留言板 :
Number crunching :
FIFO??
留言板合理
| 作者 | 消息 |
|---|---|
|
Claggy 发送消息 已加入:5 Jul 99 贴子:4654 积分:47,537,079 近期平均积分:4
|
I haven't seen any sign of the next alpha build yet, but we'll need that and a new round of testing before we start to consider a public release again. I'm waiting for a new tagged release so the Arm CPU model Identification fix can be hopefully be passed out to the repository builds, still waiting. Claggy |
Richard Haselgrove ![]() 发送消息 已加入:4 Jul 99 贴子:14152 积分:200,643,578 近期平均积分:874
|
Agreed. We got a code fix fix job scheduling bug that starves CPU instances and a private drop (hosted by Rom, because he built it while the BOINC server was out of action) http://www.romwnet.org/files/boinc.030815.x64.zip I haven't seen any sign of the next alpha build yet, but we'll need that and a new round of testing before we start to consider a public release again. |
Cliff Harding 发送消息 已加入:18 Aug 99 贴子:1432 积分:110,967,840 近期平均积分:67
|
Take 2 on the event logs has provided the requisite smoking gun: A change was made and I did test it successfully, but I don't know how many more testers are needed or when it will be available for beta testing. In my opinion this is a show stopper, so I wouldn't be too anxious to put 7.6.6 beta to the general public. I don't buy computers, I build them!! |
|
Claggy 发送消息 已加入:5 Jul 99 贴子:4654 积分:47,537,079 近期平均积分:4
|
We're in a bit of a bind at the moment because the BOINC server (holding the message boards and changelogs) has crashed - and even the WayBack machine hasn't got any of the alpha build history archived. Or look at the tagged release: https://github.com/BOINC/boinc/commits/client_release/7.6/7.6.3 Claggy |
Richard Haselgrove ![]() 发送消息 已加入:4 Jul 99 贴子:14152 积分:200,643,578 近期平均积分:874
|
Take 2 on the event logs has provided the requisite smoking gun: [cpu_sched_debug] using 2.00 out of 6 CPUs With Cliff's permission, I've forwarded the logs on to David and Rom: I think we'll have to wait for feedback when they get their server(s) working again. |
Richard Haselgrove ![]() 发送消息 已加入:4 Jul 99 贴子:14152 积分:200,643,578 近期平均积分:874
|
Well, Cliff has sent me the debug logs I requested, and they're showing the expected "reserving 0.500000 of coproc NVIDIA" - so we're still in 'multiple apps per GPU' territory, not the 'multiple GPUs per app' that commit was designed to handle. Still, the law of unintended consequences is very widely applicable... Argh, just noticed a finger-fumble on his logs - they're identical, even to the 12:56:09 timestamp on both logs. Must have failed to copy the second one before pasting it into a text file (we've all done that). Back to the drawing board. |
|
Claggy 发送消息 已加入:5 Jul 99 贴子:4654 积分:47,537,079 近期平均积分:4
|
This one looks like a possible candidate: https://github.com/BOINC/boinc/commit/eab76dc245ed88bcce3e787b9315cd3c440d4f97 client: fix bug when app version uses > 1 GPU instance Claggy |
|
Claggy 发送消息 已加入:5 Jul 99 贴子:4654 积分:47,537,079 近期平均积分:4
|
We're in a bit of a bind at the moment because the BOINC server (holding the message boards and changelogs) has crashed - and even the WayBack machine hasn't got any of the alpha build history archived. You can check back in the 7.6 head to see what changesets were applied: https://github.com/BOINC/boinc/commits/client_release/7/7.6 Claggy |
Richard Haselgrove ![]() 发送消息 已加入:4 Jul 99 贴子:14152 积分:200,643,578 近期平均积分:874
|
4) Did the same with 7.6.3 and that is when the problem starts. AP CPU tasks will not run concurrently with AP GPU tasks. That's very odd. Both v7.6.2 and v7.6.3 were created because of bugs I reported: v7.6.2 was to fix the problem deleting files over 4 GB v7.6.3 was to fix a delay doing the task cleanup at the end of a job (GPUGrid, using optional renamed output files) Neither of those should have changed GPU/CPU scheduling, though of course other development work will have been pottering along beside the specific bugs I remember. We're in a bit of a bind at the moment because the BOINC server (holding the message boards and changelogs) has crashed - and even the WayBack machine hasn't got any of the alpha build history archived. So, even though it's a bit of a chore, could you possibly save and send me one cycle of <cpu_sched_debug> log for the working v7.6.2 client, and one cycle for v7.6.3 (or v7.6.6, doesn't really matter: either of the broken ones), so I can try and work out what went wrong? Unless we can send them some evidence like that, it's unlikely they'll bother to build a new test version - and then we'd be stuck with v7.6.6 for ever. |
Cliff Harding 发送消息 已加入:18 Aug 99 贴子:1432 积分:110,967,840 近期平均积分:67
|
Since it has been proven that the AP CPU 7.03 (SSE2/AVX) tasks are running on this machine when all AP GPU 7.10 (opencl_nvidia_100) tasks are suspended, I'm now wondering if there is something within Lunatics, the BOINC client, or a combination of the two that is preventing the two working in conjunction with each other. I've been running Lunatics with .5 CPU & .5 GPU for several years with no problems. I know the current wisdom is to allocate 1 core for each GPU task, but I've found that my settings have worked very well since my Fermi days. I did some regression testing this morning and found the following: 1) Uninstalled 7.6.6 and reverted back to 7.4.42, recycled the machine and the CPU tasks had no problem with running concurrent with the GPU tasks. 2) Installed 7.6.1 over 7.4.42 and recycled machine with the same results once BOINC started. 3) Installed 7.6.2 over 7.6.1 with the same results as #1 & #2. 4) Did the same with 7.6.3 and that is when the problem starts. AP CPU tasks will not run concurrently with AP GPU tasks. 5) Did the same with 7.6.6 with the same results as #4. There is definitely something, either in Lunatics or BOINC (whether it's the client or manager) is preventing the two working concurrently. Right now I'm running 7.6.2 so that both apps run together until this is fixed or another version is ready for beta testing. I don't buy computers, I build them!! |
Richard Haselgrove ![]() 发送消息 已加入:4 Jul 99 贴子:14152 积分:200,643,578 近期平均积分:874
|
Since it has been proven that the AP CPU 7.03 (SSE2/AVX) tasks are running on this machine when all AP GPU 7.10 (opencl_nvidia_100) tasks are suspended, I'm now wondering if there is something within Lunatics, the BOINC client, or a combination of the two that is preventing the two working in conjunction with each other. Two (and only two) simple possibilities come to mind: You may have overdone the much-touted "free at least one core when running OpenCL tasks on NVidia" advice. The Lunatics default setting is 0.04 avg / 0.2 max CPUs per task (so five tasks would need to be running concurrently to free an extra core - maybe we should rethink that). Any additional restriction would be down to your local choices/edits for the number of CPUs BOINC schedules, or any app_config.xml file settings. The other possibility is a <max_concurrent> setting in app_config.xml Beyond that, you would be into the realms of resource exhaustion - not enough memory, perhaps - but that's less likely. The evidence would be found using the <cpu_sched_debug> event log flag (which includes GPU scheduling, despite the name) - but that's complex and verbose. You probably wouldn't want to run it continually, but it might be worth a peek. |
Cliff Harding 发送消息 已加入:18 Aug 99 贴子:1432 积分:110,967,840 近期平均积分:67
|
Since it has been proven that the AP CPU 7.03 (SSE2/AVX) tasks are running on this machine when all AP GPU 7.10 (opencl_nvidia_100) tasks are suspended, I'm now wondering if there is something within Lunatics, the BOINC client, or a combination of the two that is preventing the two working in conjunction with each other. I don't buy computers, I build them!! |
Cliff Harding 发送消息 已加入:18 Aug 99 贴子:1432 积分:110,967,840 近期平均积分:67
|
All MB CPU tasks have been completed and the MB(cuda50) are now running. There are NO CPU tasks running at this time. I don't buy computers, I build them!! |
betreger ![]() 发送消息 已加入:29 Jun 99 贴子:10359 积分:29,581,041 近期平均积分:66
|
Frankly, with the very relaxed deadlines at this project (most other BOINC projects set 7-day or 14-day deadlines), it would actually take considerable effort and ingenuity to get BOINC to worry about deadlines (invoke 'high priority' or EDF processing) for a SETI-only cruncher, and I've not seen any evidence of it yet in this thread. Richard I have 2 examples of high priority. My XP box when it runs AP only and they are plentiful as they have been recently does it on its GT430, as does my W7 box on its CPU. I'll admit I try to run a 20 day cache but since nothing ever times out and all validates I figure that's OK. |
Brent Norman ![]() 发送消息 已加入:1 Dec 99 贴子:2786 积分:685,657,289 近期平均积分:835
|
From one of your CPU tasks ...
You are running AVX, mine also shows SEE2, but it's AVX as well Simple solution if you are running into priority issues ... Back off your cache to 4 days, that should still keep you full.
|
Cliff Harding 发送消息 已加入:18 Aug 99 贴子:1432 积分:110,967,840 近期平均积分:67
|
The 4 CPU work units are all VLARs which take a lot longer to process than an AP. As the APs are using the Lunatics app to run. All of the MB CPU VLAR tasks had a receive date of 08/01 and a deadline of 09/23, with a est. run time of 5.07 hrs. each, with 3 "running" left. There are 42 MB (cuda50) with a deadline of 08/21 and 15 with deadlines beginning 09/11, all d/l'ed on 08/01 w/ est. run time of .13 hrs. each "ready to start". I have 130 AP GPU tasks remaining (4 "running" & 126 "ready to start" with earliest deadline of 08/21, w/ est. run time of 1.28 hrs. each. There are the 6 AP CPU tasks, d/l'ed on 07/31 w/ a deadlines between 08/21 & 08/25 and est. run times of 5.07 hrs. each "waiting to run". These are the CPU tasks that started this thread. (These tasks are in this status because I suspended all GPU tasks to make sure that they were able to run. As soon as I allowed MB tasks to be processed they have not resumed running). There are 72 AP CPU tasks, d/l'ed on 08/01 with the earliest deadline of 08/26 and est. run time of 5.07 hrs. each. None of the AP CPU tasks have resumed running at this time, and I see nothing apparent with security issues stopping them from running. BTW, the AP GPU tasks have been running since the start of this tread. I will point out that some of the AP GPU tasks were already in the queue prior to the failure of the Win 10 upgrade and the complete re-install of Win 7, BOINC 7.6.6 & Lunatics 0.43b on 07/31. The data directory is not on the system drive (C:), but on the data drive (D:) which allows me to fully retain all SETI data. Something else I've noticed is that all of the AP CPU tasks are coming came in as SSE2, but AVX was defaulted in the Lunatics install. I don't buy computers, I build them!! |
Richard Haselgrove ![]() 发送消息 已加入:4 Jul 99 贴子:14152 积分:200,643,578 近期平均积分:874
|
The 4 CPU work units are all VLARs which take a lot longer to process than an AP. As the APs are using the Lunatics app to run. Sorry, but absolutely none of that is true. VLARs are a special case for NVidia GPUs only, but on CPUs: They take less time to process than APs If Lunatics apps are installed, they will be run with a Lunatics MB CPU app. They have a longer deadline than AP - 53 days. VLAR work issued today won't be required back until 24 or 25 September. Frankly, with the very relaxed deadlines at this project (most other BOINC projects set 7-day or 14-day deadlines), it would actually take considerable effort and ingenuity to get BOINC to worry about deadlines (invoke 'high priority' or EDF processing) for a SETI-only cruncher, and I've not seen any evidence of it yet in this thread. |
Zalster 发送消息 已加入:27 May 99 贴子:5445 积分:528,817,460 近期平均积分:242
|
If you have it set at 80% CPU usage ... you are saying that 2 cores BOINC can't touch. I'm guessing that line is a duplicate? Other wise you have 5 GPU work units. The 4 CPU work units are all VLARs which take a lot longer to process than an AP. As the APs are using the Lunatics app to run. So to me, since the VLARs don't have any special app to speed them along, they will take alot longer to run and having a shorter deadline, they would move to the front of the cache to be processed. Now I have to go back and read what was happening prior to the MB download.
|
Cliff Harding 发送消息 已加入:18 Aug 99 贴子:1432 积分:110,967,840 近期平均积分:67
|
If you have it set at 80% CPU usage ... you are saying that 2 cores BOINC can't touch. 08/01/2015 16:37:10 | | max CPUs used: 6 08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task 09my15ad.11509.22323.438086664197.12.103.vlar_0 using setiathome_v7 version 700 in slot 11 08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task 09my15ac.31563.143519.438086664198.12.159.vlar_0 using setiathome_v7 version 700 in slot 13 08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task 09my15ac.31563.143519.438086664198.12.94.vlar_1 using setiathome_v7 version 700 in slot 10 08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task 09my15aa.9767.21335.438086664200.12.125.vlar_1 using setiathome_v7 version 700 in slot 12 08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_11my15af_B6_P1_00355_20150731_17277.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 2 08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_11my15af_B6_P1_00355_20150731_17277.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 2 08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_12my15ag_B0_P0_00070_20150731_00353.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 3 08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_12my15ab_B4_P0_00157_20150731_07482.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 1 08/01/2015 16:37:11 | SETI@home | [cpu_sched] Restarting task ap_12my15aa_B4_P0_00361_20150731_13562.wu_0 using astropulse_v7 version 710 (opencl_nvidia_100) in slot 0 That's 4 GPU tasks @ .5 CPU & GPU each, plus 4 CPU tasks, sorry I didn't make that clear in the previous post. I don't buy computers, I build them!! |
Brent Norman ![]() 发送消息 已加入:1 Dec 99 贴子:2786 积分:685,657,289 近期平均积分:835
|
If you have it set at 80% CPU usage ... you are saying that 2 cores BOINC can't touch. Then 4 AP tasks at 0.5 CPU is 2 more cores that really don't do much other than fed the GPU, which are about 3% CPU usage .... but you can't run CPU tasks on a reserved core. So you would have 4 cores that are reserved, and not allowed to do BONIC CPU processing.
|
©2020 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.