No GPU work on one machine |
![]() |
| log in |
Message boards : Number crunching : No GPU work on one machine
1 · 2 · Next
| Author | Message |
|---|---|
|
I haven't been able to get any GPU work on one of my machines (5360487), for at least 2-3 weeks now. Even though everything seems to be running smoothly again. My other machines are all getting what they want now, except for this one. It has the max 100 CPU WU's but just can't get any GPU work. Any ideas. | |
| ID: 1177084 · | |
I haven't been able to get any GPU work on one of my machines (5360487), for at least 2-3 weeks now. Even though everything seems to be running smoothly again. My other machines are all getting what they want now, except for this one. It has the max 100 CPU WU's but just can't get any GPU work. Any ideas.First Update that host to Boinc 6.10.60, and no later. (Boinc 6.10.18 is just a bit too Buggy, i think it had a libcurl DNS Bug) Move that host to a different venue from the others (home, school or work), set up a new set of Project Preferences for that venue, then Disable CPU work fetch, by setting Use CPU to no, then update that host, Now it can only do ATI GPU work fetches, when it has enough ATI GPU work, enable CPU work fetch again, (Boinc 6.10.18 and earlier also had a Bug where preferences across venues didn't work correctly) Edit: I've now noticed you have a HD 2600 on that host and are running Hybrid Astropulse, is it really worth it?, do you get any speedup over the r409 CPU app? I certainly didn't get a speedup with my overclocked E8500 & HD5770. Claggy | |
| ID: 1177097 · | |
|
I'm having a simular issue. Blue has been crunching WU's just fine, but for some reason Green keeps saying project as no new work. | |
| ID: 1177112 · | |
|
Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled? | |
| ID: 1177115 · | |
Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled? The scheduler probably thinks the CPU is faster than the GPU, so sends work to that first, why?, don't know, but probably because of the Dodgy DCF, Claggy | |
| ID: 1177119 · | |
Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled? I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU. ____________ Kevin | |
| ID: 1177129 · | |
|
Before I starting running the GPU app, my RAC was around 1,000. With the GPU running it was upto 1,700, so 70% improvement is worth it. I'll try the fixes and see what happens. | |
| ID: 1177170 · | |
|
Kevin Olley wrote: I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU. It may be the VLARs you rescheduled onto GPU which have made the servers think your CPU can do 63 GFLOPS on MB tasks. The servers don't know those ~3800 second runtimes were done on GPU so they affect the CPU average. Joe | |
| ID: 1177180 · | |
I doubt it, I was seeing this a long time before I had any reason to reschedule any WU's from CPU to GPU. The only reason I reschedule is lack of GPU WU's. I have been having DCF problems, or problems with "to completion" times affecting cache size. Too many shorties on GPU will reduce DCF to a level that if it was not for the CPU limits in place I would have been flooded with CPU WU's. Too many shorties or regular WU on CPU when GPU was processing regular WU would increase DCF and "to completion" times to a point that it would affect my GPU cache or Boinc to the point that the GPU's when they had a to limits cache would be forced into high priority mode until DCF reduced to a more normal level. This can be controlled to a certain extent with AP's, with careful use of suspend button you can force an AP to start you then unsuspend suspended units and the AP is next in line. If no AP's are available then I have a couple of CPND WU's that I use. I use BoincTasks on this machine and one of the things that it shows is the total estimated completion time for the cache. the way the estimate jumps on the completion of a VLAR on CPU on the GPU cache estimated runtime is horrifying. (18 days to 250+ days not unusual(divide by 9)) ____________ Kevin | |
| ID: 1177226 · | |
Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled? Do you have <flops> values in your app_info? Claggy | |
| ID: 1177227 · | |
Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled? No the only alteration I have made is to the count, To be honest calculating flops and more advanced modifications to app_info or similar type files is above my comfort level. I keep 3 copys of app_info.txt (count 1, 0.5, 0.33) and copy and paste in the right format when I need them. I know its cheating but it works for me. My "programing" abilities are very low, so I have a bad habit of finding other ways that seem to work for me. ____________ Kevin | |
| ID: 1177230 · | |
Flops are fairly easy to calculate as the needed numbers are in your applications details for the machine in question. For example on the SETI@home Enhanced (anonymous platform, nvidia GPU) It shows 156.60088862089 All you have to do is add a e09 to it and you get <flops>156.60088862089e09</flops> Your CPU is <flops>62.12696862414e09</flops> and AP is <flops>32.098248233676e09</flops> ____________ | |
| ID: 1177346 · | |
I wouldn't use the CPU MB flops value as CPU work has been rebranded to the GPU, and the APR of the CPU has been inflated, I'd use a figure of around 15e09 instead of 62.12e09 Claggy | |
| ID: 1177354 · | |
OK, I can see where the figures are coming from and the potential problems that could be caused by my prior actions, but I am sorry I am still lost as to where in the app_info file I have to put this information. I apologize for being a pain, but my computing skills (software, programing etc) are extremely limited. ____________ Kevin | |
| ID: 1177366 · | |
|
I stick mine right under the version number.
<app_info>
<app>
<name>setiathome_enhanced</name>
</app>
<file_info>
<name>AK_v8b2_win_SSE3_AMD.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>603</version_num>
<flops>15.812442866308e09</flops>
<platform>windows_intelx86</platform>
<file_ref>
<file_name>AK_v8b2_win_SSE3_AMD.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>603</version_num>
<flops>15.812442866308e09</flops>
<platform>windows_x86_64</platform>
<file_ref>
<file_name>AK_v8b2_win_SSE3_AMD.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>setiathome_enhanced</name>
</app>
<file_info>
<name>Lunatics_x41g_win32_cuda41rc1.exe</name>
<executable/>
</file_info>
<file_info>
<name>cudart32_41_15.dll</name>
<executable/>
</file_info>
<file_info>
<name>cufft32_41_15.dll</name>
<executable/>
</file_info>
<app_version>
<app_name>setiathome_enhanced</app_name>
<version_num>610</version_num>
<flops>203.99134720153e09</flops>
<platform>windows_intelx86</platform>
<plan_class>cuda_fermi</plan_class>
<avg_ncpus>0.040000</avg_ncpus>
<max_ncpus>0.040000</max_ncpus>
<coproc>
You will need to add those numbers to all 6 occurrences of the GPU part. ____________ | |
| ID: 1177374 · | |
I stick mine right under the version number. Thank you I shall stop rescheduling and let the numbers stabilise and then give it a try. ____________ Kevin | |
| ID: 1177375 · | |
I stick mine right under the version number. Just before you add the <flops> entries, I suggest you use the rescheduler to protect against -177 errors for maximum elapsed time exceeded on cached tasks. The expert tab has a setting called something like "limit rsc_fpops_bound" which provides that protection when you do a reschedule, even if the other settings mean that no tasks are actually rescheduled. That's as much as I know, I hope someone with practical experience will provide a clearer explanation. Those cached tasks will have scaled down rsc_fpops_est which is used to calculate estimated time, and rsc_fpops_bound which is used to calculate the time limit. The big increase in flops will make the times much shorter, but DCF will soon increase to fix the estimates. Unfortunately DCF isn't used for the limit. Joe | |
| ID: 1177396 · | |
I stick mine right under the version number. It is one of the main reasons that I use reschedular for, the other is to get an estimate of how many VHAR's to regulars are residing in my cache. ____________ Kevin | |
| ID: 1177418 · | |
|
Well, I finally got some GPU work for this machine. In fact 6 AP WU's no less! | |
| ID: 1178212 · | |
I stick mine right under the version number. Good info Joe, I just put <flops> onto my main cruncher, got -17's. Will give this a try. Or is there a way to edit fpops in the app_info? Regards, A ____________ | |
| ID: 1178272 · | |
Message boards : Number crunching : No GPU work on one machine
| Copyright © 2013 University of California |