No GPU work on one machine

Author	Message
Starman Send message Joined: 15 May 99 Posts: 204 Credit: 81,351,915 RAC: 25	Message 1177084 - Posted: 10 Dec 2011, 15:53:22 UTC I haven't been able to get any GPU work on one of my machines (5360487), for at least 2-3 weeks now. Even though everything seems to be running smoothly again. My other machines are all getting what they want now, except for this one. It has the max 100 CPU WU's but just can't get any GPU work. Any ideas. Thanks Brett ID: 1177084 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1177097 - Posted: 10 Dec 2011, 17:10:35 UTC - in response to Message 1177084. Last modified: 10 Dec 2011, 17:57:08 UTC I haven't been able to get any GPU work on one of my machines (5360487), for at least 2-3 weeks now. Even though everything seems to be running smoothly again. My other machines are all getting what they want now, except for this one. It has the max 100 CPU WU's but just can't get any GPU work. Any ideas. Thanks Brett First Update that host to Boinc 6.10.60, and no later. (Boinc 6.10.18 is just a bit too Buggy, i think it had a libcurl DNS Bug) Move that host to a different venue from the others (home, school or work), set up a new set of Project Preferences for that venue, then Disable CPU work fetch, by setting Use CPU to no, then update that host, Now it can only do ATI GPU work fetches, when it has enough ATI GPU work, enable CPU work fetch again, (Boinc 6.10.18 and earlier also had a Bug where preferences across venues didn't work correctly) Edit: I've now noticed you have a HD 2600 on that host and are running Hybrid Astropulse, is it really worth it?, do you get any speedup over the r409 CPU app? I certainly didn't get a speedup with my overclocked E8500 & HD5770. Claggy ID: 1177097 ·

Team kizb Send message Joined: 8 Mar 01 Posts: 219 Credit: 3,709,162 RAC: 0	Message 1177112 - Posted: 10 Dec 2011, 18:20:39 UTC - in response to Message 1177097. I'm having a simular issue. Blue has been crunching WU's just fine, but for some reason Green keeps saying project as no new work. My Computers: â–ˆ Blue Offline â–ˆ Green Offline â–ˆ Red Offline ID: 1177112 ·

Team kizb Send message Joined: 8 Mar 01 Posts: 219 Credit: 3,709,162 RAC: 0	Message 1177115 - Posted: 10 Dec 2011, 18:43:54 UTC - in response to Message 1177112. Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled? My Computers: â–ˆ Blue Offline â–ˆ Green Offline â–ˆ Red Offline ID: 1177115 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1177119 - Posted: 10 Dec 2011, 18:54:44 UTC - in response to Message 1177115. Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled? The scheduler probably thinks the CPU is faster than the GPU, so sends work to that first, why?, don't know, but probably because of the Dodgy DCF, Claggy ID: 1177119 ·

Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572	Message 1177129 - Posted: 10 Dec 2011, 19:58:35 UTC - in response to Message 1177119. Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled? The scheduler probably thinks the CPU is faster than the GPU, so sends work to that first, why?, don't know, but probably because of the Dodgy DCF, Claggy I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU. Kevin ID: 1177129 ·

Starman Send message Joined: 15 May 99 Posts: 204 Credit: 81,351,915 RAC: 25	Message 1177170 - Posted: 11 Dec 2011, 0:44:16 UTC Before I starting running the GPU app, my RAC was around 1,000. With the GPU running it was upto 1,700, so 70% improvement is worth it. I'll try the fixes and see what happens. ID: 1177170 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1177180 - Posted: 11 Dec 2011, 2:50:52 UTC - in response to Message 1177129. Kevin Olley wrote: I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU. It may be the VLARs you rescheduled onto GPU which have made the servers think your CPU can do 63 GFLOPS on MB tasks. The servers don't know those ~3800 second runtimes were done on GPU so they affect the CPU average. Joe ID: 1177180 ·

Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572	Message 1177226 - Posted: 11 Dec 2011, 10:54:58 UTC - in response to Message 1177180. Kevin Olley wrote: I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU. It may be the VLARs you rescheduled onto GPU which have made the servers think your CPU can do 63 GFLOPS on MB tasks. The servers don't know those ~3800 second runtimes were done on GPU so they affect the CPU average. Joe I doubt it, I was seeing this a long time before I had any reason to reschedule any WU's from CPU to GPU. The only reason I reschedule is lack of GPU WU's. I have been having DCF problems, or problems with "to completion" times affecting cache size. Too many shorties on GPU will reduce DCF to a level that if it was not for the CPU limits in place I would have been flooded with CPU WU's. Too many shorties or regular WU on CPU when GPU was processing regular WU would increase DCF and "to completion" times to a point that it would affect my GPU cache or Boinc to the point that the GPU's when they had a to limits cache would be forced into high priority mode until DCF reduced to a more normal level. This can be controlled to a certain extent with AP's, with careful use of suspend button you can force an AP to start you then unsuspend suspended units and the AP is next in line. If no AP's are available then I have a couple of CPND WU's that I use. I use BoincTasks on this machine and one of the things that it shows is the total estimated completion time for the cache. the way the estimate jumps on the completion of a VLAR on CPU on the GPU cache estimated runtime is horrifying. (18 days to 250+ days not unusual(divide by 9)) Kevin ID: 1177226 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1177227 - Posted: 11 Dec 2011, 11:12:55 UTC - in response to Message 1177129. Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled? The scheduler probably thinks the CPU is faster than the GPU, so sends work to that first, why?, don't know, but probably because of the Dodgy DCF, Claggy I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU. Do you have <flops> values in your app_info? Claggy ID: 1177227 ·

Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572	Message 1177230 - Posted: 11 Dec 2011, 12:06:39 UTC - in response to Message 1177227. Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled? The scheduler probably thinks the CPU is faster than the GPU, so sends work to that first, why?, don't know, but probably because of the Dodgy DCF, Claggy I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU. Do you have <flops> values in your app_info? Claggy No the only alteration I have made is to the count, To be honest calculating flops and more advanced modifications to app_info or similar type files is above my comfort level. I keep 3 copys of app_info.txt (count 1, 0.5, 0.33) and copy and paste in the right format when I need them. I know its cheating but it works for me. My "programing" abilities are very low, so I have a bad habit of finding other ways that seem to work for me. Kevin ID: 1177230 ·

arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 1177346 - Posted: 11 Dec 2011, 22:42:51 UTC - in response to Message 1177230. Do you have <flops> values in your app_info? Claggy No the only alteration I have made is to the count, To be honest calculating flops and more advanced modifications to app_info or similar type files is above my comfort level. I keep 3 copys of app_info.txt (count 1, 0.5, 0.33) and copy and paste in the right format when I need them. I know its cheating but it works for me. My "programing" abilities are very low, so I have a bad habit of finding other ways that seem to work for me. Flops are fairly easy to calculate as the needed numbers are in your applications details for the machine in question. For example on the SETI@home Enhanced (anonymous platform, nvidia GPU) It shows 156.60088862089 All you have to do is add a e09 to it and you get <flops>156.60088862089e09</flops> Your CPU is <flops>62.12696862414e09</flops> and AP is <flops>32.098248233676e09</flops> ID: 1177346 ·

Claggy Volunteer tester Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4	Message 1177354 - Posted: 11 Dec 2011, 23:16:21 UTC - in response to Message 1177346. Last modified: 11 Dec 2011, 23:17:08 UTC Do you have <flops> values in your app_info? Claggy No the only alteration I have made is to the count, To be honest calculating flops and more advanced modifications to app_info or similar type files is above my comfort level. I keep 3 copys of app_info.txt (count 1, 0.5, 0.33) and copy and paste in the right format when I need them. I know its cheating but it works for me. My "programing" abilities are very low, so I have a bad habit of finding other ways that seem to work for me. Flops are fairly easy to calculate as the needed numbers are in your applications details for the machine in question. For example on the SETI@home Enhanced (anonymous platform, nvidia GPU) It shows 156.60088862089 All you have to do is add a e09 to it and you get <flops>156.60088862089e09</flops> Your CPU is <flops>62.12696862414e09</flops> and AP is <flops>32.098248233676e09</flops> I wouldn't use the CPU MB flops value as CPU work has been rebranded to the GPU, and the APR of the CPU has been inflated, I'd use a figure of around 15e09 instead of 62.12e09 Claggy ID: 1177354 ·

Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572	Message 1177366 - Posted: 12 Dec 2011, 0:17:16 UTC - in response to Message 1177354. Flops are fairly easy to calculate as the needed numbers are in your applications details for the machine in question. For example on the SETI@home Enhanced (anonymous platform, nvidia GPU) It shows 156.60088862089 All you have to do is add a e09 to it and you get <flops>156.60088862089e09</flops> Your CPU is <flops>62.12696862414e09</flops> and AP is <flops>32.098248233676e09</flops> I wouldn't use the CPU MB flops value as CPU work has been rebranded to the GPU, and the APR of the CPU has been inflated, I'd use a figure of around 15e09 instead of 62.12e09 Claggy OK, I can see where the figures are coming from and the potential problems that could be caused by my prior actions, but I am sorry I am still lost as to where in the app_info file I have to put this information. I apologize for being a pain, but my computing skills (software, programing etc) are extremely limited. Kevin ID: 1177366 ·

arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 1177374 - Posted: 12 Dec 2011, 1:00:09 UTC I stick mine right under the version number. <app_info> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>AK_v8b2_win_SSE3_AMD.exe</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>603</version_num> <flops>15.812442866308e09</flops> <platform>windows_intelx86</platform> <file_ref> <file_name>AK_v8b2_win_SSE3_AMD.exe</file_name> <main_program/> </file_ref> </app_version> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>603</version_num> <flops>15.812442866308e09</flops> <platform>windows_x86_64</platform> <file_ref> <file_name>AK_v8b2_win_SSE3_AMD.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>Lunatics_x41g_win32_cuda41rc1.exe</name> <executable/> </file_info> <file_info> <name>cudart32_41_15.dll</name> <executable/> </file_info> <file_info> <name>cufft32_41_15.dll</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>610</version_num> <flops>203.99134720153e09</flops> <platform>windows_intelx86</platform> <plan_class>cuda_fermi</plan_class> <avg_ncpus>0.040000</avg_ncpus> <max_ncpus>0.040000</max_ncpus> <coproc> You will need to add those numbers to all 6 occurrences of the GPU part. ID: 1177374 ·

Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572	Message 1177375 - Posted: 12 Dec 2011, 1:09:55 UTC - in response to Message 1177374. I stick mine right under the version number. You will need to add those numbers to all 6 occurrences of the GPU part. Thank you I shall stop rescheduling and let the numbers stabilise and then give it a try. Kevin ID: 1177375 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1177396 - Posted: 12 Dec 2011, 5:05:58 UTC - in response to Message 1177375. I stick mine right under the version number. You will need to add those numbers to all 6 occurrences of the GPU part. Thank you I shall stop rescheduling and let the numbers stabilise and then give it a try. Just before you add the <flops> entries, I suggest you use the rescheduler to protect against -177 errors for maximum elapsed time exceeded on cached tasks. The expert tab has a setting called something like "limit rsc_fpops_bound" which provides that protection when you do a reschedule, even if the other settings mean that no tasks are actually rescheduled. That's as much as I know, I hope someone with practical experience will provide a clearer explanation. Those cached tasks will have scaled down rsc_fpops_est which is used to calculate estimated time, and rsc_fpops_bound which is used to calculate the time limit. The big increase in flops will make the times much shorter, but DCF will soon increase to fix the estimates. Unfortunately DCF isn't used for the limit. Joe ID: 1177396 ·

Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572	Message 1177418 - Posted: 12 Dec 2011, 8:25:49 UTC - in response to Message 1177396. I stick mine right under the version number. You will need to add those numbers to all 6 occurrences of the GPU part. Thank you I shall stop rescheduling and let the numbers stabilise and then give it a try. Just before you add the <flops> entries, I suggest you use the rescheduler to protect against -177 errors for maximum elapsed time exceeded on cached tasks. The expert tab has a setting called something like "limit rsc_fpops_bound" which provides that protection when you do a reschedule, even if the other settings mean that no tasks are actually rescheduled. That's as much as I know, I hope someone with practical experience will provide a clearer explanation. Those cached tasks will have scaled down rsc_fpops_est which is used to calculate estimated time, and rsc_fpops_bound which is used to calculate the time limit. The big increase in flops will make the times much shorter, but DCF will soon increase to fix the estimates. Unfortunately DCF isn't used for the limit. Joe It is one of the main reasons that I use reschedular for, the other is to get an estimate of how many VHAR's to regulars are residing in my cache. Kevin ID: 1177418 ·

Starman Send message Joined: 15 May 99 Posts: 204 Credit: 81,351,915 RAC: 25	Message 1178212 - Posted: 15 Dec 2011, 5:30:56 UTC Well, I finally got some GPU work for this machine. In fact 6 AP WU's no less! The only change I made (2 days ago) was to downgrade to 6.10.60. :) ID: 1178212 ·

AndyJ Send message Joined: 17 Aug 02 Posts: 248 Credit: 27,380,797 RAC: 0	Message 1178272 - Posted: 15 Dec 2011, 11:56:34 UTC - in response to Message 1177396. I stick mine right under the version number. You will need to add those numbers to all 6 occurrences of the GPU part. Thank you I shall stop rescheduling and let the numbers stabilise and then give it a try. Just before you add the <flops> entries, I suggest you use the rescheduler to protect against -177 errors for maximum elapsed time exceeded on cached tasks. The expert tab has a setting called something like "limit rsc_fpops_bound" which provides that protection when you do a reschedule, even if the other settings mean that no tasks are actually rescheduled. That's as much as I know, I hope someone with practical experience will provide a clearer explanation. Those cached tasks will have scaled down rsc_fpops_est which is used to calculate estimated time, and rsc_fpops_bound which is used to calculate the time limit. The big increase in flops will make the times much shorter, but DCF will soon increase to fix the estimates. Unfortunately DCF isn't used for the limit. Joe Good info Joe, I just put <flops> onto my main cruncher, got -17's. Will give this a try. Or is there a way to edit fpops in the app_info? Regards, A ID: 1178272 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.