Message boards :
Number crunching :
No GPU work on one machine
Message board moderation
Author | Message |
---|---|
Starman Send message Joined: 15 May 99 Posts: 204 Credit: 81,351,915 RAC: 25 |
I haven't been able to get any GPU work on one of my machines (5360487), for at least 2-3 weeks now. Even though everything seems to be running smoothly again. My other machines are all getting what they want now, except for this one. It has the max 100 CPU WU's but just can't get any GPU work. Any ideas. Thanks Brett |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I haven't been able to get any GPU work on one of my machines (5360487), for at least 2-3 weeks now. Even though everything seems to be running smoothly again. My other machines are all getting what they want now, except for this one. It has the max 100 CPU WU's but just can't get any GPU work. Any ideas.First Update that host to Boinc 6.10.60, and no later. (Boinc 6.10.18 is just a bit too Buggy, i think it had a libcurl DNS Bug) Move that host to a different venue from the others (home, school or work), set up a new set of Project Preferences for that venue, then Disable CPU work fetch, by setting Use CPU to no, then update that host, Now it can only do ATI GPU work fetches, when it has enough ATI GPU work, enable CPU work fetch again, (Boinc 6.10.18 and earlier also had a Bug where preferences across venues didn't work correctly) Edit: I've now noticed you have a HD 2600 on that host and are running Hybrid Astropulse, is it really worth it?, do you get any speedup over the r409 CPU app? I certainly didn't get a speedup with my overclocked E8500 & HD5770. Claggy |
Team kizb Send message Joined: 8 Mar 01 Posts: 219 Credit: 3,709,162 RAC: 0 |
|
Team kizb Send message Joined: 8 Mar 01 Posts: 219 Credit: 3,709,162 RAC: 0 |
|
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled? The scheduler probably thinks the CPU is faster than the GPU, so sends work to that first, why?, don't know, but probably because of the Dodgy DCF, Claggy |
Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572 |
Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled? I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU. Kevin |
Starman Send message Joined: 15 May 99 Posts: 204 Credit: 81,351,915 RAC: 25 |
Before I starting running the GPU app, my RAC was around 1,000. With the GPU running it was upto 1,700, so 70% improvement is worth it. I'll try the fixes and see what happens. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Kevin Olley wrote: I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU. It may be the VLARs you rescheduled onto GPU which have made the servers think your CPU can do 63 GFLOPS on MB tasks. The servers don't know those ~3800 second runtimes were done on GPU so they affect the CPU average. Joe |
Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572 |
Kevin Olley wrote:I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU. I doubt it, I was seeing this a long time before I had any reason to reschedule any WU's from CPU to GPU. The only reason I reschedule is lack of GPU WU's. I have been having DCF problems, or problems with "to completion" times affecting cache size. Too many shorties on GPU will reduce DCF to a level that if it was not for the CPU limits in place I would have been flooded with CPU WU's. Too many shorties or regular WU on CPU when GPU was processing regular WU would increase DCF and "to completion" times to a point that it would affect my GPU cache or Boinc to the point that the GPU's when they had a to limits cache would be forced into high priority mode until DCF reduced to a more normal level. This can be controlled to a certain extent with AP's, with careful use of suspend button you can force an AP to start you then unsuspend suspended units and the AP is next in line. If no AP's are available then I have a couple of CPND WU's that I use. I use BoincTasks on this machine and one of the things that it shows is the total estimated completion time for the cache. the way the estimate jumps on the completion of a VLAR on CPU on the GPU cache estimated runtime is horrifying. (18 days to 250+ days not unusual(divide by 9)) Kevin |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled? Do you have <flops> values in your app_info? Claggy |
Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572 |
Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled? No the only alteration I have made is to the count, To be honest calculating flops and more advanced modifications to app_info or similar type files is above my comfort level. I keep 3 copys of app_info.txt (count 1, 0.5, 0.33) and copy and paste in the right format when I need them. I know its cheating but it works for me. My "programing" abilities are very low, so I have a bad habit of finding other ways that seem to work for me. Kevin |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
Flops are fairly easy to calculate as the needed numbers are in your applications details for the machine in question. For example on the SETI@home Enhanced (anonymous platform, nvidia GPU) It shows 156.60088862089 All you have to do is add a e09 to it and you get <flops>156.60088862089e09</flops> Your CPU is <flops>62.12696862414e09</flops> and AP is <flops>32.098248233676e09</flops> |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I wouldn't use the CPU MB flops value as CPU work has been rebranded to the GPU, and the APR of the CPU has been inflated, I'd use a figure of around 15e09 instead of 62.12e09 Claggy |
Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572 |
OK, I can see where the figures are coming from and the potential problems that could be caused by my prior actions, but I am sorry I am still lost as to where in the app_info file I have to put this information. I apologize for being a pain, but my computing skills (software, programing etc) are extremely limited. Kevin |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
I stick mine right under the version number. <app_info> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>AK_v8b2_win_SSE3_AMD.exe</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>603</version_num> <flops>15.812442866308e09</flops> <platform>windows_intelx86</platform> <file_ref> <file_name>AK_v8b2_win_SSE3_AMD.exe</file_name> <main_program/> </file_ref> </app_version> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>603</version_num> <flops>15.812442866308e09</flops> <platform>windows_x86_64</platform> <file_ref> <file_name>AK_v8b2_win_SSE3_AMD.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome_enhanced</name> </app> <file_info> <name>Lunatics_x41g_win32_cuda41rc1.exe</name> <executable/> </file_info> <file_info> <name>cudart32_41_15.dll</name> <executable/> </file_info> <file_info> <name>cufft32_41_15.dll</name> <executable/> </file_info> <app_version> <app_name>setiathome_enhanced</app_name> <version_num>610</version_num> <flops>203.99134720153e09</flops> <platform>windows_intelx86</platform> <plan_class>cuda_fermi</plan_class> <avg_ncpus>0.040000</avg_ncpus> <max_ncpus>0.040000</max_ncpus> <coproc> You will need to add those numbers to all 6 occurrences of the GPU part. |
Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572 |
I stick mine right under the version number. Thank you I shall stop rescheduling and let the numbers stabilise and then give it a try. Kevin |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
I stick mine right under the version number. Just before you add the <flops> entries, I suggest you use the rescheduler to protect against -177 errors for maximum elapsed time exceeded on cached tasks. The expert tab has a setting called something like "limit rsc_fpops_bound" which provides that protection when you do a reschedule, even if the other settings mean that no tasks are actually rescheduled. That's as much as I know, I hope someone with practical experience will provide a clearer explanation. Those cached tasks will have scaled down rsc_fpops_est which is used to calculate estimated time, and rsc_fpops_bound which is used to calculate the time limit. The big increase in flops will make the times much shorter, but DCF will soon increase to fix the estimates. Unfortunately DCF isn't used for the limit. Joe |
Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572 |
I stick mine right under the version number. It is one of the main reasons that I use reschedular for, the other is to get an estimate of how many VHAR's to regulars are residing in my cache. Kevin |
Starman Send message Joined: 15 May 99 Posts: 204 Credit: 81,351,915 RAC: 25 |
Well, I finally got some GPU work for this machine. In fact 6 AP WU's no less! The only change I made (2 days ago) was to downgrade to 6.10.60. :) |
AndyJ Send message Joined: 17 Aug 02 Posts: 248 Credit: 27,380,797 RAC: 0 |
I stick mine right under the version number. Good info Joe, I just put <flops> onto my main cruncher, got -17's. Will give this a try. Or is there a way to edit fpops in the app_info? Regards, A |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.