No GPU work on one machine

Message boards : Number crunching : No GPU work on one machine
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Starman
Avatar

Send message
Joined: 15 May 99
Posts: 204
Credit: 81,351,915
RAC: 25
Canada
Message 1177084 - Posted: 10 Dec 2011, 15:53:22 UTC

I haven't been able to get any GPU work on one of my machines (5360487), for at least 2-3 weeks now. Even though everything seems to be running smoothly again. My other machines are all getting what they want now, except for this one. It has the max 100 CPU WU's but just can't get any GPU work. Any ideas.

Thanks

Brett
ID: 1177084 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1177097 - Posted: 10 Dec 2011, 17:10:35 UTC - in response to Message 1177084.  
Last modified: 10 Dec 2011, 17:57:08 UTC

I haven't been able to get any GPU work on one of my machines (5360487), for at least 2-3 weeks now. Even though everything seems to be running smoothly again. My other machines are all getting what they want now, except for this one. It has the max 100 CPU WU's but just can't get any GPU work. Any ideas.

Thanks

Brett
First Update that host to Boinc 6.10.60, and no later. (Boinc 6.10.18 is just a bit too Buggy, i think it had a libcurl DNS Bug)

Move that host to a different venue from the others (home, school or work), set up a new set of Project Preferences for that venue, then Disable CPU work fetch, by setting Use CPU to no, then update that host,
Now it can only do ATI GPU work fetches, when it has enough ATI GPU work, enable CPU work fetch again, (Boinc 6.10.18 and earlier also had a Bug where preferences across venues didn't work correctly)

Edit: I've now noticed you have a HD 2600 on that host and are running Hybrid Astropulse, is it really worth it?, do you get any speedup over the r409 CPU app? I certainly didn't get a speedup with my overclocked E8500 & HD5770.

Claggy
ID: 1177097 · Report as offensive
Team kizb

Send message
Joined: 8 Mar 01
Posts: 219
Credit: 3,709,162
RAC: 0
Germany
Message 1177112 - Posted: 10 Dec 2011, 18:20:39 UTC - in response to Message 1177097.  

I'm having a simular issue. Blue has been crunching WU's just fine, but for some reason Green keeps saying project as no new work.
My Computers:
â–ˆ Blue Offline
â–ˆ Green Offline
â–ˆ Red Offline
ID: 1177112 · Report as offensive
Team kizb

Send message
Joined: 8 Mar 01
Posts: 219
Credit: 3,709,162
RAC: 0
Germany
Message 1177115 - Posted: 10 Dec 2011, 18:43:54 UTC - in response to Message 1177112.  

Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled?
My Computers:
â–ˆ Blue Offline
â–ˆ Green Offline
â–ˆ Red Offline
ID: 1177115 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1177119 - Posted: 10 Dec 2011, 18:54:44 UTC - in response to Message 1177115.  

Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled?

The scheduler probably thinks the CPU is faster than the GPU, so sends work to that first, why?, don't know, but probably because of the Dodgy DCF,

Claggy
ID: 1177119 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1177129 - Posted: 10 Dec 2011, 19:58:35 UTC - in response to Message 1177119.  

Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled?

The scheduler probably thinks the CPU is faster than the GPU, so sends work to that first, why?, don't know, but probably because of the Dodgy DCF,

Claggy


I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU.


Kevin


ID: 1177129 · Report as offensive
Starman
Avatar

Send message
Joined: 15 May 99
Posts: 204
Credit: 81,351,915
RAC: 25
Canada
Message 1177170 - Posted: 11 Dec 2011, 0:44:16 UTC

Before I starting running the GPU app, my RAC was around 1,000. With the GPU running it was upto 1,700, so 70% improvement is worth it. I'll try the fixes and see what happens.
ID: 1177170 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1177180 - Posted: 11 Dec 2011, 2:50:52 UTC - in response to Message 1177129.  

Kevin Olley wrote:
I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU.

It may be the VLARs you rescheduled onto GPU which have made the servers think your CPU can do 63 GFLOPS on MB tasks. The servers don't know those ~3800 second runtimes were done on GPU so they affect the CPU average.
                                                                   Joe
ID: 1177180 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1177226 - Posted: 11 Dec 2011, 10:54:58 UTC - in response to Message 1177180.  

Kevin Olley wrote:
I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU.

It may be the VLARs you rescheduled onto GPU which have made the servers think your CPU can do 63 GFLOPS on MB tasks. The servers don't know those ~3800 second runtimes were done on GPU so they affect the CPU average.
                                                                   Joe


I doubt it, I was seeing this a long time before I had any reason to reschedule any WU's from CPU to GPU. The only reason I reschedule is lack of GPU WU's.

I have been having DCF problems, or problems with "to completion" times affecting cache size.

Too many shorties on GPU will reduce DCF to a level that if it was not for the CPU limits in place I would have been flooded with CPU WU's.

Too many shorties or regular WU on CPU when GPU was processing regular WU would increase DCF and "to completion" times to a point that it would affect my GPU cache or Boinc to the point that the GPU's when they had a to limits cache would be forced into high priority mode until DCF reduced to a more normal level.

This can be controlled to a certain extent with AP's, with careful use of suspend button you can force an AP to start you then unsuspend suspended units and the AP is next in line. If no AP's are available then I have a couple of CPND WU's that I use.

I use BoincTasks on this machine and one of the things that it shows is the total estimated completion time for the cache. the way the estimate jumps on the completion of a VLAR on CPU on the GPU cache estimated runtime is horrifying. (18 days to 250+ days not unusual(divide by 9))




Kevin


ID: 1177226 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1177227 - Posted: 11 Dec 2011, 11:12:55 UTC - in response to Message 1177129.  

Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled?

The scheduler probably thinks the CPU is faster than the GPU, so sends work to that first, why?, don't know, but probably because of the Dodgy DCF,

Claggy


I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU.


Do you have <flops> values in your app_info?

Claggy
ID: 1177227 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1177230 - Posted: 11 Dec 2011, 12:06:39 UTC - in response to Message 1177227.  

Seems as the above talked about work around is working, interesting. Any idea why it doesn't like to download GPU tasks when CPU is enabled?

The scheduler probably thinks the CPU is faster than the GPU, so sends work to that first, why?, don't know, but probably because of the Dodgy DCF,

Claggy


I find at times that the "to completion" times on CPU WU's is shorter than the times for the same type of WU on GPU.


Do you have <flops> values in your app_info?

Claggy



No the only alteration I have made is to the count, To be honest calculating flops and more advanced modifications to app_info or similar type files is above my comfort level.

I keep 3 copys of app_info.txt (count 1, 0.5, 0.33) and copy and paste in the right format when I need them. I know its cheating but it works for me.

My "programing" abilities are very low, so I have a bad habit of finding other ways that seem to work for me.




Kevin


ID: 1177230 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1177346 - Posted: 11 Dec 2011, 22:42:51 UTC - in response to Message 1177230.  


Do you have <flops> values in your app_info?

Claggy



No the only alteration I have made is to the count, To be honest calculating flops and more advanced modifications to app_info or similar type files is above my comfort level.

I keep 3 copys of app_info.txt (count 1, 0.5, 0.33) and copy and paste in the right format when I need them. I know its cheating but it works for me.

My "programing" abilities are very low, so I have a bad habit of finding other ways that seem to work for me.


Flops are fairly easy to calculate as the needed numbers are in your applications details for the machine in question.

For example on the SETI@home Enhanced (anonymous platform, nvidia GPU)
It shows 156.60088862089

All you have to do is add a e09 to it and you get
<flops>156.60088862089e09</flops>

Your CPU is
<flops>62.12696862414e09</flops>

and AP is
<flops>32.098248233676e09</flops>

ID: 1177346 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1177354 - Posted: 11 Dec 2011, 23:16:21 UTC - in response to Message 1177346.  
Last modified: 11 Dec 2011, 23:17:08 UTC


Do you have <flops> values in your app_info?

Claggy



No the only alteration I have made is to the count, To be honest calculating flops and more advanced modifications to app_info or similar type files is above my comfort level.

I keep 3 copys of app_info.txt (count 1, 0.5, 0.33) and copy and paste in the right format when I need them. I know its cheating but it works for me.

My "programing" abilities are very low, so I have a bad habit of finding other ways that seem to work for me.


Flops are fairly easy to calculate as the needed numbers are in your applications details for the machine in question.

For example on the SETI@home Enhanced (anonymous platform, nvidia GPU)
It shows 156.60088862089

All you have to do is add a e09 to it and you get
<flops>156.60088862089e09</flops>

Your CPU is
<flops>62.12696862414e09</flops>

and AP is
<flops>32.098248233676e09</flops>


I wouldn't use the CPU MB flops value as CPU work has been rebranded to the GPU, and the APR of the CPU has been inflated, I'd use a figure of around 15e09 instead of 62.12e09

Claggy
ID: 1177354 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1177366 - Posted: 12 Dec 2011, 0:17:16 UTC - in response to Message 1177354.  


Flops are fairly easy to calculate as the needed numbers are in your applications details for the machine in question.

For example on the SETI@home Enhanced (anonymous platform, nvidia GPU)
It shows 156.60088862089

All you have to do is add a e09 to it and you get
<flops>156.60088862089e09</flops>

Your CPU is
<flops>62.12696862414e09</flops>

and AP is
<flops>32.098248233676e09</flops>


I wouldn't use the CPU MB flops value as CPU work has been rebranded to the GPU, and the APR of the CPU has been inflated, I'd use a figure of around 15e09 instead of 62.12e09

Claggy


OK, I can see where the figures are coming from and the potential problems that could be caused by my prior actions, but I am sorry I am still lost as to where in the app_info file I have to put this information.

I apologize for being a pain, but my computing skills (software, programing etc) are extremely limited.


Kevin


ID: 1177366 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1177374 - Posted: 12 Dec 2011, 1:00:09 UTC

I stick mine right under the version number.
<app_info> 
    <app>
        <name>setiathome_enhanced</name>
    </app>
    <file_info>
        <name>AK_v8b2_win_SSE3_AMD.exe</name>
        <executable/>
    </file_info>
    <app_version>
        <app_name>setiathome_enhanced</app_name>
        <version_num>603</version_num>
	<flops>15.812442866308e09</flops>
	<platform>windows_intelx86</platform>
        <file_ref>
            <file_name>AK_v8b2_win_SSE3_AMD.exe</file_name>
            <main_program/>
        </file_ref>
    </app_version>
    <app_version>
        <app_name>setiathome_enhanced</app_name>
        <version_num>603</version_num>
	<flops>15.812442866308e09</flops>
	<platform>windows_x86_64</platform>
        <file_ref>
           <file_name>AK_v8b2_win_SSE3_AMD.exe</file_name>
            <main_program/>
        </file_ref>
    </app_version>
    <app>
        <name>setiathome_enhanced</name>
    </app>
    <file_info>
        <name>Lunatics_x41g_win32_cuda41rc1.exe</name>
        <executable/>
    </file_info>
    <file_info>
        <name>cudart32_41_15.dll</name>
        <executable/>
    </file_info>
    <file_info>
        <name>cufft32_41_15.dll</name>
        <executable/>
    </file_info>
    <app_version>
        <app_name>setiathome_enhanced</app_name>
        <version_num>610</version_num>
	<flops>203.99134720153e09</flops>
	<platform>windows_intelx86</platform>
        <plan_class>cuda_fermi</plan_class>
        <avg_ncpus>0.040000</avg_ncpus>
        <max_ncpus>0.040000</max_ncpus>
        <coproc>


You will need to add those numbers to all 6 occurrences of the GPU part.

ID: 1177374 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1177375 - Posted: 12 Dec 2011, 1:09:55 UTC - in response to Message 1177374.  

I stick mine right under the version number.

You will need to add those numbers to all 6 occurrences of the GPU part.


Thank you

I shall stop rescheduling and let the numbers stabilise and then give it a try.



Kevin


ID: 1177375 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1177396 - Posted: 12 Dec 2011, 5:05:58 UTC - in response to Message 1177375.  

I stick mine right under the version number.

You will need to add those numbers to all 6 occurrences of the GPU part.


Thank you

I shall stop rescheduling and let the numbers stabilise and then give it a try.

Just before you add the <flops> entries, I suggest you use the rescheduler to protect against -177 errors for maximum elapsed time exceeded on cached tasks. The expert tab has a setting called something like "limit rsc_fpops_bound" which provides that protection when you do a reschedule, even if the other settings mean that no tasks are actually rescheduled. That's as much as I know, I hope someone with practical experience will provide a clearer explanation.

Those cached tasks will have scaled down rsc_fpops_est which is used to calculate estimated time, and rsc_fpops_bound which is used to calculate the time limit. The big increase in flops will make the times much shorter, but DCF will soon increase to fix the estimates. Unfortunately DCF isn't used for the limit.
                                                                  Joe
ID: 1177396 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1177418 - Posted: 12 Dec 2011, 8:25:49 UTC - in response to Message 1177396.  

I stick mine right under the version number.

You will need to add those numbers to all 6 occurrences of the GPU part.


Thank you

I shall stop rescheduling and let the numbers stabilise and then give it a try.

Just before you add the <flops> entries, I suggest you use the rescheduler to protect against -177 errors for maximum elapsed time exceeded on cached tasks. The expert tab has a setting called something like "limit rsc_fpops_bound" which provides that protection when you do a reschedule, even if the other settings mean that no tasks are actually rescheduled. That's as much as I know, I hope someone with practical experience will provide a clearer explanation.

Those cached tasks will have scaled down rsc_fpops_est which is used to calculate estimated time, and rsc_fpops_bound which is used to calculate the time limit. The big increase in flops will make the times much shorter, but DCF will soon increase to fix the estimates. Unfortunately DCF isn't used for the limit.
                                                                  Joe


It is one of the main reasons that I use reschedular for, the other is to get an estimate of how many VHAR's to regulars are residing in my cache.


Kevin


ID: 1177418 · Report as offensive
Starman
Avatar

Send message
Joined: 15 May 99
Posts: 204
Credit: 81,351,915
RAC: 25
Canada
Message 1178212 - Posted: 15 Dec 2011, 5:30:56 UTC

Well, I finally got some GPU work for this machine. In fact 6 AP WU's no less!
The only change I made (2 days ago) was to downgrade to 6.10.60. :)

ID: 1178212 · Report as offensive
AndyJ
Avatar

Send message
Joined: 17 Aug 02
Posts: 248
Credit: 27,380,797
RAC: 0
United Kingdom
Message 1178272 - Posted: 15 Dec 2011, 11:56:34 UTC - in response to Message 1177396.  

I stick mine right under the version number.

You will need to add those numbers to all 6 occurrences of the GPU part.


Thank you

I shall stop rescheduling and let the numbers stabilise and then give it a try.

Just before you add the <flops> entries, I suggest you use the rescheduler to protect against -177 errors for maximum elapsed time exceeded on cached tasks. The expert tab has a setting called something like "limit rsc_fpops_bound" which provides that protection when you do a reschedule, even if the other settings mean that no tasks are actually rescheduled. That's as much as I know, I hope someone with practical experience will provide a clearer explanation.

Those cached tasks will have scaled down rsc_fpops_est which is used to calculate estimated time, and rsc_fpops_bound which is used to calculate the time limit. The big increase in flops will make the times much shorter, but DCF will soon increase to fix the estimates. Unfortunately DCF isn't used for the limit.
                                                                  Joe


Good info Joe, I just put <flops> onto my main cruncher, got -17's. Will give this a try. Or is there a way to edit fpops in the app_info?

Regards,

A
ID: 1178272 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : No GPU work on one machine


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.