CUDA Issue

Message boards : Number crunching : CUDA Issue
Message board moderation

To post messages, you must log in.

AuthorMessage
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 920906 - Posted: 24 Jul 2009, 2:25:06 UTC
Last modified: 24 Jul 2009, 2:26:51 UTC

I am running 6.6.20 on a machine with a CUDA card.
Whenever I d/l a new CUDA WU, if it has a nearer Report Deadline than the currently running one, it switches to the new WU, no matter how far in the future the d/l is (my nearest one is currently 8/16, which shouldn't cause this behavior, but it does). All the previous ones are in September, but they did the same thing...
NOTE: 8/16 WU just got suspended by an 8/6 WU.
I have several suspended ("waiting to run") CUDA WUs now.
The CPU WUs do NOT show this behavior.
Is this a bug?
Or am I missing something?

AND: by the way, how is it determined if a WU is CUDA or not? And where (client or server)?
ID: 920906 · Report as offensive
zpm
Volunteer tester
Avatar

Send message
Joined: 25 Apr 08
Posts: 284
Credit: 1,659,024
RAC: 0
United States
Message 920919 - Posted: 24 Jul 2009, 3:02:09 UTC - in response to Message 920906.  

upgrade to 6.6.36 or higher.


gpu/nearest deadline beat all other... like gpugrid which likes to be returned within a week will go before seti, and aqua would come in 3rd..

8/5 will trump 8/10 and so on.

I recommend Secunia PSI: http://secunia.com/vulnerability_scanning/personal/
Go Georgia Tech.
ID: 920919 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 920945 - Posted: 24 Jul 2009, 6:30:12 UTC

Are you running multiple projects or cpu and gpu MB?

As zpm says you can upgrade to the latest because Boinc has difficulties separating the edf between projects and cpu/gpu work thus making it believe that work wouldn't make the deadline and starts pausing the work etc.

If that doesn't work you can try my way and that is to downgrade to as far as 6.4.7.

That version hasn't up to date paused wu's believing other work wouldn't make the deadline etc.

At worst i had perhaps 200 pauses and my tasklist was filled up

Check this thread http://setiathome.berkeley.edu/forum_thread.php?id=54613&nowrap=true#917137

Kind regards Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 920945 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 920961 - Posted: 24 Jul 2009, 7:06:24 UTC - in response to Message 920945.  
Last modified: 24 Jul 2009, 7:21:05 UTC

Are you running multiple projects or cpu and gpu MB?

As zpm says you can upgrade to the latest because Boinc has difficulties separating the edf between projects and cpu/gpu work thus making it believe that work wouldn't make the deadline and starts pausing the work etc.

If that doesn't work you can try my way and that is to downgrade to as far as 6.4.7.

That version hasn't up to date paused wu's believing other work wouldn't make the deadline etc.

At worst i had perhaps 200 pauses and my tasklist was filled up

Check this thread http://setiathome.berkeley.edu/forum_thread.php?id=54613&nowrap=true#917137

Kind regards Vyper


Only SETI.

The problem is there is NO EDF involved here. The GPU stuff CLEARLY has no time problem; if anything, the CPU stuff should be the one with the problem. Yet it is the GPU stuff doing the "suspend" dance.
Earlier (a couple of days ago) I had some EDF probs with CPU and it simply ran the nearer-deadline WUs first (out of order) but didn't suspend WUs in the middle as it is doing for CUDA. And CUDA WUs are clearly shorter, not longer than CPU ones. Makes no sense to me.

NEW INFO: I just watched it finish a WU and then get the next nearest Deadline WU. And there are 2 "Waiting to run" as well. But only 2 running CUDA processes (should be 3 - running and 2 waiting per the thread ref'd above!!!!).
ID: 920961 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 920967 - Posted: 24 Jul 2009, 7:20:13 UTC - in response to Message 920961.  

And CUDA WUs are clearly shorter, not longer than CPU ones. Makes no sense to me.

There are no CUDA WUs. These Seti Enhanced Multibeams can run on both GPU and CPU, but there is really no difference in data. They just run faster on your GPU as the GPU uses a lot of internal processors that all do calculations on the same data at the same time (in parallel).

Were you to run one such task on your GPU first, then run it on your CPU, you'd notice the difference.
ID: 920967 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 920968 - Posted: 24 Jul 2009, 7:23:07 UTC - in response to Message 920967.  

Jord - I get that...
Part of my post was asking how (and by who) they are distinguished...why can't the local client decide how to run them????
ID: 920968 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 920976 - Posted: 24 Jul 2009, 7:45:49 UTC - in response to Message 920968.  

Part of my post was asking how (and by who) they are distinguished...

Simply said...

Your client, when it asks for work will send along the application version number it is asking work for. The scheduler at the project will then point you to the download server which will allow you to download general work assigned to that application version.

why can't the local client decide how to run them????

The application isn't that intelligent, while BOINC will just check what application number the work got assigned and use that application to run the work with. This is also why you can't have two applications with the same version number on your system.

I'm sure someone else will come along at any time to give you the difficult version, if you want them to do so. ;-)
ID: 920976 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 921005 - Posted: 24 Jul 2009, 12:25:49 UTC - in response to Message 920976.  
Last modified: 24 Jul 2009, 12:26:25 UTC


I'm sure someone else will come along at any time to give you the difficult version, if you want them to do so. ;-)


Probably not!!!!
ID: 921005 · Report as offensive
Nemesis

Send message
Joined: 14 Mar 07
Posts: 129
Credit: 31,295,655
RAC: 0
Canada
Message 921277 - Posted: 25 Jul 2009, 14:54:22 UTC - in response to Message 920961.  

Are you running multiple projects or cpu and gpu MB?

As zpm says you can upgrade to the latest because Boinc has difficulties separating the edf between projects and cpu/gpu work thus making it believe that work wouldn't make the deadline and starts pausing the work etc.

If that doesn't work you can try my way and that is to downgrade to as far as 6.4.7.

That version hasn't up to date paused wu's believing other work wouldn't make the deadline etc.

At worst i had perhaps 200 pauses and my tasklist was filled up

Check this thread http://setiathome.berkeley.edu/forum_thread.php?id=54613&nowrap=true#917137

Kind regards Vyper


Only SETI.

The problem is there is NO EDF involved here. The GPU stuff CLEARLY has no time problem; if anything, the CPU stuff should be the one with the problem. Yet it is the GPU stuff doing the "suspend" dance.
Earlier (a couple of days ago) I had some EDF probs with CPU and it simply ran the nearer-deadline WUs first (out of order) but didn't suspend WUs in the middle as it is doing for CUDA. And CUDA WUs are clearly shorter, not longer than CPU ones. Makes no sense to me.

NEW INFO: I just watched it finish a WU and then get the next nearest Deadline WU. And there are 2 "Waiting to run" as well. But only 2 running CUDA processes (should be 3 - running and 2 waiting per the thread ref'd above!!!!).



jravin, I have the exact same problem. I am running 6.6.36 client with the latest optimized apps on a QX6700 & 8800GTX card. The "suspend dance" as you call it drives me crazy! Sometimes, but not always, I end up with a dozen or more "waiting to run" CUDA tasks and a varied number of CUDA processes, anywhere from 3 to 8....once I get over 3 I always end up with failing CUDA WU's due to graphics card memory allocation errors. I generally just re-boot the system. I doesn't fix up the "waiting to run" problem but it does clear out the extra CUDA processes and reset the video card.

I hope this gets sorted out soon. I used to leave BOINC to its own devices and just check on it every couple of days but for the last couple of months I have found myself "baby sitting" it multiple times a day...
ID: 921277 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 921318 - Posted: 25 Jul 2009, 17:06:00 UTC - in response to Message 921277.  
Last modified: 25 Jul 2009, 17:09:21 UTC

I think you are hitting the problem with 6.6.36 and earlier that when new GPU tasks are started the old ones can be left in in memory and eventually you run out of memory and get the errors - you could check from your Windows tasks list.

Message 915264 refers to this and that 6.6.37 has a fix for it. I haven't noticed any memory errors of this type since upgrading. I still get waiting tasks but they do eventually get run.

Note however, that it is still a test version so running it is at your own risk.
GPU Users Group



ID: 921318 · Report as offensive
Nemesis

Send message
Joined: 14 Mar 07
Posts: 129
Credit: 31,295,655
RAC: 0
Canada
Message 921323 - Posted: 25 Jul 2009, 18:00:34 UTC - in response to Message 921318.  

I think you are hitting the problem with 6.6.36 and earlier that when new GPU tasks are started the old ones can be left in in memory and eventually you run out of memory and get the errors - you could check from your Windows tasks list.

Message 915264 refers to this and that 6.6.37 has a fix for it. I haven't noticed any memory errors of this type since upgrading. I still get waiting tasks but they do eventually get run.

Note however, that it is still a test version so running it is at your own risk.


Thanks for the reply. Yes, that is, of course, the exact problem. I have not moved to 6.6.37 since it's still a "test" version. I have been bitten before doing that...

Hopefully 6.6.37 once officially released, will remedy this situation.
ID: 921323 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 921434 - Posted: 26 Jul 2009, 8:07:25 UTC - in response to Message 920976.  

Part of my post was asking how (and by who) they are distinguished...

Simply said...

Your client, when it asks for work will send along the application version number it is asking work for. The scheduler at the project will then point you to the download server which will allow you to download general work assigned to that application version.

why can't the local client decide how to run them????

The application isn't that intelligent, while BOINC will just check what application number the work got assigned and use that application to run the work with. This is also why you can't have two applications with the same version number on your system.

I'm sure someone else will come along at any time to give you the difficult version, if you want them to do so. ;-)


Well then, is there a tool for changing which app the WU is waiting for? Since WUs run so much faster on my GTS 250 (which I bought just for this use), I believe if I run only CUDA I could liberate my CPUs and still process plenty of WUs.
ID: 921434 · Report as offensive
Profile Questor Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 3 Sep 04
Posts: 471
Credit: 230,506,401
RAC: 157
United Kingdom
Message 921438 - Posted: 26 Jul 2009, 8:34:40 UTC - in response to Message 921434.  
Last modified: 26 Jul 2009, 8:37:36 UTC

Well then, is there a tool for changing which app the WU is waiting for? Since WUs run so much faster on my GTS 250 (which I bought just for this use), I believe if I run only CUDA I could liberate my CPUs and still process plenty of WUs.


There is a tool called ReSchedule1.9 available from Lunatics web site which will move tasks CPU <--> GPU and also relocate VLAR tasks to CPU (these run a lot slower than normal on the GPU).

However you are using optimized apps so if you don't want any CPU tasks at all you could remove the CPU entries from your app_info.xml file so only the CUDA tasks are requested. Beware of doing this while you have CPU tasks though as BOINC will report no associated app and abort all the CPU tasks.

In theory you should be able to deselect CPU on your preferences page but I can't remember if this still works if you are using app_info?
GPU Users Group



ID: 921438 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 921440 - Posted: 26 Jul 2009, 8:39:01 UTC - in response to Message 921438.  

... I can't remember if this still works if you are using app_info?

No, it doesn't - app_info takes precedence.
ID: 921440 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 922641 - Posted: 31 Jul 2009, 12:14:42 UTC - in response to Message 921438.  
Last modified: 31 Jul 2009, 12:20:00 UTC

Well then, is there a tool for changing which app the WU is waiting for? Since WUs run so much faster on my GTS 250 (which I bought just for this use), I believe if I run only CUDA I could liberate my CPUs and still process plenty of WUs.


There is a tool called ReSchedule1.9 available from Lunatics web site which will move tasks CPU <--> GPU and also relocate VLAR tasks to CPU (these run a lot slower than normal on the GPU).

However you are using optimized apps so if you don't want any CPU tasks at all you could remove the CPU entries from your app_info.xml file so only the CUDA tasks are requested. Beware of doing this while you have CPU tasks though as BOINC will report no associated app and abort all the CPU tasks.

In theory you should be able to deselect CPU on your preferences page but I can't remember if this still works if you are using app_info?


You can tell reschedule to shift all work from the cpu to the gpu, but thats not a good idea.

1. VLAR's take forever to run on the gpu

2. BOINC gets all messed up.

Basically what happens is BOINC asks for some cpu work, it gets shifted to the gpu (by reschedule) so BOINC thinking it hasn't got enough work asks for more, until you get to the point where you have way too much gpu work for BOINC to complete. The same applies if you do it the other way (gpu back to cpu).

Its best if you just let reschedule move the VLAR's from gpu to cpu if there are any. Even doing that you can end up with too much cpu work if you don't keep an eye on it. When that happens just set all the projects to No new work and let it finish them off.

Oh and i'm running 6.6.37 on all my rigs after similar problems to you. There is already a 6.6.38 available, its main change was to the Backoff of failed file transfers.
BOINC blog
ID: 922641 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 922661 - Posted: 31 Jul 2009, 13:53:18 UTC - in response to Message 922641.  

I AM using Reschedule (1.9) now; I just use it to move 50% of work to GPU. This appears to work fine.

As my CUDA card - GTS 250 - does a typical WU in 20-30minutes vs. my dual 2352s doing one in 3-4 hours (so - 8 cores - about 1 every 20-30 min.), the workload is now balanced in terms of throughput.

Which leads me to think that it's much more cost effective to buy another CUDA card rather than to set up another machine to do SETI....
Also, more energy effective.
ID: 922661 · Report as offensive

Message boards : Number crunching : CUDA Issue


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.