Tasks hanging on CUDA 6.08

Questions and Answers : GPU applications : Tasks hanging on CUDA 6.08
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Profile Westsail and *Pyxey*
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,544,999
RAC: 0
United States
Message 865791 - Posted: 15 Feb 2009, 16:51:17 UTC

It comes in random batches but every so often get a WU that will stay in the cpu full utilized 0.00% state yet the cpu time because it is 100% use continues to count up in wall time. My apologies don't have an actual wu example at hand, but will keep look out and post a few if they come through my hands. Thanks guys!
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
ID: 865791 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 866253 - Posted: 16 Feb 2009, 21:38:02 UTC

AH HA! I think I finally fond the culprit. The problem is not the SETI CUDA app itself, but rather it's a problem either with BOINC or CUDA in general. I am running a GPU Grid client and it too just hung in exactly the same fashion as SETI CUDA. With that said, I will no longer be reporting hung SETI CUDA tasks here. I will wait until a new version of BOINC or new drivers come out and report back with any updates.
You will be assimilated...bunghole!

ID: 866253 · Report as offensive
Profile Mike O
Avatar

Send message
Joined: 1 Sep 07
Posts: 428
Credit: 6,670,998
RAC: 0
United States
Message 866351 - Posted: 17 Feb 2009, 2:18:11 UTC

I found that by setting the cpu count to the real count rather than real+1, the hangups stopped. I am using Raistmer's team cpu_gpu mod. This cost me the use of one core if there is one the cuda can process but its better than having the cuda hang for 12 hours like it has.
I think it has something to do with the seti client App. When I got home today, all the cores where showing 100% usage but the temps where low (~50c). No where near the normal 65c they run at during full load. There was very little progress happening. >.01% per second per core. I restarted the client and it all started to work again.
If this works for anyone else, please post it ok?


Not Ready Reading BRAIN. Abort/Retry/Fail?
ID: 866351 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 866377 - Posted: 17 Feb 2009, 3:30:19 UTC - in response to Message 866351.  

I found that by setting the cpu count to the real count rather than real+1, the hangups stopped. I am using Raistmer's team cpu_gpu mod. This cost me the use of one core if there is one the cuda can process but its better than having the cuda hang for 12 hours like it has.
I think it has something to do with the seti client App. When I got home today, all the cores where showing 100% usage but the temps where low (~50c). No where near the normal 65c they run at during full load. There was very little progress happening. >.01% per second per core. I restarted the client and it all started to work again.
If this works for anyone else, please post it ok?



The side effect is that sometimes BOINC would only crunch with one of my two CPU cores. I consider the hangups to be less annoying than a constantly under-utilized CPU. :) I have a scheduled task set to run BOINC benchmarks every couple hours which un-sticks any problem tasks so I'll just do it that way until the next BOINC version.
You will be assimilated...bunghole!

ID: 866377 · Report as offensive
john deneer
Volunteer tester
Avatar

Send message
Joined: 16 Nov 06
Posts: 331
Credit: 20,996,606
RAC: 0
Netherlands
Message 866479 - Posted: 17 Feb 2009, 13:06:05 UTC - in response to Message 866377.  

The side effect is that sometimes BOINC would only crunch with one of my two CPU cores. I consider the hangups to be less annoying than a constantly under-utilized CPU. :) I have a scheduled task set to run BOINC benchmarks every couple hours which un-sticks any problem tasks so I'll just do it that way until the next BOINC version.


Now there's an idea .... I'm still having trouble with Raistmer's v8 (q9450, WinXP 64, 8800 GTS card), once in a while a cuda task simply 'freezes' right at the beginning of execution, staying at 0 seconds and 0%

This is a bit different from the stickies several people have reported, since those usually occur somewhere during processing (or so I understand) whereas my stickies never start processing at all without restarting boincmgr. So I don't know whether suspending/restarting through a benchmark will unstick mine. Bit it's worth a try.

The strange thing is that I have never had any of the 'regular' stickies, only those that don't start at all (without restarting boincmgr). And the ones I have come very irregularly, sometimes after only a few cuda tasks and sometimes a whole day goes by with cuda merrily crunching along without any problems at all.

Well, we'll see what happens. I'm always in for a little experiment, studied chemistry for exactly that reason :-)

Regards,
John.
ID: 866479 · Report as offensive
Profile Mike O
Avatar

Send message
Joined: 1 Sep 07
Posts: 428
Credit: 6,670,998
RAC: 0
United States
Message 866581 - Posted: 17 Feb 2009, 23:58:21 UTC

I will try the benchmark approch as well.. I too dont really like to see a Core idling. Causes anxiety.. ;)

Not Ready Reading BRAIN. Abort/Retry/Fail?
ID: 866581 · Report as offensive
QUASAR

Send message
Joined: 5 Dec 99
Posts: 14
Credit: 1,734,737
RAC: 0
Canada
Message 866647 - Posted: 18 Feb 2009, 4:23:55 UTC

i have the same issues as john deneer..and the same card but for the past 3 days I've only gotten compute errors on all my tasks and nothing has fixed it.. i suspended seti and let einstein work cause i<ve lost to much crunching time with the CUDA. If anyone can help me^!^!^!
ID: 866647 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 866687 - Posted: 18 Feb 2009, 7:03:47 UTC - in response to Message 866647.  

If anyone can help me^!^!^!

Only if you help us. You're not giving any information. Please check this thread, first two posts mostly, to provide correct information. And please make it a new thread, no need to piggy-back onto this thread that has a whole different content.
ID: 866687 · Report as offensive
john deneer
Volunteer tester
Avatar

Send message
Joined: 16 Nov 06
Posts: 331
Credit: 20,996,606
RAC: 0
Netherlands
Message 866810 - Posted: 18 Feb 2009, 19:28:08 UTC - in response to Message 866479.  

I have a scheduled task set to run BOINC benchmarks every couple hours which un-sticks any problem tasks so I'll just do it that way until the next BOINC version.


Now there's an idea .... I'm still having trouble with Raistmer's v8 (q9450, WinXP 64, 8800 GTS card), once in a while a cuda task simply 'freezes' right at the beginning of execution, staying at 0 seconds and 0%


I now have a scheduled task running every 4 hours which stops boincmanager and then restarts it. I found that my particular flavor of stickies did not start by running a benchmark. They did also not respond to completely suspending and resuming work. Only a boinccmd --quit followed by restarting boincmgr made them move.

So I now run that task a couple of times per day, hoping that I can keep my cuda's coming and going in an more regular fashion than before.

Regards,
John.
ID: 866810 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 871399 - Posted: 2 Mar 2009, 18:19:43 UTC

I just built a second CUDA-capable computer and although it worked flawlessly for the past couple weeks, it too is now starting to have GPU tasks hang in the same fashion as my main system. My new system is as follows:

Pentium Dual Core E2180 2ghz
2gb RAM
GeForce 8400GS 256mb PCI-E
Driver version 182.06
Boinc version 6.4.5
You will be assimilated...bunghole!

ID: 871399 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 882993 - Posted: 7 Apr 2009, 4:34:38 UTC - in response to Message 871399.  

Just an update, CUDA tasks continue to hang intermittently with BOINC 6.6.20.
You will be assimilated...bunghole!

ID: 882993 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 882994 - Posted: 7 Apr 2009, 4:36:03 UTC - in response to Message 866810.  



I now have a scheduled task running every 4 hours which stops boincmanager and then restarts it. I found that my particular flavor of stickies did not start by running a benchmark. They did also not respond to completely suspending and resuming work. Only a boinccmd --quit followed by restarting boincmgr made them move.

When I use that command, it kills the BOINC client but the manager stays open. When I launch BOINC afterwards I find two copies of the manager in my system tray. Do you have the same problem?



You will be assimilated...bunghole!

ID: 882994 · Report as offensive
Profile -=SuperG=-
Avatar

Send message
Joined: 3 Apr 99
Posts: 63
Credit: 89,161,651
RAC: 23
Canada
Message 883312 - Posted: 8 Apr 2009, 6:43:35 UTC
Last modified: 8 Apr 2009, 6:50:59 UTC

Launching the Boincmgr after a --quit also resulted in 2 Boinc managers in the systray for me as well..

I have tried the --quit by itself. I found that after about a minute the client reconnected to the localhost and all restarted again ok. So far this works great!
Boinc Wiki




"Great spirits have always encountered violent opposition from mediocre minds." -Albert Einstein
ID: 883312 · Report as offensive
Previous · 1 · 2 · 3

Questions and Answers : GPU applications : Tasks hanging on CUDA 6.08


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.