CUDA work only / CUDA calc errors

Message boards : Number crunching : CUDA work only / CUDA calc errors
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Seneca Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Oct 02
Posts: 51
Credit: 10,114,348
RAC: 272
Germany
Message 1225791 - Posted: 1 May 2012, 15:55:53 UTC

Hi there ...

even while I've not changed any settings, SETI@home doesn't deliver any more standard (CPU) work packets to me. The feed of cuda_fermi packets is (mostly) uninterrupted, as well as the feed of CPU packets for my other projects (rosetta & Einstein).

Does anybody have an explanlation (an/or fix) for that ?

By the way, I observe may cuda_fermi packets wit calculation errors. Normal packets run for about 10..15 minutes and cause a GPU load of 50...60%. The "bad" packets start, run about 2..3 hours and cause the GPU approx. zero load. At the end they result ins "calculation error".

Any hint ?

Per aspera ad ETI ...

Seneca
0=0
Per Aspera Ad ETI ..
0=0
ID: 1225791 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1225792 - Posted: 1 May 2012, 15:57:37 UTC - in response to Message 1225791.  

It might help if you unhide your computers so folks could have a look at your results and see what the errors are.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1225792 · Report as offensive
Wembley
Volunteer tester
Avatar

Send message
Joined: 16 Sep 09
Posts: 429
Credit: 1,844,293
RAC: 0
United States
Message 1225801 - Posted: 1 May 2012, 19:56:33 UTC - in response to Message 1225791.  

Normal packets run for about 10..15 minutes and cause a GPU load of 50...60%. The "bad" packets start, run about 2..3 hours and cause the GPU approx. zero load. At the end they result ins "calculation error".

If a GPU work unit takes a lot longer than normal and uses 0% GPU load, it means the app has crunched the work unit on the CPU instead as a fall back, usually because it can't find your GPU. One cause of this might be Remote Desktop access turning off your GPU. Or using one of the buggy nVidia drivers and letting Windows turn off your monitor (also removes the GPU).
ID: 1225801 · Report as offensive
Profile Spectrum
Avatar

Send message
Joined: 14 Jun 99
Posts: 468
Credit: 53,129,336
RAC: 0
Australia
Message 1225879 - Posted: 1 May 2012, 22:28:45 UTC
Last modified: 1 May 2012, 22:30:10 UTC

Hi all.

Sounds suspiciously like what is happening to me, it has only started since Microsoft update updated my nvidia drivers on Monday night, the workunits start out fine but after an hour or so I noticed the GPU's were at 0 load and the workunits were taking forever, a exit and restart of Boinc Manager gets them moving again but if left unattended they run for about 3 hours and return a computation error!! I have rolled the driver back and am waiting to see if it fixes the problem.

Hi Mark, how are the Kitties?
ID: 1225879 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1225887 - Posted: 1 May 2012, 22:38:09 UTC - in response to Message 1225879.  

Hi all.

Sounds suspiciously like what is happening to me, it has only started since Microsoft update updated my nvidia drivers on Monday night, the workunits start out fine but after an hour or so I noticed the GPU's were at 0 load and the workunits were taking forever, a exit and restart of Boinc Manager gets them moving again but if left unattended they run for about 3 hours and return a computation error!! I have rolled the driver back and am waiting to see if it fixes the problem.

Hi Mark, how are the Kitties?

Don't use Microsoft drivers for an NVidia card. Use NVidia drivers - they should know what they're driving for.

But avoid driver versions 295.73 and 296.10, which are known to have a bug.
ID: 1225887 · Report as offensive
Profile Spectrum
Avatar

Send message
Joined: 14 Jun 99
Posts: 468
Credit: 53,129,336
RAC: 0
Australia
Message 1225982 - Posted: 2 May 2012, 2:12:55 UTC - in response to Message 1225887.  

Hi Richard.
Usually only use the Nvidia drivers, it was just that I clicked without reading what Microsoft was updating so when I backtracked I found that it had changed my drivers so I rolled it back to 285.62 so far no more problems :)
ID: 1225982 · Report as offensive
Profile Seneca Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Oct 02
Posts: 51
Credit: 10,114,348
RAC: 272
Germany
Message 1226280 - Posted: 2 May 2012, 18:30:12 UTC - in response to Message 1225792.  
Last modified: 2 May 2012, 18:33:01 UTC

It might help if you unhide your computers so folks could have a look at your results and see what the errors are.


Done. ID 6188051. Hope it helps ...

But avoid driver versions 295.73 and 296.10, which are known to have a bug.


Just running 296.10 ... I'll try rolling back to 285.62.

0=0
ID: 1226280 · Report as offensive
Profile Spectrum
Avatar

Send message
Joined: 14 Jun 99
Posts: 468
Credit: 53,129,336
RAC: 0
Australia
Message 1226300 - Posted: 2 May 2012, 19:36:19 UTC - in response to Message 1226280.  

Seems to have fixed mine :)

ID: 1226300 · Report as offensive
Profile Seneca Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Oct 02
Posts: 51
Credit: 10,114,348
RAC: 272
Germany
Message 1226602 - Posted: 3 May 2012, 9:37:24 UTC - in response to Message 1226300.  
Last modified: 3 May 2012, 9:39:35 UTC

Needs some more watchful waiting, but I'm full of hope =;-)

Anyhow - SETI is still delivering almost only CUDA work to me, resulting in crunching Einstein and rosetta packets, which are fallback projects for me ... any hint on that ?
Per Aspera Ad ETI ..
0=0
ID: 1226602 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1226612 - Posted: 3 May 2012, 10:59:34 UTC

Rosetta is CPU only, Einstein is both. What are the resource shares of your projects?

If you want them as pure backup set resource share to 0.
[iirc the bug that it would request from the backup at minimal got fixed]

Probably a mixture of resource share, cache settings and coming up empty when asking SETI.

When coming from 6.x BOINC swap your cache settings around.



Regarding the errors - -177 from CPU fallback due to CUDA card disappearing - Update your NVidia driver to 300.x or downgrade to 290.x

Alternatively go optimised, x41g is more resilient (and faster).


I'm not the Pope. I don't speak Ex Cathedra!
ID: 1226612 · Report as offensive
Profile Seneca Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Oct 02
Posts: 51
Credit: 10,114,348
RAC: 272
Germany
Message 1228312 - Posted: 6 May 2012, 14:04:27 UTC - in response to Message 1226612.  

Rosetta is CPU only, Einstein is both. What are the resource shares of your projects?


SETI 1000
Einstein 0
rosetta 1e-7 (doesn't accept zero)



Probably a mixture of resource share, cache settings and coming up empty when asking SETI.


If I explicit stop Einstein & Rosetta, I get SETI CPU work, and it stays on until done - along with the always running CUDA work - even when E&R are re-enabled again.


When coming from 6.x BOINC swap your cache settings around.


Where & What ?


Regarding the errors - -177 from CPU fallback due to CUDA card disappearing - Update your NVidia driver to 300.x or downgrade to 290.x


Went back to 285.xx - runs smooth.

0=0
ID: 1228312 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1228597 - Posted: 7 May 2012, 8:17:09 UTC - in response to Message 1228312.  

Rosetta is CPU only, Einstein is both. What are the resource shares of your projects?


SETI 1000
Einstein 0
rosetta 1e-7 (doesn't accept zero)


That should do :D ah yes, Rosetta is running quite an old server code..
Two backup projects?



Probably a mixture of resource share, cache settings and coming up empty when asking SETI.


If I explicit stop Einstein & Rosetta, I get SETI CPU work, and it stays on until done - along with the always running CUDA work - even when E&R are re-enabled again.


When there are effectively no other projects, it will keep asking seti until it gets some.



When coming from 6.x BOINC swap your cache settings around.


Where & What ?

General preferences network: maintain enough tasks for .. and additionally. You want a larger (3-5) number of days on the first and a small (0.1-1) on the second.

It will always ask SETI for work first, but it will keep asking Rosetta, when SETI hasn't got tasks (or rather you were unlucky in your request) so you'll end up with quite a bit of Rosetta work you don't really want. I would consider setting NNT (No New Tasks) on Rosetta and manually removing that when SETI is really out of work. Einstein on it's own should have quite enough to work as a backup in any case.
I'm not the Pope. I don't speak Ex Cathedra!
ID: 1228597 · Report as offensive

Message boards : Number crunching : CUDA work only / CUDA calc errors


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.