CUDA work only / CUDA calc errors


log in

Advanced search

Message boards : Number crunching : CUDA work only / CUDA calc errors

Author Message
Profile SenecaProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Oct 02
Posts: 31
Credit: 3,321,887
RAC: 3,579
Germany
Message 1225791 - Posted: 1 May 2012, 15:55:53 UTC

Hi there ...

even while I've not changed any settings, SETI@home doesn't deliver any more standard (CPU) work packets to me. The feed of cuda_fermi packets is (mostly) uninterrupted, as well as the feed of CPU packets for my other projects (rosetta & Einstein).

Does anybody have an explanlation (an/or fix) for that ?

By the way, I observe may cuda_fermi packets wit calculation errors. Normal packets run for about 10..15 minutes and cause a GPU load of 50...60%. The "bad" packets start, run about 2..3 hours and cause the GPU approx. zero load. At the end they result ins "calculation error".

Any hint ?

Per aspera ad ETI ...

Seneca
0=0
____________
Per Aspera Ad ETI ..
0=0

Wembley
Volunteer tester
Avatar
Send message
Joined: 16 Sep 09
Posts: 415
Credit: 888,257
RAC: 0
United States
Message 1225801 - Posted: 1 May 2012, 19:56:33 UTC - in response to Message 1225791.

Normal packets run for about 10..15 minutes and cause a GPU load of 50...60%. The "bad" packets start, run about 2..3 hours and cause the GPU approx. zero load. At the end they result ins "calculation error".

If a GPU work unit takes a lot longer than normal and uses 0% GPU load, it means the app has crunched the work unit on the CPU instead as a fall back, usually because it can't find your GPU. One cause of this might be Remote Desktop access turning off your GPU. Or using one of the buggy nVidia drivers and letting Windows turn off your monitor (also removes the GPU).
____________


Donate with your searches and online buys:
http://www.goodsearch.com/toolbar/university-of-california-setihome

Profile Spectrum
Avatar
Send message
Joined: 14 Jun 99
Posts: 468
Credit: 53,129,336
RAC: 0
Australia
Message 1225879 - Posted: 1 May 2012, 22:28:45 UTC
Last modified: 1 May 2012, 22:30:10 UTC

Hi all.

Sounds suspiciously like what is happening to me, it has only started since Microsoft update updated my nvidia drivers on Monday night, the workunits start out fine but after an hour or so I noticed the GPU's were at 0 load and the workunits were taking forever, a exit and restart of Boinc Manager gets them moving again but if left unattended they run for about 3 hours and return a computation error!! I have rolled the driver back and am waiting to see if it fixes the problem.

Hi Mark, how are the Kitties?
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8634
Credit: 51,629,112
RAC: 49,003
United Kingdom
Message 1225887 - Posted: 1 May 2012, 22:38:09 UTC - in response to Message 1225879.

Hi all.

Sounds suspiciously like what is happening to me, it has only started since Microsoft update updated my nvidia drivers on Monday night, the workunits start out fine but after an hour or so I noticed the GPU's were at 0 load and the workunits were taking forever, a exit and restart of Boinc Manager gets them moving again but if left unattended they run for about 3 hours and return a computation error!! I have rolled the driver back and am waiting to see if it fixes the problem.

Hi Mark, how are the Kitties?

Don't use Microsoft drivers for an NVidia card. Use NVidia drivers - they should know what they're driving for.

But avoid driver versions 295.73 and 296.10, which are known to have a bug.

Profile Spectrum
Avatar
Send message
Joined: 14 Jun 99
Posts: 468
Credit: 53,129,336
RAC: 0
Australia
Message 1225982 - Posted: 2 May 2012, 2:12:55 UTC - in response to Message 1225887.

Hi Richard.
Usually only use the Nvidia drivers, it was just that I clicked without reading what Microsoft was updating so when I backtracked I found that it had changed my drivers so I rolled it back to 285.62 so far no more problems :)
____________

Profile SenecaProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Oct 02
Posts: 31
Credit: 3,321,887
RAC: 3,579
Germany
Message 1226280 - Posted: 2 May 2012, 18:30:12 UTC - in response to Message 1225792.
Last modified: 2 May 2012, 18:33:01 UTC

It might help if you unhide your computers so folks could have a look at your results and see what the errors are.


Done. ID 6188051. Hope it helps ...

But avoid driver versions 295.73 and 296.10, which are known to have a bug.


Just running 296.10 ... I'll try rolling back to 285.62.

0=0

Profile Spectrum
Avatar
Send message
Joined: 14 Jun 99
Posts: 468
Credit: 53,129,336
RAC: 0
Australia
Message 1226300 - Posted: 2 May 2012, 19:36:19 UTC - in response to Message 1226280.

Seems to have fixed mine :)

____________

Profile SenecaProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Oct 02
Posts: 31
Credit: 3,321,887
RAC: 3,579
Germany
Message 1226602 - Posted: 3 May 2012, 9:37:24 UTC - in response to Message 1226300.
Last modified: 3 May 2012, 9:39:35 UTC

Needs some more watchful waiting, but I'm full of hope =;-)

Anyhow - SETI is still delivering almost only CUDA work to me, resulting in crunching Einstein and rosetta packets, which are fallback projects for me ... any hint on that ?
____________
Per Aspera Ad ETI ..
0=0

LadyL
Volunteer tester
Avatar
Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1226612 - Posted: 3 May 2012, 10:59:34 UTC

Rosetta is CPU only, Einstein is both. What are the resource shares of your projects?

If you want them as pure backup set resource share to 0.
[iirc the bug that it would request from the backup at minimal got fixed]

Probably a mixture of resource share, cache settings and coming up empty when asking SETI.

When coming from 6.x BOINC swap your cache settings around.



Regarding the errors - -177 from CPU fallback due to CUDA card disappearing - Update your NVidia driver to 300.x or downgrade to 290.x

Alternatively go optimised, x41g is more resilient (and faster).


____________
I'm not the Pope. I don't speak Ex Cathedra!

Profile SenecaProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Oct 02
Posts: 31
Credit: 3,321,887
RAC: 3,579
Germany
Message 1228312 - Posted: 6 May 2012, 14:04:27 UTC - in response to Message 1226612.

Rosetta is CPU only, Einstein is both. What are the resource shares of your projects?


SETI 1000
Einstein 0
rosetta 1e-7 (doesn't accept zero)



Probably a mixture of resource share, cache settings and coming up empty when asking SETI.


If I explicit stop Einstein & Rosetta, I get SETI CPU work, and it stays on until done - along with the always running CUDA work - even when E&R are re-enabled again.


When coming from 6.x BOINC swap your cache settings around.


Where & What ?


Regarding the errors - -177 from CPU fallback due to CUDA card disappearing - Update your NVidia driver to 300.x or downgrade to 290.x


Went back to 285.xx - runs smooth.

0=0

LadyL
Volunteer tester
Avatar
Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1228597 - Posted: 7 May 2012, 8:17:09 UTC - in response to Message 1228312.

Rosetta is CPU only, Einstein is both. What are the resource shares of your projects?


SETI 1000
Einstein 0
rosetta 1e-7 (doesn't accept zero)


That should do :D ah yes, Rosetta is running quite an old server code..
Two backup projects?



Probably a mixture of resource share, cache settings and coming up empty when asking SETI.


If I explicit stop Einstein & Rosetta, I get SETI CPU work, and it stays on until done - along with the always running CUDA work - even when E&R are re-enabled again.


When there are effectively no other projects, it will keep asking seti until it gets some.



When coming from 6.x BOINC swap your cache settings around.


Where & What ?

General preferences network: maintain enough tasks for .. and additionally. You want a larger (3-5) number of days on the first and a small (0.1-1) on the second.

It will always ask SETI for work first, but it will keep asking Rosetta, when SETI hasn't got tasks (or rather you were unlucky in your request) so you'll end up with quite a bit of Rosetta work you don't really want. I would consider setting NNT (No New Tasks) on Rosetta and manually removing that when SETI is really out of work. Einstein on it's own should have quite enough to work as a backup in any case.
____________
I'm not the Pope. I don't speak Ex Cathedra!

Message boards : Number crunching : CUDA work only / CUDA calc errors

Copyright © 2014 University of California