Flood of Tiny GPU/cuda WUs

Message boards : Number crunching : Flood of Tiny GPU/cuda WUs
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 10505
Credit: 7,508,002
RAC: 45
United Kingdom
Message 943145 - Posted: 27 Oct 2009, 13:31:05 UTC - in response to Message 942797.  


I thought that was always tried first as a fix/work-around for any Windows problem!

;-b

Cheers,
Martin


I've seen that fix *nix boxes too. <ducking>

Only when the operator is too lazy to check how the name is abbreviated for whatever service is to be reloaded/restarted!

I guess those with years of uptime are very good at abbreviations and acronyms :-)

... But too lazy to update their kernel! :-(


And there's now a new system for updating live running kernels insitu!...


All very clever stuff.

Happy crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 943145 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 942797 - Posted: 25 Oct 2009, 19:58:13 UTC - in response to Message 942793.  


I thought that was always tried first as a fix/work-around for any Windows problem!

;-b

Cheers,
Martin


I've seen that fix *nix boxes too. <ducking>
ID: 942797 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 10505
Credit: 7,508,002
RAC: 45
United Kingdom
Message 942793 - Posted: 25 Oct 2009, 19:39:24 UTC - in response to Message 942644.  

A small round of applause, everyone, please, for Mike Davis, who solved the problem completely with the first six words typed in response to the opening question.

I thought that was always tried first as a fix/work-around for any Windows problem!

;-b

Cheers,
Martin


See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 942793 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 942647 - Posted: 24 Oct 2009, 21:54:45 UTC - in response to Message 942640.  

I posted the short term and long term debt lines from the xml file (above)...

Yes, but there should be 14 lines, not 2 ;-)
ID: 942647 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14114
Credit: 200,643,578
RAC: 1,983
United Kingdom
Message 942644 - Posted: 24 Oct 2009, 21:43:47 UTC

A small round of applause, everyone, please, for Mike Davis, who solved the problem completely with the first six words typed in response to the opening question.
ID: 942644 · Report as offensive
Cheech Wizard

Send message
Joined: 27 Jun 01
Posts: 4
Credit: 7,151,967
RAC: 54
United States
Message 942640 - Posted: 24 Oct 2009, 21:34:23 UTC - in response to Message 942634.  

Thanks for everyone weighing in and looking at this. I posted the short term and long term debt lines from the xml file (above), but please note that's after a reboot and after it's been running wu's normally now for a couple of hours. GPU temp is back up 60C, where I normally see it when running BOINC. Should have tried the obvious first and re-booted before I started the thread, but that didn't occur to me (duh!) as everything else was running okay. After the reboot, I told BOINC not to download any more SETI work for the time being, so that I can work through the backlog.

So it still has a 50+ WU backlog in the queue, but it's now taking the normal 30 minutes or so to execute one task, and if you care to go look at my pending credits, they are getting awarded ~114 credits each...so it looks like all is normal. In 24 hours or so, when the queue is empty of GPU work, I'll download GPUGrip work and see how that goes. Richard, I hadn't noticed the GPUGrid problem, but apparently whatever was haywire was impacting work for both cuda projects.

Looks like a simple solution (at least for now) to what looked like a baffling problem. Thanks for the help, everyone.
ID: 942640 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 942637 - Posted: 24 Oct 2009, 21:23:04 UTC - in response to Message 942634.  

short_term_debt>182.495135</short_term_debt>
<long_term_debt>-7845.488689</long_term_debt>

There are debt values for each project. You need all to compare them. However, as Richard Haselgrove stated, it's probably not a debt but a quota problem.

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours
ID: 942637 · Report as offensive
Cheech Wizard

Send message
Joined: 27 Jun 01
Posts: 4
Credit: 7,151,967
RAC: 54
United States
Message 942634 - Posted: 24 Oct 2009, 21:16:05 UTC - in response to Message 942563.  

short_term_debt>182.495135</short_term_debt>
<long_term_debt>-7845.488689</long_term_debt>
ID: 942634 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 942606 - Posted: 24 Oct 2009, 17:59:00 UTC - in response to Message 942603.  

The same host is generating lots of errors at GPUGrid too:

http://www.gpugrid.net/results.php?hostid=46283

Edit - GPUGrid is his only other CUDA project, and he's down to a quota of one per day - that's why he's only fetching SETI work, John, nothing to do with debt.

OK, that makes sense.


BOINC WIKI
ID: 942606 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14114
Credit: 200,643,578
RAC: 1,983
United Kingdom
Message 942603 - Posted: 24 Oct 2009, 17:37:27 UTC
Last modified: 24 Oct 2009, 17:56:26 UTC

The same host is generating lots of errors at GPUGrid too:

http://www.gpugrid.net/results.php?hostid=46283

Edit - GPUGrid is his only other CUDA project, and he's down to a quota of one per day - that's why he's only fetching SETI work, John, nothing to do with debt.
ID: 942603 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 942600 - Posted: 24 Oct 2009, 17:31:35 UTC - in response to Message 942584.  

John,
in this case he is getting 30 or 40 tasks at a time. They are erroring out so fast that it thinks he needs that much work. That's why I suggested he check for dust bunnies or possibly his card isn't seated quite right. As Richard said, if it's running cooler than normal it isn't working. Definately something wrong with his graphics card. Worst case his card crapped out on him and he'll need a new one. Sometimes a reboot will cure it if he's lucky but I don't hold out much hope.

There are a couple of potential problems.

He has something going wrong with his CUDA apps.

He is also getting S@H only when he is attached to another project as well. I was asking for the Debts to see what was wrong there.


BOINC WIKI
ID: 942600 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 12
Germany
Message 942597 - Posted: 24 Oct 2009, 17:27:02 UTC


This is my upper mentioned wingman.

Ops.. a few 'Completed, marked as invalid'..

[http://setiathome.berkeley.edu/results.php?hostid=4753248&offset=0&show_names=0&state=4]
ID: 942597 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 12
Germany
Message 942594 - Posted: 24 Oct 2009, 17:20:24 UTC - in response to Message 942590.  
Last modified: 24 Oct 2009, 17:21:46 UTC

...
Ok, I type too slow. You both beat me to that one. :-)


;-)


-------------------------------

@ all

If you look to your task lists and if a WU is marked as 'Completed, validation inconclusive' then maybe ~ 90 % I have a wingman with stock_CUDA_app and -9 overflow error.

An example:
[http://setiathome.berkeley.edu/workunit.php?wuid=520707256]


ID: 942594 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14114
Credit: 200,643,578
RAC: 1,983
United Kingdom
Message 942593 - Posted: 24 Oct 2009, 17:17:06 UTC - in response to Message 942590.  

-9 overflows can happen in any application, CPU or CUDA, stock or optimised. If they match your wingmate, they're absolutely fine and nothing to worry about. But very few tapes would generate more than 5% or so.

If you are returning -9 from a CUDA card, every time, as Cheech Wizard currently is: and in particular if your wingmates are not returning -9: then you have a problem.

In the early CUDA days (round about BOINC v6.6.23) we saw lots of these - BOINC overloaded the cards with too many tasks at once (pre-empting without clearing memory): someting errored: and all tasks went -9 until the graphics card was given a hard reset (computer reboot). I suspect that's all that's needed - don't let's start worrying about card failures until the simple things have been tried - but please, Cheech - Reboot that computer as soon as possible.
ID: 942593 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 942590 - Posted: 24 Oct 2009, 17:04:06 UTC - in response to Message 942582.  
Last modified: 24 Oct 2009, 17:08:00 UTC

Sutaru,

You are looking at his error tasks. -9 overflow results show up as complete. The -9s are the ones he is getting now. http://setiathome.berkeley.edu/results.php?hostid=2612987

Check the low scoring completed WUs, that's his problem now.


Ok, I type too slow. You both beat me to that one. :-)


PROUD MEMBER OF Team Starfire World BOINC
ID: 942590 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 12
Germany
Message 942589 - Posted: 24 Oct 2009, 17:04:02 UTC
Last modified: 24 Oct 2009, 17:07:52 UTC


I saw a lot of '-9 overflow errors'* with stock_CUDA_app (wingmen) and my PCs make all the time well results.

I guess the stock_CUDA_app is more buggy than we know/is known.. ;-)

1st. the GPU is damaged?
2nd. the stock_CUDA_app is bugggyyyy.

:-)


[* For the new ones under us..
The -9 overflow message is a 'well' sign, that there is too much noise in the WU.
This can happen from time to time.
But it can happen if you make only -9 overflow's that your CPU/GPU is damaged, or some other prob.]

ID: 942589 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14114
Credit: 200,643,578
RAC: 1,983
United Kingdom
Message 942585 - Posted: 24 Oct 2009, 16:55:38 UTC - in response to Message 942582.  


BTW.

All -12 Unknown error's:

[http://setiathome.berkeley.edu/results.php?hostid=2612987&offset=0&show_names=0&state=5]

Yes, he's got a few of those - but it's the hundred or so marked 'invalid' because of the bogus -9 that I'm more worried about:

http://setiathome.berkeley.edu/results.php?hostid=2612987&offset=0&show_names=0&state=4
ID: 942585 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 942584 - Posted: 24 Oct 2009, 16:54:45 UTC - in response to Message 942563.  

John,
in this case he is getting 30 or 40 tasks at a time. They are erroring out so fast that it thinks he needs that much work. That's why I suggested he check for dust bunnies or possibly his card isn't seated quite right. As Richard said, if it's running cooler than normal it isn't working. Definately something wrong with his graphics card. Worst case his card crapped out on him and he'll need a new one. Sometimes a reboot will cure it if he's lucky but I don't hold out much hope.


PROUD MEMBER OF Team Starfire World BOINC
ID: 942584 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 12
Germany
Message 942582 - Posted: 24 Oct 2009, 16:52:16 UTC

ID: 942582 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 12
Germany
Message 942579 - Posted: 24 Oct 2009, 16:48:20 UTC
Last modified: 24 Oct 2009, 16:55:22 UTC


@ Cheech Wizard

It's your Intel(R) Pentium(R) D CPU 2.80GHz with 9800 ?

You get a strange error:
SETI@home error -12 Unknown error
cudaAcc_find_triplets doesn't support more than MAX_TRIPLETS_ABOVE_THRESHOLD numBinsAboveThreshold in find_triplets_kernel
File: c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_pulsefind.cu
Line: 233

[http://setiathome.berkeley.edu/result.php?resultid=1401747099]

I would recommend that you take the Lunatics Installer for opt. apps for your OS:
[http://lunatics.kwsn.net/index.php?module=Downloads;catd=9]

Then check Enhanced/Astropulse for CPU and Enhanced-CUDA for your GPU.
You know the extension of your CPU?
If not, CPU-Z will say it to you.

You crunch SETI@home only on GPU?
Then check only Enhanced-CUDA.

For ~ 30 % speed up, take the CUDA_V2.3 .dll's and copy them in the setiathome.berkeley.edu folder.
[http://lunatics.kwsn.net/index.php?module=Downloads;sa=dlview;id=208]
nVIDIA_driver_190.38+ required.


With the opt._CUDA_V12_app (which is less buggy than the stock_CUDA_app) which you installed with the Lunatics Installer you will have again joy with SETI@home! ;-)

ID: 942579 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Flood of Tiny GPU/cuda WUs


 
©2020 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.