Message boards :
Number crunching :
Flood of Tiny GPU/cuda WUs
Message board moderation
| Author | Message |
|---|---|
ML1 Send message Joined: 25 Nov 01 Posts: 10505 Credit: 7,508,002 RAC: 45
|
Only when the operator is too lazy to check how the name is abbreviated for whatever service is to be reloaded/restarted! I guess those with years of uptime are very good at abbreviations and acronyms :-) ... But too lazy to update their kernel! :-( And there's now a new system for updating live running kernels insitu!... All very clever stuff. Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
|
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0
|
I've seen that fix *nix boxes too. <ducking> |
ML1 Send message Joined: 25 Nov 01 Posts: 10505 Credit: 7,508,002 RAC: 45
|
A small round of applause, everyone, please, for Mike Davis, who solved the problem completely with the first six words typed in response to the opening question. I thought that was always tried first as a fix/work-around for any Windows problem! ;-b Cheers, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0
|
I posted the short term and long term debt lines from the xml file (above)... Yes, but there should be 14 lines, not 2 ;-) |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14114 Credit: 200,643,578 RAC: 1,983
|
A small round of applause, everyone, please, for Mike Davis, who solved the problem completely with the first six words typed in response to the opening question. |
|
Cheech Wizard Send message Joined: 27 Jun 01 Posts: 4 Credit: 7,151,967 RAC: 54
|
Thanks for everyone weighing in and looking at this. I posted the short term and long term debt lines from the xml file (above), but please note that's after a reboot and after it's been running wu's normally now for a couple of hours. GPU temp is back up 60C, where I normally see it when running BOINC. Should have tried the obvious first and re-booted before I started the thread, but that didn't occur to me (duh!) as everything else was running okay. After the reboot, I told BOINC not to download any more SETI work for the time being, so that I can work through the backlog. So it still has a 50+ WU backlog in the queue, but it's now taking the normal 30 minutes or so to execute one task, and if you care to go look at my pending credits, they are getting awarded ~114 credits each...so it looks like all is normal. In 24 hours or so, when the queue is empty of GPU work, I'll download GPUGrip work and see how that goes. Richard, I hadn't noticed the GPUGrid problem, but apparently whatever was haywire was impacting work for both cuda projects. Looks like a simple solution (at least for now) to what looked like a baffling problem. Thanks for the help, everyone. |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0
|
short_term_debt>182.495135</short_term_debt> There are debt values for each project. You need all to compare them. However, as Richard Haselgrove stated, it's probably not a debt but a quota problem. Gruß, Gundolf Computer sind nicht alles im Leben. (Kleiner Scherz) SETI@home classic workunits 3,758 SETI@home classic CPU time 66,520 hours |
|
Cheech Wizard Send message Joined: 27 Jun 01 Posts: 4 Credit: 7,151,967 RAC: 54
|
short_term_debt>182.495135</short_term_debt> <long_term_debt>-7845.488689</long_term_debt> |
|
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0
|
The same host is generating lots of errors at GPUGrid too: OK, that makes sense. BOINC WIKI |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14114 Credit: 200,643,578 RAC: 1,983
|
The same host is generating lots of errors at GPUGrid too: http://www.gpugrid.net/results.php?hostid=46283 Edit - GPUGrid is his only other CUDA project, and he's down to a quota of one per day - that's why he's only fetching SETI work, John, nothing to do with debt. |
|
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0
|
John, There are a couple of potential problems. He has something going wrong with his CUDA apps. He is also getting S@H only when he is attached to another project as well. I was asking for the Debts to see what was wrong there. BOINC WIKI |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 12
|
This is my upper mentioned wingman. Ops.. a few 'Completed, marked as invalid'.. [http://setiathome.berkeley.edu/results.php?hostid=4753248&offset=0&show_names=0&state=4] |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 12
|
... ;-) ------------------------------- @ all If you look to your task lists and if a WU is marked as 'Completed, validation inconclusive' then maybe ~ 90 % I have a wingman with stock_CUDA_app and -9 overflow error. An example: [http://setiathome.berkeley.edu/workunit.php?wuid=520707256]
|
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14114 Credit: 200,643,578 RAC: 1,983
|
-9 overflows can happen in any application, CPU or CUDA, stock or optimised. If they match your wingmate, they're absolutely fine and nothing to worry about. But very few tapes would generate more than 5% or so. If you are returning -9 from a CUDA card, every time, as Cheech Wizard currently is: and in particular if your wingmates are not returning -9: then you have a problem. In the early CUDA days (round about BOINC v6.6.23) we saw lots of these - BOINC overloaded the cards with too many tasks at once (pre-empting without clearing memory): someting errored: and all tasks went -9 until the graphics card was given a hard reset (computer reboot). I suspect that's all that's needed - don't let's start worrying about card failures until the simple things have been tried - but please, Cheech - Reboot that computer as soon as possible. |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0
|
Sutaru, You are looking at his error tasks. -9 overflow results show up as complete. The -9s are the ones he is getting now. http://setiathome.berkeley.edu/results.php?hostid=2612987 Check the low scoring completed WUs, that's his problem now. Ok, I type too slow. You both beat me to that one. :-) PROUD MEMBER OF Team Starfire World BOINC |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 12
|
I saw a lot of '-9 overflow errors'* with stock_CUDA_app (wingmen) and my PCs make all the time well results. I guess the stock_CUDA_app is more buggy than we know/is known.. ;-) 1st. the GPU is damaged? 2nd. the stock_CUDA_app is bugggyyyy. :-) [* For the new ones under us.. The -9 overflow message is a 'well' sign, that there is too much noise in the WU. This can happen from time to time. But it can happen if you make only -9 overflow's that your CPU/GPU is damaged, or some other prob.]
|
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14114 Credit: 200,643,578 RAC: 1,983
|
Yes, he's got a few of those - but it's the hundred or so marked 'invalid' because of the bogus -9 that I'm more worried about: http://setiathome.berkeley.edu/results.php?hostid=2612987&offset=0&show_names=0&state=4 |
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0
|
John, in this case he is getting 30 or 40 tasks at a time. They are erroring out so fast that it thinks he needs that much work. That's why I suggested he check for dust bunnies or possibly his card isn't seated quite right. As Richard said, if it's running cooler than normal it isn't working. Definately something wrong with his graphics card. Worst case his card crapped out on him and he'll need a new one. Sometimes a reboot will cure it if he's lucky but I don't hold out much hope. PROUD MEMBER OF Team Starfire World BOINC |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 12
|
BTW. All -12 Unknown error's: [http://setiathome.berkeley.edu/results.php?hostid=2612987&offset=0&show_names=0&state=5] |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 12
|
@ Cheech Wizard It's your Intel(R) Pentium(R) D CPU 2.80GHz with 9800 ? You get a strange error: SETI@home error -12 Unknown error cudaAcc_find_triplets doesn't support more than MAX_TRIPLETS_ABOVE_THRESHOLD numBinsAboveThreshold in find_triplets_kernel File: c:/sw/gpgpu/seti/seti_boinc/client/cuda/cudaAcc_pulsefind.cu Line: 233 [http://setiathome.berkeley.edu/result.php?resultid=1401747099] I would recommend that you take the Lunatics Installer for opt. apps for your OS: [http://lunatics.kwsn.net/index.php?module=Downloads;catd=9] Then check Enhanced/Astropulse for CPU and Enhanced-CUDA for your GPU. You know the extension of your CPU? If not, CPU-Z will say it to you. You crunch SETI@home only on GPU? Then check only Enhanced-CUDA. For ~ 30 % speed up, take the CUDA_V2.3 .dll's and copy them in the setiathome.berkeley.edu folder. [http://lunatics.kwsn.net/index.php?module=Downloads;sa=dlview;id=208] nVIDIA_driver_190.38+ required. With the opt._CUDA_V12_app (which is less buggy than the stock_CUDA_app) which you installed with the Lunatics Installer you will have again joy with SETI@home! ;-)
|
©2020 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.