-9 Errors on C/GPU

Message boards : Number crunching : -9 Errors on C/GPU
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Dead4life Project Donor

Send message
Joined: 22 Jun 04
Posts: 5
Credit: 6,960,721
RAC: 3
United Kingdom
Message 1067640 - Posted: 17 Jan 2011, 13:59:08 UTC

Gentlemen.

I have just added a GT240 to my rig, in an effort to double my RAC: http://setiathome.berkeley.edu/show_host_detail.php?hostid=5730784

Having read a number of threads I decided to install the Lunatics apps for CPU and GPU, and then to watch the results that were being reported for any errors.

Two errors have recently shown themselves, one on CPU, and one on GPU. These units both returned -9 errors, but they were both reported at the same time, which I think is odd, considering these are the only two units that have shown the error:
http://setiathome.berkeley.edu/workunit.php?wuid=685558199 GPU
http://setiathome.berkeley.edu/workunit.php?wuid=685432987 CPU

If you look at the sent/received times, they are in line for my usual processing times..

Any thoughts or diagnosis?

Euan
ID: 1067640 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1067642 - Posted: 17 Jan 2011, 14:17:10 UTC - in response to Message 1067640.  

These seem to be legitimate -9s. The rest of your work is coming in okay. Wait on those two to see if your wingmen get -9s too. All that code means is that the work unit had too much noise in it mostly caused by ground interference. It's only a problem when you get all or most of your work marked as -9. That could mean your card is overheating or going bad. I'd say you have no problem now though.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1067642 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1067643 - Posted: 17 Jan 2011, 14:18:58 UTC - in response to Message 1067640.  

A -9 isn't an an error as such, it just means the app has found enough signals:

SETI@Home Informational message -9 result_overflow
NOTE: The number of results detected exceeds the storage space allocated.

If both your tasks validate against your wingmen when they complete their tasks, then there was no problem,
Also note that GPU's are something prone to producing false -9's, you'll know about that since all your tasks on that GPU will be -9's,

Claggy
ID: 1067643 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1067653 - Posted: 17 Jan 2011, 15:15:22 UTC - in response to Message 1067643.  
Last modified: 17 Jan 2011, 16:10:04 UTC

And why those -9 errors/real-overflows, are computed, is still not clear, IMO. One (CUDA) rig, Q6600+GTS250 (VISTA32Home), is still stable.(v.037;LUNATICs)
And ofcoarse the X9650+GTX470 & 480, running now 2 WU's per card, is stable.
Also v0.37 LUNATICs. (In both cases, still -9 results, no errors)
(Never found a reproduceble error and/or reason, for faults on the 9800GTX+ & GTS250 combi, besides heat.
Lowering GPU-Clock and/or undervolting, sometimes works, but are not that easy on most of the 200 series GPU's, a seperate program (TThrotle; eFMer), is needed.

Just lost my oldest FlatScreen 17''; 4 years, 75% Uptime. :(
Well, they are less expensive nowadays.....

ADDED
<message>
- exit code -12 (0xfffffff4)
</message>
<stderr_txt>
setiathome_CUDA: Found 2 CUDA device(s):
Device 1: GeForce GTX 480, 1535 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 15
clockRate = 1388000
Device 2: GeForce GTX 470, 1279 MiB, regsPerBlock 32768
computeCap 2.0, multiProcs 14
clockRate = 1215000
setiathome_CUDA: CUDA Device 1 specified, checking...
Device 1: GeForce GTX 480 is okay
SETI@home using CUDA accelerated device GeForce GTX 480
---[Snipped]---
Multibeam x32f Preview, Cuda 3.0

Work Unit Info:
...............
WU true angle range is : 2.702858
SETI@home error -12 Unknown error
cudaAcc_find_triplets doesn't support more than MAX_TRIPLETS_ABOVE_THRESHOLD numBinsAboveThreshold in find_triplets_kernel
File: d:/[Projects]/Berkeley/seti_cuda/seti_boinc/client/cuda/cudaAcc_pulsefind.cu
Line: 253
Result 1770622711.

During this period, 3 WU's per card, were running, switched back to 2.
ID: 1067653 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1067715 - Posted: 17 Jan 2011, 18:17:32 UTC

Speaking of -9s, I see our buddy smithwr3 still hasn't got the word on the new lunatic's app. Just got another inconclusive from him running the old V12 app. He's got a RAC of 4664 thanks to his I7 but I guess he doesn't care that he could be so much higher if his two 480s were turning out good work. He's got over 17,000 work units on that machine and I'd bet most of them will be marked invalid.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1067715 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1067728 - Posted: 17 Jan 2011, 18:50:20 UTC - in response to Message 1067715.  

Speaking of -9s, I see our buddy smithwr3 still hasn't got the word on the new lunatic's app. Just got another inconclusive from him running the old V12 app. He's got a RAC of 4664 thanks to his I7 but I guess he doesn't care that he could be so much higher if his two 480s were turning out good work. He's got over 17,000 work units on that machine and I'd bet most of them will be marked invalid.

Yes I had one recently paired with him. I sent a pm, but got no response.
ID: 1067728 · Report as offensive
Profile James Sotherden
Avatar

Send message
Joined: 16 May 99
Posts: 10436
Credit: 110,373,059
RAC: 54
United States
Message 1067732 - Posted: 17 Jan 2011, 18:58:40 UTC

My i7 with its little old GTS 250 has a rac of over 11,000. Glad to know I can beat a fermi card.:)
[/quote]

Old James
ID: 1067732 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1067767 - Posted: 17 Jan 2011, 21:14:06 UTC - in response to Message 1067732.  

My i7 with its little old GTS 250 has a rac of over 11,000. Glad to know I can beat a fermi card.:)


Feel even better, he has a pair of 480s. You're beating two of them!

Bernie, I sent him a PM maybe two months ago and got no reply. He must have Email notification turned off. The last time he posted was over 1400 days ago so it looks like he doesn't look in here either.



PROUD MEMBER OF Team Starfire World BOINC
ID: 1067767 · Report as offensive
Profile Geek@Work
Avatar

Send message
Joined: 29 Oct 02
Posts: 6
Credit: 753,666
RAC: 0
United States
Message 1067871 - Posted: 18 Jan 2011, 2:19:23 UTC

Berkeley could block smithwr3 account if they felt there was any damage to the project.
Since they have not done so............????????????
Boinc....Boinc....Boinc....Boinc
ID: 1067871 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1068249 - Posted: 19 Jan 2011, 15:32:37 UTC - in response to Message 1067871.  


Since they have not done so............????????????

It means they have no time for this manual work.....
ID: 1068249 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1069874 - Posted: 23 Jan 2011, 16:43:39 UTC - in response to Message 1068249.  

Another (false)-9 error, on CPU, confirmed by 2 CUDA FERMI results:

06no10ad.32347.16427.15.10.67
.

ID: 1069874 · Report as offensive

Message boards : Number crunching : -9 Errors on C/GPU


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.