different kinds of errors

Message boards : Number crunching : different kinds of errors
Message board moderation

To post messages, you must log in.

AuthorMessage
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1343259 - Posted: 5 Mar 2013, 15:07:45 UTC

On this WU, five of us got MAX_TRIPLETTS errors and one got 28 pulses and 3 triplets for a -9 overflow. So how come the -9 counts as a completion when the others don't?

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1343259 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1343261 - Posted: 5 Mar 2013, 15:12:52 UTC
Last modified: 5 Mar 2013, 16:05:55 UTC

The other results were done by GPUs. Which when they get to many triplets throw out a -12 exit status which is classified as an error. The -9 exit status is not classified as an error, but indicates that 30 or more than 30 results were found. Which stops processing at that point..
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1343261 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1343282 - Posted: 5 Mar 2013, 16:44:57 UTC

-9 is only 'informational' - after finding more than 30 signals the task is deemed too noisy and stopped - results are returned, the '-9' is used for outlier detection, so the short runtime won't influence APR.
It's not an error, it just has an errorcode. Errorcodes can be used to caryy other information as well ;)

-12 is that infamous triplet design flaw (as discussed in the panic thread). It may be on a task that would overflow, but it can crop up on any task, depending on how the triplets are distributed. The app can't handle it and quits. That's a hard error that kills the task. At least the app quits graciously...
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1343282 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1343732 - Posted: 7 Mar 2013, 14:19:18 UTC

Okay then, how come this one's 4th host got a -9 overflow for 31 triplets instead of a -12 MAX_TRIPLETS?

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1343732 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1343734 - Posted: 7 Mar 2013, 14:47:35 UTC - in response to Message 1343732.  
Last modified: 7 Mar 2013, 14:54:00 UTC

That host is using x41zc which looks to have more code to handle triplets than the stock or older releases.
Thread call stack limit is: 1k
Find triplets Cuda kernel encountered too many triplets, or bins above threshold, reprocessing this PoT on CPU...
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.

SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1343734 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1343736 - Posted: 7 Mar 2013, 15:00:47 UTC - in response to Message 1343732.  

Okay then, how come this one's 4th host got a -9 overflow for 31 triplets instead of a -12 MAX_TRIPLETS?

It's a difficult workunit, with all those triplets. Different versions of the science application handle the processing in different ways: in particular, the GPU (cuda) applications do a lot of the triplet checking earlier in the run than the CPU applications.

So, from the top:

#1, stock v6.10 cuda. Found triplets early, called error.
#2, optimised x41g cuda. Found triplets early, called error.
#3, stock v6.03 CPU. Found 6 pulses before getting bogged down in triplets. Called overflow.
#4, optimised x41zc cuda. Found the triplets early, but was able to handle it by calling overflow instead of error.

It's unfortunate that the different processing order means that the reported pulses caused the #3 and #4 results to be only 'weakly similar', and so yet another copy was sent out as a tiebreaker. That makes five downloads for what is, basically, a pretty uninteresting workunit, with (almost certainly) too much RFI swamping any science of value.

The various developers (Eric Korpela for the stock app, and the various members of the Lunatics crew) are aware of the validation problems caused by the different processing architectures: but to make efficient use of the parallel processing capabilities of GPUs, things have to be shuffled around quite a lot, and sometimes - though mercifully rarely - artefacts like this are unavoidable.
ID: 1343736 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1345031 - Posted: 10 Mar 2013, 18:10:50 UTC

WU 1179402281 is an interesting example of this.

There are the returned results, before they get purged:

2855533288	6910484	1 Mar 2013, 16:39:55 UTC	2 Mar 2013, 1:56:38 UTC		Completed, marked as invalid	92.82	1.14	0.00	SETI@home Enhanced Anonymous platform (NVIDIA GPU)
2855533289	6366977	1 Mar 2013, 16:39:56 UTC	2 Mar 2013, 5:11:26 UTC		Error while computing		3.25	1.58	---	SETI@home Enhanced Anonymous platform (NVIDIA GPU)
2856466467	5378646	2 Mar 2013, 12:28:10 UTC	7 Mar 2013, 22:20:40 UTC	Completed and validated		17.19	16.16	0.11	SETI@home Enhanced v6.03
2863422788	4830263	8 Mar 2013, 5:23:44 UTC		10 Mar 2013, 6:46:52 UTC	Completed and validated		50.99	48.58	0.11	SETI@home Enhanced v6.03

I'm at the top with 31 triplets found by x41zc, and reported as success.
Next came x41g, with the triplet finding error.
The WU was finally completed by two steady old stock CPU crunchers, who agreed on 31 pulses.

Even the single-core P4 spent less time on it than my i7 with GTX 670. Sometimes it doesn't pay to be too clever :P
ID: 1345031 · Report as offensive

Message boards : Number crunching : different kinds of errors


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.