Computation Errors on the Rise?

Message boards : Number crunching : Computation Errors on the Rise?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Tklop
Avatar

Send message
Joined: 11 May 03
Posts: 175
Credit: 613,952
RAC: 0
United States
Message 651710 - Posted: 30 Sep 2007, 18:18:34 UTC
Last modified: 30 Sep 2007, 18:22:40 UTC

I've not encountered a ton, but there are several WU's in my Results page which are showing as "Computation Error" -- a result that matches several others' crunching attempts...

Bad work units maybe?

They all stsrt with the same date: 07mr07aj...

Anybody else running into these?

Just a heads up for all of you...

[edit]

Here's a few of them:

07mr07aj.7643.20931.3.6.20

07mr07aj.9032.7843.5.6.6

07mr07aj.9032.16023.5.6.12

[/edit]
Keep on crunching, all...
SETI@Home Forever!


___Tklop (Step-Founder, U.S. Air Force team)
ID: 651710 · Report as offensive
Osiris30

Send message
Joined: 19 Aug 07
Posts: 264
Credit: 41,917,631
RAC: 0
Barbados
Message 651716 - Posted: 30 Sep 2007, 18:23:04 UTC - in response to Message 651710.  

I've not encountered a ton, but there are several WU's in my Results page which are showing as "Computation Error" -- a result that matches several others' crunching attempts...

Bad work units maybe?

They all stsrt with the same date: 07mr07aj...

Anybody else running into these?

Just a heads up for all of you...

[edit]

Here's a few of them:

07mr07aj.7643.20931.3.6.20

07mr07aj.9032.7843.5.6.6

07mr07aj.9032.16023.5.6.12

07mr07aj.9032.16023.5.6.12

[/edit]


Yesterday's outage caused some WUs with a bad header to go out... I had about 10 myself that surfaced last time I checked.

ID: 651716 · Report as offensive
Profile Tklop
Avatar

Send message
Joined: 11 May 03
Posts: 175
Credit: 613,952
RAC: 0
United States
Message 651719 - Posted: 30 Sep 2007, 18:23:53 UTC - in response to Message 651716.  
Last modified: 30 Sep 2007, 18:24:04 UTC



Yesterday's outage caused some WUs with a bad header to go out... I had about 10 myself that surfaced last time I checked.



Ah... Thanks!
Keep on crunching, all...
SETI@Home Forever!


___Tklop (Step-Founder, U.S. Air Force team)
ID: 651719 · Report as offensive
Osiris30

Send message
Joined: 19 Aug 07
Posts: 264
Credit: 41,917,631
RAC: 0
Barbados
Message 651758 - Posted: 30 Sep 2007, 19:07:43 UTC - in response to Message 651719.  



Yesterday's outage caused some WUs with a bad header to go out... I had about 10 myself that surfaced last time I checked.



Ah... Thanks!


You're welcome.. and I just checked again and I have 39 of 'em :(

ID: 651758 · Report as offensive
davidrobertson

Send message
Joined: 9 Sep 99
Posts: 1
Credit: 520,727
RAC: 0
United Kingdom
Message 656804 - Posted: 9 Oct 2007, 15:22:51 UTC - in response to Message 651758.  



Yesterday's outage caused some WUs with a bad header to go out... I had about 10 myself that surfaced last time I checked.



Ah... Thanks!


You're welcome.. and I just checked again and I have 39 of 'em :(



So do we have to do anything with or to them?
I have one 17mr07ab.30863.409083.14.6.24_0
ID: 656804 · Report as offensive
Profile TeamDGC

Send message
Joined: 27 Oct 99
Posts: 19
Credit: 7,091,042
RAC: 0
Finland
Message 656876 - Posted: 9 Oct 2007, 21:24:02 UTC - in response to Message 651710.  



Anybody else running into these?



Yepp! :-(
All time #1 M.U.R.C. Cruncher!
ID: 656876 · Report as offensive
Nicholas Roberts

Send message
Joined: 25 Jun 06
Posts: 4
Credit: 1,195,498
RAC: 0
Argentina
Message 656891 - Posted: 9 Oct 2007, 21:50:30 UTC - in response to Message 656876.  



Anybody else running into these?



Yepp! :-(



All my 17xxxxx's have failed within 3 seconds of starting.

Should I abort my remaining 17xxxxx's before they attempt to crunch? Is there any benefit in aborting them rather than letting them fail?ç

13's, 29's and 18's working OK.

Regards,

Nicholas
ID: 656891 · Report as offensive
Profile *Viking*
Avatar

Send message
Joined: 2 Nov 03
Posts: 17
Credit: 1,051,900
RAC: 1
Canada
Message 656919 - Posted: 9 Oct 2007, 22:27:15 UTC - in response to Message 651710.  

Everything sent today is failing with either client error or compute error.

17mr07ab.30757.4162.12.6.80
17mr07ab.30087.416036.11.6.145
18mr07aa.10677.21749.3.6.156
18mr07aa.10744.5389.5.6.174
18mr07aa.10568.21340.4.6.146
17mr07ab.14071.413991.3.6.121

And others with that workunit are also having their crunching fail.

It's broke. Get the duct tape.
* Viking *
ID: 656919 · Report as offensive
Profile *Viking*
Avatar

Send message
Joined: 2 Nov 03
Posts: 17
Credit: 1,051,900
RAC: 1
Canada
Message 656988 - Posted: 10 Oct 2007, 0:03:09 UTC - in response to Message 656919.  

Update: I deleted the setiathome_5.27 exe and redownloaded it (which took forever) but it's working okay, at least on this latest workunit - 08mr07ad.14035.1300.3.6.154.

I guess we'll see what happens...
* Viking *
ID: 656988 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 656989 - Posted: 10 Oct 2007, 0:03:44 UTC - in response to Message 656891.  



Anybody else running into these?



Yepp! :-(



All my 17xxxxx's have failed within 3 seconds of starting.

Should I abort my remaining 17xxxxx's before they attempt to crunch? Is there any benefit in aborting them rather than letting them fail?ç

13's, 29's and 18's working OK.

Regards,

Nicholas

There could also be good 17xxxxx's, so it's better to do some checking. The problem causes the WU to be smaller than usual, so you could look in the boinc\\projects\\setiathome.berkeley.edu folder for those which are less than the usual size, then abort just those. It doesn't matter much if you abort or let them attempt to crunch, either will be seen as a client-side error and cause a reissue to someone else (until the sixth error is returned).

Another approach is to suspend the good WUs so BOINC tries to crunch the other stuff, then resume the good ones. That way you get the same error as if you'd just let them run normally, but sooner.
                                                               Joe
ID: 656989 · Report as offensive
Profile RandyC
Avatar

Send message
Joined: 20 Oct 99
Posts: 714
Credit: 1,704,345
RAC: 0
United States
Message 657007 - Posted: 10 Oct 2007, 0:41:46 UTC - in response to Message 656989.  


There could also be good 17xxxxx's, so it's better to do some checking. The problem causes the WU to be smaller than usual, so you could look in the boinc\\projects\\setiathome.berkeley.edu folder for those which are less than the usual size, then abort just those. It doesn't matter much if you abort or let them attempt to crunch, either will be seen as a client-side error and cause a reissue to someone else (until the sixth error is returned).

Another approach is to suspend the good WUs so BOINC tries to crunch the other stuff, then resume the good ones. That way you get the same error as if you'd just let them run normally, but sooner.
                                                               Joe


This is like playing whack-a-mole. Every time I abort one, another zero length WU downloads. And my max daily quota is dropping too. I'm going to let them crash normally. At least that way my max daily quota will suffer less.
ID: 657007 · Report as offensive
Profile Francesco Forti
Avatar

Send message
Joined: 24 May 00
Posts: 334
Credit: 204,421,005
RAC: 15
Switzerland
Message 657191 - Posted: 10 Oct 2007, 5:24:49 UTC
Last modified: 10 Oct 2007, 5:29:50 UTC

I too have seen some Compute errors in the last
week and also now, after the restart from the
last tuesday outage.

Some minute ago this host http://setiathome.berkeley.edu/show_host_detail.php?hostid=1852935
had four errors like:

<core_client_version>5.10.20</core_client_version>
<![CDATA[
<message>
- exit code -6 (0xfffffffa)
</message>
<stderr_txt>
SETI@home error -6 Bad workunit header
!swi.data_type || !found || !swi.nsamples
File: ..\\seti_header.cpp
Line: 235


</stderr_txt>
]]>

I use 5.10.20 with optimized seti (2.4V)

Optimized SETI@Home Enhanced application
Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra
Version: Windows SSE 32-bit based on S@H V5.15 'Noo? No - Ni!'
Revision: R-2.4V|xK|FFT:IPP_SSE|Ben-Joe

Bye,
Franz

PS: I had to add...
al the four run unit was sent to me the
9 Oct 2007 22:08:59 UTC ... after the last outage.
It is new job, not old.

ID: 657191 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 657193 - Posted: 10 Oct 2007, 5:28:22 UTC

Yep, there was a mess of clinkers early on after the restart for awhile, but the ones I've been getting since around midnight UTC seem to be clean so far.

Alinator
ID: 657193 · Report as offensive
Osiris30

Send message
Joined: 19 Aug 07
Posts: 264
Credit: 41,917,631
RAC: 0
Barbados
Message 657201 - Posted: 10 Oct 2007, 6:06:32 UTC - in response to Message 657193.  

Yep, there was a mess of clinkers early on after the restart for awhile, but the ones I've been getting since around midnight UTC seem to be clean so far.

Alinator


I'm still getting 5-10% bad WUs, but nothing to worry about. They'll burn through quick enough..
ID: 657201 · Report as offensive
Matthias Lehmkuhl Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 5 Oct 99
Posts: 28
Credit: 10,832,348
RAC: 53
Germany
Message 657278 - Posted: 10 Oct 2007, 11:00:34 UTC

I got two of them

<message>
process exited with code 250 (0xfa, -6)
</message>

or
<message>
- exit code -6 (0xfffffffa)
</message>

<stderr_txt>
SETI@home error -6 Bad workunit header
!swi.data_type || !found || !swi.nsamples
File: seti_header.cpp
Line: 235


wuid=163695795
wuid=163212600

Matthias
Matthias

ID: 657278 · Report as offensive

Message boards : Number crunching : Computation Errors on the Rise?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.