Bad Batch? All the WUs I get are being trashed

Message boards : Number crunching : Bad Batch? All the WUs I get are being trashed
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Tigher
Volunteer tester

Send message
Joined: 18 Mar 04
Posts: 1547
Credit: 760,577
RAC: 0
United Kingdom
Message 213133 - Posted: 13 Dec 2005, 18:59:58 UTC
Last modified: 13 Dec 2005, 19:17:24 UTC

On both Linux and Windows systems every downloaded WU I get is being trashed. Unrecoverable Error : Exit Code 144 (no minus) or -112 or -6 have all been seen.

Anyone else seeing similar/same? Iddeas on what be wrong?

ID: 213133 · Report as offensive
Profile [B^S] Spydermb
Volunteer tester
Avatar

Send message
Joined: 16 Jul 99
Posts: 496
Credit: 10,860,148
RAC: 0
United States
Message 213143 - Posted: 13 Dec 2005, 19:11:21 UTC - in response to Message 213133.  

On both Linux and Windows systems every downloaded WU I get is being trashed. Unrecoverable Error : Exit Code 144 (no minus) or -112 or -6 have all been seen.

Anyone else seeing similar/same? Iddeas on what be wrong?



Tigher, Iam also getting these unrecoverable errors, mostly -112, I have one box that hasnt had any issues. I think iam going reboot the one that does and see if it corrects itself, I wish i knew what the issues is. at least it just isnt me. LOL !
BOINC SYNERGY is an International Team and We Welcome All BOINC Participants!
BOINC Synergy Click to Join BOINC Synergy
ID: 213143 · Report as offensive
Profile Tigher
Volunteer tester

Send message
Joined: 18 Mar 04
Posts: 1547
Credit: 760,577
RAC: 0
United Kingdom
Message 213150 - Posted: 13 Dec 2005, 19:18:09 UTC
Last modified: 13 Dec 2005, 19:21:04 UTC

Ah not alone then.

All mine begin with 22ap04ab 11574 and 29697 or 26320
Just had one download (different range) that appears to be running OK.
So it may be a very bad batch of data?

ID: 213150 · Report as offensive
Grenadier
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 63
Credit: 5,445,784
RAC: 0
United States
Message 213154 - Posted: 13 Dec 2005, 19:21:47 UTC

I got a ton of these -6 errors this morning too. Shut down work requests for a few hours, and now everything seems fine again. (Although now I'm getting 'No work available' sometimes.)
ID: 213154 · Report as offensive
Profile Nightlord
Avatar

Send message
Joined: 17 Oct 01
Posts: 117
Credit: 1,316,241
RAC: 0
United Kingdom
Message 213156 - Posted: 13 Dec 2005, 19:22:53 UTC
Last modified: 13 Dec 2005, 19:29:36 UTC

Yes I'm getting them on one machine that recently downloaded, all others are ok:

13/12/2005 18:10:43|SETI@home|Starting result 22ap04ab.11574.28833.311076.14_0 using setiathome version 411
13/12/2005 18:10:43|SETI@home|Starting result 27se04ab.23281.3714.940902.88_2 using setiathome version 411
13/12/2005 18:10:44|SETI@home|Unrecoverable error for result 22ap04ab.11574.28833.311076.14_0 ( - exit code -6 (0xfffffffa))
13/12/2005 18:10:44|SETI@home|Unrecoverable error for result 27se04ab.23281.3714.940902.88_2 ( - exit code -6 (0xfffffffa))
13/12/2005 18:10:44||request_reschedule_cpus: process exited
13/12/2005 18:10:44|SETI@home|Finished download of 27se04ab.23281.3714.940902.84
13/12/2005 18:10:44|SETI@home|Throughput 0 bytes/sec
13/12/2005 18:10:44|SETI@home|Finished download of 27se04ab.23281.3714.940902.86
13/12/2005 18:10:44|SETI@home|Throughput 0 bytes/sec
13/12/2005 18:10:44|SETI@home|Computation for result

Note the zero bytes per second download too.
This machine is as stable as it gets, no overclock, run many thousands of WU with Tetsuji's app and hundreds with Crunch3r new app with no issues. All other machines also running same client.

Do I smell a rat?



ID: 213156 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 213160 - Posted: 13 Dec 2005, 19:24:35 UTC
Last modified: 13 Dec 2005, 19:29:48 UTC

Hi, I got a bunch of these on one host:

[SETI@home] Unrecoverable error for result 29se04aa.5782.13442.404844.8_1 (process exited with code 250 (0xfa))


Probably the same thing. It's back to normal now BTW.


Regards Hans

P.S: Yikes! I still have lots of them in store: Look out for zero sized WUs in your project folder.

P.P.S: I'll try to delete them to force a new download.

ID: 213160 · Report as offensive
Albatros

Send message
Joined: 2 Jul 00
Posts: 7
Credit: 245,899
RAC: 0
Germany
Message 213163 - Posted: 13 Dec 2005, 19:25:34 UTC

13.12.2005 20:19:03||Starting BOINC client version 5.3.1 for windows_intelx86
13.12.2005 20:19:03||libcurl/7.14.0 OpenSSL/0.9.8 zlib/1.2.3
13.12.2005 20:19:03||Data directory: D:\\Programme\\BOINC
13.12.2005 20:19:03|SETI@home|Found app_info.xml; using anonymous platform
13.12.2005 20:19:03||Processor: 1 AuthenticAMD AMD Sempron(tm) 2500+
13.12.2005 20:19:03||Memory: 447.48 MB physical, 1.41 GB virtual
13.12.2005 20:19:03||Disk: 40.00 GB total, 25.05 GB free
13.12.2005 20:19:03|rosetta@home|Computer ID: 97839; location: home; project prefs: default
13.12.2005 20:19:03|Predictor @ Home|Computer ID: 190087; location: home; project prefs: default
13.12.2005 20:19:03|SETI@home|Computer ID: 1028322; location: home; project prefs: default
13.12.2005 20:19:03||General prefs: from rosetta@home (last modified 2005-12-12 20:30:59)
13.12.2005 20:19:03||General prefs: no separate prefs for home; using your defaults
13.12.2005 20:19:03||Remote control not allowed; using loopback address
13.12.2005 20:19:03|rosetta@home|Deferring computation for result 1ogw__topology_sample_38826_1
13.12.2005 20:19:03|Predictor @ Home|Resuming computation for result bprion_4_84032_2 using mfoldB125 version 428
13.12.2005 20:19:03|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
13.12.2005 20:19:03|SETI@home|Reason: To fetch work
13.12.2005 20:19:03|SETI@home|Requesting 86400 seconds of new work
13.12.2005 20:19:04|rosetta@home|Restarting result 1ogw__topology_sample_38826_1 using rosetta version 480
13.12.2005 20:19:04|Predictor @ Home|Pausing result bprion_4_84032_2 (left in memory)
13.12.2005 20:19:13|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded
13.12.2005 20:19:14|SETI@home|Started download of 22ap04ab.11574.30993.997154.27
13.12.2005 20:19:14|SETI@home|Started download of 22ap04ab.11574.30993.997154.32
13.12.2005 20:19:17|SETI@home|Finished download of 22ap04ab.11574.30993.997154.27
13.12.2005 20:19:17|SETI@home|Throughput 0 bytes/sec
13.12.2005 20:19:17|SETI@home|Finished download of 22ap04ab.11574.30993.997154.32
13.12.2005 20:19:17|SETI@home|Throughput 0 bytes/sec
13.12.2005 20:19:17|SETI@home|Started download of 22ap04ab.11574.30993.997154.34
13.12.2005 20:19:17|SETI@home|Started download of 22ap04ab.11574.30993.997154.26
13.12.2005 20:19:18|SETI@home|Deferring communication with project for 10 minutes and 0 seconds
13.12.2005 20:19:18||request_reschedule_cpus: files downloaded
13.12.2005 20:19:18||request_reschedule_cpus: files downloaded
13.12.2005 20:19:20|SETI@home|Finished download of 22ap04ab.11574.30993.997154.34
13.12.2005 20:19:20|SETI@home|Throughput 0 bytes/sec
13.12.2005 20:19:21||request_reschedule_cpus: files downloaded
13.12.2005 20:19:22|SETI@home|Finished download of 22ap04ab.11574.30993.997154.26
13.12.2005 20:19:22|SETI@home|Throughput 51468 bytes/sec
13.12.2005 20:19:23||request_reschedule_cpus: files downloaded

"Throughput 0 bytes/sec" - that sounds very strange to me... When I had a look at the directory I saw that 2 of the 3 files downloaded have a size of 0 bytes

Uli
ID: 213163 · Report as offensive
Profile Nightlord
Avatar

Send message
Joined: 17 Oct 01
Posts: 117
Credit: 1,316,241
RAC: 0
United Kingdom
Message 213174 - Posted: 13 Dec 2005, 19:33:58 UTC - in response to Message 213163.  



"Throughput 0 bytes/sec" - that sounds very strange to me... When I had a look at the directory I saw that 2 of the 3 files downloaded have a size of 0 bytes

Uli



Confirmed, my -6 error WU also zero bytes on disk too

ID: 213174 · Report as offensive
Profile Tigher
Volunteer tester

Send message
Joined: 18 Mar 04
Posts: 1547
Credit: 760,577
RAC: 0
United Kingdom
Message 213176 - Posted: 13 Dec 2005, 19:35:10 UTC
Last modified: 13 Dec 2005, 19:35:26 UTC

Hmmm now I got an exit code 250 from a WU on a linux box. I reckon I might win this had royal flush!

ID: 213176 · Report as offensive
Profile kev1701e
Avatar

Send message
Joined: 28 Dec 99
Posts: 138
Credit: 10,216,553
RAC: 0
United States
Message 213177 - Posted: 13 Dec 2005, 19:35:15 UTC

Here's what I've been getting:

<core_client_version>5.2.13</core_client_version>
<message> - exit code -6 (0xfffffffa)
</message>
<stderr_txt>
SETI@home error -6 Bad workunit header
!swi.data_type || !found || !swi.nsamples
File: ..\\seti_header.cpp
Line: 194

From these WUs:

22ap04ab.11574.19952.222162.36_2
22ap04ab.11574.19952.222162.41_1
22ap04ab.11574.19952.222162.42_2
22ap04ab.11574.26368.684642.79_2
27se04ab.6580.2850.648582.219_0

kev

kev

X2 4400+,4200+ @2.75GHz, XP1800+ @1.65GHz, P4 @1.6GHz
ID: 213177 · Report as offensive
Grenadier
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 63
Credit: 5,445,784
RAC: 0
United States
Message 213193 - Posted: 13 Dec 2005, 19:46:08 UTC

Spoke too soon. I'm still getting these too. Not sure if they're really bad WU's, or if they're just bad downloads (in which case turning off new work would help.)


ID: 213193 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 213199 - Posted: 13 Dec 2005, 19:50:09 UTC

I tried to manually download one of these using TMR's ftodir and wget, but I'm getting 404 errors.
These WUs seem to have disappeared before they could be downloaded.
Usually you can re-download a WU until about 2 weeks after it went out first.

Regards Hans
ID: 213199 · Report as offensive
Profile Tigher
Volunteer tester

Send message
Joined: 18 Mar 04
Posts: 1547
Credit: 760,577
RAC: 0
United Kingdom
Message 213202 - Posted: 13 Dec 2005, 19:52:59 UTC - in response to Message 213199.  

I tried to manually download one of these using TMR's ftodir and wget, but I'm getting 404 errors.
These WUs seem to have disappeared before they could be downloaded.
Usually you can re-download a WU until about 2 weeks after it went out first.

Regards Hans


Hmmmm strange goings on! Well I guess we wait until they are out of bad units until we get good ones.


ID: 213202 · Report as offensive
Rayburner
Volunteer tester

Send message
Joined: 25 Nov 03
Posts: 18
Credit: 11,745,976
RAC: 0
Germany
Message 213203 - Posted: 13 Dec 2005, 19:53:15 UTC

Looks like a bad download to me; according to the log it was downloaded with 0 bytes/sec. In the project folder the wu is only 0 bytes large.

13.12.2005 20:41:51|SETI@home|Started download of 22ap04ab.11574.32290.373588.127
13.12.2005 20:41:54|SETI@home|Finished download of 22ap04ab.11574.32290.373588.127
13.12.2005 20:41:54|SETI@home|Throughput 0 bytes/sec

ID: 213203 · Report as offensive
Profile Clyde C. Phillips, III

Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 213207 - Posted: 13 Dec 2005, 19:58:20 UTC

Notice the workunits with identical numbers except for the last one to three digits at the end (0 to 255). I got three today. The data for those units are from the same segment of tape, i.e., were taken at the same time. The band on the tape is 2.5 MHz wide but the splitter splits the data on the tape into 256 bandlets of 2,500,000 / 256 = 9766 Hz wide for each workunit. So if a workunit is contaminated it is highly probable that some, or all, of its bandmates will be contaminated, too.
ID: 213207 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 213208 - Posted: 13 Dec 2005, 19:58:26 UTC
Last modified: 13 Dec 2005, 20:02:04 UTC

This is from ethereal. Weird...


GET /sah/download_fanout/5a/22ap04ab.11574.18450.161082.250 HTTP/1.0

User-Agent: BOINC client (i686-pc-linux-gnu 4.27)

Host: setiboincdata.ssl.berkeley.edu:80

Connection: close

Accept: */*



HTTP/1.1 302 Found

Date: Tue, 13 Dec 2005 19:23:14 GMT

Server: Apache/1.3.33 (Unix) mod_fastcgi/2.4.2

Location: http://boinc2.ssl.berkeley.edu/sah/download_fanout/5a/22ap04ab.11574.18450.161082.250

Connection: close

Content-Type: text/html; charset=iso-8859-1



<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>302 Found</TITLE>
</HEAD><BODY>
<H1>Found</H1>
The document has moved <A HREF="http://boinc2.ssl.berkeley.edu/sah/download_fanout/5a/22ap04ab.11574.18450.161082.250">here</A>.<P>
<HR>
<ADDRESS>Apache/1.3.33 Server at setiboincdata.ssl.berkeley.edu Port 80</ADDRESS>
</BODY></HTML>



Should the client be able to parse this?

Regards Hans
ID: 213208 · Report as offensive
Profile Tigher
Volunteer tester

Send message
Joined: 18 Mar 04
Posts: 1547
Credit: 760,577
RAC: 0
United Kingdom
Message 213213 - Posted: 13 Dec 2005, 20:03:05 UTC - in response to Message 213208.  
Last modified: 13 Dec 2005, 20:05:31 UTC

This is from ethereal. Weird...


GET /sah/download_fanout/5a/22ap04ab.11574.18450.161082.250 HTTP/1.0

User-Agent: BOINC client (i686-pc-linux-gnu 4.27)

Host: setiboincdata.ssl.berkeley.edu:80

Connection: close

Accept: */*



HTTP/1.1 302 Found

Date: Tue, 13 Dec 2005 19:23:14 GMT

Server: Apache/1.3.33 (Unix) mod_fastcgi/2.4.2

Location: http://boinc2.ssl.berkeley.edu/sah/download_fanout/5a/22ap04ab.11574.18450.161082.250

Connection: close

Content-Type: text/html; charset=iso-8859-1



<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>302 Found</TITLE>
</HEAD><BODY>
<H1>Found</H1>
The document has moved <A HREF="http://boinc2.ssl.berkeley.edu/sah/download_fanout/5a/22ap04ab.11574.18450.161082.250">here</A>.<P>
<HR>
<ADDRESS>Apache/1.3.33 Server at setiboincdata.ssl.berkeley.edu Port 80</ADDRESS>
</BODY></HTML>



Should the client be able to parse this?

Regards Hans


Hmmm 302. Exists but not here. Do not cache new URI. Do not re-direct without user agreement. Gets even more strange ! I guess the client cant make sense of this. If it could it would have to ask us for agreement to go to the new location. Otherwise it busts the RFC wide open and ALL security. So I think the client will dump it.



ID: 213213 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 213218 - Posted: 13 Dec 2005, 20:07:04 UTC

Yup. Bad batch of workunits. Among other things last week one of the storage devices filled up, so there are about 20,000 workunits that are 0-length files. Apparently we need to handle this case a bit better, but in the meantime we're just causing the clients to DOS us for a little bit before these workunits error out and get deleted.

Think of it as a system-wide stress test.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 213218 · Report as offensive
Profile Tigher
Volunteer tester

Send message
Joined: 18 Mar 04
Posts: 1547
Credit: 760,577
RAC: 0
United Kingdom
Message 213221 - Posted: 13 Dec 2005, 20:08:35 UTC - in response to Message 213218.  

Yup. Bad batch of workunits. Among other things last week one of the storage devices filled up, so there are about 20,000 workunits that are 0-length files. Apparently we need to handle this case a bit better, but in the meantime we're just causing the clients to DOS us for a little bit before these workunits error out and get deleted.

Think of it as a system-wide stress test.

- Matt


Matt
I would of thought you would have had enough of "stress" lately. :)


ID: 213221 · Report as offensive
Profile [B^S] Spydermb
Volunteer tester
Avatar

Send message
Joined: 16 Jul 99
Posts: 496
Credit: 10,860,148
RAC: 0
United States
Message 213238 - Posted: 13 Dec 2005, 20:18:22 UTC - in response to Message 213218.  

Yup. Bad batch of workunits. Among other things last week one of the storage devices filled up, so there are about 20,000 workunits that are 0-length files. Apparently we need to handle this case a bit better, but in the meantime we're just causing the clients to DOS us for a little bit before these workunits error out and get deleted.

Think of it as a system-wide stress test.

- Matt



Thanks for FYI Matt, relax man, things will be ok...
BOINC SYNERGY is an International Team and We Welcome All BOINC Participants!
BOINC Synergy Click to Join BOINC Synergy
ID: 213238 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Bad Batch? All the WUs I get are being trashed


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.