Computation Error - Bad Workunit Header

Message boards : Number crunching : Computation Error - Bad Workunit Header
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 8 · Next

AuthorMessage
Jim Volfan

Send message
Joined: 22 May 99
Posts: 52
Credit: 24,239,706
RAC: 90
United States
Message 723205 - Posted: 7 Mar 2008, 23:48:10 UTC

I just downloaded 4 workunits and all 4 errored out with Bad Workunit Header

3/7/2008 5:56:47 PM|SETI@home|Reason: Unrecoverable error for result 13fe08ac.24787.2526.4.7.242_1 ( - exit code -6 (0xfffffffa))
3/7/2008 5:56:47 PM|SETI@home|Computation for task 13fe08ac.24787.2526.4.7.242_1 finished
3/7/2008 5:56:47 PM|SETI@home|Output file 13fe08ac.24787.2526.4.7.242_1_0 for task 13fe08ac.24787.2526.4.7.242_1 absent
3/7/2008 6:08:20 PM|SETI@home|Reason: Unrecoverable error for result 13fe08ac.24787.2526.4.7.81_0 ( - exit code -6 (0xfffffffa))
3/7/2008 6:08:20 PM|SETI@home|Computation for task 13fe08ac.24787.2526.4.7.81_0 finished
3/7/2008 6:08:20 PM|SETI@home|Output file 13fe08ac.24787.2526.4.7.81_0_0 for task 13fe08ac.24787.2526.4.7.81_0 absent
3/7/2008 6:13:16 PM|SETI@home|Reason: Unrecoverable error for result 13fe08ac.24787.2526.4.7.223_0 ( - exit code -6 (0xfffffffa))
3/7/2008 6:13:16 PM|SETI@home|Computation for task 13fe08ac.24787.2526.4.7.223_0 finished
3/7/2008 6:13:16 PM|SETI@home|Output file 13fe08ac.24787.2526.4.7.223_0_0 for task 13fe08ac.24787.2526.4.7.223_0 absent
3/7/2008 6:15:16 PM|SETI@home|Reason: Unrecoverable error for result 13fe08ac.24787.2526.4.7.187_1 ( - exit code -6 (0xfffffffa))
3/7/2008 6:15:16 PM|SETI@home|Computation for task 13fe08ac.24787.2526.4.7.187_1 finished
3/7/2008 6:15:16 PM|SETI@home|Output file 13fe08ac.24787.2526.4.7.187_1_0 for task 13fe08ac.24787.2526.4.7.187_1 absent

<core_client_version>5.8.16</core_client_version>
<![CDATA[
<message>
- exit code -6 (0xfffffffa)
</message>
<stderr_txt>
SETI@home error -6 Bad workunit header
!swi.data_type || !found || !swi.nsamples
File: ..\\seti_header.cpp
Line: 235


</stderr_txt>
]]>

<core_client_version>5.8.16</core_client_version>
<![CDATA[
<message>
- exit code -6 (0xfffffffa)
</message>
<stderr_txt>
SETI@home error -6 Bad workunit header
!swi.data_type || !found || !swi.nsamples
File: ..\\seti_header.cpp
Line: 235


</stderr_txt>
]]>

<core_client_version>5.8.16</core_client_version>
<![CDATA[
<message>
- exit code -6 (0xfffffffa)
</message>
<stderr_txt>
SETI@home error -6 Bad workunit header
!swi.data_type || !found || !swi.nsamples
File: ..\\seti_header.cpp
Line: 235


</stderr_txt>
]]>

<core_client_version>5.8.16</core_client_version>
<![CDATA[
<message>
- exit code -6 (0xfffffffa)
</message>
<stderr_txt>
SETI@home error -6 Bad workunit header
!swi.data_type || !found || !swi.nsamples
File: ..\\seti_header.cpp
Line: 235


</stderr_txt>
]]>

All 4 workunits were from the same series, 13fe08ac.24787.2526.4.7

I don't know what might have caused the issue, but I hope this helps others.
ID: 723205 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 723209 - Posted: 7 Mar 2008, 23:56:56 UTC

Well thanks for bring that to our attention.

There has been a few posts from folks over the last 24 to 36 hours which tended to suggest there may have been some 'shaky' work generated.

Your case certainly adds some more evidence to that. ;-)

Alinator
ID: 723209 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 723212 - Posted: 8 Mar 2008, 0:04:08 UTC

We've had some cases in the past when workunit data files have been noticably malformed: smaller than the normal 367KB, or even of zero size.

Anyone who notices a task from the same series as Jim's (13fe08ac.24787) in their task list might like to check the data file in their SETI project directory and report back.
ID: 723212 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 723215 - Posted: 8 Mar 2008, 0:33:48 UTC - in response to Message 723212.  

We've had some cases in the past when workunit data files have been noticably malformed: smaller than the normal 367KB, or even of zero size.

Anyone who notices a task from the same series as Jim's (13fe08ac.24787) in their task list might like to check the data file in their SETI project directory and report back.


Hi there i have about 15 WU's :

8-3-2008 1:13:32|SETI@home|Scheduler request succeeded: got 7 new tasks
8-3-2008 1:13:34|SETI@home|Started download of 13fe08ac.24787.11524.4.7.47
8-3-2008 1:13:34|SETI@home|Started download of 13fe08ac.24787.11524.4.7.40
8-3-2008 1:13:39|SETI@home|Finished download of 13fe08ac.24787.11524.4.7.47
8-3-2008 1:13:39|SETI@home|Finished download of 13fe08ac.24787.11524.4.7.40
8-3-2008 1:13:39|SETI@home|Started download of 01ap07ad.18916.5389.8.7.210
8-3-2008 1:13:39|SETI@home|Started download of 13fe08ac.24787.11524.4.7.20
8-3-2008 1:13:43|SETI@home|Finished download of 01ap07ad.18916.5389.8.7.210
8-3-2008 1:13:43|SETI@home|Finished download of 13fe08ac.24787.11524.4.7.20
8-3-2008 1:13:43|SETI@home|Started download of 13fe08ac.24787.11524.4.7.65
8-3-2008 1:13:43|SETI@home|Started download of 13fe08ac.24787.11524.4.7.15
8-3-2008 1:13:47|SETI@home|Sending scheduler request: To fetch work. Requesting 3752 seconds of work, reporting 1 completed tasks
8-3-2008 1:13:48|SETI@home|Finished download of 13fe08ac.24787.11524.4.7.65
8-3-2008 1:13:48|SETI@home|Finished download of 13fe08ac.24787.11524.4.7.15
8-3-2008 1:13:48|SETI@home|Started download of 13fe08ac.24787.11524.4.7.25
8-3-2008 1:13:53|SETI@home|Finished download of 13fe08ac.24787.11524.4.7.25
8-3-2008 1:13:53|SETI@home|Scheduler request succeeded: got 1 new tasks
8-3-2008 1:13:55|SETI@home|Started download of 13fe08ac.24787.11524.4.7.175
8-3-2008 1:13:59|SETI@home|Finished download of 13fe08ac.24787.11524.4.7.175
8-3-2008 1:14:04|SETI@home|Sending scheduler request: To fetch work. Requesting 1758 seconds of work, reporting 0 completed tasks
8-3-2008 1:14:10|SETI@home|Scheduler request succeeded: got 1 new tasks
8-3-2008 1:14:12|SETI@home|Started download of 13fe08ac.24787.11524.4.7.212
8-3-2008 1:14:16|SETI@home|Finished download of 13fe08ac.24787.11524.4.7.212
8-3-2008 1:14:21|SETI@home|Sending scheduler request: To fetch work. Requesting 204 seconds of work, reporting 0 completed tasks
8-3-2008 1:14:27|SETI@home|Scheduler request succeeded: got 1 new tasks
8-3-2008 1:14:29|SETI@home|Started download of 13fe08ac.24787.11524.4.7.251
8-3-2008 1:14:32|SETI@home|Finished download of 13fe08ac.24787.11524.4.7.251

Which look from the similar batch.

Can't find off one has been 'crunched' and if OK, sended back to Berkely.
I'll look in the BOINC Logfiles, if any has been processed.

If so, i'll let ya know ;)

ID: 723215 · Report as offensive
Bert

Send message
Joined: 12 Oct 06
Posts: 84
Credit: 813,295
RAC: 0
United States
Message 723218 - Posted: 8 Mar 2008, 0:42:59 UTC - in response to Message 723212.  

We've had some cases in the past when workunit data files have been noticably malformed: smaller than the normal 367KB, or even of zero size.

Anyone who notices a task from the same series as Jim's (13fe08ac.24787) in their task list might like to check the data file in their SETI project directory and report back.


I have two 13fe08ac but the next 5 digits are 15641. Size seems OK, 367 KB.
ID: 723218 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 723219 - Posted: 8 Mar 2008, 0:44:48 UTC

I though I'd better check if it was the Radar-blanking test tape, but no - that was 28ja08aa.
ID: 723219 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 723227 - Posted: 8 Mar 2008, 1:06:37 UTC
Last modified: 8 Mar 2008, 1:18:16 UTC

I got 19 [EDIT: 20] x Client error - Compute error on my QX6700 too..


..because of:

<message>
 - exit code -6 (0xfffffffa)
</message>
<stderr_txt>
SETI@home error -6 Bad workunit header
!swi.data_type || !found || !swi.nsamples
File: ..\\seti_header.cpp
Line: 235

ID: 723227 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 723228 - Posted: 8 Mar 2008, 1:17:04 UTC - in response to Message 723227.  

I got 19 x Client error - Compute error on my QX6700 too..

I checked the 13 on the first three pages, and they're all from 13fe08ac.24787 too.

I'm not searching the remaining 874 tasks for the six I've missed, but it seems likely we're just seeing one rogue splitter on one tape.
ID: 723228 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 723234 - Posted: 8 Mar 2008, 1:30:45 UTC - in response to Message 723228.  

I got 19 x Client error - Compute error on my QX6700 too..

I checked the 13 on the first three pages, and they're all from 13fe08ac.24787 too.

I'm not searching the remaining 874 tasks for the six I've missed, but it seems likely we're just seeing one rogue splitter on one tape.



..they are on the first 8 sides..


ID: 723234 · Report as offensive
Profile Jim-R.
Volunteer tester
Avatar

Send message
Joined: 7 Feb 06
Posts: 1494
Credit: 194,148
RAC: 0
United States
Message 723239 - Posted: 8 Mar 2008, 1:46:19 UTC

This definately sounds like splitter problems. I received a bunch of errors like this on several occasions. One that I got to check on had a bunch of wu's sent out with the binary data intact but just no header in it. On another occasion I received a bunch of files that were zero length! They didn't contain any header or data. On both occasions it was determined that a splitter had acted up and created the defective files.

ERIC! looks like you need to use the ole boot on one of the splitters again!
Jim

Some people plan their life out and look back at the wealth they've had.
Others live life day by day and look back at the wealth of experiences and enjoyment they've had.
ID: 723239 · Report as offensive
Profile dnolan
Avatar

Send message
Joined: 30 Aug 01
Posts: 1228
Credit: 47,779,411
RAC: 32
United States
Message 723245 - Posted: 8 Mar 2008, 2:09:29 UTC - in response to Message 723218.  
Last modified: 8 Mar 2008, 2:10:34 UTC

We've had some cases in the past when workunit data files have been noticably malformed: smaller than the normal 367KB, or even of zero size.

Anyone who notices a task from the same series as Jim's (13fe08ac.24787) in their task list might like to check the data file in their SETI project directory and report back.


I have two 13fe08ac but the next 5 digits are 15641. Size seems OK, 367 KB.


Got 13 of these between 2 of my machines, all look like the correct size to me, guess I'll wait and see what happens.

[edit] Mine are all of the 13fe08ac.24787 series, though... not 15641

-Dave
ID: 723245 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 723264 - Posted: 8 Mar 2008, 4:39:09 UTC - in response to Message 723234.  

I got 19 [EDIT: 20] x Client error - Compute error on my QX6700 too..

I checked the 13 on the first three pages, and they're all from 13fe08ac.24787 too.

I'm not searching the remaining 874 tasks for the six I've missed, but it seems likely we're just seeing one rogue splitter on one tape.



..they are on the first 8 sides..




19 x 13fe08ac.24787.xxxx
1 x 13fe08ac.8515.xxxx



ID: 723264 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 723327 - Posted: 8 Mar 2008, 9:27:51 UTC

In the last few hours:

13fe08ac.24787.18477.4.7.170
13fe08ac.8515.20931.3.7.146
13fe08ac.8515.890.3.7.214

but a few days ago:

01ap07aa.20556.16841.5.7.0 (created 05/03/08)

Not had a computation error for months prior to the above.

ID: 723327 · Report as offensive
Profile SATAN
Avatar

Send message
Joined: 27 Aug 06
Posts: 835
Credit: 2,129,006
RAC: 0
United Kingdom
Message 723333 - Posted: 8 Mar 2008, 9:56:49 UTC

Just had 68 go through with another load still down loading. I was hoping to fill the cache as well over weekend, never mind.
ID: 723333 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 723365 - Posted: 8 Mar 2008, 11:41:45 UTC - in response to Message 723333.  

Just had 68 go through with another load still down loading. I was hoping to fill the cache as well over weekend, never mind.


Had a batch off 13feb08.24787 & and others from 13feb08, see post from a few
day's ago (earlier in this post).

Didn't see any computing errors so far, most off them are still waiting for UPLOAD, as their deadline isn't reached until 30 march 2008.

They almost all have TRIPLET's, with triplet power up to 11.

How do you 'display' such images/triplets, i've "PRINTSREEN"them and saved a part off the images as *.png's . Can i upload them. Or do i have to 'own' a URL, to use it?

ID: 723365 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 723371 - Posted: 8 Mar 2008, 11:56:27 UTC - in response to Message 723365.  

Just had 68 go through with another load still down loading. I was hoping to fill the cache as well over weekend, never mind.


Had a batch off 13feb08.24787 & and others from 13feb08, see post from a few
day's ago (earlier in this post).

Didn't see any computing errors so far, most off them are still waiting for UPLOAD, as their deadline isn't reached until 30 march 2008.

They almost all have TRIPLET's, with triplet power up to 11.

How do you 'display' such images/triplets, i've "PRINTSREEN"them and saved a part off the images as *.png's . Can i upload them. Or do i have to 'own' a URL, to use it?

I use imageshack.

Free and convenient.

F.
ID: 723371 · Report as offensive
Iona
Avatar

Send message
Joined: 12 Jul 07
Posts: 790
Credit: 22,438,118
RAC: 0
United Kingdom
Message 723391 - Posted: 8 Mar 2008, 13:43:45 UTC

I have also had a load of 'Compute Errors' on my 'second' PC. In frustration, I virtually took the PC apart, tested some parts (RAM and CPU) in its virtual twin, brushed out and vacuum cleaned the HSF and cooling fans and reinstalled BOINC. Guess what?! More of the same.... d'oh! I then checked and found that some of the WUs had returned results, which was not the case at the time - thy're showing the same error. All those different PCs can't all be wrong, eh?

Oh well, I'm off to visit my family for a couple of days, from tomorrow, so hopefully they'll get things sorted out by the time I return.


Don't take life too seriously, as you'll never come out of it alive!
ID: 723391 · Report as offensive
Profile SATAN
Avatar

Send message
Joined: 27 Aug 06
Posts: 835
Credit: 2,129,006
RAC: 0
United Kingdom
Message 723397 - Posted: 8 Mar 2008, 14:23:32 UTC

Have reached download limit for today, so will have to wait till tomorrow to get some more of the dodgy ones.
ID: 723397 · Report as offensive
Matthias Lehmkuhl Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 5 Oct 99
Posts: 28
Credit: 10,832,348
RAC: 53
Germany
Message 723440 - Posted: 8 Mar 2008, 16:42:40 UTC

i got one WU with error - exit code -1073741819 (0xc0000005)
13fe08ac.23325
its not the same error, but also from 13fe08ac

one result is with setiathome_5.27_windows_intel boinc 5.8.16 XP SP2
my is with KWSN_2.4V_SSSE3_MB boinc 5.10.30 vista

both crunched results got the same error immediately after the start

wuid=234464976
Matthias

ID: 723440 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 723462 - Posted: 8 Mar 2008, 18:00:09 UTC

I got three from the bad tape (13fe08ac) on host 3751792. They are WUs 234331734, 234211111, and 234081520 - identified as 13fe08ac.6464, 13fe08ac.24787 and 13fe08ac.8515 respectively. They all failed with the 'bad workunit header' error -6.

So I'm looking at the header for the 13fe08ac.24787 task, and comparing it with the saved header for 772524490, a 13fe08ac.18960 task which crunched OK a few days ago. They're noticably different. Red for the bad header, green for the good header.

[color=red]    <tape_info>
      <name>13fe08ac</name>
      <start_time>2454509.997042</start_time>
      <last_block_time>2454509.997042</last_block_time>
      <last_block_done>16841</last_block_done>
      <missed>0</missed>
      <tape_quality>0</tape_quality>
      <beam>1</beam> <--------------------------------------difference
    </tape_info>
    <name>13fe08ac.24787.16841.4.7</name> <---------------difference
    <data_desc>[/color]
[color=green]      <tape_info>
        <name>13fe08ac</name>
        <start_time>2454510.0702579</start_time>
        <last_block_time>2454510.0702579</last_block_time>
        <last_block_done>47005</last_block_done>
        <missed>0</missed>
        <tape_quality>0</tape_quality>
        <sb_id>0</sb_id> <------------------------------------difference
      </tape_info>
      <name>13fe08ac</name> <-------------------------------difference
      <data_desc>[/color]


[color=red]  <receiver_cfg>
    <s4_id>4</s4_id>
    <name>Arecibo 1.4GHz Array, Beam 0, Pol 1</name>
    <beam_width>0.0500000007</beam_width>
    <center_freq>1420</center_freq>
    <latitude>18.3538056</latitude>
    <longitude>-66.7552222</longitude>
    <elevation>497</elevation>
    <diameter>168</diameter>
    <az_orientation>180</az_orientation>
    <az_corr_coeff length=99 encoding="x-csv">
      -37,-6.05,92.35,-731.21,-1013.97,-24.53,-11.19,9.18,106.04,3.02,-1.74,
      -3.46,1.29
    </az_corr_coeff>
    <zen_corr_coeff length=99 encoding="x-csv">
      -57.55,-95.56,-4.13,141.69,677.51,-10.41,-7.71,-10.39,0.08,0.43,-0.62,
      0.03,-0.36
    </zen_corr_coeff>
    <array_az_ellipse>0</array_az_ellipse> <-------------addition
    <array_za_ellipse>0</array_za_ellipse> <-------------addition
    <array_angle>0</array_angle> <-----------------------addition
  </receiver_cfg>[/color]
[color=green]    <receiver_cfg>
      <s4_id>8</s4_id>
      <name>Arecibo 1.4GHz Array, Beam 2, Pol 1</name>
      <beam_width>0.0500000007</beam_width>
      <center_freq>1420</center_freq>
      <latitude>18.3538056</latitude>
      <longitude>-66.7552222</longitude>
      <elevation>497</elevation>
      <diameter>168</diameter>
      <az_orientation>180</az_orientation>
      <az_corr_coeff length=105 encoding="x-csv">
        -37,-6.05,92.35,-731.21,-1013.97,-24.53,-11.19,9.18,106.04,3.02,-1.74,
        -3.46,1.29
      </az_corr_coeff>
      <zen_corr_coeff length=105 encoding="x-csv">
        -57.55,-95.56,-4.13,141.69,677.51,-10.41,-7.71,-10.39,0.08,0.43,-0.62,
        0.03,-0.36
      </zen_corr_coeff>
    </receiver_cfg>[/color]


[color=red]  <splitter_cfg>
    <version>0</version> <---------------------------difference
    <data_type></data_type> <------------------------empty
    <fft_len>0</fft_len> <---------------------------difference
    <ifft_len>0</ifft_len> <-------------------------difference
    <filter></filter> <------------------------------empty
    <window></window> <------------------------------empty
    <samples_per_wu>0</samples_per_wu> <-------------addition
    <highpass>0</highpass> <-------------------------addition
  </splitter_cfg>[/color]
[color=green]    <splitter_cfg>
      <version>0.200000003</version> <-----------------difference
      <data_type>encoded</data_type>
      <fft_len>2048</fft_len> <------------------------difference
      <ifft_len>8</ifft_len> <-------------------------difference
      <filter>fft</filter>
      <window>welsh</window>
    </splitter_cfg>[/color]

The rest looks plausible, so I guess it's that splitter block which is causing the damage.

Unless it's the new line at the bottom of analysis_cfg: <credit_rate>2.8499999</credit_rate>. Yea.
ID: 723462 · Report as offensive
1 · 2 · 3 · 4 . . . 8 · Next

Message boards : Number crunching : Computation Error - Bad Workunit Header


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.