Computation Error - Bad Workunit Header

Message boards : Number crunching : Computation Error - Bad Workunit Header
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

AuthorMessage
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 724341 - Posted: 10 Mar 2008, 22:59:56 UTC - in response to Message 723371.  

Just had 68 go through with another load still down loading. I was hoping to fill the cache as well over weekend, never mind.


Had a batch off 13feb08.24787 & and others from 13feb08, see post from a few
day's ago (earlier in this post).

Didn't see any computing errors so far, most off them are still waiting for UPLOAD, as their deadline isn't reached until 30 march 2008.

They almost all have TRIPLET's, with triplet power up to 11.

How do you 'display' such images/triplets, i've "PRINTSREEN"them and saved a part off the images as *.png's . Can i upload them. Or do i have to 'own' a URL, to use it?

I use imageshack.

Free and convenient.

F.


Thanx Fred , for the idea, didn't know it
Just signed up.

ID: 724341 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65745
Credit: 55,293,173
RAC: 49
United States
Message 724382 - Posted: 10 Mar 2008, 23:54:42 UTC - in response to Message 723239.  
Last modified: 11 Mar 2008, 0:05:43 UTC

This definately sounds like splitter problems. I received a bunch of errors like this on several occasions. One that I got to check on had a bunch of wu's sent out with the binary data intact but just no header in it. On another occasion I received a bunch of files that were zero length! They didn't contain any header or data. On both occasions it was determined that a splitter had acted up and created the defective files.

ERIC! looks like you need to use the ole boot on one of the splitters again!

I certainly hope so, I'm getting them on all 3 of My PCs, And they are as follows:

5 x 13fe08ac.6464
4 x 13fe08ac.8515
2 x 13fe08ac.23325
4 x 13fe08ac.24787

And so far that is 15 total WU's that have been erroring on Me. :( I'm glad It's not My end that's causing the errors(and I thought It was), But at the same time I'm sad to see the splitters spitting out such WU's.

Edit: I've had 3 more just in the last few minutes.

1 x 13fe08ac.23325(PC3)
2 x 13fe08ac.24787(PC2)
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 724382 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 724396 - Posted: 11 Mar 2008, 0:23:34 UTC
Last modified: 11 Mar 2008, 0:23:57 UTC

OK, here's the sitch.

I have just picked a task from 13fe08ad.

So far I have no reason to suspect it other than it is from the same day as the last batch of failed test data.

Anybody else have some from this set to comment on yet?

Alinator
ID: 724396 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 724400 - Posted: 11 Mar 2008, 0:32:54 UTC - in response to Message 724396.  

OK, here's the sitch.

I have just picked a task from 13fe08ad.

So far I have no reason to suspect it other than it is from the same day as the last batch of failed test data.

Anybody else have some from this set to comment on yet?

Alinator

Last reference I saw was Richard's post here

F.
ID: 724400 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 724401 - Posted: 11 Mar 2008, 0:32:59 UTC - in response to Message 724396.  

OK, here's the sitch.

I have just picked a task from 13fe08ad.

So far I have no reason to suspect it other than it is from the same day as the last batch of failed test data.

Anybody else have some from this set to comment on yet?

Alinator

No, none here.

How does the header compare with my message 723462 below? Red or green?
ID: 724401 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 724402 - Posted: 11 Mar 2008, 0:33:49 UTC - in response to Message 724396.  

OK, here's the sitch.

I have just picked a task from 13fe08ad.

So far I have no reason to suspect it other than it is from the same day as the last batch of failed test data.

Anybody else have some from this set to comment on yet?

Alinator


It should be OK. Both Eric and Matt have indicated that it was a bad splitter, not bad "tape".

The Server status page shows that 13fe08ad and 13fe08ae are currently being split, and that one splitter mb_splitter5 is currently disabled.
Sir Arthur C Clarke 1917-2008
ID: 724402 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 724406 - Posted: 11 Mar 2008, 0:40:49 UTC - in response to Message 724401.  
Last modified: 11 Mar 2008, 0:43:15 UTC

OK, here's the sitch.

I have just picked a task from 13fe08ad.

So far I have no reason to suspect it other than it is from the same day as the last batch of failed test data.

Anybody else have some from this set to comment on yet?

Alinator

No, none here.

How does the header compare with my message 723462 below? Red or green?


Can't tell right now. It's on one of my remote hosts and I don't have it set up so I can look at the input files remotely. I'll check it tomorrow.

@ Keith: OK, that makes sense now that I think about it in big picture terms.

I was always going on the idea that it was more a larger scale test of the radar blanking technology rather than just a 'bad' splitter per se. They just chose an 2008 data set to make it easier to tell the good, from the bad, from the ugly, so to speak! :-)

Alinator
ID: 724406 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 724414 - Posted: 11 Mar 2008, 0:59:19 UTC

Well... 2 of my 36 13fe08ac's have crunched normally. I have done a text search and not found any with empty <data_type>, <window> or <filter> tags so I am going to let them crunch through and see what happens.

Bit of a negative test on Richard's hypothesis but I suppose any additional data may help (and if one of them should crash and burn, then ...)

F.
ID: 724414 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 724417 - Posted: 11 Mar 2008, 1:02:15 UTC - in response to Message 724414.  

Well... 2 of my 36 13fe08ac's have crunched normally. I have done a text search and not found any with empty <data_type>, <window> or <filter> tags so I am going to let them crunch through and see what happens.

Bit of a negative test on Richard's hypothesis but I suppose any additional data may help (and if one of them should crash and burn, then ...)

F.

Were they split (WU created) before or after Eric's "I'm on the case" message?
ID: 724417 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 724419 - Posted: 11 Mar 2008, 1:08:17 UTC - in response to Message 724417.  

Well... 2 of my 36 13fe08ac's have crunched normally. I have done a text search and not found any with empty <data_type>, <window> or <filter> tags so I am going to let them crunch through and see what happens.

Bit of a negative test on Richard's hypothesis but I suppose any additional data may help (and if one of them should crash and burn, then ...)

F.

Were they split (WU created) before or after Eric's "I'm on the case" message?

The oldest one was sent out on 6th March so that would make it "before".

F.
ID: 724419 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 724420 - Posted: 11 Mar 2008, 1:12:08 UTC - in response to Message 724419.  

Well... 2 of my 36 13fe08ac's have crunched normally. I have done a text search and not found any with empty <data_type>, <window> or <filter> tags so I am going to let them crunch through and see what happens.

Bit of a negative test on Richard's hypothesis but I suppose any additional data may help (and if one of them should crash and burn, then ...)

F.

Were they split (WU created) before or after Eric's "I'm on the case" message?

The oldest one was sent out on 6th March so that would make it "before".

F.

Ah. Not only 'before', but 'much before'. See message 724253 in the 'other' thread:
Here's a funny thing. My linux box here has one of the 13fe08ac WUs, and it was crunching it at lunchtime when I checked it. When I checked the wingman, he had completed it and was also running linux. When I get home this evening I will check to see if it completed.

I had one from that tape issued on 3rd March, returned on 4th March, which crunched OK. You've got some issued on the 6th - I suspect that they didn't make the final, fatal, modification to the test splitter until the 7th.

ID: 724420 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 724438 - Posted: 11 Mar 2008, 1:53:17 UTC - in response to Message 724420.  

Well... 2 of my 36 13fe08ac's have crunched normally. I have done a text search and not found any with empty <data_type>, <window> or <filter> tags so I am going to let them crunch through and see what happens.

Bit of a negative test on Richard's hypothesis but I suppose any additional data may help (and if one of them should crash and burn, then ...)

F.

Were they split (WU created) before or after Eric's "I'm on the case" message?

The oldest one was sent out on 6th March so that would make it "before".

F.

Ah. Not only 'before', but 'much before'. See message 724253 in the 'other' thread:
Here's a funny thing. My linux box here has one of the 13fe08ac WUs, and it was crunching it at lunchtime when I checked it. When I checked the wingman, he had completed it and was also running linux. When I get home this evening I will check to see if it completed.

I had one from that tape issued on 3rd March, returned on 4th March, which crunched OK. You've got some issued on the 6th - I suspect that they didn't make the final, fatal, modification to the test splitter until the 7th.


Right; well my text search obviously failed miserably.

I have now located a good half dozen that were issued to me in the early hours (UTC) of the 9th and all have empty <data_type>, <window> and <filter> tags. I'm now debating whether to try to find them in my cache and abort them - but I think that can wait until the morning!!!

F.
ID: 724438 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 724488 - Posted: 11 Mar 2008, 4:20:10 UTC

I just reported the one I had on my machine. My wingman and I have both finished with it but the bad thing I noticed is that the silly thing has been sent out to two more people. Too bad there isn't a way to catch these before they get reissued. Looks like these things are going to be causing headaches for some time to come before we see the last of them. :(


PROUD MEMBER OF Team Starfire World BOINC
ID: 724488 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 724493 - Posted: 11 Mar 2008, 4:43:58 UTC - in response to Message 724488.  
Last modified: 11 Mar 2008, 4:44:54 UTC

I just reported the one I had on my machine. My wingman and I have both finished with it but the bad thing I noticed is that the silly thing has been sent out to two more people. Too bad there isn't a way to catch these before they get reissued. Looks like these things are going to be causing headaches for some time to come before we see the last of them. :(


I found this one which hasn't run yet on one of your hosts. You might want to give it the boot manually.

Alinator
ID: 724493 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 724498 - Posted: 11 Mar 2008, 5:00:43 UTC - in response to Message 724493.  



I found this one which hasn't run yet on one of your hosts. You might want to give it the boot manually.

Alinator


Thanks for the heads up Alinator, I missed that one. It's gone now. :)



PROUD MEMBER OF Team Starfire World BOINC
ID: 724498 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65745
Credit: 55,293,173
RAC: 49
United States
Message 724505 - Posted: 11 Mar 2008, 5:30:02 UTC - in response to Message 724438.  

Well... 2 of my 36 13fe08ac's have crunched normally. I have done a text search and not found any with empty <data_type>, <window> or <filter> tags so I am going to let them crunch through and see what happens.

Bit of a negative test on Richard's hypothesis but I suppose any additional data may help (and if one of them should crash and burn, then ...)

F.

Were they split (WU created) before or after Eric's "I'm on the case" message?

The oldest one was sent out on 6th March so that would make it "before".

F.

Ah. Not only 'before', but 'much before'. See message 724253 in the 'other' thread:
Here's a funny thing. My linux box here has one of the 13fe08ac WUs, and it was crunching it at lunchtime when I checked it. When I checked the wingman, he had completed it and was also running linux. When I get home this evening I will check to see if it completed.

I had one from that tape issued on 3rd March, returned on 4th March, which crunched OK. You've got some issued on the 6th - I suspect that they didn't make the final, fatal, modification to the test splitter until the 7th.


Right; well my text search obviously failed miserably.

I have now located a good half dozen that were issued to me in the early hours (UTC) of the 9th and all have empty <data_type>, <window> and <filter> tags. I'm now debating whether to try to find them in my cache and abort them - but I think that can wait until the morning!!!

F.

I aborted every last one that I found, I just wish somebody could get rid of them before they get out and send them to the loony bin where they belong. :D As they are annoyingly empty.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 724505 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 724583 - Posted: 11 Mar 2008, 11:02:58 UTC - in response to Message 724505.  

Well... 2 of my 36 13fe08ac's have crunched normally. I have done a text search and not found any with empty <data_type>, <window> or <filter> tags so I am going to let them crunch through and see what happens.

Bit of a negative test on Richard's hypothesis but I suppose any additional data may help (and if one of them should crash and burn, then ...)

F.

Were they split (WU created) before or after Eric's "I'm on the case" message?

The oldest one was sent out on 6th March so that would make it "before".

F.

Ah. Not only 'before', but 'much before'. See message 724253 in the 'other' thread:
Here's a funny thing. My linux box here has one of the 13fe08ac WUs, and it was crunching it at lunchtime when I checked it. When I checked the wingman, he had completed it and was also running linux. When I get home this evening I will check to see if it completed.

I had one from that tape issued on 3rd March, returned on 4th March, which crunched OK. You've got some issued on the 6th - I suspect that they didn't make the final, fatal, modification to the test splitter until the 7th.


Right; well my text search obviously failed miserably.

I have now located a good half dozen that were issued to me in the early hours (UTC) of the 9th and all have empty <data_type>, <window> and <filter> tags. I'm now debating whether to try to find them in my cache and abort them - but I think that can wait until the morning!!!

F.

I aborted every last one that I found, I just wish somebody could get rid of them before they get out and send them to the loony bin where they belong. :D As they are annoyingly empty.


But why aborting? BOINC can do this job also..

ID: 724583 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 724584 - Posted: 11 Mar 2008, 11:05:35 UTC

I thought that bad WU's got cancelled after 5 errors, but I've just checked my only one of these and it has now reached 5 errors with a new Unsent generated. http://setiathome.berkeley.edu/workunit.php?wuid=234337343
Sir Arthur C Clarke 1917-2008
ID: 724584 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19059
Credit: 40,757,560
RAC: 67
United Kingdom
Message 724589 - Posted: 11 Mar 2008, 11:21:40 UTC - in response to Message 724584.  

I thought that bad WU's got cancelled after 5 errors, but I've just checked my only one of these and it has now reached 5 errors with a new Unsent generated. http://setiathome.berkeley.edu/workunit.php?wuid=234337343

The one that has been sent was sent before the last one that reported in. This would be normal operation, because before that BOINC would be trying to get two units to form quorum.
It should now mark the unsent unit as not needed.
ID: 724589 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 724593 - Posted: 11 Mar 2008, 11:34:15 UTC - in response to Message 724583.  


But why aborting? BOINC can do this job also..


If I abort them now, then Boinc will re-send to another sap immediately so they will reach their 5 failures and be deleted from the system sooner. Otherwise, they would be hanging aroung in my cache for the next 3 or 4 days before being crunched - producing their errors - and then being sent out again.

Just trying to speed the process up a little.

F.
ID: 724593 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

Message boards : Number crunching : Computation Error - Bad Workunit Header


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.