Astropulse Beam 3 Polarity 1 Errors

Message boards : Number crunching : Astropulse Beam 3 Polarity 1 Errors
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 896420 - Posted: 18 May 2009, 14:21:03 UTC - in response to Message 896021.  

I looked at it a bit too quickly also. I saw it was 22fe09 but saw two in progress and four errored out. I didn't check what the errors were just that it looked like the bad ones. Thought I had found another date with problems. Oh well, tomorrow morning it will be history. I may be slow but I turn in good work. :)


PROUD MEMBER OF Team Starfire World BOINC
ID: 896420 · Report as offensive
W5GA, W5TAT, W8QR, K6XT

Send message
Joined: 25 Sep 99
Posts: 42
Credit: 23,144,377
RAC: 6
United States
Message 896488 - Posted: 18 May 2009, 16:39:09 UTC

I am experiencing computation errors on some AP workunits for several days. BOINC records about 2 seconds of work, then the WU stops with an error. I can't give a result file because none of these WU generate one. Most AP WU are processing normally.

What I do see is this in stdoutae.txt:

18-May-2009 05:39:33 [SETI@home] Output file ap_02ap09ab_B3_P1_00042_20090504_15112.wu_3_0 for task ap_02ap09ab_B3_P1_00042_20090504_15112.wu_3 absent

And in client_state.xml where I wondered if <status>-161</status> may have significance:

<file_info>
<name>ap_02ap09ab_B3_P1_00042_20090504_15112.wu_3_0</name>
<nbytes>0.000000</nbytes>
<max_nbytes>655360.000000</max_nbytes>
<generated_locally/>
<status>-161</status>
<upload_when_present/>
<url>http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler</url>
<signed_xml>
<name>ap_02ap09ab_B3_P1_00042_20090504_15112.wu_3_0</name>
<generated_locally/>
<upload_when_present/>
<max_nbytes>655360</max_nbytes>
<url>http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler</url>
</signed_xml>
<xml_signature>
4bf9800b860a2feab7eb934ffc11291e8877dd10a84eaa9c22638443496b7507
ed59748718ffb2a9fc02800cd9a9add912830774003895c864d8e5aa820c84e6
00f968b4edf6630b949624059c45d844d9c15bf488628825479c99f174d0c904
6430ec9bb6df04add443a781a11d08d5183c615b12429abcf847be3721e0f00d
.
</xml_signature>
</file_info>
:
I don't personally know whether either of these have significance. I see there is talk about work units containing B3_P1. Maybe this is an example?
ID: 896488 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 896816 - Posted: 19 May 2009, 2:37:19 UTC - in response to Message 896488.  

I am experiencing computation errors on some AP workunits for several days. BOINC records about 2 seconds of work, then the WU stops with an error. I can't give a result file because none of these WU generate one. Most AP WU are processing normally.

What I do see is this in stdoutae.txt:

18-May-2009 05:39:33 [SETI@home] Output file ap_02ap09ab_B3_P1_00042_20090504_15112.wu_3_0 for task ap_02ap09ab_B3_P1_00042_20090504_15112.wu_3 absent

And in client_state.xml where I wondered if <status>-161</status> may have significance:

<file_info>
<name>ap_02ap09ab_B3_P1_00042_20090504_15112.wu_3_0</name>
<nbytes>0.000000</nbytes>
<max_nbytes>655360.000000</max_nbytes>
<generated_locally/>
<status>-161</status>
<upload_when_present/>
<url>http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler</url>
<signed_xml>
<name>ap_02ap09ab_B3_P1_00042_20090504_15112.wu_3_0</name>
<generated_locally/>
<upload_when_present/>
<max_nbytes>655360</max_nbytes>
<url>http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler</url>
</signed_xml>
<xml_signature>
4bf9800b860a2feab7eb934ffc11291e8877dd10a84eaa9c22638443496b7507
ed59748718ffb2a9fc02800cd9a9add912830774003895c864d8e5aa820c84e6
00f968b4edf6630b949624059c45d844d9c15bf488628825479c99f174d0c904
6430ec9bb6df04add443a781a11d08d5183c615b12429abcf847be3721e0f00d
.
</xml_signature>
</file_info>
:
I don't personally know whether either of these have significance. I see there is talk about work units containing B3_P1. Maybe this is an example?

Yes, that is a WU that fits the description. It doesn't actually get a chance to start crunching, and errors out before an output file is created, which is where the messages tab error comes from.

There's really nothing you can do about these though. About all you can do is if you see a B3_P1 between 12mr09 and 07ap09, suspend the other tasks and get that one processed and out of the way. It only takes a few seconds and then it's gone. I burned through about a dozen of them the other night.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 896816 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 896905 - Posted: 19 May 2009, 9:45:17 UTC

Provisional evidence is that 06mr09ad is 'clean' - does not exhibit the B3_P1 error.

That means that 04mr09ac (being split at the moment) should be clean too.
ID: 896905 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 896993 - Posted: 19 May 2009, 15:38:31 UTC - in response to Message 896905.  

Provisional evidence is that 06mr09ad is 'clean' - does not exhibit the B3_P1 error.

That means that 04mr09ac (being split at the moment) should be clean too.

Looks like 09mr09aa and 10mr09aa are infected though.

F.
ID: 896993 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 897005 - Posted: 19 May 2009, 16:00:09 UTC - in response to Message 896993.  

Provisional evidence is that 06mr09ad is 'clean' - does not exhibit the B3_P1 error.

That means that 04mr09ac (being split at the moment) should be clean too.

Looks like 09mr09aa and 10mr09aa are infected though.

F.

Yes, Joe had that one logged already - see message 896455.
ID: 897005 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 897013 - Posted: 19 May 2009, 16:13:47 UTC - in response to Message 897005.  

Provisional evidence is that 06mr09ad is 'clean' - does not exhibit the B3_P1 error.

That means that 04mr09ac (being split at the moment) should be clean too.

Looks like 09mr09aa and 10mr09aa are infected though.

F.

Yes, Joe had that one logged already - see message 896455.

There we go - tail-end charlie again!

F.
ID: 897013 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1639
Credit: 12,921,799
RAC: 89
New Zealand
Message 897202 - Posted: 20 May 2009, 9:04:16 UTC - in response to Message 896993.  

09mr09aa and 10mr09aa These two tapes finished very fast. Did these tapes get removed from the splitting queue, or did these results error out and get returned extra fast?
ID: 897202 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 897222 - Posted: 20 May 2009, 11:54:39 UTC - in response to Message 897202.  
Last modified: 20 May 2009, 12:48:20 UTC

Hi, looked at my task & error list and saw a few faulty AP WU's, amongst other error's, here is the AP WU
And AP WU , AP WU , AP WU and AP WU .
This are the latest I found, but the list goes back.
The last in the list (first) AP WU and AP WU .
Here an ODD one, never seen before!Out Off Memory!?
AP result .
from this AP WU .
ID: 897222 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 897244 - Posted: 20 May 2009, 14:17:30 UTC - in response to Message 897202.  

09mr09aa and 10mr09aa These two tapes finished very fast. Did these tapes get removed from the splitting queue, or did these results error out and get returned extra fast?

The 'tapes' are about 1.5 hours long, so give about 400 AP WUs per channel, total around 5600. And because the ap_splitter processes all focus on the same 'tape', they only take an hour or two to finish splitting a 'tape'. The mb_splitters get over 220 thousand WUs from a 'tape', and try to have the processes working on different 'tapes'. With the Feeder apparently set to deliver the same number of each kind, it's no surprise the AP tasks also get sent quickly.

The B3_P1 problem only affects that one channel, so over 92% of the AP WUs still take normal crunch times. I'm sure there are many 09mr09aa and 10mr09aa tasks still "In progress". Some actually being crunched, maybe more sitting in cache awaiting their turn.
                                                                 Joe
ID: 897244 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 897835 - Posted: 21 May 2009, 18:34:35 UTC

Most of the additional data Matt has pulled back from NERSC HPSS does not have the B3_P1 problem, but some does. Noting that the 08mr09aa 'tape' was only 8.78 GB so might have only 69 or 70 WUs per channel, I did some searching in the download fanout to find WUs to check directly.

ap_07mr09aa_B3_P1_00234_20090520_31928.wu does not have the problem. The start time was March 7 at 01:27:26 a.m. Atlantic Standard Time.
ap_08mr09aa_B3_P1_00049_20090521_22547.wu does have the problem. The start time was March 8 at 08:33:07 a.m. Atlantic Standard Time.

My guess is when 07mr09ab is split it will give good data from B3_P1, in either case it will help reduce uncertainty about when the problem began.
                                                               Joe
ID: 897835 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 897890 - Posted: 21 May 2009, 20:58:23 UTC - in response to Message 897835.  

ap_08mr09aa_B3_P1_00057_20090521_22547.wu does have the problem.

Start time in data is 2454899.524249, converts to 9 Mar 2009 (?? sic) 00:34:54 UTC by onlineconversion.
ID: 897890 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 897899 - Posted: 21 May 2009, 21:35:23 UTC - in response to Message 897890.  

ap_08mr09aa_B3_P1_00057_20090521_22547.wu does have the problem.

Start time in data is 2454899.524249, converts to 9 Mar 2009 (?? sic) 00:34:54 UTC by onlineconversion.

Atlantic Standard Time is 4 hours earlier, so 8 March at 8:34:54 p.m. (and I should have typed 8:33:07 p.m. for ap_08mr09aa_B3_P1_00049_20090521_22547.wu too).
                                                                Joe
ID: 897899 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 898253 - Posted: 22 May 2009, 15:53:47 UTC

ap_07mr09ab_B3_P1_00097_20090521_02038.wu does not have the problem. Recorded March 7 2009 at 9:13:14 p.m. Atlantic Standard Time.
                                                                Joe
ID: 898253 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 910614 - Posted: 24 Jun 2009, 1:33:43 UTC

Joe

As there appears to be no activity, I am going to Unsticky the thread.
If things warm up with the new disk images let us know.

Regards

Please consider a Donation to the Seti Project.

ID: 910614 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Astropulse Beam 3 Polarity 1 Errors


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.