Message boards :
Number crunching :
Astropulse Beam 3 Polarity 1 Errors
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
perryjay Send message Joined: 20 Aug 02 Posts: 3377 Credit: 20,676,751 RAC: 0 |
I looked at it a bit too quickly also. I saw it was 22fe09 but saw two in progress and four errored out. I didn't check what the errors were just that it looked like the bad ones. Thought I had found another date with problems. Oh well, tomorrow morning it will be history. I may be slow but I turn in good work. :) PROUD MEMBER OF Team Starfire World BOINC |
W5GA, W5TAT, W8QR, K6XT Send message Joined: 25 Sep 99 Posts: 42 Credit: 23,144,377 RAC: 6 |
I am experiencing computation errors on some AP workunits for several days. BOINC records about 2 seconds of work, then the WU stops with an error. I can't give a result file because none of these WU generate one. Most AP WU are processing normally. What I do see is this in stdoutae.txt: 18-May-2009 05:39:33 [SETI@home] Output file ap_02ap09ab_B3_P1_00042_20090504_15112.wu_3_0 for task ap_02ap09ab_B3_P1_00042_20090504_15112.wu_3 absent And in client_state.xml where I wondered if <status>-161</status> may have significance: <file_info> <name>ap_02ap09ab_B3_P1_00042_20090504_15112.wu_3_0</name> <nbytes>0.000000</nbytes> <max_nbytes>655360.000000</max_nbytes> <generated_locally/> <status>-161</status> <upload_when_present/> <url>http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler</url> <signed_xml> <name>ap_02ap09ab_B3_P1_00042_20090504_15112.wu_3_0</name> <generated_locally/> <upload_when_present/> <max_nbytes>655360</max_nbytes> <url>http://setiboincdata.ssl.berkeley.edu/sah_cgi/file_upload_handler</url> </signed_xml> <xml_signature> 4bf9800b860a2feab7eb934ffc11291e8877dd10a84eaa9c22638443496b7507 ed59748718ffb2a9fc02800cd9a9add912830774003895c864d8e5aa820c84e6 00f968b4edf6630b949624059c45d844d9c15bf488628825479c99f174d0c904 6430ec9bb6df04add443a781a11d08d5183c615b12429abcf847be3721e0f00d . </xml_signature> </file_info> : I don't personally know whether either of these have significance. I see there is talk about work units containing B3_P1. Maybe this is an example? |
Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13 |
I am experiencing computation errors on some AP workunits for several days. BOINC records about 2 seconds of work, then the WU stops with an error. I can't give a result file because none of these WU generate one. Most AP WU are processing normally. Yes, that is a WU that fits the description. It doesn't actually get a chance to start crunching, and errors out before an output file is created, which is where the messages tab error comes from. There's really nothing you can do about these though. About all you can do is if you see a B3_P1 between 12mr09 and 07ap09, suspend the other tasks and get that one processed and out of the way. It only takes a few seconds and then it's gone. I burned through about a dozen of them the other night. Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Provisional evidence is that 06mr09ad is 'clean' - does not exhibit the B3_P1 error. That means that 04mr09ac (being split at the moment) should be clean too. |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
Provisional evidence is that 06mr09ad is 'clean' - does not exhibit the B3_P1 error. Yes, Joe had that one logged already - see message 896455. |
Fred W Send message Joined: 13 Jun 99 Posts: 2524 Credit: 11,954,210 RAC: 0 |
Provisional evidence is that 06mr09ad is 'clean' - does not exhibit the B3_P1 error. There we go - tail-end charlie again! F. |
Speedy Send message Joined: 26 Jun 04 Posts: 1639 Credit: 12,921,799 RAC: 89 |
|
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Hi, looked at my task & error list and saw a few faulty AP WU's, amongst other error's, here is the AP WU And AP WU , AP WU , AP WU and AP WU . This are the latest I found, but the list goes back. The last in the list (first) AP WU and AP WU . Here an ODD one, never seen before!Out Off Memory!? AP result . from this AP WU . |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
09mr09aa and 10mr09aa These two tapes finished very fast. Did these tapes get removed from the splitting queue, or did these results error out and get returned extra fast? The 'tapes' are about 1.5 hours long, so give about 400 AP WUs per channel, total around 5600. And because the ap_splitter processes all focus on the same 'tape', they only take an hour or two to finish splitting a 'tape'. The mb_splitters get over 220 thousand WUs from a 'tape', and try to have the processes working on different 'tapes'. With the Feeder apparently set to deliver the same number of each kind, it's no surprise the AP tasks also get sent quickly. The B3_P1 problem only affects that one channel, so over 92% of the AP WUs still take normal crunch times. I'm sure there are many 09mr09aa and 10mr09aa tasks still "In progress". Some actually being crunched, maybe more sitting in cache awaiting their turn. Joe |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
Most of the additional data Matt has pulled back from NERSC HPSS does not have the B3_P1 problem, but some does. Noting that the 08mr09aa 'tape' was only 8.78 GB so might have only 69 or 70 WUs per channel, I did some searching in the download fanout to find WUs to check directly. ap_07mr09aa_B3_P1_00234_20090520_31928.wu does not have the problem. The start time was March 7 at 01:27:26 a.m. Atlantic Standard Time. ap_08mr09aa_B3_P1_00049_20090521_22547.wu does have the problem. The start time was March 8 at 08:33:07 a.m. Atlantic Standard Time. My guess is when 07mr09ab is split it will give good data from B3_P1, in either case it will help reduce uncertainty about when the problem began. Joe |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
ap_08mr09aa_B3_P1_00057_20090521_22547.wu does have the problem. Start time in data is 2454899.524249, converts to 9 Mar 2009 (?? sic) 00:34:54 UTC by onlineconversion. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
ap_08mr09aa_B3_P1_00057_20090521_22547.wu does have the problem. Atlantic Standard Time is 4 hours earlier, so 8 March at 8:34:54 p.m. (and I should have typed 8:33:07 p.m. for ap_08mr09aa_B3_P1_00049_20090521_22547.wu too). Joe |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
ap_07mr09ab_B3_P1_00097_20090521_02038.wu does not have the problem. Recorded March 7 2009 at 9:13:14 p.m. Atlantic Standard Time. Joe |
Pappa Send message Joined: 9 Jan 00 Posts: 2562 Credit: 12,301,681 RAC: 0 |
Joe As there appears to be no activity, I am going to Unsticky the thread. If things warm up with the new disk images let us know. Regards Please consider a Donation to the Seti Project. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.