Message boards :
Number crunching :
AstroPulse errors - Reporting
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 14 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 ![]() ![]() |
I have an AP work-unit I am crunching now that was estimated to take 82 hours to complete (and three more similar AP WU's to go). After about 60 hours of crunching, the number of hours to completion jumped from about 15, to 131 hours! I re-booted and let it run for 4 more hours and the number of hours to completion has dropped 4 hours (so 127 hours to go). Rather than waste more time crunching a work-unit that appears to be corrupt, I think I should simply delete this (and my other AP work-units). Do any of the experts here have a view about that way of resolving this? |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
I have an AP work-unit I am crunching now that was estimated to take 82 hours to complete (and three more similar AP WU's to go). After about 60 hours of crunching, the number of hours to completion jumped from about 15, to 131 hours! I re-booted and let it run for 4 more hours and the number of hours to completion has dropped 4 hours (so 127 hours to go). The experts might be able to tell you more about your situation, if your pc's aren't hidden so they can look at the tasks you have. |
![]() ![]() Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 ![]() ![]() |
I have an AP work-unit I am crunching now that was estimated to take 82 hours to complete (and three more similar AP WU's to go). After about 60 hours of crunching, the number of hours to completion jumped from about 15, to 131 hours! I re-booted and let it run for 4 more hours and the number of hours to completion has dropped 4 hours (so 127 hours to go). Sorry, I'm not going to un-hide my computers. But if it will help, I can copy/paste information about the problem work-units. |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
I have an AP work-unit I am crunching now that was estimated to take 82 hours to complete (and three more similar AP WU's to go). After about 60 hours of crunching, the number of hours to completion jumped from about 15, to 131 hours! I re-booted and let it run for 4 more hours and the number of hours to completion has dropped 4 hours (so 127 hours to go). Every little bit of info has the potential to help. Also knowing the type of system the APs are running on can give some here an idea what the normal run time might be. |
web03 ![]() Send message Joined: 13 Feb 01 Posts: 355 Credit: 719,156 RAC: 0 ![]() |
I have an AP work-unit I am crunching now that was estimated to take 82 hours to complete (and three more similar AP WU's to go). After about 60 hours of crunching, the number of hours to completion jumped from about 15, to 131 hours! I re-booted and let it run for 4 more hours and the number of hours to completion has dropped 4 hours (so 127 hours to go). We really can't see anything proprietary about your machines if you unhide them. Feel free to click on my link to see what we can see if you unhide. It does help us out a bit. Wendy |
![]() ![]() Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 ![]() ![]() |
I have an AP work-unit I am crunching now that was estimated to take 82 hours to complete (and three more similar AP WU's to go). After about 60 hours of crunching, the number of hours to completion jumped from about 15, to 131 hours! I re-booted and let it run for 4 more hours and the number of hours to completion has dropped 4 hours (so 127 hours to go). Time to completion jumped on a single WU, on only one machine. No other WU's or machines have been affected, so far. I need to know if this indicates a corrupt WU, in which case I will delete it and the three other AP WU's that came with it. I will not unhide my machines based on a general assertion that it may help, when there seems to be no logical relation between the problem and the machine information which SETI@home, properly, gave me the option to hide. |
web03 ![]() Send message Joined: 13 Feb 01 Posts: 355 Credit: 719,156 RAC: 0 ![]() |
I doubt your work unit is corrupt. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 ![]() |
... There's a known problem where the checkpoint file may be empty. In that case when restarting from a checkpoint the app tries several times to read the file but eventually gives up and starts processing at the beginning of the WU again. The app leaves clear evidence of those cases in sdterr. Setiathome_enhanced can also do the same, the partial workaround is to avoid restarting from checkpoints as much as possible by having BOINC keep preempted work in memory, but of course if BOINC is shut down and restarted the app cannot be left in memory. IMO it is unlikely that the problem was a corrupted WU, the symptoms indicate a glitch in processing like the above. Joe |
![]() ![]() Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 ![]() ![]() |
... Thank you for the very helpful answer, Joe. Based on this information, I will kill the affected WU but keep the three other AP WU's and hope they are not corrupted. Would it help if I let someone know the ID of the affected WU? |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
... From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. |
![]() ![]() Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 ![]() ![]() |
From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. There is time to complete it, but that would mean not being able to complete at least one, and maybe two of the other AP WU's. I have spent enough time on this one and there is no way of knowing whether this problem will reoccur with this WU that, even if it is not technically corrupted, has acted irregularly. |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. What is the progress % on the affected WU and what if any messages are you getting in the Manager regarding it? |
![]() ![]() Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 ![]() ![]() |
From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. It says 16% done, but I don't have any messages relating to this since I turn this machine off (a laptop) whenever I move it. |
![]() ![]() Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 ![]() |
From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. Well 16% done in this AP WU seems better to me than 0% when starting another. At some point did it go back to 0% like Joe described? How many cpu minutes does Manager show that it's invested in this WU? |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14687 Credit: 200,643,578 RAC: 874 ![]() ![]() |
From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. You do have messages - a full archive is kept in stdoutdae.txt (root of BOINC folder tree). It recycles to stdoutdae.old when its full (at 2MB). |
![]() ![]() Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 ![]() ![]() |
From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. I found such a file, but it had messages from 2006 (I did a full search for file names beginning with "stdout"). |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14687 Credit: 200,643,578 RAC: 874 ![]() ![]() |
From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. Yes, that's normal if you don't crunch very much. The ones on this box start with 2007-08-27 23:32:17 [SETI@home] [file_xfer] Started upload of file 11fe07ad.17166.1294.12.5.101_1_0 and end with 2008-09-26 00:45:28 [SETI@home] Resuming task ap_15au08aa_B5_P1_00156_20080916_20940.wu_1 using astropulse version 435 The clues to your problem may be in between, or they may, as Joe says, be in the std_err.txt uploaded when the task completes. |
Matthias Lehmkuhl ![]() ![]() ![]() Send message Joined: 5 Oct 99 Posts: 28 Credit: 10,832,348 RAC: 53 ![]() ![]() |
I've crashed one ap result, while the astropulse 4.36 app was downloaded incompletely. now I have downloaded the file again with the right size. resultid=1000975128 Exit status -185 (0xffffffffffffff47) edit: using app_info.xml Matthias |
HFB1217 ![]() Send message Joined: 25 Dec 05 Posts: 102 Credit: 9,424,572 RAC: 0 ![]() |
My Quad6600 overclocked to 3.5gig has errored out with AP 4.35 work units and the resulting message is: <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> CreateProcess() failed - </message> ]]> This has happened with more than one AP unit so I have set this system not to receive AP work units. All the other type of WUs complete without errors. Come and Visit Us at BBR TeamStarFire ****My 9th year of Seti****A Founding Member of the Original Seti Team Starfire at Broadband Reports.com **** |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14687 Credit: 200,643,578 RAC: 874 ![]() ![]() |
My Quad6600 overclocked to 3.5gig has errored out with AP 4.35 work units and the resulting message is: Your task list shows that you use an optimised app for the ordinary SETI tasks, so I guess you downloaded the AP files manually and tweaked your app_info.xml file. You can get this error message if the AP executable files you downloaded are incomplete. If you try again, check the file sizes of the AP files you download - they're posted in the AP FAQ thread (there's no significant difference between v4.35 and v4.36). |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.