Message boards :
Number crunching :
AstroPulse errors - Reporting
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 14 · Next
Author | Message |
---|---|
Qui-Gon Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 |
I have an AP work-unit I am crunching now that was estimated to take 82 hours to complete (and three more similar AP WU's to go). After about 60 hours of crunching, the number of hours to completion jumped from about 15, to 131 hours! I re-booted and let it run for 4 more hours and the number of hours to completion has dropped 4 hours (so 127 hours to go). Is this WU corrupt? It is due to be completed by 10/8/08, as are the other 3 similar WU's. If I let this one go for 127 hours, about 5 more days, I will not be able to complete all the other AP WU's . . . especially if their completion times jump the same as this one did. Any ideas or advice? |
Zerofool Send message Joined: 22 Apr 03 Posts: 4 Credit: 3,803,896 RAC: 0 |
I also have AP units with 0 granted credit. Here they are: Task ID - 996062921 Work unit ID - 335836004 and Task ID - 975422341 Work unit ID - 310380298 P.S.: Apparently this second (older) unit just got deleted :( And here's my results page if it's needed. |
Qui-Gon Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 |
I have an AP work-unit I am crunching now that was estimated to take 82 hours to complete (and three more similar AP WU's to go). After about 60 hours of crunching, the number of hours to completion jumped from about 15, to 131 hours! I re-booted and let it run for 4 more hours and the number of hours to completion has dropped 4 hours (so 127 hours to go). Rather than waste more time crunching a work-unit that appears to be corrupt, I think I should simply delete this (and my other AP work-units). Do any of the experts here have a view about that way of resolving this? |
Byron S Goodgame Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 |
I have an AP work-unit I am crunching now that was estimated to take 82 hours to complete (and three more similar AP WU's to go). After about 60 hours of crunching, the number of hours to completion jumped from about 15, to 131 hours! I re-booted and let it run for 4 more hours and the number of hours to completion has dropped 4 hours (so 127 hours to go). The experts might be able to tell you more about your situation, if your pc's aren't hidden so they can look at the tasks you have. |
Qui-Gon Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 |
I have an AP work-unit I am crunching now that was estimated to take 82 hours to complete (and three more similar AP WU's to go). After about 60 hours of crunching, the number of hours to completion jumped from about 15, to 131 hours! I re-booted and let it run for 4 more hours and the number of hours to completion has dropped 4 hours (so 127 hours to go). Sorry, I'm not going to un-hide my computers. But if it will help, I can copy/paste information about the problem work-units. |
Byron S Goodgame Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 |
I have an AP work-unit I am crunching now that was estimated to take 82 hours to complete (and three more similar AP WU's to go). After about 60 hours of crunching, the number of hours to completion jumped from about 15, to 131 hours! I re-booted and let it run for 4 more hours and the number of hours to completion has dropped 4 hours (so 127 hours to go). Every little bit of info has the potential to help. Also knowing the type of system the APs are running on can give some here an idea what the normal run time might be. |
web03 Send message Joined: 13 Feb 01 Posts: 355 Credit: 719,156 RAC: 0 |
I have an AP work-unit I am crunching now that was estimated to take 82 hours to complete (and three more similar AP WU's to go). After about 60 hours of crunching, the number of hours to completion jumped from about 15, to 131 hours! I re-booted and let it run for 4 more hours and the number of hours to completion has dropped 4 hours (so 127 hours to go). We really can't see anything proprietary about your machines if you unhide them. Feel free to click on my link to see what we can see if you unhide. It does help us out a bit. Wendy |
Qui-Gon Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 |
I have an AP work-unit I am crunching now that was estimated to take 82 hours to complete (and three more similar AP WU's to go). After about 60 hours of crunching, the number of hours to completion jumped from about 15, to 131 hours! I re-booted and let it run for 4 more hours and the number of hours to completion has dropped 4 hours (so 127 hours to go). Time to completion jumped on a single WU, on only one machine. No other WU's or machines have been affected, so far. I need to know if this indicates a corrupt WU, in which case I will delete it and the three other AP WU's that came with it. I will not unhide my machines based on a general assertion that it may help, when there seems to be no logical relation between the problem and the machine information which SETI@home, properly, gave me the option to hide. |
web03 Send message Joined: 13 Feb 01 Posts: 355 Credit: 719,156 RAC: 0 |
I doubt your work unit is corrupt. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
... There's a known problem where the checkpoint file may be empty. In that case when restarting from a checkpoint the app tries several times to read the file but eventually gives up and starts processing at the beginning of the WU again. The app leaves clear evidence of those cases in sdterr. Setiathome_enhanced can also do the same, the partial workaround is to avoid restarting from checkpoints as much as possible by having BOINC keep preempted work in memory, but of course if BOINC is shut down and restarted the app cannot be left in memory. IMO it is unlikely that the problem was a corrupted WU, the symptoms indicate a glitch in processing like the above. Joe |
Qui-Gon Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 |
... Thank you for the very helpful answer, Joe. Based on this information, I will kill the affected WU but keep the three other AP WU's and hope they are not corrupted. Would it help if I let someone know the ID of the affected WU? |
Byron S Goodgame Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 |
... From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. |
Qui-Gon Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 |
From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. There is time to complete it, but that would mean not being able to complete at least one, and maybe two of the other AP WU's. I have spent enough time on this one and there is no way of knowing whether this problem will reoccur with this WU that, even if it is not technically corrupted, has acted irregularly. |
Byron S Goodgame Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 |
From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. What is the progress % on the affected WU and what if any messages are you getting in the Manager regarding it? |
Qui-Gon Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 |
From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. It says 16% done, but I don't have any messages relating to this since I turn this machine off (a laptop) whenever I move it. |
Byron S Goodgame Send message Joined: 16 Jan 06 Posts: 1145 Credit: 3,936,993 RAC: 0 |
From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. Well 16% done in this AP WU seems better to me than 0% when starting another. At some point did it go back to 0% like Joe described? How many cpu minutes does Manager show that it's invested in this WU? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. You do have messages - a full archive is kept in stdoutdae.txt (root of BOINC folder tree). It recycles to stdoutdae.old when its full (at 2MB). |
Qui-Gon Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 |
From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. I found such a file, but it had messages from 2006 (I did a full search for file names beginning with "stdout"). |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14677 Credit: 200,643,578 RAC: 874 |
From what you've described, and from what others here have told you, there doesn't seem to be anything wrong with the WU, so there doesn't seem to be a need to abort it if there's still time for it to complete. Yes, that's normal if you don't crunch very much. The ones on this box start with 2007-08-27 23:32:17 [SETI@home] [file_xfer] Started upload of file 11fe07ad.17166.1294.12.5.101_1_0 and end with 2008-09-26 00:45:28 [SETI@home] Resuming task ap_15au08aa_B5_P1_00156_20080916_20940.wu_1 using astropulse version 435 The clues to your problem may be in between, or they may, as Joe says, be in the std_err.txt uploaded when the task completes. |
Matthias Lehmkuhl Send message Joined: 5 Oct 99 Posts: 28 Credit: 10,832,348 RAC: 53 |
I've crashed one ap result, while the astropulse 4.36 app was downloaded incompletely. now I have downloaded the file again with the right size. resultid=1000975128 Exit status -185 (0xffffffffffffff47) edit: using app_info.xml Matthias |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.