AstroPulse errors - Reporting

Author	Message
JDWhale Volunteer tester Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3	Message 797271 - Posted: 13 Aug 2008, 15:14:09 UTC - in response to Message 797248. Last modified: 13 Aug 2008, 15:15:17 UTC magpie2005 wrote: Well I let crunch... and crunch... and it kept on crunching... started off feeding it 111hrs and it ended up crunching its way through 164.6 hrs of CPU time... damned hungry little beggar that one is... Congratulations... You deserve a pat on the back for persevering, thanks for posting and allowing us to share your experience. I hope you don't mind that I point to your AP WU, wuid=309347223. The best thing about AP WUs seems to be that extended runtimes might fill in gaps of lean WU splitting and server outages. On the otherside... some might say that the lack of work is being caused by AP WU splitting and distribution. Comparing CPU times and credit claims of MB and AP WUs on your host, it looks like AP is underclaiming MB stock client by ~40%. As this is supposed to be addressed with the "adjustable" sliding multiplier within 30 days to narrow the gap, it seems that the early adopters are paying a premium to be the "guinea pigs". This is just the opposite of what I would expect; If the project administrators wanted to gain acceptance it seems they would be paying a premium in credits at first and allow the gap to narrow from the other direction. Folks might then be more accomodating to crunching AP during these "early" days; Especially considering the earlier distribution of "ghosts" and guaranteed missed deadlines due to issuing AP work to underqualified hosts. Just my opinion, JDWhale ID: 797271 ·

JSabin Send message Joined: 20 Aug 07 Posts: 40 Credit: 978,691 RAC: 0	Message 797288 - Posted: 13 Aug 2008, 15:59:25 UTC - in response to Message 797271. magpie2005 wrote: Well I let crunch... and crunch... and it kept on crunching... started off feeding it 111hrs and it ended up crunching its way through 164.6 hrs of CPU time... damned hungry little beggar that one is... Congratulations... You deserve a pat on the back for persevering, thanks for posting and allowing us to share your experience. I hope you don't mind that I point to your AP WU, wuid=309347223. The best thing about AP WUs seems to be that extended runtimes might fill in gaps of lean WU splitting and server outages. On the otherside... some might say that the lack of work is being caused by AP WU splitting and distribution. Comparing CPU times and credit claims of MB and AP WUs on your host, it looks like AP is underclaiming MB stock client by ~40%. As this is supposed to be addressed with the "adjustable" sliding multiplier within 30 days to narrow the gap, it seems that the early adopters are paying a premium to be the "guinea pigs". This is just the opposite of what I would expect; If the project administrators wanted to gain acceptance it seems they would be paying a premium in credits at first and allow the gap to narrow from the other direction. Folks might then be more accomodating to crunching AP during these "early" days; Especially considering the earlier distribution of "ghosts" and guaranteed missed deadlines due to issuing AP work to underqualified hosts. Just my opinion, JDWhale I agree, plus they never finish on my machines. They get to a point and continue to clock time, but the time to finish doesn't change and the percentage done doesn't change. I'm not talking about staring at it for 5 minutes, I mean all day while I'm at work. After that happened on both of my machines, I just started aborting them when I saw them, why waste cpu time on a process that won't finish? But then I ended up with no work because they are such beasts they loaded down my work load quota. For the past two days my machines had nothing to do. So I've switched to Einstein on both of my computers. ID: 797288 ·

Larry Send message Joined: 8 Jul 08 Posts: 11 Credit: 692,410 RAC: 0	Message 797407 - Posted: 13 Aug 2008, 20:40:52 UTC - in response to Message 797288. magpie2005 wrote: Well I let crunch... and crunch... and it kept on crunching... started off feeding it 111hrs and it ended up crunching its way through 164.6 hrs of CPU time... damned hungry little beggar that one is... Congratulations... You deserve a pat on the back for persevering, thanks for posting and allowing us to share your experience. I hope you don't mind that I point to your AP WU, wuid=309347223. The best thing about AP WUs seems to be that extended runtimes might fill in gaps of lean WU splitting and server outages. On the otherside... some might say that the lack of work is being caused by AP WU splitting and distribution. Comparing CPU times and credit claims of MB and AP WUs on your host, it looks like AP is underclaiming MB stock client by ~40%. As this is supposed to be addressed with the "adjustable" sliding multiplier within 30 days to narrow the gap, it seems that the early adopters are paying a premium to be the "guinea pigs". This is just the opposite of what I would expect; If the project administrators wanted to gain acceptance it seems they would be paying a premium in credits at first and allow the gap to narrow from the other direction. Folks might then be more accomodating to crunching AP during these "early" days; Especially considering the earlier distribution of "ghosts" and guaranteed missed deadlines due to issuing AP work to underqualified hosts. Just my opinion, JDWhale I agree, plus they never finish on my machines. They get to a point and continue to clock time, but the time to finish doesn't change and the percentage done doesn't change. I'm not talking about staring at it for 5 minutes, I mean all day while I'm at work. After that happened on both of my machines, I just started aborting them when I saw them, why waste cpu time on a process that won't finish? But then I ended up with no work because they are such beasts they loaded down my work load quota. For the past two days my machines had nothing to do. So I've switched to Einstein on both of my computers. The AP units seem to be doing pretty well on my 2.4GHz Q6600. I'm receiving about the same credits per CPU seconds as with MB. But I do have some that may not get any credit for awhile. Several days ago I received an AP WU on my old 450MHz PII. It started out not looking too bad. The initial estimate to completion was about 500 Hours with a 30 day, 720 hour limit. I let it run for a few days but based upon the actual CPU hours it looked like it was going to take more like 950 hours if it remained linear. I just aborted it to give someone else a try at it. LOL I imagine that they will get this sorted out eventually. It seems curious that they don't seem to finish on your machine. Maybe there are some more issues. Larry ID: 797407 ·

magpie2005 Send message Joined: 2 Dec 05 Posts: 9 Credit: 464,062 RAC: 0	Message 797515 - Posted: 13 Aug 2008, 23:49:55 UTC - in response to Message 797407. magpie2005 wrote: Well I let crunch... and crunch... and it kept on crunching... started off feeding it 111hrs and it ended up crunching its way through 164.6 hrs of CPU time... damned hungry little beggar that one is... Congratulations... You deserve a pat on the back for persevering, thanks for posting and allowing us to share your experience. I hope you don't mind that I point to your AP WU, wuid=309347223. The best thing about AP WUs seems to be that extended runtimes might fill in gaps of lean WU splitting and server outages. On the otherside... some might say that the lack of work is being caused by AP WU splitting and distribution. Comparing CPU times and credit claims of MB and AP WUs on your host, it looks like AP is underclaiming MB stock client by ~40%. As this is supposed to be addressed with the "adjustable" sliding multiplier within 30 days to narrow the gap, it seems that the early adopters are paying a premium to be the "guinea pigs". This is just the opposite of what I would expect; If the project administrators wanted to gain acceptance it seems they would be paying a premium in credits at first and allow the gap to narrow from the other direction. Folks might then be more accomodating to crunching AP during these "early" days; Especially considering the earlier distribution of "ghosts" and guaranteed missed deadlines due to issuing AP work to underqualified hosts. Just my opinion, JDWhale I agree, plus they never finish on my machines. They get to a point and continue to clock time, but the time to finish doesn't change and the percentage done doesn't change. I'm not talking about staring at it for 5 minutes, I mean all day while I'm at work. After that happened on both of my machines, I just started aborting them when I saw them, why waste cpu time on a process that won't finish? But then I ended up with no work because they are such beasts they loaded down my work load quota. For the past two days my machines had nothing to do. So I've switched to Einstein on both of my computers. The AP units seem to be doing pretty well on my 2.4GHz Q6600. I'm receiving about the same credits per CPU seconds as with MB. But I do have some that may not get any credit for awhile. Several days ago I received an AP WU on my old 450MHz PII. It started out not looking too bad. The initial estimate to completion was about 500 Hours with a 30 day, 720 hour limit. I let it run for a few days but based upon the actual CPU hours it looked like it was going to take more like 950 hours if it remained linear. I just aborted it to give someone else a try at it. LOL I imagine that they will get this sorted out eventually. It seems curious that they don't seem to finish on your machine. Maybe there are some more issues. Larry Well my second AP WU is ongoing now but I'm not so sure I'll be quite as patient with this one. It started off at 95 hours and after 36 there is still another 197 to go. If it keeps extending time at this rate my machine will have collapsed and died and I'll be six feet under before getting any credit ha-ha. But I'll give it a few days and see how it goes. Glutton for punishment I am... What the ................ Is that really .......... It can't be ............. no... NO... NO... NOOOOOO aaaaaAAAAAARRRRGGGGHHHHHHHHH ID: 797515 ·

JDWhale Volunteer tester Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3	Message 797521 - Posted: 14 Aug 2008, 0:25:20 UTC - in response to Message 797515. magpie2005 wrote: Well my second AP WU is ongoing now but I'm not so sure I'll be quite as patient with this one. It started off at 95 hours and after 36 there is still another 197 to go. If it keeps extending time at this rate my machine will have collapsed and died and I'll be six feet under before getting any credit ha-ha. But I'll give it a few days and see how it goes. Glutton for punishment I am... I'm guessing that your host is a P4 Prescott with hyperthreading(HT)... and that much of the time your first AP WU was crunching, it was the only WU being crunched... Couldn't download new work from the servers for the past 4 days. These HT CPUs perform much better when there isn't competition from a second WU, almost doubling it's performance... There is still better overall throughput crunching 2 WU at a time, but each WU pays a significant penalty when sharing the CPU... You might consider putting the AP WU in suspended state and then resume if/when the servers run out of work.... Just a thought. Happy BOINCing, JDWhale ID: 797521 ·

magpie2005 Send message Joined: 2 Dec 05 Posts: 9 Credit: 464,062 RAC: 0	Message 797532 - Posted: 14 Aug 2008, 0:51:26 UTC - in response to Message 797521. magpie2005 wrote: Well my second AP WU is ongoing now but I'm not so sure I'll be quite as patient with this one. It started off at 95 hours and after 36 there is still another 197 to go. If it keeps extending time at this rate my machine will have collapsed and died and I'll be six feet under before getting any credit ha-ha. But I'll give it a few days and see how it goes. Glutton for punishment I am... I'm guessing that your host is a P4 Prescott with hyperthreading(HT)... and that much of the time your first AP WU was crunching, it was the only WU being crunched... Couldn't download new work from the servers for the past 4 days. These HT CPUs perform much better when there isn't competition from a second WU, almost doubling it's performance... There is still better overall throughput crunching 2 WU at a time, but each WU pays a significant penalty when sharing the CPU... You might consider putting the AP WU in suspended state and then resume if/when the servers run out of work.... Just a thought. Happy BOINCing, JDWhale You know I really must get up to speed with some of the jargon and technical information that people talk about on here. I'm not that techie minded and really just like to know that me and my computer are contributing but... P4 yes, Prescott, well he's an abnoxious Labour MP, hyperthreading, is that the speed in which you thread needles??? The only WU being crunched, never, it has always been doing two at the same time... OK so I make light of a serious subject but as I say I'm not that clever - hey that's why I use a computer!! But if you can't have a laugh now and again... As for the other information I'll take it on board and think about suspending the AP WU for a while and see what happens. Keep talking techie... it makes me feel important!!! What the ................ Is that really .......... It can't be ............. no... NO... NO... NOOOOOO aaaaaAAAAAARRRRGGGGHHHHHHHHH ID: 797532 ·

Scott Send message Joined: 9 May 08 Posts: 3 Credit: 502,858 RAC: 0	Message 797786 - Posted: 14 Aug 2008, 13:29:37 UTC Last modified: 14 Aug 2008, 13:32:40 UTC Just for the record, I've just about finished crunching my second Astropulse unit on a stock speed E6750. WU 312311405 completed after 41 or so hours, and WU 313876583 should finish around the 39 hour mark. Nothing compared to the CPDN model they were crunching in parallel with - 630 hours down and only 30% completed ;) Edit: Hmm. this was supposed to be in the "snagged" thread - oh well. ID: 797786 ·

Dotsch Volunteer tester Send message Joined: 9 Jun 99 Posts: 2422 Credit: 919,393 RAC: 0	Message 798004 - Posted: 14 Aug 2008, 22:08:06 UTC - in response to Message 796566. ... process exited with code 193 (0xc1, -63) ... terminate called after throwing an instance of 'std::bad_alloc' what(): St9bad_alloc SIGABRT: abort called In very few circumstances the MacOS application crashs with these error messages on PPC and Intel. With he preferences option "leave application in memory while supspended" it looks better and the application crashs not so often. But it could also happens... I will contact Eric about this issue. ID: 798004 ·

Allan David Watson Send message Joined: 21 Jun 99 Posts: 2 Credit: 114,657 RAC: 0	Message 798140 - Posted: 15 Aug 2008, 1:58:35 UTC - in response to Message 796757. Astropulse projects hang up on my system after spending tens of hours on them. I've had this happen three times now. In the future, if I see them in my queue, I'll just delete them. Anyone else having this issue? I have a similar problem. "Normal" SETI runs are fine but Astropulse crashes my computer after a few minutes or after several hours. No error messages, no response from ctrl-alt-del, all I can do is hit the reset button. I run XP Home SP3, 1.8GHZ AMD prcessor, 512 MB ram and over 55 GB free space on disk. ID: 798140 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 798379 - Posted: 15 Aug 2008, 16:36:00 UTC Last modified: 15 Aug 2008, 16:37:19 UTC I'm wondering if we need another DCF for AP. On my machines everytime an AP work unit completes the cache size suddenly doubles in the time estimated to complete. Is a second DCF even possible in Boinc / Seti? [edit] Currently crunching for both MB and AP on all my boxes. Boinc....Boinc....Boinc....Boinc.... ID: 798379 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 798440 - Posted: 15 Aug 2008, 19:18:55 UTC - in response to Message 798379. I'm wondering if we need another DCF for AP. On my machines everytime an AP work unit completes the cache size suddenly doubles in the time estimated to complete. Is a second DCF even possible in Boinc / Seti? [edit] Currently crunching for both MB and AP on all my boxes. Duration Correction Factor is a project statistic for each host. It would take non-trivial changes in BOINC to make it an application statistic, and even that wouldn't help projects which have applications which can do various kinds of work. To make DCF work as well as it can, the project needs to get the raw estimates for all work in line. From reported data here it looks like the AP estimate needs to be scaled up, but the most that can be expected is scaling to approximately match stock setiathome_enhanced. Those running optimized s_e will continue to have a significant mismatch unless optimized versions of AstroPulse can be produced which achieve similar speed improvements. For now, if the estimate for s_e work is about twice real crunch time, setting the "extra work" preference moderately high will compensate. I wouldn't suggest doubling it, but a 50% increase seems reasonable. Joe ID: 798440 ·

Geek@Play Volunteer tester Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0	Message 798454 - Posted: 15 Aug 2008, 19:57:41 UTC - in response to Message 798440. Duration Correction Factor is a project statistic for each host. It would take non-trivial changes in BOINC to make it an application statistic, and even that wouldn't help projects which have applications which can do various kinds of work. To make DCF work as well as it can, the project needs to get the raw estimates for all work in line. From reported data here it looks like the AP estimate needs to be scaled up, but the most that can be expected is scaling to approximately match stock setiathome_enhanced. Those running optimized s_e will continue to have a significant mismatch unless optimized versions of AstroPulse can be produced which achieve similar speed improvements. For now, if the estimate for s_e work is about twice real crunch time, setting the "extra work" preference moderately high will compensate. I wouldn't suggest doubling it, but a 50% increase seems reasonable. Joe Thanks Joe for your response. I was thinking also that the work involved in creating another DCF for AP, well....it's not going to happen. I'll just leave my cache sizes as is. It does not take long for the DCF to work it's way back to a reasonable value. Boinc....Boinc....Boinc....Boinc.... ID: 798454 ·

Richard Turnbull Send message Joined: 25 Jun 99 Posts: 54 Credit: 90,402,501 RAC: 0	Message 798821 - Posted: 16 Aug 2008, 10:23:43 UTC I ahve completed 3 AP WUs, and all have pending credit of around 50% of what I would have expected. A normal WU receives 1 credit for each 260 seconds of work, however AP is returning 1 credit for every 500 seconds, making it less worthwhile crunching AP. Will the granted credit be closer to what would be expected, or are we being punished for doing such long task? ID: 798821 ·

SeaEagle Send message Joined: 14 Jun 99 Posts: 12 Credit: 3,291,985 RAC: 2	Message 799437 - Posted: 17 Aug 2008, 23:12:18 UTC AP has been running forever 167:21:03 hours/min/sec. I suspended it to write down the info: CPU Time = 167.21.29 hr/min/sec Progress = 99.950% Time to complete = 4:59 min/sec As soon as I restarted, it completed with computation error. I've read through this post several times while this wu was processing and decided to let it run to the end. Here's the WU : http://setiathome.berkeley.edu/workunit.php?wuid=313238622 The other user who processed it took 159,845.80 sec. to complete and claimed 714.40 credit. I processed it in 602,463.10 sec. and claimed 714.04 credit. How is credit determined? CPU time should be the basis, not the fastest finisher. And after processing to 99.950% completion and it errors out do you get any credit at all? I'm not running any more AP's until I get some answers. I've had SETI running since June 1999 and been through the growing pains and I'll keep it running. Regards - Gregg - need to go check on Fay and see if I need to pull out the plywoood ID: 799437 ·

arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 799520 - Posted: 18 Aug 2008, 4:27:09 UTC - in response to Message 799437. AP has been running forever 167:21:03 hours/min/sec. I suspended it to write down the info: CPU Time = 167.21.29 hr/min/sec Progress = 99.950% Time to complete = 4:59 min/sec As soon as I restarted, it completed with computation error. I've read through this post several times while this wu was processing and decided to let it run to the end. Here's the WU : http://setiathome.berkeley.edu/workunit.php?wuid=313238622 The other user who processed it took 159,845.80 sec. to complete and claimed 714.40 credit. I processed it in 602,463.10 sec. and claimed 714.04 credit. How is credit determined? CPU time should be the basis, not the fastest finisher. And after processing to 99.950% completion and it errors out do you get any credit at all? I'm not running any more AP's until I get some answers. I've had SETI running since June 1999 and been through the growing pains and I'll keep it running. Regards - Gregg - need to go check on Fay and see if I need to pull out the plywoood Looks like you got bitten by the same bug as me, I lost an AP unit as well when it was suspended. Try turning on the "Leave Application in Memory" option. ID: 799520 ·

DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2	Message 799634 - Posted: 18 Aug 2008, 15:16:15 UTC - in response to Message 798440. To make DCF work as well as it can, the project needs to get the raw estimates for all work in line. From reported data here it looks like the AP estimate needs to be scaled up, but the most that can be expected is scaling to approximately match stock setiathome_enhanced. Those running optimized s_e will continue to have a significant mismatch unless optimized versions of AstroPulse can be produced which achieve similar speed improvements. All you can do is calibrate the scaling to the stock apps. The guys running optimized apps should be smart enough to know the est. completion times will be off. :) That said, my 1st AP WU since reattaching to SETI says it will take 92 hours on a 3.0GHz E8400. That seems a bit high compared to the others on the forum. I'll wait until it reports OK before talking anymore about this. ID: 799634 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 799646 - Posted: 18 Aug 2008, 15:58:46 UTC Last modified: 18 Aug 2008, 16:00:35 UTC reporting possible corruption of checkpoint data on application suspend/termination, manifesting as exception/compute error on resume. still investigating here. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 799646 ·

seversen Send message Joined: 24 Feb 00 Posts: 2 Credit: 50,152 RAC: 0	Message 799825 - Posted: 19 Aug 2008, 4:18:30 UTC Does anyone have an idea as to why this AP unit returned a compute error? http://setiathome.berkeley.edu/result.php?resultid=946854005 Thanks. ID: 799825 ·

6dj72cn8 Send message Joined: 3 Sep 99 Posts: 24 Credit: 163,811 RAC: 0	Message 799845 - Posted: 19 Aug 2008, 7:19:41 UTC - in response to Message 799520. Looks like you got bitten by the same bug as me, I lost an AP unit as well when it was suspended. Try turning on the "Leave Application in Memory" option. My first Astropulse unit has just crashed too. Intel Mac. Task result As with others below (or above, depending which way you like your threads), I had suspended it while running other things and it crashed immediately upon resumption. When I have used 'Leave Application in Memory' in the past, I have had trouble with tasks not releasing the CPU and seeming to get stuck in a full power loop, busily calculating nothing. Task progress was not incrementing but Hardware Monitor showed the processor(s) still steaming away at 100%. I think I'll wait until a real fix is forthcoming. ID: 799845 ·

PeterRehm Send message Joined: 12 Jul 99 Posts: 13 Credit: 1,268,024 RAC: 0	Message 799901 - Posted: 19 Aug 2008, 13:57:20 UTC - in response to Message 794480. Finished an Astropulse crunch. 940322137 309888091 3 Aug 2008 16:52:28 UTC 5 Aug 2008 23:41:36 UTC Over Success Done 168,830.40 719.06 0.00 719.06 credits requested 0.00 credits given That was alot of CPU time to get squat. :-( Next time I see an astropulse work item, I'll be sure to cancel/abort it. Agreed. My computer worked an entire week during which time I'd get at least 300 per day. The 700 or so I got really was not adequate. I'll be canceling the unit the same next time I see an Astropulse WU if this is not quickly addressed. ID: 799901 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.