AstroPulse errors - Reporting

Message boards : Number crunching : AstroPulse errors - Reporting
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 14 · Next

AuthorMessage
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 797271 - Posted: 13 Aug 2008, 15:14:09 UTC - in response to Message 797248.  
Last modified: 13 Aug 2008, 15:15:17 UTC

magpie2005 wrote:

Well I let crunch... and crunch... and it kept on crunching... started off feeding it 111hrs and it ended up crunching its way through 164.6 hrs of CPU time... damned hungry little beggar that one is...


Congratulations... You deserve a pat on the back for persevering, thanks for posting and allowing us to share your experience. I hope you don't mind that I point to your AP WU, wuid=309347223.

The best thing about AP WUs seems to be that extended runtimes might fill in gaps of lean WU splitting and server outages. On the otherside... some might say that the lack of work is being caused by AP WU splitting and distribution.

Comparing CPU times and credit claims of MB and AP WUs on your host, it looks like AP is underclaiming MB stock client by ~40%. As this is supposed to be addressed with the "adjustable" sliding multiplier within 30 days to narrow the gap, it seems that the early adopters are paying a premium to be the "guinea pigs". This is just the opposite of what I would expect; If the project administrators wanted to gain acceptance it seems they would be paying a premium in credits at first and allow the gap to narrow from the other direction. Folks might then be more accomodating to crunching AP during these "early" days; Especially considering the earlier distribution of "ghosts" and guaranteed missed deadlines due to issuing AP work to underqualified hosts.

Just my opinion,
JDWhale
ID: 797271 · Report as offensive
Profile JSabin

Send message
Joined: 20 Aug 07
Posts: 40
Credit: 978,691
RAC: 0
United States
Message 797288 - Posted: 13 Aug 2008, 15:59:25 UTC - in response to Message 797271.  

magpie2005 wrote:

Well I let crunch... and crunch... and it kept on crunching... started off feeding it 111hrs and it ended up crunching its way through 164.6 hrs of CPU time... damned hungry little beggar that one is...


Congratulations... You deserve a pat on the back for persevering, thanks for posting and allowing us to share your experience. I hope you don't mind that I point to your AP WU, wuid=309347223.

The best thing about AP WUs seems to be that extended runtimes might fill in gaps of lean WU splitting and server outages. On the otherside... some might say that the lack of work is being caused by AP WU splitting and distribution.

Comparing CPU times and credit claims of MB and AP WUs on your host, it looks like AP is underclaiming MB stock client by ~40%. As this is supposed to be addressed with the "adjustable" sliding multiplier within 30 days to narrow the gap, it seems that the early adopters are paying a premium to be the "guinea pigs". This is just the opposite of what I would expect; If the project administrators wanted to gain acceptance it seems they would be paying a premium in credits at first and allow the gap to narrow from the other direction. Folks might then be more accomodating to crunching AP during these "early" days; Especially considering the earlier distribution of "ghosts" and guaranteed missed deadlines due to issuing AP work to underqualified hosts.

Just my opinion,
JDWhale


I agree, plus they never finish on my machines. They get to a point and continue to clock time, but the time to finish doesn't change and the percentage done doesn't change. I'm not talking about staring at it for 5 minutes, I mean all day while I'm at work. After that happened on both of my machines, I just started aborting them when I saw them, why waste cpu time on a process that won't finish? But then I ended up with no work because they are such beasts they loaded down my work load quota. For the past two days my machines had nothing to do. So I've switched to Einstein on both of my computers.
ID: 797288 · Report as offensive
Larry

Send message
Joined: 8 Jul 08
Posts: 11
Credit: 692,410
RAC: 0
United States
Message 797407 - Posted: 13 Aug 2008, 20:40:52 UTC - in response to Message 797288.  

magpie2005 wrote:

Well I let crunch... and crunch... and it kept on crunching... started off feeding it 111hrs and it ended up crunching its way through 164.6 hrs of CPU time... damned hungry little beggar that one is...


Congratulations... You deserve a pat on the back for persevering, thanks for posting and allowing us to share your experience. I hope you don't mind that I point to your AP WU, wuid=309347223.

The best thing about AP WUs seems to be that extended runtimes might fill in gaps of lean WU splitting and server outages. On the otherside... some might say that the lack of work is being caused by AP WU splitting and distribution.

Comparing CPU times and credit claims of MB and AP WUs on your host, it looks like AP is underclaiming MB stock client by ~40%. As this is supposed to be addressed with the "adjustable" sliding multiplier within 30 days to narrow the gap, it seems that the early adopters are paying a premium to be the "guinea pigs". This is just the opposite of what I would expect; If the project administrators wanted to gain acceptance it seems they would be paying a premium in credits at first and allow the gap to narrow from the other direction. Folks might then be more accomodating to crunching AP during these "early" days; Especially considering the earlier distribution of "ghosts" and guaranteed missed deadlines due to issuing AP work to underqualified hosts.

Just my opinion,
JDWhale


I agree, plus they never finish on my machines. They get to a point and continue to clock time, but the time to finish doesn't change and the percentage done doesn't change. I'm not talking about staring at it for 5 minutes, I mean all day while I'm at work. After that happened on both of my machines, I just started aborting them when I saw them, why waste cpu time on a process that won't finish? But then I ended up with no work because they are such beasts they loaded down my work load quota. For the past two days my machines had nothing to do. So I've switched to Einstein on both of my computers.


The AP units seem to be doing pretty well on my 2.4GHz Q6600. I'm receiving about the same credits per CPU seconds as with MB. But I do have some that may not get any credit for awhile.

Several days ago I received an AP WU on my old 450MHz PII. It started out not looking too bad. The initial estimate to completion was about 500 Hours with a 30 day, 720 hour limit. I let it run for a few days but based upon the actual CPU hours it looked like it was going to take more like 950 hours if it remained linear. I just aborted it to give someone else a try at it. LOL

I imagine that they will get this sorted out eventually.

It seems curious that they don't seem to finish on your machine. Maybe there are some more issues.

Larry
ID: 797407 · Report as offensive
Profile magpie2005
Avatar

Send message
Joined: 2 Dec 05
Posts: 9
Credit: 464,062
RAC: 0
United Kingdom
Message 797515 - Posted: 13 Aug 2008, 23:49:55 UTC - in response to Message 797407.  

magpie2005 wrote:

Well I let crunch... and crunch... and it kept on crunching... started off feeding it 111hrs and it ended up crunching its way through 164.6 hrs of CPU time... damned hungry little beggar that one is...


Congratulations... You deserve a pat on the back for persevering, thanks for posting and allowing us to share your experience. I hope you don't mind that I point to your AP WU, wuid=309347223.

The best thing about AP WUs seems to be that extended runtimes might fill in gaps of lean WU splitting and server outages. On the otherside... some might say that the lack of work is being caused by AP WU splitting and distribution.

Comparing CPU times and credit claims of MB and AP WUs on your host, it looks like AP is underclaiming MB stock client by ~40%. As this is supposed to be addressed with the "adjustable" sliding multiplier within 30 days to narrow the gap, it seems that the early adopters are paying a premium to be the "guinea pigs". This is just the opposite of what I would expect; If the project administrators wanted to gain acceptance it seems they would be paying a premium in credits at first and allow the gap to narrow from the other direction. Folks might then be more accomodating to crunching AP during these "early" days; Especially considering the earlier distribution of "ghosts" and guaranteed missed deadlines due to issuing AP work to underqualified hosts.

Just my opinion,
JDWhale


I agree, plus they never finish on my machines. They get to a point and continue to clock time, but the time to finish doesn't change and the percentage done doesn't change. I'm not talking about staring at it for 5 minutes, I mean all day while I'm at work. After that happened on both of my machines, I just started aborting them when I saw them, why waste cpu time on a process that won't finish? But then I ended up with no work because they are such beasts they loaded down my work load quota. For the past two days my machines had nothing to do. So I've switched to Einstein on both of my computers.


The AP units seem to be doing pretty well on my 2.4GHz Q6600. I'm receiving about the same credits per CPU seconds as with MB. But I do have some that may not get any credit for awhile.

Several days ago I received an AP WU on my old 450MHz PII. It started out not looking too bad. The initial estimate to completion was about 500 Hours with a 30 day, 720 hour limit. I let it run for a few days but based upon the actual CPU hours it looked like it was going to take more like 950 hours if it remained linear. I just aborted it to give someone else a try at it. LOL

I imagine that they will get this sorted out eventually.

It seems curious that they don't seem to finish on your machine. Maybe there are some more issues.

Larry


Well my second AP WU is ongoing now but I'm not so sure I'll be quite as patient with this one. It started off at 95 hours and after 36 there is still another 197 to go. If it keeps extending time at this rate my machine will have collapsed and died and I'll be six feet under before getting any credit ha-ha. But I'll give it a few days and see how it goes. Glutton for punishment I am...



What the ................
Is that really ..........
It can't be .............
no... NO... NO... NOOOOOO
aaaaaAAAAAARRRRGGGGHHHHHHHHH
ID: 797515 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 797521 - Posted: 14 Aug 2008, 0:25:20 UTC - in response to Message 797515.  

magpie2005 wrote:

Well my second AP WU is ongoing now but I'm not so sure I'll be quite as patient with this one. It started off at 95 hours and after 36 there is still another 197 to go. If it keeps extending time at this rate my machine will have collapsed and died and I'll be six feet under before getting any credit ha-ha. But I'll give it a few days and see how it goes. Glutton for punishment I am...



I'm guessing that your host is a P4 Prescott with hyperthreading(HT)... and that much of the time your first AP WU was crunching, it was the only WU being crunched... Couldn't download new work from the servers for the past 4 days.

These HT CPUs perform much better when there isn't competition from a second WU, almost doubling it's performance... There is still better overall throughput crunching 2 WU at a time, but each WU pays a significant penalty when sharing the CPU... You might consider putting the AP WU in suspended state and then resume if/when the servers run out of work.... Just a thought.

Happy BOINCing,
JDWhale
ID: 797521 · Report as offensive
Profile magpie2005
Avatar

Send message
Joined: 2 Dec 05
Posts: 9
Credit: 464,062
RAC: 0
United Kingdom
Message 797532 - Posted: 14 Aug 2008, 0:51:26 UTC - in response to Message 797521.  

magpie2005 wrote:

Well my second AP WU is ongoing now but I'm not so sure I'll be quite as patient with this one. It started off at 95 hours and after 36 there is still another 197 to go. If it keeps extending time at this rate my machine will have collapsed and died and I'll be six feet under before getting any credit ha-ha. But I'll give it a few days and see how it goes. Glutton for punishment I am...



I'm guessing that your host is a P4 Prescott with hyperthreading(HT)... and that much of the time your first AP WU was crunching, it was the only WU being crunched... Couldn't download new work from the servers for the past 4 days.

These HT CPUs perform much better when there isn't competition from a second WU, almost doubling it's performance... There is still better overall throughput crunching 2 WU at a time, but each WU pays a significant penalty when sharing the CPU... You might consider putting the AP WU in suspended state and then resume if/when the servers run out of work.... Just a thought.

Happy BOINCing,
JDWhale


You know I really must get up to speed with some of the jargon and technical information that people talk about on here. I'm not that techie minded and really just like to know that me and my computer are contributing but...

P4 yes, Prescott, well he's an abnoxious Labour MP, hyperthreading, is that the speed in which you thread needles??? The only WU being crunched, never, it has always been doing two at the same time...

OK so I make light of a serious subject but as I say I'm not that clever - hey that's why I use a computer!! But if you can't have a laugh now and again...

As for the other information I'll take it on board and think about suspending the AP WU for a while and see what happens.

Keep talking techie... it makes me feel important!!!



What the ................
Is that really ..........
It can't be .............
no... NO... NO... NOOOOOO
aaaaaAAAAAARRRRGGGGHHHHHHHHH
ID: 797532 · Report as offensive
Scott

Send message
Joined: 9 May 08
Posts: 3
Credit: 502,858
RAC: 0
Australia
Message 797786 - Posted: 14 Aug 2008, 13:29:37 UTC
Last modified: 14 Aug 2008, 13:32:40 UTC

Just for the record, I've just about finished crunching my second Astropulse unit on a stock speed E6750. WU 312311405 completed after 41 or so hours, and WU 313876583 should finish around the 39 hour mark.

Nothing compared to the CPDN model they were crunching in parallel with - 630 hours down and only 30% completed ;)

Edit: Hmm. this was supposed to be in the "snagged" thread - oh well.
ID: 797786 · Report as offensive
Dotsch
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 2422
Credit: 919,393
RAC: 0
Germany
Message 798004 - Posted: 14 Aug 2008, 22:08:06 UTC - in response to Message 796566.  

...
process exited with code 193 (0xc1, -63)
...
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc
SIGABRT: abort called

In very few circumstances the MacOS application crashs with these error messages on PPC and Intel. With he preferences option "leave application in memory while supspended" it looks better and the application crashs not so often. But it could also happens...
I will contact Eric about this issue.
ID: 798004 · Report as offensive
Allan David Watson

Send message
Joined: 21 Jun 99
Posts: 2
Credit: 114,657
RAC: 0
New Zealand
Message 798140 - Posted: 15 Aug 2008, 1:58:35 UTC - in response to Message 796757.  

Astropulse projects hang up on my system after spending tens of hours on them. I've had this happen three times now.

In the future, if I see them in my queue, I'll just delete them.

Anyone else having this issue?


I have a similar problem. "Normal" SETI runs are fine but Astropulse crashes my computer after a few minutes or after several hours. No error messages, no response from ctrl-alt-del, all I can do is hit the reset button. I run XP Home SP3, 1.8GHZ AMD prcessor, 512 MB ram and over 55 GB free space on disk.



ID: 798140 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 798379 - Posted: 15 Aug 2008, 16:36:00 UTC
Last modified: 15 Aug 2008, 16:37:19 UTC

I'm wondering if we need another DCF for AP. On my machines everytime an AP work unit completes the cache size suddenly doubles in the time estimated to complete. Is a second DCF even possible in Boinc / Seti?

[edit] Currently crunching for both MB and AP on all my boxes.
Boinc....Boinc....Boinc....Boinc....
ID: 798379 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 798440 - Posted: 15 Aug 2008, 19:18:55 UTC - in response to Message 798379.  

I'm wondering if we need another DCF for AP. On my machines everytime an AP work unit completes the cache size suddenly doubles in the time estimated to complete. Is a second DCF even possible in Boinc / Seti?

[edit] Currently crunching for both MB and AP on all my boxes.

Duration Correction Factor is a project statistic for each host. It would take non-trivial changes in BOINC to make it an application statistic, and even that wouldn't help projects which have applications which can do various kinds of work.

To make DCF work as well as it can, the project needs to get the raw estimates for all work in line. From reported data here it looks like the AP estimate needs to be scaled up, but the most that can be expected is scaling to approximately match stock setiathome_enhanced. Those running optimized s_e will continue to have a significant mismatch unless optimized versions of AstroPulse can be produced which achieve similar speed improvements.

For now, if the estimate for s_e work is about twice real crunch time, setting the "extra work" preference moderately high will compensate. I wouldn't suggest doubling it, but a 50% increase seems reasonable.
                                                               Joe
ID: 798440 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 798454 - Posted: 15 Aug 2008, 19:57:41 UTC - in response to Message 798440.  

Duration Correction Factor is a project statistic for each host. It would take non-trivial changes in BOINC to make it an application statistic, and even that wouldn't help projects which have applications which can do various kinds of work.

To make DCF work as well as it can, the project needs to get the raw estimates for all work in line. From reported data here it looks like the AP estimate needs to be scaled up, but the most that can be expected is scaling to approximately match stock setiathome_enhanced. Those running optimized s_e will continue to have a significant mismatch unless optimized versions of AstroPulse can be produced which achieve similar speed improvements.

For now, if the estimate for s_e work is about twice real crunch time, setting the "extra work" preference moderately high will compensate. I wouldn't suggest doubling it, but a 50% increase seems reasonable.
                                                               Joe


Thanks Joe for your response. I was thinking also that the work involved in creating another DCF for AP, well....it's not going to happen. I'll just leave my cache sizes as is. It does not take long for the DCF to work it's way back to a reasonable value.


Boinc....Boinc....Boinc....Boinc....
ID: 798454 · Report as offensive
Richard Turnbull
Avatar

Send message
Joined: 25 Jun 99
Posts: 54
Credit: 90,402,501
RAC: 0
United Kingdom
Message 798821 - Posted: 16 Aug 2008, 10:23:43 UTC

I ahve completed 3 AP WUs, and all have pending credit of around 50% of what I would have expected. A normal WU receives 1 credit for each 260 seconds of work, however AP is returning 1 credit for every 500 seconds, making it less worthwhile crunching AP. Will the granted credit be closer to what would be expected, or are we being punished for doing such long task?
ID: 798821 · Report as offensive
SeaEagle

Send message
Joined: 14 Jun 99
Posts: 12
Credit: 3,291,985
RAC: 2
United States
Message 799437 - Posted: 17 Aug 2008, 23:12:18 UTC

AP has been running forever 167:21:03 hours/min/sec. I suspended it to write down the info:
CPU Time = 167.21.29 hr/min/sec
Progress = 99.950%
Time to complete = 4:59 min/sec
As soon as I restarted, it completed with computation error.
I've read through this post several times while this wu was processing and decided to let it run to the end.
Here's the WU :
http://setiathome.berkeley.edu/workunit.php?wuid=313238622
The other user who processed it took 159,845.80 sec. to complete and claimed 714.40 credit.
I processed it in 602,463.10 sec. and claimed 714.04 credit.
How is credit determined? CPU time should be the basis, not the fastest finisher.
And after processing to 99.950% completion and it errors out do you get any credit at all?
I'm not running any more AP's until I get some answers. I've had SETI running since June 1999 and been through the growing pains and I'll keep it running.

Regards - Gregg - need to go check on Fay and see if I need to pull out the plywoood
ID: 799437 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 799520 - Posted: 18 Aug 2008, 4:27:09 UTC - in response to Message 799437.  

AP has been running forever 167:21:03 hours/min/sec. I suspended it to write down the info:
CPU Time = 167.21.29 hr/min/sec
Progress = 99.950%
Time to complete = 4:59 min/sec
As soon as I restarted, it completed with computation error.
I've read through this post several times while this wu was processing and decided to let it run to the end.
Here's the WU :
http://setiathome.berkeley.edu/workunit.php?wuid=313238622
The other user who processed it took 159,845.80 sec. to complete and claimed 714.40 credit.
I processed it in 602,463.10 sec. and claimed 714.04 credit.
How is credit determined? CPU time should be the basis, not the fastest finisher.
And after processing to 99.950% completion and it errors out do you get any credit at all?
I'm not running any more AP's until I get some answers. I've had SETI running since June 1999 and been through the growing pains and I'll keep it running.

Regards - Gregg - need to go check on Fay and see if I need to pull out the plywoood


Looks like you got bitten by the same bug as me, I lost an AP unit as well when it was suspended.

Try turning on the "Leave Application in Memory" option.

ID: 799520 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 799634 - Posted: 18 Aug 2008, 15:16:15 UTC - in response to Message 798440.  

To make DCF work as well as it can, the project needs to get the raw estimates for all work in line. From reported data here it looks like the AP estimate needs to be scaled up, but the most that can be expected is scaling to approximately match stock setiathome_enhanced. Those running optimized s_e will continue to have a significant mismatch unless optimized versions of AstroPulse can be produced which achieve similar speed improvements.


All you can do is calibrate the scaling to the stock apps. The guys running optimized apps should be smart enough to know the est. completion times will be off. :)

That said, my 1st AP WU since reattaching to SETI says it will take 92 hours on a 3.0GHz E8400. That seems a bit high compared to the others on the forum. I'll wait until it reports OK before talking anymore about this.
ID: 799634 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 799646 - Posted: 18 Aug 2008, 15:58:46 UTC
Last modified: 18 Aug 2008, 16:00:35 UTC

reporting *possible* corruption of checkpoint data on application suspend/termination, manifesting as exception/compute error on resume. still investigating here.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 799646 · Report as offensive
Profile seversen

Send message
Joined: 24 Feb 00
Posts: 2
Credit: 50,152
RAC: 0
United States
Message 799825 - Posted: 19 Aug 2008, 4:18:30 UTC

Does anyone have an idea as to why this AP unit returned a compute error?
http://setiathome.berkeley.edu/result.php?resultid=946854005

Thanks.
ID: 799825 · Report as offensive
6dj72cn8

Send message
Joined: 3 Sep 99
Posts: 24
Credit: 163,811
RAC: 0
Australia
Message 799845 - Posted: 19 Aug 2008, 7:19:41 UTC - in response to Message 799520.  

Looks like you got bitten by the same bug as me, I lost an AP unit as well when it was suspended.

Try turning on the "Leave Application in Memory" option.


My first Astropulse unit has just crashed too. Intel Mac. Task result

As with others below (or above, depending which way you like your threads), I had suspended it while running other things and it crashed immediately upon resumption.

When I have used 'Leave Application in Memory' in the past, I have had trouble with tasks not releasing the CPU and seeming to get stuck in a full power loop, busily calculating nothing. Task progress was not incrementing but Hardware Monitor showed the processor(s) still steaming away at 100%.

I think I'll wait until a real fix is forthcoming.
ID: 799845 · Report as offensive
PeterRehm

Send message
Joined: 12 Jul 99
Posts: 13
Credit: 1,268,024
RAC: 0
United States
Message 799901 - Posted: 19 Aug 2008, 13:57:20 UTC - in response to Message 794480.  

Finished an Astropulse crunch.
940322137 309888091 3 Aug 2008 16:52:28 UTC 5 Aug 2008 23:41:36 UTC Over Success Done 168,830.40 719.06 0.00

719.06 credits requested
0.00 credits given

That was alot of CPU time to get squat. :-(

Next time I see an astropulse work item, I'll be sure to cancel/abort it.




Agreed. My computer worked an entire week during which time I'd get at least 300 per day. The 700 or so I got really was not adequate. I'll be canceling the unit the same next time I see an Astropulse WU if this is not quickly addressed.
ID: 799901 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 14 · Next

Message boards : Number crunching : AstroPulse errors - Reporting


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.