Astropulse Errors II-Optimized version 5.03!

Message boards : Number crunching : Astropulse Errors II-Optimized version 5.03!
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
dduggan47
Volunteer tester

Send message
Joined: 18 May 99
Posts: 8
Credit: 2,071,520
RAC: 0
United States
Message 879360 - Posted: 26 Mar 2009, 10:41:56 UTC - in response to Message 879234.  

I hope this isn't a duplicate report but I didn't see it elsewhere.

I've got this (http://setiathome.berkeley.edu/result.php?resultid=1169379672) wu on my PC. It's overdue but it's also done. It shows as:

CPU: 188:53:46
Progress: 100.000%
To Completion: ---
Report Deadline: 3/22/2009 8:07:15AM
Status: Waiting to run

Until it finished (yesterday) it was running as high priority but then it finished and just sits.

Wuzzup with that?

- Dick

ID: 879360 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 879464 - Posted: 26 Mar 2009, 16:56:06 UTC - in response to Message 879200.  

Just found this WU in my list.
As you can imagine I'm the one getting nothin'.
Up to now this is the first invalid on all my rigs...

bad sign indeed.
Maybe it's worth to start saving AP workunits on HDD at least until it pass validation.
W/o original WU we can't do anything cause stock even doesn't report how many pulses it found...


Got some more good candidates for getting nothin':

423714931
414210443

all r112 paired with stock app...

mic.


ID: 879464 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 879468 - Posted: 26 Mar 2009, 16:58:35 UTC - in response to Message 879234.  

But if you feel like digging, it hasn't been purged yet, so you can still go to fanout and find the WU before it is deleted.


Someone know the logic how the WUs are stored in all those folders?
mic.


ID: 879468 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 879475 - Posted: 26 Mar 2009, 17:15:27 UTC - in response to Message 879360.  

I hope this isn't a duplicate report but I didn't see it elsewhere.

I've got this (http://setiathome.berkeley.edu/result.php?resultid=1169379672) wu on my PC. It's overdue but it's also done. It shows as:

CPU: 188:53:46
Progress: 100.000%
To Completion: ---
Report Deadline: 3/22/2009 8:07:15AM
Status: Waiting to run

Until it finished (yesterday) it was running as high priority but then it finished and just sits.

Wuzzup with that?

- Dick

Hmm, 7.87 days on a 1.86GHz Core 2 Duo suggests you're using the stock application. Still, if it's done the BOINC core client should know it, upload the result and report it.

Your account isn't visible, but if you're doing another project an unlikely possibility is that its work is at high priority, the AP_v5 WU is actually paused at 99.9996% done which gets rounded to 100.000% by the BOINC Manager display. If so, you could suspend that other project just long enough for the AP_v5 to complete.

Otherwise, first try suspending then resuming the AP_v5 WU. That might possibly encourage BOINC to do something useful. If that fails, try exiting from BOINC and restarting it. Finally, sometimes a reboot of Windows will clear up such weird conditions. I'm sorry to say it may also end up as an error condition. But if you successfully get it reported you should get credit.
                                                               Joe


ID: 879475 · Report as offensive
Profile Byron Leigh Hatch @ team Carl Sagan
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 4548
Credit: 35,667,570
RAC: 4
Canada
Message 879476 - Posted: 26 Mar 2009, 17:19:03 UTC



just reporting in for general info

Task 1189669312

I'm using opp app ( SSE SSE2 HT) for AP and it is working Great!

I just want to thank:

Joe Segur
Simon
Gecko_R7
Raistmer
Jason G
Crunch3r

and all the team and testers at Lunatics.kwsn.net team

Best Wishes
Byron



ID: 879476 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 879479 - Posted: 26 Mar 2009, 17:23:37 UTC - in response to Message 879464.  
Last modified: 26 Mar 2009, 17:25:25 UTC

Just found this WU in my list.
As you can imagine I'm the one getting nothin'.
Up to now this is the first invalid on all my rigs...

bad sign indeed.
Maybe it's worth to start saving AP workunits on HDD at least until it pass validation.
W/o original WU we can't do anything cause stock even doesn't report how many pulses it found...


Got some more good candidates for getting nothin':

423714931
414210443

all r112 paired with stock app...


Hm... If you have many invalid results... time to cleanup CPU cooler and lower OC probably ;)

(try to catch such invalid WU, will test it on my own hosts)
ID: 879479 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 879484 - Posted: 26 Mar 2009, 17:31:56 UTC - in response to Message 879468.  

But if you feel like digging, it hasn't been purged yet, so you can still go to fanout and find the WU before it is deleted.


Someone know the logic how the WUs are stored in all those folders?

The standard BOINC method is to take an MD5 hash of the file name, select 10 bits of that represented in hex digits to select one of 1024 subdirectories.

Files are deleted very soon after result assimilation, only the BOINC database entries remain unpurged for an extra day for users to look at.
                                                               Joe
ID: 879484 · Report as offensive
dduggan47
Volunteer tester

Send message
Joined: 18 May 99
Posts: 8
Credit: 2,071,520
RAC: 0
United States
Message 879525 - Posted: 26 Mar 2009, 19:01:39 UTC - in response to Message 879475.  


Hmm, 7.87 days on a 1.86GHz Core 2 Duo suggests you're using the stock application. Still, if it's done the BOINC core client should know it, upload the result and report it.

Your account isn't visible, but if you're doing another project an unlikely possibility is that its work is at high priority, the AP_v5 WU is actually paused at 99.9996% done which gets rounded to 100.000% by the BOINC Manager display. If so, you could suspend that other project just long enough for the AP_v5 to complete.

Otherwise, first try suspending then resuming the AP_v5 WU. That might possibly encourage BOINC to do something useful. If that fails, try exiting from BOINC and restarting it. Finally, sometimes a reboot of Windows will clear up such weird conditions. I'm sorry to say it may also end up as an error condition. But if you successfully get it reported you should get credit.
                                                               Joe


Thanks Joe. That may have worked ... sort of.

I do have tasks from 23 other projects on that machine at the moment (though none of them are trying to run at high priority). I suspended them all except for SETI and then the task in question did start running again as a high priority. I'm not sure it hasn't started over though:

CPU Time: 188:57:21
Progress: .0064%
To Completion: 675:24:03

Thanks again. We'll see what happens.

- Dick

PS - That time to completion is dropping in 10 to 20 second chunks every second.



ID: 879525 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 879545 - Posted: 26 Mar 2009, 20:03:44 UTC - in response to Message 879479.  

Just found this WU in my list.
As you can imagine I'm the one getting nothin'.
Up to now this is the first invalid on all my rigs...

bad sign indeed.
Maybe it's worth to start saving AP workunits on HDD at least until it pass validation.
W/o original WU we can't do anything cause stock even doesn't report how many pulses it found...


Got some more good candidates for getting nothin':

423714931
414210443

all r112 paired with stock app...


Hm... If you have many invalid results... time to cleanup CPU cooler and lower OC probably ;)

(try to catch such invalid WU, will test it on my own hosts)


Well it's just 3 out of 250 on 2 different hosts - no big thing.

BTW all my rigs run stock speed and my apprentice keeps them neat and clean. ;)

mic.


ID: 879545 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 879588 - Posted: 26 Mar 2009, 22:46:43 UTC - in response to Message 879525.  

...then the task in question did start running again as a high priority. I'm not sure it hasn't started over though:

CPU Time: 188:57:21
Progress: .0064%
To Completion: 675:24:03

Thanks again. We'll see what happens.

- Dick

PS - That time to completion is dropping in 10 to 20 second chunks every second.

I fear it would take another 188+ hours to get it done, there's a known issue with the checkpoint file which can make the app start over. The host which was given the resend probably started it March 23 and took about 8.2 days for an earlier AP_v5. If your host finishes while the WU is unresolved you'll get credit, but that seems unlikely. If it were mine I'd swear a little and abort it.
                                                                Joe
ID: 879588 · Report as offensive
nicky neutrino

Send message
Joined: 14 Jun 02
Posts: 37
Credit: 1,750,735
RAC: 1
United States
Message 879628 - Posted: 27 Mar 2009, 1:33:54 UTC

i also receive the message"astropulse v5 not available for my type of computer". i have an imac running on leopard 10.5.6. any connection or is the system down? thanx in advance for shedding any light on this, peace..

rich
aka
nicky neutrino
ID: 879628 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 879697 - Posted: 27 Mar 2009, 7:16:46 UTC - in response to Message 879545.  

[quote][quote][quote]Just found (try to catch such invalid WU, will test it on my own hosts)


Well it's just 3 out of 250 on 2 different hosts - no big thing.

BTW all my rigs run stock speed and my apprentice keeps them neat and clean. ;)


Ok, then if you have some free HDD space try to copy all AP tasks to another directory once per day for example. Then if you found some invalid result again you will have its WU available so we will be able to reproduce results with stock and opt AP on different hosts.
W/o that such invalid results can give some alert but nothing to prove if it app problem or not...
ID: 879697 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 879703 - Posted: 27 Mar 2009, 9:24:23 UTC - in response to Message 879697.  

[quote][quote][quote]Just found (try to catch such invalid WU, will test it on my own hosts)


Well it's just 3 out of 250 on 2 different hosts - no big thing.

BTW all my rigs run stock speed and my apprentice keeps them neat and clean. ;)


Ok, then if you have some free HDD space try to copy all AP tasks to another directory once per day for example. Then if you found some invalid result again you will have its WU available so we will be able to reproduce results with stock and opt AP on different hosts.
W/o that such invalid results can give some alert but nothing to prove if it app problem or not...

BoincLogX can save copies of these files automatically for you.
ID: 879703 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 879800 - Posted: 27 Mar 2009, 18:10:34 UTC - in response to Message 879703.  

Ok, then if you have some free HDD space try to copy all AP tasks to another directory once per day for example. Then if you found some invalid result again you will have its WU available so we will be able to reproduce results with stock and opt AP on different hosts.
W/o that such invalid results can give some alert but nothing to prove if it app problem or not...

BoincLogX can save copies of these files automatically for you.


Ok, I'll give it a shot, but it's gonna take until next weekend because those rigs are in productive use all week and I'm in productive use this weekend. ;)
mic.


ID: 879800 · Report as offensive
Profile Ananas
Volunteer tester

Send message
Joined: 14 Dec 01
Posts: 195
Credit: 2,503,252
RAC: 0
Germany
Message 880730 - Posted: 30 Mar 2009, 8:03:04 UTC - in response to Message 879479.  
Last modified: 30 Mar 2009, 8:04:49 UTC

...
(try to catch such invalid WU, will test it on my own hosts)


wuid=418784475 (my only invalid WU so far)

The invalid one ran on ap_5.03r112_SSE3.exe, Q9550, FSB 400MHz, 4GB RAM

Both valid ones ran on Pentium D with the stock application.
ID: 880730 · Report as offensive
Rob.B

Send message
Joined: 23 Jul 99
Posts: 157
Credit: 1,439,682
RAC: 0
United Kingdom
Message 880804 - Posted: 30 Mar 2009, 17:16:29 UTC
Last modified: 30 Mar 2009, 17:17:01 UTC

I try not to do AP not because of the time it takes to process but because of the time it takes to get validated. How many times do you see "Error while computing" from folks running stock app's. As a project how do you justify putting out an app' that can't process the data.

I have one result waiting now over 1.5 month's and another where I processed in 3 days but it's now out for it's 7'th host.

It's a joke, amd the joke is on us crunchers!

Sorry but this has to be said.

Rob

Kick me at will I don't care!!!!!
ID: 880804 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 880818 - Posted: 30 Mar 2009, 18:23:50 UTC

For awhile the number of Astropulse v5 results needed to get a valid pair was huge, but it has now gotten down to about 3.18 (based on the "Results waiting for db purging"/"Workunits waiting for db purging" ratio 300861/94389). It would be nice if that were closer to 2, and it may get down to the ~2.7 that old Astropulse had. I think it's a matter of glitches on hosts affecting a larger percentage of AP than MB simply because of the longer crunch time. If a host of typical speed glitches on average once a day and is doing AP, almost all its results will be affected, but if it were doing MB only a few of the larger number of results would be affected.

Unfortunately, the quota system is very ineffective for hosts of typical speed doing AP work. If a host takes one day or more to do an AP WU, the quota system never actually affects it since it always will provide one WU/day per CPU. We simply have to hope the user notices the host is producing errors rather than successful results, and isn't earning any credit.
                                                              Joe
ID: 880818 · Report as offensive
Rob.B

Send message
Joined: 23 Jul 99
Posts: 157
Credit: 1,439,682
RAC: 0
United Kingdom
Message 880819 - Posted: 30 Mar 2009, 18:36:30 UTC - in response to Message 880818.  

Sorry you can't realy on hosts "noticing" the issue, there are many that run Boinc as a background no maintenance task, in order to "do their bit". You can't ask them to be responsible for poor stock coding. If you can't code a stock app' to run on the bulk of the machines why bother!

Strange is it not that the Op' App's tend to work!


ID: 880819 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 880855 - Posted: 30 Mar 2009, 20:49:43 UTC - in response to Message 880818.  

For awhile the number of Astropulse v5 results needed to get a valid pair was huge, but it has now gotten down to about 3.18 (based on the "Results waiting for db purging"/"Workunits waiting for db purging" ratio 300861/94389). It would be nice if that were closer to 2, and it may get down to the ~2.7 that old Astropulse had. I think it's a matter of glitches on hosts affecting a larger percentage of AP than MB simply because of the longer crunch time. If a host of typical speed glitches on average once a day and is doing AP, almost all its results will be affected, but if it were doing MB only a few of the larger number of results would be affected.

Unfortunately, the quota system is very ineffective for hosts of typical speed doing AP work. If a host takes one day or more to do an AP WU, the quota system never actually affects it since it always will provide one WU/day per CPU. We simply have to hope the user notices the host is producing errors rather than successful results, and isn't earning any credit.
                                                              Joe

Yeah, I even posted some color picture some time ago on beta to illustrate the fact heavely OCing host can do MB pretty fine failing occasionally but will completely unusable for AP failing on each task... Unfortunately, AP task can't be splitted.
ID: 880855 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 880856 - Posted: 30 Mar 2009, 20:51:35 UTC - in response to Message 880819.  

Sorry you can't realy on hosts "noticing" the issue, there are many that run Boinc as a background no maintenance task, in order to "do their bit". You can't ask them to be responsible for poor stock coding. If you can't code a stock app' to run on the bulk of the machines why bother!

Strange is it not that the Op' App's tend to work!


The faster app will be the less (in general) failed tasks it will produce. It's just probability and statistics.... (and not poor coding by itself).
ID: 880856 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : Astropulse Errors II-Optimized version 5.03!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.