AstroPulse errors - Reporting

Message boards : Number crunching : AstroPulse errors - Reporting
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 14 · Next

AuthorMessage
Olivier ROGER-SOULILLOU

Send message
Joined: 28 May 06
Posts: 37
Credit: 501,017
RAC: 0
France
Message 805309 - Posted: 5 Sep 2008, 19:29:17 UTC

here is unfortunately my first compute error with an astropulse workunit.. will it be eventually granted, or nope?
325810265 3 Sep 2008 15:36:47 UTC 5 Sep 2008 5:51:59 UTC Over Client error Compute error 51,152.51 claimed 110.09 ---
ID: 805309 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 805321 - Posted: 5 Sep 2008, 20:54:49 UTC - in response to Message 805309.  

here is unfortunately my first compute error with an astropulse workunit.. will it be eventually granted, or nope?
325810265 3 Sep 2008 15:36:47 UTC 5 Sep 2008 5:51:59 UTC Over Client error Compute error 51,152.51 claimed 110.09 ---

Compute errors I think always result in no credit, Just move on to the next WU and smile. :D
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 805321 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 809090 - Posted: 17 Sep 2008, 8:05:33 UTC

I wonder what the validator was playing at here.

This WU was originally sent out on 29th July. Neither host responded, so it was sent out to 2 more hosts on 28th August.

The first result to be uploaded, here contains a lot of what appear to be error messages in its stderr out of the form:
     In ap_fileio.cpp, Statefile::write, statefile is 0'd, trying again: iteration x" where "x" increments from 1 to 100, 

and
     Error reading from foldfile: wanted 1048576 bytes, but read only 0 bytes.

and
     Short fold buffer didn't fill up (lol=0, size=65536).
     Long fold buffer didn't fill up (lol=0, size=262144).

The second result uploaded appears to be clean. Both results claimed 750.78 credit.

Shortly after the second result was uploaded, the WU was issued to another host (me!!) as might be expected, but both the returned results are marked as 'Valid' and both have been granted 0 credit.

So, is there any point in my crunching this WU? If I return a result that matches the one "good" one that has already been returned, will this make any difference to the status that goes into the Master Science Database? If the answer to the latter question is "No" then I may as well abort it before it starts and use the cycles for a WU where they will make a difference.

Note, the fact that I will likely get 0 credit for crunching this WU is totally irrelevant; my question is, "Will the cycles be wasted?"

F.
ID: 809090 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 809102 - Posted: 17 Sep 2008, 9:00:56 UTC - in response to Message 809090.  

I wonder what the validator was playing at here.

This WU was originally sent out on 29th July. Neither host responded, so it was sent out to 2 more hosts on 28th August.

The first result to be uploaded, here contains a lot of what appear to be error messages in its stderr out of the form:
     In ap_fileio.cpp, Statefile::write, statefile is 0'd, trying again: iteration x" where "x" increments from 1 to 100, 

and
     Error reading from foldfile: wanted 1048576 bytes, but read only 0 bytes.

and
     Short fold buffer didn't fill up (lol=0, size=65536).
     Long fold buffer didn't fill up (lol=0, size=262144).

The second result uploaded appears to be clean. Both results claimed 750.78 credit.

Shortly after the second result was uploaded, the WU was issued to another host (me!!) as might be expected, but both the returned results are marked as 'Valid' and both have been granted 0 credit.

So, is there any point in my crunching this WU? If I return a result that matches the one "good" one that has already been returned, will this make any difference to the status that goes into the Master Science Database? If the answer to the latter question is "No" then I may as well abort it before it starts and use the cycles for a WU where they will make a difference.

Note, the fact that I will likely get 0 credit for crunching this WU is totally irrelevant; my question is, "Will the cycles be wasted?"

F.

I *think* the situation is as follows - this behaviour has been seen many times before, and seems consistent.

The validator has to do two different things. (Well, probably more than two, but let's concentrate on these).

1) D'uh. Check whether the results are valid, and act accordingly. In this case, there would seem to be a discrepancy, and the correct action has been taken - send out a third copy as a tie-breaker.

2) Record the results of its actions in the BOINC database for us mere humans to view and worry over. This is the bit which seems to be going wrong: the validate state should have become 'checked, but no consensus yet' in the task details view, and the 'granted credit' should have remained 'pending' in the summary view (note that 'granted credit' is shown as 0 anyway pre-validation in task detail view).

The good news is that the validator seems to make a better fist of recording final outcomes: result reports like this get tidied up in the end, when the tiebreaker reports in.

And the other good news is that the 'statefile zeroed' is a known, and not very common, bug: Josh spent a lot of time, and released some instrumented test builds in Beta, trying to track it down, but eventually admitted defeat and released the app anyway. The chances are, therefore, that your new copy will run without errors: will be re-validated against the two existing results, and will match the good one. Result: another canonical result in the database, and credit for two out of three. I'd say go for it.
ID: 809102 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 809107 - Posted: 17 Sep 2008, 9:20:19 UTC - in response to Message 809102.  
Last modified: 17 Sep 2008, 9:26:04 UTC


I *think* the situation is as follows - this behaviour has been seen many times before, and seems consistent.

The validator has to do two different things. (Well, probably more than two, but let's concentrate on these).

1) D'uh. Check whether the results are valid, and act accordingly. In this case, there would seem to be a discrepancy, and the correct action has been taken - send out a third copy as a tie-breaker.

2) Record the results of its actions in the BOINC database for us mere humans to view and worry over. This is the bit which seems to be going wrong: the validate state should have become 'checked, but no consensus yet' in the task details view, and the 'granted credit' should have remained 'pending' in the summary view (note that 'granted credit' is shown as 0 anyway pre-validation in task detail view).

The good news is that the validator seems to make a better fist of recording final outcomes: result reports like this get tidied up in the end, when the tiebreaker reports in.

And the other good news is that the 'statefile zeroed' is a known, and not very common, bug: Josh spent a lot of time, and released some instrumented test builds in Beta, trying to track it down, but eventually admitted defeat and released the app anyway. The chances are, therefore, that your new copy will run without errors: will be re-validated against the two existing results, and will match the good one. Result: another canonical result in the database, and credit for two out of three. I'd say go for it.

Thanks for your input, Richard. It's reassuring to learn that this is not a "new" bug. I will, as you suggest, 'go for it' and will try to keep an eye on this one to check the result when crunched - could be a couple of days before it comes to the top of the heap, though.

F.
[edit]
Your mention of the 'canonical result' caused me to recheck the Task Details page and, of course, there is no mention of such for that task yet. Had I noticed that before posting, then I would have probably just waited it out - but maybe our little interchange will be of value to others in time :)
[/edit]
ID: 809107 · Report as offensive
Profile Byron S Goodgame
Volunteer tester
Avatar

Send message
Joined: 16 Jan 06
Posts: 1145
Credit: 3,936,993
RAC: 0
United States
Message 809586 - Posted: 18 Sep 2008, 23:03:00 UTC

Had an AP working on one of my older machines here and after getting to 70 something % and some 100 hours plus, had a client error. The machine did freeze up for some strange reason while running a slideshow and I had to restart the computer. When it started to boot it went into the normal check disk mode, ( which I just knew I should have stopped but didn't want to have to deal with canceling it every time I rebooted until the AP finished. Big mistake on my part ) and it found an error in the AP dat file which read cross linked on allocation unit 2754 and it made a copy of the file to repair it. After windows started, Boinc started up fine but then got an error message that AP could not run and had to close, which then caused the client error in Boinc. I'm sure it was my system that caused the error, and I won't run them on that pc anymore. It's an AMD 2200+ and I was just curious how long it would take...Too long to try again and take a chance on an error, but was curious if there was any way I could have saved the file after the check disk happened to prevent the client error?
ID: 809586 · Report as offensive
geoff

Send message
Joined: 25 Apr 00
Posts: 123
Credit: 34,100,351
RAC: 18
United Kingdom
Message 809629 - Posted: 19 Sep 2008, 0:24:56 UTC

After nearly 34 hrs crunching why have I received 0 credit on this wu http://setiathome.berkeley.edu/workunit.php?wuid=313537322
ID: 809629 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 809670 - Posted: 19 Sep 2008, 2:45:36 UTC - in response to Message 809629.  

After nearly 34 hrs crunching why have I received 0 credit on this wu http://setiathome.berkeley.edu/workunit.php?wuid=313537322

You probably haven't, though it looks like that. There's some difficulty with the status code the AP Validator returns, so it can say Valid when it hasn't really been checked yet. For this case, your result was the second "success" returned. The Validator tried to do the comparison but the result file for the first "Success" couldn't be found, so it got a Validate error. Yours should probably have remained in the "Initial" validate state since there was nothing to compare it to, but the Validator called it "Valid" instead. A similar situation has been seen many times where a result should have gone to "Checked, but no consensus yet" state, but was shown as "Valid". The Web display code is configured to show granted credit for results in the "Valid" state and has to show 0 since there can be no grant until a canonical result is chosen.

In short, what all that means is that when another "Success" result is returned I believe your result will be compared and should get credit then.

The Transitioner created another task, which unfortunately has been sent to a host with a turnaround time of over 13 days doing non-AP work. It's the last of 25 WUs in that host's list of work not yet done, and the one just before it is also an AP WU. You'll need a lot of patience, that host might possibly complete the WU by the October 19 deadline but more likely not so it would have to be reissued again.
                                                                Joe
ID: 809670 · Report as offensive
geoff

Send message
Joined: 25 Apr 00
Posts: 123
Credit: 34,100,351
RAC: 18
United Kingdom
Message 809789 - Posted: 19 Sep 2008, 7:49:22 UTC

Thanks for the explanation Joe, I will wait patiently.

Geoff
ID: 809789 · Report as offensive
Profile Dr. Bob
Avatar

Send message
Joined: 1 Apr 03
Posts: 78
Credit: 623,977
RAC: 0
United States
Message 810278 - Posted: 20 Sep 2008, 15:20:23 UTC - in response to Message 809670.  
Last modified: 20 Sep 2008, 16:19:23 UTC

After nearly 34 hrs crunching why have I received 0 credit on this wu http://setiathome.berkeley.edu/workunit.php?wuid=313537322

You probably haven't, though it looks like that. There's some difficulty with the status code the AP Validator returns, so it can say Valid when it hasn't really been checked yet. For this case, your result was the second "success" returned. The Validator tried to do the comparison but the result file for the first "Success" couldn't be found, so it got a Validate error. Yours should probably have remained in the "Initial" validate state since there was nothing to compare it to, but the Validator called it "Valid" instead. A similar situation has been seen many times where a result should have gone to "Checked, but no consensus yet" state, but was shown as "Valid". The Web display code is configured to show granted credit for results in the "Valid" state and has to show 0 since there can be no grant until a canonical result is chosen.

In short, what all that means is that when another "Success" result is returned I believe your result will be compared and should get credit then.

The Transitioner created another task, which unfortunately has been sent to a host with a turnaround time of over 13 days doing non-AP work. It's the last of 25 WUs in that host's list of work not yet done, and the one just before it is also an AP WU. You'll need a lot of patience, that host might possibly complete the WU by the October 19 deadline but more likely not so it would have to be reissued again.
                                                                Joe

==============
Hi Joe,

Crunching my first auto pulse was indeed, long as you mentioned. It continued to list 97 hours or so until finish time. Apparently it finished, because as I checked from time to time it showed it was 97% finished. Then yesterday, I noted, because I had not checked for several days--been in and out and about, that a new SETI task had started that was only 8 hours long. My credit did jump maybe 700 or so recently but can't find it on "Your Results"--unless I just misread that or didn't see it at all.

Still, think it may meet some needs of the researchers who want these services; no matter how long it takes on my computer I am happy to supply the crunching time. As others mentioned here that some may find the time element burdensome and their work schedules may really interfere with a long crunching time. Also, the desire to have confirmation of completeness of the crunching is important to many of us who like to see tasks finished. Both of these are understandable but neither one seems to be a great problem for me, although I, too, like to see things completed.

There were no difficulties except it did not show the screen saver as it crunched. Often on at my place for 8-10 hours at a time, still no screen saver. Now if it is indeed, back to the non-auto pulse with the numbers to crunch, the regular screen saver does not automatically come forth as before. I can click on the screen saver button and let it show that way and that is what I may do. Helps me to check on the computer that is crunching, if I can see the screen saver as I walk by--just a little trick to see if everything is working ok on that computer.

In any event, just a status report from me that you or some others may find helpful. If you get a chance, see if I did get credit for it--I can't seem to find that but you may be able to. Or it may be I am in the same "space ship" that Geoff is in; I must wait for my First Space Mate to take another 97 hours (or less if in a faster processor) to finish crunching the "check" on my work.

This project always makes me think about that even in the social sciences and that is my field. At times it is important to prove a hypothesis wrong and if SETI falls in that category so be it; a significant contribution still. Although, I think that that assertion may be premature so far. The search for intelligent life goes on it seems as more places in the great, great areas of the Universe are located to find radio signals from elements and/or communication attempts.

None the less, I am happy to supply computer time and be a part of what I still believe is one of the greatest scientific projects in these centuries. Sorry, i saw romantic human kind ends and techniques here a just a bit and may be off task. Still, this is one of the things that makes humans remarkable, I think.

Best,

Dr. Bob
ID: 810278 · Report as offensive
web03
Volunteer tester
Avatar

Send message
Joined: 13 Feb 01
Posts: 355
Credit: 719,156
RAC: 0
United States
Message 810317 - Posted: 20 Sep 2008, 17:08:49 UTC

Dr. Bob - more than likely you were the second one in on that workunit. I dug through all of your results still visible on the website and I can't find an AP unit. If you were the second one in and you validated against the other person, I'd say that the unit has already been moved over to the science database. You could go over to one of the many stat sites to see if you jumped a bit higher than average over the past few days.

Wendy
ID: 810317 · Report as offensive
Profile Gustav_and_Padma
Avatar

Send message
Joined: 26 Oct 03
Posts: 16
Credit: 315,654
RAC: 0
United States
Message 810733 - Posted: 21 Sep 2008, 20:37:11 UTC

We have successfully computed a few astropulse baches of data. But we have had a few client errors as well. The first errors we created when we tried to gracefully finish up some seti@home tasks before closing down the sytem to do a clean reimstall on our GenuineIntel Intel(R) Pentium(R) Dual CPU E2160 @ 1.80GHz [x86 Family 6 Model 15 Stepping 13] running Windows Vista Home Premium. We were not so graceful and scrapped a few tasks in the provess. Then we got things working right again and Seti@home eventually gave us a couple of other chances to run some astropulse tasks. They completed fine, but were waiting to report forever. And we lost the data when switching users. Two great big astropulse cruches - darn. We have more than one user on the computer and now we are pretty sure we should just do an 'update project' when the boinc servers are up before either "switching users" (vista style) or logging off a user.

1.) If we are careful to do this, will Seti give us a chance to do some more astropulse crunches?

It seems almost as if Seti punishes us by giving us low point value - high CPU time tasks if we force project updates though. Maybe it is just a coincidence that seems to happen. But with our wasted astropulse data, we are forcing project updates whenever we notice more than one unreported task, as that seems to be a sensible safety precaution.

2) Is there any rhyme or reason for getting low point value - high CPU ratio tasks?
ID: 810733 · Report as offensive
Profile Leaps-from-Shadows
Volunteer tester
Avatar

Send message
Joined: 11 Aug 08
Posts: 323
Credit: 259,220
RAC: 0
United States
Message 810962 - Posted: 22 Sep 2008, 13:59:17 UTC
Last modified: 22 Sep 2008, 14:07:12 UTC

Got something weird going on...

Check this Astropulse unit out.

We both got the same Claimed credit (764.28), and we both got the same Granted credit (0). It wasn't anywhere near the deadline.

Is it because of the Unsent server state for the third wingman?

Edit: This is the first Astropulse unit completed by Destroyer, the faster of my two crunchers. I really would like some credit for my 74 hours of crunching. I'm sure my wingman wouldn't mind having his/her credit too... :)
Cruiser
Gateway GT5692 L-f-S Edition
-Phenom X4 9650 CPU
-4GB 667MHz DDR2 RAM
-500GB SATA HD
-Vista x64 SP1
-BOINC 6.2.19 32-bit client
-SSE3 optimized 32-bit apps
ID: 810962 · Report as offensive
web03
Volunteer tester
Avatar

Send message
Joined: 13 Feb 01
Posts: 355
Credit: 719,156
RAC: 0
United States
Message 810965 - Posted: 22 Sep 2008, 14:10:42 UTC

There's been problems in the past with the validator and AP. I believe Eric runs a script to manually grant it.
ID: 810965 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 811000 - Posted: 22 Sep 2008, 17:11:59 UTC - in response to Message 810962.  

Got something weird going on...

Check this Astropulse unit out.

We both got the same Claimed credit (764.28), and we both got the same Granted credit (0). It wasn't anywhere near the deadline.

Is it because of the Unsent server state for the third wingman?

Edit: This is the first Astropulse unit completed by Destroyer, the faster of my two crunchers. I really would like some credit for my 74 hours of crunching. I'm sure my wingman wouldn't mind having his/her credit too... :)

No canonical result has been chosen, so the likely situation is that status in the Task details should be showing "Checked, but no consensus yet." The probable final outcome will be all three hosts getting credit.
                                                               Joe
ID: 811000 · Report as offensive
Profile Leaps-from-Shadows
Volunteer tester
Avatar

Send message
Joined: 11 Aug 08
Posts: 323
Credit: 259,220
RAC: 0
United States
Message 811003 - Posted: 22 Sep 2008, 17:37:10 UTC

Well at least it should be quick - it was actually sent to a third computer finally, and it's a pretty fast Intel Core 2 Duo machine. It had a status of Unsent when I originally posted.
Cruiser
Gateway GT5692 L-f-S Edition
-Phenom X4 9650 CPU
-4GB 667MHz DDR2 RAM
-500GB SATA HD
-Vista x64 SP1
-BOINC 6.2.19 32-bit client
-SSE3 optimized 32-bit apps
ID: 811003 · Report as offensive
Profile Dr. Bob
Avatar

Send message
Joined: 1 Apr 03
Posts: 78
Credit: 623,977
RAC: 0
United States
Message 811005 - Posted: 22 Sep 2008, 17:54:30 UTC - in response to Message 810317.  

Dr. Bob - more than likely you were the second one in on that workunit. I dug through all of your results still visible on the website and I can't find an AP unit. If you were the second one in and you validated against the other person, I'd say that the unit has already been moved over to the science database. You could go over to one of the many stat sites to see if you jumped a bit higher than average over the past few days.

Wendy


==========
Ok, thanks Wendy...

Dr. Bob
ID: 811005 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 811197 - Posted: 23 Sep 2008, 9:46:39 UTC - in response to Message 809102.  

I wonder what the validator was playing at here.

This WU was originally sent out on 29th July. Neither host responded, so it was sent out to 2 more hosts on 28th August.

The first result to be uploaded, here contains a lot of what appear to be error messages in its stderr out of the form:
     In ap_fileio.cpp, Statefile::write, statefile is 0'd, trying again: iteration x" where "x" increments from 1 to 100, 

and
     Error reading from foldfile: wanted 1048576 bytes, but read only 0 bytes.

and
     Short fold buffer didn't fill up (lol=0, size=65536).
     Long fold buffer didn't fill up (lol=0, size=262144).

The second result uploaded appears to be clean. Both results claimed 750.78 credit.

Shortly after the second result was uploaded, the WU was issued to another host (me!!) as might be expected, but both the returned results are marked as 'Valid' and both have been granted 0 credit.

So, is there any point in my crunching this WU? If I return a result that matches the one "good" one that has already been returned, will this make any difference to the status that goes into the Master Science Database? If the answer to the latter question is "No" then I may as well abort it before it starts and use the cycles for a WU where they will make a difference.

Note, the fact that I will likely get 0 credit for crunching this WU is totally irrelevant; my question is, "Will the cycles be wasted?"

F.

I *think* the situation is as follows - this behaviour has been seen many times before, and seems consistent.

The validator has to do two different things. (Well, probably more than two, but let's concentrate on these).

1) D'uh. Check whether the results are valid, and act accordingly. In this case, there would seem to be a discrepancy, and the correct action has been taken - send out a third copy as a tie-breaker.

2) Record the results of its actions in the BOINC database for us mere humans to view and worry over. This is the bit which seems to be going wrong: the validate state should have become 'checked, but no consensus yet' in the task details view, and the 'granted credit' should have remained 'pending' in the summary view (note that 'granted credit' is shown as 0 anyway pre-validation in task detail view).

The good news is that the validator seems to make a better fist of recording final outcomes: result reports like this get tidied up in the end, when the tiebreaker reports in.

And the other good news is that the 'statefile zeroed' is a known, and not very common, bug: Josh spent a lot of time, and released some instrumented test builds in Beta, trying to track it down, but eventually admitted defeat and released the app anyway. The chances are, therefore, that your new copy will run without errors: will be re-validated against the two existing results, and will match the good one. Result: another canonical result in the database, and credit for two out of three. I'd say go for it.

Just to complete the record here, apart from a minor hiccup that caused the Task to become invisible in my Task List for a short time after reporting (raised in another thread), the sequence completed exactly as Richard predicted.

the moral of this story is: If you think you have been short-changed on an AP WU, make sure that a canonical result has been identified before freaking out.

Another brownie point to Mr H.

F.
ID: 811197 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 811201 - Posted: 23 Sep 2008, 10:08:27 UTC - in response to Message 811197.  

Another brownie point to Mr H.

Why isn't my brownie point showing up in BOINCstats yet? ;-) Must be a bug - panic, panic - someone get Matt and Eric down to the lab immediately (even though it's the middle of the night).

Only joking - glad it worked out OK for you.
ID: 811201 · Report as offensive
mjdb

Send message
Joined: 5 Jul 00
Posts: 11
Credit: 9,395,247
RAC: 0
New Zealand
Message 811662 - Posted: 24 Sep 2008, 21:51:17 UTC

Reading this thread I see that there are issues with the granting of Astropulse Credit, but this can be granted manually.

Does one have to ask for this to happen ?

I recently finished an AP workunit (12 days after it was sent),
and received no credit.
I note that mine was the 3rd computer, and the unit was sent out to me only a few hours before the second computer returned its result (after a month).

ap_05jn08ac_B1_P1_00019_20080812_10707.wu

http://setiathome.berkeley.edu/workunit.php?wuid=313977861
http://setiathome.berkeley.edu/result.php?resultid=985738090

Martin.


ID: 811662 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 14 · Next

Message boards : Number crunching : AstroPulse errors - Reporting


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.