Astropulse Errors-Optimized version 5

Message boards : Number crunching : Astropulse Errors-Optimized version 5
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

AuthorMessage
mmonroe

Send message
Joined: 22 Oct 06
Posts: 20
Credit: 5,132,754
RAC: 0
United States
Message 838447 - Posted: 10 Dec 2008, 9:20:24 UTC - in response to Message 838379.  

ok again at the risk of sounding stupid, I have two testers one saying there was no optimized app involved, the other saying there was. I am really confused right now; the best thing may be to not accept AP WUs until there is some consistency about how things get handled here.
ID: 838447 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 838450 - Posted: 10 Dec 2008, 10:15:34 UTC - in response to Message 838447.  

ok again at the risk of sounding stupid, I have two testers one saying there was no optimized app involved, the other saying there was. I am really confused right now; the best thing may be to not accept AP WUs until there is some consistency about how things get handled here.


That's because the the Wingmen that have completed are using stock apps,
But the one that Joe's talking about Hasn't completed YET, it IS running Optimised, you can tell that by looking at other tasks that have been completed!


Claggy
ID: 838450 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 838451 - Posted: 10 Dec 2008, 10:15:57 UTC - in response to Message 838447.  
Last modified: 10 Dec 2008, 10:20:22 UTC

ok again at the risk of sounding stupid, I have two testers one saying there was no optimized app involved, the other saying there was. I am really confused right now; the best thing may be to not accept AP WUs until there is some consistency about how things get handled here.

It's not really that difficult, and it is consistent. But you do have to follow the details if you want an answer to your question.

Here is the WU we're talking about:

1047643474 3983344 6 Nov 2008 6:38:25 UTC 10 Nov 2008 16:48:22 UTC Over Success Done 278,706.50 760.43 0.00
1047643475 4646024 6 Nov 2008 6:38:25 UTC 6 Dec 2008 6:38:25 UTC Over No reply New 0.00 --- ---
1079052754 4570645 6 Dec 2008 6:38:29 UTC 9 Dec 2008 2:54:40 UTC Over Success Done 203,751.10 756.67 0.00
1082041232 4009342 9 Dec 2008 2:54:48 UTC 8 Jan 2009 2:54:48 UTC In progress --- New --- --- ---

The first line is you. You used the stock app. You did the work between 6 November and 10 November. At that time, v4.36 was the right and current application to use.

The second line - your original 'wingman' - bailed out. We can ignore him/her.

The third line is your replacement wingman. As I told you, they also used the stock app. But they did the work between 6 December and 9 December - a month later than you. At that time, Berkeley was (and is) sending out v5.00 as the stock app.

So the two results didn't validate, and the job was sent out to a fourth computer. This is the one Joe told you about. Although we can't be certain at this stage what application he or she will use on your particular job, they seem to be using the v5.00 optimised app for crunching exclusively AP tasks - so it's a pretty safe bet that they will do the same for yours, as Joe said. He has summarised the options quite thoroughly.

There's no reason to stop processing AP work now, because of a problem which happened a month ago and won't happen again. If you allow a new AP task on your computer, you will get the current AP v5.00 application from Berkeley automatically, and your work will validate against anyone else's work run with v5.00, whether stock or optimised. There is still a small chance (decreasing every day) that you will be asked to validate an old task, run a month ago with v4.36 - in that case you may (as in this case) have to wait for a fourth, fifth, etc. wingman to finally resolve the matter: but you'll be on the winning side. And there's just a tiny possibility that you'll come up against a wingman running an optimised v4.35 who hasn't upgraded yet: you'll be on the winning side of the resend again, but if I see anyone still doing that after all the conversations we've had about upgrading in the last three weeks, I will personally go round and stuff their obselete app where the sun don't shine!
ID: 838451 · Report as offensive
mmonroe

Send message
Joined: 22 Oct 06
Posts: 20
Credit: 5,132,754
RAC: 0
United States
Message 838473 - Posted: 10 Dec 2008, 13:06:14 UTC - in response to Message 838451.  

Thanks guys for your explanantion. I guess this just proves the problem with running all these different flavors of applications with no consistent testing and production implementation methodology. A faithful cruncher like myself who is not technical but wants to help things here gets screwed because all the techies want to use this as thier personal sandbox to try out the latest and greatest. I am still puzzled by the lack of backward campatability of the stock 4.36 and 5.36 and that someone did not see the potential issue with the length of the WUs affecting the reporting. Neither I nor my wingman did anything wrong, yet here we stand
ID: 838473 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 838479 - Posted: 10 Dec 2008, 13:36:10 UTC - in response to Message 838473.  

Thanks guys for your explanantion. I guess this just proves the problem with running all these different flavors of applications with no consistent testing and production implementation methodology. A faithful cruncher like myself who is not technical but wants to help things here gets screwed because all the techies want to use this as thier personal sandbox to try out the latest and greatest. I am still puzzled by the lack of backward campatability of the stock 4.36 and 5.36 and that someone did not see the potential issue with the length of the WUs affecting the reporting. Neither I nor my wingman did anything wrong, yet here we stand.

Yes, that's exactly where we stand. Frustrating and irritating, but we'll get over it.

In fact, the the incompatibility was forseeable and forseen - see the Technical News for 14 November. The relevant people wanted to avoid the validation problem, but for some reason - and we suspect cock-up, and the untimely intervention of Murphy's Law, rather than conspiracy - what happened in reality didn't match the intention.
ID: 838479 · Report as offensive
mmonroe

Send message
Joined: 22 Oct 06
Posts: 20
Credit: 5,132,754
RAC: 0
United States
Message 838507 - Posted: 10 Dec 2008, 14:50:04 UTC - in response to Message 838479.  

Thanks you for your reply Richard, but I do not think I will take your stance to get over it. As I said earlier, this was bolloxed (now there's a fine British word) due to a rush to get the latest and greatest into test without the proper precautions.This sems to me to be a "look at me, we're right there" mentality

I have been here through outages and understand the issue of uptime on a project like this and do not have a problem with it; I do however draw the line at sloppy process where things could have been avoided (WUs timing out is not an unknown circumstance here) or lazy thinking which obviously happened here to satisfy the small minority of vocal super-crunchers.
ID: 838507 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 838510 - Posted: 10 Dec 2008, 15:03:54 UTC
Last modified: 10 Dec 2008, 15:04:22 UTC

I see no optimised app issue here, you are in the wrong thread, take your complaint to the stock thread please. We did all we could.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 838510 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 838519 - Posted: 10 Dec 2008, 15:39:27 UTC - in response to Message 838507.  

.... to satisfy the small minority of vocal super-crunchers.

Actually, no.

There are two possible sub-sets of vocal super-crunchers you might be referring to:

1) People like Jason, Joe and I who volunteer to help with the smooth running of the project, help users like you, and contribute in our various ways to the existence of the optimised applications. We were arguing just as vocally as you that the upgrade roll-out was flawed and should be planned and managed better.

2) People who want to amass the greatest number of BOINC credits possible. Most of them don't want to run Astropulse tasks at all, because they cause short-term depression to their RAC - and hence, by extension, to the RAC's owner. They didn't want the new version rolled out either - most of them probably hoped it would just go quietly away.

Although I can't find a public citation for this, I have reason to believe that the major reason for the urgency was the need to complete and referee Joshua Von Korff's PhD thesis. Although Josh's account operates in stealth mode here, he is the programmer behind the current AP applications, and his account is listed for bug reports on the Astropulse FAQ page. However, don't be hard on him for the deployment issues: that would be the responsibility of the staff scientists who supervise his PhD research.
ID: 838519 · Report as offensive
mmonroe

Send message
Joined: 22 Oct 06
Posts: 20
Credit: 5,132,754
RAC: 0
United States
Message 838536 - Posted: 10 Dec 2008, 16:23:22 UTC - in response to Message 838519.  

Thanks you for your candor and openness Richard. I will now leave this thread and go where I belong
ID: 838536 · Report as offensive
Profile Gustav_and_Padma
Avatar

Send message
Joined: 26 Oct 03
Posts: 16
Credit: 315,654
RAC: 0
United States
Message 839995 - Posted: 14 Dec 2008, 21:06:44 UTC - in response to Message 838510.  

Where is the thread for non optimised AP issues? The one we had been using has a final post refering all AP issues here. But here there seems to be some problem with that. Maybe you guys would just rather not hear from us at all. Or maybe somebody should start another thread?

We are just rying to figure out if the apps we are running make sense to use. Like this last one,
http://setiathome.berkeley.edu/result.php?resultid=1051552517
by our wingman who reported it using some form of AP ver 4.35, but his result doesn't look like the ones we ran with version 4.36 of Astropulse at all. (We are running Standard AP Ver 5, and that on only one machine - not optimised). So a third wingman has apparently been called in. Ok, that makes sense to casual participants like us.

It just seems odd that sometimes discrepancies require a third wingman and credit remains 'pending', while in other circumstances credit is granted as zero. And, As we've stated before, the results do not, in all cases, remain visible for 24 houss after granting credit. Thus, we are never really sure what happens to some of our efforts.

But, some comments we've received back about posts in which we have mentioned this make us sound like morons. And all we are trying to do is provide observations that might be helpful to whomever might understand/debug how the validation and reporting software really works.

It is understandable that, as complicated as the validation coding must certainly be, that some sequences of events might trigger inconsistent reporting protocols. People with marginal programming experience (like us) can understand how that could happen. In fact folks like us can probably understand it better than one might guess, lol.

We don't really care if there are inconsistencies in the reporting protocols. Everybody wants their brownie points, of course. But we think, for typical users to remain motivated to donate computing time to this research project, well informed responses to posts containing questions or comments are of greater benefit to the project.

For us we would love to hear something like, 'your point is something we are working on, but the current progtamming priority is to maintain accuracy in the science database' (translation: we are swamped grading final exams right now -don't bother us about your brownie points), :)

So which thread should a post like this go to?, =/null?
ID: 839995 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 840051 - Posted: 15 Dec 2008, 0:16:49 UTC - in response to Message 839995.  

Where is the thread for non optimised AP issues? The one we had been using has a final post refering all AP issues here. But here there seems to be some problem with that. Maybe you guys would just rather not hear from us at all. Or maybe somebody should start another thread?


There were complaints about too many sticky threads, so an attempt was made to merge similar topics. I think you've chosen the right thread for your post even though the topic is not quite as general as it should be.

We are just rying to figure out if the apps we are running make sense to use. Like this last one,
http://setiathome.berkeley.edu/result.php?resultid=1051552517
by our wingman who reported it using some form of AP ver 4.35, but his result doesn't look like the ones we ran with version 4.36 of Astropulse at all. (We are running Standard AP Ver 5, and that on only one machine - not optimised). So a third wingman has apparently been called in. Ok, that makes sense to casual participants like us.


The stderr part of a Task details page is separate from the actual result which was uploaded. The optimized 4.35 stderr is indeed quite different from stock 4.36, but the results are intended to be the same.

That task sent 9 Nov 2008 was appropriately done with 4.x, as was your use of 5.00 for the task sent 10 Dec 2008. The host working on the reissued task now has already done another AP task with 5.00, so should match your result.

It just seems odd that sometimes discrepancies require a third wingman and credit remains 'pending', while in other circumstances credit is granted as zero. And, As we've stated before, the results do not, in all cases, remain visible for 24 houss after granting credit. Thus, we are never really sure what happens to some of our efforts.

But, some comments we've received back about posts in which we have mentioned this make us sound like morons. And all we are trying to do is provide observations that might be helpful to whomever might understand/debug how the validation and reporting software really works.

It is understandable that, as complicated as the validation coding must certainly be, that some sequences of events might trigger inconsistent reporting protocols. People with marginal programming experience (like us) can understand how that could happen. In fact folks like us can probably understand it better than one might guess, lol.

We don't really care if there are inconsistencies in the reporting protocols. Everybody wants their brownie points, of course. But we think, for typical users to remain motivated to donate computing time to this research project, well informed responses to posts containing questions or comments are of greater benefit to the project.

For us we would love to hear something like, 'your point is something we are working on, but the current progtamming priority is to maintain accuracy in the science database' (translation: we are swamped grading final exams right now -don't bother us about your brownie points), :)

So which thread should a post like this go to?, =/null?

You are definitely not the only ones puzzled about what the project was and is doing. In the early days of SETI@home there was quite a lot of information and the project was adequately funded. Now the funding is barely enough to keep it running at all, and information is minimal. In my view those relationships basically define the situation.
                                                               Joe
ID: 840051 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 840752 - Posted: 17 Dec 2008, 5:12:27 UTC
Last modified: 17 Dec 2008, 5:13:57 UTC

I've started my first AP WU with the optimized 5.0 Linux app. It should take about the same time (115 hours) of the stock 4.6 app, while the time of the untested 4.6 optimized app was about 56 hours. Two of the WUs crunched with it are still pending awaiting the results respectively of a third and fifth wingman, The fourth wingman has errored out (compute error) using a stock 5.0 Windows app and I checked that he has errored all his AP WUs, so maybe he has a problem on his PC, My CPU is Opteron 1210 at 1.8 GHz, my Linux is SuSE 10.3.
ID: 840752 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 840760 - Posted: 17 Dec 2008, 5:40:36 UTC - in response to Message 840752.  

That is not an optimized app for Linux, Crunch3r is still working on it for the time being.

It is the stock AP app until one can be built, it was felt that for those who want to run AP along with optimized MB it was better to have packages include stock for now.

ID: 840760 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 840762 - Posted: 17 Dec 2008, 5:47:07 UTC

Thanks for the explanation. I was wondering about the optimization level. There is none.
Tullio
ID: 840762 · Report as offensive
Profile Gustav_and_Padma
Avatar

Send message
Joined: 26 Oct 03
Posts: 16
Credit: 315,654
RAC: 0
United States
Message 840839 - Posted: 17 Dec 2008, 14:48:24 UTC - in response to Message 840051.  

The stderr part of a Task details page is separate from the actual result which was uploaded. The optimized 4.35 stderr is indeed quite different from stock 4.36, but the results are intended to be the same.


You are definitely not the only ones puzzled about what the project was and is doing. In the early days of SETI@home there was quite a lot of information and the project was adequately funded. Now the funding is barely enough to keep it running at all, and information is minimal. In my view those relationships basically define the situation.
                                                               Joe


Thank You Josef

We will continue our support however small it may be. If others do so too, hopefully this research shall continue. And, someday in the future people will look back and be grateful for our eccentric behavior.
ID: 840839 · Report as offensive
Profile Gustav_and_Padma
Avatar

Send message
Joined: 26 Oct 03
Posts: 16
Credit: 315,654
RAC: 0
United States
Message 841578 - Posted: 18 Dec 2008, 19:42:06 UTC - in response to Message 840762.  

This may be too specific for this thread. But, just as a follow up, it looks like the original wingman's machine running optiized ver 4.35 <a href="http://setiathome.berkeley.edu/result.php?resultid=1051552517">
Task ID = 1051552517</a> ran in only just over half the CPU time as each of us running Standard AP Ver 5, both on almost identical machines, although the third wingman is running XP, not Vista like ours; his CPU time was shorter than ours too. The first wingman -did- get credit, which is a good thing. Yet his validation was delayed for some time after the 2nd and 3rd wingmen received credit. All three of our crunches finally validated and received the same credit. However, it is still comparing apples and oramges since the first wingman is running a 64 bit processor and we (the 2nd and 3rd wingmen completing the WU) have floating point calculations only about 2/3rds as fast as the first wingman (Thomas Heutinger and Friends). Still, the overall conclusion would appear to be that the optimized AP crunch was, in fact, a more efficient run.
ID: 841578 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 843109 - Posted: 21 Dec 2008, 18:08:48 UTC

http://setiathome.berkeley.edu/workunit.php?wuid=366408834

Must have not been any blanking in this one as a 4.35 just verified versus my 5.00.

ID: 843109 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 843113 - Posted: 21 Dec 2008, 18:15:04 UTC - in response to Message 843109.  

Oy!,, That's not an Error, LoL
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 843113 · Report as offensive
Profile Adam Alexander
Avatar

Send message
Joined: 24 Dec 07
Posts: 3
Credit: 1,039,567
RAC: 0
United States
Message 843325 - Posted: 22 Dec 2008, 1:16:03 UTC - in response to Message 837034.  

This thread is to post errors and address concerns regarding the new Optimised AP v5.00 Astropulse application.

Please do not open new threads but post errors and commentary here.

Please, upgrade your optimized AP version to ap_5.00r69 !

Thanks!


I've got an AP 5.0 wu running that has time to completion as 450 hours on a Core 2 Quad Q6700. Is that normal? I've never noticed a SETI work unit that took that long to crunch.

Running:
2 x Intel Core 2 Quad CPU Q6700 @ 2.66GHz
PS3PF
Intel Core 2 Duo CPU T5750 @ 2.00GHz
Intel P4 CPU 2.40GHz
I
ID: 843325 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 843389 - Posted: 22 Dec 2008, 2:30:43 UTC - in response to Message 843325.  

I've got an AP 5.0 wu running that has time to completion as 450 hours on a Core 2 Quad Q6700. Is that normal? I've never noticed a SETI work unit that took that long to crunch.

No, it's because you've been running CUDA setiathome_enhanced work and the Duration Correction Factor is high. Crunch time wlll probably be a tenth or less of that estimate. Astropulse does take considerably longer than setiathome_enhanced though, even on hosts running both with CPU.
                                                              Joe
ID: 843389 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

Message boards : Number crunching : Astropulse Errors-Optimized version 5


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.