AstroPulse Ghost WUs !!!

Message boards : Number crunching : AstroPulse Ghost WUs !!!
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 791984 - Posted: 3 Aug 2008, 12:11:18 UTC

I'm wondering what we can do about this. As things stand, I can't see Eric fixing the scheduler code until at least Outage Tuesday (he's up to his neck in the politics thing, and he absolutely needs to have a long sleep and a long walk in the hills before doing any coding - otherwise more mistakes will be made).

Probably, the best thing would be to turn off the AP splitters for the time being. There are now a fair number of AP tasks out for crunching - I got my first for my Xeon with dual-mode app_info an hour ago, so at least that bit is working. Let's wait for those to come back, check the results, and evaluate the whole situation. Then, it'll be the time to debug the scheduler and restart AP for real, long-term, science.

In the meantime, I'm wondering about deliberately sabotaging my app_info.xml on hosts that wouldn't normally get AP work - for example, the 1.8 GHz P4 server I wrote up here. It would normally not ask for AP work, because it has a high resource share for Einstein - but Einstein is down for maintenance this weekend, and it's dry. (Who mentioned a perfect storm?)

We know how to sabotage it - just put that pesky space in the <main_program /> tag! If that generated an immediate error, rather than a ghost WU, the long-term result would be the same, but the re-issue would happen after a few seconds, rather than a month. And it wouldn't put any extra load on the BOINC splitters and database (generating a new task from an existing WU is a much lighter task than generating the WU in the first place). The only downside would be an increase in downloaded data: potential ghost WUs would be downloaded twice, rather than not at all at present.

Thoughts, anyone? Please don't rush to do this before we've had a chance to think about it - there may be other drawbacks that haven't occurred to me yet. The last thing we want to do at the moment is to put extra stress on either the staff or the servers at Berkeley.
ID: 791984 · Report as offensive
geoff

Send message
Joined: 25 Apr 00
Posts: 123
Credit: 34,100,351
RAC: 18
United Kingdom
Message 791985 - Posted: 3 Aug 2008, 12:12:57 UTC
Last modified: 3 Aug 2008, 12:14:47 UTC

My AstroPulse Ghost WU downloaded last week is in my folder C:\Program Files\BOINC\projects\setiathome.berkeley.edu and is 8196KB but it is not listed in BOINC Tasks so will never run. I have added the AP files and modified app_info.xml but not sure if I want to run AstroPulse at the present time.
ID: 791985 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 791991 - Posted: 3 Aug 2008, 12:24:23 UTC - in response to Message 791949.  


939358121


Thanks Richard. How did you find it? Is there a search function on the website for them, or did you go through all the tasks?
BOINC blog
ID: 791991 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 791992 - Posted: 3 Aug 2008, 12:24:46 UTC - in response to Message 791985.  

My AstroPulse Ghost WU downloaded last week is in my folder C:\Program Files\BOINC\projects\setiathome.berkeley.edu and is 8196KB but it is not listed in BOINC Tasks so will never run. I have added the AP files and modified app_info.xml but not sure if I want to run AstroPulse at the present time.

If it downloaded, then it's a different sort of ghost from the ones we're discussing here.

With a cache that size, I wonder how you know whether a particular tasks is there or not ;-) ! But I can't find it either: what's the timestamp on that 8196KB file?
ID: 791992 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 791994 - Posted: 3 Aug 2008, 12:27:23 UTC - in response to Message 791991.  


939358121


Thanks Richard. How did you find it? Is there a search function on the website for them, or did you go through all the tasks?

My own patent search strategy!

Actually, just eyeball the task listing, and focus on the ones where the deadline is out of line with its neighbours. Just at the moment, it's easy because the AP tasks run until September, and almost all the MB work is expiring in August.
ID: 791994 · Report as offensive
geoff

Send message
Joined: 25 Apr 00
Posts: 123
Credit: 34,100,351
RAC: 18
United Kingdom
Message 792001 - Posted: 3 Aug 2008, 12:49:34 UTC - in response to Message 791992.  
Last modified: 3 Aug 2008, 13:08:42 UTC

My AstroPulse Ghost WU downloaded last week is in my folder C:\Program Files\BOINC\projects\setiathome.berkeley.edu and is 8196KB but it is not listed in BOINC Tasks so will never run. I have added the AP files and modified app_info.xml but not sure if I want to run AstroPulse at the present time.

If it downloaded, then it's a different sort of ghost from the ones we're discussing here.

With a cache that size, I wonder how you know whether a particular tasks is there or not ;-) ! But I can't find it either: what's the timestamp on that 8196KB file?


This WU is http://setiathome.berkeley.edu/workunit.php?wuid=307558020 and as you say is probably not a ghost and will eventually be processed, it is listed in BOINC under application, I was looking at Project

Confused because I did get another AP WU which I think is a ghost, it is not listed and was received before adding the AP files and app_info.xml http://setiathome.berkeley.edu/workunit.php?wuid=306871624
ID: 792001 · Report as offensive
Profile Mumps [MM]
Volunteer tester
Avatar

Send message
Joined: 11 Feb 08
Posts: 4454
Credit: 100,893,853
RAC: 30
United States
Message 792062 - Posted: 3 Aug 2008, 14:48:43 UTC

Well, this definitely looks like a scheduler problem. I've got all my machines running the Op App, and the only one I've configured to try getting AP hasn't seen a single AP WU, but I just had one of the others have an AP WU scheduled for it. No way that should have happened...

4415467 Should be getting AP
4254505 Should be not getting AP
ID: 792062 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 792083 - Posted: 3 Aug 2008, 15:57:38 UTC

I looked through the caches on all of my rigs and do not see any AP work there at present. If any has been issued to my rigs, it is not waiting in cache.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 792083 · Report as offensive
gomeyer
Volunteer tester

Send message
Joined: 21 May 99
Posts: 488
Credit: 50,370,425
RAC: 0
United States
Message 792095 - Posted: 3 Aug 2008, 16:28:47 UTC - in response to Message 792083.  

I looked through the caches on all of my rigs and do not see any AP work there at present. If any has been issued to my rigs, it is not waiting in cache.

That is correct. These appear as being issued in the Tasks list online, but do not appear to be actually downloaded to the host. I also have at least one.
ID: 792095 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 792098 - Posted: 3 Aug 2008, 16:36:37 UTC - in response to Message 792083.  
Last modified: 3 Aug 2008, 16:40:02 UTC

I looked through the caches on all of my rigs and do not see any AP work there at present. If any has been issued to my rigs, it is not waiting in cache.


@msattler : Here's one of your AP WUs... Have you changed your app_info.xml to understand how to crunch ?

Son-of-a-gun... the only rig I didn't update app_info.xml... my 1.6GHz single core laptop got a ghost AP WU. I will not run AP on my laptop... so I fear more ghosts will follow.
ID: 792098 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 792130 - Posted: 3 Aug 2008, 17:36:57 UTC - in response to Message 792098.  

Son-of-a-gun... the only rig I didn't update app_info.xml... my 1.6GHz single core laptop got a ghost AP WU. I will not run AP on my laptop... so I fear more ghosts will follow.

Unless you install a deliberately-sabotaged app_info, as suggested in my message 791984. What say you?
ID: 792130 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 792133 - Posted: 3 Aug 2008, 17:41:06 UTC - in response to Message 792098.  

I looked through the caches on all of my rigs and do not see any AP work there at present. If any has been issued to my rigs, it is not waiting in cache.


@msattler : Here's one of your AP WUs... Have you changed your app_info.xml to understand how to crunch ?

Son-of-a-gun... the only rig I didn't update app_info.xml... my 1.6GHz single core laptop got a ghost AP WU. I will not run AP on my laptop... so I fear more ghosts will follow.

No, I did not update anything, as I really did not want to decide on running AP until it was out for a while, any bugs had been worked out, credit compared to MB was established, and you optimizing folks had a chance to take a swing at an optimized AP app.

And if I understand the situation correctly at this time, this should NOT be happening, but is due to a bug in the scheduler that Eric will probably look at as soon as he is not up to his eyeballs in non-science issues.

So for now, it was not my intent to do any AP. I would prefer not to go through the trouble of installing a crippled AP app_info just to send the AP WUs back as client errors....hopefully Eric takes note of this thread and has the time to try to sort the problem at the scheduler level.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 792133 · Report as offensive
Profile RandyC
Avatar

Send message
Joined: 20 Oct 99
Posts: 714
Credit: 1,704,345
RAC: 0
United States
Message 792146 - Posted: 3 Aug 2008, 17:52:25 UTC

Bummer. I've got at least AP 3 ghosts. All system (that run SETI) use Opt Apps and none have had their app_info.xml updated to crunch AP.
ID: 792146 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 792159 - Posted: 3 Aug 2008, 18:02:23 UTC - in response to Message 792130.  
Last modified: 3 Aug 2008, 18:14:10 UTC

Son-of-a-gun... the only rig I didn't update app_info.xml... my 1.6GHz single core laptop got a ghost AP WU. I will not run AP on my laptop... so I fear more ghosts will follow.

Unless you install a deliberately-sabotaged app_info, as suggested in my message 791984. What say you?

Sounds like a solid solution for the laptop... better to cancel WUs upon issue than creating more 30 day ghosts.

Can we say Ghost Busters ;-?
ID: 792159 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 792229 - Posted: 3 Aug 2008, 19:50:58 UTC

Eric is now aware of the situation and will be looking into it.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 792229 · Report as offensive
Profile Labbie
Avatar

Send message
Joined: 19 Jun 06
Posts: 4083
Credit: 5,930,102
RAC: 0
United States
Message 792669 - Posted: 4 Aug 2008, 12:48:38 UTC

ID: 792669 · Report as offensive
Profile Robert Gammon
Volunteer tester

Send message
Joined: 29 Aug 01
Posts: 21
Credit: 1,573,250
RAC: 0
United States
Message 792686 - Posted: 4 Aug 2008, 13:59:51 UTC - in response to Message 792159.  

Sounds like a solid solution for the laptop... better to cancel WUs upon issue than creating more 30 day ghosts.

Can we say Ghost Busters ;-?


I too have a Ghost, web site says I have the wu, but the file does not exist on the PC. I too see the error messages in the log State File Error relating to no astropulse and a wu that cannot be processed.

So the question is, how to cancel this WU, given that the localhost has NO knowledge of this wu?

Onto another topic, seti has been running dry this week after the outage. It appears that the back office functions (analysis, result merge with database, purge results, purge wus) are deluged and that is slowing down the splitters

ID: 792686 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 792688 - Posted: 4 Aug 2008, 14:09:09 UTC - in response to Message 792686.  

So the question is, how to cancel this WU, given that the localhost has NO knowledge of this wu?


There is no way to cancel the workunit on your end short of doing a detach, which will also rid you of all your current work too. Sort of like destroying your house to fix a broken window.

The general way of dealing with these is to simply let them time out on their own. Once the deadline is reached, it will be sent to another person (and hopefully not be a ghost next time).
ID: 792688 · Report as offensive
Dave Stegner
Volunteer tester
Avatar

Send message
Joined: 20 Oct 04
Posts: 540
Credit: 65,583,328
RAC: 27
United States
Message 792886 - Posted: 4 Aug 2008, 22:15:16 UTC

Looking around I find a number of my 11 rigs have been assigned Ap WU's, all of which are running AK8 Opti App and none of which have been updates for AP.



Message 792229, above says Eric is aware and looking into it.

Is anyone aware of an answer yet?

Does anyone have any suggestions on my course of action? Crunching with a RAC approx 10K, I will be creating many ghost WU's pretty quickly.





Dave

ID: 792886 · Report as offensive
Profile Labbie
Avatar

Send message
Joined: 19 Jun 06
Posts: 4083
Credit: 5,930,102
RAC: 0
United States
Message 792919 - Posted: 4 Aug 2008, 23:08:38 UTC

This makes two for me here. Just received it a few minutes ago.

Calm Chaos Forum...Join Calm Chaos Now
ID: 792919 · Report as offensive
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : AstroPulse Ghost WUs !!!


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.