Message boards :
Number crunching :
AstroPulse Ghost WUs !!!
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I'm wondering what we can do about this. As things stand, I can't see Eric fixing the scheduler code until at least Outage Tuesday (he's up to his neck in the politics thing, and he absolutely needs to have a long sleep and a long walk in the hills before doing any coding - otherwise more mistakes will be made). Probably, the best thing would be to turn off the AP splitters for the time being. There are now a fair number of AP tasks out for crunching - I got my first for my Xeon with dual-mode app_info an hour ago, so at least that bit is working. Let's wait for those to come back, check the results, and evaluate the whole situation. Then, it'll be the time to debug the scheduler and restart AP for real, long-term, science. In the meantime, I'm wondering about deliberately sabotaging my app_info.xml on hosts that wouldn't normally get AP work - for example, the 1.8 GHz P4 server I wrote up here. It would normally not ask for AP work, because it has a high resource share for Einstein - but Einstein is down for maintenance this weekend, and it's dry. (Who mentioned a perfect storm?) We know how to sabotage it - just put that pesky space in the <main_program /> tag! If that generated an immediate error, rather than a ghost WU, the long-term result would be the same, but the re-issue would happen after a few seconds, rather than a month. And it wouldn't put any extra load on the BOINC splitters and database (generating a new task from an existing WU is a much lighter task than generating the WU in the first place). The only downside would be an increase in downloaded data: potential ghost WUs would be downloaded twice, rather than not at all at present. Thoughts, anyone? Please don't rush to do this before we've had a chance to think about it - there may be other drawbacks that haven't occurred to me yet. The last thing we want to do at the moment is to put extra stress on either the staff or the servers at Berkeley. |
geoff Send message Joined: 25 Apr 00 Posts: 123 Credit: 34,100,351 RAC: 18 |
My AstroPulse Ghost WU downloaded last week is in my folder C:\Program Files\BOINC\projects\setiathome.berkeley.edu and is 8196KB but it is not listed in BOINC Tasks so will never run. I have added the AP files and modified app_info.xml but not sure if I want to run AstroPulse at the present time. |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
Thanks Richard. How did you find it? Is there a search function on the website for them, or did you go through all the tasks? BOINC blog |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
My AstroPulse Ghost WU downloaded last week is in my folder C:\Program Files\BOINC\projects\setiathome.berkeley.edu and is 8196KB but it is not listed in BOINC Tasks so will never run. I have added the AP files and modified app_info.xml but not sure if I want to run AstroPulse at the present time. If it downloaded, then it's a different sort of ghost from the ones we're discussing here. With a cache that size, I wonder how you know whether a particular tasks is there or not ;-) ! But I can't find it either: what's the timestamp on that 8196KB file? |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
My own patent search strategy! Actually, just eyeball the task listing, and focus on the ones where the deadline is out of line with its neighbours. Just at the moment, it's easy because the AP tasks run until September, and almost all the MB work is expiring in August. |
geoff Send message Joined: 25 Apr 00 Posts: 123 Credit: 34,100,351 RAC: 18 |
My AstroPulse Ghost WU downloaded last week is in my folder C:\Program Files\BOINC\projects\setiathome.berkeley.edu and is 8196KB but it is not listed in BOINC Tasks so will never run. I have added the AP files and modified app_info.xml but not sure if I want to run AstroPulse at the present time. This WU is http://setiathome.berkeley.edu/workunit.php?wuid=307558020 and as you say is probably not a ghost and will eventually be processed, it is listed in BOINC under application, I was looking at Project Confused because I did get another AP WU which I think is a ghost, it is not listed and was received before adding the AP files and app_info.xml http://setiathome.berkeley.edu/workunit.php?wuid=306871624 |
Mumps [MM] Send message Joined: 11 Feb 08 Posts: 4454 Credit: 100,893,853 RAC: 30 |
Well, this definitely looks like a scheduler problem. I've got all my machines running the Op App, and the only one I've configured to try getting AP hasn't seen a single AP WU, but I just had one of the others have an AP WU scheduled for it. No way that should have happened... 4415467 Should be getting AP 4254505 Should be not getting AP |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I looked through the caches on all of my rigs and do not see any AP work there at present. If any has been issued to my rigs, it is not waiting in cache. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
gomeyer Send message Joined: 21 May 99 Posts: 488 Credit: 50,370,425 RAC: 0 |
I looked through the caches on all of my rigs and do not see any AP work there at present. If any has been issued to my rigs, it is not waiting in cache. That is correct. These appear as being issued in the Tasks list online, but do not appear to be actually downloaded to the host. I also have at least one. |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
I looked through the caches on all of my rigs and do not see any AP work there at present. If any has been issued to my rigs, it is not waiting in cache. @msattler : Here's one of your AP WUs... Have you changed your app_info.xml to understand how to crunch ? Son-of-a-gun... the only rig I didn't update app_info.xml... my 1.6GHz single core laptop got a ghost AP WU. I will not run AP on my laptop... so I fear more ghosts will follow. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Son-of-a-gun... the only rig I didn't update app_info.xml... my 1.6GHz single core laptop got a ghost AP WU. I will not run AP on my laptop... so I fear more ghosts will follow. Unless you install a deliberately-sabotaged app_info, as suggested in my message 791984. What say you? |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
I looked through the caches on all of my rigs and do not see any AP work there at present. If any has been issued to my rigs, it is not waiting in cache. No, I did not update anything, as I really did not want to decide on running AP until it was out for a while, any bugs had been worked out, credit compared to MB was established, and you optimizing folks had a chance to take a swing at an optimized AP app. And if I understand the situation correctly at this time, this should NOT be happening, but is due to a bug in the scheduler that Eric will probably look at as soon as he is not up to his eyeballs in non-science issues. So for now, it was not my intent to do any AP. I would prefer not to go through the trouble of installing a crippled AP app_info just to send the AP WUs back as client errors....hopefully Eric takes note of this thread and has the time to try to sort the problem at the scheduler level. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
RandyC Send message Joined: 20 Oct 99 Posts: 714 Credit: 1,704,345 RAC: 0 |
Bummer. I've got at least AP 3 ghosts. All system (that run SETI) use Opt Apps and none have had their app_info.xml updated to crunch AP. |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
Son-of-a-gun... the only rig I didn't update app_info.xml... my 1.6GHz single core laptop got a ghost AP WU. I will not run AP on my laptop... so I fear more ghosts will follow. Sounds like a solid solution for the laptop... better to cancel WUs upon issue than creating more 30 day ghosts. Can we say Ghost Busters ;-? |
kittyman Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004 |
Eric is now aware of the situation and will be looking into it. "Freedom is just Chaos, with better lighting." Alan Dean Foster |
Labbie Send message Joined: 19 Jun 06 Posts: 4083 Credit: 5,930,102 RAC: 0 |
|
Robert Gammon Send message Joined: 29 Aug 01 Posts: 21 Credit: 1,573,250 RAC: 0 |
Sounds like a solid solution for the laptop... better to cancel WUs upon issue than creating more 30 day ghosts. I too have a Ghost, web site says I have the wu, but the file does not exist on the PC. I too see the error messages in the log State File Error relating to no astropulse and a wu that cannot be processed. So the question is, how to cancel this WU, given that the localhost has NO knowledge of this wu? Onto another topic, seti has been running dry this week after the outage. It appears that the back office functions (analysis, result merge with database, purge results, purge wus) are deluged and that is slowing down the splitters |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
So the question is, how to cancel this WU, given that the localhost has NO knowledge of this wu? There is no way to cancel the workunit on your end short of doing a detach, which will also rid you of all your current work too. Sort of like destroying your house to fix a broken window. The general way of dealing with these is to simply let them time out on their own. Once the deadline is reached, it will be sent to another person (and hopefully not be a ghost next time). |
Dave Stegner Send message Joined: 20 Oct 04 Posts: 540 Credit: 65,583,328 RAC: 27 |
Looking around I find a number of my 11 rigs have been assigned Ap WU's, all of which are running AK8 Opti App and none of which have been updates for AP. Message 792229, above says Eric is aware and looking into it. Is anyone aware of an answer yet? Does anyone have any suggestions on my course of action? Crunching with a RAC approx 10K, I will be creating many ghost WU's pretty quickly. Dave |
Labbie Send message Joined: 19 Jun 06 Posts: 4083 Credit: 5,930,102 RAC: 0 |
This makes two for me here. Just received it a few minutes ago. Calm Chaos Forum...Join Calm Chaos Now |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.