Ghost WU issue (and some talk about deadlines)

Message boards : Number crunching : Ghost WU issue (and some talk about deadlines)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 12 · Next

AuthorMessage
Profile ohiomike
Avatar

Send message
Joined: 14 Mar 04
Posts: 357
Credit: 650,069
RAC: 0
United States
Message 570192 - Posted: 18 May 2007, 11:51:49 UTC

Ouch- I'm running x64, so I need the app_info to get WU's.

Boinc Button Abuser In Training >My Shrubbers<
ID: 570192 · Report as offensive
Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 570201 - Posted: 18 May 2007, 11:58:32 UTC - in response to Message 570170.  

YES!!! I was just about to post the same thing.

The following has worked for me on three systems - two late version 5.8 BOINC, and a 5.3.12.tx. All were service installs, running appropriate Chicken 2.2B, and had run completely dry.

Recipe:

Rename app_info.xml so it won't be recognised
Restart BOINC (service)
Update SETI - may not get through first time, but keep trying
Restore app_info.xml to original name
Wait until all transfers have finished
Restart BOINC (service)

Outcome - decent sized cache (if I haven't nabbed them all already, LOL), still running optimised, time to open a beer.

As illogical as it sounded when I first read it, it seems to be true.

Just went through the same procedure on two of my hosts, and voila, results reported, new WUs downloaded, happily crunching again.

So I can recommend the above.

Now all that remains to be investigated is WHY this happens, and whether we need to adjust all app_info.xml files that are out there. If so, that'd be a major blow because many people will be left without work and without knowing why.

Regards,
Simon.
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information
ID: 570201 · Report as offensive
DaveSun
Avatar

Send message
Joined: 17 Jun 00
Posts: 110
Credit: 13,713,289
RAC: 2
United States
Message 570207 - Posted: 18 May 2007, 12:05:21 UTC - in response to Message 570201.  

YES!!! I was just about to post the same thing.

The following has worked for me on three systems - two late version 5.8 BOINC, and a 5.3.12.tx. All were service installs, running appropriate Chicken 2.2B, and had run completely dry.

Recipe:

Rename app_info.xml so it won't be recognised
Restart BOINC (service)
Update SETI - may not get through first time, but keep trying
Restore app_info.xml to original name
Wait until all transfers have finished
Restart BOINC (service)

Outcome - decent sized cache (if I haven't nabbed them all already, LOL), still running optimised, time to open a beer.

As illogical as it sounded when I first read it, it seems to be true.

Just went through the same procedure on two of my hosts, and voila, results reported, new WUs downloaded, happily crunching again.

So I can recommend the above.

Now all that remains to be investigated is WHY this happens, and whether we need to adjust all app_info.xml files that are out there. If so, that'd be a major blow because many people will be left without work and without knowing why.

Regards,
Simon.


My impression is that this problem began when they updated the BOINC tree, prior to that all things seemed to work just like coming out of an outage. Things seemed to be catching up until they did the udate.
ID: 570207 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 570209 - Posted: 18 May 2007, 12:08:36 UTC - in response to Message 570201.  
Last modified: 18 May 2007, 12:13:43 UTC

Thanks Simon!,
I had found, as my hosts have run dry, that detaching then reattaching had the same effect, and had the added extra of marking the ghosts 'client detached', and triggering reissues on them to other users. I'll await news from yourself before I Re-Chicken my boxes as they are happily crunching using the stock client for the moment.

Jason

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 570209 · Report as offensive
Profile Kirsten
Volunteer tester
Avatar

Send message
Joined: 7 Jul 00
Posts: 190
Credit: 566,047
RAC: 0
Denmark
Message 570211 - Posted: 18 May 2007, 12:10:54 UTC - in response to Message 570170.  
Last modified: 18 May 2007, 12:15:16 UTC


YES!!! I was just about to post the same thing.

The following has worked for me on three systems - two late version 5.8 BOINC, and a 5.3.12.tx. All were service installs, running appropriate Chicken 2.2B, and had run completely dry.

Recipe:

Rename app_info.xml so it won't be recognised
Restart BOINC (service)
Update SETI - may not get through first time, but keep trying
Restore app_info.xml to original name
Wait until all transfers have finished
Restart BOINC (service)

Outcome - decent sized cache (if I haven't nabbed them all already, LOL), still running optimised, time to open a beer.


About the recipe: In order to receive the ghost units do you not need to get access to the scheduler *before* restoring app_info.xml to its original name and then restart BM?

Well, at least the internal server error has stopped popping up.

18-05-2007 14:11:41|SETI@home|Requesting 160660 seconds of new work
18-05-2007 14:12:03||Project communication failed: attempting access to reference site
18-05-2007 14:12:04||Access to reference site succeeded - project servers may be temporarily down.
18-05-2007 14:12:06|SETI@home|Scheduler request failed: couldn't connect to server


Kind regards
Kirsten

ID: 570211 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 570216 - Posted: 18 May 2007, 12:20:05 UTC - in response to Message 570211.  

About the recipe: In order to receive the ghost units do you not need to get access to the scheduler *before* restoring app_info.xml to its original name and then restart BM?

Well, that's the way I wrote it - once you get through to the scheduler and the downloads start flowing, you have a bit of slack time to fill and that's a convenient time to do the second rename.

Actually, you can do the rename at any time between the two BOINC restarts - app_info.xml only gets read once, as the BOINC CC is starting, and then just sits there until the next time it's needed. You can edit/rename it at any time.
ID: 570216 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 570223 - Posted: 18 May 2007, 12:30:53 UTC - in response to Message 570201.  

Now all that remains to be investigated is WHY this happens, and whether we need to adjust all app_info.xml files that are out there. If so, that'd be a major blow because many people will be left without work and without knowing why.

Regards,
Simon.

I don't think we should need to adjust existing app_info.xml files except as a short-term work-around: the scheduler should be backwards-compatible with the existing format.

I've logged a 'defect' ticket in BOINC trac (#194) for the scheduler code.

Of course, if you do manage to work out what is wrong, and can fix it with a modification of the app_info, it would be helpful to pre-include it in new downloads in the same way that you include the Beta v517 hack. As well as posting it here, of course!
ID: 570223 · Report as offensive
Ivailo Bonev
Volunteer tester
Avatar

Send message
Joined: 26 Jun 00
Posts: 247
Credit: 35,864,461
RAC: 2
Bulgaria
Message 570229 - Posted: 18 May 2007, 12:40:19 UTC

After rename app_info.xml get another strange message from server:

18-05-2007 15:24|SETI@home|Requesting 33769 seconds of new work
18-05-2007 15:25||Project communication failed: attempting access to reference site
18-05-2007 15:25||Access to reference site succeeded - project servers may be temporarily down.
18-05-2007 15:25|SETI@home|Scheduler request failed: server returned nothing (no headers, no data)
18-05-2007 15:25|SETI@home|Deferring communication for 1 min 0 sec
18-05-2007 15:25|SETI@home|Reason: scheduler request failed
18-05-2007 15:26|SETI@home|Sending scheduler request: To fetch work
18-05-2007 15:26|SETI@home|Requesting 33783 seconds of new work
18-05-2007 15:26|SETI@home|Scheduler RPC succeeded [server version 509]
18-05-2007 15:26|SETI@home|Message from server: No work sent
18-05-2007 15:26|SETI@home|Message from server: (reached daily quota of 1 results)
18-05-2007 15:26|SETI@home|Deferring communication for 18 hr 53 min 36 sec
18-05-2007 15:26|SETI@home|Reason: requested by project

Anyone gets that quota of 1 result from server?
ID: 570229 · Report as offensive
Profile Henk Haneveld
Volunteer tester

Send message
Joined: 16 May 99
Posts: 154
Credit: 1,577,293
RAC: 1
Netherlands
Message 570233 - Posted: 18 May 2007, 12:43:42 UTC

After filling your cache do not forget to set to "No new task" if you want to run the WUs with the enhanced client.

Otherwise as soon as you finish 1 result Boinc will try to add to the cache and you may get Ghosts again.
ID: 570233 · Report as offensive
Profile Kirsten
Volunteer tester
Avatar

Send message
Joined: 7 Jul 00
Posts: 190
Credit: 566,047
RAC: 0
Denmark
Message 570237 - Posted: 18 May 2007, 12:50:53 UTC - in response to Message 570216.  
Last modified: 18 May 2007, 13:12:43 UTC

About the recipe: In order to receive the ghost units do you not need to get access to the scheduler *before* restoring app_info.xml to its original name and then restart BM?

Well, that's the way I wrote it - once you get through to the scheduler and the downloads start flowing, you have a bit of slack time to fill and that's a convenient time to do the second rename.


I then think I have another problem than the ghost units as I have not been able to get access to the download server at all since May 13 (except for the dozens of ghost units I have allegedly received) and I have only twice been able to upload results to SETI without clearing the finished WU's from the window, though.

Edit: something just happened

18-05-2007 14:58:57|SETI@home|[file_xfer] Finished download of file setiathome_5.15_windows_intelx86.exe

(and then the other files for initiating SETI) and now it is raining 5.15 version work units on both my hosts.


Kind regards
Kirsten

ID: 570237 · Report as offensive
Ivailo Bonev
Volunteer tester
Avatar

Send message
Joined: 26 Jun 00
Posts: 247
Credit: 35,864,461
RAC: 2
Bulgaria
Message 570257 - Posted: 18 May 2007, 13:10:08 UTC
Last modified: 18 May 2007, 13:14:13 UTC

And another never seen before message:
18-05-2007 16:03|SETI@home|Scheduler request failed: HTTP bad gateway

ID: 570257 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 570258 - Posted: 18 May 2007, 13:11:00 UTC

Glad there's been some progress on this while I was sleeping... LOL

Good find Richard / Simon / et al...!

However, is there anything in doing this rename trick that will permanently prevent the issue? My thought is that it will begin happening again... I admit I haven't read enough of this and other threads yet, and can't right now (haircut appt in under an hour and then some other things to do)... Just curious...

Brian
ID: 570258 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 570261 - Posted: 18 May 2007, 13:17:16 UTC - in response to Message 570170.  

Let me ask a very stupid question: has the internal server error, that produce ghost units, anything to do with the fact that I am using KWSN's optimized applications?

I have got nothing but ghost units for my two hosts the last couple of days.

I saw that another user uninstalled BOINC, manually deleted his BOINC folder and reinstalled BOINC. All the hosts he did this to is now receiving work. His "untouched" hosts are still getting ghosts and/or no new work.

(This is not a solution for me, as I am running other BOINC projects instead of SETI for the time being. At least I think it is bad BOINC behaviour.)

The above mentioned solution does start from scratch, though. It made me think of my optimized applications and the app_info.xml

YES!!! I was just about to post the same thing.

The following has worked for me on three systems - two late version 5.8 BOINC, and a 5.3.12.tx. All were service installs, running appropriate Chicken 2.2B, and had run completely dry.

Recipe:

Rename app_info.xml so it won't be recognised
Restart BOINC (service)
Update SETI - may not get through first time, but keep trying
Restore app_info.xml to original name
Wait until all transfers have finished
Restart BOINC (service)

Outcome - decent sized cache (if I haven't nabbed them all already, LOL), still running optimised, time to open a beer.


Yes - this is working for me too. Just renamed app_info.xml and restarted boinc, and immediately I got WUs assigned - it's downloading them (slowly) now :)

Thank you :)

Ned

*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 570261 · Report as offensive
Profile Kirsten
Volunteer tester
Avatar

Send message
Joined: 7 Jul 00
Posts: 190
Credit: 566,047
RAC: 0
Denmark
Message 570267 - Posted: 18 May 2007, 13:23:41 UTC - in response to Message 570161.  
Last modified: 18 May 2007, 13:24:11 UTC

Let me ask a very stupid question


Quoting myself as I realise my question was not that stupid after all, as I now have work thanks to Richard's recipe.
Kind regards
Kirsten

ID: 570267 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 570272 - Posted: 18 May 2007, 13:29:10 UTC - in response to Message 570267.  

Let me ask a very stupid question


Quoting myself as I realise my question was not that stupid after all, as I now have work thanks to Richard's recipe.

In situations like this, stupid questions are often the best ones - they represent "thinking outside the box" and questioning the perceived conventional wisdom. Maybe not 'stupid' but naïve, in the best sense of that word!

I'm glad it worked for you.
ID: 570272 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 570280 - Posted: 18 May 2007, 13:39:30 UTC - in response to Message 570258.  
Last modified: 18 May 2007, 13:40:07 UTC

.
ID: 570280 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 570281 - Posted: 18 May 2007, 13:39:37 UTC - in response to Message 570258.  

Glad there's been some progress on this while I was sleeping... LOL

Good find Richard / Simon / et al...!

However, is there anything in doing this rename trick that will permanently prevent the issue? My thought is that it will begin happening again... I admit I haven't read enough of this and other threads yet, and can't right now (haircut appt in under an hour and then some other things to do)... Just curious...

Brian


Just bringing my question back up to the top as I head out the door...

Also, thanks to Henk for noticing the connection between the ghosts and the http error!!!!
ID: 570281 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 570283 - Posted: 18 May 2007, 13:43:54 UTC - in response to Message 570281.  
Last modified: 18 May 2007, 13:52:44 UTC

Glad there's been some progress on this while I was sleeping... LOL

Good find Richard / Simon / et al...!

However, is there anything in doing this rename trick that will permanently prevent the issue? My thought is that it will begin happening again... I admit I haven't read enough of this and other threads yet, and can't right now (haircut appt in under an hour and then some other things to do)... Just curious...

Brian


Just bringing my question back up to the top as I head out the door...

Also, thanks to Henk for noticing the connection between the ghosts and the http error!!!!


My guess is the answer is Yes, you could leave it at the stock app ( Stop at Step 3) until the scheduler incompatibility is found and fixed.

Also, as for ghosts, if you have lots of these you might CONSIDER running your cache and upload queues dry, then doing a detach/reattach, this marks ghosts as 'client detached' and reissues the workunits to other users immediately (hopefully not as ghosts) This also sets the application back to the stock one though and wipes the local statistics [ entire project folder actually] - so consider this carefully . mine reattached to the same hostIDs correctly though.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 570283 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 570353 - Posted: 18 May 2007, 14:51:46 UTC - in response to Message 570281.  

Glad there's been some progress on this while I was sleeping... LOL

Good find Richard / Simon / et al...!

However, is there anything in doing this rename trick that will permanently prevent the issue? My thought is that it will begin happening again... I admit I haven't read enough of this and other threads yet, and can't right now (haircut appt in under an hour and then some other things to do)... Just curious...

Brian


Just bringing my question back up to the top as I head out the door...

Also, thanks to Henk for noticing the connection between the ghosts and the http error!!!!


Good question indeed.

As I've now got a few days work, I think I'll hang in there and see what happens. Can always repeat the procedure in a couple days if I run out again.

It worries me more though that I may be getting ghost WUs in that time, and taking them from the pool of available units for download, although the servers seem to be keeping up atm in creating enough work. Just a thought.


*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 570353 · Report as offensive
Profile Kirsten
Volunteer tester
Avatar

Send message
Joined: 7 Jul 00
Posts: 190
Credit: 566,047
RAC: 0
Denmark
Message 570357 - Posted: 18 May 2007, 14:56:49 UTC
Last modified: 18 May 2007, 14:58:25 UTC

Eric Korpela has been informed of the problems on ghost units and the new work around and is looking into the matter. See Eric's blog at SETI Staff Blog.
Kind regards
Kirsten

ID: 570357 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 12 · Next

Message boards : Number crunching : Ghost WU issue (and some talk about deadlines)


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.