Ghost WU issue (and some talk about deadlines)

Message boards : Number crunching : Ghost WU issue (and some talk about deadlines)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 12 · Next

AuthorMessage
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 571181 - Posted: 19 May 2007, 12:16:00 UTC - in response to Message 571177.  
Last modified: 19 May 2007, 12:20:01 UTC

I keep downloading WU's from March 4, 2005 and all of the ones I am returning, have "already been reported as success."

5/19/2007 7:05:05 AM|SETI@home|Message from server: Completed result 04mr05ab.17213.23921.909646.3.57_2 refused: result already reported as success

I still have about 6 more March 4, 2005 left to crunch :-/

The to top it off, despite changing the app_info file for Optimized clients etc etc...I still would have to do that everytime I get a ghost WU and everytime I want new work.


half right, you won't get new ghosts if you have 'No New Work' set. only when you need a work topup would you need to do the appinfo thing and allow new work, when you do this fill a cache, so you don't have to do it often.

If you are running Boinc as a service on Windows, and If you are comfortable with batch files in Windows (.cmd text files), then you can take a look at my profile for two versions of SEMI-automated method.

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 571181 · Report as offensive
Profile Jason Safoutin
Volunteer tester
Avatar

Send message
Joined: 8 Sep 05
Posts: 1386
Credit: 200,389
RAC: 0
United States
Message 571185 - Posted: 19 May 2007, 12:19:19 UTC - in response to Message 571181.  
Last modified: 19 May 2007, 12:19:47 UTC

I keep downloading WU's from March 4, 2005 and all of the ones I am returning, have "already been reported as success."

5/19/2007 7:05:05 AM|SETI@home|Message from server: Completed result 04mr05ab.17213.23921.909646.3.57_2 refused: result already reported as success

I still have about 6 more March 4, 2005 left to crunch :-/

The to top it off, despite changing the app_info file for Optimized clients etc etc...I still would have to do that everytime I get a ghost WU and everytime I want new work.


half right, you won't get new ghosts if you have 'No New Work' set. only when you need a work topup would you need to do the appinfo thing and allow new work, when you do this fill a cache, so you don't have to do it often.


Fill a cache?? What? And the WU above was received AFTER I renamed the app_info.
"By faith we understand that the universe was formed at God's command, so that what is seen was not made out of what was visible". Hebrews 11.3

ID: 571185 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 571188 - Posted: 19 May 2007, 12:22:54 UTC - in response to Message 571185.  


Fill a cache?? What? And the WU above was received AFTER I renamed the app_info.


I interpret the error message as that your boinc already reported it but didn't get an acknowledgment back, so tried to report it again ... maybe I'm missing something but it should be all good

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 571188 · Report as offensive
Profile Jason Safoutin
Volunteer tester
Avatar

Send message
Joined: 8 Sep 05
Posts: 1386
Credit: 200,389
RAC: 0
United States
Message 571192 - Posted: 19 May 2007, 12:31:42 UTC - in response to Message 571188.  


Fill a cache?? What? And the WU above was received AFTER I renamed the app_info.


I interpret the error message as that your boinc already reported it but didn't get an acknowledgment back, so tried to report it again ... maybe I'm missing something but it should be all good


It actually says refused the result...because it was already reported...maybe I read it wrong?
"By faith we understand that the universe was formed at God's command, so that what is seen was not made out of what was visible". Hebrews 11.3

ID: 571192 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 571193 - Posted: 19 May 2007, 12:34:27 UTC - in response to Message 571192.  
Last modified: 19 May 2007, 12:39:26 UTC


It actually says refused the result...because it was already reported...maybe I read it wrong?


04mr05ab.17213.23921.909646.3.57_2 is reported as "Success" "Done, in your results. It is an ambiguous error message I reckon :D

[reported by an AMD Sempron 'Manila', CompterID: 3188650]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 571193 · Report as offensive
Profile littlegreenmanfrommars
Volunteer tester
Avatar

Send message
Joined: 28 Jan 06
Posts: 1410
Credit: 934,158
RAC: 0
Australia
Message 571209 - Posted: 19 May 2007, 13:25:26 UTC

OK...

After Jason pointed out the ghost WU issue to me in another thread... thanks, mate...

The Ghost WU issue has been around for a while now, but just not in such large numbers. I've noted before that some crunchers hadinordinately large lists of results, which continued growing. I just haven't seen so many affected users at one time.

The issue has hit me before, too.

What I did was detach and re-attach.

That had the effect of cancelling the ghost WUs, and I got new ones.

HOWEVER!!!!!!!!!!!!

In the numbers I see at present, I am hesitant to recommend the above course of action for all, as it would cause incredible overhead in bandwidth/server time.
ID: 571209 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 571212 - Posted: 19 May 2007, 13:31:27 UTC - in response to Message 571209.  
Last modified: 19 May 2007, 13:32:20 UTC

OK...

After Jason pointed out the ghost WU issue to me in another thread... thanks, mate...

The Ghost WU issue has been around for a while now, but just not in such large numbers. I've noted before that some crunchers hadinordinately large lists of results, which continued growing. I just haven't seen so many affected users at one time.

The issue has hit me before, too.

What I did was detach and re-attach.

That had the effect of cancelling the ghost WUs, and I got new ones.

HOWEVER!!!!!!!!!!!!

In the numbers I see at present, I am hesitant to recommend the above course of action for all, as it would cause incredible overhead in bandwidth/server time.


Certainly one possibility, however, keep in mind that clients with full caches tend to not hammer the server, whereas (not)getting ghosts means a fairly immediate retry, further draining the results ready for download. Clearing the ghosts also makes them immediately reavailable (to fill caches)


"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 571212 · Report as offensive
Profile littlegreenmanfrommars
Volunteer tester
Avatar

Send message
Joined: 28 Jan 06
Posts: 1410
Credit: 934,158
RAC: 0
Australia
Message 571218 - Posted: 19 May 2007, 13:47:23 UTC - in response to Message 571212.  

OK...

After Jason pointed out the ghost WU issue to me in another thread... thanks, mate...

The Ghost WU issue has been around for a while now, but just not in such large numbers. I've noted before that some crunchers hadinordinately large lists of results, which continued growing. I just haven't seen so many affected users at one time.

The issue has hit me before, too.

What I did was detach and re-attach.

That had the effect of cancelling the ghost WUs, and I got new ones.

HOWEVER!!!!!!!!!!!!

In the numbers I see at present, I am hesitant to recommend the above course of action for all, as it would cause incredible overhead in bandwidth/server time.


Certainly one possibility, however, keep in mind that clients with full caches tend to not hammer the server, whereas (not)getting ghosts means a fairly immediate retry, further draining the results ready for download. Clearing the ghosts also makes them immediately reavailable (to fill caches)



Sure,

However, my single cruncher already has 74 ghosts to it's name... I've seen this issue cause lists of ghost WUs rise to thousands. Multiply that by thousands or more affected machines, and we could have another outage. At least, that's my feeling. I've renamed the app_info.xml file, and have the following log:

19/05/2007 11:33:29 PM|SETI@home|Sending scheduler request: Requested by user
19/05/2007 11:33:29 PM|SETI@home|Requesting 170459 seconds of new work
19/05/2007 11:33:34 PM|SETI@home|Scheduler RPC succeeded [server version 509]
19/05/2007 11:33:34 PM|SETI@home|Deferring communication for 11 sec
19/05/2007 11:33:34 PM|SETI@home|Reason: requested by project
19/05/2007 11:33:34 PM|SETI@home|Deferring communication for 1 min 0 sec
19/05/2007 11:33:34 PM|SETI@home|Reason: no work from project
19/05/2007 11:34:35 PM|SETI@home|Fetching scheduler list
19/05/2007 11:34:40 PM|SETI@home|Master file download succeeded
19/05/2007 11:34:45 PM|SETI@home|Sending scheduler request: To fetch work
19/05/2007 11:34:45 PM|SETI@home|Requesting 170470 seconds of new work
19/05/2007 11:36:20 PM|SETI@home|Scheduler request failed: HTTP bad gateway
19/05/2007 11:36:20 PM|SETI@home|Deferring communication for 1 min 0 sec
19/05/2007 11:36:20 PM|SETI@home|Reason: scheduler request failed
19/05/2007 11:37:20 PM|SETI@home|Sending scheduler request: To fetch work
19/05/2007 11:37:20 PM|SETI@home|Requesting 170485 seconds of new work
19/05/2007 11:37:25 PM|SETI@home|Scheduler RPC succeeded [server version 509]
19/05/2007 11:37:25 PM|SETI@home|Deferring communication for 11 sec
19/05/2007 11:37:25 PM|SETI@home|Reason: requested by project
19/05/2007 11:37:25 PM|SETI@home|Deferring communication for 1 min 0 sec
19/05/2007 11:37:25 PM|SETI@home|Reason: no work from project
19/05/2007 11:38:26 PM|SETI@home|Sending scheduler request: To fetch work
19/05/2007 11:38:26 PM|SETI@home|Requesting 170461 seconds of new work
19/05/2007 11:39:26 PM|SETI@home|Scheduler RPC succeeded [server version 509]
19/05/2007 11:39:26 PM|SETI@home|Deferring communication for 11 sec
19/05/2007 11:39:26 PM|SETI@home|Reason: requested by project
19/05/2007 11:39:26 PM|SETI@home|Deferring communication for 1 min 0 sec
19/05/2007 11:39:26 PM|SETI@home|Reason: no work from project

The workaround seems to have partially worked, as master file download has succeeded, according to the log. (I renamed app_info.xml to oldapp_info.xml) I'm intrigued, though, by the "No work from project" message. My results list is still showing 74 in progress, which is about 15 days' work for my machine, if it crunches only S@h. I think I'll have to wait a while, to see if it clears.

If not, I'll detach and re-attach, even though I'm loathe to do so, unless there's a better plan surfaces in the next day or so.
ID: 571218 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 571224 - Posted: 19 May 2007, 13:53:48 UTC - in response to Message 571218.  
Last modified: 19 May 2007, 13:58:41 UTC

...
If not, I'll detach and re-attach, even though I'm loathe to do so, unless there's a better plan surfaces in the next day or so.


Could be the Ready to send queue running dry, two splitters are shown as 'Not Running'.
[303,978 Ready to Send @ 1 hour ago, was up around 600k before .... ]

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 571224 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 571228 - Posted: 19 May 2007, 14:03:45 UTC

Two points:

1) Nothing that we're doing here will get the existing ghost results sent to anyone's computers. What we're trying to do is (a) avoid creating any new ghost results (set 'no new work' while running with an app_info.xml file), and (b) download new results to keep the CPUs warm (run the workround).

2) I doubt we've run the queue dry yet, but remember that bit from Matt's final post before his vacation:
....And our old friend "slow feeder query" is back, probably just being aggravated by the heavy load.

I suspect that the 'no new work' messages just indicate that the feeder cache has temporarily run dry, not the whole queue.
ID: 571228 · Report as offensive
Profile Stan Pleban
Avatar

Send message
Joined: 16 Jun 00
Posts: 13
Credit: 216,111
RAC: 0
United States
Message 571232 - Posted: 19 May 2007, 14:20:03 UTC
Last modified: 19 May 2007, 14:25:28 UTC

have detached and re-attached twice already...it has cleared the Ghost WU's shown as Client Detached and they were resent out to other users....

Once you get your download of WU's into your task manager, then exit BOINC, put in the 2.2B files in SETI, then start Boinc again...but put "no new work" now on SETI, I am hoping that will stem the flow of the Ghost WU's...I admit I should have done that sooner..

user stats
ID: 571232 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51583
Credit: 1,018,363,574
RAC: 1,004
United States
Message 571264 - Posted: 19 May 2007, 15:07:05 UTC

Silly question.....
We know the server code has been broken by whatever changes were recently made.
Can't the old server code be put back in place until Matt can analyze and fix what went wrong? You do keep a copy of programs before you modify them....correct?
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 571264 · Report as offensive
Profile littlegreenmanfrommars
Volunteer tester
Avatar

Send message
Joined: 28 Jan 06
Posts: 1410
Credit: 934,158
RAC: 0
Australia
Message 571279 - Posted: 19 May 2007, 15:23:28 UTC - in response to Message 571224.  

...
If not, I'll detach and re-attach, even though I'm loathe to do so, unless there's a better plan surfaces in the next day or so.


Could be the Ready to send queue running dry, two splitters are shown as 'Not Running'.
[303,978 Ready to Send @ 1 hour ago, was up around 600k before .... ]


Situation is changing rapidly:
Now have 86 results in the list, but downloads of program and a dozen WUs are in the "transfers" tab of BOINC. They appear to have stalled. Keeping an eye on it.
ID: 571279 · Report as offensive
Profile littlegreenmanfrommars
Volunteer tester
Avatar

Send message
Joined: 28 Jan 06
Posts: 1410
Credit: 934,158
RAC: 0
Australia
Message 571362 - Posted: 19 May 2007, 16:51:23 UTC

Have a dozen S@h WUs now in cache, but the ghosts haven't been cleaned up. I suppose I'll just have to wait for them to time out.
ID: 571362 · Report as offensive
Profile BORG
Volunteer tester
Avatar

Send message
Joined: 3 Aug 99
Posts: 305
Credit: 6,157,052
RAC: 0
Canada
Message 571407 - Posted: 19 May 2007, 17:28:09 UTC - in response to Message 571362.  
Last modified: 19 May 2007, 17:39:49 UTC

I don't know if others are experiencing this. But after renaming the app_info.xml file I was able to download and upload without renaming it back. I'm running optimized and the optimized client continues to run regardless of the app_info.xml file being present or not.

WU's are authenticating and all seem to be working fine for now.
ID: 571407 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51583
Credit: 1,018,363,574
RAC: 1,004
United States
Message 571443 - Posted: 19 May 2007, 17:55:17 UTC - in response to Message 571407.  

I don't know if others are experiencing this. But after renaming the app_info.xml file I was able to download and upload without renaming it back. I'm running optimized and the optimized client continues to run regardless of the app_info.xml file being present or not.

WU's are authenticating and all seem to be working fine for now.


I ran into this when I ran optimized over on Beta by mistake (don't do it!) and removed to app info file to go back to normal processing.
What happens is that the WUs downloaded when the app info file was in place will run with the optimized app, but any new work will revert back to the stock app.

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 571443 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 571453 - Posted: 19 May 2007, 18:03:03 UTC - in response to Message 571443.  
Last modified: 19 May 2007, 18:06:48 UTC

I don't know if others are experiencing this. But after renaming the app_info.xml file I was able to download and upload without renaming it back. I'm running optimized and the optimized client continues to run regardless of the app_info.xml file being present or not.

WU's are authenticating and all seem to be working fine for now.


I ran into this when I ran optimized over on Beta by mistake (don't do it!) and removed to app info file to go back to normal processing.
What happens is that the WUs downloaded when the app info file was in place will run with the optimized app, but any new work will revert back to the stock app.

Strangely, that is what I would expect and what I understand from the documentation. But it ain't so.

I was using my own workround this morning, and downloaded 100 new WUs. 44 of them were the shortest deadline possible, so coupled with my 3-day connect setting, they went straight into EDF and started crunching - on all 8 cores - while I was still downloading. App_info was still disabled - I hadn't restarted BOINC since the successful work fetch request - and yet Task Manager confirmed that the active applications were all Chicken 2.2B

Go figure.

Edit - case in point: result 535578535
ID: 571453 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51583
Credit: 1,018,363,574
RAC: 1,004
United States
Message 571462 - Posted: 19 May 2007, 18:07:42 UTC - in response to Message 571443.  

I don't know if others are experiencing this. But after renaming the app_info.xml file I was able to download and upload without renaming it back. I'm running optimized and the optimized client continues to run regardless of the app_info.xml file being present or not.

WU's are authenticating and all seem to be working fine for now.


I ran into this when I ran optimized over on Beta by mistake (don't do it!) and removed to app info file to go back to normal processing.
What happens is that the WUs downloaded when the app info file was in place will run with the optimized app, but any new work will revert back to the stock app.


Actually, right now I'm not sure how this works. I had a rig this morning that had no Seti work. Took out the app info file, got some WUs. Put the app info file back and started crunching. Took the app info file back out, and it looks like it's still using the optimized app.
Please note that each time, Boinc was stopped and restarted.
When you start Boinc with the xml file in place, does it then tag all work in the cache to run with the optimized app, or just the ones it happens to be processing at the moment? If all the WUs get tagged, you could remove the app info xml file, get work, replace the file, start crunching, and remove it again to allow comms with the host as usual.
What I'm not clear on is when Boinc will decide to use the stock app instead of the optimized one.

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 571462 · Report as offensive
Profile BORG
Volunteer tester
Avatar

Send message
Joined: 3 Aug 99
Posts: 305
Credit: 6,157,052
RAC: 0
Canada
Message 571472 - Posted: 19 May 2007, 18:14:41 UTC - in response to Message 571462.  
Last modified: 19 May 2007, 18:40:54 UTC

I don't know if others are experiencing this. But after renaming the app_info.xml file I was able to download and upload without renaming it back. I'm running optimized and the optimized client continues to run regardless of the app_info.xml file being present or not.

WU's are authenticating and all seem to be working fine for now.


I ran into this when I ran optimized over on Beta by mistake (don't do it!) and removed to app info file to go back to normal processing.
What happens is that the WUs downloaded when the app info file was in place will run with the optimized app, but any new work will revert back to the stock app.


Actually, right now I'm not sure how this works. I had a rig this morning that had no Seti work. Took out the app info file, got some WUs. Put the app info file back and started crunching. Took the app info file back out, and it looks like it's still using the optimized app.
Please note that each time, Boinc was stopped and restarted.
When you start Boinc with the xml file in place, does it then tag all work in the cache to run with the optimized app, or just the ones it happens to be processing at the moment? If all the WUs get tagged, you could remove the app info xml file, get work, replace the file, start crunching, and remove it again to allow comms with the host as usual.
What I'm not clear on is when Boinc will decide to use the stock app instead of the optimized one.



I honestly don't know what's going on, only that additional wu's have been download since yesterday. And with the app_info.xml file still renamed as app_info.xml_rename, crunching continues with the newly downloaded wu's with the optimized client. I've renamed the stock app so it's not running. Boinc has not downloaded a new one either.

So I don't get it.....
ID: 571472 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 571479 - Posted: 19 May 2007, 18:19:02 UTC - in response to Message 571462.  

...
What I'm not clear on is when Boinc will decide to use the stock app instead of the optimized one.


The Apps seems to be mentioned in the client_state.xml, somehow , in my case, KWSN_2.2B_SSE2-P4_Ben-Joe.exe has inserted itself where the standard app used to be... worth a closer look


"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 571479 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 12 · Next

Message boards : Number crunching : Ghost WU issue (and some talk about deadlines)


 
©2026 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.