Lost "Ghost" task recovery protocol

Message boards : Number crunching : Lost "Ghost" task recovery protocol
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 12548
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2010344 - Posted: 1 Sep 2019, 16:42:14 UTC - in response to Message 2010314.  

The problem comes from what the host advertises. One the magical things that the Lunatics installer does is preserve all the current cache of work on the host when converting to Lunatics apps. It does this by extensive rewriting of the client_state.xml file to change all the current entries for every task in the cache to the new plan_class and applications for the Lunatics apps.

I think this is what confuses the server task scheduler. What it used to know about what the host advertised of its capabilities is all of a sudden completely different upon the first connection.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2010344 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2012315 - Posted: 17 Sep 2019, 13:33:58 UTC

I upgraded Boinc to a newer version with the result that it ghosted everything I had as usual. I used the recovery protocol and again the server made everything expire instead of letting me redownload them :(

Here is one example of those tasks: https://setiathome.berkeley.edu/result.php?resultid=8055686426

Task 8055686426
	Name                13se19aa.6745.885.5.32.181.vlar_0
	Workunit            3655915446
	Created             17 Sep 2019, 6:44:38 UTC
	Sent                17 Sep 2019, 12:38:13 UTC
	Report deadline     17 Sep 2019, 13:17:16 UTC
	Received            ---
	Server state        Over
	Outcome             No reply
	Client state        New
	Exit status         0 (0x00000000)
	Computer ID         8652081
	Run time
	CPU time
	Validate state      Initial
	Credit              0.00
	Device peak FLOPS   0.00 GFLOPS
	Application version SETI@home v8
	                    Anonymous platform (CPU)

So "deadline" less than one hour after initially receiving the task :(
ID: 2012315 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 19945
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2012339 - Posted: 17 Sep 2019, 20:58:09 UTC

That is not a "Ghost" task, it is one that has failed to calculate in time.
A "Ghost" task is one that left the servers but you never actually received.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2012339 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3266
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2012340 - Posted: 17 Sep 2019, 21:01:49 UTC - in response to Message 2012339.  

A "Ghost" task is one that left the servers but you never actually received.


You can also "ghost" them by either wiping the files from the project folder, or badly editing app_info.xml so that the executable for a platform that has work units is invalid ie has a typo.
ID: 2012340 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2012452 - Posted: 18 Sep 2019, 20:23:44 UTC - in response to Message 2012340.  

A "Ghost" task is one that left the servers but you never actually received.
You can also "ghost" them by either wiping the files from the project folder, or badly editing app_info.xml so that the executable for a platform that has work units is invalid ie has a typo.
No need to wipe. Boinc wipes them on its own from the slightest suspicion of trouble.

In this case I just stopped the Boinc client, upgraded it and restarted it. It wiped everything on launch including the apps! And then downloaded nothing because it had no apps any more. App_info.xml was still intact, but all the apps it referred to had vanished. I kept copies elsewhere so I could easily restore them as this wasn't the first time boinc client has deleted my apps!
ID: 2012452 · Report as offensive     Reply Quote
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 615
Credit: 354,398,348
RAC: 11,693
United States
Message 2013973 - Posted: 2 Oct 2019, 12:17:41 UTC

For those with many more than the current limit of 80 ghost tasks, the following may be helpful. I had quite a large number due to a disk failure. I discovered you can save repeating this procedure. If you leave no new tasks set for along time so that many more than 80 active tasks are returned and then follow Keith's procedure, you'll get 80 tasks resent at the first update and then it appears you'll get more resends at the next update (although it may not say so). Can anyone else confirm this is what's happening?

Roger
ID: 2013973 · Report as offensive     Reply Quote
Profile Freewill Project Donor
Avatar

Send message
Joined: 19 May 99
Posts: 615
Credit: 354,398,348
RAC: 11,693
United States
Message 2013977 - Posted: 2 Oct 2019, 12:45:23 UTC - in response to Message 1993558.  

Actually, you should be able to set it up to where instead of sending back 20 tasks it will 'Expire' all your 'Lost tasks' in one move.
Change your Preferences to Not list SETI@home v8: yes, change it to No. The resend sends tasks according to your Preferences, if SETI@home is No, it won't send any when triggered.
Set it here, https://setiathome.berkeley.edu/prefs.php?subset=project
Personally, I trigger the resend by waiting until there is a task to report, copy the client_state.xml to another directory, hit Update to report the task, then Stop BOINC.
Copy the old client_state.xml back to BOINC, add 1 to the <rpc_seqno></rpc_seqno> number, and then start BOINC. I usually remove all the Active tasks from the old client_state.xml when changing the <rpc_seqno> , but, I don't think it really matters as long as you have it set to Not checkpoint.


That worked like a charm! It moves the ghost tasks to the "error" section with a status of "timed out - no response" but at least they'll get resent to someone and not leave wing people hanging. Thanks, TBar!
ID: 2013977 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14349
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2015395 - Posted: 14 Oct 2019, 8:44:41 UTC - in response to Message 2010344.  

One the magical things that the Lunatics installer does is preserve all the current cache of work on the host when converting to Lunatics apps. It does this by extensive rewriting of the client_state.xml file ...
Actually, it doesn't.

It doesn't even touch client_state.xml

What it does is to create an app_info.xml file which covers every known combination of platform, version, and plan_class. Whatever you might have lurking in your client_state file - has a home to go to, and one of the Lunatics apps will pick it up and run with it.

What this does mean is that if you write your own app_info.xml, and don't follow the established platform, version, and plan_class values, work assigned under that app_info file will probably become homeless after running the installer.
ID: 2015395 · Report as offensive     Reply Quote
rcthardcore

Send message
Joined: 23 Nov 08
Posts: 48
Credit: 1,306,006
RAC: 0
United States
Message 2018566 - Posted: 11 Nov 2019, 21:23:15 UTC

Maybe it is time for a function to be programmed into BOINC in order to allow us to download all of our ghost tasks. A button maybe. You would think that this would have already been done.
ID: 2018566 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 12548
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2018568 - Posted: 11 Nov 2019, 21:29:05 UTC

Even with a button in the client, resending lost work has to be configured on every individual projects scheduler in the server software. So it would work on some projects and not on others depending on whether the project has that function enabled.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2018568 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2018574 - Posted: 11 Nov 2019, 21:54:45 UTC

I was talking in the recent past with Richard about this, an automated ghost recovery program.

Seriously is not an complicated program, but the main problem is it could add an additional huge load to the servers something we all not want.

In theory it will need to scrap all the db looking for all the Wu with destination of a particular host and check if they are on the host or no and make the adjusts on the client file and DL the WU to it.

IIRC there are actually a subroutine on the server side who make something very similar, but is disabled exactly because the extra load.

But there is the problem, imagine few hosts making that at the same time...

I never tried but know few who does and say it works, i imagine the method described by Tbar on this thread is the easy way to do that.

my 0.02
ID: 2018574 · Report as offensive     Reply Quote
Lazydude
Volunteer tester

Send message
Joined: 17 Jan 01
Posts: 45
Credit: 96,158,001
RAC: 136
Sweden
Message 2022459 - Posted: 9 Dec 2019, 8:29:48 UTC - in response to Message 2018574.  

I
IIRC there are actually a subroutine on the server side who make something very similar, but is disabled exactly because the extra load.

But there is the problem, imagine few hosts making that at the same time...


I would like a button on the Computer page for resends.
With restriction can do this only most every 3days and not on Tuesdays and a period of xx hours after outtake.
Lazy
ID: 2022459 · Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 19 Jan 13
Posts: 111
Credit: 2,834,186
RAC: 11
United States
Message 2042894 - Posted: 4 Apr 2020, 5:35:44 UTC - in response to Message 1992660.  

"Thanks for this! The procedure seems clear and I tried it, as I appear to have about 60 ghost tasks. However, the server currently has no tasks to send. Would that cause it not to do the resends?"

My account page says I have 64 tasks in progress, but my activity page only shows 15 , pus a bunch awaiting validation.

I guess the end is beginning?

Steven Gaber
Oldsmar, FL
ID: 2042894 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13165
Credit: 208,696,464
RAC: 304
Australia
Message 2042899 - Posted: 4 Apr 2020, 6:57:11 UTC - in response to Message 2042894.  
Last modified: 4 Apr 2020, 6:57:40 UTC

"Thanks for this! The procedure seems clear and I tried it, as I appear to have about 60 ghost tasks. However, the server currently has no tasks to send. Would that cause it not to do the resends?"
That shouldn't be the case, as they are already Tasks that it thinks you have, so it should just resend those particular Tasks to your system again. It doesn't have to actually have any Tasks to send, or make any new ones up. Just re-send the existing ones.
Grant
Darwin NT
ID: 2042899 · Report as offensive     Reply Quote
Jiiimbooh

Send message
Joined: 1 Jul 09
Posts: 4
Credit: 321,586
RAC: 13
Sweden
Message 2042964 - Posted: 4 Apr 2020, 17:44:26 UTC - in response to Message 2042894.  

"Thanks for this! The procedure seems clear and I tried it, as I appear to have about 60 ghost tasks. However, the server currently has no tasks to send. Would that cause it not to do the resends?"

My account page says I have 64 tasks in progress, but my activity page only shows 15 , pus a bunch awaiting validation.

I guess the end is beginning?

Steven Gaber
Oldsmar, FL


The tasks page is not up to date right now. Judging by how many apparent ghost tasks I have, I'd say the page is about 4 days behind.
ID: 2042964 · Report as offensive     Reply Quote
Profile BobMiller

Send message
Joined: 24 Jul 08
Posts: 32
Credit: 11,041,077
RAC: 129
United States
Message 2043665 - Posted: 8 Apr 2020, 15:17:49 UTC

Help, my tasks have disappeared again.
ID: 2043665 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2043670 - Posted: 8 Apr 2020, 15:31:28 UTC

Yours tasks shows: Outcome Abandoned

Did you do reset the project or something like this?
ID: 2043670 · Report as offensive     Reply Quote
Profile BobMiller

Send message
Joined: 24 Jul 08
Posts: 32
Credit: 11,041,077
RAC: 129
United States
Message 2043690 - Posted: 8 Apr 2020, 18:07:47 UTC - in response to Message 2043670.  

they disappeared with no action on my part.
are they completely and absolutely lost if the project is reset
ID: 2043690 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2043691 - Posted: 8 Apr 2020, 18:17:23 UTC
Last modified: 8 Apr 2020, 18:39:47 UTC

AFAIK yes they are completely lost from your host POV if abandoned.
If you look at the WU the task is already send to another host.

Maybe someone else knows a way to recover and post here for us to learn.
ID: 2043691 · Report as offensive     Reply Quote
Profile BobMiller

Send message
Joined: 24 Jul 08
Posts: 32
Credit: 11,041,077
RAC: 129
United States
Message 2043785 - Posted: 9 Apr 2020, 3:26:37 UTC - in response to Message 2043691.  

Juan,
Gracias,
Me gusta a conocer usted
ID: 2043785 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Lost "Ghost" task recovery protocol


 
©2021 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.