"Lost Task"

Message boards : Number crunching : "Lost Task"
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile John S. (Low IFR) Jenkins

Send message
Joined: 22 Feb 01
Posts: 53
Credit: 24,515,908
RAC: 0
United States
Message 1223121 - Posted: 25 Apr 2012, 10:53:23 UTC

What is a "lost task", as in "Sending Lost task" message
ID: 1223121 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1223122 - Posted: 25 Apr 2012, 11:06:41 UTC - in response to Message 1223121.  

What is a "lost task", as in "Sending Lost task" message

A task that failed to reach your computer the first time it was sent.
ID: 1223122 · Report as offensive
Profile Area 51
Avatar

Send message
Joined: 31 Jan 04
Posts: 965
Credit: 42,193,520
RAC: 0
United Kingdom
Message 1223123 - Posted: 25 Apr 2012, 11:06:52 UTC - in response to Message 1223121.  
Last modified: 25 Apr 2012, 11:08:34 UTC

What is a "lost task", as in "Sending Lost task" message



A task that the servers allocated to your host - but effectively never arrived, or never finished arriving - perhaps due to network congestion. When your host reports, it tells the servers what it has in its cache - if that doesn't agree with what the server thinks you have, the servers resend any tasks your host hasn't got but should have. These lost tasks are often called ghosts.
ID: 1223123 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19012
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1223129 - Posted: 25 Apr 2012, 11:47:22 UTC

A further bit of info. Tasks marked VLAR do not run on the GPU and therefore on the first sending only, they are sent to the CPU.

When they are resends they can be sent to the GPU, if this happens a quick and dirty piece of code kicks in and gives them an impossible deadline.
This effectively aborts the task.
ID: 1223129 · Report as offensive
Sakletare
Avatar

Send message
Joined: 18 May 99
Posts: 132
Credit: 23,423,829
RAC: 0
Sweden
Message 1223162 - Posted: 25 Apr 2012, 15:09:10 UTC

What I don't understand about lost tasks is why they are only resent in batches of 20.
ID: 1223162 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1223180 - Posted: 25 Apr 2012, 15:59:37 UTC - in response to Message 1223162.  

What I don't understand about lost tasks is why they are only resent in batches of 20.

Because that is the arbitrary number they selected for the upper limit. I would guess it is to limit the amount of data the servers have to move from the disk array or something of that nature.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1223180 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1223186 - Posted: 25 Apr 2012, 16:13:06 UTC - in response to Message 1223180.  

What I don't understand about lost tasks is why they are only resent in batches of 20.

Because that is the arbitrary number they selected for the upper limit. I would guess it is to limit the amount of data the servers have to move from the disk array or something of that nature.

When the tasks are first sent out, there's some fairly rigorous checking done on the server (in theory, at least) to ensure that no task is sent which might possibly run into deadline trouble, given the speed of the host, what proportion of the time it's crunching, how much work it has on board from other projects, etc., etc.

I think the 'resend' logic is much simpler - which is one reason why we have this problem with VLARs. And don't forget, your computer might have gone off and collected a bucketful of work from another project in the meantime, thinking that SETI hadn't given it anything.

I think the 20 task limit is just a simple-minded precaution against overwork. You can always ask for another 20 five minutes later.
ID: 1223186 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1223240 - Posted: 25 Apr 2012, 18:30:27 UTC - in response to Message 1223180.  

What I don't understand about lost tasks is why they are only resent in batches of 20.

Because that is the arbitrary number they selected for the upper limit. I would guess it is to limit the amount of data the servers have to move from the disk array or something of that nature.

Yes, the 20 comes from the <max_wus_to_send> project setting. Resend for lost tasks uses that setting directly rather than multiplied by the number of processing resources.

Even for normal work assignments, a single core CPU system without a usable GPU is restricted to 20 per request.
                                                                   Joe
ID: 1223240 · Report as offensive
Profile Francesco Forti
Avatar

Send message
Joined: 24 May 00
Posts: 334
Credit: 204,421,005
RAC: 15
Switzerland
Message 1235741 - Posted: 24 May 2012, 8:24:22 UTC
Last modified: 24 May 2012, 8:31:12 UTC

In theese days I see a lot of

896	SETI@home	24.05.2012 10:18:39	Resent lost task 07my10ag.14625.51365.15.10.220_1	
916	SETI@home	24.05.2012 10:18:39	[error] Already have task 07my10ag.14625.51365.15.10.220_1	


Of course I see 20 lost task resent followed by 20 Already have task
This mainly on two hosts. [Edit_ now three].
ID: 1235741 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1235763 - Posted: 24 May 2012, 10:20:27 UTC
Last modified: 24 May 2012, 10:22:18 UTC

[quote]In theese days I see a lot of

896 SETI@home 24.05.2012 10:18:39 Resent lost task 07my10ag.14625.51365.15.10.220_1
916 SETI@home 24.05.2012 10:18:39 [error] Already have task 07my10ag.14625.51365.15.10.220_1

Of course I see 20 lost task resent followed by 20 Already have task
This mainly on two hosts. [Edit_ now three].
[/quote

This is an unexpected consequence from a change to the server code that was made during Tuesday's maintenance window. It will occur when you try to report more than 64 results, so it should only occur on your fastest hosts. The project has been alerted to this issue. There is more info in the thread "Unannounced Server-Side Change?"
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1235763 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1236005 - Posted: 24 May 2012, 17:47:25 UTC

Typically, "resent lost task:" happens because of the following:

Your machine sends a server request asking for work. The request makes it to the server, gets processed, and WUs get assigned to you. The message from the server back to you informing you of these tasks and where to download them from gets dropped due to network congestion.

Your machine will say something like "time-out window reached" after 360 seconds because it did not get a response from the server. After usually 60 seconds, it will send another scheduler request. When you send scheduler requests, it sends a copy of client_state.xml along with it which says what work you have, how long it is expected to take, etc.

The server looks at what your machine says it has, compares it to what you're supposed to have, and sends you those same tasks again, but as "lost tasks." Limited to 20 at a time, and as long as there are no crazy hardware limitations, other projects getting a whole pile of work in the meantime or you're getting too close to deadlines, you can get an indefinite number of batches of 20 resends until you have all that you're supposed to have.


The resending of lost tasks is a relatively new feature. It used to be called "ghosts" because they would show up on your tasks page here on the website, but they weren't on your machine. They would simply just sit there and wait until the deadline passed and then get sent to somebody else. The only time ghost tasks happen are when the network pipe is maxed out, which in the past 6+ months.. has been nearly 24/7.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1236005 · Report as offensive

Message boards : Number crunching : "Lost Task"


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.