"Lost Task"


log in

Advanced search

Message boards : Number crunching : "Lost Task"

Author Message
Profile John S. (Low IFR) Jenkins
Send message
Joined: 22 Feb 01
Posts: 53
Credit: 24,515,908
RAC: 0
United States
Message 1223121 - Posted: 25 Apr 2012, 10:53:23 UTC

What is a "lost task", as in "Sending Lost task" message

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8635
Credit: 23,744,400
RAC: 18,766
United Kingdom
Message 1223122 - Posted: 25 Apr 2012, 11:06:41 UTC - in response to Message 1223121.

What is a "lost task", as in "Sending Lost task" message

A task that failed to reach your computer the first time it was sent.

Profile Area 51
Avatar
Send message
Joined: 31 Jan 04
Posts: 965
Credit: 42,193,520
RAC: 0
United Kingdom
Message 1223123 - Posted: 25 Apr 2012, 11:06:52 UTC - in response to Message 1223121.
Last modified: 25 Apr 2012, 11:08:34 UTC

What is a "lost task", as in "Sending Lost task" message



A task that the servers allocated to your host - but effectively never arrived, or never finished arriving - perhaps due to network congestion. When your host reports, it tells the servers what it has in its cache - if that doesn't agree with what the server thinks you have, the servers resend any tasks your host hasn't got but should have. These lost tasks are often called ghosts.
____________

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8635
Credit: 23,744,400
RAC: 18,766
United Kingdom
Message 1223129 - Posted: 25 Apr 2012, 11:47:22 UTC

A further bit of info. Tasks marked VLAR do not run on the GPU and therefore on the first sending only, they are sent to the CPU.

When they are resends they can be sent to the GPU, if this happens a quick and dirty piece of code kicks in and gives them an impossible deadline.
This effectively aborts the task.

Sakletare
Avatar
Send message
Joined: 18 May 99
Posts: 131
Credit: 20,877,464
RAC: 3,995
Sweden
Message 1223162 - Posted: 25 Apr 2012, 15:09:10 UTC

What I don't understand about lost tasks is why they are only resent in batches of 20.

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4087
Credit: 111,884,334
RAC: 147,602
United States
Message 1223180 - Posted: 25 Apr 2012, 15:59:37 UTC - in response to Message 1223162.

What I don't understand about lost tasks is why they are only resent in batches of 20.

Because that is the arbitrary number they selected for the upper limit. I would guess it is to limit the amount of data the servers have to move from the disk array or something of that nature.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8466
Credit: 49,006,816
RAC: 72,891
United Kingdom
Message 1223186 - Posted: 25 Apr 2012, 16:13:06 UTC - in response to Message 1223180.

What I don't understand about lost tasks is why they are only resent in batches of 20.

Because that is the arbitrary number they selected for the upper limit. I would guess it is to limit the amount of data the servers have to move from the disk array or something of that nature.

When the tasks are first sent out, there's some fairly rigorous checking done on the server (in theory, at least) to ensure that no task is sent which might possibly run into deadline trouble, given the speed of the host, what proportion of the time it's crunching, how much work it has on board from other projects, etc., etc.

I think the 'resend' logic is much simpler - which is one reason why we have this problem with VLARs. And don't forget, your computer might have gone off and collected a bucketful of work from another project in the meantime, thinking that SETI hadn't given it anything.

I think the 20 task limit is just a simple-minded precaution against overwork. You can always ask for another 20 five minutes later.

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4230
Credit: 1,043,255
RAC: 311
United States
Message 1223240 - Posted: 25 Apr 2012, 18:30:27 UTC - in response to Message 1223180.

What I don't understand about lost tasks is why they are only resent in batches of 20.

Because that is the arbitrary number they selected for the upper limit. I would guess it is to limit the amount of data the servers have to move from the disk array or something of that nature.

Yes, the 20 comes from the <max_wus_to_send> project setting. Resend for lost tasks uses that setting directly rather than multiplied by the number of processing resources.

Even for normal work assignments, a single core CPU system without a usable GPU is restricted to 20 per request.
Joe

Profile Francesco Forti
Avatar
Send message
Joined: 24 May 00
Posts: 281
Credit: 137,763,817
RAC: 44,132
Switzerland
Message 1235741 - Posted: 24 May 2012, 8:24:22 UTC
Last modified: 24 May 2012, 8:31:12 UTC

In theese days I see a lot of

896 SETI@home 24.05.2012 10:18:39 Resent lost task 07my10ag.14625.51365.15.10.220_1
916 SETI@home 24.05.2012 10:18:39 [error] Already have task 07my10ag.14625.51365.15.10.220_1


Of course I see 20 lost task resent followed by 20 Already have task
This mainly on two hosts. [Edit_ now three].
____________

Profile Fred E.Project donor
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,139,004
RAC: 118
United States
Message 1235763 - Posted: 24 May 2012, 10:20:27 UTC
Last modified: 24 May 2012, 10:22:18 UTC

[quote]In theese days I see a lot of

896 SETI@home 24.05.2012 10:18:39 Resent lost task 07my10ag.14625.51365.15.10.220_1
916 SETI@home 24.05.2012 10:18:39 [error] Already have task 07my10ag.14625.51365.15.10.220_1

Of course I see 20 lost task resent followed by 20 Already have task
This mainly on two hosts. [Edit_ now three].
[/quote

This is an unexpected consequence from a change to the server code that was made during Tuesday's maintenance window. It will occur when you try to report more than 64 results, so it should only occur on your fastest hosts. The project has been alerted to this issue. There is more info in the thread "Unannounced Server-Side Change?"
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2250
Credit: 8,602,524
RAC: 4,322
United States
Message 1236005 - Posted: 24 May 2012, 17:47:25 UTC

Typically, "resent lost task:" happens because of the following:

Your machine sends a server request asking for work. The request makes it to the server, gets processed, and WUs get assigned to you. The message from the server back to you informing you of these tasks and where to download them from gets dropped due to network congestion.

Your machine will say something like "time-out window reached" after 360 seconds because it did not get a response from the server. After usually 60 seconds, it will send another scheduler request. When you send scheduler requests, it sends a copy of client_state.xml along with it which says what work you have, how long it is expected to take, etc.

The server looks at what your machine says it has, compares it to what you're supposed to have, and sends you those same tasks again, but as "lost tasks." Limited to 20 at a time, and as long as there are no crazy hardware limitations, other projects getting a whole pile of work in the meantime or you're getting too close to deadlines, you can get an indefinite number of batches of 20 resends until you have all that you're supposed to have.


The resending of lost tasks is a relatively new feature. It used to be called "ghosts" because they would show up on your tasks page here on the website, but they weren't on your machine. They would simply just sit there and wait until the deadline passed and then get sent to somebody else. The only time ghost tasks happen are when the network pipe is maxed out, which in the past 6+ months.. has been nearly 24/7.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Message boards : Number crunching : "Lost Task"

Copyright © 2014 University of California