Impossible deadlines

Message boards : Number crunching : Impossible deadlines
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Len
Avatar

Send message
Joined: 15 Mar 10
Posts: 52
Credit: 11,725,173
RAC: 86
United Kingdom
Message 1284749 - Posted: 17 Sep 2012, 10:56:36 UTC

I currently have 53 errors and they are all tasks that had deadlines within just a few minutes of the 'Sent' time. My machine is not up there on crunch time ranking like many of you, but even if it were it seems a big ask.

This seems to be something that is increasing rather than diminishing. It does not particularly affect my stats, and frankly I wouldn't really be bothered if it did, as all I am doing is releasing idle time to S@H. But it seems like an awful waste of valuable S@H Server time and bandwidth to me. It has been happening occasionally for months, but just seems to be getting worse.

I don't nurse maid S@H, so I only ever see these as errors. Therefore I can't tell if it's true, but it appears that if the 'sent' timestamp came when the download started it might even have expired by the time it was on my machine. Even if my machine started it immediately it could not possibly finish by the deadline.

Len
I think I am. Therefore I am. I think.
ID: 1284749 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14645
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1284755 - Posted: 17 Sep 2012, 11:09:13 UTC - in response to Message 1284749.  

Don't worry about it. The explanation is well known, and has been given on these boards many times - but it's only really of interest to people who do monitor their machines obsessively.

The important thing is that no bandwidth is wasted - these tasks never get anywhere near your computer.
ID: 1284755 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1284756 - Posted: 17 Sep 2012, 11:10:44 UTC
Last modified: 17 Sep 2012, 11:13:24 UTC

I take it as an indication of how stresed the download system is,
When downloads are stuck i have had some that expire before i compleat the download,
Just another day at the office.

edit - Richard types quicker than i do :¬)
ID: 1284756 · Report as offensive
Profile Len
Avatar

Send message
Joined: 15 Mar 10
Posts: 52
Credit: 11,725,173
RAC: 86
United Kingdom
Message 1284760 - Posted: 17 Sep 2012, 11:23:04 UTC

Thanks Richard & Clive. I shall relax and simply consider it a measure of how hard S@H as a whole is stressed. - And the speedy replies as a measure of the enthusiastic community we have. ;)

Len
I think I am. Therefore I am. I think.
ID: 1284760 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1284772 - Posted: 17 Sep 2012, 12:38:32 UTC - in response to Message 1284755.  

The explanation is well known, and has been given on these boards many times

I think we should have a sticky thread about it, this question seems to come back at least 1-2 times a week (including posts in the panic thread).
ID: 1284772 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1284777 - Posted: 17 Sep 2012, 13:00:35 UTC - in response to Message 1284772.  

The explanation is well known, and has been given on these boards many times

I think we should have a sticky thread about it, this question seems to come back at least 1-2 times a week (including posts in the panic thread).

People actually read those?

Ideally I think the solution would be to change the results messages from
"Outcome No reply" & "Timed out - no response"
to something like
"Outcome server canceled" & "Timed out - server canceled"

Something more descriptive would be better, but I haven't had coffee yet this morning.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1284777 · Report as offensive
Profile Link
Avatar

Send message
Joined: 18 Sep 03
Posts: 834
Credit: 1,807,369
RAC: 0
Germany
Message 1284793 - Posted: 17 Sep 2012, 14:09:06 UTC - in response to Message 1284777.  

That would be even better. There is already the status "Canceled by server", it's used for tasks cancelled by server in case of results returned after deadline, i.e. the replacement task is canceled after the late task has been returned and maybe validated (on projects which use this feature, for example Collatz). That status could be used here too, the current messages are completely wrong.
ID: 1284793 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1284975 - Posted: 17 Sep 2012, 21:53:40 UTC

Unfortunatelly, these more destructive...

2609402650 1067642640 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402648 1067642634 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402645 1067642628 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402643 1067642622 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402641 1067642616 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402639 1067642610 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402637 1067642604 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402635 1067642598 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402631 1067642586 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)

And now my main cruncher sits w/o CPU tasks at all - quota was stretched to 6 tasks per day... because of server errors.

IMHO it's very time to get separate treatment for client side and server side errors. Last should not affect user quota !
ID: 1284975 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1284980 - Posted: 17 Sep 2012, 22:13:06 UTC - in response to Message 1284975.  



And now my main cruncher sits w/o CPU tasks at all - quota was stretched to 6 tasks per day... because of server errors.

IMHO it's very time to get separate treatment for client side and server side errors. Last should not affect user quota !





+1

ID: 1284980 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1284983 - Posted: 17 Sep 2012, 22:19:30 UTC

Isn't this new wording? Change today?

9/17/2012 5:09:59 PM | SETI@home | Didn't resend lost task 13my12aa.28462.19290.12.10.168.vlar_0 (expired).

At least it gives a clue.

Agree that these should not impact daily quotas and s/b called server aborted or something like that.

Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1284983 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1285147 - Posted: 18 Sep 2012, 13:51:13 UTC - in response to Message 1284983.  

Isn't this new wording? Change today?

9/17/2012 5:09:59 PM | SETI@home | Didn't resend lost task 13my12aa.28462.19290.12.10.168.vlar_0 (expired).

At least it gives a clue.

Agree that these should not impact daily quotas and s/b called server aborted or something like that.

The client has had that message for quite some time.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1285147 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1285155 - Posted: 18 Sep 2012, 14:29:02 UTC

ID: 1285155 · Report as offensive
Profile Alex Erne

Send message
Joined: 18 Nov 08
Posts: 12
Credit: 800,330
RAC: 0
Netherlands
Message 1285156 - Posted: 18 Sep 2012, 14:37:25 UTC

Now we're at it... How about the BOINC manager working on tasks in order of deadline? I have enough time for all tasks on my system (most of the time...), but I always notice that tasks with a deadline of Nov 3 are already processed while deadlines of Sept 30 are still waiting...
<--- Searching for this one ;-)
ID: 1285156 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1285176 - Posted: 18 Sep 2012, 15:31:20 UTC - in response to Message 1285156.  

Now we're at it... How about the BOINC manager working on tasks in order of deadline? I have enough time for all tasks on my system (most of the time...), but I always notice that tasks with a deadline of Nov 3 are already processed while deadlines of Sept 30 are still waiting...

AFAIK, BOINC works in strict FIFO order unless it thinks that one WU will miss a deadline. If it were working by dealine order, then long tasks with long deadlines will be allways delayed and suspended when new work with short deadlines appears. And then projects with long tasks (and long deadlines) will have serious issues to get their work done (not to mention that then some projects may try to reduce the deadlines to put themselves very high in the order lists).
If the client scheduller notices that some WUs will miss the deadline then it enters in "panic mode" (aka High priority) and then it changes from the FIFO order to the deadline order and crunches first those tasks in danger...
ID: 1285176 · Report as offensive
Profile Bill G Special Project $75 donor
Avatar

Send message
Joined: 1 Jun 01
Posts: 1282
Credit: 187,688,550
RAC: 182
United States
Message 1285233 - Posted: 18 Sep 2012, 21:35:16 UTC - in response to Message 1285155.  

I do wonder why this AP timed out.

http://setiathome.berkeley.edu/workunit.php?wuid=1068219734

My guess would be that it was stuck downloading and did not finish the download in time. I am seeing that every now and then now that I am having problems downloading. (This is especially more the case when it is a large file as APs are)

SETI@home classic workunits 4,019
SETI@home classic CPU time 34,348 hours
ID: 1285233 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1285246 - Posted: 18 Sep 2012, 21:54:12 UTC - in response to Message 1285233.  
Last modified: 18 Sep 2012, 21:54:50 UTC

I do wonder why this AP timed out.

http://setiathome.berkeley.edu/workunit.php?wuid=1068219734

My guess would be that it was stuck downloading and did not finish the download in time. I am seeing that every now and then now that I am having problems downloading. (This is especially more the case when it is a large file as APs are)

My guess is different...and I think its a bug (or feature?) in the resent code that aborts the APs that were sent to a CPU if they get resent to an Nvidia GPU that is not OpenCl "capable" (i.e. its not on a host running Boinc 7.xx)...
ID: 1285246 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1285248 - Posted: 18 Sep 2012, 21:57:35 UTC - in response to Message 1285246.  

I do wonder why this AP timed out.

http://setiathome.berkeley.edu/workunit.php?wuid=1068219734


My guess is different...and I think its a bug (or feature?) in the resent code that aborts the APs that were sent to a CPU if they get resent to an Nvidia GPU that is not OpenCl "capable" (i.e. its not on a host running Boinc 7.xx)...

I have disabled AP's for GPU for now, so it can't be that.
ID: 1285248 · Report as offensive
Horacio

Send message
Joined: 14 Jan 00
Posts: 536
Credit: 75,967,266
RAC: 0
Argentina
Message 1285254 - Posted: 18 Sep 2012, 22:10:55 UTC - in response to Message 1285248.  
Last modified: 18 Sep 2012, 22:19:08 UTC

I do wonder why this AP timed out.

http://setiathome.berkeley.edu/workunit.php?wuid=1068219734


My guess is different...and I think its a bug (or feature?) in the resent code that aborts the APs that were sent to a CPU if they get resent to an Nvidia GPU that is not OpenCl "capable" (i.e. its not on a host running Boinc 7.xx)...

I have disabled AP's for GPU for now, so it can't be that.


At the contrary, I think it's more likely then...
If the scheduller was not able to sent them to your GPU then thats why they were "aborted"... The resent code doesnt put WUs on hold waiting for a request to a more suitable device... if the scheduller were doing that then instead of the usual short deadline on vlars, they will be skipped until you need a CPU task... (I have several APs "aborted" through the deadline in all my hosts. All them use BOINC 6.10.60 but only one doesnt have the apps to crunch them on GPU...)

Edit: What I think, is that with the adition of the new OpenCl stock apps for ATI and Nvidia, and trying to make them compatible with older clients using the optimized apps, and all the workarounds made to avoid vlars on GPU and whatnot... something is not working as intended with the resend of APs...
ID: 1285254 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1285260 - Posted: 18 Sep 2012, 22:19:56 UTC - in response to Message 1285248.  

I do wonder why this AP timed out.

http://setiathome.berkeley.edu/workunit.php?wuid=1068219734


My guess is different...and I think its a bug (or feature?) in the resent code that aborts the APs that were sent to a CPU if they get resent to an Nvidia GPU that is not OpenCl "capable" (i.e. its not on a host running Boinc 7.xx)...

I have disabled AP's for GPU for now, so it can't be that.

Well then that answers it, that AP just suffered the same fate that VLAR's do.

Cheers.
ID: 1285260 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1285264 - Posted: 18 Sep 2012, 22:24:41 UTC - in response to Message 1285260.  

I do wonder why this AP timed out.

http://setiathome.berkeley.edu/workunit.php?wuid=1068219734


My guess is different...and I think its a bug (or feature?) in the resent code that aborts the APs that were sent to a CPU if they get resent to an Nvidia GPU that is not OpenCl "capable" (i.e. its not on a host running Boinc 7.xx)...

I have disabled AP's for GPU for now, so it can't be that.

Well then that answers it, that AP just suffered the same fate that VLAR's do.

Cheers.

I disabled, removed the app_info section, for GPU APs days ago, so that AP could only be sent/resent to CPU.
ID: 1285264 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Impossible deadlines


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.