Impossible deadlines


log in

Advanced search

Message boards : Number crunching : Impossible deadlines

1 · 2 · Next
Author Message
Profile Len
Avatar
Send message
Joined: 15 Mar 10
Posts: 53
Credit: 1,728,200
RAC: 1,868
United Kingdom
Message 1284749 - Posted: 17 Sep 2012, 10:56:36 UTC

I currently have 53 errors and they are all tasks that had deadlines within just a few minutes of the 'Sent' time. My machine is not up there on crunch time ranking like many of you, but even if it were it seems a big ask.

This seems to be something that is increasing rather than diminishing. It does not particularly affect my stats, and frankly I wouldn't really be bothered if it did, as all I am doing is releasing idle time to S@H. But it seems like an awful waste of valuable S@H Server time and bandwidth to me. It has been happening occasionally for months, but just seems to be getting worse.

I don't nurse maid S@H, so I only ever see these as errors. Therefore I can't tell if it's true, but it appears that if the 'sent' timestamp came when the download started it might even have expired by the time it was on my machine. Even if my machine started it immediately it could not possibly finish by the deadline.

Len
____________
I think I am. Therefore I am. I think.

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 44,946,183
RAC: 13,645
United Kingdom
Message 1284755 - Posted: 17 Sep 2012, 11:09:13 UTC - in response to Message 1284749.

Don't worry about it. The explanation is well known, and has been given on these boards many times - but it's only really of interest to people who do monitor their machines obsessively.

The important thing is that no bandwidth is wasted - these tasks never get anywhere near your computer.

clive G1FYE
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,054,144
RAC: 5
United Kingdom
Message 1284756 - Posted: 17 Sep 2012, 11:10:44 UTC
Last modified: 17 Sep 2012, 11:13:24 UTC

I take it as an indication of how stresed the download system is,
When downloads are stuck i have had some that expire before i compleat the download,
Just another day at the office.

edit - Richard types quicker than i do :¬)

Profile Len
Avatar
Send message
Joined: 15 Mar 10
Posts: 53
Credit: 1,728,200
RAC: 1,868
United Kingdom
Message 1284760 - Posted: 17 Sep 2012, 11:23:04 UTC

Thanks Richard & Clive. I shall relax and simply consider it a measure of how hard S@H as a whole is stressed. - And the speedy replies as a measure of the enthusiastic community we have. ;)

Len
____________
I think I am. Therefore I am. I think.

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 813
Credit: 1,500,813
RAC: 410
Germany
Message 1284772 - Posted: 17 Sep 2012, 12:38:32 UTC - in response to Message 1284755.

The explanation is well known, and has been given on these boards many times

I think we should have a sticky thread about it, this question seems to come back at least 1-2 times a week (including posts in the panic thread).
____________
.

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3570
Credit: 97,994,472
RAC: 79,137
United States
Message 1284777 - Posted: 17 Sep 2012, 13:00:35 UTC - in response to Message 1284772.

The explanation is well known, and has been given on these boards many times

I think we should have a sticky thread about it, this question seems to come back at least 1-2 times a week (including posts in the panic thread).

People actually read those?

Ideally I think the solution would be to change the results messages from
"Outcome No reply" & "Timed out - no response"
to something like
"Outcome server canceled" & "Timed out - server canceled"

Something more descriptive would be better, but I haven't had coffee yet this morning.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Profile Link
Avatar
Send message
Joined: 18 Sep 03
Posts: 813
Credit: 1,500,813
RAC: 410
Germany
Message 1284793 - Posted: 17 Sep 2012, 14:09:06 UTC - in response to Message 1284777.

That would be even better. There is already the status "Canceled by server", it's used for tasks cancelled by server in case of results returned after deadline, i.e. the replacement task is canceled after the late task has been returned and maybe validated (on projects which use this feature, for example Collatz). That status could be used here too, the current messages are completely wrong.
____________
.

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3289
Credit: 40,814,909
RAC: 58,199
Russia
Message 1284975 - Posted: 17 Sep 2012, 21:53:40 UTC

Unfortunatelly, these more destructive...

2609402650 1067642640 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402648 1067642634 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402645 1067642628 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402643 1067642622 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402641 1067642616 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402639 1067642610 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402637 1067642604 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402635 1067642598 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)
2609402631 1067642586 17 Sep 2012 | 15:27:44 UTC 17 Sep 2012 | 15:33:55 UTC Время истекло - нет ответа 0.00 0.00 --- SETI@home Enhanced
Анонимная платформа (Тип ЦП)

And now my main cruncher sits w/o CPU tasks at all - quota was stretched to 6 tasks per day... because of server errors.

IMHO it's very time to get separate treatment for client side and server side errors. Last should not affect user quota !

tbret
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2390
Credit: 164,210,269
RAC: 68,621
United States
Message 1284980 - Posted: 17 Sep 2012, 22:13:06 UTC - in response to Message 1284975.



And now my main cruncher sits w/o CPU tasks at all - quota was stretched to 6 tasks per day... because of server errors.

IMHO it's very time to get separate treatment for client side and server side errors. Last should not affect user quota !





+1

Profile Fred E.
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 731
Credit: 22,098,078
RAC: 24,827
United States
Message 1284983 - Posted: 17 Sep 2012, 22:19:30 UTC

Isn't this new wording? Change today?

9/17/2012 5:09:59 PM | SETI@home | Didn't resend lost task 13my12aa.28462.19290.12.10.168.vlar_0 (expired).

At least it gives a clue.

Agree that these should not impact daily quotas and s/b called server aborted or something like that.

____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 3570
Credit: 97,994,472
RAC: 79,137
United States
Message 1285147 - Posted: 18 Sep 2012, 13:51:13 UTC - in response to Message 1284983.

Isn't this new wording? Change today?

9/17/2012 5:09:59 PM | SETI@home | Didn't resend lost task 13my12aa.28462.19290.12.10.168.vlar_0 (expired).

At least it gives a clue.

Agree that these should not impact daily quotas and s/b called server aborted or something like that.

The client has had that message for quite some time.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

JohnDK
Volunteer tester
Avatar
Send message
Joined: 28 May 00
Posts: 823
Credit: 33,522,099
RAC: 69,346
Denmark
Message 1285155 - Posted: 18 Sep 2012, 14:29:02 UTC

I do wonder why this AP timed out.

http://setiathome.berkeley.edu/workunit.php?wuid=1068219734

Profile DemiGoth
Send message
Joined: 18 Nov 08
Posts: 12
Credit: 612,558
RAC: 0
Netherlands
Message 1285156 - Posted: 18 Sep 2012, 14:37:25 UTC

Now we're at it... How about the BOINC manager working on tasks in order of deadline? I have enough time for all tasks on my system (most of the time...), but I always notice that tasks with a deadline of Nov 3 are already processed while deadlines of Sept 30 are still waiting...
____________
<--- Searching for this one ;-)

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 60,047,474
RAC: 87,548
Argentina
Message 1285176 - Posted: 18 Sep 2012, 15:31:20 UTC - in response to Message 1285156.

Now we're at it... How about the BOINC manager working on tasks in order of deadline? I have enough time for all tasks on my system (most of the time...), but I always notice that tasks with a deadline of Nov 3 are already processed while deadlines of Sept 30 are still waiting...

AFAIK, BOINC works in strict FIFO order unless it thinks that one WU will miss a deadline. If it were working by dealine order, then long tasks with long deadlines will be allways delayed and suspended when new work with short deadlines appears. And then projects with long tasks (and long deadlines) will have serious issues to get their work done (not to mention that then some projects may try to reduce the deadlines to put themselves very high in the order lists).
If the client scheduller notices that some WUs will miss the deadline then it enters in "panic mode" (aka High priority) and then it changes from the FIFO order to the deadline order and crunches first those tasks in danger...
____________

Profile Bill G
Avatar
Send message
Joined: 1 Jun 01
Posts: 340
Credit: 29,975,387
RAC: 68,618
United States
Message 1285233 - Posted: 18 Sep 2012, 21:35:16 UTC - in response to Message 1285155.

I do wonder why this AP timed out.

http://setiathome.berkeley.edu/workunit.php?wuid=1068219734

My guess would be that it was stuck downloading and did not finish the download in time. I am seeing that every now and then now that I am having problems downloading. (This is especially more the case when it is a large file as APs are)
____________

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 60,047,474
RAC: 87,548
Argentina
Message 1285246 - Posted: 18 Sep 2012, 21:54:12 UTC - in response to Message 1285233.
Last modified: 18 Sep 2012, 21:54:50 UTC

I do wonder why this AP timed out.

http://setiathome.berkeley.edu/workunit.php?wuid=1068219734

My guess would be that it was stuck downloading and did not finish the download in time. I am seeing that every now and then now that I am having problems downloading. (This is especially more the case when it is a large file as APs are)

My guess is different...and I think its a bug (or feature?) in the resent code that aborts the APs that were sent to a CPU if they get resent to an Nvidia GPU that is not OpenCl "capable" (i.e. its not on a host running Boinc 7.xx)...
____________

JohnDK
Volunteer tester
Avatar
Send message
Joined: 28 May 00
Posts: 823
Credit: 33,522,099
RAC: 69,346
Denmark
Message 1285248 - Posted: 18 Sep 2012, 21:57:35 UTC - in response to Message 1285246.

I do wonder why this AP timed out.

http://setiathome.berkeley.edu/workunit.php?wuid=1068219734


My guess is different...and I think its a bug (or feature?) in the resent code that aborts the APs that were sent to a CPU if they get resent to an Nvidia GPU that is not OpenCl "capable" (i.e. its not on a host running Boinc 7.xx)...

I have disabled AP's for GPU for now, so it can't be that.

Horacio
Send message
Joined: 14 Jan 00
Posts: 536
Credit: 60,047,474
RAC: 87,548
Argentina
Message 1285254 - Posted: 18 Sep 2012, 22:10:55 UTC - in response to Message 1285248.
Last modified: 18 Sep 2012, 22:19:08 UTC

I do wonder why this AP timed out.

http://setiathome.berkeley.edu/workunit.php?wuid=1068219734


My guess is different...and I think its a bug (or feature?) in the resent code that aborts the APs that were sent to a CPU if they get resent to an Nvidia GPU that is not OpenCl "capable" (i.e. its not on a host running Boinc 7.xx)...

I have disabled AP's for GPU for now, so it can't be that.


At the contrary, I think it's more likely then...
If the scheduller was not able to sent them to your GPU then thats why they were "aborted"... The resent code doesnt put WUs on hold waiting for a request to a more suitable device... if the scheduller were doing that then instead of the usual short deadline on vlars, they will be skipped until you need a CPU task... (I have several APs "aborted" through the deadline in all my hosts. All them use BOINC 6.10.60 but only one doesnt have the apps to crunch them on GPU...)

Edit: What I think, is that with the adition of the new OpenCl stock apps for ATI and Nvidia, and trying to make them compatible with older clients using the optimized apps, and all the workarounds made to avoid vlars on GPU and whatnot... something is not working as intended with the resend of APs...
____________

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 5178
Credit: 82,999,427
RAC: 71,457
Australia
Message 1285260 - Posted: 18 Sep 2012, 22:19:56 UTC - in response to Message 1285248.

I do wonder why this AP timed out.

http://setiathome.berkeley.edu/workunit.php?wuid=1068219734


My guess is different...and I think its a bug (or feature?) in the resent code that aborts the APs that were sent to a CPU if they get resent to an Nvidia GPU that is not OpenCl "capable" (i.e. its not on a host running Boinc 7.xx)...

I have disabled AP's for GPU for now, so it can't be that.

Well then that answers it, that AP just suffered the same fate that VLAR's do.

Cheers.
____________

JohnDK
Volunteer tester
Avatar
Send message
Joined: 28 May 00
Posts: 823
Credit: 33,522,099
RAC: 69,346
Denmark
Message 1285264 - Posted: 18 Sep 2012, 22:24:41 UTC - in response to Message 1285260.

I do wonder why this AP timed out.

http://setiathome.berkeley.edu/workunit.php?wuid=1068219734


My guess is different...and I think its a bug (or feature?) in the resent code that aborts the APs that were sent to a CPU if they get resent to an Nvidia GPU that is not OpenCl "capable" (i.e. its not on a host running Boinc 7.xx)...

I have disabled AP's for GPU for now, so it can't be that.

Well then that answers it, that AP just suffered the same fate that VLAR's do.

Cheers.

I disabled, removed the app_info section, for GPU APs days ago, so that AP could only be sent/resent to CPU.

1 · 2 · Next

Message boards : Number crunching : Impossible deadlines

Copyright © 2014 University of California