The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 81 · 82 · 83 · 84 · 85 · 86 · 87 . . . 107 · Next

AuthorMessage
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2045575 - Posted: 18 Apr 2020, 9:42:50 UTC - in response to Message 2045497.  
Last modified: 18 Apr 2020, 9:43:49 UTC

It would be smarter to send those n days after the last scheduler contact of the pending cruncher than n days before the deadline. If a host hasn't contacted the servers after the faucets were closed, the task is unlikely to be returned and should be resent now even if its deadline is in June.
. . The problem is that even if that is done and the task is validated it will not clear until the current 'zombied' task is expired.
But if the resend goes into a black hole, the re-resend would happen far sooner than in the case where the extra resend won't happen until shortly before the original deadline.

Also if the extra resend happens long before the original deadline, it could be given the same deadline. So the resend host blackholing wouldn't then extend the WU lifetime at all. Ghost recoveries do just this. When you re-receive a recovered ghost, it gets almost the same deadline the task was originally given (not exactly but the difference is just seconds - I don't know why).

If the original task had a 7 week deadline and extra resend happens when the host has been MIA for a week, and every host that receives the task just disappears, 6 extra resends could be sent one every week. The last one would have just one week deadline. So when the original deadline hits, there is 8 results from 8 hosts expiring at the same time. What's the chance that less than 2 out of those 8 hosts returns the result in time? This would ensure the WU gets validated before the original deadline with very high probability without extending its database lifetime at all.

The extra resend script would have to just check the total replication of the WU before resending to avoid the repeat of March 30 disaster that doomed thousands of WUs to "Completed, can't validate". High replication is good for getting the task done but too high is bad.
ID: 2045575 · Report as offensive     Reply Quote
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 717
Credit: 8,032,827
RAC: 62
France
Message 2045576 - Posted: 18 Apr 2020, 9:52:06 UTC

ID: 2045576 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2045577 - Posted: 18 Apr 2020, 10:00:30 UTC - in response to Message 2045575.  

At the moment, they seem to be 'resending batches' early, without re-thinking all the subtleties surrounding work allocation (and BOINC, in spite of all our moans, is really quite subtle).

I've got host 8907573 as an 'early resend' wingmate on at least one task. I got the 'real' resend on deadline, and I've completed it, returned it, and it's validated.

Host 8907573 is a very slow machine, with runtime exceeding 24 hours on one valid task. Yet it had 40 tasks allocated in two batches on 16 April, and has an average turnround of 13 days.

Because our caches have drained to zero, on the rare occasions when a work request is made just as a batch of resends becomes available, we'll ask for and receive a lot of work. Slow machines, or machines which ask once and disappear, will hold workunits back from assimilation even more than usual. I'm already seeing my 'valid, but not purged' task list starting to grow again.
ID: 2045577 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13959
Credit: 208,696,464
RAC: 304
Australia
Message 2045580 - Posted: 18 Apr 2020, 10:28:23 UTC - in response to Message 2045574.  

It's a miracle!
18/04/2020 18:33:57 | SETI@home | Scheduler request completed: got 150 new tasks
And again. 2 miracles in one day is unheard of.

18/04/2020 19:34:48 | SETI@home | Reporting 16 completed tasks
18/04/2020 19:34:48 | SETI@home | Requesting new tasks for NVIDIA GPU
18/04/2020 19:34:52 | SETI@home | Scheduler request completed: got 180 new tasks

Grant
Darwin NT
ID: 2045580 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19716
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2045582 - Posted: 18 Apr 2020, 10:42:42 UTC - in response to Message 2045574.  
Last modified: 18 Apr 2020, 10:46:45 UTC

It's a miracle!
18/04/2020 18:33:57 | SETI@home | Scheduler request completed: got 150 new tasks

O ye of little faith ;-)
ID: 2045582 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13959
Credit: 208,696,464
RAC: 304
Australia
Message 2045587 - Posted: 18 Apr 2020, 11:23:35 UTC

Not sure why many of these Tasks have been resent- they're still waiting on other Results to be returned before anything will happen with them anyway.
Grant
Darwin NT
ID: 2045587 · Report as offensive     Reply Quote
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7381
Credit: 44,181,323
RAC: 238
United States
Message 2045594 - Posted: 18 Apr 2020, 12:19:19 UTC

Greetings,

I'm flabbergasted!!! I got 1 task this morning, the firsts since the 12th. :|

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2045594 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19716
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2045597 - Posted: 18 Apr 2020, 12:46:36 UTC - in response to Message 2045587.  
Last modified: 18 Apr 2020, 12:47:52 UTC

Not sure why many of these Tasks have been resent- they're still waiting on other Results to be returned before anything will happen with them anyway.

It's probably Eric trying to hurry the process up, but without thinking the whole process through. I've got at least one WU where there has been a pre-emptive task _02 generated (16 Apr 2020, 19:51:42 UTC), which wasn't returned by the original deadline (18 Apr 2020, 9:38:16 UTC), and I got the task _03 generated by the failure to report by deadline. This task I have reported (18 Apr 2020, 10:38:40 UTC) and it is now valid and now waiting for the _02 which has a deadline of 9 Jun 2020, 0:51:24 UTC.
https://setiathome.berkeley.edu/workunit.php?wuid=3902833369
ID: 2045597 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2045601 - Posted: 18 Apr 2020, 13:06:35 UTC - in response to Message 2045558.  

My GPU cache is full with 150 tasks.

13/04/2020 8:14:13 | SETI@home | Scheduler request completed: got 0 new tasks
13/04/2020 8:44:35 | SETI@home | Scheduler request completed: got 0 new tasks
snip
18/04/2020 14:47:18 | SETI@home | Scheduler request completed: got 0 new tasks
18/04/2020 15:17:40 | SETI@home | Scheduler request completed: got 0 new tasks
18/04/2020 16:02:06 | SETI@home | Scheduler request completed: got 0 new tasks

Not that i'm bitter or anything...


. . I know that feeling, it is a lottery. But I have 3 machines still running, (just restarted a 4th) and the faster 2 have nothing, but the slowest had a windfall of a big bunch of tasks yesterday, despite all 3 getting nothing for nearly 2 weeks.

. . Hang in there ...

Stephen

. .
ID: 2045601 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2045602 - Posted: 18 Apr 2020, 13:08:50 UTC - in response to Message 2045574.  

It's a miracle!
18/04/2020 18:33:57 | SETI@home | Scheduler request completed: got 150 new tasks


. . See what did I tell you :)

Stephen

:)
ID: 2045602 · Report as offensive     Reply Quote
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 2045603 - Posted: 18 Apr 2020, 13:10:53 UTC

Whoa! Don't think I've ever seen this before!
18.04.2020 15.04.05 | SETI@home | Scheduler request completed: got 338 new tasks

ID: 2045603 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2045604 - Posted: 18 Apr 2020, 13:13:29 UTC - in response to Message 2045580.  

It's a miracle!
18/04/2020 18:33:57 | SETI@home | Scheduler request completed: got 150 new tasks
And again. 2 miracles in one day is unheard of.

18/04/2020 19:34:48 | SETI@home | Reporting 16 completed tasks
18/04/2020 19:34:48 | SETI@home | Requesting new tasks for NVIDIA GPU
18/04/2020 19:34:52 | SETI@home | Scheduler request completed: got 180 new tasks


. . OK now you're just bragging :)

Stephen

:)
ID: 2045604 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2045605 - Posted: 18 Apr 2020, 13:16:27 UTC - in response to Message 2045603.  

Whoa! Don't think I've ever seen this before!
18.04.2020 15.04.05 | SETI@home | Scheduler request completed: got 338 new tasks


. . OK, getting out the measuring tapes are we ???

Stephen

. . . I still have hope ...
ID: 2045605 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2045608 - Posted: 18 Apr 2020, 13:24:03 UTC - in response to Message 2045603.  
Last modified: 18 Apr 2020, 13:35:07 UTC

Whoa! Don't think I've ever seen this before!
18.04.2020 15.04.05 | SETI@home | Scheduler request completed: got 338 new tasks

You win the first price of the SETI Lotto! Congratulations.

I only get 1 in the last night. But i not claim, still have a lot on the cache.
ID: 2045608 · Report as offensive     Reply Quote
Scrooge McDuck
Avatar

Send message
Joined: 26 Nov 99
Posts: 1741
Credit: 1,674,173
RAC: 54
Germany
Message 2045614 - Posted: 18 Apr 2020, 13:39:17 UTC - in response to Message 2045597.  
Last modified: 18 Apr 2020, 13:40:11 UTC

It's probably Eric trying to hurry the process up, but without thinking the whole process through. I've got at least one WU where there has been a pre-emptive task _02 generated (16 Apr 2020, 19:51:42 UTC), which wasn't returned by the original deadline (18 Apr 2020, 9:38:16 UTC), and I got the task _03 generated by the failure to report by deadline. This task I have reported (18 Apr 2020, 10:38:40 UTC) and it is now valid and now waiting for the _02 which has a deadline of 9 Jun 2020, 0:51:24 UTC.
https://setiathome.berkeley.edu/workunit.php?wuid=3902833369

Anyhow, carefully triggering an additional resend (a day, 3 days ... or 7 days before deadline) breaks up endless chains of timeout, resend, timeout, resend... with high probability. Why? Without pre-emtively resending, all tasks are necessary to get a valididated (authorized) result. With an additional pre-emtive resend the replication of the wu is raised to 3, but only 2 successful finished tasks are required (minimal quorum). So, for your refered wu 3902833369, a further resend/timeout chain is stopped. When the pre-emtive task ".._02" times out, there won't be a further resend. It's not necessary because minimal quorum is already reached.
ID: 2045614 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2045636 - Posted: 18 Apr 2020, 14:54:54 UTC - in response to Message 2045614.  
Last modified: 18 Apr 2020, 14:55:18 UTC

Anyhow, carefully triggering an additional resend (a day, 3 days ... or 7 days before deadline) breaks up endless chains of timeout, resend, timeout, resend... with high probability. Why? Without pre-emtively resending, all tasks are necessary to get a valididated (authorized) result. With an additional pre-emtive resend the replication of the wu is raised to 3, but only 2 successful finished tasks are required (minimal quorum). So, for your refered wu 3902833369, a further resend/timeout chain is stopped. When the pre-emtive task ".._02" times out, there won't be a further resend. It's not necessary because minimal quorum is already reached.
But doing it just a few days before deadline gains only those few days. And if you do that without looking at the host whose task it is, you risk unnecessarily postponing the assimilation of the task by nearly two months. So this should be done as early as possible long before the deadline and only to the tasks of the hosts that have gone MIA.
ID: 2045636 · Report as offensive     Reply Quote
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2045672 - Posted: 18 Apr 2020, 18:11:59 UTC
Last modified: 18 Apr 2020, 18:27:46 UTC

Just got some resends of tasks that are due to expire April 30th. So it is now 12 days before due date.

edit: some of the early resends I got yesterday have a May7 deadline. I should be finish them today, but I hope it doesn't resend the resends before I do them.
ID: 2045672 · Report as offensive     Reply Quote
Scrooge McDuck
Avatar

Send message
Joined: 26 Nov 99
Posts: 1741
Credit: 1,674,173
RAC: 54
Germany
Message 2045676 - Posted: 18 Apr 2020, 18:56:42 UTC - in response to Message 2045580.  

It's a miracle!
18/04/2020 18:33:57 | SETI@home | Scheduler request completed: got 150 new tasks
And again. 2 miracles in one day is unheard of.

18/04/2020 19:34:48 | SETI@home | Reporting 16 completed tasks
18/04/2020 19:34:48 | SETI@home | Requesting new tasks for NVIDIA GPU
18/04/2020 19:34:52 | SETI@home | Scheduler request completed: got 180 new tasks

Maybe no miracle or lottery here. I speculated, based on the timestamps in Grant (SSSF)s message, on a hourly triggered script resending WUs pre-emptively. So, I manually triggered a scheduler request at: 16:03:41 (UTC+0200); 3 minutes, 41 seconds after a full hour. So, I don't know Grants' timezone (e.g. there is this 30min shifted central australia time). Anyhow, the next automatic requests are timed each 30 minutes (1,818 secs). That's the outcome (local times UTC+0200):

18.04.2020 16:03:41 | SETI@home | Sending scheduler request: Requested by user.
18.04.2020 16:03:43 | SETI@home | Scheduler request completed: got 0 new tasks
18.04.2020 16:34:08 | SETI@home | Scheduler request completed: got 0 new tasks
18.04.2020 17:04:31 | SETI@home | Scheduler request completed: got 107 new tasks
18.04.2020 17:34:54 | SETI@home | Scheduler request completed: got 0 new tasks
18.04.2020 18:05:21 | SETI@home | Scheduler request completed: got 182 new tasks

The Munin graph shows hourly peaks in task generation around 5 ... to ... circa 12 minutes after each full hour. Maybe there's some delay due to cyclic parsing of the server status page by this Munin website...
ID: 2045676 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2045678 - Posted: 18 Apr 2020, 19:00:59 UTC - in response to Message 2045672.  

Just got some resends of tasks that are due to expire April 30th. So it is now 12 days before due date.

edit: some of the early resends I got yesterday have a May7 deadline. I should be finish them today, but I hope it doesn't resend the resends before I do them.
My CPU is currently crunching stuff with May 8 deadlines and doing them in deadline order. If they start resending those early, they are just needlessly postponing their assimilation because my computer is returning them very soon.
ID: 2045678 · Report as offensive     Reply Quote
Scrooge McDuck
Avatar

Send message
Joined: 26 Nov 99
Posts: 1741
Credit: 1,674,173
RAC: 54
Germany
Message 2045681 - Posted: 18 Apr 2020, 19:04:01 UTC - in response to Message 2045603.  

Whoa! Don't think I've ever seen this before!
18.04.2020 15.04.05 | SETI@home | Scheduler request completed: got 338 new tasks

Oddbjorniks surprise pops up within the same time span (4:05 min) after a full hour.
ID: 2045681 · Report as offensive     Reply Quote
Previous · 1 . . . 81 · 82 · 83 · 84 · 85 · 86 · 87 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.