Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation
Previous · 1 . . . 81 · 82 · 83 · 84 · 85 · 86 · 87 . . . 107 · Next
Author | Message |
---|---|
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
But if the resend goes into a black hole, the re-resend would happen far sooner than in the case where the extra resend won't happen until shortly before the original deadline.It would be smarter to send those n days after the last scheduler contact of the pending cruncher than n days before the deadline. If a host hasn't contacted the servers after the faucets were closed, the task is unlikely to be returned and should be resent now even if its deadline is in June.. . The problem is that even if that is done and the task is validated it will not clear until the current 'zombied' task is expired. Also if the extra resend happens long before the original deadline, it could be given the same deadline. So the resend host blackholing wouldn't then extend the WU lifetime at all. Ghost recoveries do just this. When you re-receive a recovered ghost, it gets almost the same deadline the task was originally given (not exactly but the difference is just seconds - I don't know why). If the original task had a 7 week deadline and extra resend happens when the host has been MIA for a week, and every host that receives the task just disappears, 6 extra resends could be sent one every week. The last one would have just one week deadline. So when the original deadline hits, there is 8 results from 8 hosts expiring at the same time. What's the chance that less than 2 out of those 8 hosts returns the result in time? This would ensure the WU gets validated before the original deadline with very high probability without extending its database lifetime at all. The extra resend script would have to just check the total replication of the WU before resending to avoid the repeat of March 30 disaster that doomed thousands of WUs to "Completed, can't validate". High replication is good for getting the task done but too high is bad. |
![]() ![]() ![]() Send message Joined: 6 Nov 99 Posts: 717 Credit: 8,032,827 RAC: 62 ![]() ![]() |
|
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
At the moment, they seem to be 'resending batches' early, without re-thinking all the subtleties surrounding work allocation (and BOINC, in spite of all our moans, is really quite subtle). I've got host 8907573 as an 'early resend' wingmate on at least one task. I got the 'real' resend on deadline, and I've completed it, returned it, and it's validated. Host 8907573 is a very slow machine, with runtime exceeding 24 hours on one valid task. Yet it had 40 tasks allocated in two batches on 16 April, and has an average turnround of 13 days. Because our caches have drained to zero, on the rare occasions when a work request is made just as a batch of resends becomes available, we'll ask for and receive a lot of work. Slow machines, or machines which ask once and disappear, will hold workunits back from assimilation even more than usual. I'm already seeing my 'valid, but not purged' task list starting to grow again. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13959 Credit: 208,696,464 RAC: 304 ![]() ![]() |
It's a miracle!And again. 2 miracles in one day is unheard of. 18/04/2020 19:34:48 | SETI@home | Reporting 16 completed tasks 18/04/2020 19:34:48 | SETI@home | Requesting new tasks for NVIDIA GPU 18/04/2020 19:34:52 | SETI@home | Scheduler request completed: got 180 new tasks Grant Darwin NT |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19716 Credit: 40,757,560 RAC: 67 ![]() ![]() |
It's a miracle! O ye of little faith ;-) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13959 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Not sure why many of these Tasks have been resent- they're still waiting on other Results to be returned before anything will happen with them anyway. Grant Darwin NT |
![]() ![]() Send message Joined: 23 May 99 Posts: 7381 Credit: 44,181,323 RAC: 238 ![]() ![]() |
Greetings, I'm flabbergasted!!! I got 1 task this morning, the firsts since the 12th. :| Have a great day! :) Siran CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19716 Credit: 40,757,560 RAC: 67 ![]() ![]() |
Not sure why many of these Tasks have been resent- they're still waiting on other Results to be returned before anything will happen with them anyway. It's probably Eric trying to hurry the process up, but without thinking the whole process through. I've got at least one WU where there has been a pre-emptive task _02 generated (16 Apr 2020, 19:51:42 UTC), which wasn't returned by the original deadline (18 Apr 2020, 9:38:16 UTC), and I got the task _03 generated by the failure to report by deadline. This task I have reported (18 Apr 2020, 10:38:40 UTC) and it is now valid and now waiting for the _02 which has a deadline of 9 Jun 2020, 0:51:24 UTC. https://setiathome.berkeley.edu/workunit.php?wuid=3902833369 |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
My GPU cache is full with 150 tasks. . . I know that feeling, it is a lottery. But I have 3 machines still running, (just restarted a 4th) and the faster 2 have nothing, but the slowest had a windfall of a big bunch of tasks yesterday, despite all 3 getting nothing for nearly 2 weeks. . . Hang in there ... Stephen . . |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
It's a miracle! . . See what did I tell you :) Stephen :) |
Oddbjornik ![]() ![]() ![]() ![]() Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728 ![]() ![]() |
Whoa! Don't think I've ever seen this before! 18.04.2020 15.04.05 | SETI@home | Scheduler request completed: got 338 new tasks |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
It's a miracle!And again. 2 miracles in one day is unheard of. . . OK now you're just bragging :) Stephen :) |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Whoa! Don't think I've ever seen this before! . . OK, getting out the measuring tapes are we ??? Stephen . . . I still have hope ... |
juan BFP ![]() ![]() ![]() ![]() Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 ![]() ![]() |
Whoa! Don't think I've ever seen this before! You win the first price of the SETI Lotto! Congratulations. I only get 1 in the last night. But i not claim, still have a lot on the cache. ![]() |
Scrooge McDuck ![]() Send message Joined: 26 Nov 99 Posts: 1741 Credit: 1,674,173 RAC: 54 ![]() ![]() |
It's probably Eric trying to hurry the process up, but without thinking the whole process through. I've got at least one WU where there has been a pre-emptive task _02 generated (16 Apr 2020, 19:51:42 UTC), which wasn't returned by the original deadline (18 Apr 2020, 9:38:16 UTC), and I got the task _03 generated by the failure to report by deadline. This task I have reported (18 Apr 2020, 10:38:40 UTC) and it is now valid and now waiting for the _02 which has a deadline of 9 Jun 2020, 0:51:24 UTC. Anyhow, carefully triggering an additional resend (a day, 3 days ... or 7 days before deadline) breaks up endless chains of timeout, resend, timeout, resend... with high probability. Why? Without pre-emtively resending, all tasks are necessary to get a valididated (authorized) result. With an additional pre-emtive resend the replication of the wu is raised to 3, but only 2 successful finished tasks are required (minimal quorum). So, for your refered wu 3902833369, a further resend/timeout chain is stopped. When the pre-emtive task ".._02" times out, there won't be a further resend. It's not necessary because minimal quorum is already reached. |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Anyhow, carefully triggering an additional resend (a day, 3 days ... or 7 days before deadline) breaks up endless chains of timeout, resend, timeout, resend... with high probability. Why? Without pre-emtively resending, all tasks are necessary to get a valididated (authorized) result. With an additional pre-emtive resend the replication of the wu is raised to 3, but only 2 successful finished tasks are required (minimal quorum). So, for your refered wu 3902833369, a further resend/timeout chain is stopped. When the pre-emtive task ".._02" times out, there won't be a further resend. It's not necessary because minimal quorum is already reached.But doing it just a few days before deadline gains only those few days. And if you do that without looking at the host whose task it is, you risk unnecessarily postponing the assimilation of the task by nearly two months. So this should be done as early as possible long before the deadline and only to the tasks of the hosts that have gone MIA. |
![]() ![]() ![]() Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22 ![]() ![]() |
Just got some resends of tasks that are due to expire April 30th. So it is now 12 days before due date. edit: some of the early resends I got yesterday have a May7 deadline. I should be finish them today, but I hope it doesn't resend the resends before I do them. |
Scrooge McDuck ![]() Send message Joined: 26 Nov 99 Posts: 1741 Credit: 1,674,173 RAC: 54 ![]() ![]() |
It's a miracle!And again. 2 miracles in one day is unheard of. Maybe no miracle or lottery here. I speculated, based on the timestamps in Grant (SSSF)s message, on a hourly triggered script resending WUs pre-emptively. So, I manually triggered a scheduler request at: 16:03:41 (UTC+0200); 3 minutes, 41 seconds after a full hour. So, I don't know Grants' timezone (e.g. there is this 30min shifted central australia time). Anyhow, the next automatic requests are timed each 30 minutes (1,818 secs). That's the outcome (local times UTC+0200): 18.04.2020 16:03:41 | SETI@home | Sending scheduler request: Requested by user. 18.04.2020 16:03:43 | SETI@home | Scheduler request completed: got 0 new tasks 18.04.2020 16:34:08 | SETI@home | Scheduler request completed: got 0 new tasks 18.04.2020 17:04:31 | SETI@home | Scheduler request completed: got 107 new tasks 18.04.2020 17:34:54 | SETI@home | Scheduler request completed: got 0 new tasks 18.04.2020 18:05:21 | SETI@home | Scheduler request completed: got 182 new tasks The Munin graph shows hourly peaks in task generation around 5 ... to ... circa 12 minutes after each full hour. Maybe there's some delay due to cyclic parsing of the server status page by this Munin website... |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
Just got some resends of tasks that are due to expire April 30th. So it is now 12 days before due date.My CPU is currently crunching stuff with May 8 deadlines and doing them in deadline order. If they start resending those early, they are just needlessly postponing their assimilation because my computer is returning them very soon. |
Scrooge McDuck ![]() Send message Joined: 26 Nov 99 Posts: 1741 Credit: 1,674,173 RAC: 54 ![]() ![]() |
Whoa! Don't think I've ever seen this before! Oddbjorniks surprise pops up within the same time span (4:05 min) after a full hour. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.