Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 107 · Next
Author | Message |
---|---|
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14655 Credit: 200,643,578 RAC: 874 |
Got the list for your second wingmate: https://setiathome.berkeley.edu/workunit.php?wuid=3861282516I'll paste the last page in plain text, to save stressing the servers any more. 8484925894 3857466952 27 Jan 2020, 23:17:25 UTC 2 Feb 2020, 19:28:32 UTC Completed, waiting for validation 12.58 10.08 pending SETI@home v8 v8.20 (opencl_ati5_SoG_mac) x86_64-apple-darwin 8484781777 3857403492 27 Jan 2020, 22:26:07 UTC 20 Mar 2020, 13:52:07 UTC In progress --- --- --- SETI@home v8 v8.05 x86_64-apple-darwin 8478387349 3853330864 26 Jan 2020, 13:08:24 UTC 8 Feb 2020, 8:00:21 UTC Completed, validation inconclusive 3,087.29 259.78 pending SETI@home v8 v8.00 (opencl_intel_gpu_sah) x86_64-apple-darwin 8477774759 3854401397 26 Jan 2020, 8:34:43 UTC 27 Jan 2020, 1:18:17 UTC Completed, validation inconclusive 112.47 25.07 pending SETI@home v8 v8.20 (opencl_ati5_SoG_mac) x86_64-apple-darwin 8470877534 3851643811 24 Jan 2020, 10:57:47 UTC 26 Jan 2020, 5:23:06 UTC Completed, waiting for validation 9.31 5.86 pending SETI@home v8 v8.20 (opencl_ati5_SoG_mac) x86_64-apple-darwin 8469487773 3851088659 24 Jan 2020, 4:50:26 UTC 3 Feb 2020, 8:00:12 UTC Completed, validation inconclusive 5,607.09 315.76 pending SETI@home v8 v8.00 (opencl_intel_gpu_sah) x86_64-apple-darwin 8446482330 3840880946 16 Jan 2020, 3:39:18 UTC 27 Jan 2020, 23:39:22 UTC Completed, validation inconclusive 4,288.40 205.55 pending SETI@home v8 v8.00 (opencl_intel_gpu_sah) x86_64-apple-darwin 8445832229 3840571186 15 Jan 2020, 23:57:54 UTC 9 Mar 2020, 4:57:36 UTC In progress --- --- --- SETI@home v8 v8.05 x86_64-apple-darwin 8445610988 3840463677 15 Jan 2020, 21:12:11 UTC 26 Jan 2020, 8:02:14 UTC Completed, waiting for validation 2,266.98 140.51 pending SETI@home v8 v8.00 (opencl_intel_gpu_sah) x86_64-apple-darwin 8443744341 3839606065 13 Jan 2020, 20:58:42 UTC 7 Mar 2020, 1:58:24 UTC Timed out - no response 0.00 0.00 --- SETI@home v8 v8.05 (mac_intel32) i686-apple-darwin 8438563644 3837285679 12 Jan 2020, 14:17:14 UTC 8 Mar 2020, 3:10:09 UTC In progress --- --- --- SETI@home v8 v8.05 x86_64-apple-darwin 8419584286 3828746925 8 Jan 2020, 9:12:19 UTC 30 Mar 2020, 3:13:51 UTC In progress --- --- --- SETI@home v8 v8.05 x86_64-apple-darwin 8419584868 3828747259 8 Jan 2020, 9:12:19 UTC 30 Mar 2020, 3:13:51 UTC In progress --- --- --- SETI@home v8 v8.05 x86_64-apple-darwin 8418449958 3828209706 8 Jan 2020, 4:04:08 UTC 19 Jan 2020, 0:12:33 UTC Completed, validation inconclusive 2,480.45 158.94 pending SETI@home v8 v8.00 (opencl_intel_gpu_sah) x86_64-apple-darwin Handy that it's an apple-darwin, so we see all the problems. In progress = ghost Timed out - I'll follow up that WU validation inconclusive - more wingmen, more pendings waiting for validation - more wingmen, more pendings Many shorties, so they may be automatic extra checks for the bad drivers. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14655 Credit: 200,643,578 RAC: 874 |
Timed out - I'll follow up that WUVery interesting - we've caught one in the act. WU 3839606065 It shows as timed out on the original computer's task list, but here it's validated 8443744340 8741853 13 Jan 2020, 20:58:37 UTC 14 Jan 2020, 3:22:27 UTC Completed and validated 10,428.06 10,387.59 61.29 SETI@home v8 v8.00 x86_64-pc-linux-gnu 8443744341 8825095 13 Jan 2020, 20:58:42 UTC 7 Mar 2020, 8:01:24 UTC Completed and validated 11,530.82 10,085.35 61.29 SETI@home v8 v8.05 (mac_intel32) i686-apple-darwin 8619499217 8294363 7 Mar 2020, 1:58:27 UTC 29 Apr 2020, 6:58:09 UTC In progress --- --- --- SETI@home v8 Anonymous platform (CPU)But the server had already created and sent out yet another replication. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14655 Credit: 200,643,578 RAC: 874 |
https://setiathome.berkeley.edu/workunit.php?wuid=3860194203Your first wingmate is a typical 'walk-away', despite the recent contact: 8491240699 3860194403 30 Jan 2020, 5:16:58 UTC 23 Mar 2020, 10:16:40 UTC In progress --- --- --- SETI@home v8 v8.05 windows_x86_64 8491240701 3860194395 30 Jan 2020, 5:16:58 UTC 23 Mar 2020, 10:16:40 UTC In progress --- --- --- SETI@home v8 v8.05 windows_x86_64 8491240703 3860194409 30 Jan 2020, 5:16:58 UTC 22 Mar 2020, 20:58:10 UTC In progress --- --- --- SETI@home v8 v8.05 windows_x86_64 8458013734 3843718439 19 Jan 2020, 4:12:01 UTC 11 Mar 2020, 16:29:55 UTC In progress --- --- --- SETI@home v8 v8.00 windows_intelx86 8458013663 3845972264 19 Jan 2020, 4:12:01 UTC 12 Mar 2020, 15:42:47 UTC In progress --- --- --- SETI@home v8 v8.00 windows_intelx86 8458013730 3845972449 19 Jan 2020, 4:12:00 UTC 12 Mar 2020, 11:27:25 UTC In progress --- --- --- SETI@home v8 v8.00 windows_intelx86 8458013732 3845972455 19 Jan 2020, 4:12:00 UTC 12 Mar 2020, 16:07:20 UTC In progress --- --- --- SETI@home v8 v8.00 windows_intelx86 8321265296 3782772627 10 Dec 2019, 5:56:10 UTC 26 Dec 2019, 9:53:27 UTC Completed, validation inconclusive 6,215.97 183.42 pending SETI@home v8 v8.20 (opencl_intel_gpu_sah) windows_intelx86 Possibly switches the machine on for work only, but has the 'suspend BOINC while user is active' option set - so BOINC never has a chance to actually run anything. |
Kissagogo27 Send message Joined: 6 Nov 99 Posts: 716 Credit: 8,032,827 RAC: 62 |
Assimilation logjam holds tasks in that 15 milllion result SSP slot for about three days only. So the quorum 1 tasks have been assimilated and deleted long ago. not really true, look at thisone quorum minimum 1 réplication initiale 2 nombre maximum de tâches en erreur/totales/succès 5, 10, 5 it's between the minimal Quorum 1 setting and back to the normal Quorum 2 ... and still here . it must be a lot with this case ;) |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
So, my best guess (prediction) is that these will turn out to be ghost tasks, never received and never to be crunched. They will reach deadline and time out on 23 March. What happens then, I'm less certain about. The minimum quorum of one, but initial replication of two, is an unusual combination, and we don't know exactly how the SETI daemons are programmed to cope with it. Ideally, a simple 'finished/purge', but my concern would be that the system, in its current configuration, might create and send out a replacement task.If the wingman returns the result, he gets his credit and the task moves on. And my observations suggest that he gets the credit regardless of what he returns, so the results aren't even compared when the workunit has been validated already. I have seen several cases where a wildly different result returned to an already validated workunit gets the full credit. I haven't observed what happens when the tasks time out but I'm fairly certain that the wingman just gets the error and the task moves on without being replicated further. |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
This one is still being crunched by one host just like the two tasks in my list. It's not stuck in assimilation but hangs around simply because deleting it from the database would orphan the result that may still be in the cache of that host. Although it is most likely a ghost that the server only thinks the host has.Assimilation logjam holds tasks in that 15 milllion result SSP slot for about three days only. So the quorum 1 tasks have been assimilated and deleted long ago.not really true, look at thisone |
Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 |
Assimilation queue is bigger than ever. Almost 5 milllion workunits. Which means nearly 11 milllion results. Over 50% of all the results in the database are now stuck in assimilation. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13755 Credit: 208,696,464 RAC: 304 |
Message boards sluggish, Scheduler barely responsive (minute or 2 to respond with "Project has no tasks available", even when there are). Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13755 Credit: 208,696,464 RAC: 304 |
SETI is basically in hibernation already. March 31st would make no big difference.The forums will be faster & so will the Scheduler responses. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13755 Credit: 208,696,464 RAC: 304 |
Managed to pick up 1 WU, and it started to download & then got stuck 90% done & even disabling & re-enabling network access won't budge it (till i posted this of course). Grant Darwin NT |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
SETI is basically in hibernation already. March 31st would make no big difference. The project is not in hibernation until the 31st of March If you want to crunch crunch if you don't want to crunch don't crunch.. That is my two cents on this topic |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13755 Credit: 208,696,464 RAC: 304 |
The project is not in hibernation until the 31st of March If you want to crunch crunch if you don't want to crunch don't crunch.Yes, but to be able to crunch, you have to be able to get work. That is becoming increasingly difficult. Grant Darwin NT |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
The project is not in hibernation until the 31st of March If you want to crunch crunch if you don't want to crunch don't crunch.Yes, but to be able to crunch, you have to be able to get work. That is becoming increasingly difficult. I would agree with that, it's a case of having to go with the flow. Last night when I went to bed around 930 New Zealand time the return rate was like we had come out of the Tuesday outage it was something like over 251,000 |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 |
Power outages in California appear to be over. That should have helped ease some of the stress of incoming tasks. |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 |
And the Replica is nearly 10 hours behind. Was it an unfortunate power victim? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13755 Credit: 208,696,464 RAC: 304 |
Whatever was going in California recently doesn't appear to have affected UC Berkeley. Grant Darwin NT |
Mr. Kevvy Send message Joined: 15 May 99 Posts: 3776 Credit: 1,114,826,392 RAC: 3,319 |
Was it an unfortunate power victim? One of several reasons that SETI@Home moved all the project's servers to the UC Berkeley colocation (aka CoLo) facility is that it has a backup generator for outages, and an enterprise-grade UPS that will keep the entire CoLo online until the generator kicks in. During last year's Berkeley outages due to PG&E cutting power to prevent fires that caused classes to be cancelled and closed most of the campus, I don't think SETI@Home had so much as a hiccup. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13755 Credit: 208,696,464 RAC: 304 |
The Replica is still falling behind, but at least it's not falling behind as fast as it was before. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13755 Credit: 208,696,464 RAC: 304 |
The Replica is still falling behind, but at least it's not falling behind as fast as it was before.That was just a temporary glitch. The Replica is now back to getting as far behind as fast as it can. Grant Darwin NT |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 |
The Replica is still falling behind, but at least it's not falling behind as fast as it was before.That was just a temporary glitch. The Replica is now back to getting as far behind as fast as it can. Thanks for the giggle after work. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.