The Server Issues / Outages Thread - Panic Mode On! (119)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 107 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2036483 - Posted: 7 Mar 2020, 10:39:33 UTC - in response to Message 2036477.  

Got the list for your second wingmate:
https://setiathome.berkeley.edu/workunit.php?wuid=3861282516
I'll paste the last page in plain text, to save stressing the servers any more.

8484925894 3857466952 27 Jan 2020, 23:17:25 UTC 2 Feb 2020, 19:28:32 UTC Completed, waiting for validation 12.58 10.08 pending SETI@home v8 v8.20 (opencl_ati5_SoG_mac)
x86_64-apple-darwin
8484781777 3857403492 27 Jan 2020, 22:26:07 UTC 20 Mar 2020, 13:52:07 UTC In progress --- --- --- SETI@home v8 v8.05
x86_64-apple-darwin
8478387349 3853330864 26 Jan 2020, 13:08:24 UTC 8 Feb 2020, 8:00:21 UTC Completed, validation inconclusive 3,087.29 259.78 pending SETI@home v8 v8.00 (opencl_intel_gpu_sah)
x86_64-apple-darwin
8477774759 3854401397 26 Jan 2020, 8:34:43 UTC 27 Jan 2020, 1:18:17 UTC Completed, validation inconclusive 112.47 25.07 pending SETI@home v8 v8.20 (opencl_ati5_SoG_mac)
x86_64-apple-darwin
8470877534 3851643811 24 Jan 2020, 10:57:47 UTC 26 Jan 2020, 5:23:06 UTC Completed, waiting for validation 9.31 5.86 pending SETI@home v8 v8.20 (opencl_ati5_SoG_mac)
x86_64-apple-darwin
8469487773 3851088659 24 Jan 2020, 4:50:26 UTC 3 Feb 2020, 8:00:12 UTC Completed, validation inconclusive 5,607.09 315.76 pending SETI@home v8 v8.00 (opencl_intel_gpu_sah)
x86_64-apple-darwin
8446482330 3840880946 16 Jan 2020, 3:39:18 UTC 27 Jan 2020, 23:39:22 UTC Completed, validation inconclusive 4,288.40 205.55 pending SETI@home v8 v8.00 (opencl_intel_gpu_sah)
x86_64-apple-darwin
8445832229 3840571186 15 Jan 2020, 23:57:54 UTC 9 Mar 2020, 4:57:36 UTC In progress --- --- --- SETI@home v8 v8.05
x86_64-apple-darwin
8445610988 3840463677 15 Jan 2020, 21:12:11 UTC 26 Jan 2020, 8:02:14 UTC Completed, waiting for validation 2,266.98 140.51 pending SETI@home v8 v8.00 (opencl_intel_gpu_sah)
x86_64-apple-darwin
8443744341 3839606065 13 Jan 2020, 20:58:42 UTC 7 Mar 2020, 1:58:24 UTC Timed out - no response 0.00 0.00 --- SETI@home v8 v8.05 (mac_intel32)
i686-apple-darwin
8438563644 3837285679 12 Jan 2020, 14:17:14 UTC 8 Mar 2020, 3:10:09 UTC In progress --- --- --- SETI@home v8 v8.05
x86_64-apple-darwin
8419584286 3828746925 8 Jan 2020, 9:12:19 UTC 30 Mar 2020, 3:13:51 UTC In progress --- --- --- SETI@home v8 v8.05
x86_64-apple-darwin
8419584868 3828747259 8 Jan 2020, 9:12:19 UTC 30 Mar 2020, 3:13:51 UTC In progress --- --- --- SETI@home v8 v8.05
x86_64-apple-darwin
8418449958 3828209706 8 Jan 2020, 4:04:08 UTC 19 Jan 2020, 0:12:33 UTC Completed, validation inconclusive 2,480.45 158.94 pending SETI@home v8 v8.00 (opencl_intel_gpu_sah)
x86_64-apple-darwin

Handy that it's an apple-darwin, so we see all the problems.

In progress = ghost
Timed out - I'll follow up that WU
validation inconclusive - more wingmen, more pendings
waiting for validation - more wingmen, more pendings

Many shorties, so they may be automatic extra checks for the bad drivers.
ID: 2036483 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2036484 - Posted: 7 Mar 2020, 10:49:02 UTC - in response to Message 2036483.  

Timed out - I'll follow up that WU
Very interesting - we've caught one in the act. WU 3839606065

It shows as timed out on the original computer's task list, but here it's validated

8443744340	8741853	13 Jan 2020, 20:58:37 UTC	14 Jan 2020, 3:22:27 UTC	Completed and validated	10,428.06	10,387.59	61.29	SETI@home v8 v8.00
x86_64-pc-linux-gnu
8443744341	8825095	13 Jan 2020, 20:58:42 UTC	7 Mar 2020, 8:01:24 UTC		Completed and validated	11,530.82	10,085.35	61.29	SETI@home v8 v8.05 (mac_intel32)
i686-apple-darwin
8619499217	8294363	7 Mar 2020, 1:58:27 UTC		29 Apr 2020, 6:58:09 UTC	In progress	---	---	---	SETI@home v8
Anonymous platform (CPU)
But the server had already created and sent out yet another replication.
ID: 2036484 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2036486 - Posted: 7 Mar 2020, 10:59:04 UTC - in response to Message 2036477.  

https://setiathome.berkeley.edu/workunit.php?wuid=3860194203
Your first wingmate is a typical 'walk-away', despite the recent contact:

8491240699 3860194403 30 Jan 2020, 5:16:58 UTC 23 Mar 2020, 10:16:40 UTC In progress --- --- --- SETI@home v8 v8.05
windows_x86_64
8491240701 3860194395 30 Jan 2020, 5:16:58 UTC 23 Mar 2020, 10:16:40 UTC In progress --- --- --- SETI@home v8 v8.05
windows_x86_64
8491240703 3860194409 30 Jan 2020, 5:16:58 UTC 22 Mar 2020, 20:58:10 UTC In progress --- --- --- SETI@home v8 v8.05
windows_x86_64
8458013734 3843718439 19 Jan 2020, 4:12:01 UTC 11 Mar 2020, 16:29:55 UTC In progress --- --- --- SETI@home v8 v8.00
windows_intelx86
8458013663 3845972264 19 Jan 2020, 4:12:01 UTC 12 Mar 2020, 15:42:47 UTC In progress --- --- --- SETI@home v8 v8.00
windows_intelx86
8458013730 3845972449 19 Jan 2020, 4:12:00 UTC 12 Mar 2020, 11:27:25 UTC In progress --- --- --- SETI@home v8 v8.00
windows_intelx86
8458013732 3845972455 19 Jan 2020, 4:12:00 UTC 12 Mar 2020, 16:07:20 UTC In progress --- --- --- SETI@home v8 v8.00
windows_intelx86
8321265296 3782772627 10 Dec 2019, 5:56:10 UTC 26 Dec 2019, 9:53:27 UTC Completed, validation inconclusive 6,215.97 183.42 pending SETI@home v8 v8.20 (opencl_intel_gpu_sah)
windows_intelx86

Possibly switches the machine on for work only, but has the 'suspend BOINC while user is active' option set - so BOINC never has a chance to actually run anything.
ID: 2036486 · Report as offensive     Reply Quote
Profile Kissagogo27 Special Project $75 donor
Avatar

Send message
Joined: 6 Nov 99
Posts: 716
Credit: 8,032,827
RAC: 62
France
Message 2036495 - Posted: 7 Mar 2020, 12:06:24 UTC - in response to Message 2036411.  
Last modified: 7 Mar 2020, 12:09:44 UTC

Assimilation logjam holds tasks in that 15 milllion result SSP slot for about three days only. So the quorum 1 tasks have been assimilated and deleted long ago.


not really true, look at thisone

quorum minimum 1
réplication initiale 2
nombre maximum de tâches en erreur/totales/succès 5, 10, 5

it's between the minimal Quorum 1 setting and back to the normal Quorum 2 ... and still here .

it must be a lot with this case ;)
ID: 2036495 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2036503 - Posted: 7 Mar 2020, 13:21:12 UTC - in response to Message 2036482.  

So, my best guess (prediction) is that these will turn out to be ghost tasks, never received and never to be crunched. They will reach deadline and time out on 23 March. What happens then, I'm less certain about. The minimum quorum of one, but initial replication of two, is an unusual combination, and we don't know exactly how the SETI daemons are programmed to cope with it. Ideally, a simple 'finished/purge', but my concern would be that the system, in its current configuration, might create and send out a replacement task.
If the wingman returns the result, he gets his credit and the task moves on. And my observations suggest that he gets the credit regardless of what he returns, so the results aren't even compared when the workunit has been validated already. I have seen several cases where a wildly different result returned to an already validated workunit gets the full credit.

I haven't observed what happens when the tasks time out but I'm fairly certain that the wingman just gets the error and the task moves on without being replicated further.
ID: 2036503 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2036504 - Posted: 7 Mar 2020, 13:26:14 UTC - in response to Message 2036495.  

Assimilation logjam holds tasks in that 15 milllion result SSP slot for about three days only. So the quorum 1 tasks have been assimilated and deleted long ago.
not really true, look at thisone
This one is still being crunched by one host just like the two tasks in my list. It's not stuck in assimilation but hangs around simply because deleting it from the database would orphan the result that may still be in the cache of that host. Although it is most likely a ghost that the server only thinks the host has.
ID: 2036504 · Report as offensive     Reply Quote
Ville Saari
Avatar

Send message
Joined: 30 Nov 00
Posts: 1158
Credit: 49,177,052
RAC: 82,530
Finland
Message 2036526 - Posted: 7 Mar 2020, 15:25:44 UTC

Assimilation queue is bigger than ever. Almost 5 milllion workunits. Which means nearly 11 milllion results. Over 50% of all the results in the database are now stuck in assimilation.
ID: 2036526 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 2036601 - Posted: 7 Mar 2020, 22:59:33 UTC

Message boards sluggish, Scheduler barely responsive (minute or 2 to respond with "Project has no tasks available", even when there are).
Grant
Darwin NT
ID: 2036601 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 2036608 - Posted: 7 Mar 2020, 23:21:03 UTC - in response to Message 2036604.  

SETI is basically in hibernation already. March 31st would make no big difference.
The forums will be faster & so will the Scheduler responses.
Grant
Darwin NT
ID: 2036608 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 2036609 - Posted: 7 Mar 2020, 23:31:26 UTC

Managed to pick up 1 WU, and it started to download & then got stuck 90% done & even disabling & re-enabling network access won't budge it (till i posted this of course).
Grant
Darwin NT
ID: 2036609 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2036613 - Posted: 7 Mar 2020, 23:36:56 UTC - in response to Message 2036604.  

SETI is basically in hibernation already. March 31st would make no big difference.

The project is not in hibernation until the 31st of March If you want to crunch crunch if you don't want to crunch don't crunch.. That is my two cents on this topic
ID: 2036613 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 2036620 - Posted: 7 Mar 2020, 23:56:20 UTC - in response to Message 2036613.  

The project is not in hibernation until the 31st of March If you want to crunch crunch if you don't want to crunch don't crunch.
Yes, but to be able to crunch, you have to be able to get work. That is becoming increasingly difficult.
Grant
Darwin NT
ID: 2036620 · Report as offensive     Reply Quote
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2036625 - Posted: 8 Mar 2020, 0:04:54 UTC - in response to Message 2036620.  

The project is not in hibernation until the 31st of March If you want to crunch crunch if you don't want to crunch don't crunch.
Yes, but to be able to crunch, you have to be able to get work. That is becoming increasingly difficult.

I would agree with that, it's a case of having to go with the flow. Last night when I went to bed around 930 New Zealand time the return rate was like we had come out of the Tuesday outage it was something like over 251,000
ID: 2036625 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2036639 - Posted: 8 Mar 2020, 0:49:50 UTC

Power outages in California appear to be over. That should have helped ease some of the stress of incoming tasks.
ID: 2036639 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2036641 - Posted: 8 Mar 2020, 0:52:40 UTC - in response to Message 2036639.  

And the Replica is nearly 10 hours behind. Was it an unfortunate power victim?
ID: 2036641 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 2036642 - Posted: 8 Mar 2020, 1:03:47 UTC

Whatever was going in California recently doesn't appear to have affected UC Berkeley.
Grant
Darwin NT
ID: 2036642 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2036645 - Posted: 8 Mar 2020, 1:14:02 UTC - in response to Message 2036641.  
Last modified: 8 Mar 2020, 1:15:06 UTC

Was it an unfortunate power victim?


One of several reasons that SETI@Home moved all the project's servers to the UC Berkeley colocation (aka CoLo) facility is that it has a backup generator for outages, and an enterprise-grade UPS that will keep the entire CoLo online until the generator kicks in. During last year's Berkeley outages due to PG&E cutting power to prevent fires that caused classes to be cancelled and closed most of the campus, I don't think SETI@Home had so much as a hiccup.
ID: 2036645 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 2036649 - Posted: 8 Mar 2020, 1:34:36 UTC

The Replica is still falling behind, but at least it's not falling behind as fast as it was before.
Grant
Darwin NT
ID: 2036649 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 2036695 - Posted: 8 Mar 2020, 6:04:42 UTC - in response to Message 2036649.  

The Replica is still falling behind, but at least it's not falling behind as fast as it was before.
That was just a temporary glitch. The Replica is now back to getting as far behind as fast as it can.
Grant
Darwin NT
ID: 2036695 · Report as offensive     Reply Quote
AllgoodGuy

Send message
Joined: 29 May 01
Posts: 293
Credit: 16,348,499
RAC: 266
United States
Message 2036727 - Posted: 8 Mar 2020, 12:28:00 UTC - in response to Message 2036695.  

The Replica is still falling behind, but at least it's not falling behind as fast as it was before.
That was just a temporary glitch. The Replica is now back to getting as far behind as fast as it can.


Thanks for the giggle after work.
ID: 2036727 · Report as offensive     Reply Quote
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 107 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (119)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.