Storage machine crash....

Message boards : News : Storage machine crash....
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1344
Credit: 44,293,336
RAC: 60,387
United States
Message 1979420 - Posted: 8 Feb 2019, 21:34:28 UTC

A machine that was holding 15% of our outgoing workunits has crashed and refuses to start back up. Short term it means that attempts to access those workunits will cause an error until the workunit is marked as bad.

Sorry for the incovenience.
@SETIEric

ID: 1979420 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 25397
Credit: 48,680,508
RAC: 25,683
United States
Message 1979430 - Posted: 8 Feb 2019, 22:09:51 UTC

eeeeekk
ID: 1979430 · Report as offensive
FurryGuy
Volunteer tester

Send message
Joined: 1 Jun 04
Posts: 3
Credit: 8,056,383
RAC: 7,947
United States
Message 1979445 - Posted: 8 Feb 2019, 22:54:31 UTC - in response to Message 1979420.  

A machine that was holding 15% of our outgoing workunits has crashed and refuses to start back up. Short term it means that attempts to access those workunits will cause an error until the workunit is marked as bad.

Sorry for the incovenience.
So......

should we wait for the server to catch up on its own, or should we abort any stalled download WUs?
ID: 1979445 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11478
Credit: 167,496,176
RAC: 105,541
Australia
Message 1979446 - Posted: 8 Feb 2019, 22:58:47 UTC - in response to Message 1979445.  
Last modified: 8 Feb 2019, 23:00:42 UTC

should we wait for the server to catch up on its own, or should we abort any stalled download WUs?

I wouldn't abort as there are stalled downloads occurring even with the new work units being allocated for download, they will download eventually. But it's taking a while for them to start, and often with extended pauses & restarts to eventually download.

Oh, and even though the splitters show as running, they're not actually producing much work at the moment. So work is going to remain extremely scarce for some time yet.
Grant
Darwin NT
ID: 1979446 · Report as offensive
Profile ronssito
Avatar

Send message
Joined: 8 Feb 00
Posts: 14
Credit: 32,167,063
RAC: 46,959
United States
Message 1979514 - Posted: 9 Feb 2019, 2:36:35 UTC

Thanks and great job guys!
ID: 1979514 · Report as offensive
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41574
Credit: 41,972,691
RAC: 355
Message 1979540 - Posted: 9 Feb 2019, 6:40:48 UTC

Thanks for keeping us in touch with events Eric.
ID: 1979540 · Report as offensive
Profile J3P-0
Avatar

Send message
Joined: 1 Dec 11
Posts: 42
Credit: 18,091,442
RAC: 19,626
United States
Message 1979582 - Posted: 9 Feb 2019, 17:27:30 UTC
Last modified: 9 Feb 2019, 17:35:41 UTC

Thanks for the update, as of this AM 11:25 CST have a bunch of tasks waiting to report with nothing downloaded. Should I continue to wait, abort or will it pick back up when storage is back online?

EDIT: seems a reboot fixed my issue.
ID: 1979582 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 17589
Credit: 396,845,582
RAC: 185,161
United Kingdom
Message 1979594 - Posted: 9 Feb 2019, 18:54:20 UTC

Don't abort them as only part of the storage system has been failed and there is no way for us to identify if a task was distributed from the failed part or the part that is working correctly (only about 15% of the tasks were being by the failed computer).
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1979594 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1344
Credit: 44,293,336
RAC: 60,387
United States
Message 1979678 - Posted: 10 Feb 2019, 7:28:47 UTC

The system eventually came back up and we're getting the missing workunits back online as quickly as we can. There will still be some download errors as things will be out of synchronization for a while. Some workunits that exist in the database may not have been flushed to disk before the system went down (although in theory our disk controllers shouldn't allow that to happen).
@SETIEric

ID: 1979678 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2766
Credit: 521,482,774
RAC: 848,919
Canada
Message 1979679 - Posted: 10 Feb 2019, 7:53:04 UTC - in response to Message 1979678.  

Thanks for the update Eric,
I noticed some tasks making progress now. Such as this one.
ID: 1979679 · Report as offensive
Profile Ray Cameron
Avatar

Send message
Joined: 2 Sep 04
Posts: 5
Credit: 107,850
RAC: 0
Canada
Message 1980093 - Posted: 13 Feb 2019, 14:06:44 UTC - in response to Message 1979420.  

Any idea when there will be data available to process?

Ray
ID: 1980093 · Report as offensive
Profile Gone with the wind Crowdfunding Project Donor*Special Project $75 donor
Volunteer tester

Send message
Joined: 19 Nov 00
Posts: 41574
Credit: 41,972,691
RAC: 355
Message 1980100 - Posted: 13 Feb 2019, 14:23:52 UTC

Hassle you could have done without I expect Eric! Thanks for keeping us workers at the coalface in touch.
ID: 1980100 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 17589
Credit: 396,845,582
RAC: 185,161
United Kingdom
Message 1980102 - Posted: 13 Feb 2019, 14:26:54 UTC

As the servers are coming back to life after a ~24hour break one can expect it to be a S-L-O-W process.
(Given the time they started to come back I would guess that it's an automated job, and was triggered by some task or other starting to live again)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1980102 · Report as offensive
Profile Ray Cameron
Avatar

Send message
Joined: 2 Sep 04
Posts: 5
Credit: 107,850
RAC: 0
Canada
Message 1980157 - Posted: 13 Feb 2019, 19:56:08 UTC - in response to Message 1980093.  

2:55 Eastern time and my computer just downloaded and I'm now processing!
ID: 1980157 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1344
Credit: 44,293,336
RAC: 60,387
United States
Message 1980206 - Posted: 14 Feb 2019, 1:28:21 UTC

Sorry for the late notice. The problems we had bloated the result table to about double its normal size. Hopefully it will be back down to normal next week.
@SETIEric

ID: 1980206 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 20865
Credit: 2,889,451
RAC: 1,409
Ireland
Message 1980209 - Posted: 14 Feb 2019, 1:42:31 UTC - in response to Message 1980206.  

Wow, still on campus at 17:20? Whatever will CRL say?

Thanks for what you guys do. :-)
ID: 1980209 · Report as offensive
Profile ronssito
Avatar

Send message
Joined: 8 Feb 00
Posts: 14
Credit: 32,167,063
RAC: 46,959
United States
Message 1980298 - Posted: 14 Feb 2019, 15:24:06 UTC

3rd consecutive day of no boinc stats update
ID: 1980298 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1344
Credit: 44,293,336
RAC: 60,387
United States
Message 1980347 - Posted: 14 Feb 2019, 19:40:04 UTC - in response to Message 1980298.  

3rd consecutive day of no boinc stats update


Not sure why that would be. Our stats files are in place and have current timestamps.

https://setiathome.berkeley.edu/stats/
@SETIEric

ID: 1980347 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 13028
Credit: 142,857,673
RAC: 190,810
United Kingdom
Message 1980349 - Posted: 14 Feb 2019, 19:50:45 UTC - in response to Message 1980347.  

BOINCstats has processed one now. Probably all three days' worth in one go, judging by the figures on my account.
ID: 1980349 · Report as offensive
Profile ronssito
Avatar

Send message
Joined: 8 Feb 00
Posts: 14
Credit: 32,167,063
RAC: 46,959
United States
Message 1980355 - Posted: 14 Feb 2019, 20:11:12 UTC
Last modified: 14 Feb 2019, 20:15:22 UTC

my boinc stats update each morning so tomorrow we shall see
ID: 1980355 · Report as offensive
1 · 2 · Next

Message boards : News : Storage machine crash....


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.