Storage machine crash....

Message boards : News : Storage machine crash....
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 1979420 - Posted: 8 Feb 2019, 21:34:28 UTC

A machine that was holding 15% of our outgoing workunits has crashed and refuses to start back up. Short term it means that attempts to access those workunits will cause an error until the workunit is marked as bad.

Sorry for the incovenience.
@SETIEric@qoto.org (Mastodon)

ID: 1979420 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30608
Credit: 53,134,872
RAC: 32
United States
Message 1979430 - Posted: 8 Feb 2019, 22:09:51 UTC

eeeeekk
ID: 1979430 · Report as offensive
FurryGuy
Volunteer tester

Send message
Joined: 1 Jun 04
Posts: 6
Credit: 9,294,513
RAC: 1
United States
Message 1979445 - Posted: 8 Feb 2019, 22:54:31 UTC - in response to Message 1979420.  

A machine that was holding 15% of our outgoing workunits has crashed and refuses to start back up. Short term it means that attempts to access those workunits will cause an error until the workunit is marked as bad.

Sorry for the incovenience.
So......

should we wait for the server to catch up on its own, or should we abort any stalled download WUs?
ID: 1979445 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1979446 - Posted: 8 Feb 2019, 22:58:47 UTC - in response to Message 1979445.  
Last modified: 8 Feb 2019, 23:00:42 UTC

should we wait for the server to catch up on its own, or should we abort any stalled download WUs?

I wouldn't abort as there are stalled downloads occurring even with the new work units being allocated for download, they will download eventually. But it's taking a while for them to start, and often with extended pauses & restarts to eventually download.

Oh, and even though the splitters show as running, they're not actually producing much work at the moment. So work is going to remain extremely scarce for some time yet.
Grant
Darwin NT
ID: 1979446 · Report as offensive
Profile ronssito
Avatar

Send message
Joined: 8 Feb 00
Posts: 19
Credit: 43,465,609
RAC: 63
United States
Message 1979514 - Posted: 9 Feb 2019, 2:36:35 UTC

Thanks and great job guys!
ID: 1979514 · Report as offensive
Profile J3P-0
Avatar

Send message
Joined: 1 Dec 11
Posts: 45
Credit: 25,258,781
RAC: 180
United States
Message 1979582 - Posted: 9 Feb 2019, 17:27:30 UTC
Last modified: 9 Feb 2019, 17:35:41 UTC

Thanks for the update, as of this AM 11:25 CST have a bunch of tasks waiting to report with nothing downloaded. Should I continue to wait, abort or will it pick back up when storage is back online?

EDIT: seems a reboot fixed my issue.
ID: 1979582 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1979594 - Posted: 9 Feb 2019, 18:54:20 UTC

Don't abort them as only part of the storage system has been failed and there is no way for us to identify if a task was distributed from the failed part or the part that is working correctly (only about 15% of the tasks were being by the failed computer).
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1979594 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 1979678 - Posted: 10 Feb 2019, 7:28:47 UTC

The system eventually came back up and we're getting the missing workunits back online as quickly as we can. There will still be some download errors as things will be out of synchronization for a while. Some workunits that exist in the database may not have been flushed to disk before the system went down (although in theory our disk controllers shouldn't allow that to happen).
@SETIEric@qoto.org (Mastodon)

ID: 1979678 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1979679 - Posted: 10 Feb 2019, 7:53:04 UTC - in response to Message 1979678.  

Thanks for the update Eric,
I noticed some tasks making progress now. Such as this one.
ID: 1979679 · Report as offensive
Profile Ray Cameron
Avatar

Send message
Joined: 2 Sep 04
Posts: 5
Credit: 107,850
RAC: 0
Canada
Message 1980093 - Posted: 13 Feb 2019, 14:06:44 UTC - in response to Message 1979420.  

Any idea when there will be data available to process?

Ray
ID: 1980093 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1980102 - Posted: 13 Feb 2019, 14:26:54 UTC

As the servers are coming back to life after a ~24hour break one can expect it to be a S-L-O-W process.
(Given the time they started to come back I would guess that it's an automated job, and was triggered by some task or other starting to live again)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1980102 · Report as offensive
Profile Ray Cameron
Avatar

Send message
Joined: 2 Sep 04
Posts: 5
Credit: 107,850
RAC: 0
Canada
Message 1980157 - Posted: 13 Feb 2019, 19:56:08 UTC - in response to Message 1980093.  

2:55 Eastern time and my computer just downloaded and I'm now processing!
ID: 1980157 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 1980206 - Posted: 14 Feb 2019, 1:28:21 UTC

Sorry for the late notice. The problems we had bloated the result table to about double its normal size. Hopefully it will be back down to normal next week.
@SETIEric@qoto.org (Mastodon)

ID: 1980206 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24877
Credit: 3,081,182
RAC: 7
Ireland
Message 1980209 - Posted: 14 Feb 2019, 1:42:31 UTC - in response to Message 1980206.  

Wow, still on campus at 17:20? Whatever will CRL say?

Thanks for what you guys do. :-)
ID: 1980209 · Report as offensive
Profile ronssito
Avatar

Send message
Joined: 8 Feb 00
Posts: 19
Credit: 43,465,609
RAC: 63
United States
Message 1980298 - Posted: 14 Feb 2019, 15:24:06 UTC

3rd consecutive day of no boinc stats update
ID: 1980298 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 1980347 - Posted: 14 Feb 2019, 19:40:04 UTC - in response to Message 1980298.  

3rd consecutive day of no boinc stats update


Not sure why that would be. Our stats files are in place and have current timestamps.

https://setiathome.berkeley.edu/stats/
@SETIEric@qoto.org (Mastodon)

ID: 1980347 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1980349 - Posted: 14 Feb 2019, 19:50:45 UTC - in response to Message 1980347.  

BOINCstats has processed one now. Probably all three days' worth in one go, judging by the figures on my account.
ID: 1980349 · Report as offensive
Profile ronssito
Avatar

Send message
Joined: 8 Feb 00
Posts: 19
Credit: 43,465,609
RAC: 63
United States
Message 1980355 - Posted: 14 Feb 2019, 20:11:12 UTC
Last modified: 14 Feb 2019, 20:15:22 UTC

my boinc stats update each morning so tomorrow we shall see
ID: 1980355 · Report as offensive
Roland

Send message
Joined: 1 Feb 19
Posts: 1
Credit: 155,648
RAC: 0
Message 1980471 - Posted: 15 Feb 2019, 10:33:47 UTC - in response to Message 1979420.  

Hello Eric,
would it be possible to renew the "Status of the UC-Berkeley SETI Efforts (Korpela, et al. 2011) " ?
There has been some years between 2011 and today - or has nothing changed?

Thank you
ID: 1980471 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 1980557 - Posted: 15 Feb 2019, 19:12:41 UTC - in response to Message 1980471.  

We're working on a couple of large papers right now. We'll certainly post them when we're done.
@SETIEric@qoto.org (Mastodon)

ID: 1980557 · Report as offensive
1 · 2 · Next

Message boards : News : Storage machine crash....


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.