Hills and Valleys (Feb 10 2011)


log in

Advanced search

Message boards : Technical News : Hills and Valleys (Feb 10 2011)

1 · 2 · 3 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1391
Credit: 74,079
RAC: 10
United States
Message 1075931 - Posted: 10 Feb 2011, 22:13:48 UTC
Last modified: 10 Feb 2011, 22:13:59 UTC

First the good news. I have thumper all configured and ready to roll as our mega file server. In fact it's already rolling. Note this isn't a public facing server, but will indirectly help the various public services in many ways, including making the sysadmins working on SETI@home/BOINC a lot happier in general. Lots of really fast disk storage for database backups, raw data transfer buffers, doesn't randomly reboot itself like our current home account server, etc.

Mmmkay. Now the less good news. Looks like gowron is having some fundamental RAID issues. The issues has been whittled down to one RAID1 pair tagged as degraded that won't rebuild no matter what we do. THe guys at Overland have been super helpful - but this is actually an old SnapAppliance (not a box that Overland sells) and running a (very) old version of the OS. So it's looking like our best bet to move forward is to upgrade the OS on the thing. However to do so we need to copy the workunits on the system (about 2 terabyte's worth) elsewhere temporarily. How about... thumper! That copy process is happening now.

Meanwhile, we'll be off for the foreseeable future. Like at least until next week, I imagine. Bummer.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile KWSN Ekky Ekky Ekky
Avatar
Send message
Joined: 25 May 99
Posts: 928
Credit: 12,558,066
RAC: 10,733
United Kingdom
Message 1075933 - Posted: 10 Feb 2011, 22:18:57 UTC - in response to Message 1075931.

OK Matt, well done and all as usual. I simply have no idea how you people do what you do.
I am just down to CUDA tasks now so it'll be one at a time for some time.
____________

Profile [seti.international] Dirk Sadowski
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7122
Credit: 61,600,033
RAC: 16,335
Germany
Message 1075934 - Posted: 10 Feb 2011, 22:19:37 UTC - in response to Message 1075931.

(...)
Meanwhile, we'll be off for the foreseeable future. Like at least until next week, I imagine. Bummer.

- Matt


Matt, thanks for the news!


Ohh.. at least until next week? A pity.. ET is waiting.. ;-)

____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4241
Credit: 34,941,226
RAC: 23,076
United Kingdom
Message 1075937 - Posted: 10 Feb 2011, 22:22:32 UTC - in response to Message 1075931.

Thanks for the update Matt, well done on getting thumper up and running, good luck on getting gowron running properly,

Claggy

Profile Jeff Mercer
Send message
Joined: 14 Aug 08
Posts: 90
Credit: 162,139
RAC: 0
United States
Message 1075950 - Posted: 10 Feb 2011, 22:59:26 UTC

Thanks for the news Matt. I've been checking in every few hours, but looks like my little H.P. computer will have a nice, long, cool down break. Wish that there was something I could do to help you, but I don't know a thing about a server. Hope all goes well with the repairs.

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 13176
Credit: 7,915,030
RAC: 14,471
United States
Message 1075951 - Posted: 10 Feb 2011, 22:59:54 UTC

As they say, the next work unit in the splitter has ET on it.

Thanks for the update. Dang my cache will be full of other projects by then and everything will go into EDF mode.

____________

Profile S@NL - Eesger - www.knoop.nl
Avatar
Send message
Joined: 7 Oct 01
Posts: 384
Credit: 37,763,922
RAC: 10,967
Netherlands
Message 1075955 - Posted: 10 Feb 2011, 23:08:56 UTC
Last modified: 10 Feb 2011, 23:10:18 UTC

As always, thanks for the update.

Ouch, the best of luck getting it all working properly again!

PS: maybe in about three weeks we may be able to make life at your end a (tiny?) little bit easier ;)
____________
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS

Profile RottenMutt
Avatar
Send message
Joined: 15 Mar 01
Posts: 998
Credit: 209,253,122
RAC: 49,623
United States
Message 1075961 - Posted: 10 Feb 2011, 23:31:12 UTC - in response to Message 1075931.

...
Meanwhile, we'll be off for the foreseeable future. Like at least until next week, I imagine. Bummer.

- Matt



Can't we do better then that. About 6 hours to copy the data one way. You should not have to copy it back as it shouldn't be distroyed.
____________

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1391
Credit: 74,079
RAC: 10
United States
Message 1075972 - Posted: 10 Feb 2011, 23:53:36 UTC - in response to Message 1075961.

Can't we do better then that. About 6 hours to copy the data one way. You should not have to copy it back as it shouldn't be distroyed.


In a perfect world, yes.

But the data is coming off a degraded RAID, and it's talking over NFS, and it's competing with various other must-get-done backups writing to the same device, and it all will in fact be destroyed as this OS upgrade on the broken system (going up 2 major versions) will wipe out all current RAID configurations to make way for the larger root filesystem. And then we'll have to copy the data back.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile SciManStevProject donor
Volunteer tester
Avatar
Send message
Joined: 20 Jun 99
Posts: 4906
Credit: 84,313,324
RAC: 27,743
United States
Message 1075976 - Posted: 11 Feb 2011, 0:03:42 UTC

Thank you Matt! It seems like we will be in a much stronger position once the OS is upgraded and the problems fixed. This sounds like a good solid fix that will get rid of a long standing problem.

Steve
____________
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8811
Credit: 53,464,187
RAC: 44,889
United Kingdom
Message 1075977 - Posted: 11 Feb 2011, 0:05:51 UTC - in response to Message 1075951.

Dang my cache will be full of other projects by then and everything will go into EDF mode.

If you want to crunch SETI above all else, why on earth would you want to punish other projects by allowing them to fill a SETI-sized cache and then run into deadline trouble?

Turn the cache down while you know there's no work, then turn it back up - gradually - once SETI is back and work is flowing. No EDF, no deadlines missed, fastest possible return to SETI crunching, least stress on the download servers and comms. What's to lose?

Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 32628
Credit: 14,495,349
RAC: 13,398
United Kingdom
Message 1076079 - Posted: 11 Feb 2011, 9:57:49 UTC
Last modified: 11 Feb 2011, 9:59:23 UTC

It's a shame to hear that that we will be down until next week, but if it has to be then it has to be. However, if as a by-product, Thumper is now helping to keep SETI@home/BOINC sysadmins happy, a daunting task at any time, then that is worth a few brownie points on its own!

I don't know about anyone else, but until fairly recently I thought the sum total of Seti's kit was as listed on the server page. I didn't realise there were other non-public facing machines like Gowron behind the scenes. Clearly the Seti project is even more complicated to Admin than I had previously thought.

Could we have a similar list of backroom kit, saying what they are and what they do?
____________
Damsel Rescuer, Uli Devotee, Julie Supporter, ES99 Admirer,
Raccoon Friend, Anniet fan, Shining Knight in Armour


Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 8597
Credit: 99,230,132
RAC: 52,351
Australia
Message 1076092 - Posted: 11 Feb 2011, 11:54:12 UTC - in response to Message 1076079.

Could we have a similar list of backroom kit, saying what they are and what they do?

This question has been asked several times now over the last few months but still no answer has been given.

Cheers.
____________

Profile APCyberax
Volunteer tester
Send message
Joined: 6 Jun 01
Posts: 29
Credit: 2,000,348
RAC: 0
United Kingdom
Message 1076132 - Posted: 11 Feb 2011, 15:04:18 UTC - in response to Message 1076092.

explains the lack of reporting i had. out of WU on my pc but the work server still has a day or so....
Will do some rosetta while you get things back up.
Will be here when you get things back up.

take your time and thanks for the update.


____________

Profile Kibble (KB7TIB)
Avatar
Send message
Joined: 6 Dec 99
Posts: 21
Credit: 1,918,867
RAC: 4,574
United States
Message 1076144 - Posted: 11 Feb 2011, 16:07:28 UTC

Echoing KWSN Ekky above, it's just down to the cuda units being chewed up from SETI, one-by-one now. Figured it was time to start punishing the EINSTEIN@home servers and down loaded a bunch of their work units. (One project or another will have available work.) Take your time and get the job done right. Can't wait to start building up the Pending list again. :-)

And thank you for both the info and the never-ending efforts of the whole SETI crew. You are the original True Believers.
____________

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3746
Credit: 48,777,915
RAC: 1,076
United States
Message 1076151 - Posted: 11 Feb 2011, 16:34:57 UTC - in response to Message 1076079.

It's a shame to hear that that we will be down until next week, but if it has to be then it has to be. However, if as a by-product, Thumper is now helping to keep SETI@home/BOINC sysadmins happy, a daunting task at any time, then that is worth a few brownie points on its own!

I don't know about anyone else, but until fairly recently I thought the sum total of Seti's kit was as listed on the server page. I didn't realise there were other non-public facing machines like Gowron behind the scenes. Clearly the Seti project is even more complicated to Admin than I had previously thought.

Could we have a similar list of backroom kit, saying what they are and what they do?


Best I could locate on a fast trip through the thread and the about pages, there have been a few changes since then.

http://setiathome.berkeley.edu/forum_thread.php?id=62056#1049143
http://setiathome.berkeley.edu/sah_photos.php?album=closet_02_14_2008
http://setiathome.berkeley.edu/sah_photos.php?album=closet_12_22_2008
http://setiathome.berkeley.edu/forum_thread.php?id=62143#1052095

____________

baron_iv
Volunteer tester
Avatar
Send message
Joined: 4 Nov 02
Posts: 81
Credit: 20,278,528
RAC: 52,229
United States
Message 1076159 - Posted: 11 Feb 2011, 16:58:29 UTC

I have a feeling that your download servers are gonna be slammed the second they come back up. I have over 2100 tasks ready to be returned on all my computers combined and my RAC is a pittance compared to the big boys, who will have many many times more. I hope that doesn't crash everything again. I'm sure y'all are clever enough to prevent that though.

As always, thanks for all that you mighty admins do to keep things up and running and I'll be here ready to crunch more tasks when you get everything back online. :)

Also, have a nice weekend!
____________
-baron_iv
Proud member of:

Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 2003
Credit: 11,191,747
RAC: 13,152
United States
Message 1076172 - Posted: 11 Feb 2011, 17:30:16 UTC

I can top all of ya - I've got one computer with 4 down projects! (out of 7...) CDPN, MilkyWay, SETI, and SETBeta are all either down, or giving me the random "Servers not available" (CDPN)
____________
.

Mooncalf
Send message
Joined: 5 Jan 11
Posts: 19
Credit: 20,196,239
RAC: 0
United States
Message 1076176 - Posted: 11 Feb 2011, 17:36:05 UTC

Is there anything that the "consuming public" can do to assist an more expedient positive outcome?

Benivo.

Profile Corvid
Avatar
Send message
Joined: 31 Oct 05
Posts: 12
Credit: 4,910,906
RAC: 1,120
United States
Message 1076197 - Posted: 11 Feb 2011, 18:51:24 UTC

Thanks for the update Matt,

Looks like I'll run out of work units some time tonight.

Too bad you've been having so much trouble with storage lately, all the hard drive and RAID issues. Is there a reason you don't use a SAN solution or is it just a matter of funding for all the hardware that would involve?

Hope everything comes up better and stronger when the repairs are done.
____________

1 · 2 · 3 · Next

Message boards : Technical News : Hills and Valleys (Feb 10 2011)

Copyright © 2014 University of California