Hills and Valleys (Feb 10 2011)

Message boards : Technical News : Hills and Valleys (Feb 10 2011)
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1075931 - Posted: 10 Feb 2011, 22:13:48 UTC
Last modified: 10 Feb 2011, 22:13:59 UTC

First the good news. I have thumper all configured and ready to roll as our mega file server. In fact it's already rolling. Note this isn't a public facing server, but will indirectly help the various public services in many ways, including making the sysadmins working on SETI@home/BOINC a lot happier in general. Lots of really fast disk storage for database backups, raw data transfer buffers, doesn't randomly reboot itself like our current home account server, etc.

Mmmkay. Now the less good news. Looks like gowron is having some fundamental RAID issues. The issues has been whittled down to one RAID1 pair tagged as degraded that won't rebuild no matter what we do. THe guys at Overland have been super helpful - but this is actually an old SnapAppliance (not a box that Overland sells) and running a (very) old version of the OS. So it's looking like our best bet to move forward is to upgrade the OS on the thing. However to do so we need to copy the workunits on the system (about 2 terabyte's worth) elsewhere temporarily. How about... thumper! That copy process is happening now.

Meanwhile, we'll be off for the foreseeable future. Like at least until next week, I imagine. Bummer.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1075931 · Report as offensive
Profile KWSN Ekky Ekky Ekky
Avatar

Send message
Joined: 25 May 99
Posts: 944
Credit: 52,956,491
RAC: 67
United Kingdom
Message 1075933 - Posted: 10 Feb 2011, 22:18:57 UTC - in response to Message 1075931.  

OK Matt, well done and all as usual. I simply have no idea how you people do what you do.
I am just down to CUDA tasks now so it'll be one at a time for some time.

ID: 1075933 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1075934 - Posted: 10 Feb 2011, 22:19:37 UTC - in response to Message 1075931.  

(...)
Meanwhile, we'll be off for the foreseeable future. Like at least until next week, I imagine. Bummer.

- Matt


Matt, thanks for the news!


Ohh.. at least until next week? A pity.. ET is waiting.. ;-)

ID: 1075934 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1075937 - Posted: 10 Feb 2011, 22:22:32 UTC - in response to Message 1075931.  

Thanks for the update Matt, well done on getting thumper up and running, good luck on getting gowron running properly,

Claggy
ID: 1075937 · Report as offensive
Profile Jeff Mercer

Send message
Joined: 14 Aug 08
Posts: 90
Credit: 162,139
RAC: 0
United States
Message 1075950 - Posted: 10 Feb 2011, 22:59:26 UTC

Thanks for the news Matt. I've been checking in every few hours, but looks like my little H.P. computer will have a nice, long, cool down break. Wish that there was something I could do to help you, but I don't know a thing about a server. Hope all goes well with the repairs.
ID: 1075950 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30593
Credit: 53,134,872
RAC: 32
United States
Message 1075951 - Posted: 10 Feb 2011, 22:59:54 UTC

As they say, the next work unit in the splitter has ET on it.

Thanks for the update. Dang my cache will be full of other projects by then and everything will go into EDF mode.

ID: 1075951 · Report as offensive
Profile S@NL - Eesger - www.knoop.nl
Avatar

Send message
Joined: 7 Oct 01
Posts: 385
Credit: 50,200,038
RAC: 0
Netherlands
Message 1075955 - Posted: 10 Feb 2011, 23:08:56 UTC
Last modified: 10 Feb 2011, 23:10:18 UTC

As always, thanks for the update.

Ouch, the best of luck getting it all working properly again!

PS: maybe in about three weeks we may be able to make life at your end a (tiny?) little bit easier ;)
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS
ID: 1075955 · Report as offensive
Profile RottenMutt
Avatar

Send message
Joined: 15 Mar 01
Posts: 1011
Credit: 230,314,058
RAC: 0
United States
Message 1075961 - Posted: 10 Feb 2011, 23:31:12 UTC - in response to Message 1075931.  

...
Meanwhile, we'll be off for the foreseeable future. Like at least until next week, I imagine. Bummer.

- Matt



Can't we do better then that. About 6 hours to copy the data one way. You should not have to copy it back as it shouldn't be distroyed.
ID: 1075961 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 1075972 - Posted: 10 Feb 2011, 23:53:36 UTC - in response to Message 1075961.  

Can't we do better then that. About 6 hours to copy the data one way. You should not have to copy it back as it shouldn't be distroyed.


In a perfect world, yes.

But the data is coming off a degraded RAID, and it's talking over NFS, and it's competing with various other must-get-done backups writing to the same device, and it all will in fact be destroyed as this OS upgrade on the broken system (going up 2 major versions) will wipe out all current RAID configurations to make way for the larger root filesystem. And then we'll have to copy the data back.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 1075972 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6651
Credit: 121,090,076
RAC: 0
United States
Message 1075976 - Posted: 11 Feb 2011, 0:03:42 UTC

Thank you Matt! It seems like we will be in a much stronger position once the OS is upgraded and the problems fixed. This sounds like a good solid fix that will get rid of a long standing problem.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1075976 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1075977 - Posted: 11 Feb 2011, 0:05:51 UTC - in response to Message 1075951.  

Dang my cache will be full of other projects by then and everything will go into EDF mode.

If you want to crunch SETI above all else, why on earth would you want to punish other projects by allowing them to fill a SETI-sized cache and then run into deadline trouble?

Turn the cache down while you know there's no work, then turn it back up - gradually - once SETI is back and work is flowing. No EDF, no deadlines missed, fastest possible return to SETI crunching, least stress on the download servers and comms. What's to lose?
ID: 1075977 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1076092 - Posted: 11 Feb 2011, 11:54:12 UTC - in response to Message 1076079.  

Could we have a similar list of backroom kit, saying what they are and what they do?

This question has been asked several times now over the last few months but still no answer has been given.

Cheers.
ID: 1076092 · Report as offensive
Profile APCyberax
Volunteer tester

Send message
Joined: 6 Jun 01
Posts: 29
Credit: 24,078,024
RAC: 48
United Kingdom
Message 1076132 - Posted: 11 Feb 2011, 15:04:18 UTC - in response to Message 1076092.  

explains the lack of reporting i had. out of WU on my pc but the work server still has a day or so....
Will do some rosetta while you get things back up.
Will be here when you get things back up.

take your time and thanks for the update.


ID: 1076132 · Report as offensive
Profile Kibble (KB7TIB)
Avatar

Send message
Joined: 6 Dec 99
Posts: 27
Credit: 10,121,469
RAC: 2
United States
Message 1076144 - Posted: 11 Feb 2011, 16:07:28 UTC

Echoing KWSN Ekky above, it's just down to the cuda units being chewed up from SETI, one-by-one now. Figured it was time to start punishing the EINSTEIN@home servers and down loaded a bunch of their work units. (One project or another will have available work.) Take your time and get the job done right. Can't wait to start building up the Pending list again. :-)

And thank you for both the info and the never-ending efforts of the whole SETI crew. You are the original True Believers.
ID: 1076144 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1076151 - Posted: 11 Feb 2011, 16:34:57 UTC - in response to Message 1076079.  

It's a shame to hear that that we will be down until next week, but if it has to be then it has to be. However, if as a by-product, Thumper is now helping to keep SETI@home/BOINC sysadmins happy, a daunting task at any time, then that is worth a few brownie points on its own!

I don't know about anyone else, but until fairly recently I thought the sum total of Seti's kit was as listed on the server page. I didn't realise there were other non-public facing machines like Gowron behind the scenes. Clearly the Seti project is even more complicated to Admin than I had previously thought.

Could we have a similar list of backroom kit, saying what they are and what they do?


Best I could locate on a fast trip through the thread and the about pages, there have been a few changes since then.

http://setiathome.berkeley.edu/forum_thread.php?id=62056#1049143
http://setiathome.berkeley.edu/sah_photos.php?album=closet_02_14_2008
http://setiathome.berkeley.edu/sah_photos.php?album=closet_12_22_2008
http://setiathome.berkeley.edu/forum_thread.php?id=62143#1052095


ID: 1076151 · Report as offensive
baron_iv
Volunteer tester
Avatar

Send message
Joined: 4 Nov 02
Posts: 109
Credit: 104,905,241
RAC: 0
United States
Message 1076159 - Posted: 11 Feb 2011, 16:58:29 UTC

I have a feeling that your download servers are gonna be slammed the second they come back up. I have over 2100 tasks ready to be returned on all my computers combined and my RAC is a pittance compared to the big boys, who will have many many times more. I hope that doesn't crash everything again. I'm sure y'all are clever enough to prevent that though.

As always, thanks for all that you mighty admins do to keep things up and running and I'll be here ready to crunch more tasks when you get everything back online. :)

Also, have a nice weekend!
-baron_iv
Proud member of:
GPU Users Group
ID: 1076159 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 1076172 - Posted: 11 Feb 2011, 17:30:16 UTC

I can top all of ya - I've got one computer with 4 down projects! (out of 7...) CDPN, MilkyWay, SETI, and SETBeta are all either down, or giving me the random "Servers not available" (CDPN)
.

Hello, from Albany, CA!...
ID: 1076172 · Report as offensive
Mooncalf

Send message
Joined: 5 Jan 11
Posts: 19
Credit: 20,196,239
RAC: 0
United States
Message 1076176 - Posted: 11 Feb 2011, 17:36:05 UTC

Is there anything that the "consuming public" can do to assist an more expedient positive outcome?

Benivo.
ID: 1076176 · Report as offensive
Profile Corvid
Avatar

Send message
Joined: 31 Oct 05
Posts: 15
Credit: 18,216,988
RAC: 11
United States
Message 1076197 - Posted: 11 Feb 2011, 18:51:24 UTC

Thanks for the update Matt,

Looks like I'll run out of work units some time tonight.

Too bad you've been having so much trouble with storage lately, all the hard drive and RAID issues. Is there a reason you don't use a SAN solution or is it just a matter of funding for all the hardware that would involve?

Hope everything comes up better and stronger when the repairs are done.
ID: 1076197 · Report as offensive
zii

Send message
Joined: 24 May 03
Posts: 7
Credit: 828,565
RAC: 0
Sweden
Message 1076263 - Posted: 11 Feb 2011, 22:10:41 UTC

I never really trusted gowron.

It's the eyes.
ID: 1076263 · Report as offensive
1 · 2 · 3 · Next

Message boards : Technical News : Hills and Valleys (Feb 10 2011)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.