Message boards :
Technical News :
Hills and Valleys (Feb 10 2011)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
First the good news. I have thumper all configured and ready to roll as our mega file server. In fact it's already rolling. Note this isn't a public facing server, but will indirectly help the various public services in many ways, including making the sysadmins working on SETI@home/BOINC a lot happier in general. Lots of really fast disk storage for database backups, raw data transfer buffers, doesn't randomly reboot itself like our current home account server, etc. Mmmkay. Now the less good news. Looks like gowron is having some fundamental RAID issues. The issues has been whittled down to one RAID1 pair tagged as degraded that won't rebuild no matter what we do. THe guys at Overland have been super helpful - but this is actually an old SnapAppliance (not a box that Overland sells) and running a (very) old version of the OS. So it's looking like our best bet to move forward is to upgrade the OS on the thing. However to do so we need to copy the workunits on the system (about 2 terabyte's worth) elsewhere temporarily. How about... thumper! That copy process is happening now. Meanwhile, we'll be off for the foreseeable future. Like at least until next week, I imagine. Bummer. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
KWSN Ekky Ekky Ekky Send message Joined: 25 May 99 Posts: 944 Credit: 52,956,491 RAC: 67 |
OK Matt, well done and all as usual. I simply have no idea how you people do what you do. I am just down to CUDA tasks now so it'll be one at a time for some time. |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
(...) Matt, thanks for the news! Ohh.. at least until next week? A pity.. ET is waiting.. ;-) |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Thanks for the update Matt, well done on getting thumper up and running, good luck on getting gowron running properly, Claggy |
Jeff Mercer Send message Joined: 14 Aug 08 Posts: 90 Credit: 162,139 RAC: 0 |
Thanks for the news Matt. I've been checking in every few hours, but looks like my little H.P. computer will have a nice, long, cool down break. Wish that there was something I could do to help you, but I don't know a thing about a server. Hope all goes well with the repairs. |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30930 Credit: 53,134,872 RAC: 32 |
As they say, the next work unit in the splitter has ET on it. Thanks for the update. Dang my cache will be full of other projects by then and everything will go into EDF mode. |
S@NL - Eesger - www.knoop.nl Send message Joined: 7 Oct 01 Posts: 385 Credit: 50,200,038 RAC: 0 |
As always, thanks for the update. Ouch, the best of luck getting it all working properly again! PS: maybe in about three weeks we may be able to make life at your end a (tiny?) little bit easier ;) The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS |
RottenMutt Send message Joined: 15 Mar 01 Posts: 1011 Credit: 230,314,058 RAC: 0 |
... Can't we do better then that. About 6 hours to copy the data one way. You should not have to copy it back as it shouldn't be distroyed. |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Can't we do better then that. About 6 hours to copy the data one way. You should not have to copy it back as it shouldn't be distroyed. In a perfect world, yes. But the data is coming off a degraded RAID, and it's talking over NFS, and it's competing with various other must-get-done backups writing to the same device, and it all will in fact be destroyed as this OS upgrade on the broken system (going up 2 major versions) will wipe out all current RAID configurations to make way for the larger root filesystem. And then we'll have to copy the data back. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
SciManStev Send message Joined: 20 Jun 99 Posts: 6657 Credit: 121,090,076 RAC: 0 |
Thank you Matt! It seems like we will be in a much stronger position once the OS is upgraded and the problems fixed. This sounds like a good solid fix that will get rid of a long standing problem. Steve Warning, addicted to SETI crunching! Crunching as a member of GPU Users Group. GPUUG Website |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
Dang my cache will be full of other projects by then and everything will go into EDF mode. If you want to crunch SETI above all else, why on earth would you want to punish other projects by allowing them to fill a SETI-sized cache and then run into deadline trouble? Turn the cache down while you know there's no work, then turn it back up - gradually - once SETI is back and work is flowing. No EDF, no deadlines missed, fastest possible return to SETI crunching, least stress on the download servers and comms. What's to lose? |
Wiggo Send message Joined: 24 Jan 00 Posts: 36375 Credit: 261,360,520 RAC: 489 |
Could we have a similar list of backroom kit, saying what they are and what they do? This question has been asked several times now over the last few months but still no answer has been given. Cheers. |
APCyberax Send message Joined: 6 Jun 01 Posts: 29 Credit: 24,078,024 RAC: 48 |
explains the lack of reporting i had. out of WU on my pc but the work server still has a day or so.... Will do some rosetta while you get things back up. Will be here when you get things back up. take your time and thanks for the update. |
Kibble (KB7TIB) Send message Joined: 6 Dec 99 Posts: 27 Credit: 10,121,469 RAC: 2 |
Echoing KWSN Ekky above, it's just down to the cuda units being chewed up from SETI, one-by-one now. Figured it was time to start punishing the EINSTEIN@home servers and down loaded a bunch of their work units. (One project or another will have available work.) Take your time and get the job done right. Can't wait to start building up the Pending list again. :-) And thank you for both the info and the never-ending efforts of the whole SETI crew. You are the original True Believers. |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
It's a shame to hear that that we will be down until next week, but if it has to be then it has to be. However, if as a by-product, Thumper is now helping to keep SETI@home/BOINC sysadmins happy, a daunting task at any time, then that is worth a few brownie points on its own! Best I could locate on a fast trip through the thread and the about pages, there have been a few changes since then. http://setiathome.berkeley.edu/forum_thread.php?id=62056#1049143 http://setiathome.berkeley.edu/sah_photos.php?album=closet_02_14_2008 http://setiathome.berkeley.edu/sah_photos.php?album=closet_12_22_2008 http://setiathome.berkeley.edu/forum_thread.php?id=62143#1052095 |
baron_iv Send message Joined: 4 Nov 02 Posts: 109 Credit: 104,905,241 RAC: 0 |
I have a feeling that your download servers are gonna be slammed the second they come back up. I have over 2100 tasks ready to be returned on all my computers combined and my RAC is a pittance compared to the big boys, who will have many many times more. I hope that doesn't crash everything again. I'm sure y'all are clever enough to prevent that though. As always, thanks for all that you mighty admins do to keep things up and running and I'll be here ready to crunch more tasks when you get everything back online. :) Also, have a nice weekend! -baron_iv Proud member of: GPU Users Group |
KWSN THE Holy Hand Grenade! Send message Joined: 20 Dec 05 Posts: 3187 Credit: 57,163,290 RAC: 0 |
I can top all of ya - I've got one computer with 4 down projects! (out of 7...) CDPN, MilkyWay, SETI, and SETBeta are all either down, or giving me the random "Servers not available" (CDPN) . Hello, from Albany, CA!... |
Mooncalf Send message Joined: 5 Jan 11 Posts: 19 Credit: 20,196,239 RAC: 0 |
Is there anything that the "consuming public" can do to assist an more expedient positive outcome? Benivo. |
Corvid Send message Joined: 31 Oct 05 Posts: 15 Credit: 18,216,988 RAC: 11 |
Thanks for the update Matt, Looks like I'll run out of work units some time tonight. Too bad you've been having so much trouble with storage lately, all the hard drive and RAID issues. Is there a reason you don't use a SAN solution or is it just a matter of funding for all the hardware that would involve? Hope everything comes up better and stronger when the repairs are done. |
zii Send message Joined: 24 May 03 Posts: 7 Credit: 828,565 RAC: 0 |
I never really trusted gowron. It's the eyes. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.