Down Time (May 01 2007)

Message boards : Technical News : Down Time (May 01 2007)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 558084 - Posted: 2 May 2007, 16:40:27 UTC - in response to Message 558051.  


As i stil got +- 1 day worth of work does this afect beta as well ?


Yep, beta will run out of work as well. I'll be getting a current status report from Matt and Jeff in 20 minutes.
@SETIEric@qoto.org (Mastodon)

ID: 558084 · Report as offensive
netwraith

Send message
Joined: 15 Sep 06
Posts: 8
Credit: 61,290
RAC: 0
United States
Message 558087 - Posted: 2 May 2007, 16:50:19 UTC
Last modified: 2 May 2007, 16:54:34 UTC

--

How big is the array ????

The reason I ask is that I have a spare FiberChannel-over-copper ClarIIon Array that could be had... That is, if it's big enough to do some good...

And perhaps a spare dual XEON server.... or another 420/450 Sun UltraSparc..

Reply in email for confidential discussion ... posting response is also OK...


ID: 558087 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 558090 - Posted: 2 May 2007, 16:52:39 UTC - in response to Message 558051.  


Oh well if it goes it goes with a bang (someone find an 24 port Sata Controler)



Infortunately, thumper's hardware is very specialized. The 6 sata controllers are part of a stacked/sandwiched motherboard design. There are no SATA cables in the box. The 48 port SATA backplane plugs directly into the lower motherboard. It's pre-release hardware, so Sun doesn't know whether a full motherboard swap is even possible due to changes between the engineering model and the production model. Even if we got a 24 port PCI SATA controller, there would be no way to use it in thumper.

Our best bet is to get another machine with 24 drive bays, and we're exploring several avenues. We used software (mdadm) RAID+LVM, so they should boot right up in a different box. I'll post an update when I have more info on current status.

Eric

@SETIEric@qoto.org (Mastodon)

ID: 558090 · Report as offensive
Dirk and LoriEllen

Send message
Joined: 13 Feb 07
Posts: 27
Credit: 27,573
RAC: 0
Australia
Message 558092 - Posted: 2 May 2007, 16:56:06 UTC

Will it do any harm to leave the auto-update on Boinc running? It would be great to get work soon, my latest is ay 90% complete now.
ID: 558092 · Report as offensive
Conrad Human
Volunteer tester

Send message
Joined: 17 Nov 00
Posts: 67
Credit: 2,009,224
RAC: 0
South Africa
Message 558093 - Posted: 2 May 2007, 16:57:40 UTC
Last modified: 2 May 2007, 17:03:30 UTC

Thanks Eric
I would also like some GB specs on that array (cant help with funds or hardware)
Is it posible to make jocelyn an replica server for Thumper too ?

ID: 558093 · Report as offensive
Profile Jim Baize
Volunteer tester

Send message
Joined: 6 May 00
Posts: 758
Credit: 149,536
RAC: 0
United States
Message 558103 - Posted: 2 May 2007, 17:26:41 UTC - in response to Message 558092.  

Will it do any harm to leave the auto-update on Boinc running? It would be great to get work soon, my latest is ay 90% complete now.


When you say "auto-update" I assume that you are talking about the ability of Boinc to periodically poll the server to see if there is any work. If so, then no, it will not hurt. In fact, this is a prime example of why Boinc was built with this (and other) features. It was designed to be able to withstand downtime by projects.
ID: 558103 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 558106 - Posted: 2 May 2007, 17:31:30 UTC

IINM, the problem is that if the scheduler is still up and your host is out of work, it won't trigger the random backoff and your host 'pesters' the project much more frequently looking for new work.

As more hosts run out, it seems to be slowing down others from being able to report their completed work, so it would seem to be better to set No New Tasks for the time being.

Alinator
ID: 558106 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 558107 - Posted: 2 May 2007, 17:31:37 UTC - in response to Message 558087.  

--
How big is the array ????


24x500GB. (12TB of which about 8TB was active with 2TB of parity and 2TB of spares).

Just talked with Matt and Jeff. It looks like we'll have a replacement machine within a few days, but I can't give details just yet because things are still in process.

Eric


@SETIEric@qoto.org (Mastodon)

ID: 558107 · Report as offensive
netwraith

Send message
Joined: 15 Sep 06
Posts: 8
Credit: 61,290
RAC: 0
United States
Message 558112 - Posted: 2 May 2007, 17:41:30 UTC - in response to Message 558107.  
Last modified: 2 May 2007, 17:42:12 UTC

--
How big is the array ????


24x500GB. (12TB of which about 8TB was active with 2TB of parity and 2TB of spares).

Just talked with Matt and Jeff. It looks like we'll have a replacement machine within a few days, but I can't give details just yet because things are still in process.

Eric



Thanks... too big for the ClarIIon.. to handle.. Oh well...

ID: 558112 · Report as offensive
Profile Philadelphia
Volunteer tester
Avatar

Send message
Joined: 12 Feb 07
Posts: 1590
Credit: 399,688
RAC: 0
United States
Message 558131 - Posted: 2 May 2007, 18:46:15 UTC - in response to Message 557665.  

This was one of those days. Sometime in the early morning MySQL on sidious crashed and rebooted itself. It had minor indigestion and restarted on its own just fine. Eric had to restart the BOINC projects to clean the pipes.

But when I came in I found Eric dissecting our master database server, thumper. That's never a good sign. He and Jeff informed me that it lost the ability to see any of its internal drives. Tests throughout the day confirmed that diagnosis - there's something dead between the power supply and the disk controllers so the drives don't even spin up. Booting from a DVD and an "fdisk" shows nothing. This system has a "preliminary" motherboard, which is one of the reasons we got it for free, but it has no hardware support.

Meanwhile I went ahead with the usual database backup/compression while we figured out what the heck we're gonna do. We're pretty confident the data is intact and as long as some server somewhere can mount the 24 SATA drives the make up the database the SETI@home science data will be perfectly intact. Failing that, we can recover from tape but unfortunately we're at a bad point in the backup cycle so the most recent tape is a week old.

Since data loss is most likely not an issue, the upshot of thumper being down is that we can't run the splitters or the assimilators. I just restarted the scheduler, but we only had about 300,000 results to process. I checked again just now and it's already down to about 281,000. Brace yourselves for a long outage.

[Edit: things are looking better regarding previously mentioned inability to procure a replacement. In other words, we might get another server relatively quickly.]

- Matt



Matt,

I send you an email a couple of minutes ago about a server, can you read it and get back to me. my email is danmartinaz AT aol DOT com

http://setiathome.berkeley.edu/forum_thread.php?id=39206


ID: 558131 · Report as offensive
Profile marsinph
Volunteer tester

Send message
Joined: 7 Apr 01
Posts: 172
Credit: 23,823,824
RAC: 0
Belgium
Message 558134 - Posted: 2 May 2007, 18:47:54 UTC

Hi everybody,

I am not able to give a new mainboard or server. But I can send you all my support. By the way, Sun can also do " something ". Boinc become very popular around the world. There are more and more reportage on TV, newspapers about Arecibo and Boinc. It is free advertising for Sun. So they also can do more than normal support !!

With hope on the (very) soon arrival of the replacment machine, my greetings.


ID: 558134 · Report as offensive
Profile Jane Delawney

Send message
Joined: 3 May 02
Posts: 5
Credit: 2,886,879
RAC: 0
Message 558168 - Posted: 2 May 2007, 19:45:08 UTC

Matt, this is my first post on the forums, just because as a long time seti contributor I've been caught up in the car crash.
Can't offer any hardware or any techie suggestions since I'm just an end user on various platforms - but can't help wondering what will happen to the work currently in progress on my linux box which has been delayed by almost a week because of a simple hardware failure (dead power supply - day job meant I couldn't sort it out until now). 'Cos this box isn't the fastest in the world it's going to be some time tomorrow morning before it attempts to report; do you think it's worth carrying on with it, or should I abort the task and wait until y'all are back up?
in other words if I let this box carry on number-crunching and reporting, what will happen to that report, assuming you're able to get things up and running again in the next week or so?

[btw because of *quite* a few hardware problems I've been contributing to this project from a win vista laptop for the past couple of weeks. I'm almost ashamed to admit it! but so it goes.
Windows Vista? I hate it already.]

JD
ID: 558168 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 558170 - Posted: 2 May 2007, 19:47:00 UTC - in response to Message 558131.  

Philadelphia

Thank You

I called Eric who went off to insure Matt got interrupted long enough to read the email... You should have a reply shortly

Regards

Pappa


Matt,

I send you an email a couple of minutes ago about a server, can you read it and get back to me. my email is danmartinaz AT aol DOT com

http://setiathome.berkeley.edu/forum_thread.php?id=39206



Please consider a Donation to the Seti Project.

ID: 558170 · Report as offensive
Profile Fuzzy Hollynoodles
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 9659
Credit: 251,998
RAC: 0
Message 558171 - Posted: 2 May 2007, 19:47:36 UTC - in response to Message 558107.  
Last modified: 2 May 2007, 19:57:48 UTC

--
How big is the array ????


24x500GB. (12TB of which about 8TB was active with 2TB of parity and 2TB of spares).

Just talked with Matt and Jeff. It looks like we'll have a replacement machine within a few days, but I can't give details just yet because things are still in process.

Eric



We'll keep our fingers crossed for you.


And the kitties united says: Let's hope those humans will understand our need to crunch!





"I'm trying to maintain a shred of dignity in this world." - Me

ID: 558171 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 558176 - Posted: 2 May 2007, 19:51:23 UTC - in response to Message 558168.  

Jane

Welcome to Seti

If is has not told you it in danger of timing out let it complete...

Matt, this is my first post on the forums, just because as a long time seti contributor I've been caught up in the car crash.
Can't offer any hardware or any techie suggestions since I'm just an end user on various platforms - but can't help wondering what will happen to the work currently in progress on my linux box which has been delayed by almost a week because of a simple hardware failure (dead power supply - day job meant I couldn't sort it out until now). 'Cos this box isn't the fastest in the world it's going to be some time tomorrow morning before it attempts to report; do you think it's worth carrying on with it, or should I abort the task and wait until y'all are back up?
in other words if I let this box carry on number-crunching and reporting, what will happen to that report, assuming you're able to get things up and running again in the next week or so?

[btw because of *quite* a few hardware problems I've been contributing to this project from a win vista laptop for the past couple of weeks. I'm almost ashamed to admit it! but so it goes.
Windows Vista? I hate it already.]

JD


Please consider a Donation to the Seti Project.

ID: 558176 · Report as offensive
Profile PJ - SonOfJabba

Send message
Joined: 3 Dec 01
Posts: 3
Credit: 242,569
RAC: 0
United States
Message 558183 - Posted: 2 May 2007, 19:57:52 UTC

One of the computers in my Seti Farm just had the same issue. I got it to work with a Promise PCI Controller Card until I can figure out the real cause of the issue.

Good Luck!
ID: 558183 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 558205 - Posted: 2 May 2007, 20:32:34 UTC - in response to Message 558168.  

Matt, this is my first post on the forums, just because as a long time seti contributor I've been caught up in the car crash.
Can't offer any hardware or any techie suggestions since I'm just an end user on various platforms - but can't help wondering what will happen to the work currently in progress on my linux box which has been delayed by almost a week because of a simple hardware failure (dead power supply - day job meant I couldn't sort it out until now). 'Cos this box isn't the fastest in the world it's going to be some time tomorrow morning before it attempts to report; do you think it's worth carrying on with it, or should I abort the task and wait until y'all are back up?
in other words if I let this box carry on number-crunching and reporting, what will happen to that report, assuming you're able to get things up and running again in the next week or so?

[btw because of *quite* a few hardware problems I've been contributing to this project from a win vista laptop for the past couple of weeks. I'm almost ashamed to admit it! but so it goes.
Windows Vista? I hate it already.]

JD

When there is an outage like this, deadlines are generally extended, either "artifically" (by the project) or simply because everyone is "late" and the work pairs-up just fine when things start working.
ID: 558205 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 558217 - Posted: 2 May 2007, 21:01:04 UTC - in response to Message 558205.  
Last modified: 2 May 2007, 21:01:33 UTC

When there is an outage like this, deadlines are generally extended, either "artifically" (by the project) or simply because everyone is "late" and the work pairs-up just fine when things start working.


If there is an outage where people cannot connect to the project at all, that's when we artificially bump up deadlines. However, in an outage *like this* all the public facing servers are up and running, they simply aren't sending out any new work. But work being processed can still be submitted and validated (and credit granted). So there's no reason to bump up deadlines.

By the way I did notice just now the file_upload stuff was hung for the past few hours. Not sure why. I kicked the web server and that's working again.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 558217 · Report as offensive
Profile Qurmo
Volunteer tester
Avatar

Send message
Joined: 2 May 07
Posts: 8
Credit: 306,878
RAC: 0
Belgium
Message 558279 - Posted: 2 May 2007, 22:22:49 UTC

I'm happy to see their's a solution on it's way, let's hope they send it in express delivery ;). Hope all will return in to normal soon so I can start doing some tasks for you guy's. Any way I'm gonne call it a night.
ID: 558279 · Report as offensive
Profile Misfit
Volunteer tester
Avatar

Send message
Joined: 21 Jun 01
Posts: 21804
Credit: 2,815,091
RAC: 0
United States
Message 558496 - Posted: 3 May 2007, 2:52:11 UTC - in response to Message 558084.  

As i stil got +- 1 day worth of work does this afect beta as well ?

Yep, beta will run out of work as well. I'll be getting a current status report from Matt and Jeff in 20 minutes.

well at least I wont have to worry there. 200 hours and still crunching.
me@rescam.org
ID: 558496 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Technical News : Down Time (May 01 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.