Message boards :
Technical News :
Down Time (May 01 2007)
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 ![]() |
This was one of those days. Sometime in the early morning MySQL on sidious crashed and rebooted itself. It had minor indigestion and restarted on its own just fine. Eric had to restart the BOINC projects to clean the pipes. But when I came in I found Eric dissecting our master database server, thumper. That's never a good sign. He and Jeff informed me that it lost the ability to see any of its internal drives. Tests throughout the day confirmed that diagnosis - there's something dead between the power supply and the disk controllers so the drives don't even spin up. Booting from a DVD and an "fdisk" shows nothing. This system has a "preliminary" motherboard, which is one of the reasons we got it for free, but it has no hardware support. Meanwhile I went ahead with the usual database backup/compression while we figured out what the heck we're gonna do. We're pretty confident the data is intact and as long as some server somewhere can mount the 24 SATA drives the make up the database the SETI@home science data will be perfectly intact. Failing that, we can recover from tape but unfortunately we're at a bad point in the backup cycle so the most recent tape is a week old. Since data loss is most likely not an issue, the upshot of thumper being down is that we can't run the splitters or the assimilators. I just restarted the scheduler, but we only had about 300,000 results to process. I checked again just now and it's already down to about 281,000. Brace yourselves for a long outage. [Edit: things are looking better regarding previously mentioned inability to procure a replacement. In other words, we might get another server relatively quickly.] - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Jamie Send message Joined: 8 Feb 01 Posts: 28 Credit: 11,078,008 RAC: 0 ![]() |
Are the scheduler errors that I'm getting related to this issue, or is it something separate? |
![]() ![]() Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 ![]() |
Are the scheduler errors that I'm getting related to this issue, or is it something separate? That's just due to the servers coming online after the 3-hour database backup, so they are swamped with requests and dropping connections. It'll get better soon, but when work runs out (in about 3-6 hours) it'll get worse. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Iztok s52d (and friends) Send message Joined: 12 Jan 01 Posts: 136 Credit: 393,469,375 RAC: 116 ![]() ![]() |
T Hi! Maybe splitters can store data somewhere temporary, once thumper is back and tested, you can just load this data and then resume assimilators? Good luck with boxes! 73 Iztok |
John-James-Connellan Send message Joined: 16 Jan 00 Posts: 4 Credit: 824,683 RAC: 0 ![]() |
T Should there be a second master science database as backup? (perhaps a generous sponser may be able to help) from Passive Seti Alpha Tester Brendan |
![]() ![]() Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 ![]() |
Maybe splitters can store data somewhere temporary, When splitters first create work they need access to the master science database to store the workunit information - so that when results eventually appear on the other end of the pipeline they will have a workunit to match it. Should there be a second master science database as backup? Of course there *should* be, but lack of resources (a.k.a money) dictated that our policy was to be satisfied with a RAIDed database server with tape backups just in case. Of course RAID doesn't do you any good when *every* disk suddenly disappears according to the OS. I don't want to claim anything prematurely, but in light of this we may get a backup server after all. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
WlanWorks1 Send message Joined: 11 Jan 06 Posts: 7 Credit: 446,368 RAC: 0 |
Are the scheduler errors that I'm getting related to this issue, or is it something separate? This question may be pre-mature, and is most likely covered by your response above, but I am going to ask anyway. During the downtime today I now know was due to the "phantom hard drives" on your end. At first, My messages were advising that I was requesting new work but the project servers may down. However, after 4:06pm and CURRENTLY, when I attempt to connect, via "update" - I am getting the message "(not requesting new work or reporting completed tasks)". Now, I know that most likely I will not gain new work, and I have ceased attempts to contact the servers as it will only add to the traffic congestion. What I want to know, though, is that message related to the downed servers and hard drives and their coming back on line, or do I have a seperate issue all of a sudden? More importantly, will I be able to request and obtain new work without any actions on My end? |
Odysseus ![]() Send message Joined: 26 Jul 99 Posts: 1808 Credit: 6,701,347 RAC: 6 ![]() |
[…] when I attempt to connect, via "update" - I am getting the message "(not requesting new work or reporting completed tasks)". That message isn’t usually anything to worry about, unless you do have current tasks that are “Ready to reportâ€Â. The first part just means BOINC thinks you haveâ€â€or have hadâ€â€enough work from the project for the time being. Do you run any other projects? Did you still have S@h work in progress or queued when you tried the update? Only if the answer to both questions is “no†would I assume there’s any problem at your end. ![]() |
![]() ![]() Send message Joined: 9 Jul 99 Posts: 1199 Credit: 6,615,780 RAC: 0 ![]() |
[...] Phew, at least it isn't the disks themselves - spot of luck in an unlucky overall situation. Data is always more valuable than hardware... That said, good luck with procuring a replacement, and maybe a new system board for Thumper so the master science DB will have a live backup. Regards, Simon. Donate to SETI@Home via PayPal! Optimized SETI@Home apps + Information |
Paul J. Bennett Send message Joined: 6 Oct 06 Posts: 4 Credit: 185,279 RAC: 0 ![]() |
This was one of those days. Sometime in the early morning MySQL on sidious crashed and rebooted itself. It had minor indigestion and restarted on its own just fine. Eric had to restart the BOINC projects to clean the pipes. I suppose that is why I am getting the message STATUS COMMUNICATION DEFERRED. |
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51544 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
Hope that Sun comes through for you and gets you Thumpin' again quickly! "Time is simply the mechanism that keeps everything from happening all at once." ![]() |
![]() ![]() Send message Joined: 2 May 07 Posts: 8 Credit: 306,878 RAC: 0 ![]() |
Hi, I just switched from project. So I don't got any work from you guy's yet, is their an indication when I would be able to get work? I hope your problems will get solutioned quickly ;) |
TarracoServer Send message Joined: 11 Apr 07 Posts: 38 Credit: 595,022 RAC: 0 ![]() |
Hi! Yes, that's a really problem. Maybe, this is a bad idea (or maybe not), but what about to make a daily backup copy (only of the new data, not all the database) and store it on a virtual HD on the net?. I don't know the amount of memory needed daily of that DB (I suppose, several Gb, not several Tb ;)), but there are some free virtual HD space servers to store that data. (Of course, as security issue, can be a good idea to use a couple of that servers with the same data stored) ![]() |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13961 Credit: 208,696,464 RAC: 304 ![]() ![]() |
is their an indication when I would be able to get work? Not really, but from the nature of the problem i'd suggest a couple of days- if all goes well. Grant Darwin NT |
![]() ![]() Send message Joined: 2 May 07 Posts: 8 Credit: 306,878 RAC: 0 ![]() |
Ok, thnx for the intel, I hope it will be sooner but we will see. So I'm in hold ;) ![]() |
![]() Send message Joined: 4 Jun 06 Posts: 13 Credit: 1,233,155 RAC: 0 ![]() |
We'll at least i see it has been accepting my completed task. But i am unable to request new work, is there maybe an alternative solution available or something to work on while the thumper is down? |
![]() ![]() Send message Joined: 21 Jun 99 Posts: 53 Credit: 2,434,487 RAC: 0 ![]() |
We'll at least i see it has been accepting my completed task. Yes just attach to another project and do work for them while seti fixes the hardware. ![]() SETI@home classic workunits 5,429 SETI@home classic CPU time 73,472 hours |
![]() Send message Joined: 21 Jan 03 Posts: 1 Credit: 3,351,670 RAC: 0 ![]() |
Why does Seti dont work , I´ll being trying to up-load , and down load work´s ..and i´ll get no response .. Why don´t it work? i´s it not important? |
Conrad Human Send message Joined: 17 Nov 00 Posts: 67 Credit: 2,009,224 RAC: 0 ![]() |
Please read http://setiathome.berkeley.edu/forum_thread.php?id=39188&nowrap=true#557665 As i stil got +- 1 day worth of work does this afect beta aswell ? Oh well if it goes it goes with a bang (someone find an 24 port Sata Controler) I am sure we wil have an update from Mat later 2day Why does Seti dont work , I´ll being trying to up-load , and down load work´s |
sideband@seti.usa ![]() Send message Joined: 19 Jun 99 Posts: 25 Credit: 2,774,864 RAC: 0 ![]() |
Could this explain why, over the last week or so, my RAC seems to have fallen (to the tune of 1K), while the output of my machines has remained relatively constant (aside from Bishop's burp and Twiggy's downtime)? I've noted that other members of my team have been experiencing similar drops, etc, and was wondering what was going on there, too... 73 de AI8W, Chris Abdico Concussio Fidens Servo Libertas Semper! ![]() ![]() |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.