Message boards :
Technical News :
Harvey (Mar 24 2009)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
The good news is that our regular Tuesday maintenance outage today chugged along quickly, and without incident. The not-so-great news is that we are still fighting with thumper to get it running properly again. Jeff, Eric, and I whipped up a cookbook yesterday of the 7 or 8 steps to get thumper's root drive mirrored. As of this morning we had only one working drive with root/boot on it, but it's the spare drive sitting in the /dev/sda slot. According to the BIOS, the root/boot drives have to be in slots #0 and #1, but thanks to non-linear disk controller labels on the backplane these drives show up in linux-land as /dev/sdy and /dev/sdac. Of course, you can only install grub on /dev/sd[a-d] which means lots of disk swapping and rebooting and resyncing. However, we're still on step #2 right now, and it won't finish until later tonight. The three of us were huddled over thumper for almost three hours - a frustrating period of time starting with us rebooting thumper "just to make sure everything is working" and then it wouldn't mount the root drive because of underlying issues with the metadevice. This was all mysterious, and after poking this and that it got worse - we could only boot in recovery mode off of DVD, and we had to hack partition tables and change disk identifiers before we could see root again. That's where it's at now: we're syncing the one working drive with a new spare, a process that we thought would take less than an hour but will take five, apparently. To add insult to injury our pulse table in the science database on thumper ran out of extents last night, which basically means the tables are full even though we have disk space available. So as if the above ordeal wasn't enough, we'll need an additional day or two to recreate (or at least hack at) the pulse table to add more extents. Long story short, don't expect SETI@home to be generating any new work or assimilating anything for a week (unless we're lucky). We'll at least try to keep Astropulse working during this time, so computers that can run Astropulse will be kept busy. When it rains it pours, but we'll be back to normal again soon enough. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
ML1 Send message Joined: 25 Nov 01 Posts: 21329 Credit: 7,508,002 RAC: 20 |
OK, wild guess #1 for the root drives problem... Can you specify in grub to reorder the IO ports for the disks to get sda, sdb, sdc, sdd to map into grub in sequence? Using disk labels, you can then let Linux sort out the mount mess automagically later. More of a question is where (which disks) to put the MBR (and redundant copies) for the BIOS to boot into... Good luck! (Anyone else ever juggled so many drives?) Or... Set up a dedicated isoboot CD? Memory stick?? Regards, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Andrew Clayton Send message Joined: 12 Apr 99 Posts: 7 Credit: 907,810 RAC: 0 |
If your RAID resync is going slow (check /sys/block/mdX/md/sync_speed, speed in KB/sec). You could try increasing it by tweaking /sys/block/mdX/md/sync_speed_{min,max} |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
If your RAID resync is going slow (check /sys/block/mdX/md/sync_speed, speed in KB/sec). You could try increasing it by tweaking Good tip, but I just checked - we're nowhere near the max, and well above the min... - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
KW2E Send message Joined: 18 May 99 Posts: 346 Credit: 104,396,190 RAC: 34 |
Hey Matt, If we go without work for a while, then we go without work for a while. Take your time man and do what you gotta do. We can all wait. Don't pull your hair out. I shave mine off once a week with a #1 so I can't if I wanted too. ;-) Rob |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
. . . Thanks for the Posting Matt - and Thanks to Each of You @ Berkeley for All that you do & have done especially w/ AP [+ to those @ Lunatics - well done Mates . . .] BOINC Wiki . . . Science Status Page . . . |
P. J. Crabtree Send message Joined: 17 Jan 07 Posts: 22 Credit: 1,847,766 RAC: 0 |
Matt, should we suspend network activity so as to reduce the load on the servers when the situation is resolved? |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
from Matt: 'We'll at least try to keep Astropulse working during this time, so computers that can run Astropulse will be kept busy.' Matt, should we suspend network activity so as to reduce the load on the servers when the situation is resolved? BOINC Wiki . . . Science Status Page . . . |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
|
Andreas Send message Joined: 21 Jan 02 Posts: 16 Credit: 9,911,789 RAC: 0 |
We'll at least try to keep Astropulse working during this time, so computers that can run Astropulse will be kept busy. I'm usualy crunching MB's only and trying to change this now. Is the projekt sending out AP's at the moment? Greetings from "Good Old Europe", Andreas |
Neil Blaikie Send message Joined: 17 May 99 Posts: 143 Credit: 6,652,341 RAC: 0 |
I have got it set to MB and Astropulse and have not got any of either work unit. (Temporarily turned off MB to ease the burden and only set to AP) Get the 3/25/2009 11:00:22 AM|SETI@home|Message from server: (Project has no jobs available) message when trying to request work. Says there are units available just doesn't seem to be any being sent out. (Then again the server status is very lagged behind so could be the queue is empty) |
Andreas Send message Joined: 21 Jan 02 Posts: 16 Credit: 9,911,789 RAC: 0 |
the server status page is out of date, the numbers given there for available work are 25h old. has someone actualy gotten ap's today? |
Neil Blaikie Send message Joined: 17 May 99 Posts: 143 Credit: 6,652,341 RAC: 0 |
Not me, haven't got anything since yesterday. Giving the dual cores a nice earned rest until work becomes available again. |
suki quin Send message Joined: 12 Oct 08 Posts: 81 Credit: 1,053,392 RAC: 0 |
Matt, should we suspend network activity so as to reduce the load on the servers when the situation is resolved? Received no work since around 21:00 UTC on the 24th... Seconding this question and suspending network activity until answer appears (again) Thank you ALL Suki keep telescopic listening devices aimed at the Zenith of the Horizon |
speedimic Send message Joined: 28 Sep 02 Posts: 362 Credit: 16,590,653 RAC: 0 |
the server status page is out of date, the numbers given there for available work are 25h old. yep, 7 APs - all resends - nothing newly split. mic. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
the server status page is out of date, the numbers given there for available work are 25h old. The status page is now up to date. Creation rates for MB and AP_v5 reflect the amount of resends being created. Demand is much higher than that, of course, so it takes luck to get work. Joe |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 31047 Credit: 53,134,872 RAC: 32 |
the server status page is out of date, the numbers given there for available work are 25h old. Was going to ask why the result creation rate wasn't zero, but you have explained it. |
ionocean Send message Joined: 1 Nov 05 Posts: 1 Credit: 46,446 RAC: 0 |
Hello from Salina, Kansas, Matt; ...According to the BIOS, the root/boot drives have to be in slots #0 and #1... This seems to be the bottom line here. I know that you guys have your plates full of "things to do", but have you considered writing a custom bios to take care of stuff like that? I've been programming for about 30 years, and have ran into problems like this before, but on mainframes running Sys V, v4., and several times have just hooked from a maintenance monitor to do what I wanted, instead of what the BIOS wanted. Ya gotta be careful here, it's like brain surgery....but sometimes it was the only way. Like the #1 precept of programmers that states: A program must never modify it's own code while it is running, I consider a program that CAN modify it's own code a better piece of work...it's only a program, after all. Just a few thoughts for ya, good luck with all that you do, and I'll keep chugging along with my old Evo N-150 (800Mhz) and my Dell laptops for you. Mike Kashkin "ionocean" Salina, Kansas |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.