| Author |
Message |
Matt LebofskyVolunteer moderator Project administrator Project developer Project scientist
 Send message
Joined: 1 Mar 99 Posts: 1376 Credit: 74,079 RAC: 0

|
|
The good news is that our regular Tuesday maintenance outage today chugged along quickly, and without incident. The not-so-great news is that we are still fighting with thumper to get it running properly again.
Jeff, Eric, and I whipped up a cookbook yesterday of the 7 or 8 steps to get thumper's root drive mirrored. As of this morning we had only one working drive with root/boot on it, but it's the spare drive sitting in the /dev/sda slot. According to the BIOS, the root/boot drives have to be in slots #0 and #1, but thanks to non-linear disk controller labels on the backplane these drives show up in linux-land as /dev/sdy and /dev/sdac. Of course, you can only install grub on /dev/sd[a-d] which means lots of disk swapping and rebooting and resyncing.
However, we're still on step #2 right now, and it won't finish until later tonight. The three of us were huddled over thumper for almost three hours - a frustrating period of time starting with us rebooting thumper "just to make sure everything is working" and then it wouldn't mount the root drive because of underlying issues with the metadevice. This was all mysterious, and after poking this and that it got worse - we could only boot in recovery mode off of DVD, and we had to hack partition tables and change disk identifiers before we could see root again. That's where it's at now: we're syncing the one working drive with a new spare, a process that we thought would take less than an hour but will take five, apparently.
To add insult to injury our pulse table in the science database on thumper ran out of extents last night, which basically means the tables are full even though we have disk space available. So as if the above ordeal wasn't enough, we'll need an additional day or two to recreate (or at least hack at) the pulse table to add more extents. Long story short, don't expect SETI@home to be generating any new work or assimilating anything for a week (unless we're lucky). We'll at least try to keep Astropulse working during this time, so computers that can run Astropulse will be kept busy.
When it rains it pours, but we'll be back to normal again soon enough.
- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
|
|
ML1Volunteer tester Send message
Joined: 25 Nov 01 Posts: 7210 Credit: 3,703,390 RAC: 728

|
|
OK, wild guess #1 for the root drives problem...
Can you specify in grub to reorder the IO ports for the disks to get sda, sdb, sdc, sdd to map into grub in sequence?
Using disk labels, you can then let Linux sort out the mount mess automagically later.
More of a question is where (which disks) to put the MBR (and redundant copies) for the BIOS to boot into...
Good luck!
(Anyone else ever juggled so many drives?)
Or... Set up a dedicated isoboot CD? Memory stick??
Regards,
Martin
____________
Mandriva Linux A user friendly OS!
See new freedom Mageia2
The Future is what We make IT (GPLv3) |
|
|
|
|
|
If your RAID resync is going slow (check /sys/block/mdX/md/sync_speed, speed in KB/sec). You could try increasing it by tweaking
/sys/block/mdX/md/sync_speed_{min,max}
____________
|
|
|
Matt LebofskyVolunteer moderator Project administrator Project developer Project scientist
 Send message
Joined: 1 Mar 99 Posts: 1376 Credit: 74,079 RAC: 0

|
If your RAID resync is going slow (check /sys/block/mdX/md/sync_speed, speed in KB/sec). You could try increasing it by tweaking
/sys/block/mdX/md/sync_speed_{min,max}
Good tip, but I just checked - we're nowhere near the max, and well above the min...
- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
|
|
|
|
|
Hey Matt,
If we go without work for a while, then we go without work for a while. Take your time man and do what you gotta do. We can all wait.
Don't pull your hair out. I shave mine off once a week with a #1 so I can't if I wanted too. ;-)
Rob
____________
|
|
|
|
|
|
. . . Thanks for the Posting Matt
- and Thanks to Each of You @ Berkeley for All that you do & have done
especially w/ AP
[+ to those @ Lunatics - well done Mates . . .]
____________
BOINC Wiki . . .
Science Status Page . . .
|
|
|
|
|
|
Matt, should we suspend network activity so as to reduce the load on the servers when the situation is resolved?
____________
|
|
|
|
|
|
from Matt: 'We'll at least try to keep Astropulse working during this time, so computers that can run Astropulse will be kept busy.'
Matt, should we suspend network activity so as to reduce the load on the servers when the situation is resolved?
____________
BOINC Wiki . . .
Science Status Page . . . |
|
|
arkayn Volunteer tester
 Send message
Joined: 14 May 99 Posts: 3148 Credit: 41,361,719 RAC: 33,080

|
|
If you are only doing MB on a host, set to NNW instead.
____________
|
|
|
|
|
We'll at least try to keep Astropulse working during this time, so computers that can run Astropulse will be kept busy.
I'm usualy crunching MB's only and trying to change this now. Is the projekt sending out AP's at the moment?
Greetings from "Good Old Europe",
Andreas
____________
|
|
|
|
|
|
I have got it set to MB and Astropulse and have not got any of either work unit.
(Temporarily turned off MB to ease the burden and only set to AP)
Get the 3/25/2009 11:00:22 AM|SETI@home|Message from server: (Project has no jobs available) message when trying to request work.
Says there are units available just doesn't seem to be any being sent out. (Then again the server status is very lagged behind so could be the queue is empty)
____________
|
|
|
|
|
|
the server status page is out of date, the numbers given there for available work are 25h old.
has someone actualy gotten ap's today?
____________
|
|
|
|
|
|
Not me, haven't got anything since yesterday. Giving the dual cores a nice earned rest until work becomes available again.
____________
|
|
|
|
|
Matt, should we suspend network activity so as to reduce the load on the servers when the situation is resolved?
Received no work since around 21:00 UTC on the 24th... Seconding this question and suspending network activity until answer appears (again)
Thank you ALL
Suki
____________
keep telescopic listening devices aimed at the Zenith of the Horizon |
|
|
|
|
the server status page is out of date, the numbers given there for available work are 25h old.
has someone actualy gotten ap's today?
yep, 7 APs - all resends - nothing newly split.
____________
mic.
|
|
|
|
|
the server status page is out of date, the numbers given there for available work are 25h old.
has someone actualy gotten ap's today?
yep, 7 APs - all resends - nothing newly split.
The status page is now up to date. Creation rates for MB and AP_v5 reflect the amount of resends being created. Demand is much higher than that, of course, so it takes luck to get work. Joe |
|
|
|
|
the server status page is out of date, the numbers given there for available work are 25h old.
has someone actualy gotten ap's today?
yep, 7 APs - all resends - nothing newly split.
The status page is now up to date. Creation rates for MB and AP_v5 reflect the amount of resends being created. Demand is much higher than that, of course, so it takes luck to get work. Joe
Was going to ask why the result creation rate wasn't zero, but you have explained it.
____________
|
|
|
|
|
|
Hello from Salina, Kansas, Matt;
...According to the BIOS, the root/boot drives have to be in slots #0 and #1...
This seems to be the bottom line here.
I know that you guys have your plates full of "things to do", but have you considered writing a custom bios to take care of stuff like that?
I've been programming for about 30 years, and have ran into problems like this before, but on mainframes running Sys V, v4., and several times have just hooked from a maintenance monitor to do what I wanted, instead of what the BIOS wanted. Ya gotta be careful here, it's like brain surgery....but sometimes it was the only way.
Like the #1 precept of programmers that states: A program must never modify it's own code while it is running, I consider a program that CAN modify it's own code a better piece of work...it's only a program, after all.
Just a few thoughts for ya, good luck with all that you do, and I'll keep chugging along with my old Evo N-150 (800Mhz) and my Dell laptops for you.
Mike Kashkin
"ionocean"
Salina, Kansas
____________
|
|
|