A Billion or So (Jan 06 2010) |
![]() |
| log in |
Message boards : Technical News : A Billion or So (Jan 06 2010)
| Author | Message |
|---|---|
|
Still catching up from the reduced/random schedule during the holidays. The science database rehabilitation project still continues. We're nearing the end: the primary science database (thumper) is now corruption free, stable, and logging properly. The secondary science database (bambi) is being rebuilt as I type using the science database backup we made on Monday. The rebuilding is going rather slowly - we predict it will take 11 days (!) at current rates. As I typed this paragraph we noticed the rebuild was stuck. We feared we had to reboot the system and start again from scratch but luckily we were able to find the errant process locking the whole system, and everything else sprung to life, continuing where it left off. Phew. | |
| ID: 961366 · | |
|
Thanks for the news Matt, glad most things are doing OK. | |
| ID: 961367 · | |
|
Right... regarding mork we still have no clue. I don't think it's power - the system is just hanging there in a frozen stateuntil we have to hard reset it. Maybe that's a symptom to a power problem I've never seen before. Anyway, it's an "engineering model" system so all bets are off, really. | |
| ID: 961369 · | |
|
Thanks for the update Matt, | |
| ID: 961370 · | |
|
This is just a theory, as I have not seen the system, you might have some AC ripple leaking through onto a DC line, due to a filter breaking down. Is it possible to swap out a power unit? | |
| ID: 961374 · | |
|
Great work Matt! | |
| ID: 961401 · | |
|
Cant...resist... | |
| ID: 961442 · | |
one of the drives in thumper's RAID issued some warnings. Last time that happened we got some, well, um... corruption. This has been suggested before but why not use ZFS on the Thumper (and other fileservers)? Having an additional end-to-end checksum is very useful when your data volume starts to approach typical disk bit-error rates. You can also run a data verify ("scrub") to ensure that the data you read off the drives matches the expected data. ____________ | |
| ID: 961499 · | |
It's still no good to have your master user database crash and recover every other week. Maybe you'd be better of switching it to be the secondary instead? | |
| ID: 961540 · | |
|
Is there any chance whatsoever to run some of these servers as virtual servers, thereby increasing the system's flexibility? It seems possible that one of the VM providers may be interesting in the experiment and would provide 'free' software to such a visible project. With any luck it may make Matt's work-experience a little less anxious! | |
| ID: 961559 · | |
Is there any chance whatsoever to run some of these servers as virtual servers, thereby increasing the system's flexibility? It seems possible that one of the VM providers may be interesting in the experiment and would provide 'free' software to such a visible project. With any luck it may make Matt's work-experience a little less anxious! Virtual servers are only useful if: 1) The servers are not completely saturated (virtualizing things does take extra CPU and disk resources). 2) There are processes that come and go on an irregular basis. Neither of these is true in this case. ____________ BOINC WIKI | |
| ID: 961709 · | |
|
I was thinking of the capability of migrating the VM between hardware platforms. Isn't that a useful property, as well? | |
| ID: 961811 · | |
|
With all the servers running at full load, where could you migrate the VM to? I was thinking of the capability of migrating the VM between hardware platforms. Isn't that a useful property, as well? ____________ mic. | |
| ID: 961903 · | |
Message boards : Technical News : A Billion or So (Jan 06 2010)
| Copyright © 2013 University of California |