A Billion or So (Jan 06 2010)


log in

Advanced search

Message boards : Technical News : A Billion or So (Jan 06 2010)

Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 961366 - Posted: 6 Jan 2010, 23:47:01 UTC

Still catching up from the reduced/random schedule during the holidays. The science database rehabilitation project still continues. We're nearing the end: the primary science database (thumper) is now corruption free, stable, and logging properly. The secondary science database (bambi) is being rebuilt as I type using the science database backup we made on Monday. The rebuilding is going rather slowly - we predict it will take 11 days (!) at current rates. As I typed this paragraph we noticed the rebuild was stuck. We feared we had to reboot the system and start again from scratch but luckily we were able to find the errant process locking the whole system, and everything else sprung to life, continuing where it left off. Phew.

By the way... not to rain on the parade, but during the holidays one of the drives in thumper's RAID issued some warnings. Last time that happened we got some, well, um... corruption. I doubt we'll have to go through this whole rigamarole again. If anything, just a small part of the cookbook. Ah, probably not worth worrying about. We'll run some checks when all the above is through and see where we're at.

In better news, I got scram_peek working again. What's that? It's a little utility that runs down at the telescope and reads various diagnostics as they are broadcast around the local net. Stuff like current telescope position, if alfa is running, etc.. This hasn't been working since our data recorder issues a loooong time ago, so our science status page (where we post such info) has been rather stale. One major stumbling block was the old scram_peek ran on a solaris machine, but that particular system died. We had no other solaris system handy so I had to recompile it on linux. It's really old code, linking against even older libraries. I had some compiler errors to work through - annoying but nothing too extreme.

Anyway, I'm looking at the science status page right now and the ALFA receiver light is green. That's beautiful. You may also notice the # of spikes in the science database is shockingly low. That's because we recently split the spike table into two (it grew beyond the bounds a single logical table could handle). We'll combine them again at a later point. Until then, that number is off by a billion or so (1,341,844,240 to be exact).

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile Keith T.
Volunteer tester
Avatar
Send message
Joined: 23 Aug 99
Posts: 738
Credit: 232,023
RAC: 28
United Kingdom
Message 961367 - Posted: 6 Jan 2010, 23:58:44 UTC - in response to Message 961366.

Thanks for the news Matt, glad most things are doing OK.

Did you see my questions in http://setiathome.berkeley.edu/forum_thread.php?id=56160? What happened to Mork a few days ago? Have you eliminated any intermittant hardware problems?

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 961369 - Posted: 7 Jan 2010, 0:02:46 UTC
Last modified: 7 Jan 2010, 0:03:13 UTC

Right... regarding mork we still have no clue. I don't think it's power - the system is just hanging there in a frozen stateuntil we have to hard reset it. Maybe that's a symptom to a power problem I've never seen before. Anyway, it's an "engineering model" system so all bets are off, really.

The good news on that front, if anything, is that Jeff and I finally incorporated an IP enabled power switch so we can at least fully power cycle the thing from the comfort of our homes next time this happens off hours. It's still no good to have your master user database crash and recover every other week.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4067
Credit: 32,904,876
RAC: 7,762
United Kingdom
Message 961370 - Posted: 7 Jan 2010, 0:04:06 UTC
Last modified: 7 Jan 2010, 0:51:54 UTC

Thanks for the update Matt,

Claggy

Edit: Strangly enough, i was an Avionics Technician in the 80's as well :)

Profile Keith T.
Volunteer tester
Avatar
Send message
Joined: 23 Aug 99
Posts: 738
Credit: 232,023
RAC: 28
United Kingdom
Message 961374 - Posted: 7 Jan 2010, 0:18:04 UTC - in response to Message 961369.

This is just a theory, as I have not seen the system, you might have some AC ripple leaking through onto a DC line, due to a filter breaking down. Is it possible to swap out a power unit?

I used to be an avionics tech in the Air Force back in the '80s. I have seen a few wierd snags get tracked down to bad power or bad grounding or earthing.

Keith T.

Luke
Volunteer developer
Avatar
Send message
Joined: 31 Dec 06
Posts: 2546
Credit: 817,560
RAC: 0
New Zealand
Message 961401 - Posted: 7 Jan 2010, 2:39:45 UTC

Great work Matt!

Is there perhaps any chance of an update at the S@H Data Distribution History page?

Looks like the last time it was updated was in August 2008...

- Luke.
____________
- Luke.

Profile Francis Noel
Avatar
Send message
Joined: 30 Aug 05
Posts: 417
Credit: 55,431,419
RAC: 68,535
Canada
Message 961442 - Posted: 7 Jan 2010, 4:59:46 UTC

Cant...resist...

<sagan> a billion or so </sagan>

:D
____________
mambo

Peter Jeremy
Send message
Joined: 18 May 99
Posts: 1
Credit: 1,106,165
RAC: 1,761
Australia
Message 961499 - Posted: 7 Jan 2010, 9:12:08 UTC

one of the drives in thumper's RAID issued some warnings. Last time that happened we got some, well, um... corruption.


This has been suggested before but why not use ZFS on the Thumper (and other fileservers)? Having an additional end-to-end checksum is very useful when your data volume starts to approach typical disk bit-error rates. You can also run a data verify ("scrub") to ensure that the data you read off the drives matches the expected data.
____________

DJStarfox
Send message
Joined: 23 May 01
Posts: 1040
Credit: 544,758
RAC: 267
United States
Message 961540 - Posted: 7 Jan 2010, 14:56:57 UTC - in response to Message 961369.

It's still no good to have your master user database crash and recover every other week.


Maybe you'd be better of switching it to be the secondary instead?

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1622
Credit: 22,107,260
RAC: 3,857
United States
Message 961559 - Posted: 7 Jan 2010, 16:42:27 UTC

Is there any chance whatsoever to run some of these servers as virtual servers, thereby increasing the system's flexibility? It seems possible that one of the VM providers may be interesting in the experiment and would provide 'free' software to such a visible project. With any luck it may make Matt's work-experience a little less anxious!

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24384
Credit: 519,750
RAC: 37
United States
Message 961709 - Posted: 7 Jan 2010, 23:37:27 UTC - in response to Message 961559.

Is there any chance whatsoever to run some of these servers as virtual servers, thereby increasing the system's flexibility? It seems possible that one of the VM providers may be interesting in the experiment and would provide 'free' software to such a visible project. With any luck it may make Matt's work-experience a little less anxious!

Virtual servers are only useful if:

1) The servers are not completely saturated (virtualizing things does take extra CPU and disk resources).
2) There are processes that come and go on an irregular basis.

Neither of these is true in this case.
____________


BOINC WIKI

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1622
Credit: 22,107,260
RAC: 3,857
United States
Message 961811 - Posted: 8 Jan 2010, 5:33:15 UTC - in response to Message 961709.

I was thinking of the capability of migrating the VM between hardware platforms. Isn't that a useful property, as well?

Profile speedimic
Volunteer tester
Avatar
Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 961903 - Posted: 8 Jan 2010, 13:46:57 UTC - in response to Message 961811.

With all the servers running at full load, where could you migrate the VM to?

I was thinking of the capability of migrating the VM between hardware platforms. Isn't that a useful property, as well?


____________
mic.


Message boards : Technical News : A Billion or So (Jan 06 2010)

Copyright © 2014 University of California