A Billion or So (Jan 06 2010)

Message boards : Technical News : A Billion or So (Jan 06 2010)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 961366 - Posted: 6 Jan 2010, 23:47:01 UTC

Still catching up from the reduced/random schedule during the holidays. The science database rehabilitation project still continues. We're nearing the end: the primary science database (thumper) is now corruption free, stable, and logging properly. The secondary science database (bambi) is being rebuilt as I type using the science database backup we made on Monday. The rebuilding is going rather slowly - we predict it will take 11 days (!) at current rates. As I typed this paragraph we noticed the rebuild was stuck. We feared we had to reboot the system and start again from scratch but luckily we were able to find the errant process locking the whole system, and everything else sprung to life, continuing where it left off. Phew.

By the way... not to rain on the parade, but during the holidays one of the drives in thumper's RAID issued some warnings. Last time that happened we got some, well, um... corruption. I doubt we'll have to go through this whole rigamarole again. If anything, just a small part of the cookbook. Ah, probably not worth worrying about. We'll run some checks when all the above is through and see where we're at.

In better news, I got scram_peek working again. What's that? It's a little utility that runs down at the telescope and reads various diagnostics as they are broadcast around the local net. Stuff like current telescope position, if alfa is running, etc.. This hasn't been working since our data recorder issues a loooong time ago, so our science status page (where we post such info) has been rather stale. One major stumbling block was the old scram_peek ran on a solaris machine, but that particular system died. We had no other solaris system handy so I had to recompile it on linux. It's really old code, linking against even older libraries. I had some compiler errors to work through - annoying but nothing too extreme.

Anyway, I'm looking at the science status page right now and the ALFA receiver light is green. That's beautiful. You may also notice the # of spikes in the science database is shockingly low. That's because we recently split the spike table into two (it grew beyond the bounds a single logical table could handle). We'll combine them again at a later point. Until then, that number is off by a billion or so (1,341,844,240 to be exact).

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 961366 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 961367 - Posted: 6 Jan 2010, 23:58:44 UTC - in response to Message 961366.  

Thanks for the news Matt, glad most things are doing OK.

Did you see my questions in http://setiathome.berkeley.edu/forum_thread.php?id=56160? What happened to Mork a few days ago? Have you eliminated any intermittant hardware problems?
ID: 961367 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 961369 - Posted: 7 Jan 2010, 0:02:46 UTC
Last modified: 7 Jan 2010, 0:03:13 UTC

Right... regarding mork we still have no clue. I don't think it's power - the system is just hanging there in a frozen stateuntil we have to hard reset it. Maybe that's a symptom to a power problem I've never seen before. Anyway, it's an "engineering model" system so all bets are off, really.

The good news on that front, if anything, is that Jeff and I finally incorporated an IP enabled power switch so we can at least fully power cycle the thing from the comfort of our homes next time this happens off hours. It's still no good to have your master user database crash and recover every other week.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 961369 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 961370 - Posted: 7 Jan 2010, 0:04:06 UTC
Last modified: 7 Jan 2010, 0:51:54 UTC

Thanks for the update Matt,

Claggy

Edit: Strangly enough, i was an Avionics Technician in the 80's as well :)
ID: 961370 · Report as offensive
Profile Keith T.
Volunteer tester
Avatar

Send message
Joined: 23 Aug 99
Posts: 962
Credit: 537,293
RAC: 9
United Kingdom
Message 961374 - Posted: 7 Jan 2010, 0:18:04 UTC - in response to Message 961369.  

This is just a theory, as I have not seen the system, you might have some AC ripple leaking through onto a DC line, due to a filter breaking down. Is it possible to swap out a power unit?

I used to be an avionics tech in the Air Force back in the '80s. I have seen a few wierd snags get tracked down to bad power or bad grounding or earthing.

Keith T.
ID: 961374 · Report as offensive
Luke
Volunteer developer
Avatar

Send message
Joined: 31 Dec 06
Posts: 2546
Credit: 817,560
RAC: 0
New Zealand
Message 961401 - Posted: 7 Jan 2010, 2:39:45 UTC

Great work Matt!

Is there perhaps any chance of an update at the S@H Data Distribution History page?

Looks like the last time it was updated was in August 2008...

- Luke.
- Luke.
ID: 961401 · Report as offensive
Profile Francis Noel
Avatar

Send message
Joined: 30 Aug 05
Posts: 452
Credit: 142,832,523
RAC: 94
Canada
Message 961442 - Posted: 7 Jan 2010, 4:59:46 UTC

Cant...resist...

<sagan> a billion or so </sagan>

:D
mambo
ID: 961442 · Report as offensive
Peter Jeremy

Send message
Joined: 18 May 99
Posts: 1
Credit: 1,821,011
RAC: 0
Australia
Message 961499 - Posted: 7 Jan 2010, 9:12:08 UTC

one of the drives in thumper's RAID issued some warnings. Last time that happened we got some, well, um... corruption.


This has been suggested before but why not use ZFS on the Thumper (and other fileservers)? Having an additional end-to-end checksum is very useful when your data volume starts to approach typical disk bit-error rates. You can also run a data verify ("scrub") to ensure that the data you read off the drives matches the expected data.
ID: 961499 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 961540 - Posted: 7 Jan 2010, 14:56:57 UTC - in response to Message 961369.  

It's still no good to have your master user database crash and recover every other week.


Maybe you'd be better of switching it to be the secondary instead?
ID: 961540 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 961559 - Posted: 7 Jan 2010, 16:42:27 UTC

Is there any chance whatsoever to run some of these servers as virtual servers, thereby increasing the system's flexibility? It seems possible that one of the VM providers may be interesting in the experiment and would provide 'free' software to such a visible project. With any luck it may make Matt's work-experience a little less anxious!
ID: 961559 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 961709 - Posted: 7 Jan 2010, 23:37:27 UTC - in response to Message 961559.  

Is there any chance whatsoever to run some of these servers as virtual servers, thereby increasing the system's flexibility? It seems possible that one of the VM providers may be interesting in the experiment and would provide 'free' software to such a visible project. With any luck it may make Matt's work-experience a little less anxious!

Virtual servers are only useful if:

1) The servers are not completely saturated (virtualizing things does take extra CPU and disk resources).
2) There are processes that come and go on an irregular basis.

Neither of these is true in this case.


BOINC WIKI
ID: 961709 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 961811 - Posted: 8 Jan 2010, 5:33:15 UTC - in response to Message 961709.  

I was thinking of the capability of migrating the VM between hardware platforms. Isn't that a useful property, as well?
ID: 961811 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 961903 - Posted: 8 Jan 2010, 13:46:57 UTC - in response to Message 961811.  

With all the servers running at full load, where could you migrate the VM to?

I was thinking of the capability of migrating the VM between hardware platforms. Isn't that a useful property, as well?


mic.


ID: 961903 · Report as offensive

Message boards : Technical News : A Billion or So (Jan 06 2010)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.