Crashy (Feb 21 2013)


log in

Advanced search

Message boards : Technical News : Crashy (Feb 21 2013)

1 · 2 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1340001 - Posted: 21 Feb 2013, 20:34:01 UTC

I already posted this on the front page, but FYI there's going to be another lab-wide power outage all weekend, during which all our servers will be unreachable. Hopefully this is the last of this sort of thing, and/or we relocate to the colocation facility before it happens again.

Meanwhile, we've hit a few bumps in the road. I don't think anything dire is happening outside of normal, expected drive failures and kernel hangs. But it's been causing cascading failures on the public facing servers thanks to the web of dependencies each machine has on another. It may seem bad, but everything is more or less okay. I think. I continue to aggressively upgrade and prepare for the impending probable move to the colocation facility, so maybe I'm exercising some lingering, forgotten hardware and configuration issues.

That's all I have to report for now, tech-wise. Behind the scenes development has been largely focused on getting a new polyphase filter bank splitter into production. The current splitter has standard, known FFT artifacts causing dips in sensitivity at the edges of workunits and rolloffs at the edges of the whole 2.5MHz band, but this new splitter will create workunits that exhibit more even sensitivity across the whole spectrum, as well as more sensivity in general to find singals in the noise. We also are turning corners on (finally) getting the NTPCkr back into regular production.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile ML1
Volunteer tester
Send message
Joined: 25 Nov 01
Posts: 8499
Credit: 4,194,256
RAC: 1,721
United Kingdom
Message 1340005 - Posted: 21 Feb 2013, 20:37:19 UTC
Last modified: 21 Feb 2013, 20:39:43 UTC

What is it with all the power loss?...


Can you conscript a few students to pedal like fury to drive a few dynamos to keep the closet powered?...

;-)


And very good news for the improved data splitting for analysis.

Hang in there!

Keep crunchin',
Martin
____________
See new freedom: Mageia4
Linux Voice See & try out your OS Freedom!
The Future is what We make IT (GPLv3)

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4141
Credit: 33,622,701
RAC: 27,875
United Kingdom
Message 1340008 - Posted: 21 Feb 2013, 20:48:17 UTC - in response to Message 1340001.

Thanks for the update Matt,

Claggy

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1340029 - Posted: 21 Feb 2013, 21:35:25 UTC
Last modified: 21 Feb 2013, 21:35:36 UTC

...By the way I'm fully aware we are about to temporarily run out of workunits to send. I'm just waiting for a RAID resync to finish (in about 90 minutes) before firing up the splitters again.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile Chris SProject donor
Volunteer tester
Avatar
Send message
Joined: 19 Nov 00
Posts: 32104
Credit: 13,793,121
RAC: 25,029
United Kingdom
Message 1340041 - Posted: 21 Feb 2013, 22:15:40 UTC
Last modified: 21 Feb 2013, 22:16:25 UTC

We also are turning corners on (finally) getting the NTPCkr back into regular production.

Matt, as always many thanks for the heads up, we really do appreciate it. The news about Nitpicker is particularly welcome :-)

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7101
Credit: 60,934,513
RAC: 17,226
Germany
Message 1340044 - Posted: 21 Feb 2013, 22:36:31 UTC

Matt, thanks for the news!


* Best regards! :-) * Philip J. Fry formerly Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *
____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

N9JFE David SProject donor
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 11992
Credit: 14,657,963
RAC: 12,195
United States
Message 1340206 - Posted: 22 Feb 2013, 14:11:16 UTC - in response to Message 1340029.

...By the way I'm fully aware we are about to temporarily run out of workunits to send. I'm just waiting for a RAID resync to finish (in about 90 minutes) before firing up the splitters again.

- Matt

Thanks, especially, for explaining that.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8634
Credit: 51,604,535
RAC: 48,754
United Kingdom
Message 1341610 - Posted: 28 Feb 2013, 12:22:24 UTC - in response to Message 1340001.
Last modified: 28 Feb 2013, 12:28:33 UTC

Behind the scenes development has been largely focused on getting a new polyphase filter bank splitter into production.

Matt,

It looks as if in compiling the new splitters with

<splitter_cfg> ... <pfb_ntaps>0</pfb_ntaps> <pfb_width_factor>0</pfb_width_factor> </splitter_cfg>

you've picked up the faulty library code which puts a 64-bit representation of the receiver ID into the WU name. Eric confirmed that this was harmless when we first questioned the long names at Beta, but the huge long names do look a bit ugly in BOINC Manager and task lists.

Edit - sorry, not a library problem, but a g++ type conversion problem between enum() and integer - I didn't read on to message 44106.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8634
Credit: 51,604,535
RAC: 48,754
United Kingdom
Message 1341685 - Posted: 28 Feb 2013, 17:11:02 UTC

And another left-over, perhaps from the crash or power-down.

People are reporting that the stats dump for external stats sites, http://setiathome.berkeley.edu/stats/, hasn't been updated since 25 February. Could you kick the daemon, please?

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 1341702 - Posted: 28 Feb 2013, 18:11:47 UTC - in response to Message 1341685.

And another left-over, perhaps from the crash or power-down.

People are reporting that the stats dump for external stats sites, http://setiathome.berkeley.edu/stats/, hasn't been updated since 25 February. Could you kick the daemon, please?


Ah! Thanks. I just fixed that (the scripts were trying to run an older binary that doesn't work after the OS upgrade). Should be back to normal again within the next 24 hours...

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24551
Credit: 33,888,733
RAC: 24,414
Germany
Message 1341777 - Posted: 28 Feb 2013, 22:19:39 UTC

Thanks Matt.

____________

Profile {BDC} Thomas DupontProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Dec 11
Posts: 3905
Credit: 1,325,539
RAC: 202
France
Message 1343198 - Posted: 5 Mar 2013, 7:11:22 UTC

Thanks for the heads-up Matt !
Good news for this new splitter ! :)
____________
Founder of french team BRIGADE DU COSMOS
Ranked 55th worldwide

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,631,148
RAC: 2
United States
Message 1343948 - Posted: 8 Mar 2013, 3:42:49 UTC
Last modified: 8 Mar 2013, 3:43:06 UTC

The suggested registry entry seems to be improving throughput in spite of the eternal log jam of the bandwidth. I would be very curious if amount of work accomplished increases as people make this edit, or if any negative side effects on the server end are noted.
____________

Janice

Profile David Mueller
Volunteer tester
Send message
Joined: 8 Feb 01
Posts: 5
Credit: 12,690,872
RAC: 9,720
Canada
Message 1344314 - Posted: 9 Mar 2013, 2:49:55 UTC

Not sure if you can help I have been trying to download ap files. Since the scheduled shutdown. I have tried different connections to the internet.
Normally I can download AP files at a good speed greater than 25, they just continue to download slow, less than 10.
Even rebooted my system & SETI app.
Thanks
Dave
____________

Profile Gatekeeper
Avatar
Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1344325 - Posted: 9 Mar 2013, 3:16:47 UTC - in response to Message 1344314.

You need to read this thread.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8634
Credit: 51,604,535
RAC: 48,754
United Kingdom
Message 1345372 - Posted: 11 Mar 2013, 16:30:31 UTC

@ Matt,

I'm occasionally seeing

Data Distribution State SETI@home # Astropulse # As of* Results ready to send 315,972 0 -1m Current result creation rate 41.2241/sec 0.0441/sec 5m Results out in the field 3,584,585 150,826 -1m Results received in last hour 98,472 1,194 0m Result turnaround time (last hour average) 29.95 hours 149.99 hours 0m Results returned and awaiting validation 2,819,506 176,961 -1m Workunits waiting for validation 52 3 -1m Workunits waiting for assimilation 84 49 -1m Workunit files waiting for deletion 32 0 -1m Result files waiting for deletion 208 0 -1m Workunits waiting for db purging 880,516 23,465 -1m Results waiting for db purging 1,850,801 63,477 -1m

on the sever status page.

Could those "As of minus one minute" lines indicate that one or more servers have lost their NTP syncs again?

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8535
Credit: 59,457,217
RAC: 86,314
United Kingdom
Message 1345400 - Posted: 11 Mar 2013, 17:12:43 UTC

daylight saving strikes again....
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Glenn savill
Avatar
Send message
Joined: 20 Aug 99
Posts: 2598
Credit: 3,814,257
RAC: 23,026
Australia
Message 1345639 - Posted: 12 Mar 2013, 0:48:21 UTC - in response to Message 1345400.

Hello ,sorry about this guys but i'm haveing troubble with einstein@home can't upload units can't get to web pages or message boards and yes i know this is not einstein but anybody out there know what has happened to einstein@home and how long the problem will take to fix ??? Thank's , i know i'm a pain in the ass
____________

1 · 2 · Next

Message boards : Technical News : Crashy (Feb 21 2013)

Copyright © 2014 University of California