Crashy (Feb 21 2013)

Message boards : Technical News : Crashy (Feb 21 2013)

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1441
Credit: 213,689
RAC: 0
United States
Message 1340001 - Posted: 21 Feb 2013, 20:34:01 UTC

I already posted this on the front page, but FYI there's going to be another lab-wide power outage all weekend, during which all our servers will be unreachable. Hopefully this is the last of this sort of thing, and/or we relocate to the colocation facility before it happens again.

Meanwhile, we've hit a few bumps in the road. I don't think anything dire is happening outside of normal, expected drive failures and kernel hangs. But it's been causing cascading failures on the public facing servers thanks to the web of dependencies each machine has on another. It may seem bad, but everything is more or less okay. I think. I continue to aggressively upgrade and prepare for the impending probable move to the colocation facility, so maybe I'm exercising some lingering, forgotten hardware and configuration issues.

That's all I have to report for now, tech-wise. Behind the scenes development has been largely focused on getting a new polyphase filter bank splitter into production. The current splitter has standard, known FFT artifacts causing dips in sensitivity at the edges of workunits and rolloffs at the edges of the whole 2.5MHz band, but this new splitter will create workunits that exhibit more even sensitivity across the whole spectrum, as well as more sensivity in general to find singals in the noise. We also are turning corners on (finally) getting the NTPCkr back into regular production.

- Matt


-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

ID: 1340001 · Report as offensive
Profile ML1
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 9201
Credit: 5,925,310
RAC: 1,861
United Kingdom
Message 1340005 - Posted: 21 Feb 2013, 20:37:19 UTC
Last modified: 21 Feb 2013, 20:39:43 UTC

What is it with all the power loss?...


Can you conscript a few students to pedal like fury to drive a few dynamos to keep the closet powered?...

;-)


And very good news for the improved data splitting for analysis.

Hang in there!

Keep crunchin',
Martin


See new freedom: Mageia5
See & try out for yourself: Linux Voice
The Future is what We all make IT (GPLv3)

ID: 1340005 · Report as offensive
ClaggyProject Donor
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4622
Credit: 46,334,083
RAC: 3,000
United Kingdom
Message 1340008 - Posted: 21 Feb 2013, 20:48:17 UTC - in response to Message 1340001.

Thanks for the update Matt,

Claggy

ID: 1340008 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1441
Credit: 213,689
RAC: 0
United States
Message 1340029 - Posted: 21 Feb 2013, 21:35:25 UTC
Last modified: 21 Feb 2013, 21:35:36 UTC

...By the way I'm fully aware we are about to temporarily run out of workunits to send. I'm just waiting for a RAID resync to finish (in about 90 minutes) before firing up the splitters again.

- Matt


-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

ID: 1340029 · Report as offensive
Profile Chris SCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 19 Nov 00
Posts: 38176
Credit: 21,218,844
RAC: 27,678
United Kingdom
Message 1340041 - Posted: 21 Feb 2013, 22:15:40 UTC
Last modified: 21 Feb 2013, 22:16:25 UTC

We also are turning corners on (finally) getting the NTPCkr back into regular production.

Matt, as always many thanks for the heads up, we really do appreciate it. The news about Nitpicker is particularly welcome :-)

ID: 1340041 · Report as offensive
Profile Dirk Sadowski
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7066
Credit: 100,834,856
RAC: 56,913
Germany
Message 1340044 - Posted: 21 Feb 2013, 22:36:31 UTC

Matt, thanks for the news!


* Best regards! :-) * Philip J. Fry formerly Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. *


ID: 1340044 · Report as offensive
David SProject Donor
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 17034
Credit: 20,913,156
RAC: 5,975
United States
Message 1340206 - Posted: 22 Feb 2013, 14:11:16 UTC - in response to Message 1340029.

...By the way I'm fully aware we are about to temporarily run out of workunits to send. I'm just waiting for a RAID resync to finish (in about 90 minutes) before firing up the splitters again.

- Matt

Thanks, especially, for explaining that.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


ID: 1340206 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45858
Credit: 814,537,051
RAC: 121,835
United States
Message 1340208 - Posted: 22 Feb 2013, 14:17:02 UTC

Thank you, Matt.
As usual, your insights into what goes on behind the scenes helps many of us to understand what goes on besides just server crashes...LOL.
You are a very much appreciated part of the Seti lab team. (Not to slight the others, of course.)


Kitties make wonderful traveling companions on your journey through life.

Have made a few friends in this life.
Most were cats.

ID: 1340208 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11136
Credit: 83,478,328
RAC: 40,882
United Kingdom
Message 1341610 - Posted: 28 Feb 2013, 12:22:24 UTC - in response to Message 1340001.
Last modified: 28 Feb 2013, 12:28:33 UTC

Behind the scenes development has been largely focused on getting a new polyphase filter bank splitter into production.

Matt,

It looks as if in compiling the new splitters with

  <splitter_cfg>
    ...
    <pfb_ntaps>0</pfb_ntaps>
    <pfb_width_factor>0</pfb_width_factor>
  </splitter_cfg>

you've picked up the faulty library code which puts a 64-bit representation of the receiver ID into the WU name. Eric confirmed that this was harmless when we first questioned the long names at Beta, but the huge long names do look a bit ugly in BOINC Manager and task lists.

Edit - sorry, not a library problem, but a g++ type conversion problem between enum() and integer - I didn't read on to message 44106.

ID: 1341610 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11136
Credit: 83,478,328
RAC: 40,882
United Kingdom
Message 1341685 - Posted: 28 Feb 2013, 17:11:02 UTC

And another left-over, perhaps from the crash or power-down.

People are reporting that the stats dump for external stats sites, http://setiathome.berkeley.edu/stats/, hasn't been updated since 25 February. Could you kick the daemon, please?

ID: 1341685 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1441
Credit: 213,689
RAC: 0
United States
Message 1341702 - Posted: 28 Feb 2013, 18:11:47 UTC - in response to Message 1341685.

And another left-over, perhaps from the crash or power-down.

People are reporting that the stats dump for external stats sites, http://setiathome.berkeley.edu/stats/, hasn't been updated since 25 February. Could you kick the daemon, please?


Ah! Thanks. I just fixed that (the scripts were trying to run an older binary that doesn't work after the OS upgrade). Should be back to normal again within the next 24 hours...

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

ID: 1341702 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45858
Credit: 814,537,051
RAC: 121,835
United States
Message 1341703 - Posted: 28 Feb 2013, 18:13:33 UTC - in response to Message 1341702.

And another left-over, perhaps from the crash or power-down.

People are reporting that the stats dump for external stats sites, http://setiathome.berkeley.edu/stats/, hasn't been updated since 25 February. Could you kick the daemon, please?


Ah! Thanks. I just fixed that (the scripts were trying to run an older binary that doesn't work after the OS upgrade). Should be back to normal again within the next 24 hours...

- Matt

Thanks for the fix, Matt.
If you happen to see Eric, you can tell him to ignore the email I sent him on the subject a while ago.
Kitties make wonderful traveling companions on your journey through life.

Have made a few friends in this life.
Most were cats.

ID: 1341703 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 29557
Credit: 49,007,960
RAC: 16,933
Germany
Message 1341777 - Posted: 28 Feb 2013, 22:19:39 UTC

Thanks Matt.


With each crime and every kindness we birth our future.

ID: 1341777 · Report as offensive
Thomas
Volunteer tester

Send message
Joined: 9 Dec 11
Posts: 1499
Credit: 1,345,576
RAC: 0
France
Message 1343198 - Posted: 5 Mar 2013, 7:11:22 UTC

Thanks for the heads-up Matt !
Good news for this new splitter ! :)


ID: 1343198 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6438
Credit: 31,801,139
RAC: 6,403
United States
Message 1343948 - Posted: 8 Mar 2013, 3:42:49 UTC
Last modified: 8 Mar 2013, 3:43:06 UTC

The suggested registry entry seems to be improving throughput in spite of the eternal log jam of the bandwidth. I would be very curious if amount of work accomplished increases as people make this edit, or if any negative side effects on the server end are noted.



Janice

ID: 1343948 · Report as offensive
Profile David Mueller
Volunteer tester

Send message
Joined: 8 Feb 01
Posts: 5
Credit: 18,250,384
RAC: 5,447
Canada
Message 1344314 - Posted: 9 Mar 2013, 2:49:55 UTC

Not sure if you can help I have been trying to download ap files. Since the scheduled shutdown. I have tried different connections to the internet.
Normally I can download AP files at a good speed greater than 25, they just continue to download slow, less than 10.
Even rebooted my system & SETI app.
Thanks
Dave


ID: 1344314 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1344325 - Posted: 9 Mar 2013, 3:16:47 UTC - in response to Message 1344314.

You need to read this thread.


ID: 1344325 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11136
Credit: 83,478,328
RAC: 40,882
United Kingdom
Message 1345372 - Posted: 11 Mar 2013, 16:30:31 UTC

@ Matt,

I'm occasionally seeing

Data Distribution State				SETI@home #	Astropulse #	As of*
Results ready to send				315,972		0		-1m
Current result creation rate			41.2241/sec	0.0441/sec	5m
Results out in the field			3,584,585	150,826		-1m
Results received in last hour			98,472		1,194		0m
Result turnaround time (last hour average)	29.95 hours	149.99 hours	0m
Results returned and awaiting validation	2,819,506	176,961		-1m
Workunits waiting for validation		52		3		-1m
Workunits waiting for assimilation		84		49		-1m
Workunit files waiting for deletion		32		0		-1m
Result files waiting for deletion		208		0		-1m
Workunits waiting for db purging		880,516		23,465		-1m
Results waiting for db purging			1,850,801	63,477		-1m

on the sever status page.

Could those "As of minus one minute" lines indicate that one or more servers have lost their NTP syncs again?

ID: 1345372 · Report as offensive
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 13295
Credit: 154,075,904
RAC: 111,951
United Kingdom
Message 1345400 - Posted: 11 Mar 2013, 17:12:43 UTC

daylight saving strikes again....


Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

ID: 1345400 · Report as offensive
Darth Beaver
Avatar

Send message
Joined: 20 Aug 99
Posts: 6357
Credit: 15,592,889
RAC: 1,204
Australia
Message 1345639 - Posted: 12 Mar 2013, 0:48:21 UTC - in response to Message 1345400.

Hello ,sorry about this guys but i'm haveing troubble with einstein@home can't upload units can't get to web pages or message boards and yes i know this is not einstein but anybody out there know what has happened to einstein@home and how long the problem will take to fix ??? Thank's , i know i'm a pain in the ass



ID: 1345639 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Crashy (Feb 21 2013)


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.