Message boards :
Technical News :
Crashy (Feb 21 2013)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
I already posted this on the front page, but FYI there's going to be another lab-wide power outage all weekend, during which all our servers will be unreachable. Hopefully this is the last of this sort of thing, and/or we relocate to the colocation facility before it happens again. Meanwhile, we've hit a few bumps in the road. I don't think anything dire is happening outside of normal, expected drive failures and kernel hangs. But it's been causing cascading failures on the public facing servers thanks to the web of dependencies each machine has on another. It may seem bad, but everything is more or less okay. I think. I continue to aggressively upgrade and prepare for the impending probable move to the colocation facility, so maybe I'm exercising some lingering, forgotten hardware and configuration issues. That's all I have to report for now, tech-wise. Behind the scenes development has been largely focused on getting a new polyphase filter bank splitter into production. The current splitter has standard, known FFT artifacts causing dips in sensitivity at the edges of workunits and rolloffs at the edges of the whole 2.5MHz band, but this new splitter will create workunits that exhibit more even sensitivity across the whole spectrum, as well as more sensivity in general to find singals in the noise. We also are turning corners on (finally) getting the NTPCkr back into regular production. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
ML1 Send message Joined: 25 Nov 01 Posts: 21019 Credit: 7,508,002 RAC: 20 |
What is it with all the power loss?... Can you conscript a few students to pedal like fury to drive a few dynamos to keep the closet powered?... ;-) And very good news for the improved data splitting for analysis. Hang in there! Keep crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
Thanks for the update Matt, Claggy |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
...By the way I'm fully aware we are about to temporarily run out of workunits to send. I'm just waiting for a RAID resync to finish (in about 90 minutes) before firing up the splitters again. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Dirk Sadowski Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
Matt, thanks for the news! * Best regards! :-) * Philip J. Fry formerly Sutaru Tsureku, team seti.international founder. * Optimize your PC for higher RAC. * SETI@home needs your help. * |
David S Send message Joined: 4 Oct 99 Posts: 18352 Credit: 27,761,924 RAC: 12 |
...By the way I'm fully aware we are about to temporarily run out of workunits to send. I'm just waiting for a RAID resync to finish (in about 90 minutes) before firing up the splitters again. Thanks, especially, for explaining that. David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
Thank you, Matt. As usual, your insights into what goes on behind the scenes helps many of us to understand what goes on besides just server crashes...LOL. You are a very much appreciated part of the Seti lab team. (Not to slight the others, of course.) "Time is simply the mechanism that keeps everything from happening all at once." |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
Behind the scenes development has been largely focused on getting a new polyphase filter bank splitter into production. Matt, It looks as if in compiling the new splitters with <splitter_cfg> ... <pfb_ntaps>0</pfb_ntaps> <pfb_width_factor>0</pfb_width_factor> </splitter_cfg> you've picked up the faulty library code which puts a 64-bit representation of the receiver ID into the WU name. Eric confirmed that this was harmless when we first questioned the long names at Beta, but the huge long names do look a bit ugly in BOINC Manager and task lists. Edit - sorry, not a library problem, but a g++ type conversion problem between enum() and integer - I didn't read on to message 44106. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
And another left-over, perhaps from the crash or power-down. People are reporting that the stats dump for external stats sites, http://setiathome.berkeley.edu/stats/, hasn't been updated since 25 February. Could you kick the daemon, please? |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
And another left-over, perhaps from the crash or power-down. Ah! Thanks. I just fixed that (the scripts were trying to run an older binary that doesn't work after the OS upgrade). Should be back to normal again within the next 24 hours... - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
And another left-over, perhaps from the crash or power-down. Thanks for the fix, Matt. If you happen to see Eric, you can tell him to ignore the email I sent him on the subject a while ago. "Time is simply the mechanism that keeps everything from happening all at once." |
Mike Send message Joined: 17 Feb 01 Posts: 34353 Credit: 79,922,639 RAC: 80 |
Thanks Matt. With each crime and every kindness we birth our future. |
Thomas Send message Joined: 9 Dec 11 Posts: 1499 Credit: 1,345,576 RAC: 0 |
Thanks for the heads-up Matt ! Good news for this new splitter ! :) |
soft^spirit Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0 |
The suggested registry entry seems to be improving throughput in spite of the eternal log jam of the bandwidth. I would be very curious if amount of work accomplished increases as people make this edit, or if any negative side effects on the server end are noted. Janice |
David Mueller Send message Joined: 8 Feb 01 Posts: 5 Credit: 21,378,404 RAC: 16 |
Not sure if you can help I have been trying to download ap files. Since the scheduled shutdown. I have tried different connections to the internet. Normally I can download AP files at a good speed greater than 25, they just continue to download slow, less than 10. Even rebooted my system & SETI app. Thanks Dave |
Gatekeeper Send message Joined: 14 Jul 04 Posts: 887 Credit: 176,479,616 RAC: 0 |
You need to read this thread. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
@ Matt, I'm occasionally seeing Data Distribution State SETI@home # Astropulse # As of* Results ready to send 315,972 0 -1m Current result creation rate 41.2241/sec 0.0441/sec 5m Results out in the field 3,584,585 150,826 -1m Results received in last hour 98,472 1,194 0m Result turnaround time (last hour average) 29.95 hours 149.99 hours 0m Results returned and awaiting validation 2,819,506 176,961 -1m Workunits waiting for validation 52 3 -1m Workunits waiting for assimilation 84 49 -1m Workunit files waiting for deletion 32 0 -1m Result files waiting for deletion 208 0 -1m Workunits waiting for db purging 880,516 23,465 -1m Results waiting for db purging 1,850,801 63,477 -1m on the sever status page. Could those "As of minus one minute" lines indicate that one or more servers have lost their NTP syncs again? |
rob smith Send message Joined: 7 Mar 03 Posts: 22456 Credit: 416,307,556 RAC: 380 |
daylight saving strikes again.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Darth Beaver Send message Joined: 20 Aug 99 Posts: 6728 Credit: 21,443,075 RAC: 3 |
Hello ,sorry about this guys but i'm haveing troubble with einstein@home can't upload units can't get to web pages or message boards and yes i know this is not einstein but anybody out there know what has happened to einstein@home and how long the problem will take to fix ??? Thank's , i know i'm a pain in the ass |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
We are all in the same boat, the E@H site is out, have no ideia why, but today i see something about some trouble with the replica server. Let´s see tomorrow. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.