Message boards :
Technical News :
Weirder (Sep 06 2007)
Message board moderation
Author | Message |
---|---|
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
Guess what? A *second* drive on thumper failed this morning, around the same time the other drive failed yesterday. This system is on service, so we should get some replacements soon. But there's no obvious signs of why these two failed so close in succession. They were both on the same drive controller, but there's a 15% chance of that happening at random. The temperatures all look sane. In better news, we got to the bottom of the weird splitter sequence number problems I spotted yesterday. Now that we understand what happened and why this really isn't a problem at all. Basically, data that was meant to be tacked on the tail of one raw data file ended up at the start of the next file instead. No biggie. As far as those overflow workunits taking forever... Jeff and Eric wrote some code (and checked it twice) to scour the database for such workunits and "cancel" them. Immediately we saw our pipelines flood with requests for new work.. so expect some delays for a while. We hope to eventually give credit to those who got stuck with these troubled workunits. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
Excellent news - thanks for getting rid of the "Splitsville Evercrunch Specials" - it'll save us a lot of explanation time in Number Crunching. And to be offered credit as well - more than we ever asked for. I've linked the news into NC here. |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
Hear, Hear - Great News from Berkeley - Nice Going All . . . |
Rev. Tim Olivera Send message Joined: 15 Jan 06 Posts: 20 Credit: 1,717,714 RAC: 0 |
I said I was going to keep bitching as long as BOINC was not doing anything!! Over the last 9 days BOINC has been able to long on the SITE server once to get work, my systems are Intel duan 3.6GIG systems with 2GIG of RAM so it takes them about 12 hours to crunch a downloads worth of work.. They have not been able to log back on the server and get any more work! What follows is the message: 9/7/2007 8:30:34 AM|SETI@home|Deferring communication for 1 min 0 sec 9/7/2007 8:30:34 AM|SETI@home|Reason: no work from project 9/7/2007 8:31:34 AM|SETI@home|Sending scheduler request: To fetch work 9/7/2007 8:31:34 AM|SETI@home|Requesting 44928 seconds of new work 9/7/2007 8:31:39 AM|SETI@home|Scheduler RPC succeeded [server version 511] 9/7/2007 8:31:39 AM|SETI@home|Deferring communication for 1 min 0 sec 9/7/2007 8:31:39 AM|SETI@home|Reason: no work from project 9/7/2007 8:32:39 AM|SETI@home|Sending scheduler request: To fetch work 9/7/2007 8:32:39 AM|SETI@home|Requesting 44928 seconds of new work 9/7/2007 8:32:44 AM|SETI@home|Scheduler RPC succeeded [server version 511] 9/7/2007 8:32:44 AM|SETI@home|Deferring communication for 1 min 2 sec 9/7/2007 8:32:44 AM|SETI@home|Reason: no work from project 9/7/2007 8:33:49 AM|SETI@home|Sending scheduler request: To fetch work 9/7/2007 8:33:49 AM|SETI@home|Requesting 44928 seconds of new work 9/7/2007 8:33:54 AM|SETI@home|Scheduler RPC succeeded [server version 511] 9/7/2007 8:33:54 AM|SETI@home|Deferring communication for 3 min 23 sec 9/7/2007 8:33:54 AM|SETI@home|Reason: no work from project 9/7/2007 8:37:19 AM|SETI@home|Sending scheduler request: To fetch work 9/7/2007 8:37:19 AM|SETI@home|Requesting 44928 seconds of new work 9/7/2007 8:37:24 AM|SETI@home|Scheduler RPC succeeded [server version 511] 9/7/2007 8:37:24 AM|SETI@home|Deferring communication for 12 min 26 sec 9/7/2007 8:37:24 AM|SETI@home|Reason: no work from project 9/7/2007 8:46:59 AM||Suspending computation - user is active Anyone else not getting anything from the SITE server?? Are we just wasting are time leaving our systems running?? Tim Olivera |
Profi Send message Joined: 8 Dec 00 Posts: 19 Credit: 20,552,123 RAC: 0 |
Guess what? A *second* drive on thumper failed this morning, around the same time the other drive failed yesterday. This system is on service, so we should get some replacements soon. But there's no obvious signs of why these two failed so close in succession. They were both on the same drive controller, but there's a 15% chance of that happening at random. The temperatures all look sane. Thanks Matt for an update... Anyway - from the brief analysis - SETI@Home has a approx. 1 week MTBF time.... -Profi |
Terror Australis Send message Joined: 14 Feb 04 Posts: 1817 Credit: 262,693,308 RAC: 44 |
My 6 computers have received nothing since 0012Z and they've been pinging their little heads off. Looks like things went kaput about two hours after Matt's original post on this thread. Regards Brodo |
ML1 Send message Joined: 25 Nov 01 Posts: 21009 Credit: 7,508,002 RAC: 20 |
I said I was going to keep bitching... Very irreverend of you... I guess you're not of The True Faith? 9/7/2007 8:46:59 AM||Suspending computation - user is active Why you got that enabled? There should be no need. Boinc shouldn't interfere with normal operation in any case. Anyone else not getting anything from the SITE server?? New work is trickling in fine for this system. Just leave it to try a few times and all should settle in ok. Also, it's a good idea to join a second Boinc project so that your machine isn't idle if one of the projects has any hiccups. Are we just wasting are time leaving our systems running?? That's up to you to decide... Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
William Roeder Send message Joined: 19 May 99 Posts: 69 Credit: 523,414 RAC: 0 |
9/7/2007 8:37:24 AM|SETI@home|Reason: no work from project http://setiathome.berkeley.edu/sah_status.html says splitters are down and there's no work to be had. I've had no problems getting work until today. So now I'll just have to crunch my 8 day's worth of cache and then go on to other projects. The beta site still has work http://setiweb.ssl.berkeley.edu/beta/status.php |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
9/7/2007 8:37:24 AM|SETI@home|Reason: no work from project No work either. I think when the project has no work, the server should reply with "wait 4 hours and try again". I don't want BOINC checking every 1 min 0 sec for work just loading the scheduler for no reason. |
Dr. C.E.T.I. Send message Joined: 29 Feb 00 Posts: 16019 Credit: 794,685 RAC: 0 |
THIS is the SETI / Berkeley Pages for the NEWS from Them - iT is NOT the NC Forums - please take Your Issues there (or to the Q & A Pages) - RESPECT the ADMIN's Rules Please . . . Thank You Kindly . . . With Respect, richard w lubrich jr (AKA leonardo, nobody & watCh out! with SETI-BOINC since February 29 2000) edit - and i have been and am gettin' 'plenty' of work @ SETI . . . fyi |
nickth Send message Joined: 21 Jan 07 Posts: 8 Credit: 822,571 RAC: 0 |
Right this is really getting to me. That people are screaming and shouting about not having any work for crunch with out actually looking around the forums to find out why they are not getting no work. 1. If you all had set your work creche to 10 day then you would be still crunching right now because i am i have plenty of work units why because i think about the number of outrages seti has so i planned. 2. If any of you had been bothered to look at the server stats page once in a while you would find the every splitter is disable so they would not be making any work for you. 3. If you had read matt's post properly then you would of found out why there was a problem with sending work...:- In better news, we got to the bottom of the weird splitter sequence number problems I spotted yesterday. Now that we understand what happened and why this really isn't a problem at all. Basically, data that was meant to be tacked on the tail of one raw data file ended up at the start of the next file instead. No biggie. As far as those overflow workunits taking forever... Jeff and Eric wrote some code (and checked it twice) to scour the database for such workunits and "cancel" them. Immediately we saw our pipelines flood with requests for new work.. so expect some delays for a while. We hope to eventually give credit to those who got stuck with these troubled workunits. Really some of you have to WAKE UP and start thinking for yourself instead of bothing the seti team with stupid I CANT GET ANY WORK AT THE MOMENT posts. They are try there best to fix the problem but they don't work on seti 24/7 like some of the other projects the do have other things to do. So why don't you give them a break. Your doing a great job matt keep up the good work |
kittyman Send message Joined: 9 Jul 00 Posts: 51477 Credit: 1,018,363,574 RAC: 1,004 |
I said I was going to keep bitching as long as BOINC was not doing anything!! Over the last 9 days BOINC has been able to long on the SITE server once to get work, my systems are Intel duan 3.6GIG systems with 2GIG of RAM so it takes them about 12 hours to crunch a downloads worth of work.. They have not been able to log back on the server and get any more work! What follows is the message: Reverend. With such a title, you should know something about respect. Would you kindly respect the fact the this is the tech news forum and post your comments/complaints/questions to the Number Crunching forum? And the kitties say..........'Thank You, kind Sir'. "Time is simply the mechanism that keeps everything from happening all at once." |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
I said I was going to keep bitching as long as BOINC was not doing anything!! No, Reverend, it's just you. I've got at least a weeks' worth of SETI. |
Kevin Beasley Send message Joined: 27 Aug 99 Posts: 15 Credit: 5,059,412 RAC: 16 |
1. If you all had set your work creche to 10 day then you would be still crunching right now because i am i have plenty of work units why because i think about the number of outrages seti has so i planned. Quick question - how do you (re)set the work creche? Cheers |
RandyC Send message Joined: 20 Oct 99 Posts: 714 Credit: 1,704,345 RAC: 0 |
1. If you all had set your work creche to 10 day then you would be still crunching right now because i am i have plenty of work units why because i think about the number of outrages seti has so i planned. Your work cache (a creche is a baby-bed) depends on your Connect Interval along with the 'Maintain enough work for an additional x days' settings. Go to Your Account, select General Preferences and update your Network Preferences. There are 3 (4 if you count Default) venues you can customize for your use. They are called Home, Work, and School...but you can use them for whatever you feel like. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.