Weirder (Sep 06 2007)

Message boards : Technical News : Weirder (Sep 06 2007)
Message board moderation

To post messages, you must log in.

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 634938 - Posted: 6 Sep 2007, 22:13:25 UTC

Guess what? A *second* drive on thumper failed this morning, around the same time the other drive failed yesterday. This system is on service, so we should get some replacements soon. But there's no obvious signs of why these two failed so close in succession. They were both on the same drive controller, but there's a 15% chance of that happening at random. The temperatures all look sane.

In better news, we got to the bottom of the weird splitter sequence number problems I spotted yesterday. Now that we understand what happened and why this really isn't a problem at all. Basically, data that was meant to be tacked on the tail of one raw data file ended up at the start of the next file instead. No biggie.

As far as those overflow workunits taking forever... Jeff and Eric wrote some code (and checked it twice) to scour the database for such workunits and "cancel" them. Immediately we saw our pipelines flood with requests for new work.. so expect some delays for a while. We hope to eventually give credit to those who got stuck with these troubled workunits.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 634938 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14687
Credit: 200,643,578
RAC: 874
United Kingdom
Message 634948 - Posted: 6 Sep 2007, 22:56:03 UTC

Excellent news - thanks for getting rid of the "Splitsville Evercrunch Specials" - it'll save us a lot of explanation time in Number Crunching.

And to be offered credit as well - more than we ever asked for.

I've linked the news into NC here.
ID: 634948 · Report as offensive
Profile Dr. C.E.T.I.

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 634987 - Posted: 7 Sep 2007, 0:21:55 UTC

Hear, Hear - Great News from Berkeley - Nice Going All . . .

ID: 634987 · Report as offensive
Profile Rev. Tim Olivera

Send message
Joined: 15 Jan 06
Posts: 20
Credit: 1,717,714
RAC: 0
United States
Message 635309 - Posted: 7 Sep 2007, 13:22:29 UTC - in response to Message 634938.  

I said I was going to keep bitching as long as BOINC was not doing anything!! Over the last 9 days BOINC has been able to long on the SITE server once to get work, my systems are Intel duan 3.6GIG systems with 2GIG of RAM so it takes them about 12 hours to crunch a downloads worth of work.. They have not been able to log back on the server and get any more work! What follows is the message:

9/7/2007 8:30:34 AM|SETI@home|Deferring communication for 1 min 0 sec
9/7/2007 8:30:34 AM|SETI@home|Reason: no work from project
9/7/2007 8:31:34 AM|SETI@home|Sending scheduler request: To fetch work
9/7/2007 8:31:34 AM|SETI@home|Requesting 44928 seconds of new work
9/7/2007 8:31:39 AM|SETI@home|Scheduler RPC succeeded [server version 511]
9/7/2007 8:31:39 AM|SETI@home|Deferring communication for 1 min 0 sec
9/7/2007 8:31:39 AM|SETI@home|Reason: no work from project
9/7/2007 8:32:39 AM|SETI@home|Sending scheduler request: To fetch work
9/7/2007 8:32:39 AM|SETI@home|Requesting 44928 seconds of new work
9/7/2007 8:32:44 AM|SETI@home|Scheduler RPC succeeded [server version 511]
9/7/2007 8:32:44 AM|SETI@home|Deferring communication for 1 min 2 sec
9/7/2007 8:32:44 AM|SETI@home|Reason: no work from project
9/7/2007 8:33:49 AM|SETI@home|Sending scheduler request: To fetch work
9/7/2007 8:33:49 AM|SETI@home|Requesting 44928 seconds of new work
9/7/2007 8:33:54 AM|SETI@home|Scheduler RPC succeeded [server version 511]
9/7/2007 8:33:54 AM|SETI@home|Deferring communication for 3 min 23 sec
9/7/2007 8:33:54 AM|SETI@home|Reason: no work from project
9/7/2007 8:37:19 AM|SETI@home|Sending scheduler request: To fetch work
9/7/2007 8:37:19 AM|SETI@home|Requesting 44928 seconds of new work
9/7/2007 8:37:24 AM|SETI@home|Scheduler RPC succeeded [server version 511]
9/7/2007 8:37:24 AM|SETI@home|Deferring communication for 12 min 26 sec
9/7/2007 8:37:24 AM|SETI@home|Reason: no work from project
9/7/2007 8:46:59 AM||Suspending computation - user is active

Anyone else not getting anything from the SITE server?? Are we just wasting are time leaving our systems running??

Tim Olivera

ID: 635309 · Report as offensive
Profile Profi
Volunteer tester

Send message
Joined: 8 Dec 00
Posts: 19
Credit: 20,552,123
RAC: 0
Message 635334 - Posted: 7 Sep 2007, 13:41:26 UTC - in response to Message 634938.  
Last modified: 7 Sep 2007, 13:42:06 UTC

Guess what? A *second* drive on thumper failed this morning, around the same time the other drive failed yesterday. This system is on service, so we should get some replacements soon. But there's no obvious signs of why these two failed so close in succession. They were both on the same drive controller, but there's a 15% chance of that happening at random. The temperatures all look sane.

In better news, we got to the bottom of the weird splitter sequence number problems I spotted yesterday. Now that we understand what happened and why this really isn't a problem at all. Basically, data that was meant to be tacked on the tail of one raw data file ended up at the start of the next file instead. No biggie.

As far as those overflow workunits taking forever... Jeff and Eric wrote some code (and checked it twice) to scour the database for such workunits and "cancel" them. Immediately we saw our pipelines flood with requests for new work.. so expect some delays for a while. We hope to eventually give credit to those who got stuck with these troubled workunits.

- Matt

Thanks Matt for an update... Anyway - from the brief analysis - SETI@Home has a approx. 1 week MTBF time....

ID: 635334 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Message 635337 - Posted: 7 Sep 2007, 13:42:26 UTC

My 6 computers have received nothing since 0012Z and they've been pinging their little heads off. Looks like things went kaput about two hours after Matt's original post on this thread.

ID: 635337 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21567
Credit: 7,508,002
RAC: 20
United Kingdom
Message 635345 - Posted: 7 Sep 2007, 13:44:43 UTC - in response to Message 635309.  

I said I was going to keep bitching...

Very irreverend of you... I guess you're not of The True Faith?

9/7/2007 8:46:59 AM||Suspending computation - user is active

Why you got that enabled? There should be no need. Boinc shouldn't interfere with normal operation in any case.

Anyone else not getting anything from the SITE server??

New work is trickling in fine for this system. Just leave it to try a few times and all should settle in ok. Also, it's a good idea to join a second Boinc project so that your machine isn't idle if one of the projects has any hiccups.

Are we just wasting are time leaving our systems running??

That's up to you to decide...

Happy crunchin',

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 635345 · Report as offensive
William Roeder
Volunteer tester

Send message
Joined: 19 May 99
Posts: 69
Credit: 523,414
RAC: 0
United States
Message 635352 - Posted: 7 Sep 2007, 13:49:50 UTC - in response to Message 635309.  

9/7/2007 8:37:24 AM|SETI@home|Reason: no work from project
9/7/2007 8:46:59 AM||Suspending computation - user is active

Anyone else not getting anything from the SITE server?? says splitters are down and there's no work to be had.

I've had no problems getting work until today. So now I'll just have to crunch my 8 day's worth of cache and then go on to other projects.

The beta site still has work
ID: 635352 · Report as offensive

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 635421 - Posted: 7 Sep 2007, 15:07:20 UTC - in response to Message 635352.  

9/7/2007 8:37:24 AM|SETI@home|Reason: no work from project
9/7/2007 8:46:59 AM||Suspending computation - user is active

Anyone else not getting anything from the SITE server?? says splitters are down and there's no work to be had.

I've had no problems getting work until today. So now I'll just have to crunch my 8 day's worth of cache and then go on to other projects.

The beta site still has work

No work either. I think when the project has no work, the server should reply with "wait 4 hours and try again". I don't want BOINC checking every 1 min 0 sec for work just loading the scheduler for no reason.
ID: 635421 · Report as offensive
Profile Dr. C.E.T.I.

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 635435 - Posted: 7 Sep 2007, 15:24:06 UTC
Last modified: 7 Sep 2007, 15:25:01 UTC

THIS is the SETI / Berkeley Pages for the NEWS from Them - iT is NOT the NC Forums - please take Your Issues there (or to the Q & A Pages) - RESPECT the ADMIN's Rules Please . . .

Thank You Kindly . . .

With Respect,

richard w lubrich jr (AKA leonardo, nobody & watCh out! with SETI-BOINC since February 29 2000)

edit - and i have been and am gettin' 'plenty' of work @ SETI . . . fyi
ID: 635435 · Report as offensive
Volunteer tester

Send message
Joined: 21 Jan 07
Posts: 8
Credit: 822,571
RAC: 0
United Kingdom
Message 635459 - Posted: 7 Sep 2007, 15:48:44 UTC

Right this is really getting to me. That people are screaming and shouting about not having any work for crunch with out actually looking around the forums to find out why they are not getting no work.

1. If you all had set your work creche to 10 day then you would be still crunching right now because i am i have plenty of work units why because i think about the number of outrages seti has so i planned.

2. If any of you had been bothered to look at the server stats page once in a while you would find the every splitter is disable so they would not be making any work for you.

3. If you had read matt's post properly then you would of found out why there was a problem with sending work...:-
In better news, we got to the bottom of the weird splitter sequence number problems I spotted yesterday. Now that we understand what happened and why this really isn't a problem at all. Basically, data that was meant to be tacked on the tail of one raw data file ended up at the start of the next file instead. No biggie.

As far as those overflow workunits taking forever... Jeff and Eric wrote some code (and checked it twice) to scour the database for such workunits and "cancel" them. Immediately we saw our pipelines flood with requests for new work.. so expect some delays for a while. We hope to eventually give credit to those who got stuck with these troubled workunits.

Really some of you have to WAKE UP and start thinking for yourself instead of bothing the seti team with stupid I CANT GET ANY WORK AT THE MOMENT posts. They are try there best to fix the problem but they don't work on seti 24/7 like some of the other projects the do have other things to do.
So why don't you give them a break.

Your doing a great job matt keep up the good work
ID: 635459 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 9 Jul 00
Posts: 51507
Credit: 1,018,363,574
RAC: 1,004
United States
Message 635461 - Posted: 7 Sep 2007, 15:49:17 UTC - in response to Message 635309.  

I said I was going to keep bitching as long as BOINC was not doing anything!! Over the last 9 days BOINC has been able to long on the SITE server once to get work, my systems are Intel duan 3.6GIG systems with 2GIG of RAM so it takes them about 12 hours to crunch a downloads worth of work.. They have not been able to log back on the server and get any more work! What follows is the message:

9/7/2007 8:30:34 AM|SETI@home|Deferring communication for 1 min 0 sec
9/7/2007 8:30:34 AM|SETI@home|Reason: no work from project
9/7/2007 8:31:34 AM|SETI@home|Sending scheduler request: To fetch work
9/7/2007 8:31:34 AM|SETI@home|Requesting 44928 seconds of new work
9/7/2007 8:31:39 AM|SETI@home|Scheduler RPC succeeded [server version 511]
9/7/2007 8:31:39 AM|SETI@home|Deferring communication for 1 min 0 sec
9/7/2007 8:31:39 AM|SETI@home|Reason: no work from project
9/7/2007 8:32:39 AM|SETI@home|Sending scheduler request: To fetch work
9/7/2007 8:32:39 AM|SETI@home|Requesting 44928 seconds of new work
9/7/2007 8:32:44 AM|SETI@home|Scheduler RPC succeeded [server version 511]
9/7/2007 8:32:44 AM|SETI@home|Deferring communication for 1 min 2 sec
9/7/2007 8:32:44 AM|SETI@home|Reason: no work from project
9/7/2007 8:33:49 AM|SETI@home|Sending scheduler request: To fetch work
9/7/2007 8:33:49 AM|SETI@home|Requesting 44928 seconds of new work
9/7/2007 8:33:54 AM|SETI@home|Scheduler RPC succeeded [server version 511]
9/7/2007 8:33:54 AM|SETI@home|Deferring communication for 3 min 23 sec
9/7/2007 8:33:54 AM|SETI@home|Reason: no work from project
9/7/2007 8:37:19 AM|SETI@home|Sending scheduler request: To fetch work
9/7/2007 8:37:19 AM|SETI@home|Requesting 44928 seconds of new work
9/7/2007 8:37:24 AM|SETI@home|Scheduler RPC succeeded [server version 511]
9/7/2007 8:37:24 AM|SETI@home|Deferring communication for 12 min 26 sec
9/7/2007 8:37:24 AM|SETI@home|Reason: no work from project
9/7/2007 8:46:59 AM||Suspending computation - user is active

Anyone else not getting anything from the SITE server?? Are we just wasting are time leaving our systems running??

Tim Olivera

Reverend. With such a title, you should know something about respect. Would you kindly respect the fact the this is the tech news forum and post your comments/complaints/questions to the Number Crunching forum?

And the kitties say..........'Thank You, kind Sir'.

"Time is simply the mechanism that keeps everything from happening all at once."

ID: 635461 · Report as offensive
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 635881 - Posted: 7 Sep 2007, 20:45:50 UTC - in response to Message 635309.  

I said I was going to keep bitching as long as BOINC was not doing anything!!

<redundant "bloat" removed>

Anyone else not getting anything from the SITE server?? Are we just wasting are time leaving our systems running??

Tim Olivera

No, Reverend, it's just you. I've got at least a weeks' worth of SETI.
ID: 635881 · Report as offensive
Profile Kevin Beasley

Send message
Joined: 27 Aug 99
Posts: 15
Credit: 5,059,412
RAC: 16
United Kingdom
Message 636281 - Posted: 8 Sep 2007, 9:38:14 UTC - in response to Message 635459.  

1. If you all had set your work creche to 10 day then you would be still crunching right now because i am i have plenty of work units why because i think about the number of outrages seti has so i planned.

Quick question - how do you (re)set the work creche?


ID: 636281 · Report as offensive
Profile RandyC

Send message
Joined: 20 Oct 99
Posts: 714
Credit: 1,704,345
RAC: 0
United States
Message 636315 - Posted: 8 Sep 2007, 12:03:11 UTC - in response to Message 636281.  

1. If you all had set your work creche to 10 day then you would be still crunching right now because i am i have plenty of work units why because i think about the number of outrages seti has so i planned.

Quick question - how do you (re)set the work creche?


Your work cache (a creche is a baby-bed) depends on your Connect Interval along with the 'Maintain enough work for an additional x days' settings.

Go to Your Account, select General Preferences and update your Network Preferences. There are 3 (4 if you count Default) venues you can customize for your use. They are called Home, Work, and School...but you can use them for whatever you feel like.
ID: 636315 · Report as offensive

Message boards : Technical News : Weirder (Sep 06 2007)

©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.