Blop (Jun 25 2007)

Message boards : Technical News : Blop (Jun 25 2007)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 592582 - Posted: 25 Jun 2007, 22:54:36 UTC
Last modified: 25 Jun 2007, 22:55:50 UTC

No major failures to report today. Good. Maybe you noticed web servers going up and down today - I was upgrading versions just to keep up with security. You may also note the result queue draining a bit. I changed the ceiling from 500K to 200K. This is plenty high, and the lower ceiling will free up some extra breathing room so when multibeam workunits are created they won't fill up the download volume. I also fixed the top_hosts.php again. I guess I didn't check changes fast enough into SVN and they were overwritten with the previous buggy code. Should be okay now. I also took some time to upgrade my desktop machine to Fedora Core 7, just so I can start getting used to that process.

Not sure when I'll get to working on "bane" again, but Intel in conjuction with Colfax International assembled and donated a master science database replica machine which was delivered at the very end of last week. It basically has the same specs as thumper and the plan is to use it as a replica on which we do some real scientific development and final analysis. I should try to get that rolling soon.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 592582 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3224
Credit: 4,603,826
RAC: 0
United States
Message 592596 - Posted: 25 Jun 2007, 23:56:34 UTC

Some nice hardware donations lately. I am glad to see the hardware getting a little closer to today's machines.

Glad to hear that the day was a lot less of a rush and fix, and more trying to get some of the small things done.

Keep up the grand work, all of you. Maybe we can get those pesky old Orphan's cleaned up, that are still clogging the DB a little. Especially those results that no longer have a WU associated with them. Pointers have to be really funky with things like that hanging on.


My movie https://vimeo.com/manage/videos/502242
ID: 592596 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 592667 - Posted: 26 Jun 2007, 2:01:41 UTC
Last modified: 26 Jun 2007, 2:02:18 UTC

Matt,

The "http error", first mentioned in the "blip" thread, and as evidenced by:

6/25/2007 6:56:18 PM|SETI@home|Requesting 185928 seconds of new work, and reporting 1 completed tasks
6/25/2007 6:56:39 PM|SETI@home|Scheduler request failed: HTTP file not found
6/25/2007 6:56:39 PM|SETI@home|Sending scheduler request: To fetch work
6/25/2007 6:56:39 PM|SETI@home|Requesting 185928 seconds of new work, and reporting 1 completed tasks
6/25/2007 6:56:59 PM|SETI@home|Scheduler RPC succeeded [server version 509]
6/25/2007 6:56:59 PM|SETI@home|Deferring communication for 11 sec
6/25/2007 6:56:59 PM|SETI@home|Reason: requested by project
6/25/2007 6:56:59 PM|SETI@home|Deferring communication for 1 min 25 sec
6/25/2007 6:56:59 PM|SETI@home|Reason: no work from project

is (obviously) still with us. (times PDT)

I'm also not sure why that last line reads "no work from project" as there were about 200K WU's in the pipeline, ready to go...
.

Hello, from Albany, CA!...
ID: 592667 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 592713 - Posted: 26 Jun 2007, 5:02:25 UTC - in response to Message 592667.  
Last modified: 26 Jun 2007, 5:02:58 UTC

I'm also not sure why that last line reads "no work from project" as there were about 200K WU's in the pipeline, ready to go...

Definately a glitch somewhere- for the last 4 hours i've been getting "No work from project" responses to requests for more Work Units.
Grant
Darwin NT
ID: 592713 · Report as offensive
Rob

Send message
Joined: 25 Sep 06
Posts: 1
Credit: 199,258
RAC: 0
Australia
Message 592767 - Posted: 26 Jun 2007, 8:13:39 UTC

G'day.

I've had no new work units for about 4 days now.

Rob
Adelaide SA
ID: 592767 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 592811 - Posted: 26 Jun 2007, 10:19:31 UTC - in response to Message 592767.  

I've had no new work units for about 4 days now.

Would be worth posting about the problem in Numbner Crunching or the appropriate Help forum (Windows, LUNIX etc).
This problem has only just occured today. Prior to that, downloading as required, first attempt every time.
Grant
Darwin NT
ID: 592811 · Report as offensive
Ingleside
Volunteer developer

Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 15,832,022
RAC: 13
Norway
Message 592856 - Posted: 26 Jun 2007, 11:48:12 UTC - in response to Message 592713.  

I'm also not sure why that last line reads "no work from project" as there were about 200K WU's in the pipeline, ready to go...

Definately a glitch somewhere- for the last 4 hours i've been getting "No work from project" responses to requests for more Work Units.

Just a "stab in the dark", but SETI@home is running 2 Scheduling-servers, there one of them handles all Even results and the other all Odd results. If one of the servers gets so long ahead of the other server, it's possible there aren't any Odd or Even results left, meaning one of the Scheduling-servers doesn't have any work at all. Only then the other scheduling-server "catches-up" enough, will both scheduling-server have more work to send-out...

Now, in theory the Odd/Even Scheduling-server should be choosen randomly for each scheduler-request, but it's possible this isn't really happening so if you asks the "wrong" server you'll continue to ask the "wrong" server again and again...

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 592856 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19013
Credit: 40,757,560
RAC: 67
United Kingdom
Message 592860 - Posted: 26 Jun 2007, 11:52:51 UTC
Last modified: 26 Jun 2007, 11:54:24 UTC

In last half hour my two computers have downloaded 10 results all with even number ID's.

Andy

edit] Just noticed that splitters are now splitting work. They weren't when I was getting 'no work from project' [/edit
ID: 592860 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13722
Credit: 208,696,464
RAC: 304
Australia
Message 593494 - Posted: 27 Jun 2007, 6:54:35 UTC - in response to Message 592860.  
Last modified: 27 Jun 2007, 7:36:08 UTC

In last half hour my two computers have downloaded 10 results all with even number ID's.

Just got some work, all starting with even IDs. It's still asking for more work & now getting the "No work from project" messages again.

EDIT- after about 6 automatic retries i finally got another Work Unit, odd this time.
Grant
Darwin NT
ID: 593494 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 593635 - Posted: 27 Jun 2007, 15:37:41 UTC

There are two schedulers, running redundantly. However, to make sure they don't send out the same work twice they are given "control" of one half of the work, i.e. one sends out results with even id's, the other sends out odd. So, depending on which scheduler DNS randomly hands you at the time, you'll get either even or odd id'ed results to work on.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 593635 · Report as offensive
Profile Byron Leigh Hatch @ team Carl Sagan
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 4548
Credit: 35,667,570
RAC: 4
Canada
Message 593657 - Posted: 27 Jun 2007, 16:25:54 UTC - in response to Message 593635.  


There are two schedulers, running redundantly. However, to make sure they don't send out the same work twice they are given "control" of one half of the work, i.e. one sends out results with even id's, the other sends out odd. So, depending on which scheduler DNS randomly hands you at the time, you'll get either even or odd id'ed results to work on.

- Matt

Matt , thanks very much for this information.

Kind Regards
Byron
ID: 593657 · Report as offensive
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 3187
Credit: 57,163,290
RAC: 0
United States
Message 594426 - Posted: 28 Jun 2007, 13:57:21 UTC - in response to Message 593635.  
Last modified: 28 Jun 2007, 14:01:12 UTC

There are two schedulers, running redundantly. However, to make sure they don't send out the same work twice they are given "control" of one half of the work, i.e. one sends out results with even id's, the other sends out odd. So, depending on which scheduler DNS randomly hands you at the time, you'll get either even or odd id'ed results to work on.

- Matt


Matt, someone should check on the scheduler sending out the "odd" results, as I think that's the culprit that's handing out the "no new work" (even with 200k WU's ready to send) line... the http error is still with us, too; see following:

6/28/2007 6:49:50 AM|SETI@home|Sending scheduler request: To fetch work
6/28/2007 6:49:50 AM|SETI@home|Requesting 456833 seconds of new work
6/28/2007 6:49:55 AM|SETI@home|Scheduler request failed: HTTP file not found
6/28/2007 6:49:55 AM|SETI@home|Sending scheduler request: To fetch work
6/28/2007 6:49:55 AM|SETI@home|Requesting 456833 seconds of new work
6/28/2007 6:50:00 AM|SETI@home|Scheduler RPC succeeded [server version 509]
6/28/2007 6:50:00 AM|SETI@home|Deferring communication for 11 sec
6/28/2007 6:50:00 AM|SETI@home|Reason: requested by project
6/28/2007 6:50:00 AM|SETI@home|Deferring communication for 1 min 30 sec
6/28/2007 6:50:00 AM|SETI@home|Reason: no work from project


.

Hello, from Albany, CA!...
ID: 594426 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 594589 - Posted: 28 Jun 2007, 20:06:42 UTC - in response to Message 594426.  

There are two schedulers, running redundantly. However, to make sure they don't send out the same work twice they are given "control" of one half of the work, i.e. one sends out results with even id's, the other sends out odd. So, depending on which scheduler DNS randomly hands you at the time, you'll get either even or odd id'ed results to work on.

- Matt


Matt, someone should check on the scheduler sending out the "odd" results, as I think that's the culprit that's handing out the "no new work" (even with 200k WU's ready to send) line...

While it's unclear why "odd" is so far behind "even", I think it is likely that the "no new work" replies are coming from "even". Take a look at wuid 137582495 and note that the results were sent within seconds of creation. That indicates "even" doesn't really have any "ready to send" and is sending as soon as it gets some from the splitters. When it has sent all those it really doesn't have any until another batch comes from a splitter or the transitioner does a resend.

Also look at wuid 137443002 where "odd" was working at about the same 19:00 UTC time. It was sending work which had been created 16 hours earlier, the even result had been sent immediately.

I'm not sure how to relate the wuid difference of about 140000 and the resultid difference of about 435000 to the 200000 target for ready to send.
                                                               Joe
ID: 594589 · Report as offensive

Message boards : Technical News : Blop (Jun 25 2007)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.