Stretch (Apr 22 2009)


log in

Advanced search

Message boards : Technical News : Stretch (Apr 22 2009)

Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1384
Credit: 74,079
RAC: 0
United States
Message 887410 - Posted: 22 Apr 2009, 22:33:18 UTC

Looks like there were some beta project problems after the outage yesterday caused by a missing executable. That got replaced, and I think that everything should be okay now on that front. I heard rumors that regular users were seeing beta errors, but I'm hoping that was just confusion. I haven't heard anything since.

Other than that today was more or less a day of system/web plumbing. The web stuff I'm working on is becoming a major kludge due to time constraints. It's actually a conglomeration of C code and perl, php, and C-shell scripts. You know, whatever works. I'm a big fan of getting things working as soon as possible, then making it pretty later.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile perryjay
Volunteer tester
Avatar
Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 13,712,936
RAC: 12,326
United States
Message 887412 - Posted: 22 Apr 2009, 22:44:34 UTC - in response to Message 887410.

I don't think there were any actual errors but a few people reported getting switched to Beta. This is one of the threads started on it... http://setiathome.berkeley.edu/forum_thread.php?id=53205
____________


PROUD MEMBER OF Team Starfire World BOINC

Profile Gary Charpentier
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 11732
Credit: 5,969,877
RAC: 0
United States
Message 887436 - Posted: 23 Apr 2009, 0:24:03 UTC - in response to Message 887410.

Looks like there were some beta project problems after the outage yesterday caused by a missing executable. That got replaced, and I think that everything should be okay now on that front. I heard rumors that regular users were seeing beta errors, but I'm hoping that was just confusion. I haven't heard anything since.

Other than that today was more or less a day of system/web plumbing. The web stuff I'm working on is becoming a major kludge due to time constraints. It's actually a conglomeration of C code and perl, php, and C-shell scripts. You know, whatever works. I'm a big fan of getting things working as soon as possible, then making it pretty later.

- Matt

Saw those things about failed BETA d/l's and wondered if a router got pointed in the wrong direction, if just for a moment, while the project was coming back up.

As for making things work, well, there is never time to go back and make them pretty. If you have to maintain it, better do it right the first time or not close the ticket until you have finished the clean up.

Thanks for all the good work to the whole staff!


____________

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1622
Credit: 21,581,406
RAC: 3,114
United States
Message 887557 - Posted: 23 Apr 2009, 12:21:51 UTC

I'd like to suggest that the balance between MB and AP wu generation be adjusted somehow. Looking at the last few entries of my log from last night, I see that the servers ran out of MB's to give me (I only run MB). But then 16 seconds later boinc 6.6.20 requests work again and gets 2. Then 10 mins later it requests work again and gets 1. Then 3 mins later it requests more, and gets 1. And then 2 mins later it requests and gets 1.

If this goes on for 100,000 clients, no wonder life is tough at times for the servers. Wouldn't it make sense to produce a few more MB/s hour and then (hopefully) eliminate all these extra requests?

(Or is multiple, sequential, redundant requests a 'feature' of boinc that would occur independent of the #wu's available? I would believe that too.)

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5566
Credit: 51,443,283
RAC: 42,445
Australia
Message 887667 - Posted: 23 Apr 2009, 18:09:26 UTC - in response to Message 887557.

Looking at the last few entries of my log from last night, I see that the servers ran out of MB's to give me (I only run MB).

It's a glitch of some sort.
Scarecrow's Graphs show there's been plenty of MB work ready to send for over 4 days. Often my system will request work & get
23/04/2009 6:51:01 SETI@home Message from server: No work sent
23/04/2009 6:51:01 SETI@home Message from server: No work is available for SETI@home Enhanced
23/04/2009 6:51:01 SETI@home Message from server: No work available for the applications you have selected. Please check your settings on the web site.
Then a minute or so later when it makes another request, it gets some.
____________
Grant
Darwin NT.

Profile Gary Charpentier
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 11732
Credit: 5,969,877
RAC: 0
United States
Message 887668 - Posted: 23 Apr 2009, 18:09:53 UTC - in response to Message 887557.

I'd like to suggest that the balance between MB and AP wu generation be adjusted somehow. Looking at the last few entries of my log from last night, I see that the servers ran out of MB's to give me (I only run MB). But then 16 seconds later boinc 6.6.20 requests work again and gets 2. Then 10 mins later it requests work again and gets 1. Then 3 mins later it requests more, and gets 1. And then 2 mins later it requests and gets 1.

If this goes on for 100,000 clients, no wonder life is tough at times for the servers. Wouldn't it make sense to produce a few more MB/s hour and then (hopefully) eliminate all these extra requests?

(Or is multiple, sequential, redundant requests a 'feature' of boinc that would occur independent of the #wu's available? I would believe that too.)

Answering you last question first, yes. There is even method to the madness. You client asks for a given amount of work. The project sends some work units. It does not know how fast your machine crunches. It may send a single work unit that is twice (or more) the amount you request or that work unit might be only a fraction of the amount of your request. You machine after it gets the work unit calculates if it needs more. If it does it sends another request.

Now as to your specific request and getting none, well in this case the splitters couldn't make work fast enough or much more likely the internal queue between the splitter and the server backed up and the server was itself waiting due to too much data being moved around. That backlog may have broken a few microseconds after your request arrived. And in the 16 seconds before your machine again asked for work, formed and gone away a few dozen times. IIRC Matt said it grabs work units from the splitter queue is 100 unit hunks. Matt might want to tune this value so it is higher right after he brings the project back up if it has been offline for a while to reduce thrashing, but he knows much better than I what bottlenecks or if it is even possible to dynamically tune this value.

As to the ratio of M/B to A/P unless the queues for those aren't being kept reasonably full then the ratio is correct. If it wasn't, one would overflow and the other be empty.


____________

John G
Send message
Joined: 29 Dec 01
Posts: 62
Credit: 10,037,163
RAC: 24,371
Canada
Message 887733 - Posted: 23 Apr 2009, 21:50:29 UTC

Hey Guys and Gals

Uploads and downlaods just went down hopefully its just preventative maintenance ???

regards

Message boards : Technical News : Stretch (Apr 22 2009)

Copyright © 2014 University of California