Stretch (Apr 22 2009)

Message boards : Technical News : Stretch (Apr 22 2009)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 887410 - Posted: 22 Apr 2009, 22:33:18 UTC

Looks like there were some beta project problems after the outage yesterday caused by a missing executable. That got replaced, and I think that everything should be okay now on that front. I heard rumors that regular users were seeing beta errors, but I'm hoping that was just confusion. I haven't heard anything since.

Other than that today was more or less a day of system/web plumbing. The web stuff I'm working on is becoming a major kludge due to time constraints. It's actually a conglomeration of C code and perl, php, and C-shell scripts. You know, whatever works. I'm a big fan of getting things working as soon as possible, then making it pretty later.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 887410 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 887412 - Posted: 22 Apr 2009, 22:44:34 UTC - in response to Message 887410.  

I don't think there were any actual errors but a few people reported getting switched to Beta. This is one of the threads started on it... http://setiathome.berkeley.edu/forum_thread.php?id=53205


PROUD MEMBER OF Team Starfire World BOINC
ID: 887412 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30637
Credit: 53,134,872
RAC: 32
United States
Message 887436 - Posted: 23 Apr 2009, 0:24:03 UTC - in response to Message 887410.  

Looks like there were some beta project problems after the outage yesterday caused by a missing executable. That got replaced, and I think that everything should be okay now on that front. I heard rumors that regular users were seeing beta errors, but I'm hoping that was just confusion. I haven't heard anything since.

Other than that today was more or less a day of system/web plumbing. The web stuff I'm working on is becoming a major kludge due to time constraints. It's actually a conglomeration of C code and perl, php, and C-shell scripts. You know, whatever works. I'm a big fan of getting things working as soon as possible, then making it pretty later.

- Matt

Saw those things about failed BETA d/l's and wondered if a router got pointed in the wrong direction, if just for a moment, while the project was coming back up.

As for making things work, well, there is never time to go back and make them pretty. If you have to maintain it, better do it right the first time or not close the ticket until you have finished the clean up.

Thanks for all the good work to the whole staff!


ID: 887436 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 887557 - Posted: 23 Apr 2009, 12:21:51 UTC

I'd like to suggest that the balance between MB and AP wu generation be adjusted somehow. Looking at the last few entries of my log from last night, I see that the servers ran out of MB's to give me (I only run MB). But then 16 seconds later boinc 6.6.20 requests work again and gets 2. Then 10 mins later it requests work again and gets 1. Then 3 mins later it requests more, and gets 1. And then 2 mins later it requests and gets 1.

If this goes on for 100,000 clients, no wonder life is tough at times for the servers. Wouldn't it make sense to produce a few more MB/s hour and then (hopefully) eliminate all these extra requests?

(Or is multiple, sequential, redundant requests a 'feature' of boinc that would occur independent of the #wu's available? I would believe that too.)
ID: 887557 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 887667 - Posted: 23 Apr 2009, 18:09:26 UTC - in response to Message 887557.  

Looking at the last few entries of my log from last night, I see that the servers ran out of MB's to give me (I only run MB).

It's a glitch of some sort.
Scarecrow's Graphs show there's been plenty of MB work ready to send for over 4 days. Often my system will request work & get
23/04/2009 6:51:01 SETI@home Message from server: No work sent
23/04/2009 6:51:01 SETI@home Message from server: No work is available for SETI@home Enhanced
23/04/2009 6:51:01 SETI@home Message from server: No work available for the applications you have selected. Please check your settings on the web site.
Then a minute or so later when it makes another request, it gets some.
Grant
Darwin NT
ID: 887667 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30637
Credit: 53,134,872
RAC: 32
United States
Message 887668 - Posted: 23 Apr 2009, 18:09:53 UTC - in response to Message 887557.  

I'd like to suggest that the balance between MB and AP wu generation be adjusted somehow. Looking at the last few entries of my log from last night, I see that the servers ran out of MB's to give me (I only run MB). But then 16 seconds later boinc 6.6.20 requests work again and gets 2. Then 10 mins later it requests work again and gets 1. Then 3 mins later it requests more, and gets 1. And then 2 mins later it requests and gets 1.

If this goes on for 100,000 clients, no wonder life is tough at times for the servers. Wouldn't it make sense to produce a few more MB/s hour and then (hopefully) eliminate all these extra requests?

(Or is multiple, sequential, redundant requests a 'feature' of boinc that would occur independent of the #wu's available? I would believe that too.)

Answering you last question first, yes. There is even method to the madness. You client asks for a given amount of work. The project sends some work units. It does not know how fast your machine crunches. It may send a single work unit that is twice (or more) the amount you request or that work unit might be only a fraction of the amount of your request. You machine after it gets the work unit calculates if it needs more. If it does it sends another request.

Now as to your specific request and getting none, well in this case the splitters couldn't make work fast enough or much more likely the internal queue between the splitter and the server backed up and the server was itself waiting due to too much data being moved around. That backlog may have broken a few microseconds after your request arrived. And in the 16 seconds before your machine again asked for work, formed and gone away a few dozen times. IIRC Matt said it grabs work units from the splitter queue is 100 unit hunks. Matt might want to tune this value so it is higher right after he brings the project back up if it has been offline for a while to reduce thrashing, but he knows much better than I what bottlenecks or if it is even possible to dynamically tune this value.

As to the ratio of M/B to A/P unless the queues for those aren't being kept reasonably full then the ratio is correct. If it wasn't, one would overflow and the other be empty.


ID: 887668 · Report as offensive
John G

Send message
Joined: 29 Dec 01
Posts: 68
Credit: 10,932,850
RAC: 0
Canada
Message 887733 - Posted: 23 Apr 2009, 21:50:29 UTC

Hey Guys and Gals

Uploads and downlaods just went down hopefully its just preventative maintenance ???

regards
ID: 887733 · Report as offensive

Message boards : Technical News : Stretch (Apr 22 2009)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.