Message boards :
Technical News :
Fast One (May 16 2007)
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 ![]() |
Quick note as I gotta catch a bus.. Wow - what a mess. I think we're in the middle of our biggest outage recovery to date, and it's breaking everything. The good news is we're coming into some newer hardware which we'll get on line to help somehow. See Eric's thread in the Staff Blog. He's been working overtime getting a new frankenstein machine together to act as another upload/download server and reduce the load on bruno. The scheduling server (galileo) has been choking - I just now moved all that over to bruno as well. So we may retire galileo soon, too. Jeff has been going nuts trying to track down errors in validator/assimilator code so we can get those on line as well. And our old friend "slow feeder query" is back, probably just being aggravated by the heavy load. Gotta go.. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
KB7RZF ![]() Send message Joined: 15 Aug 99 Posts: 9549 Credit: 3,308,926 RAC: 2 ![]() |
Matt, thanks for the quick update. We all keep our fingers crossed, and wish you all good luck on getting things sorted. You guys keep up the awesome job, we know its a pain in the rear. Jeremy Quick note as I gotta catch a bus.. ![]() |
Flyer Send message Joined: 8 Aug 00 Posts: 3 Credit: 545,047 RAC: 0 ![]() |
Matt and company, thanks for the great effort. Take your time get it fixed correctly and we'll all be better off for it. Again Thanks Flyer |
![]() Send message Joined: 29 Aug 01 Posts: 12 Credit: 2,493,076 RAC: 3 ![]() |
I know it's all in good hands. Keep up your spirits there should be light at the end of the tunnel so don't let it startle you when you come upon it. Best of luck & wishes Jim USAF Projects Page My Home Site |
Claudel Send message Joined: 2 Dec 00 Posts: 1 Credit: 109,396 RAC: 0 ![]() |
would it help if everybody stop asking for new work ? |
tombew Send message Joined: 12 Apr 00 Posts: 111 Credit: 12,182,261 RAC: 0 ![]() |
Thanks for the update. |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 ![]() |
would it help if everybody stop asking for new work ? While a noble effort, you have no chance in getting that level of cooperation. I agree with what I've seen mentioned elsewhere, that the projects need an additional throttle mechanism built into BOINC; a "break glass" that is only performed in dire circumstances... Something that puts a little more control into their hands to get wildly out of control processes back in control quicker. |
![]() ![]() Send message Joined: 4 Sep 99 Posts: 3868 Credit: 2,697,267 RAC: 0 ![]() |
First, let me say thanks to the SAH staff for their efforts over the last few weeks, both in fixing things and in keeping us informed. Second, I finally got a WU, and I'm trying to report it, but I'm getting a new error message - new to me anyway. 5/16/2007 8:47:39 PM|SETI@home|Sending scheduler request: To report completed tasks 5/16/2007 8:47:39 PM|SETI@home|Requesting 1863 seconds of new work, and reporting 1 completed tasks 5/16/2007 8:48:04 PM|SETI@home|Scheduler RPC succeeded [server version 509] 5/16/2007 8:48:04 PM|SETI@home|Message from server: Incomplete request received. Anything I should be doing from my end, or is this part of the general mess already underway? ![]() ![]() |
divedude Send message Joined: 5 Jun 06 Posts: 9 Credit: 4,394,705 RAC: 0 ![]() |
You guys do an awesome job and I hope that the work we all do helping to process the work units results in something. But, is it my understanding from the forums that you rely on one to three servers to upload/download units and process them? With no backup servers? A single server down should not have resulted in a 2 week or more downtime. I have just now started getting work, but my uploads are not working. We understand that it is based on donations, but a project this large should have backup servers in place before operating.. Can we as a community petition Sun to donate more server hardware to enhance the program? |
Martin Johnson Send message Joined: 9 Jun 01 Posts: 201 Credit: 224,995 RAC: 0 ![]() |
I just got this too for the first time, plus New Host Venue (??): 01-44-20|SETI@home|Sending scheduler request: Requested by user 01-44-20|SETI@home|(not requesting new work or reporting completed tasks) 01-44-35|SETI@home|Scheduler RPC succeeded [server version 509] 01-44-35|SETI@home|Message from server: Incomplete request received. 01-44-35|SETI@home|New host venue: 01-44-35||General prefs: from SETI@home (last modified 2007-04-01 01:31:42) 01-44-35||Host location: none 01-44-35||General prefs: using your defaults 01-44-35|SETI@home|Deferring communication for 11 sec 01-44-35|SETI@home|Reason: requested by project |
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0 ![]() |
I think everyone is seeing that now. Not sure what it is. Probably won't be until at least 15:00 GMT today (05/17) before it is fixed (unless someone is staying late again)... Edit: Had to change "tomorrow" to "today" because of being in EDT and so it is already "tomorrow" as far as GMT is concerned... |
![]() ![]() Send message Joined: 3 Apr 99 Posts: 9659 Credit: 251,998 RAC: 0 |
You guys do an awesome job and I hope that the work we all do helping to process the work units results in something. But, is it my understanding from the forums that you rely on one to three servers to upload/download units and process them? With no backup servers? A single server down should not have resulted in a 2 week or more downtime. I have just now started getting work, but my uploads are not working. We understand that it is based on donations, but a project this large should have backup servers in place before operating.. Can we as a community petition Sun to donate more server hardware to enhance the program? Hi divedude and welcome to the boards. :-) Yes, you have got it right, they are operating on a shoestring, all relied on money and hardware donations. They don't have any grants, and the failings of the servers we have seen here the past weeks are a result of working with old, obsolete hardware. They got the Thumper from Sun last year, but it was a beta test model and the one they got to replace Thumper was not a donation, they got it for a reduced price. So what they have to work with at the moment is donated hardware as it seems that the old servers are giving up one by one. Of fatigue, I suppose. And no, they don't have any backup servers, hence the long outages and difficulties the past weeks. So all donations are welcome, money and usable hardware. You can see their budget here. In case you would like to donate money please click on the link in my sig. Thank you. "I'm trying to maintain a shred of dignity in this world." - Me ![]() |
![]() ![]() Send message Joined: 30 Nov 03 Posts: 66507 Credit: 55,293,173 RAC: 49 ![]() ![]() |
You guys do an awesome job and I hope that the work we all do helping to process the work units results in something. But, is it my understanding from the forums that you rely on one to three servers to upload/download units and process them? With no backup servers? A single server down should not have resulted in a 2 week or more downtime. I have just now started getting work, but my uploads are not working. We understand that it is based on donations, but a project this large should have backup servers in place before operating.. Can we as a community petition Sun to donate more server hardware to enhance the program? Fatigue, figures, The Seti users must be driving the old servers into a metal breakdown. ;) Hopefully nothing else will go wrong as I'm about to put up a 5th cruncher on a shoestring of My own(I need to replace one or two psus eventually). CA HSR built a foundation, is laying Track! PRR T1 Class 4-4-4-4 #5550 Loco, US's 1st HST ![]() |
Mithotar ![]() Send message Joined: 11 Apr 01 Posts: 88 Credit: 66,037,385 RAC: 50 ![]() ![]() |
would it help if everybody stop asking for new work ? It might help but as noted elsewhere its not likely to happen. I have 5 PCs doing BOINC.......I have shut down BOINC on 4 of the 5 and left just the 1 running BOINC ...my "canary" if you like ......its not much but every little bit will help get things back to normal....... |
TarracoServer Send message Joined: 11 Apr 07 Posts: 38 Credit: 595,022 RAC: 0 ![]() |
I don't think that to stop Boinc clients would be the best option, because the problem will be when they'll say You can connect now: Another overflow for Bruno. The best is what they're doing: A new Up/download server to free Bruno. Keep on good job! ![]() ![]() |
Teasel Send message Joined: 16 May 03 Posts: 2 Credit: 3,467,167 RAC: 0 ![]() |
would it help if everybody stop asking for new work ? I'm no network expert, but how about simply firewalling out half of the internet? That might relieve the load on the servers sufficiently that they could achieve some reasonable throughput and clear some of the backlogue. A bit tough on the half that's firewalled, but the chunk of IP addresses allowed through could be changed every few hours to give everyone a chance. |
![]() Send message Joined: 4 Dec 99 Posts: 12 Credit: 1,401,540 RAC: 0 ![]() |
would it help if everybody stop asking for new work ? I have done the same.. I have 11 PCs running SETI but only one is running SETI right now (right now its not requesting more work). The 10 others is running rosetta until connection problems gets better :) I think that will help reduce the load on the project.. |
![]() Send message Joined: 4 Dec 99 Posts: 12 Credit: 1,401,540 RAC: 0 ![]() |
would it help if everybody stop asking for new work ? Thats a real good idea! I'm not network expert either but it sounds like a good idea.. If the bottleneck is the servers and not the firewall itself that would work i think.. :) |
![]() ![]() Send message Joined: 12 Jun 99 Posts: 105 Credit: 5,858,225 RAC: 0 ![]() |
|
![]() Send message Joined: 25 Nov 01 Posts: 21533 Credit: 7,508,002 RAC: 20 ![]() ![]() |
would it help if everybody stop asking for new work ? There is "exponential backoff" built into the Boinc manager that is designed to avoid giving a Boinc project a DDoS from its own clients. Perhaps that feature needs to be looked at again... There's also a problem/vulnerability in the Boinc communication protocol in that to complete a transaction, there is more than one TCP connection required for successful completion. There is then additional wasteful overhead generated if any part of the sequence fails. Worse still, under heavy load, the chance of getting subsequent connections to complete the sequence gets forever reduced (choked by all the other first connections attempts from everyone else) until noone can get to complete the sequence... Load shaping to give higher priority to connections that are further along the sequence so that once you're in you are guaranteed to complete the transfer would likely greatly help. Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.