Message boards :
Number crunching :
Is it possable to improve performance following 'outages'?
Message board moderation
Author | Message |
---|---|
Dorsai Send message Joined: 7 Sep 04 Posts: 474 Credit: 4,504,838 RAC: 0 |
There must be some way for Boinc to 'change' it's behaviour to take into account 'grid-lock' following an outage. Perhaps the project could issue a "suspend uploads for 1 day, on results that have a deadline that is more than 3 days away" instruction to users, so that the server could concentrate on receiving uploads that are 'urgent'.. Similarly, an instruction "you have more than 3 days work, so will have to wait 24 hours before you try and get any more work" could be issued, so that the server could concentrate on getting work to dry users. Clearly the time factors could be whatever the project wanted. Does anyone think this would help ease the pressure, allowing users close to deadlines to upload, and users close to running out to download? All the server would need to do was notice is running at 100%, and reply to requests "I'm busy" and then the Boinc client could say: "The server is busy. I need work, but I have enough for 3 days...All finished results are not due in for a week. I will try tomorrow." OR "The server is busy. I need work, But have 3 days worth. My finished results are due in tomorrow, so I will continue trying to return them now, but will wait till tomorrow before trying to get more work." etc? Boinc already has a panic mode for doing work, so why not add this feature to the returning and requesting of work? That way following an outage as soon as someone tries to connect, Boinc would try once, find the server busy, and then make a decision as to how urgent the desire to connect was. Foamy is "Lord and Master". (Oh, + some Classic WUs too.) |
Tigher Send message Joined: 18 Mar 04 Posts: 1547 Credit: 760,577 RAC: 0 |
I agree. I expressed similar sentiments in another thread; that you can't have a new WU until you are down to your last one/two or one for each CPU you have. Also that priority be given to downloads rather than getting results back....which can wait. Well worth doing. Their target is 1M client hosts....it will kill 'em! |
RPMurphy Send message Joined: 2 Jun 00 Posts: 131 Credit: 622,641 RAC: 0 |
I don't think that would really solve the problem. Imagine after a 2-3 day outage everyone with built up wu's that need to upload them, and get new ones were server instructed to wait 48 hours...yep, you guessed it. After that madatory waiting period is over we're right back to hammering the server and maybe even getting dangerously close to missing deadlines. On the surface it's a great idea, but in reality I think it would just delay the inevitable. It is a sad sad day when someone takes your spoon away from you... |
MikeSW17 Send message Joined: 3 Apr 99 Posts: 1603 Credit: 2,700,523 RAC: 0 |
BOINC already has a backing-off mechanism. If a scheduler request fails, then initially it re-tries after about 1 minute. If that and sucessive requests fail, each retry is delayed further and further into the future. A very brief communication failure/server outage will be usually be met by a quick retry, longer breaks met by longer retries. I believe it has both a random and exponential element in determining the next request time. Of course, in designing this method, the developers must have had some expectations re what actual down times might be expected. Priority seems also to be given to sending out new work after an outage. As far as upload priority goes, IMO it doesn't make sense to run your 'cache' or 'connect days' too close to the deadline anyhow - and in SETI it's not possible as max cache=10 and deadline is around 14 days. so there's a 4 day window. I suppose in the event of a _really_ major outage (>=1 week) the project admins could/would relax the deadlines rather than just reject masses of work. |
Tony D Send message Joined: 20 Mar 04 Posts: 73 Credit: 330,114 RAC: 0 |
I have had a weird idea, so think about it before the flames, ok? What if we were to take this "distributed computing" idea a little further? I know that some of us have servers we have available/can get/can procure or build, so why don't we all get together and make them available to SETI so that, in some little way, we may be ble to prevent the SCIENCE from suffering a setback during the like of these outages we are ALL suffering right now? As I have said elsewhere in these fora, I have just suffered the pain of redundancy and find myself with some enforced time on my hands. If there is ANYTHING I can do to help this project, please let me know and I will be there... immediately. I am a systems engineer and have a modicum of knowledge about how computer systems work, so I may be of some little help if required. Having read the posts of the regulars here, I also know that there is a lot of skill, knowledge and dedication within the ranks of my fellow BOINCers which, I am sure, could be put to some good use by the people at Berkeley if they feel it could be of some use, or, maybe, if someone would take the original idea and expand it a little. I'm not saying it would be a simple undertaking, but neither was SETI. All it took was someone to come up with the idea and the rest followed with others working out how it could be done, and then still others tasked with the implementation until we are where we are now. So, I suppose, what I'm asking is why don't we all get together and try to come up with a way that a backup to Berkeley can be implemented? Lets face it, we have a lot of the World's computing power in our hands and, as I've said earlier, from what I've gathered from the fora, some of the World's smartest people too. (and a few idiots, it has to be said, but let's ignore them for now, ok?) Just one small suggestion, but let's try and take it further, eh? Kindest regards to all my fellow crunchers Tony I may be getting older but I refuse to grow up! |
Dorsai Send message Joined: 7 Sep 04 Posts: 474 Credit: 4,504,838 RAC: 0 |
BOINC already has a backing-off mechanism. Yup, it does. But: Do you think my backing off for 60 seconds to upload a result that is not due in for 10 days will be helpful to a user with a result what is due in in 24 hours? Ditto, will my backing off downloading a new WU for 60 seconds, when I have 35 to go, will help a user who has none left? I Have 'disabled net access'. I Have work, to last till next mid week. So i am not trying to upload/download, so that the overstreached server can deal with those users who are 'urgently trying' to get work, or 'urgenty trying' to upload work before it's out of date. I'M OK JACK. Just trying to help those that are not, with a suggestion. Foamy is "Lord and Master". (Oh, + some Classic WUs too.) |
RichaG Send message Joined: 20 May 99 Posts: 1690 Credit: 19,287,294 RAC: 36 |
Have you looked the uploads already have low priority, plus they usually get set to wait a least 3 hours. Downloads have higher priority and get serviced fairly soon after an outage. What your suggesting will only imply people should have large connect times and caches. I have found out using 4.43 that a connect time larger than a few days will cause Einstein@home to go into panic mode before it has even got to 20% done. They have a very short deadline. Red Bull Air Racing Gas price by zip at Seti |
Tigher Send message Joined: 18 Mar 04 Posts: 1547 Credit: 760,577 RAC: 0 |
I have had a weird idea, so think about it before the flames, ok? Well I'm up for it. You have a thread going on this? |
W-K 666 Send message Joined: 18 May 99 Posts: 19075 Credit: 40,757,560 RAC: 67 |
Priority seems also to be given to sending out new work after an outage. One says you're right here one shouldn't work to close to the edge. But I refer you to my post yesterday:- CC V4.45 and Commsunication disruptions where the system put my computer too close to the edge. Andy |
Dorsai Send message Joined: 7 Sep 04 Posts: 474 Credit: 4,504,838 RAC: 0 |
What your suggesting will only imply people should have large connect times and caches. I'm not telling anyone what sized cache they want/need. For you a small cache meets your needs/requirements. That's fine by me. When you need more work though, I (or those like me) are there, with enough work to last 5 days, and finished work not due in for 10 days, tying up the server downloading/uploading work, when we don't need to yet. It makes sense to me that users like me, should be told to back off for a big bit, to let other users who have more pressing needs, to upload their 'close to deadline' results, and get new work, as they have none left. But if this is a dumb idea then so be it. It does seem clear to me that the current system does not work well at all following even short outages. There must be a better way. If my suggestion is not it, then Fine. So go on every one, come up with a better idea than mine. Happy crunching... Foamy is "Lord and Master". (Oh, + some Classic WUs too.) |
Ace Casino Send message Joined: 5 Feb 03 Posts: 285 Credit: 29,750,804 RAC: 15 |
Per Tony: It's a nice "dream" but I don't really think Berkley would ever go for it. It's there baby and something like that would require hands on or face to face with the designers. They would have to give you a lot of info that is private and/or confidential. "You never know" but I don't think it would ever happen with realative unknown people from the internet. [img][/img] |
Tigher Send message Joined: 18 Mar 04 Posts: 1547 Credit: 760,577 RAC: 0 |
Per Tony: Don't disagree really but; their baby wont live without our CPU time -fact. Also its open source...there are NO secrets. I have all the code to change and build a complete system - others have it to. So...if its all open why cannot the goal setting and design part be open too? There's some smart people in this community who might just have a some decent thoughts about how this might be done, tested, developed, run and supported. More.......the vast majority would be willing and free! |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Per Tony: And that is indeed how the new CPU scheduler came about. Some of us identified a problem, designed a fix, I implemented it, and convinced David to adopt it. BOINC WIKI |
Tigher Send message Joined: 18 Mar 04 Posts: 1547 Credit: 760,577 RAC: 0 |
Per Tony: I was perhaps thinking of a wider group. |
Tony D Send message Joined: 20 Mar 04 Posts: 73 Credit: 330,114 RAC: 0 |
I have had a weird idea, so think about it before the flames, ok? I have, Ian. It's here ... http://setiweb.ssl.berkeley.edu/forum_thread.php?id=15622#122345 kind regards Tony I may be getting older but I refuse to grow up! |
RPMurphy Send message Joined: 2 Jun 00 Posts: 131 Credit: 622,641 RAC: 0 |
A mirror on a sister-campus for the heavy load systems, perhaps on the East Coast? Dorsai's idea I think is a valid one, just very hard if not impossible to implement with current setup. The mirror could be anywhere in the world that the existing BOINC management has a brain trust. It is a sad sad day when someone takes your spoon away from you... |
Geek@Play Send message Joined: 31 Jul 01 Posts: 2467 Credit: 86,146,931 RAC: 0 |
I would say that there is plenty of room to improve their performance. Look at this. It shows the bandwidth used by the Cogent link. Remember this link is shared with Seti Classic so 35 to 40M is their used bandwidth. In the hourly chart I have been watching the spikes for the last 6 hours. When they spike up then uploads and downloads happen really nice. When the spikes go down uploads and downloads almost stop. Looks to me like the upload and download server get's really busy and is unable to answer IO requests even though the bandwidth is available and then it is freed up and able to handle nearly double the IO requests which makes the peaks. Make sense? This is the same thing that happened before they shutdown for the OS upgrade which you can see by looking at the daily graph. I think they need to find out what procedure is IO binding that server. Boinc....Boinc....Boinc....Boinc.... |
Ace Casino Send message Joined: 5 Feb 03 Posts: 285 Credit: 29,750,804 RAC: 15 |
I know that some of us have servers we have available/can get/can procure or build, so why don't we all get together and make them available to SETI. ////////////////////////////////////////////////////////////////////////// Sorry if I ruffled some feathers. Maybe I was not clear or you did not understand what I meant. It was the above part I was referring to. I still might be wrong but I don't think the people at Berkley are going to allow just anyone to distribute and/or receive WU's at there own "private server". That's what I meant by their baby and confidential info (computers/servers communicating with each other). If someone could design or improved something that they could use on a new or existing server "at Berkley" that is very different. ///////////////////////////////////////////////////////////////// And that is indeed how the new CPU scheduler came about. Some of us identified a problem, designed a fix, I implemented it, and convinced David to adopt it. ///////////////////////////////////////////// Thats great and I know there are alot of smart computer people out here. But it's not being run from your server at home or business. Maybe if someone designed a fix and it required a server we could all pitch in $5 to buy them a new one. [img][/img] |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.