Is it possable to improve performance following 'outages'?

Message boards : Number crunching : Is it possable to improve performance following 'outages'?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Dorsai
Avatar

Send message
Joined: 7 Sep 04
Posts: 474
Credit: 4,504,838
RAC: 0
United Kingdom
Message 121801 - Posted: 10 Jun 2005, 18:03:43 UTC
Last modified: 10 Jun 2005, 18:06:59 UTC

There must be some way for Boinc to 'change' it's behaviour to take into account 'grid-lock' following an outage.

Perhaps the project could issue a "suspend uploads for 1 day, on results that have a deadline that is more than 3 days away" instruction to users, so that the server could concentrate on receiving uploads that are 'urgent'..

Similarly, an instruction "you have more than 3 days work, so will have to wait 24 hours before you try and get any more work" could be issued, so that the server could concentrate on getting work to dry users.

Clearly the time factors could be whatever the project wanted.

Does anyone think this would help ease the pressure, allowing users close to deadlines to upload, and users close to running out to download?

All the server would need to do was notice is running at 100%, and reply to requests "I'm busy" and then the Boinc client could say:

"The server is busy. I need work, but I have enough for 3 days...All finished results are not due in for a week. I will try tomorrow."

OR

"The server is busy. I need work, But have 3 days worth. My finished results are due in tomorrow, so I will continue trying to return them now, but will wait till tomorrow before trying to get more work."

etc?

Boinc already has a panic mode for doing work, so why not add this feature to the returning and requesting of work?
That way following an outage as soon as someone tries to connect, Boinc would try once, find the server busy, and then make a decision as to how urgent the desire to connect was.


Foamy is "Lord and Master".
(Oh, + some Classic WUs too.)
ID: 121801 · Report as offensive
Profile Tigher
Volunteer tester

Send message
Joined: 18 Mar 04
Posts: 1547
Credit: 760,577
RAC: 0
United Kingdom
Message 121807 - Posted: 10 Jun 2005, 18:14:12 UTC

I agree. I expressed similar sentiments in another thread; that you can't have a new WU until you are down to your last one/two or one for each CPU you have. Also that priority be given to downloads rather than getting results back....which can wait. Well worth doing. Their target is 1M client hosts....it will kill 'em!

ID: 121807 · Report as offensive
Profile RPMurphy
Volunteer tester
Avatar

Send message
Joined: 2 Jun 00
Posts: 131
Credit: 622,641
RAC: 0
United States
Message 121861 - Posted: 10 Jun 2005, 20:29:31 UTC

I don't think that would really solve the problem.
Imagine after a 2-3 day outage everyone with built
up wu's that need to upload them, and get new ones
were server instructed to wait 48 hours...yep, you guessed
it. After that madatory waiting period is over we're
right back to hammering the server and maybe even getting
dangerously close to missing deadlines.

On the surface it's a great idea, but in reality I think
it would just delay the inevitable.
It is a sad sad day when someone takes your spoon away from you...
ID: 121861 · Report as offensive
Profile MikeSW17
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 1603
Credit: 2,700,523
RAC: 0
United Kingdom
Message 121901 - Posted: 10 Jun 2005, 22:15:38 UTC

BOINC already has a backing-off mechanism.
If a scheduler request fails, then initially it re-tries after about 1 minute.
If that and sucessive requests fail, each retry is delayed further and further into the future.

A very brief communication failure/server outage will be usually be met by a quick retry, longer breaks met by longer retries.

I believe it has both a random and exponential element in determining the next request time.

Of course, in designing this method, the developers must have had some expectations re what actual down times might be expected.

Priority seems also to be given to sending out new work after an outage.
As far as upload priority goes, IMO it doesn't make sense to run your 'cache' or 'connect days' too close to the deadline anyhow - and in SETI it's not possible as max cache=10 and deadline is around 14 days. so there's a 4 day window.
I suppose in the event of a _really_ major outage (>=1 week) the project admins could/would relax the deadlines rather than just reject masses of work.


ID: 121901 · Report as offensive
Profile Tony D

Send message
Joined: 20 Mar 04
Posts: 73
Credit: 330,114
RAC: 0
United Kingdom
Message 121914 - Posted: 10 Jun 2005, 22:45:32 UTC

I have had a weird idea, so think about it before the flames, ok?

What if we were to take this "distributed computing" idea a little further?

I know that some of us have servers we have available/can get/can procure or build, so why don't we all get together and make them available to SETI so that, in some little way, we may be ble to prevent the SCIENCE from suffering a setback during the like of these outages we are ALL suffering right now? As I have said elsewhere in these fora, I have just suffered the pain of redundancy and find myself with some enforced time on my hands. If there is ANYTHING I can do to help this project, please let me know and I will be there... immediately.

I am a systems engineer and have a modicum of knowledge about how computer systems work, so I may be of some little help if required.

Having read the posts of the regulars here, I also know that there is a lot of skill, knowledge and dedication within the ranks of my fellow BOINCers which, I am sure, could be put to some good use by the people at Berkeley if they feel it could be of some use, or, maybe, if someone would take the original idea and expand it a little.

I'm not saying it would be a simple undertaking, but neither was SETI. All it took was someone to come up with the idea and the rest followed with others working out how it could be done, and then still others tasked with the implementation until we are where we are now.

So, I suppose, what I'm asking is why don't we all get together and try to come up with a way that a backup to Berkeley can be implemented? Lets face it, we have a lot of the World's computing power in our hands and, as I've said earlier, from what I've gathered from the fora, some of the World's smartest people too. (and a few idiots, it has to be said, but let's ignore them for now, ok?)

Just one small suggestion, but let's try and take it further, eh?

Kindest regards to all my fellow crunchers

Tony
I may be getting older but I refuse to grow up!

ID: 121914 · Report as offensive
Profile Dorsai
Avatar

Send message
Joined: 7 Sep 04
Posts: 474
Credit: 4,504,838
RAC: 0
United Kingdom
Message 121915 - Posted: 10 Jun 2005, 22:51:57 UTC - in response to Message 121901.  

BOINC already has a backing-off mechanism.
If a scheduler request fails, then initially it re-tries after about 1 minute.


Yup, it does.

But:

Do you think my backing off for 60 seconds to upload a result that is not due in for 10 days will be helpful to a user with a result what is due in in 24 hours?

Ditto, will my backing off downloading a new WU for 60 seconds, when I have 35 to go, will help a user who has none left?

I Have 'disabled net access'.

I Have work, to last till next mid week.

So i am not trying to upload/download, so that the overstreached server can deal with those users who are 'urgently trying' to get work, or 'urgenty trying' to upload work before it's out of date.

I'M OK JACK.

Just trying to help those that are not, with a suggestion.

Foamy is "Lord and Master".
(Oh, + some Classic WUs too.)
ID: 121915 · Report as offensive
Profile RichaG
Volunteer tester
Avatar

Send message
Joined: 20 May 99
Posts: 1690
Credit: 19,287,294
RAC: 36
United States
Message 121986 - Posted: 11 Jun 2005, 2:13:55 UTC
Last modified: 11 Jun 2005, 2:14:18 UTC

Have you looked the uploads already have low priority, plus they usually get set to wait a least 3 hours.
Downloads have higher priority and get serviced fairly soon after an outage.

What your suggesting will only imply people should have large connect times and caches.

I have found out using 4.43 that a connect time larger than a few days will cause Einstein@home to go into panic mode before it has even got to 20% done. They have a very short deadline.
Red Bull Air Racing

Gas price by zip at Seti

ID: 121986 · Report as offensive
Profile Tigher
Volunteer tester

Send message
Joined: 18 Mar 04
Posts: 1547
Credit: 760,577
RAC: 0
United Kingdom
Message 122109 - Posted: 11 Jun 2005, 12:35:01 UTC - in response to Message 121914.  

I have had a weird idea, so think about it before the flames, ok?

What if we were to take this "distributed computing" idea a little further?

I know that some of us have servers we have available/can get/can procure or build, so why don't we all get together and make them available to SETI so that, in some little way, we may be ble to prevent the SCIENCE from suffering a setback during the like of these outages we are ALL suffering right now? As I have said elsewhere in these fora, I have just suffered the pain of redundancy and find myself with some enforced time on my hands. If there is ANYTHING I can do to help this project, please let me know and I will be there... immediately.

I am a systems engineer and have a modicum of knowledge about how computer systems work, so I may be of some little help if required.

Having read the posts of the regulars here, I also know that there is a lot of skill, knowledge and dedication within the ranks of my fellow BOINCers which, I am sure, could be put to some good use by the people at Berkeley if they feel it could be of some use, or, maybe, if someone would take the original idea and expand it a little.

I'm not saying it would be a simple undertaking, but neither was SETI. All it took was someone to come up with the idea and the rest followed with others working out how it could be done, and then still others tasked with the implementation until we are where we are now.

So, I suppose, what I'm asking is why don't we all get together and try to come up with a way that a backup to Berkeley can be implemented? Lets face it, we have a lot of the World's computing power in our hands and, as I've said earlier, from what I've gathered from the fora, some of the World's smartest people too. (and a few idiots, it has to be said, but let's ignore them for now, ok?)

Just one small suggestion, but let's try and take it further, eh?

Kindest regards to all my fellow crunchers

Tony


Well I'm up for it. You have a thread going on this?

ID: 122109 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19060
Credit: 40,757,560
RAC: 67
United Kingdom
Message 122183 - Posted: 11 Jun 2005, 16:21:43 UTC - in response to Message 121901.  
Last modified: 11 Jun 2005, 16:22:11 UTC

Priority seems also to be given to sending out new work after an outage.
As far as upload priority goes, IMO it doesn't make sense to run your 'cache' or 'connect days' too close to the deadline anyhow - and in SETI it's not possible as max cache=10 and deadline is around 14 days. so there's a 4 day window.
I suppose in the event of a _really_ major outage (>=1 week) the project admins could/would relax the deadlines rather than just reject masses of work.



One says you're right here one shouldn't work to close to the edge. But I refer you to my post yesterday:-
CC V4.45 and Commsunication disruptions
where the system put my computer too close to the edge.

Andy
ID: 122183 · Report as offensive
Profile Dorsai
Avatar

Send message
Joined: 7 Sep 04
Posts: 474
Credit: 4,504,838
RAC: 0
United Kingdom
Message 122210 - Posted: 11 Jun 2005, 17:14:37 UTC - in response to Message 121986.  
Last modified: 11 Jun 2005, 17:17:39 UTC

What your suggesting will only imply people should have large connect times and caches.

I have found out using 4.43 that a connect time larger than a few days will cause Einstein@home to go into panic mode before it has even got to 20% done. They have a very short deadline.


I'm not telling anyone what sized cache they want/need.
For you a small cache meets your needs/requirements. That's fine by me. When you need more work though, I (or those like me) are there, with enough work to last 5 days, and finished work not due in for 10 days, tying up the server downloading/uploading work, when we don't need to yet.
It makes sense to me that users like me, should be told to back off for a big bit, to let other users who have more pressing needs, to upload their 'close to deadline' results, and get new work, as they have none left.

But if this is a dumb idea then so be it.

It does seem clear to me that the current system does not work well at all following even short outages.
There must be a better way.
If my suggestion is not it, then Fine.
So go on every one, come up with a better idea than mine.

Happy crunching...



Foamy is "Lord and Master".
(Oh, + some Classic WUs too.)
ID: 122210 · Report as offensive
Profile Ace Casino
Avatar

Send message
Joined: 5 Feb 03
Posts: 285
Credit: 29,750,804
RAC: 15
United States
Message 122264 - Posted: 11 Jun 2005, 19:17:58 UTC
Last modified: 11 Jun 2005, 19:19:17 UTC

Per Tony:
It's a nice "dream" but I don't really think Berkley would ever go for it. It's there baby and something like that would require hands on or face to face with the designers. They would have to give you a lot of info that is private and/or confidential. "You never know" but I don't think it would ever happen with realative unknown people from the internet.
[img][/img]
ID: 122264 · Report as offensive
Profile Tigher
Volunteer tester

Send message
Joined: 18 Mar 04
Posts: 1547
Credit: 760,577
RAC: 0
United Kingdom
Message 122279 - Posted: 11 Jun 2005, 19:50:42 UTC - in response to Message 122264.  

Per Tony:
It's a nice "dream" but I don't really think Berkley would ever go for it. It's there baby and something like that would require hands on or face to face with the designers. They would have to give you a lot of info that is private and/or confidential. "You never know" but I don't think it would ever happen with realative unknown people from the internet.


Don't disagree really but; their baby wont live without our CPU time -fact. Also its open source...there are NO secrets. I have all the code to change and build a complete system - others have it to. So...if its all open why cannot the goal setting and design part be open too? There's some smart people in this community who might just have a some decent thoughts about how this might be done, tested, developed, run and supported. More.......the vast majority would be willing and free!


ID: 122279 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 122280 - Posted: 11 Jun 2005, 19:53:53 UTC - in response to Message 122279.  

Per Tony:
It's a nice "dream" but I don't really think Berkley would ever go for it. It's there baby and something like that would require hands on or face to face with the designers. They would have to give you a lot of info that is private and/or confidential. "You never know" but I don't think it would ever happen with realative unknown people from the internet.


Don't disagree really but; their baby wont live without our CPU time -fact. Also its open source...there are NO secrets. I have all the code to change and build a complete system - others have it to. So...if its all open why cannot the goal setting and design part be open too? There's some smart people in this community who might just have a some decent thoughts about how this might be done, tested, developed, run and supported. More.......the vast majority would be willing and free!

And that is indeed how the new CPU scheduler came about. Some of us identified a problem, designed a fix, I implemented it, and convinced David to adopt it.


BOINC WIKI
ID: 122280 · Report as offensive
Profile Tigher
Volunteer tester

Send message
Joined: 18 Mar 04
Posts: 1547
Credit: 760,577
RAC: 0
United Kingdom
Message 122297 - Posted: 11 Jun 2005, 20:09:07 UTC - in response to Message 122280.  

Per Tony:
It's a nice "dream" but I don't really think Berkley would ever go for it. It's there baby and something like that would require hands on or face to face with the designers. They would have to give you a lot of info that is private and/or confidential. "You never know" but I don't think it would ever happen with realative unknown people from the internet.


Don't disagree really but; their baby wont live without our CPU time -fact. Also its open source...there are NO secrets. I have all the code to change and build a complete system - others have it to. So...if its all open why cannot the goal setting and design part be open too? There's some smart people in this community who might just have a some decent thoughts about how this might be done, tested, developed, run and supported. More.......the vast majority would be willing and free!

And that is indeed how the new CPU scheduler came about. Some of us identified a problem, designed a fix, I implemented it, and convinced David to adopt it.


I was perhaps thinking of a wider group.

ID: 122297 · Report as offensive
Profile Tony D

Send message
Joined: 20 Mar 04
Posts: 73
Credit: 330,114
RAC: 0
United Kingdom
Message 122346 - Posted: 11 Jun 2005, 21:35:59 UTC - in response to Message 122109.  

I have had a weird idea, so think about it before the flames, ok?

What if we were to take this "distributed computing" idea a little further?

I know that some of us have servers we have available/can get/can procure or build, so why don't we all get together and make them available to SETI so that, in some little way, we may be ble to prevent the SCIENCE from suffering a setback during the like of these outages we are ALL suffering right now? As I have said elsewhere in these fora, I have just suffered the pain of redundancy and find myself with some enforced time on my hands. If there is ANYTHING I can do to help this project, please let me know and I will be there... immediately.

I am a systems engineer and have a modicum of knowledge about how computer systems work, so I may be of some little help if required.

Having read the posts of the regulars here, I also know that there is a lot of skill, knowledge and dedication within the ranks of my fellow BOINCers which, I am sure, could be put to some good use by the people at Berkeley if they feel it could be of some use, or, maybe, if someone would take the original idea and expand it a little.

I'm not saying it would be a simple undertaking, but neither was SETI. All it took was someone to come up with the idea and the rest followed with others working out how it could be done, and then still others tasked with the implementation until we are where we are now.

So, I suppose, what I'm asking is why don't we all get together and try to come up with a way that a backup to Berkeley can be implemented? Lets face it, we have a lot of the World's computing power in our hands and, as I've said earlier, from what I've gathered from the fora, some of the World's smartest people too. (and a few idiots, it has to be said, but let's ignore them for now, ok?)

Just one small suggestion, but let's try and take it further, eh?

Kindest regards to all my fellow crunchers

Tony


Well I'm up for it. You have a thread going on this?


I have, Ian. It's here ...

http://setiweb.ssl.berkeley.edu/forum_thread.php?id=15622#122345

kind regards

Tony


I may be getting older but I refuse to grow up!

ID: 122346 · Report as offensive
Profile RPMurphy
Volunteer tester
Avatar

Send message
Joined: 2 Jun 00
Posts: 131
Credit: 622,641
RAC: 0
United States
Message 122351 - Posted: 11 Jun 2005, 21:48:03 UTC

A mirror on a sister-campus for the heavy load systems, perhaps
on the East Coast?

Dorsai's idea I think is a valid one, just very hard if not impossible to implement with current setup. The mirror could be anywhere in the world that the existing BOINC management has a brain trust.
It is a sad sad day when someone takes your spoon away from you...
ID: 122351 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 122373 - Posted: 11 Jun 2005, 22:32:43 UTC
Last modified: 11 Jun 2005, 22:34:24 UTC

I would say that there is plenty of room to improve their performance. Look at this. It shows the bandwidth used by the Cogent link. Remember this link is shared with Seti Classic so 35 to 40M is their used bandwidth. In the hourly chart I have been watching the spikes for the last 6 hours. When they spike up then uploads and downloads happen really nice. When the spikes go down uploads and downloads almost stop. Looks to me like the upload and download server get's really busy and is unable to answer IO requests even though the bandwidth is available and then it is freed up and able to handle nearly double the IO requests which makes the peaks. Make sense? This is the same thing that happened before they shutdown for the OS upgrade which you can see by looking at the daily graph. I think they need to find out what procedure is IO binding that server.


Boinc....Boinc....Boinc....Boinc....
ID: 122373 · Report as offensive
Profile Ace Casino
Avatar

Send message
Joined: 5 Feb 03
Posts: 285
Credit: 29,750,804
RAC: 15
United States
Message 122591 - Posted: 12 Jun 2005, 11:17:00 UTC - in response to Message 122373.  

I know that some of us have servers we have available/can get/can procure or build, so why don't we all get together and make them available to SETI.
//////////////////////////////////////////////////////////////////////////
Sorry if I ruffled some feathers. Maybe I was not clear or you did not understand what I meant. It was the above part I was referring to. I still might be wrong but I don't think the people at Berkley are going to allow just anyone to distribute and/or receive WU's at there own "private server". That's what I meant by their baby and confidential info (computers/servers communicating with each other). If someone could design or improved something that they could use on a new or existing server "at Berkley" that is very different.
/////////////////////////////////////////////////////////////////
And that is indeed how the new CPU scheduler came about. Some of us identified a problem, designed a fix, I implemented it, and convinced David to adopt it.
/////////////////////////////////////////////
Thats great and I know there are alot of smart computer people out here. But it's not being run from your server at home or business. Maybe if someone designed a fix and it required a server we could all pitch in $5 to buy them a new one.

[img][/img]
ID: 122591 · Report as offensive

Message boards : Number crunching : Is it possable to improve performance following 'outages'?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.