Suggestion for clients to help backlog

Message boards : Number crunching : Suggestion for clients to help backlog
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile hooded.figure
Volunteer tester

Send message
Joined: 15 Dec 02
Posts: 33
Credit: 670,271
RAC: 0
United States
Message 163725 - Posted: 6 Sep 2005, 2:31:19 UTC

To not compound the backlog further and to help make it blow over quicker, why don't we set our BOINC clients to Not allow any new work. It sounds crazy, but we should give the servers a break while they catch up. That is what I did to my three computers, they are going to finish and upload whatever they have, then start crunching Einstein. I think this is a good idea, we need to do our part to help the servers catch up, and our part is not giving the servers more to do.

That is my opinion, just give the servers a break until they can catch up.

-matt
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
- Benjamin Franklin,
Historical Review of Pennsylvania, 1759.
ID: 163725 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 163727 - Posted: 6 Sep 2005, 2:39:36 UTC - in response to Message 163725.  
Last modified: 6 Sep 2005, 2:41:34 UTC

To not compound the backlog further and to help make it blow over quicker, why don't we set our BOINC clients to Not allow any new work. It sounds crazy, but we should give the servers a break while they catch up. That is what I did to my three computers, they are going to finish and upload whatever they have, then start crunching Einstein. I think this is a good idea, we need to do our part to help the servers catch up, and our part is not giving the servers more to do.

That is my opinion, just give the servers a break until they can catch up.

-matt


DOH.....

There you go confusing the issue with logic and common sense! :-O

Actually, that is a good idea and I did it to most of my machines at the beginning of this weekend. ;-)

Alinator
ID: 163727 · Report as offensive
Profile hooded.figure
Volunteer tester

Send message
Joined: 15 Dec 02
Posts: 33
Credit: 670,271
RAC: 0
United States
Message 163732 - Posted: 6 Sep 2005, 2:57:24 UTC

I didn't think about this earlier, but say we have 5000 people do this, and they all have 2 computers, that is 10000 computers. Now say all these people at once decide to allow new work. That would be quite a large spike and may introduce new backlog. I propose that if enough people do this we create our own system of allowing new work. We can go with something like "ok if your last name starts with a-d allow new work today" and so on. Or if someone has a better idea post it here.

Also, so we can know who all does this for systematic purposes, tell us on this thread so we can decide if we need to systematically allow new work or not

-matt
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
- Benjamin Franklin,
Historical Review of Pennsylvania, 1759.
ID: 163732 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3224
Credit: 4,603,826
RAC: 0
United States
Message 163734 - Posted: 6 Sep 2005, 3:07:55 UTC

As stated, they are trying to test their limits, at least some. If a ton of people stop, how can they test and see what the limits will be? You are just causing more issues, when you come back, as they will get hit hard again, and then not know what they need to do.



My movie https://vimeo.com/manage/videos/502242
ID: 163734 · Report as offensive
Profile hooded.figure
Volunteer tester

Send message
Joined: 15 Dec 02
Posts: 33
Credit: 670,271
RAC: 0
United States
Message 163736 - Posted: 6 Sep 2005, 3:13:16 UTC - in response to Message 163734.  

As stated, they are trying to test their limits, at least some. If a ton of people stop, how can they test and see what the limits will be? You are just causing more issues, when you come back, as they will get hit hard again, and then not know what they need to do.



Please post a link to where it says they are trying to test the limits. It seems to me that the limit has been reached, and right now they don't know how to fix it, or even know what went wrong. First thing's first, we need to lighten the load on the servers so they can get back to normal. THEN they can do what they need to do to test the limits. And also if we do a system like I stated in my previous post, then it will not be so hard of a hit when we come back.

-matt
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
- Benjamin Franklin,
Historical Review of Pennsylvania, 1759.
ID: 163736 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 163739 - Posted: 6 Sep 2005, 3:19:54 UTC - in response to Message 163732.  
Last modified: 6 Sep 2005, 3:20:53 UTC

I didn't think about this earlier, but say we have 5000 people do this, and they all have 2 computers, that is 10000 computers. Now say all these people at once decide to allow new work. That would be quite a large spike and may introduce new backlog. I propose that if enough people do this we create our own system of allowing new work. We can go with something like "ok if your last name starts with a-d allow new work today" and so on. Or if someone has a better idea post it here.

Also, so we can know who all does this for systematic purposes, tell us on this thread so we can decide if we need to systematically allow new work or not

-matt


Actually if I'm reading your post correctly, the Dev Team tried to take the spike effect into account in the design of BOINC. That's why the client will randomize the retry interval when communications with the project break down.

I don't how well a "grassroots" load mangement campaign would work, but I guess things couldn't get much worse from trying! ;-)

Alinator


ID: 163739 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3224
Credit: 4,603,826
RAC: 0
United States
Message 163740 - Posted: 6 Sep 2005, 3:21:07 UTC - in response to Message 163736.  

Please post a link to where it says they are trying to test the limits. It seems to me that the limit has been reached, and right now they don't know how to fix it, or even know what went wrong. First thing's first, we need to lighten the load on the servers so they can get back to normal. THEN they can do what they need to do to test the limits. And also if we do a system like I stated in my previous post, then it will not be so hard of a hit when we come back.

-matt


If they didn't want it, would they not ask people to stop, or force people to stop? They need to find what they are working on and have the loads so they can fix it. I know I read something before about this.

I have worked in a different situation but simular where user loads caused issues, when less users were around, we did not have the issues, and could not test and find why we had the issues. The load of users is a need.

If you want to stop, fine. Come back whenever and see if it is fixed. If not, blame yourself for causing the load issues not having the ability to be tested.



My movie https://vimeo.com/manage/videos/502242
ID: 163740 · Report as offensive
Profile hooded.figure
Volunteer tester

Send message
Joined: 15 Dec 02
Posts: 33
Credit: 670,271
RAC: 0
United States
Message 163745 - Posted: 6 Sep 2005, 3:26:56 UTC

But what I am trying to say is that they have no idea what is causing the issues. Here is a part of the latest tech news:

An even better (and quicker) solution is that we release the new SETI@home/BOINC client which does a lot more science (with much better resolution in chirp space) and therefore it takes much longer for workunits to complete. While this will not affect user credit (as BOINC credit is based on actual work, not the more arbitrary number of workunits), this will reduce the load on our servers by as much as 75% (maybe more), since there will be a lot less workunits/results to process. This should have an immediate positive effect on all our backend services, and then we can diagnose our disk wait issues in a less stressful environment. We are still testing this new client, and the scientist/programmer doing most of the work on this will be returning from vacation shortly.

This implies that they WANT to lighten the load, strengthening my point.

-matt
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
- Benjamin Franklin,
Historical Review of Pennsylvania, 1759.
ID: 163745 · Report as offensive
Profile Qui-Gon
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 2940
Credit: 19,199,902
RAC: 11
United States
Message 163748 - Posted: 6 Sep 2005, 3:35:02 UTC

I estimate that there are about 100-200 people who come to these boards, and most of them don't come very regularly. There are 180,000+ users of SETI/BOINC. A suggestion, like this one, to the folks who use these boards will reach so few that even if ALL the people on the message forums did what you suggest (and as you can see, many would resist the suggestion), the impact would be very small.

There's no way of telling how many visit the SETI front page, but I suspect that number is also small, unless the project is having problems.
ID: 163748 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3224
Credit: 4,603,826
RAC: 0
United States
Message 163751 - Posted: 6 Sep 2005, 3:36:50 UTC

I think they would have done a lot more to ask people to stop, or force them to stop if they wanted the load lighter.

I feel stopping is a hinderence.



My movie https://vimeo.com/manage/videos/502242
ID: 163751 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 163753 - Posted: 6 Sep 2005, 3:39:18 UTC
Last modified: 6 Sep 2005, 3:40:53 UTC

I guess the bottom line here is that anybody who's participated in Seti/BOINC for more than 24 hours knows that these guys *like* to push the edge of the envelop using "stone knives and bearskins".

That being said, this isn't the first time the project has "augered in" for one reason or another, and the team has always managed to get it back on track sooner or later.

I'm sure they will figure out where they went wrong pretty soon and get it straightened out.

One thing related to Pooh Bears comments is there has been a marked increase in the number of participants since they closed registration for Classic. I'm pretty sure this is related somehow to the current problems, and may tend to increase the difficulty and time required to figure out all the problems. But on the other hand, if was it was easy, everyone would be doing it! :-)

Alinator
ID: 163753 · Report as offensive
BarryAZ

Send message
Joined: 1 Apr 01
Posts: 2580
Credit: 16,982,517
RAC: 0
United States
Message 163756 - Posted: 6 Sep 2005, 3:40:40 UTC - in response to Message 163725.  

But it seems it is not simply a 'wait until they catch up' issue. Rather, it is a 'can't keep up with the normal (increasing) workload' issue. It seems the hope from the project folks is that with a new client which does more work, this will take quite a bit longer for each unit to process and this will reduce the work on the server side. Makes sense -- but that's nothing 'we' can do at this point. I'd note that the Einstein folks use a client which takes about 50% longer than the existing Seti work unit -- which probably helps them with their load (which has increased in part as a function of SETI's travails over the past 75 days.).


To not compound the backlog further and to help make it blow over quicker, why don't we set our BOINC clients to Not allow any new work. It sounds crazy, but we should give the servers a break while they catch up. That is what I did to my three computers, they are going to finish and upload whatever they have, then start crunching Einstein. I think this is a good idea, we need to do our part to help the servers catch up, and our part is not giving the servers more to do.

That is my opinion, just give the servers a break until they can catch up.

-matt


ID: 163756 · Report as offensive
Profile hooded.figure
Volunteer tester

Send message
Joined: 15 Dec 02
Posts: 33
Credit: 670,271
RAC: 0
United States
Message 163761 - Posted: 6 Sep 2005, 3:43:30 UTC

but this isn't normal workload currently. This is all stuff that piled up from the outage.

-matt
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
- Benjamin Franklin,
Historical Review of Pennsylvania, 1759.
ID: 163761 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 163763 - Posted: 6 Sep 2005, 3:46:46 UTC - in response to Message 163761.  

but this isn't normal workload currently. This is all stuff that piled up from the outage.

-matt


That's not true from what I can tell, all the stuff I have pending now is from after they went back online with the cleaned up filesystem.

Alinator
ID: 163763 · Report as offensive
Profile Project III
Volunteer tester

Send message
Joined: 7 Oct 04
Posts: 106
Credit: 442,001
RAC: 1
United States
Message 163765 - Posted: 6 Sep 2005, 3:47:22 UTC

I think there exists a question of ethics.

"I'll do it if you'll do it. It seems to me many people would agree to do it, but then continue to get new work anyhow. It would give liars a slight advantage in terms of gaining credit.

I love credit (and science).
SETI.USA
ID: 163765 · Report as offensive
Alinator
Volunteer tester

Send message
Joined: 19 Apr 05
Posts: 4178
Credit: 4,647,982
RAC: 0
United States
Message 163767 - Posted: 6 Sep 2005, 3:50:22 UTC - in response to Message 163765.  
Last modified: 6 Sep 2005, 3:50:39 UTC

I think there exists a question of ethics.

"I'll do it if you'll do it. It seems to me many people would agree to do it, but then continue to get new work anyhow. It would give liars a slight advantage in terms of gaining credit.

I love credit (and science).


LOL, that's why I left one box still grabbing results when it can! ;-)

Alinator
ID: 163767 · Report as offensive
Profile Misfit
Volunteer tester
Avatar

Send message
Joined: 21 Jun 01
Posts: 21804
Credit: 2,815,091
RAC: 0
United States
Message 163784 - Posted: 6 Sep 2005, 4:18:14 UTC - in response to Message 163753.  

"stone knives and bearskins"

.o0(a ST original series quote)

Anyway earlier today there was no new work to be had at all. Thank goodness for multiple projects.
ID: 163784 · Report as offensive
Profile Skeptic
Avatar

Send message
Joined: 15 Mar 03
Posts: 106
Credit: 30,946
RAC: 0
United States
Message 163790 - Posted: 6 Sep 2005, 4:25:41 UTC - in response to Message 163745.  
Last modified: 6 Sep 2005, 4:27:42 UTC

"...just give the servers a break until they can catch up."
"This implies that they WANT to lighten the load, strengthening my point." -matt


Agree with the diagnosis, disagree with the prescription.

Ultimately, we need to remember that this is not just about crunching, but about doing something that passes for science, to whit - looking for little green men.

That end is not served by having the most productive crunchers essentially boycott the project, no matter how noble the intent.

Moreover, they do have an untested theory about what is the cause of the recent problems, and it is posed in the first sentence of the 4-Sep Technical Update:
================
September 4, 2005 - 23:00 UTC - Technical News "... This server worked much better in the past - we're not sure what changed. Perhaps just the influx of new users?"
================

Well, how about this test for that hypothesis: Stop adding new users.

Or in other words:

"When you are in a hole, stop digging."

Make it work effectively for you big dog crunchers working now, which appears to be quite a good stress test for the system, as you manage to break it quite frequently. Continue to get the most science done. Then, once stable, re-open for new users.

- Skeptic - "... and there is no intelligent life in Washington D.C. either."
ID: 163790 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 163800 - Posted: 6 Sep 2005, 4:42:06 UTC

The reality is, that if all crunchers that read these forums on a regular basis were to stop requesting new work, it would have a minor impact.

Most others have a "set and forget" method of running. If there's work, it get's crunched, if not either more idle time, or more time for another project.

The only way to thottle work returned is to thottle work sent to be crunched. And that thottle might not really kick in for a few days.
ID: 163800 · Report as offensive

Message boards : Number crunching : Suggestion for clients to help backlog


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.