Message boards :
Number crunching :
Suggestion for clients to help backlog
Message board moderation
Author | Message |
---|---|
hooded.figure Send message Joined: 15 Dec 02 Posts: 33 Credit: 670,271 RAC: 0 |
To not compound the backlog further and to help make it blow over quicker, why don't we set our BOINC clients to Not allow any new work. It sounds crazy, but we should give the servers a break while they catch up. That is what I did to my three computers, they are going to finish and upload whatever they have, then start crunching Einstein. I think this is a good idea, we need to do our part to help the servers catch up, and our part is not giving the servers more to do. That is my opinion, just give the servers a break until they can catch up. -matt "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." - Benjamin Franklin, Historical Review of Pennsylvania, 1759. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
To not compound the backlog further and to help make it blow over quicker, why don't we set our BOINC clients to Not allow any new work. It sounds crazy, but we should give the servers a break while they catch up. That is what I did to my three computers, they are going to finish and upload whatever they have, then start crunching Einstein. I think this is a good idea, we need to do our part to help the servers catch up, and our part is not giving the servers more to do. DOH..... There you go confusing the issue with logic and common sense! :-O Actually, that is a good idea and I did it to most of my machines at the beginning of this weekend. ;-) Alinator |
hooded.figure Send message Joined: 15 Dec 02 Posts: 33 Credit: 670,271 RAC: 0 |
I didn't think about this earlier, but say we have 5000 people do this, and they all have 2 computers, that is 10000 computers. Now say all these people at once decide to allow new work. That would be quite a large spike and may introduce new backlog. I propose that if enough people do this we create our own system of allowing new work. We can go with something like "ok if your last name starts with a-d allow new work today" and so on. Or if someone has a better idea post it here. Also, so we can know who all does this for systematic purposes, tell us on this thread so we can decide if we need to systematically allow new work or not -matt "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." - Benjamin Franklin, Historical Review of Pennsylvania, 1759. |
Pooh Bear 27 Send message Joined: 14 Jul 03 Posts: 3224 Credit: 4,603,826 RAC: 0 |
As stated, they are trying to test their limits, at least some. If a ton of people stop, how can they test and see what the limits will be? You are just causing more issues, when you come back, as they will get hit hard again, and then not know what they need to do. My movie https://vimeo.com/manage/videos/502242 |
hooded.figure Send message Joined: 15 Dec 02 Posts: 33 Credit: 670,271 RAC: 0 |
As stated, they are trying to test their limits, at least some. If a ton of people stop, how can they test and see what the limits will be? You are just causing more issues, when you come back, as they will get hit hard again, and then not know what they need to do. Please post a link to where it says they are trying to test the limits. It seems to me that the limit has been reached, and right now they don't know how to fix it, or even know what went wrong. First thing's first, we need to lighten the load on the servers so they can get back to normal. THEN they can do what they need to do to test the limits. And also if we do a system like I stated in my previous post, then it will not be so hard of a hit when we come back. -matt "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." - Benjamin Franklin, Historical Review of Pennsylvania, 1759. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
I didn't think about this earlier, but say we have 5000 people do this, and they all have 2 computers, that is 10000 computers. Now say all these people at once decide to allow new work. That would be quite a large spike and may introduce new backlog. I propose that if enough people do this we create our own system of allowing new work. We can go with something like "ok if your last name starts with a-d allow new work today" and so on. Or if someone has a better idea post it here. Actually if I'm reading your post correctly, the Dev Team tried to take the spike effect into account in the design of BOINC. That's why the client will randomize the retry interval when communications with the project break down. I don't how well a "grassroots" load mangement campaign would work, but I guess things couldn't get much worse from trying! ;-) Alinator |
Pooh Bear 27 Send message Joined: 14 Jul 03 Posts: 3224 Credit: 4,603,826 RAC: 0 |
Please post a link to where it says they are trying to test the limits. It seems to me that the limit has been reached, and right now they don't know how to fix it, or even know what went wrong. First thing's first, we need to lighten the load on the servers so they can get back to normal. THEN they can do what they need to do to test the limits. And also if we do a system like I stated in my previous post, then it will not be so hard of a hit when we come back. If they didn't want it, would they not ask people to stop, or force people to stop? They need to find what they are working on and have the loads so they can fix it. I know I read something before about this. I have worked in a different situation but simular where user loads caused issues, when less users were around, we did not have the issues, and could not test and find why we had the issues. The load of users is a need. If you want to stop, fine. Come back whenever and see if it is fixed. If not, blame yourself for causing the load issues not having the ability to be tested. My movie https://vimeo.com/manage/videos/502242 |
hooded.figure Send message Joined: 15 Dec 02 Posts: 33 Credit: 670,271 RAC: 0 |
But what I am trying to say is that they have no idea what is causing the issues. Here is a part of the latest tech news: An even better (and quicker) solution is that we release the new SETI@home/BOINC client which does a lot more science (with much better resolution in chirp space) and therefore it takes much longer for workunits to complete. While this will not affect user credit (as BOINC credit is based on actual work, not the more arbitrary number of workunits), this will reduce the load on our servers by as much as 75% (maybe more), since there will be a lot less workunits/results to process. This should have an immediate positive effect on all our backend services, and then we can diagnose our disk wait issues in a less stressful environment. We are still testing this new client, and the scientist/programmer doing most of the work on this will be returning from vacation shortly. This implies that they WANT to lighten the load, strengthening my point. -matt "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." - Benjamin Franklin, Historical Review of Pennsylvania, 1759. |
Qui-Gon Send message Joined: 15 May 99 Posts: 2940 Credit: 19,199,902 RAC: 11 |
I estimate that there are about 100-200 people who come to these boards, and most of them don't come very regularly. There are 180,000+ users of SETI/BOINC. A suggestion, like this one, to the folks who use these boards will reach so few that even if ALL the people on the message forums did what you suggest (and as you can see, many would resist the suggestion), the impact would be very small. There's no way of telling how many visit the SETI front page, but I suspect that number is also small, unless the project is having problems. |
Pooh Bear 27 Send message Joined: 14 Jul 03 Posts: 3224 Credit: 4,603,826 RAC: 0 |
I think they would have done a lot more to ask people to stop, or force them to stop if they wanted the load lighter. I feel stopping is a hinderence. My movie https://vimeo.com/manage/videos/502242 |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
I guess the bottom line here is that anybody who's participated in Seti/BOINC for more than 24 hours knows that these guys *like* to push the edge of the envelop using "stone knives and bearskins". That being said, this isn't the first time the project has "augered in" for one reason or another, and the team has always managed to get it back on track sooner or later. I'm sure they will figure out where they went wrong pretty soon and get it straightened out. One thing related to Pooh Bears comments is there has been a marked increase in the number of participants since they closed registration for Classic. I'm pretty sure this is related somehow to the current problems, and may tend to increase the difficulty and time required to figure out all the problems. But on the other hand, if was it was easy, everyone would be doing it! :-) Alinator |
BarryAZ Send message Joined: 1 Apr 01 Posts: 2580 Credit: 16,982,517 RAC: 0 |
But it seems it is not simply a 'wait until they catch up' issue. Rather, it is a 'can't keep up with the normal (increasing) workload' issue. It seems the hope from the project folks is that with a new client which does more work, this will take quite a bit longer for each unit to process and this will reduce the work on the server side. Makes sense -- but that's nothing 'we' can do at this point. I'd note that the Einstein folks use a client which takes about 50% longer than the existing Seti work unit -- which probably helps them with their load (which has increased in part as a function of SETI's travails over the past 75 days.). To not compound the backlog further and to help make it blow over quicker, why don't we set our BOINC clients to Not allow any new work. It sounds crazy, but we should give the servers a break while they catch up. That is what I did to my three computers, they are going to finish and upload whatever they have, then start crunching Einstein. I think this is a good idea, we need to do our part to help the servers catch up, and our part is not giving the servers more to do. |
hooded.figure Send message Joined: 15 Dec 02 Posts: 33 Credit: 670,271 RAC: 0 |
but this isn't normal workload currently. This is all stuff that piled up from the outage. -matt "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." - Benjamin Franklin, Historical Review of Pennsylvania, 1759. |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
but this isn't normal workload currently. This is all stuff that piled up from the outage. That's not true from what I can tell, all the stuff I have pending now is from after they went back online with the cleaned up filesystem. Alinator |
Project III Send message Joined: 7 Oct 04 Posts: 106 Credit: 442,001 RAC: 1 |
I think there exists a question of ethics. "I'll do it if you'll do it. It seems to me many people would agree to do it, but then continue to get new work anyhow. It would give liars a slight advantage in terms of gaining credit. I love credit (and science). SETI.USA |
Alinator Send message Joined: 19 Apr 05 Posts: 4178 Credit: 4,647,982 RAC: 0 |
I think there exists a question of ethics. LOL, that's why I left one box still grabbing results when it can! ;-) Alinator |
Misfit Send message Joined: 21 Jun 01 Posts: 21804 Credit: 2,815,091 RAC: 0 |
"stone knives and bearskins" .o0(a ST original series quote) Anyway earlier today there was no new work to be had at all. Thank goodness for multiple projects. |
Skeptic Send message Joined: 15 Mar 03 Posts: 106 Credit: 30,946 RAC: 0 |
"...just give the servers a break until they can catch up." Agree with the diagnosis, disagree with the prescription. Ultimately, we need to remember that this is not just about crunching, but about doing something that passes for science, to whit - looking for little green men. That end is not served by having the most productive crunchers essentially boycott the project, no matter how noble the intent. Moreover, they do have an untested theory about what is the cause of the recent problems, and it is posed in the first sentence of the 4-Sep Technical Update: ================ September 4, 2005 - 23:00 UTC - Technical News "... This server worked much better in the past - we're not sure what changed. Perhaps just the influx of new users?" ================ Well, how about this test for that hypothesis: Stop adding new users. Or in other words: "When you are in a hole, stop digging." Make it work effectively for you big dog crunchers working now, which appears to be quite a good stress test for the system, as you manage to break it quite frequently. Continue to get the most science done. Then, once stable, re-open for new users. - Skeptic - "... and there is no intelligent life in Washington D.C. either." |
EclipseHA Send message Joined: 28 Jul 99 Posts: 1018 Credit: 530,719 RAC: 0 |
The reality is, that if all crunchers that read these forums on a regular basis were to stop requesting new work, it would have a minor impact. Most others have a "set and forget" method of running. If there's work, it get's crunched, if not either more idle time, or more time for another project. The only way to thottle work returned is to thottle work sent to be crunched. And that thottle might not really kick in for a few days. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.