Message boards :
Number crunching :
Seti it runs dry...
Message board moderation
Author | Message |
---|---|
SoNic Send message Joined: 24 Dec 00 Posts: 140 Credit: 2,963,627 RAC: 0 |
I have just looked at the units waiting to be send for work. ZERO. It's sad, so many things are going wrong now, with all the effort from the "geek" team... I wonder if isn't the time to switch to something else than [edit]GNU[/edit] as software. Something more dependable? Don't shoot me, that's just my toughts... Sorry, I ment GNU but thinking at General Public Licence... I wrote GPL |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
What is GPL? May this Farce be with You |
The frozen Send message Joined: 2 Jun 99 Posts: 11 Credit: 261,900 RAC: 0 |
I wonder if isn't the time to switch to something else than GPL as software. Something more dependable? Don't shoot me, that's just my toughts... Actually the software which causes these problems is not GPL.. As far as I know, the backend software is ONLY available to the seti-team and it is NOT GPL.. |
SoNic Send message Joined: 24 Dec 00 Posts: 140 Credit: 2,963,627 RAC: 0 |
I don't know exactly... Apache isn't GNU? It sais "compatible GPL"... Anyway, free is cheap but not the best usually. |
The frozen Send message Joined: 2 Jun 99 Posts: 11 Credit: 261,900 RAC: 0 |
I don't know exactly... Apache isn't GNU? Apache is GNU, but most backend parts I know aren't. Are they running SUN OS? This is not GNU.. The validators, transitioners, splitters and all other parts are not GNU too.. You found one GNU-software in a system of non-GPLed software. But I don't think it is okay to say "THIS is the problem! It's GPL so it CAN'T be good" and thats what I read from your post. Please correct me if I am wrong! "Running dry" is the result of many things coming together.. An outage of several hours, old and somewhat "small" hardware, slow tapes to be read, internet connection failing for hours.. Just give it time and it will get okay again - or donate some thousand dollars to the seti-project... Edit: Anyway, free is cheap but not the best usually. I understand your thinking. I think totally different.. Just one example: Most internet servers run Apache. If "cheap" would really mean "not the best" IIS or any non-GPL http-server would be serving most pages on the web.. I will now stop answering here, because I'm afraid this will become a "linux vs. windows" flame war and I don't want something like this... |
RichaG Send message Joined: 20 May 99 Posts: 1690 Credit: 19,287,294 RAC: 36 |
It ran dry because the splitters were offline too long. After a outage of this length, the data base gets hit so hard that the splitters have trouble to enter new work units into the DB. Red Bull Air Racing Gas price by zip at Seti |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Whats more, everyone has been warned that S@H will occasionally not be able to supply work to everyone that asks. It is strongly suggested that everyone join a couple of other BOINC projects. BOINC WIKI |
Pooh Bear 27 Send message Joined: 14 Jul 03 Posts: 3224 Credit: 4,603,826 RAC: 0 |
"The Waiting to transition queue tells you how many workunits/results are waiting to move down the pipeline. A large number means there is a problem somewhere in our backend server system." Currently the transitioner is at 9 hours, and no WU ready to send. Do I detect a problem??? My movie https://vimeo.com/manage/videos/502242 |
Divide Overflow Send message Joined: 3 Apr 99 Posts: 365 Credit: 131,684 RAC: 0 |
I think that RichardG has hit the nail on the head. We just need to wait a day or two for the ripple from the down time to work its way out of the project. As I frequently remind myself: I need patience and I need it RIGHT NOW!!! ;) |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
I mentioned this in another thread. Basically, just because the queue says "0" doesn't mean there's no work to be sent. Just that there's no backlog of work to be sent. There are 5 splitters working full bore to keep up with current demand. As soon as work is ready, it is sent out. As well, the transitioners are backlogged - once they start catching up you'll see the backlog of work increase. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
Byron Leigh Hatch @ team Carl Sagan Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4 |
I mentioned this in another thread. Basically, just because the queue says "0" doesn't mean there's no work to be sent. Just that there's no backlog of work to be sent. There are 5 splitters working full bore to keep up with current demand. As soon as work is ready, it is sent out. As well, the transitioners are backlogged - once they start catching up you'll see the backlog of work increase. Hi Matt thank you very much for your post and information byron :) <B>Happy Canada day</B> |
Dave Mickey Send message Joined: 19 Oct 99 Posts: 178 Credit: 11,122,965 RAC: 0 |
Matt, perhaps I'm thinking of the message you are referring to, during a past iteration of "systems have been down but are catching up" where you noted that you have some script or tool which can show you, for the last 10 data requests, how many were satisifed. For times like these, perhaps something like that could be leveraged into an informative status page item. Something along the lines of 82% of data requests filled in the last 10 minutes or raw stats like, for 10 minutes, 123,456 data requests recvd, 112,345 filled or some similar expression of demand vs. output. It would give stat-mongers something to latch onto and "watch" the systems catch up, and/or be comforted that sometime soon the "no work" messages will abate. That would be easy in your "spare time", no? ;) Dave |
SoNic Send message Joined: 24 Dec 00 Posts: 140 Credit: 2,963,627 RAC: 0 |
Last night the upload/download server ran out of processes. This happened because the load was very heavy, which causes adverse effects in apache. When hourly apache restarts were issued (for log rotation), old processes wouldn't die and new ones would fill the process queue. By this morning we had over 7000 httpd processes on the machine! Apparently some apache tuning is in order. This is from the technical page... that's I was talking about GPL/GNU. |
EclipseHA Send message Joined: 28 Jul 99 Posts: 1018 Credit: 530,719 RAC: 0 |
Last night the upload/download server ran out of processes. This happened because the load was very heavy, which causes adverse effects in apache. When hourly apache restarts were issued (for log rotation), old processes wouldn't die and new ones would fill the process queue. By this morning we had over 7000 httpd processes on the machine! Apparently some apache tuning is in order. I may be wrong, but doesn't apache have a limit (configurable) as to the max number of spawned processes? Yes.. here it is http://httpd.apache.org/docs-2.0/mod/core.html#rlimitnproc (Same kind of thing was there in apache 1.x) If the processes are not terminating correctly and the max process is hit, that seems to be a cgi problem and not an apache problem. Maybe the client just isn't reporting/reacting to the correct apache type error in this case. Seems if the CGI couldn't talk to the required service (server) might be part of the problem, and some additional error detection might be reguired in the cgi. In other words, I don't think the fault lies with apache . if the "rotate log" scheme currently used hourly SHOULD kill all active connections, I'd question if they are not using the proper scheme for log rotation! |
Matt Lebofsky Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0 |
We have maxclients set low enough so that (normally) when it maxes out it barely eats up half the RAM. As for log rotation, I've been using apache on solaris for about 7 years now, and I gotta say it works most of the time, but not always. Random things happen and there's nothing we can do about it. I've seen this happen on quiet systems, and on heavily loaded systems. That is, you do a "restart" and the old processes don't die. Then the next batch waiting to bind to port 80 don't die either. And on and on. The best we could do is pkill httpd, wait for everything to die, then pkill -9 httpd, wait so more, and then restart apache, which we may very well start doing in light of this, however ungraceful. Or tune down the maxclients even more and see if this happens again. - Matt
-- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.