Seti it runs dry...

Author	Message
SoNic Send message Joined: 24 Dec 00 Posts: 140 Credit: 2,963,627 RAC: 0	Message 130684 - Posted: 30 Jun 2005, 23:24:30 UTC Last modified: 30 Jun 2005, 23:28:30 UTC I have just looked at the units waiting to be send for work. ZERO. It's sad, so many things are going wrong now, with all the effort from the "geek" team... I wonder if isn't the time to switch to something else than [edit]GNU[/edit] as software. Something more dependable? Don't shoot me, that's just my toughts... Sorry, I ment GNU but thinking at General Public Licence... I wrote GPL ID: 130684 ·

PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1	Message 130685 - Posted: 30 Jun 2005, 23:25:44 UTC What is GPL? May this Farce be with You ID: 130685 ·

The frozen Send message Joined: 2 Jun 99 Posts: 11 Credit: 261,900 RAC: 0	Message 130687 - Posted: 30 Jun 2005, 23:29:46 UTC - in response to Message 130684. I wonder if isn't the time to switch to something else than GPL as software. Something more dependable? Don't shoot me, that's just my toughts... Actually the software which causes these problems is not GPL.. As far as I know, the backend software is ONLY available to the seti-team and it is NOT GPL.. ID: 130687 ·

SoNic Send message Joined: 24 Dec 00 Posts: 140 Credit: 2,963,627 RAC: 0	Message 130688 - Posted: 30 Jun 2005, 23:30:57 UTC Last modified: 30 Jun 2005, 23:32:59 UTC I don't know exactly... Apache isn't GNU? It sais "compatible GPL"... Anyway, free is cheap but not the best usually. ID: 130688 ·

The frozen Send message Joined: 2 Jun 99 Posts: 11 Credit: 261,900 RAC: 0	Message 130695 - Posted: 30 Jun 2005, 23:38:08 UTC - in response to Message 130688. Last modified: 30 Jun 2005, 23:41:51 UTC I don't know exactly... Apache isn't GNU? Apache is GNU, but most backend parts I know aren't. Are they running SUN OS? This is not GNU.. The validators, transitioners, splitters and all other parts are not GNU too.. You found one GNU-software in a system of non-GPLed software. But I don't think it is okay to say "THIS is the problem! It's GPL so it CAN'T be good" and thats what I read from your post. Please correct me if I am wrong! "Running dry" is the result of many things coming together.. An outage of several hours, old and somewhat "small" hardware, slow tapes to be read, internet connection failing for hours.. Just give it time and it will get okay again - or donate some thousand dollars to the seti-project... Edit: Anyway, free is cheap but not the best usually. I understand your thinking. I think totally different.. Just one example: Most internet servers run Apache. If "cheap" would really mean "not the best" IIS or any non-GPL http-server would be serving most pages on the web.. I will now stop answering here, because I'm afraid this will become a "linux vs. windows" flame war and I don't want something like this... ID: 130695 ·

RichaG Volunteer tester Send message Joined: 20 May 99 Posts: 1690 Credit: 19,287,294 RAC: 36	Message 130698 - Posted: 30 Jun 2005, 23:40:26 UTC It ran dry because the splitters were offline too long. After a outage of this length, the data base gets hit so hard that the splitters have trouble to enter new work units into the DB. Red Bull Air Racing Gas price by zip at Seti ID: 130698 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 130703 - Posted: 30 Jun 2005, 23:47:29 UTC Whats more, everyone has been warned that S@H will occasionally not be able to supply work to everyone that asks. It is strongly suggested that everyone join a couple of other BOINC projects. BOINC WIKI ID: 130703 ·

Pooh Bear 27 Volunteer tester Send message Joined: 14 Jul 03 Posts: 3224 Credit: 4,603,826 RAC: 0	Message 130716 - Posted: 1 Jul 2005, 0:29:26 UTC "The Waiting to transition queue tells you how many workunits/results are waiting to move down the pipeline. A large number means there is a problem somewhere in our backend server system." Currently the transitioner is at 9 hours, and no WU ready to send. Do I detect a problem??? My movie https://vimeo.com/manage/videos/502242 ID: 130716 ·

Divide Overflow Volunteer tester Send message Joined: 3 Apr 99 Posts: 365 Credit: 131,684 RAC: 0	Message 130731 - Posted: 1 Jul 2005, 0:42:08 UTC I think that RichardG has hit the nail on the head. We just need to wait a day or two for the ripple from the down time to work its way out of the project. As I frequently remind myself: I need patience and I need it RIGHT NOW!!! ;) ID: 130731 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 130739 - Posted: 1 Jul 2005, 0:49:44 UTC I mentioned this in another thread. Basically, just because the queue says "0" doesn't mean there's no work to be sent. Just that there's no backlog of work to be sent. There are 5 splitters working full bore to keep up with current demand. As soon as work is ready, it is sent out. As well, the transitioners are backlogged - once they start catching up you'll see the backlog of work increase. - Matt -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 130739 ·

Byron Leigh Hatch @ team Carl Sagan Volunteer tester Send message Joined: 5 Jul 99 Posts: 4548 Credit: 35,667,570 RAC: 4	Message 130754 - Posted: 1 Jul 2005, 1:08:51 UTC - in response to Message 130739. Last modified: 1 Jul 2005, 1:56:50 UTC I mentioned this in another thread. Basically, just because the queue says "0" doesn't mean there's no work to be sent. Just that there's no backlog of work to be sent. There are 5 splitters working full bore to keep up with current demand. As soon as work is ready, it is sent out. As well, the transitioners are backlogged - once they start catching up you'll see the backlog of work increase. - Matt Hi Matt thank you very much for your post and information byron :) <B>Happy Canada day</B> ID: 130754 ·

Dave Mickey Send message Joined: 19 Oct 99 Posts: 178 Credit: 11,122,965 RAC: 0	Message 130784 - Posted: 1 Jul 2005, 2:09:40 UTC Matt, perhaps I'm thinking of the message you are referring to, during a past iteration of "systems have been down but are catching up" where you noted that you have some script or tool which can show you, for the last 10 data requests, how many were satisifed. For times like these, perhaps something like that could be leveraged into an informative status page item. Something along the lines of 82% of data requests filled in the last 10 minutes or raw stats like, for 10 minutes, 123,456 data requests recvd, 112,345 filled or some similar expression of demand vs. output. It would give stat-mongers something to latch onto and "watch" the systems catch up, and/or be comforted that sometime soon the "no work" messages will abate. That would be easy in your "spare time", no? ;) Dave ID: 130784 ·

SoNic Send message Joined: 24 Dec 00 Posts: 140 Credit: 2,963,627 RAC: 0	Message 130835 - Posted: 1 Jul 2005, 3:13:39 UTC Last night the upload/download server ran out of processes. This happened because the load was very heavy, which causes adverse effects in apache. When hourly apache restarts were issued (for log rotation), old processes wouldn't die and new ones would fill the process queue. By this morning we had over 7000 httpd processes on the machine! Apparently some apache tuning is in order. This went unnoticed, though the lack of server status page updates did get noticed. The page gets updated every 10 minutes (along with all kinds of internal-use BOINC status files). Once every few hours the whole system "skips a turn" due to some funny interaction with cron. But occasionally the whole system stops altogether until somebody comes along and "kicks it" (i.e. removes some stale lock files). So we noticed the status page was stale, "kicked" the whole system and it started up again (temporarily). Everything looked okay, so we went to bed, only to realize the gravity of the problem in the morning (the system was hanging because it would get stuck trying to talk to hosed server). This is from the technical page... that's I was talking about GPL/GNU. ID: 130835 ·

EclipseHA Send message Joined: 28 Jul 99 Posts: 1018 Credit: 530,719 RAC: 0	Message 130896 - Posted: 1 Jul 2005, 4:40:44 UTC - in response to Message 130835. Last modified: 1 Jul 2005, 4:45:12 UTC Last night the upload/download server ran out of processes. This happened because the load was very heavy, which causes adverse effects in apache. When hourly apache restarts were issued (for log rotation), old processes wouldn't die and new ones would fill the process queue. By this morning we had over 7000 httpd processes on the machine! Apparently some apache tuning is in order. This went unnoticed, though the lack of server status page updates did get noticed. The page gets updated every 10 minutes (along with all kinds of internal-use BOINC status files). Once every few hours the whole system "skips a turn" due to some funny interaction with cron. But occasionally the whole system stops altogether until somebody comes along and "kicks it" (i.e. removes some stale lock files). So we noticed the status page was stale, "kicked" the whole system and it started up again (temporarily). Everything looked okay, so we went to bed, only to realize the gravity of the problem in the morning (the system was hanging because it would get stuck trying to talk to hosed server). This is from the technical page... that's I was talking about GPL/GNU. I may be wrong, but doesn't apache have a limit (configurable) as to the max number of spawned processes? Yes.. here it is http://httpd.apache.org/docs-2.0/mod/core.html#rlimitnproc (Same kind of thing was there in apache 1.x) If the processes are not terminating correctly and the max process is hit, that seems to be a cgi problem and not an apache problem. Maybe the client just isn't reporting/reacting to the correct apache type error in this case. Seems if the CGI couldn't talk to the required service (server) might be part of the problem, and some additional error detection might be reguired in the cgi. In other words, I don't think the fault lies with apache . if the "rotate log" scheme currently used hourly SHOULD kill all active connections, I'd question if they are not using the proper scheme for log rotation! ID: 130896 ·

Matt Lebofsky Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Mar 99 Posts: 1444 Credit: 957,058 RAC: 0	Message 130994 - Posted: 1 Jul 2005, 9:03:13 UTC - in response to Message 130896. We have maxclients set low enough so that (normally) when it maxes out it barely eats up half the RAM. As for log rotation, I've been using apache on solaris for about 7 years now, and I gotta say it works most of the time, but not always. Random things happen and there's nothing we can do about it. I've seen this happen on quiet systems, and on heavily loaded systems. That is, you do a "restart" and the old processes don't die. Then the next batch waiting to bind to port 80 don't die either. And on and on. The best we could do is pkill httpd, wait for everything to die, then pkill -9 httpd, wait so more, and then restart apache, which we may very well start doing in light of this, however ungraceful. Or tune down the maxclients even more and see if this happens again. - Matt I may be wrong, but doesn't apache have a limit (configurable) as to the max number of spawned processes? Yes.. here it is http://httpd.apache.org/docs-2.0/mod/core.html#rlimitnproc (Same kind of thing was there in apache 1.x) If the processes are not terminating correctly and the max process is hit, that seems to be a cgi problem and not an apache problem. Maybe the client just isn't reporting/reacting to the correct apache type error in this case. Seems if the CGI couldn't talk to the required service (server) might be part of the problem, and some additional error detection might be reguired in the cgi. In other words, I don't think the fault lies with apache . if the "rotate log" scheme currently used hourly SHOULD kill all active connections, I'd question if they are not using the proper scheme for log rotation! -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude ID: 130994 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.