Seti it runs dry...

Message boards : Number crunching : Seti it runs dry...
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile SoNic

Send message
Joined: 24 Dec 00
Posts: 140
Credit: 2,963,627
RAC: 0
Romania
Message 130684 - Posted: 30 Jun 2005, 23:24:30 UTC
Last modified: 30 Jun 2005, 23:28:30 UTC

I have just looked at the units waiting to be send for work. ZERO. It's sad, so many things are going wrong now, with all the effort from the "geek" team... I wonder if isn't the time to switch to something else than [edit]GNU[/edit] as software. Something more dependable? Don't shoot me, that's just my toughts...
Sorry, I ment GNU but thinking at General Public Licence... I wrote GPL
ID: 130684 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 130685 - Posted: 30 Jun 2005, 23:25:44 UTC

What is GPL?
May this Farce be with You
ID: 130685 · Report as offensive
The frozen

Send message
Joined: 2 Jun 99
Posts: 11
Credit: 261,900
RAC: 0
Germany
Message 130687 - Posted: 30 Jun 2005, 23:29:46 UTC - in response to Message 130684.  

I wonder if isn't the time to switch to something else than GPL as software. Something more dependable? Don't shoot me, that's just my toughts...


Actually the software which causes these problems is not GPL.. As far as I know, the backend software is ONLY available to the seti-team and it is NOT GPL..
ID: 130687 · Report as offensive
Profile SoNic

Send message
Joined: 24 Dec 00
Posts: 140
Credit: 2,963,627
RAC: 0
Romania
Message 130688 - Posted: 30 Jun 2005, 23:30:57 UTC
Last modified: 30 Jun 2005, 23:32:59 UTC

I don't know exactly... Apache isn't GNU? It sais "compatible GPL"... Anyway, free is cheap but not the best usually.
ID: 130688 · Report as offensive
The frozen

Send message
Joined: 2 Jun 99
Posts: 11
Credit: 261,900
RAC: 0
Germany
Message 130695 - Posted: 30 Jun 2005, 23:38:08 UTC - in response to Message 130688.  
Last modified: 30 Jun 2005, 23:41:51 UTC

I don't know exactly... Apache isn't GNU?


Apache is GNU, but most backend parts I know aren't. Are they running SUN OS? This is not GNU.. The validators, transitioners, splitters and all other parts are not GNU too.. You found one GNU-software in a system of non-GPLed software. But I don't think it is okay to say "THIS is the problem! It's GPL so it CAN'T be good" and thats what I read from your post. Please correct me if I am wrong! "Running dry" is the result of many things coming together.. An outage of several hours, old and somewhat "small" hardware, slow tapes to be read, internet connection failing for hours.. Just give it time and it will get okay again - or donate some thousand dollars to the seti-project...

Edit:
Anyway, free is cheap but not the best usually.


I understand your thinking. I think totally different.. Just one example: Most internet servers run Apache. If "cheap" would really mean "not the best" IIS or any non-GPL http-server would be serving most pages on the web..

I will now stop answering here, because I'm afraid this will become a "linux vs. windows" flame war and I don't want something like this...
ID: 130695 · Report as offensive
Profile RichaG
Volunteer tester
Avatar

Send message
Joined: 20 May 99
Posts: 1690
Credit: 19,287,294
RAC: 36
United States
Message 130698 - Posted: 30 Jun 2005, 23:40:26 UTC

It ran dry because the splitters were offline too long.

After a outage of this length, the data base gets hit so hard that the splitters have trouble to enter new work units into the DB.
Red Bull Air Racing

Gas price by zip at Seti

ID: 130698 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 130703 - Posted: 30 Jun 2005, 23:47:29 UTC

Whats more, everyone has been warned that S@H will occasionally not be able to supply work to everyone that asks. It is strongly suggested that everyone join a couple of other BOINC projects.


BOINC WIKI
ID: 130703 · Report as offensive
Profile Pooh Bear 27
Volunteer tester
Avatar

Send message
Joined: 14 Jul 03
Posts: 3224
Credit: 4,603,826
RAC: 0
United States
Message 130716 - Posted: 1 Jul 2005, 0:29:26 UTC

"The Waiting to transition queue tells you how many workunits/results are waiting to move down the pipeline. A large number means there is a problem somewhere in our backend server system."

Currently the transitioner is at 9 hours, and no WU ready to send. Do I detect a problem???



My movie https://vimeo.com/manage/videos/502242
ID: 130716 · Report as offensive
Divide Overflow
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 365
Credit: 131,684
RAC: 0
United States
Message 130731 - Posted: 1 Jul 2005, 0:42:08 UTC

I think that RichardG has hit the nail on the head. We just need to wait a day or two for the ripple from the down time to work its way out of the project.

As I frequently remind myself: I need patience and I need it RIGHT NOW!!! ;)

ID: 130731 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 130739 - Posted: 1 Jul 2005, 0:49:44 UTC

I mentioned this in another thread. Basically, just because the queue says "0" doesn't mean there's no work to be sent. Just that there's no backlog of work to be sent. There are 5 splitters working full bore to keep up with current demand. As soon as work is ready, it is sent out. As well, the transitioners are backlogged - once they start catching up you'll see the backlog of work increase.

- Matt
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 130739 · Report as offensive
Profile Byron Leigh Hatch @ team Carl Sagan
Volunteer tester
Avatar

Send message
Joined: 5 Jul 99
Posts: 4548
Credit: 35,667,570
RAC: 4
Canada
Message 130754 - Posted: 1 Jul 2005, 1:08:51 UTC - in response to Message 130739.  
Last modified: 1 Jul 2005, 1:56:50 UTC

I mentioned this in another thread. Basically, just because the queue says "0" doesn't mean there's no work to be sent. Just that there's no backlog of work to be sent. There are 5 splitters working full bore to keep up with current demand. As soon as work is ready, it is sent out. As well, the transitioners are backlogged - once they start catching up you'll see the backlog of work increase.

- Matt



Hi Matt
thank you very much for your post and information
byron
:)
<B>Happy Canada day</B>
ID: 130754 · Report as offensive
Dave Mickey

Send message
Joined: 19 Oct 99
Posts: 178
Credit: 11,122,965
RAC: 0
United States
Message 130784 - Posted: 1 Jul 2005, 2:09:40 UTC

Matt, perhaps I'm thinking of the message you
are referring to, during a past iteration of
"systems have been down but are catching up"
where you noted that you have some script or
tool which can show you, for the last 10 data
requests, how many were satisifed. For times
like these, perhaps something like that could
be leveraged into an informative status page
item.

Something along the lines of

82% of data requests filled in the last 10 minutes

or raw stats like, for 10 minutes,

123,456 data requests recvd, 112,345 filled

or some similar expression of demand vs. output.

It would give stat-mongers something to latch onto
and "watch" the systems catch up, and/or be comforted
that sometime soon the "no work" messages will abate.

That would be easy in your "spare time", no? ;)

Dave


ID: 130784 · Report as offensive
Profile SoNic

Send message
Joined: 24 Dec 00
Posts: 140
Credit: 2,963,627
RAC: 0
Romania
Message 130835 - Posted: 1 Jul 2005, 3:13:39 UTC

Last night the upload/download server ran out of processes. This happened because the load was very heavy, which causes adverse effects in apache. When hourly apache restarts were issued (for log rotation), old processes wouldn't die and new ones would fill the process queue. By this morning we had over 7000 httpd processes on the machine! Apparently some apache tuning is in order.
This went unnoticed, though the lack of server status page updates did get noticed. The page gets updated every 10 minutes (along with all kinds of internal-use BOINC status files). Once every few hours the whole system "skips a turn" due to some funny interaction with cron. But occasionally the whole system stops altogether until somebody comes along and "kicks it" (i.e. removes some stale lock files).
So we noticed the status page was stale, "kicked" the whole system and it started up again (temporarily). Everything looked okay, so we went to bed, only to realize the gravity of the problem in the morning (the system was hanging because it would get stuck trying to talk to hosed server).

This is from the technical page... that's I was talking about GPL/GNU.
ID: 130835 · Report as offensive
EclipseHA

Send message
Joined: 28 Jul 99
Posts: 1018
Credit: 530,719
RAC: 0
United States
Message 130896 - Posted: 1 Jul 2005, 4:40:44 UTC - in response to Message 130835.  
Last modified: 1 Jul 2005, 4:45:12 UTC

Last night the upload/download server ran out of processes. This happened because the load was very heavy, which causes adverse effects in apache. When hourly apache restarts were issued (for log rotation), old processes wouldn't die and new ones would fill the process queue. By this morning we had over 7000 httpd processes on the machine! Apparently some apache tuning is in order.
This went unnoticed, though the lack of server status page updates did get noticed. The page gets updated every 10 minutes (along with all kinds of internal-use BOINC status files). Once every few hours the whole system "skips a turn" due to some funny interaction with cron. But occasionally the whole system stops altogether until somebody comes along and "kicks it" (i.e. removes some stale lock files).
So we noticed the status page was stale, "kicked" the whole system and it started up again (temporarily). Everything looked okay, so we went to bed, only to realize the gravity of the problem in the morning (the system was hanging because it would get stuck trying to talk to hosed server).

This is from the technical page... that's I was talking about GPL/GNU.



I may be wrong, but doesn't apache have a limit (configurable) as to the max number of spawned processes?

Yes.. here it is http://httpd.apache.org/docs-2.0/mod/core.html#rlimitnproc

(Same kind of thing was there in apache 1.x)

If the processes are not terminating correctly and the max process is hit, that seems to be a cgi problem and not an apache problem. Maybe the client just isn't reporting/reacting to the correct apache type error in this case. Seems if the CGI couldn't talk to the required service (server) might be part of the problem, and some additional error detection might be reguired in the cgi.

In other words, I don't think the fault lies with apache
.

if the "rotate log" scheme currently used hourly SHOULD kill all active connections, I'd question if they are not using the proper scheme for log rotation!
ID: 130896 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 130994 - Posted: 1 Jul 2005, 9:03:13 UTC - in response to Message 130896.  

We have maxclients set low enough so that (normally) when it maxes out it barely eats up half the RAM.

As for log rotation, I've been using apache on solaris for about 7 years now, and I gotta say it works most of the time, but not always. Random things happen and there's nothing we can do about it. I've seen this happen on quiet systems, and on heavily loaded systems. That is, you do a "restart" and the old processes don't die. Then the next batch waiting to bind to port 80 don't die either. And on and on. The best we could do is pkill httpd, wait for everything to die, then pkill -9 httpd, wait so more, and then restart apache, which we may very well start doing in light of this, however ungraceful. Or tune down the maxclients even more and see if this happens again.

- Matt



I may be wrong, but doesn't apache have a limit (configurable) as to the max number of spawned processes?

Yes.. here it is http://httpd.apache.org/docs-2.0/mod/core.html#rlimitnproc

(Same kind of thing was there in apache 1.x)

If the processes are not terminating correctly and the max process is hit, that seems to be a cgi problem and not an apache problem. Maybe the client just isn't reporting/reacting to the correct apache type error in this case. Seems if the CGI couldn't talk to the required service (server) might be part of the problem, and some additional error detection might be reguired in the cgi.

In other words, I don't think the fault lies with apache
.

if the "rotate log" scheme currently used hourly SHOULD kill all active connections, I'd question if they are not using the proper scheme for log rotation!


-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 130994 · Report as offensive

Message boards : Number crunching : Seti it runs dry...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.