Maintence Day

Message boards : Number crunching : Maintence Day
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1143626 - Posted: 21 Aug 2011, 23:25:21 UTC

What would be the downside to moving the maintenance
day from Tuesday to Monday? It seems that since the
system (more often than not) develops problems over
the weekend, why not move up the maintenance and
possibly decrease the total amount of down time for the
week.

It might give one more day for any problems that pop up
after the system is brought back online to appear while
everybody is still in the office, thereby getting fixed
during normal work hours (hopefully).
ID: 1143626 · Report as offensive
Profile Slavac
Volunteer tester
Avatar

Send message
Joined: 27 Apr 11
Posts: 1932
Credit: 17,952,639
RAC: 0
United States
Message 1143627 - Posted: 21 Aug 2011, 23:28:43 UTC - in response to Message 1143626.  

I like Tuesday for a few reasons:

1. Allows programmers to get their ducks in a row before they start working on Tues.

2. Who wants to tear apart a server closet on Monday after a long Sunday in the sun?




Executive Director GPU Users Group Inc. -
brad@gpuug.org
ID: 1143627 · Report as offensive
.clair.

Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 55,390,408
RAC: 69
United Kingdom
Message 1143667 - Posted: 22 Aug 2011, 1:24:07 UTC

Some time ago i read that doing it on a tuesday means it avoids public holidays when they happen and keeps the automated bits of the job nice and regular.
ID: 1143667 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1143683 - Posted: 22 Aug 2011, 2:40:07 UTC

My preference is.. any time they need to, for as long as they need to, just let us know before hand if possible.
Janice
ID: 1143683 · Report as offensive
Tom95134

Send message
Joined: 27 Nov 01
Posts: 216
Credit: 3,790,200
RAC: 0
United States
Message 1143685 - Posted: 22 Aug 2011, 2:40:52 UTC

Starting the Maintenance Cycle on Tuesday also gives the guys a chance to try and figure out what little "gifts" we left in the system over the weekend and work out a sensible plan to try to sort them out during the maintenance cycle.

Even if there are no surprises from the weekend it does give the guys a chance to evaluate what went right over the weekend and to do a reasonable plan for the Maintenance Cycle and then getting things running again and the pipeline filled.


ID: 1143685 · Report as offensive
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 1143706 - Posted: 22 Aug 2011, 3:16:13 UTC - in response to Message 1143685.  

My 2cents say ... it's not my project it belongs to the boys out west ... I'm just glad they let me play in the sand box!!

Ed F
ID: 1143706 · Report as offensive
bill

Send message
Joined: 16 Jun 99
Posts: 861
Credit: 29,352,955
RAC: 0
United States
Message 1143740 - Posted: 22 Aug 2011, 6:15:16 UTC
Last modified: 22 Aug 2011, 6:16:49 UTC

Ahh, the answer appears to be political. It would
interfere with government mandated holidays.

Personally, I think it would decrease downtime overall,
but, we'll never know.
ID: 1143740 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22217
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1143751 - Posted: 22 Aug 2011, 7:41:00 UTC

Having once worked in server management I'd say Monday is the day that most problems are uncovered, so Tuesday is the day to start fixing them. If hardware is needed it gives you a fighting chance to get it ordered and delivered before you start the sweaty jobs in the server room. And as Slavac says who wants to go into the server room after a nice weekend in the sun?

The posted three day maintenance "window of opportunity" is a very good idea, first day do the routine, the rest you either use, or don't use, but your user base is aware there may be other periods of non-availability.




(On an 8/24-5/7 operation like S@H, with a complex, and ageing, server collection there is a well known fact of life - Servers will misbehave worst just everyone has had the "third glass of red", so can't drive back to work...)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1143751 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 1143787 - Posted: 22 Aug 2011, 11:58:03 UTC - in response to Message 1143740.  

Ahh, the answer appears to be political. It would
interfere with government mandated holidays.

Personally, I think it would decrease downtime overall,
but, we'll never know.


I think that is a bit of oversimplification. People expect holidays, mandated or not. You can ask them to give them up, but you will pay a long term price in staff performance, and eventually staff migration. Been there, both as a hard-assed manager and as a fed up employee.

And what will we do when there is a Monday holiday? Skip a week's worth of maintenance? Sometimes have maintenance on Tuesdays? Prices to pay for both of those options as well.

As often happens in life, you set out looking for a good solution, but may have to settle for the "least bad" solution.

ID: 1143787 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1143805 - Posted: 22 Aug 2011, 13:14:57 UTC

Monday is also often a "recovery day" for any problems that might have occurred over the weekend. I normally end up doing maintenance on our stuff at work on Wednesday or Thursday.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1143805 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1143806 - Posted: 22 Aug 2011, 13:17:48 UTC - in response to Message 1143805.  

Monday is also often a "recovery day" for any problems that might have occurred over the weekend. I normally end up doing maintenance on our stuff at work on Wednesday or Thursday.

I have often seen a 'quick fix' put into place on Monday to get things running until the full sort on Tuesday.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1143806 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 1143819 - Posted: 22 Aug 2011, 14:39:37 UTC

How about segmenting the system architecture better so that different parts can be "maintained" on different days, deferring complete shutdowns to a more infrequent interval?
ID: 1143819 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1143832 - Posted: 22 Aug 2011, 15:10:43 UTC - in response to Message 1143819.  

How about segmenting the system architecture better so that different parts can be "maintained" on different days, deferring complete shutdowns to a more infrequent interval?

I thought the primary reason for the weekly maintenance is to make a backup of the science databases. Which effectively takes everything down.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1143832 · Report as offensive
musicplayer

Send message
Joined: 17 May 10
Posts: 2430
Credit: 926,046
RAC: 0
Message 1143859 - Posted: 22 Aug 2011, 15:52:31 UTC
Last modified: 22 Aug 2011, 15:54:11 UTC

If I push the Update-button on PrimeGrid, I will be passing 8 million in credit there today.

Even so, I was able to upload my results here a little earlier on. It shows that there are people who still believe in this project.

Also I received a CUDA-task a little earlier on from someone who must have been a very early morning bird. Not bad!

Isn't it really so that when downloads do not work, you need to have a cache?

So if uploads break down on either a Saturday or Sunday, you soon run dry.

If one or so day is "fix it" day and it works the rest of the week, Tuesdays seem to be the best day for backup of data to me. It really could soon be worse.

Uploads are separate from downloads. Downloads are raw-data which are unprocessed and comes from Arecibo. Uploads are the finished tasks which are meant to be stored in the Master Database or Science Database as well.
ID: 1143859 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1143870 - Posted: 22 Aug 2011, 16:01:53 UTC - in response to Message 1143832.  

How about segmenting the system architecture better so that different parts can be "maintained" on different days, deferring complete shutdowns to a more infrequent interval?

I thought the primary reason for the weekly maintenance is to make a backup of the science databases. Which effectively takes everything down.

Posted about 4 days ago by Matt in Tech News........

"The most important thing that happens during the outage is that we compress the mysql databases. Since we are inserting/deleting millions of rows per day (all results and workunits) the database pages get ridiculously fragmented really fast, and after about a week can no longer fit in memory. The compression part is what takes about 2-3 hours, and you can't really do much with the database while that happens, which is why we stop the projects.

The actual backup part takes about 45 minutes, and could actually happen live if we did it on the replica, but just to be safe we back up the master.

We also take care of other odds and ends, like rotating the backend server logs, and replacing broken drives, etc. while everything is quiet."
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1143870 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1143890 - Posted: 22 Aug 2011, 16:31:15 UTC - in response to Message 1143870.  

How about segmenting the system architecture better so that different parts can be "maintained" on different days, deferring complete shutdowns to a more infrequent interval?

I thought the primary reason for the weekly maintenance is to make a backup of the science databases. Which effectively takes everything down.

Posted about 4 days ago by Matt in Tech News........

"The most important thing that happens during the outage is that we compress the mysql databases. Since we are inserting/deleting millions of rows per day (all results and workunits) the database pages get ridiculously fragmented really fast, and after about a week can no longer fit in memory. The compression part is what takes about 2-3 hours, and you can't really do much with the database while that happens, which is why we stop the projects.

The actual backup part takes about 45 minutes, and could actually happen live if we did it on the replica, but just to be safe we back up the master.

We also take care of other odds and ends, like rotating the backend server logs, and replacing broken drives, etc. while everything is quiet."

Right! That's it. Thanks.
Like I can remember 4 days ago... :)
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1143890 · Report as offensive
justsomeguy

Send message
Joined: 27 May 99
Posts: 84
Credit: 6,084,595
RAC: 11
United States
Message 1143895 - Posted: 22 Aug 2011, 16:40:20 UTC - in response to Message 1143890.  


Like I can remember 4 days ago... :)



Hear, hear!

Heck all I remember from yesterday is blah blah blah your never listen or something like that anyway...

:)

"Two things are infinite: The universe and human stupidity; and I'm not sure about the universe." - Albert Einstein

ID: 1143895 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1143942 - Posted: 22 Aug 2011, 17:38:55 UTC - in response to Message 1143929.  

If you want to know the findings of any result (which is far off topic for this thread), the Near Time Persistency Checker (NTPCKR, or "nit picker") is supposed to do just that. The problem is that due to SETI's over-worked and over-stressed server infrastructure, the NTPCKR functionality has been temporarily disabled.

In the mean time, we're just collecting the results to be later analyzed.
ID: 1143942 · Report as offensive
musicplayer

Send message
Joined: 17 May 10
Posts: 2430
Credit: 926,046
RAC: 0
Message 1143948 - Posted: 22 Aug 2011, 17:44:49 UTC

Is there a difference between some blahh blahh blahh and the results that are being received by you from the users of Seti@home?
ID: 1143948 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1143961 - Posted: 22 Aug 2011, 18:06:51 UTC - in response to Message 1143948.  

Is there a difference between some blahh blahh blahh and the results that are being received by you from the users of Seti@home?


Being received by me? I am not receiving anything. I'm nowhere near Berkeley, California. I work in IT as a Server Administrator.

If you want to talk numbers, create a new thread here in Number Crunching and I'm sure someone who understands the signal analysis would love to speak up and help you.
ID: 1143961 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Maintence Day


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.