Outage schedule change suggestion.

Message boards : Number crunching : Outage schedule change suggestion.
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Qui-Gon
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 2940
Credit: 19,199,902
RAC: 11
United States
Message 1014502 - Posted: 10 Jul 2010, 23:33:13 UTC

Has anyone thought about changing the outage to Friday through Monday? On Saturday and Sunday,at least, the staff wouldn't need to be around to restart the servers if they crash (and it does seem to happen on the weekends), and on Friday the staff can set up the lab to do "the science" that is the reason for the longer outage (I'm assuming no one is doing "the science" on a round-the-clock basis, and that whatever computing is being done in furtherance of "the science", staff should not need to baby-sit it).
ID: 1014502 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1014507 - Posted: 10 Jul 2010, 23:49:06 UTC - in response to Message 1014502.  

AFAIK main point was to free resources for their internal work with database/servers.
Obviously it should be work days, not holidays ;)
ID: 1014507 · Report as offensive
Profile Qui-Gon
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 2940
Credit: 19,199,902
RAC: 11
United States
Message 1014512 - Posted: 11 Jul 2010, 0:04:59 UTC - in response to Message 1014507.  

Why? Do they have to be on site while the database/servers are running? If the work was started on Friday, couldn't it run the rest of the weekend? It is not obvious to me that a weekend outage wouldn't work.
ID: 1014512 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1014517 - Posted: 11 Jul 2010, 0:47:07 UTC - in response to Message 1014512.  

Why? Do they have to be on site while the database/servers are running? If the work was started on Friday, couldn't it run the rest of the weekend? It is not obvious to me that a weekend outage wouldn't work.

It depends how long each query is. If it's full day query - then it could be done from friday to monday indeed. But if it's just many short queries each of that needs full server capacity and each of them requires some man-performed thinking (yes, AI still not enough :P ) before starting next one... I think you got the picture :)
ID: 1014517 · Report as offensive
Profile Qui-Gon
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 2940
Credit: 19,199,902
RAC: 11
United States
Message 1014541 - Posted: 11 Jul 2010, 2:40:59 UTC - in response to Message 1014517.  

Do you know for a fact that they must have people there all the time? I am not familiar with you and I don't know how to judge your argument. What happens now if there is a problem after midnight on a Tuesday or Wednesday when I assume no one is around?

I don't have "the picture" because I have not read anything detailed about "the science" being done and how it is accomplished during the current three-day long outage. (The front page and Technical News say little about what they are doing during the outage to advance "the science" or how it is being done.)

If it is as you say, then they might load a number of queries in a queue that would take them through the weekend; if the servers they are using crash over the weekend they might remotely re-start, as they try to do now when upload/download servers crash over a weekend.
ID: 1014541 · Report as offensive
Profile Geek@Play
Volunteer tester
Avatar

Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,146,931
RAC: 0
United States
Message 1014545 - Posted: 11 Jul 2010, 3:00:42 UTC
Last modified: 11 Jul 2010, 3:01:02 UTC

The reason for the science fair being scheduled for Tuesday through Thursday was to allow our scientists more time for other endeavors besides keeping the servers and SETI online. You know, time for bug squashing, Boinc software enhancements, Nitpicker to pick throughn the data un-interupted. The scientists are in the office on these days doing other things for us.
Boinc....Boinc....Boinc....Boinc....
ID: 1014545 · Report as offensive
Profile Qui-Gon
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 2940
Credit: 19,199,902
RAC: 11
United States
Message 1014553 - Posted: 11 Jul 2010, 3:49:15 UTC - in response to Message 1014545.  

Geek@Play, you seem to be telling me that when they said the extended outage was meant to do "the science", that included system maintenance, like bug squashing, and BOINC software enhancements". Those things don't need so much computing power that the upload/download servers need to be off. As far as NTPCkr going through the data, it doesn't sound like much human intervention should be required if, as I suggested, the data to be checked is put in a queue. At any rate, I hate to be so blunt, but are you and Raistmer guessing or can you point me to some post or discussion that says what "the science" consists of? As I said before, I haven't read about in on the front page or Technical News.
ID: 1014553 · Report as offensive
Profile John Neale
Volunteer tester
Avatar

Send message
Joined: 16 Mar 00
Posts: 634
Credit: 7,246,513
RAC: 9
South Africa
Message 1014577 - Posted: 11 Jul 2010, 6:19:24 UTC

Qui-Gon, Matt Lebofsky did tell us what was achieved during the first three-day scheduled outage, in the first post of the Staycation thread, in the Technical News forum.

He used the following phrases: "...stuff that would never get done under normal-operation circumstances..." and "...doing so while the project was up was causing all kinds of headaches...". In addition, I think his comment that "the mood around here is a lot better when we have the time and resources to take care of longstanding projects without worrying about squeezing them in edgewise" says a lot.

All of this may not answer your question directly, but Matt did say that future outage schedules may be adjusted as productivity improves. I'm speculating, but it may be that once they've completed some of the hands-on projects, the data analysis (the "science") will be able to run over weekends, without requiring constant human supervision or intervention.
ID: 1014577 · Report as offensive
Profile Qui-Gon
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 2940
Credit: 19,199,902
RAC: 11
United States
Message 1014765 - Posted: 11 Jul 2010, 19:57:37 UTC - in response to Message 1014577.  

All of this may not answer your question directly, but Matt did say that future outage schedules may be adjusted as productivity improves. I'm speculating, but it may be that once they've completed some of the hands-on projects, the data analysis (the "science") will be able to run over weekends, without requiring constant human supervision or intervention.

I read that post by Matt. I believe he was saying that during the extended outage they can work on their projects uninterrupted by server crashes. That certainly has merit (I don't like interruptions either when doing work that requires serious concentration).

I started this thread to put out the idea of changing the extended outage to the weekend. I hope that at some point it will be feasible to have a single work-day (Friday) outage, for doing the work Matt was describing, followed by an upload/download outage on Saturday-Sunday, for doing "the science" that requires the use of the project's computing power. That way that the staff will be available for most of the work-week to deal with old cranky servers, and if the servers behave, they might even do some of "the science" work.
ID: 1014765 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 1015276 - Posted: 13 Jul 2010, 4:38:12 UTC

The one thing that struck me from the staycation thread was:

Data wise, we were able to get back to merging our various spike tables together full bore - doing so while the project was up was causing all kinds of headaches. We'll have to turn the merge off over the weekend, of course. I also was able to do a whole bunch of data integrity testing - it's nice to be able to pull 1 Gbyte of signals out of the science database without the query getting blocked, or worrying about blocking other queries.


From History, we know that the Science Database is Postgress, The Boinc Database is MySQL. One of the plans was to migrate the Science Database to MySQL (time for all those millions of records and some unknow value for problems).

Each Database has things it likes or dislikes about various queries.

As each completed workunit gets uploaded and reported it transitions from MySQL to Postgress. So keeping the Databases running and talking (which has always been a problem with multiple machines accessing the databases) is not easy.

So with the goal of getting long standing "broken things" working (no time to fix during the normal weekly outage)... The time they are spending is fine with me

I also await, when they talk about changes to the plan that allows for changes of days etc.

Regards


Please consider a Donation to the Seti Project.

ID: 1015276 · Report as offensive
Profile Qui-Gon
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 2940
Credit: 19,199,902
RAC: 11
United States
Message 1015418 - Posted: 13 Jul 2010, 23:49:36 UTC - in response to Message 1015276.  
Last modified: 13 Jul 2010, 23:50:23 UTC

Thank you Pappa, and all others who have expressed their views on my suggestion to change the outage days. At this point, there may be very good reasons to keep the extended outage from Tuesday to Thurdsay, but my hope is that when having staff on hand to manage tasks becomes less critical, it might make sense to consider moving the outage to the weekend. If that is done, server crashes (that happen on work days) would be more quickly addressed by staff, and server crashes that occur on a weekend (when no one is around to fix things), and when the project's computing power would be used for "the science", would not impact crunching by the volunteers.

I see now that the staff is able to clean up long-standing issues during the weekday extended outage; when, and I hope soon, those issues are under control, it would benefit the project and help out the volunteer crunchers to have staff available as much as possible on work days to keep the servers up, suppling work and collecting results. Under the present system, crunchers can not get or return work from Tuesday to Thursay; coupling that with the good possibility that the servers will go down over any given weekend, when staff can't correct the problem, means that crunchers could, potentially, have only Monday and Friday to access the upload/download servers.
ID: 1015418 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19525
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1015433 - Posted: 14 Jul 2010, 0:36:37 UTC - in response to Message 1015418.  

It is even worse from a European viewpoint, especially for computers run only during office hours. A weekend outage means the servers are off during the European Monday working hours, as their end of day is at about the same time as the Berkeley start of working day. And on Friday when there is liable to be an extended recovery period and a limit on downloads it will be impossible to download enough to cover the next week.
ID: 1015433 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 1015450 - Posted: 14 Jul 2010, 1:31:06 UTC

There is more:
If we look at bringing a single server uptodate for the OS and and various "patches" to make things behave correctly... That alone can take a couple of days. Not to mention machines that are not listed doing Science. To Bring MySQL uptodate with various patches to keep all the machine talking to the MySQL Database in sync...

So while Jeff is working fixing qeuries that talk to all databases to make them work more effeciently (not cause an outage).

The last I heard the Boinc Database (MySQL) is one of the largest active databases (number of transactions/second) in the world.
With over a million (as reported WU's sent/daily) Results reported, as the tapes are split, As the Scheduler links the WU to the User and returned result to the user, then silly things linke credit granted as a result of the validdation and finally as the WU's gets purged and the Cannonical Result gets transfered to the Science Database. A few User(s) scrapping stats via the web... And Then Users wanting to post in forums (open a long thread and hit refresh each message that has a Number has just been queried).

The Tweaks the Staff does or asks for help to improve the quality of the Database Engine and Nix. They know the Developers, your outage means someone gets an email.

As far as the suggestion to move the Science (downtime) to over the weekends to increase User productivity. If they are not worried about the servers then it would make some sense. It would presume they are not having to stop and restart processes as they needed to tweak a program that was mis-behaving. As things get stablized, I would suppect we will see some changes.

If I remember correctly, and I would have to go back and look. There was a statement I believe Matt made to the effect that due to Budget shortfall for UCB that people were required to be taking "Furlow Days (non-paid)." Which was supposed to be running into the Fall.

I suspect that as it implied there was some flexability in timing of the days and that there are a few of the Staff that are overdue for the postponed timeoff. That was also an implication in Staycation.

Regards


Please consider a Donation to the Seti Project.

ID: 1015450 · Report as offensive
Profile Qui-Gon
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 2940
Credit: 19,199,902
RAC: 11
United States
Message 1015459 - Posted: 14 Jul 2010, 2:12:30 UTC - in response to Message 1015450.  

There is more:
If we look at bringing a single server uptodate for the OS and and various "patches" to make things behave correctly... That alone can take a couple of days. Not to mention machines that are not listed doing Science. To Bring MySQL uptodate with various patches to keep all the machine talking to the MySQL Database in sync...

So while Jeff is working fixing qeuries that talk to all databases to make them work more effeciently (not cause an outage).

The last I heard the Boinc Database (MySQL) is one of the largest active databases (number of transactions/second) in the world.
With over a million (as reported WU's sent/daily) Results reported, as the tapes are split, As the Scheduler links the WU to the User and returned result to the user, then silly things linke credit granted as a result of the validdation and finally as the WU's gets purged and the Cannonical Result gets transfered to the Science Database. A few User(s) scrapping stats via the web... And Then Users wanting to post in forums (open a long thread and hit refresh each message that has a Number has just been queried).

The Tweaks the Staff does or asks for help to improve the quality of the Database Engine and Nix. They know the Developers, your outage means someone gets an email.

As far as the suggestion to move the Science (downtime) to over the weekends to increase User productivity. If they are not worried about the servers then it would make some sense. It would presume they are not having to stop and restart processes as they needed to tweak a program that was mis-behaving. As things get stablized, I would suppect we will see some changes.

If I remember correctly, and I would have to go back and look. There was a statement I believe Matt made to the effect that due to Budget shortfall for UCB that people were required to be taking "Furlow Days (non-paid)." Which was supposed to be running into the Fall.

I suspect that as it implied there was some flexability in timing of the days and that there are a few of the Staff that are overdue for the postponed timeoff. That was also an implication in Staycation.

Regards


Whew! Okay Pappa, you've convinced me that for now the weekday outages are necessary. All I'm trying to say is that when it is possible, outages on the weekend would be much fairer to crunchers who have over the years lost accessibility over many weekends and who now have lost accessibility every week from Tuesday to Thursday (not to mention difficulties connecting by those in far away time-zones). The prospect of having only two days of connection in a week when the servers go down late Friday evening (this happens!) is pretty dismal. Even though the extended outage can't be scheduled for the weekend now, I hope that someday when the staff gets things running more efficiently, the option will be considered.
ID: 1015459 · Report as offensive
ront

Send message
Joined: 25 Aug 01
Posts: 77
Credit: 386,336
RAC: 0
United States
Message 1015539 - Posted: 14 Jul 2010, 10:05:20 UTC

Good Morning,

I was directed to ask this question here.


"Can anyone assist me in adjusting my program so that I get more than 2 WUs at a time?

I have adjusted profile to reflect "10" days.

Do I need to do anything else (Obvious answer = yes! :-))?

If 'cache' needs adjusting, I will probably need details.

Thanks,

Be Blessed & Be A Blessing,"

ront


ID: 1015539 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 1015545 - Posted: 14 Jul 2010, 10:33:10 UTC - in response to Message 1015539.  
Last modified: 14 Jul 2010, 10:43:21 UTC

You still haven't found the right thread ;-)

You should either open a new thread or look for ones that better suit your problem, like Not requesting new tasks.

But anyway, we need more information.

Does your client ask for new work?

What is your Duration Correction Factor (to be found in host details)?

[edit]What is the status of your other (inactive) project (debt/offsets)?[/edit]

Please also post the first few lines from the Messages tab after a restart of BOINC, including an update request.

Gruß,
Gundolf
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours
ID: 1015545 · Report as offensive
ront

Send message
Joined: 25 Aug 01
Posts: 77
Credit: 386,336
RAC: 0
United States
Message 1015551 - Posted: 14 Jul 2010, 11:29:57 UTC - in response to Message 1015545.  

Thank you for your response.

I will try to find the appropriate thread per your counsel.

Yes,the host does requests tasks but no more than three.

I do not know what the duration correction factor is nor where to find it

I do not have any other "projects."

14-Jul-10 06:38:15 SETI@home Requesting new tasks
14-Jul-10 06:38:36 Project communication failed: attempting access to reference site
14-Jul-10 06:38:36 SETI@home Scheduler request failed: Couldn't connect to server
14-Jul-10 06:38:37 Internet access OK - project servers may be temporarily down.
14-Jul-10 06:40:01 SETI@home Sending scheduler request: To fetch work.
14-Jul-10 06:40:01 SETI@home Requesting new tasks
14-Jul-10 06:40:23 Project communication failed: attempting access to reference site
14-Jul-10 06:40:23 SETI@home Scheduler request failed: Couldn't connect to server
14-Jul-10 06:40:24 Internet access OK - project servers may be temporarily down.
14-Jul-10 06:45:04 SETI@home Sending scheduler request: To fetch work.
14-Jul-10 06:45:04 SETI@home Requesting new tasks
14-Jul-10 06:45:26 Project communication failed: attempting access to reference site
14-Jul-10 06:45:26 SETI@home Scheduler request failed: Couldn't connect to server
14-Jul-10 06:45:27 Internet access OK - project servers may be temporarily down.
14-Jul-10 06:49:36 SETI@home Sending scheduler request: To fetch work.
14-Jul-10 06:49:36 SETI@home Requesting new tasks
14-Jul-10 06:49:58 Project communication failed: attempting access to reference site
14-Jul-10 06:49:58 SETI@home Scheduler request failed: Couldn't connect to server
14-Jul-10 06:50:00 Internet access OK - project servers may be temporarily down.
14-Jul-10 07:25:09 SETI@home Sending scheduler request: To fetch work.
14-Jul-10 07:25:09 SETI@home Requesting new tasks
14-Jul-10 07:25:31 Project communication failed: attempting access to reference site
14-Jul-10 07:25:31 SETI@home Scheduler request failed: Couldn't connect to server
14-Jul-10 07:25:32 Internet access OK - project servers may be temporarily down.
14-Jul-10 07:26:44 SETI@home update requested by user
14-Jul-10 07:26:46 SETI@home Sending scheduler request: Requested by user.
14-Jul-10 07:26:46 SETI@home Requesting new tasks
14-Jul-10 07:27:08 Project communication failed: attempting access to reference site
14-Jul-10 07:27:08 SETI@home Scheduler request failed: Couldn't connect to server
14-Jul-10 07:27:10 Internet access OK - project servers may be temporarily down.

Again, thanks for your help.

I hope this is helpful

ront
ID: 1015551 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1015552 - Posted: 14 Jul 2010, 11:31:53 UTC - in response to Message 1015551.  

servers should be coming back up friday. No one is getting or turning in units at the moment.
Janice
ID: 1015552 · Report as offensive
Profile Qui-Gon
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 2940
Credit: 19,199,902
RAC: 11
United States
Message 1016382 - Posted: 16 Jul 2010, 18:50:06 UTC

The suggestion in this thread is based on the kind of problem we are seeing today.

From the front page:

The project down due to a network problem.
We have lost connectivity somewhere between our two routers. One of these routers is located at the Space Sciences Lab and the other at the Peering And Internet eXchange (PAIX) in Palo Alto. Until we get this problem cleared up we cannot distribute new work.

If this happened on Saturday, or if this problem persists into the weekend, we who crunch would be unable to connect for more than half of the week. If there needs to be a three-day upload/download outage, and Pappa has convinced me the need exists, then as soon as possible it should be moved to the weekend.

ID: 1016382 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1016392 - Posted: 16 Jul 2010, 19:06:19 UTC - in response to Message 1016382.  

The suggestion in this thread is based on the kind of problem we are seeing today.

From the front page:

The project down due to a network problem.
We have lost connectivity somewhere between our two routers. One of these routers is located at the Space Sciences Lab and the other at the Peering And Internet eXchange (PAIX) in Palo Alto. Until we get this problem cleared up we cannot distribute new work.

If this happened on Saturday, or if this problem persists into the weekend, we who crunch would be unable to connect for more than half of the week. If there needs to be a three-day upload/download outage, and Pappa has convinced me the need exists, then as soon as possible it should be moved to the weekend.


Then the staff would have to work on the weekend to do the things they try to get done during the tue-fri outage.
My understanding is the extended outage is to have the db free for them to work on things. Which can't be done with everyone using it.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1016392 · Report as offensive
1 · 2 · 3 · Next

Message boards : Number crunching : Outage schedule change suggestion.


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.