Panic Mode On (9) Server problems


log in

Advanced search

Message boards : Number crunching : Panic Mode On (9) Server problems

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next
Author Message
Tonne
Send message
Joined: 11 Jun 99
Posts: 27
Credit: 16,093,366
RAC: 0
Sweden
Message 818667 - Posted: 15 Oct 2008, 8:07:03 UTC

Shure. Writing one minute, informing us about what is going on and finnishing one minute later. This total lack of interest in us crunchers is making me thinking about switching project to support.
____________
/TB

Ianab
Volunteer tester
Send message
Joined: 11 Jun 08
Posts: 667
Credit: 12,514,650
RAC: 8,194
New Zealand
Message 818668 - Posted: 15 Oct 2008, 8:08:09 UTC - in response to Message 818653.

From Matt's post on the Technical News section..

"We're also fairly pegged at our network limit again, I think thanks to the workunit turnaround time being pretty low (i.e. fast). Plus I have to send extra raw data to our archive over the same link. Oh well. Expect data transfer headaches for the next qwhile.
"

I think that we are still in the 'Qwhile' period, and catching up after the extended Tuesday outage is bogging down the network link.

No servers are down, there are no faults to fix. Just cache a couple of days work and you wont have any problems. The system is tollerent of temporary faults and network overloads and will sort itself out in due course.

Ian

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8644
Credit: 24,405,883
RAC: 24,936
United Kingdom
Message 818670 - Posted: 15 Oct 2008, 8:13:24 UTC - in response to Message 818653.

There was prior warning here and the front page still says,
[quote]Weekly Outage
Every Tuesday morning (Pacific time) we have a 3-4 hour outage for database maintenance. The upload/download servers and some web pages are offline during this time. Due to extra work today's outage will start earlier and end later than usual.

Tonne
Send message
Joined: 11 Jun 99
Posts: 27
Credit: 16,093,366
RAC: 0
Sweden
Message 818672 - Posted: 15 Oct 2008, 8:26:42 UTC

I've said what I want so now I rest my case
____________
/TB

Profile Aristoteles Doukas
Avatar
Send message
Joined: 11 Apr 08
Posts: 1091
Credit: 2,140,913
RAC: 0
Finland
Message 818673 - Posted: 15 Oct 2008, 8:30:20 UTC

did not know i was in court, hope the verdict is "you are free"

Ianab
Volunteer tester
Send message
Joined: 11 Jun 08
Posts: 667
Credit: 12,514,650
RAC: 8,194
New Zealand
Message 818678 - Posted: 15 Oct 2008, 8:48:03 UTC - in response to Message 818673.

did not know i was in court, hope the verdict is "you are free"


The verdict is

"Check back again in 24 hours and everything will be normal."

Ian

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8551
Credit: 50,433,731
RAC: 51,240
United Kingdom
Message 818679 - Posted: 15 Oct 2008, 8:52:10 UTC

OK, if you're done, let's start a new panic: the Server Status page hasn't updated since "As of 15 Oct 2008 6:00:20 UTC".

And if anyone does get out of bed at 2 a.m. local time, I expect them to concentrate on fixing it, not transcribing what I've just written to the front page.

(And when I say 'expect', I mean that's my prediction of what they will do, not any sort of demand on my part).

Tonne
Send message
Joined: 11 Jun 99
Posts: 27
Credit: 16,093,366
RAC: 0
Sweden
Message 818682 - Posted: 15 Oct 2008, 9:14:44 UTC - in response to Message 818679.

OK, if you're done, let's start a new panic: the Server Status page hasn't updated since "As of 15 Oct 2008 6:00:20 UTC".

And if anyone does get out of bed at 2 a.m. local time, I expect them to concentrate on fixing it, not transcribing what I've just written to the front page.

(And when I say 'expect', I mean that's my prediction of what they will do, not any sort of demand on my part).


Sorry that You don't understand whaT I mean. What I mean is that after serveral years of ignoring to write the shortest explanation possible, it is starting to irritate me. That is what I'm talking about. This total lack of care.
____________
/TB

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12311
Credit: 2,607,444
RAC: 1,009
Netherlands
Message 818684 - Posted: 15 Oct 2008, 9:37:30 UTC - in response to Message 818682.

This total lack of care.

Have you ever looked here?

But when things start to irritate you it's time for different venues. There's a 100+ other projects out there to explore. Some with, some without forums. Some with personal admin care, most without.

Or perhaps it's time to move away from the keyboard and go do something else. Watch Heroes 3, or something alike. (on BBC 2 tonight)
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Ianab
Volunteer tester
Send message
Joined: 11 Jun 08
Posts: 667
Credit: 12,514,650
RAC: 8,194
New Zealand
Message 818688 - Posted: 15 Oct 2008, 10:03:21 UTC - in response to Message 818682.

OK, if you're done, let's start a new panic: the Server Status page hasn't updated since "As of 15 Oct 2008 6:00:20 UTC".

And if anyone does get out of bed at 2 a.m. local time, I expect them to concentrate on fixing it, not transcribing what I've just written to the front page.

(And when I say 'expect', I mean that's my prediction of what they will do, not any sort of demand on my part).


Sorry that You don't understand whaT I mean. What I mean is that after serveral years of ignoring to write the shortest explanation possible, it is starting to irritate me. That is what I'm talking about. This total lack of care.



Boot on the other foot..

What if SETI started emailing you because YOU hadn't reported in for 48 hours?

What you agreed to when you signed up is to give your 'free' CPU cycles to SETI, to use when they have work for you to do. They dont guarantee to supply you work units 24/7. LHC has been patchy on work for the last couple of weeks, they have melted a magnet and wont be back online for a few months, live with it.

If having work for your PC is important to you, then sign up for a couple of other projects. If SETI is bogged down, let your work cache run, or do work on another project.

Sure they could hire a PR Dept to keep us up to date on the hour to hour status of the project, but I prefer they work on keeping the servers running and just keep us informed of any major problems. A bit of network congestion is NOT a major problem, it has no serious effect on YOU or the project.

RELAX

Ian

Tonne
Send message
Joined: 11 Jun 99
Posts: 27
Credit: 16,093,366
RAC: 0
Sweden
Message 818689 - Posted: 15 Oct 2008, 10:09:30 UTC - in response to Message 818688.

What is it You don't get. 2 lines on the front page. 1 minute work. I'm not talking about writing a bestselling novel or e-mailing every user individually.

X server is down because of X and we expect to be up agian in X days/hours

How long can that take to write on the front page?
____________
/TB

Profile champ
Volunteer tester
Avatar
Send message
Joined: 12 Mar 03
Posts: 3642
Credit: 1,489,147
RAC: 0
Germany
Message 818692 - Posted: 15 Oct 2008, 10:16:35 UTC

Calm down Tonne.

Your point is noted.

The crew is working hard to solve the problem(s). They do what they can. We are all humans and can fail in our work.
____________

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8644
Credit: 24,405,883
RAC: 24,936
United Kingdom
Message 818700 - Posted: 15 Oct 2008, 10:38:10 UTC - in response to Message 818689.

Besides you, "Who reads the front page"?

Tonne
Send message
Joined: 11 Jun 99
Posts: 27
Credit: 16,093,366
RAC: 0
Sweden
Message 818702 - Posted: 15 Oct 2008, 10:44:08 UTC - in response to Message 818700.

Besides you, "Who reads the front page"?


Those who can read?
____________
/TB

Ianab
Volunteer tester
Send message
Joined: 11 Jun 08
Posts: 667
Credit: 12,514,650
RAC: 8,194
New Zealand
Message 818704 - Posted: 15 Oct 2008, 10:45:06 UTC - in response to Message 818689.

What is it You don't get. 2 lines on the front page. 1 minute work. I'm not talking about writing a bestselling novel or e-mailing every user individually.

X server is down because of X and we expect to be up agian in X days/hours

How long can that take to write on the front page?


By the time they have got into work, worked out what the problem is, and how long it will take to fix... then its generally fixed already.

Matt does a pretty good job of keeping us informed of what is going on, you just need to read his posts. Let him fix the problems and get on with life.

Or otherwise he can post useless info like "Something is wrong, dont know what it is, or how long it's going to take to fix" Great.. we know that already.

I understand the frustration when the project is not running smoothly, but give the guys a break. The current problem is an extented Tuesday outage and the ensuing catchup period. There is a note on the main page to that effect.

Set your work cache to 3 days and relax. None of my machines have run out of work, inspite one of them having 150 odd work units waiting to report there was still 2 days work on hand. There have been some comms issues today, but all the machines have caught up now.

It's a Non-Panic, no need to write home about it.

Ian

Tonne
Send message
Joined: 11 Jun 99
Posts: 27
Credit: 16,093,366
RAC: 0
Sweden
Message 818705 - Posted: 15 Oct 2008, 10:46:11 UTC - in response to Message 818692.

Calm down Tonne.

Your point is noted.

The crew is working hard to solve the problem(s). They do what they can. We are all humans and can fail in our work.


I'm not upset. But it sounds like I'm asking for something very timconsuming that includes hirering new staff.
____________
/TB

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8551
Credit: 50,433,731
RAC: 51,240
United Kingdom
Message 818706 - Posted: 15 Oct 2008, 10:49:11 UTC - in response to Message 818700.
Last modified: 15 Oct 2008, 10:50:06 UTC

Besides you, "Who reads the front page"?

Apparently some people like to use RSS feeds, but I find there's enough bad news pouring out of the TV and radio these days. In BOINC projects, where my mode is "participant", rather than "passive couch potato", I prefer to try to work things out for myself.

And unlike our friend Tonne, I find that the staff are quite communicative when you engage in the sort of dialog that helps, rather than hinders, their working day (or sleepless night, as in the current case). I got both a PM from Matt and 'thread title of the day' in Technical News when I diagnosed that Vader had lost http service the other day.

But I haven't worked out the cause of this one yet.

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8644
Credit: 24,405,883
RAC: 24,936
United Kingdom
Message 818721 - Posted: 15 Oct 2008, 11:38:50 UTC

I asked the question, "Who reads the front page"? because a few years ago a notice was put on the front page about two days before a known outage.
Within about 2 hours of the outage occurring there were about 10 threads started here, the cafe and on Q&A wanting to know what was going on.

And as obviously those people didn't and I admit I very rarely do, then "Who does read the front page"?

Also observing family and friends when they go to frequently visited sites the cursor is usually hovering over and clicking next required page before front page has fully loaded.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8551
Credit: 50,433,731
RAC: 51,240
United Kingdom
Message 818744 - Posted: 15 Oct 2008, 12:59:52 UTC
Last modified: 15 Oct 2008, 13:29:49 UTC

We've all - myself included - had a bit of fun at Tonne's expense, but I think he deserves a serious response as well (even in what is meant to be a light-hearted thread). Actually, I do understand what he means, and I have felt - and posted - similar opinions myself.

Let's examine the fault-handling process, from the staff's point of view.

1) First possible indication of fault - user input
2) Staff become aware of problem
3) Potential reporting point
4) Staff diagnose fault, form plan of action
5) Another potential reporting point
6) Staff fix fault
7) Final potential reporting point

There is no way that (1) is going to find its way onto the front page - UCB isn't going to allow access for outside volunteers, however well vetted, to modify the home page. And the staff can't do it, because at this stage (early hours of the morning? weekend?) they aren't aware of the problem.

There is the possibility of amending the home page at line (3), but I don't think many of us are asking for that. Tonne, in particular, doesn't want reporting at this point, because in two (edit - now three) separate posts he's indicated that he would like an indication of the expected duration of the outage: and that requires diagnosis first.

So the earliest possible useful front-page notice comes at step (5). But by this stage, they've already got the closet doors open, or the server logs in a terminal window, or whatever: it's actually much more efficient to apply simple fixes at that moment, with the tools to hand, than to move to a completely different operating environment to modify the home page. Which means we're at line (7) already, and personally I'm happy for the post-mortem to be conducted in Technical News, rather than the front page.

Of course, the weasel word in that last paragraph is "simple". The obvious counter-example is the day the motherboard died in Thumper Mark I. Because that was a pre-production, prototype server, it couldn't be fixed with a simple switch-out, and the project was down until a new server could be built and shipped trans-continent. That sort of outage deserves, and I seem to remember received, regular updates on the front page.

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7091
Credit: 60,531,852
RAC: 18,729
Germany
Message 818750 - Posted: 15 Oct 2008, 13:19:24 UTC
Last modified: 15 Oct 2008, 13:22:54 UTC

I don't know why sometimes some people are angry to the project..

We all are long enough here to know that sometimes the project have problems.. and?

Rise the cache of WUs.. and everything is fine.. where is the problem??


EDIT:
BTW.
(200. post in this thread.. ;-)
____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Panic Mode On (9) Server problems

Copyright © 2014 University of California