Panic Mode On (46) Server problems


log in

Advanced search

Message boards : Number crunching : Panic Mode On (46) Server problems

Previous · 1 · 2 · 3 · 4 · 5 . . . 12 · Next
Author Message
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar
Send message
Joined: 20 Dec 05
Posts: 2009
Credit: 11,263,155
RAC: 15,005
United States
Message 1092226 - Posted: 1 Apr 2011, 16:57:33 UTC

The scheduling server is off-line...
____________
.

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4665
Credit: 123,823,681
RAC: 94,721
United States
Message 1092238 - Posted: 1 Apr 2011, 17:40:35 UTC - in response to Message 1092226.

The scheduling server is off-line...

Perhaps it was the one mucking things up last night.

Seems the cricket went flat at 00:00 UTC. Then around 12:00 UTC all the web stuff went wonky for a while.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Swibby Bear
Send message
Joined: 1 Aug 01
Posts: 236
Credit: 7,276,504
RAC: 1
United States
Message 1092315 - Posted: 1 Apr 2011, 22:33:25 UTC - in response to Message 1092120.

If they don't get it fixed by knock-off time on Friday, then you may want to be concerned.


Okay, now we're all concerned!

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6370
Credit: 803,398
RAC: 1,752
United States
Message 1092405 - Posted: 2 Apr 2011, 3:33:52 UTC
Last modified: 2 Apr 2011, 3:36:22 UTC

I reported 1 WU just a few minutes ago. The Scheduler was up and running, the only things that were not online were the Back-up database on jocelyn and Download server #2 on vader (and some of the ntpckrs on synergy).

I've been in and out all day, and did notice that some functions were taken down for a time, but almost all are up now. Not worried.
____________
Donald
Infernal Optimist / Submariner, retired

Profile HAL9000
Volunteer tester
Avatar
Send message
Joined: 11 Sep 99
Posts: 4665
Credit: 123,823,681
RAC: 94,721
United States
Message 1092433 - Posted: 2 Apr 2011, 4:06:02 UTC - in response to Message 1092405.

I reported 1 WU just a few minutes ago. The Scheduler was up and running, the only things that were not online were the Back-up database on jocelyn and Download server #2 on vader (and some of the ntpckrs on synergy).

I've been in and out all day, and did notice that some functions were taken down for a time, but almost all are up now. Not worried.

I haven't been worried. Looks like they have brought up and taken vader down as download a few times since this evening. I haven't really seen any issues since I got home from work 6 hours ago.
____________
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5955
Credit: 62,527,792
RAC: 40,652
Australia
Message 1092534 - Posted: 2 Apr 2011, 5:55:34 UTC - in response to Message 1092433.


There's still some sort of problem there. Even when the network traffic drops off, downloads are very slow & often time out almost as soon as they start. Takes multiple retrys before each one does download.
And the amount of work in progress is fairly steady, yet it's about a million short of where it should be.
____________
Grant
Darwin NT.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2358
Credit: 8,957,554
RAC: 3,931
United States
Message 1092577 - Posted: 2 Apr 2011, 9:42:10 UTC

*sigh* Yay for getting loop-holed out of credit for an AP.

_0 missed the deadline and therefore, I became _2. A few hours later, _0 turned their's in and _0 and _1 validated and got credit. Few days go by and my machine turns the work in. Invalid because credit was already granted.

Or it was a legitimate invalid result. They were both stock, I'm not, so it's hard to tell.

WU in question

I know in the past I've seen issues where _2 would get "robbed" of credit if _0 or _1 turned their's in late, but before you did. I don't know if this is still a problem though since I don't see nearly as many WUs since going AP-only.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 25223
Credit: 34,844,595
RAC: 21,553
Germany
Message 1092601 - Posted: 2 Apr 2011, 12:30:23 UTC

The third always gets the credit if the result is valid.
So i think yours is invalid.

____________

ClaggyProject donor
Volunteer tester
Send message
Joined: 5 Jul 99
Posts: 4251
Credit: 34,993,280
RAC: 20,519
United Kingdom
Message 1092629 - Posted: 2 Apr 2011, 13:45:50 UTC - in response to Message 1092601.
Last modified: 2 Apr 2011, 13:46:33 UTC

The third always gets the credit if the result is valid.
So i think yours is invalid.

I suspect if he had reported earlier, his task would have been inconclusive, and when the third result was in, he probably would have got credit for it,
But since those two tasks had already validated, there's no point making it inconclusive and sending out further tasks to see if his is valid, or the original two,

Claggy

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5955
Credit: 62,527,792
RAC: 40,652
Australia
Message 1092743 - Posted: 2 Apr 2011, 19:30:18 UTC - in response to Message 1092534.

There's still some sort of problem there. Even when the network traffic drops off, downloads are very slow & often time out almost as soon as they start. Takes multiple retrys before each one does download.
And the amount of work in progress is fairly steady, yet it's about a million short of where it should be.

Still the same. Downloads timing out, extremely slow when they do download. Work in Progress pretty much stagnant, enough work going out to keep things busy. But not enough to fill caches.
____________
Grant
Darwin NT.

Sten-Arne
Volunteer tester
Send message
Joined: 1 Nov 08
Posts: 3780
Credit: 21,511,116
RAC: 15,066
Sweden
Message 1092779 - Posted: 2 Apr 2011, 22:08:21 UTC - in response to Message 1092743.
Last modified: 2 Apr 2011, 22:09:37 UTC

There's still some sort of problem there. Even when the network traffic drops off, downloads are very slow & often time out almost as soon as they start. Takes multiple retrys before each one does download.
And the amount of work in progress is fairly steady, yet it's about a million short of where it should be.

Still the same. Downloads timing out, extremely slow when they do download. Work in Progress pretty much stagnant, enough work going out to keep things busy. But not enough to fill caches.


Not so strange, since download server 2 has been offline for days now. One download server can't handle all downloads, so timeouts and slow downloads is to be expected.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5955
Credit: 62,527,792
RAC: 40,652
Australia
Message 1092807 - Posted: 2 Apr 2011, 22:53:07 UTC - in response to Message 1092779.

Not so strange, since download server 2 has been offline for days now.

Ah, that explains it.

____________
Grant
Darwin NT.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2358
Credit: 8,957,554
RAC: 3,931
United States
Message 1092917 - Posted: 3 Apr 2011, 6:38:34 UTC

It looks like the "saturation" line on the cricket graph looks pretty nice being below 90mbit though. Scheduler requests take a few seconds longer than usual, but it seems like uploads and scheduler requests go through every time...for me anyway.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5955
Credit: 62,527,792
RAC: 40,652
Australia
Message 1092935 - Posted: 3 Apr 2011, 9:21:49 UTC - in response to Message 1092917.

It looks like the "saturation" line on the cricket graph looks pretty nice being below 90mbit though. Scheduler requests take a few seconds longer than usual, but it seems like uploads and scheduler requests go through every time...for me anyway.

The problem is that it can take anywhere from 3-12 attampts to download a single Work Unit. And where it's usually around 70kB/s or better, at the momnent it can be as slow as 2kB/s.
____________
Grant
Darwin NT.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2358
Credit: 8,957,554
RAC: 3,931
United States
Message 1093164 - Posted: 4 Apr 2011, 1:20:35 UTC - in response to Message 1092935.

It looks like the "saturation" line on the cricket graph looks pretty nice being below 90mbit though. Scheduler requests take a few seconds longer than usual, but it seems like uploads and scheduler requests go through every time...for me anyway.

The problem is that it can take anywhere from 3-12 attampts to download a single Work Unit. And where it's usually around 70kB/s or better, at the momnent it can be as slow as 2kB/s.

Yeah, I see that now. Had gotten a few downloads whilst sleeping or away from the house for a few hours, but I see one now that's trying. Has restarted four times and when it does actually get data.. 1.21kB/sec.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Careface
Send message
Joined: 6 Jun 03
Posts: 115
Credit: 11,626,751
RAC: 0
New Zealand
Message 1093267 - Posted: 4 Apr 2011, 7:45:36 UTC - in response to Message 1093164.

My 5 day cache has nearly run out.. I attribute this to the fact that when I woke up this morning, the 100 or so WU I had been assigned overnight were in 12+ hour project backoff due to failing to download so many times in a row lol

How come the second download server is offline? I've checked news/tech news and I can't find any mention of it.. Same with the backup database.. :o

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5955
Credit: 62,527,792
RAC: 40,652
Australia
Message 1093273 - Posted: 4 Apr 2011, 8:33:54 UTC - in response to Message 1093267.

How come the second download server is offline? I've checked news/tech news and I can't find any mention of it.. Same with the backup database.. :o

Download server, no idea.
Replica database- they were having issues with it's external storage system.
____________
Grant
Darwin NT.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2358
Credit: 8,957,554
RAC: 3,931
United States
Message 1093278 - Posted: 4 Apr 2011, 9:17:21 UTC

I do enjoy still using a pre-GPU build of BOINC. Max back-off is 3:59:59.. or so I've observed. Unless the scheduler specifically responds with a different back-off. A couple weeks ago with that extended downtime for..something, I hadn't turned network communications off yet, and saw "scheduler request pending, waiting 18:xx:xx". So it can still happen for scheduler contacts, but not for failed transfers..those max out at 4 hours.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Fred J. Verster
Volunteer tester
Avatar
Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,520
RAC: 119
Netherlands
Message 1093280 - Posted: 4 Apr 2011, 9:39:58 UTC - in response to Message 1093278.
Last modified: 4 Apr 2011, 9:55:42 UTC

Well I just got some WU's to UPload and DOWNload, whatever it was 'blocking' it,
does work now,

4-4-2011 1:43:43 SETI@home Sending scheduler request: To fetch work.
4-4-2011 1:43:43 SETI@home Requesting new tasks
4-4-2011 1:43:44 SETI@home Started upload of 19fe11aa.18283.24607.10.10.167_1_0
4-4-2011 1:43:47 SETI@home Scheduler request completed: got 1 new tasks
4-4-2011 1:43:49 SETI@home Started download of 19fe11aa.13638.17245.12.10.240
4-4-2011 1:44:02 SETI@home Finished upload of 19fe11aa.18283.24607.10.10.167_1_0
4-4-2011 1:44:50 Project communication failed: attempting access to reference site
4-4-2011 1:44:50 SETI@home Temporarily failed download of 19fe11aa.13638.17245.12.10.240: HTTP error
4-4-2011 1:44:50 SETI@home Backing off 1 min 0 sec on download of 19fe11aa.13638.17245.12.10.240
4-4-2011 1:44:52 Internet access OK - project servers may be temporarily down.
4-4-2011 1:45:51 SETI@home Started download of 19fe11aa.13638.17245.12.10.240
4-4-2011 1:45:53 SETI@home Temporarily failed download of 19fe11aa.13638.17245.12.10.240: HTTP error

Some ~5 hours later:

4-4-2011 6:38:39 SETI@home Temporarily failed download of 18fe11ac.29560.19699.4.10.64: HTTP error
4-4-2011 6:38:39 SETI@home Backing off 1 min 0 sec on download of 18fe11ac.29560.19699.4.10.64
4-4-2011 6:39:39 SETI@home Started download of 18fe11ac.29560.19699.4.10.64
4-4-2011 6:40:07 SETI@home Finished download of 18fe11ac.29560.19699.4.10.64
4-4-2011 8:35:59 SETI@home Started upload of 19fe11aa.13638.17245.12.10.240_1_0
4-4-2011 8:35:59 SETI@home Sending scheduler request: To fetch work.
4-4-2011 8:35:59 SETI@home Reporting 1 completed tasks, requesting new tasks
4-4-2011 8:36:06 SETI@home Scheduler request completed: got 1 new tasks
4-4-2011 8:36:08 SETI@home Started download of 18fe11ab.29874.4975.8.10.56
4-4-2011 8:36:16 SETI@home Finished upload of 19fe11aa.13638.17245.12.10.240_1_0
4-4-2011 8:36:42 SETI@home Finished download of 18fe11ab.29874.4975.8.10.56


Someone got to the Lab, or it fixed itself, :/
____________

Profile Careface
Send message
Joined: 6 Jun 03
Posts: 115
Credit: 11,626,751
RAC: 0
New Zealand
Message 1093282 - Posted: 4 Apr 2011, 10:05:25 UTC - in response to Message 1093280.

Well, whatever the case is, both download servers are offline now o_O

I've got about 6-8 hours of CPU work left, and a good couple of days of GPU work.. so I might just reschedule some work for the time being until stuff is back up and running :)

Previous · 1 · 2 · 3 · 4 · 5 . . . 12 · Next

Message boards : Number crunching : Panic Mode On (46) Server problems

Copyright © 2014 University of California