Panic Mode On (46) Server problems

Message boards : Number crunching : Panic Mode On (46) Server problems

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 12 · Next

AuthorMessage
Profile KWSN THE Holy Hand Grenade!
Volunteer tester
Avatar

Send message
Joined: 20 Dec 05
Posts: 2576
Credit: 34,630,096
RAC: 19,007
United States
Message 1092226 - Posted: 1 Apr 2011, 16:57:33 UTC

The scheduling server is off-line...


.

ID: 1092226 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6085
Credit: 154,951,366
RAC: 46,505
United States
Message 1092238 - Posted: 1 Apr 2011, 17:40:35 UTC - in response to Message 1092226.

The scheduling server is off-line...

Perhaps it was the one mucking things up last night.

Seems the cricket went flat at 00:00 UTC. Then around 12:00 UTC all the web stuff went wonky for a while.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

ID: 1092238 · Report as offensive
Swibby Bear

Send message
Joined: 1 Aug 01
Posts: 246
Credit: 7,912,597
RAC: 803
United States
Message 1092315 - Posted: 1 Apr 2011, 22:33:25 UTC - in response to Message 1092120.

If they don't get it fixed by knock-off time on Friday, then you may want to be concerned.


Okay, now we're all concerned!

ID: 1092315 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8205
Credit: 4,328,174
RAC: 5,389
United States
Message 1092405 - Posted: 2 Apr 2011, 3:33:52 UTC
Last modified: 2 Apr 2011, 3:36:22 UTC

I reported 1 WU just a few minutes ago. The Scheduler was up and running, the only things that were not online were the Back-up database on jocelyn and Download server #2 on vader (and some of the ntpckrs on synergy).

I've been in and out all day, and did notice that some functions were taken down for a time, but almost all are up now. Not worried.


Donald
Infernal Optimist / Submariner, retired

ID: 1092405 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6085
Credit: 154,951,366
RAC: 46,505
United States
Message 1092433 - Posted: 2 Apr 2011, 4:06:02 UTC - in response to Message 1092405.

I reported 1 WU just a few minutes ago. The Scheduler was up and running, the only things that were not online were the Back-up database on jocelyn and Download server #2 on vader (and some of the ntpckrs on synergy).

I've been in and out all day, and did notice that some functions were taken down for a time, but almost all are up now. Not worried.

I haven't been worried. Looks like they have brought up and taken vader down as download a few times since this evening. I haven't really seen any issues since I got home from work 6 hours ago.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours

Join the BP6/VP6 User Group today!

ID: 1092433 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7474
Credit: 90,845,285
RAC: 45,042
Australia
Message 1092534 - Posted: 2 Apr 2011, 5:55:34 UTC - in response to Message 1092433.


There's still some sort of problem there. Even when the network traffic drops off, downloads are very slow & often time out almost as soon as they start. Takes multiple retrys before each one does download.
And the amount of work in progress is fairly steady, yet it's about a million short of where it should be.


Grant
Darwin NT

ID: 1092534 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,619,827
RAC: 294
United States
Message 1092577 - Posted: 2 Apr 2011, 9:42:10 UTC

*sigh* Yay for getting loop-holed out of credit for an AP.

_0 missed the deadline and therefore, I became _2. A few hours later, _0 turned their's in and _0 and _1 validated and got credit. Few days go by and my machine turns the work in. Invalid because credit was already granted.

Or it was a legitimate invalid result. They were both stock, I'm not, so it's hard to tell.

WU in question

I know in the past I've seen issues where _2 would get "robbed" of credit if _0 or _1 turned their's in late, but before you did. I don't know if this is still a problem though since I don't see nearly as many WUs since going AP-only.


Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

ID: 1092577 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 29560
Credit: 49,009,252
RAC: 16,928
Germany
Message 1092601 - Posted: 2 Apr 2011, 12:30:23 UTC

The third always gets the credit if the result is valid.
So i think yours is invalid.


With each crime and every kindness we birth our future.

ID: 1092601 · Report as offensive
ClaggyProject Donor
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4622
Credit: 46,334,181
RAC: 2,959
United Kingdom
Message 1092629 - Posted: 2 Apr 2011, 13:45:50 UTC - in response to Message 1092601.
Last modified: 2 Apr 2011, 13:46:33 UTC

The third always gets the credit if the result is valid.
So i think yours is invalid.

I suspect if he had reported earlier, his task would have been inconclusive, and when the third result was in, he probably would have got credit for it,
But since those two tasks had already validated, there's no point making it inconclusive and sending out further tasks to see if his is valid, or the original two,

Claggy

ID: 1092629 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7474
Credit: 90,845,285
RAC: 45,042
Australia
Message 1092743 - Posted: 2 Apr 2011, 19:30:18 UTC - in response to Message 1092534.

There's still some sort of problem there. Even when the network traffic drops off, downloads are very slow & often time out almost as soon as they start. Takes multiple retrys before each one does download.
And the amount of work in progress is fairly steady, yet it's about a million short of where it should be.

Still the same. Downloads timing out, extremely slow when they do download. Work in Progress pretty much stagnant, enough work going out to keep things busy. But not enough to fill caches.
Grant
Darwin NT

ID: 1092743 · Report as offensive
Tutankhamon "Communist"
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 6081
Credit: 37,591,261
RAC: 14,859
Sweden
Message 1092779 - Posted: 2 Apr 2011, 22:08:21 UTC - in response to Message 1092743.
Last modified: 2 Apr 2011, 22:09:37 UTC

There's still some sort of problem there. Even when the network traffic drops off, downloads are very slow & often time out almost as soon as they start. Takes multiple retrys before each one does download.
And the amount of work in progress is fairly steady, yet it's about a million short of where it should be.

Still the same. Downloads timing out, extremely slow when they do download. Work in Progress pretty much stagnant, enough work going out to keep things busy. But not enough to fill caches.


Not so strange, since download server 2 has been offline for days now. One download server can't handle all downloads, so timeouts and slow downloads is to be expected.
This is a test of the Emergency Moron System. Had there been a real moron in the room, there would've been a small mushroom cloud in the place where the idiot had been standing.

ID: 1092779 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7474
Credit: 90,845,285
RAC: 45,042
Australia
Message 1092807 - Posted: 2 Apr 2011, 22:53:07 UTC - in response to Message 1092779.

Not so strange, since download server 2 has been offline for days now.

Ah, that explains it.

Grant
Darwin NT

ID: 1092807 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,619,827
RAC: 294
United States
Message 1092917 - Posted: 3 Apr 2011, 6:38:34 UTC

It looks like the "saturation" line on the cricket graph looks pretty nice being below 90mbit though. Scheduler requests take a few seconds longer than usual, but it seems like uploads and scheduler requests go through every time...for me anyway.


Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

ID: 1092917 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7474
Credit: 90,845,285
RAC: 45,042
Australia
Message 1092935 - Posted: 3 Apr 2011, 9:21:49 UTC - in response to Message 1092917.

It looks like the "saturation" line on the cricket graph looks pretty nice being below 90mbit though. Scheduler requests take a few seconds longer than usual, but it seems like uploads and scheduler requests go through every time...for me anyway.

The problem is that it can take anywhere from 3-12 attampts to download a single Work Unit. And where it's usually around 70kB/s or better, at the momnent it can be as slow as 2kB/s.
Grant
Darwin NT

ID: 1092935 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,619,827
RAC: 294
United States
Message 1093164 - Posted: 4 Apr 2011, 1:20:35 UTC - in response to Message 1092935.

It looks like the "saturation" line on the cricket graph looks pretty nice being below 90mbit though. Scheduler requests take a few seconds longer than usual, but it seems like uploads and scheduler requests go through every time...for me anyway.

The problem is that it can take anywhere from 3-12 attampts to download a single Work Unit. And where it's usually around 70kB/s or better, at the momnent it can be as slow as 2kB/s.

Yeah, I see that now. Had gotten a few downloads whilst sleeping or away from the house for a few hours, but I see one now that's trying. Has restarted four times and when it does actually get data.. 1.21kB/sec.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

ID: 1093164 · Report as offensive
Profile Careface

Send message
Joined: 6 Jun 03
Posts: 128
Credit: 16,561,684
RAC: 12
New Zealand
Message 1093267 - Posted: 4 Apr 2011, 7:45:36 UTC - in response to Message 1093164.

My 5 day cache has nearly run out.. I attribute this to the fact that when I woke up this morning, the 100 or so WU I had been assigned overnight were in 12+ hour project backoff due to failing to download so many times in a row lol

How come the second download server is offline? I've checked news/tech news and I can't find any mention of it.. Same with the backup database.. :o

ID: 1093267 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7474
Credit: 90,845,285
RAC: 45,042
Australia
Message 1093273 - Posted: 4 Apr 2011, 8:33:54 UTC - in response to Message 1093267.

How come the second download server is offline? I've checked news/tech news and I can't find any mention of it.. Same with the backup database.. :o

Download server, no idea.
Replica database- they were having issues with it's external storage system.
Grant
Darwin NT

ID: 1093273 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,619,827
RAC: 294
United States
Message 1093278 - Posted: 4 Apr 2011, 9:17:21 UTC

I do enjoy still using a pre-GPU build of BOINC. Max back-off is 3:59:59.. or so I've observed. Unless the scheduler specifically responds with a different back-off. A couple weeks ago with that extended downtime for..something, I hadn't turned network communications off yet, and saw "scheduler request pending, waiting 18:xx:xx". So it can still happen for scheduler contacts, but not for failed transfers..those max out at 4 hours.


Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

ID: 1093278 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1093280 - Posted: 4 Apr 2011, 9:39:58 UTC - in response to Message 1093278.
Last modified: 4 Apr 2011, 9:55:42 UTC

Well I just got some WU's to UPload and DOWNload, whatever it was 'blocking' it,
does work now,

4-4-2011 1:43:43 SETI@home Sending scheduler request: To fetch work.
4-4-2011 1:43:43 SETI@home Requesting new tasks
4-4-2011 1:43:44 SETI@home Started upload of 19fe11aa.18283.24607.10.10.167_1_0
4-4-2011 1:43:47 SETI@home Scheduler request completed: got 1 new tasks
4-4-2011 1:43:49 SETI@home Started download of 19fe11aa.13638.17245.12.10.240
4-4-2011 1:44:02 SETI@home Finished upload of 19fe11aa.18283.24607.10.10.167_1_0
4-4-2011 1:44:50 Project communication failed: attempting access to reference site
4-4-2011 1:44:50 SETI@home Temporarily failed download of 19fe11aa.13638.17245.12.10.240: HTTP error
4-4-2011 1:44:50 SETI@home Backing off 1 min 0 sec on download of 19fe11aa.13638.17245.12.10.240
4-4-2011 1:44:52 Internet access OK - project servers may be temporarily down.
4-4-2011 1:45:51 SETI@home Started download of 19fe11aa.13638.17245.12.10.240
4-4-2011 1:45:53 SETI@home Temporarily failed download of 19fe11aa.13638.17245.12.10.240: HTTP error

Some ~5 hours later:

4-4-2011 6:38:39 SETI@home Temporarily failed download of 18fe11ac.29560.19699.4.10.64: HTTP error
4-4-2011 6:38:39 SETI@home Backing off 1 min 0 sec on download of 18fe11ac.29560.19699.4.10.64
4-4-2011 6:39:39 SETI@home Started download of 18fe11ac.29560.19699.4.10.64
4-4-2011 6:40:07 SETI@home Finished download of 18fe11ac.29560.19699.4.10.64
4-4-2011 8:35:59 SETI@home Started upload of 19fe11aa.13638.17245.12.10.240_1_0
4-4-2011 8:35:59 SETI@home Sending scheduler request: To fetch work.
4-4-2011 8:35:59 SETI@home Reporting 1 completed tasks, requesting new tasks
4-4-2011 8:36:06 SETI@home Scheduler request completed: got 1 new tasks
4-4-2011 8:36:08 SETI@home Started download of 18fe11ab.29874.4975.8.10.56
4-4-2011 8:36:16 SETI@home Finished upload of 19fe11aa.13638.17245.12.10.240_1_0
4-4-2011 8:36:42 SETI@home Finished download of 18fe11ab.29874.4975.8.10.56


Someone got to the Lab, or it fixed itself, :/


ID: 1093280 · Report as offensive
Profile Careface

Send message
Joined: 6 Jun 03
Posts: 128
Credit: 16,561,684
RAC: 12
New Zealand
Message 1093282 - Posted: 4 Apr 2011, 10:05:25 UTC - in response to Message 1093280.

Well, whatever the case is, both download servers are offline now o_O

I've got about 6-8 hours of CPU work left, and a good couple of days of GPU work.. so I might just reschedule some work for the time being until stuff is back up and running :)

ID: 1093282 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 12 · Next

Message boards : Number crunching : Panic Mode On (46) Server problems


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.