Panic Mode On (46) Server problems

Message boards : Number crunching : Panic Mode On (46) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next

AuthorMessage
Swibby Bear

Send message
Joined: 1 Aug 01
Posts: 246
Credit: 7,945,093
RAC: 0
United States
Message 1092315 - Posted: 1 Apr 2011, 22:33:25 UTC - in response to Message 1092120.  

If they don't get it fixed by knock-off time on Friday, then you may want to be concerned.


Okay, now we're all concerned!
ID: 1092315 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1092405 - Posted: 2 Apr 2011, 3:33:52 UTC
Last modified: 2 Apr 2011, 3:36:22 UTC

I reported 1 WU just a few minutes ago. The Scheduler was up and running, the only things that were not online were the Back-up database on jocelyn and Download server #2 on vader (and some of the ntpckrs on synergy).

I've been in and out all day, and did notice that some functions were taken down for a time, but almost all are up now. Not worried.
Donald
Infernal Optimist / Submariner, retired
ID: 1092405 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1092433 - Posted: 2 Apr 2011, 4:06:02 UTC - in response to Message 1092405.  

I reported 1 WU just a few minutes ago. The Scheduler was up and running, the only things that were not online were the Back-up database on jocelyn and Download server #2 on vader (and some of the ntpckrs on synergy).

I've been in and out all day, and did notice that some functions were taken down for a time, but almost all are up now. Not worried.

I haven't been worried. Looks like they have brought up and taken vader down as download a few times since this evening. I haven't really seen any issues since I got home from work 6 hours ago.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1092433 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1092534 - Posted: 2 Apr 2011, 5:55:34 UTC - in response to Message 1092433.  


There's still some sort of problem there. Even when the network traffic drops off, downloads are very slow & often time out almost as soon as they start. Takes multiple retrys before each one does download.
And the amount of work in progress is fairly steady, yet it's about a million short of where it should be.
Grant
Darwin NT
ID: 1092534 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1092577 - Posted: 2 Apr 2011, 9:42:10 UTC

*sigh* Yay for getting loop-holed out of credit for an AP.

_0 missed the deadline and therefore, I became _2. A few hours later, _0 turned their's in and _0 and _1 validated and got credit. Few days go by and my machine turns the work in. Invalid because credit was already granted.

Or it was a legitimate invalid result. They were both stock, I'm not, so it's hard to tell.

WU in question

I know in the past I've seen issues where _2 would get "robbed" of credit if _0 or _1 turned their's in late, but before you did. I don't know if this is still a problem though since I don't see nearly as many WUs since going AP-only.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1092577 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34255
Credit: 79,922,639
RAC: 80
Germany
Message 1092601 - Posted: 2 Apr 2011, 12:30:23 UTC

The third always gets the credit if the result is valid.
So i think yours is invalid.



With each crime and every kindness we birth our future.
ID: 1092601 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1092629 - Posted: 2 Apr 2011, 13:45:50 UTC - in response to Message 1092601.  
Last modified: 2 Apr 2011, 13:46:33 UTC

The third always gets the credit if the result is valid.
So i think yours is invalid.

I suspect if he had reported earlier, his task would have been inconclusive, and when the third result was in, he probably would have got credit for it,
But since those two tasks had already validated, there's no point making it inconclusive and sending out further tasks to see if his is valid, or the original two,

Claggy
ID: 1092629 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1092743 - Posted: 2 Apr 2011, 19:30:18 UTC - in response to Message 1092534.  

There's still some sort of problem there. Even when the network traffic drops off, downloads are very slow & often time out almost as soon as they start. Takes multiple retrys before each one does download.
And the amount of work in progress is fairly steady, yet it's about a million short of where it should be.

Still the same. Downloads timing out, extremely slow when they do download. Work in Progress pretty much stagnant, enough work going out to keep things busy. But not enough to fill caches.
Grant
Darwin NT
ID: 1092743 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1092807 - Posted: 2 Apr 2011, 22:53:07 UTC - in response to Message 1092779.  

Not so strange, since download server 2 has been offline for days now.

Ah, that explains it.

Grant
Darwin NT
ID: 1092807 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1092917 - Posted: 3 Apr 2011, 6:38:34 UTC

It looks like the "saturation" line on the cricket graph looks pretty nice being below 90mbit though. Scheduler requests take a few seconds longer than usual, but it seems like uploads and scheduler requests go through every time...for me anyway.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1092917 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1092935 - Posted: 3 Apr 2011, 9:21:49 UTC - in response to Message 1092917.  

It looks like the "saturation" line on the cricket graph looks pretty nice being below 90mbit though. Scheduler requests take a few seconds longer than usual, but it seems like uploads and scheduler requests go through every time...for me anyway.

The problem is that it can take anywhere from 3-12 attampts to download a single Work Unit. And where it's usually around 70kB/s or better, at the momnent it can be as slow as 2kB/s.
Grant
Darwin NT
ID: 1092935 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1093164 - Posted: 4 Apr 2011, 1:20:35 UTC - in response to Message 1092935.  

It looks like the "saturation" line on the cricket graph looks pretty nice being below 90mbit though. Scheduler requests take a few seconds longer than usual, but it seems like uploads and scheduler requests go through every time...for me anyway.

The problem is that it can take anywhere from 3-12 attampts to download a single Work Unit. And where it's usually around 70kB/s or better, at the momnent it can be as slow as 2kB/s.

Yeah, I see that now. Had gotten a few downloads whilst sleeping or away from the house for a few hours, but I see one now that's trying. Has restarted four times and when it does actually get data.. 1.21kB/sec.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1093164 · Report as offensive
Profile Careface

Send message
Joined: 6 Jun 03
Posts: 128
Credit: 16,561,684
RAC: 0
New Zealand
Message 1093267 - Posted: 4 Apr 2011, 7:45:36 UTC - in response to Message 1093164.  

My 5 day cache has nearly run out.. I attribute this to the fact that when I woke up this morning, the 100 or so WU I had been assigned overnight were in 12+ hour project backoff due to failing to download so many times in a row lol

How come the second download server is offline? I've checked news/tech news and I can't find any mention of it.. Same with the backup database.. :o
ID: 1093267 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1093273 - Posted: 4 Apr 2011, 8:33:54 UTC - in response to Message 1093267.  

How come the second download server is offline? I've checked news/tech news and I can't find any mention of it.. Same with the backup database.. :o

Download server, no idea.
Replica database- they were having issues with it's external storage system.
Grant
Darwin NT
ID: 1093273 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1093278 - Posted: 4 Apr 2011, 9:17:21 UTC

I do enjoy still using a pre-GPU build of BOINC. Max back-off is 3:59:59.. or so I've observed. Unless the scheduler specifically responds with a different back-off. A couple weeks ago with that extended downtime for..something, I hadn't turned network communications off yet, and saw "scheduler request pending, waiting 18:xx:xx". So it can still happen for scheduler contacts, but not for failed transfers..those max out at 4 hours.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1093278 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1093280 - Posted: 4 Apr 2011, 9:39:58 UTC - in response to Message 1093278.  
Last modified: 4 Apr 2011, 9:55:42 UTC

Well I just got some WU's to UPload and DOWNload, whatever it was 'blocking' it,
does work now,

4-4-2011 1:43:43 SETI@home Sending scheduler request: To fetch work.
4-4-2011 1:43:43 SETI@home Requesting new tasks
4-4-2011 1:43:44 SETI@home Started upload of 19fe11aa.18283.24607.10.10.167_1_0
4-4-2011 1:43:47 SETI@home Scheduler request completed: got 1 new tasks
4-4-2011 1:43:49 SETI@home Started download of 19fe11aa.13638.17245.12.10.240
4-4-2011 1:44:02 SETI@home Finished upload of 19fe11aa.18283.24607.10.10.167_1_0
4-4-2011 1:44:50 Project communication failed: attempting access to reference site
4-4-2011 1:44:50 SETI@home Temporarily failed download of 19fe11aa.13638.17245.12.10.240: HTTP error
4-4-2011 1:44:50 SETI@home Backing off 1 min 0 sec on download of 19fe11aa.13638.17245.12.10.240
4-4-2011 1:44:52 Internet access OK - project servers may be temporarily down.
4-4-2011 1:45:51 SETI@home Started download of 19fe11aa.13638.17245.12.10.240
4-4-2011 1:45:53 SETI@home Temporarily failed download of 19fe11aa.13638.17245.12.10.240: HTTP error

Some ~5 hours later:

4-4-2011 6:38:39 SETI@home Temporarily failed download of 18fe11ac.29560.19699.4.10.64: HTTP error
4-4-2011 6:38:39 SETI@home Backing off 1 min 0 sec on download of 18fe11ac.29560.19699.4.10.64
4-4-2011 6:39:39 SETI@home Started download of 18fe11ac.29560.19699.4.10.64
4-4-2011 6:40:07 SETI@home Finished download of 18fe11ac.29560.19699.4.10.64
4-4-2011 8:35:59 SETI@home Started upload of 19fe11aa.13638.17245.12.10.240_1_0
4-4-2011 8:35:59 SETI@home Sending scheduler request: To fetch work.
4-4-2011 8:35:59 SETI@home Reporting 1 completed tasks, requesting new tasks
4-4-2011 8:36:06 SETI@home Scheduler request completed: got 1 new tasks
4-4-2011 8:36:08 SETI@home Started download of 18fe11ab.29874.4975.8.10.56
4-4-2011 8:36:16 SETI@home Finished upload of 19fe11aa.13638.17245.12.10.240_1_0
4-4-2011 8:36:42 SETI@home Finished download of 18fe11ab.29874.4975.8.10.56


Someone got to the Lab, or it fixed itself, :/
ID: 1093280 · Report as offensive
Profile Careface

Send message
Joined: 6 Jun 03
Posts: 128
Credit: 16,561,684
RAC: 0
New Zealand
Message 1093282 - Posted: 4 Apr 2011, 10:05:25 UTC - in response to Message 1093280.  

Well, whatever the case is, both download servers are offline now o_O

I've got about 6-8 hours of CPU work left, and a good couple of days of GPU work.. so I might just reschedule some work for the time being until stuff is back up and running :)
ID: 1093282 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1093284 - Posted: 4 Apr 2011, 10:45:47 UTC

The d/l server VADER has been in trouble all weekend. I think they attempted to get it going late Friday - early Saturday and it lasted a short time. If you look at the server page you will notice that it was disabled by the staff. I suspect that they will take care of it as soon as someone comes in later this morning.
ID: 1093284 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1093312 - Posted: 4 Apr 2011, 13:03:59 UTC - in response to Message 1093278.  

I do enjoy still using a pre-GPU build of BOINC. Max back-off is 3:59:59.. or so I've observed. Unless the scheduler specifically responds with a different back-off. A couple weeks ago with that extended downtime for..something, I hadn't turned network communications off yet, and saw "scheduler request pending, waiting 18:xx:xx". So it can still happen for scheduler contacts, but not for failed transfers..those max out at 4 hours.

I like in the version I'm running, 6.10.48, where I'll see tasks downloading & then (project back-off 00:30:00) shows up next to them while still downloading. Once one of the downloading tasks finishes the back-off goes away. It just amuses me to see it do that. I would use .58, but I have problems connecting to remote machines with the manager on my work network.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1093312 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1093354 - Posted: 4 Apr 2011, 16:03:40 UTC - in response to Message 1093312.  
Last modified: 4 Apr 2011, 16:40:41 UTC


BOINC replica database jocelyn Disabled
download server 2 vader Disabled
ap_splitter1 vader Not Running
ap_splitter2 lando Not Running
ap_splitter3 lando Not Running, as of 4 Apr 2011 | 15:50:06 UTC



UP- & DOWN-Loads, do get though, most of the time.

4-4-2011 13:36:04 SETI@home Started upload of 18fe11ac.29560.19699.4.10.64_1_0
4-4-2011 13:36:05 SETI@home Started download of 18fe11ab.12264.8656.11.10.114
4-4-2011 13:36:06 SETI@home Temporarily failed download of 18fe11ab.12264.8656.11.10.114: HTTP error
4-4-2011 13:36:06 SETI@home Backing off 1 min 0 sec on download of 18fe11ab.12264.8656.11.10.114
4-4-2011 13:36:08 SETI@home Finished upload of 18fe11ac.29560.19699.4.10.64_1_0
4-4-2011 13:37:06 SETI@home Started download of 18fe11ab.12264.8656.11.10.114
4-4-2011 13:37:35 SETI@home Finished download of 18fe11ab.12264.8656.11.10.114
4-4-2011 15:21:58 SETI@home Reporting 1 completed tasks, requesting new tasks
4-4-2011 15:22:03 SETI@home Scheduler request completed: got 1 new tasks
4-4-2011 15:22:05 SETI@home Finished upload of 18fe11ab.29874.4975.8.10.56_1_0
4-4-2011 15:22:05 SETI@home Started download of 18fe11ac.30016.21744.6.10.219
4-4-2011 15:23:01 Project communication failed: attempting access to reference site
4-4-2011 15:23:01 SETI@home Temporarily failed download of 18fe11ac.30016.21744.6.10.219: HTTP error
4-4-2011 15:23:01 SETI@home Backing off 1 min 0 sec on download of 18fe11ac.30016.21744.6.10.219
4-4-2011 15:23:02 Internet access OK - project servers may be temporarily down.
4-4-2011 15:24:02 SETI@home Started download of 18fe11ac.30016.21744.6.10.219
4-4-2011 15:26:12 SETI@home Finished download of 18fe11ac.30016.21744.6.10.219


The upload server bruno Running is apparently not enough,
to handle the (DDOS!?) Requests
The download server 1 anakin is Running
download server 2 vader is Disabled


By the way, this comes from my LT(T2400)
ID: 1093354 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (46) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.