Panic Mode On (60) Server problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (60) Server problems?

1 · 2 · 3 · 4 . . . 11 · Next
Author Message
Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3747
Credit: 48,777,915
RAC: 1,076
United States
Message 1167949 - Posted: 5 Nov 2011, 3:50:20 UTC

Red Alert, servers are down at this moment.....
____________

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3747
Credit: 48,777,915
RAC: 1,076
United States
Message 1167952 - Posted: 5 Nov 2011, 4:08:51 UTC - in response to Message 1167951.

Red Alert, servers are down at this moment.....

Oh, you mean more down than they have been for the last hour or two?


More like the last 6 hours or so.

That is when the upload server started misbehaving.
____________

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3747
Credit: 48,777,915
RAC: 1,076
United States
Message 1167954 - Posted: 5 Nov 2011, 4:24:30 UTC - in response to Message 1167953.

Red Alert, servers are down at this moment.....

Oh, you mean more down than they have been for the last hour or two?


More like the last 6 hours or so.

That is when the upload server started misbehaving.

And kinda looks like it's gonna be rough ridin' until somebody can get in to da lab to set things straight.

Eric was trying to remote boot thingys when he got home, but I don't think it quite worked out yet.

Sometimes the servers doin' 'most alright...
And sometimes I tink dey ain't.
Once in a while dey get a few bits out,
but most da time they cain't.


I just allowed Milkyway for Nvidia for a little while to see how fast the GTX560 can crunch a standard unit. Most likely way slower than my HD5830 in the same machine.

____________

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1759
Credit: 206,855,743
RAC: 22,474
Australia
Message 1167960 - Posted: 5 Nov 2011, 4:36:35 UTC

Sometimes the servers doin' 'most alright...
And sometimes I tink dey ain't.
Once in a while dey get a few bits out,
but most da time they cain't.

Hey Mark.
As the author of this original work, may I please have your permission to print it out and stick it on the wall of the server room at work ?
:-)

T.A.

Profile arkaynProject donor
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3747
Credit: 48,777,915
RAC: 1,076
United States
Message 1167964 - Posted: 5 Nov 2011, 4:46:06 UTC - in response to Message 1167957.



I just allowed Milkyway for Nvidia for a little while to see how fast the GTX560 can crunch a standard unit. Most likely way slower than my HD5830 in the same machine.

Well, y'all enjoy......
The kitties are gonna proceed to crunch what they got, and hope things can get fixed before the kibble bowls run dry.

Meow meow meow!


My CUDA kibble bowl is dry, I ran through those 140 units almost as fast as I could download them.

The other machine still has a good supply though.
____________

Terror Australis
Volunteer tester
Send message
Joined: 14 Feb 04
Posts: 1759
Credit: 206,855,743
RAC: 22,474
Australia
Message 1167974 - Posted: 5 Nov 2011, 5:05:06 UTC

Uploads have been down for 6 to 7 hours now but the green on the Cricket Graphs is still maxxed out.

Shows how many downloads must have been backed up. (Unfortunately none of them are mine.) :(

T.A.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5954
Credit: 62,482,213
RAC: 40,312
Australia
Message 1168071 - Posted: 5 Nov 2011, 6:05:45 UTC - in response to Message 1167982.

There is still a bit of work being issued on the few scheduler requests that make it in and out of the black hole.

6.12.33 is showing it's limits again. The machine with 6.10.58 continues to occasionally report & get new work, but 6.12.33 backs off so far with each failed attempt that it hasn't reported or received work for hours.
____________
Grant
Darwin NT.

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8780
Credit: 25,957,251
RAC: 16,959
United Kingdom
Message 1168127 - Posted: 5 Nov 2011, 7:29:46 UTC

I can't report or request now.

And in reply to the last two post, I am sure the extra long back-offs in 6.12.nn are actually causing more problems than they are curing. Very noticable if you abuse the buttons.

Would you believe that in the distant past, I actually gave written warnings to people who played with switches and twiddled with variable controls for no good reason.

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8819
Credit: 63,183,441
RAC: 81,329
United Kingdom
Message 1168130 - Posted: 5 Nov 2011, 7:36:12 UTC

Looking at the crickets there have been a series of drop-outs for the last few hours, every couple of hours the throughput has dropped by about 10%.
Coupled with a very poor upload performance its obvious that the servers are less than happy. We're going to have to wait a few hours until anyone in Berkeley is awake, or maybe a couple of days until someone gets in to work on Monday.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8819
Credit: 63,183,441
RAC: 81,329
United Kingdom
Message 1168148 - Posted: 5 Nov 2011, 9:02:44 UTC

Work outside the project indicates that long back-off are actually counter productive in terms of overall throughput on saturated data links. This is particularly so when you have a long first back-off. If you must use back-off then a short, random initial back-off is far more effective in load spreading.

Far more effective is load throttling, where you reduce the data rate to each of your concurrent clients by a small faction.
So, if you have 100 concurrent clients who would normally each have a 1% share of the available bandwidth and you are suffering signs of congestion (increases packet drop for example) you reduce the bandwidth share of each client by 1%, that is from 1% of total to 0.99% of total. This has virtually no effect on the data rate to the user, but does reduce the instantaneous data rate enough to reduce the number of destructive collisions, so reducing packet loss, and the client actual observes an INCREASE in effective data rate at their end of the bit of wire, but you are only using 99% of the available bandwidth.

Obviously if you have a situation where the feed server is off-line (for maintenance, or it has crashed) then you have to implement a message that says so, and a realistic "wait for" delay.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile hiamps
Volunteer tester
Avatar
Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 1168182 - Posted: 5 Nov 2011, 12:27:35 UTC - in response to Message 1168127.

I can't report or request now.

And in reply to the last two post, I am sure the extra long back-offs in 6.12.nn are actually causing more problems than they are curing. Very noticable if you abuse the buttons.

Would you believe that in the distant past, I actually gave written warnings to people who played with switches and twiddled with variable controls for no good reason.


Thats one of the reasons I left.....
____________
Official Abuser of Boinc Buttons...
And no good credit hound!

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5954
Credit: 62,482,213
RAC: 40,312
Australia
Message 1168378 - Posted: 5 Nov 2011, 21:19:19 UTC - in response to Message 1168182.
Last modified: 5 Nov 2011, 21:19:56 UTC

Uploads piled up overnight, so i just had a look at the network traffic graphs and that is one weird looking graph. I don't know how the Scheduler is going at the moment- but the upload server is certainly having kittens.


EDIT- i'm pretty sure we had similar issues a few months ago.
____________
Grant
Darwin NT.

1 · 2 · 3 · 4 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (60) Server problems?

Copyright © 2014 University of California