Panic Mode On (79) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 23 · Next
Author Message
Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5872
Credit: 60,851,680
RAC: 47,565
Australia
Message 1308666 - Posted: 22 Nov 2012, 7:45:33 UTC - in response to Message 1308658.


AP being split & sent off in large numbers will be the real test.
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5872
Credit: 60,851,680
RAC: 47,565
Australia
Message 1308676 - Posted: 22 Nov 2012, 7:57:55 UTC - in response to Message 1308669.
Last modified: 22 Nov 2012, 8:00:44 UTC


AP being split & sent off in large numbers will be the real test.

Agreed.

If I understand it, the change was in the routing of the pipe to the servers?

Just the Scheduler.
Apparently (at least for now) they're able to use the campus network for the Scheduler traffic.

If you look at the network graphs at present, instead of being around 14-20Mb/s it's been sitting around 10-12Mb/s inbound.

I did some pings (posted a few posts before these from memory).
No packet loss at all, where as the download server (i use .13 exclusively) is around 50-75% & the upload server is around 50% packet loss.


EDIT- & the other real test will be to bump up the limits & see if things fall over again or not. Maybe 400 per core & 1200 per GPU to start with?
Hint, hint.
____________
Grant
Darwin NT.

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 628
Credit: 144,352,028
RAC: 153,190
United Kingdom
Message 1308767 - Posted: 22 Nov 2012, 13:56:44 UTC - in response to Message 1308561.
Last modified: 22 Nov 2012, 14:02:49 UTC

Now we just need a small tweak to divide those into 'before tonight' and 'after tonight', so we know what effect Eric's changes have had.

Here's a graph of my response times (UTC) for the last couple of days -- I couldn't get it to embed, perhaps because of the https. Timed-out requests were set to 330 seconds.
https://lh4.googleusercontent.com/-dde5ywVYBuM/UK4sHPNdySI/AAAAAAAAAY0/KCmDzfOo6lI/s800/setiresponse.png

[Edit] Spoke too soon; everything's dropped off the cliff and it's timing out again...
____________

mikeej42
Send message
Joined: 26 Oct 00
Posts: 109
Credit: 790,750,043
RAC: 1,964
United States
Message 1308771 - Posted: 22 Nov 2012, 14:10:37 UTC
Last modified: 22 Nov 2012, 14:20:34 UTC

11/22/2012 8:06:02 AM | SETI@home | Sending scheduler request: Requested by user.
11/22/2012 8:06:02 AM | SETI@home | Reporting 28 completed tasks, requesting new tasks for CPU
11/22/2012 8:06:24 AM | SETI@home | Scheduler request failed: Couldn't connect to server

<PanicMode>1</PanicMode>
It is going to be a long weekend....

[Edit] A U.S. Holyday (sic) for many. 4 days till the work week resumes.
____________

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5421
Credit: 308,585,872
RAC: 357,592
Brazil
Message 1308774 - Posted: 22 Nov 2012, 14:23:19 UTC
Last modified: 22 Nov 2012, 14:39:17 UTC

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=octets

Cricket graphics: Shows Dive, dive... dive! But DL rises to incredibles > 250kbps! Without proxy!

Any clues???

(edit)

<PanicMode>1</PanicMode>

+1
____________

Profile Fred E.Project donor
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,139,004
RAC: 0
United States
Message 1308795 - Posted: 22 Nov 2012, 15:17:01 UTC


http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=octets

Cricket graphics: Shows Dive, dive... dive! But DL rises to incredibles > 250kbps! Without proxy!

Any clues???


Would also note that the Server Status Page hasn't updated for over 2 hours. I had good luck w/o a proxy last night, although download speeds were low.
If interested, here's a Cricket link that also shows the weekly graph:

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d%3Aw

____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5421
Credit: 308,585,872
RAC: 357,592
Brazil
Message 1308800 - Posted: 22 Nov 2012, 15:21:35 UTC - in response to Message 1308795.
Last modified: 22 Nov 2012, 15:24:23 UTC


http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=octets

Cricket graphics: Shows Dive, dive... dive! But DL rises to incredibles > 250kbps! Without proxy!

Any clues???


Would also note that the Server Status Page hasn't updated for over 2 hours. I had good luck w/o a proxy last night, although download speeds were low.
If interested, here's a Cricket link that also shows the weekly graph:

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d%3Aw


Allready notice that, stuck at 13:00 UTC hours ago... and belive nobody is in the lab because the Thanksgiving holiday besides the ghosts in the machine... i belive we could do nothing else beside open a beer or two and wait... will do my part ASAP, is normal working day here.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8670
Credit: 51,868,594
RAC: 49,356
United Kingdom
Message 1308804 - Posted: 22 Nov 2012, 15:29:14 UTC

OK, I've been having a scout around. So far....

Well, it happened while I was out at lunch, OK? Nothing to do with me. Jeez, can't I even trust you guys to mind the shop while I go and fetch a sandwich ... LOL :)

Something seemed to happen to the scheduler - quite suddenly - at about 13:24 UTC. One host got a timeout, everything else has been "Couldn't connect to server" since then.

Synergy has been responding to pings, so I guess the server itself is running, but the programs we need to handle work requests and reports clearly aren't - maybe Apache has failed.

I also see that the Server Status Page hasn't updated since [As of 22 Nov 2012 | 13:00:07 UTC]. That usually means that one of the auxiliary servers in the lab, that handles the glue that holds the whole ball of string together, has crashed.

Some of the lab servers are on remotely-controlled power strips, so they can be given a remote kicking (power down and power back up). If this failure can be handled like that, we might see some resumption after the staff have finished their holiday lie-ins. Otherwise, we're probably reduced to hoping that some member of staff will accept the excuse to evade the Black Friday shopping trip tomorrow...

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5421
Credit: 308,585,872
RAC: 357,592
Brazil
Message 1308807 - Posted: 22 Nov 2012, 15:36:19 UTC

Take a beer on my acount to help in the waiting task and thanks for the info.
Have a good day


____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8670
Credit: 51,868,594
RAC: 49,356
United Kingdom
Message 1308842 - Posted: 22 Nov 2012, 17:36:07 UTC

Front page News:

The scheduler will be down until someone can get to the lab to reboot it. I'll try to convince Angela to let me go in once the turkey is in the oven.

Eric

tbretProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2861
Credit: 217,213,576
RAC: 226,363
United States
Message 1308924 - Posted: 22 Nov 2012, 19:34:16 UTC - in response to Message 1308842.

Front page News:

The scheduler will be down until someone can get to the lab to reboot it. I'll try to convince Angela to let me go in once the turkey is in the oven.

Eric


Now THAT is above and beyond the call of duty.


Profile dancer42
Volunteer tester
Send message
Joined: 2 Jun 02
Posts: 436
Credit: 1,159,720
RAC: 102
United States
Message 1308986 - Posted: 22 Nov 2012, 20:53:52 UTC - in response to Message 1307767.

One of the red flags that a site has been hacked or spoofed is that you find grammatical or spelling errors that are out of the ordinary, and that make a message hard to read. Did anyone else notice the most recent message on the front page seems to have such errors? For example, "the lookup of result in process", "hosts being assigned large number or [of?] results to compute", and "The host. think it received", among others. These are not normal for the seti@home front page or any technical message one usually finds on the site.

OR it could have simply been that Eric was tired on a Sunday when he composed that posting and was more concerned with getting the info out than worrying about being grammatically correct..........



No it is the lgm's i just know it.
lol
____________

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 628
Credit: 144,352,028
RAC: 153,190
United Kingdom
Message 1309015 - Posted: 22 Nov 2012, 22:08:47 UTC

...and so we're cranking up to rolling speed again...
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8670
Credit: 51,868,594
RAC: 49,356
United Kingdom
Message 1309016 - Posted: 22 Nov 2012, 22:09:21 UTC

Master database queries/second 3,438

Congratulations Eric - my, that turkey is going to taste nice when you get home.

tbretProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2861
Credit: 217,213,576
RAC: 226,363
United States
Message 1309020 - Posted: 22 Nov 2012, 22:14:34 UTC - in response to Message 1309016.

Master database queries/second 3,438

Congratulations Eric - my, that turkey is going to taste nice when you get home.


+1

...etc.

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5421
Credit: 308,585,872
RAC: 357,592
Brazil
Message 1309027 - Posted: 22 Nov 2012, 22:43:35 UTC

ItĀ“alive! Again...
____________

Gone
Send message
Joined: 31 May 99
Posts: 150
Credit: 125,774,760
RAC: 0
United Kingdom
Message 1309043 - Posted: 22 Nov 2012, 23:37:18 UTC

Yuuupp,
thanks to Eric for fixing it on Thanksgiving.

____________

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Copyright © 2014 University of California