Panic Mode On (79) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 23 · Next
Author Message
Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5561
Credit: 51,262,763
RAC: 38,756
Australia
Message 1308666 - Posted: 22 Nov 2012, 7:45:33 UTC - in response to Message 1308658.


AP being split & sent off in large numbers will be the real test.
____________
Grant
Darwin NT.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37287
Credit: 498,266,474
RAC: 494,610
United States
Message 1308669 - Posted: 22 Nov 2012, 7:48:11 UTC - in response to Message 1308666.


AP being split & sent off in large numbers will be the real test.

Agreed.

If I understand it, the change was in the routing of the pipe to the servers?
____________
******************
Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5561
Credit: 51,262,763
RAC: 38,756
Australia
Message 1308676 - Posted: 22 Nov 2012, 7:57:55 UTC - in response to Message 1308669.
Last modified: 22 Nov 2012, 8:00:44 UTC


AP being split & sent off in large numbers will be the real test.

Agreed.

If I understand it, the change was in the routing of the pipe to the servers?

Just the Scheduler.
Apparently (at least for now) they're able to use the campus network for the Scheduler traffic.

If you look at the network graphs at present, instead of being around 14-20Mb/s it's been sitting around 10-12Mb/s inbound.

I did some pings (posted a few posts before these from memory).
No packet loss at all, where as the download server (i use .13 exclusively) is around 50-75% & the upload server is around 50% packet loss.


EDIT- & the other real test will be to bump up the limits & see if things fall over again or not. Maybe 400 per core & 1200 per GPU to start with?
Hint, hint.
____________
Grant
Darwin NT.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37287
Credit: 498,266,474
RAC: 494,610
United States
Message 1308678 - Posted: 22 Nov 2012, 8:03:54 UTC - in response to Message 1308676.
Last modified: 22 Nov 2012, 8:08:22 UTC


AP being split & sent off in large numbers will be the real test.

Agreed.

If I understand it, the change was in the routing of the pipe to the servers?

Just the Scheduler.
Apparently (at least for now) they're able to use the campus network for the Scheduler traffic.

If you look at the network graphs at present, instead of being around 14-20Mb/s it's been sitting around 10-12Mb/s inbound.

I did some pings (posted a few posts before these from memory).
No packet loss at all, where as the download server (i use .13 exclusively) is around 50% & the upload server is 50-75% packet loss.

Whatever works, I guess. Splitting the scheduler request comms from the download pipe makes a lot of sense.
Over the last couple of weeks, I found that when my rigs did a scheduler request, most of the time when I checked my account page, contact WAS made by them at the time of the request. The problem was, they never got answered. So if the scheduler comms can be handled without too many errors, that should help to stop the ghost task generation. Then the only problem is downloads.....which kind of moderate things themselves, as when downloads are backed up, you get scheduler requests to report work which don't ask for new tasks.
It's kinda like a salesman driving around in a Porche to take orders, but of course the delivery is by a much slower truck. And if da truck don't deliver da goods, the salesman don't get no more orders....LOL.
____________
******************
Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 552
Credit: 120,056,058
RAC: 85,320
United Kingdom
Message 1308767 - Posted: 22 Nov 2012, 13:56:44 UTC - in response to Message 1308561.
Last modified: 22 Nov 2012, 14:02:49 UTC

Now we just need a small tweak to divide those into 'before tonight' and 'after tonight', so we know what effect Eric's changes have had.

Here's a graph of my response times (UTC) for the last couple of days -- I couldn't get it to embed, perhaps because of the https. Timed-out requests were set to 330 seconds.
https://lh4.googleusercontent.com/-dde5ywVYBuM/UK4sHPNdySI/AAAAAAAAAY0/KCmDzfOo6lI/s800/setiresponse.png

[Edit] Spoke too soon; everything's dropped off the cliff and it's timing out again...
____________

mikeej42
Send message
Joined: 26 Oct 00
Posts: 109
Credit: 779,552,224
RAC: 124,615
United States
Message 1308771 - Posted: 22 Nov 2012, 14:10:37 UTC
Last modified: 22 Nov 2012, 14:20:34 UTC

11/22/2012 8:06:02 AM | SETI@home | Sending scheduler request: Requested by user.
11/22/2012 8:06:02 AM | SETI@home | Reporting 28 completed tasks, requesting new tasks for CPU
11/22/2012 8:06:24 AM | SETI@home | Scheduler request failed: Couldn't connect to server

<PanicMode>1</PanicMode>
It is going to be a long weekend....

[Edit] A U.S. Holyday (sic) for many. 4 days till the work week resumes.
____________

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,457,340
RAC: 330,211
Brazil
Message 1308774 - Posted: 22 Nov 2012, 14:23:19 UTC
Last modified: 22 Nov 2012, 14:39:17 UTC

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=octets

Cricket graphics: Shows Dive, dive... dive! But DL rises to incredibles > 250kbps! Without proxy!

Any clues???

(edit)

<PanicMode>1</PanicMode>

+1
____________

Profile Fred E.
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 731
Credit: 22,049,123
RAC: 23,699
United States
Message 1308795 - Posted: 22 Nov 2012, 15:17:01 UTC


http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=octets

Cricket graphics: Shows Dive, dive... dive! But DL rises to incredibles > 250kbps! Without proxy!

Any clues???


Would also note that the Server Status Page hasn't updated for over 2 hours. I had good luck w/o a proxy last night, although download speeds were low.
If interested, here's a Cricket link that also shows the weekly graph:

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d%3Aw

____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,457,340
RAC: 330,211
Brazil
Message 1308800 - Posted: 22 Nov 2012, 15:21:35 UTC - in response to Message 1308795.
Last modified: 22 Nov 2012, 15:24:23 UTC


http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=octets

Cricket graphics: Shows Dive, dive... dive! But DL rises to incredibles > 250kbps! Without proxy!

Any clues???


Would also note that the Server Status Page hasn't updated for over 2 hours. I had good luck w/o a proxy last night, although download speeds were low.
If interested, here's a Cricket link that also shows the weekly graph:

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d%3Aw


Allready notice that, stuck at 13:00 UTC hours ago... and belive nobody is in the lab because the Thanksgiving holiday besides the ghosts in the machine... i belive we could do nothing else beside open a beer or two and wait... will do my part ASAP, is normal working day here.
____________

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 44,924,001
RAC: 13,574
United Kingdom
Message 1308804 - Posted: 22 Nov 2012, 15:29:14 UTC

OK, I've been having a scout around. So far....

Well, it happened while I was out at lunch, OK? Nothing to do with me. Jeez, can't I even trust you guys to mind the shop while I go and fetch a sandwich ... LOL :)

Something seemed to happen to the scheduler - quite suddenly - at about 13:24 UTC. One host got a timeout, everything else has been "Couldn't connect to server" since then.

Synergy has been responding to pings, so I guess the server itself is running, but the programs we need to handle work requests and reports clearly aren't - maybe Apache has failed.

I also see that the Server Status Page hasn't updated since [As of 22 Nov 2012 | 13:00:07 UTC]. That usually means that one of the auxiliary servers in the lab, that handles the glue that holds the whole ball of string together, has crashed.

Some of the lab servers are on remotely-controlled power strips, so they can be given a remote kicking (power down and power back up). If this failure can be handled like that, we might see some resumption after the staff have finished their holiday lie-ins. Otherwise, we're probably reduced to hoping that some member of staff will accept the excuse to evade the Black Friday shopping trip tomorrow...

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,457,340
RAC: 330,211
Brazil
Message 1308807 - Posted: 22 Nov 2012, 15:36:19 UTC

Take a beer on my acount to help in the waiting task and thanks for the info.
Have a good day


____________

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 44,924,001
RAC: 13,574
United Kingdom
Message 1308842 - Posted: 22 Nov 2012, 17:36:07 UTC

Front page News:

The scheduler will be down until someone can get to the lab to reboot it. I'll try to convince Angela to let me go in once the turkey is in the oven.

Eric

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37287
Credit: 498,266,474
RAC: 494,610
United States
Message 1308843 - Posted: 22 Nov 2012, 17:39:04 UTC - in response to Message 1308842.

Front page News:

The scheduler will be down until someone can get to the lab to reboot it. I'll try to convince Angela to let me go in once the turkey is in the oven.

Eric

Even Seti serves up a turkey today....LOL.
____________
******************
Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

tbret
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2389
Credit: 164,100,045
RAC: 68,357
United States
Message 1308924 - Posted: 22 Nov 2012, 19:34:16 UTC - in response to Message 1308842.

Front page News:

The scheduler will be down until someone can get to the lab to reboot it. I'll try to convince Angela to let me go in once the turkey is in the oven.

Eric


Now THAT is above and beyond the call of duty.


Profile dancer42
Volunteer tester
Send message
Joined: 2 Jun 02
Posts: 341
Credit: 1,078,912
RAC: 2
United States
Message 1308986 - Posted: 22 Nov 2012, 20:53:52 UTC - in response to Message 1307767.

One of the red flags that a site has been hacked or spoofed is that you find grammatical or spelling errors that are out of the ordinary, and that make a message hard to read. Did anyone else notice the most recent message on the front page seems to have such errors? For example, "the lookup of result in process", "hosts being assigned large number or [of?] results to compute", and "The host. think it received", among others. These are not normal for the seti@home front page or any technical message one usually finds on the site.

OR it could have simply been that Eric was tired on a Sunday when he composed that posting and was more concerned with getting the info out than worrying about being grammatically correct..........



No it is the lgm's i just know it.
lol
____________

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 552
Credit: 120,056,058
RAC: 85,320
United Kingdom
Message 1309015 - Posted: 22 Nov 2012, 22:08:47 UTC

...and so we're cranking up to rolling speed again...
____________

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 44,924,001
RAC: 13,574
United Kingdom
Message 1309016 - Posted: 22 Nov 2012, 22:09:21 UTC

Master database queries/second 3,438

Congratulations Eric - my, that turkey is going to taste nice when you get home.

tbret
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2389
Credit: 164,100,045
RAC: 68,357
United States
Message 1309020 - Posted: 22 Nov 2012, 22:14:34 UTC - in response to Message 1309016.

Master database queries/second 3,438

Congratulations Eric - my, that turkey is going to taste nice when you get home.


+1

...etc.

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,457,340
RAC: 330,211
Brazil
Message 1309027 - Posted: 22 Nov 2012, 22:43:35 UTC

ItĀ“alive! Again...
____________

Big Reg
Avatar
Send message
Joined: 31 May 99
Posts: 142
Credit: 119,635,708
RAC: 344,416
United Kingdom
Message 1309043 - Posted: 22 Nov 2012, 23:37:18 UTC

Yuuupp,
thanks to Eric for fixing it on Thanksgiving.

____________

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (79) Server Problems?

Copyright © 2014 University of California