Panic Mode On (79) Server Problems?

Message boards : Number crunching : Panic Mode On (79) Server Problems?

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 23 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7489
Credit: 91,156,651
RAC: 46,508
Australia
Message 1308666 - Posted: 22 Nov 2012, 7:45:33 UTC - in response to Message 1308658.  


AP being split & sent off in large numbers will be the real test.
Grant
Darwin NT
ID: 1308666 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45941
Credit: 815,378,855
RAC: 125,004
United States
Message 1308669 - Posted: 22 Nov 2012, 7:48:11 UTC - in response to Message 1308666.  


AP being split & sent off in large numbers will be the real test.

Agreed.

If I understand it, the change was in the routing of the pipe to the servers?
Always remember.....kitties are all Angels with fur.

Have made friends in this life.
Most were cats.
ID: 1308669 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7489
Credit: 91,156,651
RAC: 46,508
Australia
Message 1308676 - Posted: 22 Nov 2012, 7:57:55 UTC - in response to Message 1308669.  
Last modified: 22 Nov 2012, 8:00:44 UTC


AP being split & sent off in large numbers will be the real test.

Agreed.

If I understand it, the change was in the routing of the pipe to the servers?

Just the Scheduler.
Apparently (at least for now) they're able to use the campus network for the Scheduler traffic.

If you look at the network graphs at present, instead of being around 14-20Mb/s it's been sitting around 10-12Mb/s inbound.

I did some pings (posted a few posts before these from memory).
No packet loss at all, where as the download server (i use .13 exclusively) is around 50-75% & the upload server is around 50% packet loss.


EDIT- & the other real test will be to bump up the limits & see if things fall over again or not. Maybe 400 per core & 1200 per GPU to start with?
Hint, hint.
Grant
Darwin NT
ID: 1308676 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45941
Credit: 815,378,855
RAC: 125,004
United States
Message 1308678 - Posted: 22 Nov 2012, 8:03:54 UTC - in response to Message 1308676.  
Last modified: 22 Nov 2012, 8:08:22 UTC


AP being split & sent off in large numbers will be the real test.

Agreed.

If I understand it, the change was in the routing of the pipe to the servers?

Just the Scheduler.
Apparently (at least for now) they're able to use the campus network for the Scheduler traffic.

If you look at the network graphs at present, instead of being around 14-20Mb/s it's been sitting around 10-12Mb/s inbound.

I did some pings (posted a few posts before these from memory).
No packet loss at all, where as the download server (i use .13 exclusively) is around 50% & the upload server is 50-75% packet loss.

Whatever works, I guess. Splitting the scheduler request comms from the download pipe makes a lot of sense.
Over the last couple of weeks, I found that when my rigs did a scheduler request, most of the time when I checked my account page, contact WAS made by them at the time of the request. The problem was, they never got answered. So if the scheduler comms can be handled without too many errors, that should help to stop the ghost task generation. Then the only problem is downloads.....which kind of moderate things themselves, as when downloads are backed up, you get scheduler requests to report work which don't ask for new tasks.
It's kinda like a salesman driving around in a Porche to take orders, but of course the delivery is by a much slower truck. And if da truck don't deliver da goods, the salesman don't get no more orders....LOL.
Always remember.....kitties are all Angels with fur.

Have made friends in this life.
Most were cats.
ID: 1308678 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 780
Credit: 232,728,686
RAC: 81,413
United Kingdom
Message 1308767 - Posted: 22 Nov 2012, 13:56:44 UTC - in response to Message 1308561.  
Last modified: 22 Nov 2012, 14:02:49 UTC

Now we just need a small tweak to divide those into 'before tonight' and 'after tonight', so we know what effect Eric's changes have had.

Here's a graph of my response times (UTC) for the last couple of days -- I couldn't get it to embed, perhaps because of the https. Timed-out requests were set to 330 seconds.
https://lh4.googleusercontent.com/-dde5ywVYBuM/UK4sHPNdySI/AAAAAAAAAY0/KCmDzfOo6lI/s800/setiresponse.png

[Edit] Spoke too soon; everything's dropped off the cliff and it's timing out again...
ID: 1308767 · Report as offensive
mikeej42

Send message
Joined: 26 Oct 00
Posts: 109
Credit: 791,863,759
RAC: 0
United States
Message 1308771 - Posted: 22 Nov 2012, 14:10:37 UTC
Last modified: 22 Nov 2012, 14:20:34 UTC

11/22/2012 8:06:02 AM | SETI@home | Sending scheduler request: Requested by user.
11/22/2012 8:06:02 AM | SETI@home | Reporting 28 completed tasks, requesting new tasks for CPU
11/22/2012 8:06:24 AM | SETI@home | Scheduler request failed: Couldn't connect to server

<PanicMode>1</PanicMode>
It is going to be a long weekend....

[Edit] A U.S. Holyday (sic) for many. 4 days till the work week resumes.
ID: 1308771 · Report as offensive
juan BFP
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 5847
Credit: 330,560,911
RAC: 7,813
Panama
Message 1308774 - Posted: 22 Nov 2012, 14:23:19 UTC
Last modified: 22 Nov 2012, 14:39:17 UTC

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=octets

Cricket graphics: Shows Dive, dive... dive! But DL rises to incredibles > 250kbps! Without proxy!

Any clues???

(edit)
<PanicMode>1</PanicMode>

+1
ID: 1308774 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1308795 - Posted: 22 Nov 2012, 15:17:01 UTC


http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=octets

Cricket graphics: Shows Dive, dive... dive! But DL rises to incredibles > 250kbps! Without proxy!

Any clues???


Would also note that the Server Status Page hasn't updated for over 2 hours. I had good luck w/o a proxy last night, although download speeds were low.
If interested, here's a Cricket link that also shows the weekly graph:

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d%3Aw

Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1308795 · Report as offensive
juan BFP
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 5847
Credit: 330,560,911
RAC: 7,813
Panama
Message 1308800 - Posted: 22 Nov 2012, 15:21:35 UTC - in response to Message 1308795.  
Last modified: 22 Nov 2012, 15:24:23 UTC


http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=octets

Cricket graphics: Shows Dive, dive... dive! But DL rises to incredibles > 250kbps! Without proxy!

Any clues???


Would also note that the Server Status Page hasn't updated for over 2 hours. I had good luck w/o a proxy last night, although download speeds were low.
If interested, here's a Cricket link that also shows the weekly graph:

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d%3Aw


Allready notice that, stuck at 13:00 UTC hours ago... and belive nobody is in the lab because the Thanksgiving holiday besides the ghosts in the machine... i belive we could do nothing else beside open a beer or two and wait... will do my part ASAP, is normal working day here.
ID: 1308800 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11142
Credit: 83,815,452
RAC: 45,927
United Kingdom
Message 1308804 - Posted: 22 Nov 2012, 15:29:14 UTC

OK, I've been having a scout around. So far....

Well, it happened while I was out at lunch, OK? Nothing to do with me. Jeez, can't I even trust you guys to mind the shop while I go and fetch a sandwich ... LOL :)

Something seemed to happen to the scheduler - quite suddenly - at about 13:24 UTC. One host got a timeout, everything else has been "Couldn't connect to server" since then.

Synergy has been responding to pings, so I guess the server itself is running, but the programs we need to handle work requests and reports clearly aren't - maybe Apache has failed.

I also see that the Server Status Page hasn't updated since [As of 22 Nov 2012 | 13:00:07 UTC]. That usually means that one of the auxiliary servers in the lab, that handles the glue that holds the whole ball of string together, has crashed.

Some of the lab servers are on remotely-controlled power strips, so they can be given a remote kicking (power down and power back up). If this failure can be handled like that, we might see some resumption after the staff have finished their holiday lie-ins. Otherwise, we're probably reduced to hoping that some member of staff will accept the excuse to evade the Black Friday shopping trip tomorrow...
ID: 1308804 · Report as offensive
juan BFP
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 5847
Credit: 330,560,911
RAC: 7,813
Panama
Message 1308807 - Posted: 22 Nov 2012, 15:36:19 UTC

Take a beer on my acount to help in the waiting task and thanks for the info.
Have a good day


ID: 1308807 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11142
Credit: 83,815,452
RAC: 45,927
United Kingdom
Message 1308842 - Posted: 22 Nov 2012, 17:36:07 UTC

Front page News:

The scheduler will be down until someone can get to the lab to reboot it. I'll try to convince Angela to let me go in once the turkey is in the oven.

Eric
ID: 1308842 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45941
Credit: 815,378,855
RAC: 125,004
United States
Message 1308843 - Posted: 22 Nov 2012, 17:39:04 UTC - in response to Message 1308842.  

Front page News:

The scheduler will be down until someone can get to the lab to reboot it. I'll try to convince Angela to let me go in once the turkey is in the oven.

Eric

Even Seti serves up a turkey today....LOL.
Always remember.....kitties are all Angels with fur.

Have made friends in this life.
Most were cats.
ID: 1308843 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3373
Credit: 248,516,178
RAC: 20,814
United States
Message 1308924 - Posted: 22 Nov 2012, 19:34:16 UTC - in response to Message 1308842.  

Front page News:

The scheduler will be down until someone can get to the lab to reboot it. I'll try to convince Angela to let me go in once the turkey is in the oven.

Eric


Now THAT is above and beyond the call of duty.


ID: 1308924 · Report as offensive
Profile dancer42Project Donor
Volunteer tester

Send message
Joined: 2 Jun 02
Posts: 455
Credit: 2,283,606
RAC: 170
United States
Message 1308986 - Posted: 22 Nov 2012, 20:53:52 UTC - in response to Message 1307767.  

One of the red flags that a site has been hacked or spoofed is that you find grammatical or spelling errors that are out of the ordinary, and that make a message hard to read. Did anyone else notice the most recent message on the front page seems to have such errors? For example, "the lookup of result in process", "hosts being assigned large number or [of?] results to compute", and "The host. think it received", among others. These are not normal for the seti@home front page or any technical message one usually finds on the site.

OR it could have simply been that Eric was tired on a Sunday when he composed that posting and was more concerned with getting the info out than worrying about being grammatically correct..........



No it is the lgm's i just know it.
lol
ID: 1308986 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 780
Credit: 232,728,686
RAC: 81,413
United Kingdom
Message 1309015 - Posted: 22 Nov 2012, 22:08:47 UTC

...and so we're cranking up to rolling speed again...
ID: 1309015 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11142
Credit: 83,815,452
RAC: 45,927
United Kingdom
Message 1309016 - Posted: 22 Nov 2012, 22:09:21 UTC

Master database queries/second 3,438

Congratulations Eric - my, that turkey is going to taste nice when you get home.
ID: 1309016 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3373
Credit: 248,516,178
RAC: 20,814
United States
Message 1309020 - Posted: 22 Nov 2012, 22:14:34 UTC - in response to Message 1309016.  

Master database queries/second 3,438

Congratulations Eric - my, that turkey is going to taste nice when you get home.


+1

...etc.
ID: 1309020 · Report as offensive
juan BFP
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 5847
Credit: 330,560,911
RAC: 7,813
Panama
Message 1309027 - Posted: 22 Nov 2012, 22:43:35 UTC

ItĀ“alive! Again...
ID: 1309027 · Report as offensive
Gone

Send message
Joined: 31 May 99
Posts: 150
Credit: 125,779,206
RAC: 0
United Kingdom
Message 1309043 - Posted: 22 Nov 2012, 23:37:18 UTC

Yuuupp,
thanks to Eric for fixing it on Thanksgiving.

ID: 1309043 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 23 · Next

Message boards : Number crunching : Panic Mode On (79) Server Problems?


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.