Panic Mode On (79) Server Problems?

Message boards : Number crunching : Panic Mode On (79) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 22 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1308676 - Posted: 22 Nov 2012, 7:57:55 UTC - in response to Message 1308669.  
Last modified: 22 Nov 2012, 8:00:44 UTC


AP being split & sent off in large numbers will be the real test.

Agreed.

If I understand it, the change was in the routing of the pipe to the servers?

Just the Scheduler.
Apparently (at least for now) they're able to use the campus network for the Scheduler traffic.

If you look at the network graphs at present, instead of being around 14-20Mb/s it's been sitting around 10-12Mb/s inbound.

I did some pings (posted a few posts before these from memory).
No packet loss at all, where as the download server (i use .13 exclusively) is around 50-75% & the upload server is around 50% packet loss.


EDIT- & the other real test will be to bump up the limits & see if things fall over again or not. Maybe 400 per core & 1200 per GPU to start with?
Hint, hint.
Grant
Darwin NT
ID: 1308676 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1308678 - Posted: 22 Nov 2012, 8:03:54 UTC - in response to Message 1308676.  
Last modified: 22 Nov 2012, 8:08:22 UTC


AP being split & sent off in large numbers will be the real test.

Agreed.

If I understand it, the change was in the routing of the pipe to the servers?

Just the Scheduler.
Apparently (at least for now) they're able to use the campus network for the Scheduler traffic.

If you look at the network graphs at present, instead of being around 14-20Mb/s it's been sitting around 10-12Mb/s inbound.

I did some pings (posted a few posts before these from memory).
No packet loss at all, where as the download server (i use .13 exclusively) is around 50% & the upload server is 50-75% packet loss.

Whatever works, I guess. Splitting the scheduler request comms from the download pipe makes a lot of sense.
Over the last couple of weeks, I found that when my rigs did a scheduler request, most of the time when I checked my account page, contact WAS made by them at the time of the request. The problem was, they never got answered. So if the scheduler comms can be handled without too many errors, that should help to stop the ghost task generation. Then the only problem is downloads.....which kind of moderate things themselves, as when downloads are backed up, you get scheduler requests to report work which don't ask for new tasks.
It's kinda like a salesman driving around in a Porche to take orders, but of course the delivery is by a much slower truck. And if da truck don't deliver da goods, the salesman don't get no more orders....LOL.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1308678 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1308767 - Posted: 22 Nov 2012, 13:56:44 UTC - in response to Message 1308561.  
Last modified: 22 Nov 2012, 14:02:49 UTC

Now we just need a small tweak to divide those into 'before tonight' and 'after tonight', so we know what effect Eric's changes have had.

Here's a graph of my response times (UTC) for the last couple of days -- I couldn't get it to embed, perhaps because of the https. Timed-out requests were set to 330 seconds.
https://lh4.googleusercontent.com/-dde5ywVYBuM/UK4sHPNdySI/AAAAAAAAAY0/KCmDzfOo6lI/s800/setiresponse.png

[Edit] Spoke too soon; everything's dropped off the cliff and it's timing out again...
ID: 1308767 · Report as offensive
mikeej42

Send message
Joined: 26 Oct 00
Posts: 109
Credit: 791,875,385
RAC: 9
United States
Message 1308771 - Posted: 22 Nov 2012, 14:10:37 UTC
Last modified: 22 Nov 2012, 14:20:34 UTC

11/22/2012 8:06:02 AM | SETI@home | Sending scheduler request: Requested by user.
11/22/2012 8:06:02 AM | SETI@home | Reporting 28 completed tasks, requesting new tasks for CPU
11/22/2012 8:06:24 AM | SETI@home | Scheduler request failed: Couldn't connect to server

<PanicMode>1</PanicMode>
It is going to be a long weekend....

[Edit] A U.S. Holyday (sic) for many. 4 days till the work week resumes.
ID: 1308771 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1308774 - Posted: 22 Nov 2012, 14:23:19 UTC
Last modified: 22 Nov 2012, 14:39:17 UTC

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=octets

Cricket graphics: Shows Dive, dive... dive! But DL rises to incredibles > 250kbps! Without proxy!

Any clues???

(edit)
<PanicMode>1</PanicMode>

+1
ID: 1308774 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1308795 - Posted: 22 Nov 2012, 15:17:01 UTC


http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=octets

Cricket graphics: Shows Dive, dive... dive! But DL rises to incredibles > 250kbps! Without proxy!

Any clues???


Would also note that the Server Status Page hasn't updated for over 2 hours. I had good luck w/o a proxy last night, although download speeds were low.
If interested, here's a Cricket link that also shows the weekly graph:

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d%3Aw

Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1308795 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1308800 - Posted: 22 Nov 2012, 15:21:35 UTC - in response to Message 1308795.  
Last modified: 22 Nov 2012, 15:24:23 UTC


http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=octets

Cricket graphics: Shows Dive, dive... dive! But DL rises to incredibles > 250kbps! Without proxy!

Any clues???


Would also note that the Server Status Page hasn't updated for over 2 hours. I had good luck w/o a proxy last night, although download speeds were low.
If interested, here's a Cricket link that also shows the weekly graph:

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;view=Octets;ranges=d%3Aw


Allready notice that, stuck at 13:00 UTC hours ago... and belive nobody is in the lab because the Thanksgiving holiday besides the ghosts in the machine... i belive we could do nothing else beside open a beer or two and wait... will do my part ASAP, is normal working day here.
ID: 1308800 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1308804 - Posted: 22 Nov 2012, 15:29:14 UTC

OK, I've been having a scout around. So far....

Well, it happened while I was out at lunch, OK? Nothing to do with me. Jeez, can't I even trust you guys to mind the shop while I go and fetch a sandwich ... LOL :)

Something seemed to happen to the scheduler - quite suddenly - at about 13:24 UTC. One host got a timeout, everything else has been "Couldn't connect to server" since then.

Synergy has been responding to pings, so I guess the server itself is running, but the programs we need to handle work requests and reports clearly aren't - maybe Apache has failed.

I also see that the Server Status Page hasn't updated since [As of 22 Nov 2012 | 13:00:07 UTC]. That usually means that one of the auxiliary servers in the lab, that handles the glue that holds the whole ball of string together, has crashed.

Some of the lab servers are on remotely-controlled power strips, so they can be given a remote kicking (power down and power back up). If this failure can be handled like that, we might see some resumption after the staff have finished their holiday lie-ins. Otherwise, we're probably reduced to hoping that some member of staff will accept the excuse to evade the Black Friday shopping trip tomorrow...
ID: 1308804 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1308807 - Posted: 22 Nov 2012, 15:36:19 UTC

Take a beer on my acount to help in the waiting task and thanks for the info.
Have a good day


ID: 1308807 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1308842 - Posted: 22 Nov 2012, 17:36:07 UTC

Front page News:

The scheduler will be down until someone can get to the lab to reboot it. I'll try to convince Angela to let me go in once the turkey is in the oven.

Eric
ID: 1308842 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1308843 - Posted: 22 Nov 2012, 17:39:04 UTC - in response to Message 1308842.  

Front page News:

The scheduler will be down until someone can get to the lab to reboot it. I'll try to convince Angela to let me go in once the turkey is in the oven.

Eric

Even Seti serves up a turkey today....LOL.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1308843 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1308924 - Posted: 22 Nov 2012, 19:34:16 UTC - in response to Message 1308842.  

Front page News:

The scheduler will be down until someone can get to the lab to reboot it. I'll try to convince Angela to let me go in once the turkey is in the oven.

Eric


Now THAT is above and beyond the call of duty.


ID: 1308924 · Report as offensive
Profile dancer42
Volunteer tester

Send message
Joined: 2 Jun 02
Posts: 455
Credit: 2,422,890
RAC: 1
United States
Message 1308986 - Posted: 22 Nov 2012, 20:53:52 UTC - in response to Message 1307767.  

One of the red flags that a site has been hacked or spoofed is that you find grammatical or spelling errors that are out of the ordinary, and that make a message hard to read. Did anyone else notice the most recent message on the front page seems to have such errors? For example, "the lookup of result in process", "hosts being assigned large number or [of?] results to compute", and "The host. think it received", among others. These are not normal for the seti@home front page or any technical message one usually finds on the site.

OR it could have simply been that Eric was tired on a Sunday when he composed that posting and was more concerned with getting the info out than worrying about being grammatically correct..........



No it is the lgm's i just know it.
lol
ID: 1308986 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1309015 - Posted: 22 Nov 2012, 22:08:47 UTC

...and so we're cranking up to rolling speed again...
ID: 1309015 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1309016 - Posted: 22 Nov 2012, 22:09:21 UTC

Master database queries/second 3,438

Congratulations Eric - my, that turkey is going to taste nice when you get home.
ID: 1309016 · Report as offensive
tbret
Volunteer tester
Avatar

Send message
Joined: 28 May 99
Posts: 3380
Credit: 296,162,071
RAC: 40
United States
Message 1309020 - Posted: 22 Nov 2012, 22:14:34 UTC - in response to Message 1309016.  

Master database queries/second 3,438

Congratulations Eric - my, that turkey is going to taste nice when you get home.


+1

...etc.
ID: 1309020 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1309027 - Posted: 22 Nov 2012, 22:43:35 UTC

It´alive! Again...
ID: 1309027 · Report as offensive
Gone

Send message
Joined: 31 May 99
Posts: 150
Credit: 125,779,206
RAC: 0
United Kingdom
Message 1309043 - Posted: 22 Nov 2012, 23:37:18 UTC

Yuuupp,
thanks to Eric for fixing it on Thanksgiving.

ID: 1309043 · Report as offensive
musicplayer

Send message
Joined: 17 May 10
Posts: 2430
Credit: 926,046
RAC: 0
Message 1309052 - Posted: 23 Nov 2012, 0:20:06 UTC

Having problems reporting to the server?

Why not try out the following:

Check out your task list. The tasks that have finished up on your computer lists as "Ready to report" in your Tasks tab. There may be other tasks in this tab being in other states. Then check out your Messages tab or the separate Event log for the most recent versions of BOINC Manager. You may have tasks that are being uploaded in this list as well.

This should work out for most of the time - at least I do not have to press the Retry button on the uploaded tasks, although they may sometimes hang a little while at 100 % uploaded before finishing up completely.

Then choose the "Projects" tab and select "Update" for the selected project. After having done this return back to the tasks tab or the Messages tab / Event log and possibly alternating betweeen these tabs should tell you that the tasks have been reported. This could take a little while, sometimes a couple of minutes.

If this does not work out, set "No new tasks" for the selected project and carry out the same process once more, namely push the "Update" button. This should work out, but if you are experienced on this project you may know when this does not work out without even trying it out.

The only question is whether you should wait 5 minutes before trying to report with "Allow new tasks" before re-trying with "No new tasks" set active. Does the scheduler acknowledge a request when the client is unable to report to the scheduler? My assumption is that is so, but perhaps this is not correct.
ID: 1309052 · Report as offensive
PCMS

Send message
Joined: 12 Aug 12
Posts: 2
Credit: 3,903,982
RAC: 0
Denmark
Message 1309064 - Posted: 23 Nov 2012, 0:50:26 UTC - in response to Message 1309034.  

I'm getting tired of that I can not receive tasks or send the calculated files back. I have downsized task in favor of another, which gives me access to upload and sending results. If this is not bedere, considering I stop to make my IDEL time with this service. hope that it will be this service will be bedere.
ID: 1309064 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (79) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.