Panic Mode On (60) Server problems?

Message boards : Number crunching : Panic Mode On (60) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1168401 - Posted: 5 Nov 2011, 21:56:20 UTC - in response to Message 1168378.  

I don't know how the Scheduler is going at the moment

Uploads finally cleared, and the Scheduler requests are still timing out.

Grant
Darwin NT
ID: 1168401 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1168409 - Posted: 5 Nov 2011, 22:05:06 UTC - in response to Message 1168401.  

I don't know how the Scheduler is going at the moment

Uploads finally cleared, and the Scheduler requests are still timing out.

Maybe time to do a tracert test on the scheduler from your side of the planet. I got a scheduler response in 45 seconds, half an hour ago: that's slow, but nothing like as slow as yesterday.
ID: 1168409 · Report as offensive
JohnDK Crowdfunding Project Donor*Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 1222
Credit: 451,243,443
RAC: 1,127
Denmark
Message 1168416 - Posted: 5 Nov 2011, 22:29:36 UTC

Just had 6 timeouts in a row, the 7th got through sending lost tasks.
ID: 1168416 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1168419 - Posted: 5 Nov 2011, 22:36:32 UTC - in response to Message 1168409.  
Last modified: 5 Nov 2011, 22:37:39 UTC

I don't know how the Scheduler is going at the moment

Uploads finally cleared, and the Scheduler requests are still timing out.

Maybe time to do a tracert test on the scheduler from your side of the planet. I got a scheduler response in 45 seconds, half an hour ago: that's slow, but nothing like as slow as yesterday.

I suggested putting the scheduler on the campus link, then there would be no contention with downloads on a maxed out hurricane link, (there was no response to the suggestion)

Claggy
ID: 1168419 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1168420 - Posted: 5 Nov 2011, 22:39:56 UTC - in response to Message 1168409.  

I don't know how the Scheduler is going at the moment

Uploads finally cleared, and the Scheduler requests are still timing out.

Maybe time to do a tracert test on the scheduler from your side of the planet. I got a scheduler response in 45 seconds, half an hour ago: that's slow, but nothing like as slow as yesterday.

My other system's had a couple of timeouts, but most times it gets a resposne. But it's got 6.10.58 so it tries every 5 min; sooner or later you'll get a response. The one with 6.12.33 has been backing off anything from 30 min to 4 hours. Even with a better hit/miss ratio than 6.10.58, it's not going to get much work.
Grant
Darwin NT
ID: 1168420 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1168422 - Posted: 5 Nov 2011, 22:46:48 UTC - in response to Message 1168419.  


I suggested putting the scheduler on the campus link, then there would be no contention with downloads on a maxed out hurricane link, (there was no response to the suggestion)

Claggy


That sounds like an NIH response
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1168422 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1168427 - Posted: 5 Nov 2011, 22:50:07 UTC - in response to Message 1168422.  
Last modified: 5 Nov 2011, 23:03:09 UTC

Here's some pings.

Ping statistics for 208.68.240.13:
Packets: Sent = 4, Received = 2, Lost = 2 (50% loss),
Approximate round trip times in milli-seconds:
Minimum = 286ms, Maximum = 290ms, Average = 288ms


Ping statistics for 208.68.240.16:
Packets: Sent = 4, Received = 3, Lost = 1 (25% loss),
Approximate round trip times in milli-seconds:
Minimum = 282ms, Maximum = 311ms, Average = 292ms


Ping statistics for 208.68.240.18:
Packets: Sent = 4, Received = 0, Lost = 4 (100% loss),


Ping statistics for 208.68.240.20:
Packets: Sent = 4, Received = 1, Lost = 3 (75% loss),
Approximate round trip times in milli-seconds:
Minimum = 278ms, Maximum = 278ms, Average = 278ms


EDIT- just noticed my good machine was getting Scheduler responses in less than a minute, so i did a manaul update. Reported & got some work in about 1:30.
Grant
Darwin NT
ID: 1168427 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1168431 - Posted: 5 Nov 2011, 23:02:42 UTC
Last modified: 5 Nov 2011, 23:07:25 UTC

And here's a tracert

Tracing route to boinc2.ssl.berkeley.edu [208.68.240.13]
over a maximum of 30 hops:

1 1 ms <1 ms <1 ms home.gateway.home.gateway [192.168.1.254]
2 56 ms 55 ms 55 ms lo5000.lns2.adl1.adnap.net.au [122.49.191.12]
3 55 ms 55 ms 54 ms g3-32.cor2.adl1.adnap.net.au [219.90.143.121]
4 55 ms 55 ms 54 ms te1-3.cor2.adl4.adnap.net.au [219.90.143.242]
5 55 ms 56 ms 55 ms vlan369.55drc76fg.optus.net.au [59.154.0.49]
6 236 ms 235 ms 236 ms 203.208.192.241
7 241 ms 240 ms 240 ms xe-0-0-0-0.plapx-cr2.ix.singtel.com [203.208.183
.161]
8 237 ms 238 ms 236 ms paix.he.net [198.32.176.20]
9 241 ms 240 ms 240 ms 64.71.140.42
10 278 ms 278 ms 279 ms 208.68.243.254
11 * 294 ms 280 ms boinc2.ssl.berkeley.edu [208.68.240.13]

Trace complete.


Tracing route to setiboincdata.ssl.berkeley.edu [208.68.240.16]
over a maximum of 30 hops:

1 <1 ms <1 ms <1 ms home.gateway.home.gateway [192.168.1.254]
2 55 ms 55 ms 54 ms lo5000.lns2.adl1.adnap.net.au [122.49.191.12]
3 54 ms 54 ms 54 ms g3-32.cor2.adl1.adnap.net.au [219.90.143.121]
4 54 ms 54 ms 55 ms te1-3.cor2.adl4.adnap.net.au [219.90.143.242]
5 55 ms 55 ms 55 ms vlan369.55drc76fg.optus.net.au [59.154.0.49]
6 239 ms 240 ms 240 ms 203.208.192.241
7 244 ms 243 ms 244 ms POS3-2.sngtp-ar2.ix.singtel.com [203.208.182.205
]
8 250 ms 249 ms 249 ms paix.he.net [198.32.176.20]
9 244 ms 265 ms 244 ms 64.71.140.42
10 282 ms * 282 ms 208.68.243.254
11 * 282 ms * setiboincdata.ssl.berkeley.edu [208.68.240.16]
12 283 ms 282 ms 282 ms setiboincdata.ssl.berkeley.edu [208.68.240.16]

Trace complete.


Tracing route to boinc2.ssl.berkeley.edu [208.68.240.18]
over a maximum of 30 hops:

1 <1 ms <1 ms <1 ms home.gateway.home.gateway [192.168.1.254]
2 55 ms 55 ms 55 ms lo5000.lns2.adl1.adnap.net.au [122.49.191.12]
3 54 ms 55 ms 55 ms g3-32.cor2.adl1.adnap.net.au [219.90.143.121]
4 54 ms 55 ms 55 ms te1-3.cor2.adl4.adnap.net.au [219.90.143.242]
5 55 ms 56 ms 55 ms vlan369.55drc76fg.optus.net.au [59.154.0.49]
6 240 ms 359 ms 239 ms 203.208.192.241
7 244 ms 244 ms 244 ms POS3-2.sngtp-ar2.ix.singtel.com [203.208.182.205
]
8 240 ms 248 ms 248 ms paix.he.net [198.32.176.20]
9 245 ms 244 ms 245 ms 64.71.140.42
10 * 282 ms 282 ms 208.68.243.254
11 * * * Request timed out.
12 * * * Request timed out.
13 * * * Request timed out.
14 * * * Request timed out.
15 * * * Request timed out.
16 * * * Request timed out.
17 * * * Request timed out.
18 * * * Request timed out.
19 * * * Request timed out.
20 * * * Request timed out.
21 * * * Request timed out.
22 * * * Request timed out.
23 * * * Request timed out.
24 * * * Request timed out.
25 * * * Request timed out.
26 * * * Request timed out.
27 * * * Request timed out.
28 * * * Request timed out.
29 * * * Request timed out.
30 * * * Request timed out.

Trace complete.


Tracing route to setiboinc.ssl.berkeley.edu [208.68.240.20]
over a maximum of 30 hops:

1 <1 ms <1 ms <1 ms home.gateway.home.gateway [192.168.1.254]
2 55 ms 54 ms 55 ms lo5000.lns2.adl1.adnap.net.au [122.49.191.12]
3 55 ms 55 ms 57 ms g3-32.cor2.adl1.adnap.net.au [219.90.143.121]
4 54 ms 54 ms 55 ms te1-3.cor2.adl4.adnap.net.au [219.90.143.242]
5 56 ms 55 ms 55 ms vlan369.55drc76fg.optus.net.au [59.154.0.49]
6 236 ms 237 ms 235 ms 203.208.192.241
7 253 ms 240 ms 240 ms xe-0-0-0-0.plapx-cr2.ix.singtel.com [203.208.183
.161]
8 239 ms 237 ms 236 ms paix.he.net [198.32.176.20]
9 240 ms 241 ms 241 ms 64.71.140.42
10 241 ms 240 ms 240 ms 208.68.243.254
11 241 ms 241 ms 241 ms setiboinc.ssl.berkeley.edu [208.68.240.20]

Trace complete.


EDIT- .18 & .13 are the download servers aren't they? In which case .18 must be providing most of the bandwidth, or it's broken, if the ping & tracert are anything to go by.
Grant
Darwin NT
ID: 1168431 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1168526 - Posted: 6 Nov 2011, 3:49:23 UTC - in response to Message 1168516.  


And down.
No outbound network traffic.
Grant
Darwin NT
ID: 1168526 · Report as offensive
Profile Khangollo
Avatar

Send message
Joined: 1 Aug 00
Posts: 245
Credit: 36,410,524
RAC: 0
Slovenia
Message 1168530 - Posted: 6 Nov 2011, 3:57:35 UTC

Bandwidth problems fixed.
ID: 1168530 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1168550 - Posted: 6 Nov 2011, 5:11:15 UTC

Looks like mebbe Vader hit the skids.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1168550 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1168567 - Posted: 6 Nov 2011, 6:18:29 UTC
Last modified: 6 Nov 2011, 6:30:47 UTC

YES - UL & scheduler
NO - DL

.. - server reachable, at least for my machine.


- Best regards! - Sutaru Tsureku, team seti.international founder. - Optimize your PC for higher RAC. - SETI@home needs your help. -
ID: 1168567 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1168569 - Posted: 6 Nov 2011, 6:41:43 UTC - in response to Message 1168422.  


I suggested putting the scheduler on the campus link, then there would be no contention with downloads on a maxed out hurricane link, (there was no response to the suggestion)

Claggy


That sounds like an NIH response

More like the organizational politics problem that Eric has mentioned a time or two.
Donald
Infernal Optimist / Submariner, retired
ID: 1168569 · Report as offensive
Dave

Send message
Joined: 29 Mar 02
Posts: 778
Credit: 25,001,396
RAC: 0
United Kingdom
Message 1168576 - Posted: 6 Nov 2011, 7:39:26 UTC - in response to Message 1168567.  
Last modified: 6 Nov 2011, 7:40:12 UTC

NO - DL


NJMT*.

* Not just me then.
ID: 1168576 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1168585 - Posted: 6 Nov 2011, 8:55:52 UTC - in response to Message 1168422.  


I suggested putting the scheduler on the campus link, then there would be no contention with downloads on a maxed out hurricane link, (there was no response to the suggestion)

Claggy


That sounds like an NIH response


I only suggested it to the forum, and there was no response from the forum to it.

Claggy
ID: 1168585 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1168586 - Posted: 6 Nov 2011, 9:14:28 UTC

Ah, I thought you'd suggested it directly to the men on the hill.

Sounds a good idea, but a bit of thought would be needed to make sure that issues such as responses form "wrong" ip addresses weren't going to cause problems at the client end of things.
And it shouldn't be too difficult to implement.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1168586 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1168598 - Posted: 6 Nov 2011, 9:57:12 UTC

Its going to be a "fun filled feeding frenzy" when the tyre kicker gets in. I've got about 80 downloads stalled, and the number is growing steadily.

Interesting looking at the server status page - some bits of Vader are still showing as working (download server 2), but much of the rest is dead.

This I don't understand - I'd have thought that if Vader was truly down for the count then all its processes would be out, but. Both download servers are apparently running, yet we are getting no deliveries. Must be more than just a "simple" server crash :-(




That yellow fluff ball must be one mean critter...
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1168598 · Report as offensive
Blake Bonkofsky
Volunteer tester
Avatar

Send message
Joined: 29 Dec 99
Posts: 617
Credit: 46,383,149
RAC: 0
United States
Message 1168604 - Posted: 6 Nov 2011, 11:10:08 UTC - in response to Message 1168598.  
Last modified: 6 Nov 2011, 11:10:58 UTC

Indeed. I've already got about 700 WU's trying to download, and that number is going to keep climbing if the DL server fixed soon. In fact, my caches will be mostly dry within 24 hours. With current limits in place, that's 3200 WU's across my 3 machines. 3200 tasks that have already been assigned and just waiting to download.

Tasks are still being assigned, so I would think that the RAID system is still functioning. I'm betting a network connection got unplugged or failed. Both DL servers aren't even responding to pings, and traces fail, but the upload server and scheduler are both working just fine. If the RAID had died, work creation would cease, and the servers would still be reachable. Ready to send has been holding around 200,000.

Hopefully it's nothing serious and we are back in business without too much trouble.
ID: 1168604 · Report as offensive
AndyJ
Avatar

Send message
Joined: 17 Aug 02
Posts: 248
Credit: 27,380,797
RAC: 0
United Kingdom
Message 1168609 - Posted: 6 Nov 2011, 11:42:14 UTC

This is the flight deck, Captain Server speaking. As you may have noticed, we have started our decent, so buckle up as things may get a little bumpy on the way down. We hope to get you on the ground real soon, as we will simply fall out of the sky on Tuesday.
Thank you for flying Seti R.A.C,
Have a great day.

:-)

Regards,

A
ID: 1168609 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1168620 - Posted: 6 Nov 2011, 13:30:14 UTC - in response to Message 1168604.  

Indeed. I've already got about 700 WU's trying to download, and that number is going to keep climbing if the DL server fixed soon. In fact, my caches will be mostly dry within 24 hours. With current limits in place, that's 3200 WU's across my 3 machines. 3200 tasks that have already been assigned and just waiting to download.

Tasks are still being assigned, so I would think that the RAID system is still functioning. I'm betting a network connection got unplugged or failed. Both DL servers aren't even responding to pings, and traces fail, but the upload server and scheduler are both working just fine. If the RAID had died, work creation would cease, and the servers would still be reachable. Ready to send has been holding around 200,000.

Hopefully it's nothing serious and we are back in business without too much trouble.

Yeah, very interesting. Overnight, the rigs have been issued tasks to fill the caches back up to the current limits. Of course, no downloads are possible right now, so they can't process any of it.
Gonna be dicey getting those downloads through when the connection is reestablished.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1168620 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 9 · Next

Message boards : Number crunching : Panic Mode On (60) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.