Panic Mode On (100) Server Problems?

Message boards : Number crunching : Panic Mode On (100) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 32 · Next

AuthorMessage
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1730562 - Posted: 1 Oct 2015, 7:12:29 UTC - in response to Message 1730548.  

Thanks Matt, It's always nice to know you are on top of it !!
ID: 1730562 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1730563 - Posted: 1 Oct 2015, 7:13:38 UTC - in response to Message 1730549.  

Just had a look at my log, "Scheduler request failed: Couldn't connect to server" seems to be the going response, then every now and then it gets through.

A few download glitches, but that's been the case for a while now with the second download server down.
Grant
Darwin NT
ID: 1730563 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1730569 - Posted: 1 Oct 2015, 7:27:46 UTC - in response to Message 1730562.  

Thanks Matt, It's always nice to know you are on top of it !!

+1
ID: 1730569 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1730570 - Posted: 1 Oct 2015, 7:30:50 UTC - in response to Message 1730563.  

Just had a look at my log, "Scheduler request failed: Couldn't connect to server" seems to be the going response, then every now and then it gets through.

A few download glitches, but that's been the case for a while now with the second download server down.

Really bizarre stuff.
I can upload just fine, but have not hit the scheduler to report uploads or request new work from either machine for 15+ hours now. GPUs went empty, CPUs are threatening to do likewise.
ID: 1730570 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1730574 - Posted: 1 Oct 2015, 7:47:03 UTC - in response to Message 1730570.  

Really bizarre stuff.
I can upload just fine, but have not hit the scheduler to report uploads or request new work from either machine for 15+ hours now. GPUs went empty, CPUs are threatening to do likewise.

I seem to be getting through to the Scheduler on every 5-15 requests.
Grant
Darwin NT
ID: 1730574 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1730575 - Posted: 1 Oct 2015, 7:57:52 UTC - in response to Message 1730574.  
Last modified: 1 Oct 2015, 7:58:25 UTC

Really bizarre stuff.
I can upload just fine, but have not hit the scheduler to report uploads or request new work from either machine for 15+ hours now. GPUs went empty, CPUs are threatening to do likewise.

I seem to be getting through to the Scheduler on every 5-15 requests.

Grant, if you're up for it, I'd love to see a tracert of a successful request.
Would be interesting to see if you're hitting different routers than I do coming through Comcast.
ID: 1730575 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1730576 - Posted: 1 Oct 2015, 8:06:57 UTC - in response to Message 1730570.  

Just had a look at my log, "Scheduler request failed: Couldn't connect to server" seems to be the going response, then every now and then it gets through.

A few download glitches, but that's been the case for a while now with the second download server down.

Really bizarre stuff.
I can upload just fine, but have not hit the scheduler to report uploads or request new work from either machine for 15+ hours now.

To be specific, last work was received 09/30 at 1400gmt, last uploaded work I was able to report was 09/30 at 0929gmt. Uploads seem to be no problem at this point.
ID: 1730576 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1730577 - Posted: 1 Oct 2015, 8:15:55 UTC - in response to Message 1730576.  
Last modified: 1 Oct 2015, 8:18:14 UTC

setiboinc.ssl.berkeley.edu

1 3 ms 3 ms 3 ms home.gateway.home.gateway [192.168.1.254]
2 19 ms 18 ms 17 ms lo0.bng2.drw1.on.ii.net [150.101.32.81]
3 17 ms 34 ms 18 ms aeXX.cr1.drw1.on.ii.net [150.101.33.156]
4 242 ms 241 ms 246 ms xe-1-0-0-5.cr1.bne4.on.ii.net [150.101.35.0]
5 94 ms 107 ms 95 ms ae6.br1.syd7.on.ii.net [150.101.33.76]
6 269 ms 269 ms 268 ms te0-2-0-3.br2.sjc2.on.ii.net [203.16.213.158]
7 249 ms 246 ms 256 ms paix0.tr-cps.internet2.edu [198.32.176.128]
8 247 ms 248 ms 246 ms 64.57.21.7
9 241 ms 241 ms 240 ms dc-oak-agg4--svl-agg4-100ge.cenic.net [137.164.4
6.144]
10 253 ms 255 ms 257 ms ucb--oak-agg4-10g.cenic.net [137.164.50.31]
11 274 ms 274 ms 275 ms t2-3.inr-201-sut.Berkeley.EDU [128.32.0.37]
12 240 ms 241 ms 241 ms et3-48.inr-311-ewdc.Berkeley.EDU [128.32.0.101]

13 et3-48.inr-311-ewdc.Berkeley.EDU [128.32.0.101] reports: Destination host
unreachable.

Trace complete.


Gets lost at the same place as Cosmic_Oceans tracert.
Grant
Darwin NT
ID: 1730577 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1730579 - Posted: 1 Oct 2015, 8:35:57 UTC - in response to Message 1730575.  

if you're up for it, I'd love to see a tracert of a successful request.

I haven't checked through all the logs, but I have four machines with GPUs that all request replacement work every 7-10 minutes - and their caches are all full, with what looks like a steady allocation of new work through the night. One of them has reported completed work while I've been typing this.

But a few attempts at tracert yielded

C:\>tracert setiboinc.ssl.berkeley.edu

Tracing route to setiboinc.ssl.berkeley.edu [208.68.240.126]
over a maximum of 30 hops:

1 1 ms 1 ms 2 ms BThomehub.home
2 5 ms 7 ms 8 ms 217.32.143.233
3 7 ms 5 ms 6 ms 217.32.144.30
4 9 ms 7 ms 7 ms 212.140.235.226
5 7 ms 10 ms 7 ms 217.41.169.177
6 10 ms 7 ms 7 ms 213.120.179.81
7 7 ms 6 ms 7 ms acc1-xe-4-3-0.sf.21cn-ipp.bt.net [109.159.251.103]
8 14 ms 15 ms 16 ms acc2-10GigE-0-2-3.sf.21cn-ipp.bt.net [109.159.251.171]
9 14 ms 15 ms 15 ms transit2-xe1-0-0.ealing.ukcore.bt.net [62.6.200.134]
10 75 ms 16 ms 14 ms t2c4-xe-10-3-0-0.uk-eal.eu.bt.net [166.49.168.45]
11 * * * Request timed out.
12 154 ms 155 ms 153 ms ae-3-80.ear1.SanJose1.Level3.net [4.69.152.150]
13 * * 153 ms ae-3-80.ear1.SanJose1.Level3.net [4.69.152.150]
14 162 ms 165 ms 164 ms CENIC.ear1.SanJose1.Level3.net [4.15.122.46]
15 166 ms 164 ms 162 ms dc-oak-agg4--svl-agg4-100ge.cenic.net [137.164.46.144]
16 166 ms 166 ms 164 ms ucb--oak-agg4-10g.cenic.net [137.164.50.31]
17 161 ms 159 ms 160 ms t2-3.inr-202-reccev.Berkeley.EDU [128.32.0.39]
18 165 ms 165 ms 165 ms et3-47.inr-311-ewdc.Berkeley.EDU [128.32.0.103]
19 et3-47.inr-311-ewdc.Berkeley.EDU [128.32.0.103] reports: Destination host unreachable.

Trace complete.

The problem seems to be consistently the step after et3-47.inr-311-ewdc.Berkeley.EDU - i.e., on the Berkeley campus somewhere, as Matt said. I'll keep trying at intervals, and post the final step(s) if I catch them.
ID: 1730579 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1730580 - Posted: 1 Oct 2015, 8:39:18 UTC - in response to Message 1730577.  
Last modified: 1 Oct 2015, 8:44:49 UTC

setiboinc.ssl.berkeley.edu


13 et3-48.inr-311-ewdc.Berkeley.EDU [128.32.0.101] reports: Destination host
unreachable.

Trace complete.


Gets lost at the same place as Cosmic_Oceans tracert.


Definitely the same places I die, at "et-3", though sometimes it's 128.32.0.101, sometimes .100 and sometimes .99.
But I was wondering if that's the same box you hit on one of your successful attempts?
Edit: As we can see from Richard, he hit .103. Probably a pool of routers there; would not be the first time only one of the boxes in a rotor had a good route and the rest did not. Makes life really interesting.
Still scratching my head as to how it is we can get to these forums consistently. From the trace it doesn't seem like SSL is a factor there.
ID: 1730580 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1730581 - Posted: 1 Oct 2015, 8:45:53 UTC - in response to Message 1730580.  

setiboinc.ssl.berkeley.edu


13 et3-48.inr-311-ewdc.Berkeley.EDU [128.32.0.101] reports: Destination host
unreachable.

Trace complete.


Gets lost at the same place as Cosmic_Oceans tracert.


Definitely the same places I die, at "et-3", though sometimes it's 128.32.0.101, sometimes .100 and sometimes .99.
But I was wondering if that's the same box you hit on one of your successful attempts?

The ethernet port numbers change as well. You quoted

et3-48 at 128.32.0.101

I hit

et3-47 at 128.32.0.103

Perhaps we should keep a list of those? (especially the successful ones....)
ID: 1730581 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1730582 - Posted: 1 Oct 2015, 9:00:12 UTC - in response to Message 1730581.  
Last modified: 1 Oct 2015, 9:52:02 UTC

Perhaps we should keep a list of those? (especially the successful ones....)

I have no successes to report, but note that boinc.berkeley.edu is also toast now, wasn't a bit earlier, and somebody gets through as I get email updates of new messages.
In case it's interesting, I always hit .99, But I'm doing a different trace:

tracert setiathome.berkeley.edu

Tracing route to setiathome.berkeley.edu [208.68.240.110] over a maximum of 30 hops:

1 <1 ms <1 ms <1 ms router.asus.com [192.168.1.1]
...
15 53 ms 53 ms 50 ms dc-oak-agg4--svl-agg4-100ge.cenic.net [137.164.46.144]
16 56 ms 55 ms 50 ms ucb--oak-agg4-10g.cenic.net [137.164.50.31]
17 54 ms 51 ms 55 ms t2-3.inr-202-reccev.Berkeley.EDU [128.32.0.39]
18 50 ms 54 ms 51 ms e3-47.inr-310-ewdc.Berkeley.EDU [128.32.0.99]
19 * * Request timed out.

So my info may be meaningless, especially as we can get to the forums. Just noticed that ...
ID: 1730582 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1730606 - Posted: 1 Oct 2015, 11:26:56 UTC - in response to Message 1730582.  

I have no successes to report, but note that boinc.berkeley.edu is also toast now, wasn't a bit earlier, and somebody gets through as I get email updates of new messages.

I saw last night, in a small window of time that I could reach the BOINC forums, that there are some people for whom the Seti forums seem down, but they can reach the BOINC forums.

If for instance you find Mark Sattler so quiet around here, he was one of the peeps for whom Seti seemed down and was posting on the BOINC forums. A couple of others were in the same situation.

I myself can't now reach the BOINC forums anymore and still can't report all work done for Seti. Shrug, more time to play GTA V Online then versus other players. :)
ID: 1730606 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1730611 - Posted: 1 Oct 2015, 11:47:43 UTC - in response to Message 1730606.  

Which suggests it's primarily an internet/intranet routing problem. I've replied to a post on the BOINC forums within the last half hour, and I can see that there have been two further replies to my post since then.

But I'm getting intermittent, extended, delays on all communications links to all the various different servers at Berkeley - not all at the same time. SETI uploads seem to be the most reliable (I can't so easily monitor the downloads). Scheduler reports mostly go through, but sometimes they fail. Both the BOINC message boards, and the SETI message boards, come and go, seemingly at random.

This machine has had 87 failures like this so far today:

01-Oct-2015 12:39:21 [SETI@home] Scheduler request failed: Couldn't connect to server

but over the same period it's had 89 of these:

01-Oct-2015 12:28:05 [SETI@home] Scheduler request completed: got 1 new tasks
ID: 1730611 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1730626 - Posted: 1 Oct 2015, 13:07:50 UTC - in response to Message 1730614.  

It's the Pope, Daesh, or Obamas fault. No doubt about that, whatsoever.

Not blaming the Russians or the Chinese, then? :P
ID: 1730626 · Report as offensive
Profile Oz
Avatar

Send message
Joined: 6 Jun 99
Posts: 233
Credit: 200,655,462
RAC: 212
United States
Message 1730627 - Posted: 1 Oct 2015, 13:20:08 UTC - in response to Message 1730626.  
Last modified: 1 Oct 2015, 13:23:15 UTC

It's the Pope, Daesh, or Obamas fault. No doubt about that, whatsoever.

Not blaming the Russians or the Chinese, then? :P


No, I think Grumpy is right - it's Pope Obama in cahoots with the North Koreans and Isis. Now that McDonald's is in Russia and China, they are all fat and stupid like we Americans.
Member of the 20 Year Club



ID: 1730627 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30651
Credit: 53,134,872
RAC: 32
United States
Message 1730629 - Posted: 1 Oct 2015, 13:31:42 UTC - in response to Message 1730606.  

I have no successes to report, but note that boinc.berkeley.edu is also toast now, wasn't a bit earlier, and somebody gets through as I get email updates of new messages.

I saw last night, in a small window of time that I could reach the BOINC forums, that there are some people for whom the Seti forums seem down, but they can reach the BOINC forums.

If for instance you find Mark Sattler so quiet around here, he was one of the peeps for whom Seti seemed down and was posting on the BOINC forums. A couple of others were in the same situation.

I myself can't now reach the BOINC forums anymore and still can't report all work done for Seti. Shrug, more time to play GTA V Online then versus other players. :)
Well, last night I had no issues getting to all. This morning, here and Beta are fine, the dev board is unreachable. Wonder if the sites are under some DDOS attack and the campus IT firewall is reacting?
ID: 1730629 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1730635 - Posted: 1 Oct 2015, 13:49:20 UTC

I'm getting a private report - as yet unverified - of a possible circular route from et3-48 back to et3-48, via seven or eight unlabelled nodes. That might point to a potential configuration error (or corrupted routing table) in inr-311.
ID: 1730635 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1730638 - Posted: 1 Oct 2015, 14:24:00 UTC

Okay, thus far I found that when using Google DNS I can't reach the BOINC forums.
I could reach them with my mom's internet, and with the internet at the supermarket (why they have wifi, who knows?), nor the internet at the revalidation center my mum's currently in.

I'll go test different (free) DNS ranges.
ID: 1730638 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1730645 - Posted: 1 Oct 2015, 14:51:26 UTC - in response to Message 1730638.  
Last modified: 1 Oct 2015, 14:51:48 UTC

So far I tried to following values:

123.223.35.185
213.168.186.246
178.61.22.68
201.216.200.74
77.85.169.227
66.249.99.130
91.205.35.36
213.125.124.99
185.56.30.132
77.73.224.193
185.51.195.195
213.126.24.234

None worked. For people with more time on their hands than me, see http://public-dns.tk/ for more public DNS servers.
Now set back to Google DNS, also set for IPv6.

Hope there won't be spam attacks on the Dev forums, as I won't be able to stop it...
ID: 1730645 · Report as offensive
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · 21 . . . 32 · Next

Message boards : Number crunching : Panic Mode On (100) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.