Panic Mode On (28) Server problems

Message boards : Number crunching : Panic Mode On (28) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · Next

AuthorMessage
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65734
Credit: 55,293,173
RAC: 49
United States
Message 971306 - Posted: 18 Feb 2010, 19:38:00 UTC

I got this when I did a tracert to Berkeley, and It looks like something withing Berkeley is busted.

Microsoft Windows [Version 5.2.3790]
(C) Copyright 1985-2003 Microsoft Corp.

C:\Documents and Settings\Administrator.PC1>tracert setiathome.berkeley.edu

Tracing route to setiathome.SSL.berkeley.edu [128.32.18.150]
over a maximum of 30 hops:

  1     1 ms     1 ms     1 ms  dslrouter.westell.com [192.168.1.1]
  2    33 ms    33 ms    33 ms  L100.LSANCA-DSL-35.verizon-gni.net [71.105.32.1]
  3    35 ms    35 ms    35 ms  9-0-2935.LSANCA-LCR-09.verizon-gni.net [130.81.136.14]
  4    37 ms    40 ms    36 ms  so-4-0-0-0.LAX01-BB-RTR1.verizon-gni.net [130.81.28.72]
  5    38 ms    37 ms    37 ms  0.so-6-3-0.XL3.LAX15.ALTER.NET [152.63.113.241]
  6    37 ms    37 ms    38 ms  0.xe-11-0-0.BR2.LAX15.ALTER.NET [152.63.116.157]
  7    38 ms    39 ms    94 ms  xe-10-1-0.edge1.LosAngeles9.Level3.net [4.68.63.129]
  8    37 ms    35 ms    36 ms  ae-1-60.edge5.LosAngeles1.Level3.net [4.69.144.11]
  9    38 ms    37 ms    37 ms  CENIC.edge5.LosAngeles1.Level3.net [4.59.48.178]
 10    46 ms    45 ms    46 ms  dc-svl-isp1--lax-isp1-ge.cenic.net [137.164.47.34]
 11    47 ms    49 ms    48 ms  inet-ucb--svl-isp.cenic.net [137.164.24.106]
 12    47 ms    47 ms    47 ms  g3-19.inr-201-eva.Berkeley.EDU [128.32.0.58]
 13    48 ms    57 ms    48 ms  g6-1.inr-230-spr.Berkeley.EDU [128.32.255.110]
 14     *        *        *     Request timed out.
 15    57 ms    47 ms    55 ms  thinman.ssl.berkeley.edu [128.32.18.150]

Trace complete.

The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 971306 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971312 - Posted: 18 Feb 2010, 19:46:23 UTC - in response to Message 971292.  

When the heavy number crunchers move on to other things where will that leave the science of Seti?

I'm not saying this is good, but....

If some crunchers leave, that reduces load, making the load more tolerable for others, making those who stayed less likely to leave.

I'm also not sure it's necessary. We're talking less about technology and more about psychology, and I didn't take psych, I took computer science.

The only real problem is when the average load gets really close to what the servers can handle, then the knee gets really really dangerous.

... and the answer is to tune clients to smooth off the peaks and raise the valleys.
ID: 971312 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971313 - Posted: 18 Feb 2010, 19:47:39 UTC - in response to Message 971306.  

I got this when I did a tracert to Berkeley, and It looks like something withing Berkeley is busted.

Microsoft Windows [Version 5.2.3790]
(C) Copyright 1985-2003 Microsoft Corp.

C:\Documents and Settings\Administrator.PC1>tracert setiathome.berkeley.edu

Tracing route to setiathome.SSL.berkeley.edu [128.32.18.150]
over a maximum of 30 hops:

  1     1 ms     1 ms     1 ms  dslrouter.westell.com [192.168.1.1]
  2    33 ms    33 ms    33 ms  L100.LSANCA-DSL-35.verizon-gni.net [71.105.32.1]
  3    35 ms    35 ms    35 ms  9-0-2935.LSANCA-LCR-09.verizon-gni.net [130.81.136.14]
  4    37 ms    40 ms    36 ms  so-4-0-0-0.LAX01-BB-RTR1.verizon-gni.net [130.81.28.72]
  5    38 ms    37 ms    37 ms  0.so-6-3-0.XL3.LAX15.ALTER.NET [152.63.113.241]
  6    37 ms    37 ms    38 ms  0.xe-11-0-0.BR2.LAX15.ALTER.NET [152.63.116.157]
  7    38 ms    39 ms    94 ms  xe-10-1-0.edge1.LosAngeles9.Level3.net [4.68.63.129]
  8    37 ms    35 ms    36 ms  ae-1-60.edge5.LosAngeles1.Level3.net [4.69.144.11]
  9    38 ms    37 ms    37 ms  CENIC.edge5.LosAngeles1.Level3.net [4.59.48.178]
 10    46 ms    45 ms    46 ms  dc-svl-isp1--lax-isp1-ge.cenic.net [137.164.47.34]
 11    47 ms    49 ms    48 ms  inet-ucb--svl-isp.cenic.net [137.164.24.106]
 12    47 ms    47 ms    47 ms  g3-19.inr-201-eva.Berkeley.EDU [128.32.0.58]
 13    48 ms    57 ms    48 ms  g6-1.inr-230-spr.Berkeley.EDU [128.32.255.110]
 14     *        *        *     Request timed out.
 15    57 ms    47 ms    55 ms  thinman.ssl.berkeley.edu [128.32.18.150]

Trace complete.

Are you talking about line 14?
ID: 971313 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65734
Credit: 55,293,173
RAC: 49
United States
Message 971315 - Posted: 18 Feb 2010, 19:49:55 UTC - in response to Message 971313.  

I got this when I did a tracert to Berkeley, and It looks like something withing Berkeley is busted.

Microsoft Windows [Version 5.2.3790]
(C) Copyright 1985-2003 Microsoft Corp.

C:\Documents and Settings\Administrator.PC1>tracert setiathome.berkeley.edu

Tracing route to setiathome.SSL.berkeley.edu [128.32.18.150]
over a maximum of 30 hops:

  1     1 ms     1 ms     1 ms  dslrouter.westell.com [192.168.1.1]
  2    33 ms    33 ms    33 ms  L100.LSANCA-DSL-35.verizon-gni.net [71.105.32.1]
  3    35 ms    35 ms    35 ms  9-0-2935.LSANCA-LCR-09.verizon-gni.net [130.81.136.14]
  4    37 ms    40 ms    36 ms  so-4-0-0-0.LAX01-BB-RTR1.verizon-gni.net [130.81.28.72]
  5    38 ms    37 ms    37 ms  0.so-6-3-0.XL3.LAX15.ALTER.NET [152.63.113.241]
  6    37 ms    37 ms    38 ms  0.xe-11-0-0.BR2.LAX15.ALTER.NET [152.63.116.157]
  7    38 ms    39 ms    94 ms  xe-10-1-0.edge1.LosAngeles9.Level3.net [4.68.63.129]
  8    37 ms    35 ms    36 ms  ae-1-60.edge5.LosAngeles1.Level3.net [4.69.144.11]
  9    38 ms    37 ms    37 ms  CENIC.edge5.LosAngeles1.Level3.net [4.59.48.178]
 10    46 ms    45 ms    46 ms  dc-svl-isp1--lax-isp1-ge.cenic.net [137.164.47.34]
 11    47 ms    49 ms    48 ms  inet-ucb--svl-isp.cenic.net [137.164.24.106]
 12    47 ms    47 ms    47 ms  g3-19.inr-201-eva.Berkeley.EDU [128.32.0.58]
 13    48 ms    57 ms    48 ms  g6-1.inr-230-spr.Berkeley.EDU [128.32.255.110]
 14     *        *        *     Request timed out.
 15    57 ms    47 ms    55 ms  thinman.ssl.berkeley.edu [128.32.18.150]

Trace complete.

Are you talking about line 14?

No I'm talking about the line item veto, What do Ya think I'm talking about?
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 971315 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 971316 - Posted: 18 Feb 2010, 19:50:24 UTC - in response to Message 971305.  

I've just tried clicking a 'retry upload' button (one machine, two clicks - no more). It made a valiant effort, but no complete uploads. I'm aware there's a problem. Then I looked (again) at the Cricket graph: it's steady at well over 90 Mbits. Diagnosis? Normal for Tuesday - I wouldn't expect uploads to be going through just now. Response - leave it well alone, and see if it sorts itself out when things are quieter.

But I think there's a tendency, in both your and Matt's posts, to assume that the diagnosis is 'overload' (in one of its many forms), and formulate the response accordingly: in fact, immediately following that snip of Matt's I posted earlier, he says "This should simmer down in due time." If the diagnosis of overwork is correct, that would be the appropriate response - go away and do something more constructive with your time.

But I think he missed out the 'awareness' stage. I don't think Matt was aware, when he posted that, that the upload failures were - in my opinion - from some different cause, and hence not likely to be self-healing through benign indifference. There are some problems which don't go away of their own accord.


Scarecrow's graphs are very telling that there is a continuing problem with uploads.
Normally after an outage, even with download bandwith fully saturated, there is a surge of uploads- over 100,000/hr (even as high as 180,000/hr) where the usual rate is a round 50,000/hr. After last Tuesday's outage the peak was about 70,000/hr. After the aircon outage it barely reached 35,000 per hour. After such an outage i would have expected a new reacord of over 180,000/hr.

With a shorty storm the return rate can be as high as 60,000/hr, but over the last week the rate has barely been 40,000/hr which means Matts statement about a problem due to short/noisy work units just can't be right.
Grant
Darwin NT
ID: 971316 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65734
Credit: 55,293,173
RAC: 49
United States
Message 971317 - Posted: 18 Feb 2010, 19:52:34 UTC - in response to Message 971316.  

I've just tried clicking a 'retry upload' button (one machine, two clicks - no more). It made a valiant effort, but no complete uploads. I'm aware there's a problem. Then I looked (again) at the Cricket graph: it's steady at well over 90 Mbits. Diagnosis? Normal for Tuesday - I wouldn't expect uploads to be going through just now. Response - leave it well alone, and see if it sorts itself out when things are quieter.

But I think there's a tendency, in both your and Matt's posts, to assume that the diagnosis is 'overload' (in one of its many forms), and formulate the response accordingly: in fact, immediately following that snip of Matt's I posted earlier, he says "This should simmer down in due time." If the diagnosis of overwork is correct, that would be the appropriate response - go away and do something more constructive with your time.

But I think he missed out the 'awareness' stage. I don't think Matt was aware, when he posted that, that the upload failures were - in my opinion - from some different cause, and hence not likely to be self-healing through benign indifference. There are some problems which don't go away of their own accord.


Scarecrow's graphs are very telling that there is a continuing problem with uploads.
Normally after an outage, even with download bandwith fully saturated, there is a surge of uploads- over 100,000/hr (even as high as 180,000/hr) where the usual rate is a round 50,000/hr. After last Tuesday's outage the peak was about 70,000/hr. After the aircon outage it barely reached 35,000 per hour. After such an outage i would have expected a new reacord of over 180,000/hr.

With a shorty storm the return rate can be as high as 60,000/hr, but over the last week the rate has barely been 40,000/hr which means Matts statement about a problem due to short/noisy work units just can't be right.

And that's why I shut down Boinc, As It's rather pointless to crunch since nothing can be uploaded or reported and I've tried to no avail.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 971317 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 971319 - Posted: 18 Feb 2010, 19:55:14 UTC - in response to Message 971317.  
Last modified: 18 Feb 2010, 19:55:31 UTC


And that's why I shut down Boinc, As It's rather pointless to crunch since nothing can be uploaded or reported and I've tried to no avail.

It's NOT pointless to continue to crunch what work you have.
Boinc will continue to store the results and upload/report them when the servers are able to service the requests.
Whenever the dam breaks, regardless of the root cause.

Keep 'em crunching folks.

Meow meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 971319 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971320 - Posted: 18 Feb 2010, 19:56:38 UTC - in response to Message 971305.  
Last modified: 18 Feb 2010, 20:03:35 UTC

But I think there's a tendency, in both your and Matt's posts, to assume that the diagnosis is 'overload' (in one of its many forms), and formulate the response accordingly: in fact, immediately following that snip of Matt's I posted earlier, he says "This should simmer down in due time." If the diagnosis of overwork is correct, that would be the appropriate response - go away and do something more constructive with your time.

To which I have two comments:

1) It's a theory. Once you have a theory, you go to the metrics, and go through the troubleshooting, and if you find that the facts don't fit the theory, well, it wouldn't be the first time.

2) It is said that "The race doesn't always go to the fastest, or the fight to the strongest, but that's how you bet."

There is a strong correlation between things that cause higher loading (AP, "shorty storms", outages) and complaints about uploads and reporting.

There is one more thing that draws me to loading: I can't think of way to prevent the A/C from breaking by writing software. Software (in the client) can mitigate a loading issue, so I'm most interested in that part of the problem.

Edit: As for Matt, I assume he has a much better picture of the situation than I do, since he'd have access to at least some of the metrics on my wish list, and he can query the servers to see what they're really doing.
ID: 971320 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971322 - Posted: 18 Feb 2010, 20:00:53 UTC - in response to Message 971315.  

I got this when I did a tracert to Berkeley, and It looks like something withing Berkeley is busted.

Microsoft Windows [Version 5.2.3790]
(C) Copyright 1985-2003 Microsoft Corp.

C:\Documents and Settings\Administrator.PC1>tracert setiathome.berkeley.edu

Tracing route to setiathome.SSL.berkeley.edu [128.32.18.150]
over a maximum of 30 hops:

  1     1 ms     1 ms     1 ms  dslrouter.westell.com [192.168.1.1]
  2    33 ms    33 ms    33 ms  L100.LSANCA-DSL-35.verizon-gni.net [71.105.32.1]
  3    35 ms    35 ms    35 ms  9-0-2935.LSANCA-LCR-09.verizon-gni.net [130.81.136.14]
  4    37 ms    40 ms    36 ms  so-4-0-0-0.LAX01-BB-RTR1.verizon-gni.net [130.81.28.72]
  5    38 ms    37 ms    37 ms  0.so-6-3-0.XL3.LAX15.ALTER.NET [152.63.113.241]
  6    37 ms    37 ms    38 ms  0.xe-11-0-0.BR2.LAX15.ALTER.NET [152.63.116.157]
  7    38 ms    39 ms    94 ms  xe-10-1-0.edge1.LosAngeles9.Level3.net [4.68.63.129]
  8    37 ms    35 ms    36 ms  ae-1-60.edge5.LosAngeles1.Level3.net [4.69.144.11]
  9    38 ms    37 ms    37 ms  CENIC.edge5.LosAngeles1.Level3.net [4.59.48.178]
 10    46 ms    45 ms    46 ms  dc-svl-isp1--lax-isp1-ge.cenic.net [137.164.47.34]
 11    47 ms    49 ms    48 ms  inet-ucb--svl-isp.cenic.net [137.164.24.106]
 12    47 ms    47 ms    47 ms  g3-19.inr-201-eva.Berkeley.EDU [128.32.0.58]
 13    48 ms    57 ms    48 ms  g6-1.inr-230-spr.Berkeley.EDU [128.32.255.110]
 14     *        *        *     Request timed out.
 15    57 ms    47 ms    55 ms  thinman.ssl.berkeley.edu [128.32.18.150]

Trace complete.

Are you talking about line 14?

No I'm talking about the line item veto, What do Ya think I'm talking about?

I have absolutely no idea what you're talking about, because all I see is a router that doesn't return ICMP echo requests (a common setting on all "real" routers). It's not something I do on my networks, but it's not unusual.

You're also looking at the SETI@Home web server, and not one of the data servers. They're on SETI's bandwidth (through Hurricane Electric), while the web server is on Campus bandwidth through Cenic.
ID: 971322 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65734
Credit: 55,293,173
RAC: 49
United States
Message 971324 - Posted: 18 Feb 2010, 20:09:30 UTC - in response to Message 971322.  

I got this when I did a tracert to Berkeley, and It looks like something withing Berkeley is busted.

Microsoft Windows [Version 5.2.3790]
(C) Copyright 1985-2003 Microsoft Corp.

C:\Documents and Settings\Administrator.PC1>tracert setiathome.berkeley.edu

Tracing route to setiathome.SSL.berkeley.edu [128.32.18.150]
over a maximum of 30 hops:

  1     1 ms     1 ms     1 ms  dslrouter.westell.com [192.168.1.1]
  2    33 ms    33 ms    33 ms  L100.LSANCA-DSL-35.verizon-gni.net [71.105.32.1]
  3    35 ms    35 ms    35 ms  9-0-2935.LSANCA-LCR-09.verizon-gni.net [130.81.136.14]
  4    37 ms    40 ms    36 ms  so-4-0-0-0.LAX01-BB-RTR1.verizon-gni.net [130.81.28.72]
  5    38 ms    37 ms    37 ms  0.so-6-3-0.XL3.LAX15.ALTER.NET [152.63.113.241]
  6    37 ms    37 ms    38 ms  0.xe-11-0-0.BR2.LAX15.ALTER.NET [152.63.116.157]
  7    38 ms    39 ms    94 ms  xe-10-1-0.edge1.LosAngeles9.Level3.net [4.68.63.129]
  8    37 ms    35 ms    36 ms  ae-1-60.edge5.LosAngeles1.Level3.net [4.69.144.11]
  9    38 ms    37 ms    37 ms  CENIC.edge5.LosAngeles1.Level3.net [4.59.48.178]
 10    46 ms    45 ms    46 ms  dc-svl-isp1--lax-isp1-ge.cenic.net [137.164.47.34]
 11    47 ms    49 ms    48 ms  inet-ucb--svl-isp.cenic.net [137.164.24.106]
 12    47 ms    47 ms    47 ms  g3-19.inr-201-eva.Berkeley.EDU [128.32.0.58]
 13    48 ms    57 ms    48 ms  g6-1.inr-230-spr.Berkeley.EDU [128.32.255.110]
 14     *        *        *     Request timed out.
 15    57 ms    47 ms    55 ms  thinman.ssl.berkeley.edu [128.32.18.150]

Trace complete.

Are you talking about line 14?

No I'm talking about the line item veto, What do Ya think I'm talking about?

I have absolutely no idea what you're talking about, because all I see is a router that doesn't return ICMP echo requests (a common setting on all "real" routers). It's not something I do on my networks, but it's not unusual.

You're also looking at the SETI@Home web server, and not one of the data servers. They're on SETI's bandwidth (through Hurricane Electric), while the web server is on Campus bandwidth through Cenic.

Well I knew that, I just don't have the address of the data server, If I had that I could do a tracert on It then.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 971324 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971326 - Posted: 18 Feb 2010, 20:19:23 UTC - in response to Message 971324.  


I have absolutely no idea what you're talking about, because all I see is a router that doesn't return ICMP echo requests (a common setting on all "real" routers). It's not something I do on my networks, but it's not unusual.

You're also looking at the SETI@Home web server, and not one of the data servers. They're on SETI's bandwidth (through Hurricane Electric), while the web server is on Campus bandwidth through Cenic.

Well I knew that, I just don't have the address of the data server, If I had that I could do a tracert on It then.

Try "setiboincdata.ssl.berkeley.edu"

What you'll find is consistent with the Cricket Graphs (good ping times) and you'll find that none of the routers on that path filter ICMP echoes.

That's also consistent with what I said earlier: the bottleneck isn't bandwidth.
ID: 971326 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 971327 - Posted: 18 Feb 2010, 20:21:59 UTC - in response to Message 971326.  


Hmmm.
Outbound traffic volume has just plummeted, no increase in inbound (my uploads are still sitting there).
I suspect that all those that can download have, it's the backlog of uploads that's brought the download frenzy to an early end.
Grant
Darwin NT
ID: 971327 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 971328 - Posted: 18 Feb 2010, 20:27:02 UTC

I don't have any difficulty reaching the upload server: it's the answer I get back after I've reached it that suggests there's a problem:

18/02/2010 20:23:04|SETI@home|[file_xfer] Started upload of file 13fe07ac.24261.3344.7.10.225_0_0
18/02/2010 20:23:05||[http_debug] [ID#15] info: About to connect() to setiboincdata.ssl.berkeley.edu port 80 (#0)
18/02/2010 20:23:05||[http_debug] [ID#15] info:   Trying 208.68.240.16... 
18/02/2010 20:23:05||[http_debug] [ID#15] info: Connected to setiboincdata.ssl.berkeley.edu (208.68.240.16) port 80 (#0)
18/02/2010 20:23:05||[http_debug] [ID#15] Sent header to server: POST /sah_cgi/file_upload_handler HTTP/1.1
User-Agent: BOINC client (windows_intelx86 5.10.13)
Host: setiboincdata.ssl.berkeley.edu
Accept: */*
Accept-Encoding: deflate, gzip
Content-Type: application/x-www-form-urlencoded
Content-Length: 286


18/02/2010 20:23:05||[http_debug] [ID#15] Received header from server: HTTP/1.0 503 Service Unavailable

18/02/2010 20:23:05||[http_debug] [ID#15] Received header from server: Content-Type: text/html

18/02/2010 20:23:05||[http_debug] [ID#15] Received header from server: Content-Length: 53

18/02/2010 20:23:05||[http_debug] [ID#15] info: Expire cleared
18/02/2010 20:23:05||[http_debug] [ID#15] info: Closing connection #0
18/02/2010 20:23:06|SETI@home|[file_xfer] Temporarily failed upload of 13fe07ac.24261.3344.7.10.225_0_0: http error
ID: 971328 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65734
Credit: 55,293,173
RAC: 49
United States
Message 971330 - Posted: 18 Feb 2010, 20:32:20 UTC - in response to Message 971326.  


I have absolutely no idea what you're talking about, because all I see is a router that doesn't return ICMP echo requests (a common setting on all "real" routers). It's not something I do on my networks, but it's not unusual.

You're also looking at the SETI@Home web server, and not one of the data servers. They're on SETI's bandwidth (through Hurricane Electric), while the web server is on Campus bandwidth through Cenic.

Well I knew that, I just don't have the address of the data server, If I had that I could do a tracert on It then.

Try "setiboincdata.ssl.berkeley.edu"

What you'll find is consistent with the Cricket Graphs (good ping times) and you'll find that none of the routers on that path filter ICMP echoes.

That's also consistent with what I said earlier: the bottleneck isn't bandwidth.

Ok I did a tracert on the supplied address, Thanks.

Microsoft Windows [Version 5.2.3790]
(C) Copyright 1985-2003 Microsoft Corp.

C:\Documents and Settings\Administrator.PC1>tracert setiboincdata.ssl.berkeley.edu

Tracing route to setiboincdata.ssl.berkeley.edu [208.68.240.16]
over a maximum of 30 hops:

  1     1 ms     1 ms     1 ms  dslrouter.westell.com [192.168.1.1]
  2    35 ms    33 ms    85 ms  L100.LSANCA-DSL-35.verizon-gni.net [71.105.32.1]
  3    36 ms    35 ms    36 ms  9-0-2935.LSANCA-LCR-09.verizon-gni.net [130.81.136.14]
  4    38 ms    38 ms    37 ms  so-4-0-0-0.LAX01-BB-RTR1.verizon-gni.net [130.81.28.72]
  5    38 ms    39 ms    37 ms  0.so-6-3-0.XT1.LAX9.ALTER.NET [152.63.10.153]
  6    49 ms    49 ms    49 ms  0.ge-7-1-0.XL3.SJC7.ALTER.NET [152.63.48.254]
  7    49 ms    49 ms    48 ms  POS6-0-0.GW4.SJC7.ALTER.NET [152.63.48.241]
  8    47 ms    47 ms    49 ms  teliasonera-test-gw.customer.alter.net [157.130.215.70]
  9    49 ms    49 ms    49 ms  hurricane-113209-sjo-bb1.c.telia.net [213.248.86.54]
 10    47 ms    49 ms    49 ms  64.71.140.42
 11    57 ms    55 ms    56 ms  208.68.243.254
 12    56 ms    55 ms    55 ms  setiboincdata.ssl.berkeley.edu [208.68.240.16]

Trace complete.

C:\Documents and Settings\Administrator.PC1>

The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 971330 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 971331 - Posted: 18 Feb 2010, 20:34:38 UTC - in response to Message 971327.  


Hmmm.
Outbound traffic volume has just plummeted, no increase in inbound (my uploads are still sitting there).
I suspect that all those that can download have, it's the backlog of uploads that's brought the download frenzy to an early end.

The servers just ran out of AP WU's to send....that's what the bandwidth drop is about.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 971331 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 971334 - Posted: 18 Feb 2010, 20:49:54 UTC - in response to Message 971331.  

The servers just ran out of AP WU's to send....that's what the bandwidth drop is about.

In which case, there should be plenty of spare connections available.

But no:

18/02/2010 20:47:14|SETI@home|Sending scheduler request: Requested by user
18/02/2010 20:47:14|SETI@home|Requesting 35123 seconds of new work
18/02/2010 20:47:14||[http_debug] HTTP_OP::init_post(): http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
18/02/2010 20:47:14||[http_debug] [ID#18] info: About to connect() to setiboinc.ssl.berkeley.edu port 80 (#0)
18/02/2010 20:47:14||[http_debug] [ID#18] info:   Trying 208.68.240.20... 
18/02/2010 20:47:35||[http_debug] [ID#18] info: Timed out
18/02/2010 20:47:35||[http_debug] [ID#18] info: Failed connect to setiboinc.ssl.berkeley.edu:80; No error
18/02/2010 20:47:35||[http_debug] [ID#18] info: Expire cleared
18/02/2010 20:47:35||[http_debug] [ID#18] info: Closing connection #0
18/02/2010 20:47:35||[http_debug] HTTP error: couldn't connect to server
ID: 971334 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 971336 - Posted: 18 Feb 2010, 20:53:26 UTC - in response to Message 971331.  


Hmmm.
Outbound traffic volume has just plummeted, no increase in inbound (my uploads are still sitting there).
I suspect that all those that can download have, it's the backlog of uploads that's brought the download frenzy to an early end.

The servers just ran out of AP WU's to send....that's what the bandwidth drop is about.

Which means there's even more work waiting to be uploaded blocking downloads of more new work than i first thought.
Given the length of the outage, even with MutiBeam only work i'd expect the download traffic to have been pegged for at least 12 hours.
Grant
Darwin NT
ID: 971336 · Report as offensive
Rick
Avatar

Send message
Joined: 3 Dec 99
Posts: 79
Credit: 11,486,227
RAC: 0
United States
Message 971341 - Posted: 18 Feb 2010, 21:01:04 UTC - in response to Message 971301.  

Just noticed that my iMac got a set of tasks from Seti about 15 minutes ago. My other system is still unable to get any tasks. Guess my iMac's lottery number just happened to come up.


My second system got a download of 31 GPU tasks about 20 minutes ago but I got no CPU tasks.

ID: 971341 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 971344 - Posted: 18 Feb 2010, 21:20:13 UTC - in response to Message 971341.  
Last modified: 18 Feb 2010, 21:40:57 UTC

Got 15 GPU tasks about 15 minutes ago. No CPU tasks yet but there is light at the end of the tunnel finally. (Hope it's not a train coming through! :-) )


Ok, CPUs are happy now. Just got 22 WUs for them. That should keep me busy for awhile.


PROUD MEMBER OF Team Starfire World BOINC
ID: 971344 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 971346 - Posted: 18 Feb 2010, 21:23:50 UTC - in response to Message 971334.  

The servers just ran out of AP WU's to send....that's what the bandwidth drop is about.

In which case, there should be plenty of spare connections available.

But no:

18/02/2010 20:47:14||[http_debug] [ID#18] info:   Trying 208.68.240.20... 
18/02/2010 20:47:35||[http_debug] [ID#18] info: Timed out
18/02/2010 20:47:35||[http_debug] [ID#18] info: Failed connect to setiboinc.ssl.berkeley.edu:80; No error
18/02/2010 20:47:35||[http_debug] [ID#18] info: Expire cleared
18/02/2010 20:47:35||[http_debug] [ID#18] info: Closing connection #0
18/02/2010 20:47:35||[http_debug] HTTP error: couldn't connect to server

The exact same thing would happen if the initial TCP SYN got to the servers, but the SYN+ACK was late due to extreme loading (due to the sheer number of incoming SYNs). SYN packets are small, so not a lot of bandwidth needed.

I'm not saying that's the reason, just more than one way for this to happen.
ID: 971346 · Report as offensive
Previous · 1 . . . 12 · 13 · 14 · 15 · 16 · Next

Message boards : Number crunching : Panic Mode On (28) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.