Panic Mode On (9) Server problems


log in

Advanced search

Message boards : Number crunching : Panic Mode On (9) Server problems

Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next
Author Message
Profile Fred J. Verster
Volunteer tester
Avatar
Send message
Joined: 21 Apr 04
Posts: 3232
Credit: 31,585,541
RAC: 0
Netherlands
Message 809785 - Posted: 19 Sep 2008, 7:40:35 UTC - in response to Message 809779.
Last modified: 19 Sep 2008, 8:01:17 UTC

I knoo the possibility of DNS attacks was floated in the past,,,,,,but was discounted at the time.........might another new analysis of network traffic be in order at this time?
I was curious about the traffic too, as I've been studying some networking subjects lately. When I started traffic was at a much more humble rate around 20Mbps on the cricket graphs. Switching to the long term view recently, however, seems to tell the grim truth: bandwidth utilisation is scaling proportionally with Moore's law...


Hi Mark and Jason, not too much trouble in gettin WU's, UP- or DOWNloaded, only goes in big 'chunks', like 20 to 40 WU's at a time.
On all hosts, 'large' numbers off waitin to UPload. But never mind that.
Eventually, they get UPloaded.
{EDIT/ADD}We don't have to panic, though ;)
____________


Knight Who Says Ni N!, OUT numbered.................

Profile jason_gee
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4813
Credit: 71,606,593
RAC: 9,206
Australia
Message 809797 - Posted: 19 Sep 2008, 8:29:40 UTC - in response to Message 809783.
Last modified: 19 Sep 2008, 8:31:05 UTC

But is it all due to crunching, or server data transfer, or what??????
Well surely some peak periods would represent recovery, and /or bulk moving server data about, however the general trend, and I'm looking at the yearly cricket graph here, looks somewhere between linear and exponential growth, and has gone from ~20Mbps to 60Mbps+ [in about a year?].

Assuming from the hardware donations threads that upgrading internal network speeds is going to happen gradually, that still leaves that 100Mbps link 'up the hill'. If that connection is fibre of some sort, replacing the transceivers at either end I reckon would cost big bucks :(. I hope we'll be alright for a while longer before that capacity is required, but then look at the general hardware performance increases of the last few years, seem pretty staggering, and set to continue.
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44561
Credit: 35,424,374
RAC: 9,060
Message 809861 - Posted: 19 Sep 2008, 13:01:55 UTC - in response to Message 809785.

I know the possibility of DNS attacks was floated in the past,,,,,,but was discounted at the time.........might another new analysis of network traffic be in order at this time?
I was curious about the traffic too, as I've been studying some networking subjects lately. When I started traffic was at a much more humble rate around 20Mbps on the cricket graphs. Switching to the long term view recently, however, seems to tell the grim truth: bandwidth utilization is scaling proportionally with Moore's law...


Hi Mark and Jason, not too much trouble in gettin WU's, UP- or Downloaded, only goes in big 'chunks', like 20 to 40 WU's at a time.
On all hosts, 'large' numbers off waitin to Upload. But never mind that.
Eventually, they get Uploaded.
{EDIT/ADD}We don't have to panic, though ;)

I know that, But speculating is still fun. :)
____________

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4135
Credit: 1,004,312
RAC: 247
United States
Message 809938 - Posted: 19 Sep 2008, 18:55:45 UTC - in response to Message 809797.

But is it all due to crunching, or server data transfer, or what??????
Well surely some peak periods would represent recovery, and /or bulk moving server data about, however the general trend, and I'm looking at the yearly cricket graph here, looks somewhere between linear and exponential growth, and has gone from ~20Mbps to 60Mbps+ [in about a year?].

Assuming from the hardware donations threads that upgrading internal network speeds is going to happen gradually, that still leaves that 100Mbps link 'up the hill'. If that connection is fibre of some sort, replacing the transceivers at either end I reckon would cost big bucks :(. I hope we'll be alright for a while longer before that capacity is required, but then look at the general hardware performance increases of the last few years, seem pretty staggering, and set to continue.

One of Eric's posts estimated the upgrade cost at over 30K USD, definitely not pocket change.

As to the blip, I've noticed it at recurring periods on the Cricket graph. That is, the traffic in to SSL jumps up to about 12 MBits/sec at intervals of about 3 hours and remains high for about half an hour. If I try to upload or get new work during those high periods, there are a lot of HTTP 500 errors (Internal server error) and occasionally 503 or even 403. When the rate is below 10 MBits/sec those errors are fairly rare.

I think the 90+ MBits/sec output rates recently are mostly the project uploading raw data to the NERSC HPSS at LBNL. There were a couple of periods when the project was down but there was still a steady ~35 MBits/sec flowing out, that would be the max rate of that flow though Matt said it was NICE'd so probably doesn't get that much when WUs are also being downloaded. Still, those blips at ~3 hour intervals are suspiciously close to how long it takes to upload 50 GB at 35 MBits/sec.
Joe

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44561
Credit: 35,424,374
RAC: 9,060
Message 810209 - Posted: 20 Sep 2008, 7:40:37 UTC

Ok I'm seeing trouble with a tracert I've run, earlier My WU's were getting connect failures and HTTP errors.

Microsoft Windows [Version 5.2.3790]
(C) Copyright 1985-2003 Microsoft Corp.

C:\Documents and Settings\Administrator.PC1>tracert setiathome.berkeley.com

Tracing route to setiathome.berkeley.com [208.254.26.139]
over a maximum of 30 hops:

1 25 ms 25 ms 25 ms L100.DSL-35.LSANCA.verizon-gni.net [71.105.32.1]
2 27 ms 27 ms 26 ms G9-0-2035.LCR-10.LSANCA.verizon-gni.net [130.81.136.16]
3 28 ms 28 ms 28 ms P15-3.LCR-01.TAMPFL.verizon-gni.net [130.81.28.74]
4 30 ms 29 ms 30 ms 0.so-6-0-0.XT2.LAX9.ALTER.NET [152.63.10.157]
5 106 ms 105 ms 105 ms 0.so-5-0-2.XT2.DCA6.ALTER.NET [152.63.0.197]
6 107 ms 108 ms 107 ms so-0-0-0.ur2.iad6.web.wcom.net [157.130.59.74]
7 * * * Request timed out.
8 * * * Request timed out.
9 * * * Request timed out.
10 * * * Request timed out.
11 * * * Request timed out.
12 * * * Request timed out.
13 * * * Request timed out.
14 * * * Request timed out.
15 * * * Request timed out.
16 * * * Request timed out.
17 * * * Request timed out.
18 * * * Request timed out.
19 * * * Request timed out.
20 * * * Request timed out.
21 * * * Request timed out.
22 * * * Request timed out.
23 * * * Request timed out.
24 * * * Request timed out.
25 * * * Request timed out.
26 * * * Request timed out.
27 * * * Request timed out.
28 * * * Request timed out.
29 * * * Request timed out.
30 * * * Request timed out.

Trace complete.

C:\Documents and Settings\Administrator.PC1>

I also tried to ping berkeley and the ping timed out 4 times.
____________

Profile Careface
Send message
Joined: 6 Jun 03
Posts: 115
Credit: 11,626,751
RAC: 0
New Zealand
Message 810211 - Posted: 20 Sep 2008, 8:02:02 UTC
Last modified: 20 Sep 2008, 8:03:59 UTC

I'm getting rather worried now.. my "best" cruncher (laptop) is almost out of work, for some reason isnt requesting anymore, and whenever I try to upload the last 3 or so days of work, the uploads get to 100% and then fail.. is this a server problem do you think?

19/09/2008 8:00:41 p.m.|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
19/09/2008 8:01:00 p.m.||Project communication failed: attempting access to reference site
19/09/2008 8:01:00 p.m.|SETI@home|Temporarily failed upload of 02ap08ad.19170.23385.14.8.119_2_0: http error
19/09/2008 8:01:00 p.m.|SETI@home|Backing off 3 hr 50 min 48 sec on upload of 02ap08ad.19170.23385.14.8.119_2_0
19/09/2008 8:01:00 p.m.|SETI@home|Temporarily failed upload of 18au08ab.20784.12752.9.8.246_1_0: http error
19/09/2008 8:01:00 p.m.|SETI@home|Backing off 3 hr 58 min 31 sec on upload of 18au08ab.20784.12752.9.8.246_1_0
19/09/2008 8:01:00 p.m.|SETI@home|Started upload of 04ap08ac.18298.2936.12.8.211_2_0
19/09/2008 8:01:00 p.m.|SETI@home|Started upload of 18au08ab.10585.16842.10.8.44_1_0
19/09/2008 8:01:02 p.m.||Access to reference site succeeded - project servers may be temporarily down.

EDIT: My other cruncher can upload/download WU fine.. figured it could be the LAN connection (as my laptop is using wireless, and desktop is using wired), but I tried using both wireless and wired, and still no dice on the laptop.. *cries*

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44561
Credit: 35,424,374
RAC: 9,060
Message 810212 - Posted: 20 Sep 2008, 8:16:44 UTC - in response to Message 810211.

I'm getting rather worried now.. my "best" cruncher (laptop) is almost out of work, for some reason isnt requesting anymore, and whenever I try to upload the last 3 or so days of work, the uploads get to 100% and then fail.. is this a server problem do you think?

19/09/2008 8:00:41 p.m.|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
19/09/2008 8:01:00 p.m.||Project communication failed: attempting access to reference site
19/09/2008 8:01:00 p.m.|SETI@home|Temporarily failed upload of 02ap08ad.19170.23385.14.8.119_2_0: http error
19/09/2008 8:01:00 p.m.|SETI@home|Backing off 3 hr 50 min 48 sec on upload of 02ap08ad.19170.23385.14.8.119_2_0
19/09/2008 8:01:00 p.m.|SETI@home|Temporarily failed upload of 18au08ab.20784.12752.9.8.246_1_0: http error
19/09/2008 8:01:00 p.m.|SETI@home|Backing off 3 hr 58 min 31 sec on upload of 18au08ab.20784.12752.9.8.246_1_0
19/09/2008 8:01:00 p.m.|SETI@home|Started upload of 04ap08ac.18298.2936.12.8.211_2_0
19/09/2008 8:01:00 p.m.|SETI@home|Started upload of 18au08ab.10585.16842.10.8.44_1_0
19/09/2008 8:01:02 p.m.||Access to reference site succeeded - project servers may be temporarily down.

EDIT: My other cruncher can upload/download WU fine.. figured it could be the LAN connection (as my laptop is using wireless, and desktop is using wired), but I tried using both wireless and wired, and still no dice on the laptop.. *cries*

Don't know as mine are all wired, It looks like My farms RAC is going up and down like the teeth on a saw blade(crosscut).
____________

Profile Fred J. Verster
Volunteer tester
Avatar
Send message
Joined: 21 Apr 04
Posts: 3232
Credit: 31,585,541
RAC: 0
Netherlands
Message 810214 - Posted: 20 Sep 2008, 8:36:09 UTC - in response to Message 810212.
Last modified: 20 Sep 2008, 8:51:37 UTC

I'm getting rather worried now.. my "best" cruncher (laptop) is almost out of work, for some reason isnt requesting anymore, and whenever I try to upload the last 3 or so days of work, the uploads get to 100% and then fail.. is this a server problem do you think?

19/09/2008 8:00:41 p.m.|SETI@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 0 completed tasks
19/09/2008 8:01:00 p.m.||Project communication failed: attempting access to reference site
19/09/2008 8:01:00 p.m.|SETI@home|Temporarily failed upload of 02ap08ad.19170.23385.14.8.119_2_0: http error
19/09/2008 8:01:00 p.m.|SETI@home|Backing off 3 hr 50 min 48 sec on upload of 02ap08ad.19170.23385.14.8.119_2_0
19/09/2008 8:01:00 p.m.|SETI@home|Temporarily failed upload of 18au08ab.20784.12752.9.8.246_1_0: http error
19/09/2008 8:01:00 p.m.|SETI@home|Backing off 3 hr 58 min 31 sec on upload of 18au08ab.20784.12752.9.8.246_1_0
19/09/2008 8:01:00 p.m.|SETI@home|Started upload of 04ap08ac.18298.2936.12.8.211_2_0
19/09/2008 8:01:00 p.m.|SETI@home|Started upload of 18au08ab.10585.16842.10.8.44_1_0
19/09/2008 8:01:02 p.m.||Access to reference site succeeded - project servers may be temporarily down.

EDIT: My other cruncher can upload/download WU fine.. figured it could be the LAN connection (as my laptop is using wireless, and desktop is using wired), but I tried using both wireless and wired, and still no dice on the laptop.. *cries*

Don't know as mine are all wired, It looks like My farms RAC is going up and down like the teeth on a saw blade(crosscut).


As off 'now' (UTC=08:30) ready to report WU's are uploaded.

20-9-2008 10:07:23|SETI@home|Temporarily failed upload of 20au08ad.14308.11524.6.8.208_1_0: HTTP error
20-9-2008 10:07:23|SETI@home|Backing off 1 min 0 sec on upload of 20au08ad.14308.11524.6.8.208_1_0
20-9-2008 10:07:25||Internet access OK - project servers may be temporarily down.
20-9-2008 10:08:24|SETI@home|Started upload of 20au08ad.14308.11524.6.8.208_1_0
20-9-2008 10:08:59|SETI@home|Finished upload of 20au08ad.14308.11524.6.8.208_1_0
20-9-2008 10:28:27|SETI@home|Computation for task 20au08aa.13418.13978.6.8.147_1 finished
20-9-2008 10:28:27|Einstein@Home|Resuming task h1_0818.10_S5R4__542_S5R4a_0 using einstein_S5R4 version 604
20-9-2008 10:28:30|SETI@home|Started upload of 20au08aa.13418.13978.6.8.147_1_0

Then Inet-Access 'blocks' for a few hours and then goes on.
A 'tracert' looks like this now:



1 <1 ms <1 ms <1 ms SX551xxxxxx [192.168.2.1]
2 7 ms 7 ms 7 ms 195.190.249.32
3 10 ms 10 ms 9 ms iawxsrt-dc2-bb21-ge-5-0-0.328.wxs.nl [213.75.1.213]
4 25 ms 10 ms 10 ms 208.49.200.129
5 101 ms 94 ms 94 ms 0.ge-4-0-0.BR3.NYC4.ALTER.NET [204.255.169.125]
6 93 ms 93 ms 93 ms 0.ge-5-0-0.XL3.NYC4.ALTER.NET [152.63.3.109]
7 94 ms 93 ms 93 ms 0.so-4-0-3.XT1.DCA6.ALTER.NET [152.63.1.117]
8 96 ms 96 ms 96 ms so-0-0-0.ur1.iad6.web.wcom.net [157.130.59.70]
9 * * * Request timed out.
10 * * * Request timed out.
11 * * * Request timed out.
12 *

?
____________


Knight Who Says Ni N!, OUT numbered.................

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44561
Credit: 35,424,374
RAC: 9,060
Message 810747 - Posted: 21 Sep 2008, 21:35:49 UTC

Looks like the server could use some brief attention:

9/21/2008 2:28:04 PM|SETI@home|[file_xfer] Temporarily failed download of 22mr08ac.7857.14387.16.8.31: HTTP error
9/21/2008 2:28:04 PM|SETI@home|Backing off 4 min 2 sec on download of file 22mr08ac.7857.14387.16.8.31
9/21/2008 2:28:43 PM|SETI@home|[file_xfer] Started download of file 21au08ab.1813.25021.15.8.98
9/21/2008 2:29:03 PM|SETI@home|[file_xfer] Finished download of file 21au08ab.1813.25021.15.8.98
9/21/2008 2:29:03 PM|SETI@home|[file_xfer] Throughput 17615 bytes/sec
9/21/2008 2:29:46 PM|SETI@home|Computation for task 20au08ac.7620.11524.8.8.27_0 finished
9/21/2008 2:29:46 PM|SETI@home|Starting 20au08ab.7342.15614.11.8.64_1
9/21/2008 2:29:46 PM|SETI@home|Starting task 20au08ab.7342.15614.11.8.64_1 using setiathome_enhanced version 528
9/21/2008 2:29:48 PM|SETI@home|[file_xfer] Started upload of file 20au08ac.7620.11524.8.8.27_0_0
9/21/2008 2:30:24 PM||Project communication failed: attempting access to reference site
9/21/2008 2:30:24 PM|SETI@home|[file_xfer] Temporarily failed upload of 20au08ac.7620.11524.8.8.27_0_0: connect() failed
9/21/2008 2:30:24 PM|SETI@home|Backing off 1 min 0 sec on upload of file 20au08ac.7620.11524.8.8.27_0_0
9/21/2008 2:30:25 PM||Access to reference site succeeded - project servers may be temporarily down.
9/21/2008 2:31:12 PM||Project communication failed: attempting access to reference site
9/21/2008 2:31:12 PM|SETI@home|[file_xfer] Temporarily failed upload of 20au08ac.7620.11115.8.8.223_0_0: HTTP error
____________

MAKS
Send message
Joined: 29 Sep 03
Posts: 2
Credit: 61,096
RAC: 0
Germany
Message 811014 - Posted: 22 Sep 2008, 18:29:09 UTC

Same problems here (near Dresden, Germany)
BOINC can't keep the connection to fully upload/download the packets.
ping-pointing to the servers works fine on one second and stucks on the next one.

how about a subproject SENSS (Search for Exterior Network SETI-Servers)?
____________

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44561
Credit: 35,424,374
RAC: 9,060
Message 813262 - Posted: 29 Sep 2008, 23:15:42 UTC

Ok I'm getting HTTP errors and some of My PCs can't get in touch with the sever to either upload, report and probably download WU's.

Someone needs to nudge the server awake I think.
____________

Profile arkayn
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3544
Credit: 46,191,847
RAC: 31,008
United States
Message 813271 - Posted: 29 Sep 2008, 23:29:59 UTC

Looks like everything is back to normal as Matt killed that rogue process.
____________

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44561
Credit: 35,424,374
RAC: 9,060
Message 813274 - Posted: 29 Sep 2008, 23:46:46 UTC - in response to Message 813271.

Looks like everything is back to normal as Matt killed that rogue process.

Just what We needed a rogue process on the loose. It deserved to die. ;)
____________

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44561
Credit: 35,424,374
RAC: 9,060
Message 813303 - Posted: 30 Sep 2008, 1:15:09 UTC
Last modified: 30 Sep 2008, 1:16:33 UTC

I don't know what's going on, But uploads don't seem to be working here.

And now PC3 can't upload, Not just PC4, Somethings Borked.
____________

Profile arkayn
Volunteer tester
Avatar
Send message
Joined: 14 May 99
Posts: 3544
Credit: 46,191,847
RAC: 31,008
United States
Message 813336 - Posted: 30 Sep 2008, 2:08:52 UTC - in response to Message 813303.

I don't know what's going on, But uploads don't seem to be working here.

And now PC3 can't upload, Not just PC4, Somethings Borked.



All 3 of mine are uploading just fine.
____________

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44561
Credit: 35,424,374
RAC: 9,060
Message 814073 - Posted: 2 Oct 2008, 18:32:36 UTC

I think the scheduler needs some coffee, As It's not responding right now.

10/2/2008 11:28:09 AM|SETI@home|Reason: scheduler request failed
10/2/2008 11:28:22 AM|SETI@home|[file_xfer] Started upload of file 16au08aa.23831.9070.4.8.14_1_0
10/2/2008 11:28:38 AM|SETI@home|[file_xfer] Finished upload of file 16au08aa.23831.9070.4.8.14_1_0
10/2/2008 11:28:38 AM|SETI@home|[file_xfer] Throughput 1885 bytes/sec
10/2/2008 11:29:11 AM||Time passed...reporting result(s) now.
10/2/2008 11:29:11 AM|SETI@home|Sending scheduler request: To report completed tasks
10/2/2008 11:29:11 AM|SETI@home|Reporting 4 tasks
10/2/2008 11:29:33 AM||Project communication failed: attempting access to reference site
10/2/2008 11:29:35 AM||Access to reference site succeeded - project servers may be temporarily down.
10/2/2008 11:29:37 AM|SETI@home|Scheduler request failed: couldn't connect to server
10/2/2008 11:29:37 AM|SETI@home|Deferring communication for 1 min 42 sec
10/2/2008 11:29:37 AM|SETI@home|Reason: scheduler request failed
10/2/2008 11:30:28 AM|SETI@home|Sending scheduler request: Requested by user
10/2/2008 11:30:28 AM|SETI@home|Reporting 4 tasks
10/2/2008 11:30:49 AM||Project communication failed: attempting access to reference site
10/2/2008 11:30:50 AM||Access to reference site succeeded - project servers may be temporarily down.
10/2/2008 11:30:53 AM|SETI@home|Scheduler request failed: couldn't connect to server
10/2/2008 11:30:53 AM|SETI@home|Deferring communication for 5 min 19 sec
10/2/2008 11:30:53 AM|SETI@home|Reason: scheduler request failed
____________

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1384
Credit: 74,079
RAC: 0
United States
Message 814080 - Posted: 2 Oct 2008, 19:02:18 UTC

Coffee applied. It was actually the file upload handlers whining, I think. I moved them all back to bruno (where they are working fine) - up until just now half of them were going to anakin (for redundancy purposes), but for some reason anakin started barfing on them.

- Matt
____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44561
Credit: 35,424,374
RAC: 9,060
Message 814081 - Posted: 2 Oct 2008, 19:15:11 UTC - in response to Message 814080.

Coffee applied. It was actually the file upload handlers whining, I think. I moved them all back to bruno (where they are working fine) - up until just now half of them were going to anakin (for redundancy purposes), but for some reason anakin started barfing on them.

- Matt

Maybe anakin was havin an allergy attack? ;) Thanks Matt.
____________

Josef W. Segur
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4135
Credit: 1,004,312
RAC: 247
United States
Message 814085 - Posted: 2 Oct 2008, 19:30:18 UTC
Last modified: 2 Oct 2008, 19:31:08 UTC

I think it's fascinating that 22mr08aa still has the same two active channels as it had three weeks ago on 11 September, and 23mr08aa one. We've been remarkably lucky that the other splitters have had enough mid-range work to meet demand, thank the ALFALFA project for that.

Joe

Profile ignorance is no excuse
Avatar
Send message
Joined: 4 Oct 00
Posts: 9529
Credit: 44,432,240
RAC: 132
Korea, North
Message 814091 - Posted: 2 Oct 2008, 20:13:21 UTC

I've had trouble with sending results back for the last few days
____________
In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

End terrorism by building a school

Previous · 1 · 2 · 3 · 4 · 5 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (9) Server problems

Copyright © 2014 University of California