Panic Mode On (24) Server problems

Message boards : Number crunching : Panic Mode On (24) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22186
Credit: 416,307,556
RAC: 380
United Kingdom
Message 932872 - Posted: 12 Sep 2009, 20:51:26 UTC

Looking at the server status there's something amiss with the AP splitters, with none running and jobs stalled in progress. MB splitters are working, but ti would be nice to get a few APs coming out.....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 932872 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 932877 - Posted: 12 Sep 2009, 21:39:35 UTC - in response to Message 932844.  

Is there still problems with the uploads, some goes through however my latest one has not gone through since 19:08 UTC? Will wait until it does when that is the the question.

Depends a lot on what you mean by "problem."

In my opinion, it is only a problem when work misses the deadline because it couldn't be uploaded in time.
ID: 932877 · Report as offensive
Profile twister@austria-national-team.at
Volunteer tester

Send message
Joined: 26 Jan 00
Posts: 30
Credit: 60,419,551
RAC: 0
Austria
Message 933049 - Posted: 13 Sep 2009, 14:41:35 UTC

My Pending the way to the 600,000 are, I still can not see.

And when I with my computer, the "task" look, I see that I yesterday on 12 Sept. WU will have, even that is not true. Even a whole day, the PC has requested and received packets. PHP also this result is not correct.


ID: 933049 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 933055 - Posted: 13 Sep 2009, 15:12:39 UTC - in response to Message 933049.  
Last modified: 13 Sep 2009, 15:13:27 UTC

My Pending the way to the 600,000 are, I still can not see.

And when I with my computer, the "task" look, I see that I yesterday on 12 Sept. WU will have, even that is not true. Even a whole day, the PC has requested and received packets. PHP also this result is not correct.


The replica database is offline.

Gruß,
Gundolf
ID: 933055 · Report as offensive
FiveHamlet
Avatar

Send message
Joined: 5 Oct 99
Posts: 783
Credit: 32,638,578
RAC: 0
United Kingdom
Message 933069 - Posted: 13 Sep 2009, 15:56:00 UTC
Last modified: 13 Sep 2009, 15:56:25 UTC

Replica database now back online.

From us all.
Thank's to whoever gave it a kick.
ID: 933069 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 933134 - Posted: 13 Sep 2009, 20:26:48 UTC

Astropulse splitters producing, download bandwidth heading for saturation....
                                                              Joe
ID: 933134 · Report as offensive
HAL

Send message
Joined: 28 Mar 03
Posts: 704
Credit: 870,617
RAC: 0
United States
Message 933162 - Posted: 13 Sep 2009, 23:10:27 UTC

replica database retired AGAIN - very short life span, new game of guess your score.
ID: 933162 · Report as offensive
Profile Vistro
Avatar

Send message
Joined: 6 Aug 08
Posts: 233
Credit: 316,549
RAC: 0
United States
Message 933166 - Posted: 13 Sep 2009, 23:19:35 UTC

BOINC replica database jocelyn Running

Replica seconds behind master Offline 0m

Which one is lying?
30+ Computers heading our way! Currently at the "Zomg we need to talk to our tech expert at the co-op about this first!!!" stage. 16 Lab machines and 14+ Staff machines each with 2.2Ghz CPUs and 256MB ram. Think they balance? The RAM certainly is bad
ID: 933166 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 933168 - Posted: 13 Sep 2009, 23:28:09 UTC - in response to Message 933166.  
Last modified: 13 Sep 2009, 23:33:05 UTC

Which one is lying?

Probably neither. The replica database is probably running as a database but not as a "replica" according to my Tasks page (i.e. it is not being kept up to date).

F.

[Edited]
ID: 933168 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 933238 - Posted: 14 Sep 2009, 6:34:27 UTC


Someone appears to have done something during the day- uploads are going through without any problems now.
Grant
Darwin NT
ID: 933238 · Report as offensive
Fred W
Volunteer tester

Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 933242 - Posted: 14 Sep 2009, 6:59:24 UTC - in response to Message 933238.  


Someone appears to have done something during the day- uploads are going through without any problems now.

Not from this side of the globe:

14/09/2009 07:56:20	SETI@home	Started upload of 25au09ab.19058.25016.7.10.249_1_0
14/09/2009 07:56:22		Project communication failed: attempting access to reference site
14/09/2009 07:56:22	SETI@home	Temporarily failed upload of 25au09ab.19058.25016.7.10.249_1_0: HTTP error
14/09/2009 07:56:22	SETI@home	Backing off 1 min 30 sec on upload of 25au09ab.19058.25016.7.10.249_1_0
14/09/2009 07:56:24		Internet access OK - project servers may be temporarily down.
14/09/2009 07:57:01	SETI@home	Started upload of 25au09ab.19058.25016.7.10.249_1_0
14/09/2009 07:57:03		Project communication failed: attempting access to reference site
14/09/2009 07:57:03	SETI@home	Temporarily failed upload of 25au09ab.19058.25016.7.10.249_1_0: HTTP error
14/09/2009 07:57:03	SETI@home	Backing off 5 min 9 sec on upload of 25au09ab.19058.25016.7.10.249_1_0


F.

ID: 933242 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 933252 - Posted: 14 Sep 2009, 8:30:15 UTC - in response to Message 933242.  


Someone appears to have done something during the day- uploads are going through without any problems now.

Not from this side of the globe:

My fault.
It was working well for a few hours there, untill not long after i commented on how well it had been working.
Mutliple attempts at uploading before successfull have resumed here as well.
Grant
Darwin NT
ID: 933252 · Report as offensive
W5DMG - Dave

Send message
Joined: 19 May 99
Posts: 155
Credit: 33,162,251
RAC: 0
United States
Message 933284 - Posted: 14 Sep 2009, 13:42:29 UTC - in response to Message 933252.  
Last modified: 14 Sep 2009, 13:49:25 UTC

9/14/2009 8:19:43 AM SETI@home Started upload of 29au09aa.20757.19699.10.10.252_0_0
9/14/2009 8:19:44 AM Project communication failed: attempting access to reference site
9/14/2009 8:19:44 AM SETI@home Temporarily failed upload of 29au09aa.20757.19699.10.10.252_0_0: HTTP error
9/14/2009 8:19:44 AM SETI@home Backing off 1 min 0 sec on upload of 29au09aa.20757.19699.10.10.252_0_0
9/14/2009 8:19:46 AM Internet access OK - project servers may be temporarily down.
9/14/2009 8:19:46 AM SETI@home Started upload of 28au09ad.15131.19699.3.10.73_0_0
9/14/2009 8:19:47 AM Project communication failed: attempting access to reference site
9/14/2009 8:19:47 AM SETI@home Temporarily failed upload of 28au09ad.15131.19699.3.10.73_0_0: HTTP error
9/14/2009 8:19:47 AM SETI@home Backing off 18 min 8 sec on upload of 28au09ad.15131.19699.3.10.73_0_0
9/14/2009 8:19:49 AM Internet access OK - project servers may be temporarily down.
9/14/2009 8:19:53 AM SETI@home Started upload of 28au09ad.15131.19699.3.10.169_0_0
9/14/2009 8:19:54 AM Project communication failed: attempting access to reference site


Yes I have been having this same problem now for the past 24 hours.
ID: 933284 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 933287 - Posted: 14 Sep 2009, 13:49:12 UTC
Last modified: 14 Sep 2009, 13:49:59 UTC


Hey - thanks for enable AstroPulse again..


UL don't go through..

ID: 933287 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 933450 - Posted: 15 Sep 2009, 4:28:03 UTC - in response to Message 933287.  
Last modified: 15 Sep 2009, 4:29:02 UTC

The upload problem is getting more severe.
It used to only take a few attempts for an upload to go through, now it's taking dozens (at a minimum).
Grant
Darwin NT
ID: 933450 · Report as offensive
PP

Send message
Joined: 3 Jul 99
Posts: 42
Credit: 10,012,664
RAC: 0
Sweden
Message 933475 - Posted: 15 Sep 2009, 8:48:40 UTC - in response to Message 933450.  
Last modified: 15 Sep 2009, 8:49:27 UTC

Yes, something is really wrong. I have hundreds of finished jobs pending upload. Any connection attempt looks like this:

10:42:44.802571 IP my.ip.42902 > setiboincdata.ssl.berkeley.edu.http: S 2647950799:2647950799(0) win 5840 <mss 1460,sackOK,timestamp 47594046 0,nop,wscale 7>
10:42:44.802727 IP my.ip.42903 > setiboincdata.ssl.berkeley.edu.http: S 1789212786:1789212786(0) win 5840 <mss 1460,sackOK,timestamp 47594046 0,nop,wscale 7>
10:42:45.008852 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42902: S 3884893807:3884893807(0) ack 2647950800 win 5696 <mss 1436,sackOK,timestamp 1175400497 47594046,nop,wscale 10>
10:42:45.009103 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42903: S 3888998256:3888998256(0) ack 1789212787 win 5696 <mss 1436,sackOK,timestamp 1175400498 47594046,nop,wscale 10>
10:42:45.011758 IP my.ip.42902 > setiboincdata.ssl.berkeley.edu.http: P 1:255(254) ack 1 win 46 <nop,nop,timestamp 47594109 1175400497>
10:42:45.012778 IP my.ip.42903 > setiboincdata.ssl.berkeley.edu.http: P 1:255(254) ack 1 win 46 <nop,nop,timestamp 47594109 1175400498>
10:42:45.014820 IP my.ip.42902 > setiboincdata.ssl.berkeley.edu.http: . ack 1 win 46 <nop,nop,timestamp 47594109 1175400497>
10:42:45.014823 IP my.ip.42903 > setiboincdata.ssl.berkeley.edu.http: . ack 1 win 46 <nop,nop,timestamp 47594109 1175400498>
10:42:45.218107 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42902: . ack 255 win 7 <nop,nop,timestamp 1175400707 47594109>
10:42:45.218233 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42902: R 1:1(0) ack 255 win 7 <nop,nop,timestamp 1175400707 47594109>
10:42:45.219858 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42903: . ack 255 win 7 <nop,nop,timestamp 1175400708 47594109>
10:42:45.220233 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42903: R 1:1(0) ack 255 win 7 <nop,nop,timestamp 1175400708 47594109>
10:42:45.220731 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42903: R 3888998257:3888998257(0) win 0
10:42:45.220981 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42902: R 3884893808:3884893808(0) win 0
10:42:45.224115 IP my.ip.42902 > setiboincdata.ssl.berkeley.edu.http: P 255:542(287) ack 1 win 46 <nop,nop,timestamp 47594172 1175400707>
10:42:45.226156 IP my.ip.42903 > setiboincdata.ssl.berkeley.edu.http: P 255:542(287) ack 1 win 46 <nop,nop,timestamp 47594172 1175400708>
10:42:45.430738 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42902: R 3884893808:3884893808(0) win 0
10:42:45.432610 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42903: R 3888998257:3888998257(0) win 0
10:42:46.400563 IP my.ip.42904 > setiboincdata.ssl.berkeley.edu.http: S 2087092463:2087092463(0) win 5840 <mss 1460,sackOK,timestamp 47594525 0,nop,wscale 7>
10:42:46.400567 IP my.ip.42905 > setiboincdata.ssl.berkeley.edu.http: S 807197585:807197585(0) win 5840 <mss 1460,sackOK,timestamp 47594525 0,nop,wscale 7>
10:42:46.605942 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42904: S 3925343597:3925343597(0) ack 2087092464 win 5696 <mss 1436,sackOK,timestamp 1175402095 47594525,nop,wscale 10>
10:42:46.606941 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42905: S 3929144410:3929144410(0) ack 807197586 win 5696 <mss 1436,sackOK,timestamp 1175402095 47594525,nop,wscale 10>
10:42:46.610895 IP my.ip.42904 > setiboincdata.ssl.berkeley.edu.http: P 1:255(254) ack 1 win 46 <nop,nop,timestamp 47594588 1175402095>
10:42:46.612937 IP my.ip.42905 > setiboincdata.ssl.berkeley.edu.http: P 1:255(254) ack 1 win 46 <nop,nop,timestamp 47594588 1175402095>
10:42:46.613956 IP my.ip.42904 > setiboincdata.ssl.berkeley.edu.http: . ack 1 win 46 <nop,nop,timestamp 47594588 1175402095>
10:42:46.613960 IP my.ip.42905 > setiboincdata.ssl.berkeley.edu.http: . ack 1 win 46 <nop,nop,timestamp 47594588 1175402095>
10:42:46.817697 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42904: . ack 255 win 7 <nop,nop,timestamp 1175402306 47594588>
10:42:46.817713 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42904: R 1:1(0) ack 255 win 7 <nop,nop,timestamp 1175402306 47594588>
10:42:46.819167 IP my.ip.42904 > setiboincdata.ssl.berkeley.edu.http: P 255:542(287) ack 1 win 46 <nop,nop,timestamp 47594652 1175402306>
10:42:46.820824 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42904: R 3925343598:3925343598(0) win 0
10:42:46.820837 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42905: . ack 255 win 7 <nop,nop,timestamp 1175402308 47594588>
10:42:46.820944 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42905: R 1:1(0) ack 255 win 7 <nop,nop,timestamp 1175402308 47594588>
10:42:46.822070 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42905: R 3929144411:3929144411(0) win 0
10:42:46.834479 IP my.ip.42905 > setiboincdata.ssl.berkeley.edu.http: P 255:538(283) ack 1 win 46 <nop,nop,timestamp 47594652 1175402308>
10:42:47.026203 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42904: R 3925343598:3925343598(0) win 0
10:42:47.041319 IP setiboincdata.ssl.berkeley.edu.http > my.ip.42905: R 3929144411:3929144411(0) win 0


The TCP handshake appears to go well but then the server simply resets the connection. No FIN, no graceful disconnect. I think those Berkeley network guys need the motivation of a well placed blow torch. ;-)
ID: 933475 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 933476 - Posted: 15 Sep 2009, 8:54:25 UTC
Last modified: 15 Sep 2009, 8:58:15 UTC

Is Astropulse really to blame ??
The Cricket graphs show the average "bits in" (or output to us) as 64Mbs and currently at 50Mbs. This should leave plenty of headroom for our uploads but the "bits out" (our uploads) has been falling steadily since midnight Sunday Berkeley time and is currently only 4.29Mbs.

This means that there is a lot of unused capacity on the system atm. So what is happening ?
ID: 933476 · Report as offensive
PP

Send message
Joined: 3 Jul 99
Posts: 42
Credit: 10,012,664
RAC: 0
Sweden
Message 933477 - Posted: 15 Sep 2009, 9:05:23 UTC - in response to Message 933476.  

Maybe they forgot to upgrade all of their Cisco routers... :-)
http://www.cisco.com/warp/public/707/cisco-sa-20090908-tcp24.shtml
ID: 933477 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 933478 - Posted: 15 Sep 2009, 9:23:02 UTC - in response to Message 933476.  

Is Astropulse really to blame ??

Nope.
It's some sort of issue with the upload server, hopefully just a configuration tweak is all that'll be required.
Grant
Darwin NT
ID: 933478 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 933479 - Posted: 15 Sep 2009, 10:12:36 UTC - in response to Message 933478.  

Is Astropulse really to blame ??

Nope.
It's some sort of issue with the upload server, hopefully just a configuration tweak is all that'll be required.

Or possibly a router problem: PP's information is interesting.

Cisco's advisory suggests that a router reboot may be helpful: that should be a relatively easy thing to try during the maintenance outage today.
ID: 933479 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Panic Mode On (24) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.