Panic Mode On (100) Server Problems?

Message boards : Number crunching : Panic Mode On (100) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 21 · 22 · 23 · 24 · 25 · 26 · 27 . . . 32 · Next

AuthorMessage
Profile Cactus Bob
Avatar

Send message
Joined: 19 May 99
Posts: 209
Credit: 10,924,287
RAC: 29
Canada
Message 1730999 - Posted: 2 Oct 2015, 4:32:09 UTC

looking through my logs, I have some connecting issues but overall not many. I am up to date on reporting and receiving WU's. This is from Arizona, USA. I suspect TR's wont help much but if anyone thinks it wilI will do a few. Running at 80% and that works for me until the problem gets fixed.

Bob
Sometimes I wonder, what happened to all the people I gave directions to?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SETI@home classic workunits 4,321
SETI@home classic CPU time 22,169 hours
ID: 1730999 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1731007 - Posted: 2 Oct 2015, 5:57:06 UTC - in response to Message 1730970.  

Still having issues contacting the Scheduler.

Came home to find a tonne of WUs waiting to be reported, repeated abuse of the Retry button eventually got that sorted.
But requests since then still resulting in random "Scheduler request failed: Couldn't connect to server" messages, interspersed with the odd success.
Grant
Darwin NT
ID: 1731007 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1731008 - Posted: 2 Oct 2015, 6:01:19 UTC - in response to Message 1731007.  
Last modified: 2 Oct 2015, 6:07:14 UTC

Tracert on setiboinc.ssl.berkeley.edu

9 241 ms 241 ms 240 ms dc-oak-agg4--svl-agg4-100ge.cenic.net [137.164.46.144]
10 253 ms 255 ms 257 ms ucb--oak-agg4-10g.cenic.net [137.164.50.31]
11 274 ms 274 ms 275 ms t2-3.inr-201-sut.Berkeley.EDU [128.32.0.37]
12 240 ms 241 ms 241 ms et3-48.inr-311-ewdc.Berkeley.EDU [128.32.0.101]

13 * * et3-48.inr-311-ewdc.Berkeley.EDU [128.32.0.101] reports: Destination host unreachable.

Trace complete.
Grant
Darwin NT
ID: 1731008 · Report as offensive
Iztok s52d (and friends)

Send message
Joined: 12 Jan 01
Posts: 136
Credit: 393,469,375
RAC: 116
Slovenia
Message 1731009 - Posted: 2 Oct 2015, 6:07:02 UTC - in response to Message 1730999.  

hi!
not DNS: both working and nonworking ISPs resolve to 208.68.240.126.

but for most of my machines, 208.68.240.126 port 80 is unreacable.

73
s52d

one ISP, not working:

wget http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
--2015-10-02 07:59:16-- http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
Resolving setiboinc.ssl.berkeley.edu... 208.68.240.126
Connecting to setiboinc.ssl.berkeley.edu|208.68.240.126|:80... failed: Connection timed out.
Retrying.



working machine, another ISP:

wget http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
--2015-10-02 08:00:39-- http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
Resolving setiboinc.ssl.berkeley.edu (setiboinc.ssl.berkeley.edu)... 208.68.240.126
Connecting to setiboinc.ssl.berkeley.edu (setiboinc.ssl.berkeley.edu)|208.68.240.126|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/xml]
Saving to: ‘cgi.1’

[ <=> ] 299 --.-K/s in 0s

2015-10-02 08:00:39 (26.4 MB/s) - ‘cgi.1’ saved [299]
ID: 1731009 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1731011 - Posted: 2 Oct 2015, 6:19:19 UTC - in response to Message 1730982.  
Last modified: 2 Oct 2015, 6:35:35 UTC

yep i got about 600 wu's waiting to upload from all the machines

funny enough one of my machines has no issue at all it s happily crunching away and reporting ...same wan ip

For the one that's successfully reporting, about how many tasks is it reporting in each scheduler request? Also, do you happen to know if the machines have different MTU values?

This is getting maddening.
My MTU is 1492, set on my router through which all rigs communicate.

There was a time long ago when very large numbers of WUs being reported in a work request caused trouble. I now have all rigs set to report a maximum of 100 at a time, regardless of how many are queued to return.

I find it hard to believe than an IT expert, of whom the Berkeley campus IT department SHOULD have on staff, cannot employ a network analyzer to find the rouge switch or router and get this problem gone.

Perhaps a piece of kit that got marginalized by the heat of the colo fire?
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1731011 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1731018 - Posted: 2 Oct 2015, 6:35:29 UTC - in response to Message 1731011.  

I find it hard to believe than an IT expert, of whom the Berkeley campus IT department SHOULD have on staff, cannot employ a network analyzer to find the rouge switch or router and get this problem gone.

Well, what I saw in the "real world" was that if a production server was unavailable due to something like this, for more than 36 hours, the IT manager wasn't too worried ... because he was already looking for a new job ;)
ID: 1731018 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1731024 - Posted: 2 Oct 2015, 7:04:56 UTC

Looks like my message may have gotten thru as regards the replica anyway........
Sent it to Eric, Matt, and Jeff about an hour ago, and it's just come back up, about 14k seconds behind.

Now if only we could get such luck with the comms situation.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1731024 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1731025 - Posted: 2 Oct 2015, 7:37:40 UTC

Can someone please enlighten me as to what is going on ... or, rather, is NOT going on? Since about 10am EDT on 9/30 I have not been able to communicate ONCE with Berkeley on either of my crunchers, but the Status page seems to be OK. I haven't really followed the discussion here about a possible "bad" router; what does it all mean?

Is anybody really getting work?

Is SETI dead, or what????
ID: 1731025 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1731027 - Posted: 2 Oct 2015, 7:41:16 UTC - in response to Message 1731025.  

Can someone please enlighten me as to what is going on ... or, rather, is NOT going on? Since about 10am EDT on 9/30 I have not been able to communicate ONCE with Berkeley on either of my crunchers, but the Status page seems to be OK. I haven't really followed the discussion here about a possible "bad" router; what does it all mean?

Is anybody really getting work?

Is SETI dead, or what????

Not dead, the kitties have the crunchers up to full cache here.
However, I had to hit the update button on a couple of rigs to entice them to try again because Boinc backoff times had convinced them to give up trying.
Comms are hit and miss at present.
Eventually, most rigs will make contact.
May take some very long timeouts before they try again if they miss too many times in a row, however.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1731027 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1731031 - Posted: 2 Oct 2015, 7:52:09 UTC

And carolyn (the replica server) is back up to speed again....

Meow!
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1731031 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1731035 - Posted: 2 Oct 2015, 8:07:35 UTC

Sorry, Mark, more like Barf!

This really sucks. Every attempted access (Including all via Update button) gets "Scheduler request failed: couldn't connect to Server".

This is really frustrating. Now that it is cooling off here in New England, I can use the heat from my crunchers running...

Is there anything I can do?
ID: 1731035 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1731036 - Posted: 2 Oct 2015, 8:08:32 UTC - in response to Message 1731035.  

Is there anything I can do?

Back up projects. Else, nothing.
ID: 1731036 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1731037 - Posted: 2 Oct 2015, 8:13:59 UTC - in response to Message 1731035.  

Sorry, Mark, more like Barf!

This really sucks. Every attempted access (Including all via Update button) gets "Scheduler request failed: couldn't connect to Server".

This is really frustrating. Now that it is cooling off here in New England, I can use the heat from my crunchers running...

Is there anything I can do?

Other than wearing out the retry button, not much at the moment.
It is unclear why some can get comms, and some cannot. It is dicey even for those who are somewhat successful, as the kitties have been.

It's down to 39f here tonight, and I am still shunting heat outside from the crunchers. 9 rigs at full bore generate a lot of heat. The fans in the garage are cycling on and off.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1731037 · Report as offensive
Profile Kibble (KB7TIB)
Avatar

Send message
Joined: 6 Dec 99
Posts: 27
Credit: 10,121,469
RAC: 2
United States
Message 1731039 - Posted: 2 Oct 2015, 8:17:04 UTC - in response to Message 1731037.  

<sighs>
ID: 1731039 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1731041 - Posted: 2 Oct 2015, 8:33:30 UTC - in response to Message 1731039.  
Last modified: 2 Oct 2015, 8:34:38 UTC

<sighs>

Kittysighs here as well, at times.
I have been blessed a bit.
Out of some 175,000 active hosts on Seti, I have 9.
And the kitties keep them in tight tow.
But why I seem to have less trouble than some?
I cannot answer that.
The right hardware?
Possibly. All nVidia, all Intel.
Old Boinc, not the most current drivers.
I go with what works and update as little as possible.

All connected through an 8 port BEFSR81 Linksys router, one port with a switch that allows 2 rigs to share a port, and then on to my ATT modem. My ATT speed is nothing special. I am on a very, very old part of the lines here, and I get all I can by paying top dollar with not getting nearly as much speed as some others.

It all just works.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1731041 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1731042 - Posted: 2 Oct 2015, 8:33:53 UTC - in response to Message 1730905.  

According to David the routing problem is fixed.

Please tell David it isn't fixed for the BOINC domain currently.

I'm getting ERR_ADDRESS_UNREACHABLE, as often as not.
ID: 1731042 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1731044 - Posted: 2 Oct 2015, 8:35:40 UTC - in response to Message 1731042.  

According to David the routing problem is fixed.

Please tell David it isn't fixed for the BOINC domain currently.

I'm getting ERR_ADDRESS_UNREACHABLE, as often as not.

I also get a timeout when trying to reach the 'Seti is Down' boards...unreachable.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1731044 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1731048 - Posted: 2 Oct 2015, 8:45:38 UTC

Can't report finished tasks, can't get new tasks, can't reach the Boinc download page to dl the package with virtual box so I can run vlhc instead of Seti, and once again nobody seems to care. At least I can't find any official word on the homepage or on the forums.
ID: 1731048 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1731053 - Posted: 2 Oct 2015, 9:04:47 UTC - in response to Message 1731042.  

According to David the routing problem is fixed.

Please tell David it isn't fixed for the BOINC domain currently.

It was fixed for a couple of hours. It's now broken again.
Oh, who can we blame this time? Any takers? :)
ID: 1731053 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1731054 - Posted: 2 Oct 2015, 9:04:55 UTC - in response to Message 1731048.  

Can't report finished tasks, can't get new tasks, can't reach the Boinc download page to dl the package with virtual box so I can run vlhc instead of Seti, and once again nobody seems to care. At least I can't find any official word on the homepage or on the forums.

Ditto
ID: 1731054 · Report as offensive
Previous · 1 . . . 21 · 22 · 23 · 24 · 25 · 26 · 27 . . . 32 · Next

Message boards : Number crunching : Panic Mode On (100) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.