Panic Mode On (100) Server Problems?

Message boards : Number crunching : Panic Mode On (100) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 . . . 32 · Next

AuthorMessage
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 1731353 - Posted: 3 Oct 2015, 2:10:08 UTC - in response to Message 1731352.  

Flooding the Help Desk with Trouble reports about a problem they are already working on may serve only to piss-off the folks on the help-desk, and possibly IT management.

And waste manpower.
ID: 1731353 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1731356 - Posted: 3 Oct 2015, 2:20:31 UTC

A few dozen (or hundred or thousand) emails might help Berkeley IST grasp the scope of the problem.


Never, hassle the cook when he's preparing your food........

"Sour Grapes make a bitter Whine." <(0)>
ID: 1731356 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1731360 - Posted: 3 Oct 2015, 2:29:40 UTC - in response to Message 1731187.  

Yup - one of my machines started receiving WUs about an hour ago, without a manual update. I just did a manual update on the other, and the floodgates opened (170 WUs in the first shot!).

Looks like all is well, at long last.


Only lasted for about 2 hours, then dead again. And until now, and who knows until when this time????

And that is about 8 1/2 hours summed up nicely...
ID: 1731360 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13732
Credit: 208,696,464
RAC: 304
Australia
Message 1731363 - Posted: 3 Oct 2015, 2:36:23 UTC - in response to Message 1731360.  

Just to add to the fun- there appear to be a lot of VLARs coming through again. So even if you do somehow manage to contact the Scheduler, your GPU may not get any work out of it.
Grant
Darwin NT
ID: 1731363 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30640
Credit: 53,134,872
RAC: 32
United States
Message 1731368 - Posted: 3 Oct 2015, 3:24:47 UTC

Interesting ...
Work ISP, no issues at all ....
Home, can't reach the Dev board and the machines can't get/report main/beta work.
$ traceroute boinc.berkeley.edu
traceroute to boinc.berkeley.edu (208.68.240.115), 64 hops max, 52 byte packets
 1  * * *
 2  l100.lsanca-dsl-20.verizon-gni.net (71.108.177.1)  23.000 ms  22.884 ms  22.903 ms
 3  p0-2-2-5.lsanca-lcr-21.verizon-gni.net (130.81.35.32)  25.212 ms  27.449 ms  24.986 ms
 4  ae9-0.lax01-bb-rtr1.verizon-gni.net (130.81.163.240)  24.490 ms
    so-3-1-0-0.lax01-bb-rtr1.verizon-gni.net (130.81.151.236)  30.732 ms *
 5  0.ae5.br1.lax15.alter.net (140.222.225.135)  24.207 ms * *
 6  0.ae5.br1.lax15.alter.net (140.222.225.135)  27.917 ms
    ae6.edge1.losangeles9.level3.net (4.68.62.169)  25.812 ms
    0.ae6.br1.lax15.alter.net (140.222.225.137)  26.247 ms
 7  * ae6.edge1.losangeles9.level3.net (4.68.62.169)  27.217 ms
    ae-1-60.ear1.losangeles1.level3.net (4.69.144.18)  26.395 ms
 8  cenic.ear1.losangeles1.level3.net (4.35.156.66)  25.842 ms
    ae-1-60.ear1.losangeles1.level3.net (4.69.144.18)  26.455 ms
    cenic.ear1.losangeles1.level3.net (4.35.156.66)  25.547 ms
 9  cenic.ear1.losangeles1.level3.net (4.35.156.66)  25.461 ms
    dc-svl-agg4--lax-agg6-100ge.cenic.net (137.164.11.1)  33.797 ms
    cenic.ear1.losangeles1.level3.net (4.35.156.66)  26.200 ms
10  dc-svl-agg4--lax-agg6-100ge.cenic.net (137.164.11.1)  33.577 ms
    dc-oak-agg4--svl-agg4-100ge.cenic.net (137.164.46.144)  35.655 ms  36.262 ms
11  ucb--oak-agg4-10g.cenic.net (137.164.50.31)  37.876 ms
    dc-oak-agg4--svl-agg4-100ge.cenic.net (137.164.46.144)  36.301 ms
    ucb--oak-agg4-10g.cenic.net (137.164.50.31)  37.990 ms
12  ucb--oak-agg4-10g.cenic.net (137.164.50.31)  39.112 ms
    t2-3.inr-201-sut.berkeley.edu (128.32.0.37)  38.593 ms
    ucb--oak-agg4-10g.cenic.net (137.164.50.31)  68.088 ms
13  t2-3.inr-201-sut.berkeley.edu (128.32.0.37)  40.203 ms
    et3-47.inr-311-ewdc.berkeley.edu (128.32.0.103)  36.416 ms
    t2-3.inr-202-reccev.berkeley.edu (128.32.0.39)  36.620 ms
14  et3-47.inr-311-ewdc.berkeley.edu (128.32.0.103)  37.451 ms  36.168 ms
    et3-48.inr-311-ewdc.berkeley.edu (128.32.0.101)  35.919 ms
15  et3-47.inr-311-ewdc.berkeley.edu (128.32.0.103)  1996.520 ms !H  3010.989 ms !H  3011.396 ms !H
$

Also ICMP
$ traceroute -I boinc.berkeley.edu
traceroute to boinc.berkeley.edu (208.68.240.115), 64 hops max, 72 byte packets
 1  * * *
 2  l100.lsanca-dsl-20.verizon-gni.net (71.108.177.1)  23.491 ms  23.292 ms  23.347 ms
 3  p0-2-2-5.lsanca-lcr-21.verizon-gni.net (130.81.35.32)  27.793 ms  27.883 ms  27.847 ms
 4  ae1-0.lax01-bb-rtr1.verizon-gni.net (130.81.199.90)  24.727 ms  38.736 ms  28.939 ms
 5  * * *
 6  0.ae5.br1.lax15.alter.net (140.222.225.135)  26.017 ms  24.799 ms  24.371 ms
 7  ae6.edge1.losangeles9.level3.net (4.68.62.169)  25.671 ms  25.568 ms  25.833 ms
 8  * * *
 9  cenic.ear1.losangeles1.level3.net (4.35.156.66)  25.125 ms  24.813 ms  25.380 ms
10  dc-svl-agg4--lax-agg6-100ge.cenic.net (137.164.11.1)  33.727 ms  33.480 ms  33.718 ms
11  dc-oak-agg4--svl-agg4-100ge.cenic.net (137.164.46.144)  36.548 ms  35.663 ms  35.933 ms
12  ucb--oak-agg4-10g.cenic.net (137.164.50.31)  38.237 ms  38.657 ms  38.193 ms
13  t2-3.inr-201-sut.berkeley.edu (128.32.0.37)  39.225 ms  38.411 ms  38.433 ms
14  et3-48.inr-311-ewdc.berkeley.edu (128.32.0.101)  36.511 ms  37.666 ms  36.438 ms
15  et3-48.inr-311-ewdc.berkeley.edu (128.32.0.101)  128.161 ms !H  3010.297 ms !H  3036.063 ms !H
$

Note the very very long times on the last reported router.
As it happened on two different boxes I doubt hardware failure. I suspect someone may have put a loop in the routing tables or some such software error, perhaps not everything got turned on in the right order after the fire shutdown?
ID: 1731368 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1731409 - Posted: 3 Oct 2015, 6:11:51 UTC

So somebody did put a note on the SETI top page a few minutes ago about the issue. Nothing new, just that it's being worked on.
ID: 1731409 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1731413 - Posted: 3 Oct 2015, 6:39:57 UTC - in response to Message 1731409.  

Nothing new, just that it's being worked on.



ID: 1731413 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1731421 - Posted: 3 Oct 2015, 7:34:30 UTC - in response to Message 1731318.  

This is the new "normal". I guess we just have to get used to it. Yo-Yo connections, the new SETI normal.

I would expect that if they are using any kind of traffic prioritization the SETI@home data traffic wouldn't be high on the list. Where previously the hurricane line was likely just trunked over a vlan. Which may not have been subject to the same kind of packet scrutiny that normally occurs.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1731421 · Report as offensive
Profile Oz
Avatar

Send message
Joined: 6 Jun 99
Posts: 233
Credit: 200,655,462
RAC: 212
United States
Message 1731434 - Posted: 3 Oct 2015, 8:42:51 UTC

Ok, I am sorry to have suggested what I thought might have been a helpful solution. Since there is no mention of any issues involving ssl.berkeley.edu on the berkeley ist status (trouble) page, they are either unaware of the problem or have assigned it a priority below low.
Member of the 20 Year Club



ID: 1731434 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1731448 - Posted: 3 Oct 2015, 10:11:54 UTC
Last modified: 3 Oct 2015, 10:12:38 UTC

I have major upload problems too in one host, but restarting BOINC helped, downloads etc start working again.
ID: 1731448 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1731452 - Posted: 3 Oct 2015, 10:28:03 UTC - in response to Message 1731448.  

I have major upload problems too in one host, but restarting BOINC helped, downloads etc start working again.

I suspect that's either the placebo effect, or a long-winded way of clicking the update/retry button to clear the backoff.
ID: 1731452 · Report as offensive
Cavalary

Send message
Joined: 15 Jul 99
Posts: 104
Credit: 7,507,548
RAC: 38
Romania
Message 1731453 - Posted: 3 Oct 2015, 10:29:31 UTC

Fortunately caught a period when it worked yesterday, reported some 50+ completed WUs and downloaded to refill the queue. Then it could report 5 more WUs as they completed, and download others to replace them, then gone again.
ID: 1731453 · Report as offensive
Profile Oz
Avatar

Send message
Joined: 6 Jun 99
Posts: 233
Credit: 200,655,462
RAC: 212
United States
Message 1731456 - Posted: 3 Oct 2015, 10:43:53 UTC

03-10-15 6:42:15 AM Project communication failed: attempting access to reference site
03-10-15 6:42:15 AM SETI@home Scheduler request failed: Couldn't connect to server
03-10-15 6:42:17 AM Internet access OK - project servers may be temporarily down.
Member of the 20 Year Club



ID: 1731456 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1731457 - Posted: 3 Oct 2015, 10:46:22 UTC

03/10/2015 11:19:23 | SETI@home | Scheduler request failed: Couldn't connect to server
03/10/2015 11:24:01 | SETI@home | Scheduler request failed: Couldn't connect to server
03/10/2015 11:25:54 | SETI@home | Scheduler request failed: Couldn't connect to server
03/10/2015 11:26:11 | SETI@home | Scheduler request completed: got 2 new tasks
ID: 1731457 · Report as offensive
Profile Oz
Avatar

Send message
Joined: 6 Jun 99
Posts: 233
Credit: 200,655,462
RAC: 212
United States
Message 1731461 - Posted: 3 Oct 2015, 10:57:27 UTC
Last modified: 3 Oct 2015, 11:00:15 UTC

03-10-15 4:11:45 AM SETI@home Scheduler request failed: Couldn't connect to server
03-10-15 4:23:04 AM SETI@home Scheduler request failed: Couldn't connect to server
03-10-15 6:05:23 AM SETI@home Scheduler request failed: Couldn't connect to server
03-10-15 6:06:53 AM SETI@home Scheduler request failed: Couldn't connect to server
03-10-15 6:08:16 AM SETI@home Scheduler request failed: Couldn't connect to server
03-10-15 6:09:40 AM SETI@home Scheduler request failed: Couldn't connect to server
03-10-15 6:11:04 AM SETI@home Scheduler request failed: Couldn't connect to server
03-10-15 6:12:26 AM SETI@home Scheduler request failed: Couldn't connect to server
03-10-15 6:14:39 AM SETI@home Scheduler request failed: Couldn't connect to server
03-10-15 6:19:33 AM SETI@home Scheduler request failed: Couldn't connect to server
03-10-15 6:21:52 AM SETI@home Scheduler request failed: Couldn't connect to server
03-10-15 6:42:15 AM SETI@home Scheduler request failed: Couldn't connect to server

Heading into Day 5...
Member of the 20 Year Club



ID: 1731461 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1731470 - Posted: 3 Oct 2015, 12:59:10 UTC

Had connection problems with SETI servers at morning too, now all OK.
ID: 1731470 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1731477 - Posted: 3 Oct 2015, 13:29:32 UTC
Last modified: 3 Oct 2015, 13:42:10 UTC

On this host out of last 48 Scheduler requests 26 was completed, 22 failed.

On this host out of last 115 Scheduler requests 50 was completed, 65 failed.
ID: 1731477 · Report as offensive
Herb Smith
Volunteer tester

Send message
Joined: 28 Jan 07
Posts: 76
Credit: 31,615,205
RAC: 0
United States
Message 1731478 - Posted: 3 Oct 2015, 13:31:02 UTC

Still down. Machines starting to run out of work again. Coming in from Comcast ISP from Chicago area.
It seems very strange that this depends on where you are and\or who your ISP is.
Has anybody put a packet analyzer on their attempts to communicate.

Herb
ID: 1731478 · Report as offensive
Profile Cliff Harding
Volunteer tester
Avatar

Send message
Joined: 18 Aug 99
Posts: 1432
Credit: 110,967,840
RAC: 67
United States
Message 1731496 - Posted: 3 Oct 2015, 14:32:08 UTC

Maybe this will explain what is going on with us only having one d/l server for the last several weeks. Skip the first notice and go to 2 & 3.

http://ucbsystems.org/category/active/scheduled-maintenance/


I don't buy computers, I build them!!
ID: 1731496 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1731503 - Posted: 3 Oct 2015, 14:55:29 UTC - in response to Message 1731496.  

Maybe this will explain what is going on with us only having one d/l server for the last several weeks. Skip the first notice and go to 2 & 3.

http://ucbsystems.org/category/active/scheduled-maintenance/

I noticed those items earlier today and wondered if someone in IT was trying to 'get a head start' on the work and now has no idea what they screwed up.

Wouldn't be the first time....

"Sour Grapes make a bitter Whine." <(0)>
ID: 1731503 · Report as offensive
Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 . . . 32 · Next

Message boards : Number crunching : Panic Mode On (100) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.