Panic Mode On (100) Server Problems?

Author	Message
OTS Volunteer tester Send message Joined: 6 Jan 08 Posts: 369 Credit: 20,533,537 RAC: 0	Message 1731524 - Posted: 3 Oct 2015, 15:57:33 UTC Last modified: 3 Oct 2015, 16:03:00 UTC Looks like I am all done for the weekend. As of a few minutes ago, no more WUs to work on and I would guess the chances of new work showing up are slim to none. I have not been able to connect even once in almost 24 hours. If SETI connectivity is a low priority, then having only a few users unable to connect must be a very low priority. :( ID: 1731524 ·

Louis Loria II Volunteer tester Send message Joined: 20 Oct 03 Posts: 259 Credit: 9,208,040 RAC: 24	Message 1731529 - Posted: 3 Oct 2015, 16:12:57 UTC - in response to Message 1731524. Looks like I am all done for the weekend. As of a few minutes ago, no more WUs to work on and I would guess the chances of new work showing up are slim to none. I have not been able to connect even once in almost 24 hours. If SETI connectivity is a low priority, then having only a few users unable to connect must be a very low priority. :( My connection is sporadic as well. I am getting WUs though. It seems to be about every eight to twelve hours. It is playing with my RAC of course, but I have plenty of work. I don't understand much about networks, but the crying and pinging are unnerving. I think that if the SETI crew knew what was really happening, they would tell us. It seems that they are at the mercy of UC Berkley. Anyhow, I for one will let my rig continue to run until I am out of work or a solution is found. Keep crunching...(if possible) ID: 1731529 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1731530 - Posted: 3 Oct 2015, 16:15:13 UTC Have set Einstein as backup. Oh well the app is rather buggy but good for testing. With each crime and every kindness we birth our future. ID: 1731530 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 1731531 - Posted: 3 Oct 2015, 16:16:21 UTC Oh what fun that they changed back over to campus network connection. Don't we all now wish they'd kept their own internet connection? Isn't it possible to switch back to that one for the time being? :) ID: 1731531 ·

Herb Smith Volunteer tester Send message Joined: 28 Jan 07 Posts: 76 Credit: 31,615,205 RAC: 0	Message 1731532 - Posted: 3 Oct 2015, 16:17:04 UTC I find it interesting that there is no mention of the Seti issues on the IT web site. The policy stated at the top of page is that any issue affecting users for more than 30 minutes will be posted. Well this has certainly been going on for a lot longer than 30 minutes. And it just says users, not all users. It is quite clear that it is not just one or two people impacted. At the very least they should be posting information about the issue. ID: 1731532 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30646 Credit: 53,134,872 RAC: 32	Message 1731534 - Posted: 3 Oct 2015, 16:22:51 UTC - in response to Message 1731478. Still down. Machines starting to run out of work again. Coming in from Comcast ISP from Chicago area. It seems very strange that this depends on where you are and\or who your ISP is. Has anybody put a packet analyzer on their attempts to communicate. Herb I suspect the who your ISP is may be a routing table that won't or has not yet refreshed. That might be a hardware issue with the router, but as it seems to be happening on two routers more of a software configuration issue. http://www.net.berkeley.edu/netinfo/newmaps/campus-topology.pdf The two routers in question are the only paths into the Earl Warren Data Center. Every campus login for every connect would flow through one or the other of these boxes to reach the central database of campus users. Campus wide backups too. All course material. Nearly every IT related function will at some point have to access some database in the data center. Obviously that is all working or there would be notices all over the systems status page. Behind these routers there must be a switch likely on the rack the Seti/BOINC equipment is on to distribute the traffic to all the machines. This dumb switch might be the issue. I don't think the issue is with DHCP because some of the people can reach it some of the time. If the issue was DHCP no one could reach it until lease expires. Oh and much of the Seti/BOINC kit is fixed IP anyway. I'm sure Matt has taken a long hard look at the configuration of the Seti/BOINC equipment and assured it is correct and worked with Campus to be sure that some change order wasn't overlooked. If Campus thinks the issue is with their equipment it will be resolved. Just remember as it is 99% working for them, they are going to be loathe to intentionally take 100% service offline for a fix until they are assured they is no other choice. ID: 1731534 ·

OGM Send message Joined: 14 Apr 15 Posts: 12 Credit: 1,001,458 RAC: 0	Message 1731535 - Posted: 3 Oct 2015, 16:23:12 UTC Started to work on my end... not sure for how long. ID: 1731535 ·

Cavalary Send message Joined: 15 Jul 99 Posts: 104 Credit: 7,507,548 RAC: 38	Message 1731559 - Posted: 3 Oct 2015, 17:32:17 UTC Yep, working for me now too. ID: 1731559 ·

Kibble (KB7TIB) Send message Joined: 6 Dec 99 Posts: 27 Credit: 10,121,469 RAC: 2	Message 1731566 - Posted: 3 Oct 2015, 17:45:28 UTC - in response to Message 1731535. Two of three machines have been able to make contact this morning. Looks like Einstein is getting the benefits from one. Oh, well.[/size] ID: 1731566 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 1731572 - Posted: 3 Oct 2015, 18:19:26 UTC It's as if it's on a timer. What kind of timer does not allow access to the network for 18-20 hours a day? ID: 1731572 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30646 Credit: 53,134,872 RAC: 32	Message 1731583 - Posted: 3 Oct 2015, 18:51:41 UTC - in response to Message 1731572. It's as if it's on a timer. What kind of timer does not allow access to the network for 18-20 hours a day? A RAM buffer overflow timer! With the system working $ traceroute -I boinc.berkeley.edu traceroute to boinc.berkeley.edu (208.68.240.115), 64 hops max, 72 byte packets 1 * * * 2 l100.lsanca-dsl-20.verizon-gni.net (71.108.177.1) 23.359 ms 23.182 ms 23.112 ms 3 p0-2-2-5.lsanca-lcr-21.verizon-gni.net (130.81.35.32) 26.164 ms 28.747 ms 35.141 ms 4 ae1-0.lax01-bb-rtr1.verizon-gni.net (130.81.199.90) 26.092 ms 24.303 ms 24.625 ms 5 * * * 6 0.ae5.br1.lax15.alter.net (140.222.225.135) 24.722 ms 24.374 ms 24.884 ms 7 ae6.edge1.losangeles9.level3.net (4.68.62.169) 24.365 ms 24.890 ms 24.662 ms 8 ae-3-80.ear1.losangeles1.level3.net (4.69.144.146) 25.726 ms 25.891 ms 25.620 ms 9 cenic.ear1.losangeles1.level3.net (4.35.156.66) 25.590 ms 25.382 ms 25.355 ms 10 dc-svl-agg4--lax-agg6-100ge.cenic.net (137.164.11.1) 36.270 ms 35.459 ms 36.197 ms 11 dc-oak-agg4--svl-agg4-100ge.cenic.net (137.164.46.144) 39.906 ms 36.472 ms 36.677 ms 12 ucb--oak-agg4-10g.cenic.net (137.164.50.31) 34.949 ms 38.679 ms 35.176 ms 13 t2-3.inr-201-sut.berkeley.edu (128.32.0.37) 38.471 ms 39.041 ms 39.848 ms 14 et3-48.inr-311-ewdc.berkeley.edu (128.32.0.101) 39.286 ms 39.073 ms 38.682 ms 15 isaac.ssl.berkeley.edu (208.68.240.115) 39.068 ms 38.789 ms 38.355 ms $ ID: 1731583 ·

JaundicedEye Send message Joined: 14 Mar 12 Posts: 5375 Credit: 30,870,693 RAC: 1	Message 1731587 - Posted: 3 Oct 2015, 19:00:52 UTC Last modified: 3 Oct 2015, 19:13:24 UTC All systems running smoothly..............I smell a rat...... [edit] WoW, did that ready to send buffer drain quickly....... "Sour Grapes make a bitter Whine." <(0)> ID: 1731587 ·

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 1731606 - Posted: 3 Oct 2015, 20:54:07 UTC - in response to Message 1731587. Last modified: 3 Oct 2015, 20:54:34 UTC Very interesting that with things working "properly", a tracert setiboinc.ssl.berkeley.edu now yields a timeout. Better than not found, but still ??? ID: 1731606 ·

Herb Smith Volunteer tester Send message Joined: 28 Jan 07 Posts: 76 Credit: 31,615,205 RAC: 0	Message 1731639 - Posted: 3 Oct 2015, 22:19:07 UTC Went out to see The Martian and came home to find my caches full and things looking more normal. Oh, and it is a good movie also. Two good things today. Herb ID: 1731639 ·

betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66	Message 1731645 - Posted: 3 Oct 2015, 22:32:41 UTC SSP shows RTS = 0 but I won't panic because I have 5 days or so of APs left to process. ID: 1731645 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1731657 - Posted: 3 Oct 2015, 23:32:40 UTC - in response to Message 1731645. Looking at my log, for the last 7.5 hours I haven't had any Scheduler contact issues. The only issue at the moment is with the splitters, once again their output has dropped way off & they're not able to keep up with demand, let alone build up the ready-to-send buffer. 09ap11aa has 2 splitters on the one file. Sticky file? Even so, 1 or 2 splitters down shouldn't result in running out of ready-to-send work unless there's a shorty storm, and that isn't the case. What's really causing concern for me right now- the Haveland page isn't displaying anything at the moment. Grant Darwin NT ID: 1731657 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1731664 - Posted: 3 Oct 2015, 23:52:19 UTC - in response to Message 1731657. What's really causing concern for me right now- the Haveland page isn't displaying anything at the moment. It's showing normally for me, with none of the gaps that usually occur when it can't pull server stats from the status page. The only oddity I see is a full RTS (and matching low creation rate, because of high water mark) until 18:00 his time - which I think is one hour the other side of UTC from me. Then, a dramatic draining of RTC over about 3 hours, which the creation rate - now uninhibited - wasn't able to keep up with. ID: 1731664 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1731673 - Posted: 4 Oct 2015, 0:29:08 UTC - in response to Message 1731664. Last modified: 4 Oct 2015, 0:31:18 UTC What's really causing concern for me right now- the Haveland page isn't displaying anything at the moment. It's showing normally for me, with none of the gaps that usually occur when it can't pull server stats from the status page. sighs in relief The page is now coming up for me as well (mostly, some of the graphs aren't loading). Before it was just a blank white page. The only oddity I see is a full RTS (and matching low creation rate, because of high water mark) until 18:00 his time - which I think is one hour the other side of UTC from me. Then, a dramatic draining of RTC over about 3 hours, which the creation rate - now uninhibited - wasn't able to keep up with. Yep, 18:00 there was a huge surge in returned work (250,000 per hour), Ready-to-send buffer drained, and at that time the splitter output wasn't inhibited, but it is now. It was peaking at 38 with lows of 25, then at approx. 22:30hrs it dropped to a max of 30 & minimum below 20. It's just now coming off the sub 20 minimum & is still just on 30. For some reason received in the last hour is still sitting around 100,000- in my cache & the odd amount of work I'm able to get seems to be a reasonable mix of VLARs and shorties. Possibly as I get more work, there will be more shorties than VLARs, hence the current Returned-in-the-last-hour numbers. Grant Darwin NT ID: 1731673 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1731677 - Posted: 4 Oct 2015, 0:51:53 UTC - in response to Message 1731673. Looking at that surge in returned work, and my client logs for Scheduler contact issues, it appears my Scheduler contact issues were resolved pretty much around the same time the upload avalanche began. Looks like they may have fixed the network issues. Grant Darwin NT ID: 1731677 ·

betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66	Message 1731686 - Posted: 4 Oct 2015, 1:22:07 UTC - in response to Message 1731677. Looks like they may have fixed the network issues. If so we will never know what was wrong. ID: 1731686 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.