Message boards :
Number crunching :
Panic Mode On (100) Server Problems?
Message board moderation
Previous · 1 . . . 26 · 27 · 28 · 29 · 30 · 31 · 32 · Next
Author | Message |
---|---|
OTS Send message Joined: 6 Jan 08 Posts: 369 Credit: 20,533,537 RAC: 0 |
Looks like I am all done for the weekend. As of a few minutes ago, no more WUs to work on and I would guess the chances of new work showing up are slim to none. I have not been able to connect even once in almost 24 hours. If SETI connectivity is a low priority, then having only a few users unable to connect must be a very low priority. :( |
Louis Loria II Send message Joined: 20 Oct 03 Posts: 259 Credit: 9,208,040 RAC: 24 |
Looks like I am all done for the weekend. As of a few minutes ago, no more WUs to work on and I would guess the chances of new work showing up are slim to none. I have not been able to connect even once in almost 24 hours. My connection is sporadic as well. I am getting WUs though. It seems to be about every eight to twelve hours. It is playing with my RAC of course, but I have plenty of work. I don't understand much about networks, but the crying and pinging are unnerving. I think that if the SETI crew knew what was really happening, they would tell us. It seems that they are at the mercy of UC Berkley. Anyhow, I for one will let my rig continue to run until I am out of work or a solution is found. Keep crunching...(if possible) |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
Have set Einstein as backup. Oh well the app is rather buggy but good for testing. With each crime and every kindness we birth our future. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
Oh what fun that they changed back over to campus network connection. Don't we all now wish they'd kept their own internet connection? Isn't it possible to switch back to that one for the time being? :) |
Herb Smith Send message Joined: 28 Jan 07 Posts: 76 Credit: 31,615,205 RAC: 0 |
I find it interesting that there is no mention of the Seti issues on the IT web site. The policy stated at the top of page is that any issue affecting users for more than 30 minutes will be posted. Well this has certainly been going on for a lot longer than 30 minutes. And it just says users, not all users. It is quite clear that it is not just one or two people impacted. At the very least they should be posting information about the issue. |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30646 Credit: 53,134,872 RAC: 32 |
Still down. Machines starting to run out of work again. Coming in from Comcast ISP from Chicago area. I suspect the who your ISP is may be a routing table that won't or has not yet refreshed. That might be a hardware issue with the router, but as it seems to be happening on two routers more of a software configuration issue. http://www.net.berkeley.edu/netinfo/newmaps/campus-topology.pdf The two routers in question are the only paths into the Earl Warren Data Center. Every campus login for every connect would flow through one or the other of these boxes to reach the central database of campus users. Campus wide backups too. All course material. Nearly every IT related function will at some point have to access some database in the data center. Obviously that is all working or there would be notices all over the systems status page. Behind these routers there must be a switch likely on the rack the Seti/BOINC equipment is on to distribute the traffic to all the machines. This dumb switch might be the issue. I don't think the issue is with DHCP because some of the people can reach it some of the time. If the issue was DHCP no one could reach it until lease expires. Oh and much of the Seti/BOINC kit is fixed IP anyway. I'm sure Matt has taken a long hard look at the configuration of the Seti/BOINC equipment and assured it is correct and worked with Campus to be sure that some change order wasn't overlooked. If Campus thinks the issue is with their equipment it will be resolved. Just remember as it is 99% working for them, they are going to be loathe to intentionally take 100% service offline for a fix until they are assured they is no other choice. |
OGM Send message Joined: 14 Apr 15 Posts: 12 Credit: 1,001,458 RAC: 0 |
Started to work on my end... not sure for how long. |
Cavalary Send message Joined: 15 Jul 99 Posts: 104 Credit: 7,507,548 RAC: 38 |
Yep, working for me now too. |
Kibble (KB7TIB) Send message Joined: 6 Dec 99 Posts: 27 Credit: 10,121,469 RAC: 2 |
Two of three machines have been able to make contact this morning. Looks like Einstein is getting the benefits from one. Oh, well.[/size] |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
It's as if it's on a timer. What kind of timer does not allow access to the network for 18-20 hours a day? |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 30646 Credit: 53,134,872 RAC: 32 |
It's as if it's on a timer. What kind of timer does not allow access to the network for 18-20 hours a day? A RAM buffer overflow timer! With the system working $ traceroute -I boinc.berkeley.edu traceroute to boinc.berkeley.edu (208.68.240.115), 64 hops max, 72 byte packets 1 * * * 2 l100.lsanca-dsl-20.verizon-gni.net (71.108.177.1) 23.359 ms 23.182 ms 23.112 ms 3 p0-2-2-5.lsanca-lcr-21.verizon-gni.net (130.81.35.32) 26.164 ms 28.747 ms 35.141 ms 4 ae1-0.lax01-bb-rtr1.verizon-gni.net (130.81.199.90) 26.092 ms 24.303 ms 24.625 ms 5 * * * 6 0.ae5.br1.lax15.alter.net (140.222.225.135) 24.722 ms 24.374 ms 24.884 ms 7 ae6.edge1.losangeles9.level3.net (4.68.62.169) 24.365 ms 24.890 ms 24.662 ms 8 ae-3-80.ear1.losangeles1.level3.net (4.69.144.146) 25.726 ms 25.891 ms 25.620 ms 9 cenic.ear1.losangeles1.level3.net (4.35.156.66) 25.590 ms 25.382 ms 25.355 ms 10 dc-svl-agg4--lax-agg6-100ge.cenic.net (137.164.11.1) 36.270 ms 35.459 ms 36.197 ms 11 dc-oak-agg4--svl-agg4-100ge.cenic.net (137.164.46.144) 39.906 ms 36.472 ms 36.677 ms 12 ucb--oak-agg4-10g.cenic.net (137.164.50.31) 34.949 ms 38.679 ms 35.176 ms 13 t2-3.inr-201-sut.berkeley.edu (128.32.0.37) 38.471 ms 39.041 ms 39.848 ms 14 et3-48.inr-311-ewdc.berkeley.edu (128.32.0.101) 39.286 ms 39.073 ms 38.682 ms 15 isaac.ssl.berkeley.edu (208.68.240.115) 39.068 ms 38.789 ms 38.355 ms $ |
JaundicedEye Send message Joined: 14 Mar 12 Posts: 5375 Credit: 30,870,693 RAC: 1 |
All systems running smoothly..............I smell a rat...... [edit] WoW, did that ready to send buffer drain quickly....... "Sour Grapes make a bitter Whine." <(0)> |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
Very interesting that with things working "properly", a tracert setiboinc.ssl.berkeley.edu now yields a timeout. Better than not found, but still ??? |
Herb Smith Send message Joined: 28 Jan 07 Posts: 76 Credit: 31,615,205 RAC: 0 |
Went out to see The Martian and came home to find my caches full and things looking more normal. Oh, and it is a good movie also. Two good things today. Herb |
betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66 |
SSP shows RTS = 0 but I won't panic because I have 5 days or so of APs left to process. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Looking at my log, for the last 7.5 hours I haven't had any Scheduler contact issues. The only issue at the moment is with the splitters, once again their output has dropped way off & they're not able to keep up with demand, let alone build up the ready-to-send buffer. 09ap11aa has 2 splitters on the one file. Sticky file? Even so, 1 or 2 splitters down shouldn't result in running out of ready-to-send work unless there's a shorty storm, and that isn't the case. What's really causing concern for me right now- the Haveland page isn't displaying anything at the moment. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
What's really causing concern for me right now- the Haveland page isn't displaying anything at the moment. It's showing normally for me, with none of the gaps that usually occur when it can't pull server stats from the status page. The only oddity I see is a full RTS (and matching low creation rate, because of high water mark) until 18:00 his time - which I think is one hour the other side of UTC from me. Then, a dramatic draining of RTC over about 3 hours, which the creation rate - now uninhibited - wasn't able to keep up with. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
What's really causing concern for me right now- the Haveland page isn't displaying anything at the moment. *sighs in relief* The page is now coming up for me as well (mostly, some of the graphs aren't loading). Before it was just a blank white page. The only oddity I see is a full RTS (and matching low creation rate, because of high water mark) until 18:00 his time - which I think is one hour the other side of UTC from me. Then, a dramatic draining of RTC over about 3 hours, which the creation rate - now uninhibited - wasn't able to keep up with. Yep, 18:00 there was a huge surge in returned work (250,000 per hour), Ready-to-send buffer drained, and at that time the splitter output wasn't inhibited, but it is now. It was peaking at 38 with lows of 25, then at approx. 22:30hrs it dropped to a max of 30 & minimum below 20. It's just now coming off the sub 20 minimum & is still just on 30. For some reason received in the last hour is still sitting around 100,000- in my cache & the odd amount of work I'm able to get seems to be a reasonable mix of VLARs and shorties. Possibly as I get more work, there will be more shorties than VLARs, hence the current Returned-in-the-last-hour numbers. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Looking at that surge in returned work, and my client logs for Scheduler contact issues, it appears my Scheduler contact issues were resolved pretty much around the same time the upload avalanche began. Looks like they may have fixed the network issues. Grant Darwin NT |
betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66 |
Looks like they may have fixed the network issues. If so we will never know what was wrong. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.