Panic Mode On (110) Server Problems?

Author	Message
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13732 Credit: 208,696,464 RAC: 304	Message 1918740 - Posted: 14 Feb 2018, 5:10:26 UTC - in response to Message 1918735. Last modified: 14 Feb 2018, 5:14:19 UTC This makes for a long day now :( And then they cleared. Hopefully they'll stay good now. Edit- now it'd be nice if the splitters could finally get going, and keep going. But with all those deletions backing up... Grant Darwin NT ID: 1918740 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1918750 - Posted: 14 Feb 2018, 6:26:32 UTC - in response to Message 1918740. And then they cleared. Hopefully they'll stay good now. Edit- now it'd be nice if the splitters could finally get going, and keep going. But with all those deletions backing up... Watched a documentary and came back and gave the stalled downloads a try. Cleared them all up but now no work is available. If past experience shows, I will wake up tomorrow morning to full caches on all machines. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1918750 ·

Stargate (SA) Volunteer tester Send message Joined: 4 Mar 10 Posts: 1854 Credit: 2,258,721 RAC: 0	Message 1918755 - Posted: 14 Feb 2018, 6:49:49 UTC Now it's lag time right on queue ID: 1918755 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1918757 - Posted: 14 Feb 2018, 7:38:46 UTC - in response to Message 1918755. Now it's lag time right on queue Yes, but much shorter tonight. Only about ten minutes for the notch in the Haveland graphs. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1918757 ·

Stargate (SA) Volunteer tester Send message Joined: 4 Mar 10 Posts: 1854 Credit: 2,258,721 RAC: 0	Message 1918758 - Posted: 14 Feb 2018, 7:45:53 UTC Last modified: 14 Feb 2018, 7:52:26 UTC Not sure what that is? but around 5pm thru to 6pm everyday Adelaide time..Right after 6pm everything is running fine lol Could be the transition of time zones ( fast then slow then visa versa) All I know is that Seti is the only one affected, all other web sites works like normal.. ID: 1918758 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1918806 - Posted: 14 Feb 2018, 17:48:18 UTC - in response to Message 1918758. My experience also. No other websites exhibit the phenomena, only SETI. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1918806 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 1918809 - Posted: 14 Feb 2018, 18:12:18 UTC Maybe something runs at the servers at this time, scheduled since it's always at the same time, and slows down everything. ID: 1918809 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1918815 - Posted: 14 Feb 2018, 18:52:07 UTC - in response to Message 1918809. Yes, that is what I suspect. It seems to run on all the exposed servers in the Haveland graphs. So all the splitters, validators, purgers etc. for both AP and MB. Same for all the schedulers, up/down servers and the replica database. Since its network related, I wonder if the routers or backup power supplies have a 15 minute update period or do some sort of internal housekeeping. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1918815 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1918821 - Posted: 14 Feb 2018, 19:33:37 UTC - in response to Message 1918815. Last modified: 14 Feb 2018, 19:40:40 UTC The Haveland graphs are drawn from exactly the same data as we see on the server status page ("SETI@home server status information is also available in XML") - I debugged that when it wouldn't show SaH v8 for some time after we started using that. So, there are three possibilities for that gap in the line. 1) Every single server pauses at exactly the same time. 2) One server - the one which collects the data - pauses. 3) the XML data is inaccessible over the internet, either because of server connection failures, or because of router and line congestion. I think the second two are both more likely than the first. Richard Haselgrove <redacted> 26/11/16 at 10:21 AM To David Anderson Eric Korpela Jeff Cobb Message body While the SETI website - especially the Server Status page - is still fresh in our minds, could somebody fix the XML version of the SSP to show the correct values for sah_v8, please, rather than duplicating the Astropulse values? I think the problem lies in the function show_three_counts https://setisvn.ssl.berkeley.edu/trac/browser/seti_boinc_html/sah_status.php#L206 and specifically in lines 222-224: 222 $xmlstring = " <$xmlkey>$value</$xmlkey>\n"; 223 $xmlstring .= " <$axmlkey>$avalue</$axmlkey>\n"; 224 $xmlstring .= " <$bxmlkey>$avalue</$bxmlkey>\n"; Line 224 should use $bvalue, to match the b keys. This isn't mere pedantry - there are very useful graphs at https://setistats.haveland.com/ driven from the XML, but currently meaningless. ID: 1918821 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1918836 - Posted: 14 Feb 2018, 20:38:22 UTC - in response to Message 1918821. Thanks for the comment Richard about where the Havenland graphs get their data. I think that #2 is the likely cause as the break in the graphs seems to occur regularly every night at almost exactly the same time. I would think that #3 would be more variable as the the data going over the connection is likely a lot more variable in its traffic load. So do we know the name of the server that pulls the XML data from the SSP to publish to the Haveland graphs? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1918836 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1918850 - Posted: 14 Feb 2018, 22:08:41 UTC - in response to Message 1918836. Thanks for the comment Richard about where the Havenland graphs get their data. I think that #2 is the likely cause as the break in the graphs seems to occur regularly every night at almost exactly the same time. I would think that #3 would be more variable as the the data going over the connection is likely a lot more variable in its traffic load. So do we know the name of the server that pulls the XML data from the SSP to publish to the Haveland graphs? Wrong question. The same server renders the data, whether it's requested in html form or xml form - it's all done in the single sah_status.php file I linked. So I guess that would be muarae1 - the web (and god knows what else) server that you complain about being unresponsive each morning. ID: 1918850 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13732 Credit: 208,696,464 RAC: 304	Message 1918859 - Posted: 14 Feb 2018, 22:48:26 UTC - in response to Message 1918850. Last modified: 14 Feb 2018, 23:27:10 UTC So I guess that would be muarae1 - the web (and god knows what else) server that you complain about being unresponsive each morning. Web site & forums become slow/unresponsive/timeout. Scheduler is out of reach. No server status updates (Haveland graphs). It's generally a 30-45min period. Lately 45min has been more common. And it's now occurring about 1hour later than it used to. Edit- I can't remember when this started occurring, but i'm pretty sure it was very late last year (Nov, Dec?) Grant Darwin NT ID: 1918859 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1918884 - Posted: 15 Feb 2018, 1:41:41 UTC - in response to Message 1918859. For me, granted I have not been sitting in front of the computer exactly the same time every night, the unresponsiveness occurs around 07:15 UTC and usually lasts for 30 - 45 minutes and the site becomes available around 07:45 UTC. Anyone follow up my post and look into the Haveland graphs and verify what I see with regard the UTC time under each graph? I see the graph legend off by 1 hour UTC at all times. But the graph dropout is exactly in sync with when I have the site go unresponsive. For example my computer indicates the time is 01:38 UTC 15 Feb 2018 and the Haveland graphs are all showing 02:30 UTC 15 Feb 2018. That accounts for the SSP ten minute update cycle. They are showing a DST offset from last November still. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1918884 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1918902 - Posted: 15 Feb 2018, 3:00:41 UTC The results awaiting purge has now exceeded 7 million. That has made it impossible to view any of my tasks on my fastest crunchers because the database times out. They need to get those results purged and back down to reasonable levels. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1918902 ·

Stargate (SA) Volunteer tester Send message Joined: 4 Mar 10 Posts: 1854 Credit: 2,258,721 RAC: 0	Message 1918903 - Posted: 15 Feb 2018, 3:05:08 UTC It might get done at "Lag o- Clock" period :/ ID: 1918903 ·

Stargate (SA) Volunteer tester Send message Joined: 4 Mar 10 Posts: 1854 Credit: 2,258,721 RAC: 0	Message 1918942 - Posted: 15 Feb 2018, 6:52:18 UTC 5:20pm and so far no lag looks promising ID: 1918942 ·

Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50	Message 1918947 - Posted: 15 Feb 2018, 7:21:12 UTC Started here at 7:13 UTC. Humans may rule the world...but bacteria run it... ID: 1918947 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13732 Credit: 208,696,464 RAC: 304	Message 1918952 - Posted: 15 Feb 2018, 7:59:58 UTC - in response to Message 1918942. Last modified: 15 Feb 2018, 8:01:51 UTC 5:20pm and so far no lag looks promising Didn't notice any web site issues (not that I was doing much here at the time), but as per usual from 16:45 till 17:25 (CST (Australia)) no Scheduler contact was possible. Edit- And the Haveland graphs show the usual small gap, then drop & surge in Received-last-hour numbers. Grant Darwin NT ID: 1918952 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1918957 - Posted: 15 Feb 2018, 8:21:24 UTC - in response to Message 1918952. Last modified: 15 Feb 2018, 8:22:01 UTC Missed it. Was watching the telly. Came in to check on the computers and saw they were down on work. Looked back through the logs and see the first no connection event at 07:09 UTC. The big jump in returned tasks is a good telltale that many others were unable to contact the servers to report and get new work. Keith-Windows7 3196 SETI@home 2/14/2018 23:09:06 Sending scheduler request: To fetch work. 3197 SETI@home 2/14/2018 23:09:06 Reporting 4 completed tasks 3198 SETI@home 2/14/2018 23:09:06 Requesting new tasks for CPU and NVIDIA GPU 3199 SETI@home 2/14/2018 23:09:28 Scheduler request failed: Couldn't connect to server 3200 2/14/2018 23:09:29 Project communication failed: attempting access to reference site 3201 2/14/2018 23:09:31 Internet access OK - project servers may be temporarily down. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1918957 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1919055 - Posted: 15 Feb 2018, 19:26:53 UTC - in response to Message 1918884. Anyone follow up my post and look into the Haveland graphs and verify what I see with regard the UTC time under each graph? I see the graph legend off by 1 hour UTC at all times My haveland times have always been out 1h for me - for as long as I can remember. I have always thought it was because of the strange time zone I'm in that switches between Central and Mountain. ID: 1919055 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.