Panic Mode On (116) Server Problems?

Author	Message
arkayn Volunteer tester Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0	Message 1987648 - Posted: 28 Mar 2019, 20:36:56 UTC Time to create another new thread, we are over critical mass in the old thread. I will start off with the same image I posted at the end of the last thread. ID: 1987648 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1987655 - Posted: 28 Mar 2019, 22:08:10 UTC - in response to Message 1987648. I have been creating these threads for 10.5 years. And speaking of that, as we are past 600 again, I think I will create a new one again. . . I believe these threads you create are without doubt the MOST used threads in the system ... :) Stephen :) or should that be :( ID: 1987655 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1987656 - Posted: 28 Mar 2019, 22:18:48 UTC Still having major download issues and keeping on top of the stalled and backoffed downloads. I think Eric's change is the cause of the issue. (My workaround was just to not let connection attempts sit in the local queues for long periods of time. Quick drops are often much better than those that hang around and prevent other connections. That seems to have fixed the log jam, but there may still be people who can't connect.) Whatever he changed to shorten the time a connection attempt sits in the local queue is not long enough. The tasks don't even start to download, just immediately go to backoff when the client asks for work. His comment that it might affect people is true, though I can connect, but I can't maintain a steady download queue and some tasks always stall out on the connection leaving them hanging around to prevent a normal client connection at the normal intervals. Until those stalled downloads clear, I don't ask for work which could be for several hours depending on the backoff length. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1987656 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1987660 - Posted: 28 Mar 2019, 22:43:39 UTC - in response to Message 1987656. Last modified: 28 Mar 2019, 22:45:30 UTC Still having major download issues and keeping on top of the stalled and backoffed downloads. I think Eric's change is the cause of the issue. . . Since the problem existed before Eric made the change it is certainly NOT the cause but it may be an imperfect cure. It may, as you say, need to be a trifle longer to prevent momentary traffic conflicts from causing the instant and quickly prolonged backoffs. Stephen ID: 1987660 ·

Cosmic_Ocean Send message Joined: 23 Dec 00 Posts: 3027 Credit: 13,516,867 RAC: 13	Message 1987662 - Posted: 28 Mar 2019, 23:06:36 UTC haven't been here in a while, heard CPU fan revved up and wondered why, saw one AP was running. Checked over in Manager and saw one running.. 8 were downloading. All in project backoff. Came here to see what's up with that.. saw there's complications. Did the only sensible thing you CAN do.. and I remember having to do this all the time back before the move down to the co-lo... hammer the retry button, of course They DO start transferring after 1-5 tries Linux laptop: record uptime: 1511d 20h 19m (ended due to the power brick giving-up) ID: 1987662 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1987663 - Posted: 28 Mar 2019, 23:08:52 UTC - in response to Message 1987660. Still having major download issues and keeping on top of the stalled and backoffed downloads. I think Eric's change is the cause of the issue. . . Since the problem existed before Eric made the change it is certainly NOT the cause but it may be an imperfect cure. It may, as you say, need to be a trifle longer to prevent momentary traffic conflicts from causing the instant and quickly prolonged backoffs. Stephen Yes we had download issues before. What I was commenting on was the "patch on top of the patch" He made changes back when we lost one entire download server and were reduced to one server. He made some configuration changes to get it back online that was not the normal or previous configuration if I remember. Now the aformentioned patch on top of that patch. Not optimal. Could we return to the previous download server configuration before that failure? Things we going great beforehand. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1987663 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1987664 - Posted: 28 Mar 2019, 23:10:45 UTC - in response to Message 1987662. haven't been here in a while, heard CPU fan revved up and wondered why, saw one AP was running. Checked over in Manager and saw one running.. 8 were downloading. All in project backoff. Came here to see what's up with that.. saw there's complications. Did the only sensible thing you CAN do.. and I remember having to do this all the time back before the move down to the co-lo... hammer the retry button, of course They DO start transferring after 1-5 tries Not in my case. If I hammer the retry button I just increment the backoff by 45 minutes till it hits 6 hours. A fruitless exercise that makes matters worse before I did anything. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1987664 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1987666 - Posted: 28 Mar 2019, 23:33:47 UTC It's getting late and I still have download problems. The Hammer deal works for me, it's about all that does work at the moment. The biggest problem is the Mac with 5 GPUs and a 500 WU cache, it can't seem to make it 5 minutes without stalling a download. This morning it was Out of work with a cache full of stalled downloads, can't leave it more than a few hours or it stops working. ID: 1987666 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1987676 - Posted: 29 Mar 2019, 0:13:26 UTC May have figured something out on my download issues. Ian's suggestion to revert to stock <max_file_xfers_per_project>2</max_file_xfers_per_project> seems to have improved things greatly. But did not solve the issue entirely. I was still having stalled downloads that turned into backoffs. I was also getting the instant retries though the max_file_xfers change greatly reduced those but didn't eliminate them. What I do think made some difference is putting the <http_transfer_timeout></http_transfer_timeout> back to stock 300 seconds. I had changed that for the earlier problem of only having one download server along with the <max_file_xfers_per_project>2</max_file_xfers_per_project> change a month ago. That value was still set for 90 seconds. I think I realized that with the reduction of the allowed connections from my normal 8 connections to the project with my many hundred plus task downloads on every connection, and with the length of time it now takes to download that many tasks, two at a time, that I may have exceeded the 90 second http_transfer_timeout. That may have been what was forcing so many tasks into backoff and retries. Now that I allow the connection to last for 300 seconds, I am not getting retries or backoffs. Or if I do get a retry, the connection is still alive when the first retry counts down. So if anyone else had made that change in the parameter, I suggest nulling it out again. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1987676 ·

Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640	Message 1987691 - Posted: 29 Mar 2019, 2:39:00 UTC - in response to Message 1987676. Nice Keith. My systems have been pretty hands off for me all day. Once I changed it back to max xfers 2, I pretty much didnâ€™t have to touch it. Now I wonder whatâ€™s going on with the stagnant RAC. Prior to the outage on Tuesday, my RAC was steadily climbing. I took the hit from the outage and the beast running out of work. But expected RAC to recover after a day or two like it usually does. But still RAC has been stagnant for several days now. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ID: 1987691 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 38618 Credit: 261,360,520 RAC: 489	Message 1987693 - Posted: 29 Mar 2019, 2:54:22 UTC It's sorta like the upload problem we had before the last outage, but it seems now that I've gotta check my downloads every hour or 2 to stay on top of things. :-( I'm sorta glad that I'm not running Linux w/ SS yet. Cheers. ID: 1987693 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1987698 - Posted: 29 Mar 2019, 4:14:09 UTC I noticed that the stats export to BOINCStats has changed from around 1430 hours UTC to now around 2130 hours UTC. So the later time might mean it doesn't update the stats till the next day. I too have noticed a rather severe drop in RAC across all hosts. Normally would have recovered by now. But maybe the change in data mix is the thing affecting the RAC. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1987698 ·

Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22	Message 1987702 - Posted: 29 Mar 2019, 5:02:20 UTC The status page is missing for me. I hope it is only me, or just a weird fluke that clears up in 5 minutes. no panic, just weirdness. Hopefully all the systems are working and it is only a page problem. ID: 1987702 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 14010 Credit: 208,696,464 RAC: 304	Message 1987705 - Posted: 29 Mar 2019, 5:33:18 UTC Last modified: 29 Mar 2019, 5:38:35 UTC I see we are still having download issues. Came home to find one system out of CPU work as there were several downloads in super extended backoff mode. Cleared those, and the next batch to download was interesting. about half stared downloading straight off and at pretty good speed. The others took quite a while to start downloading, and they tended to star & stop resulting in download speeds of around 10kB/s. So one download server is now mostly OK, the other still borked? Edit- Next couple of mass downloads, all managed to download at reasonable speeds. Grant Darwin NT ID: 1987705 ·

John Neale Volunteer tester Send message Joined: 16 Mar 00 Posts: 634 Credit: 7,246,513 RAC: 9	Message 1987708 - Posted: 29 Mar 2019, 5:56:57 UTC - in response to Message 1987702. The status page is missing for me. I hope it is only me, or just a weird fluke that clears up in 5 minutes. no panic, just weirdness. Hopefully all the systems are working and it is only a page problem. Nope, not just you. The Server status page is blank for me too. :) ID: 1987708 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 14010 Credit: 208,696,464 RAC: 304	Message 1987710 - Posted: 29 Mar 2019, 6:10:32 UTC - in response to Message 1987708. The status page is missing for me. I hope it is only me, or just a weird fluke that clears up in 5 minutes. no panic, just weirdness. Hopefully all the systems are working and it is only a page problem. Nope, not just you. The Server status page is blank for me too. :) And the Haveland graphs are starved for data as well. Grant Darwin NT ID: 1987710 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1987716 - Posted: 29 Mar 2019, 6:43:55 UTC Last modified: 29 Mar 2019, 7:21:51 UTC It appears the Host Web Pages aren't updating either. I am trying to test an App, it would be nice if the Web pages were working. Oh well, I guess it's tested enough anyway... Hey, the Web Pages are working again. I'm going to bed anyway, got all ready, and then it started working again. Blah, false alarm. Only a couple of pages updated but they are still way behind. The other pages never updated. ID: 1987716 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 14010 Credit: 208,696,464 RAC: 304	Message 1987719 - Posted: 29 Mar 2019, 6:52:40 UTC Still getting the occasional instant/near instant download timeout. Grant Darwin NT ID: 1987719 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874	Message 1987721 - Posted: 29 Mar 2019, 8:24:48 UTC - in response to Message 1987710. And the Haveland graphs are starved for data as well. Yes, the data sources went dark at - it seems - exactly 05:00 UTC. But in the two hours before that, the replica database started to fall behind and the MB result creation rate fell to near zero. Not looking good. ID: 1987721 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462	Message 1987729 - Posted: 29 Mar 2019, 10:10:25 UTC Last modified: 29 Mar 2019, 10:31:40 UTC Just had this error show up at the top of my browser Notice: unserialize(): Error at offset 4074 of 4096 bytes in /disks/carolyn/b/home/boincadm/projects/sah/html/inc/user.inc on line 43 Then it went away. ---edit--- Then it came back. Notice: unserialize(): Error at offset 4074 of 4096 bytes in /disks/carolyn/b/home/boincadm/projects/sah/html/inc/user.inc on line 43 Tom A proud member of the OFA (Old Farts Association). ID: 1987729 ·

©2026 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.