The Server Issues / Outages Thread - Panic Mode On! (117)

Author	Message
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2022157 - Posted: 7 Dec 2019, 18:42:27 UTC - in response to Message 2022136. The results out in the field is over 6 million and it is still splitting fine (at a high rate). It still can't keep up, as it has a large backlog of "holes" to fill. Yep. Even with a 0.2 day cache setting, my Linux system is still unable to get work often enough to actually fill it's cache. Still taking 4-8 Scheduler requests to get any work. Grant Darwin NT ID: 2022157 ·

Siran d'Vel'nahr Volunteer tester Send message Joined: 23 May 99 Posts: 7379 Credit: 44,181,323 RAC: 238	Message 2022160 - Posted: 7 Dec 2019, 18:50:58 UTC Greetings, Is there any reason that, because of the lack of the 100 tasks per day per device limit, that my PCs don't report finished WUs regularly now? I have to manually do it with BT because they just add up and stick around. My Pis, laptop and tablet do not seem to be affected. They aren't getting any more WUs than they were before. They are reporting regularly. Have a great day! :) Siran CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath ID: 2022160 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 2022162 - Posted: 7 Dec 2019, 19:00:55 UTC - in response to Message 2022160. my PCs don't report finished WUs regularly now? The PCs don't report work 'regularly': they contact the servers when they feel they want new work. If you were regularly asking for more work than the limits allowed (100 tasks per device), your PCs will have been perpetually hungry, and - like Oliver Twist - always asking for more. If they now have their fill, there's no need to ask so often, and nothing to trigger the early reporting. If there ever is any unreported work hanging around, it should be sent automatically after not longer than 1 hour. That's quick enough - you can relax your trigger finger. That reduction in the number of wasteful scheduler contacts (asking for work that will never be sent) might even be a beneficial side-effect of the new limits, once they've settled down. ID: 2022162 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 2022164 - Posted: 7 Dec 2019, 19:14:26 UTC - in response to Message 2022162. I'm more worried about post maintenance work reporting of that many work units out there. ID: 2022164 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 2022167 - Posted: 7 Dec 2019, 19:19:08 UTC - in response to Message 2022164. I'm more worried about post maintenance work reporting of that many work units out there. Results out in the field 6,524,999 let's see what we have in few hours. ID: 2022167 ·

Sirius B Volunteer tester Send message Joined: 26 Dec 00 Posts: 24879 Credit: 3,081,182 RAC: 7	Message 2022168 - Posted: 7 Dec 2019, 19:21:02 UTC Nice, this should see me through to Xmas. :-) 07/12/2019 15:31:53 \| SETI@home \| Scheduler request completed: got 100 new tasks 07/12/2019 15:36:57 \| SETI@home \| Sending scheduler request: To fetch work. 07/12/2019 15:36:57 \| SETI@home \| Requesting new tasks for CPU 07/12/2019 15:37:01 \| SETI@home \| Scheduler request completed: got 96 new tasks 07/12/2019 15:42:08 \| SETI@home \| Scheduler request completed: got 0 new tasks 07/12/2019 15:42:08 \| SETI@home \| No tasks sent 07/12/2019 15:42:08 \| SETI@home \| This computer has reached a limit on tasks in progress ID: 2022168 ·

Kiska Volunteer tester Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0	Message 2022180 - Posted: 7 Dec 2019, 20:06:07 UTC - in response to Message 2022067. I hope we don't come to regret this. That's the million dollar question. Have we been given what we wished for, or is it just because something's broken? If we have been given it, I would have thought it would be a case of gradually increasing the limits to make sure the system doesn't fall over in a screaming heap. To just remove them seems rather, risky. Especially considering how sluggish the system has been over the last few weeks (although now without the Haveland graphs we can't actually see that occurring anymore). I guess it looks fine https://munin.kiska.pw/munin/Munin-Node/Munin-Node/index.html#setiathome ID: 2022180 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2022185 - Posted: 7 Dec 2019, 20:24:42 UTC - in response to Message 2022180. Last modified: 7 Dec 2019, 20:29:30 UTC https://munin.kiska.pw/munin/Munin-Node/Munin-Node/index.html#setiathome Thanks for that, I was really missing the graphs. Edit- is there one available for splitter output (Current result creation rate) & Results received in last hour? Grant Darwin NT ID: 2022185 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2022189 - Posted: 7 Dec 2019, 20:31:45 UTC - in response to Message 2022154. At my last company, no new or changed software was ever rolled out on a Friday, Wednesday was preferred, as being the farthest point from a weekend. Monday, people recovering from the weekend, Tuesday getting up to speed, Thursday thinking about the weekend, Friday, winding down. ;-) And i'd expect they did it on a stable functional system, not after a system crash. I've also seen mention of a gradual increase in work limits. To me, that would have been a 50% increase, then see how it goes for a few days (even a week), and if ok another 50% increase (over the original limit value). A 400% increase doesn't seem all that gradual to me. Grant Darwin NT ID: 2022189 ·

Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640	Message 2022190 - Posted: 7 Dec 2019, 20:33:02 UTC maybe not the best methodology, but things are slowly coming back to normal. all seems ok right now. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ID: 2022190 ·

Siran d'Vel'nahr Volunteer tester Send message Joined: 23 May 99 Posts: 7379 Credit: 44,181,323 RAC: 238	Message 2022192 - Posted: 7 Dec 2019, 20:41:53 UTC - in response to Message 2022162. Hi Richard, If you were regularly asking for more work than the limits allowed (100 tasks per device), your PCs will have been perpetually hungry, and - like Oliver Twist - always asking for more. If they now have their fill, there's no need to ask so often, and nothing to trigger the early reporting. I do believe I understand now. I had gotten back from the store and saw that I had about 60+ WUs to report. I went ahead and reported them. I was just gone watching a video for about an hour or so, about as long as when I was at the store, and there weren't nearly as many when I checked BT. I went ahead and reported them before I read your reply. Now, after reading the above explanation, I will just leave well enough alone. Thanks for this! :) Have a great day! :) Siran CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath ID: 2022192 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2022195 - Posted: 7 Dec 2019, 20:52:28 UTC - in response to Message 2022192. I had gotten back from the store and saw that I had about 60+ WUs to report. I went ahead and reported them. I was just gone watching a video for about an hour or so, about as long as when I was at the store, and there weren't nearly as many when I checked BT. I went ahead and reported them before I read your reply. Now, after reading the above explanation, I will just leave well enough alone. Thanks for this! :) Sounds like the behaviour you get with a large Addition days cache setting. Grant Darwin NT ID: 2022195 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2022197 - Posted: 7 Dec 2019, 20:57:50 UTC - in response to Message 2022190. maybe not the best methodology, but things are slowly coming back to normal. all seems ok right now. Apart from the growing Assimilator backlog, and no Ready-to-send buffer (which is why my Linux system is still unable to build up it's cache, although it is doing a lot better than it was...) Looks like the Results in progress is starting to level off a bit now, so that should help things along considerably. Grant Darwin NT ID: 2022197 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 2022198 - Posted: 7 Dec 2019, 20:59:09 UTC - in response to Message 2022195. No, not really. I have 0 day additional cache setting and I only report crunched results every hour because I am way in excess of my normal cache settings because of the changes. Dropping the spoofed gpu count down considerably and also changing the daily cache level to get to 200 gpu tasks. Half a day only netted me around 130-150 cpu tasks after the change. Moved to 0.75 day for the primary cache. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 2022198 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2022199 - Posted: 7 Dec 2019, 21:06:04 UTC - in response to Message 2022198. Last modified: 7 Dec 2019, 21:12:15 UTC No, not really. I have 0 day additional cache setting and I only report crunched results every hour because I am way in excess of my normal cache settings because of the changes. I understand that, but if you have a large Additional day setting, even when you drop below your "Store at least value" the work won't be reported till you hit the "Up to an Additional" setting. In my case, I have to report frequently, more than 70 tasks reported at a time after outages results in Scheduler issues and nothing gets reported, so I just have the Additional setting for 0.02. I'll see how things go when my Windows system gets down to the current cache setting, and my Linux system finally gets up to it. I'd like to have 24 hours work. As short as the weekly outages have been, it's still been taking the system a while to recover & build the Ready-to-send buffer back up. What the effect of the new server-side limits will be is anyone's guess. Ideally, none if everyone's caches are at their new maximum limits. But if they're low going in to an outage, i'd expect the after outage server recovery period to be much, much longer. Look at how long it's taking to get a Ready-to-send buffer again, even with the splitters working as well as they have for a while. Grant Darwin NT ID: 2022199 ·

Siran d'Vel'nahr Volunteer tester Send message Joined: 23 May 99 Posts: 7379 Credit: 44,181,323 RAC: 238	Message 2022200 - Posted: 7 Dec 2019, 21:10:07 UTC - in response to Message 2022195. I had gotten back from the store and saw that I had about 60+ WUs to report. I went ahead and reported them. I was just gone watching a video for about an hour or so, about as long as when I was at the store, and there weren't nearly as many when I checked BT. I went ahead and reported them before I read your reply. Now, after reading the above explanation, I will just leave well enough alone. Thanks for this! :) Sounds like the behaviour you get with a large Addition days cache setting. Hi Grant, I have my settings at "Store at least [ 1 ] days of work" and "Store up to an additional [ 0.5 ] days of work". Do I need to change either of those 2 settings? I believe I have the old Linux PC set that way too. Have a great day! :) Siran CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath ID: 2022200 ·

Kiska Volunteer tester Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0	Message 2022203 - Posted: 7 Dec 2019, 21:12:25 UTC - in response to Message 2022185. https://munin.kiska.pw/munin/Munin-Node/Munin-Node/index.html#setiathome Thanks for that, I was really missing the graphs. Edit- is there one available for splitter output (Current result creation rate) & Results received in last hour? I'll put that in once I remember how I setup munin :D ID: 2022203 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2022204 - Posted: 7 Dec 2019, 21:13:36 UTC - in response to Message 2022200. I have my settings at "Store at least [ 1 ] days of work" and "Store up to an additional [ 0.5 ] days of work". Do I need to change either of those 2 settings? I believe I have the old Linux PC set that way too. A definite, maybe... See how things go over the next day or so as they (hopefully) settle down. Grant Darwin NT ID: 2022204 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 2022205 - Posted: 7 Dec 2019, 21:14:17 UTC - in response to Message 2022203. I'll put that in once I remember how I setup munin :D Excellent! Grant Darwin NT ID: 2022205 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22205 Credit: 416,307,556 RAC: 380	Message 2022208 - Posted: 7 Dec 2019, 21:24:44 UTC - in response to Message 2022200. Leave the "store x days" as it is, but reduce the Store additional days" to a much smaller fraction, say 0.1, or even 0.02. This will ensure your cache remains fairly constant in size by being topped up regularly. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 2022208 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.