The Server Issues / Outages Thread - Panic Mode On! (117)

Author	Message
Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 2022253 - Posted: 8 Dec 2019, 0:01:01 UTC - in response to Message 2022249. I'm starting to see a small amount in the Ready to Send Queue... 40K. I take this as a good sign. Are some of the faster machines now getting some WUs to fill the cache?? Think that is the effect of all the spoofed clients reducing their gpu count to reasonable levels owing the 400 per gpu limit now. I certainly backed off considerably on all my hosts. Still working through all the overabundance of gpu tasks trying find the new reduced cache floor. Haven't asked for gpu work since discovering the new limits this morning. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 2022253 ·

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1857 Credit: 268,616,081 RAC: 1,349	Message 2022254 - Posted: 8 Dec 2019, 0:03:09 UTC - in response to Message 2022241. . . Damn, a perfectly good joke ruined/wasted ... nope It just got backfired and got much funnier ... +1 !! ID: 2022254 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13861 Credit: 208,696,464 RAC: 304	Message 2022258 - Posted: 8 Dec 2019, 0:09:44 UTC - in response to Message 2022253. I'm starting to see a small amount in the Ready to Send Queue... 40K. I take this as a good sign. Are some of the faster machines now getting some WUs to fill the cache?? Think that is the effect of all the spoofed clients reducing their gpu count to reasonable levels owing the 400 per gpu limit now. And hosts such as my Linux one that kept getting mostly "Project has no tasks available" responses when trying for work, now that's it's regularly getting work it's finally managed to fill it's cache. The Results-in-progress line is now more horizontal than vertical; it's still going to take a while for things to settle down but the end is in sight. Looks like there will be an extra 1.8 million or so WUs out with hosts now (around 6.8 million in total). Grant Darwin NT ID: 2022258 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 2022267 - Posted: 8 Dec 2019, 0:44:11 UTC - in response to Message 2022254. . . Damn, a perfectly good joke ruined/wasted ... nope It just got backfired and got much funnier ... +1 !! . . OK, I'll go get my big red nose ... :) Stephen ID: 2022267 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 36982 Credit: 261,360,520 RAC: 489	Message 2022269 - Posted: 8 Dec 2019, 0:57:28 UTC - in response to Message 2022267. . . Damn, a perfectly good joke ruined/wasted ... nope It just got backfired and got much funnier ... +1 !! . . OK, I'll go get my big red nose ... :) Stephen At least you got your laughs Stephen (just not the way you thought). :-D Cheers. ID: 2022269 ·

Kiska Volunteer tester Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0	Message 2022270 - Posted: 8 Dec 2019, 1:04:07 UTC - in response to Message 2022248. I'll put that in once I remember how I setup munin :D Excellent! Ok I'll added both graphs. But I don't have historical data since I just added them Thank you very much! On my wishlist: Result turnaround time (last hour average) Its an good indication on when there are much shorties in the system. Earlier this year (Aug) if the value went under 30h - then I suspected that the system will be in trouble in a couple of hours I have not yet seen when start to be trouble again- 26h seems to be fine Thanks again! Ummm.... sure, I'll add it when I have time. So I guess soonâ„¢ is apt ID: 2022270 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13861 Credit: 208,696,464 RAC: 304	Message 2022295 - Posted: 8 Dec 2019, 3:28:33 UTC Last modified: 8 Dec 2019, 3:28:52 UTC Just had another bunch of stuck downloads in extended back-off, ended up going through on first manual retry, although several of them were rather reluctant. Grant Darwin NT ID: 2022295 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13861 Credit: 208,696,464 RAC: 304	Message 2022330 - Posted: 8 Dec 2019, 7:34:47 UTC Looks like Results-out-in-the-field has found it's new level, but the splitters are still struggling to refill the Ready-to-send buffer. Grant Darwin NT ID: 2022330 ·

Kiska Volunteer tester Send message Joined: 31 Mar 12 Posts: 302 Credit: 3,067,762 RAC: 0	Message 2022344 - Posted: 8 Dec 2019, 10:37:17 UTC - in response to Message 2022330. Looks like Results-out-in-the-field has found it's new level, but the splitters are still struggling to refill the Ready-to-send buffer. I think I've dealt with the errant plugin causing some gaps in the monitoring. But if Yafu doesn't sort out their response times... I'll have to drop them from graphing Also we should move this to another thread ID: 2022344 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13861 Credit: 208,696,464 RAC: 304	Message 2022466 - Posted: 9 Dec 2019, 9:33:10 UTC Splitters still unable to refill Ready-to-send buffer. Grant Darwin NT ID: 2022466 ·

Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22	Message 2022484 - Posted: 9 Dec 2019, 15:27:33 UTC - in response to Message 2022466. Last modified: 9 Dec 2019, 15:29:28 UTC Splitters still unable to refill Ready-to-send buffer. The hourly return rate is high (148K) and the out in the field has crossed the 7 million mark. The system is still filling a hole. I'm glad we have any RTS. On further thought, I really like the extra WUs, and I think it will allow me to not connect on Tuesdays and skip the whole maintenance "hunger games" grab for WUs edit: I wonder if they changed the size of the RTS queue. There isn't a need for such a large RTS queue if we all have larger cache of them. ID: 2022484 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874	Message 2022486 - Posted: 9 Dec 2019, 15:37:23 UTC - in response to Message 2022484. Looking at Kiska's RTS graph, it's very striking how quickly RTS dropped on Friday night. That says a lot of hosts were set to cache more than they were allowed, and might have been hammering on the server doors all this time. That might, paradoxically, make the server load lighter in the future, once we've filled everyone up. ID: 2022486 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 2022503 - Posted: 9 Dec 2019, 17:43:49 UTC - in response to Message 2022484. Splitters still unable to refill Ready-to-send buffer. The hourly return rate is high (148K) and the out in the field has crossed the 7 million mark. The system is still filling a hole. I'm glad we have any RTS. On further thought, I really like the extra WUs, and I think it will allow me to not connect on Tuesdays and skip the whole maintenance "hunger games" grab for WUs edit: I wonder if they changed the size of the RTS queue. There isn't a need for such a large RTS queue if we all have larger cache of them. . . The RTS is a buffer between splitter output and WU demand which is proportional to the results returned number. Regardless of the size of caches 'in the field', the RTS needs to be what it is so that work requests can be met and the higher the rate of returns the bigger it needs to be, unless the splitter output is itself significantly higher that the rate of returns fairly constantly. Stephen . . ID: 2022503 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 2022504 - Posted: 9 Dec 2019, 17:51:01 UTC - in response to Message 2022486. Looking at Kiska's RTS graph, it's very striking how quickly RTS dropped on Friday night. That says a lot of hosts were set to cache more than they were allowed, and might have been hammering on the server doors all this time. That might, paradoxically, make the server load lighter in the future, once we've filled everyone up. . . Undoubtedly! Most of the hosts in the field request far more work than the previous limits allowed, which is why it was very 'brave' to suddenly increase those limits manifold without some prior notice and recommendations, such as "reduce the size of your work requests". A host with a pair of older GPUs that can only process a couple of dozen WUs per day does not need a cache of 800 GPU WUs, and some old clunker out there with a Core 2 duo and no GPU crunching through a dozen or so WUs/day does not need 200 CPU WUs. But that is a long time bug bear of mine. Stephen :( ID: 2022504 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14680 Credit: 200,643,578 RAC: 874	Message 2022507 - Posted: 9 Dec 2019, 18:00:08 UTC - in response to Message 2022504. Some of us have been advising people to moderate their cache requests for decades - but the message never seems to get through :-( It's hard get get people to think of themselves as part of a larger collective: every part of that collective has to work together, or we all suffer. ID: 2022507 ·

Siran d'Vel'nahr Volunteer tester Send message Joined: 23 May 99 Posts: 7379 Credit: 44,181,323 RAC: 238	Message 2022512 - Posted: 9 Dec 2019, 18:28:10 UTC - in response to Message 2022507. Some of us have been advising people to moderate their cache requests for decades - but the message never seems to get through :-( It's hard get get people to think of themselves as part of a larger collective: every part of that collective has to work together, or we all suffer. Hi Richard, Resistance is futile, you will be assimilated. We will add your biological and technological distinctiveness to our own. lol ;) Have a great day! :) Siran CAPT Siran d'Vel'nahr - L L & P _\\// Winders 11 OS? "What a piece of junk!" - L. Skywalker "Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath ID: 2022512 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22555 Credit: 416,307,556 RAC: 380	Message 2022522 - Posted: 9 Dec 2019, 19:28:14 UTC - in response to Message 2022507. There are a couple of server side "tricks" that could totally thwart attempts at having excessively large caches - I can well imagine the gnashing of teeth that would ensue if one of them was triggered.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 2022522 ·

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 2022524 - Posted: 9 Dec 2019, 19:41:14 UTC Last modified: 9 Dec 2019, 19:52:19 UTC Current result creation rate ** 0/sec 0/sec 0/sec 5m Panic mode ON? <edit> Never mind, it's back now. ID: 2022524 ·

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1857 Credit: 268,616,081 RAC: 1,349	Message 2022529 - Posted: 9 Dec 2019, 20:27:46 UTC - in response to Message 2022507. Some of us have been advising people to moderate their cache requests for decades - but the message never seems to get through :-( It's hard get get people to think of themselves as part of a larger collective: every part of that collective has to work together, or we all suffer. People react to what they see, not what they're told, I believe. In that regard, I think the previous low hard limits were in fact counterproductive. Personally, I like being able to set realistic cache time and actually see an effect. If nothing else, return times should improve a bit based on an improved CPU cache turnaround. ID: 2022529 ·

betreger Send message Joined: 29 Jun 99 Posts: 11419 Credit: 29,581,041 RAC: 66	Message 2022531 - Posted: 9 Dec 2019, 20:48:29 UTC Methinks Tuesday will be very interesting with the low RTS cache. ID: 2022531 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.