The Server Issues / Outages Thread - Panic Mode On! (117)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 33 · 34 · 35 · 36 · 37 · 38 · 39 . . . 52 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2022253 - Posted: 8 Dec 2019, 0:01:01 UTC - in response to Message 2022249.  

I'm starting to see a small amount in the Ready to Send Queue... 40K. I take this as a good sign. Are some of the faster machines now getting some WUs to fill the cache??

Think that is the effect of all the spoofed clients reducing their gpu count to reasonable levels owing the 400 per gpu limit now. I certainly backed off considerably on all my hosts. Still working through all the overabundance of gpu tasks trying find the new reduced cache floor. Haven't asked for gpu work since discovering the new limits this morning.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2022253 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1855
Credit: 268,616,081
RAC: 1,349
United States
Message 2022254 - Posted: 8 Dec 2019, 0:03:09 UTC - in response to Message 2022241.  


. . Damn, a perfectly good joke ruined/wasted ...
nope It just got backfired and got much funnier ...
+1 !!
ID: 2022254 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13789
Credit: 208,696,464
RAC: 304
Australia
Message 2022258 - Posted: 8 Dec 2019, 0:09:44 UTC - in response to Message 2022253.  

I'm starting to see a small amount in the Ready to Send Queue... 40K. I take this as a good sign. Are some of the faster machines now getting some WUs to fill the cache??
Think that is the effect of all the spoofed clients reducing their gpu count to reasonable levels owing the 400 per gpu limit now.
And hosts such as my Linux one that kept getting mostly "Project has no tasks available" responses when trying for work, now that's it's regularly getting work it's finally managed to fill it's cache.
The Results-in-progress line is now more horizontal than vertical; it's still going to take a while for things to settle down but the end is in sight. Looks like there will be an extra 1.8 million or so WUs out with hosts now (around 6.8 million in total).
Grant
Darwin NT
ID: 2022258 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2022267 - Posted: 8 Dec 2019, 0:44:11 UTC - in response to Message 2022254.  


. . Damn, a perfectly good joke ruined/wasted ...
nope It just got backfired and got much funnier ...
+1 !!


. . OK, I'll go get my big red nose ... :)

Stephen
ID: 2022267 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 35476
Credit: 261,360,520
RAC: 489
Australia
Message 2022269 - Posted: 8 Dec 2019, 0:57:28 UTC - in response to Message 2022267.  


. . Damn, a perfectly good joke ruined/wasted ...

nope It just got backfired and got much funnier ...
+1 !!
. . OK, I'll go get my big red nose ... :)

Stephen
At least you got your laughs Stephen (just not the way you thought). :-D

Cheers.
ID: 2022269 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 2022270 - Posted: 8 Dec 2019, 1:04:07 UTC - in response to Message 2022248.  

I'll put that in once I remember how I setup munin :D
Excellent!


Ok I'll added both graphs. But I don't have historical data since I just added them

Thank you very much!

On my wishlist: Result turnaround time (last hour average)
Its an good indication on when there are much shorties in the system.
Earlier this year (Aug) if the value went under 30h - then I suspected that the system will be in trouble in a couple of hours
I have not yet seen when start to be trouble again- 26h seems to be fine

Thanks again!


Ummm.... sure, I'll add it when I have time.

So I guess soonâ„¢ is apt
ID: 2022270 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13789
Credit: 208,696,464
RAC: 304
Australia
Message 2022295 - Posted: 8 Dec 2019, 3:28:33 UTC
Last modified: 8 Dec 2019, 3:28:52 UTC

Just had another bunch of stuck downloads in extended back-off, ended up going through on first manual retry, although several of them were rather reluctant.
Grant
Darwin NT
ID: 2022295 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13789
Credit: 208,696,464
RAC: 304
Australia
Message 2022330 - Posted: 8 Dec 2019, 7:34:47 UTC

Looks like Results-out-in-the-field has found it's new level, but the splitters are still struggling to refill the Ready-to-send buffer.
Grant
Darwin NT
ID: 2022330 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 2022344 - Posted: 8 Dec 2019, 10:37:17 UTC - in response to Message 2022330.  

Looks like Results-out-in-the-field has found it's new level, but the splitters are still struggling to refill the Ready-to-send buffer.


I think I've dealt with the errant plugin causing some gaps in the monitoring. But if Yafu doesn't sort out their response times... I'll have to drop them from graphing

Also we should move this to another thread
ID: 2022344 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13789
Credit: 208,696,464
RAC: 304
Australia
Message 2022466 - Posted: 9 Dec 2019, 9:33:10 UTC

Splitters still unable to refill Ready-to-send buffer.
Grant
Darwin NT
ID: 2022466 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2022484 - Posted: 9 Dec 2019, 15:27:33 UTC - in response to Message 2022466.  
Last modified: 9 Dec 2019, 15:29:28 UTC

Splitters still unable to refill Ready-to-send buffer.


The hourly return rate is high (148K) and the out in the field has crossed the 7 million mark. The system is still filling a hole. I'm glad we have any RTS.

On further thought, I really like the extra WUs, and I think it will allow me to not connect on Tuesdays and skip the whole maintenance "hunger games" grab for WUs

edit: I wonder if they changed the size of the RTS queue. There isn't a need for such a large RTS queue if we all have larger cache of them.
ID: 2022484 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14659
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2022486 - Posted: 9 Dec 2019, 15:37:23 UTC - in response to Message 2022484.  

Looking at Kiska's RTS graph, it's very striking how quickly RTS dropped on Friday night. That says a lot of hosts were set to cache more than they were allowed, and might have been hammering on the server doors all this time.

That might, paradoxically, make the server load lighter in the future, once we've filled everyone up.
ID: 2022486 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2022503 - Posted: 9 Dec 2019, 17:43:49 UTC - in response to Message 2022484.  

Splitters still unable to refill Ready-to-send buffer.


The hourly return rate is high (148K) and the out in the field has crossed the 7 million mark. The system is still filling a hole. I'm glad we have any RTS.

On further thought, I really like the extra WUs, and I think it will allow me to not connect on Tuesdays and skip the whole maintenance "hunger games" grab for WUs

edit: I wonder if they changed the size of the RTS queue. There isn't a need for such a large RTS queue if we all have larger cache of them.


. . The RTS is a buffer between splitter output and WU demand which is proportional to the results returned number. Regardless of the size of caches 'in the field', the RTS needs to be what it is so that work requests can be met and the higher the rate of returns the bigger it needs to be, unless the splitter output is itself significantly higher that the rate of returns fairly constantly.

Stephen

. .
ID: 2022503 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2022504 - Posted: 9 Dec 2019, 17:51:01 UTC - in response to Message 2022486.  

Looking at Kiska's RTS graph, it's very striking how quickly RTS dropped on Friday night. That says a lot of hosts were set to cache more than they were allowed, and might have been hammering on the server doors all this time.

That might, paradoxically, make the server load lighter in the future, once we've filled everyone up.


. . Undoubtedly! Most of the hosts in the field request far more work than the previous limits allowed, which is why it was very 'brave' to suddenly increase those limits manifold without some prior notice and recommendations, such as "reduce the size of your work requests". A host with a pair of older GPUs that can only process a couple of dozen WUs per day does not need a cache of 800 GPU WUs, and some old clunker out there with a Core 2 duo and no GPU crunching through a dozen or so WUs/day does not need 200 CPU WUs. But that is a long time bug bear of mine.

Stephen

:(
ID: 2022504 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14659
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2022507 - Posted: 9 Dec 2019, 18:00:08 UTC - in response to Message 2022504.  

Some of us have been advising people to moderate their cache requests for decades - but the message never seems to get through :-(

It's hard get get people to think of themselves as part of a larger collective: every part of that collective has to work together, or we all suffer.
ID: 2022507 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2022512 - Posted: 9 Dec 2019, 18:28:10 UTC - in response to Message 2022507.  

Some of us have been advising people to moderate their cache requests for decades - but the message never seems to get through :-(

It's hard get get people to think of themselves as part of a larger collective: every part of that collective has to work together, or we all suffer.

Hi Richard,

Resistance is futile, you will be assimilated. We will add your biological and technological distinctiveness to our own. lol ;)

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2022512 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22323
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2022522 - Posted: 9 Dec 2019, 19:28:14 UTC - in response to Message 2022507.  

There are a couple of server side "tricks" that could totally thwart attempts at having excessively large caches - I can well imagine the gnashing of teeth that would ensue if one of them was triggered....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2022522 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2022524 - Posted: 9 Dec 2019, 19:41:14 UTC
Last modified: 9 Dec 2019, 19:52:19 UTC

Current result creation rate **	0/sec	0/sec	0/sec	5m


Panic mode ON?

<edit> Never mind, it's back now.
ID: 2022524 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1855
Credit: 268,616,081
RAC: 1,349
United States
Message 2022529 - Posted: 9 Dec 2019, 20:27:46 UTC - in response to Message 2022507.  

Some of us have been advising people to moderate their cache requests for decades - but the message never seems to get through :-(

It's hard get get people to think of themselves as part of a larger collective: every part of that collective has to work together, or we all suffer.

People react to what they see, not what they're told, I believe.
In that regard, I think the previous low hard limits were in fact counterproductive. Personally, I like being able to set realistic cache time and actually see an effect. If nothing else, return times should improve a bit based on an improved CPU cache turnaround.
ID: 2022529 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11378
Credit: 29,581,041
RAC: 66
United States
Message 2022531 - Posted: 9 Dec 2019, 20:48:29 UTC

Methinks Tuesday will be very interesting with the low RTS cache.
ID: 2022531 · Report as offensive
Previous · 1 . . . 33 · 34 · 35 · 36 · 37 · 38 · 39 . . . 52 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.