The Server Issues / Outages Thread - Panic Mode On! (117)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 31 · 32 · 33 · 34 · 35 · 36 · 37 . . . 52 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2022157 - Posted: 7 Dec 2019, 18:42:27 UTC - in response to Message 2022136.  

The results out in the field is over 6 million and it is still splitting fine (at a high rate). It still can't keep up, as it has a large backlog of "holes" to fill.
Yep. Even with a 0.2 day cache setting, my Linux system is still unable to get work often enough to actually fill it's cache. Still taking 4-8 Scheduler requests to get any work.
Grant
Darwin NT
ID: 2022157 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2022160 - Posted: 7 Dec 2019, 18:50:58 UTC

Greetings,

Is there any reason that, because of the lack of the 100 tasks per day per device limit, that my PCs don't report finished WUs regularly now? I have to manually do it with BT because they just add up and stick around.

My Pis, laptop and tablet do not seem to be affected. They aren't getting any more WUs than they were before. They are reporting regularly.

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2022160 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2022162 - Posted: 7 Dec 2019, 19:00:55 UTC - in response to Message 2022160.  

my PCs don't report finished WUs regularly now?
The PCs don't report work 'regularly': they contact the servers when they feel they want new work.

If you were regularly asking for more work than the limits allowed (100 tasks per device), your PCs will have been perpetually hungry, and - like Oliver Twist - always asking for more. If they now have their fill, there's no need to ask so often, and nothing to trigger the early reporting.

If there ever is any unreported work hanging around, it should be sent automatically after not longer than 1 hour. That's quick enough - you can relax your trigger finger.

That reduction in the number of wasteful scheduler contacts (asking for work that will never be sent) might even be a beneficial side-effect of the new limits, once they've settled down.
ID: 2022162 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 2022164 - Posted: 7 Dec 2019, 19:14:26 UTC - in response to Message 2022162.  

I'm more worried about post maintenance work reporting of that many work units out there.
ID: 2022164 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2022167 - Posted: 7 Dec 2019, 19:19:08 UTC - in response to Message 2022164.  

I'm more worried about post maintenance work reporting of that many work units out there.


Results out in the field 6,524,999 let's see what we have in few hours.
ID: 2022167 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24912
Credit: 3,081,182
RAC: 7
Ireland
Message 2022168 - Posted: 7 Dec 2019, 19:21:02 UTC

Nice, this should see me through to Xmas. :-)
07/12/2019 15:31:53 | SETI@home | Scheduler request completed: got 100 new tasks
07/12/2019 15:36:57 | SETI@home | Sending scheduler request: To fetch work.
07/12/2019 15:36:57 | SETI@home | Requesting new tasks for CPU
07/12/2019 15:37:01 | SETI@home | Scheduler request completed: got 96 new tasks
07/12/2019 15:42:08 | SETI@home | Scheduler request completed: got 0 new tasks
07/12/2019 15:42:08 | SETI@home | No tasks sent
07/12/2019 15:42:08 | SETI@home | This computer has reached a limit on tasks in progress
ID: 2022168 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 2022180 - Posted: 7 Dec 2019, 20:06:07 UTC - in response to Message 2022067.  

I hope we don't come to regret this.
That's the million dollar question. Have we been given what we wished for, or is it just because something's broken?
If we have been given it, I would have thought it would be a case of gradually increasing the limits to make sure the system doesn't fall over in a screaming heap. To just remove them seems rather, risky. Especially considering how sluggish the system has been over the last few weeks (although now without the Haveland graphs we can't actually see that occurring anymore).


I guess it looks *fine*
https://munin.kiska.pw/munin/Munin-Node/Munin-Node/index.html#setiathome
ID: 2022180 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2022185 - Posted: 7 Dec 2019, 20:24:42 UTC - in response to Message 2022180.  
Last modified: 7 Dec 2019, 20:29:30 UTC

https://munin.kiska.pw/munin/Munin-Node/Munin-Node/index.html#setiathome
Thanks for that, I was really missing the graphs.

Edit- is there one available for splitter output (Current result creation rate) & Results received in last hour?
Grant
Darwin NT
ID: 2022185 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2022189 - Posted: 7 Dec 2019, 20:31:45 UTC - in response to Message 2022154.  

At my last company, no new or changed software was ever rolled out on a Friday, Wednesday was preferred, as being the farthest point from a weekend.

Monday, people recovering from the weekend, Tuesday getting up to speed, Thursday thinking about the weekend, Friday, winding down. ;-)
And i'd expect they did it on a stable functional system, not after a system crash.




I've also seen mention of a gradual increase in work limits. To me, that would have been a 50% increase, then see how it goes for a few days (even a week), and if ok another 50% increase (over the original limit value). A 400% increase doesn't seem all that gradual to me.
Grant
Darwin NT
ID: 2022189 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2022190 - Posted: 7 Dec 2019, 20:33:02 UTC

maybe not the best methodology, but things are slowly coming back to normal. all seems ok right now.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2022190 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2022192 - Posted: 7 Dec 2019, 20:41:53 UTC - in response to Message 2022162.  

Hi Richard,

If you were regularly asking for more work than the limits allowed (100 tasks per device), your PCs will have been perpetually hungry, and - like Oliver Twist - always asking for more. If they now have their fill, there's no need to ask so often, and nothing to trigger the early reporting.

I do believe I understand now. I had gotten back from the store and saw that I had about 60+ WUs to report. I went ahead and reported them. I was just gone watching a video for about an hour or so, about as long as when I was at the store, and there weren't nearly as many when I checked BT. I went ahead and reported them before I read your reply. Now, after reading the above explanation, I will just leave well enough alone. Thanks for this! :)

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2022192 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2022195 - Posted: 7 Dec 2019, 20:52:28 UTC - in response to Message 2022192.  

I had gotten back from the store and saw that I had about 60+ WUs to report. I went ahead and reported them. I was just gone watching a video for about an hour or so, about as long as when I was at the store, and there weren't nearly as many when I checked BT. I went ahead and reported them before I read your reply. Now, after reading the above explanation, I will just leave well enough alone. Thanks for this! :)
Sounds like the behaviour you get with a large Addition days cache setting.
Grant
Darwin NT
ID: 2022195 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2022197 - Posted: 7 Dec 2019, 20:57:50 UTC - in response to Message 2022190.  

maybe not the best methodology, but things are slowly coming back to normal. all seems ok right now.
Apart from the growing Assimilator backlog, and no Ready-to-send buffer (which is why my Linux system is still unable to build up it's cache, although it is doing a lot better than it was...)

Looks like the Results in progress is starting to level off a bit now, so that should help things along considerably.
Grant
Darwin NT
ID: 2022197 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2022198 - Posted: 7 Dec 2019, 20:59:09 UTC - in response to Message 2022195.  

No, not really. I have 0 day additional cache setting and I only report crunched results every hour because I am way in excess of my normal cache settings because of the changes. Dropping the spoofed gpu count down considerably and also changing the daily cache level to get to 200 gpu tasks. Half a day only netted me around 130-150 cpu tasks after the change. Moved to 0.75 day for the primary cache.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2022198 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2022199 - Posted: 7 Dec 2019, 21:06:04 UTC - in response to Message 2022198.  
Last modified: 7 Dec 2019, 21:12:15 UTC

No, not really. I have 0 day additional cache setting and I only report crunched results every hour because I am way in excess of my normal cache settings because of the changes.
I understand that, but if you have a large Additional day setting, even when you drop below your "Store at least value" the work won't be reported till you hit the "Up to an Additional" setting.
In my case, I have to report frequently, more than 70 tasks reported at a time after outages results in Scheduler issues and nothing gets reported, so I just have the Additional setting for 0.02.

I'll see how things go when my Windows system gets down to the current cache setting, and my Linux system finally gets up to it.

I'd like to have 24 hours work. As short as the weekly outages have been, it's still been taking the system a while to recover & build the Ready-to-send buffer back up. What the effect of the new server-side limits will be is anyone's guess. Ideally, none if everyone's caches are at their new maximum limits. But if they're low going in to an outage, i'd expect the after outage server recovery period to be much, much longer.
Look at how long it's taking to get a Ready-to-send buffer again, even with the splitters working as well as they have for a while.
Grant
Darwin NT
ID: 2022199 · Report as offensive
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2022200 - Posted: 7 Dec 2019, 21:10:07 UTC - in response to Message 2022195.  

I had gotten back from the store and saw that I had about 60+ WUs to report. I went ahead and reported them. I was just gone watching a video for about an hour or so, about as long as when I was at the store, and there weren't nearly as many when I checked BT. I went ahead and reported them before I read your reply. Now, after reading the above explanation, I will just leave well enough alone. Thanks for this! :)
Sounds like the behaviour you get with a large Addition days cache setting.

Hi Grant,

I have my settings at "Store at least [ 1 ] days of work" and "Store up to an additional [ 0.5 ] days of work". Do I need to change either of those 2 settings? I believe I have the old Linux PC set that way too.

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2022200 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 2022203 - Posted: 7 Dec 2019, 21:12:25 UTC - in response to Message 2022185.  

https://munin.kiska.pw/munin/Munin-Node/Munin-Node/index.html#setiathome
Thanks for that, I was really missing the graphs.

Edit- is there one available for splitter output (Current result creation rate) & Results received in last hour?


I'll put that in once I remember how I setup munin :D
ID: 2022203 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2022204 - Posted: 7 Dec 2019, 21:13:36 UTC - in response to Message 2022200.  

I have my settings at "Store at least [ 1 ] days of work" and "Store up to an additional [ 0.5 ] days of work". Do I need to change either of those 2 settings? I believe I have the old Linux PC set that way too.
A definite, maybe...
See how things go over the next day or so as they (hopefully) settle down.
Grant
Darwin NT
ID: 2022204 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 2022205 - Posted: 7 Dec 2019, 21:14:17 UTC - in response to Message 2022203.  

I'll put that in once I remember how I setup munin :D
Excellent!
Grant
Darwin NT
ID: 2022205 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22535
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2022208 - Posted: 7 Dec 2019, 21:24:44 UTC - in response to Message 2022200.  

Leave the "store x days" as it is, but reduce the Store additional days" to a much smaller fraction, say 0.1, or even 0.02. This will ensure your cache remains fairly constant in size by being topped up regularly.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2022208 · Report as offensive
Previous · 1 . . . 31 · 32 · 33 · 34 · 35 · 36 · 37 . . . 52 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.