The Server Issues / Outages Thread - Panic Mode On! (117)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 32 · 33 · 34 · 35 · 36 · 37 · 38 . . . 52 · Next

AuthorMessage
Profile Siran d'Vel'nahr
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 7379
Credit: 44,181,323
RAC: 238
United States
Message 2022209 - Posted: 7 Dec 2019, 21:25:46 UTC - in response to Message 2022204.  

Hi Grant,

A definite, maybe...
See how things go over the next day or so as they (hopefully) settle down.

Ok, will do. I'll only use my "trigger finger" if'n I see a great many ready to report when I'm gonna log into Winders.

Have a great day! :)

Siran
CAPT Siran d'Vel'nahr - L L & P _\\//
Winders 11 OS? "What a piece of junk!" - L. Skywalker
"Logic is the cement of our civilization with which we ascend from chaos using reason as our guide." - T'Plana-hath
ID: 2022209 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2022213 - Posted: 7 Dec 2019, 21:47:16 UTC

I have my cache settings at 'whatever you want' days plus an additional 0.05 days. That way, 'report after an hour' pretty much matches up with 'request work after an hour' and things potter along very harmoniously.
ID: 2022213 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2022214 - Posted: 7 Dec 2019, 21:50:50 UTC - in response to Message 2022200.  

Hi Grant,
I have my settings at "Store at least [ 1 ] days of work" and "Store up to an additional [ 0.5 ] days of work". Do I need to change either of those 2 settings? I believe I have the old Linux PC set that way too.
Have a great day! :)
Siran


. . If you want your machine to report results more frequently try setting the 'additional work' value to zero. I have mine set like that on all 5 machines and as long as my caches are at nominal levels it reports every 5 mins.

Stephen

. .
ID: 2022214 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2022216 - Posted: 7 Dec 2019, 21:52:04 UTC - in response to Message 2022199.  

In my case, I have to report frequently, more than 70 tasks reported at a time after outages results in Scheduler issues and nothing gets reported.
False logic. Even if you manage to report everything outstanding right at the beginning of an outage, you still have every task completed during the outage still waiting at the end. There are other ways of ensuring that the 'end of outage report' isn't too large, and they work even if we have a 12 hour outage or a 2 day outage next time.
ID: 2022216 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2022217 - Posted: 7 Dec 2019, 21:53:33 UTC - in response to Message 2022209.  

Hi Grant,
Ok, will do. I'll only use my "trigger finger" if'n I see a great many ready to report when I'm gonna log into Winders.
Have a great day! :)
Siran


. . Or maybe only use it if the machine has not reported results in over 1 hour.

Stephen

:)
ID: 2022217 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2022219 - Posted: 7 Dec 2019, 22:02:43 UTC

Once I used the parameter in the cc_config.xml file to reduce the # of reported tasks to something lower than 400-500 the server stopped having a hiccup every time I tried to report after the last Tuesday outage.

Some like this:

<cc_config>
 <log_flags>
   <sched_op_debug>1</sched_op_debug>
 </log_flags>
 <options>
   <use_all_gpus>1</use_all_gpus>
   <save_stats_days>90</save_stats_days>
   <max_file_xfers>16</max_file_xfers>
   <max_file_xfers_per_project>8</max_file_xfers_per_project>
 <max_tasks_reported>150</max_tasks_reported>
 </options>
</cc_config>


Tom
A proud member of the OFA (Old Farts Association).
ID: 2022219 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2022223 - Posted: 7 Dec 2019, 22:07:03 UTC

If I remember correctly, Richard can confirm . . . . the client is hard-wired in the scheduler module to contact the project at minimum every hour. Yep, in cs_scheduler.cpp

// report results within this time after completion
//
#define MAX_REPORT_DELAY    3600

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2022223 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2022226 - Posted: 7 Dec 2019, 22:14:20 UTC

Yes to both of those -

MAX_REPORT_DELAY 3600
<max_tasks_reported>150</max_tasks_reported>

- though I'd personally take <max_tasks_reported> even lower, to perhaps 64.

Only caveat: neither of those were included in the earlier versions of BOINC. If you're still using one of those, I'd seriously suggest you consider upgrading it: there are some good things in the newer versions, even if you have to do a little re-learning.
ID: 2022226 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 2022228 - Posted: 7 Dec 2019, 22:26:32 UTC - in response to Message 2022226.  
Last modified: 7 Dec 2019, 22:31:01 UTC

False logic. Even if you manage to report everything outstanding right at the beginning of an outage, you still have every task completed during the outage still waiting at the end.

<max_tasks_reported>150</max_tasks_reported>
Which is why I have that set to 75 to avoid Scheduler problems when reporting.
Avoid letting it build up to problem levels (when possible), and limit the number to avoid problems when it's not possible to avoid the build up in the first place.



Edit- oh, and it's taken a while but finally my Linux system is able to get work on most requests.
Grant
Darwin NT
ID: 2022228 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2022229 - Posted: 7 Dec 2019, 22:26:44 UTC - in response to Message 2022226.  

Yes to both of those -
MAX_REPORT_DELAY 3600
<max_tasks_reported>150</max_tasks_reported>
- though I'd personally take <max_tasks_reported> even lower, to perhaps 64.

Only caveat: neither of those were included in the earlier versions of BOINC. If you're still using one of those, I'd seriously suggest you consider upgrading it: there are some good things in the newer versions, even if you have to do a little re-learning.


. . It's OK Wiggo, it is ONLY a recommendation, you can keep using your prehistoric version of BOINC :)

Stephen

:)
ID: 2022229 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 2022231 - Posted: 7 Dec 2019, 22:32:58 UTC

. . It's OK Wiggo, it is ONLY a recommendation, you can keep using your prehistoric version of BOINC :)

Stephen
But I'm not using Windows any more Stephen. ;-)

Cheers.
ID: 2022231 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2022233 - Posted: 7 Dec 2019, 22:34:50 UTC - in response to Message 2022231.  

. . It's OK Wiggo, it is ONLY a recommendation, you can keep using your prehistoric version of BOINC :)

Stephen
But I'm not using Windows any more Stephen. ;-)

Cheers.
And he's using BOINC v7.14.2 :-) Well done, that man.
ID: 2022233 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2022237 - Posted: 7 Dec 2019, 22:48:32 UTC - in response to Message 2022231.  

. . It's OK Wiggo, it is ONLY a recommendation, you can keep using your prehistoric version of BOINC :)

Stephen
But I'm not using Windows any more Stephen. ;-)

Cheers.


. . D'oh!

< Stephen slaps himself with a wet trout ... >

. . Damn, a perfectly good joke ruined/wasted ...

Stephen

:)
ID: 2022237 · Report as offensive
Lazydude
Volunteer tester

Send message
Joined: 17 Jan 01
Posts: 45
Credit: 96,158,001
RAC: 136
Sweden
Message 2022241 - Posted: 7 Dec 2019, 23:06:45 UTC - in response to Message 2022237.  

. . It's OK Wiggo, it is ONLY a recommendation, you can keep using your prehistoric version of BOINC :)

Stephen
But I'm not using Windows any more Stephen. ;-)

Cheers.


. . D'oh!

< Stephen slaps himself with a wet trout ... >

. . Damn, a perfectly good joke ruined/wasted ...

Stephen

:)

. . Damn, a perfectly good joke ruined/wasted ...
nope It just got backfired and got much funnier ...
ID: 2022241 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 2022242 - Posted: 7 Dec 2019, 23:08:56 UTC - in response to Message 2022205.  

I'll put that in once I remember how I setup munin :D
Excellent!


Ok I'll added both graphs. But I don't have historical data since I just added them
ID: 2022242 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 2022245 - Posted: 7 Dec 2019, 23:20:14 UTC - in response to Message 2022242.  

Ok I'll added both graphs. But I don't have historical data since I just added them
Thanks for that.
The graphs make it so much easier to see how things are going than having just the current numbers.
Grant
Darwin NT
ID: 2022245 · Report as offensive
Kiska
Volunteer tester

Send message
Joined: 31 Mar 12
Posts: 302
Credit: 3,067,762
RAC: 0
Australia
Message 2022246 - Posted: 7 Dec 2019, 23:22:21 UTC - in response to Message 2022245.  

Ok I'll added both graphs. But I don't have historical data since I just added them
Thanks for that.
The graphs make it so much easier to see how things are going than having just the current numbers.


Also I am sorry for anyone wanting those graphs to load the other million on the page... Just a side effect of graphing most projects
ID: 2022246 · Report as offensive
Lazydude
Volunteer tester

Send message
Joined: 17 Jan 01
Posts: 45
Credit: 96,158,001
RAC: 136
Sweden
Message 2022248 - Posted: 7 Dec 2019, 23:25:39 UTC - in response to Message 2022242.  

I'll put that in once I remember how I setup munin :D
Excellent!


Ok I'll added both graphs. But I don't have historical data since I just added them

Thank you very much!

On my wishlist: Result turnaround time (last hour average)
Its an good indication on when there are much shorties in the system.
Earlier this year (Aug) if the value went under 30h - then I suspected that the system will be in trouble in a couple of hours
I have not yet seen when start to be trouble again- 26h seems to be fine

Thanks again!
ID: 2022248 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2022249 - Posted: 7 Dec 2019, 23:28:48 UTC

I'm starting to see a small amount in the Ready to Send Queue... 40K. I take this as a good sign. Are some of the faster machines now getting some WUs to fill the cache??
ID: 2022249 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2022250 - Posted: 7 Dec 2019, 23:41:52 UTC - in response to Message 2022241.  

But I'm not using Windows any more Stephen. ;-)
Cheers.

. . D'oh!
< Stephen slaps himself with a wet trout ... >
. . Damn, a perfectly good joke ruined/wasted ...
Stephen
:)

. . Damn, a perfectly good joke ruined/wasted ...
nope It just got backfired and got much funnier ...


. . Glad it wasn't wasted ...

Stephen

:)
ID: 2022250 · Report as offensive
Previous · 1 . . . 32 · 33 · 34 · 35 · 36 · 37 · 38 . . . 52 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (117)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.