Panic Mode On (109) Server Problems?

Message boards : Number crunching : Panic Mode On (109) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 33 · 34 · 35 · 36

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1914094 - Posted: 19 Jan 2018, 21:37:29 UTC - in response to Message 1914035.  

I'd say the administrators need to shorten up the cron job interval on the deleters purge task so that we could maintain a higher average RTS buffer quantity. Or if the purge is threshold based, to lower it.

I think it's just a question of I/O congestion.
The deleters run all the time, however with the current rate of work return and the current rate of WU splitting required to keep that rate of return going, there's so much I/O contention that the deleters can't keep up. Eventually the I/O contention gets to such a point that the output of the splitters falls away, but the deleters still can't keep up with the load, so the backlog continues to grow. Eventually it gets to the point where the deleters are able to catch up & clear the backlog, then their reduced level of I/O allows the splitters to crank back up again; till the delete backlog & load reaches that trigger point & the splitter slow down again.
Rinse and repeat.
The combination of returned per hour, in progress, awaiting deletion & required splitter output is resulting in a huge amount of I/O, which is more than the servers can actually meet. So you end up with these moving trigger points where one function slows down and the other speeds up, then it slows down & the first one speeds up again. And back & forth they go.
That's my speculation based on minimal facts.
Grant
Darwin NT
ID: 1914094 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1914100 - Posted: 19 Jan 2018, 22:12:50 UTC - in response to Message 1914094.  
Last modified: 19 Jan 2018, 22:47:33 UTC


I think it's just a question of I/O congestion.
The deleters run all the time

OK, I'm sure I read in some other post in recent days that the deleters and purgers don't run continuously. Now I have to find that post.

[Edit] Found it. By Rob Smith Message 1913582
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1914100 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1914103 - Posted: 19 Jan 2018, 22:20:52 UTC - in response to Message 1914100.  

I think it's just a question of I/O congestion.
The deleters run all the time
OK, I'm sure I read in some other post in recent days that the deleters and purgers don't run continuously. Now I have to find that post.
That's probably a difference between Main and Beta. Beta certainly doesn't purge the database continuously - Eric likes to keep older tasks visible for comparison and retrospective bug-hunting. Main, on the other hand, needs to clear the decks within 24 hours or we're swamped.
ID: 1914103 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1914105 - Posted: 19 Jan 2018, 22:30:37 UTC - in response to Message 1914100.  

OK, I'm sure I read in some other post in recent days that the deleters and purgers don't run continuously. Now I have to find that post.

Got me curious too.
Generally (when things have been working well), the number of WU awaiting Validation, Assimilation and Deletion is generally around 0, occasionally 1-3 (emphasis on when everything is working OK). So even if they don't run all the time, they run when there is something to do. Which is pretty much all the time (especially with 145k results being returned per hour).

Looking at AP, where the return rate is less than 1 per minute at the moment, the WUs awaiting Validation, Assimilation & Deletion are around 1, with periods of 0 & a few periods of 2 or 3. It could be they run all the time, and those values of 1-3 are at the time the data is read, before the WU is processed. Or it could be as you say- they don't run all the time, only when there is work to be done.
Either way, it means the MB WU Validator/Deleter/Assimilators are running (effectively) all the time with 40/s there to be processed, as the values there were usually around (or very close to) 0.
Grant
Darwin NT
ID: 1914105 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1914179 - Posted: 20 Jan 2018, 7:17:21 UTC

Panic Mode On (110) Server Problems? Now open for business
ID: 1914179 · Report as offensive
Previous · 1 . . . 33 · 34 · 35 · 36

Message boards : Number crunching : Panic Mode On (109) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.