Message boards :
Number crunching :
Panic Mode On (109) Server Problems?
Message board moderation
Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 . . . 35 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Splitters still struggling. There was brief boost, but not enough to top up the ready-to-send buffer, or even stop the decline- just slow it down for a bit. About 3-3.5hrs work left at the current rate of consumption. Grant Darwin NT |
![]() ![]() ![]() ![]() Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 ![]() |
Here's a theory for y'all. Do you suppose Meltdown and Spectre patches have been applied to that server, possibly degrading performance? |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Here's a theory for y'all. Do you suppose Meltdown and Spectre patches have been applied to that server, possibly degrading performance? At the back of my mind also .... great minds thinking alike and all :-} Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
I notice that at the same time the Deleters cleared a huge backlog, the Splitters picked up their pace. They've since dropped their output again, but at least they're still producing enough to slowly build up the Ready-to-send buffer. Deleter I/O affecting splitter I/O? Grant Darwin NT |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1646 Credit: 12,921,799 RAC: 89 ![]() ![]() |
I notice that at the same time the Deleters cleared a huge backlog, the Splitters picked up their pace. They've since dropped their output again, but at least they're still producing enough to slowly build up the Ready-to-send buffer. You could be onto something there Grant. When you say a "huge backlog" was cleared are you talking about work unit files/result files waiting to be deleted or were you referring to the DB purge? Currently sitting at 3.463 million results ![]() |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
You could be onto something there Grant. When you say a "huge backlog" was cleared are you talking about work unit files/result files waiting to be deleted or were you referring to the DB purge? Currently sitting at 3.463 million results MB WU-awaiting-deletion went from 398,000 to 100,000 in 30min or less (hard to tell due to the scale of the graphs). At roughly that point in time, the splitters went from 35/s to over 60/s. WU-awaiting-deletion dropped slightly further, but since then has started climbing again. And as they have started climbing again, the splitter output has declined again (60/s, down to 50/s down to 30/s). Hence my wild speculation that some of the splitter issues are related to I/O contention in the database/ file storage. Received-last-hour is still around 135,000. Used to be 90k or over was a shorty storm. Then 90k-100k became the new norm. Now 135k. Used to be the Replica could keep up after the outages, not any more. Often it's only a few minutes behind, now there are more frequent periods of 30min or more. I/O bottleneck is my personal theory, be it security patch related, or just coming up on the limits of the present HDD based storage. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
MB WU-awating-deletion on the rise, splitter output on the decline (below 30/s now). About 5 hrs of work left in the Ready-to-send buffer at the present rate of it's decline. Grant Darwin NT |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
I heard Kevin Reed say that the World Community Grid servers had slowed by between 20% and 30% when they applied the patches. Fortunately, WCG had recently upgraded the hardware, so they had enough headroom - but they would have been struggling with the previous hardware.Here's a theory for y'all. Do you suppose Meltdown and Spectre patches have been applied to that server, possibly degrading performance?At the back of my mind also .... great minds thinking alike and all :-} Servers are different beasts from consumer PCs, and they do a different job. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I heard Kevin Reed say that the World Community Grid servers had slowed by between 20% and 30% when they applied the patches. Fortunately, WCG had recently upgraded the hardware, so they had enough headroom - but they would have been struggling with the previous hardware.Here's a theory for y'all. Do you suppose Meltdown and Spectre patches have been applied to that server, possibly degrading performance?At the back of my mind also .... great minds thinking alike and all :-} I have my suspicions too. After all servers do LOTS of I/O transactions. The most deleterious effect the patch had on server software was in any app that used a lot of I/O transactions in all the online tests I read. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() ![]() Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 ![]() |
Back about a week ago, Eric wrote: If we don't start building a queue I'll add more GBT splitters.I wonder if that's still an option. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Change the unused pfb splitters over to gbt splitters. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Back about a week ago, Eric wrote:Although Eric attended the same teleconference with Kevin Reed, Eric joined us a few minutes late: Kevin told us about the slowdown in the general chit-chat before the start of the formal business (which was about something completely different), so Eric didn't hear the actual statement. But I expect he's found out about it by now. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Thanks so much for the update Richard. Sounds like a real concern that we hope Eric addresses soon. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Back about a week ago, Eric wrote: I'm sure it is, but it would be better IMHO to sort out what is causing the slow downs. The current splitters are capable of sustaining 50+/s. Is the issue I/O contention? There's a pretty strong correlation looking at the graphs for contention between deletion & splitting- but correlation isn't causation. Have the exploit patches even been applied yet? (ie if they haven't then it sounds like things will be even worse than they are now). Will more RAM in the servers involved help with more caching? Or will it require a move to flash based storage? And if we make that move, will the current hardware running the queries be good enough to take advantage of that storage for some time to come, or will it quickly become the next bottleneck? The Ready-to-send buffer seems to have settled around 100k for now. The splitters crank up, then fall over, crank up, fall over. Along with the deleters clearing the backlog, then losing ground, then clearing it, then losing ground. Cause & effect or just another symptom? *shrug* Results & WU awaiting purge are also on the climb. Received-last-hour is sitting at 142k (after being over 145k for some time). The servers really are working hard at present. And it looks like we've just about finished off all those BLC05 file that were loaded in one batch. Grant Darwin NT |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
I thought for sure I saw mention of them applying the security patch and Jeff Cobb was involved. But I can't find the post now and I might be imagining. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1646 Credit: 12,921,799 RAC: 89 ![]() ![]() |
You could be onto something there Grant. When you say a "huge backlog" was cleared are you talking about work unit files/result files waiting to be deleted or were you referring to the DB purge? Currently sitting at 3.463 million results There could be hope when they load some more tapes with the longer units on them. Until this happens I guess 130 odd will be the new return on average per hour ![]() |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
There could be hope when they load some more tapes with the longer units on them. Until this happens I guess 130 odd will be the new return on average per hour It we get a batch of the longest running WUs it I suspect it could be 90k or less. My GPUs take 5min 10sec/44min to process these present WUs. The longer running WUs take 8min/1hr 15min+ to process. Grant Darwin NT |
![]() ![]() ![]() Send message Joined: 1 Jun 01 Posts: 1282 Credit: 187,688,550 RAC: 182 ![]() ![]() |
Looks like the Replica is falling further and further behind. What is coming next......... ![]() SETI@home classic workunits 4,019 SETI@home classic CPU time 34,348 hours |
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51527 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
Just got word from Eric that he's gonna try to add a couple more GBT splitters. Meow! "Time is simply the mechanism that keeps everything from happening all at once." ![]() |
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51527 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
Just got word from Eric that he's gonna try to add a couple more GBT splitters. It would not surprise me if Eric took care of adding more splitter cache at the same time. We all know there is tons of it to work on . "Time is simply the mechanism that keeps everything from happening all at once." ![]() |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.