Message boards :
Number crunching :
Panic Mode On (109) Server Problems?
Message board moderation
Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 . . . 36 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
That's fine and works if you only have one main project at a time. But if you run multiple projects at the same time, it does not. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
oh, and if you forget to change 0.1 back to 10 when you move from Einstein to Seti, you figure it out quickly as you don't get the full allotment of 100 work units per GPU/CPU........ And if you forget to change 4.0+0.01 when changing to Einstein (with RS <> 0) to find E@H doesn't have a 100 task limit ... last time I did that and turned my back for a few minutes and have had IIRC 736 tasks ... way, way over committed! |
betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66 |
oh, and if you forget to change 0.1 back to 10 when you move from Einstein to Seti, you figure it out quickly as you don't get the full allotment of 100 work units per GPU/CPU........ And if you forget to change 4.0+0.01 when changing to Einstein (with RS <> 0) to find E@H doesn't have a 100 task limit ... last time I did that and turned my back for a few minutes and have had IIRC 736 tasks ... way, way over committed! Yep you gotta be careful, very dangerous after the cocktail hour. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I kept threatening, mainly toggling preferences and the Triple Update. Haven't resorted to kicking server. Cache is full right now. Not going to do anything. Will have to see where I am at in the morning. Calling a night. I've been seeing the same on my Linux machines today. Similar to a rolling Blackout. The Server will stop sending the tasks requested by the Client and just send a few tasks at random. Once the Host is down by around 100 tasks the Server will recover and fill the cache. A while later the same will happen on a different machine. The current victim is down by about 70 tasks and just received 5 new tasks instead of the 70 or so the client is requesting. The 3 update routine hasn't had any effect so far. The cache should be around 220 on this machine, https://setiathome.berkeley.edu/results.php?hostid=6906726&offset=140 |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
oh, and if you forget to change 0.1 back to 10 when you move from Einstein to Seti, you figure it out quickly as you don't get the full allotment of 100 work units per GPU/CPU........ And if you forget to change 4.0+0.01 when changing to Einstein (with RS <> 0) to find E@H doesn't have a 100 task limit ... last time I did that and turned my back for a few minutes and have had IIRC 736 tasks ... way, way over committed! Ha! LOL. Been there ..... done that. I have you beat. I forgot to switch to NNT for an hour once. Accumulated over 5000 tasks. Couldn't even abort them all in one shot and had to take whacks at a couple a hundred at a time. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
oh, and if you forget to change 0.1 back to 10 when you move from Einstein to Seti, you figure it out quickly as you don't get the full allotment of 100 work units per GPU/CPU........ And if you forget to change 4.0+0.01 when changing to Einstein (with RS <> 0) to find E@H doesn't have a 100 task limit ... last time I did that and turned my back for a few minutes and have had IIRC 736 tasks ... way, way over committed! Dang, I hate when that happens...Wait..Hold this.... |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
The splitter output has fallen even further. They were good for 50+/s, then it dropped down to around 42/s, now they're struggling to provide 30/s. That's about 108,000 per hour. Unfortunately current demand is 130,00/hr min, averaging around 135,000. We need 39/s as a minimum to meet peak demand (140,000) and keep a ready-to-send buffer with the present load. In a few hours there will be no work left in the ready-to-send buffer & caches will start to run down (more than they normally do) and not get refilled till the splitter output recovers. I think Eric might need to do some further splitter trouble shooting. Or it could be related to the general server system malaise- Replica keeps dropping behind, WU deleters likewise can't keep up. Grant Darwin NT |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
Why those things always happening on the friday? TGIF Cocktail hours? Ops 510 PM I'm late for the first one. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Splitters still struggling. There was brief boost, but not enough to top up the ready-to-send buffer, or even stop the decline- just slow it down for a bit. About 3-3.5hrs work left at the current rate of consumption. Grant Darwin NT |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Here's a theory for y'all. Do you suppose Meltdown and Spectre patches have been applied to that server, possibly degrading performance? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Here's a theory for y'all. Do you suppose Meltdown and Spectre patches have been applied to that server, possibly degrading performance? At the back of my mind also .... great minds thinking alike and all :-} Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
I notice that at the same time the Deleters cleared a huge backlog, the Splitters picked up their pace. They've since dropped their output again, but at least they're still producing enough to slowly build up the Ready-to-send buffer. Deleter I/O affecting splitter I/O? Grant Darwin NT |
Speedy Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 |
I notice that at the same time the Deleters cleared a huge backlog, the Splitters picked up their pace. They've since dropped their output again, but at least they're still producing enough to slowly build up the Ready-to-send buffer. You could be onto something there Grant. When you say a "huge backlog" was cleared are you talking about work unit files/result files waiting to be deleted or were you referring to the DB purge? Currently sitting at 3.463 million results |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
You could be onto something there Grant. When you say a "huge backlog" was cleared are you talking about work unit files/result files waiting to be deleted or were you referring to the DB purge? Currently sitting at 3.463 million results MB WU-awaiting-deletion went from 398,000 to 100,000 in 30min or less (hard to tell due to the scale of the graphs). At roughly that point in time, the splitters went from 35/s to over 60/s. WU-awaiting-deletion dropped slightly further, but since then has started climbing again. And as they have started climbing again, the splitter output has declined again (60/s, down to 50/s down to 30/s). Hence my wild speculation that some of the splitter issues are related to I/O contention in the database/ file storage. Received-last-hour is still around 135,000. Used to be 90k or over was a shorty storm. Then 90k-100k became the new norm. Now 135k. Used to be the Replica could keep up after the outages, not any more. Often it's only a few minutes behind, now there are more frequent periods of 30min or more. I/O bottleneck is my personal theory, be it security patch related, or just coming up on the limits of the present HDD based storage. Grant Darwin NT |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
MB WU-awating-deletion on the rise, splitter output on the decline (below 30/s now). About 5 hrs of work left in the Ready-to-send buffer at the present rate of it's decline. Grant Darwin NT |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
I heard Kevin Reed say that the World Community Grid servers had slowed by between 20% and 30% when they applied the patches. Fortunately, WCG had recently upgraded the hardware, so they had enough headroom - but they would have been struggling with the previous hardware.Here's a theory for y'all. Do you suppose Meltdown and Spectre patches have been applied to that server, possibly degrading performance?At the back of my mind also .... great minds thinking alike and all :-} Servers are different beasts from consumer PCs, and they do a different job. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I heard Kevin Reed say that the World Community Grid servers had slowed by between 20% and 30% when they applied the patches. Fortunately, WCG had recently upgraded the hardware, so they had enough headroom - but they would have been struggling with the previous hardware.Here's a theory for y'all. Do you suppose Meltdown and Spectre patches have been applied to that server, possibly degrading performance?At the back of my mind also .... great minds thinking alike and all :-} I have my suspicions too. After all servers do LOTS of I/O transactions. The most deleterious effect the patch had on server software was in any app that used a lot of I/O transactions in all the online tests I read. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Back about a week ago, Eric wrote: If we don't start building a queue I'll add more GBT splitters.I wonder if that's still an option. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Change the unused pfb splitters over to gbt splitters. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Back about a week ago, Eric wrote:Although Eric attended the same teleconference with Kevin Reed, Eric joined us a few minutes late: Kevin told us about the slowdown in the general chit-chat before the start of the formal business (which was about something completely different), so Eric didn't hear the actual statement. But I expect he's found out about it by now. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.