Message boards :
Number crunching :
The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation
Previous · 1 . . . 79 · 80 · 81 · 82 · 83 · 84 · 85 . . . 94 · Next
Author | Message |
---|---|
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 ![]() ![]() |
That didn't take too long today. |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
That didn't take too long today. About double what it should be from what it was in the past. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13882 Credit: 208,696,464 RAC: 304 ![]() ![]() |
But still way better than it has been recently.That didn't take too long today.About double what it should be from what it was in the past. Got home to find one system with a full cache, and the other system with a bunch of downloads in extended backoff mode, but one Retry & everything came down OK And since then the Scheduler has been dishing out work and it hasn't taken any effort on my part to download it. First time for several weeks. Edit- although it looks like we are about to run out of work; the splitters are still having issues getting going again after an outage. And one of my systems has a tonne of Shorties in it's cache- that's not going to help the Validation & Assimilation backlogs clear. Grant Darwin NT |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
True, better than the past couple of weeks, but nowhere near the Tuesday outage boilerplate of a 4-5 hour outage for database backup. I still have one system that stubbornly refuses to get any cpu work even though it requests it and only finally will get cpu tasks once the gpu cache is filled. Infuriating when the cpus provide a good portion of the house heating during Winter. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
![]() ![]() ![]() Send message Joined: 1 Apr 13 Posts: 1858 Credit: 268,616,081 RAC: 1,349 ![]() ![]() |
Best recovery I've seen in quite a while. Hopefully this means they've gotten a handle on the issues. Perfect? No, but I can live with this ... ![]() ![]() |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13882 Credit: 208,696,464 RAC: 304 ![]() ![]() |
Best recovery I've seen in quite a while.I'ts just a case of finally no more BLC35 files being split & putting out pretty much nothing but noise bombs. The fact is the backlogs from that (and the added replication for the RX5000 series issues) is still to be cleared, luckily they're presently low enough not to cause everything to fall over or come to a grinding halt. Grant Darwin NT |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Best recovery I've seen in quite a while. . . Hi Jimbo, . . Less than 9 hours is a pleasant change from recent outages. . . But the recovery has been very smooth with only a few niggling 'http internal error' messages. Stephen :) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13882 Credit: 208,696,464 RAC: 304 ![]() ![]() |
I'm wondering if they ran some sort of script during this outage to jiggle any missed WUs? I'm seeing a lot more groups of re-sends than usual. Getting batches of 20+ at a time. Grant Darwin NT |
Ville Saari ![]() Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530 ![]() ![]() |
. . Less than 9 hours is a pleasant change from recent outages.You apparently count the outage length differently than I do. My system logged that as 11.93 hour outage. That's the time between the last reported or received task before the outage and the first received new task after it. So for me the outage ends when the servers have recovered to the point where they can actually hand out new work. That's what matters if I want to determine if my cache was sufficient to 'survive' the outage. Actually the end should be at the point when the server can feed my computers new work faster than they can process so that the caches start filling. The first received task could be a random single task long before the faucets open for real. The first received task is just much easier to measure. December 3rd was the last 'normal' tuesday outage (4.21 hours). All outages after that have been way longer - except some unplanned ones. |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19485 Credit: 40,757,560 RAC: 67 ![]() ![]() |
True, better than the past couple of weeks, but nowhere near the Tuesday outage boilerplate of a 4-5 hour outage for database backup. During the outage did your CPU's run out of tasks? |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
I'm wondering if they ran some sort of script during this outage to jiggle any missed WUs? I'm seeing a lot more groups of re-sends than usual. Getting batches of 20+ at a time. . . A measure to try and clear the backlog perhaps ?? |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
. . Less than 9 hours is a pleasant change from recent outages.You apparently count the outage length differently than I do. My system logged that as 11.93 hour outage. That's the time between the last reported or received task before the outage and the first received new task after it. So for me the outage ends when the servers have recovered to the point where they can actually hand out new work. That's what matters if I want to determine if my cache was sufficient to 'survive' the outage. . . I think the general consensus is that it starts when the servers go into maintenance mode and ends when they come back online and we no longer receive a "Shut down for maintenance" message. After that the time till we receive "normal" feeds of WUs is the recovery period. I found the recovery pretty smooth and not long this time. This outage began at approx. 12:30 am local time and I was able to report work from about 9:05 am with only a small number (relatively) of "http internal error messages" so the outage was around 8.5 hours. I agree that the assessment of work required to get through an outage must include the recovery period but I was receiving 'significant' downloads (20 or more WUs) by about 10:15 am. So all up still less than 10 hours here. After 24 to 48 hour monsters I welcome the improvement ... Stephen < shrug > |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
True, better than the past couple of weeks, but nowhere near the Tuesday outage boilerplate of a 4-5 hour outage for database backup. The Ryzens, ALWAYS. The Intel Xeon usually always has cpu work during these long outages. It is just way slower than the Rzyens. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19485 Credit: 40,757,560 RAC: 67 ![]() ![]() |
True, better than the past couple of weeks, but nowhere near the Tuesday outage boilerplate of a 4-5 hour outage for database backup. I just took a quick look at one of your Ryzens (computer 5741129) and guestimate that about 60 tasks/day would keep each core busy, excluding noise bombs. So the next question has to be why are there not enough tasks being cached for the CPU's |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
just took a quick look at one of your Ryzens (computer 5741129) and guestimate that about 60 tasks/day would keep each core busy, excluding noise bombs. I don't know how you came up with 60 tasks a day on that host. BoincTasks says that host does 576 cpu tasks a day. Generally I do a cpu task in 25-45 minutes for most work. Have 16 threads running cpu work. I don't generally reschedule gpu work to the cpu. I just crunch through my allotted 150 tasks and then the cpu goes idle for the rest of the outage. I only do cpu work for Seti. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89 ![]() ![]() |
I am just guessing Keith, I am thinking possibly a slower CPU by 600 MHz and running Windows 10 ![]() |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
Well that host has the new Ryzen 3950X cpu. So 32 threads available and I have the cores locked to 4.2Ghz and I run the memory at 3600Mhz CL14. So it crunches through the cpu tasks the fastest of all my hosts. The second fastest is the 3900X with the same memory clock speed but only running 4.15Ghz and only 12 threads running cpu work. That one is currently in pieces on the kitchen table getting a custom loop cooling upgrade so I can punch the clock up on it. The Threadripper is next fastest and is mostly hamstrung by slower memory and slower clock that is autoboosted and variable. I'm waiting on a new hot-shirt cpu block for it so I can move to locked clocks on it which will improve the cpu processing time. Hoping to punch the memory clocks up too with a cooler cpu. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19485 Credit: 40,757,560 RAC: 67 ![]() ![]() |
just took a quick look at one of your Ryzens (computer 5741129) and guestimate that about 60 tasks/day would keep each core busy, excluding noise bombs. I did say each core. Surely the limits set by Seti are per processor not per type of processor. So I must ask again why is your cpu cache not storing enough to last a 12 hour outage when by your own numbers each core/thread is averaging 36 tasks/day. At a max cache/processor of 100, a number I think is now out of date, you should be able to cache nearly three days of cpu work. Assuming no -9's. |
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266 ![]() ![]() |
At a max cache/processor of 100, a number I think is now out of date, you should be able to cache nearly three days of cpu work. Assuming no -9's. There we go using common sense again :) By your maths, I should be able to store about 1500 GPU tasks to keep me purring along, although I wish I could get a cache of about 150 -200 opencl_ati_mac AP files a day as a decent replacement. Edit: Next you'll be saying we should no longer crunch AP on CPU because it is too inefficient, and has too many casual users who don't devote their machines to the extent full time crunchers do. But hey, it is a good hobby. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
I did say each core. . . The limit is per CPU not per core. So we only get 150 WUs whether you have 4 cores or 64 cores. Stephen < shrug > |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.