The Server Issues / Outages Thread - Panic Mode On! (119)

Author	Message
TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 2034066 - Posted: 26 Feb 2020, 18:13:04 UTC I've already cut down to just 3 machines running SETI, now, one of those can't get enough work to keep busy. Since the outage yesterday morning it has only had work for about 7 hours. It's currently Out of Work again, https://setiathome.berkeley.edu/results.php?hostid=6813106 Perhaps I should just cut back to just 2 machines? ID: 2034066 · Reply Quote

Lazydude Volunteer tester Send message Joined: 17 Jan 01 Posts: 45 Credit: 96,158,001 RAC: 136	Message 2034069 - Posted: 26 Feb 2020, 18:43:28 UTC I use RTT as early warnig 32h and above are in my eyes OK below 31h WARNING under 30h "Houston we have a small problem" As of time i wrote this Result turnaround time (last hour average) 29.60 hours ID: 2034069 · Reply Quote

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 2034070 - Posted: 26 Feb 2020, 19:11:39 UTC - in response to Message 2034069. Last modified: 26 Feb 2020, 19:18:58 UTC I use RTT as early warnig 32h and above are in my eyes OK below 31h WARNING under 30h "Houston we have a small problem" As of time i wrote this Result turnaround time (last hour average) 29.60 hours By your scale: Now at 31.34 hours (just 30 min after your post) we are in the Warning stage. <edit> Reached 32.25 hours at 19:10:04 UTC few minutes after. At this increase rate: Are we doomed? ID: 2034070 · Reply Quote

Lazydude Volunteer tester Send message Joined: 17 Jan 01 Posts: 45 Credit: 96,158,001 RAC: 136	Message 2034071 - Posted: 26 Feb 2020, 19:28:32 UTC At this increase rate: Are we doomed? No - when the trend is going at shorter times then is warning Now its in recovey mode now when the trend is uppwards May add that over round about 36h - then we have had an outake .. ID: 2034071 · Reply Quote

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 2034108 - Posted: 27 Feb 2020, 0:11:39 UTC Now the 2nd out of 3 machines has run Out of Work, https://setiathome.berkeley.edu/results.php?hostid=6796479 That leaves 1 machine still working. I suppose when that one runs Out of Work I'll just shut every thing down and brag about how much money I'm saving on electricity. ID: 2034108 · Reply Quote

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19062 Credit: 40,757,560 RAC: 67	Message 2034116 - Posted: 27 Feb 2020, 0:45:07 UTC - in response to Message 2034108. Now the 2nd out of 3 machines has run Out of Work, https://setiathome.berkeley.edu/results.php?hostid=6796479 That leaves 1 machine still working. I suppose when that one runs Out of Work I'll just shut every thing down and brag about how much money I'm saving on electricity. I can only assume it is your problem. I've had very few problems since 08:00 26th UTC. ID: 2034116 · Reply Quote

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 2034123 - Posted: 27 Feb 2020, 1:02:31 UTC - in response to Message 2034116. That's what you get for assuming, Since making that post the machine that had been run out of work now has 400 tasks instead of Zero. I had absolutely nothing to do with it. ID: 2034123 · Reply Quote

Boiler Paul Send message Joined: 4 May 00 Posts: 232 Credit: 4,965,771 RAC: 64	Message 2034125 - Posted: 27 Feb 2020, 1:12:09 UTC work can be hard to come by. all I've gotten over the past few hours is the Project has no tasks available in the log. Just need to be patient ID: 2034125 · Reply Quote

Boiler Paul Send message Joined: 4 May 00 Posts: 232 Credit: 4,965,771 RAC: 64	Message 2034126 - Posted: 27 Feb 2020, 1:17:51 UTC - in response to Message 2034125. and, of course, after I post, I receive work! ID: 2034126 · Reply Quote

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 2034131 - Posted: 27 Feb 2020, 1:41:16 UTC I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw. ID: 2034131 · Reply Quote

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 2034135 - Posted: 27 Feb 2020, 2:20:40 UTC - in response to Message 2034131. I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw. +1 To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 2034135 · Reply Quote

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19062 Credit: 40,757,560 RAC: 67	Message 2034141 - Posted: 27 Feb 2020, 2:54:08 UTC Could it be that the assimilation process is the problem. How difficult is it to translate the data we produce and all the other details necessary and put onto the science database. this is what the Server Status page says; sah_assimilator/ap_assimilator : Takes scientific data from validated results and puts them in the SETI@home (or Astropulse) database for later analysis. ID: 2034141 · Reply Quote

Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530	Message 2034144 - Posted: 27 Feb 2020, 3:09:55 UTC - in response to Message 2034131. I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw. It is an illusion. Everyone has the same priority but higher your RAC, the more successful scheduler request you need to keep your cache not depleting. If every 12th request wins the lottery and gets some work, then you get some work once every hour and this may be all that a slow host needs to refill its cache to the brim but nowhere near the one hour production of a fast host. ID: 2034144 · Reply Quote

Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530	Message 2034146 - Posted: 27 Feb 2020, 3:23:54 UTC - in response to Message 2034141. Could it be that the assimilation process is the problem. There has clearly been some problem in assimilation for the last several weeks, but the problem can be in many different places. It could be the throughput of the boinc database that somehow hits the assimilator harder than the other processes. Or it could be a problem in the assimilator program itself. Or it can be the throughput of the science databases. Or the throughput of the upload filesystem where the result files the assimilator needs to read are. ID: 2034146 · Reply Quote

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19062 Credit: 40,757,560 RAC: 67	Message 2034167 - Posted: 27 Feb 2020, 8:17:43 UTC - in response to Message 2034131. Last modified: 27 Feb 2020, 8:25:22 UTC I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw. Maybe related, but I think it is more to do with how much work the host requests. When I get up Wednesday mornings, UTC times rule in the UK winter, if the computer hasn't started receiving work, I set the cache to a very low level. I find that usually works after a few attempts, and as I receive work, I increase the cache in steps up to 0.6 days which, unless the servers give me oddles of AP, fills the GPU cache to 150 tasks. Also, here are some numbers on tasks downloaded and validated in the 24 hrs since 08:06:31 26th Feb, ~24hours ago. After 12 hours at ~20:00 26th Downloaded - 345; In Progress 150; Valid 86 Processed = 345 - 150 = 195 Percentage of tasks downloaded and Validated in 12 hours = 100 * 86 / 195 = 44.1% After 4 hours at ~08:00 27th Downloaded - 523; In Progress 150; Valid 253 Processed = 523 - 150 = 373 Percentage of tasks downloaded and Validated in 24 hours = 100 * 253 / 373 = 67.8% I only crunch on the GPU so it is fairly simple just to scroll through the pages and count each page then add up the page numbers. edit] Prior to 08:06 yesterday the Seti cache was empty. ID: 2034167 · Reply Quote

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 2034169 - Posted: 27 Feb 2020, 8:32:03 UTC - in response to Message 2034167. I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw. Maybe related, but I think it is more to do with how much work the host requests. When I get up Wednesday mornings, UTC times rule in the UK winter, if the computer hasn't started receiving work, I set the cache to a very low level. I find that usually works after a few attempts, and as I receive work, I increase the cache in steps up to 0.6 days which, unless the servers give me oddles of AP, fills the GPU cache to 150 tasks. That's my experience, too. I now have two machines in the 'high RAC' category (top 100): they were both completely dry yesterday morning. I did a little Einstein backup work while the servers were sorting themselves out, but once work started flowing, I ramped them up gently by requesting an hour of work at a time (0.05 days) and increasing the cache a step at a time as they filled up. Reached full cache by evening, with just a little tweak any time I happened to be passing. ID: 2034169 · Reply Quote

AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266	Message 2034224 - Posted: 27 Feb 2020, 18:48:43 UTC - in response to Message 2034169. Validation Pending still steadily growing, looks like around 23 million objects waiting to be satisfied. Still getting work though, despite the RTS showing a pretty steady 0. I even fell asleep in the wrong configuration night before last to decrease my Pending column below normal average, but I'm well over that again. This poor system needs a break. ID: 2034224 · Reply Quote

Jimbocous Volunteer tester Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349	Message 2034292 - Posted: 27 Feb 2020, 22:28:05 UTC - in response to Message 2034169. Last modified: 27 Feb 2020, 22:35:36 UTC I'm still convinced that somehow, whether it be intent or just net result, the higher your RAC is the lower you are in the priority stack in terms of actually getting work during a recovery from outage. This is entirely too consistent to be the luck of the draw. Maybe related, but I think it is more to do with how much work the host requests. When I get up Wednesday mornings, UTC times rule in the UK winter, if the computer hasn't started receiving work, I set the cache to a very low level. I find that usually works after a few attempts, and as I receive work, I increase the cache in steps up to 0.6 days which, unless the servers give me oddles of AP, fills the GPU cache to 150 tasks. That's my experience, too. I now have two machines in the 'high RAC' category (top 100): they were both completely dry yesterday morning. I did a little Einstein backup work while the servers were sorting themselves out, but once work started flowing, I ramped them up gently by requesting an hour of work at a time (0.05 days) and increasing the cache a step at a time as they filled up. Reached full cache by evening, with just a little tweak any time I happened to be passing. Sounds like a reality, not "an illusion". Main cruncher cache here is around ~~>25%~~ ~50%. [my error] No heartburn, work is getting assigned and completed, but it seems clear that there's more than "first come, first served" going on. ID: 2034292 · Reply Quote

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22203 Credit: 416,307,556 RAC: 380	Message 2034295 - Posted: 27 Feb 2020, 22:33:04 UTC I can't help wondering if the splitters are being deliberately throttled in an attempt to reduce the amount of work sitting around in the various queues. After all work not being split will have that effect Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 2034295 · Reply Quote

Wiggo Send message Joined: 24 Jan 00 Posts: 34748 Credit: 261,360,520 RAC: 489	Message 2034303 - Posted: 27 Feb 2020, 23:04:48 UTC - in response to Message 2034295. Last modified: 27 Feb 2020, 23:05:59 UTC I can't help wondering if the splitters are being deliberately throttled in an attempt to reduce the amount of work sitting around in the various queues. After all work not being split will have that effect Yes they are, it was stated by Eric that this is being done to try and keep the system within it's RAM limits, I just can't remember where that post was made and whether Eric actually made it or it was passed along ATM. Cheers. ID: 2034303 · Reply Quote

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.