The Server Issues / Outages Thread - Panic Mode On! (119)

Author	Message
AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266	Message 2041092 - Posted: 28 Mar 2020, 16:06:08 UTC - in response to Message 2041089. Last modified: 28 Mar 2020, 16:07:27 UTC Yep. I'm just going through the tasks I have and making sure the _2 and _1 are prioritized and running first. Maybe we can force a small kick on the system by trying to clear out the DB??? Edit: My room is so much colder now. ID: 2041092 · Reply Quote

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22221 Credit: 416,307,556 RAC: 380	Message 2041095 - Posted: 28 Mar 2020, 16:12:21 UTC No point in prioritising _1 tasks, they are one half of the initial pair sent out. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 2041095 · Reply Quote

Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530	Message 2041096 - Posted: 28 Mar 2020, 16:13:19 UTC - in response to Message 2041092. Yep. I'm just going through the tasks I have and making sure the _2 and _1 are prioritized and running first. Maybe we can force a small kick on the system by trying to clear out the DB???. There's no point in priorizing _1s. Only _2s and up. _0 and _1 are the initial replication, not resends. ID: 2041096 · Reply Quote

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14653 Credit: 200,643,578 RAC: 874	Message 2041097 - Posted: 28 Mar 2020, 16:18:53 UTC - in response to Message 2041089. Last modified: 28 Mar 2020, 16:28:53 UTC To where the new splitted data is going? I'm getting some, in short bursts every few hours. It's just the luck of asking at exactly the right time. Edit - I didn't get everything I was wanting, but it'll do for now. These were all new work, not resends. 28/03/2020 16:26:07 \| SETI@home \| Reporting 1 completed tasks 28/03/2020 16:26:07 \| SETI@home \| [sched_op] NVIDIA GPU work request: 36067.85 seconds; 0.00 devices 28/03/2020 16:26:10 \| SETI@home \| Scheduler request completed: got 31 new tasks 28/03/2020 16:26:10 \| SETI@home \| [sched_op] estimated total NVIDIA GPU task duration: 15004 seconds ID: 2041097 · Reply Quote

Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22	Message 2041098 - Posted: 28 Mar 2020, 16:20:52 UTC I wish that the system was making some progress on assimilation since it isn't doing much of any splitting. I know the Status page data is about an hour out of date, but I would like to see the system recovering a bit since it isn't splitting much if at all at the moment. I have been prioritizing 2s as well. I used to do it to help the db size, but recently I have been doing it because they are usually short run WUs that I could return and get a "meatier" WU in its place. Since I'm no longer topped off, and thus have space should any request actually get something, I'm no longer prioritizing anything and I'm running in normal FIFO order. I think I'm good for another 2 days, so I'm hanging out for a bit longer. glad to have made it over 2 million on this and the previous mac-mini. ID: 2041098 · Reply Quote

AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266	Message 2041100 - Posted: 28 Mar 2020, 16:34:44 UTC Well then...the _2s are running. Not much help on my end. ID: 2041100 · Reply Quote

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 2041103 - Posted: 28 Mar 2020, 16:42:11 UTC All my machines are Out of Work and not receiving much of anything. I think it's time to do something a little more constructive, I dunno, maybe finish burning the leafs or something... ID: 2041103 · Reply Quote

[TA]Assimilator1 Send message Joined: 16 Oct 99 Posts: 52 Credit: 8,551,146 RAC: 50	Message 2041104 - Posted: 28 Mar 2020, 16:43:54 UTC Last modified: 28 Mar 2020, 16:44:19 UTC I seem to be having the opposite problem to most, so far my faster rig hasn't run out of WUs (got some some at ~11.30, 14.30 & 15.30), but the cache is dwindling. My 2nd rig's CPU ran out of WUs yesterday, bar 1 AP WU which is going to take over a day!! It's GPU was getting WUs sporadically but is now out. So is the consensus just general server problems which aren't being addressed, rather than a purposeful wind down? Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP@H. Main rig - Ryzen 5 3600, 32GB DDR4 3200, RX 580 8GB, 500GB Samsung 970 Evo+, Win 10 2nd rig - i7 4930k @4.1 GHz, 16GB DDR2 1866, HD 7870 XT 3GB (DS), Win7 64bit ID: 2041104 · Reply Quote

Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530	Message 2041109 - Posted: 28 Mar 2020, 17:08:42 UTC - in response to Message 2041104. So is the consensus just general server problems which aren't being addressed, rather than a purposeful wind down? The database server is out of RAM and running in snail mode and the splitters are being heavily throttled in an attempt to reduce the size of the database so it could recover. ID: 2041109 · Reply Quote

Keith T. Volunteer tester Send message Joined: 23 Aug 99 Posts: 962 Credit: 537,293 RAC: 9	Message 2041113 - Posted: 28 Mar 2020, 17:18:38 UTC - in response to Message 2041092. Yep. I'm just going through the tasks I have and making sure the _2 and _1 are prioritized and running first. Maybe we can force a small kick on the system by trying to clear out the DB??? Edit: My room is so much colder now. _1 and _0 are original, normal tasks _2 or greater are the re-sends. ID: 2041113 · Reply Quote

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 2041117 - Posted: 28 Mar 2020, 17:34:43 UTC - in response to Message 2041097. To where the new splitted data is going? I'm getting some, in short bursts every few hours. It's just the luck of asking at exactly the right time. Edit - I didn't get everything I was wanting, but it'll do for now. These were all new work, not resends. 28/03/2020 16:26:07 \| SETI@home \| Reporting 1 completed tasks 28/03/2020 16:26:07 \| SETI@home \| [sched_op] NVIDIA GPU work request: 36067.85 seconds; 0.00 devices 28/03/2020 16:26:10 \| SETI@home \| Scheduler request completed: got 31 new tasks 28/03/2020 16:26:10 \| SETI@home \| [sched_op] estimated total NVIDIA GPU task duration: 15004 seconds Just get: 28-Mar-2020 12:29:30 [SETI@home] Scheduler request completed: got 33 new tasks Did that indicates we both win the S@H new WU lotto? ID: 2041117 · Reply Quote

AllgoodGuy Send message Joined: 29 May 01 Posts: 293 Credit: 16,348,499 RAC: 266	Message 2041120 - Posted: 28 Mar 2020, 17:46:26 UTC 7 splitters splitting 6 machines doing nothing 5 extraterrestrial rings 4 servers burning 3 clients crashing 2 servers smashing and an adult beverage to be enjoyed now. ID: 2041120 · Reply Quote

[TA]Assimilator1 Send message Joined: 16 Oct 99 Posts: 52 Credit: 8,551,146 RAC: 50	Message 2041126 - Posted: 28 Mar 2020, 18:12:48 UTC - in response to Message 2041109. Last modified: 28 Mar 2020, 18:14:43 UTC Lol! :D So is the consensus just general server problems which aren't being addressed, rather than a purposeful wind down? The database server is out of RAM and running in snail mode and the splitters are being heavily throttled in an attempt to reduce the size of the database so it could recover. Ok, thanks :) (not that I can remember what the splitters do :o, ........oh wait, do they split up the large data blocks from the observatory(ies?) to the smaller chunks we crunch? Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, DHEP@H. Main rig - Ryzen 5 3600, 32GB DDR4 3200, RX 580 8GB, 500GB Samsung 970 Evo+, Win 10 2nd rig - i7 4930k @4.1 GHz, 16GB DDR2 1866, HD 7870 XT 3GB (DS), Win7 64bit ID: 2041126 · Reply Quote

Ville Saari Send message Joined: 30 Nov 00 Posts: 1158 Credit: 49,177,052 RAC: 82,530	Message 2041130 - Posted: 28 Mar 2020, 18:26:34 UTC - in response to Message 2041126. Ok, thanks :) (not that I can remember what the splitters do :o, ........oh wait, do they split up the large data blocks from the observatory(ies?) to the smaller chunks we crunch? They produce the tasks for us to download and crunch. If they are not running and the RTS buffer is empty, all we get are some occasional resends. And to get even them our computers have to win the lottery against thousands of other hungry computers. ID: 2041130 · Reply Quote

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 2041146 - Posted: 28 Mar 2020, 19:28:56 UTC i feel so lucky, i just get 6 new WU!!! Enough for less than a minute.... but is a progress. 6 is better than nothing. ID: 2041146 · Reply Quote

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304	Message 2041159 - Posted: 28 Mar 2020, 20:11:18 UTC Last modified: 28 Mar 2020, 20:25:24 UTC Still barely any work, so it looks like the Servers have finally ground to a halt. "Results out in the field" is only 5 million now. Dropping away as work comes in but little goes out. "Results returned and awaiting validation" 20.2 million and "Workunits waiting for assimilation" 6.85 million. I think they might as well shut things down now. Leave just the BOINC master database, SETI@home science database, data-driven web pages, the Transitioners and the sah assimilators running but shut down all other functions (inc AP, the Scheudler, upload & download servers, Replica etc). Just let the Assimilators clear that 6.85 million WU backlog. Once that is done, then start up the Deleters to delete all the Tasks & WUs that have now been marked for deletion. Once that backlog clears, then run the Purgers to actually remove everything marked for Deletion. And once that is done, time for one final official weekly outage to compact the database. It would be nice to think this could all be done in a few days, but given the state of the database i can see it taking a week (or more). But once done, the database shouldn't have any further issues with 6.8 million less WUs and at least 13.6 million less Results in it, along with much smaller indexes. Fire up everything except for the Scheduler, upload server (and of course the splitters). Let things run for a bit (after a week there should be plenty of resends to go out), then start up the Scheduler, then a bit later the upload server, and let the WU cleanup lottery commence. And i would really, really, really hope they finally implement the revised deadlines for resends, set it for 2 days & get the worst of the outstanding work cleared up in 3.5 months, instead of the 10+ months it will take (if all work is to be cleared completely). Grant Darwin NT ID: 2041159 · Reply Quote

Speedy Volunteer tester Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89	Message 2041161 - Posted: 28 Mar 2020, 20:24:27 UTC - in response to Message 2041146. i feel so lucky, i just get 6 new WU!!! Enough for less than a minute.... but is a progress. 6 is better than nothing. I am amazed a Linux machine can process 6 work units in under a minute. Did I read that correctly? ID: 2041161 · Reply Quote

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13746 Credit: 208,696,464 RAC: 304	Message 2041162 - Posted: 28 Mar 2020, 20:26:29 UTC - in response to Message 2041161. i feel so lucky, i just get 6 new WU!!! Enough for less than a minute.... but is a progress. 6 is better than nothing. I am amazed a Linux machine can process 6 work units in under a minute. Did I read that correctly? Depends how many GPUs you have, and what type. High end cards can do a WU in 30sec or so. Grant Darwin NT ID: 2041162 · Reply Quote

juan BFP Volunteer tester Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799	Message 2041165 - Posted: 28 Mar 2020, 20:40:25 UTC - in response to Message 2041162. Last modified: 28 Mar 2020, 21:29:19 UTC i feel so lucky, i just get 6 new WU!!! Enough for less than a minute.... but is a progress. 6 is better than nothing. I am amazed a Linux machine can process 6 work units in under a minute. Did I read that correctly? Depends how many GPUs you have, and what type. High end cards can do a WU in 30sec or so. Yes you are right. This host has now 4 GPU's (could be more) and it crunch a WU in about 70 secs (some in the range of 30 secs and slower in about 2:10) so yes 6 WU normal last for 1-1:30 minutes. But the current WU produced are mainly shorties or noise bombs so they crunch in about 1 min only. I measure the crunching performance on how many WU are reported on each scheduled call. It goes from 22 WU to 60 each 5 min. Then just do the math. Be aware i running on 1070 & 2070, In the 2080 Ti or Titan that number is cut by 2 or 3. ID: 2041165 · Reply Quote

Speedy Volunteer tester Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89	Message 2041166 - Posted: 28 Mar 2020, 20:40:40 UTC - in response to Message 2041159. Once that is done, then start up the Deleters to delete all the Tasks & WUs that have now been marked for deletion. Once that backlog clears, then run the Purgers to actually remove everything marked for Deletion. And once that is done, time for one final official weekly outage to compact the database. Unless the above can be done remotely it will not be getting done until Berkely is out of lockdown. ID: 2041166 · Reply Quote

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.