Panic Mode On (113) Server Problems?

Author	Message
Ghia Send message Joined: 7 Feb 17 Posts: 238 Credit: 28,911,438 RAC: 50	Message 1962637 - Posted: 31 Oct 2018, 9:12:09 UTC - in response to Message 1962636. And with the Ready-to-send buffer empty, and splitter output at 18/s, it's going to be that way for a while. Edit- looks like the splitters have woken up. Now cranking out 80/s. As long as they can keep at 55 or better, then things will improve, eventually. Well something did wake up...suddenly my cache filled up, and I'm back to the steady hum of my GPU fans ;-) Humans may rule the world...but bacteria run it... ID: 1962637 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1962639 - Posted: 31 Oct 2018, 9:27:05 UTC Last modified: 31 Oct 2018, 9:28:53 UTC And one of the splitters has started on a BLC01 file, so hopefully the number of noise bombs will start to decline as the BLC22 & BLC23 files are finally finished off, and the ther servers can clear the Validation/Assimilation backlog (now at 7.2/1.4 million). Grant Darwin NT ID: 1962639 ·

Kevin Olley Send message Joined: 3 Aug 99 Posts: 906 Credit: 261,085,289 RAC: 572	Message 1962669 - Posted: 31 Oct 2018, 14:29:43 UTC - in response to Message 1962639. And one of the splitters has started on a BLC01 file, so hopefully the number of noise bombs will start to decline as the BLC22 & BLC23 files are finally finished off, and the ther servers can clear the Validation/Assimilation backlog (now at 7.2/1.4 million). Just done a couple of them, they are fast, 55 seconds each compared with 90 seconds for normal BLC22 & BLC23 WU's. Kevin ID: 1962669 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1962750 - Posted: 31 Oct 2018, 22:29:29 UTC - in response to Message 1962669. And one of the splitters has started on a BLC01 file, so hopefully the number of noise bombs will start to decline as the BLC22 & BLC23 files are finally finished off, and the ther servers can clear the Validation/Assimilation backlog (now at 7.2/1.4 million). Just done a couple of them, they are fast, 55 seconds each compared with 90 seconds for normal BLC22 & BLC23 WU's. . . They are the "new" GBT format which first appeared in a Blc04 data series 12 months ago and became the norm by about Christmas last year or January this year. The Blc22/23 series we have been wading through, noise bombs and all, are the "old" format from before that. At least, they conform to the run times that identifies each of what I call format (for want of a better term). . . I like the new format much better :) Stephen :) ID: 1962750 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1962791 - Posted: 1 Nov 2018, 5:21:17 UTC I don't want to tempt fate, but I have to say i'm impressed with the servers at the moment. The recovery from the weekly outage wasn't all that great, but now they have recovered they're holding up well. There's been a sustained return rate of 130k, and the splitters have still been able to meet the demand, and build up a Ready-to-send buffer as well. On top of that effort, they've also been able to put a dent in the Validation & Assimilation backlogs. Grant Darwin NT ID: 1962791 ·

Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22	Message 1962795 - Posted: 1 Nov 2018, 6:32:18 UTC Last modified: 1 Nov 2018, 7:02:07 UTC The status page numbers have some lag. I'm not sure if it is a sign of a problem or not. edit to add... it was a false alarm. All is well. ID: 1962795 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1962830 - Posted: 1 Nov 2018, 15:34:37 UTC So what is going on now? Just looked and all hosts are out of gpu work. Reporting and getting nothing in return. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1962830 ·

Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640	Message 1962832 - Posted: 1 Nov 2018, 15:39:34 UTC Mine arenâ€™t out of work, but it does seem to be falling from the max queue that they usually hold. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ID: 1962832 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1962834 - Posted: 1 Nov 2018, 15:49:57 UTC - in response to Message 1962832. It looks like hit and miss over a dozen request cycles whether you get any work or not. My big iron has been empty for half an hour and just got a slug of 114 tasks 5 minutes ago. But that is almost half completed by now. My other crunchers are getting nothing and are now working on the backup projects. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1962834 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1962835 - Posted: 1 Nov 2018, 15:51:31 UTC I have been loading up on tasks here, and it has been hit-and-miss for the last 10-15m. But certainly not long enough to run out of work. ID: 1962835 ·

Unixchick Send message Joined: 5 Mar 12 Posts: 815 Credit: 2,361,516 RAC: 22	Message 1962858 - Posted: 1 Nov 2018, 17:58:06 UTC Last modified: 1 Nov 2018, 18:00:43 UTC The results out in the field looks about normal at 4.4 million. The results ready to send is a bit low, but holding steady in the 470k range. The results received in the last hour seems high at 140K. Are we getting noise bombs again?? is this just the noise bombs filtering through the slower machines? edit to add : we are now getting 30oc18aa, so that might help. ID: 1962858 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1962861 - Posted: 1 Nov 2018, 18:32:49 UTC - in response to Message 1962858. That could be the effect of the noise bombs filtering though the slow hosts. I think it more likely that the "shorty" BLC01 tasks are the cause. Mine are processing in 30-60 seconds. So the gpus make quick work of them. The Arecibo file will slow things down a bit. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1962861 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1962867 - Posted: 1 Nov 2018, 18:55:08 UTC - in response to Message 1962861. That could be the effect of the noise bombs filtering though the slow hosts. I think it more likely that the "shorty" BLC01 tasks are the cause. Mine are processing in 30-60 seconds. So the gpus make quick work of them. The Arecibo file will slow things down a bit. The BLC01 tasks run quick on CPUs as well as GPUs, and on GPUs they run not much over half the runtime of the recent BLC22/23 run. So to a first approximation, we would expect roughly double the return rate for a while. Caches (apart from those bumping the hard limit) will also grow as as initial runtime estimates adjust to the new normal, increasing the drawdown rate from RTS. ID: 1962867 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1962873 - Posted: 1 Nov 2018, 21:03:16 UTC - in response to Message 1962861. Last modified: 1 Nov 2018, 21:05:13 UTC That could be the effect of the noise bombs filtering though the slow hosts. I think it more likely that the "shorty" BLC01 tasks are the cause. Mine are processing in 30-60 seconds. So the gpus make quick work of them. The Arecibo file will slow things down a bit. . . They aren't 'shorties' they are normal but the 'new' format that started with a Blc04 series in October last year and runs in about the same time as Arecibo 'normal' tasks, and had been the norm until we got this recent run of '2x' series tasks that were old school and slow. The new format takes 105 secs on my top machine compared to 180 for the Blc22/23 series of late. And in usual form Credit Screw ignores the greater efficiency of the tasks (my APR jumped over 35%) and halved the awarded credit because they take only slightly more than half the time. But that is typical for Credit Screw. There was allegedly a 'committee' formed to ponder an upgrade/replacement for Credit New but that seems to have evaporated, I haven't heard anything for months, not since the initial 'rumour' was started. But then the same is true of Parkes, PMs are unanswered and when I asked to have the Parkes thread in News to be re-activated (it had been closed because of no activity) it was instead removed. There are less leaks in SETI than the government or the CIA. . . As for the recent problem getting new work. I reviewed my logs for overnight and it only seems to have lasted from 2:20pm to 2:50 pm UTC. So maybe it was that glitch that is often mentioned but simply at a different time. Stephen <shrug> ID: 1962873 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1962927 - Posted: 2 Nov 2018, 5:48:59 UTC ... Welcome back to Earth Assimilator backlog - Pull up a chair and stay awhile. ID: 1962927 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1962932 - Posted: 2 Nov 2018, 6:37:48 UTC - in response to Message 1962927. ... Welcome back to Earth Assimilator backlog - Pull up a chair and stay awhile. Yep, good the see the Validation & Assimilation backlogs have finally cleared. Now all we need is for the Results & WU Purge backlogs to clear... Grant Darwin NT ID: 1962932 ·

Speedy Volunteer tester Send message Joined: 26 Jun 04 Posts: 1643 Credit: 12,921,799 RAC: 89	Message 1963009 - Posted: 2 Nov 2018, 20:53:56 UTC - in response to Message 1962932. ... Welcome back to Earth Assimilator backlog - Pull up a chair and stay awhile. Yep, good the see the Validation & Assimilation backlogs have finally cleared. Now all we need is for the Results & WU Purge backlogs to clear... I am just curious to know why we need the work units purged results to clear? These will probably take 24 hours. Apart from people having lots of results in their accounts I cannot see any other major issue, unless we run out of disk space. ID: 1963009 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13727 Credit: 208,696,464 RAC: 304	Message 1963013 - Posted: 2 Nov 2018, 21:19:53 UTC - in response to Message 1963009. I am just curious to know why we need the work units purged results to clear? These will probably take 24 hours. Apart from people having lots of results in their accounts I cannot see any other major issue, unless we run out of disk space. That's part of it- running out of disk space, not to mention the load on the database, and really, really, really long wait times when looking at the results on your computers- particularly for those with high output systems. Think of it as being like the Recycle Bin on the windows desktop- once you delete the file, it's still there till you empty it. Same here- the files need to be purged after being deleted to finally free up that space, and reduce the size of the database. And most of the issues we have with the servers relate to the size of the database. Grant Darwin NT ID: 1963013 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1963016 - Posted: 2 Nov 2018, 21:30:19 UTC and really, really, really long wait times when looking at the results on your computers- particularly for those with high output systems. That is my primary annoyance. The website is unusable when the Results and WU purge loads get large, especially after an outage and for several days later, typically running right into the next Tuesday outage. If the web page for one of my hosts doesn't display and times out, I just give up and look at it the next day. Makes it hard to keep on top of my hosts to make sure they are returning valid work and haven't had an upset I need to catch and rectify. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1963016 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1963017 - Posted: 2 Nov 2018, 21:46:28 UTC - in response to Message 1963009. I am just curious to know why we need the work units purged results to clear? These will probably take 24 hours. Apart from people having lots of results in their accounts I cannot see any other major issue, unless we run out of disk space. It's more that records in the database tables need to clear - and in particular, the indexes to the records need to be shrunk until they fit in RAM (not on disk). Otherwise, they will be unutterably slow. ID: 1963017 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.