Panic Mode On (113) Server Problems?

Message boards : Number crunching : Panic Mode On (113) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 27 · 28 · 29 · 30 · 31 · 32 · 33 . . . 37 · Next

AuthorMessage
Ghia
Avatar

Send message
Joined: 7 Feb 17
Posts: 238
Credit: 28,911,438
RAC: 50
Norway
Message 1962637 - Posted: 31 Oct 2018, 9:12:09 UTC - in response to Message 1962636.  


And with the Ready-to-send buffer empty, and splitter output at 18/s, it's going to be that way for a while.

Edit- looks like the splitters have woken up. Now cranking out 80/s. As long as they can keep at 55 or better, then things will improve, eventually.

Well something did wake up...suddenly my cache filled up, and I'm back to the steady hum of my GPU fans ;-)
Humans may rule the world...but bacteria run it...
ID: 1962637 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13832
Credit: 208,696,464
RAC: 304
Australia
Message 1962639 - Posted: 31 Oct 2018, 9:27:05 UTC
Last modified: 31 Oct 2018, 9:28:53 UTC

And one of the splitters has started on a BLC01 file, so hopefully the number of noise bombs will start to decline as the BLC22 & BLC23 files are finally finished off, and the ther servers can clear the Validation/Assimilation backlog (now at 7.2/1.4 million).
Grant
Darwin NT
ID: 1962639 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1962669 - Posted: 31 Oct 2018, 14:29:43 UTC - in response to Message 1962639.  

And one of the splitters has started on a BLC01 file, so hopefully the number of noise bombs will start to decline as the BLC22 & BLC23 files are finally finished off, and the ther servers can clear the Validation/Assimilation backlog (now at 7.2/1.4 million).


Just done a couple of them, they are fast, 55 seconds each compared with 90 seconds for normal BLC22 & BLC23 WU's.
Kevin


ID: 1962669 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1962750 - Posted: 31 Oct 2018, 22:29:29 UTC - in response to Message 1962669.  

And one of the splitters has started on a BLC01 file, so hopefully the number of noise bombs will start to decline as the BLC22 & BLC23 files are finally finished off, and the ther servers can clear the Validation/Assimilation backlog (now at 7.2/1.4 million).


Just done a couple of them, they are fast, 55 seconds each compared with 90 seconds for normal BLC22 & BLC23 WU's.


. . They are the "new" GBT format which first appeared in a Blc04 data series 12 months ago and became the norm by about Christmas last year or January this year. The Blc22/23 series we have been wading through, noise bombs and all, are the "old" format from before that. At least, they conform to the run times that identifies each of what I call format (for want of a better term).

. . I like the new format much better :)

Stephen

:)
ID: 1962750 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13832
Credit: 208,696,464
RAC: 304
Australia
Message 1962791 - Posted: 1 Nov 2018, 5:21:17 UTC

I don't want to tempt fate, but I have to say i'm impressed with the servers at the moment. The recovery from the weekly outage wasn't all that great, but now they have recovered they're holding up well.
There's been a sustained return rate of 130k, and the splitters have still been able to meet the demand, and build up a Ready-to-send buffer as well. On top of that effort, they've also been able to put a dent in the Validation & Assimilation backlogs.
Grant
Darwin NT
ID: 1962791 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1962795 - Posted: 1 Nov 2018, 6:32:18 UTC
Last modified: 1 Nov 2018, 7:02:07 UTC

The status page numbers have some lag. I'm not sure if it is a sign of a problem or not.

edit to add... it was a false alarm. All is well.
ID: 1962795 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1962830 - Posted: 1 Nov 2018, 15:34:37 UTC

So what is going on now? Just looked and all hosts are out of gpu work. Reporting and getting nothing in return.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1962830 · Report as offensive
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1962832 - Posted: 1 Nov 2018, 15:39:34 UTC

Mine aren’t out of work, but it does seem to be falling from the max queue that they usually hold.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1962832 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1962834 - Posted: 1 Nov 2018, 15:49:57 UTC - in response to Message 1962832.  

It looks like hit and miss over a dozen request cycles whether you get any work or not. My big iron has been empty for half an hour and just got a slug of 114 tasks 5 minutes ago. But that is almost half completed by now. My other crunchers are getting nothing and are now working on the backup projects.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1962834 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1962835 - Posted: 1 Nov 2018, 15:51:31 UTC

I have been loading up on tasks here, and it has been hit-and-miss for the last 10-15m.
But certainly not long enough to run out of work.
ID: 1962835 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1962858 - Posted: 1 Nov 2018, 17:58:06 UTC
Last modified: 1 Nov 2018, 18:00:43 UTC

The results out in the field looks about normal at 4.4 million. The results ready to send is a bit low, but holding steady in the 470k range. The results received in the last hour seems high at 140K.

Are we getting noise bombs again?? is this just the noise bombs filtering through the slower machines?

edit to add : we are now getting 30oc18aa, so that might help.
ID: 1962858 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1962861 - Posted: 1 Nov 2018, 18:32:49 UTC - in response to Message 1962858.  

That could be the effect of the noise bombs filtering though the slow hosts. I think it more likely that the "shorty" BLC01 tasks are the cause. Mine are processing in 30-60 seconds. So the gpus make quick work of them. The Arecibo file will slow things down a bit.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1962861 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14672
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1962867 - Posted: 1 Nov 2018, 18:55:08 UTC - in response to Message 1962861.  

That could be the effect of the noise bombs filtering though the slow hosts. I think it more likely that the "shorty" BLC01 tasks are the cause. Mine are processing in 30-60 seconds. So the gpus make quick work of them. The Arecibo file will slow things down a bit.
The BLC01 tasks run quick on CPUs as well as GPUs, and on GPUs they run not much over half the runtime of the recent BLC22/23 run. So to a first approximation, we would expect roughly double the return rate for a while. Caches (apart from those bumping the hard limit) will also grow as as initial runtime estimates adjust to the new normal, increasing the drawdown rate from RTS.
ID: 1962867 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1962873 - Posted: 1 Nov 2018, 21:03:16 UTC - in response to Message 1962861.  
Last modified: 1 Nov 2018, 21:05:13 UTC

That could be the effect of the noise bombs filtering though the slow hosts. I think it more likely that the "shorty" BLC01 tasks are the cause. Mine are processing in 30-60 seconds. So the gpus make quick work of them. The Arecibo file will slow things down a bit.


. . They aren't 'shorties' they are normal but the 'new' format that started with a Blc04 series in October last year and runs in about the same time as Arecibo 'normal' tasks, and had been the norm until we got this recent run of '2x' series tasks that were old school and slow. The new format takes 105 secs on my top machine compared to 180 for the Blc22/23 series of late. And in usual form Credit Screw ignores the greater efficiency of the tasks (my APR jumped over 35%) and halved the awarded credit because they take only slightly more than half the time. But that is typical for Credit Screw. There was allegedly a 'committee' formed to ponder an upgrade/replacement for Credit New but that seems to have evaporated, I haven't heard anything for months, not since the initial 'rumour' was started. But then the same is true of Parkes, PMs are unanswered and when I asked to have the Parkes thread in News to be re-activated (it had been closed because of no activity) it was instead removed. There are less leaks in SETI than the government or the CIA.

. . As for the recent problem getting new work. I reviewed my logs for overnight and it only seems to have lasted from 2:20pm to 2:50 pm UTC. So maybe it was that glitch that is often mentioned but simply at a different time.

Stephen

<shrug>
ID: 1962873 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1962927 - Posted: 2 Nov 2018, 5:48:59 UTC

... Welcome back to Earth Assimilator backlog - Pull up a chair and stay awhile.
ID: 1962927 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13832
Credit: 208,696,464
RAC: 304
Australia
Message 1962932 - Posted: 2 Nov 2018, 6:37:48 UTC - in response to Message 1962927.  

... Welcome back to Earth Assimilator backlog - Pull up a chair and stay awhile.

Yep, good the see the Validation & Assimilation backlogs have finally cleared. Now all we need is for the Results & WU Purge backlogs to clear...
Grant
Darwin NT
ID: 1962932 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1963009 - Posted: 2 Nov 2018, 20:53:56 UTC - in response to Message 1962932.  

... Welcome back to Earth Assimilator backlog - Pull up a chair and stay awhile.

Yep, good the see the Validation & Assimilation backlogs have finally cleared. Now all we need is for the Results & WU Purge backlogs to clear...

I am just curious to know why we need the work units purged results to clear? These will probably take 24 hours. Apart from people having lots of results in their accounts I cannot see any other major issue, unless we run out of disk space.
ID: 1963009 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13832
Credit: 208,696,464
RAC: 304
Australia
Message 1963013 - Posted: 2 Nov 2018, 21:19:53 UTC - in response to Message 1963009.  

I am just curious to know why we need the work units purged results to clear? These will probably take 24 hours. Apart from people having lots of results in their accounts I cannot see any other major issue, unless we run out of disk space.

That's part of it- running out of disk space, not to mention the load on the database, and really, really, really long wait times when looking at the results on your computers- particularly for those with high output systems.
Think of it as being like the Recycle Bin on the windows desktop- once you delete the file, it's still there till you empty it. Same here- the files need to be purged after being deleted to finally free up that space, and reduce the size of the database.
And most of the issues we have with the servers relate to the size of the database.
Grant
Darwin NT
ID: 1963013 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1963016 - Posted: 2 Nov 2018, 21:30:19 UTC

and really, really, really long wait times when looking at the results on your computers- particularly for those with high output systems.


That is my primary annoyance. The website is unusable when the Results and WU purge loads get large, especially after an outage and for several days later, typically running right into the next Tuesday outage. If the web page for one of my hosts doesn't display and times out, I just give up and look at it the next day. Makes it hard to keep on top of my hosts to make sure they are returning valid work and haven't had an upset I need to catch and rectify.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1963016 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14672
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1963017 - Posted: 2 Nov 2018, 21:46:28 UTC - in response to Message 1963009.  

I am just curious to know why we need the work units purged results to clear? These will probably take 24 hours. Apart from people having lots of results in their accounts I cannot see any other major issue, unless we run out of disk space.
It's more that records in the database tables need to clear - and in particular, the indexes to the records need to be shrunk until they fit in RAM (not on disk). Otherwise, they will be unutterably slow.
ID: 1963017 · Report as offensive
Previous · 1 . . . 27 · 28 · 29 · 30 · 31 · 32 · 33 . . . 37 · Next

Message boards : Number crunching : Panic Mode On (113) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.