Panic Mode On (116) Server Problems?

Message boards : Number crunching : Panic Mode On (116) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 34 · 35 · 36 · 37 · 38 · 39 · 40 . . . 47 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1999085 - Posted: 21 Jun 2019, 22:19:04 UTC

Just sent Eric a message about the 18dc09aa stuck dataset.
Maybe he'll be able to reboot it or just boot it.

Meow.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1999085 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1999087 - Posted: 21 Jun 2019, 22:21:21 UTC - in response to Message 1999085.  

Just sent Eric a message about the 18dc09aa stuck dataset.
Maybe he'll be able to reboot it or just boot it.

Good to hear- with it sitting there tying up splitters, there are only 3 PFB splitters available to actually process any Arecibo work that does come along.
Grant
Darwin NT
ID: 1999087 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1999134 - Posted: 22 Jun 2019, 5:49:30 UTC - in response to Message 1999132.  

The BOINC server is down, and with that the whole website and forums. Jeff has been notified, no ETA on when it's back.

I downloaded a full cache at 5 03 13 Great work to whoever brought us back online
ID: 1999134 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1999551 - Posted: 25 Jun 2019, 9:41:12 UTC

Looks like there are a few noisy WUs making their way through at the moment, although no where near the numbers of the previous BLC25 noise bomb storm.
Grant
Darwin NT
ID: 1999551 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1999566 - Posted: 25 Jun 2019, 14:02:56 UTC - in response to Message 1999134.  

The BOINC server is down, and with that the whole website and forums. Jeff has been notified, no ETA on when it's back.

I downloaded a full cache at 5 03 13 Great work to whoever brought us back online


. . They are talking about the BOINC site, not SETI ...

Stephen

<shrug>
ID: 1999566 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1999567 - Posted: 25 Jun 2019, 14:04:15 UTC - in response to Message 1999551.  

Looks like there are a few noisy WUs making their way through at the moment, although no where near the numbers of the previous BLC25 noise bomb storm.


. . Probably not as bad but it seems to me it is getting kind of windy out there :)

Stephen

<shrug>
ID: 1999567 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1999572 - Posted: 25 Jun 2019, 14:44:53 UTC

We are definitely in a shorty storm. We are returning 145K results per hour right now, when we have been closer to 100k/hr for these data files. We need a shorty storm scale like we have for hurricanes.
ID: 1999572 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1999595 - Posted: 25 Jun 2019, 22:02:43 UTC
Last modified: 25 Jun 2019, 22:03:09 UTC

The recovery from the normal Tuesday outage is not going well. I'm sure it will sort itself out in a couple of hours, but WU creation is too slow to keep up with demand at the moment. We will hit empty in the ready to send queue soon, so hopefully everyone has a small cache of WUs to last until we dig ourselves out of this hole.

They have 28 splitters vs our normal 14 going, which will cause splitting the last bit of some of the files (9) to hang, but the normal 14 will continue on to split, so it doesn't do any lasting harm, it just is ugly when looking at the status page.

The shorty storm observed before shutdown could also be causing some problems.

Really wish we had some Aricebo data to split as it helps to have more splitters working on filling the ready to send queue after an outage. Some Parkes data would be nice too if it has separate splitters :-P
ID: 1999595 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1999613 - Posted: 25 Jun 2019, 23:38:19 UTC

18dc09aa remains stuck ...
ID: 1999613 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1999651 - Posted: 26 Jun 2019, 4:10:13 UTC - in response to Message 1999572.  
Last modified: 26 Jun 2019, 4:37:05 UTC

We are definitely in a shorty storm. We are returning 145K results per hour right now, when we have been closer to 100k/hr for these data files. We need a shorty storm scale like we have for hurricanes.

Around 187k for a while now (down from a peak of over 200k).

Edit-
Make that over 190k for a while now.
Grant
Darwin NT
ID: 1999651 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1999657 - Posted: 26 Jun 2019, 6:00:44 UTC - in response to Message 1999651.  
Last modified: 26 Jun 2019, 6:09:34 UTC

Edit-
Make that over 190k for a while now.

Just hit 197k. Can we make the 200k mark? (And when will the servers fall in a heap?)

Edit-
That didn't take long, over 200k and still climbing.
Grant
Darwin NT
ID: 1999657 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1999658 - Posted: 26 Jun 2019, 6:04:14 UTC - in response to Message 1999613.  

18dc09aa remains stuck ...

I don't know why they haven't kicked that file to the curb long ago.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1999658 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1999659 - Posted: 26 Jun 2019, 6:08:18 UTC - in response to Message 1999658.  

18dc09aa remains stuck ...

I don't know why they haven't kicked that file to the curb long ago.

Or at least used it for debugging just what it is that causes the splitters to jam up on it.
Grant
Darwin NT
ID: 1999659 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 1999660 - Posted: 26 Jun 2019, 6:32:11 UTC - in response to Message 1999657.  

Edit-
Make that over 190k for a while now.

Just hit 197k. Can we make the 200k mark? (And when will the servers fall in a heap?)

Edit-
That didn't take long, over 200k and still climbing.


says 211k/ hr for returned results - bad shorty storm. I must say that I'm amazed how well the machine is handling the load. Hopefully it won't crash. The results out in the field is falling, so it can't split fast enough to keep up with demand. It's 11:30pm in California, so I doubt much will be done about the situation until morning.
ID: 1999660 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1999662 - Posted: 26 Jun 2019, 6:42:41 UTC - in response to Message 1999572.  

We are definitely in a shorty storm.
Not getting any better, at least here ...
ID: 1999662 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1999663 - Posted: 26 Jun 2019, 6:44:50 UTC - in response to Message 1999660.  

Yes the servers are handling it quite well considering the load.
Once the database lag and assimilation catch up (from maintenance), performance should improve with them having less load as well.

I wouldn't expect it to get any better for overflows until the next tape series - there are a million slower CPUs out there with 24h plus cache, so they are still working on normal runtimes. As they get into these tasks ... more and more returns.
ID: 1999663 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1999664 - Posted: 26 Jun 2019, 7:09:34 UTC - in response to Message 1999663.  

I wouldn't expect it to get any better for overflows until the next tape series - there are a million slower CPUs out there with 24h plus cache, so they are still working on normal runtimes. As they get into these tasks ... more and more returns.

Yeah, I had been getting 1 or 2 noise bombs here & there, but just had a batch of 9 in a row bomb out.
Grant
Darwin NT
ID: 1999664 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13727
Credit: 208,696,464
RAC: 304
Australia
Message 1999665 - Posted: 26 Jun 2019, 7:23:35 UTC - in response to Message 1999572.  

We are definitely in a shorty storm.

Not shortie, Noise bombs.
Shorties are Arecibo WUs that take less than half the usual time to process, but at least do take time process.
Noise bombs are what we've got at the moment- noisy data that that finishes almost as soon as it starts.

And to add to the fun, the splitters are showing signs of problems. Pumping out the work at a good rate, then not producing any for several minutes. Then pumping it out again for a hour or two, then nothing for a while.
Grant
Darwin NT
ID: 1999665 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1999667 - Posted: 26 Jun 2019, 7:42:51 UTC

Hundreds of Noise Bombs have been through here (probably well over a thousand) and now I'm having trouble downloading enough work fast enough to cover the mongrels.

Cheers.
ID: 1999667 · Report as offensive
Profile Stargate (SA)
Volunteer tester
Avatar

Send message
Joined: 4 Mar 10
Posts: 1854
Credit: 2,258,721
RAC: 0
Australia
Message 1999669 - Posted: 26 Jun 2019, 7:50:04 UTC - in response to Message 1999667.  

Getting loads myself Boss
ID: 1999669 · Report as offensive
Previous · 1 . . . 34 · 35 · 36 · 37 · 38 · 39 · 40 . . . 47 · Next

Message boards : Number crunching : Panic Mode On (116) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.