The Server Issues / Outages Thread - Panic Mode On! (118)

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 94 · Next

AuthorMessage
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2026105 - Posted: 3 Jan 2020, 7:57:53 UTC - in response to Message 2026104.  

Thank you for the insight & I completely agree with you will be much faster.
ID: 2026105 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2026121 - Posted: 3 Jan 2020, 11:14:06 UTC - in response to Message 2026104.  
Last modified: 3 Jan 2020, 11:15:14 UTC

Grant in regards to the speed I am not sure there will be any increase as the hard drives are only 7200 rpm, in saying this I do not know what speed the drives are we are currently using.
The areal density of the new drives is much, much, much higher than the older ones, which means even for the same rotational speed their minimum & maximum data rates will be much higher.
Also being in a single enclosure, with much newer hardware (and more RAM for caching), the overall performance should be significantly better.
An All Flash Array would be better still (probably by an order of magnitude or more), but the cost of one with the same storage capacity would be considerably greater.


. . And that was in effect the new server which has died on Beta, Muarae2, all SSD drives.

Stephen

:(
ID: 2026121 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 2026124 - Posted: 3 Jan 2020, 11:42:26 UTC - in response to Message 2026121.  

Grant in regards to the speed I am not sure there will be any increase as the hard drives are only 7200 rpm, in saying this I do not know what speed the drives are we are currently using.
The areal density of the new drives is much, much, much higher than the older ones, which means even for the same rotational speed their minimum & maximum data rates will be much higher.
Also being in a single enclosure, with much newer hardware (and more RAM for caching), the overall performance should be significantly better.
An All Flash Array would be better still (probably by an order of magnitude or more), but the cost of one with the same storage capacity would be considerably greater.

. . And that was in effect the new server which has died on Beta, Muarae2, all SSD drives.

Stephen :(
Until the write/rewrite wear, cost and size factors have been eliminated then a mechanical drive is still the best outright bang for buck solution still under ideal conditions. ;-)

Cheers.
ID: 2026124 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2026146 - Posted: 3 Jan 2020, 16:24:16 UTC - in response to Message 2026124.  

. . And that was in effect the new server which has died on Beta, Muarae2, all SSD drives.

I haven't seen ANY post-mortem report on muarae2 over at Beta. What have you heard? All I heard was they had a problem with the database. Not any specifics.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2026146 · Report as offensive
Profile Unixchick Project Donor
Avatar

Send message
Joined: 5 Mar 12
Posts: 815
Credit: 2,361,516
RAC: 22
United States
Message 2026148 - Posted: 3 Jan 2020, 17:38:23 UTC
Last modified: 3 Jan 2020, 17:40:28 UTC

The RTS is building and the out in the field is dropping. The greedy server isn't handing out WUs right now, even though it has some. I know at some point the dam will break and the WUs will flow, but I'm curious what coding bit is causing this hoarding.

edit: and just after posting. I get a bunch of WUs.
ID: 2026148 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11361
Credit: 29,581,041
RAC: 66
United States
Message 2026149 - Posted: 3 Jan 2020, 17:42:06 UTC - in response to Message 2026148.  

Posting about a problem often fixes it immediately
ID: 2026149 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 2026154 - Posted: 3 Jan 2020, 18:39:45 UTC - in response to Message 2026148.  
Last modified: 3 Jan 2020, 18:39:58 UTC

The RTS is building and the out in the field is dropping. The greedy server isn't handing out WUs right now, even though it has some. I know at some point the dam will break and the WUs will flow, but I'm curious what coding bit is causing this hoarding.

edit: and just after posting. I get a bunch of WUs.

Unless the RTS row (or any row, really) is showing "As Of" 0 minutes, the data really doesn't mean much. Right now, it's at 9 minutes behind. Also, the fact that there is a lag time itself probably indicates other higher priority processes are hogging the available processing capacity or, at the least, too busy to respond with real times or numbers.
ID: 2026154 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22194
Credit: 416,307,556
RAC: 380
United Kingdom
Message 2026155 - Posted: 3 Jan 2020, 19:01:12 UTC

The server stats page only updates about every 15 minutes, so 9 minutes is between updates, so no need to worry about that.
But the RTS is perilously low at about 6000, and a creation rate of only 25 per second.
What is "interesting" is that the master database is running at about 1600 queries per second, which is pretty close to highest I've seen.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 2026155 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2026175 - Posted: 3 Jan 2020, 22:47:48 UTC

Splitters still struggling, present output 0.
Grant
Darwin NT
ID: 2026175 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2026177 - Posted: 3 Jan 2020, 23:00:38 UTC - in response to Message 2026146.  
Last modified: 3 Jan 2020, 23:03:01 UTC

. . And that was in effect the new server which has died on Beta, Muarae2, all SSD drives.

I haven't seen ANY post-mortem report on muarae2 over at Beta. What have you heard? All I heard was they had a problem with the database. Not any specifics.

. . When it first went down I am sure I read there was a problem with one of the drives that corrupted the database. Not heard anything since ... Things happen here in Astronomical time, everything takes light years :) Sorry, I'll put that one back in the bad jokes box.

Stephen

:)

P.S. I am still waiting to process any Parkes data.
ID: 2026177 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2026179 - Posted: 3 Jan 2020, 23:19:00 UTC - in response to Message 2026177.  
Last modified: 3 Jan 2020, 23:23:50 UTC

When it first went down I am sure I read there was a problem with one of the drives that corrupted the database.
All it said was that there was a database issue. No indication of what that issue was, or how it was caused.



Edit- here it is,

The file system containing the beta project uploads directory is having problems, so beta is down until further notice.
So it could be hardware (the drives, memory, power supply, motherboard, BIOS etc, etc, etc) or software- the OS (a patch or update or a configuration change that borked things).
Grant
Darwin NT
ID: 2026179 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2026189 - Posted: 4 Jan 2020, 0:12:47 UTC - in response to Message 2026179.  

When it first went down I am sure I read there was a problem with one of the drives that corrupted the database.
All it said was that there was a database issue. No indication of what that issue was, or how it was caused.



Edit- here it is,

The file system containing the beta project uploads directory is having problems, so beta is down until further notice.
So it could be hardware (the drives, memory, power supply, motherboard, BIOS etc, etc, etc) or software- the OS (a patch or update or a configuration change that borked things).

That is why I posted the question in response to the quote that the new SSD drives had all failed and that only spinning storage was reliable.

I questioned whether that statement was true when AFAIK there had never been any statement of ANY hardware failure. We just don't know what went wrong.

And as far as I'm concerned Stephen . . . your joke CAN be pulled out of the box as it is very relevant.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2026189 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2026197 - Posted: 4 Jan 2020, 0:35:52 UTC - in response to Message 2026177.  

Not heard anything since ... Things happen here in Astronomical time, everything takes light years :) Sorry, I'll put that one back in the bad jokes box.

Stephen

:)


Stephen, don't put that poor joke away. It was "Astronomically" funny....

Tom
A proud member of the OFA (Old Farts Association).
ID: 2026197 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2026337 - Posted: 4 Jan 2020, 21:52:44 UTC

Linux system getting some downloads timing out instantly (no hosts file setting), Windows system ok so far (hosts file setting in use).
Grant
Darwin NT
ID: 2026337 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 2026350 - Posted: 4 Jan 2020, 23:45:26 UTC

Sorry, I just have not had time to figure out what has gone wrong with the beta SSDs. It could just be a file system problem. Or it could be worse. Hopefully this coming week I'll have a chance.
@SETIEric@qoto.org (Mastodon)

ID: 2026350 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2026352 - Posted: 4 Jan 2020, 23:49:04 UTC - in response to Message 2026350.  
Last modified: 4 Jan 2020, 23:51:55 UTC

Sorry, I just have not had time to figure out what has gone wrong with the beta SSDs. It could just be a file system problem. Or it could be worse. Hopefully this coming week I'll have a chance.
Thanks for the update.


BTW- we may have a stuck Arecibo file.
04ja20aa
4 Jan 2020, 23:40:04 UTC, MB- 11 in progress, 3 to do. AP- 7 in progress, 7 to do.
Pretty sure it's been that way for a while now, but just noting time & date to check again in an hour or two.


Splitters, still struggling.
Grant
Darwin NT
ID: 2026352 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2026357 - Posted: 5 Jan 2020, 0:42:29 UTC - in response to Message 2026350.  

Sorry, I just have not had time to figure out what has gone wrong with the beta SSDs. It could just be a file system problem. Or it could be worse. Hopefully this coming week I'll have a chance.


. . I don't believe you owe us an apology. Anyone who has been taking part in this project for any length of time and fails to appreciate that you guys are pretty busy just has not been paying attention. It just helps to grumble a little ... :)

. . BTW how is the Parkes issue going ? ...

Stephen

<grins, ducks and runs fast ... >
ID: 2026357 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 2026375 - Posted: 5 Jan 2020, 2:30:00 UTC - in response to Message 2026352.  

BTW- we may have a stuck Arecibo file.
04ja20aa
4 Jan 2020, 23:40:04 UTC, MB- 11 in progress, 3 to do. AP- 7 in progress, 7 to do.
Pretty sure it's been that way for a while now, but just noting time & date to check again in an hour or two.
It was just sitting there for quite a while, but is now slowly being split.
Grant
Darwin NT
ID: 2026375 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 2026404 - Posted: 5 Jan 2020, 8:02:27 UTC - in response to Message 2026375.  

BTW- we may have a stuck Arecibo file.
04ja20aa
4 Jan 2020, 23:40:04 UTC, MB- 11 in progress, 3 to do. AP- 7 in progress, 7 to do.
Pretty sure it's been that way for a while now, but just noting time & date to check again in an hour or two.
It was just sitting there for quite a while, but is now slowly being split.

It has now completed been processed
ID: 2026404 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2026415 - Posted: 5 Jan 2020, 11:59:26 UTC
Last modified: 5 Jan 2020, 11:59:58 UTC

It's Sunday Morning, and it appears the Server has checked out. I can Upload, but, all machines can't Report and Fetch work.
It's going to be the coldest night of the Year tonight, and I can't get work.
The results received is nosediving.... Results received in last hour = 93,016
ID: 2026415 · Report as offensive
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 94 · Next

Message boards : Number crunching : The Server Issues / Outages Thread - Panic Mode On! (118)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.