Panic Mode On (87) Server Problems?

Message boards : Number crunching : Panic Mode On (87) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 24 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51527
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1485677 - Posted: 7 Mar 2014, 9:37:29 UTC - in response to Message 1485672.  

Seems to be a lot of server crashes lately, I just have to wonder how well the COLO facility is looking after them.

Cheers.

From what I understand, the COLO provides, rack space, solid power, AC and a big fast pipe to the outside world. Plus someone who can reboot a server if it cannot be rebooted remotely.

It does not provide new hardware, that is the same as was in the lab and is nothing to do with the COLO

I am sure the guys at the COLO do not have direct access to login into the servers and do not monitor each one. That is the job of the owners, in this case SET@Home.

As I see it the problems are still 100% SETI hardware or software, COLO has no bearing on the current problems

Unless they are not keeping up with their end on the power conditioning or cooling.
Dunno.
Most of these servers are just a few years old. I might suspect PSU aging before anything else. If that is the case, a few modestly priced PSU replacements might bring back the old Marvin server for further use. Don't know if da boyz in da lab have done any testing in that regard.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1485677 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 37709
Credit: 261,360,520
RAC: 489
Australia
Message 1485679 - Posted: 7 Mar 2014, 9:40:48 UTC

You're right Bernie, unless of course the rack space isn't being properly cooled.

Cheers.
ID: 1485679 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1485747 - Posted: 7 Mar 2014, 15:14:09 UTC

At this point, we don't even know if this was a hardware or software crash.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1485747 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9958
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1485812 - Posted: 7 Mar 2014, 17:37:23 UTC - in response to Message 1485747.  

At this point, we don't even know if this was a hardware or software crash.

Indeed we do not, but whatever I doubt the COLO is to blame. I remember seeing that the LAB is paying for the COLO, in that case it would be embarrassing for the COLO if servers were failing all over the place due to their power or AC.
ID: 1485812 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1485908 - Posted: 7 Mar 2014, 20:43:19 UTC
Last modified: 7 Mar 2014, 20:43:34 UTC

By SSP: BOINC master database oscar Disabled

the DB is still recovering after almost 24 hs?
ID: 1485908 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1859
Credit: 268,616,081
RAC: 1,349
United States
Message 1485928 - Posted: 7 Mar 2014, 21:32:43 UTC - in response to Message 1485908.  

By SSP: BOINC master database oscar Disabled

the DB is still recovering after almost 24 hs?


I believe the master database server crashed hard, and we're running on the replica as current master.
ID: 1485928 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1486210 - Posted: 8 Mar 2014, 15:15:03 UTC - in response to Message 1485672.  

Seems to be a lot of server crashes lately, I just have to wonder how well the COLO facility is looking after them.

Cheers.

From what I understand, the COLO provides, rack space, solid power, AC and a big fast pipe to the outside world. Plus someone who can reboot a server if it cannot be rebooted remotely.

It does not provide new hardware, that is the same as was in the lab and is nothing to do with the COLO

I am sure the guys at the COLO do not have direct access to login into the servers and do not monitor each one. That is the job of the owners, in this case SET@Home.

As I see it the problems are still 100% SETI hardware or software, COLO has no bearing on the current problems

Basic colo services I have come across are normally just:
$xx/mo per RU.
UPS & generator backed power.
Internet access at xxMb/s, given for the paid tier, or connection to users 3rd party provider, if allowed.
Staff to push buttons on request. x number of instances/hours a month free $xx per instance/hour after.

Then if you want monitoring of IPMI or baby sitting of your equipment beyond that the pricing tend to go up quite a lot.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1486210 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14690
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1486219 - Posted: 8 Mar 2014, 15:36:31 UTC - in response to Message 1486210.  

Seems to be a lot of server crashes lately, I just have to wonder how well the COLO facility is looking after them.

Cheers.

From what I understand, the COLO provides, rack space, solid power, AC and a big fast pipe to the outside world. Plus someone who can reboot a server if it cannot be rebooted remotely.

It does not provide new hardware, that is the same as was in the lab and is nothing to do with the COLO

I am sure the guys at the COLO do not have direct access to login into the servers and do not monitor each one. That is the job of the owners, in this case SET@Home.

As I see it the problems are still 100% SETI hardware or software, COLO has no bearing on the current problems

Basic colo services I have come across are normally just:
$xx/mo per RU.
UPS & generator backed power.
Internet access at xxMb/s, given for the paid tier, or connection to users 3rd party provider, if allowed.
Staff to push buttons on request. x number of instances/hours a month free $xx per instance/hour after.

Then if you want monitoring of IPMI or baby sitting of your equipment beyond that the pricing tend to go up quite a lot.

All very similar in http://ist.berkeley.edu/files/DataCenterColocationSLA-20130521.pdf
ID: 1486219 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1486272 - Posted: 8 Mar 2014, 17:25:08 UTC - in response to Message 1486219.  

Seems to be a lot of server crashes lately, I just have to wonder how well the COLO facility is looking after them.

Cheers.

From what I understand, the COLO provides, rack space, solid power, AC and a big fast pipe to the outside world. Plus someone who can reboot a server if it cannot be rebooted remotely.

It does not provide new hardware, that is the same as was in the lab and is nothing to do with the COLO

I am sure the guys at the COLO do not have direct access to login into the servers and do not monitor each one. That is the job of the owners, in this case SET@Home.

As I see it the problems are still 100% SETI hardware or software, COLO has no bearing on the current problems

Basic colo services I have come across are normally just:
$xx/mo per RU.
UPS & generator backed power.
Internet access at xxMb/s, given for the paid tier, or connection to users 3rd party provider, if allowed.
Staff to push buttons on request. x number of instances/hours a month free $xx per instance/hour after.

Then if you want monitoring of IPMI or baby sitting of your equipment beyond that the pricing tend to go up quite a lot.

All very similar in http://ist.berkeley.edu/files/DataCenterColocationSLA-20130521.pdf

I had tired to find that after my post but didn't remember the location. The services a colo offers seem fairly universal in my checking for personal and work use.

Everyone just needs to remember the colo doesn't come with 24/7 server admins. The same staff of, 3ish?, guys are still the one admining the boxen. Who require things like sleep, have other duties/responsibilities, & sometimes take their own personal time on weekends to fix things.

Personally I am impressed with how quickly they were able to find the time to get the AP server swapped out.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1486272 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13913
Credit: 208,696,464
RAC: 304
Australia
Message 1486348 - Posted: 8 Mar 2014, 20:59:40 UTC - in response to Message 1486272.  

AP Assimilator & Validator backlogs continue to grow.
Grant
Darwin NT
ID: 1486348 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13913
Credit: 208,696,464
RAC: 304
Australia
Message 1486516 - Posted: 9 Mar 2014, 3:10:40 UTC - in response to Message 1486515.  

I felt a great disturbance in the Force. Are we going down again?

It's been up for a couple of days now, so we're due.
Grant
Darwin NT
ID: 1486516 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 37709
Credit: 261,360,520
RAC: 489
Australia
Message 1486517 - Posted: 9 Mar 2014, 3:15:16 UTC - in response to Message 1486515.  

I felt a great disturbance in the Force. Are we going down again?

Sorry, that was just me passing wind. :-O

Cheers.
ID: 1486517 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1486557 - Posted: 9 Mar 2014, 6:48:01 UTC - in response to Message 1486517.  

I felt a great disturbance in the Force. Are we going down again?

Sorry, that was just me passing wind. :-O

Cheers.

Effects of the fried rice?
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1486557 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 37709
Credit: 261,360,520
RAC: 489
Australia
Message 1486600 - Posted: 9 Mar 2014, 9:25:14 UTC - in response to Message 1486557.  

I felt a great disturbance in the Force. Are we going down again?

Sorry, that was just me passing wind. :-O

Cheers.

Effects of the fried rice?

Or a slightly gassier batch of beer.

Cheers.
ID: 1486600 · Report as offensive
Miklos M.

Send message
Joined: 5 May 99
Posts: 955
Credit: 136,115,648
RAC: 73
Hungary
Message 1486662 - Posted: 9 Mar 2014, 12:51:13 UTC

I keep getting the following messages from SETI, but no wu's. Even when I suspend other programs.
3/9/2014 8:47:38 AM | SETI@home | update requested by user
3/9/2014 8:47:42 AM | SETI@home | Sending scheduler request: Requested by user.
3/9/2014 8:47:42 AM | SETI@home | Not requesting tasks: don't need
3/9/2014 8:47:45 AM | SETI@home | Scheduler request completed
Anyone has any suggestions, please?
By the way Happy Almost 15th Anniversary.
ID: 1486662 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 37709
Credit: 261,360,520
RAC: 489
Australia
Message 1486665 - Posted: 9 Mar 2014, 13:04:15 UTC - in response to Message 1486662.  

I keep getting the following messages from SETI, but no wu's. Even when I suspend other programs.
3/9/2014 8:47:38 AM | SETI@home | update requested by user
3/9/2014 8:47:42 AM | SETI@home | Sending scheduler request: Requested by user.
3/9/2014 8:47:42 AM | SETI@home | Not requesting tasks: don't need
3/9/2014 8:47:45 AM | SETI@home | Scheduler request completed
Anyone has any suggestions, please?
By the way Happy Almost 15th Anniversary.

A lot of people have been having this problem with 7.2.39 (seems to be a buggy version) and updating to 7.2.42 has fixed it for them.

Cheers.
ID: 1486665 · Report as offensive
Filipe

Send message
Joined: 12 Aug 00
Posts: 218
Credit: 21,281,677
RAC: 20
Portugal
Message 1486680 - Posted: 9 Mar 2014, 14:28:54 UTC

No new tapes coming?
ID: 1486680 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1486697 - Posted: 9 Mar 2014, 15:12:31 UTC

No stock MB on a BOINC 6.10.58 laptop. AP on my 7.2.41 workstation.
Tullio
ID: 1486697 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1486711 - Posted: 9 Mar 2014, 16:28:08 UTC - in response to Message 1486697.  
Last modified: 9 Mar 2014, 16:49:30 UTC

No stock MB on a BOINC 6.10.58 laptop.

It hasn't contacted the project since 2 Mar 2014, if it doesn't ask, it doesn't get, But it does have work from Test4Theory and Einstein:

In progress tasks for computer 62554 at Test4Theory

In progress tasks for computer 8444797 at Einstein

Claggy
ID: 1486711 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1486721 - Posted: 9 Mar 2014, 16:46:14 UTC

No new tapes coming?

One finally started. I was about to enter Panic Mode.
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.
ID: 1486721 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 24 · Next

Message boards : Number crunching : Panic Mode On (87) Server Problems?


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.