Panic Mode On (87) Server Problems?

Message boards : Number crunching : Panic Mode On (87) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 24 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51469
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1485677 - Posted: 7 Mar 2014, 9:37:29 UTC - in response to Message 1485672.  

Seems to be a lot of server crashes lately, I just have to wonder how well the COLO facility is looking after them.

Cheers.

From what I understand, the COLO provides, rack space, solid power, AC and a big fast pipe to the outside world. Plus someone who can reboot a server if it cannot be rebooted remotely.

It does not provide new hardware, that is the same as was in the lab and is nothing to do with the COLO

I am sure the guys at the COLO do not have direct access to login into the servers and do not monitor each one. That is the job of the owners, in this case SET@Home.

As I see it the problems are still 100% SETI hardware or software, COLO has no bearing on the current problems

Unless they are not keeping up with their end on the power conditioning or cooling.
Dunno.
Most of these servers are just a few years old. I might suspect PSU aging before anything else. If that is the case, a few modestly priced PSU replacements might bring back the old Marvin server for further use. Don't know if da boyz in da lab have done any testing in that regard.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1485677 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34905
Credit: 261,360,520
RAC: 489
Australia
Message 1485679 - Posted: 7 Mar 2014, 9:40:48 UTC

You're right Bernie, unless of course the rack space isn't being properly cooled.

Cheers.
ID: 1485679 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1485747 - Posted: 7 Mar 2014, 15:14:09 UTC

At this point, we don't even know if this was a hardware or software crash.
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1485747 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1485812 - Posted: 7 Mar 2014, 17:37:23 UTC - in response to Message 1485747.  

At this point, we don't even know if this was a hardware or software crash.

Indeed we do not, but whatever I doubt the COLO is to blame. I remember seeing that the LAB is paying for the COLO, in that case it would be embarrassing for the COLO if servers were failing all over the place due to their power or AC.
ID: 1485812 · Report as offensive
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 1485908 - Posted: 7 Mar 2014, 20:43:19 UTC
Last modified: 7 Mar 2014, 20:43:34 UTC

By SSP: BOINC master database oscar Disabled

the DB is still recovering after almost 24 hs?
ID: 1485908 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1485928 - Posted: 7 Mar 2014, 21:32:43 UTC - in response to Message 1485908.  

By SSP: BOINC master database oscar Disabled

the DB is still recovering after almost 24 hs?


I believe the master database server crashed hard, and we're running on the replica as current master.
ID: 1485928 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1486210 - Posted: 8 Mar 2014, 15:15:03 UTC - in response to Message 1485672.  

Seems to be a lot of server crashes lately, I just have to wonder how well the COLO facility is looking after them.

Cheers.

From what I understand, the COLO provides, rack space, solid power, AC and a big fast pipe to the outside world. Plus someone who can reboot a server if it cannot be rebooted remotely.

It does not provide new hardware, that is the same as was in the lab and is nothing to do with the COLO

I am sure the guys at the COLO do not have direct access to login into the servers and do not monitor each one. That is the job of the owners, in this case SET@Home.

As I see it the problems are still 100% SETI hardware or software, COLO has no bearing on the current problems

Basic colo services I have come across are normally just:
$xx/mo per RU.
UPS & generator backed power.
Internet access at xxMb/s, given for the paid tier, or connection to users 3rd party provider, if allowed.
Staff to push buttons on request. x number of instances/hours a month free $xx per instance/hour after.

Then if you want monitoring of IPMI or baby sitting of your equipment beyond that the pricing tend to go up quite a lot.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1486210 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1486219 - Posted: 8 Mar 2014, 15:36:31 UTC - in response to Message 1486210.  

Seems to be a lot of server crashes lately, I just have to wonder how well the COLO facility is looking after them.

Cheers.

From what I understand, the COLO provides, rack space, solid power, AC and a big fast pipe to the outside world. Plus someone who can reboot a server if it cannot be rebooted remotely.

It does not provide new hardware, that is the same as was in the lab and is nothing to do with the COLO

I am sure the guys at the COLO do not have direct access to login into the servers and do not monitor each one. That is the job of the owners, in this case SET@Home.

As I see it the problems are still 100% SETI hardware or software, COLO has no bearing on the current problems

Basic colo services I have come across are normally just:
$xx/mo per RU.
UPS & generator backed power.
Internet access at xxMb/s, given for the paid tier, or connection to users 3rd party provider, if allowed.
Staff to push buttons on request. x number of instances/hours a month free $xx per instance/hour after.

Then if you want monitoring of IPMI or baby sitting of your equipment beyond that the pricing tend to go up quite a lot.

All very similar in http://ist.berkeley.edu/files/DataCenterColocationSLA-20130521.pdf
ID: 1486219 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1486272 - Posted: 8 Mar 2014, 17:25:08 UTC - in response to Message 1486219.  

Seems to be a lot of server crashes lately, I just have to wonder how well the COLO facility is looking after them.

Cheers.

From what I understand, the COLO provides, rack space, solid power, AC and a big fast pipe to the outside world. Plus someone who can reboot a server if it cannot be rebooted remotely.

It does not provide new hardware, that is the same as was in the lab and is nothing to do with the COLO

I am sure the guys at the COLO do not have direct access to login into the servers and do not monitor each one. That is the job of the owners, in this case SET@Home.

As I see it the problems are still 100% SETI hardware or software, COLO has no bearing on the current problems

Basic colo services I have come across are normally just:
$xx/mo per RU.
UPS & generator backed power.
Internet access at xxMb/s, given for the paid tier, or connection to users 3rd party provider, if allowed.
Staff to push buttons on request. x number of instances/hours a month free $xx per instance/hour after.

Then if you want monitoring of IPMI or baby sitting of your equipment beyond that the pricing tend to go up quite a lot.

All very similar in http://ist.berkeley.edu/files/DataCenterColocationSLA-20130521.pdf

I had tired to find that after my post but didn't remember the location. The services a colo offers seem fairly universal in my checking for personal and work use.

Everyone just needs to remember the colo doesn't come with 24/7 server admins. The same staff of, 3ish?, guys are still the one admining the boxen. Who require things like sleep, have other duties/responsibilities, & sometimes take their own personal time on weekends to fix things.

Personally I am impressed with how quickly they were able to find the time to get the AP server swapped out.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1486272 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1486348 - Posted: 8 Mar 2014, 20:59:40 UTC - in response to Message 1486272.  

AP Assimilator & Validator backlogs continue to grow.
Grant
Darwin NT
ID: 1486348 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65776
Credit: 55,293,173
RAC: 49
United States
Message 1486453 - Posted: 8 Mar 2014, 23:55:02 UTC - in response to Message 1485106.  

but a machine like Juan's only has a few minutes grace.

Very few minutes... I still hate the 100WU limit... At least my electric bill will be lower this month. :(

<edit> GPU´s don´t count in the equation.

I don't like the 100wu limit either, for the cpu here the limit is technically good enough, as the wu's last for 4.5-9 days with just 2 cores being used, for the gpu(s) it's starvation practically, It would be better to have the gpu limit at 100 per gpu, but those in charge have decided otherwise.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1486453 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1486516 - Posted: 9 Mar 2014, 3:10:40 UTC - in response to Message 1486515.  

I felt a great disturbance in the Force. Are we going down again?

It's been up for a couple of days now, so we're due.
Grant
Darwin NT
ID: 1486516 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34905
Credit: 261,360,520
RAC: 489
Australia
Message 1486517 - Posted: 9 Mar 2014, 3:15:16 UTC - in response to Message 1486515.  

I felt a great disturbance in the Force. Are we going down again?

Sorry, that was just me passing wind. :-O

Cheers.
ID: 1486517 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1486557 - Posted: 9 Mar 2014, 6:48:01 UTC - in response to Message 1486517.  

I felt a great disturbance in the Force. Are we going down again?

Sorry, that was just me passing wind. :-O

Cheers.

Effects of the fried rice?
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1486557 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34905
Credit: 261,360,520
RAC: 489
Australia
Message 1486600 - Posted: 9 Mar 2014, 9:25:14 UTC - in response to Message 1486557.  

I felt a great disturbance in the Force. Are we going down again?

Sorry, that was just me passing wind. :-O

Cheers.

Effects of the fried rice?

Or a slightly gassier batch of beer.

Cheers.
ID: 1486600 · Report as offensive
Miklos M.

Send message
Joined: 5 May 99
Posts: 955
Credit: 136,115,648
RAC: 73
Hungary
Message 1486662 - Posted: 9 Mar 2014, 12:51:13 UTC

I keep getting the following messages from SETI, but no wu's. Even when I suspend other programs.
3/9/2014 8:47:38 AM | SETI@home | update requested by user
3/9/2014 8:47:42 AM | SETI@home | Sending scheduler request: Requested by user.
3/9/2014 8:47:42 AM | SETI@home | Not requesting tasks: don't need
3/9/2014 8:47:45 AM | SETI@home | Scheduler request completed
Anyone has any suggestions, please?
By the way Happy Almost 15th Anniversary.
ID: 1486662 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34905
Credit: 261,360,520
RAC: 489
Australia
Message 1486665 - Posted: 9 Mar 2014, 13:04:15 UTC - in response to Message 1486662.  

I keep getting the following messages from SETI, but no wu's. Even when I suspend other programs.
3/9/2014 8:47:38 AM | SETI@home | update requested by user
3/9/2014 8:47:42 AM | SETI@home | Sending scheduler request: Requested by user.
3/9/2014 8:47:42 AM | SETI@home | Not requesting tasks: don't need
3/9/2014 8:47:45 AM | SETI@home | Scheduler request completed
Anyone has any suggestions, please?
By the way Happy Almost 15th Anniversary.

A lot of people have been having this problem with 7.2.39 (seems to be a buggy version) and updating to 7.2.42 has fixed it for them.

Cheers.
ID: 1486665 · Report as offensive
Filipe

Send message
Joined: 12 Aug 00
Posts: 218
Credit: 21,281,677
RAC: 20
Portugal
Message 1486680 - Posted: 9 Mar 2014, 14:28:54 UTC

No new tapes coming?
ID: 1486680 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1486697 - Posted: 9 Mar 2014, 15:12:31 UTC

No stock MB on a BOINC 6.10.58 laptop. AP on my 7.2.41 workstation.
Tullio
ID: 1486697 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1486711 - Posted: 9 Mar 2014, 16:28:08 UTC - in response to Message 1486697.  
Last modified: 9 Mar 2014, 16:49:30 UTC

No stock MB on a BOINC 6.10.58 laptop.

It hasn't contacted the project since 2 Mar 2014, if it doesn't ask, it doesn't get, But it does have work from Test4Theory and Einstein:

In progress tasks for computer 62554 at Test4Theory

In progress tasks for computer 8444797 at Einstein

Claggy
ID: 1486711 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 24 · Next

Message boards : Number crunching : Panic Mode On (87) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.