Panic Mode On (80) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (80) Server Problems?

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 25 · Next
Author Message
Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2523
Credit: 9,558,384
RAC: 3,864
United States
Message 1324806 - Posted: 5 Jan 2013, 5:25:27 UTC

If I get one more AP, I'll have a full 10-day cache. I guess that's about the only good thing to being CPU-only is it makes a large cache less than the limits.
____________
Linux laptop:
Record uptime: 1484d 22h 42m
Ended due to UPS failure, as discovered 14 hours later

Current run: 1139d 23h 53m (as of 20150727_2050 UTC)

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 6303
Credit: 72,497,165
RAC: 46,868
Australia
Message 1324814 - Posted: 5 Jan 2013, 6:12:51 UTC - in response to Message 1324806.
Last modified: 5 Jan 2013, 6:13:05 UTC

I guess that's about the only good thing to being CPU-only is it makes a large cache less than the limits.

With MB my Core2 Duo can last for about 4 days with long running WUs, probably 5 if they were all VLARs.
On my i7 even with all VLARs the present limits wouldn't give me a days work.
____________
Grant
Darwin NT.

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 41704
Credit: 690,328,515
RAC: 331,638
United States
Message 1324924 - Posted: 5 Jan 2013, 14:39:44 UTC

Bad kitty juju here.......
Some time last night there was a power glitch, and EVERYTHING reset.
Most of the rigs restarted OK, but a couple of them hung, including the Frozen One, who's compressor does not appreciate being restarted like that.
Or the freaking modem and router, who apparently cannot come back up without a few restarts by the kitty paws.
Not a great good welcome morning here.
____________
*********************************************
Behold the power of kitty!!

I have met a few friends in my life.
Most were cats.

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 374
Credit: 3,577,128
RAC: 2,744
United States
Message 1324929 - Posted: 5 Jan 2013, 15:03:15 UTC

I'm concern that they haven't powered down for the scheduled electrical work. I'm concerned because I could see that the staff was told the work was delayed but sometime today someone else will simply cut the power, leading to a mess that'll take days to restore.
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 41704
Credit: 690,328,515
RAC: 331,638
United States
Message 1324930 - Posted: 5 Jan 2013, 15:06:14 UTC - in response to Message 1324929.
Last modified: 5 Jan 2013, 15:07:33 UTC

I'm concern that they haven't powered down for the scheduled electrical work. I'm concerned because I could see that the staff was told the work was delayed but sometime today someone else will simply cut the power, leading to a mess that'll take days to restore.

Keith...

I am sure they have it covered.
The work was rescheduled.
I am sure we shall see a notice when they are gonna do it.
They're not simply gonna cut out the line to the SSL without notice.
____________
*********************************************
Behold the power of kitty!!

I have met a few friends in my life.
Most were cats.

rob smith
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 9855
Credit: 85,511,465
RAC: 144,339
United Kingdom
Message 1324945 - Posted: 5 Jan 2013, 15:40:24 UTC

The fact that Matt has sent out a message saying that the outage has been cancelled indicates that the outage for the repairs to the SSL-wide power systems has been cancelled for this weekend, and that the whole of the SSL will be breathing a communal sigh of relief. One of two things has happened - the uni has not been able to undertake the repairs this weekend for whatever reason, or the uni has found a way to conduct the repairs without shutting down the whole of the SSL. I suspect the former is the more probable, so there will be a planned outage for the work to be done, and there will be notice of it being done so the whole lab can prepared for it.

(We are looking forward to version four of a similar outage where I work.)
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12810
Credit: 2,894,216
RAC: 1,427
Netherlands
Message 1324951 - Posted: 5 Jan 2013, 15:59:21 UTC

Looks like there can be some outage on Monday, though.
IST Service Status writes:

Scheduled Outage Space Sciences Laboratory

Outage Type: SCHEDULED OUTAGE
Date Submitted: Monday, January 7, 2013
Outage Start/End Time: 0630 – 0700
Groups Impacted: Space Sciences Laboratory
Equipment: sut1ds, sslringsut1fes

Description: Users on the PPCS-SSL-net_169.229.155.240/29 subnet will be without network connectivity while a switchport is reconfigured.

But perhaps that it only affects the forums, and not the data carrier.
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 41704
Credit: 690,328,515
RAC: 331,638
United States
Message 1324954 - Posted: 5 Jan 2013, 16:01:08 UTC - in response to Message 1324945.

The fact that Matt has sent out a message saying that the outage has been cancelled indicates that the outage for the repairs to the SSL-wide power systems has been cancelled for this weekend, and that the whole of the SSL will be breathing a communal sigh of relief. One of two things has happened - the uni has not been able to undertake the repairs this weekend for whatever reason, or the uni has found a way to conduct the repairs without shutting down the whole of the SSL. I suspect the former is the more probable, so there will be a planned outage for the work to be done, and there will be notice of it being done so the whole lab can prepared for it.

(We are looking forward to version four of a similar outage where I work.)

If they have to do any transformer work, total outage is likely.
____________
*********************************************
Behold the power of kitty!!

I have met a few friends in my life.
Most were cats.

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 7366
Credit: 1,646,402
RAC: 4,684
United States
Message 1324957 - Posted: 5 Jan 2013, 16:05:14 UTC - in response to Message 1324951.

Looks like there can be some outage on Monday, though.
IST Service Status writes:
Scheduled Outage Space Sciences Laboratory

Outage Type: SCHEDULED OUTAGE
Date Submitted: Monday, January 7, 2013
Outage Start/End Time: 0630 – 0700
Groups Impacted: Space Sciences Laboratory
Equipment: sut1ds, sslringsut1fes

Description: Users on the PPCS-SSL-net_169.229.155.240/29 subnet will be without network connectivity while a switchport is reconfigured.

But perhaps that it only affects the forums, and not the data carrier.

Only half an hour allotted, and before normal working hours at Berkeley - should be minimal impact on us, probably just the forums and front page.
____________
Donald
Infernal Optimist / Submariner, retired

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 41704
Credit: 690,328,515
RAC: 331,638
United States
Message 1324958 - Posted: 5 Jan 2013, 16:07:27 UTC - in response to Message 1324957.

Looks like there can be some outage on Monday, though.
IST Service Status writes:
Scheduled Outage Space Sciences Laboratory

Outage Type: SCHEDULED OUTAGE
Date Submitted: Monday, January 7, 2013
Outage Start/End Time: 0630 – 0700
Groups Impacted: Space Sciences Laboratory
Equipment: sut1ds, sslringsut1fes

Description: Users on the PPCS-SSL-net_169.229.155.240/29 subnet will be without network connectivity while a switchport is reconfigured.

But perhaps that it only affects the forums, and not the data carrier.

Only half an hour allotted, and before normal working hours at Berkeley - should be minimal impact on us, probably just the forums and front page.

Yeah, OK.
I am an electrician, and know what we can do, LOL.


____________
*********************************************
Behold the power of kitty!!

I have met a few friends in my life.
Most were cats.

rob smith
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 9855
Credit: 85,511,465
RAC: 144,339
United Kingdom
Message 1325024 - Posted: 5 Jan 2013, 18:53:33 UTC

This is an ITS outage, not a power outage, and it only affects (one of) the out of building switches, so the servers will be able to carry on, but there may be no out of building connectivity depending on which switch they are working on and how that switch is being reconfigured.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Keith White
Avatar
Send message
Joined: 29 May 99
Posts: 374
Credit: 3,577,128
RAC: 2,744
United States
Message 1325032 - Posted: 5 Jan 2013, 19:15:13 UTC - in response to Message 1324930.
Last modified: 5 Jan 2013, 19:21:09 UTC

I'm concern that they haven't powered down for the scheduled electrical work. I'm concerned because I could see that the staff was told the work was delayed but sometime today someone else will simply cut the power, leading to a mess that'll take days to restore.

Keith...

I am sure they have it covered.
The work was rescheduled.
I am sure we shall see a notice when they are gonna do it.
They're not simply gonna cut out the line to the SSL without notice.

I take it you haven't worked with contractors, utilities or a university maintenance department before. It makes those 6 hour windows for the cable guy seem downright convenient. ;)

Edit: Okay, they did canceled it. Now back to your previously scheduled grumblings already in progress.
____________
"Life is just nature's way of keeping meat fresh." - The Doctor

.clair.
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 29,779,598
RAC: 1,171
United Kingdom
Message 1325036 - Posted: 5 Jan 2013, 19:36:58 UTC - in response to Message 1325024.

which switch they are working on and how that switch is being reconfigured.

That can be :-
software
firmware
hardware
reboot
steel toecap boot ....
hammer
wire cuters
trash IT

rob smith
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 9855
Credit: 85,511,465
RAC: 144,339
United Kingdom
Message 1325042 - Posted: 5 Jan 2013, 19:51:33 UTC

Just now downloads are back to their normal - about as fast as a snail with arthritis moving across coarse glass paper...

Unlike when there were 4,000,000 awaiting download when I couldn't get any, but downloads were fast.

I just wonder if they were trying to see what happens if you overfill the tanks, and at what point the downloads "work well". Keep going down, its a long way below the current level.
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile ivan
Volunteer tester
Avatar
Send message
Joined: 5 Mar 01
Posts: 726
Credit: 175,960,828
RAC: 138,195
United Kingdom
Message 1325050 - Posted: 5 Jan 2013, 20:36:39 UTC - in response to Message 1325042.

Just now downloads are back to their normal - about as fast as a snail with arthritis moving across coarse glass paper...

Scheduler response times are still measured in seconds though, here at home at least. I wish they'd start to increase the limits a bit though, my faster machines can't last too long a break at 100 GPU WUs/machine.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 6303
Credit: 72,497,165
RAC: 46,868
Australia
Message 1325058 - Posted: 5 Jan 2013, 20:54:57 UTC - in response to Message 1325050.

I wish they'd start to increase the limits a bit though, my faster machines can't last too long a break at 100 GPU WUs/machine.

Or 100 per CPU.

____________
Grant
Darwin NT.

Profile Bernie VineProject donor
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 7809
Credit: 34,465,892
RAC: 14,812
United Kingdom
Message 1325069 - Posted: 5 Jan 2013, 21:56:42 UTC

Just now downloads are back to their normal - about as fast as a snail with arthritis moving across coarse glass paper...


Hmm that fast!! I was getting speeds of 350K now I'm lucky if it 3K!!
____________


Sometimes it is the people no one imagines anything of who do the things that no one can imagine.

fscheel
Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1325255 - Posted: 6 Jan 2013, 13:29:58 UTC

I notice this morning that my pc 6738288 has marked around 100 tasks as abandoned. What caused this and is there anything I need to do? It hasn't abandoned anymore in the past 6-7 hours and appears to be working. Boinc tasks is reporting about 200 tasks on the pc but the web site reports about 120.

thanks...Frank

msattlerProject donor
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 41704
Credit: 690,328,515
RAC: 331,638
United States
Message 1325263 - Posted: 6 Jan 2013, 13:38:26 UTC - in response to Message 1325255.

I notice this morning that my pc 6738288 has marked around 100 tasks as abandoned. What caused this and is there anything I need to do? It hasn't abandoned anymore in the past 6-7 hours and appears to be working. Boinc tasks is reporting about 200 tasks on the pc but the web site reports about 120.

thanks...Frank

Random task abandonment has been reported before. Nothing you can or need to do on your end. Just keep crunching, and if the servers pick up the problem, they might resend the missing tasks. If not, no worries, they'll get crunched by somebody sooner or later.
____________
*********************************************
Behold the power of kitty!!

I have met a few friends in my life.
Most were cats.

fscheel
Send message
Joined: 13 Apr 12
Posts: 73
Credit: 11,135,641
RAC: 0
United States
Message 1325270 - Posted: 6 Jan 2013, 13:45:32 UTC - in response to Message 1325263.

I notice this morning that my pc 6738288 has marked around 100 tasks as abandoned. What caused this and is there anything I need to do? It hasn't abandoned anymore in the past 6-7 hours and appears to be working. Boinc tasks is reporting about 200 tasks on the pc but the web site reports about 120.

thanks...Frank

Random task abandonment has been reported before. Nothing you can or need to do on your end. Just keep crunching, and if the servers pick up the problem, they might resend the missing tasks. If not, no worries, they'll get crunched by somebody sooner or later.



Thanks

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 25 · Next

Message boards : Number crunching : Panic Mode On (80) Server Problems?

Copyright © 2015 University of California