Panic Mode On (69) Server problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (69) Server problems?

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next
Author Message
N9JFE David S
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 10725
Credit: 13,431,630
RAC: 13,824
United States
Message 1200970 - Posted: 29 Feb 2012, 15:33:20 UTC - in response to Message 1200910.

And at about 03:00hrs d/l u/l & sched servers showing disabled again.
03:24 hrs only 1 server now disabled.. These darn things are switching on and off for some reason.
Cant see a human hand being responsible in the very early hours of the morning.
[times are pst mot utc]

Cheers

When I looked an hour or so ago, only the scheduling server was showing disabled. Now it and the upload server are. Disabled means someone manually turned it off, right? But the guys usually come into the lab at 0800 PST and the last update of the server status page was at 0700.

Also, what the heck is going on with the crickets? They seem to have dropped by about 20Mbps at around 1900 last night.

Update: before posting, I checked the SSP again and the upload server is back up; the blue crickets aren't showing any significant problems. Ready to send has dropped quite a bit, to ~200K, which means work is continuing to be scheduled and sent.

"I'm so confused..."

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


LadyL
Volunteer tester
Avatar
Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1201017 - Posted: 29 Feb 2012, 17:17:33 UTC

Since reporting is working it's obviously some giltch in the status display.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38315
Credit: 558,720,297
RAC: 628,431
United States
Message 1201035 - Posted: 29 Feb 2012, 18:17:44 UTC
Last modified: 29 Feb 2012, 18:20:08 UTC

Matt just turned the tasks page back on!!!

Meowza!

(Now, don't everybody go crashing the DB now. LOL.)
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Profile Michel448a
Volunteer tester
Avatar
Send message
Joined: 27 Oct 00
Posts: 1201
Credit: 2,891,635
RAC: 0
Canada
Message 1201037 - Posted: 29 Feb 2012, 18:37:21 UTC - in response to Message 1201035.

Matt just turned the tasks page back on!!!

Meowza!

(Now, don't everybody go crashing the DB now. LOL.)


HAHA sweet :)
____________

Kevin Olley
Send message
Joined: 3 Aug 99
Posts: 368
Credit: 35,176,971
RAC: 2,979
United Kingdom
Message 1201043 - Posted: 29 Feb 2012, 18:44:04 UTC - in response to Message 1200873.

I may have a problem.

Consecutive valid tasks on the application details page for my GPU's keeps dropping to single figures.

I may be trashing WU's, not seeing anything obvious (short run times etc) in boinc manager.

I have over the weekend striped and cleaned this machine and upgraded video drivers, first to 285.62 and now to 290.53.

ATM I cannot do much more than keep an eye on it when I can, if it starts looking too bad I will stop processing on GPU's.

It would be nice if the tasks pages could be turned on again.




Found the problem.

The driver update reduced the vcore voltage on my GPU's from 1.00v to 0.95v, It looks as if my cards don't like being run at too low a voltage.

Thanks for turning on the tasks pages.


____________
Kevin


N9JFE David S
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 10725
Credit: 13,431,630
RAC: 13,824
United States
Message 1201047 - Posted: 29 Feb 2012, 18:48:36 UTC - in response to Message 1201035.

Matt just turned the tasks page back on!!!

Meowza!

(Now, don't everybody go crashing the DB now. LOL.)

Meowza indeed! That's even better than a silly green star.

____________
David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


Dave
Avatar
Send message
Joined: 29 Mar 02
Posts: 774
Credit: 23,193,139
RAC: 0
United Kingdom
Message 1201072 - Posted: 29 Feb 2012, 19:35:03 UTC

Those "silly green stars" keep this place alive.

Profile SciManStev
Volunteer tester
Avatar
Send message
Joined: 20 Jun 99
Posts: 4792
Credit: 79,657,099
RAC: 33,444
United States
Message 1201080 - Posted: 29 Feb 2012, 20:16:49 UTC

It is so nice having the tasks pages back. I now know from work, that I am filled to my limits. With it off, I had to use the rescheduler tool to see what I had on board.

Steve
____________
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 45781
Credit: 36,405,058
RAC: 7,398
Message 1201082 - Posted: 29 Feb 2012, 20:21:37 UTC - in response to Message 1201047.

Matt just turned the tasks page back on!!!

Meowza!

(Now, don't everybody go crashing the DB now. LOL.)

Meowza indeed! That's even better than a silly green star.

I think blue is quite an improvement...
____________

Profile Zapped Sparky
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 30 Aug 08
Posts: 6637
Credit: 1,200,844
RAC: 77
United Kingdom
Message 1201147 - Posted: 29 Feb 2012, 23:25:54 UTC - in response to Message 1201035.

Matt just turned the tasks page back on!!!

Meowza!

(Now, don't everybody go crashing the DB now. LOL.)

Woo!
____________
In an alternate universe, it was a ZX81 that asked for clothes, boots and motorcycle.

Client error 418: I'm a teapot

Tropical Goldfish Fish 13: You're not crazy if you crunch for Seti :)

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2235
Credit: 8,442,558
RAC: 4,126
United States
Message 1201182 - Posted: 1 Mar 2012, 0:44:12 UTC

Well, I am glad the tasks pages are back on.. but at the same time.. my spreadsheet for all the APs I've done is just completely ruined. 80% of the WUs that I had made entries for and were waiting to be crunched and returned to get the rest of the data are all now "unable to collect data." Scrapping that project after nearly two years.

With that bombshell.. I present: Getting totally credit-screwed by an ATI wingmate. (since it will purge soon.. 3.31 is all it got).
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 6488
Credit: 90,390,384
RAC: 73,941
Australia
Message 1201234 - Posted: 1 Mar 2012, 2:27:10 UTC - in response to Message 1201182.

With that bombshell.. I present: Getting totally credit-screwed by an ATI wingmate. (since it will purge soon.. 3.31 is all it got).

Isn't it strange that when one of those gets pointed out it's suddenly no longer available to see (that's happened to me before as well).

Cheers.
____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2235
Credit: 8,442,558
RAC: 4,126
United States
Message 1201251 - Posted: 1 Mar 2012, 3:02:24 UTC - in response to Message 1201234.
Last modified: 1 Mar 2012, 3:03:49 UTC

With that bombshell.. I present: Getting totally credit-screwed by an ATI wingmate. (since it will purge soon.. 3.31 is all it got).

Isn't it strange that when one of those gets pointed out it's suddenly no longer available to see (that's happened to me before as well).

Cheers.

Not trying to promote/start any witch-hunts.. but it was hostid=6029917. I looked through the AP tasks that machine has listed and there is a huge variation in run time for the GPU, was at least one with an error (is purged now), and several that had a pretty normal granted credit.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Profile cliff
Avatar
Send message
Joined: 16 Dec 07
Posts: 322
Credit: 2,509,590
RAC: 0
United Kingdom
Message 1201331 - Posted: 1 Mar 2012, 11:51:58 UTC - in response to Message 1201251.

Hah,
I have an outstanding AP task, been outstanding for quite some time..
So I had a look at the wingman..

WoW, 45 day turnround, ~380 tasks on the rig..

When I wonder will the last of those tasks be completed? 2020? or later?

Still he did contact the server today:-) I wonder why.. Oh.. silly me it must have been day 45.. Wonder if he actually returned a compled task?

Not as irritating as loosing 2 years worth of work, but still irritating.

Regards,


____________
Cliff,
Been there, Done that, Still no damm T shirt!

Dave
Avatar
Send message
Joined: 29 Mar 02
Posts: 774
Credit: 23,193,139
RAC: 0
United Kingdom
Message 1201443 - Posted: 1 Mar 2012, 19:36:26 UTC

You didn't "lose 2 years of work". Think of it as an ongoing research project that reached a natural end. Everything we do is a learning exercise & we come away from it better for it.

Profile cliff
Avatar
Send message
Joined: 16 Dec 07
Posts: 322
Credit: 2,509,590
RAC: 0
United Kingdom
Message 1201446 - Posted: 1 Mar 2012, 19:46:29 UTC - in response to Message 1201443.

Hi Dave,
I'm betting thats not what Cosmic Ocean is thinking or feeling right now though..

Cheers
____________
Cliff,
Been there, Done that, Still no damm T shirt!

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8375
Credit: 46,659,950
RAC: 18,274
United Kingdom
Message 1201468 - Posted: 1 Mar 2012, 20:56:49 UTC

Not today, but next week. From the front page:

Monday Morning Outage
The entire lab is undergoing some electrical power tests on the morning of Monday, March 5th. All SETI web sites and servers will be unreachable for 2 hours (from 8am to 10am, Pacific Time).

Profile Michel448a
Volunteer tester
Avatar
Send message
Joined: 27 Oct 00
Posts: 1201
Credit: 2,891,635
RAC: 0
Canada
Message 1201469 - Posted: 1 Mar 2012, 21:01:59 UTC - in response to Message 1201468.

Not today, but next week. From the front page:

Monday Morning Outage
The entire lab is undergoing some electrical power tests on the morning of Monday, March 5th. All SETI web sites and servers will be unreachable for 2 hours (from 8am to 10am, Pacific Time).


ohhhhh ! good to know
____________

Profile Graham Middleton
Send message
Joined: 1 Sep 00
Posts: 412
Credit: 44,293,857
RAC: 9
United Kingdom
Message 1201474 - Posted: 1 Mar 2012, 21:18:36 UTC - in response to Message 1201469.
Last modified: 1 Mar 2012, 21:20:24 UTC

Not today, but next week. From the front page:

Monday Morning Outage
The entire lab is undergoing some electrical power tests on the morning of Monday, March 5th. All SETI web sites and servers will be unreachable for 2 hours (from 8am to 10am, Pacific Time).


ohhhhh ! good to know



Indeed good to know, but also something to beware of. Every time there are power tests in the Data Centers of my employers, There seem to be 2 constants:-

1. The testing causes further power issues that continue for some time after the scheduled duration, and have a wider effect than planned.

And

2) A number of the systems (notably those that the business can least do without) fail to reboot after the power work, caused by failed disks (the one remaining of a mirror pair fails to restart), power supplies (N out of N + 1 power supplies may be enough to keep the server running, but won't be enough or will be wrongly configured to enable the system to boot) or a database server has been wrongly set to auto-boot on power-up, to it tries to open a database when some, but not all the required disks are available, resulting in corruption, and the need for a full restore of the database from backups [and they are always usable and correct, aren't they???!!!???]. Or even some combination of these issues.

In other words, Murphy's Law always applies.

Of course this isn't from personal experience! :-D
____________
Happy Crunching,

Graham

GPUUG Officer




graham@gpuug.org

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2235
Credit: 8,442,558
RAC: 4,126
United States
Message 1201537 - Posted: 1 Mar 2012, 23:51:16 UTC - in response to Message 1201446.

Hi Dave,
I'm betting thats not what Cosmic Ocean is thinking or feeling right now though..

Cheers

Eh.. it's not really that important. At first it was a project to compare the run-time vs. percent blanked for AP tasks using the lunatics apps. It started way back with r103 when I made the switch from stock->optimized. At that time, I had four machines that were crunching, and they were all radically different architectures. I made some good observations and data points.

Even recently when my main cruncher of just over five years started developing problems and I removed one of the CPUs, the data discovered a possible architecture flaw. I have at one point also sent all of my work to Josef to see if he could make any sense of an issue I was having.

So it wasn't really a waste, but like you said Dave.. it was probably time for that project to come to an end. I've worked through small periods of not being able to get at the tasks, or DB crashes that last a week or more without any significant loss, but this most recent occurrence was enough to just make me scrap it. Of course I could just start anew now that it is working for the most part.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next

Message boards : Number crunching : Panic Mode On (69) Server problems?

Copyright © 2014 University of California