Panic Mode On (69) Server problems?

Message boards : Number crunching : Panic Mode On (69) Server problems?

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next

AuthorMessage
David SProject Donor
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 17047
Credit: 20,956,986
RAC: 6,140
United States
Message 1200970 - Posted: 29 Feb 2012, 15:33:20 UTC - in response to Message 1200910.  

And at about 03:00hrs d/l u/l & sched servers showing disabled again.
03:24 hrs only 1 server now disabled.. These darn things are switching on and off for some reason.
Cant see a human hand being responsible in the very early hours of the morning.
[times are pst mot utc]

Cheers

When I looked an hour or so ago, only the scheduling server was showing disabled. Now it and the upload server are. Disabled means someone manually turned it off, right? But the guys usually come into the lab at 0800 PST and the last update of the server status page was at 0700.

Also, what the heck is going on with the crickets? They seem to have dropped by about 20Mbps at around 1900 last night.

Update: before posting, I checked the SSP again and the upload server is back up; the blue crickets aren't showing any significant problems. Ready to send has dropped quite a bit, to ~200K, which means work is continuing to be scheduled and sent.

"I'm so confused..."

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


ID: 1200970 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1201017 - Posted: 29 Feb 2012, 17:17:33 UTC

Since reporting is working it's obviously some giltch in the status display.
ID: 1201017 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45946
Credit: 815,426,159
RAC: 124,540
United States
Message 1201035 - Posted: 29 Feb 2012, 18:17:44 UTC
Last modified: 29 Feb 2012, 18:20:08 UTC

Matt just turned the tasks page back on!!!

Meowza!

(Now, don't everybody go crashing the DB now. LOL.)
Always remember.....kitties are all Angels with fur.

Have made friends in this life.
Most were cats.
ID: 1201035 · Report as offensive
Profile Michel448a
Volunteer tester
Avatar

Send message
Joined: 27 Oct 00
Posts: 1201
Credit: 2,891,635
RAC: 0
Canada
Message 1201037 - Posted: 29 Feb 2012, 18:37:21 UTC - in response to Message 1201035.  

Matt just turned the tasks page back on!!!

Meowza!

(Now, don't everybody go crashing the DB now. LOL.)


HAHA sweet :)
ID: 1201037 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 502
Credit: 46,956,616
RAC: 13,292
United Kingdom
Message 1201043 - Posted: 29 Feb 2012, 18:44:04 UTC - in response to Message 1200873.  

I may have a problem.

Consecutive valid tasks on the application details page for my GPU's keeps dropping to single figures.

I may be trashing WU's, not seeing anything obvious (short run times etc) in boinc manager.

I have over the weekend striped and cleaned this machine and upgraded video drivers, first to 285.62 and now to 290.53.

ATM I cannot do much more than keep an eye on it when I can, if it starts looking too bad I will stop processing on GPU's.

It would be nice if the tasks pages could be turned on again.




Found the problem.

The driver update reduced the vcore voltage on my GPU's from 1.00v to 0.95v, It looks as if my cards don't like being run at too low a voltage.

Thanks for turning on the tasks pages.


Kevin


ID: 1201043 · Report as offensive
David SProject Donor
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 17047
Credit: 20,956,986
RAC: 6,140
United States
Message 1201047 - Posted: 29 Feb 2012, 18:48:36 UTC - in response to Message 1201035.  

Matt just turned the tasks page back on!!!

Meowza!

(Now, don't everybody go crashing the DB now. LOL.)

Meowza indeed! That's even better than a silly green star.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.


ID: 1201047 · Report as offensive
Dave
Avatar

Send message
Joined: 29 Mar 02
Posts: 777
Credit: 25,001,396
RAC: 0
United Kingdom
Message 1201072 - Posted: 29 Feb 2012, 19:35:03 UTC

Those "silly green stars" keep this place alive.
ID: 1201072 · Report as offensive
Profile SciManStev
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 5851
Credit: 105,986,638
RAC: 1,927
United States
Message 1201080 - Posted: 29 Feb 2012, 20:16:49 UTC

It is so nice having the tasks pages back. I now know from work, that I am filled to my limits. With it off, I had to use the rescheduler tool to see what I had on board.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1201080 · Report as offensive
zoom314
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 56772
Credit: 40,754,312
RAC: 5,000
United States
Message 1201082 - Posted: 29 Feb 2012, 20:21:37 UTC - in response to Message 1201047.  

Matt just turned the tasks page back on!!!

Meowza!

(Now, don't everybody go crashing the DB now. LOL.)

Meowza indeed! That's even better than a silly green star.

I think blue is quite an improvement...
Pluto is still a planet

Beep! Beep!
ID: 1201082 · Report as offensive
Profile Dimly Lit Lightbulb 😀Project Donor
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 14363
Credit: 2,924,933
RAC: 3,133
United Kingdom
Message 1201147 - Posted: 29 Feb 2012, 23:25:54 UTC - in response to Message 1201035.  

Matt just turned the tasks page back on!!!

Meowza!

(Now, don't everybody go crashing the DB now. LOL.)

Woo!
ID: 1201147 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,622,529
RAC: 342
United States
Message 1201182 - Posted: 1 Mar 2012, 0:44:12 UTC

Well, I am glad the tasks pages are back on.. but at the same time.. my spreadsheet for all the APs I've done is just completely ruined. 80% of the WUs that I had made entries for and were waiting to be crunched and returned to get the rest of the data are all now "unable to collect data." Scrapping that project after nearly two years.

With that bombshell.. I present: Getting totally credit-screwed by an ATI wingmate. (since it will purge soon.. 3.31 is all it got).
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1201182 · Report as offensive
Profile Wiggo "Socialist"
Avatar

Send message
Joined: 24 Jan 00
Posts: 10534
Credit: 135,462,653
RAC: 41,141
Australia
Message 1201234 - Posted: 1 Mar 2012, 2:27:10 UTC - in response to Message 1201182.  

With that bombshell.. I present: Getting totally credit-screwed by an ATI wingmate. (since it will purge soon.. 3.31 is all it got).

Isn't it strange that when one of those gets pointed out it's suddenly no longer available to see (that's happened to me before as well).

Cheers.
ID: 1201234 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,622,529
RAC: 342
United States
Message 1201251 - Posted: 1 Mar 2012, 3:02:24 UTC - in response to Message 1201234.  
Last modified: 1 Mar 2012, 3:03:49 UTC

With that bombshell.. I present: Getting totally credit-screwed by an ATI wingmate. (since it will purge soon.. 3.31 is all it got).

Isn't it strange that when one of those gets pointed out it's suddenly no longer available to see (that's happened to me before as well).

Cheers.

Not trying to promote/start any witch-hunts.. but it was hostid=6029917. I looked through the AP tasks that machine has listed and there is a huge variation in run time for the GPU, was at least one with an error (is purged now), and several that had a pretty normal granted credit.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1201251 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1201331 - Posted: 1 Mar 2012, 11:51:58 UTC - in response to Message 1201251.  

Hah,
I have an outstanding AP task, been outstanding for quite some time..
So I had a look at the wingman..

WoW, 45 day turnround, ~380 tasks on the rig..

When I wonder will the last of those tasks be completed? 2020? or later?

Still he did contact the server today:-) I wonder why.. Oh.. silly me it must have been day 45.. Wonder if he actually returned a compled task?

Not as irritating as loosing 2 years worth of work, but still irritating.

Regards,


Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1201331 · Report as offensive
Dave
Avatar

Send message
Joined: 29 Mar 02
Posts: 777
Credit: 25,001,396
RAC: 0
United Kingdom
Message 1201443 - Posted: 1 Mar 2012, 19:36:26 UTC

You didn't "lose 2 years of work". Think of it as an ongoing research project that reached a natural end. Everything we do is a learning exercise & we come away from it better for it.
ID: 1201443 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1201446 - Posted: 1 Mar 2012, 19:46:29 UTC - in response to Message 1201443.  

Hi Dave,
I'm betting thats not what Cosmic Ocean is thinking or feeling right now though..

Cheers
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1201446 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11142
Credit: 83,835,956
RAC: 46,061
United Kingdom
Message 1201468 - Posted: 1 Mar 2012, 20:56:49 UTC

Not today, but next week. From the front page:

Monday Morning Outage
The entire lab is undergoing some electrical power tests on the morning of Monday, March 5th. All SETI web sites and servers will be unreachable for 2 hours (from 8am to 10am, Pacific Time).
ID: 1201468 · Report as offensive
Profile Michel448a
Volunteer tester
Avatar

Send message
Joined: 27 Oct 00
Posts: 1201
Credit: 2,891,635
RAC: 0
Canada
Message 1201469 - Posted: 1 Mar 2012, 21:01:59 UTC - in response to Message 1201468.  

Not today, but next week. From the front page:

Monday Morning Outage
The entire lab is undergoing some electrical power tests on the morning of Monday, March 5th. All SETI web sites and servers will be unreachable for 2 hours (from 8am to 10am, Pacific Time).


ohhhhh ! good to know
ID: 1201469 · Report as offensive
Profile Graham Middleton

Send message
Joined: 1 Sep 00
Posts: 1012
Credit: 74,976,029
RAC: 37,380
United Kingdom
Message 1201474 - Posted: 1 Mar 2012, 21:18:36 UTC - in response to Message 1201469.  
Last modified: 1 Mar 2012, 21:20:24 UTC

Not today, but next week. From the front page:

Monday Morning Outage
The entire lab is undergoing some electrical power tests on the morning of Monday, March 5th. All SETI web sites and servers will be unreachable for 2 hours (from 8am to 10am, Pacific Time).


ohhhhh ! good to know



Indeed good to know, but also something to beware of. Every time there are power tests in the Data Centers of my employers, There seem to be 2 constants:-

1. The testing causes further power issues that continue for some time after the scheduled duration, and have a wider effect than planned.

And

2) A number of the systems (notably those that the business can least do without) fail to reboot after the power work, caused by failed disks (the one remaining of a mirror pair fails to restart), power supplies (N out of N + 1 power supplies may be enough to keep the server running, but won't be enough or will be wrongly configured to enable the system to boot) or a database server has been wrongly set to auto-boot on power-up, to it tries to open a database when some, but not all the required disks are available, resulting in corruption, and the need for a full restore of the database from backups [and they are always usable and correct, aren't they???!!!???]. Or even some combination of these issues.

In other words, Murphy's Law always applies.

Of course this isn't from personal experience! :-D
Happy Crunching,

Graham

GPUUG Officer




graham@gpuug.org
ID: 1201474 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,622,529
RAC: 342
United States
Message 1201537 - Posted: 1 Mar 2012, 23:51:16 UTC - in response to Message 1201446.  

Hi Dave,
I'm betting thats not what Cosmic Ocean is thinking or feeling right now though..

Cheers

Eh.. it's not really that important. At first it was a project to compare the run-time vs. percent blanked for AP tasks using the lunatics apps. It started way back with r103 when I made the switch from stock->optimized. At that time, I had four machines that were crunching, and they were all radically different architectures. I made some good observations and data points.

Even recently when my main cruncher of just over five years started developing problems and I removed one of the CPUs, the data discovered a possible architecture flaw. I have at one point also sent all of my work to Josef to see if he could make any sense of an issue I was having.

So it wasn't really a waste, but like you said Dave.. it was probably time for that project to come to an end. I've worked through small periods of not being able to get at the tasks, or DB crashes that last a week or more without any significant loss, but this most recent occurrence was enough to just make me scrap it. Of course I could just start anew now that it is working for the most part.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1201537 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next

Message boards : Number crunching : Panic Mode On (69) Server problems?


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.