Panic Mode On (69) Server problems?

Message boards : Number crunching : Panic Mode On (69) Server problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1200873 - Posted: 29 Feb 2012, 6:46:51 UTC

I may have a problem.

Consecutive valid tasks on the application details page for my GPU's keeps dropping to single figures.

I may be trashing WU's, not seeing anything obvious (short run times etc) in boinc manager.

I have over the weekend striped and cleaned this machine and upgraded video drivers, first to 285.62 and now to 290.53.

ATM I cannot do much more than keep an eye on it when I can, if it starts looking too bad I will stop processing on GPU's.

It would be nice if the tasks pages could be turned on again.



Kevin


ID: 1200873 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1200878 - Posted: 29 Feb 2012, 6:54:39 UTC

Pshaw.....it seems that they still don't think the servers can handle the DB inquiries to give us back our tasks info.

Meowphhhhhht.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1200878 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1200910 - Posted: 29 Feb 2012, 11:26:39 UTC - in response to Message 1200878.  

And at about 03:00hrs d/l u/l & sched servers showing disabled again.
03:24 hrs only 1 server now disabled.. These darn things are switching on and off for some reason.
Cant see a human hand being responsible in the very early hours of the morning.
[times are pst mot utc]

Cheers
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1200910 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1200970 - Posted: 29 Feb 2012, 15:33:20 UTC - in response to Message 1200910.  

And at about 03:00hrs d/l u/l & sched servers showing disabled again.
03:24 hrs only 1 server now disabled.. These darn things are switching on and off for some reason.
Cant see a human hand being responsible in the very early hours of the morning.
[times are pst mot utc]

Cheers

When I looked an hour or so ago, only the scheduling server was showing disabled. Now it and the upload server are. Disabled means someone manually turned it off, right? But the guys usually come into the lab at 0800 PST and the last update of the server status page was at 0700.

Also, what the heck is going on with the crickets? They seem to have dropped by about 20Mbps at around 1900 last night.

Update: before posting, I checked the SSP again and the upload server is back up; the blue crickets aren't showing any significant problems. Ready to send has dropped quite a bit, to ~200K, which means work is continuing to be scheduled and sent.

"I'm so confused..."

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1200970 · Report as offensive
LadyL
Volunteer tester
Avatar

Send message
Joined: 14 Sep 11
Posts: 1679
Credit: 5,230,097
RAC: 0
Message 1201017 - Posted: 29 Feb 2012, 17:17:33 UTC

Since reporting is working it's obviously some giltch in the status display.
ID: 1201017 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1201035 - Posted: 29 Feb 2012, 18:17:44 UTC
Last modified: 29 Feb 2012, 18:20:08 UTC

Matt just turned the tasks page back on!!!

Meowza!

(Now, don't everybody go crashing the DB now. LOL.)
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1201035 · Report as offensive
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1201043 - Posted: 29 Feb 2012, 18:44:04 UTC - in response to Message 1200873.  

I may have a problem.

Consecutive valid tasks on the application details page for my GPU's keeps dropping to single figures.

I may be trashing WU's, not seeing anything obvious (short run times etc) in boinc manager.

I have over the weekend striped and cleaned this machine and upgraded video drivers, first to 285.62 and now to 290.53.

ATM I cannot do much more than keep an eye on it when I can, if it starts looking too bad I will stop processing on GPU's.

It would be nice if the tasks pages could be turned on again.




Found the problem.

The driver update reduced the vcore voltage on my GPU's from 1.00v to 0.95v, It looks as if my cards don't like being run at too low a voltage.

Thanks for turning on the tasks pages.


Kevin


ID: 1201043 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1201047 - Posted: 29 Feb 2012, 18:48:36 UTC - in response to Message 1201035.  

Matt just turned the tasks page back on!!!

Meowza!

(Now, don't everybody go crashing the DB now. LOL.)

Meowza indeed! That's even better than a silly green star.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1201047 · Report as offensive
Dave

Send message
Joined: 29 Mar 02
Posts: 778
Credit: 25,001,396
RAC: 0
United Kingdom
Message 1201072 - Posted: 29 Feb 2012, 19:35:03 UTC

Those "silly green stars" keep this place alive.
ID: 1201072 · Report as offensive
Profile SciManStev Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Jun 99
Posts: 6651
Credit: 121,090,076
RAC: 0
United States
Message 1201080 - Posted: 29 Feb 2012, 20:16:49 UTC

It is so nice having the tasks pages back. I now know from work, that I am filled to my limits. With it off, I had to use the rescheduler tool to see what I had on board.

Steve
Warning, addicted to SETI crunching!
Crunching as a member of GPU Users Group.
GPUUG Website
ID: 1201080 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65709
Credit: 55,293,173
RAC: 49
United States
Message 1201082 - Posted: 29 Feb 2012, 20:21:37 UTC - in response to Message 1201047.  

Matt just turned the tasks page back on!!!

Meowza!

(Now, don't everybody go crashing the DB now. LOL.)

Meowza indeed! That's even better than a silly green star.

I think blue is quite an improvement...
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 1201082 · Report as offensive
Profile Dimly Lit Lightbulb 😀
Volunteer tester
Avatar

Send message
Joined: 30 Aug 08
Posts: 15399
Credit: 7,423,413
RAC: 1
United Kingdom
Message 1201147 - Posted: 29 Feb 2012, 23:25:54 UTC - in response to Message 1201035.  

Matt just turned the tasks page back on!!!

Meowza!

(Now, don't everybody go crashing the DB now. LOL.)

Woo!

Member of the People Encouraging Niceness In Society club.

ID: 1201147 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1201182 - Posted: 1 Mar 2012, 0:44:12 UTC

Well, I am glad the tasks pages are back on.. but at the same time.. my spreadsheet for all the APs I've done is just completely ruined. 80% of the WUs that I had made entries for and were waiting to be crunched and returned to get the rest of the data are all now "unable to collect data." Scrapping that project after nearly two years.

With that bombshell.. I present: Getting totally credit-screwed by an ATI wingmate. (since it will purge soon.. 3.31 is all it got).
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1201182 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1201234 - Posted: 1 Mar 2012, 2:27:10 UTC - in response to Message 1201182.  

With that bombshell.. I present: Getting totally credit-screwed by an ATI wingmate. (since it will purge soon.. 3.31 is all it got).

Isn't it strange that when one of those gets pointed out it's suddenly no longer available to see (that's happened to me before as well).

Cheers.
ID: 1201234 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1201251 - Posted: 1 Mar 2012, 3:02:24 UTC - in response to Message 1201234.  
Last modified: 1 Mar 2012, 3:03:49 UTC

With that bombshell.. I present: Getting totally credit-screwed by an ATI wingmate. (since it will purge soon.. 3.31 is all it got).

Isn't it strange that when one of those gets pointed out it's suddenly no longer available to see (that's happened to me before as well).

Cheers.

Not trying to promote/start any witch-hunts.. but it was hostid=6029917. I looked through the AP tasks that machine has listed and there is a huge variation in run time for the GPU, was at least one with an error (is purged now), and several that had a pretty normal granted credit.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1201251 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1201331 - Posted: 1 Mar 2012, 11:51:58 UTC - in response to Message 1201251.  

Hah,
I have an outstanding AP task, been outstanding for quite some time..
So I had a look at the wingman..

WoW, 45 day turnround, ~380 tasks on the rig..

When I wonder will the last of those tasks be completed? 2020? or later?

Still he did contact the server today:-) I wonder why.. Oh.. silly me it must have been day 45.. Wonder if he actually returned a compled task?

Not as irritating as loosing 2 years worth of work, but still irritating.

Regards,


Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1201331 · Report as offensive
Dave

Send message
Joined: 29 Mar 02
Posts: 778
Credit: 25,001,396
RAC: 0
United Kingdom
Message 1201443 - Posted: 1 Mar 2012, 19:36:26 UTC

You didn't "lose 2 years of work". Think of it as an ongoing research project that reached a natural end. Everything we do is a learning exercise & we come away from it better for it.
ID: 1201443 · Report as offensive
Profile cliff
Avatar

Send message
Joined: 16 Dec 07
Posts: 625
Credit: 3,590,440
RAC: 0
United Kingdom
Message 1201446 - Posted: 1 Mar 2012, 19:46:29 UTC - in response to Message 1201443.  

Hi Dave,
I'm betting thats not what Cosmic Ocean is thinking or feeling right now though..

Cheers
Cliff,
Been there, Done that, Still no damm T shirt!
ID: 1201446 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14649
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1201468 - Posted: 1 Mar 2012, 20:56:49 UTC

Not today, but next week. From the front page:

Monday Morning Outage
The entire lab is undergoing some electrical power tests on the morning of Monday, March 5th. All SETI web sites and servers will be unreachable for 2 hours (from 8am to 10am, Pacific Time).
ID: 1201468 · Report as offensive
Profile Graham Middleton

Send message
Joined: 1 Sep 00
Posts: 1517
Credit: 86,815,638
RAC: 0
United Kingdom
Message 1201474 - Posted: 1 Mar 2012, 21:18:36 UTC - in response to Message 1201469.  
Last modified: 1 Mar 2012, 21:20:24 UTC

Not today, but next week. From the front page:

Monday Morning Outage
The entire lab is undergoing some electrical power tests on the morning of Monday, March 5th. All SETI web sites and servers will be unreachable for 2 hours (from 8am to 10am, Pacific Time).


ohhhhh ! good to know



Indeed good to know, but also something to beware of. Every time there are power tests in the Data Centers of my employers, There seem to be 2 constants:-

1. The testing causes further power issues that continue for some time after the scheduled duration, and have a wider effect than planned.

And

2) A number of the systems (notably those that the business can least do without) fail to reboot after the power work, caused by failed disks (the one remaining of a mirror pair fails to restart), power supplies (N out of N + 1 power supplies may be enough to keep the server running, but won't be enough or will be wrongly configured to enable the system to boot) or a database server has been wrongly set to auto-boot on power-up, to it tries to open a database when some, but not all the required disks are available, resulting in corruption, and the need for a full restore of the database from backups [and they are always usable and correct, aren't they???!!!???]. Or even some combination of these issues.

In other words, Murphy's Law always applies.

Of course this isn't from personal experience! :-D
Happy Crunching,

Graham

ID: 1201474 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : Panic Mode On (69) Server problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.