Panic Mode On (69) Server problems? |
![]() |
| log in |
Message boards : Number crunching : Panic Mode On (69) Server problems?
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · Next
| Author | Message |
|---|---|
And at about 03:00hrs d/l u/l & sched servers showing disabled again. When I looked an hour or so ago, only the scheduling server was showing disabled. Now it and the upload server are. Disabled means someone manually turned it off, right? But the guys usually come into the lab at 0800 PST and the last update of the server status page was at 0700. Also, what the heck is going on with the crickets? They seem to have dropped by about 20Mbps at around 1900 last night. Update: before posting, I checked the SSP again and the upload server is back up; the blue crickets aren't showing any significant problems. Ready to send has dropped quite a bit, to ~200K, which means work is continuing to be scheduled and sent. "I'm so confused..." ____________ David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. | |
| ID: 1200970 · | |
|
Since reporting is working it's obviously some giltch in the status display. | |
| ID: 1201017 · | |
|
Matt just turned the tasks page back on!!! | |
| ID: 1201035 · | |
Matt just turned the tasks page back on!!! HAHA sweet :) ____________ | |
| ID: 1201037 · | |
I may have a problem. Found the problem. The driver update reduced the vcore voltage on my GPU's from 1.00v to 0.95v, It looks as if my cards don't like being run at too low a voltage. Thanks for turning on the tasks pages. ____________ Kevin | |
| ID: 1201043 · | |
Matt just turned the tasks page back on!!! Meowza indeed! That's even better than a silly green star. ____________ David Sitting on my butt while others boldly go, Waiting for a message from a small furry creature from Alpha Centauri. | |
| ID: 1201047 · | |
|
Those "silly green stars" keep this place alive. | |
| ID: 1201072 · | |
|
It is so nice having the tasks pages back. I now know from work, that I am filled to my limits. With it off, I had to use the rescheduler tool to see what I had on board. | |
| ID: 1201080 · | |
Matt just turned the tasks page back on!!! I think blue is quite an improvement... ____________ BSG Anthem My Facebook page | |
| ID: 1201082 · | |
Matt just turned the tasks page back on!!! Woo! ____________ In an alternate universe, it was a ZX81 that asked for clothes, boots and motorcycle. Beer/wine o'clock, the best of the o'clocks. Humpty dumpty sat on a wall, along came a giant, and cooked him for breakfast. | |
| ID: 1201147 · | |
|
Well, I am glad the tasks pages are back on.. but at the same time.. my spreadsheet for all the APs I've done is just completely ruined. 80% of the WUs that I had made entries for and were waiting to be crunched and returned to get the rest of the data are all now "unable to collect data." Scrapping that project after nearly two years. | |
| ID: 1201182 · | |
With that bombshell.. I present: Getting totally credit-screwed by an ATI wingmate. (since it will purge soon.. 3.31 is all it got). Isn't it strange that when one of those gets pointed out it's suddenly no longer available to see (that's happened to me before as well). Cheers. ____________ | |
| ID: 1201234 · | |
With that bombshell.. I present: Getting totally credit-screwed by an ATI wingmate. (since it will purge soon.. 3.31 is all it got). Not trying to promote/start any witch-hunts.. but it was hostid=6029917. I looked through the AP tasks that machine has listed and there is a huge variation in run time for the GPU, was at least one with an error (is purged now), and several that had a pretty normal granted credit. ____________ Linux laptop uptime: 1484d 22h 42m Ended due to UPS failure, found 14 hours after the fact | |
| ID: 1201251 · | |
|
Hah, | |
| ID: 1201331 · | |
|
You didn't "lose 2 years of work". Think of it as an ongoing research project that reached a natural end. Everything we do is a learning exercise & we come away from it better for it. | |
| ID: 1201443 · | |
|
Hi Dave, | |
| ID: 1201446 · | |
|
Not today, but next week. From the front page: Monday Morning Outage | |
| ID: 1201468 · | |
Not today, but next week. From the front page: ohhhhh ! good to know ____________ | |
| ID: 1201469 · | |
Not today, but next week. From the front page: Indeed good to know, but also something to beware of. Every time there are power tests in the Data Centers of my employers, There seem to be 2 constants:- 1. The testing causes further power issues that continue for some time after the scheduled duration, and have a wider effect than planned. And 2) A number of the systems (notably those that the business can least do without) fail to reboot after the power work, caused by failed disks (the one remaining of a mirror pair fails to restart), power supplies (N out of N + 1 power supplies may be enough to keep the server running, but won't be enough or will be wrongly configured to enable the system to boot) or a database server has been wrongly set to auto-boot on power-up, to it tries to open a database when some, but not all the required disks are available, resulting in corruption, and the need for a full restore of the database from backups [and they are always usable and correct, aren't they???!!!???]. Or even some combination of these issues. In other words, Murphy's Law always applies. Of course this isn't from personal experience! :-D ____________ Happy Crunching, Graham GPUUG Officer graham@gpuug.org | |
| ID: 1201474 · | |
Hi Dave, Eh.. it's not really that important. At first it was a project to compare the run-time vs. percent blanked for AP tasks using the lunatics apps. It started way back with r103 when I made the switch from stock->optimized. At that time, I had four machines that were crunching, and they were all radically different architectures. I made some good observations and data points. Even recently when my main cruncher of just over five years started developing problems and I removed one of the CPUs, the data discovered a possible architecture flaw. I have at one point also sent all of my work to Josef to see if he could make any sense of an issue I was having. So it wasn't really a waste, but like you said Dave.. it was probably time for that project to come to an end. I've worked through small periods of not being able to get at the tasks, or DB crashes that last a week or more without any significant loss, but this most recent occurrence was enough to just make me scrap it. Of course I could just start anew now that it is working for the most part. ____________ Linux laptop uptime: 1484d 22h 42m Ended due to UPS failure, found 14 hours after the fact | |
| ID: 1201537 · | |
Message boards : Number crunching : Panic Mode On (69) Server problems?
| Copyright © 2013 University of California |