Panic Mode On (98) Server Problems?

Message boards : Number crunching : Panic Mode On (98) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 30 · Next

AuthorMessage
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9958
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1696011 - Posted: 26 Jun 2015, 17:29:59 UTC

There is no "official outage" read Matt's post

Speaking of outages, this Saturday (the 27th) we will be bringing the project down for the day as the colo is messing with power lines and while they are confident we shouldn't lose power during their upgrades we're going to play it safe and make sure our databases are quiescent.


They have decided to bring the project down "just in case". The COLO expects no irruptions.
ID: 1696011 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1696017 - Posted: 26 Jun 2015, 18:32:41 UTC - in response to Message 1696010.  
Last modified: 26 Jun 2015, 18:33:59 UTC

Well, since nothing comes out of the splitters, my GTX980 will be cold in 3 hours. And even if they start splitting full bore, and I will be able to fill the cache to the brim, it will become dead cold again, 6 hours into tomorrows outage.

100 tasks for the GTX980, only lasts about 6 hours....

Maybe I missed it, but was any indication given as to how long an outage to expect?

Matt said "we will be bringing the project down for the day". So I expect that means pretty much all day. Depending on who is doing the work, campus or the power company, they might be finished by the evening. Then someone could remote in to power everything up.
I'm thinking we won't see the servers back up before 6PM local SETI time.

I suppose so.
Oddly, there was nothing posted on the IST page about it.

Not really. There is no scheduled down time. They have chosen to take things down.
"they are confident we shouldn't lose power during their upgrades we're going to play it safe"
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1696017 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1696104 - Posted: 27 Jun 2015, 2:40:26 UTC

Ahhhhh...
That makes sense. Thanks for the clarification.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1696104 · Report as offensive
WezH
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 576
Credit: 67,033,957
RAC: 95
Finland
Message 1696201 - Posted: 27 Jun 2015, 10:38:47 UTC
Last modified: 27 Jun 2015, 10:39:52 UTC

After the new ap's from today I found 8 more (7567911, 7567922, 7567938, 7568510, 7568518, 7568644,7569479, 7572925)

All 35 identical in chronological order of building:

7567886, 7567911, 7567912, 7567913, 7567922, 7567924, 7567931, 7567938, 7567941, 7567951, 7568499, 7568501, 7568503, 7568504, 7568508, 7568510, 7568511, 7568512, 7568513, 7568516, 7568518, 7568519, 7568596, 7568597, 7568637, 7568640, 7568642, 7568643, 7568644, 7568646, 7569479, 7569519, 7572867, 7572890, 7572925



Last 5 are not identical to first 30.


As for this mysterious 30 identical hosts aborting AP's seems to be in end. No new task downloaded in last "feeding frenzy". All of those hosts has "Last contact - 12 Jun 2015"

So in between 30 June - 5 July there will be 414 AP tasks resend. Most of them in 2 July.
ID: 1696201 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1696208 - Posted: 27 Jun 2015, 11:53:33 UTC - in response to Message 1696201.  

After the new ap's from today I found 8 more (7567911, 7567922, 7567938, 7568510, 7568518, 7568644,7569479, 7572925)

All 35 identical in chronological order of building:

7567886, 7567911, 7567912, 7567913, 7567922, 7567924, 7567931, 7567938, 7567941, 7567951, 7568499, 7568501, 7568503, 7568504, 7568508, 7568510, 7568511, 7568512, 7568513, 7568516, 7568518, 7568519, 7568596, 7568597, 7568637, 7568640, 7568642, 7568643, 7568644, 7568646, 7569479, 7569519, 7572867, 7572890, 7572925



Last 5 are not identical to first 30.


As for this mysterious 30 identical hosts aborting AP's seems to be in end. No new task downloaded in last "feeding frenzy". All of those hosts has "Last contact - 12 Jun 2015"

So in between 30 June - 5 July there will be 414 AP tasks resend. Most of them in 2 July.

They must have finished testing the server farm...........

"Sour Grapes make a bitter Whine." <(0)>
ID: 1696208 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1696228 - Posted: 27 Jun 2015, 19:26:31 UTC - in response to Message 1696208.  

We may be back but not without some hiccups.

Getting the dreaded Scheduler request failed: HTTP service unavailable

On well, now it's a waiting game.
ID: 1696228 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1696229 - Posted: 27 Jun 2015, 19:30:02 UTC - in response to Message 1696228.  

We may be back but not without some hiccups.

Getting the dreaded Scheduler request failed: HTTP service unavailable

On well, now it's a waiting game.

It's often like that when first coming back up after an outage.
The servers start getting hit whilst they are still sorting themselves out.
Should sort itself in a while.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1696229 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 66336
Credit: 55,293,173
RAC: 49
United States
Message 1696238 - Posted: 27 Jun 2015, 20:11:46 UTC

Seems a bit slow here since the forums came back online.
Savoir-Faire is everywhere!
The T1 Trust, T1 Class 4-4-4-4 #5550, America's First HST

ID: 1696238 · Report as offensive
Profile JaundicedEye
Avatar

Send message
Joined: 14 Mar 12
Posts: 5375
Credit: 30,870,693
RAC: 1
United States
Message 1696244 - Posted: 27 Jun 2015, 20:32:56 UTC

Data Distribution State SETI@home # Astropulse # As of*
Results ready to send 304,948 0 7h
Current result creation rate 0.1445/sec -1.0000/sec 4m

I don't think I've ever seen a negative creation rate before.

"Sour Grapes make a bitter Whine." <(0)>
ID: 1696244 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1696246 - Posted: 27 Jun 2015, 20:35:22 UTC - in response to Message 1696244.  

Data Distribution State SETI@home # Astropulse # As of*
Results ready to send 304,948 0 7h
Current result creation rate 0.1445/sec -1.0000/sec 4m

I don't think I've ever seen a negative creation rate before.

I've seen it here and there for AP, but not often.
Last SSP page is more current.
The 7 hour old stats are now updated.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1696246 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1696255 - Posted: 27 Jun 2015, 21:15:46 UTC - in response to Message 1696246.  

There is a heartbeat!!

Ok, maybe just an MBs here and there, lol...

slow and steady....
ID: 1696255 · Report as offensive
Profile BANZAI56
Volunteer tester

Send message
Joined: 17 May 00
Posts: 139
Credit: 47,299,948
RAC: 2
United States
Message 1696256 - Posted: 27 Jun 2015, 21:25:32 UTC

Times like this I still miss not having a simple OEM tab for the Event Log.

No hoops necessary...
ID: 1696256 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1696257 - Posted: 27 Jun 2015, 21:26:46 UTC

Times like this I still miss our dearly departed Cricket graphs......
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1696257 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1696263 - Posted: 27 Jun 2015, 22:08:54 UTC - in response to Message 1696261.  
Last modified: 27 Jun 2015, 22:14:30 UTC

It will be fine, well before Christmas, I'm sure...

Things should start to settle a bit over the next hour, save any major conniptions.

My last couple of rigs just a short while ago got out of their Boinc 'deep sleep' backoff mode and reported in. So the rest of the rigs on the project either probably have also, or should be shortly. And the SSP shows results received starting to drop back off accordingly.

The daily stats dump completed about 15 minutes ago, so that DB thrasher is out of the way for today.

Work is starting to come back in to the kitties, albeit in hit and miss fashion.
I am about 1,300 tasks short of a full boat here. I wish I still had the Cricket graphs to monitor what the traffic in and out of the servers is actually doing. That was a real loss to us Seti server fixated types.

Kitties want their kibbles back....LOL.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1696263 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1696275 - Posted: 27 Jun 2015, 22:50:20 UTC - in response to Message 1696263.  

There's a lot of work available (220,000 & the splitters are actually producing more, almost 40/s), but what I've seen so far it appears to be mostly VLARs.
Almost all of my CPU cache is now VLAR (over 90%) & most GPU requests for work result in "Project has no tasks available" messages; then every 15-30 requests a whole bunch of WUs finally come through.
Grant
Darwin NT
ID: 1696275 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13854
Credit: 208,696,464
RAC: 304
Australia
Message 1696278 - Posted: 27 Jun 2015, 23:16:57 UTC - in response to Message 1696263.  

I wish I still had the Cricket graphs to monitor what the traffic in and out of the servers is actually doing.

You and me both.
It makes it a lot easier to see just how heavy the load actually is or isn't without having to try & take a wild arse guess based on the graphs.
Grant
Darwin NT
ID: 1696278 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1696279 - Posted: 27 Jun 2015, 23:19:15 UTC - in response to Message 1696275.  

Yup, 91 out of 100 CPU MB are VLARs on my 1 machine.
ID: 1696279 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1696286 - Posted: 28 Jun 2015, 0:19:11 UTC

Moving along pretty well now....
The kitties are starting to top off the tanks in fair fashion.

Meow!
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1696286 · Report as offensive
Admiral Gloval
Avatar

Send message
Joined: 31 Mar 13
Posts: 21280
Credit: 5,308,449
RAC: 0
United States
Message 1696313 - Posted: 28 Jun 2015, 2:42:35 UTC

Had a VLAR storm here to only 1 CPU wu is not a VLAR. And my remaining wu's are ati5-nocal. I am going to ask. What exactly is a VLAR.

ID: 1696313 · Report as offensive
Speedy
Volunteer tester
Avatar

Send message
Joined: 26 Jun 04
Posts: 1643
Credit: 12,921,799
RAC: 89
New Zealand
Message 1696361 - Posted: 28 Jun 2015, 6:40:31 UTC - in response to Message 1696313.  

Had a VLAR storm here to only 1 CPU wu is not a VLAR. And my remaining wu's are ati5-nocal. I am going to ask. What exactly is a VLAR.

It stands for very long and range if my memory serves me correctly.
ID: 1696361 · Report as offensive
Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 30 · Next

Message boards : Number crunching : Panic Mode On (98) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.