Panic Mode On (84) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (84) Server Problems?

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 22 · Next
Author Message
Profile Mike
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 23408
Credit: 31,878,855
RAC: 23,421
Germany
Message 1384497 - Posted: 25 Jun 2013, 7:22:18 UTC

My GPU is rarely getting work also.
I want my VLAR`s back.

____________

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38343
Credit: 562,008,676
RAC: 638,224
United States
Message 1384499 - Posted: 25 Jun 2013, 7:26:21 UTC
Last modified: 25 Jun 2013, 7:28:55 UTC

I am wondering what the sustained inbound traffic to the servers as seen on the Cricket graphs for the last 10-11 hours is all about.....

It is unlike any 'normal' comms from the hosts, and it is also unlike any previous bursts of uploading new data from the lab.

DOS attack??

The kitties are curious.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5701
Credit: 56,531,471
RAC: 48,516
Australia
Message 1384502 - Posted: 25 Jun 2013, 7:38:39 UTC - in response to Message 1384499.

I am wondering what the sustained inbound traffic to the servers as seen on the Cricket graphs for the last 10-11 hours is all about.....

It is unlike any 'normal' comms from the hosts, and it is also unlike any previous bursts of uploading new data from the lab.

DOS attack??

The kitties are curious.


I was thinking it might be the usual data from the archive traffic, but limited by whatever is causing issues with the Scheduler/splitters/feeders.

There's plenty of work there, but it's just not being allocated. AP assimilators are backing up again. Ready-to-send buffer actually spiked much higher than usual before it topped out.
So something's gummed up the works.
____________
Grant
Darwin NT.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38343
Credit: 562,008,676
RAC: 638,224
United States
Message 1384504 - Posted: 25 Jun 2013, 7:44:09 UTC - in response to Message 1384502.

I am wondering what the sustained inbound traffic to the servers as seen on the Cricket graphs for the last 10-11 hours is all about.....

It is unlike any 'normal' comms from the hosts, and it is also unlike any previous bursts of uploading new data from the lab.

DOS attack??

The kitties are curious.


I was thinking it might be the usual data from the archive traffic, but limited by whatever is causing issues with the Scheduler/splitters/feeders.

There's plenty of work there, but it's just not being allocated. AP assimilators are backing up again. Ready-to-send buffer actually spiked much higher than usual before it topped out.
So something's gummed up the works.

I just dunno....
Whatever is going on, it is not usual.

When you are on the Cricket graphs, click the 'long term' link.
There has not been any activity similar to this since the move to the colo.


____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5701
Credit: 56,531,471
RAC: 48,516
Australia
Message 1384506 - Posted: 25 Jun 2013, 7:46:07 UTC - in response to Message 1384504.

When you are on the Cricket graphs, click the 'long term' link.
There has not been any activity similar to this since the move to the colo.

Nope.
But then we haven't had Scheduler/feeder issues either since the move. Till now.
____________
Grant
Darwin NT.

jravin
Send message
Joined: 25 Mar 02
Posts: 927
Credit: 95,297,970
RAC: 88,216
United States
Message 1384536 - Posted: 25 Jun 2013, 11:52:41 UTC - in response to Message 1384425.

2 of your computers are getting work, just one isn't. Look to it to make sure it is asking for work.



The one not getting work is no longer online. It was GPU only. I recently acquired an i7-3820 and MB and 4x2GB of quad channel 2133MHz RAM (thanks, Craigslist!), and have replaced Unimatrix002 with I7-3820. Am now running 8 HT cores, with 4 CPU tasks and the other 4 reserved for graphics support (love those AP 6.04s!).

When I can get them, that is.

Right now, neither is getting work except for a (very) few APs overnight. I7-3820 is about to run out of work; Fermibox2 still has a bunch to do.
____________

JohnDK
Volunteer tester
Avatar
Send message
Joined: 28 May 00
Posts: 829
Credit: 40,819,364
RAC: 69,440
Denmark
Message 1384542 - Posted: 25 Jun 2013, 12:26:43 UTC

Re. lack of AP work. Could it be that more and more switch to doing AP only (like me), due to the credits + there's now a stock linux GPU AP app?

jravin
Send message
Joined: 25 Mar 02
Posts: 927
Credit: 95,297,970
RAC: 88,216
United States
Message 1384546 - Posted: 25 Jun 2013, 13:20:12 UTC - in response to Message 1384542.

Re. lack of AP work. Could it be that more and more switch to doing AP only (like me), due to the credits + there's now a stock linux GPU AP app?

Maybe, but I note that on the server status page, the number of channels to do for MB hasn't changed since at least last night, and very few (if any) of AP have been processed either.

Definitely seems to be a splitter problem, at least at first glance.
____________

jravin
Send message
Joined: 25 Mar 02
Posts: 927
Credit: 95,297,970
RAC: 88,216
United States
Message 1384552 - Posted: 25 Jun 2013, 14:09:12 UTC

UPDATE: Since my last post, SSP says that 3 channels of AP and 3 of MB have been processed. Not very much.
____________

Profile Mike
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 23408
Credit: 31,878,855
RAC: 23,421
Germany
Message 1384560 - Posted: 25 Jun 2013, 14:56:44 UTC - in response to Message 1384546.
Last modified: 25 Jun 2013, 14:56:58 UTC

Re. lack of AP work. Could it be that more and more switch to doing AP only (like me), due to the credits + there's now a stock linux GPU AP app?

Maybe, but I note that on the server status page, the number of channels to do for MB hasn't changed since at least last night, and very few (if any) of AP have been processed either.

Definitely seems to be a splitter problem, at least at first glance.


I think its more a feeder problem since still over 300.000 V7 are ready to send.
I might be wrong on that.
____________

jravin
Send message
Joined: 25 Mar 02
Posts: 927
Credit: 95,297,970
RAC: 88,216
United States
Message 1384561 - Posted: 25 Jun 2013, 14:59:01 UTC - in response to Message 1384560.

I think its more a feeder problem since still over 300.000 V7 are ready to send.
I might be wrong on that.


The question then becomes: do we split a feeder or feed a splitter?
____________

Profile Gary Charpentier
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 12156
Credit: 6,442,536
RAC: 8,248
United States
Message 1384566 - Posted: 25 Jun 2013, 15:15:39 UTC - in response to Message 1384536.

Right now, neither is getting work except for a (very) few APs overnight. I7-3820 is about to run out of work; Fermibox2 still has a bunch to do.

About to run out of work is normal with the BOINC 7.X. Unless you change the setting for connect every to a long value, it drains the queue until it is dry before it asks for more work.

e.g. set to default 10 times a day, queue is drained to 1/10 day's work. Set to every 2 days, queue is drained to 2 days work. Yeah, kind of works backwards.

I don't suggest a too big value for the extra work fill either. We haven't had a long outage since they have moved to the co-lo and too big a number here simply runs you into the max tasks limit and you don't get it anyway.

____________

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38343
Credit: 562,008,676
RAC: 638,224
United States
Message 1384577 - Posted: 25 Jun 2013, 15:43:48 UTC

And I see the 'mystery bandwidth' has continued.
The kitties have been losing cache all night. Not a usual thingy since the move to the colo.
Hopefully they'll sort it all during today's outage.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Profile Fred E.
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,136,801
RAC: 2,076
United States
Message 1384621 - Posted: 25 Jun 2013, 20:03:33 UTC

Guess something got fixed. My first request after maintenance brought me back to the limits. Hope it holds up.

And I see the 'mystery bandwidth' has continued.

Don't know what to think of that. Maybe they thorottled back to Lab-to-CoLo transfers?
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

rob smith
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8149
Credit: 52,949,682
RAC: 75,088
United Kingdom
Message 1384632 - Posted: 25 Jun 2013, 21:04:22 UTC

Oh dear, I see 20jn12ac is still hanging around, stuck with loads of errors :-(
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

jravin
Send message
Joined: 25 Mar 02
Posts: 927
Credit: 95,297,970
RAC: 88,216
United States
Message 1384650 - Posted: 25 Jun 2013, 22:05:08 UTC - in response to Message 1384566.

@Gary - THANKS! That explains why I7-3820 (my new cruncher) wasn't even asking for WUs except when nearly empty. I changed the settings per your suggestion, and the machine promptly loaded up to the limit.
____________

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 6546
Credit: 90,770,443
RAC: 75,085
Australia
Message 1384692 - Posted: 26 Jun 2013, 2:17:11 UTC - in response to Message 1384632.

Oh dear, I see 20jn12ac is still hanging around, stuck with loads of errors :-(

It certainly isn't giving in, is it?

Cheers.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5701
Credit: 56,531,471
RAC: 48,516
Australia
Message 1384729 - Posted: 26 Jun 2013, 5:53:04 UTC - in response to Message 1384692.
Last modified: 26 Jun 2013, 5:53:24 UTC

Splitters still having issues. Ready-to-send buffer continues to shrink with the splitters unable to produce enough work. Could run out of work yet again all within 24hors of last running out.
____________
Grant
Darwin NT.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 38343
Credit: 562,008,676
RAC: 638,224
United States
Message 1384740 - Posted: 26 Jun 2013, 7:00:33 UTC - in response to Message 1384729.

Splitters still having issues. Ready-to-send buffer continues to shrink with the splitters unable to produce enough work. Could run out of work yet again all within 24hors of last running out.

Well, that stuck dataset 20jn12ac is not helping anything. Tying up one MB splitter that is not producing WUs.

Eric said that Matt had restarted it. It apparently is still in terminal fail mode. I send Eric another message, and we'll have to wait until tomorrow to see if they kick it again or just boot the sucker.
____________
*********************************************
Embrace your inner kitty...ya know ya wanna!

I have met a few friends in my life.
Most were cats.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2240
Credit: 8,462,455
RAC: 4,122
United States
Message 1384744 - Posted: 26 Jun 2013, 7:15:21 UTC
Last modified: 26 Jun 2013, 7:22:46 UTC

Was just looking through my single-core machine's recently-reported v7 WUs and saw a cool one. 1265536732

The beginnings of an inconclusive train. I guess build 1846 and stock CPU app didn't quite agree with each other..? Its obvious cuda32 just completely got it wrong. It'll be interesting to see what happens when cuda50 gives it a whirl.

edit: and then I looked into it more.. the cuda32 attempt is a runaway machine. hostid 6721035. PMed to inform them and pointed to NC to ask for solutions for fixing it.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (84) Server Problems?

Copyright © 2014 University of California