Panic Mode On (84) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (84) Server Problems?

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 22 · Next
Author Message
Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24941
Credit: 34,409,274
RAC: 7,745
Germany
Message 1384497 - Posted: 25 Jun 2013, 7:22:18 UTC

My GPU is rarely getting work also.
I want my VLAR`s back.

____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5920
Credit: 61,710,351
RAC: 17,622
Australia
Message 1384502 - Posted: 25 Jun 2013, 7:38:39 UTC - in response to Message 1384499.

I am wondering what the sustained inbound traffic to the servers as seen on the Cricket graphs for the last 10-11 hours is all about.....

It is unlike any 'normal' comms from the hosts, and it is also unlike any previous bursts of uploading new data from the lab.

DOS attack??

The kitties are curious.


I was thinking it might be the usual data from the archive traffic, but limited by whatever is causing issues with the Scheduler/splitters/feeders.

There's plenty of work there, but it's just not being allocated. AP assimilators are backing up again. Ready-to-send buffer actually spiked much higher than usual before it topped out.
So something's gummed up the works.
____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5920
Credit: 61,710,351
RAC: 17,622
Australia
Message 1384506 - Posted: 25 Jun 2013, 7:46:07 UTC - in response to Message 1384504.

When you are on the Cricket graphs, click the 'long term' link.
There has not been any activity similar to this since the move to the colo.

Nope.
But then we haven't had Scheduler/feeder issues either since the move. Till now.
____________
Grant
Darwin NT.

jravin
Send message
Joined: 25 Mar 02
Posts: 969
Credit: 104,816,575
RAC: 35,496
United States
Message 1384536 - Posted: 25 Jun 2013, 11:52:41 UTC - in response to Message 1384425.

2 of your computers are getting work, just one isn't. Look to it to make sure it is asking for work.



The one not getting work is no longer online. It was GPU only. I recently acquired an i7-3820 and MB and 4x2GB of quad channel 2133MHz RAM (thanks, Craigslist!), and have replaced Unimatrix002 with I7-3820. Am now running 8 HT cores, with 4 CPU tasks and the other 4 reserved for graphics support (love those AP 6.04s!).

When I can get them, that is.

Right now, neither is getting work except for a (very) few APs overnight. I7-3820 is about to run out of work; Fermibox2 still has a bunch to do.
____________

JohnDKProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 00
Posts: 876
Credit: 48,318,816
RAC: 20,113
Denmark
Message 1384542 - Posted: 25 Jun 2013, 12:26:43 UTC

Re. lack of AP work. Could it be that more and more switch to doing AP only (like me), due to the credits + there's now a stock linux GPU AP app?

jravin
Send message
Joined: 25 Mar 02
Posts: 969
Credit: 104,816,575
RAC: 35,496
United States
Message 1384546 - Posted: 25 Jun 2013, 13:20:12 UTC - in response to Message 1384542.

Re. lack of AP work. Could it be that more and more switch to doing AP only (like me), due to the credits + there's now a stock linux GPU AP app?

Maybe, but I note that on the server status page, the number of channels to do for MB hasn't changed since at least last night, and very few (if any) of AP have been processed either.

Definitely seems to be a splitter problem, at least at first glance.
____________

jravin
Send message
Joined: 25 Mar 02
Posts: 969
Credit: 104,816,575
RAC: 35,496
United States
Message 1384552 - Posted: 25 Jun 2013, 14:09:12 UTC

UPDATE: Since my last post, SSP says that 3 channels of AP and 3 of MB have been processed. Not very much.
____________

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24941
Credit: 34,409,274
RAC: 7,745
Germany
Message 1384560 - Posted: 25 Jun 2013, 14:56:44 UTC - in response to Message 1384546.
Last modified: 25 Jun 2013, 14:56:58 UTC

Re. lack of AP work. Could it be that more and more switch to doing AP only (like me), due to the credits + there's now a stock linux GPU AP app?

Maybe, but I note that on the server status page, the number of channels to do for MB hasn't changed since at least last night, and very few (if any) of AP have been processed either.

Definitely seems to be a splitter problem, at least at first glance.


I think its more a feeder problem since still over 300.000 V7 are ready to send.
I might be wrong on that.
____________

jravin
Send message
Joined: 25 Mar 02
Posts: 969
Credit: 104,816,575
RAC: 35,496
United States
Message 1384561 - Posted: 25 Jun 2013, 14:59:01 UTC - in response to Message 1384560.

I think its more a feeder problem since still over 300.000 V7 are ready to send.
I might be wrong on that.


The question then becomes: do we split a feeder or feed a splitter?
____________

Profile Gary CharpentierProject donor
Volunteer tester
Avatar
Send message
Joined: 25 Dec 00
Posts: 13004
Credit: 7,666,318
RAC: 6,097
United States
Message 1384566 - Posted: 25 Jun 2013, 15:15:39 UTC - in response to Message 1384536.

Right now, neither is getting work except for a (very) few APs overnight. I7-3820 is about to run out of work; Fermibox2 still has a bunch to do.

About to run out of work is normal with the BOINC 7.X. Unless you change the setting for connect every to a long value, it drains the queue until it is dry before it asks for more work.

e.g. set to default 10 times a day, queue is drained to 1/10 day's work. Set to every 2 days, queue is drained to 2 days work. Yeah, kind of works backwards.

I don't suggest a too big value for the extra work fill either. We haven't had a long outage since they have moved to the co-lo and too big a number here simply runs you into the max tasks limit and you don't get it anyway.

____________

Profile Fred E.Project donor
Volunteer tester
Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,139,004
RAC: 0
United States
Message 1384621 - Posted: 25 Jun 2013, 20:03:33 UTC

Guess something got fixed. My first request after maintenance brought me back to the limits. Hope it holds up.

And I see the 'mystery bandwidth' has continued.

Don't know what to think of that. Maybe they thorottled back to Lab-to-CoLo transfers?
____________
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8754
Credit: 61,654,605
RAC: 33,588
United Kingdom
Message 1384632 - Posted: 25 Jun 2013, 21:04:22 UTC

Oh dear, I see 20jn12ac is still hanging around, stuck with loads of errors :-(
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

jravin
Send message
Joined: 25 Mar 02
Posts: 969
Credit: 104,816,575
RAC: 35,496
United States
Message 1384650 - Posted: 25 Jun 2013, 22:05:08 UTC - in response to Message 1384566.

@Gary - THANKS! That explains why I7-3820 (my new cruncher) wasn't even asking for WUs except when nearly empty. I changed the settings per your suggestion, and the machine promptly loaded up to the limit.
____________

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 7983
Credit: 98,334,789
RAC: 23,455
Australia
Message 1384692 - Posted: 26 Jun 2013, 2:17:11 UTC - in response to Message 1384632.

Oh dear, I see 20jn12ac is still hanging around, stuck with loads of errors :-(

It certainly isn't giving in, is it?

Cheers.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5920
Credit: 61,710,351
RAC: 17,622
Australia
Message 1384729 - Posted: 26 Jun 2013, 5:53:04 UTC - in response to Message 1384692.
Last modified: 26 Jun 2013, 5:53:24 UTC

Splitters still having issues. Ready-to-send buffer continues to shrink with the splitters unable to produce enough work. Could run out of work yet again all within 24hors of last running out.
____________
Grant
Darwin NT.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2327
Credit: 8,869,285
RAC: 683
United States
Message 1384744 - Posted: 26 Jun 2013, 7:15:21 UTC
Last modified: 26 Jun 2013, 7:22:46 UTC

Was just looking through my single-core machine's recently-reported v7 WUs and saw a cool one. 1265536732

The beginnings of an inconclusive train. I guess build 1846 and stock CPU app didn't quite agree with each other..? Its obvious cuda32 just completely got it wrong. It'll be interesting to see what happens when cuda50 gives it a whirl.

edit: and then I looked into it more.. the cuda32 attempt is a runaway machine. hostid 6721035. PMed to inform them and pointed to NC to ask for solutions for fixing it.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (84) Server Problems?

Copyright © 2014 University of California