Panic Mode On (84) Server Problems?

Message boards : Number crunching : Panic Mode On (84) Server Problems?

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 22 · Next

AuthorMessage
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 29565
Credit: 49,029,477
RAC: 16,644
Germany
Message 1384497 - Posted: 25 Jun 2013, 7:22:18 UTC

My GPU is rarely getting work also.
I want my VLAR`s back.


With each crime and every kindness we birth our future.

ID: 1384497 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45884
Credit: 814,714,178
RAC: 121,902
United States
Message 1384499 - Posted: 25 Jun 2013, 7:26:21 UTC
Last modified: 25 Jun 2013, 7:28:55 UTC

I am wondering what the sustained inbound traffic to the servers as seen on the Cricket graphs for the last 10-11 hours is all about.....

It is unlike any 'normal' comms from the hosts, and it is also unlike any previous bursts of uploading new data from the lab.

DOS attack??

The kitties are curious.


Kitties make wonderful traveling companions on your journey through life.

Have made a few friends in this life.
Most were cats.

ID: 1384499 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7475
Credit: 90,910,429
RAC: 45,362
Australia
Message 1384502 - Posted: 25 Jun 2013, 7:38:39 UTC - in response to Message 1384499.

I am wondering what the sustained inbound traffic to the servers as seen on the Cricket graphs for the last 10-11 hours is all about.....

It is unlike any 'normal' comms from the hosts, and it is also unlike any previous bursts of uploading new data from the lab.

DOS attack??

The kitties are curious.


I was thinking it might be the usual data from the archive traffic, but limited by whatever is causing issues with the Scheduler/splitters/feeders.

There's plenty of work there, but it's just not being allocated. AP assimilators are backing up again. Ready-to-send buffer actually spiked much higher than usual before it topped out.
So something's gummed up the works.
Grant
Darwin NT

ID: 1384502 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45884
Credit: 814,714,178
RAC: 121,902
United States
Message 1384504 - Posted: 25 Jun 2013, 7:44:09 UTC - in response to Message 1384502.

I am wondering what the sustained inbound traffic to the servers as seen on the Cricket graphs for the last 10-11 hours is all about.....

It is unlike any 'normal' comms from the hosts, and it is also unlike any previous bursts of uploading new data from the lab.

DOS attack??

The kitties are curious.


I was thinking it might be the usual data from the archive traffic, but limited by whatever is causing issues with the Scheduler/splitters/feeders.

There's plenty of work there, but it's just not being allocated. AP assimilators are backing up again. Ready-to-send buffer actually spiked much higher than usual before it topped out.
So something's gummed up the works.

I just dunno....
Whatever is going on, it is not usual.

When you are on the Cricket graphs, click the 'long term' link.
There has not been any activity similar to this since the move to the colo.


Kitties make wonderful traveling companions on your journey through life.

Have made a few friends in this life.
Most were cats.

ID: 1384504 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7475
Credit: 90,910,429
RAC: 45,362
Australia
Message 1384506 - Posted: 25 Jun 2013, 7:46:07 UTC - in response to Message 1384504.

When you are on the Cricket graphs, click the 'long term' link.
There has not been any activity similar to this since the move to the colo.

Nope.
But then we haven't had Scheduler/feeder issues either since the move. Till now.
Grant
Darwin NT

ID: 1384506 · Report as offensive
Cruncher-American

Send message
Joined: 25 Mar 02
Posts: 1310
Credit: 175,601,851
RAC: 103,927
United States
Message 1384536 - Posted: 25 Jun 2013, 11:52:41 UTC - in response to Message 1384425.

2 of your computers are getting work, just one isn't. Look to it to make sure it is asking for work.



The one not getting work is no longer online. It was GPU only. I recently acquired an i7-3820 and MB and 4x2GB of quad channel 2133MHz RAM (thanks, Craigslist!), and have replaced Unimatrix002 with I7-3820. Am now running 8 HT cores, with 4 CPU tasks and the other 4 reserved for graphics support (love those AP 6.04s!).

When I can get them, that is.

Right now, neither is getting work except for a (very) few APs overnight. I7-3820 is about to run out of work; Fermibox2 still has a bunch to do.

ID: 1384536 · Report as offensive
JohnDKCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 28 May 00
Posts: 944
Credit: 87,766,462
RAC: 47,435
Denmark
Message 1384542 - Posted: 25 Jun 2013, 12:26:43 UTC

Re. lack of AP work. Could it be that more and more switch to doing AP only (like me), due to the credits + there's now a stock linux GPU AP app?

ID: 1384542 · Report as offensive
Cruncher-American

Send message
Joined: 25 Mar 02
Posts: 1310
Credit: 175,601,851
RAC: 103,927
United States
Message 1384546 - Posted: 25 Jun 2013, 13:20:12 UTC - in response to Message 1384542.

Re. lack of AP work. Could it be that more and more switch to doing AP only (like me), due to the credits + there's now a stock linux GPU AP app?

Maybe, but I note that on the server status page, the number of channels to do for MB hasn't changed since at least last night, and very few (if any) of AP have been processed either.

Definitely seems to be a splitter problem, at least at first glance.

ID: 1384546 · Report as offensive
Cruncher-American

Send message
Joined: 25 Mar 02
Posts: 1310
Credit: 175,601,851
RAC: 103,927
United States
Message 1384552 - Posted: 25 Jun 2013, 14:09:12 UTC

UPDATE: Since my last post, SSP says that 3 channels of AP and 3 of MB have been processed. Not very much.


ID: 1384552 · Report as offensive
Profile Mike
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 29565
Credit: 49,029,477
RAC: 16,644
Germany
Message 1384560 - Posted: 25 Jun 2013, 14:56:44 UTC - in response to Message 1384546.
Last modified: 25 Jun 2013, 14:56:58 UTC

Re. lack of AP work. Could it be that more and more switch to doing AP only (like me), due to the credits + there's now a stock linux GPU AP app?

Maybe, but I note that on the server status page, the number of channels to do for MB hasn't changed since at least last night, and very few (if any) of AP have been processed either.

Definitely seems to be a splitter problem, at least at first glance.


I think its more a feeder problem since still over 300.000 V7 are ready to send.
I might be wrong on that.
With each crime and every kindness we birth our future.

ID: 1384560 · Report as offensive
Cruncher-American

Send message
Joined: 25 Mar 02
Posts: 1310
Credit: 175,601,851
RAC: 103,927
United States
Message 1384561 - Posted: 25 Jun 2013, 14:59:01 UTC - in response to Message 1384560.

I think its more a feeder problem since still over 300.000 V7 are ready to send.
I might be wrong on that.


The question then becomes: do we split a feeder or feed a splitter?

ID: 1384561 · Report as offensive
Profile Gary CharpentierCrowdfunding Project Donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 18618
Credit: 21,384,402
RAC: 19,971
United States
Message 1384566 - Posted: 25 Jun 2013, 15:15:39 UTC - in response to Message 1384536.

Right now, neither is getting work except for a (very) few APs overnight. I7-3820 is about to run out of work; Fermibox2 still has a bunch to do.

About to run out of work is normal with the BOINC 7.X. Unless you change the setting for connect every to a long value, it drains the queue until it is dry before it asks for more work.

e.g. set to default 10 times a day, queue is drained to 1/10 day's work. Set to every 2 days, queue is drained to 2 days work. Yeah, kind of works backwards.

I don't suggest a too big value for the extra work fill either. We haven't had a long outage since they have moved to the co-lo and too big a number here simply runs you into the max tasks limit and you don't get it anyway.

ID: 1384566 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45884
Credit: 814,714,178
RAC: 121,902
United States
Message 1384577 - Posted: 25 Jun 2013, 15:43:48 UTC

And I see the 'mystery bandwidth' has continued.
The kitties have been losing cache all night. Not a usual thingy since the move to the colo.
Hopefully they'll sort it all during today's outage.


Kitties make wonderful traveling companions on your journey through life.

Have made a few friends in this life.
Most were cats.

ID: 1384577 · Report as offensive
Profile Fred E.
Volunteer tester

Send message
Joined: 22 Jul 99
Posts: 768
Credit: 24,140,697
RAC: 0
United States
Message 1384621 - Posted: 25 Jun 2013, 20:03:33 UTC

Guess something got fixed. My first request after maintenance brought me back to the limits. Hope it holds up.

And I see the 'mystery bandwidth' has continued.

Don't know what to think of that. Maybe they thorottled back to Lab-to-CoLo transfers?
Another Fred
Support SETI@home when you search the Web with GoodSearch or shop online with GoodShop.

ID: 1384621 · Report as offensive
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 13305
Credit: 154,260,731
RAC: 114,039
United Kingdom
Message 1384632 - Posted: 25 Jun 2013, 21:04:22 UTC

Oh dear, I see 20jn12ac is still hanging around, stuck with loads of errors :-(


Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

ID: 1384632 · Report as offensive
Cruncher-American

Send message
Joined: 25 Mar 02
Posts: 1310
Credit: 175,601,851
RAC: 103,927
United States
Message 1384650 - Posted: 25 Jun 2013, 22:05:08 UTC - in response to Message 1384566.

@Gary - THANKS! That explains why I7-3820 (my new cruncher) wasn't even asking for WUs except when nearly empty. I changed the settings per your suggestion, and the machine promptly loaded up to the limit.


ID: 1384650 · Report as offensive
Profile Wiggo "Socialist"
Avatar

Send message
Joined: 24 Jan 00
Posts: 10499
Credit: 135,212,039
RAC: 37,291
Australia
Message 1384692 - Posted: 26 Jun 2013, 2:17:11 UTC - in response to Message 1384632.

Oh dear, I see 20jn12ac is still hanging around, stuck with loads of errors :-(

It certainly isn't giving in, is it?

Cheers.

ID: 1384692 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 7475
Credit: 90,910,429
RAC: 45,362
Australia
Message 1384729 - Posted: 26 Jun 2013, 5:53:04 UTC - in response to Message 1384692.
Last modified: 26 Jun 2013, 5:53:24 UTC

Splitters still having issues. Ready-to-send buffer continues to shrink with the splitters unable to produce enough work. Could run out of work yet again all within 24hors of last running out.


Grant
Darwin NT

ID: 1384729 · Report as offensive
kittymanProject Donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 45884
Credit: 814,714,178
RAC: 121,902
United States
Message 1384740 - Posted: 26 Jun 2013, 7:00:33 UTC - in response to Message 1384729.

Splitters still having issues. Ready-to-send buffer continues to shrink with the splitters unable to produce enough work. Could run out of work yet again all within 24hors of last running out.

Well, that stuck dataset 20jn12ac is not helping anything. Tying up one MB splitter that is not producing WUs.

Eric said that Matt had restarted it. It apparently is still in terminal fail mode. I send Eric another message, and we'll have to wait until tomorrow to see if they kick it again or just boot the sucker.
Kitties make wonderful traveling companions on your journey through life.

Have made a few friends in this life.
Most were cats.

ID: 1384740 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 2871
Credit: 10,620,227
RAC: 301
United States
Message 1384744 - Posted: 26 Jun 2013, 7:15:21 UTC
Last modified: 26 Jun 2013, 7:22:46 UTC

Was just looking through my single-core machine's recently-reported v7 WUs and saw a cool one. 1265536732

The beginnings of an inconclusive train. I guess build 1846 and stock CPU app didn't quite agree with each other..? Its obvious cuda32 just completely got it wrong. It'll be interesting to see what happens when cuda50 gives it a whirl.

edit: and then I looked into it more.. the cuda32 attempt is a runaway machine. hostid 6721035. PMed to inform them and pointed to NC to ask for solutions for fixing it.


Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

ID: 1384744 · Report as offensive
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (84) Server Problems?


 
©2016 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.