Panic Mode On (83) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (83) Server Problems?

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 22 · Next
Author Message
Profile Bernie Vine
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 6601
Credit: 22,371,573
RAC: 14,859
United Kingdom
Message 1363696 - Posted: 2 May 2013, 19:57:10 UTC

If they are holding back data, then they need to let us know this explicitly. It's a lot harder to get work units now compared to last week, but I'm making the assumption that this is temporary and we will return to a somwhat normal condition in a little while. If this is not going to be the case, I just want to know.

You have been with this project a long while so surely you realise by now that "project communication" is at best sparse. Someone, probably Matt, will eventually let us know what has been going on. I suggest just waiting a while, I am sure all will be revealed when the team have time.
____________


Today is life, the only life we're sure of. Make the most of today.

Profile Donald L. Johnson
Avatar
Send message
Joined: 5 Aug 02
Posts: 5681
Credit: 561,761
RAC: 621
United States
Message 1363782 - Posted: 3 May 2013, 5:09:09 UTC - in response to Message 1363696.

If they are holding back data, then they need to let us know this explicitly. It's a lot harder to get work units now compared to last week, but I'm making the assumption that this is temporary and we will return to a somwhat normal condition in a little while. If this is not going to be the case, I just want to know.

You have been with this project a long while so surely you realise by now that "project communication" is at best sparse. Someone, probably Matt, will eventually let us know what has been going on. I suggest just waiting a while, I am sure all will be revealed when the team have time.

This post from Matt seems pretty explicit to me ...
On 8 April 2013, as we came back up after the move to the Colocation Facility, Matt wrote this (emphasis mine):

Jeff and I predicted based on previous demand that we'd see, once things settled down, a bandwidth usage average of 150Mbits/second (as long as both multibeam and astropulse workunits were available). And in fact this is what we're seeing, though we are still tuning some throttle mechanisms to make sure we don't go much higher than that.

Why not go higher? At least three reasons for now. First, we don't really have the data or the ability to split workunits faster than that. Second, we eventually hope to move off Hurricane and get on the campus network (and wantonly grabbing all the bits we can for no clear scientific reason wouldn't be setting a good example that we are in control of our needs/traffic). Third, and perhaps most importantly, it seems that our result storage server can't handle much higher a load. Yes, that seems to be our big bottleneck at this point - the ability of that server to write results to disk much faster than current demand. We expected as much. We'll look into improving the disk i/o on that system soon.




____________
Donald
Infernal Optimist / Submariner, retired

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5561
Credit: 51,233,164
RAC: 38,313
Australia
Message 1363791 - Posted: 3 May 2013, 5:37:11 UTC - in response to Message 1363782.

This post from Matt seems pretty explicit to me ...

However that post was on March 8, the problems we're having occured about a week, week and a half after that.
If it is related to that post, it would be nice to have confirmation, as things were running OK with a ready-to-send buffer.

____________
Grant
Darwin NT.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37286
Credit: 497,928,047
RAC: 492,928
United States
Message 1363836 - Posted: 3 May 2013, 8:03:47 UTC - in response to Message 1363791.
Last modified: 3 May 2013, 8:04:39 UTC

This post from Matt seems pretty explicit to me ...

However that post was on March 8, the problems we're having occured about a week, week and a half after that.
If it is related to that post, it would be nice to have confirmation, as things were running OK with a ready-to-send buffer.

I suspect this shall either pass or become the new normal, although right now it is being brought to bear due to the shorties being sent out.

We are dealing with quite a different animal since the move to the new digs for the servers. The amazing bandwidth is a two edged sword. Cuts both ways.

It has allowed the servers to send and receive all the data possible in both directions most of the time. This is a very grand thing!

What has happened, is that now other limitations present themselves. Splitting capacity. Database capacity. I/O capacity.
Still, a very good thing.

What do you think is worse, not being able to get work because the servers could not parse it out over 100Mb bandwidth, or not being able to get work at times because the servers are handing it out as fast as they can feed it to the new pipe?

I vote for the current situation.

The day may well come when Eric or Matt have to post and say..'We have no more data from Arecibo to split today.'

And that, believe it or not, would be the best situation the project could be in.
____________
******************
Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5561
Credit: 51,233,164
RAC: 38,313
Australia
Message 1363838 - Posted: 3 May 2013, 8:11:13 UTC - in response to Message 1363836.

I suspect this shall either pass or become the new normal, although right now it is being brought to bear due to the shorties being sent out.

Unfortunately it was also the case for several hours where there were hardly any shorties about.

____________
Grant
Darwin NT.

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37286
Credit: 497,928,047
RAC: 492,928
United States
Message 1363842 - Posted: 3 May 2013, 8:16:57 UTC - in response to Message 1363838.

I suspect this shall either pass or become the new normal, although right now it is being brought to bear due to the shorties being sent out.

Unfortunately it was also the case for several hours where there were hardly any shorties about.

Well, either there is something left to be sorted, or this is just the way it shall be.

I am just now looking at the Killawatt on one of my best rigs. 350 watts.
It should be drawing about 650 watts. It will keep sniffing for work, and will get a hit here and there.

I DO still wish the limits were raised, would keep my fast crunchers in work for a bit longer when the ready to send crashes.

But, I am in the same boat as everybody else. And will keep rowing as hard as I can.
____________
******************
Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

juan BFB
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 4609
Credit: 232,248,229
RAC: 330,832
Brazil
Message 1363849 - Posted: 3 May 2013, 9:01:50 UTC

I belive the problem is the way the pfb_splitter works, seems they are slower than the old mb_splitter used until few days ago and the lack of AP work who makes we all asking for MB work only and some hidden limit added by Matt. Sooner or later we will know the answer. Until that is fixed our power bill would be smaller...
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5561
Credit: 51,233,164
RAC: 38,313
Australia
Message 1363853 - Posted: 3 May 2013, 9:32:16 UTC - in response to Message 1363842.

I DO still wish the limits were raised, would keep my fast crunchers in work for a bit longer when the ready to send crashes.

Even that wouldn't help in this case as the output isn't sufficient to meet demand, let alone produce a ready-to-send buffer.

Ideally, they'll sort out the splitters so they can maintain a ready-to-send buffer.
Then sort out the database issues so we can increase the limits so we can get enough work to last more than a couple of hours.


If we run out of work, we run out of work. I can live with that (it'd be disappointing, but if there isn't any to process then there's no point getting worked up about it).
But when there is work there to be done, but it's not available, that's frustrating.
____________
Grant
Darwin NT.

Profile Donald L. Johnson
Avatar
Send message
Joined: 5 Aug 02
Posts: 5681
Credit: 561,761
RAC: 621
United States
Message 1364063 - Posted: 3 May 2013, 18:06:01 UTC - in response to Message 1363791.
Last modified: 3 May 2013, 18:15:43 UTC

This post from Matt seems pretty explicit to me ...

However that post was on March 8, the problems we're having occured about a week, week and a half after that.
If it is related to that post, it would be nice to have confirmation, as things were running OK with a ready-to-send buffer.

No, Grant, it is from April 8, immediately after the move to the CoLo facility. We have been here less than 1 month, and for the last 2 weeks the data being split has been mostly shorties. Absent any new word from Matt, I must presume that those throttles he mentioned are still in place, and for the reasons he stated. We are still establishing the new "normal". As always with this project, patience is not just a virtue, it is a requirement.

Edit: added link to Matt's Tech News message.
____________
Donald
Infernal Optimist / Submariner, retired

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37286
Credit: 497,928,047
RAC: 492,928
United States
Message 1364076 - Posted: 3 May 2013, 18:50:05 UTC
Last modified: 3 May 2013, 18:55:05 UTC

I just wish for what I came here for.
Un ending WUs for me to process in the name of the un endeding searrcy.

I have come and gone.

You can not sidetrack me any more.

Ya know, friends..........I been wrong as hell sometimes.
NOt on this one.

\Hit the road, Jack.
\

I really don't think some of you realize just who you really dealing with.
____________
******************
Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5561
Credit: 51,233,164
RAC: 38,313
Australia
Message 1364128 - Posted: 3 May 2013, 21:16:50 UTC - in response to Message 1364063.

This post from Matt seems pretty explicit to me ...

However that post was on March 8, the problems we're having occured about a week, week and a half after that.
If it is related to that post, it would be nice to have confirmation, as things were running OK with a ready-to-send buffer.

No, Grant, it is from April 8, immediately after the move to the CoLo facility.

Typo on my part- i meant April. The rest of my statement in that post is correct.

____________
Grant
Darwin NT.

Profile Sliver
Avatar
Send message
Joined: 18 May 11
Posts: 281
Credit: 6,538,297
RAC: 14,494
United States
Message 1364185 - Posted: 3 May 2013, 23:58:46 UTC - in response to Message 1364076.

I really don't think some of you realize just who you really dealing with.


No one is any bit scared of you, so knock it off.

Can't seem to get any work for the GPU. Anyone else with the same issue?

____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5561
Credit: 51,233,164
RAC: 38,313
Australia
Message 1364202 - Posted: 4 May 2013, 0:20:53 UTC - in response to Message 1364185.
Last modified: 4 May 2013, 0:21:15 UTC

Can't seem to get any work for the GPU. Anyone else with the same issue?

Nope.

"Project has no tasks available" is the usual response for any type of work request over the last week or so, then every now & then some work gets allocated. Depending on the backoffs for CPU or GPU work one or the other may miss out on that work allocation because it wasn't asked for due to the backoff.
____________
Grant
Darwin NT.

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44514
Credit: 35,395,520
RAC: 9,369
Message 1364216 - Posted: 4 May 2013, 1:02:04 UTC - in response to Message 1364185.

I really don't think some of you realize just who you really dealing with.


No one is any bit scared of you, so knock it off.

Can't seem to get any work for the GPU. Anyone else with the same issue?

I'm offline until 7pm, but this might explain the lack of work.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5561
Credit: 51,233,164
RAC: 38,313
Australia
Message 1364229 - Posted: 4 May 2013, 1:27:36 UTC - in response to Message 1364216.

I'm offline until 7pm, but this might explain the lack of work.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets

?
____________
Grant
Darwin NT.

zoom314
Avatar
Send message
Joined: 30 Nov 03
Posts: 44514
Credit: 35,395,520
RAC: 9,369
Message 1364237 - Posted: 4 May 2013, 1:45:49 UTC - in response to Message 1364229.

I'm offline until 7pm, but this might explain the lack of work.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets

?

The wu output seems kind of low.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5561
Credit: 51,233,164
RAC: 38,313
Australia
Message 1364240 - Posted: 4 May 2013, 1:54:36 UTC - in response to Message 1364237.
Last modified: 4 May 2013, 1:56:23 UTC

I'm offline until 7pm, but this might explain the lack of work.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets

?

The wu output seems kind of low.

That just shows network traffic.

This gives a better idea of WU generation.
http://setistats.haveland.com/cgi/munin-cgi-graph/setiathome/setiathome/sah_creation-day.png

and this shows you how much work is ready-to-send & how much is being returned per hour.
http://setistats.haveland.com/cgi/munin-cgi-graph/setiathome/setiathome/sah_results-day.png
____________
Grant
Darwin NT.

Speedy
Volunteer tester
Avatar
Send message
Joined: 26 Jun 04
Posts: 618
Credit: 4,747,470
RAC: 4,373
New Zealand
Message 1364245 - Posted: 4 May 2013, 2:01:54 UTC - in response to Message 1364237.

I'm offline until 7pm, but this might explain the lack of work.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets

?

The wu output seems kind of low.

Work out is the blue line this represents the work been returned to the servers. I think this was sitting about normal 18.19 MB as I write, reason not why it has been so high yesterday Berkely time is because I think they were transferring more data sets to the servers for the splitters to be able to work on. As I write the bits in data been sent to clients is at 189.06 MB I don't consider this to be low unless it becomes under 100 MB.


____________

Live in NZ y not join Smile City?

msattler
Volunteer tester
Avatar
Send message
Joined: 9 Jul 00
Posts: 37286
Credit: 497,928,047
RAC: 492,928
United States
Message 1364394 - Posted: 4 May 2013, 14:55:02 UTC

Not so good here right now.
I have a few powerful rigs left sucking big wind for the lack of tasks sent.
____________
******************
Crunching Seti, loving all of God's kitties.

I have met a few friends in my life.
Most were cats.

Richard Haselgrove
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8275
Credit: 44,914,609
RAC: 13,524
United Kingdom
Message 1364397 - Posted: 4 May 2013, 14:59:15 UTC - in response to Message 1364394.

Not so good here right now.
I have a few powerful rigs left sucking big wind for the lack of tasks sent.

Sorry guys ;)

04/05/2013 15:55:05 | SETI@home | Scheduler request completed: got 48 new tasks

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (83) Server Problems?

Copyright © 2014 University of California