Panic Mode On (83) Server Problems?


log in

Advanced search

Message boards : Number crunching : Panic Mode On (83) Server Problems?

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 22 · Next
Author Message
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 26 May 99
Posts: 7135
Credit: 28,515,034
RAC: 16,914
United Kingdom
Message 1363696 - Posted: 2 May 2013, 19:57:10 UTC

If they are holding back data, then they need to let us know this explicitly. It's a lot harder to get work units now compared to last week, but I'm making the assumption that this is temporary and we will return to a somwhat normal condition in a little while. If this is not going to be the case, I just want to know.

You have been with this project a long while so surely you realise by now that "project communication" is at best sparse. Someone, probably Matt, will eventually let us know what has been going on. I suggest just waiting a while, I am sure all will be revealed when the team have time.
____________


Today is life, the only life we're sure of. Make the most of today.

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6326
Credit: 769,481
RAC: 597
United States
Message 1363782 - Posted: 3 May 2013, 5:09:09 UTC - in response to Message 1363696.

If they are holding back data, then they need to let us know this explicitly. It's a lot harder to get work units now compared to last week, but I'm making the assumption that this is temporary and we will return to a somwhat normal condition in a little while. If this is not going to be the case, I just want to know.

You have been with this project a long while so surely you realise by now that "project communication" is at best sparse. Someone, probably Matt, will eventually let us know what has been going on. I suggest just waiting a while, I am sure all will be revealed when the team have time.

This post from Matt seems pretty explicit to me ...
On 8 April 2013, as we came back up after the move to the Colocation Facility, Matt wrote this (emphasis mine):

Jeff and I predicted based on previous demand that we'd see, once things settled down, a bandwidth usage average of 150Mbits/second (as long as both multibeam and astropulse workunits were available). And in fact this is what we're seeing, though we are still tuning some throttle mechanisms to make sure we don't go much higher than that.

Why not go higher? At least three reasons for now. First, we don't really have the data or the ability to split workunits faster than that. Second, we eventually hope to move off Hurricane and get on the campus network (and wantonly grabbing all the bits we can for no clear scientific reason wouldn't be setting a good example that we are in control of our needs/traffic). Third, and perhaps most importantly, it seems that our result storage server can't handle much higher a load. Yes, that seems to be our big bottleneck at this point - the ability of that server to write results to disk much faster than current demand. We expected as much. We'll look into improving the disk i/o on that system soon.




____________
Donald
Infernal Optimist / Submariner, retired

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5918
Credit: 61,709,657
RAC: 18,253
Australia
Message 1363791 - Posted: 3 May 2013, 5:37:11 UTC - in response to Message 1363782.

This post from Matt seems pretty explicit to me ...

However that post was on March 8, the problems we're having occured about a week, week and a half after that.
If it is related to that post, it would be nice to have confirmation, as things were running OK with a ready-to-send buffer.

____________
Grant
Darwin NT.

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5918
Credit: 61,709,657
RAC: 18,253
Australia
Message 1363838 - Posted: 3 May 2013, 8:11:13 UTC - in response to Message 1363836.

I suspect this shall either pass or become the new normal, although right now it is being brought to bear due to the shorties being sent out.

Unfortunately it was also the case for several hours where there were hardly any shorties about.

____________
Grant
Darwin NT.

juan BFBProject donor
Volunteer tester
Avatar
Send message
Joined: 16 Mar 07
Posts: 5473
Credit: 313,441,391
RAC: 94,012
Brazil
Message 1363849 - Posted: 3 May 2013, 9:01:50 UTC

I belive the problem is the way the pfb_splitter works, seems they are slower than the old mb_splitter used until few days ago and the lack of AP work who makes we all asking for MB work only and some hidden limit added by Matt. Sooner or later we will know the answer. Until that is fixed our power bill would be smaller...
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5918
Credit: 61,709,657
RAC: 18,253
Australia
Message 1363853 - Posted: 3 May 2013, 9:32:16 UTC - in response to Message 1363842.

I DO still wish the limits were raised, would keep my fast crunchers in work for a bit longer when the ready to send crashes.

Even that wouldn't help in this case as the output isn't sufficient to meet demand, let alone produce a ready-to-send buffer.

Ideally, they'll sort out the splitters so they can maintain a ready-to-send buffer.
Then sort out the database issues so we can increase the limits so we can get enough work to last more than a couple of hours.


If we run out of work, we run out of work. I can live with that (it'd be disappointing, but if there isn't any to process then there's no point getting worked up about it).
But when there is work there to be done, but it's not available, that's frustrating.
____________
Grant
Darwin NT.

Profile Donald L. JohnsonProject donor
Avatar
Send message
Joined: 5 Aug 02
Posts: 6326
Credit: 769,481
RAC: 597
United States
Message 1364063 - Posted: 3 May 2013, 18:06:01 UTC - in response to Message 1363791.
Last modified: 3 May 2013, 18:15:43 UTC

This post from Matt seems pretty explicit to me ...

However that post was on March 8, the problems we're having occured about a week, week and a half after that.
If it is related to that post, it would be nice to have confirmation, as things were running OK with a ready-to-send buffer.

No, Grant, it is from April 8, immediately after the move to the CoLo facility. We have been here less than 1 month, and for the last 2 weeks the data being split has been mostly shorties. Absent any new word from Matt, I must presume that those throttles he mentioned are still in place, and for the reasons he stated. We are still establishing the new "normal". As always with this project, patience is not just a virtue, it is a requirement.

Edit: added link to Matt's Tech News message.
____________
Donald
Infernal Optimist / Submariner, retired

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5918
Credit: 61,709,657
RAC: 18,253
Australia
Message 1364128 - Posted: 3 May 2013, 21:16:50 UTC - in response to Message 1364063.

This post from Matt seems pretty explicit to me ...

However that post was on March 8, the problems we're having occured about a week, week and a half after that.
If it is related to that post, it would be nice to have confirmation, as things were running OK with a ready-to-send buffer.

No, Grant, it is from April 8, immediately after the move to the CoLo facility.

Typo on my part- i meant April. The rest of my statement in that post is correct.

____________
Grant
Darwin NT.

Profile SliverProject donor
Avatar
Send message
Joined: 18 May 11
Posts: 281
Credit: 7,192,891
RAC: 111
United States
Message 1364185 - Posted: 3 May 2013, 23:58:46 UTC - in response to Message 1364076.

I really don't think some of you realize just who you really dealing with.


No one is any bit scared of you, so knock it off.

Can't seem to get any work for the GPU. Anyone else with the same issue?

____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5918
Credit: 61,709,657
RAC: 18,253
Australia
Message 1364202 - Posted: 4 May 2013, 0:20:53 UTC - in response to Message 1364185.
Last modified: 4 May 2013, 0:21:15 UTC

Can't seem to get any work for the GPU. Anyone else with the same issue?

Nope.

"Project has no tasks available" is the usual response for any type of work request over the last week or so, then every now & then some work gets allocated. Depending on the backoffs for CPU or GPU work one or the other may miss out on that work allocation because it wasn't asked for due to the backoff.
____________
Grant
Darwin NT.

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46816
Credit: 37,000,998
RAC: 2,483
United States
Message 1364216 - Posted: 4 May 2013, 1:02:04 UTC - in response to Message 1364185.

I really don't think some of you realize just who you really dealing with.


No one is any bit scared of you, so knock it off.

Can't seem to get any work for the GPU. Anyone else with the same issue?

I'm offline until 7pm, but this might explain the lack of work.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets
____________
My Facebook, War Commander, 2015

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5918
Credit: 61,709,657
RAC: 18,253
Australia
Message 1364229 - Posted: 4 May 2013, 1:27:36 UTC - in response to Message 1364216.

I'm offline until 7pm, but this might explain the lack of work.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets

?
____________
Grant
Darwin NT.

zoom314Project donor
Avatar
Send message
Joined: 30 Nov 03
Posts: 46816
Credit: 37,000,998
RAC: 2,483
United States
Message 1364237 - Posted: 4 May 2013, 1:45:49 UTC - in response to Message 1364229.

I'm offline until 7pm, but this might explain the lack of work.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets

?

The wu output seems kind of low.
____________
My Facebook, War Commander, 2015

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5918
Credit: 61,709,657
RAC: 18,253
Australia
Message 1364240 - Posted: 4 May 2013, 1:54:36 UTC - in response to Message 1364237.
Last modified: 4 May 2013, 1:56:23 UTC

I'm offline until 7pm, but this might explain the lack of work.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets

?

The wu output seems kind of low.

That just shows network traffic.

This gives a better idea of WU generation.
http://setistats.haveland.com/cgi/munin-cgi-graph/setiathome/setiathome/sah_creation-day.png

and this shows you how much work is ready-to-send & how much is being returned per hour.
http://setistats.haveland.com/cgi/munin-cgi-graph/setiathome/setiathome/sah_results-day.png
____________
Grant
Darwin NT.

Speedy
Volunteer tester
Avatar
Send message
Joined: 26 Jun 04
Posts: 702
Credit: 5,993,479
RAC: 1,523
New Zealand
Message 1364245 - Posted: 4 May 2013, 2:01:54 UTC - in response to Message 1364237.

I'm offline until 7pm, but this might explain the lack of work.

http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=/router-interfaces/inr-211/gigabitethernet6_17&ranges=d%3Aw&view=Octets

?

The wu output seems kind of low.

Work out is the blue line this represents the work been returned to the servers. I think this was sitting about normal 18.19 MB as I write, reason not why it has been so high yesterday Berkely time is because I think they were transferring more data sets to the servers for the splitters to be able to work on. As I write the bits in data been sent to clients is at 189.06 MB I don't consider this to be low unless it becomes under 100 MB.


____________

Live in NZ y not join Smile City?

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8764
Credit: 52,716,643
RAC: 17,358
United Kingdom
Message 1364397 - Posted: 4 May 2013, 14:59:15 UTC - in response to Message 1364394.

Not so good here right now.
I have a few powerful rigs left sucking big wind for the lack of tasks sent.

Sorry guys ;)

04/05/2013 15:55:05 | SETI@home | Scheduler request completed: got 48 new tasks

Previous · 1 . . . 9 · 10 · 11 · 12 · 13 · 14 · 15 . . . 22 · Next

Message boards : Number crunching : Panic Mode On (83) Server Problems?

Copyright © 2014 University of California