Panic Mode On (42) Server problems

Message boards : Number crunching : Panic Mode On (42) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next

AuthorMessage
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1057436 - Posted: 18 Dec 2010, 10:24:08 UTC

It would be interesting if they could actually implement true QoS.....

Favoring established connections at the expense of new ones.

I don't think that is what was happening at the time.

But, as things seem to have settled down a bit, I think it's time to raise the limits, eh?
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1057436 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1057536 - Posted: 18 Dec 2010, 16:46:38 UTC

Whatever.
By my calculations the current limits are compatible with a 4 day cache.

As any future *planned* outages will be only 3 days. Isn't this discussion academic??

T.A.
ID: 1057536 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1057546 - Posted: 18 Dec 2010, 17:28:46 UTC - in response to Message 1057536.  
Last modified: 18 Dec 2010, 17:30:11 UTC

Whatever.
By my calculations the current limits are compatible with a 4 day cache.

As any future *planned* outages will be only 3 days. Isn't this discussion academic??

T.A.

And what, pray tell, are you basing your calculations on?
The Frozen 920, running at 4.1Ghz, completes most MB work in about an hour.
Many only take half that time or less. Depends on the AR.
150 WUs won't get that rig through a single day on the 4 CPU cores.

You can have a look at the rig's valid results here.
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1057546 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1057577 - Posted: 18 Dec 2010, 19:06:11 UTC - in response to Message 1057536.  

By my calculations the current limits are compatible with a 4 day cache.

?
My system isn't a powerfull one, and my 4 day cache isn't full due to the present server side limits. I've got about 2.5-3 days worth with the present limit on work.
Grant
Darwin NT
ID: 1057577 · Report as offensive
Terror Australis
Volunteer tester

Send message
Joined: 14 Feb 04
Posts: 1817
Credit: 262,693,308
RAC: 44
Australia
Message 1057667 - Posted: 19 Dec 2010, 1:10:39 UTC - in response to Message 1057536.  

Whatever.
By my calculations the current limits are compatible with a 4 day cache.

As any future *planned* outages will be only 3 days. Isn't this discussion academic??

T.A.

OK. Statement retracted. Jose` Quervo was helping me with my calculations and we misplaced a decimal point.

Sorry for any angst caused :-)

T.A.
ID: 1057667 · Report as offensive
Profile perryjay
Volunteer tester
Avatar

Send message
Joined: 20 Aug 02
Posts: 3377
Credit: 20,676,751
RAC: 0
United States
Message 1057669 - Posted: 19 Dec 2010, 1:24:16 UTC - in response to Message 1057667.  

For those that missed it, Matt says the limits have been raised.


PROUD MEMBER OF Team Starfire World BOINC
ID: 1057669 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1057686 - Posted: 19 Dec 2010, 3:45:35 UTC - in response to Message 1057536.  
Last modified: 19 Dec 2010, 3:46:58 UTC

Whatever.
By my calculations the current limits are compatible with a 4 day cache.

As any future *planned* outages will be only 3 days. Isn't this discussion academic??

T.A.


not so my friend ...

it appears as though the limit is 320 per GPU. In my case that means that the dual GTX470s are limited to 640 wus. This is just under 2 days worth of work for each of these boxes

the problem is that they will go dry before the next outage finishes (as will many others) and the feeding frenzy begins again and it will take 2-3 days for these caches to fill back to this low limit, and then the cycle repeats.

all of us are effectively brought down to the lowest common denominator.

the limits need to be raised or abolished so that we can slowly build our caches up so that we can operate outside of seti's 3 day outage period, unscheduled down time, etc., and not be affected by the virtual DDoS attack that is generated by the outage.

in the past i ran my caches deep to avoid seti's ups and downs and never had an issue. if it went down i just waited till after it came back for few days to avoid congestion. with the limits in place i am now subject to seti's intermittent behaviour and its congestion issues.

L.
ID: 1057686 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1057688 - Posted: 19 Dec 2010, 3:47:37 UTC - in response to Message 1057686.  
Last modified: 19 Dec 2010, 3:48:04 UTC


not so my friend ...

it appears as though the limit is 320 per GPU. In my case that means that the dual GTX470s are limited to 640 wus. This is just under 2 days worth of work for each of these boxes

the problem is that they will go dry before the next outage finishes (as will many others) and the feeding frenzy begins again and it will take 2-3 days for these caches to fill back to this low limit, and then the cycle repeats.

all of us are effectively brought down to the lowest common denominator.

the limits need to be raised or abolished so that we can slowly build our caches up so that we can operate outside of seti's 3 day outage period, unscheduled down time, etc., and not be affected by the virtual DDoS attack that is generated by the outage.

in the past i ran my caches deep to avoid seti's ups and downs and never had an issue. if it went down i just waited till after it came back for few days to avoid congestion. with the limits in place i am now subject to seti's intermittent behaviour and its congestion issues.

L.


It must be because right now my GTS 250 has 564 tasks and my 480 has 376 before I communicate with the project again. And that's not counting the tasks for my cpus.
Traveling through space at ~67,000mph!
ID: 1057688 · Report as offensive
Lionel

Send message
Joined: 25 Mar 00
Posts: 680
Credit: 563,640,304
RAC: 597
Australia
Message 1057689 - Posted: 19 Dec 2010, 3:48:48 UTC - in response to Message 1057669.  

For those that missed it, Matt says the limits have been raised.


to what ...

ID: 1057689 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1057692 - Posted: 19 Dec 2010, 3:50:32 UTC - in response to Message 1057689.  

For those that missed it, Matt says the limits have been raised.


to what ...


Gotta be near double or higher considering I have 564 units on one machine.
Traveling through space at ~67,000mph!
ID: 1057692 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19403
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1057694 - Posted: 19 Dec 2010, 3:52:12 UTC
Last modified: 19 Dec 2010, 3:52:32 UTC

At the moment, this discussion about d/load limits is a bit academic, cause if you take a peek at the server status page they will run out of tasks very soon.

Unless someone comes in and finds some blanked data and puts it in the splitters.
ID: 1057694 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1057697 - Posted: 19 Dec 2010, 3:55:25 UTC - in response to Message 1057694.  

At the moment, this discussion about d/load limits is a bit academic, cause if you take a peek at the server status page they will run out of tasks very soon.

Unless someone comes in and finds some blanked data and puts it in the splitters.


It'll get handled. It's Saturday night, and I'm sure they won't be back around to deal with any of that till Monday morning. Kind of expected at this point in time. I know there has been some talk about re-striping the raid etc. So maybe they want it dried up for a reason, or maybe they are still underestimating the power of the new servers. Either way my machines have work according to Boinc for the next 8/5 days and then some.
Traveling through space at ~67,000mph!
ID: 1057697 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 1057700 - Posted: 19 Dec 2010, 4:14:22 UTC - in response to Message 1057392.  

... Not sure how accurate their cricket graphs are but the highest I've ever seen it was about 97mbps, so kind of makes you wonder what QoS they do have running. ...

I've looked numerous times at setting up Cricket on my own equipment (to no success so far), but from what I've seen, the "chirps" (each vertical column of pixels) can be configured as an average of an interval of a time period. It's not a case of making a query to the hardware every X seconds, but instead, the hardware sends out SNMP packets saying what is going on, and Cricket just simply listens to those packets and makes sense of the data it is looking for. The default is, I believe, 5 minutes per chirp. That makes it not an excellent tool for real-time monitoring, but for time-lapse trends instead.

As far as what the throttling is set for, or the QoS... no idea.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 1057700 · Report as offensive
Scarecrow

Send message
Joined: 15 Jul 00
Posts: 4520
Credit: 486,601
RAC: 0
United States
Message 1057703 - Posted: 19 Dec 2010, 4:42:59 UTC
Last modified: 19 Dec 2010, 4:44:16 UTC

I think I feel a disturbance in the Scheduler force.


Project communication failed: attempting access to reference site
Scheduler request failed: Server returned nothing (no headers, no data)
Internet access OK - project servers may be temporarily down.
ID: 1057703 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1057720 - Posted: 19 Dec 2010, 5:42:41 UTC - in response to Message 1057700.  

... Not sure how accurate their cricket graphs are but the highest I've ever seen it was about 97mbps, so kind of makes you wonder what QoS they do have running. ...

I've looked numerous times at setting up Cricket on my own equipment (to no success so far), but from what I've seen, the "chirps" (each vertical column of pixels) can be configured as an average of an interval of a time period. It's not a case of making a query to the hardware every X seconds, but instead, the hardware sends out SNMP packets saying what is going on, and Cricket just simply listens to those packets and makes sense of the data it is looking for. The default is, I believe, 5 minutes per chirp. That makes it not an excellent tool for real-time monitoring, but for time-lapse trends instead.

As far as what the throttling is set for, or the QoS... no idea.


Yeah sounds about right. I've setup MRTG on hardware numerous times. And it only updates at max every 5 minutes. So that could be, I suppose, the reason why it never goes over that, however on hardware I've dealt with if you spike all the way out it will show eventually with the graphs running 24/7. No huge deal anyways as it's just an indication not 100% since it only updates every 5 minutes or so.
Traveling through space at ~67,000mph!
ID: 1057720 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13855
Credit: 208,696,464
RAC: 304
Australia
Message 1058350 - Posted: 21 Dec 2010, 7:33:18 UTC


I was going to post about the assimilator queue radipdly growning, but then read in the Tech News that Matt has turned them off in preparation for some shuffling about of data over the next couple of days.
Grant
Darwin NT
ID: 1058350 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1058426 - Posted: 21 Dec 2010, 13:14:48 UTC

Ruh roh........

I noticed some glitches and timeouts in forum access in the last half hour or so.

And now I see that the Cricket graphs seem to have taken a dive.

Could there be problems in kittyland?

Meow?
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1058426 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1058427 - Posted: 21 Dec 2010, 13:17:54 UTC - in response to Message 1058426.  

Probably just a maintenance brown-out, like Matt posted about last week.
ID: 1058427 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51478
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1058428 - Posted: 21 Dec 2010, 13:19:11 UTC - in response to Message 1058427.  

Probably just a maintenance brown-out, like Matt posted about last week.

At THIS time of the morning?????
"Time is simply the mechanism that keeps everything from happening all at once."

ID: 1058428 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1058430 - Posted: 21 Dec 2010, 13:28:14 UTC - in response to Message 1058428.  

Probably just a maintenance brown-out, like Matt posted about last week.

At THIS time of the morning?????

Quite likely - it was around this time last week too.

Campus maintenance elves have to work unsocial hours - they get complaints from faculty otherwise.
ID: 1058430 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 11 · Next

Message boards : Number crunching : Panic Mode On (42) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.