Panic Mode On (83) Server Problems?

Message boards : Number crunching : Panic Mode On (83) Server Problems?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 21 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1363114 - Posted: 1 May 2013, 7:22:33 UTC - in response to Message 1363109.  

Might be something with the new pfb splitting thingy, although I think it was Richard that expressed the opinion he thought they might be a little faster than the old splitters.....hmmmmmm.
Might depend on the work they are splitting.

I saw that mentioned, but so far they've proved to be about half the speed of the previous ones. 3/4 speed if they're really flying.
They're definately borked.
Grant
Darwin NT
ID: 1363114 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1363122 - Posted: 1 May 2013, 7:58:11 UTC - in response to Message 1363114.  

Might be something with the new pfb splitting thingy, although I think it was Richard that expressed the opinion he thought they might be a little faster than the old splitters.....hmmmmmm.
Might depend on the work they are splitting.

I saw that mentioned, but so far they've proved to be about half the speed of the previous ones. 3/4 speed if they're really flying.
They're definately borked.

Well, I got to get some sleeps now with the kitties.

I suspect they shall get up and paw the keyboard once in a while to keep things in order whilst I sleep. And shall keep the lights on for any wayward WUs that need to find a good home.

All WUs are welcome to the kitty crunching farm.

Meow, and good night.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1363122 · Report as offensive
Keith White
Avatar

Send message
Joined: 29 May 99
Posts: 392
Credit: 13,035,233
RAC: 22
United States
Message 1363145 - Posted: 1 May 2013, 9:22:24 UTC - in response to Message 1363106.  

It's interesting that 30 units/s creation rate isn't enough, at least during a shorty storm.

During a shorty storm 55/s is the minimum needed to meet demand.
The storm is over, but still the splitters aren't able to crank out enough work. For a while they were doing about 30/s (barely enough when there's a lot of VLARs in the mix). Now they've dropped down to less than 20.

Someone in the lab needs to take a look at what is going on- the splitters used to be able sustain 70/s no problems at all, now they can't even reach it as a peak.


EDIT- at least the shorty storm was over for a while. The work my systems were able to get after the outage while i was at work didn't have many shorties in it, but one of the systems was just able to get some more work (still nowhere near enough...) & it was almost all shorties.


The reason I was talking about a shorty storm was I had drained my queues to update BOINC and move from r390 to r1817 (my 3 cores pretty well match my low end GPU in performance so they were fairly close to running out at the same time) and when I turned off NNT I only got shorties for my CPU and the few GPU units I got were actually a mix but leaned toward shorties as well.

Edit: and once again without fail, the lack of GPU units get resolved as I'm writing about it. Woo Hoo. I finally have full queues again.

"Life is just nature's way of keeping meat fresh." - The Doctor
ID: 1363145 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1363287 - Posted: 1 May 2013, 17:55:37 UTC - in response to Message 1363145.  


Splitters still not keeping up.
Grant
Darwin NT
ID: 1363287 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1363291 - Posted: 1 May 2013, 18:09:21 UTC - in response to Message 1363287.  
Last modified: 1 May 2013, 18:22:58 UTC


Splitters still not keeping up.

The kitties shall just have to sniff and scratch and claw for all the WUs they can find for the crunchers today whilst I am at work.

They are usually pretty good at sniffing out Seti work.
Cache down about 330 from maximum right now across all 9 rigs, so they'll have to work at it.

But, it would be great if da boyz in da lab could commission one more of those neat pfb splitter jobs on the servers!
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1363291 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1363298 - Posted: 1 May 2013, 18:38:42 UTC

Just within the last hour have I finally been able to top off my allocation on all three rigs. Looks like the shortie storm is at least temporarily over.
ID: 1363298 · Report as offensive
Kevin Benfield

Send message
Joined: 29 Dec 03
Posts: 39
Credit: 30,085,439
RAC: 0
United Kingdom
Message 1363311 - Posted: 1 May 2013, 19:50:19 UTC

For some reason, I do have not been able to get any new units for most of the day, completely out of units to crunch as well, tried the usual kick to try and get things going but nothing :(
ID: 1363311 · Report as offensive
Profile Gatekeeper
Avatar

Send message
Joined: 14 Jul 04
Posts: 887
Credit: 176,479,616
RAC: 0
United States
Message 1363352 - Posted: 1 May 2013, 22:20:36 UTC - in response to Message 1363311.  

For some reason, I do have not been able to get any new units for most of the day, completely out of units to crunch as well, tried the usual kick to try and get things going but nothing :(


And, as is usually the case, 25 minutes after you posted, one of your rigs got work.
ID: 1363352 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1363408 - Posted: 2 May 2013, 3:34:45 UTC - in response to Message 1363352.  


Splitters still unable to build a ready-to-send buffer.
Grant
Darwin NT
ID: 1363408 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1363490 - Posted: 2 May 2013, 8:34:19 UTC
Last modified: 2 May 2013, 8:35:25 UTC

On 8 April 2013, as we came back up after the move to the Colocation Facility, Matt wrote this (emphasis mine):

Jeff and I predicted based on previous demand that we'd see, once things settled down, a bandwidth usage average of 150Mbits/second (as long as both multibeam and astropulse workunits were available). And in fact this is what we're seeing, though we are still tuning some throttle mechanisms to make sure we don't go much higher than that.

Why not go higher? At least three reasons for now. First, we don't really have the data or the ability to split workunits faster than that. Second, we eventually hope to move off Hurricane and get on the campus network (and wantonly grabbing all the bits we can for no clear scientific reason wouldn't be setting a good example that we are in control of our needs/traffic). Third, and perhaps most importantly, it seems that our result storage server can't handle much higher a load. Yes, that seems to be our big bottleneck at this point - the ability of that server to write results to disk much faster than current demand. We expected as much. We'll look into improving the disk i/o on that system soon.

I wonder if this is one of the throttles he's mentioned, to slow down the MB/pfb splitters so we don't overwhelm the result storage server and/or run out of data....
Donald
Infernal Optimist / Submariner, retired
ID: 1363490 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1363492 - Posted: 2 May 2013, 8:41:29 UTC - in response to Message 1363490.  

On 8 April 2013, as we came back up after the move to the Colocation Facility, Matt wrote this (emphasis mine):

Jeff and I predicted based on previous demand that we'd see, once things settled down, a bandwidth usage average of 150Mbits/second (as long as both multibeam and astropulse workunits were available). And in fact this is what we're seeing, though we are still tuning some throttle mechanisms to make sure we don't go much higher than that.

Why not go higher? At least three reasons for now. First, we don't really have the data or the ability to split workunits faster than that. Second, we eventually hope to move off Hurricane and get on the campus network (and wantonly grabbing all the bits we can for no clear scientific reason wouldn't be setting a good example that we are in control of our needs/traffic). Third, and perhaps most importantly, it seems that our result storage server can't handle much higher a load. Yes, that seems to be our big bottleneck at this point - the ability of that server to write results to disk much faster than current demand. We expected as much. We'll look into improving the disk i/o on that system soon.

I wonder if this is one of the throttles he's mentioned, to slow down the MB/pfb splitters so we don't overwhelm the result storage server and/or run out of data....


Possibly, although the problem continues even when there aren't tonnes of shorties going through the system. Prior to the move & the change of splitter type it wasn't an issue.
Grant
Darwin NT
ID: 1363492 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1363563 - Posted: 2 May 2013, 13:33:18 UTC

The blue line on the cricket graph is certainly interesting for the last 18 or 19 hours.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1363563 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1363570 - Posted: 2 May 2013, 13:52:03 UTC - in response to Message 1363563.  

The blue line on the cricket graph is certainly interesting for the last 18 or 19 hours.

More datasets being transferred from the lab to the colo in preparation for splitting, I suspect.

God bless the new bandwidth, eh?
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1363570 · Report as offensive
David S
Volunteer tester
Avatar

Send message
Joined: 4 Oct 99
Posts: 18352
Credit: 27,761,924
RAC: 12
United States
Message 1363572 - Posted: 2 May 2013, 13:55:50 UTC - in response to Message 1363570.  

The blue line on the cricket graph is certainly interesting for the last 18 or 19 hours.

More datasets being transferred from the lab to the colo in preparation for splitting, I suspect.

I hope so, with only about 300 MBs ready to send.

God bless the new bandwidth, eh?

Yup.

David
Sitting on my butt while others boldly go,
Waiting for a message from a small furry creature from Alpha Centauri.

ID: 1363572 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1363581 - Posted: 2 May 2013, 14:13:07 UTC - in response to Message 1363572.  

The blue line on the cricket graph is certainly interesting for the last 18 or 19 hours.

More datasets being transferred from the lab to the colo in preparation for splitting, I suspect.

I hope so, with only about 300 MBs ready to send.

That's a problem with the splitting rate, not the amount of data waiting to be split. There is plenty of data awaiting splitting right now.

We need another pfb splitter online.

Unless the powers that be are deliberately holding back to spare the bandwidth or database a bit.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1363581 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30608
Credit: 53,134,872
RAC: 32
United States
Message 1363620 - Posted: 2 May 2013, 17:09:09 UTC - in response to Message 1363581.  

The blue line on the cricket graph is certainly interesting for the last 18 or 19 hours.

More datasets being transferred from the lab to the colo in preparation for splitting, I suspect.

I hope so, with only about 300 MBs ready to send.

That's a problem with the splitting rate, not the amount of data waiting to be split. There is plenty of data awaiting splitting right now.

We need another pfb splitter online.

Unless the powers that be are deliberately holding back to spare the bandwidth or database a bit.

With the comments about not wanting to be too big a data hog, the limits on total units, the comment that we are going through the data faster than they are collecting it, I would think the conclusion is they are holding back on purpose.

Face it, there are more crunchers than there is Seti data. This is a good thing. Now how do we convince more of them to say join the 84+ cents a month club so that ntpckr development can continue?

ID: 1363620 · Report as offensive
ExchangeMan
Volunteer tester

Send message
Joined: 9 Jan 00
Posts: 115
Credit: 157,719,104
RAC: 0
United States
Message 1363687 - Posted: 2 May 2013, 19:39:48 UTC - in response to Message 1363581.  

The blue line on the cricket graph is certainly interesting for the last 18 or 19 hours.

More datasets being transferred from the lab to the colo in preparation for splitting, I suspect.

I hope so, with only about 300 MBs ready to send.

That's a problem with the splitting rate, not the amount of data waiting to be split. There is plenty of data awaiting splitting right now.

We need another pfb splitter online.

Unless the powers that be are deliberately holding back to spare the bandwidth or database a bit.

If they are holding back data, then they need to let us know this explicitly. It's a lot harder to get work units now compared to last week, but I'm making the assumption that this is temporary and we will return to a somwhat normal condition in a little while. If this is not going to be the case, I just want to know.

ID: 1363687 · Report as offensive
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1363696 - Posted: 2 May 2013, 19:57:10 UTC

If they are holding back data, then they need to let us know this explicitly. It's a lot harder to get work units now compared to last week, but I'm making the assumption that this is temporary and we will return to a somwhat normal condition in a little while. If this is not going to be the case, I just want to know.

You have been with this project a long while so surely you realise by now that "project communication" is at best sparse. Someone, probably Matt, will eventually let us know what has been going on. I suggest just waiting a while, I am sure all will be revealed when the team have time.
ID: 1363696 · Report as offensive
Profile Donald L. Johnson
Avatar

Send message
Joined: 5 Aug 02
Posts: 8240
Credit: 14,654,533
RAC: 20
United States
Message 1363782 - Posted: 3 May 2013, 5:09:09 UTC - in response to Message 1363696.  

If they are holding back data, then they need to let us know this explicitly. It's a lot harder to get work units now compared to last week, but I'm making the assumption that this is temporary and we will return to a somwhat normal condition in a little while. If this is not going to be the case, I just want to know.

You have been with this project a long while so surely you realise by now that "project communication" is at best sparse. Someone, probably Matt, will eventually let us know what has been going on. I suggest just waiting a while, I am sure all will be revealed when the team have time.

This post from Matt seems pretty explicit to me ...
On 8 April 2013, as we came back up after the move to the Colocation Facility, Matt wrote this (emphasis mine):

Jeff and I predicted based on previous demand that we'd see, once things settled down, a bandwidth usage average of 150Mbits/second (as long as both multibeam and astropulse workunits were available). And in fact this is what we're seeing, though we are still tuning some throttle mechanisms to make sure we don't go much higher than that.

Why not go higher? At least three reasons for now. First, we don't really have the data or the ability to split workunits faster than that. Second, we eventually hope to move off Hurricane and get on the campus network (and wantonly grabbing all the bits we can for no clear scientific reason wouldn't be setting a good example that we are in control of our needs/traffic). Third, and perhaps most importantly, it seems that our result storage server can't handle much higher a load. Yes, that seems to be our big bottleneck at this point - the ability of that server to write results to disk much faster than current demand. We expected as much. We'll look into improving the disk i/o on that system soon.




Donald
Infernal Optimist / Submariner, retired
ID: 1363782 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13720
Credit: 208,696,464
RAC: 304
Australia
Message 1363791 - Posted: 3 May 2013, 5:37:11 UTC - in response to Message 1363782.  

This post from Matt seems pretty explicit to me ...

However that post was on March 8, the problems we're having occured about a week, week and a half after that.
If it is related to that post, it would be nice to have confirmation, as things were running OK with a ready-to-send buffer.

Grant
Darwin NT
ID: 1363791 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 21 · Next

Message boards : Number crunching : Panic Mode On (83) Server Problems?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.