Panic Mode On (41) Server problems

Message boards : Number crunching : Panic Mode On (41) Server problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1054010 - Posted: 8 Dec 2010, 18:46:10 UTC - in response to Message 1045537.  


Time for the first panic of the new system.
Network traffic has dropped way down (wasn't even pegged the way it was previously). Result creation rate shows as 0, splitters shown as all down.
Grant
Darwin NT
ID: 1054010 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1054012 - Posted: 8 Dec 2010, 18:49:00 UTC - in response to Message 1054010.  
Last modified: 8 Dec 2010, 18:49:28 UTC


Time for the first panic of the new system.
Network traffic has dropped way down (wasn't even pegged the way it was previously). Result creation rate shows as 0, splitters shown as all down.

I would imagine da boyz in da lab are doing some tuning of the setup.

I am sure there will be a lot of configuration issues in the coming weeks that will have to be adjusted as they learn the new hardware and get it tweaked to perfection.

See Matt's last tech post.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1054012 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1054023 - Posted: 8 Dec 2010, 18:57:24 UTC

And they have Oscar offline now as well, so they are playing with things a bit.
Hopefully not a big showstopper.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1054023 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1054031 - Posted: 8 Dec 2010, 19:11:16 UTC

No panic.

Matt just posted this in tech news......

"In case anybody is wondering - we're trying to increase various settings like I mentioned at the top of the thread, and this is leading to the predictably unexpected snags. No worries - we've proven we can fall back to this morning's settings without much ado, but we're leaving splitters/assimilators off for now in case we can figure this out quickly.

- Matt"
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1054031 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34746
Credit: 261,360,520
RAC: 489
Australia
Message 1054060 - Posted: 8 Dec 2010, 20:31:06 UTC - in response to Message 1054031.  

All but 2 AP splitters back to green now.

Cheers.
ID: 1054060 · Report as offensive
Dave

Send message
Joined: 29 Mar 02
Posts: 778
Credit: 25,001,396
RAC: 0
United Kingdom
Message 1054105 - Posted: 8 Dec 2010, 22:55:12 UTC

Surely someone should rename this thread to "Panic Mode Off (41) Server problems"... :D.
ID: 1054105 · Report as offensive
Profile Allie in Vancouver
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 3949
Credit: 1,604,668
RAC: 0
Canada
Message 1054132 - Posted: 9 Dec 2010, 0:16:17 UTC

Everything save one ap splitter running now.

438 MB ready to send. Such riches to behold!

@ Dave: there will always be the occasional interruption to the work flow and always some folks will panic about it. Nature of the beast. ;o)
Pure mathematics is, in its way, the poetry of logical ideas.

Albert Einstein
ID: 1054132 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1054308 - Posted: 9 Dec 2010, 17:24:26 UTC

Splitters down.....

More playtime for Matt?
He was having fun ramping up the RAM on Oscar yesterday....LOL.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1054308 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1054338 - Posted: 9 Dec 2010, 18:28:38 UTC - in response to Message 1054308.  


Up again, although i notice that the splitting rate is quite low- no Ready to Send buffer developing.
Grant
Darwin NT
ID: 1054338 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1054342 - Posted: 9 Dec 2010, 18:35:32 UTC

It's because from my understanding as of right now they are splitting and sending on demand.
Traveling through space at ~67,000mph!
ID: 1054342 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 1054343 - Posted: 9 Dec 2010, 18:39:08 UTC - in response to Message 1054342.  

It's because from my understanding as of right now they are splitting and sending on demand.

Well, not quite.
It's just that work is being sent out as fast as it can be split.
Once the caches start to fill, or current limits on cache are reached, ready to send will start to build.
But given the magnitude of the big outage, that may take some time to happen.

My rigs are currently having no problem getting enough to build a little cache and keep them all in production.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 1054343 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 1054376 - Posted: 9 Dec 2010, 20:35:31 UTC - in response to Message 1054366.  

Looks like Oscar still needs some tweaking.
ID: 1054376 · Report as offensive
-BeNt-
Avatar

Send message
Joined: 17 Oct 99
Posts: 1234
Credit: 10,116,112
RAC: 0
United States
Message 1054417 - Posted: 9 Dec 2010, 22:15:29 UTC - in response to Message 1054343.  

It's because from my understanding as of right now they are splitting and sending on demand.

Well, not quite.
It's just that work is being sent out as fast as it can be split.
Once the caches start to fill, or current limits on cache are reached, ready to send will start to build.
But given the magnitude of the big outage, that may take some time to happen.

My rigs are currently having no problem getting enough to build a little cache and keep them all in production.


Yeah mine are sending in results to Seti almost instantly when they are finished and downloading a new WU along with it. My cache has been at 300+ on both my machines all day today.

Right now it's saying there is 439/3 results in the ready to send box along with a ~13 second creation rate. And as soon as I typed that I checked, read refreshed, the server stats page and it's empty again. Ah well everything seems good on this end to keep my machines partly happy for awhile. Now if I could get some cpu WU's everything would be great! I'm sure there are some people out there that still can't get work units but over time it will get better, we are only what, one day in from the long downtime. Half full guys, half full.


Traveling through space at ~67,000mph!
ID: 1054417 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1054517 - Posted: 10 Dec 2010, 4:53:19 UTC - in response to Message 1054417.  


Still not spliting very fast & yet the network pipe is less than 2/3 full. Before the outage the system was able to saturate the network connection & still build a Ready to Send buffer. I think still more tweaking is required.
And the assimilator queue contines to grow.
Grant
Darwin NT
ID: 1054517 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1054522 - Posted: 10 Dec 2010, 5:02:57 UTC - in response to Message 1054517.  

I have to disagree. Before the outtage the bandwidth was saturated with errors and failures, repeated handshakes to try again. It is not staying saturated because work is getting THROUGH!!!
Janice
ID: 1054522 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 9 Jan 00
Posts: 2562
Credit: 12,301,681
RAC: 0
United States
Message 1054529 - Posted: 10 Dec 2010, 5:17:49 UTC - in response to Message 1054522.  

I have to disagree. Before the outtage the bandwidth was saturated with errors and failures, repeated handshakes to try again. It is not staying saturated because work is getting THROUGH!!!


I have to agree! Before the outage there were over 4000000 results in the field (which represents cache size etc). Work is being sent as fast as it can be split and sent. Then everyone knew that it would take a week+ to get things back to where they were before the Outage.

So we are now looking at what the New Balance will be. Yes, that will take time to establish.

Regards



Please consider a Donation to the Seti Project.

ID: 1054529 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1054546 - Posted: 10 Dec 2010, 7:28:10 UTC - in response to Message 1054529.  


And i disagree.
Previously network traffic was maxed out by downloads. The splitters were still able to produce enough work for the Ready to Send buffer to build up.
At present the network traffic is only 60Mb/s, yet the Ready to Send buffer is only 151 Work Units, it used to have a limit of 200,000. So far the highest it's been is 1,000.
Grant
Darwin NT
ID: 1054546 · Report as offensive
Blake Bonkofsky
Volunteer tester
Avatar

Send message
Joined: 29 Dec 99
Posts: 617
Credit: 46,383,149
RAC: 0
United States
Message 1054550 - Posted: 10 Dec 2010, 7:45:45 UTC - in response to Message 1054546.  


And i disagree.
Previously network traffic was maxed out by downloads. The splitters were still able to produce enough work for the Ready to Send buffer to build up.
At present the network traffic is only 60Mb/s, yet the Ready to Send buffer is only 151 Work Units, it used to have a limit of 200,000. So far the highest it's been is 1,000.



Coming back from a weekly outage, typically there were 300k+ WU's ready to be sent, resulting in immediately clogged pipes, server back-off was only 10sec instead of 5 minutes, and people weren't COMPLETELY out of work like I'm sure 99% of the userbase was two days ago. Now, EVERYONE needs work, so what WU's are available, are immediately being snatched up. If you look at the server status, it has been averaging 300-500 DB requests per second since the project went live again. At 25 WU/Sec creation rate, if even just 1/10 (30-50) of those requests are for work, those 25 WU's are going to be gone within the same second they are created. Throw in the fact that the team is still working on optimizing the servers, I'd say we are in pretty good shape. The project has been running on these machines now for the last week, and has done so without ANY unexpected downtime. Sure there have been a few glitches here and there as they have been trying various settings, but it has been quickly fixed and brought back up.
ID: 1054550 · Report as offensive
Highlander
Avatar

Send message
Joined: 5 Oct 99
Posts: 167
Credit: 37,987,668
RAC: 16
Germany
Message 1054552 - Posted: 10 Dec 2010, 7:47:08 UTC

And i only wondering about the result creation rate: actually i never saw them going over 25/s; before the outtage, they were able to go to 40/s. Hope, this is changeable through finetuning of the new servers.
- Performance is not a simple linear function of the number of CPUs you throw at the problem. -
ID: 1054552 · Report as offensive
Blake Bonkofsky
Volunteer tester
Avatar

Send message
Joined: 29 Dec 99
Posts: 617
Credit: 46,383,149
RAC: 0
United States
Message 1054553 - Posted: 10 Dec 2010, 7:48:43 UTC - in response to Message 1054552.  

And i only wondering about the result creation rate: actually i never saw them going over 25/s; before the outtage, they were able to go to 40/s. Hope, this is changeable through finetuning of the new servers.


I'm sure they'll be able to get that tuned up. This hardware is far superior to the previous setup, but being brand new, it will take some fine-tuning to get it to really run like it should.
ID: 1054553 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 10 · Next

Message boards : Number crunching : Panic Mode On (41) Server problems


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.