There Goes a Tenner (Jan 20 2011) |
![]() |
| log in |
Message boards : Technical News : There Goes a Tenner (Jan 20 2011)
1 · 2 · Next
| Author | Message |
|---|---|
|
As expected it took about 1.5 days to copy all the results from our failed upload server (bruno) to the new upload server (synergy). I was out yesterday hence the lack of update from me, but nothing could get done until the result copy finished anyway. | |
| ID: 1068814 · | |
|
Thanks for the update Matt, and for you and Jeff getting the replacement Bruno working in quick time, | |
| ID: 1068817 · | |
Random drives are disappearing, random partitions are disappearing Crikey that's a bit of a conundrum. I'm glad the data was transferred OK, and I hope you have one of those eureka moments :) | |
| ID: 1068865 · | |
|
Matt, thanks for the news and the whole crew for their work! | |
| ID: 1068867 · | |
|
Ideally, to do multiple upload and post-processing servers, you'd need a very easy way to partition the results. Essentially, you'd have to create multiple pipelines. For example, odd numbered results _1, _3, etc would goto upload1, validator2. Even numbered results _0, _2, etc would goto upload2, validator2, etc. | |
| ID: 1068874 · | |
Matt, thanks for the news and the whole crew for their work! Previously there was a 10 sec backoff too, that is now 5 minutes. I'm sure that leads to a lot of the "smoothing" we are seeing. ____________ | |
| ID: 1068928 · | |
|
Performance wise I'd expect Synergy to be about 10-20% of the throughput of Bruno. This means that the catch-up from an outage will be slower, with a higher retry count. So we have to sit here and be patient for a bit longer. So what? the data we are processing is already a few months old, and not "time critical". | |
| ID: 1068945 · | |
Performance wise I'd expect Synergy to be about 10-20% of the throughput of Bruno. This means that the catch-up from an outage will be slower, with a higher retry count. So we have to sit here and be patient for a bit longer. So what? the data we are processing is already a few months old, and not "time critical". Uploads and downloads seem to have settled down nicely, but there's quite a backlog growing for validations - also running on Synergy (aka 'the new Bruno'). They'll be held back by the lack of disk I/O, too - every validation attempt will require finding and retrieving at least two, and possibly several, previously uploaded result files. | |
| ID: 1068951 · | |
|
Sooooooooo. | |
| ID: 1068956 · | |
Fix Bruno, or add to Synergy? I'd go with fix Bruno. Synergy is already spoken for. ____________ Grant Darwin NT. | |
| ID: 1068958 · | |
Fix Bruno, or add to Synergy? For what? Last I heard, it's duties were not spoken for yet. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1068960 · | |
|
In the server list it shows it as handling Nitpicker duties? | |
| ID: 1068968 · | |
In the server list it shows it as handling Nitpicker duties? I think Matt said that was kinda a test routine. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1068969 · | |
|
Either way, synergy is way more powerful than the demands of the upload server require. Look at the old specs of bruno compared to synergy. The RAM is different by a factor of twelve! To replace bruno completely probably wouldn't cost nearly as much as carolyn and oscar, nor even synergy. | |
| ID: 1068971 · | |
|
Here's a little script I wrote that would gradually open up the flood gates and stop the fileserver from thrashing and dropping half uploaded files #!/bin/bash # choke - gradually open seti@home gates # usage: choke [0-3|setup] # IPT=/sbin/iptables if [ "$1" = "setup" ]; then $IPT -N CHOKE $IPT -F CHOKE $IPT -I INPUT 1 -p tcp -m state --state ESTABLISHED,RELATED -j ACCEPT $IPT -I INPUT 2 -p tcp -m state --state NEW --dport 80 -j CHOKE fi $IPT -F CHOKE # accept 0/4 traffic if [ $1 = 0 ]; then $IPT -A CHOKE -j REJECT; fi; # accept 1/4 traffic - reject *.1,2,3 but pass .0,4,8,12,16 etc. if [ $1 = 1 ]; then $IPT -A CHOKE \! -s 0.0.0.0/0.0.0.3 -j REJECT; fi; # accept 2/4 traffic - reject *.2,3 pass 0,1,4,5,6,9 etc. if [ $1 = 2 ]; then $IPT -A CHOKE \! -s 0.0.0.0/0.0.0.2 -j REJECT; fi; # accept 3/4 traffic - reject *.3 pass 0,1,2,4,5,6,8,9,10 etc. if [ $1 = 3 ]; then $IPT -A CHOKE -s 0.0.0.3/0.0.0.3 -j REJECT; fi; This would open up address space in 1/4 steps using inverse subnets, though could suffer from favouritism. Another way would be to cycle quarters for an hour each until traffic reduced enough to open up completely. | |
| ID: 1068980 · | |
Fix Bruno, or add to Synergy? I disagree. Bruno is an old server that is having hardware problems again (dropped drives in array). There would be far less headaches just to add spindles to Synergy (a new server). That's assuming there's money for either operation. | |
| ID: 1069010 · | |
|
Synergy does not have the ability to install additional drives in its chassis. | |
| ID: 1069017 · | |
Synergy does not have the ability to install additional drives in its chassis.I suppose we should wait and see if Bruno is still viable. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1069019 · | |
|
Oh, I fully agree! If it was the raid card I was prepared to just order one and have it drop-shipped to Berkely but it sounds like it is more than just that being the problem. | |
| ID: 1069042 · | |
Oh, I fully agree! If it was the raid card I was prepared to just order one and have it drop-shipped to Berkely but it sounds like it is more than just that being the problem. You worked at Cray??? I was impressed with your knowledge, and your generosity. Now I am REALLY impressed. That explains a lot. ____________ ****** "Ask not, what your kitty can do for you. Ask what you can do for your kitty." As it is kitten, so shall it be done. | |
| ID: 1069048 · | |
Message boards : Technical News : There Goes a Tenner (Jan 20 2011)
| Copyright © 2013 University of California |