Mirror (Feb 26 2009) |
![]() |
| log in |
Message boards : Technical News : Mirror (Feb 26 2009)
1 · 2 · Next
| Author | Message |
|---|---|
|
Random day today for me. Catching up on various documentation/sysadmin/data pipeline tasks. Not very glamorous. | |
| ID: 869769 · | |
|
I understand that when the bandwidth only spikes for 'short' periods of time, it may not be a priority to fix it. However, per a not-so-recent news release, S@H's need for more computers and volunteers was going to increase many-fold. In light of this hoped-for need, can one actually afford to deprioritize the bandwidth issue? | |
| ID: 869834 · | |
I understand that when the bandwidth only spikes for 'short' periods of time, it may not be a priority to fix it. However, per a not-so-recent news release, S@H's need for more computers and volunteers was going to increase many-fold. In light of this hoped-for need, can one actually afford to deprioritize the bandwidth issue? Well, to clarify - bandwidth is definitely a priority. Solving them by compressing workunits? Not so much... It wasn't a priority in the past for reasons I stated earlier, and it isn't a priority in the near future since we'll need a lot more bandwidth than compressing workunits will provide. This is why we're exploring the bigger options mentioned in an earlier thread. - Matt ____________ -- BOINC/SETI@home network/web/science/development person -- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude | |
| ID: 869840 · | |
|
As usual - Thanks for the Posting Matt - It's really appreciated here . . . < wonderin' how many ever notice that re: the quote ;) > ____________ BOINC Wiki . . .Science Status Page . . . | |
| ID: 869863 · | |
this is really only a problem (so far) during client download phases - workunits alone don't really clobber the network except for short, infrequent events (like right after the weekly outage). Question: could the client binaries themselves be located off-site (or closer to your feed), perhaps a single server just for those files? ____________ | |
| ID: 869868 · | |
this is really only a problem (so far) during client download phases - workunits alone don't really clobber the network except for short, infrequent events (like right after the weekly outage). That technique was causing quite a bit of the grief. The problem was the redirect - which apparently a large number of firewalls and ISPs do not allow. The ISP substitutes a page that notes that this is not allowed, and the substitute page downloads correctly. BOINC thinks it has the file until it does a checksum at which point all tasks that relied on that file error out and then new ones are downloaded and run into the same problem. Churning like this is a good way of filling the bandwidth. If the BOINC client knew where to go look for the files instead of a redirect, then it would work better. Another possibility would to have something like BitTorrent - where BOINC would ask the torrent for the files and the torrent would tell BOINC where to go fetch. ____________ BOINC WIKI | |
| ID: 869876 · | |
I understand that when the bandwidth only spikes for 'short' periods of time, it may not be a priority to fix it. However, per a not-so-recent news release, S@H's need for more computers and volunteers was going to increase many-fold. In light of this hoped-for need, can one actually afford to deprioritize the bandwidth issue? The A2010 ALFALFA Spring 2009 observations will continue until the end of April, so the midrange work they provide gives a convenient period to decide how to handle the increase in VHAR 'shorties' after that. I guess the delivery pipeline would make the critical time the second or third week of May. I agree compression wouldn't be enough for a permanent solution, but it might ease problems temporarily. Joe | |
| ID: 869889 · | |
|
Thanks Matt.. | |
| ID: 869891 · | |
this is really only a problem (so far) during client download phases - workunits alone don't really clobber the network except for short, infrequent events (like right after the weekly outage). Wearing my ISP hat for a moment, I'm not sure why an ISP would block the redirect. Corporate America is a different story. Do the applications have to come from the same download servers as the work? Seems like the easiest solution would be to tell BOINC to look for clients.setiathome.ssl.berkeley.EDU or somesuch and download from there. Then that could be one, or several, distributed servers around the planet just based on the number of "A" records in DNS. ... or it could be mapped to the same IP on smaller projects. ____________ | |
| ID: 869911 · | |
Second, the concern that binary data will freak out some incredibly protective proxies or ISPs (the traffic is all going over port 80). But binary data is already going over port 80. A lot of web pages already get compressed by the web server before they get transferred to browser with then proceeds to uncompress that stream to render it. 2/26/2009 10:59:55 PM||Libraries: libcurl/7.19.0 OpenSSL/0.9.8i zlib/1.2.3 That's why libcurl includes zlib, to unpack the data stream. The client appears to already be ready to at least receive if not already send compressed streams. caveat vendor, I'm not a programmer and looking at the documentation I don't see a way of handling it on the sending side, not that it would be necessary. And if the server is already doing THAT processing, zipping the file isn't going to help the bandwidth one little bit. ____________ | |
| ID: 869928 · | |
|
I agree implementing your own compression within BOINC would require a significant amount of work, and the rewards are not large enough given your time and financial constraints. | |
| ID: 869951 · | |
|
Oh, I didn't see your post, Bryan. Yes, if it's already being compressed via HTTP realtime, then there's no point to using GZIP/Deflate. But if it's not... then that should be a quick fix to enable. | |
| ID: 869954 · | |
|
| |
| ID: 869960 · | |
But binary data is already going over port 80. A lot of web pages already get compressed by the web server before they get transferred to browser with then proceeds to uncompress that stream to render it. I think we're forgetting GIFs and JPGs, to name a couple. ____________ | |
| ID: 869961 · | |
That technique was causing quite a bit of the grief. The problem was the redirect - which apparently a large number of firewalls and ISPs do not allow. The ISP substitutes a page that notes that this is not allowed, and the substitute page downloads correctly. BOINC thinks it has the file until it does a checksum at which point all tasks that relied on that file error out and then new ones are downloaded and run into the same problem. Churning like this is a good way of filling the bandwidth. If the BOINC client knew where to go look for the files instead of a redirect, then it would work better. Another possibility would to have something like BitTorrent - where BOINC would ask the torrent for the files and the torrent would tell BOINC where to go fetch. The problem wasn't so much with the location of the files on another server, it was with the technique used to tell the BOINC client where to collect the files from. Matt made the (very reasonable) point that 'caching' (of any description) means that the relief server doesn't have to be manually loaded with the new files when anything changes. But subject to that limitation, and the need to issue Matt with a pair of long-range kicking boots in case anything goes wrong, then establishing an application download server with a different URL closer to the head-end of the 1Gb/s feed might buy a bit of time. Einstein does something like this - even supplying BOINC with a set of multiple download urls - and it seems to work well in general, with just occasional glitches if a mirror site goes off-air. Comparing notes with Einstein might throw up some more ideas. | |
| ID: 869981 · | |
However... Have you considered HTTP based compression? Apache could compress the work unit downloads realtime using gzip or deflate. You'll get about 27% smaller downloads this way. gz compression over http is great for servers that are generating significant html/xml/css/js traffic, and although advisable to use on this forum, this traffic is a tiny fraction of the WU bandwidth requirements. Its fraction of total bandwidth would be small. Compressibility is related to entropy, and in terms of compression a given block of data has varying degrees of 'sponginess' - you could take a large sponge, compress it and feed it though a gas pipe, and it expands again on reaching the exit. Same principle. However, we are analysing noise that is random and has no identifiable redundancy - to any compressor it is indistinguishable from concrete! The only viable solution that I recommend is to repackage the XML workunits with binary CDATA instead of base64 encoding. I don't agree that this will cause any problems with routing or content filtering - http has to handle all forms of binary transmission, or we would have no web! If the apps were properly XML compliant and not make assumptions about content encoding then they should not need to be rewritten. Transmission should be transparent regardless of content encoding, and much bandwidth could be saved as with Astropulse WUs. Andy. | |
| ID: 869996 · | |
|
I just received a 1200+ hour Astropulse work unit (computer is a P4 2.65Mhz) isn't that kinda long ?? | |
| ID: 870016 · | |
That technique was causing quite a bit of the grief. The problem was the redirect - which apparently a large number of firewalls and ISPs do not allow. The ISP substitutes a page that notes that this is not allowed, and the substitute page downloads correctly. BOINC thinks it has the file until it does a checksum at which point all tasks that relied on that file error out and then new ones are downloaded and run into the same problem. Churning like this is a good way of filling the bandwidth. If the BOINC client knew where to go look for the files instead of a redirect, then it would work better. Another possibility would to have something like BitTorrent - where BOINC would ask the torrent for the files and the torrent would tell BOINC where to go fetch. I was kind of thinking more along the lines of don't try to download any work for an application that needs to be downloaded, until the application has successfully downloaded. The problem was WUs were being assigned, BOINC was saying "hey, I need this application to run these tasks", it would then try to get the application, then the app download failed, and all the WUs were dumped, and the process would start over again at the next connect interval. Another thing that would have greatly helped keep that situation from spiraling out of control is what I believe would be a better way to do the quota system. Instead of doubling the quota for a good result returned, it should just be +2. It was pointed out that if you are on an 8-CPU system and you turn in 800+ bad tasks, it only takes eleven (11) good results to bring the quota back to 800. 4-cpu takes 10, 2-cpu only takes 8. Then there's the CUDA quotas that were thrown into the mix, as well, with the multiply factor for that. I think +2 instead of 2x would keep problem computers at bay very nicely. It doesn't even have to be +2.. it can be +5..just as long as it's not multiplication. ____________ Linux laptop uptime: 1484d 22h 42m Ended due to UPS failure, found 14 hours after the fact | |
| ID: 870055 · | |
|
This may be a dumb question, but... | |
| ID: 870108 · | |
This may be a dumb question, but... A problem computer is one that returns an excessive amount of errors. Each error reduces your daily CPU quota by one. Typically you want to be at 100 all the time, but something as simple as missing a deadline will count as an error and reduce the quota. You can look at your hosts page and check and see what the daily quotas are for all of your systems. If you find one that is lower than 90, there's a problem that needs to be addressed. Now that I've said that, there are many different problems, and of course, multiple solutions per problem, so I'm not going to make this post any longer than this. ____________ Linux laptop uptime: 1484d 22h 42m Ended due to UPS failure, found 14 hours after the fact | |
| ID: 870111 · | |
Message boards : Technical News : Mirror (Feb 26 2009)
| Copyright © 2013 University of California |