Mirror (Feb 26 2009)


log in

Advanced search

Message boards : Technical News : Mirror (Feb 26 2009)

1 · 2 · Next
Author Message
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 869769 - Posted: 26 Feb 2009, 19:46:29 UTC

Random day today for me. Catching up on various documentation/sysadmin/data pipeline tasks. Not very glamorous.

The question was raised: Why don't we compress workunits to save bandwidth? I forget the exact arguments, but I think it's a combination of small things that, when added together, make this a very low priority. First, the programming overhead to the splitters, clients, etc. - however minor it may be it's still labor and (even worse) testing. Second, the concern that binary data will freak out some incredibly protective proxies or ISPs (the traffic is all going over port 80). Third, the amount of bandwidth we'd gain by compressing workunits is relatively minor considering the possible effort of making it so. Fourth, this is really only a problem (so far) during client download phases - workunits alone don't really clobber the network except for short, infrequent events (like right after the weekly outage). We might be actually implementing better download logic to prevent coral cache from being a redirect, so that may solve this latter issue. Anyway.. this idea comes up from time to time within our group and we usually determine we have bigger fish to fry. Or lower hanging fruit.

Oh - I guess that's the end of this month's thread title theme: names of lakes in or around the Sierras that I've been to.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1622
Credit: 22,063,680
RAC: 4,104
United States
Message 869834 - Posted: 26 Feb 2009, 23:12:27 UTC

I understand that when the bandwidth only spikes for 'short' periods of time, it may not be a priority to fix it. However, per a not-so-recent news release, S@H's need for more computers and volunteers was going to increase many-fold. In light of this hoped-for need, can one actually afford to deprioritize the bandwidth issue?

More generally, if a virtual stress test could be performed with 10x more hosts than currently are involved (the news release suggests a potential factor of 500-yikes!), what parts of S@H wouldn't scale and would need to be re-engineered?

Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar
Send message
Joined: 1 Mar 99
Posts: 1389
Credit: 74,079
RAC: 0
United States
Message 869840 - Posted: 26 Feb 2009, 23:23:58 UTC - in response to Message 869834.

I understand that when the bandwidth only spikes for 'short' periods of time, it may not be a priority to fix it. However, per a not-so-recent news release, S@H's need for more computers and volunteers was going to increase many-fold. In light of this hoped-for need, can one actually afford to deprioritize the bandwidth issue?


Well, to clarify - bandwidth is definitely a priority. Solving them by compressing workunits? Not so much... It wasn't a priority in the past for reasons I stated earlier, and it isn't a priority in the near future since we'll need a lot more bandwidth than compressing workunits will provide. This is why we're exploring the bigger options mentioned in an earlier thread.

- Matt

____________
-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude

Profile Dr. C.E.T.I.
Avatar
Send message
Joined: 29 Feb 00
Posts: 15993
Credit: 690,597
RAC: 1
United States
Message 869863 - Posted: 27 Feb 2009, 0:38:09 UTC




<snip>

Oh - I guess that's the end of this month's thread title theme: names of lakes

in or around the Sierras that I've been to.

- Matt



As usual - Thanks for the Posting Matt - It's really appreciated here . . .

< wonderin' how many ever notice that re: the quote ;) >


____________
BOINC Wiki . . .

Science Status Page . . .

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 869868 - Posted: 27 Feb 2009, 1:16:32 UTC - in response to Message 869769.

this is really only a problem (so far) during client download phases - workunits alone don't really clobber the network except for short, infrequent events (like right after the weekly outage).

Question: could the client binaries themselves be located off-site (or closer to your feed), perhaps a single server just for those files?

____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24239
Credit: 519,558
RAC: 57
United States
Message 869876 - Posted: 27 Feb 2009, 1:36:19 UTC - in response to Message 869868.

this is really only a problem (so far) during client download phases - workunits alone don't really clobber the network except for short, infrequent events (like right after the weekly outage).

Question: could the client binaries themselves be located off-site (or closer to your feed), perhaps a single server just for those files?

That technique was causing quite a bit of the grief. The problem was the redirect - which apparently a large number of firewalls and ISPs do not allow. The ISP substitutes a page that notes that this is not allowed, and the substitute page downloads correctly. BOINC thinks it has the file until it does a checksum at which point all tasks that relied on that file error out and then new ones are downloaded and run into the same problem. Churning like this is a good way of filling the bandwidth. If the BOINC client knew where to go look for the files instead of a redirect, then it would work better. Another possibility would to have something like BitTorrent - where BOINC would ask the torrent for the files and the torrent would tell BOINC where to go fetch.
____________


BOINC WIKI

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4220
Credit: 1,040,185
RAC: 419
United States
Message 869889 - Posted: 27 Feb 2009, 2:13:30 UTC - in response to Message 869840.

I understand that when the bandwidth only spikes for 'short' periods of time, it may not be a priority to fix it. However, per a not-so-recent news release, S@H's need for more computers and volunteers was going to increase many-fold. In light of this hoped-for need, can one actually afford to deprioritize the bandwidth issue?


Well, to clarify - bandwidth is definitely a priority. Solving them by compressing workunits? Not so much... It wasn't a priority in the past for reasons I stated earlier, and it isn't a priority in the near future since we'll need a lot more bandwidth than compressing workunits will provide. This is why we're exploring the bigger options mentioned in an earlier thread.

- Matt

The A2010 ALFALFA Spring 2009 observations will continue until the end of April, so the midrange work they provide gives a convenient period to decide how to handle the increase in VHAR 'shorties' after that. I guess the delivery pipeline would make the critical time the second or third week of May.

I agree compression wouldn't be enough for a permanent solution, but it might ease problems temporarily.
Joe

Profile Mike O
Avatar
Send message
Joined: 1 Sep 07
Posts: 428
Credit: 6,670,998
RAC: 0
United States
Message 869891 - Posted: 27 Feb 2009, 2:16:26 UTC

Thanks Matt..
I didnt know why compression wasnt being used, thats why I started with... Heres a crazy idea.
It seems pretty clear now actually. You would need to have a beta server and testers.



____________
Not Ready Reading BRAIN. Abort/Retry/Fail?

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 869911 - Posted: 27 Feb 2009, 3:28:10 UTC - in response to Message 869876.

this is really only a problem (so far) during client download phases - workunits alone don't really clobber the network except for short, infrequent events (like right after the weekly outage).

Question: could the client binaries themselves be located off-site (or closer to your feed), perhaps a single server just for those files?

That technique was causing quite a bit of the grief. The problem was the redirect - which apparently a large number of firewalls and ISPs do not allow. The ISP substitutes a page that notes that this is not allowed, and the substitute page downloads correctly. BOINC thinks it has the file until it does a checksum at which point all tasks that relied on that file error out and then new ones are downloaded and run into the same problem. Churning like this is a good way of filling the bandwidth. If the BOINC client knew where to go look for the files instead of a redirect, then it would work better. Another possibility would to have something like BitTorrent - where BOINC would ask the torrent for the files and the torrent would tell BOINC where to go fetch.

Wearing my ISP hat for a moment, I'm not sure why an ISP would block the redirect. Corporate America is a different story.

Do the applications have to come from the same download servers as the work? Seems like the easiest solution would be to tell BOINC to look for clients.setiathome.ssl.berkeley.EDU or somesuch and download from there.

Then that could be one, or several, distributed servers around the planet just based on the number of "A" records in DNS.

... or it could be mapped to the same IP on smaller projects.
____________

Profile Bryan Price
Send message
Joined: 3 Apr 99
Posts: 3
Credit: 509,972
RAC: 0
United States
Message 869928 - Posted: 27 Feb 2009, 4:35:31 UTC - in response to Message 869769.

Second, the concern that binary data will freak out some incredibly protective proxies or ISPs (the traffic is all going over port 80).


But binary data is already going over port 80. A lot of web pages already get compressed by the web server before they get transferred to browser with then proceeds to uncompress that stream to render it.

2/26/2009 10:59:55 PM||Libraries: libcurl/7.19.0 OpenSSL/0.9.8i zlib/1.2.3


That's why libcurl includes zlib, to unpack the data stream. The client appears to already be ready to at least receive if not already send compressed streams. caveat vendor, I'm not a programmer and looking at the documentation I don't see a way of handling it on the sending side, not that it would be necessary.

And if the server is already doing THAT processing, zipping the file isn't going to help the bandwidth one little bit.
____________

DJStarfox
Send message
Joined: 23 May 01
Posts: 1040
Credit: 541,672
RAC: 182
United States
Message 869951 - Posted: 27 Feb 2009, 5:39:27 UTC - in response to Message 869769.

I agree implementing your own compression within BOINC would require a significant amount of work, and the rewards are not large enough given your time and financial constraints.

However... Have you considered HTTP based compression? Apache could compress the work unit downloads realtime using gzip or deflate. You'll get about 27% smaller downloads this way.

Here's a website that shows you how to edit the apache config file.
http://www.serverwatch.com/tutorials/article.php/10825_3514866_2

I'd have to check the BOINC code, but this should be very straight-forward. The one disadvantage to this feature is CPU load on the web servers will increase. Memory requirements for compression are minimal.

DJStarfox
Send message
Joined: 23 May 01
Posts: 1040
Credit: 541,672
RAC: 182
United States
Message 869954 - Posted: 27 Feb 2009, 5:48:47 UTC - in response to Message 869928.

Oh, I didn't see your post, Bryan. Yes, if it's already being compressed via HTTP realtime, then there's no point to using GZIP/Deflate. But if it's not... then that should be a quick fix to enable.

Edit: tcpdump shows compressed data. Can someone run tcpdump while fetching work from SETI to verify?

Profile [seti.international] Dirk SadowskiProject donor
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7047
Credit: 59,770,800
RAC: 21,803
Germany
Message 869960 - Posted: 27 Feb 2009, 6:25:29 UTC


BTW.
Since days nearly every work request results that the PC get only one new WU.
So BOINC ask and ask again.. Because the cache couldn't fill up. More serverload..

Is there a reason, why this happen?

____________
BR



>Das Deutsche Cafe. The German Cafe.<

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 869961 - Posted: 27 Feb 2009, 6:26:52 UTC - in response to Message 869928.

But binary data is already going over port 80. A lot of web pages already get compressed by the web server before they get transferred to browser with then proceeds to uncompress that stream to render it.

I think we're forgetting GIFs and JPGs, to name a couple.

____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8437
Credit: 47,968,863
RAC: 60,168
United Kingdom
Message 869981 - Posted: 27 Feb 2009, 9:46:38 UTC - in response to Message 869876.

That technique was causing quite a bit of the grief. The problem was the redirect - which apparently a large number of firewalls and ISPs do not allow. The ISP substitutes a page that notes that this is not allowed, and the substitute page downloads correctly. BOINC thinks it has the file until it does a checksum at which point all tasks that relied on that file error out and then new ones are downloaded and run into the same problem. Churning like this is a good way of filling the bandwidth. If the BOINC client knew where to go look for the files instead of a redirect, then it would work better. Another possibility would to have something like BitTorrent - where BOINC would ask the torrent for the files and the torrent would tell BOINC where to go fetch.

The problem wasn't so much with the location of the files on another server, it was with the technique used to tell the BOINC client where to collect the files from. Matt made the (very reasonable) point that 'caching' (of any description) means that the relief server doesn't have to be manually loaded with the new files when anything changes.

But subject to that limitation, and the need to issue Matt with a pair of long-range kicking boots in case anything goes wrong, then establishing an application download server with a different URL closer to the head-end of the 1Gb/s feed might buy a bit of time. Einstein does something like this - even supplying BOINC with a set of multiple download urls - and it seems to work well in general, with just occasional glitches if a mirror site goes off-air. Comparing notes with Einstein might throw up some more ideas.

Profile Andy Lee Robinson
Avatar
Send message
Joined: 8 Dec 05
Posts: 615
Credit: 40,434,237
RAC: 42,824
Hungary
Message 869996 - Posted: 27 Feb 2009, 12:10:44 UTC - in response to Message 869951.

However... Have you considered HTTP based compression? Apache could compress the work unit downloads realtime using gzip or deflate. You'll get about 27% smaller downloads this way.


gz compression over http is great for servers that are generating significant html/xml/css/js traffic, and although advisable to use on this forum, this traffic is a tiny fraction of the WU bandwidth requirements. Its fraction of total bandwidth would be small.

Compressibility is related to entropy, and in terms of compression a given block of data has varying degrees of 'sponginess' - you could take a large sponge, compress it and feed it though a gas pipe, and it expands again on reaching the exit. Same principle.

However, we are analysing noise that is random and has no identifiable redundancy - to any compressor it is indistinguishable from concrete!

The only viable solution that I recommend is to repackage the XML workunits with binary CDATA instead of base64 encoding.

I don't agree that this will cause any problems with routing or content filtering - http has to handle all forms of binary transmission, or we would have no web!

If the apps were properly XML compliant and not make assumptions about content encoding then they should not need to be rewritten. Transmission should be transparent regardless of content encoding, and much bandwidth could be saved as with Astropulse WUs.

Andy.

Robert
Send message
Joined: 2 May 00
Posts: 5
Credit: 2,266,286
RAC: 0
United States
Message 870016 - Posted: 27 Feb 2009, 13:26:56 UTC

I just received a 1200+ hour Astropulse work unit (computer is a P4 2.65Mhz) isn't that kinda long ??
____________

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2244
Credit: 8,551,230
RAC: 4,310
United States
Message 870055 - Posted: 27 Feb 2009, 15:26:35 UTC

That technique was causing quite a bit of the grief. The problem was the redirect - which apparently a large number of firewalls and ISPs do not allow. The ISP substitutes a page that notes that this is not allowed, and the substitute page downloads correctly. BOINC thinks it has the file until it does a checksum at which point all tasks that relied on that file error out and then new ones are downloaded and run into the same problem. Churning like this is a good way of filling the bandwidth. If the BOINC client knew where to go look for the files instead of a redirect, then it would work better. Another possibility would to have something like BitTorrent - where BOINC would ask the torrent for the files and the torrent would tell BOINC where to go fetch.

I was kind of thinking more along the lines of don't try to download any work for an application that needs to be downloaded, until the application has successfully downloaded. The problem was WUs were being assigned, BOINC was saying "hey, I need this application to run these tasks", it would then try to get the application, then the app download failed, and all the WUs were dumped, and the process would start over again at the next connect interval.

Another thing that would have greatly helped keep that situation from spiraling out of control is what I believe would be a better way to do the quota system. Instead of doubling the quota for a good result returned, it should just be +2. It was pointed out that if you are on an 8-CPU system and you turn in 800+ bad tasks, it only takes eleven (11) good results to bring the quota back to 800. 4-cpu takes 10, 2-cpu only takes 8. Then there's the CUDA quotas that were thrown into the mix, as well, with the multiply factor for that. I think +2 instead of 2x would keep problem computers at bay very nicely. It doesn't even have to be +2.. it can be +5..just as long as it's not multiplication.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

Kaylie
Send message
Joined: 26 Jul 08
Posts: 39
Credit: 332,100
RAC: 0
United States
Message 870108 - Posted: 27 Feb 2009, 17:33:16 UTC - in response to Message 870055.

This may be a dumb question, but...
How do I know if I have a problem computer and how can I fix it if I do?

(Not wanting to be part of the problem), Thanks.

Cosmic_Ocean
Avatar
Send message
Joined: 23 Dec 00
Posts: 2244
Credit: 8,551,230
RAC: 4,310
United States
Message 870111 - Posted: 27 Feb 2009, 17:50:43 UTC - in response to Message 870108.

This may be a dumb question, but...
How do I know if I have a problem computer and how can I fix it if I do?

(Not wanting to be part of the problem), Thanks.

A problem computer is one that returns an excessive amount of errors. Each error reduces your daily CPU quota by one. Typically you want to be at 100 all the time, but something as simple as missing a deadline will count as an error and reduce the quota.

You can look at your hosts page and check and see what the daily quotas are for all of your systems. If you find one that is lower than 90, there's a problem that needs to be addressed. Now that I've said that, there are many different problems, and of course, multiple solutions per problem, so I'm not going to make this post any longer than this.
____________

Linux laptop uptime: 1484d 22h 42m
Ended due to UPS failure, found 14 hours after the fact

1 · 2 · Next

Message boards : Technical News : Mirror (Feb 26 2009)

Copyright © 2014 University of California