Mirror (Feb 26 2009)

Message boards : Technical News : Mirror (Feb 26 2009)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 869769 - Posted: 26 Feb 2009, 19:46:29 UTC

Random day today for me. Catching up on various documentation/sysadmin/data pipeline tasks. Not very glamorous.

The question was raised: Why don't we compress workunits to save bandwidth? I forget the exact arguments, but I think it's a combination of small things that, when added together, make this a very low priority. First, the programming overhead to the splitters, clients, etc. - however minor it may be it's still labor and (even worse) testing. Second, the concern that binary data will freak out some incredibly protective proxies or ISPs (the traffic is all going over port 80). Third, the amount of bandwidth we'd gain by compressing workunits is relatively minor considering the possible effort of making it so. Fourth, this is really only a problem (so far) during client download phases - workunits alone don't really clobber the network except for short, infrequent events (like right after the weekly outage). We might be actually implementing better download logic to prevent coral cache from being a redirect, so that may solve this latter issue. Anyway.. this idea comes up from time to time within our group and we usually determine we have bigger fish to fry. Or lower hanging fruit.

Oh - I guess that's the end of this month's thread title theme: names of lakes in or around the Sierras that I've been to.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 869769 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 869834 - Posted: 26 Feb 2009, 23:12:27 UTC

I understand that when the bandwidth only spikes for 'short' periods of time, it may not be a priority to fix it. However, per a not-so-recent news release, S@H's need for more computers and volunteers was going to increase many-fold. In light of this hoped-for need, can one actually afford to deprioritize the bandwidth issue?

More generally, if a virtual stress test could be performed with 10x more hosts than currently are involved (the news release suggests a potential factor of 500-yikes!), what parts of S@H wouldn't scale and would need to be re-engineered?
ID: 869834 · Report as offensive
Profile Matt Lebofsky
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 1 Mar 99
Posts: 1444
Credit: 957,058
RAC: 0
United States
Message 869840 - Posted: 26 Feb 2009, 23:23:58 UTC - in response to Message 869834.  

I understand that when the bandwidth only spikes for 'short' periods of time, it may not be a priority to fix it. However, per a not-so-recent news release, S@H's need for more computers and volunteers was going to increase many-fold. In light of this hoped-for need, can one actually afford to deprioritize the bandwidth issue?


Well, to clarify - bandwidth is definitely a priority. Solving them by compressing workunits? Not so much... It wasn't a priority in the past for reasons I stated earlier, and it isn't a priority in the near future since we'll need a lot more bandwidth than compressing workunits will provide. This is why we're exploring the bigger options mentioned in an earlier thread.

- Matt

-- BOINC/SETI@home network/web/science/development person
-- "Any idiot can have a good idea. What is hard is to do it." - Jeanne-Claude
ID: 869840 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 869863 - Posted: 27 Feb 2009, 0:38:09 UTC




<snip>

Oh - I guess that's the end of this month's thread title theme: names of lakes

in or around the Sierras that I've been to.

- Matt



As usual - Thanks for the Posting Matt - It's really appreciated here . . .

< wonderin' how many ever notice that re: the quote ;) >


BOINC Wiki . . .

Science Status Page . . .
ID: 869863 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 869868 - Posted: 27 Feb 2009, 1:16:32 UTC - in response to Message 869769.  

this is really only a problem (so far) during client download phases - workunits alone don't really clobber the network except for short, infrequent events (like right after the weekly outage).

Question: could the client binaries themselves be located off-site (or closer to your feed), perhaps a single server just for those files?

ID: 869868 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 869876 - Posted: 27 Feb 2009, 1:36:19 UTC - in response to Message 869868.  

this is really only a problem (so far) during client download phases - workunits alone don't really clobber the network except for short, infrequent events (like right after the weekly outage).

Question: could the client binaries themselves be located off-site (or closer to your feed), perhaps a single server just for those files?

That technique was causing quite a bit of the grief. The problem was the redirect - which apparently a large number of firewalls and ISPs do not allow. The ISP substitutes a page that notes that this is not allowed, and the substitute page downloads correctly. BOINC thinks it has the file until it does a checksum at which point all tasks that relied on that file error out and then new ones are downloaded and run into the same problem. Churning like this is a good way of filling the bandwidth. If the BOINC client knew where to go look for the files instead of a redirect, then it would work better. Another possibility would to have something like BitTorrent - where BOINC would ask the torrent for the files and the torrent would tell BOINC where to go fetch.


BOINC WIKI
ID: 869876 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 869889 - Posted: 27 Feb 2009, 2:13:30 UTC - in response to Message 869840.  

I understand that when the bandwidth only spikes for 'short' periods of time, it may not be a priority to fix it. However, per a not-so-recent news release, S@H's need for more computers and volunteers was going to increase many-fold. In light of this hoped-for need, can one actually afford to deprioritize the bandwidth issue?


Well, to clarify - bandwidth is definitely a priority. Solving them by compressing workunits? Not so much... It wasn't a priority in the past for reasons I stated earlier, and it isn't a priority in the near future since we'll need a lot more bandwidth than compressing workunits will provide. This is why we're exploring the bigger options mentioned in an earlier thread.

- Matt

The A2010 ALFALFA Spring 2009 observations will continue until the end of April, so the midrange work they provide gives a convenient period to decide how to handle the increase in VHAR 'shorties' after that. I guess the delivery pipeline would make the critical time the second or third week of May.

I agree compression wouldn't be enough for a permanent solution, but it might ease problems temporarily.
                                                               Joe
ID: 869889 · Report as offensive
Profile Mike O
Avatar

Send message
Joined: 1 Sep 07
Posts: 428
Credit: 6,670,998
RAC: 0
United States
Message 869891 - Posted: 27 Feb 2009, 2:16:26 UTC

Thanks Matt..
I didnt know why compression wasnt being used, thats why I started with... Heres a crazy idea.
It seems pretty clear now actually. You would need to have a beta server and testers.



Not Ready Reading BRAIN. Abort/Retry/Fail?
ID: 869891 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 869911 - Posted: 27 Feb 2009, 3:28:10 UTC - in response to Message 869876.  

this is really only a problem (so far) during client download phases - workunits alone don't really clobber the network except for short, infrequent events (like right after the weekly outage).

Question: could the client binaries themselves be located off-site (or closer to your feed), perhaps a single server just for those files?

That technique was causing quite a bit of the grief. The problem was the redirect - which apparently a large number of firewalls and ISPs do not allow. The ISP substitutes a page that notes that this is not allowed, and the substitute page downloads correctly. BOINC thinks it has the file until it does a checksum at which point all tasks that relied on that file error out and then new ones are downloaded and run into the same problem. Churning like this is a good way of filling the bandwidth. If the BOINC client knew where to go look for the files instead of a redirect, then it would work better. Another possibility would to have something like BitTorrent - where BOINC would ask the torrent for the files and the torrent would tell BOINC where to go fetch.

Wearing my ISP hat for a moment, I'm not sure why an ISP would block the redirect. Corporate America is a different story.

Do the applications have to come from the same download servers as the work? Seems like the easiest solution would be to tell BOINC to look for clients.setiathome.ssl.berkeley.EDU or somesuch and download from there.

Then that could be one, or several, distributed servers around the planet just based on the number of "A" records in DNS.

... or it could be mapped to the same IP on smaller projects.
ID: 869911 · Report as offensive
Profile Bryan Price

Send message
Joined: 3 Apr 99
Posts: 3
Credit: 509,972
RAC: 0
United States
Message 869928 - Posted: 27 Feb 2009, 4:35:31 UTC - in response to Message 869769.  

Second, the concern that binary data will freak out some incredibly protective proxies or ISPs (the traffic is all going over port 80).


But binary data is already going over port 80. A lot of web pages already get compressed by the web server before they get transferred to browser with then proceeds to uncompress that stream to render it.

2/26/2009 10:59:55 PM||Libraries: libcurl/7.19.0 OpenSSL/0.9.8i zlib/1.2.3


That's why libcurl includes zlib, to unpack the data stream. The client appears to already be ready to at least receive if not already send compressed streams. caveat vendor, I'm not a programmer and looking at the documentation I don't see a way of handling it on the sending side, not that it would be necessary.

And if the server is already doing THAT processing, zipping the file isn't going to help the bandwidth one little bit.
ID: 869928 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 869951 - Posted: 27 Feb 2009, 5:39:27 UTC - in response to Message 869769.  

I agree implementing your own compression within BOINC would require a significant amount of work, and the rewards are not large enough given your time and financial constraints.

However... Have you considered HTTP based compression? Apache could compress the work unit downloads realtime using gzip or deflate. You'll get about 27% smaller downloads this way.

Here's a website that shows you how to edit the apache config file.
http://www.serverwatch.com/tutorials/article.php/10825_3514866_2

I'd have to check the BOINC code, but this should be very straight-forward. The one disadvantage to this feature is CPU load on the web servers will increase. Memory requirements for compression are minimal.
ID: 869951 · Report as offensive
DJStarfox

Send message
Joined: 23 May 01
Posts: 1066
Credit: 1,226,053
RAC: 2
United States
Message 869954 - Posted: 27 Feb 2009, 5:48:47 UTC - in response to Message 869928.  

Oh, I didn't see your post, Bryan. Yes, if it's already being compressed via HTTP realtime, then there's no point to using GZIP/Deflate. But if it's not... then that should be a quick fix to enable.

Edit: tcpdump shows compressed data. Can someone run tcpdump while fetching work from SETI to verify?
ID: 869954 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 869960 - Posted: 27 Feb 2009, 6:25:29 UTC


BTW.
Since days nearly every work request results that the PC get only one new WU.
So BOINC ask and ask again.. Because the cache couldn't fill up. More serverload..

Is there a reason, why this happen?

ID: 869960 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 869961 - Posted: 27 Feb 2009, 6:26:52 UTC - in response to Message 869928.  

But binary data is already going over port 80. A lot of web pages already get compressed by the web server before they get transferred to browser with then proceeds to uncompress that stream to render it.

I think we're forgetting GIFs and JPGs, to name a couple.

ID: 869961 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 869981 - Posted: 27 Feb 2009, 9:46:38 UTC - in response to Message 869876.  

That technique was causing quite a bit of the grief. The problem was the redirect - which apparently a large number of firewalls and ISPs do not allow. The ISP substitutes a page that notes that this is not allowed, and the substitute page downloads correctly. BOINC thinks it has the file until it does a checksum at which point all tasks that relied on that file error out and then new ones are downloaded and run into the same problem. Churning like this is a good way of filling the bandwidth. If the BOINC client knew where to go look for the files instead of a redirect, then it would work better. Another possibility would to have something like BitTorrent - where BOINC would ask the torrent for the files and the torrent would tell BOINC where to go fetch.

The problem wasn't so much with the location of the files on another server, it was with the technique used to tell the BOINC client where to collect the files from. Matt made the (very reasonable) point that 'caching' (of any description) means that the relief server doesn't have to be manually loaded with the new files when anything changes.

But subject to that limitation, and the need to issue Matt with a pair of long-range kicking boots in case anything goes wrong, then establishing an application download server with a different URL closer to the head-end of the 1Gb/s feed might buy a bit of time. Einstein does something like this - even supplying BOINC with a set of multiple download urls - and it seems to work well in general, with just occasional glitches if a mirror site goes off-air. Comparing notes with Einstein might throw up some more ideas.
ID: 869981 · Report as offensive
Profile Andy Lee Robinson
Avatar

Send message
Joined: 8 Dec 05
Posts: 630
Credit: 59,973,836
RAC: 0
Hungary
Message 869996 - Posted: 27 Feb 2009, 12:10:44 UTC - in response to Message 869951.  

However... Have you considered HTTP based compression? Apache could compress the work unit downloads realtime using gzip or deflate. You'll get about 27% smaller downloads this way.


gz compression over http is great for servers that are generating significant html/xml/css/js traffic, and although advisable to use on this forum, this traffic is a tiny fraction of the WU bandwidth requirements. Its fraction of total bandwidth would be small.

Compressibility is related to entropy, and in terms of compression a given block of data has varying degrees of 'sponginess' - you could take a large sponge, compress it and feed it though a gas pipe, and it expands again on reaching the exit. Same principle.

However, we are analysing noise that is random and has no identifiable redundancy - to any compressor it is indistinguishable from concrete!

The only viable solution that I recommend is to repackage the XML workunits with binary CDATA instead of base64 encoding.

I don't agree that this will cause any problems with routing or content filtering - http has to handle all forms of binary transmission, or we would have no web!

If the apps were properly XML compliant and not make assumptions about content encoding then they should not need to be rewritten. Transmission should be transparent regardless of content encoding, and much bandwidth could be saved as with Astropulse WUs.

Andy.
ID: 869996 · Report as offensive
Robert

Send message
Joined: 2 May 00
Posts: 5
Credit: 12,853,177
RAC: 10
United States
Message 870016 - Posted: 27 Feb 2009, 13:26:56 UTC

I just received a 1200+ hour Astropulse work unit (computer is a P4 2.65Mhz) isn't that kinda long ??
ID: 870016 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 870055 - Posted: 27 Feb 2009, 15:26:35 UTC

That technique was causing quite a bit of the grief. The problem was the redirect - which apparently a large number of firewalls and ISPs do not allow. The ISP substitutes a page that notes that this is not allowed, and the substitute page downloads correctly. BOINC thinks it has the file until it does a checksum at which point all tasks that relied on that file error out and then new ones are downloaded and run into the same problem. Churning like this is a good way of filling the bandwidth. If the BOINC client knew where to go look for the files instead of a redirect, then it would work better. Another possibility would to have something like BitTorrent - where BOINC would ask the torrent for the files and the torrent would tell BOINC where to go fetch.

I was kind of thinking more along the lines of don't try to download any work for an application that needs to be downloaded, until the application has successfully downloaded. The problem was WUs were being assigned, BOINC was saying "hey, I need this application to run these tasks", it would then try to get the application, then the app download failed, and all the WUs were dumped, and the process would start over again at the next connect interval.

Another thing that would have greatly helped keep that situation from spiraling out of control is what I believe would be a better way to do the quota system. Instead of doubling the quota for a good result returned, it should just be +2. It was pointed out that if you are on an 8-CPU system and you turn in 800+ bad tasks, it only takes eleven (11) good results to bring the quota back to 800. 4-cpu takes 10, 2-cpu only takes 8. Then there's the CUDA quotas that were thrown into the mix, as well, with the multiply factor for that. I think +2 instead of 2x would keep problem computers at bay very nicely. It doesn't even have to be +2.. it can be +5..just as long as it's not multiplication.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 870055 · Report as offensive
Kaylie

Send message
Joined: 26 Jul 08
Posts: 39
Credit: 333,106
RAC: 0
United States
Message 870108 - Posted: 27 Feb 2009, 17:33:16 UTC - in response to Message 870055.  

This may be a dumb question, but...
How do I know if I have a problem computer and how can I fix it if I do?

(Not wanting to be part of the problem), Thanks.
ID: 870108 · Report as offensive
Cosmic_Ocean
Avatar

Send message
Joined: 23 Dec 00
Posts: 3027
Credit: 13,516,867
RAC: 13
United States
Message 870111 - Posted: 27 Feb 2009, 17:50:43 UTC - in response to Message 870108.  

This may be a dumb question, but...
How do I know if I have a problem computer and how can I fix it if I do?

(Not wanting to be part of the problem), Thanks.

A problem computer is one that returns an excessive amount of errors. Each error reduces your daily CPU quota by one. Typically you want to be at 100 all the time, but something as simple as missing a deadline will count as an error and reduce the quota.

You can look at your hosts page and check and see what the daily quotas are for all of your systems. If you find one that is lower than 90, there's a problem that needs to be addressed. Now that I've said that, there are many different problems, and of course, multiple solutions per problem, so I'm not going to make this post any longer than this.
Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)
ID: 870111 · Report as offensive
1 · 2 · Next

Message boards : Technical News : Mirror (Feb 26 2009)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.