Message boards :
Technical News :
Busy Bytes (Jul 06 2009)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5
Author | Message |
---|---|
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
And the files are going to have different names XXXXX_0 and XXXXX_1. Will squid know that these are the same file? I looked at a work request, and it appears that the file name does not have the suffix. In other words, the scheduler says "go get XXXXX and return the result named XXXXX_0" -- but I haven't read the documentation to make sure. The squid cache machine would need a SETI@Home IP, so it used the HE bandwidth, boinc2.ssl.berkeley.edu would point at the squid address, and told how to find the "true" download server. It looks like it is possible. The best part of the idea is that once configured, it would require a minimum amount of attention (as simple as making sure the machine is up). It would require a little bit of rack space in a high-bandwidth spot, and some help from Campus to make it all play. I'm a tiny bit skeptical about the gain. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Very good point: is squid smart enough to realize it's already trying to grab the work unit and not connect again? Since each paired workunit has a different filename ending in _0, _1, etc., wouldn't squid think they are all different files? |
W-K 666 Send message Joined: 18 May 99 Posts: 19064 Credit: 40,757,560 RAC: 67 |
Not convinced, yet, that this squid box idea will work, but isn't there still going to be lots of comms between the squid box and the main servers. How long do the WU's live on the quid box? What happens with re-issues, Who or what moves the WU's up and down the hill? Surely not a student as this is a 24/7/365 job, and also not one for the present staff, they are too overworked and there are not enough of them as it is. Who re-stocks squid at 3 in the morning when it is realised that all the previous issued batch of tasks were VHAR and therefore are only a third of what is really required for the next period. The more I think of this, the more and more I am convinced this is not a good idea, the manpower and logistics are just not in place or can be afforded to make it work. Suggest if you do, that you need to completely think the idea through and then present it. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Not convinced, yet, that this squid box idea will work, but isn't there still going to be lots of comms between the squid box and the main servers. Re-issues are only a couple percent of the total number of results downloaded. The problem is the file names. If they are different, SQUID is unlikely to realize that they are the same contents. BOINC WIKI |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Very good point: is squid smart enough to realize it's already trying to grab the work unit and not connect again? The work units have the same file name. The result files (the uploaded result) differs with _0, _1, etc. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Not convinced, yet, that this squid box idea will work, but isn't there still going to be lots of comms between the squid box and the main servers. I'm not convinced either, but these questions are not the reason. If you look in the BOINC data directory, you'll see the last scheduler response, scanning through it: <wu_name>21no08aa.25514.8252.11.8.89</wu_name> <name>21no08aa.25514.8252.11.8.89_1</name> This gives the WU name, and the result file name. A bit of searching also gives the URL for the file: <url>http://boinc2.ssl.berkeley.edu/sah/download_fanout/3e4/21no08aa.25514.8252.11.8.39</url> Note that there is no _1 in the URL. Here is how it would work: Your BOINC client resolves boinc2.ssl.berkeley.edu and connects to it to transfer data. The resolved IP is the Squid box. Squid searches the cacne and doesn't find /sah/download_fanout/3e4/21no08aa.25514.8252.11.8.39, so it uses rules in the configuration file to change the host name (maybe it's "secret-boinc2.ssl.berkeley.edu") and connects to the REAL download server, and transfers the file. It stores a copy (it doesn't know if anyone will ask for it again) and sends a copy to the original request. Answers to questions: How long do they stay: Depends on the hard drive, but it's not important as long as most paired downloads occur before the cache expires. Something in my head says an hour would be plenty. It can always get the WU again. (A second request within an hour would skip talking to the "real" download server) What happens with reissues: Squid checks the cache, and then gets the work from the download server. Who or what moves the work: Squid does, automatically. It also automatically manages the cache. The idea is that the cache would give a 2:1 leverage -- that files currently transit the 100mbit line twice, and this way, they'd transit once. In reality, it'll be something between 2:1 and a slight bottleneck caused by extra overhead. ... and only for downloads, not uploads or the scheduler. |
ML1 Send message Joined: 25 Nov 01 Posts: 20291 Credit: 7,508,002 RAC: 20 |
The idea is that the cache would give a 2:1 leverage -- that files currently transit the 100mbit line twice, and this way, they'd transit once. More importantly, you'll get a 10:1 improvement over that at present in that the squid box will be facing the 1000Mbit/s link. However, without some traffic management measures included into the Boinc server and Boinc clients, you're just moving the real problem up to higher numbers... I think the squid box could well certainly help even if it isn't an actual solution. Regards, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
djmotiska Send message Joined: 26 Jul 01 Posts: 20 Credit: 29,378,647 RAC: 105 |
100% agreed. But would this box even need to be connected to the SSL servers? Why not a dedicated box that only distributes pre-split WUs to clients on request. This could be housed somewhere down the hill from SSL and would simply require someone from the Lab (one of the students) to come by and hot-swap in a fresh drive full of pre-split WUs every few days. This is exactly what I suggested in my post in this thread. I think the existing 100 mbit pipe can handle the DB traffic, so maybe the splitter could also be moved outside the lab, if needed. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
100% agreed. But would this box even need to be connected to the SSL servers? Why not a dedicated box that only distributes pre-split WUs to clients on request. This could be housed somewhere down the hill from SSL and would simply require someone from the Lab (one of the students) to come by and hot-swap in a fresh drive full of pre-split WUs every few days. The DB is getting hammered. There are already problems with DB access speeds (this was the last bottleneck that was worked on). BOINC WIKI |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
100% agreed. But would this box even need to be connected to the SSL servers? Why not a dedicated box that only distributes pre-split WUs to clients on request. This could be housed somewhere down the hill from SSL and would simply require someone from the Lab (one of the students) to come by and hot-swap in a fresh drive full of pre-split WUs every few days. Moving parts of the setup is not an option according to Matt's post. Its an all-or-nothing kind of deal. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
A hypothetical cache would be limited by the 100Mbit link for fetching files from the download server. It could at most get a 2:1 improvement because one workunit file is sent to no more than two people. To get a 10:1 ratio, you'd need to send that work to ten people. That's where the idea starts hitting limits: there is too much randomness. |
OzzFan Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28 |
Not to mention that if the outgoing connection to everyone else is 1Gbit, but the incoming connection to the splitter/db is 100Mbit, you're sending faster than you're receiving so there will still be a slight bottleneck, but it should still be an improvement. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
The place where it would work really well is whenever a new science app. needs to be downloaded -- assuming it comes from the DL server of course. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
I don't know, but if the high bandwidth was caused by downloading the AP 5.05 application, then this particular "burst" could have been downloading the app. as needed. If so, an offsite cache would have helped for this particular case. |
ML1 Send message Joined: 25 Nov 01 Posts: 20291 Credit: 7,508,002 RAC: 20 |
OK... Enough 'random' guessing. Doing a few sums I've come up with the numbers for how many connections the s@h upload and download servers should be able to service simultaneously... As an experiment, can s@h limit the max number of simultaneous upload/download connections to see what happens to the data rates? I suggest a 'first try' max number of simultaneous connections of 150 for uploads and 20 for downloads. Adjust as necessary to keep the link at an average that is no more than just 80 Mbit/s. Yep. Those numbers are very low compared to usual website serving. Note that uploads and downloads are not a typical website, even if they are being serviced by a web server... Anyone like to back up those numbers please? Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
100% agreed. But would this box even need to be connected to the SSL servers? Why not a dedicated box that only distributes pre-split WUs to clients on request. This could be housed somewhere down the hill from SSL and would simply require someone from the Lab (one of the students) to come by and hot-swap in a fresh drive full of pre-split WUs every few days. Matt doesn't have time to post all possible details of how BOINC works, there is documentation and source code elsewhere for those who want full detail. Certainly most parts of the BOINC backend need to interact with the database and therefore need to be kept local. Gigabit ethernet is needed for those transactions, 10Gbit would be even better. The splitters definitely need database access. The download server is an exception to the rule, it does not interact with the database. In fact, there is no "download server" code in BOINC, the splitters simply store created work in a directory structure. The URL of the WU is passed to hosts, after they report the work done and validation is complete the file deleter removes the WU. That allows the download server to be located anywhere, and I believe some projects operate that way. But of course that requires a remote file server to have all the WUs which it is serving, around 6 terabytes for this project now. The proposed squid proxy would need about 45 gigabytes for 1 hour of cache. As the project is operating now, the 2 initial replication tasks are sent within a few seconds of each other. The proxy would certainly provide the 2:1 reduction for those cases. If one of those hosts immediately trashes and reports the error, a reissue is generated and put at the end of the "Results ready to send" queue. If that queue is short the reissue can be sent before the proxy cache expires, and typically the queue is shortest when the project is in difficulty. But it's a small advantage, probably squid would perform better with a smaller cache. Could a 3 minute cache ( ~2.25 GB) which would easily fit in memory be a better approach? Joe |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Just gaming this around in my head, I doubt there would be a big difference in "gain" between a few minutes of cache and a few hours of cache, since most of the "hits" would be two downloads in short succession, and then the file would sit until expired. If I was doing the work, I'd set up to measure the effects, and then experiment until I found the optimal rate. |
Nicolas Send message Joined: 30 Mar 05 Posts: 161 Credit: 12,985 RAC: 0 |
And the files are going to have different names XXXXX_0 and XXXXX_1. Will squid know that these are the same file? Input filenames are unrelated to workunit/result filenames. And the file isn't duplicated for each result (that would make no sense at all). Only output files have the _0, _1 suffix. Contribute to the Wiki! |
Nicolas Send message Joined: 30 Mar 05 Posts: 161 Credit: 12,985 RAC: 0 |
A possibility that might help would be to generate a second task or not based on the "reputation" of the first computer to get the task. A computer that keeps returning no errors that validate would keep getting its reputation increased. You ought to be able to get around 40% of the bandwidth back. That already exists: Adaptive replication. Contribute to the Wiki! |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.