Busy Bytes (Jul 06 2009)


log in

Advanced search

Message boards : Technical News : Busy Bytes (Jul 06 2009)

Previous · 1 · 2 · 3 · 4 · 5 · Next
Author Message
Profile S@NL - Eesger - www.knoop.nl
Avatar
Send message
Joined: 7 Oct 01
Posts: 384
Credit: 37,563,805
RAC: 9,734
Netherlands
Message 916223 - Posted: 9 Jul 2009, 18:22:36 UTC - in response to Message 916148.

While yes, BitTorrent really isn't an option here since it only works in a one to many topology, I agree the idea of sending the data once to get it delivered to all the wingmen does have merrit. No it doesn't really solve the problem the way a gbit line would, but it would certainly help, essentially cutting the outgoing bandwidth in half.

Hmm, that way you still need to buy/rent/pay for the traffic on the other location.. I'm afraid that won't fix much.. the big advantage of the first wu downloader distributing it to all the wingmen would be also the distribution of 50% (maybe even more factoring in timeouts etc?) of the bandwidth/traffic!
..
I'm not sure that gains anything though. From the graphs I've seen the problem is greater on the incoming side than the outgoing. But frankly that makes no sense to me. Aren't results considerably smaller than workunits? Could it be that all this excess incoming traffic is just requests that are going unanswered? If so, then this idea might actually be worth something.


This one I can answer ;0 (red it somewhere)
What is inbound for one is outbound for the other.. what you consider inbound is mentioned as outbound in the graph!
____________
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 916325 - Posted: 9 Jul 2009, 21:57:24 UTC - in response to Message 916143.

And the torrent system could be (ab)used for moving the wu from the first downloader to the wingman.. This could (almost) split the bandwidth usage in half! but I have no idea if the torrent system can handle stuff like this.. (and the security/legal issues getting involved in making BOINC have a torrent component)

Incoming connections to workstations here is simply not an option, so a work unit assigned to my machine would not be available for another machine to pick up, period.
____________

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 916328 - Posted: 9 Jul 2009, 22:01:40 UTC - in response to Message 916148.


As I understand things, the problem is getting the data to the ISP rather than getting it from there out to the crunchers, right? So how hard would it be to get a SETI box colo'd at the ISP? I expect this could be done with a very simple box just running a slightly tweaked version of squid.

That moves the problem, it doesn't really fix it. Data still has to get from SSL to the "squid" box.

Matt has said repeatedly that there are enough interdependencies that shuffling servers around offsite is not likely.

It is more likely that the server closet could be moved to another building, where gigabit rates are already available.

Would they move staff to be closer to the boxes, or would the staff become masters at remote administration?

As Matt said, CNS is working through the options now, and while they aren't fast, they tend to be thorough.

____________

Profile S@NL - Eesger - www.knoop.nl
Avatar
Send message
Joined: 7 Oct 01
Posts: 384
Credit: 37,563,805
RAC: 9,734
Netherlands
Message 916335 - Posted: 9 Jul 2009, 22:09:43 UTC
Last modified: 9 Jul 2009, 22:10:21 UTC

In an other thread there is much information that answers things here.

The mentioned 'torrent idea' (or the sql-structure idea I mentioned) will most likely not fix it all, but it might be a way to alleviate some of the "pain". Things don't have to be a complete problem solver.. but if it helps, it helps (says he who doesn't know all the inner workings, but I 'cling' to the idea there aren't stupid questions, only stupid answers ;).
____________
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS

Aurora Borealis
Volunteer tester
Avatar
Send message
Joined: 14 Jan 01
Posts: 3000
Credit: 5,180,429
RAC: 2,812
Canada
Message 916362 - Posted: 10 Jul 2009, 0:14:57 UTC

The use of torrent would only be useful to distribute the application software and the Boinc manager software. This is already being considerered as a possible future option by the Boinc devs. It just wouldn't be efficient for WU distribution.

Profile S@NL - Eesger - www.knoop.nl
Avatar
Send message
Joined: 7 Oct 01
Posts: 384
Credit: 37,563,805
RAC: 9,734
Netherlands
Message 916467 - Posted: 10 Jul 2009, 8:05:35 UTC

Ah!

So then, more people with the same idea, then it almost must be a good idea ;)
____________
The SETI@Home Gauntlet 2012 april 16 - 30| info / chat | STATS

DrFoo
Send message
Joined: 17 Jul 99
Posts: 26
Credit: 26,138,215
RAC: 17,635
United States
Message 916505 - Posted: 10 Jul 2009, 12:49:50 UTC - in response to Message 916223.

Hmm, that way you still need to buy/rent/pay for the traffic on the other location..

My understanding is they already are paying for the bandwidth! Or perhaps it was donated. Either way, they can only use 1/10 of what's available to them because they can't get data to the ISP fast enough. While I expect placing a box at the ISP would cost something more, that wouldn't really be necessary. The box could actually be placed anywhere at Berkeley that already has a gigabit line.

This one I can answer ;0 (red it somewhere)
What is inbound for one is outbound for the other.. what you consider inbound is mentioned as outbound in the graph!

ROFL, of course it is! I do this sort of thing for a living, that's not the question. I was speaking strictly from the BOINC/SETI point of view.
____________

DrFoo
Send message
Joined: 17 Jul 99
Posts: 26
Credit: 26,138,215
RAC: 17,635
United States
Message 916512 - Posted: 10 Jul 2009, 13:24:47 UTC - in response to Message 916328.

That moves the problem, it doesn't really fix it. Data still has to get from SSL to the "squid" box.

Of course, but the whole point is they send each workunit once to squid instead of two or more times directly to each wingman. There are other less obvious advantages here too. If, as I suspect, much of the traffic is simply caused by congestion, this should eliminate a lot of additional traffic, not to mention some hefty socket load on the upload servers.

Anytime your network can't keep up with the data, the burden on your servers goes through the roof since they have to wait around for acknowledgments they will simply never receive. You end up opening a dozen sockets or more where only one would be needed without the congestion. It's the sort of thing a network admin tries to avoid like the plague. Never ever ever saturate your network!

Matt has said repeatedly that there are enough interdependencies that shuffling servers around offsite is not likely.

It is more likely that the server closet could be moved to another building, where gigabit rates are already available.

I understand that, but this is not at all the same thing as moving an existing upload server elsewhere. It's adding a much needed cache (or lift station if you will) to the system. It's a very simple box with a single purpose in life. Such boxes are actually quite common at ISPs and in large corporate networks.

... or would the staff become masters at remote administration?

Heh, trust me, they already are! Administering a Linux box across the room, down the hall, or across town is really all the same. The only time it makes a difference is when you need to physically reset the box or change out some hardware.

As Matt said, CNS is working through the options now, and while they aren't fast, they tend to be thorough.

Unfortunately, I didn't read that yesterday until after I posted. No doubt a gigabit line directly to the server closet makes my idea a rather moot point. Given that fact, I agree it's best to wait and see what CNS comes up with first.
____________

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 916544 - Posted: 10 Jul 2009, 16:42:42 UTC - in response to Message 916512.


Matt has said repeatedly that there are enough interdependencies that shuffling servers around offsite is not likely.

It is more likely that the server closet could be moved to another building, where gigabit rates are already available.

I understand that, but this is not at all the same thing as moving an existing upload server elsewhere. It's adding a much needed cache (or lift station if you will) to the system. It's a very simple box with a single purpose in life. Such boxes are actually quite common at ISPs and in large corporate networks.

What is really frustrating is that most people making suggestions like this are looking at SETI@Home through "web-colored glasses" and this is not the web.

Splitters read raw data recorded at Berkeley, and write the result files to the upload server, and enter the relevant info into the database.

Putting the upload server off-site limits the splitter to the speed of the link between the two sites (at best).

In other words, we're still talking about trying to move the problem (from one end of the wire to the other, effectively) and not solve the problem.

... and yes, I do understand what ISPs do. I've been doing it for 15 years.

____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24785
Credit: 524,053
RAC: 86
United States
Message 916588 - Posted: 10 Jul 2009, 20:04:01 UTC - in response to Message 916512.

That moves the problem, it doesn't really fix it. Data still has to get from SSL to the "squid" box.

Of course, but the whole point is they send each workunit once to squid instead of two or more times directly to each wingman. There are other less obvious advantages here too. If, as I suspect, much of the traffic is simply caused by congestion, this should eliminate a lot of additional traffic, not to mention some hefty socket load on the upload servers.

Anytime your network can't keep up with the data, the burden on your servers goes through the roof since they have to wait around for acknowledgments they will simply never receive. You end up opening a dozen sockets or more where only one would be needed without the congestion. It's the sort of thing a network admin tries to avoid like the plague. Never ever ever saturate your network!

Matt has said repeatedly that there are enough interdependencies that shuffling servers around offsite is not likely.

It is more likely that the server closet could be moved to another building, where gigabit rates are already available.

I understand that, but this is not at all the same thing as moving an existing upload server elsewhere. It's adding a much needed cache (or lift station if you will) to the system. It's a very simple box with a single purpose in life. Such boxes are actually quite common at ISPs and in large corporate networks.

... or would the staff become masters at remote administration?

Heh, trust me, they already are! Administering a Linux box across the room, down the hall, or across town is really all the same. The only time it makes a difference is when you need to physically reset the box or change out some hardware.

As Matt said, CNS is working through the options now, and while they aren't fast, they tend to be thorough.

Unfortunately, I didn't read that yesterday until after I posted. No doubt a gigabit line directly to the server closet makes my idea a rather moot point. Given that fact, I agree it's best to wait and see what CNS comes up with first.

Moving the upload server down the hill means that the validator either needs to load the files across the link down the hill, or the validator has to move down the hill. If the validator moves down the hill, then the DB access from the validator either has to be done up the hill, or you move the DB down the hill. If you move the DB down the hill, then you have to either do DB access from the splitters, the web pages, and everything else across the link down the hill, or move everything down the hill.

In any case, the servers all have cross mounted drives so moving anything is a really bad idea.
____________


BOINC WIKI

DrFoo
Send message
Joined: 17 Jul 99
Posts: 26
Credit: 26,138,215
RAC: 17,635
United States
Message 916604 - Posted: 10 Jul 2009, 20:49:42 UTC - in response to Message 916588.

Okay, I realize this is bordering on beating a dead horse, but you guys just aren't getting it. The entire current infrastructure with the complex cross-mounts stays put just where it is. Nothing gets moved.

The clients still make all requests to the exact same machines as they are now, and all transfers from the clients go directly to the core. The only thing that changes is that requests for file downloads are intercepted by a squid box located someplace upstream that has the full bandwidth available. When a request comes in one of two things happens:

Squid already has that file and sends it out. If the core system needs to know that the unit was sent out successfully, the squid box tells it so. I haven't dug into the details, but this can't possibly be a difficult tweak to make on the standard squid program if need be. My guess, though, is that the remote client is doing such acknowledgments already, in which case everything already works, no changes are needed at all.

The other possibility of course is that squid doesn't have the file, in which case it simply requests it from the core upload server in exactly the same manner as a remote client does now, and then forwards the file to the remote client.

This is just a simple cache, it wouldn't mess things up in the core in any way, nor would it be at all difficult to implement.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5917
Credit: 61,695,730
RAC: 28,713
Australia
Message 916620 - Posted: 10 Jul 2009, 21:26:45 UTC - in response to Message 916604.

This is just a simple cache, it wouldn't mess things up in the core in any way, nor would it be at all difficult to implement.

How does the data get to & from the cache?

____________
Grant
Darwin NT.

Profile Jon Golding
Avatar
Send message
Joined: 20 Apr 00
Posts: 56
Credit: 365,460
RAC: 1
United Kingdom
Message 916626 - Posted: 10 Jul 2009, 21:36:57 UTC - in response to Message 916604.
Last modified: 10 Jul 2009, 21:40:47 UTC

Okay, I realize this is bordering on beating a dead horse, but you guys just aren't getting it. The entire current infrastructure with the complex cross-mounts stays put just where it is. Nothing gets moved.

The clients still make all requests to the exact same machines as they are now, and all transfers from the clients go directly to the core. The only thing that changes is that requests for file downloads are intercepted by a squid box located someplace upstream that has the full bandwidth available. When a request comes in one of two things happens:

Squid already has that file and sends it out. If the core system needs to know that the unit was sent out successfully, the squid box tells it so. I haven't dug into the details, but this can't possibly be a difficult tweak to make on the standard squid program if need be. My guess, though, is that the remote client is doing such acknowledgments already, in which case everything already works, no changes are needed at all.

The other possibility of course is that squid doesn't have the file, in which case it simply requests it from the core upload server in exactly the same manner as a remote client does now, and then forwards the file to the remote client.

This is just a simple cache, it wouldn't mess things up in the core in any way, nor would it be at all difficult to implement.


100% agreed. But would this box even need to be connected to the SSL servers? Why not a dedicated box that only distributes pre-split WUs to clients on request. This could be housed somewhere down the hill from SSL and would simply require someone from the Lab (one of the students) to come by and hot-swap in a fresh drive full of pre-split WUs every few days.
That's all this box would do, just a distribution mirror - clients still return all processed data to the SSL servers, as usual.
I emphasise that this box does not take over any function of the current servers - it just acts as an additional distribution service, which should take some pressure off the SSL servers. If it works, then maybe several could be set up on campus?
____________

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 916628 - Posted: 10 Jul 2009, 21:52:48 UTC - in response to Message 916620.

This is just a simple cache, it wouldn't mess things up in the core in any way, nor would it be at all difficult to implement.

How does the data get to & from the cache?

The same way it does with any cache: the request goes to the squid box and if it doesn't already have it, the squid box gets it from the download servers, and then serves it to the user.

It's all HTTP.

The assumption is that for most work units, the downloads are going to happen close enough together that a squid cache would be able to get the work unit once, and serve it up twice (that it wouldn't expire from squid before the second download).

I don't think it'd cut the bandwidth in half, but it might help some.
____________

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5917
Credit: 61,695,730
RAC: 28,713
Australia
Message 916630 - Posted: 10 Jul 2009, 22:02:02 UTC - in response to Message 916628.

The assumption is that for most work units, the downloads are going to happen close enough together that a squid cache would be able to get the work unit once, and serve it up twice (that it wouldn't expire from squid before the second download).

But would that be the case?
When a Work Unit is initially split, does the Scheduler download it to one client, then download it to another on the next download request, or is other work scheduled for download before the 2nd copy of the recently split work unit?
____________
Grant
Darwin NT.

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 916634 - Posted: 10 Jul 2009, 22:17:49 UTC - in response to Message 916630.

The assumption is that for most work units, the downloads are going to happen close enough together that a squid cache would be able to get the work unit once, and serve it up twice (that it wouldn't expire from squid before the second download).

But would that be the case?
When a Work Unit is initially split, does the Scheduler download it to one client, then download it to another on the next download request, or is other work scheduled for download before the 2nd copy of the recently split work unit?

The downloaded file is the same file for every result, and the download file does not contain the result file name.

The question is: how close together are the initial two downloads? Are they close enough that the squid cache would not discard the first copy before the second request?

After that (i.e. reissues) those might pair-up if, and only if, both original issues timed out.

Wikimedia says they use Squid and it reduced load on their servers by 75%, but they serve the same pages over and over and over.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8757
Credit: 52,707,223
RAC: 27,580
United Kingdom
Message 916641 - Posted: 10 Jul 2009, 22:45:57 UTC - in response to Message 916634.

The question is: how close together are the initial two downloads? Are they close enough that the squid cache would not discard the first copy before the second request?

Actually, I suspect the question is: Does the second request follow so closely on the first that it arrives before squid has completed the cache operation triggered by the first request?

Most downloads will have just two instances, within a second of each other according to a spot-check. How does squid handle a request for a file "not cached yet, but requested and on its way"?

Other downloads will have a third or later instance, separated by up to a month. Unless the squid cache is of comparable size to the main download server (at least a terabyte), these month-delayed downloads are likely to have been flushed and need a re-fetch.

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24785
Credit: 524,053
RAC: 86
United States
Message 916649 - Posted: 10 Jul 2009, 23:19:57 UTC - in response to Message 916641.
Last modified: 10 Jul 2009, 23:20:44 UTC

The question is: how close together are the initial two downloads? Are they close enough that the squid cache would not discard the first copy before the second request?

Actually, I suspect the question is: Does the second request follow so closely on the first that it arrives before squid has completed the cache operation triggered by the first request?

Most downloads will have just two instances, within a second of each other according to a spot-check. How does squid handle a request for a file "not cached yet, but requested and on its way"?

Other downloads will have a third or later instance, separated by up to a month. Unless the squid cache is of comparable size to the main download server (at least a terabyte), these month-delayed downloads are likely to have been flushed and need a re-fetch.

And the files are going to have different names XXXXX_0 and XXXXX_1. Will squid know that these are the same file?

At least I believe that this is the case.
____________


BOINC WIKI

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5917
Credit: 61,695,730
RAC: 28,713
Australia
Message 916654 - Posted: 10 Jul 2009, 23:25:55 UTC - in response to Message 916649.


Seti science aside, it really is a very intersing project from a technology point of view. Lots of variables, lots of dependancies & interactions (some positive feedback & others negative), limited resources, conflicting goals.
As i said, very interesting.
____________
Grant
Darwin NT.

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 916658 - Posted: 10 Jul 2009, 23:41:29 UTC - in response to Message 916641.
Last modified: 10 Jul 2009, 23:50:20 UTC

The question is: how close together are the initial two downloads? Are they close enough that the squid cache would not discard the first copy before the second request?

Actually, I suspect the question is: Does the second request follow so closely on the first that it arrives before squid has completed the cache operation triggered by the first request?

Very good point: is squid smart enough to realize it's already trying to grab the work unit and not connect again? Either way, timing is everything.

On cache size, I'd expect anything big enough to carry six hours or so worth of downloads would be as good as something that held six days -- or close enough.

It might be possible to get squid to limit bandwidth or connections to the download server.

But all of that said, it'd be good for no more than a 2:1 boost, and I think I'd be surprised if it was half that.
____________

Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Technical News : Busy Bytes (Jul 06 2009)

Copyright © 2014 University of California