Message boards :
Technical News :
Working as Expected (Jul 13 2009)
Message board moderation
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 . . . 11 · Next
Author | Message |
---|---|
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
What I'm not sure about: the change that Eric made to shorten the "pending connection" queue suggests that the number of simultaneous connections is a big issue, this just moves that issue from the upload server to the server near the edge. The upload bandwidth used jumped from about 7 MBits/sec to 25 MBits/sec, more than a triple and I think that's what Eric was looking at. In other words, we were under 50% utilization when the servers were flooded with queued connections. I think it's likely that the 50% download utilization was due to many hosts with work request disabled by stalled uploads. The Cricket graphs only have 10 minute resolution, but when the upload usage jumped to 25 MBits/sec the download jumped to 69 MBits/sec, then 84 MBits/sec for two intervals, then ~90 MBits/sec. IOW, the download increase took about 30 minutes. I'm not really disagreeing, I'm just saying that the server out on the edge is going to be subject to all of the problems Bruno faces now -- and be more accessible. Bruno has a fibre channel disk array, IIRC, and that's exactly why it is used as the upload handler, file deleter, etc. In fact it's used for so many things between Main and Beta I wonder how a system with two single core 2.8 GHz. Xeon CPUs handles them all as well as it does. One change from your design that I would make: I would try to keep two connections going at speed at all times, so that if one connection stalled for any reason the other could use that bandwidth -- and each time a transfer completes, I'd start making a new .zip file, instead of doing it hourly or somesuch. I agree a 2.25 MByte file every 30 seconds or so would be better than a 45 MByte file every ten minutes. Neither strains any reasonable connection rate criteria, and too much delay gives too much opportunity for Murphy's law to work. Joe |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
The interesting thing that we saw when Eric made his change was a sudden, dramatic increase in bandwidth used, from somewhere around 40 megabits to something near 90 megabits -- Eric said "tripled." There was something strange about that transition that I don't fully understand: it seemed different from anything we've seen before. Here's a static copy of Eric's image, so that it doesn't scroll off the screen while we think about it: The upload server was disabled until around 09:00 local Wednesday. Then it was turned on, and nothing happened. Downloads continued as before, and a few - very few, fewer than usual at 95% download - uploads crept through. Then, around 17:00 local, a dam burst, and both uploads and downloads jumped. Eric posted at 17:22 local if I've got the timezones right: which suggests that prior to that point, the upload server was (first) disabled, and (second) misconfigured. Perhaps Matt tried to set up a new configuration, couldn't get it to work, and disabled the server meaning to come back to it later. Whatever. Maybe we'll find out when Matt is back from his vacation, maybe we won't - no big deal either way (he's earned the time off many times over). What I'm saying is - I'm not sure we can put the low rates from 09:00 to 17:00, and the relative jump after 17:00, purely to "flooded with queued connections". |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
One thought I have had..... This has been implemented and checked in. It has NOT made it as far as test code yet though. BOINC WIKI |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
You know, some people had pointed that out already in this same thread... ;-) And now we are three. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
The upload server was disabled until around 09:00 local Wednesday. Then it was turned on, and nothing happened..... I was thinking along similar lines. Configuration tweak or otherwise- normally as soon as the outbound traffic drops, if there is a backlog of uploads waiting to happen- it happens. Yet after thte upload server came back online (and there was relatively bugger all download traffic at the time) there was only the slightest increase in upload traffic. Grant Darwin NT |
nero Send message Joined: 28 Jun 03 Posts: 5 Credit: 18,414 RAC: 0 |
Hi guys Just a query, the program says i have 3 work units ready to report. They have been sitting in tasks for days. The other work units have benn uploaded. Is this an issue with the program or the server? |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
Hi guys Just a query, the program says i have 3 work units ready to report. They have been sitting in tasks for days. The other work units have benn uploaded. Is this an issue with the program or the server? Neither. Reporting tends to put a fair load on the database, so it's only done when absolutely necessary. From memory it's generally when requesting more work, or the deadline of a result is close. Grant Darwin NT |
nero Send message Joined: 28 Jun 03 Posts: 5 Credit: 18,414 RAC: 0 |
Thanks Grant I will wait till the other work units are done before I request more work. The ones that are ready for repoerting are not due till next month. |
nero Send message Joined: 28 Jun 03 Posts: 5 Credit: 18,414 RAC: 0 |
Just to let you know they reported when I finished typing the last message. ET must be around QLD Australia, < attempt at a joke. |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
You know, some people had pointed that out already in this same thread... ;-) Four, including John's reply just before your reply. :-) |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14674 Credit: 200,643,578 RAC: 874 |
You know, some people had pointed that out already in this same thread... ;-) Which is what I was commenting on. OK, you got me: I can't count. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
Hi guys Just a query, the program says i have 3 work units ready to report. They have been sitting in tasks for days. The other work units have benn uploaded. Is this an issue with the program or the server? Tasks are reported at the first of: 1) 24 hours before the report deadline. 2) Connect every X before the report deadline. 3) On completion of upload if after 1 or 2. 4) 24 hours after completion. 5) On a work request. 6) On the report of any other task. 7) On a trickle up message. (CPDN only as far as I know). 8) On a trickle down request. (No projects that I am aware of do this). 9) On a server specified minimum connect interval. 10 When the user pushes the "Update" button. BOINC WIKI |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
You know, some people had pointed that out already in this same thread... ;-) Which brings up the question: do we also count the people pointing out that it's already been suggested? |
.clair. Send message Joined: 4 Nov 04 Posts: 1300 Credit: 55,390,408 RAC: 69 |
You know, some people had pointed that out already in this same thread... ;-) Err, how many years are you going back . . ;) Now then, If they switch the forums off during the network / multiple motorway pileup days, how much bandwidth can that save, without us being able to `talk` about it ?? ;) edit - this thread is getting a nice s i z e . . . . |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
Now then, None. The forums use campus bandwidth, uploads & downloads go through a different network. Grant Darwin NT |
ML1 Send message Joined: 25 Nov 01 Posts: 20982 Credit: 7,508,002 RAC: 20 |
Four days hence and the downloads continue to be maxed out on the s@h 100Mbit/s bottleneck, strangling the control packets for all uploads and strangling the downloads themselves down to likely much less than max (lossless) link capacity... Sooo... With a saturated link, what useable download rate is actually being achieved amongst all the TCP resends?... Is some server-side traffic management being put in place? As a bodge-fix, just simply limit the WU supply to limit the download traffic to less than 80Mbit/s? ...Or? Regards, Martin Indeed so... Working exactly as expected. See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
As a bodge-fix, just simply limit the WU supply to limit the download traffic to less than 80Mbit/s? At this point, the problem isn't the newly assigned work, but work already downloaded and work that has been completed and not yet uploaded. Stopping work unit production completely would stop uploads, but the download link would still be saturated until they all get through. |
ML1 Send message Joined: 25 Nov 01 Posts: 20982 Credit: 7,508,002 RAC: 20 |
As a bodge-fix, just simply limit the WU supply to limit the download traffic to less than 80Mbit/s? Crossed wires on the directions?... Note that http://fragment1.berkeley.edu/newcricket/grapher.cgi?target=%2Frouter-interfaces%2Finr-250%2Fgigabitethernet2_3;ranges=d;view=Octets shows the view wrt the router at "the bottom of the hill looking up". The saturated direction is downloads: Berkeley servers -> clients around the world. In whatever way, the rate at which new WUs are made available for download shouldn't exceed the link capacity including a good margin for bursts. Indeed, the present overload won't clear until the presently assigned WUs have cleared or their release rate is controlled. Or unless packet level traffic management is imposed... The uploads (client WU results -> Berkeley servers) have plenty of spare bandwidth to freely upload IF the upload tcp connections had a guaranteed success for return data packets to get through the downlink. There is a recent demonstration of the effect mentioned here and also here. Whatever is done, wherever, and at what level, the link in BOTH directions must be kept at something like 89Mbit/s or less for 'smooth' operation to gain MAXIMUM transfer rates. Although the link shows 90+ Mbit/s downlink, with all the repeated resends due to dropped packets, there's going to be very much less than 90Mbit/s of useful data making it through. That is, the effective bandwidth will be very poor whilst saturated. The source problem is in allowing an unlimited flood of data into a very finite internet connection. Infinite into finite doesn't work... All of which I'm sure must be obvious. (Note that data link "policing" is highly wasteful of data bandwidth. Sure, tcp will mop up the mess, but at a high cost of greatly wasted bandwidth...) Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
DJStarfox Send message Joined: 23 May 01 Posts: 1066 Credit: 1,226,053 RAC: 2 |
Martin, I agree with a lot of what you're saying. This problem is actually very simple. Demand for WU is greater than the WU creation rate. If I'm doing my math correctly, WU creation rate peaks around 8.2MB/sec (corresponds to 23 WU/sec). Demand is already exceeding this and would probably be higher if they had the bandwidth. Short of finding a way to DOUBLE the WU creation rate, the only option is to add latency to the download rate. The easiest way to do this would be to cap download bandwidth at the router. Is there traffic shaping imposed? If not, I would be shocked, as this is the quickest and easiest way to help the situation (assuming the router(s) in place have this capability). It makes no sense to flood the clients with WU because it makes the database (results in the field) grow to an unmanageable size. So, the only quick solution for now is to flow-control the download speeds. By having slower, more reliable downloads/transactions, there will be less retries/resends. It should also give the splitters a little breathing room to build a queue (during the slow times of day). Edit: OK, I see the splitter WU creation rate at 39 WU/sec..corresponding to 14.1 MB/sec of WU. Perhaps there is some I/O contention because less people are downloading querying this late at night. Still, my recommendation for traffic shaping (or changing its parameters) still stands. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13835 Credit: 208,696,464 RAC: 304 |
Edit: OK, I see the splitter WU creation rate at 39 WU/sec..corresponding to 14.1 MB/sec of WU. Perhaps there is some I/O contention because less people are downloading querying this late at night. The splitters have on occasion pumped out as much as 50 MultiBeam Work Units per second, which is way more than required. Generally around 15-18/s is enough to meet demand. Any more than that builds up the Ready to Send buffer. When there are a lot of shorties, around 20-25/s is enough to meet demand, any more than that builds up the buffer. With AP the present demand is a bit more difficult to work out, but it would appear that once caches are full, 1-2 per second is more than enough to build up a ready to send buffer. Grant Darwin NT |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.