Working as Expected (Jul 13 2009) |
![]() |
| log in |
Message boards : Technical News : Working as Expected (Jul 13 2009)
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next
| Author | Message |
|---|---|
Gigabit is one solution, but looking at the progress rate of SETI it is only temporary one - S@H will stand in front of the hardware problem in the future - I think that the better way is to distribute the backend of the project on many sites preferably distributed across the globe. Also I don't believe the project scientists would like to lose direct control of the project data. It's like a "chain of custody" in an investigation. If you can't account for the security of the data throughout the process then questions will arise if something of some importance is claimed at a later date. ____________ Boinc....Boinc....Boinc....Boinc.... | |
| ID: 918370 · | |
Also I don't believe the project scientists would like to lose direct control of the project data. It's like a "chain of custody" in an investigation. If you can't account for the security of the data throughout the process then questions will arise if something of some importance is claimed at a later date. You can be sure files won't be modified. There are checksums on all files, and additionally digital signatures for executable files. For uploads, the client first sends the file to the upload server, then sends checksums when reporting the task (which wouldn't go through any intermediate server). ____________ Contribute to the Wiki! | |
| ID: 918371 · | |
The project scientists don't have control while the work is on our crunchers. That's why two crunchers work each WU. ____________ | |
| ID: 918374 · | |
This may seem silly, but what would the bandwidth of a couple of large HDDs or tapes (say, terabyte or larger) be if sent by courier or post from various other sites back to berkeley? It would get around the issue of bandwidth consumption, and the data could be stripped off the drive and confirmed before it was wiped from the satellite systems doing the actual receiving... Low-tech I know, and there would be the delay of receiving and loading the data, but its a thought... | |
| ID: 918379 · | |
The biggest problem with that is that the scheduler can't assign the work while the big hard drives are in transit. ... of course, we're assuming that there is someone at the other end who is willing to swap disks. ____________ | |
| ID: 918380 · | |
|
Would an set of remote upload servers as "data aggregators" work? | |
| ID: 918399 · | |
Would an set of remote upload servers as "data aggregators" work? I thought of this too when ppl started to suggest multiple worldwide spread ul servers. A Quick zip indicates that 58 finished results takes up aprox 275K in size so that would equal 46 Mb for every 10000 results in size. That file could reside in an incoming folder with a cron script unpacking *.zip into that same folder. Very nice idea actually. In my perspective worth looking into. Kind regards Vyper P.S I'm sure that Swedish University Network (www.sunet.se) here in Sweden would donate their gigabit links for this purpose, perhaps Telia also. D.S ____________ _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group | |
| ID: 918402 · | |
A Quick zip indicates that 58 finished results takes up aprox 275K in size so that would equal 46 Mb for every 10000 results in size. Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that. So all you need is a server capable of running Apache/fastcgi, with a public-facing NIC capable of handling that many connections (the only stress point), and a private-facing NIC which can transfer one 45MB file every seven or eight minutes. That sounds do-able: didn't someone post a link saying that Campus had some free refurb servers available for bidding? Edit - better get a RAID / hot spare / fresh set of server-class, 5-year warranty, 15,000 rpm disk drives though - they're going to take a hammering! | |
| ID: 918406 · | |
Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that. Looking at the 90 day graphs shows quite a few paeks over 100,000, and a couple over 200,000. ____________ Grant Darwin NT. | |
| ID: 918414 · | |
Eric is close to implementing a configuration change which will increase the resolution of chirp rates (thus increasing analysis/sensitivity) Can you tell me more about this chirp rate resolution increasing? The maximum bandwidth SETI@home is 1200 Hz, I guess. Why not to increase also this? 2400 Hz? 4800 Hz? 9600 Hz? HTH. | |
| ID: 918415 · | |
A Quick zip indicates that 58 finished results takes up aprox 275K in size so that would equal 46 Mb for every 10000 results in size. That would be rather cool but the more places you could get a upload server on there is less load. The upload server would need to check whether the users country setting on it's account too, if i'm resident in Scandinavia it would accept connections on that server , england , russia , ireland etc and deny if not set or international if it's set at no country or international berkeley doesn't send out that servers ip to connect to but defaults to its own slower connection. Bu implementing that you pinpoint users to their countrylocations and servers they could connect to to get a reliable connection. Should this be doable? Perhaps the upload server and scheduler would be made "more intelligent" if multiple upload servers reside in different countries. Kind regards Vyper ____________ _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group | |
| ID: 918417 · | |
Would an set of remote upload servers as "data aggregators" work? Unfortunately, zip is not an option for a couple of reasons. First, that particular data does not zip very weil, and second the server CPUs are just about saturated anyway so they do not have the extra CPU power to to the unzips, and third, reports cannot be made until the server has the data. The compression ratio on the data as tested range from aout 3% for AP to about 17% for SETI. This is not enough to fix the problem. The last round of bottlenecks were server CPU issues. We don't need to aggravate that area either. Reports are asynchronous and can occur at any time after the file is uploaded. If the report is made, and the file cannot be located, the report will be rejected along with the credit request. You still have to cram all of the data through the same pipe, it is only in a larger file instead of a bunch of smaller ones. ____________ BOINC WIKI | |
| ID: 918418 · | |
Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that. Yes, that's why I said "sustained". Those will be reporting peaks, not upload peaks: typically, when the upload server has been left open through a 4-hour maintenance window, and every upload is reported within 1 hour (the maintenance backoff interval) after the scheduler comes back up. My suggested data aggregator (you could also call it a communications multiplexor) would also act as a buffer, helping to smooth the upload peaks even more. And if it did reach its incoming connection limit - well, we'd just get backoffs and retries, as now. At least it would (I hope - waiting for the networking gurus to check it over) de-couple the upload problems from the download saturation. | |
| ID: 918420 · | |
Unfortunately, zip is not an option for a couple of reasons. First, that particular data does not zip very weil... John, No, no, NO! These are uploads I'm talking about. They are tiny text files, ASCII/XML, and they compress very sweetly. And even if they didn't, it wouldn't matter much. As Joe Segur says, My firm belief is there's no bandwidth problem on the link going to SSL, rather it's a transaction rate problem. My suggestion is addressed at solving the transaction overhead by combining the files. But I accept your point about needing to check the CPU overhead of the unzip process on Bruno when the zips arrive. Can I offer the payback of losing the comms overhead, as a consolation? | |
| ID: 918422 · | |
The compression ratio on the data as tested range from aout 3% for AP to about 17% for SETI. This is not enough to fix the problem. Nope, that is if you are refering to work downloads then there is no point ziping it, but in terms of uploads the result is compressed roughly 80% for me, 28K gets to around 5K in size in terms of uploads to berkeley. The last round of bottlenecks were server CPU issues. We don't need to aggravate that area either. Ok didn't know that the cpu's are stalled at berkeley, only thought that there were exessive amount of disk access and database shuffling. You still have to cram all of the data through the same pipe, it is only in a larger file instead of a bunch of smaller ones. That's correct but in terms of TCP/IP efficiency it's mych better getting one connection needing to transfer 40 Mbytes of data instead of 25000 that all network equipment need to account for, don't forget that switches gets congested to and with those large amounts of people trying to connect many switches doesn't even have memory enough to keep track of all the mac adressess connecting. //Vyper ____________ _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group | |
| ID: 918423 · | |
John, Oops you got me there :) You were 4 minutes faster.. //Vyper ____________ _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group | |
| ID: 918424 · | |
|
This is my first post to this (any) board, although I have been lurking for years. I'm sure they already thought of this and probably tried it but I need to get the thought out of my head. If S@H is having server problems or network problems maybe they can distribute the load. Seti@home@home or BOINC@home. If they need uploads collected, combined and stored for a while SIGN ME UP. Make it a BOINC project ! | |
| ID: 918437 · | |
|
One thought I have had..... | |
| ID: 918441 · | |
|
Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries. | |
| ID: 918443 · | |
Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries. YES, and NO... Boinc has an inbuilt setting which only allow x number of simultaneous tranfers (Default = 2 per project) ____________ Flying high with Team Sicituradastra. | |
| ID: 918445 · | |
Message boards : Technical News : Working as Expected (Jul 13 2009)
| Copyright © 2013 University of California |