Working as Expected (Jul 13 2009)

Author	Message
1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 918374 - Posted: 16 Jul 2009, 4:45:13 UTC - in response to Message 918370. Also I don't believe the project scientists would like to lose direct control of the project data. It's like a "chain of custody" in an investigation. If you can't account for the security of the data throughout the process then questions will arise if something of some importance is claimed at a later date. The project scientists don't have control while the work is on our crunchers. That's why two crunchers work each WU. ID: 918374 ·

Red_Wolf_2 Send message Joined: 24 Sep 00 Posts: 1 Credit: 2,776,134 RAC: 0	Message 918379 - Posted: 16 Jul 2009, 5:05:21 UTC - in response to Message 918356. Same for upload. Yes, you can put multiple upload servers around the world, but eventually, somehow, the data has to go back to setiathome.berkeley.edu. This may seem silly, but what would the bandwidth of a couple of large HDDs or tapes (say, terabyte or larger) be if sent by courier or post from various other sites back to berkeley? It would get around the issue of bandwidth consumption, and the data could be stripped off the drive and confirmed before it was wiped from the satellite systems doing the actual receiving... Low-tech I know, and there would be the delay of receiving and loading the data, but its a thought... ID: 918379 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 918380 - Posted: 16 Jul 2009, 5:10:45 UTC - in response to Message 918379. Same for upload. Yes, you can put multiple upload servers around the world, but eventually, somehow, the data has to go back to setiathome.berkeley.edu. This may seem silly, but what would the bandwidth of a couple of large HDDs or tapes (say, terabyte or larger) be if sent by courier or post from various other sites back to berkeley? It would get around the issue of bandwidth consumption, and the data could be stripped off the drive and confirmed before it was wiped from the satellite systems doing the actual receiving... Low-tech I know, and there would be the delay of receiving and loading the data, but its a thought... The biggest problem with that is that the scheduler can't assign the work while the big hard drives are in transit. ... of course, we're assuming that there is someone at the other end who is willing to swap disks. ID: 918380 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 918399 - Posted: 16 Jul 2009, 8:37:17 UTC Last modified: 16 Jul 2009, 8:57:43 UTC Would an set of remote upload servers as "data aggregators" work? Receives many small files, handles many TCP connections: then, once an hour or so, zips them all up into one big compressed file and only requires one connection to Berkeley? Maybe even negotiate for permission for those files to arrive over the campus network (upload only), so that Hurricane Electric becomes effectively download only? It would need a change to the transitioner/validator logic and timing, to avoid those 'validate error' we get when the report arrives before the upload: but with BOINC doing delayed/batched reporting anyway, the overall effect wouldn't be big. It would finally give me an excuse to ditch v5.10.13 (no point in early reporting!). All it really needs is that if the validator can't find the file it needs at the first attempt, it goes into backoff/retry (like much of the rest of BOINC) instead of immediate error. Edit - with the current scale of operations, it might even work with a single co-lo server down at the bottom of the hill at Campus networking HQ. Keep all those upload syn/ack packets off the 100MB. If it works there, it's scalable to other campuses, other continents as SETI grows. ID: 918399 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 918402 - Posted: 16 Jul 2009, 9:16:52 UTC - in response to Message 918399. Last modified: 16 Jul 2009, 9:21:18 UTC Would an set of remote upload servers as "data aggregators" work? Receives many small files, handles many TCP connections: then, once an hour or so, zips them all up into one big compressed file and only requires one connection to Berkeley? Maybe even negotiate for permission for those files to arrive over the campus network (upload only), so that Hurricane Electric becomes effectively download only? It would need a change to the transitioner/validator logic and timing, to avoid those 'validate error' we get when the report arrives before the upload: but with BOINC doing delayed/batched reporting anyway, the overall effect wouldn't be big. It would finally give me an excuse to ditch v5.10.13 (no point in early reporting!). All it really needs is that if the validator can't find the file it needs at the first attempt, it goes into backoff/retry (like much of the rest of BOINC) instead of immediate error. Edit - with the current scale of operations, it might even work with a single co-lo server down at the bottom of the hill at Campus networking HQ. Keep all those upload syn/ack packets off the 100MB. If it works there, it's scalable to other campuses, other continents as SETI grows. I thought of this too when ppl started to suggest multiple worldwide spread ul servers. A Quick zip indicates that 58 finished results takes up aprox 275K in size so that would equal 46 Mb for every 10000 results in size. That file could reside in an incoming folder with a cron script unpacking *.zip into that same folder. Very nice idea actually. In my perspective worth looking into. Kind regards Vyper P.S I'm sure that Swedish University Network (www.sunet.se) here in Sweden would donate their gigabit links for this purpose, perhaps Telia also. D.S _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 918402 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 918406 - Posted: 16 Jul 2009, 10:27:36 UTC - in response to Message 918402. Last modified: 16 Jul 2009, 10:43:52 UTC A Quick zip indicates that 58 finished results takes up aprox 275K in size so that would equal 46 Mb for every 10000 results in size. That file could reside in an incoming folder with a cron script unpacking *.zip into that same folder. Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that. So all you need is a server capable of running Apache/fastcgi, with a public-facing NIC capable of handling that many connections (the only stress point), and a private-facing NIC which can transfer one 45MB file every seven or eight minutes. That sounds do-able: didn't someone post a link saying that Campus had some free refurb servers available for bidding? Edit - better get a RAID / hot spare / fresh set of server-class, 5-year warranty, 15,000 rpm disk drives though - they're going to take a hammering! ID: 918406 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 918414 - Posted: 16 Jul 2009, 11:29:54 UTC - in response to Message 918406. Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that. Looking at the 90 day graphs shows quite a few paeks over 100,000, and a couple over 200,000. Grant Darwin NT ID: 918414 ·

HTH Volunteer tester Send message Joined: 8 Jul 00 Posts: 691 Credit: 909,237 RAC: 0	Message 918415 - Posted: 16 Jul 2009, 11:34:17 UTC - in response to Message 917472. Eric is close to implementing a configuration change which will increase the resolution of chirp rates (thus increasing analysis/sensitivity) Can you tell me more about this chirp rate resolution increasing? The maximum bandwidth SETI@home is 1200 Hz, I guess. Why not to increase also this? 2400 Hz? 4800 Hz? 9600 Hz? HTH. ID: 918415 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 918417 - Posted: 16 Jul 2009, 11:47:51 UTC - in response to Message 918406. A Quick zip indicates that 58 finished results takes up aprox 275K in size so that would equal 46 Mb for every 10000 results in size. That file could reside in an incoming folder with a cron script unpacking *.zip into that same folder. Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that. So all you need is a server capable of running Apache/fastcgi, with a public-facing NIC capable of handling that many connections (the only stress point), and a private-facing NIC which can transfer one 45MB file every seven or eight minutes. That sounds do-able: didn't someone post a link saying that Campus had some free refurb servers available for bidding? Edit - better get a RAID / hot spare / fresh set of server-class, 5-year warranty, 15,000 rpm disk drives though - they're going to take a hammering! That would be rather cool but the more places you could get a upload server on there is less load. The upload server would need to check whether the users country setting on it's account too, if i'm resident in Scandinavia it would accept connections on that server , england , russia , ireland etc and deny if not set or international if it's set at no country or international berkeley doesn't send out that servers ip to connect to but defaults to its own slower connection. Bu implementing that you pinpoint users to their countrylocations and servers they could connect to to get a reliable connection. Should this be doable? Perhaps the upload server and scheduler would be made "more intelligent" if multiple upload servers reside in different countries. Kind regards Vyper _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 918417 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 918418 - Posted: 16 Jul 2009, 11:52:45 UTC - in response to Message 918399. Would an set of remote upload servers as "data aggregators" work? Receives many small files, handles many TCP connections: then, once an hour or so, zips them all up into one big compressed file and only requires one connection to Berkeley? Maybe even negotiate for permission for those files to arrive over the campus network (upload only), so that Hurricane Electric becomes effectively download only? It would need a change to the transitioner/validator logic and timing, to avoid those 'validate error' we get when the report arrives before the upload: but with BOINC doing delayed/batched reporting anyway, the overall effect wouldn't be big. It would finally give me an excuse to ditch v5.10.13 (no point in early reporting!). All it really needs is that if the validator can't find the file it needs at the first attempt, it goes into backoff/retry (like much of the rest of BOINC) instead of immediate error. Edit - with the current scale of operations, it might even work with a single co-lo server down at the bottom of the hill at Campus networking HQ. Keep all those upload syn/ack packets off the 100MB. If it works there, it's scalable to other campuses, other continents as SETI grows. Unfortunately, zip is not an option for a couple of reasons. First, that particular data does not zip very weil, and second the server CPUs are just about saturated anyway so they do not have the extra CPU power to to the unzips, and third, reports cannot be made until the server has the data. The compression ratio on the data as tested range from aout 3% for AP to about 17% for SETI. This is not enough to fix the problem. The last round of bottlenecks were server CPU issues. We don't need to aggravate that area either. Reports are asynchronous and can occur at any time after the file is uploaded. If the report is made, and the file cannot be located, the report will be rejected along with the credit request. You still have to cram all of the data through the same pipe, it is only in a larger file instead of a bunch of smaller ones. BOINC WIKI ID: 918418 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 918420 - Posted: 16 Jul 2009, 11:54:57 UTC - in response to Message 918414. Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that. Looking at the 90 day graphs shows quite a few peaks over 100,000, and a couple over 200,000. Yes, that's why I said "sustained". Those will be reporting peaks, not upload peaks: typically, when the upload server has been left open through a 4-hour maintenance window, and every upload is reported within 1 hour (the maintenance backoff interval) after the scheduler comes back up. My suggested data aggregator (you could also call it a communications multiplexor) would also act as a buffer, helping to smooth the upload peaks even more. And if it did reach its incoming connection limit - well, we'd just get backoffs and retries, as now. At least it would (I hope - waiting for the networking gurus to check it over) de-couple the upload problems from the download saturation. ID: 918420 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 918422 - Posted: 16 Jul 2009, 12:04:27 UTC - in response to Message 918418. Unfortunately, zip is not an option for a couple of reasons. First, that particular data does not zip very weil... The compression ratio on the data as tested range from aout 3% for AP to about 17% for SETI. This is not enough to fix the problem. John, No, no, NO! These are uploads I'm talking about. They are tiny text files, ASCII/XML, and they compress very sweetly. And even if they didn't, it wouldn't matter much. As Joe Segur says, My firm belief is there's no bandwidth problem on the link going to SSL, rather it's a transaction rate problem. My suggestion is addressed at solving the transaction overhead by combining the files. But I accept your point about needing to check the CPU overhead of the unzip process on Bruno when the zips arrive. Can I offer the payback of losing the comms overhead, as a consolation? ID: 918422 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 918423 - Posted: 16 Jul 2009, 12:08:05 UTC - in response to Message 918418. The compression ratio on the data as tested range from aout 3% for AP to about 17% for SETI. This is not enough to fix the problem. Nope, that is if you are refering to work downloads then there is no point ziping it, but in terms of uploads the result is compressed roughly 80% for me, 28K gets to around 5K in size in terms of uploads to berkeley. The last round of bottlenecks were server CPU issues. We don't need to aggravate that area either. Ok didn't know that the cpu's are stalled at berkeley, only thought that there were exessive amount of disk access and database shuffling. You still have to cram all of the data through the same pipe, it is only in a larger file instead of a bunch of smaller ones. That's correct but in terms of TCP/IP efficiency it's mych better getting one connection needing to transfer 40 Mbytes of data instead of 25000 that all network equipment need to account for, don't forget that switches gets congested to and with those large amounts of people trying to connect many switches doesn't even have memory enough to keep track of all the mac adressess connecting. //Vyper _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 918423 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 918424 - Posted: 16 Jul 2009, 12:09:25 UTC - in response to Message 918422. Last modified: 16 Jul 2009, 12:11:14 UTC John, No, no, NO! These are uploads I'm talking about. They are tiny text files, ASCII/XML, and they compress very sweetly Oops you got me there :) You were 4 minutes faster.. //Vyper _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 918424 ·

D.J.Lankenau Send message Joined: 17 May 99 Posts: 1 Credit: 614,824 RAC: 0	Message 918437 - Posted: 16 Jul 2009, 13:23:00 UTC Last modified: 16 Jul 2009, 13:28:47 UTC This is my first post to this (any) board, although I have been lurking for years. I'm sure they already thought of this and probably tried it but I need to get the thought out of my head. If S@H is having server problems or network problems maybe they can distribute the load. Seti@home@home or BOINC@home. If they need uploads collected, combined and stored for a while SIGN ME UP. Make it a BOINC project ! ID: 918437 ·

Virtual Boss* Volunteer tester Send message Joined: 4 May 08 Posts: 417 Credit: 6,440,287 RAC: 0	Message 918441 - Posted: 16 Jul 2009, 13:35:05 UTC One thought I have had..... BUT it would require a change to the Boinc client software. I'll throw it in the ring anyway It seems a lot of the problem is the continual hammering of the upload server with attempt to upload by each result individually. Why not get Boinc to apply the backoff to ALL results attempting to upload to that SAME server that caused the initial backoff. This would mean having a backoff clock for each upload server, instead of for each result. This would mean just one or two (whatever your # of simultaneous tranfers setting) results would make the attempt, then the rest of the results waiting (up to 1000's in some cases) would be backed off as well and give the servers a breather. Not being a programmer, I'm not sure how difficult this would be to implement (proverbially it doesn't seem like it would be to me), and the benefits of reduced bandwidth wasting should be substantial. Please feel free to comment. Flying high with Team Sicituradastra. ID: 918441 ·

Bill Walker Send message Joined: 4 Sep 99 Posts: 3868 Credit: 2,697,267 RAC: 0	Message 918443 - Posted: 16 Jul 2009, 13:43:22 UTC - in response to Message 918441. Last modified: 16 Jul 2009, 13:49:07 UTC Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries. Edit added after serious thought: maybe what we need is longer maximum delay times, meaning an adjustment in BOINC and an adjustment in the deadlines set by the projects. All this would result in longer turn around times on the average, but that may be the price we pay to accommodate the ever increasing number of users on limited hardware. On a related note, I can remember when airlines would sell you a ticket AFTER you got on the plane on some shuttle flights, now you have to buy the ticket days in advance and get to the airport hours before the flight. All just signs of straining infrastructure. ID: 918443 ·

Virtual Boss* Volunteer tester Send message Joined: 4 May 08 Posts: 417 Credit: 6,440,287 RAC: 0	Message 918445 - Posted: 16 Jul 2009, 13:46:15 UTC - in response to Message 918443. Last modified: 16 Jul 2009, 13:47:41 UTC Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries. YES, and NO... Boinc has an inbuilt setting which only allow x number of simultaneous tranfers (Default = 2 per project) Flying high with Team Sicituradastra. ID: 918445 ·

Bill Walker Send message Joined: 4 Sep 99 Posts: 3868 Credit: 2,697,267 RAC: 0	Message 918449 - Posted: 16 Jul 2009, 13:52:00 UTC - in response to Message 918445. Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries. YES, and NO... Boinc has an inbuilt setting which only allow x number of simultaneous tranfers (Default = 2 per project) So after the delay BOINC would try x uploads, then when they fail x more, and so on. That sounds exactly like what happens right now when a frustrated user hits "Retry Now" for all his umpteen pending uploads. Not an improvement, in my opinion. ID: 918449 ·

Virtual Boss* Volunteer tester Send message Joined: 4 May 08 Posts: 417 Credit: 6,440,287 RAC: 0	Message 918451 - Posted: 16 Jul 2009, 13:58:38 UTC - in response to Message 918445. Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries. YES, and NO... Boinc has an inbuilt setting which only allow x number of simultaneous tranfers (Default = 2 per project) Just to clarify...(using the default setting of two) Yes ... ALL would be ready to attempt upload at the same time. IF the first 2 attempting upload failed, they would all back off again. or IF the first two succeeded, two more would immediately attempt upload... continueing until all uploaded ( or one failed, which would initiate a new back off) Flying high with Team Sicituradastra. ID: 918451 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.