Working as Expected (Jul 13 2009)


log in

Advanced search

Message boards : Technical News : Working as Expected (Jul 13 2009)

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next
Author Message
Profile Geek@Play
Volunteer tester
Avatar
Send message
Joined: 31 Jul 01
Posts: 2467
Credit: 86,144,272
RAC: 279
United States
Message 918370 - Posted: 16 Jul 2009, 4:30:53 UTC - in response to Message 918356.

Gigabit is one solution, but looking at the progress rate of SETI it is only temporary one - S@H will stand in front of the hardware problem in the future - I think that the better way is to distribute the backend of the project on many sites preferably distributed across the globe.

Many people suggested putting multiple servers around the world for clients to download from, so the bandwidth is distributed. But how do you get the data to those servers in the first place? It would have to go from the servers at Berkeley to the distributed download servers through the current 100mbit pipe. I don't see how that lowers bandwidth at all.

Same for upload. Yes, you can put multiple upload servers around the world, but eventually, somehow, the data has to go back to setiathome.berkeley.edu.


Also I don't believe the project scientists would like to lose direct control of the project data. It's like a "chain of custody" in an investigation. If you can't account for the security of the data throughout the process then questions will arise if something of some importance is claimed at a later date.
____________
Boinc....Boinc....Boinc....Boinc....

Nicolas
Avatar
Send message
Joined: 30 Mar 05
Posts: 160
Credit: 10,335
RAC: 0
Argentina
Message 918371 - Posted: 16 Jul 2009, 4:38:23 UTC - in response to Message 918370.

Also I don't believe the project scientists would like to lose direct control of the project data. It's like a "chain of custody" in an investigation. If you can't account for the security of the data throughout the process then questions will arise if something of some importance is claimed at a later date.

You can be sure files won't be modified. There are checksums on all files, and additionally digital signatures for executable files.

For uploads, the client first sends the file to the upload server, then sends checksums when reporting the task (which wouldn't go through any intermediate server).
____________

Contribute to the Wiki!

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 918374 - Posted: 16 Jul 2009, 4:45:13 UTC - in response to Message 918370.


Also I don't believe the project scientists would like to lose direct control of the project data. It's like a "chain of custody" in an investigation. If you can't account for the security of the data throughout the process then questions will arise if something of some importance is claimed at a later date.

The project scientists don't have control while the work is on our crunchers.

That's why two crunchers work each WU.
____________

Red_Wolf_2
Send message
Joined: 24 Sep 00
Posts: 1
Credit: 2,459,165
RAC: 0
Australia
Message 918379 - Posted: 16 Jul 2009, 5:05:21 UTC - in response to Message 918356.


Same for upload. Yes, you can put multiple upload servers around the world, but eventually, somehow, the data has to go back to setiathome.berkeley.edu.


This may seem silly, but what would the bandwidth of a couple of large HDDs or tapes (say, terabyte or larger) be if sent by courier or post from various other sites back to berkeley? It would get around the issue of bandwidth consumption, and the data could be stripped off the drive and confirmed before it was wiped from the satellite systems doing the actual receiving... Low-tech I know, and there would be the delay of receiving and loading the data, but its a thought...

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 918380 - Posted: 16 Jul 2009, 5:10:45 UTC - in response to Message 918379.


Same for upload. Yes, you can put multiple upload servers around the world, but eventually, somehow, the data has to go back to setiathome.berkeley.edu.


This may seem silly, but what would the bandwidth of a couple of large HDDs or tapes (say, terabyte or larger) be if sent by courier or post from various other sites back to berkeley? It would get around the issue of bandwidth consumption, and the data could be stripped off the drive and confirmed before it was wiped from the satellite systems doing the actual receiving... Low-tech I know, and there would be the delay of receiving and loading the data, but its a thought...

The biggest problem with that is that the scheduler can't assign the work while the big hard drives are in transit.

... of course, we're assuming that there is someone at the other end who is willing to swap disks.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8833
Credit: 53,662,212
RAC: 48,529
United Kingdom
Message 918399 - Posted: 16 Jul 2009, 8:37:17 UTC
Last modified: 16 Jul 2009, 8:57:43 UTC

Would an set of remote upload servers as "data aggregators" work?

Receives many small files, handles many TCP connections: then, once an hour or so, zips them all up into one big compressed file and only requires one connection to Berkeley? Maybe even negotiate for permission for those files to arrive over the campus network (upload only), so that Hurricane Electric becomes effectively download only?

It would need a change to the transitioner/validator logic and timing, to avoid those 'validate error' we get when the report arrives before the upload: but with BOINC doing delayed/batched reporting anyway, the overall effect wouldn't be big. It would finally give me an excuse to ditch v5.10.13 (no point in early reporting!). All it really needs is that if the validator can't find the file it needs at the first attempt, it goes into backoff/retry (like much of the rest of BOINC) instead of immediate error.

Edit - with the current scale of operations, it might even work with a single co-lo server down at the bottom of the hill at Campus networking HQ. Keep all those upload syn/ack packets off the 100MB. If it works there, it's scalable to other campuses, other continents as SETI grows.

Profile -= Vyper =-Project donor
Volunteer tester
Avatar
Send message
Joined: 5 Sep 99
Posts: 1098
Credit: 329,162,668
RAC: 157,426
Sweden
Message 918402 - Posted: 16 Jul 2009, 9:16:52 UTC - in response to Message 918399.
Last modified: 16 Jul 2009, 9:21:18 UTC

Would an set of remote upload servers as "data aggregators" work?

Receives many small files, handles many TCP connections: then, once an hour or so, zips them all up into one big compressed file and only requires one connection to Berkeley? Maybe even negotiate for permission for those files to arrive over the campus network (upload only), so that Hurricane Electric becomes effectively download only?

It would need a change to the transitioner/validator logic and timing, to avoid those 'validate error' we get when the report arrives before the upload: but with BOINC doing delayed/batched reporting anyway, the overall effect wouldn't be big. It would finally give me an excuse to ditch v5.10.13 (no point in early reporting!). All it really needs is that if the validator can't find the file it needs at the first attempt, it goes into backoff/retry (like much of the rest of BOINC) instead of immediate error.

Edit - with the current scale of operations, it might even work with a single co-lo server down at the bottom of the hill at Campus networking HQ. Keep all those upload syn/ack packets off the 100MB. If it works there, it's scalable to other campuses, other continents as SETI grows.


I thought of this too when ppl started to suggest multiple worldwide spread ul servers.

A Quick zip indicates that 58 finished results takes up aprox 275K in size so that would equal 46 Mb for every 10000 results in size.
That file could reside in an incoming folder with a cron script unpacking *.zip into that same folder.

Very nice idea actually.

In my perspective worth looking into.

Kind regards Vyper

P.S I'm sure that Swedish University Network (www.sunet.se) here in Sweden would donate their gigabit links for this purpose, perhaps Telia also. D.S
____________

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8833
Credit: 53,662,212
RAC: 48,529
United Kingdom
Message 918406 - Posted: 16 Jul 2009, 10:27:36 UTC - in response to Message 918402.
Last modified: 16 Jul 2009, 10:43:52 UTC

A Quick zip indicates that 58 finished results takes up aprox 275K in size so that would equal 46 Mb for every 10000 results in size.
That file could reside in an incoming folder with a cron script unpacking *.zip into that same folder.

Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that. So all you need is a server capable of running Apache/fastcgi, with a public-facing NIC capable of handling that many connections (the only stress point), and a private-facing NIC which can transfer one 45MB file every seven or eight minutes. That sounds do-able: didn't someone post a link saying that Campus had some free refurb servers available for bidding?

Edit - better get a RAID / hot spare / fresh set of server-class, 5-year warranty, 15,000 rpm disk drives though - they're going to take a hammering!

Grant (SSSF)
Send message
Joined: 19 Aug 99
Posts: 5955
Credit: 62,523,780
RAC: 40,740
Australia
Message 918414 - Posted: 16 Jul 2009, 11:29:54 UTC - in response to Message 918406.

Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that.

Looking at the 90 day graphs shows quite a few paeks over 100,000, and a couple over 200,000.
____________
Grant
Darwin NT.

HTH
Volunteer tester
Send message
Joined: 8 Jul 00
Posts: 690
Credit: 835,288
RAC: 0
Finland
Message 918415 - Posted: 16 Jul 2009, 11:34:17 UTC - in response to Message 917472.

Eric is close to implementing a configuration change which will increase the resolution of chirp rates (thus increasing analysis/sensitivity)


Can you tell me more about this chirp rate resolution increasing?

The maximum bandwidth SETI@home is 1200 Hz, I guess. Why not to increase also this? 2400 Hz? 4800 Hz? 9600 Hz?

HTH.

Profile -= Vyper =-Project donor
Volunteer tester
Avatar
Send message
Joined: 5 Sep 99
Posts: 1098
Credit: 329,162,668
RAC: 157,426
Sweden
Message 918417 - Posted: 16 Jul 2009, 11:47:51 UTC - in response to Message 918406.

A Quick zip indicates that 58 finished results takes up aprox 275K in size so that would equal 46 Mb for every 10000 results in size.
That file could reside in an incoming folder with a cron script unpacking *.zip into that same folder.

Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that. So all you need is a server capable of running Apache/fastcgi, with a public-facing NIC capable of handling that many connections (the only stress point), and a private-facing NIC which can transfer one 45MB file every seven or eight minutes. That sounds do-able: didn't someone post a link saying that Campus had some free refurb servers available for bidding?

Edit - better get a RAID / hot spare / fresh set of server-class, 5-year warranty, 15,000 rpm disk drives though - they're going to take a hammering!


That would be rather cool but the more places you could get a upload server on there is less load.
The upload server would need to check whether the users country setting on it's account too, if i'm resident in Scandinavia it would accept connections on that server , england , russia , ireland etc and deny if not set or international if it's set at no country or international berkeley doesn't send out that servers ip to connect to but defaults to its own slower connection.

Bu implementing that you pinpoint users to their countrylocations and servers they could connect to to get a reliable connection.

Should this be doable? Perhaps the upload server and scheduler would be made "more intelligent" if multiple upload servers reside in different countries.

Kind regards Vyper
____________

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 531,382
RAC: 345
United States
Message 918418 - Posted: 16 Jul 2009, 11:52:45 UTC - in response to Message 918399.

Would an set of remote upload servers as "data aggregators" work?

Receives many small files, handles many TCP connections: then, once an hour or so, zips them all up into one big compressed file and only requires one connection to Berkeley? Maybe even negotiate for permission for those files to arrive over the campus network (upload only), so that Hurricane Electric becomes effectively download only?

It would need a change to the transitioner/validator logic and timing, to avoid those 'validate error' we get when the report arrives before the upload: but with BOINC doing delayed/batched reporting anyway, the overall effect wouldn't be big. It would finally give me an excuse to ditch v5.10.13 (no point in early reporting!). All it really needs is that if the validator can't find the file it needs at the first attempt, it goes into backoff/retry (like much of the rest of BOINC) instead of immediate error.

Edit - with the current scale of operations, it might even work with a single co-lo server down at the bottom of the hill at Campus networking HQ. Keep all those upload syn/ack packets off the 100MB. If it works there, it's scalable to other campuses, other continents as SETI grows.

Unfortunately, zip is not an option for a couple of reasons. First, that particular data does not zip very weil, and second the server CPUs are just about saturated anyway so they do not have the extra CPU power to to the unzips, and third, reports cannot be made until the server has the data.

The compression ratio on the data as tested range from aout 3% for AP to about 17% for SETI. This is not enough to fix the problem.

The last round of bottlenecks were server CPU issues. We don't need to aggravate that area either.

Reports are asynchronous and can occur at any time after the file is uploaded. If the report is made, and the file cannot be located, the report will be rejected along with the credit request.

You still have to cram all of the data through the same pipe, it is only in a larger file instead of a bunch of smaller ones.
____________


BOINC WIKI

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8833
Credit: 53,662,212
RAC: 48,529
United Kingdom
Message 918420 - Posted: 16 Jul 2009, 11:54:57 UTC - in response to Message 918414.

Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that.

Looking at the 90 day graphs shows quite a few peaks over 100,000, and a couple over 200,000.

Yes, that's why I said "sustained".

Those will be reporting peaks, not upload peaks: typically, when the upload server has been left open through a 4-hour maintenance window, and every upload is reported within 1 hour (the maintenance backoff interval) after the scheduler comes back up.

My suggested data aggregator (you could also call it a communications multiplexor) would also act as a buffer, helping to smooth the upload peaks even more. And if it did reach its incoming connection limit - well, we'd just get backoffs and retries, as now.

At least it would (I hope - waiting for the networking gurus to check it over) de-couple the upload problems from the download saturation.

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8833
Credit: 53,662,212
RAC: 48,529
United Kingdom
Message 918422 - Posted: 16 Jul 2009, 12:04:27 UTC - in response to Message 918418.

Unfortunately, zip is not an option for a couple of reasons. First, that particular data does not zip very weil...

The compression ratio on the data as tested range from aout 3% for AP to about 17% for SETI. This is not enough to fix the problem.

John,

No, no, NO!

These are uploads I'm talking about. They are tiny text files, ASCII/XML, and they compress very sweetly.

And even if they didn't, it wouldn't matter much. As Joe Segur says,

My firm belief is there's no bandwidth problem on the link going to SSL, rather it's a transaction rate problem.

My suggestion is addressed at solving the transaction overhead by combining the files.

But I accept your point about needing to check the CPU overhead of the unzip process on Bruno when the zips arrive. Can I offer the payback of losing the comms overhead, as a consolation?

Profile -= Vyper =-Project donor
Volunteer tester
Avatar
Send message
Joined: 5 Sep 99
Posts: 1098
Credit: 329,162,668
RAC: 157,426
Sweden
Message 918423 - Posted: 16 Jul 2009, 12:08:05 UTC - in response to Message 918418.

The compression ratio on the data as tested range from aout 3% for AP to about 17% for SETI. This is not enough to fix the problem.



Nope, that is if you are refering to work downloads then there is no point ziping it, but in terms of uploads the result is compressed roughly 80% for me, 28K gets to around 5K in size in terms of uploads to berkeley.


The last round of bottlenecks were server CPU issues. We don't need to aggravate that area either.


Ok didn't know that the cpu's are stalled at berkeley, only thought that there were exessive amount of disk access and database shuffling.

You still have to cram all of the data through the same pipe, it is only in a larger file instead of a bunch of smaller ones.


That's correct but in terms of TCP/IP efficiency it's mych better getting one connection needing to transfer 40 Mbytes of data instead of 25000 that all network equipment need to account for, don't forget that switches gets congested to and with those large amounts of people trying to connect many switches doesn't even have memory enough to keep track of all the mac adressess connecting.

//Vyper
____________

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group

Profile -= Vyper =-Project donor
Volunteer tester
Avatar
Send message
Joined: 5 Sep 99
Posts: 1098
Credit: 329,162,668
RAC: 157,426
Sweden
Message 918424 - Posted: 16 Jul 2009, 12:09:25 UTC - in response to Message 918422.
Last modified: 16 Jul 2009, 12:11:14 UTC

John,

No, no, NO!

These are uploads I'm talking about. They are tiny text files, ASCII/XML, and they compress very sweetly


Oops you got me there :)

You were 4 minutes faster..

//Vyper
____________

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group

D.J.Lankenau
Send message
Joined: 17 May 99
Posts: 1
Credit: 607,207
RAC: 29
United States
Message 918437 - Posted: 16 Jul 2009, 13:23:00 UTC
Last modified: 16 Jul 2009, 13:28:47 UTC

This is my first post to this (any) board, although I have been lurking for years. I'm sure they already thought of this and probably tried it but I need to get the thought out of my head. If S@H is having server problems or network problems maybe they can distribute the load. Seti@home@home or BOINC@home. If they need uploads collected, combined and stored for a while SIGN ME UP. Make it a BOINC project !
____________

Profile Virtual Boss*
Volunteer tester
Avatar
Send message
Joined: 4 May 08
Posts: 417
Credit: 6,206,208
RAC: 250
Australia
Message 918441 - Posted: 16 Jul 2009, 13:35:05 UTC

One thought I have had.....

BUT it would require a change to the Boinc client software.

I'll throw it in the ring anyway

It seems a lot of the problem is the continual hammering of the upload server with attempt to upload by each result individually.

Why not get Boinc to apply the backoff to ALL results attempting to upload to that SAME server that caused the initial backoff.

This would mean having a backoff clock for each upload server, instead of for each result.

This would mean just one or two (whatever your # of simultaneous tranfers setting) results would make the attempt, then the rest of the results waiting (up to 1000's in some cases) would be backed off as well and give the servers a breather.

Not being a programmer, I'm not sure how difficult this would be to implement (proverbially it doesn't seem like it would be to me), and the benefits of reduced bandwidth wasting should be substantial.

Please feel free to comment.

____________
Flying high with Team Sicituradastra.

Profile Bill Walker
Avatar
Send message
Joined: 4 Sep 99
Posts: 3462
Credit: 2,217,237
RAC: 1,093
Canada
Message 918443 - Posted: 16 Jul 2009, 13:43:22 UTC - in response to Message 918441.
Last modified: 16 Jul 2009, 13:49:07 UTC

Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries.

Edit added after serious thought: maybe what we need is longer maximum delay times, meaning an adjustment in BOINC and an adjustment in the deadlines set by the projects. All this would result in longer turn around times on the average, but that may be the price we pay to accommodate the ever increasing number of users on limited hardware. On a related note, I can remember when airlines would sell you a ticket AFTER you got on the plane on some shuttle flights, now you have to buy the ticket days in advance and get to the airport hours before the flight. All just signs of straining infrastructure.
____________

Profile Virtual Boss*
Volunteer tester
Avatar
Send message
Joined: 4 May 08
Posts: 417
Credit: 6,206,208
RAC: 250
Australia
Message 918445 - Posted: 16 Jul 2009, 13:46:15 UTC - in response to Message 918443.
Last modified: 16 Jul 2009, 13:47:41 UTC

Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries.


YES, and NO... Boinc has an inbuilt setting which only allow x number of simultaneous tranfers (Default = 2 per project)
____________
Flying high with Team Sicituradastra.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

Message boards : Technical News : Working as Expected (Jul 13 2009)

Copyright © 2014 University of California