Working as Expected (Jul 13 2009)

Message boards : Technical News : Working as Expected (Jul 13 2009)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

AuthorMessage
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 918374 - Posted: 16 Jul 2009, 4:45:13 UTC - in response to Message 918370.  


Also I don't believe the project scientists would like to lose direct control of the project data. It's like a "chain of custody" in an investigation. If you can't account for the security of the data throughout the process then questions will arise if something of some importance is claimed at a later date.

The project scientists don't have control while the work is on our crunchers.

That's why two crunchers work each WU.
ID: 918374 · Report as offensive
Red_Wolf_2

Send message
Joined: 24 Sep 00
Posts: 1
Credit: 2,776,134
RAC: 0
Australia
Message 918379 - Posted: 16 Jul 2009, 5:05:21 UTC - in response to Message 918356.  


Same for upload. Yes, you can put multiple upload servers around the world, but eventually, somehow, the data has to go back to setiathome.berkeley.edu.


This may seem silly, but what would the bandwidth of a couple of large HDDs or tapes (say, terabyte or larger) be if sent by courier or post from various other sites back to berkeley? It would get around the issue of bandwidth consumption, and the data could be stripped off the drive and confirmed before it was wiped from the satellite systems doing the actual receiving... Low-tech I know, and there would be the delay of receiving and loading the data, but its a thought...
ID: 918379 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 918380 - Posted: 16 Jul 2009, 5:10:45 UTC - in response to Message 918379.  


Same for upload. Yes, you can put multiple upload servers around the world, but eventually, somehow, the data has to go back to setiathome.berkeley.edu.


This may seem silly, but what would the bandwidth of a couple of large HDDs or tapes (say, terabyte or larger) be if sent by courier or post from various other sites back to berkeley? It would get around the issue of bandwidth consumption, and the data could be stripped off the drive and confirmed before it was wiped from the satellite systems doing the actual receiving... Low-tech I know, and there would be the delay of receiving and loading the data, but its a thought...

The biggest problem with that is that the scheduler can't assign the work while the big hard drives are in transit.

... of course, we're assuming that there is someone at the other end who is willing to swap disks.
ID: 918380 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 918399 - Posted: 16 Jul 2009, 8:37:17 UTC
Last modified: 16 Jul 2009, 8:57:43 UTC

Would an set of remote upload servers as "data aggregators" work?

Receives many small files, handles many TCP connections: then, once an hour or so, zips them all up into one big compressed file and only requires one connection to Berkeley? Maybe even negotiate for permission for those files to arrive over the campus network (upload only), so that Hurricane Electric becomes effectively download only?

It would need a change to the transitioner/validator logic and timing, to avoid those 'validate error' we get when the report arrives before the upload: but with BOINC doing delayed/batched reporting anyway, the overall effect wouldn't be big. It would finally give me an excuse to ditch v5.10.13 (no point in early reporting!). All it really needs is that if the validator can't find the file it needs at the first attempt, it goes into backoff/retry (like much of the rest of BOINC) instead of immediate error.

Edit - with the current scale of operations, it might even work with a single co-lo server down at the bottom of the hill at Campus networking HQ. Keep all those upload syn/ack packets off the 100MB. If it works there, it's scalable to other campuses, other continents as SETI grows.
ID: 918399 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 918402 - Posted: 16 Jul 2009, 9:16:52 UTC - in response to Message 918399.  
Last modified: 16 Jul 2009, 9:21:18 UTC

Would an set of remote upload servers as "data aggregators" work?

Receives many small files, handles many TCP connections: then, once an hour or so, zips them all up into one big compressed file and only requires one connection to Berkeley? Maybe even negotiate for permission for those files to arrive over the campus network (upload only), so that Hurricane Electric becomes effectively download only?

It would need a change to the transitioner/validator logic and timing, to avoid those 'validate error' we get when the report arrives before the upload: but with BOINC doing delayed/batched reporting anyway, the overall effect wouldn't be big. It would finally give me an excuse to ditch v5.10.13 (no point in early reporting!). All it really needs is that if the validator can't find the file it needs at the first attempt, it goes into backoff/retry (like much of the rest of BOINC) instead of immediate error.

Edit - with the current scale of operations, it might even work with a single co-lo server down at the bottom of the hill at Campus networking HQ. Keep all those upload syn/ack packets off the 100MB. If it works there, it's scalable to other campuses, other continents as SETI grows.


I thought of this too when ppl started to suggest multiple worldwide spread ul servers.

A Quick zip indicates that 58 finished results takes up aprox 275K in size so that would equal 46 Mb for every 10000 results in size.
That file could reside in an incoming folder with a cron script unpacking *.zip into that same folder.

Very nice idea actually.

In my perspective worth looking into.

Kind regards Vyper

P.S I'm sure that Swedish University Network (www.sunet.se) here in Sweden would donate their gigabit links for this purpose, perhaps Telia also. D.S

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 918402 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 918406 - Posted: 16 Jul 2009, 10:27:36 UTC - in response to Message 918402.  
Last modified: 16 Jul 2009, 10:43:52 UTC

A Quick zip indicates that 58 finished results takes up aprox 275K in size so that would equal 46 Mb for every 10000 results in size.
That file could reside in an incoming folder with a cron script unpacking *.zip into that same folder.

Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that. So all you need is a server capable of running Apache/fastcgi, with a public-facing NIC capable of handling that many connections (the only stress point), and a private-facing NIC which can transfer one 45MB file every seven or eight minutes. That sounds do-able: didn't someone post a link saying that Campus had some free refurb servers available for bidding?

Edit - better get a RAID / hot spare / fresh set of server-class, 5-year warranty, 15,000 rpm disk drives though - they're going to take a hammering!
ID: 918406 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13715
Credit: 208,696,464
RAC: 304
Australia
Message 918414 - Posted: 16 Jul 2009, 11:29:54 UTC - in response to Message 918406.  

Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that.

Looking at the 90 day graphs shows quite a few paeks over 100,000, and a couple over 200,000.
Grant
Darwin NT
ID: 918414 · Report as offensive
HTH
Volunteer tester

Send message
Joined: 8 Jul 00
Posts: 691
Credit: 909,237
RAC: 0
Finland
Message 918415 - Posted: 16 Jul 2009, 11:34:17 UTC - in response to Message 917472.  

Eric is close to implementing a configuration change which will increase the resolution of chirp rates (thus increasing analysis/sensitivity)


Can you tell me more about this chirp rate resolution increasing?

The maximum bandwidth SETI@home is 1200 Hz, I guess. Why not to increase also this? 2400 Hz? 4800 Hz? 9600 Hz?

HTH.
ID: 918415 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 918417 - Posted: 16 Jul 2009, 11:47:51 UTC - in response to Message 918406.  

A Quick zip indicates that 58 finished results takes up aprox 275K in size so that would equal 46 Mb for every 10000 results in size.
That file could reside in an incoming folder with a cron script unpacking *.zip into that same folder.

Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that. So all you need is a server capable of running Apache/fastcgi, with a public-facing NIC capable of handling that many connections (the only stress point), and a private-facing NIC which can transfer one 45MB file every seven or eight minutes. That sounds do-able: didn't someone post a link saying that Campus had some free refurb servers available for bidding?

Edit - better get a RAID / hot spare / fresh set of server-class, 5-year warranty, 15,000 rpm disk drives though - they're going to take a hammering!


That would be rather cool but the more places you could get a upload server on there is less load.
The upload server would need to check whether the users country setting on it's account too, if i'm resident in Scandinavia it would accept connections on that server , england , russia , ireland etc and deny if not set or international if it's set at no country or international berkeley doesn't send out that servers ip to connect to but defaults to its own slower connection.

Bu implementing that you pinpoint users to their countrylocations and servers they could connect to to get a reliable connection.

Should this be doable? Perhaps the upload server and scheduler would be made "more intelligent" if multiple upload servers reside in different countries.

Kind regards Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 918417 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 918418 - Posted: 16 Jul 2009, 11:52:45 UTC - in response to Message 918399.  

Would an set of remote upload servers as "data aggregators" work?

Receives many small files, handles many TCP connections: then, once an hour or so, zips them all up into one big compressed file and only requires one connection to Berkeley? Maybe even negotiate for permission for those files to arrive over the campus network (upload only), so that Hurricane Electric becomes effectively download only?

It would need a change to the transitioner/validator logic and timing, to avoid those 'validate error' we get when the report arrives before the upload: but with BOINC doing delayed/batched reporting anyway, the overall effect wouldn't be big. It would finally give me an excuse to ditch v5.10.13 (no point in early reporting!). All it really needs is that if the validator can't find the file it needs at the first attempt, it goes into backoff/retry (like much of the rest of BOINC) instead of immediate error.

Edit - with the current scale of operations, it might even work with a single co-lo server down at the bottom of the hill at Campus networking HQ. Keep all those upload syn/ack packets off the 100MB. If it works there, it's scalable to other campuses, other continents as SETI grows.

Unfortunately, zip is not an option for a couple of reasons. First, that particular data does not zip very weil, and second the server CPUs are just about saturated anyway so they do not have the extra CPU power to to the unzips, and third, reports cannot be made until the server has the data.

The compression ratio on the data as tested range from aout 3% for AP to about 17% for SETI. This is not enough to fix the problem.

The last round of bottlenecks were server CPU issues. We don't need to aggravate that area either.

Reports are asynchronous and can occur at any time after the file is uploaded. If the report is made, and the file cannot be located, the report will be rejected along with the credit request.

You still have to cram all of the data through the same pipe, it is only in a larger file instead of a bunch of smaller ones.


BOINC WIKI
ID: 918418 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 918420 - Posted: 16 Jul 2009, 11:54:57 UTC - in response to Message 918414.  

Looking at Scarecrow's graphs, we've had sustained bursts of 70,000 - 80,000 results per hour recently, but nothing above that.

Looking at the 90 day graphs shows quite a few peaks over 100,000, and a couple over 200,000.

Yes, that's why I said "sustained".

Those will be reporting peaks, not upload peaks: typically, when the upload server has been left open through a 4-hour maintenance window, and every upload is reported within 1 hour (the maintenance backoff interval) after the scheduler comes back up.

My suggested data aggregator (you could also call it a communications multiplexor) would also act as a buffer, helping to smooth the upload peaks even more. And if it did reach its incoming connection limit - well, we'd just get backoffs and retries, as now.

At least it would (I hope - waiting for the networking gurus to check it over) de-couple the upload problems from the download saturation.
ID: 918420 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 918422 - Posted: 16 Jul 2009, 12:04:27 UTC - in response to Message 918418.  

Unfortunately, zip is not an option for a couple of reasons. First, that particular data does not zip very weil...

The compression ratio on the data as tested range from aout 3% for AP to about 17% for SETI. This is not enough to fix the problem.

John,

No, no, NO!

These are uploads I'm talking about. They are tiny text files, ASCII/XML, and they compress very sweetly.

And even if they didn't, it wouldn't matter much. As Joe Segur says,

My firm belief is there's no bandwidth problem on the link going to SSL, rather it's a transaction rate problem.

My suggestion is addressed at solving the transaction overhead by combining the files.

But I accept your point about needing to check the CPU overhead of the unzip process on Bruno when the zips arrive. Can I offer the payback of losing the comms overhead, as a consolation?
ID: 918422 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 918423 - Posted: 16 Jul 2009, 12:08:05 UTC - in response to Message 918418.  

The compression ratio on the data as tested range from aout 3% for AP to about 17% for SETI. This is not enough to fix the problem.



Nope, that is if you are refering to work downloads then there is no point ziping it, but in terms of uploads the result is compressed roughly 80% for me, 28K gets to around 5K in size in terms of uploads to berkeley.


The last round of bottlenecks were server CPU issues. We don't need to aggravate that area either.


Ok didn't know that the cpu's are stalled at berkeley, only thought that there were exessive amount of disk access and database shuffling.

You still have to cram all of the data through the same pipe, it is only in a larger file instead of a bunch of smaller ones.


That's correct but in terms of TCP/IP efficiency it's mych better getting one connection needing to transfer 40 Mbytes of data instead of 25000 that all network equipment need to account for, don't forget that switches gets congested to and with those large amounts of people trying to connect many switches doesn't even have memory enough to keep track of all the mac adressess connecting.

//Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 918423 · Report as offensive
Profile -= Vyper =-
Volunteer tester
Avatar

Send message
Joined: 5 Sep 99
Posts: 1652
Credit: 1,065,191,981
RAC: 2,537
Sweden
Message 918424 - Posted: 16 Jul 2009, 12:09:25 UTC - in response to Message 918422.  
Last modified: 16 Jul 2009, 12:11:14 UTC

John,

No, no, NO!

These are uploads I'm talking about. They are tiny text files, ASCII/XML, and they compress very sweetly


Oops you got me there :)

You were 4 minutes faster..

//Vyper

_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group
ID: 918424 · Report as offensive
D.J.Lankenau

Send message
Joined: 17 May 99
Posts: 1
Credit: 614,824
RAC: 0
United States
Message 918437 - Posted: 16 Jul 2009, 13:23:00 UTC
Last modified: 16 Jul 2009, 13:28:47 UTC

This is my first post to this (any) board, although I have been lurking for years. I'm sure they already thought of this and probably tried it but I need to get the thought out of my head. If S@H is having server problems or network problems maybe they can distribute the load. Seti@home@home or BOINC@home. If they need uploads collected, combined and stored for a while SIGN ME UP. Make it a BOINC project !
ID: 918437 · Report as offensive
Profile Virtual Boss*
Volunteer tester
Avatar

Send message
Joined: 4 May 08
Posts: 417
Credit: 6,440,287
RAC: 0
Australia
Message 918441 - Posted: 16 Jul 2009, 13:35:05 UTC

One thought I have had.....

BUT it would require a change to the Boinc client software.

I'll throw it in the ring anyway

It seems a lot of the problem is the continual hammering of the upload server with attempt to upload by each result individually.

Why not get Boinc to apply the backoff to ALL results attempting to upload to that SAME server that caused the initial backoff.

This would mean having a backoff clock for each upload server, instead of for each result.

This would mean just one or two (whatever your # of simultaneous tranfers setting) results would make the attempt, then the rest of the results waiting (up to 1000's in some cases) would be backed off as well and give the servers a breather.

Not being a programmer, I'm not sure how difficult this would be to implement (proverbially it doesn't seem like it would be to me), and the benefits of reduced bandwidth wasting should be substantial.

Please feel free to comment.

Flying high with Team Sicituradastra.
ID: 918441 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 918443 - Posted: 16 Jul 2009, 13:43:22 UTC - in response to Message 918441.  
Last modified: 16 Jul 2009, 13:49:07 UTC

Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries.

Edit added after serious thought: maybe what we need is longer maximum delay times, meaning an adjustment in BOINC and an adjustment in the deadlines set by the projects. All this would result in longer turn around times on the average, but that may be the price we pay to accommodate the ever increasing number of users on limited hardware. On a related note, I can remember when airlines would sell you a ticket AFTER you got on the plane on some shuttle flights, now you have to buy the ticket days in advance and get to the airport hours before the flight. All just signs of straining infrastructure.

ID: 918443 · Report as offensive
Profile Virtual Boss*
Volunteer tester
Avatar

Send message
Joined: 4 May 08
Posts: 417
Credit: 6,440,287
RAC: 0
Australia
Message 918445 - Posted: 16 Jul 2009, 13:46:15 UTC - in response to Message 918443.  
Last modified: 16 Jul 2009, 13:47:41 UTC

Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries.


YES, and NO... Boinc has an inbuilt setting which only allow x number of simultaneous tranfers (Default = 2 per project)
Flying high with Team Sicituradastra.
ID: 918445 · Report as offensive
Profile Bill Walker
Avatar

Send message
Joined: 4 Sep 99
Posts: 3868
Credit: 2,697,267
RAC: 0
Canada
Message 918449 - Posted: 16 Jul 2009, 13:52:00 UTC - in response to Message 918445.  

Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries.


YES, and NO... Boinc has an inbuilt setting which only allow x number of simultaneous tranfers (Default = 2 per project)


So after the delay BOINC would try x uploads, then when they fail x more, and so on. That sounds exactly like what happens right now when a frustrated user hits "Retry Now" for all his umpteen pending uploads. Not an improvement, in my opinion.

ID: 918449 · Report as offensive
Profile Virtual Boss*
Volunteer tester
Avatar

Send message
Joined: 4 May 08
Posts: 417
Credit: 6,440,287
RAC: 0
Australia
Message 918451 - Posted: 16 Jul 2009, 13:58:38 UTC - in response to Message 918445.  

Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries.


YES, and NO... Boinc has an inbuilt setting which only allow x number of simultaneous tranfers (Default = 2 per project)


Just to clarify...(using the default setting of two)

Yes ... ALL would be ready to attempt upload at the same time.

IF the first 2 attempting upload failed, they would all back off again.

or IF the first two succeeded, two more would immediately attempt upload...
continueing until all uploaded ( or one failed, which would initiate a new back off)
Flying high with Team Sicituradastra.
ID: 918451 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 11 · Next

Message boards : Technical News : Working as Expected (Jul 13 2009)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.