| Author |
Message |
|
|
|
This is turning into quite a pub crawl - any offers from Lincolnshire? (we might be getting a little thirsty as we head south....) |
|
|
|
|
Seeing the constant issues with bandwidth, I was wondering if the SETI data packets could be compressed (to reduce size for transfer), then BIONIC decompress for processing.
I know it would take more CPU time to compress the packets, but it would reduce bandwidth use.
Just an idea, thought I would share.
- Wol
File compression works based on the fact that the data in those files (word processing, databases, etc.) are not entirely random. A "flat file" database may compress 90% because one filler character appears over and over.
Common bytes get shorter codes, uncommon bytes longer, and the average number of bits/character goes down. (gross oversimplification)
Binary data consisting almost entirely of noise is going to be equally distributed across the whole range, so they aren't very compressable.
To make this concrete, I compressed (Windows) my project directory of 500 or so wu's. It saved 7MB out of 180MB, or about 4%. So I then zipped up the entire directory and saved about 28% of disk space.
Granted there may be better, or more tailored compression algorithms available, these numbers are a good guide and that tends to validate Ned's observation. Although 28% might be appealing if it could be implemented and if it is related to wu's not not some sublety of Vista NTFS. |
|
|
|
|
|
Oops. I just saw Josef's estimates. Sorry for the added post. |
|
|
|
|
Granted there may be better, or more tailored compression algorithms available, these numbers are a good guide and that tends to validate Ned's observation. Although 28% might be appealing if it could be implemented and if it is related to wu's not not some sublety of Vista NTFS.
All file compression is based on the fact that data usually isn't very random.
In the absense of a strong, consistent signal, stuff coming from a radio receiver is very well randomized.
Joe's comments: the easiest way to pick up that 15% would be to encode the MB work units as pure binary, same as AP. Of course, we'd all need new science apps.
____________
|
|
|
|
|
We don't usually do pitchers in England, but I'd love to hear the stories - if you're ever in Yorkshire, these are waiting for you:
And a dyed-in-the-wool Lancastrian would risk crossing the border to double the contribution.
F.
And I'll come across from Nottinghamshire to triple the contribution.
Claggy
Add another 4 from Cambridgeshire :).
Mick
Id probably even manage to come over from the isle of man for that :)
____________
|
|
|
|
|
We don't usually do pitchers in England, but I'd love to hear the stories - if you're ever in Yorkshire, these are waiting for you:
And a dyed-in-the-wool Lancastrian would risk crossing the border to double the contribution.
F.
And I'll come across from Nottinghamshire to triple the contribution.
Claggy
Add another 4 from Cambridgeshire :).
Mick
Id probably even manage to come over from the isle of man for that :)
Perhaps we should get Matt to put it on a CD and just have a get-together to listen to it. Seems a pity for all those pints to go to waste ;)
F.
____________
|
|
|
|
|
We don't usually do pitchers in England, but I'd love to hear the stories - if you're ever in Yorkshire, these are waiting for you:
And a dyed-in-the-wool Lancastrian would risk crossing the border to double the contribution.
F.
And I'll come across from Nottinghamshire to triple the contribution.
Claggy
Add another 4 from Cambridgeshire :).
Mick
Here's another 4 from a Yorkshireman in Lincolnshire!
____________
|
|
|
|
|
|
Ned Ludd wrote: ...
Joe's comments: the easiest way to pick up that 15% would be to encode the MB work units as pure binary, same as AP. Of course, we'd all need new science apps.
It would be more like 13% that way, the 20K of XML in Enhanced WUs gzips to 4K and you lose the gzip compression of AP WUs. OTOH, it has an advantage for project WU storage space. Adding the capability to handle encoding="binary" to the Enhanced apps would be fairly easy, once everyone had new apps the splitter could be changed to send that format. It would be a way to weed out those numbskulls who are running obsolete optimized apps, too. Joe |
|
|
|
|
We don't usually do pitchers in England, but I'd love to hear the stories - if you're ever in Yorkshire, these are waiting for you:
And a dyed-in-the-wool Lancastrian would risk crossing the border to double the contribution.
F.
And I'll come across from Nottinghamshire to triple the contribution.
Claggy
Add another 4 from Cambridgeshire :).
Mick
Here's another 4 from a Yorkshireman in Lincolnshire!
To complete the liquid odesey I've got a crate ready here in Hampshire - if he's still standing - and will pour him back on a plane at Heathrow :)
____________
|
|
|
|
|
We don't usually do pitchers in England, but I'd love to hear the stories - if you're ever in Yorkshire, these are waiting for you:
And a dyed-in-the-wool Lancastrian would risk crossing the border to double the contribution.
F.
And I'll come across from Nottinghamshire to triple the contribution.
Claggy
Add another 4 from Cambridgeshire :).
Mick
Here's another 4 from a Yorkshireman in Lincolnshire!
To complete the liquid odesey I've got a crate ready here in Hampshire - if he's still standing - and will pour him back on a plane at Heathrow :)
There are a `few` counties left yet........
and I am shure a pint or few can be found in Cumbria (west coast)
(hello Isle of Man, I can see you :).... |
|
|
|
|
Right now Seti@Home is coping quite well with a 100 Mb connection to the internet
for the most part. This connection has to send out at least 2 copies of each WU,
accept work returned, and send out updated Apps.
If a server closet could be set up at the 1 GB feed to distribute the
WU's, as well as the updated apps, and accept the work returned, then the
bandwidth could easily double.
With Wu's going down the 100 Mb line just once, instead of 2 or 3 times,
returned work coming back on the same line, and new Apps only being
updated to the 1 Gb server closet once per update, the increase in bandwidth
would easily exceed 50%.
It might be easier to have the work returned directly to the lab since the
bandwidth required is so little.
Sounds good, but....
I don't think the two work-units are identical, so you'd need a pre-splitter that made one, and then something else that copied the identical parts to make another.
I'm not saying it's not possible, just that it might not be as helpful as it appears at first blush.
I think it'd be better if all of the servers could end up close to the 1gb feed, and then managed remotely (with the "tapes" sent over the 100mb link) but ultimately, whatever they do has to be workable for them.
____________
|
|
|
|
|
Right now Seti@Home is coping quite well with a 100 Mb connection to the internet
for the most part. This connection has to send out at least 2 copies of each WU,
accept work returned, and send out updated Apps.
If a server closet could be set up at the 1 GB feed to distribute the
WU's, as well as the updated apps, and accept the work returned, then the
bandwidth could easily double.
With Wu's going down the 100 Mb line just once, instead of 2 or 3 times,
returned work coming back on the same line, and new Apps only being
updated to the 1 Gb server closet once per update, the increase in bandwidth
would easily exceed 50%.
It might be easier to have the work returned directly to the lab since the
bandwidth required is so little.
Sounds good, but....
I don't think the two work-units are identical, so you'd need a pre-splitter that made one, and then something else that copied the identical parts to make another.
I'm not saying it's not possible, just that it might not be as helpful as it appears at first blush.
I think it'd be better if all of the servers could end up close to the 1gb feed, and then managed remotely (with the "tapes" sent over the 100mb link) but ultimately, whatever they do has to be workable for them.
Ned,
Surely the data in a workunit (the 366KB or 8MB download file) is identical for all replications. Otherwise validation makes no sense. Have a look in the fanout directories for a current one of yours (the directory name is in the url in client_state.xml). You'll only find one copy.
The 'tasks' assigned to each host are different, but trivially so: "process this data, and return a file called WU_0" - "... WU_1" - and so on. But the 'tasks' are trivially small, and highly compressible, components of 'sched_reply...xml' files.
What's more, the data files remain unchanged on disk in case of compute error, deadline exceeded, CBNC etc.: they can then be resent quickly if needed, or deleted once validation is complete.
That's probably the weakness with guido.man's idea: storing the datafiles at the bottom of the hill involves a large data storage unit, and a lot of management data traffic as data is added and deleted - all for a comparatively small (just over 50%, allowing for resends) gain in bandwidth. Storing the applications down there would require much less data capacity and much less management, and still be a worthwhile contribution. |
|
|
|
|
|
If when a new application is available, it might help, a lot, if the application were sent first and the host acknowledges the correct download of same. Before sending any tasks for that application.
Rather than, here is a task, Oh you haven't got the application, wait one I'll send it to you. Only to find the application downloads are not working. Oooops.
Sending out, in some cases 5 copies of the task @ 8 MB each, only for then to fail because the host hasn't got the application yet was a big waste of bandwidth. Which caused all sorts of problems for the project team and us poor users who have now done several long tasks only for them to reach the "Too many errors" threshold. |
|
|
|
|
Surely the data in a workunit (the 366KB or 8MB download file) is identical for all replications. Otherwise validation makes no sense. Have a look in the fanout directories for a current one of yours (the directory name is in the url in client_state.xml). You'll only find one copy.
At the moment, I've got one of the 5.01 "beta" astropulse units that is taking a long time to crunch, so I can't look at Multibeam.
When I look at that workunit, it has a header, plus the data.
I don't have two copies of the same workunit sent to different hosts, so I can't compare two headers to see if they're the same.
I assume they're different -- that they contain information that identifies that result.
The data part of each of these should be identical.
I don't know how BOINC handles directory fan-outs, but I would expect the companion WU to be in a different directory on the server, based on a hash of the file name.
____________
|
|
|
|
|
Surely the data in a workunit (the 366KB or 8MB download file) is identical for all replications. Otherwise validation makes no sense. Have a look in the fanout directories for a current one of yours (the directory name is in the url in client_state.xml). You'll only find one copy.
At the moment, I've got one of the 5.01 "beta" astropulse units that is taking a long time to crunch, so I can't look at Multibeam.
When I look at that workunit, it has a header, plus the data.
I don't have two copies of the same workunit sent to different hosts, so I can't compare two headers to see if they're the same.
I assume they're different -- that they contain information that identifies that result.
The data part of each of these should be identical.
I don't know how BOINC handles directory fan-outs, but I would expect the companion WU to be in a different directory on the server, based on a hash of the file name.
Sorry, Ned, Richard is right. The splitter produces one workunit file and stores it on a download server. It also produces a database record for that workunit which includes a <target_nresults> value, that is what determines how many hosts will initially be told to download a copy of the workunit file (initial replication). The Transitioner creates that many result database entries which are added to the "Results ready to send" pool, the Feeder eventually passes those on to the Scheduler and hosts are told about the tasks. It's the result records in the database which differ for each host, the workunit file is identical. Joe |
|
|
|
|
hello from France
i'm happy than the outage was issued, you did a great job.
infortunately i can't get SETI wus, i can only get Astropulse wus and they are too big to crunch with my old PC (about 650 h announced !!!).
do you think i'll can get some in a few hours ?
thanks for you patience
Patrick from "l'Alliance francophone" team.
trouble resolved: i check my preferences and i have now only seti Wus, special thanks to BernardP from my team.
____________
seti1 was pretty good, seti2 will be better ?
|
|
|
|
|
hello from France
i'm happy than the outage was issued, you did a great job.
infortunately i can't get SETI wus, i can only get Astropulse wus and they are too big to crunch with my old PC (about 650 h announced !!!).
do you think i'll can get some in a few hours ?
thanks for you patience
Patrick from "l'Alliance francophone" team.
trouble resolved: i check my preferences and i have now only seti Wus, special thanks to BernardP from my team.
Hello,
well that your problem is solved.
For the next time if you have a problem, have a look in the 'number crunching' forum:
http://setiathome.berkeley.edu/forum_forum.php?id=10
There are more people around which could help you.. :-)
Nice greetings from Germany! :-)
____________
>Das Deutsche Cafe. The German Cafe.< |
|
|
|
|
|
What would happen if say half of the top 25 crunchers no longer needed internet access. Why not make it like netflix?
Image someone could put say 10,000 work units on a DVD and post it to me on Monday. Whenever I crunch through that disk I put all the results on a disk pop it in the prepaid mailer and in 3-5 days a new disk arrives. Rinse and repeat. You would need 3 disks for the system to be most efficient. One that is being crunched and two in the mail going opposite directions. The ermm "benefit" would of course require that the cruncher make regular minimum donations to the project. This would go to cover the time involved and supplies shipping etc. Heck with the decreased server load the total workload may actually improve due to less time spent chasing server gremlins. P.S. I second the request to add Alan array data to the project someday if there ever were a shortfall of raw data. I am still working on getting the other Tesla card for this machine. Just bought the wifey a new Mustang yesterday so the card may have to wait a bit. I can't seem to successfully explain to her why I horde video cards for headless machines...lol Hey, this machine has a 9500, tesla, and 260..How come on the computer information page it is listed as 3x gtx260? Thanks!
____________
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
|
|
|
Volunteer tester Send message
Joined: 9 Apr 02 Posts: 11987 Credit: 18,220,642 RAC: 57,064

|
|
I'm not so certain that idea would work. One of the Pros of volunteering is that its easy and not a whole lot of work. If people had to load their own workunits via DVD and send results back the same way, they'd quickly find another project.
____________
|
|
|
|
|
|
Aloha!
*shrugs* mo better than spending 2-3 days a week with idle machines reading:
"Scheduler request completed: got 0 new tasks" ;)
____________
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov
|
|
|