Use DeDuplication in the files

Questions and Answers : Wish list : Use DeDuplication in the files
Message board moderation

To post messages, you must log in.

AuthorMessage
cthame

Send message
Joined: 20 Aug 06
Posts: 3
Credit: 2,097,551
RAC: 0
Australia
Message 1411988 - Posted: 6 Sep 2013, 2:10:00 UTC

I am wondering if we dedupe the data files processing or file transmission couldn't be quicker?
ID: 1411988 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15691
Credit: 84,761,841
RAC: 28
United States
Message 1412000 - Posted: 6 Sep 2013, 2:57:15 UTC - in response to Message 1411988.  

Deduplication compresses files which requires extra overhead to compress and decompress. How would this be quicker in processing?


...and actually, SETI@home is trying to find ways to make processing longer so as to keep the 150,000+ hosts from constantly pounding the servers for more work. Processing was recently increased in SETI@home v7 by adding autocorrelation into the processing.
ID: 1412000 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1412111 - Posted: 6 Sep 2013, 10:49:16 UTC

Added to that, the Seti@Home tasks are 367KB per task. Not too big.
But even if you compress these, they are 270KB. A reduction of only 27%.

The Astropulse are 8MB per task, and can only be partially compressed.
I tried with 7zip LZMA Ultra: 7zip a -t7z -m0=lzma -mx=9 -mfb=256 -md=512m -ms=on, will compress my 8,196KB test-AP task into 5,666KB. A reduction of ~30%. And this depends on the AR of the task as well, as not all of them compress that well.

Now, the problem is that the compression method needs to be available on the 4 main platforms that BOINC is available for: FreeBSD, Linux, Windows and Macintosh OS X. 7zip (7z) may do all that, but it still needs an external 7zip program before it can be used.

Which then means that we fall back to the next best thing, one that is supported by all out-of-the-box, which is plain zip.
Now then, using zip -m0=deflate -mx=9 -mfb=128 -md=32k -ms=on, the Seti@Home v7 task compresses from 367KB to 266KB, while the AP goes from 8,196KB to 8,096KB. A reduction of 28% and 1.3%.
ID: 1412111 · Report as offensive

Questions and Answers : Wish list : Use DeDuplication in the files


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.