Use DeDuplication in the files

log in

Advanced search

Questions and Answers : Wish list : Use DeDuplication in the files

Author Message
Send message
Joined: 20 Aug 06
Posts: 3
Credit: 1,661,888
RAC: 887
Message 1411988 - Posted: 6 Sep 2013, 2:10:00 UTC

I am wondering if we dedupe the data files processing or file transmission couldn't be quicker?

Volunteer tester
Send message
Joined: 9 Apr 02
Posts: 13614
Credit: 30,355,503
RAC: 21,204
United States
Message 1412000 - Posted: 6 Sep 2013, 2:57:15 UTC - in response to Message 1411988.

Deduplication compresses files which requires extra overhead to compress and decompress. How would this be quicker in processing?

...and actually, SETI@home is trying to find ways to make processing longer so as to keep the 150,000+ hosts from constantly pounding the servers for more work. Processing was recently increased in SETI@home v7 by adding autocorrelation into the processing.

Profile Ageless
Send message
Joined: 9 Jun 99
Posts: 12300
Credit: 2,598,419
RAC: 1,128
Message 1412111 - Posted: 6 Sep 2013, 10:49:16 UTC

Added to that, the Seti@Home tasks are 367KB per task. Not too big.
But even if you compress these, they are 270KB. A reduction of only 27%.

The Astropulse are 8MB per task, and can only be partially compressed.
I tried with 7zip LZMA Ultra: 7zip a -t7z -m0=lzma -mx=9 -mfb=256 -md=512m -ms=on, will compress my 8,196KB test-AP task into 5,666KB. A reduction of ~30%. And this depends on the AR of the task as well, as not all of them compress that well.

Now, the problem is that the compression method needs to be available on the 4 main platforms that BOINC is available for: FreeBSD, Linux, Windows and Macintosh OS X. 7zip (7z) may do all that, but it still needs an external 7zip program before it can be used.

Which then means that we fall back to the next best thing, one that is supported by all out-of-the-box, which is plain zip.
Now then, using zip -m0=deflate -mx=9 -mfb=128 -md=32k -ms=on, the Seti@Home v7 task compresses from 367KB to 266KB, while the AP goes from 8,196KB to 8,096KB. A reduction of 28% and 1.3%.

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Questions and Answers : Wish list : Use DeDuplication in the files

Copyright © 2014 University of California