Use DeDuplication in the files


log in

Advanced search

Questions and Answers : Wish list : Use DeDuplication in the files

Author Message
cthame
Send message
Joined: 20 Aug 06
Posts: 3
Credit: 1,700,105
RAC: 60
Australia
Message 1411988 - Posted: 6 Sep 2013, 2:10:00 UTC

I am wondering if we dedupe the data files processing or file transmission couldn't be quicker?

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13706
Credit: 31,734,174
RAC: 12,809
United States
Message 1412000 - Posted: 6 Sep 2013, 2:57:15 UTC - in response to Message 1411988.

Deduplication compresses files which requires extra overhead to compress and decompress. How would this be quicker in processing?


...and actually, SETI@home is trying to find ways to make processing longer so as to keep the 150,000+ hosts from constantly pounding the servers for more work. Processing was recently increased in SETI@home v7 by adding autocorrelation into the processing.

Profile Ageless
Avatar
Send message
Joined: 9 Jun 99
Posts: 12474
Credit: 2,695,287
RAC: 1,405
Netherlands
Message 1412111 - Posted: 6 Sep 2013, 10:49:16 UTC

Added to that, the Seti@Home tasks are 367KB per task. Not too big.
But even if you compress these, they are 270KB. A reduction of only 27%.

The Astropulse are 8MB per task, and can only be partially compressed.
I tried with 7zip LZMA Ultra: 7zip a -t7z -m0=lzma -mx=9 -mfb=256 -md=512m -ms=on, will compress my 8,196KB test-AP task into 5,666KB. A reduction of ~30%. And this depends on the AR of the task as well, as not all of them compress that well.

Now, the problem is that the compression method needs to be available on the 4 main platforms that BOINC is available for: FreeBSD, Linux, Windows and Macintosh OS X. 7zip (7z) may do all that, but it still needs an external 7zip program before it can be used.

Which then means that we fall back to the next best thing, one that is supported by all out-of-the-box, which is plain zip.
Now then, using zip -m0=deflate -mx=9 -mfb=128 -md=32k -ms=on, the Seti@Home v7 task compresses from 367KB to 266KB, while the AP goes from 8,196KB to 8,096KB. A reduction of 28% and 1.3%.
____________
Jord

Fighting for the correct use of the apostrophe, together with Weird Al Yankovic

Questions and Answers : Wish list : Use DeDuplication in the files

Copyright © 2014 University of California