News from the top - file compression

Message boards : Number crunching : News from the top - file compression
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile trux
Volunteer tester
Avatar

Send message
Joined: 6 Feb 01
Posts: 344
Credit: 1,127,051
RAC: 0
Czech Republic
Message 253963 - Posted: 26 Feb 2006, 7:35:30 UTC - in response to Message 253941.  

You all seem to be going on about bandwidth here but where is the cpu power coming from to do the compression/decompression at the server end?
As I explained, that will happen only once per four 4 WU downloads, and once per ~830,000 2.6MB application downloads, saving more than a Terrabyte of bandwidth. Although we can speculate about the impact of the additional CPU load on the servers because of the packing at WU downloads and/or results uploads, it is absolutely no question at the application downloads.

trux
BOINC software
Freediving Team
Czech Republic
ID: 253963 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19080
Credit: 40,757,560
RAC: 67
United Kingdom
Message 253966 - Posted: 26 Feb 2006, 7:49:06 UTC

But how often do we need to do an app d/load as far as I can remember we have had at most three changes in the official app in the last year, so compressing the app will only save bandwidth on a few days/year. Most hosts only do one unit/day, and one way of saving bandwidth may be to allow those of us with more than one host, and especially the farmers, is to download a new app once and copy it locally to all the users hosts.
ID: 253966 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 254249 - Posted: 26 Feb 2006, 22:26:24 UTC

The host that does the ul/dl is, I believe, not normally CPU bound. It is occasionally I/O bound, and it is also sometimes disk bound (caused by a bug, hopefully never to happen again).


BOINC WIKI
ID: 254249 · Report as offensive
Profile Jim-R.
Volunteer tester
Avatar

Send message
Joined: 7 Feb 06
Posts: 1494
Credit: 194,148
RAC: 0
United States
Message 254294 - Posted: 27 Feb 2006, 0:30:47 UTC - in response to Message 254249.  

The host that does the ul/dl is, I believe, not normally CPU bound. It is occasionally I/O bound, and it is also sometimes disk bound (caused by a bug, hopefully never to happen again).


Ok, the disk usage for doing the compression may cause a problem then. So much for our cpu concerns! haha. But good news as far as the cpu then. Sounds like if disk problems can be conquered it might be worth it, at least for apps, but as was mentioned that's rare. A greater benefit to the I/O would be compressing wu's but it would put a lot more strain on the cpu and disk. Possibly not worth it?
Jim

Some people plan their life out and look back at the wealth they've had.
Others live life day by day and look back at the wealth of experiences and enjoyment they've had.
ID: 254294 · Report as offensive
Profile MikeSW17
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 1603
Credit: 2,700,523
RAC: 0
United Kingdom
Message 254483 - Posted: 27 Feb 2006, 11:23:04 UTC - in response to Message 253966.  
Last modified: 27 Feb 2006, 11:27:35 UTC

But how often do we need to do an app d/load as far as I can remember we have had at most three changes in the official app in the last year, so compressing the app will only save bandwidth on a few days/year. Most hosts only do one unit/day, and one way of saving bandwidth may be to allow those of us with more than one host, and especially the farmers, is to download a new app once and copy it locally to all the users hosts.


/EDIT (See edit at end!)

Actually, in the past year there have been the following releases:

12/05/2005 22:39 6,288,477 boinc_4.39_windows_intelx86.exe
13/05/2005 06:21 6,288,989 boinc_4.40_windows_intelx86.exe
15/05/2005 10:43 6,288,989 boinc_4.41_windows_intelx86.exe
17/05/2005 00:03 6,288,989 boinc_4.42_windows_intelx86.exe
19/05/2005 19:36 6,290,013 boinc_4.43_windows_intelx86.exe
27/05/2005 06:58 6,347,357 boinc_4.44_windows_intelx86.exe
08/06/2005 19:36 6,341,725 boinc_4.45_windows_intelx86.exe
08/07/2005 13:17 6,363,741 boinc_4.70_windows_intelx86.exe
29/09/2005 10:16 10,496,089 boinc_5.1.5_windows_intelx86.exe
11/10/2005 19:26 10,508,377 boinc_5.2.1_windows_intelx86.exe
20/10/2005 21:09 10,507,865 boinc_5.2.2_windows_intelx86.exe
28/10/2005 08:36 10,514,521 boinc_5.2.5_windows_intelx86.exe
04/11/2005 19:39 10,515,545 boinc_5.2.6_windows_intelx86.exe
26/11/2005 19:13 10,532,954 boinc_5.2.10_windows_intelx86.exe
26/11/2005 23:05 10,541,658 boinc_5.2.11_windows_intelx86.exe
19/12/2005 23:58 10,546,778 boinc_5.2.13_windows_intelx86.exe

That's 16. Over 130Mb of BOINC Releases, and I may have missed a few short-lived ones.

Admittedly, not everyone downloads every/most releases but it's still a significant transfer load.

/EDIT
Oops! On re-reading I realise you're talkig about the science app. Never the less, compression should certainly help on the one-off side, but I'm less convinced that it will help with WUs.

ID: 254483 · Report as offensive
Profile Tigher
Volunteer tester

Send message
Joined: 18 Mar 04
Posts: 1547
Credit: 760,577
RAC: 0
United Kingdom
Message 254493 - Posted: 27 Feb 2006, 11:37:54 UTC
Last modified: 27 Feb 2006, 11:38:32 UTC

I'm not sure the app is important in this measurement exercise thats going on. I took just one of those apps last year - 5.2.13. What I do take every day is (a decent guess here) about 160 work units at something like 354 KB each. That's approaching 60 MB per day every day.

Compress it I say and the results back! Batch them too and stop giving in to "I want to see each result as it completes get credits"; one batch a day I vote for all compressed. Now there is a serious server overheads saving.

ID: 254493 · Report as offensive
Profile Jim-R.
Volunteer tester
Avatar

Send message
Joined: 7 Feb 06
Posts: 1494
Credit: 194,148
RAC: 0
United States
Message 254499 - Posted: 27 Feb 2006, 11:57:56 UTC - in response to Message 254483.  

But how often do we need to do an app d/load as far as I can remember we have had at most three changes in the official app in the last year, so compressing the app will only save bandwidth on a few days/year. Most hosts only do one unit/day, and one way of saving bandwidth may be to allow those of us with more than one host, and especially the farmers, is to download a new app once and copy it locally to all the users hosts.


/EDIT (See edit at end!)

Actually, in the past year there have been the following releases:

12/05/2005 22:39 6,288,477 boinc_4.39_windows_intelx86.exe
13/05/2005 06:21 6,288,989 boinc_4.40_windows_intelx86.exe
15/05/2005 10:43 6,288,989 boinc_4.41_windows_intelx86.exe
17/05/2005 00:03 6,288,989 boinc_4.42_windows_intelx86.exe
19/05/2005 19:36 6,290,013 boinc_4.43_windows_intelx86.exe
27/05/2005 06:58 6,347,357 boinc_4.44_windows_intelx86.exe
08/06/2005 19:36 6,341,725 boinc_4.45_windows_intelx86.exe
08/07/2005 13:17 6,363,741 boinc_4.70_windows_intelx86.exe
29/09/2005 10:16 10,496,089 boinc_5.1.5_windows_intelx86.exe
11/10/2005 19:26 10,508,377 boinc_5.2.1_windows_intelx86.exe
20/10/2005 21:09 10,507,865 boinc_5.2.2_windows_intelx86.exe
28/10/2005 08:36 10,514,521 boinc_5.2.5_windows_intelx86.exe
04/11/2005 19:39 10,515,545 boinc_5.2.6_windows_intelx86.exe
26/11/2005 19:13 10,532,954 boinc_5.2.10_windows_intelx86.exe
26/11/2005 23:05 10,541,658 boinc_5.2.11_windows_intelx86.exe
19/12/2005 23:58 10,546,778 boinc_5.2.13_windows_intelx86.exe

That's 16. Over 130Mb of BOINC Releases, and I may have missed a few short-lived ones.

Admittedly, not everyone downloads every/most releases but it's still a significant transfer load.

/EDIT
Oops! On re-reading I realise you're talkig about the science app. Never the less, compression should certainly help on the one-off side, but I'm less convinced that it will help with WUs.


The greatest benefit as far as disk space and bandwidth (about 100kb apiece) would be if it were possible to store the wu's on the disk as compressed files. However since there are only four or a few more copies of each wu on the disk the cpu useage for compressing all those files would be high. A slightly better situation were if the splitter could compress the files (I'm assuming the splitter is a separate computer and has the cpu cycles to spare) but I understand the header of the wu has to be parsed for information to store in the database. Of course here it might be possible to uncompress just the first couple hundred bytes (just enough to get the header info) without uncompressing the entire file. Also I'm talking about compressing the wu's before they are even stored, not "on the fly" when they're sent. This would mean less disk space needed to store them, and the compression/storage could be done "off-peak". That would give the greatest return on the investment of cpu cycles.
Jim

Some people plan their life out and look back at the wealth they've had.
Others live life day by day and look back at the wealth of experiences and enjoyment they've had.
ID: 254499 · Report as offensive
Profile Jim-R.
Volunteer tester
Avatar

Send message
Joined: 7 Feb 06
Posts: 1494
Credit: 194,148
RAC: 0
United States
Message 254501 - Posted: 27 Feb 2006, 12:08:03 UTC - in response to Message 254493.  

I'm not sure the app is important in this measurement exercise thats going on. I took just one of those apps last year - 5.2.13. What I do take every day is (a decent guess here) about 160 work units at something like 354 KB each. That's approaching 60 MB per day every day.

Compress it I say and the results back! Batch them too and stop giving in to "I want to see each result as it completes get credits"; one batch a day I vote for all compressed. Now there is a serious server overheads saving.

The compression of the wu's would save about 1/3 in bandwidth and if they were stored on disk in compressed format it would also save 1/3 the disk space, however the cpu power to compress all those files would be fairly large. However it's been found that compressing the application saves about 1/2 of the file size, so half the bandwidth of sending them out with just a slight one time cpu jump. Very little cpu useage and neglible disk space savings but tremendous return on saving bandwidth. Wu's however, would give a greater savings overall on both disk space and bandwidth but at a very high cost in cpu cycles. Can't get something for nothing! It's just a question of is the saving in disk space and bandwidth worth the extra expense in cpu cycles.
Jim

Some people plan their life out and look back at the wealth they've had.
Others live life day by day and look back at the wealth of experiences and enjoyment they've had.
ID: 254501 · Report as offensive
Profile Tigher
Volunteer tester

Send message
Joined: 18 Mar 04
Posts: 1547
Credit: 760,577
RAC: 0
United Kingdom
Message 254505 - Posted: 27 Feb 2006, 12:22:03 UTC
Last modified: 27 Feb 2006, 12:24:00 UTC

Well its in 5.3.22 which has been released (non stable test version!) so we will see or hear perhaps how successful it is or is not. Then again...we never hear much about anything so lets not build our hopes up!

I think its likely to be pretty flakey as I "think" each WU will be zipped as it is sent so it may then be zipped 3 or 4 or 5 times. Unnecessary CPU time for sure but saved bandwidth. No change on disc. As I say I think this is how its done. I might get the source and take a peek.

ID: 254505 · Report as offensive
PRouleau

Send message
Joined: 3 Apr 99
Posts: 15
Credit: 20,638,960
RAC: 0
Canada
Message 441122 - Posted: 21 Oct 2006, 10:33:17 UTC

Hi!

I would like to have an update on this thread.

Does the workunits are sent in a compressed form?


My problem is that we have a lot of Idle cpu cycles at work, but we are limited by the bandwidth. By using only 7 PCs, BOINC use ~1GB of our 20GB/month. The dual cores boost the download frequency...

It would be nice if the wu was also saved in compressed form, but the HD space isn't a problem (on the client side). A compressed partition can do the job, when needed.

ID: 441122 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 441132 - Posted: 21 Oct 2006, 10:54:44 UTC - in response to Message 441122.  
Last modified: 21 Oct 2006, 10:56:59 UTC

Hi!

I would like to have an update on this thread.

Does the workunits are sent in a compressed form?


My problem is that we have a lot of Idle cpu cycles at work, but we are limited by the bandwidth. By using only 7 PCs, BOINC use ~1GB of our 20GB/month. The dual cores boost the download frequency...

It would be nice if the wu was also saved in compressed form, but the HD space isn't a problem (on the client side). A compressed partition can do the job, when needed.

I understand that the latest Alpha test versions of BOINC have this ability, but I don't know if any projects are planning to use it.

In any event, how much use would it be here on SETI? The data we're searching is very close to random - that's why we have to use so much computing power to find patterns in it! And random files don't compress well.

I recently used WinZip to compress a data file in case it might be useful for the optimisers to analyse one with a fairly rare AR. It went down from 354KB to 267KB, a 25% saving in bandwidth. Useful, but not the answer to all your problems.
ID: 441132 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 441548 - Posted: 22 Oct 2006, 2:15:36 UTC

The majority of the data in the WU files is also ASCII encoded binary data. (the first little part is XML). Can't be sure it would compress if still in binary form.
ID: 441548 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 441580 - Posted: 22 Oct 2006, 3:34:37 UTC - in response to Message 441548.  

The majority of the data in the WU files is also ASCII encoded binary data. (the first little part is XML). Can't be sure it would compress if still in binary form.

But ASCII compresses very nicely.


BOINC WIKI
ID: 441580 · Report as offensive
Hans Dorn
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 2262
Credit: 26,448,570
RAC: 0
Germany
Message 441592 - Posted: 22 Oct 2006, 3:55:18 UTC - in response to Message 441580.  

The majority of the data in the WU files is also ASCII encoded binary data. (the first little part is XML). Can't be sure it would compress if still in binary form.

But ASCII compresses very nicely.


Sendig the WUs in binary in the first place would compress even better :o)

Regards Hans
ID: 441592 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 442504 - Posted: 23 Oct 2006, 15:38:07 UTC - in response to Message 441580.  

The majority of the data in the WU files is also ASCII encoded binary data. (the first little part is XML). Can't be sure it would compress if still in binary form.

But ASCII compresses very nicely.


Normally True John, but not seemingly random ASCII as in one of the WUs.

Open one yourself and have a look.
ID: 442504 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 442505 - Posted: 23 Oct 2006, 15:38:31 UTC - in response to Message 441580.  
Last modified: 23 Oct 2006, 15:39:44 UTC

The majority of the data in the WU files is also ASCII encoded binary data. (the first little part is XML). Can't be sure it would compress if still in binary form.

But ASCII compresses very nicely.


Normally true John,

But not the seemingly random ASCII of an encoded WU. Open one yourself and look at the last hundred lines or so.
ID: 442505 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : News from the top - file compression


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.