Blitzed Again (Jul 02 2009) |
![]() |
| log in |
Message boards : Technical News : Blitzed Again (Jul 02 2009)
Previous · 1 · 2 · 3 · 4 · 5 · Next
| Author | Message |
|---|---|
|
| |
| ID: 914176 · | |
|
Rather than speculating in a vacuum, I suggest any of you could simply try compressing a WU or two. You'd find that setiathome_enhanced WUs are moderately compressible, astropulse_v505 only slightly. If compressed downloads were implemented, it would be akin to adding another 25 MBits/second to the download bandwidth; enough to help short term, but hardly a permanent fix. Correct me if I am wrong, but isn't it a text file that is sent each way? The payload in Enhanced work is 256KB of nearly random data encoded in uue fashion, so can be sent as text. The payload in AP work is 8MB of pure 8 bit nearly random data. In both cases, it's the task of the science applications to ferret out the few cases where the data deviates from random noise. Joe | |
| ID: 914192 · | |
Rather than speculating in a vacuum, I suggest any of you could simply try compressing a WU or two. You'd find that setiathome_enhanced WUs are moderately compressible, astropulse_v505 only slightly. If compressed downloads were implemented, it would be akin to adding another 25 MBits/second to the download bandwidth; enough to help short term, but hardly a permanent fix. The enhanced files are being transmitted as XML, not binary, therefore, there is something to work with - even if the underlying data is completely random. The one enhanced WU that I have compresses by 27% with a fairly weak compression technique. That is assuming that I used the right file for the test. The AP task I tried compressing got 11% with one of the better compression techniques that WinZip has at its disposal. This is a somewhat disappointing number. The question still remains as to whether the CPU time spent compressing that tasks is worth the saved bandwidty. Typically the project is shorter on CPU time than on bandwidth, although this not true in all cases. ____________ BOINC WIKI | |
| ID: 914197 · | |
|
it my opinion, any dual-core processor spends not a lot of time wrapping it up and compressing....a quad even less.... | |
| ID: 914206 · | |
it my opinion, any dual-core processor spends not a lot of time wrapping it up and compressing....a quad even less.... Especially when it's not doing anything else. But when the CPU is busy with other tasks, the system is low on available memory, and the disk sub-system is busy with outher throughput you will find that compressing even a small file can take a while, even more so when you've got to do 10, or a 100, or a 1,000 per second to keep up with demand. ____________ Grant Darwin NT. | |
| ID: 914219 · | |
|
i'll still say "A byte Saved is a Byte that can be spent later." this concept is helping DD@H and H@H, when it comes to paying for bandwidth. | |
| ID: 914224 · | |
i'll still say "A byte Saved is a Byte that can be spent later." this concept is helping DD@H and H@H, when it comes to paying for bandwidth. The problem here isn't just one of bandwidth- it resources overall. Compression certainly might ease the bandwidth problem, but then the load it imposes on the system could cause a new bottleneck elsewhere in the system. hey, if i were to win the lottery, i would donate a lot to boinc projects.... Same here. Unfortunately i missed out on the recent $120,000,000 draw. ____________ Grant Darwin NT. | |
| ID: 914233 · | |
|
Lossless compression is usually done with the Lempel-Ziv-Welch (LZW) algorithm or a variation of it. | |
| ID: 914264 · | |
|
Perhaps the problem is that of the distribution algorithm? | |
| ID: 914277 · | |
As the Seti data files consist of a header followed by random data the header can be compressed considerably but the random data will only be compressed a very small amount. The header is about the same size for MB and AP tasks. Random data is like concrete. It cannot be compressed because there are too few recurring patterns to be worth substituting. AP files contain binary data. MB files contain base64 encoded binary data, and so 33% bigger than if they were plain binary. They are still not very compressible with LZW because the base64 codes are also random, but instead with 64 symbols instead of 256 for binary. base64 can be compressed by removing the redundant 2 bits of each byte and then replacing them afterwards, but this is a different algorithm to LZW! Therefore, the lowest hanging fruit to reduce bandwidth demand is to remove the base64 encoding for the data in the MB files. If the newer AP workunits work without it, then why not for MB too? Andy. | |
| ID: 914287 · | |
I'm not so sure about compression. My understanding was that random noise wouldn't compress very much at all as there are no patterns in it to exploit to compress. Now a WU with a signal might compress because of the pattern of the signal. ... Very good idea, and that has parallels to the statistical analysis used to break the German Enigma codes. But... Would you get anything that could be distinguished from the background noise? Even for when Arecibo is sweeping across the sky...? Try the idea on a selection of VLARs? Keep searchin', Martin ____________ Mandriva Linux A user friendly OS! See new freedom Mageia2 The Future is what We make IT (GPLv3) | |
| ID: 914304 · | |
|
Getting back to a point much earlier in this thread about the the system being designed not to be overloaded by noisy work because there are multiple splitters working on multiple files... Has anyone noticed that the splitter working on 05mr09af appears to be 'stuck'? Its been at the same point since the servers came back up on Tuesday... | |
| ID: 914318 · | |
it my opinion, any dual-core processor spends not a lot of time wrapping it up and compressing....a quad even less.... The processing bottleneck would be at the server, not at the client. Now imagine having to compress ten to twenty tasks per second in order to feed the outbound queue. Similar for decompressing the result data. ____________ BOINC WIKI | |
| ID: 914369 · | |
it my opinion, any dual-core processor spends not a lot of time wrapping it up and compressing....a quad even less.... I agree that compression is tough. ... but what we're doing right now is taking the binary data and converting it to text. The mime type is x-setiathome which is legal but non-standard, and I can't tell how efficient it is by looking at it. If it's like Base64 it is 6 characters for four bytes, if it's like ASCII85, it's 5 characters for 4 bytes. Then it converts back on the other end. Shifting from a text encoding to binary (the way AP does it) could save 15% or more, save storage on disk, etc. Of course, the science application would have to understand more than one encoding method. ____________ | |
| ID: 914380 · | |
... Shifting from a text encoding to binary (the way AP does it) could save 15% or more, save storage on disk, etc. The code is already there as used in AP... Just add an "if" statement? (And someone to code it.) Happy crunchin', Martin ____________ Mandriva Linux A user friendly OS! See new freedom Mageia2 The Future is what We make IT (GPLv3) | |
| ID: 914384 · | |
it my opinion, any dual-core processor spends not a lot of time wrapping it up and compressing....a quad even less.... re mime type This morning it came to my attention that we've been sending out workunits with the "application/x-troff-man" mime type. This was because files with numerical suffices (like workunits) are assumed to be man pages. This may have been causing some blockages at firewalls. I changed the mime type to "text/plain." I get the feeling some Apache tables have been borked. Elsewhere I believe someone said they were UUcodes. About the worst choice from a bandwidth standpoint but perhaps the only 100% compatible choice at the time BOINC was put in place. Today I'm sure that a different encoding can be chosen. As to compression, and some tests people have run, first don't forget when you compress the file, you are compressing a file that was uncompressed to be transmitted as plain text. At least one of every 8 bits is a zero and 12.5% compression is automatic in that case. If they are UUCode or Base64 or anything else, none of these use all 128 possible 7 bit configurations, so even more automatic compression is a given. But all this automatic compression is false. To transmit you have to uncompress to plain text. The numbers I saw quoted looked like MB is UUCode and AP is Base64 with random incompressible data inside as expected. Perhaps the best compression would be to transmit in a full 8 bit binary mode. I'm not sure however if that is 100% compatible with all the equipment everywhere in use on the net. I know that everything sold today is, but who has what 20 year old box with a couple of percent of SETI users behind it? ____________ | |
| ID: 914389 · | |
re mime type That's talking about the HTTP header, and the only thing that would look at that is some nosy, paranoid firewall (they're supposed to be paranoid). I'm talking about the internal type in the XML-ish work-unit. That is x-setiathome. It's a private encoding, or it wouldn't start with "x-" and it doesn't have to follow UUENCODE because SETI@Home "owns" both ends of the process. Looking at the data, it isn't hex, which would be kind-of dumb. UUENCODE and Base64 look a lot alike, and are about as efficient at six bits per character. It could be something like ASCII85, which encodes four bytes into five characters. What it isn't is raw binary. Binary could work because HTTP can handle binary (or you couldn't have graphics on web pages, GIFs and JPEGs are binary), and it would have the least CPU load, especially on the server end. ____________ | |
| ID: 914418 · | |
... Shifting from a text encoding to binary (the way AP does it) could save 15% or more, save storage on disk, etc. ... and test it, and then time to deploy, and it'd be nice to give the optimized apps a chance to add it to their apps. The actual coding is probably the easiest part. ____________ | |
| ID: 914421 · | |
How about rather than getting other uni's involved you get Nvidia involved. The deciding factor for me between an ATI card and an Nvidia card was the CUDA. Both cards play the games I want to play but only one will crunch. Amp up CUDA support messages in the SETI website and I'm sure that Nvidia would find hosting *all* the SETI boxes with a monster pipe a very profitable move. Win Win Win all round. Berkeley getts the boxes and their associated costs off site, we get WU to crunch, Nvidia gets advertising worth millions, the SETI team spends money on research rather than airconditioning and leasing space. Cheers Jason =:) | |
| ID: 914561 · | |
|
well, hey that would be nice, but people are working on an opencl platform that will allow nvid and ati cards to crunch.... it would go much faster if ati weren't so "do it yourself...." which then if it came out, their goes the incentive for nvid. | |
| ID: 914566 · | |
Message boards : Technical News : Blitzed Again (Jul 02 2009)
| Copyright © 2013 University of California |