Message boards :
Number crunching :
Idea for Boinc.....
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
Apart for the fact that there simply are too much computers or too little bandwidth, not much can be done. Making the WUs bigger means less WUs, but still the same amount of data. Compressing MB WUs could help, if it isn't too much of an 'operation' and puts extra strain on the server, or splitter, which is used for compression. Noticing the crazy amount of time and retries, DownLoading a few AstroPulse WUs, 36 hours and still downloading.................but eventually, they arrive... UPLoads aren't possible at all.(Have to check again!) Hope the extra memory for the PAIX Router, will help, as I read in another post, forgot which 'thread'. |
janneseti Send message Joined: 14 Oct 09 Posts: 14106 Credit: 655,366 RAC: 0 |
Apart for the fact that there simply are too much computers or too little bandwidth, not much can be done. |
janneseti Send message Joined: 14 Oct 09 Posts: 14106 Credit: 655,366 RAC: 0 |
Apart for the fact that there simply are too much computers or too little bandwidth, not much can be done. You're right. I compressed some VLAR files and they got smaller by a third. 360 kBytes got 260 kbytes which is a surprise for me. Maybe that is a coincident for my VLAR files but surely interesting enough to follow up. The extra strain on SETI's servers should be manegable. |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
One other possibility for MB WUs which has been noted before is to switch to pure binary data rather than the Base64 format used now. That would require an application change to handle, but S@h v7 could fairly easily be modified along those lines. The code could be imported from the Astropulse application. That would reduce the data section from 354991 bytes to just 262144 bytes, overall WU size would be almost as small as compressing the current WUs but with no extra computation needed server-side. It would actually reduce splitter load slightly, storing the raw binary is faster than converting it to Base64. The download pipe may get ~72000 MB WUs sent per hour now on average when operating well with some AP mixed in. The change would allow that to increase to about ~92000 per hour. Not a miracle cure, but perhaps enough to justify the change. Joe |
rob smith Send message Joined: 7 Mar 03 Posts: 22204 Credit: 416,307,556 RAC: 380 |
While it may appear to be a "good idea" there are quite a few overheads associated with moving from a "raw data" feed to a "compressed data" feed. First, and by no means least, is the vast number of clients out in the field that would have to be updated. You have the not inconsiderable server overhead of compressing the data before dispatch. During the transition period you have to be able to detect which clients can accept a compressed WU and which can't - its no good trying server-side tracking, it just won't work given the rate some folks update, tweak, modify, regress their crunchers. Most of the projects that are held up as using compressed data did so from the outset, and as such both the clients and the servers have been configured with compression/de-compression in mind, rather than being "bolted on" as a late addition Mark's initial idea was a fairly simple attempt at reducing the amount of traffic, mainly by hitting the overheads, the non-productive traffic and as such it bears some thinking about - how to reduce the amount of non-productive traffic? And doing so in such a manner that it is "client transparent" - in other words, if the client can perform the latest magic trick it does it, but if it can't then the servers don't get themselves in a twist... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
soft^spirit Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0 |
a basic unzip is available through most clients. Proprietary versions and .rar should be avoided due to compatibility/licensing issues, and I am pretty sure there are free versions that could be sent out with a version of the client. Self extracting is possible, but each .exe would be reviewed and possibly require manual intervention by the end user. Not a good idea for this application. But in principle if it saves a fair amount of space, it should make better use of the bandwidth. Yes the servers would have to have the capacity to perform the compress function. But in theory, it could be a good idea. Janice |
Tex1954 Send message Joined: 16 Mar 11 Posts: 12 Credit: 6,654,193 RAC: 17 |
I didn't see this thread and started a new one. http://setiathome.berkeley.edu/forum_thread.php?id=65725&nowrap=true#1160350 MPEG or ZIP compression algorithms would be a really iffy thing. One could not rely on it much due to the nature of the data. It could be all irrelevant NULL data or filled with subtle pertinent data that could be lost in compression. I think by far the best idea is to do what I (and others) already suggested; that being make the tasks 10 times longer. Einstein sends multiple 4-Meg files per WU and has a small upload. SETI seems to be about the same... larger download, smaller upload. So, thinking disk related I/O is bottle-necking throughput and causing lost packets (due to timeouts and other concurrent I/O requests), making the tasks 10 times longer (by whatever means) could theoretically reduce indexing/disk I/O lookups an order of magnitude. JMHO :) |
Wiggo Send message Joined: 24 Jan 00 Posts: 34754 Credit: 261,360,520 RAC: 489 |
Some time ago I remember 1 of the guys from the closet (I just can't remember which 1) said that compressing the files won't work with the data that we're working with as compression will introduce errors in the signals that we're trying to analyse. Cheers. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Some time ago I remember 1 of the guys from the closet (I just can't remember which 1) said that compressing the files won't work with the data that we're working with as compression will introduce errors in the signals that we're trying to analyse. Sorry, but rubbish. There are two sorts of file compression: lossy, and lossless. 'lossy' compression (like JPEG for images) works by throwing away fine detail that the human eye wouldn't notice anyway. That would indeed introduce errors, and wouldn't be considered here. But 'lossless' compression like the various flavours of ZIP allows the recreation of the exact same digital data file at the destination. That wouldn't cause any errors at all, and it's perfectly fair to consider it here - even if the compression ratios achieved with lossless compression are much less impressive than those for lossy compression. |
janneseti Send message Joined: 14 Oct 09 Posts: 14106 Credit: 655,366 RAC: 0 |
Some time ago I remember 1 of the guys from the closet (I just can't remember which 1) said that compressing the files won't work with the data that we're working with as compression will introduce errors in the signals that we're trying to analyse. BOINC has already a solution to this and we crunchers does not have to bother. |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
I didn't see this thread and started a new one. Making the tasks 10 times larger won't help much as the the size of the data for a typical update is minuscule in comparison to the size of the data transferred for each task. The gain would be a few at most. There would be ten times the data sent for each task as now, so the only gains would be the overlap and the slight decrease in size of a work request. Zipping the files would decrease the data size of MB tasks by about 30% but would increase the load on some of the servers that are already near the edge of CPU power. BOINC WIKI |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
Maybe a combination of things? 1. Josef's idea of the pure binary data instead of Base64 (approx 50%) saving in data size 2. Increase wu size by 50% so its back to its original size but host is getting more work to do 3. gzip compression on the scheduler request That saves the servers having to compress (except on scheduler request) thus doesn't require too much extra cpu power and the number of scheduler requests should drop as larger work is being given out. BOINC blog |
W-K 666 Send message Joined: 18 May 99 Posts: 19063 Credit: 40,757,560 RAC: 67 |
It is not the size of the task that is important but the result, and the storing thereof. Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians). They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number. So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in. |
Ingleside Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13 |
While it may appear to be a "good idea" there are quite a few overheads associated with moving from a "raw data" feed to a "compressed data" feed. First, and by no means least, is the vast number of clients out in the field that would have to be updated. On-the-fly download-compression has been included since v5.4.xx, while on-the-fly upload-compression has been included since v5.8.xx. So appart for some zombie-computers still running ancient clients, 99.9% of clients supports this. Also, as long as project uses on-the-fly compression, the few ancient clients will just get the uncompressed files, meaning 100% of clients supports this. The extra load on download-servers on the other hand can be a problem. Also, the on-the-fly compression possibly gives less effects than other compression-methods, so it's not a good option. Using pre-compressed files is another option, as long as it's gzipped files the client will just automatically uncompress the files on download. But, unfortunately this method breaks download-resuming, and the few ancient clients running pre-v5.4.xx won't work. If uses pre-compressed files like example zip-files, these AFAIK needs application-support to decompress the files, so needs a new application. But if you're going to add compression-support into the application, as Josef Segur already mentioned it's better to not encode the data in the format it's being done now, but use binary data instead. Seeing how a new SETI-application will be released "soon" anyway, adding neccessary changes to application would be better than to add other compression-support. "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
|
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
It is not the size of the task that is important but the result, and the storing thereof. No. Append 2 tasks together (with appropiate header info). Same as Einstein did except they put 8 of them together. They run as a single task, but are really multiple tasks run sequentially. Would still require an app update to cope and probably scheduler (to append tasks) but maintains DB consistency. BOINC blog |
W-K 666 Send message Joined: 18 May 99 Posts: 19063 Credit: 40,757,560 RAC: 67 |
It is not the size of the task that is important but the result, and the storing thereof. Unless the situation has changed significantly, since I last did Einstein. Only if you have fast h/ware and devote enough time/resources to Einstein. |
Fred J. Verster Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0 |
It is not the size of the task that is important but the result, and the storing thereof. 30 or 31 'signals' are regarded, an overflow, so why should 30 not be enough, what would the average be for Pulses, Spikes, Triplets and Gaussians, in a 'normal' 0.4AR WU. Chance of finding combined signals close to 30 would be possible, with 'Double Sized WUs', can it become a problem?
They did improve their GPU app. with the same resources, my througput, or R.A.C. in Einstein@home, has improved by more then 5 times. (Probably 10 fold, by looking at RAC only.) |
John McLeod VII Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0 |
It is not the size of the task that is important but the result, and the storing thereof. It really would not help that much. Most of the data transmission in the task is the raw data, not the updates. Doubling it halves the number of tasks, but doubles the size of each one - net is a wash. BOINC WIKI |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
It really would not help that much. Most of the data transmission in the task is the raw data, not the updates. Doubling it halves the number of tasks, but doubles the size of each one - net is a wash. It would reduce the rate of scheduler requests by a factor of 2. If they were also compressed then you get a (bandwidth) saving there as well. They could take it a step further and have more than 2 wu appended which then reduces the number of scheduler request even further. BOINC blog |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.