Idea for Boinc.....

Author	Message
Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1160265 - Posted: 8 Oct 2011, 17:59:31 UTC - in response to Message 1160260. Apart for the fact that there simply are too much computers or too little bandwidth, not much can be done. Making the WUs bigger means less WUs, but still the same amount of data. Compressing MB WUs could help, if it isn't too much of an 'operation' and puts extra strain on the server, or splitter, which is used for compression. Noticing the crazy amount of time and retries, DownLoading a few AstroPulse WUs, 36 hours and still downloading.................but eventually, they arrive... UPLoads aren't possible at all.(Have to check again!) Hope the extra memory for the PAIX Router, will help, as I read in another post, forgot which 'thread'. ID: 1160265 ·

janneseti Send message Joined: 14 Oct 09 Posts: 14106 Credit: 655,366 RAC: 0	Message 1160291 - Posted: 8 Oct 2011, 18:49:26 UTC - in response to Message 1160265. Apart for the fact that there simply are too much computers or too little bandwidth, not much can be done. Making the WUs bigger means less WUs, but still the same amount of data. Compressing MB WUs could help, if it isn't too much of an 'operation' and puts extra strain on the server, or splitter, which is used for compression. Noticing the crazy amount of time and retries, DownLoading a few AstroPulse WUs, 36 hours and still downloading.................but eventually, they arrive... UPLoads aren't possible at all.(Have to check again!) Hope the extra memory for the PAIX Router, will help, as I read in another post, forgot which 'thread'. ID: 1160291 ·

janneseti Send message Joined: 14 Oct 09 Posts: 14106 Credit: 655,366 RAC: 0	Message 1160292 - Posted: 8 Oct 2011, 18:49:59 UTC - in response to Message 1160291. Apart for the fact that there simply are too much computers or too little bandwidth, not much can be done. Making the WUs bigger means less WUs, but still the same amount of data. Compressing MB WUs could help, if it isn't too much of an 'operation' and puts extra strain on the server, or splitter, which is used for compression. Noticing the crazy amount of time and retries, DownLoading a few AstroPulse WUs, 36 hours and still downloading.................but eventually, they arrive... UPLoads aren't possible at all.(Have to check again!) Hope the extra memory for the PAIX Router, will help, as I read in another post, forgot which 'thread'. You're right. I compressed some VLAR files and they got smaller by a third. 360 kBytes got 260 kbytes which is a surprise for me. Maybe that is a coincident for my VLAR files but surely interesting enough to follow up. The extra strain on SETI's servers should be manegable. ID: 1160292 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 1160309 - Posted: 8 Oct 2011, 19:35:33 UTC One other possibility for MB WUs which has been noted before is to switch to pure binary data rather than the Base64 format used now. That would require an application change to handle, but S@h v7 could fairly easily be modified along those lines. The code could be imported from the Astropulse application. That would reduce the data section from 354991 bytes to just 262144 bytes, overall WU size would be almost as small as compressing the current WUs but with no extra computation needed server-side. It would actually reduce splitter load slightly, storing the raw binary is faster than converting it to Base64. The download pipe may get ~72000 MB WUs sent per hour now on average when operating well with some AP mixed in. The change would allow that to increase to about ~92000 per hour. Not a miracle cure, but perhaps enough to justify the change. Joe ID: 1160309 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22204 Credit: 416,307,556 RAC: 380	Message 1160337 - Posted: 8 Oct 2011, 20:44:05 UTC While it may appear to be a "good idea" there are quite a few overheads associated with moving from a "raw data" feed to a "compressed data" feed. First, and by no means least, is the vast number of clients out in the field that would have to be updated. You have the not inconsiderable server overhead of compressing the data before dispatch. During the transition period you have to be able to detect which clients can accept a compressed WU and which can't - its no good trying server-side tracking, it just won't work given the rate some folks update, tweak, modify, regress their crunchers. Most of the projects that are held up as using compressed data did so from the outset, and as such both the clients and the servers have been configured with compression/de-compression in mind, rather than being "bolted on" as a late addition Mark's initial idea was a fairly simple attempt at reducing the amount of traffic, mainly by hitting the overheads, the non-productive traffic and as such it bears some thinking about - how to reduce the amount of non-productive traffic? And doing so in such a manner that it is "client transparent" - in other words, if the client can perform the latest magic trick it does it, but if it can't then the servers don't get themselves in a twist... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1160337 ·

soft^spirit Send message Joined: 18 May 99 Posts: 6497 Credit: 34,134,168 RAC: 0	Message 1160341 - Posted: 8 Oct 2011, 20:59:47 UTC a basic unzip is available through most clients. Proprietary versions and .rar should be avoided due to compatibility/licensing issues, and I am pretty sure there are free versions that could be sent out with a version of the client. Self extracting is possible, but each .exe would be reviewed and possibly require manual intervention by the end user. Not a good idea for this application. But in principle if it saves a fair amount of space, it should make better use of the bandwidth. Yes the servers would have to have the capacity to perform the compress function. But in theory, it could be a good idea. Janice ID: 1160341 ·

Tex1954 Volunteer tester Send message Joined: 16 Mar 11 Posts: 12 Credit: 6,654,193 RAC: 17	Message 1160401 - Posted: 8 Oct 2011, 23:03:42 UTC - in response to Message 1160341. I didn't see this thread and started a new one. http://setiathome.berkeley.edu/forum_thread.php?id=65725&nowrap=true#1160350 MPEG or ZIP compression algorithms would be a really iffy thing. One could not rely on it much due to the nature of the data. It could be all irrelevant NULL data or filled with subtle pertinent data that could be lost in compression. I think by far the best idea is to do what I (and others) already suggested; that being make the tasks 10 times longer. Einstein sends multiple 4-Meg files per WU and has a small upload. SETI seems to be about the same... larger download, smaller upload. So, thinking disk related I/O is bottle-necking throughput and causing lost packets (due to timeouts and other concurrent I/O requests), making the tasks 10 times longer (by whatever means) could theoretically reduce indexing/disk I/O lookups an order of magnitude. JMHO :) ID: 1160401 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34754 Credit: 261,360,520 RAC: 489	Message 1160410 - Posted: 8 Oct 2011, 23:20:30 UTC - in response to Message 1160401. Some time ago I remember 1 of the guys from the closet (I just can't remember which 1) said that compressing the files won't work with the data that we're working with as compression will introduce errors in the signals that we're trying to analyse. Cheers. ID: 1160410 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1160420 - Posted: 8 Oct 2011, 23:36:45 UTC - in response to Message 1160410. Some time ago I remember 1 of the guys from the closet (I just can't remember which 1) said that compressing the files won't work with the data that we're working with as compression will introduce errors in the signals that we're trying to analyse. Cheers. Sorry, but rubbish. There are two sorts of file compression: lossy, and lossless. 'lossy' compression (like JPEG for images) works by throwing away fine detail that the human eye wouldn't notice anyway. That would indeed introduce errors, and wouldn't be considered here. But 'lossless' compression like the various flavours of ZIP allows the recreation of the exact same digital data file at the destination. That wouldn't cause any errors at all, and it's perfectly fair to consider it here - even if the compression ratios achieved with lossless compression are much less impressive than those for lossy compression. ID: 1160420 ·

janneseti Send message Joined: 14 Oct 09 Posts: 14106 Credit: 655,366 RAC: 0	Message 1160426 - Posted: 8 Oct 2011, 23:48:47 UTC - in response to Message 1160420. Some time ago I remember 1 of the guys from the closet (I just can't remember which 1) said that compressing the files won't work with the data that we're working with as compression will introduce errors in the signals that we're trying to analyse. Cheers. Sorry, but rubbish. There are two sorts of file compression: lossy, and lossless. 'lossy' compression (like JPEG for images) works by throwing away fine detail that the human eye wouldn't notice anyway. That would indeed introduce errors, and wouldn't be considered here. But 'lossless' compression like the various flavours of ZIP allows the recreation of the exact same digital data file at the destination. That wouldn't cause any errors at all, and it's perfectly fair to consider it here - even if the compression ratios achieved with lossless compression are much less impressive than those for lossy compression. BOINC has already a solution to this and we crunchers does not have to bother. ID: 1160426 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 1160472 - Posted: 9 Oct 2011, 3:40:12 UTC - in response to Message 1160401. I didn't see this thread and started a new one. http://setiathome.berkeley.edu/forum_thread.php?id=65725&nowrap=true#1160350 MPEG or ZIP compression algorithms would be a really iffy thing. One could not rely on it much due to the nature of the data. It could be all irrelevant NULL data or filled with subtle pertinent data that could be lost in compression. I think by far the best idea is to do what I (and others) already suggested; that being make the tasks 10 times longer. Einstein sends multiple 4-Meg files per WU and has a small upload. SETI seems to be about the same... larger download, smaller upload. So, thinking disk related I/O is bottle-necking throughput and causing lost packets (due to timeouts and other concurrent I/O requests), making the tasks 10 times longer (by whatever means) could theoretically reduce indexing/disk I/O lookups an order of magnitude. JMHO :) Making the tasks 10 times larger won't help much as the the size of the data for a typical update is minuscule in comparison to the size of the data transferred for each task. The gain would be a few at most. There would be ten times the data sent for each task as now, so the only gains would be the overlap and the slight decrease in size of a work request. Zipping the files would decrease the data size of MB tasks by about 30% but would increase the load on some of the servers that are already near the edge of CPU power. BOINC WIKI ID: 1160472 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 1160515 - Posted: 9 Oct 2011, 8:40:14 UTC Maybe a combination of things? 1. Josef's idea of the pure binary data instead of Base64 (approx 50%) saving in data size 2. Increase wu size by 50% so its back to its original size but host is getting more work to do 3. gzip compression on the scheduler request That saves the servers having to compress (except on scheduler request) thus doesn't require too much extra cpu power and the number of scheduler requests should drop as larger work is being given out. BOINC blog ID: 1160515 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19063 Credit: 40,757,560 RAC: 67	Message 1160539 - Posted: 9 Oct 2011, 11:02:43 UTC It is not the size of the task that is important but the result, and the storing thereof. Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians). They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number. So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in. ID: 1160539 ·

Ingleside Volunteer developer Send message Joined: 4 Feb 03 Posts: 1546 Credit: 15,832,022 RAC: 13	Message 1160545 - Posted: 9 Oct 2011, 12:06:08 UTC - in response to Message 1160337. While it may appear to be a "good idea" there are quite a few overheads associated with moving from a "raw data" feed to a "compressed data" feed. First, and by no means least, is the vast number of clients out in the field that would have to be updated. On-the-fly download-compression has been included since v5.4.xx, while on-the-fly upload-compression has been included since v5.8.xx. So appart for some zombie-computers still running ancient clients, 99.9% of clients supports this. Also, as long as project uses on-the-fly compression, the few ancient clients will just get the uncompressed files, meaning 100% of clients supports this. The extra load on download-servers on the other hand can be a problem. Also, the on-the-fly compression possibly gives less effects than other compression-methods, so it's not a good option. Using pre-compressed files is another option, as long as it's gzipped files the client will just automatically uncompress the files on download. But, unfortunately this method breaks download-resuming, and the few ancient clients running pre-v5.4.xx won't work. If uses pre-compressed files like example zip-files, these AFAIK needs application-support to decompress the files, so needs a new application. But if you're going to add compression-support into the application, as Josef Segur already mentioned it's better to not encode the data in the format it's being done now, but use binary data instead. Seeing how a new SETI-application will be released "soon" anyway, adding neccessary changes to application would be better than to add other compression-support. "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." ID: 1160545 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 1160801 - Posted: 10 Oct 2011, 2:32:43 UTC Switching to Binary is probably the best space savings available. BOINC WIKI ID: 1160801 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 1160898 - Posted: 10 Oct 2011, 11:11:50 UTC - in response to Message 1160539. It is not the size of the task that is important but the result, and the storing thereof. Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians). They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number. So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in. No. Append 2 tasks together (with appropiate header info). Same as Einstein did except they put 8 of them together. They run as a single task, but are really multiple tasks run sequentially. Would still require an app update to cope and probably scheduler (to append tasks) but maintains DB consistency. BOINC blog ID: 1160898 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19063 Credit: 40,757,560 RAC: 67	Message 1160918 - Posted: 10 Oct 2011, 13:16:51 UTC - in response to Message 1160898. It is not the size of the task that is important but the result, and the storing thereof. Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians). They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number. So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in. No. Append 2 tasks together (with appropiate header info). Same as Einstein did except they put 8 of them together. They run as a single task, but are really multiple tasks run sequentially. Would still require an app update to cope and probably scheduler (to append tasks) but maintains DB consistency. Unless the situation has changed significantly, since I last did Einstein. Only if you have fast h/ware and devote enough time/resources to Einstein. ID: 1160918 ·

Fred J. Verster Volunteer tester Send message Joined: 21 Apr 04 Posts: 3252 Credit: 31,903,643 RAC: 0	Message 1160939 - Posted: 10 Oct 2011, 14:39:10 UTC - in response to Message 1160918. Last modified: 10 Oct 2011, 14:43:26 UTC It is not the size of the task that is important but the result, and the storing thereof. Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians). They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number. So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in. 30 or 31 'signals' are regarded, an overflow, so why should 30 not be enough, what would the average be for Pulses, Spikes, Triplets and Gaussians, in a 'normal' 0.4AR WU. Chance of finding combined signals close to 30 would be possible, with 'Double Sized WUs', can it become a problem? No. Append 2 tasks together (with appropiate header info). Same as Einstein did except they put 8 of them together. They run as a single task, but are really multiple tasks run sequentially. Would still require an app update to cope and probably scheduler (to append tasks) but maintains DB consistency. Unless the situation has changed significantly, since I last did Einstein. Only if you have fast h/ware and devote enough time/resources to Einstein. They did improve their GPU app. with the same resources, my througput, or R.A.C. in Einstein@home, has improved by more then 5 times. (Probably 10 fold, by looking at RAC only.) ID: 1160939 ·

John McLeod VII Volunteer developer Volunteer tester Send message Joined: 15 Jul 99 Posts: 24806 Credit: 790,712 RAC: 0	Message 1161089 - Posted: 10 Oct 2011, 23:06:28 UTC - in response to Message 1160898. It is not the size of the task that is important but the result, and the storing thereof. Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians). They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number. So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in. No. Append 2 tasks together (with appropiate header info). Same as Einstein did except they put 8 of them together. They run as a single task, but are really multiple tasks run sequentially. Would still require an app update to cope and probably scheduler (to append tasks) but maintains DB consistency. It really would not help that much. Most of the data transmission in the task is the raw data, not the updates. Doubling it halves the number of tasks, but doubles the size of each one - net is a wash. BOINC WIKI ID: 1161089 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 1161440 - Posted: 12 Oct 2011, 10:57:45 UTC - in response to Message 1161089. Last modified: 12 Oct 2011, 11:04:30 UTC It really would not help that much. Most of the data transmission in the task is the raw data, not the updates. Doubling it halves the number of tasks, but doubles the size of each one - net is a wash. It would reduce the rate of scheduler requests by a factor of 2. If they were also compressed then you get a (bandwidth) saving there as well. They could take it a step further and have more than 2 wu appended which then reduces the number of scheduler request even further. BOINC blog ID: 1161440 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.