Idea for Boinc.....

Message boards : Number crunching : Idea for Boinc.....
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1160265 - Posted: 8 Oct 2011, 17:59:31 UTC - in response to Message 1160260.  

Apart for the fact that there simply are too much computers or too little bandwidth, not much can be done.

Making the WUs bigger means less WUs, but still the same amount of data.
Compressing MB WUs could help, if it isn't too much of an 'operation' and puts
extra strain on the server, or splitter, which is used for compression.

Noticing the crazy amount of time and retries, DownLoading a few AstroPulse
WUs, 36 hours and still downloading.................but eventually, they arrive...

UPLoads aren't possible at all.(Have to check again!)
Hope the extra memory for the PAIX Router, will help, as I read in another post,
forgot which 'thread'.


ID: 1160265 · Report as offensive
Profile janneseti
Avatar

Send message
Joined: 14 Oct 09
Posts: 14106
Credit: 655,366
RAC: 0
Sweden
Message 1160291 - Posted: 8 Oct 2011, 18:49:26 UTC - in response to Message 1160265.  

Apart for the fact that there simply are too much computers or too little bandwidth, not much can be done.

Making the WUs bigger means less WUs, but still the same amount of data.
Compressing MB WUs could help, if it isn't too much of an 'operation' and puts
extra strain on the server, or splitter, which is used for compression.

Noticing the crazy amount of time and retries, DownLoading a few AstroPulse
WUs, 36 hours and still downloading.................but eventually, they arrive...

UPLoads aren't possible at all.(Have to check again!)
Hope the extra memory for the PAIX Router, will help, as I read in another post,
forgot which 'thread'.


ID: 1160291 · Report as offensive
Profile janneseti
Avatar

Send message
Joined: 14 Oct 09
Posts: 14106
Credit: 655,366
RAC: 0
Sweden
Message 1160292 - Posted: 8 Oct 2011, 18:49:59 UTC - in response to Message 1160291.  

Apart for the fact that there simply are too much computers or too little bandwidth, not much can be done.

Making the WUs bigger means less WUs, but still the same amount of data.
Compressing MB WUs could help, if it isn't too much of an 'operation' and puts
extra strain on the server, or splitter, which is used for compression.

Noticing the crazy amount of time and retries, DownLoading a few AstroPulse
WUs, 36 hours and still downloading.................but eventually, they arrive...

UPLoads aren't possible at all.(Have to check again!)
Hope the extra memory for the PAIX Router, will help, as I read in another post,
forgot which 'thread'.




You're right. I compressed some VLAR files and they got smaller by a third.
360 kBytes got 260 kbytes which is a surprise for me.
Maybe that is a coincident for my VLAR files but surely interesting enough to follow up.
The extra strain on SETI's servers should be manegable.
ID: 1160292 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1160309 - Posted: 8 Oct 2011, 19:35:33 UTC

One other possibility for MB WUs which has been noted before is to switch to pure binary data rather than the Base64 format used now. That would require an application change to handle, but S@h v7 could fairly easily be modified along those lines. The code could be imported from the Astropulse application.

That would reduce the data section from 354991 bytes to just 262144 bytes, overall WU size would be almost as small as compressing the current WUs but with no extra computation needed server-side. It would actually reduce splitter load slightly, storing the raw binary is faster than converting it to Base64.

The download pipe may get ~72000 MB WUs sent per hour now on average when operating well with some AP mixed in. The change would allow that to increase to about ~92000 per hour. Not a miracle cure, but perhaps enough to justify the change.
                                                                  Joe
ID: 1160309 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22149
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1160337 - Posted: 8 Oct 2011, 20:44:05 UTC

While it may appear to be a "good idea" there are quite a few overheads associated with moving from a "raw data" feed to a "compressed data" feed. First, and by no means least, is the vast number of clients out in the field that would have to be updated. You have the not inconsiderable server overhead of compressing the data before dispatch. During the transition period you have to be able to detect which clients can accept a compressed WU and which can't - its no good trying server-side tracking, it just won't work given the rate some folks update, tweak, modify, regress their crunchers. Most of the projects that are held up as using compressed data did so from the outset, and as such both the clients and the servers have been configured with compression/de-compression in mind, rather than being "bolted on" as a late addition

Mark's initial idea was a fairly simple attempt at reducing the amount of traffic, mainly by hitting the overheads, the non-productive traffic and as such it bears some thinking about - how to reduce the amount of non-productive traffic? And doing so in such a manner that it is "client transparent" - in other words, if the client can perform the latest magic trick it does it, but if it can't then the servers don't get themselves in a twist...
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1160337 · Report as offensive
Profile soft^spirit
Avatar

Send message
Joined: 18 May 99
Posts: 6497
Credit: 34,134,168
RAC: 0
United States
Message 1160341 - Posted: 8 Oct 2011, 20:59:47 UTC

a basic unzip is available through most clients. Proprietary versions and .rar should be avoided due to compatibility/licensing issues, and I am pretty sure there are free versions that could be sent out with a version of the client.

Self extracting is possible, but each .exe would be reviewed and possibly require manual intervention by the end user. Not a good idea for this application.

But in principle if it saves a fair amount of space, it should make better use of the bandwidth. Yes the servers would have to have the capacity to perform the compress function. But in theory, it could be a good idea.


Janice
ID: 1160341 · Report as offensive
Tex1954
Volunteer tester

Send message
Joined: 16 Mar 11
Posts: 12
Credit: 6,654,193
RAC: 17
United States
Message 1160401 - Posted: 8 Oct 2011, 23:03:42 UTC - in response to Message 1160341.  

I didn't see this thread and started a new one.
http://setiathome.berkeley.edu/forum_thread.php?id=65725&nowrap=true#1160350


MPEG or ZIP compression algorithms would be a really iffy thing. One could not rely on it much due to the nature of the data. It could be all irrelevant NULL data or filled with subtle pertinent data that could be lost in compression.

I think by far the best idea is to do what I (and others) already suggested; that being make the tasks 10 times longer.

Einstein sends multiple 4-Meg files per WU and has a small upload. SETI seems to be about the same... larger download, smaller upload.

So, thinking disk related I/O is bottle-necking throughput and causing lost packets (due to timeouts and other concurrent I/O requests), making the tasks 10 times longer (by whatever means) could theoretically reduce indexing/disk I/O lookups an order of magnitude.

JMHO

:)
ID: 1160401 · Report as offensive
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 1160410 - Posted: 8 Oct 2011, 23:20:30 UTC - in response to Message 1160401.  

Some time ago I remember 1 of the guys from the closet (I just can't remember which 1) said that compressing the files won't work with the data that we're working with as compression will introduce errors in the signals that we're trying to analyse.

Cheers.
ID: 1160410 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1160420 - Posted: 8 Oct 2011, 23:36:45 UTC - in response to Message 1160410.  

Some time ago I remember 1 of the guys from the closet (I just can't remember which 1) said that compressing the files won't work with the data that we're working with as compression will introduce errors in the signals that we're trying to analyse.

Cheers.

Sorry, but rubbish.

There are two sorts of file compression: lossy, and lossless.

'lossy' compression (like JPEG for images) works by throwing away fine detail that the human eye wouldn't notice anyway. That would indeed introduce errors, and wouldn't be considered here.

But 'lossless' compression like the various flavours of ZIP allows the recreation of the exact same digital data file at the destination. That wouldn't cause any errors at all, and it's perfectly fair to consider it here - even if the compression ratios achieved with lossless compression are much less impressive than those for lossy compression.
ID: 1160420 · Report as offensive
Profile janneseti
Avatar

Send message
Joined: 14 Oct 09
Posts: 14106
Credit: 655,366
RAC: 0
Sweden
Message 1160426 - Posted: 8 Oct 2011, 23:48:47 UTC - in response to Message 1160420.  

Some time ago I remember 1 of the guys from the closet (I just can't remember which 1) said that compressing the files won't work with the data that we're working with as compression will introduce errors in the signals that we're trying to analyse.

Cheers.

Sorry, but rubbish.

There are two sorts of file compression: lossy, and lossless.

'lossy' compression (like JPEG for images) works by throwing away fine detail that the human eye wouldn't notice anyway. That would indeed introduce errors, and wouldn't be considered here.

But 'lossless' compression like the various flavours of ZIP allows the recreation of the exact same digital data file at the destination. That wouldn't cause any errors at all, and it's perfectly fair to consider it here - even if the compression ratios achieved with lossless compression are much less impressive than those for lossy compression.


BOINC has already a solution to this and we crunchers does not have to bother.
ID: 1160426 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 1160472 - Posted: 9 Oct 2011, 3:40:12 UTC - in response to Message 1160401.  

I didn't see this thread and started a new one.
http://setiathome.berkeley.edu/forum_thread.php?id=65725&nowrap=true#1160350


MPEG or ZIP compression algorithms would be a really iffy thing. One could not rely on it much due to the nature of the data. It could be all irrelevant NULL data or filled with subtle pertinent data that could be lost in compression.

I think by far the best idea is to do what I (and others) already suggested; that being make the tasks 10 times longer.

Einstein sends multiple 4-Meg files per WU and has a small upload. SETI seems to be about the same... larger download, smaller upload.

So, thinking disk related I/O is bottle-necking throughput and causing lost packets (due to timeouts and other concurrent I/O requests), making the tasks 10 times longer (by whatever means) could theoretically reduce indexing/disk I/O lookups an order of magnitude.

JMHO

:)

Making the tasks 10 times larger won't help much as the the size of the data for a typical update is minuscule in comparison to the size of the data transferred for each task. The gain would be a few at most. There would be ten times the data sent for each task as now, so the only gains would be the overlap and the slight decrease in size of a work request.

Zipping the files would decrease the data size of MB tasks by about 30% but would increase the load on some of the servers that are already near the edge of CPU power.


BOINC WIKI
ID: 1160472 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 1160515 - Posted: 9 Oct 2011, 8:40:14 UTC

Maybe a combination of things?

1. Josef's idea of the pure binary data instead of Base64 (approx 50%) saving in data size

2. Increase wu size by 50% so its back to its original size but host is getting more work to do

3. gzip compression on the scheduler request

That saves the servers having to compress (except on scheduler request) thus doesn't require too much extra cpu power and the number of scheduler requests should drop as larger work is being given out.
BOINC blog
ID: 1160515 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 18996
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1160539 - Posted: 9 Oct 2011, 11:02:43 UTC

It is not the size of the task that is important but the result, and the storing thereof.

Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians).
They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number.

So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in.
ID: 1160539 · Report as offensive
Ingleside
Volunteer developer

Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 15,832,022
RAC: 13
Norway
Message 1160545 - Posted: 9 Oct 2011, 12:06:08 UTC - in response to Message 1160337.  

While it may appear to be a "good idea" there are quite a few overheads associated with moving from a "raw data" feed to a "compressed data" feed. First, and by no means least, is the vast number of clients out in the field that would have to be updated.

On-the-fly download-compression has been included since v5.4.xx, while on-the-fly upload-compression has been included since v5.8.xx. So appart for some zombie-computers still running ancient clients, 99.9% of clients supports this. Also, as long as project uses on-the-fly compression, the few ancient clients will just get the uncompressed files, meaning 100% of clients supports this.

The extra load on download-servers on the other hand can be a problem. Also, the on-the-fly compression possibly gives less effects than other compression-methods, so it's not a good option.

Using pre-compressed files is another option, as long as it's gzipped files the client will just automatically uncompress the files on download. But, unfortunately this method breaks download-resuming, and the few ancient clients running pre-v5.4.xx won't work.

If uses pre-compressed files like example zip-files, these AFAIK needs application-support to decompress the files, so needs a new application. But if you're going to add compression-support into the application, as Josef Segur already mentioned it's better to not encode the data in the format it's being done now, but use binary data instead. Seeing how a new SETI-application will be released "soon" anyway, adding neccessary changes to application would be better than to add other compression-support.

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 1160545 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 1160801 - Posted: 10 Oct 2011, 2:32:43 UTC

Switching to Binary is probably the best space savings available.


BOINC WIKI
ID: 1160801 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 1160898 - Posted: 10 Oct 2011, 11:11:50 UTC - in response to Message 1160539.  

It is not the size of the task that is important but the result, and the storing thereof.

Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians).
They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number.

So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in.


No. Append 2 tasks together (with appropiate header info). Same as Einstein did except they put 8 of them together. They run as a single task, but are really multiple tasks run sequentially. Would still require an app update to cope and probably scheduler (to append tasks) but maintains DB consistency.
BOINC blog
ID: 1160898 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 18996
Credit: 40,757,560
RAC: 67
United Kingdom
Message 1160918 - Posted: 10 Oct 2011, 13:16:51 UTC - in response to Message 1160898.  

It is not the size of the task that is important but the result, and the storing thereof.

Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians).
They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number.

So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in.


No. Append 2 tasks together (with appropiate header info). Same as Einstein did except they put 8 of them together. They run as a single task, but are really multiple tasks run sequentially. Would still require an app update to cope and probably scheduler (to append tasks) but maintains DB consistency.

Unless the situation has changed significantly, since I last did Einstein. Only if you have fast h/ware and devote enough time/resources to Einstein.
ID: 1160918 · Report as offensive
Profile Fred J. Verster
Volunteer tester
Avatar

Send message
Joined: 21 Apr 04
Posts: 3252
Credit: 31,903,643
RAC: 0
Netherlands
Message 1160939 - Posted: 10 Oct 2011, 14:39:10 UTC - in response to Message 1160918.  
Last modified: 10 Oct 2011, 14:43:26 UTC

It is not the size of the task that is important but the result, and the storing thereof.

Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians).
They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number.

So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in.


30 or 31 'signals' are regarded, an overflow, so why should 30 not be enough, what would the average be for Pulses, Spikes, Triplets and Gaussians, in a 'normal'
0.4AR WU. Chance of finding combined signals close to 30 would be possible, with
'Double Sized WUs', can it become a problem?


No. Append 2 tasks together (with appropiate header info). Same as Einstein did except they put 8 of them together. They run as a single task, but are really multiple tasks run sequentially. Would still require an app update to cope and probably scheduler (to append tasks) but maintains DB consistency.

Unless the situation has changed significantly, since I last did Einstein. Only if you have fast h/ware and devote enough time/resources to Einstein.


They did improve their GPU app. with the same resources, my
througput, or R.A.C. in Einstein@home, has improved by more then 5 times.
(Probably 10 fold, by looking at RAC only.)
ID: 1160939 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 1161089 - Posted: 10 Oct 2011, 23:06:28 UTC - in response to Message 1160898.  

It is not the size of the task that is important but the result, and the storing thereof.

Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians).
They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number.

So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in.


No. Append 2 tasks together (with appropiate header info). Same as Einstein did except they put 8 of them together. They run as a single task, but are really multiple tasks run sequentially. Would still require an app update to cope and probably scheduler (to append tasks) but maintains DB consistency.

It really would not help that much. Most of the data transmission in the task is the raw data, not the updates. Doubling it halves the number of tasks, but doubles the size of each one - net is a wash.


BOINC WIKI
ID: 1161089 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 1161440 - Posted: 12 Oct 2011, 10:57:45 UTC - in response to Message 1161089.  
Last modified: 12 Oct 2011, 11:04:30 UTC

It really would not help that much. Most of the data transmission in the task is the raw data, not the updates. Doubling it halves the number of tasks, but doubles the size of each one - net is a wash.


It would reduce the rate of scheduler requests by a factor of 2. If they were also compressed then you get a (bandwidth) saving there as well.

They could take it a step further and have more than 2 wu appended which then reduces the number of scheduler request even further.
BOINC blog
ID: 1161440 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Idea for Boinc.....


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.