Idea for Boinc.....


log in

Advanced search

Message boards : Number crunching : Idea for Boinc.....

Previous · 1 · 2 · 3 · 4 · 5 · Next
Author Message
Profile Fred J. Verster
Volunteer tester
Send message
Joined: 21 Apr 04
Posts: 3236
Credit: 31,654,926
RAC: 5,370
Netherlands
Message 1160265 - Posted: 8 Oct 2011, 17:59:31 UTC - in response to Message 1160260.

Apart for the fact that there simply are too much computers or too little bandwidth, not much can be done.

Making the WUs bigger means less WUs, but still the same amount of data.
Compressing MB WUs could help, if it isn't too much of an 'operation' and puts
extra strain on the server, or splitter, which is used for compression.

Noticing the crazy amount of time and retries, DownLoading a few AstroPulse
WUs, 36 hours and still downloading.................but eventually, they arrive...

UPLoads aren't possible at all.(Have to check again!)
Hope the extra memory for the PAIX Router, will help, as I read in another post,
forgot which 'thread'.


____________

janneseti
Avatar
Send message
Joined: 14 Oct 09
Posts: 985
Credit: 482,324
RAC: 104
Sweden
Message 1160291 - Posted: 8 Oct 2011, 18:49:26 UTC - in response to Message 1160265.

Apart for the fact that there simply are too much computers or too little bandwidth, not much can be done.

Making the WUs bigger means less WUs, but still the same amount of data.
Compressing MB WUs could help, if it isn't too much of an 'operation' and puts
extra strain on the server, or splitter, which is used for compression.

Noticing the crazy amount of time and retries, DownLoading a few AstroPulse
WUs, 36 hours and still downloading.................but eventually, they arrive...

UPLoads aren't possible at all.(Have to check again!)
Hope the extra memory for the PAIX Router, will help, as I read in another post,
forgot which 'thread'.


janneseti
Avatar
Send message
Joined: 14 Oct 09
Posts: 985
Credit: 482,324
RAC: 104
Sweden
Message 1160292 - Posted: 8 Oct 2011, 18:49:59 UTC - in response to Message 1160291.

Apart for the fact that there simply are too much computers or too little bandwidth, not much can be done.

Making the WUs bigger means less WUs, but still the same amount of data.
Compressing MB WUs could help, if it isn't too much of an 'operation' and puts
extra strain on the server, or splitter, which is used for compression.

Noticing the crazy amount of time and retries, DownLoading a few AstroPulse
WUs, 36 hours and still downloading.................but eventually, they arrive...

UPLoads aren't possible at all.(Have to check again!)
Hope the extra memory for the PAIX Router, will help, as I read in another post,
forgot which 'thread'.




You're right. I compressed some VLAR files and they got smaller by a third.
360 kBytes got 260 kbytes which is a surprise for me.
Maybe that is a coincident for my VLAR files but surely interesting enough to follow up.
The extra strain on SETI's servers should be manegable.

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4225
Credit: 1,041,649
RAC: 357
United States
Message 1160309 - Posted: 8 Oct 2011, 19:35:33 UTC

One other possibility for MB WUs which has been noted before is to switch to pure binary data rather than the Base64 format used now. That would require an application change to handle, but S@h v7 could fairly easily be modified along those lines. The code could be imported from the Astropulse application.

That would reduce the data section from 354991 bytes to just 262144 bytes, overall WU size would be almost as small as compressing the current WUs but with no extra computation needed server-side. It would actually reduce splitter load slightly, storing the raw binary is faster than converting it to Base64.

The download pipe may get ~72000 MB WUs sent per hour now on average when operating well with some AP mixed in. The change would allow that to increase to about ~92000 per hour. Not a miracle cure, but perhaps enough to justify the change.

Joe

rob smithProject donor
Volunteer tester
Send message
Joined: 7 Mar 03
Posts: 8292
Credit: 54,876,180
RAC: 74,842
United Kingdom
Message 1160337 - Posted: 8 Oct 2011, 20:44:05 UTC

While it may appear to be a "good idea" there are quite a few overheads associated with moving from a "raw data" feed to a "compressed data" feed. First, and by no means least, is the vast number of clients out in the field that would have to be updated. You have the not inconsiderable server overhead of compressing the data before dispatch. During the transition period you have to be able to detect which clients can accept a compressed WU and which can't - its no good trying server-side tracking, it just won't work given the rate some folks update, tweak, modify, regress their crunchers. Most of the projects that are held up as using compressed data did so from the outset, and as such both the clients and the servers have been configured with compression/de-compression in mind, rather than being "bolted on" as a late addition

Mark's initial idea was a fairly simple attempt at reducing the amount of traffic, mainly by hitting the overheads, the non-productive traffic and as such it bears some thinking about - how to reduce the amount of non-productive traffic? And doing so in such a manner that it is "client transparent" - in other words, if the client can perform the latest magic trick it does it, but if it can't then the servers don't get themselves in a twist...
____________
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

Profile soft^spirit
Avatar
Send message
Joined: 18 May 99
Posts: 6374
Credit: 28,631,059
RAC: 94
United States
Message 1160341 - Posted: 8 Oct 2011, 20:59:47 UTC

a basic unzip is available through most clients. Proprietary versions and .rar should be avoided due to compatibility/licensing issues, and I am pretty sure there are free versions that could be sent out with a version of the client.

Self extracting is possible, but each .exe would be reviewed and possibly require manual intervention by the end user. Not a good idea for this application.

But in principle if it saves a fair amount of space, it should make better use of the bandwidth. Yes the servers would have to have the capacity to perform the compress function. But in theory, it could be a good idea.


____________

Janice

Tex1954
Volunteer tester
Send message
Joined: 16 Mar 11
Posts: 8
Credit: 3,192,711
RAC: 20,611
United States
Message 1160401 - Posted: 8 Oct 2011, 23:03:42 UTC - in response to Message 1160341.

I didn't see this thread and started a new one.
http://setiathome.berkeley.edu/forum_thread.php?id=65725&nowrap=true#1160350


MPEG or ZIP compression algorithms would be a really iffy thing. One could not rely on it much due to the nature of the data. It could be all irrelevant NULL data or filled with subtle pertinent data that could be lost in compression.

I think by far the best idea is to do what I (and others) already suggested; that being make the tasks 10 times longer.

Einstein sends multiple 4-Meg files per WU and has a small upload. SETI seems to be about the same... larger download, smaller upload.

So, thinking disk related I/O is bottle-necking throughput and causing lost packets (due to timeouts and other concurrent I/O requests), making the tasks 10 times longer (by whatever means) could theoretically reduce indexing/disk I/O lookups an order of magnitude.

JMHO

:)

Profile Wiggo
Avatar
Send message
Joined: 24 Jan 00
Posts: 6763
Credit: 92,711,358
RAC: 76,237
Australia
Message 1160410 - Posted: 8 Oct 2011, 23:20:30 UTC - in response to Message 1160401.

Some time ago I remember 1 of the guys from the closet (I just can't remember which 1) said that compressing the files won't work with the data that we're working with as compression will introduce errors in the signals that we're trying to analyse.

Cheers.
____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8458
Credit: 48,585,791
RAC: 79,662
United Kingdom
Message 1160420 - Posted: 8 Oct 2011, 23:36:45 UTC - in response to Message 1160410.

Some time ago I remember 1 of the guys from the closet (I just can't remember which 1) said that compressing the files won't work with the data that we're working with as compression will introduce errors in the signals that we're trying to analyse.

Cheers.

Sorry, but rubbish.

There are two sorts of file compression: lossy, and lossless.

'lossy' compression (like JPEG for images) works by throwing away fine detail that the human eye wouldn't notice anyway. That would indeed introduce errors, and wouldn't be considered here.

But 'lossless' compression like the various flavours of ZIP allows the recreation of the exact same digital data file at the destination. That wouldn't cause any errors at all, and it's perfectly fair to consider it here - even if the compression ratios achieved with lossless compression are much less impressive than those for lossy compression.

janneseti
Avatar
Send message
Joined: 14 Oct 09
Posts: 985
Credit: 482,324
RAC: 104
Sweden
Message 1160426 - Posted: 8 Oct 2011, 23:48:47 UTC - in response to Message 1160420.

Some time ago I remember 1 of the guys from the closet (I just can't remember which 1) said that compressing the files won't work with the data that we're working with as compression will introduce errors in the signals that we're trying to analyse.

Cheers.

Sorry, but rubbish.

There are two sorts of file compression: lossy, and lossless.

'lossy' compression (like JPEG for images) works by throwing away fine detail that the human eye wouldn't notice anyway. That would indeed introduce errors, and wouldn't be considered here.

But 'lossless' compression like the various flavours of ZIP allows the recreation of the exact same digital data file at the destination. That wouldn't cause any errors at all, and it's perfectly fair to consider it here - even if the compression ratios achieved with lossless compression are much less impressive than those for lossy compression.


BOINC has already a solution to this and we crunchers does not have to bother.

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24313
Credit: 519,558
RAC: 29
United States
Message 1160472 - Posted: 9 Oct 2011, 3:40:12 UTC - in response to Message 1160401.

I didn't see this thread and started a new one.
http://setiathome.berkeley.edu/forum_thread.php?id=65725&nowrap=true#1160350


MPEG or ZIP compression algorithms would be a really iffy thing. One could not rely on it much due to the nature of the data. It could be all irrelevant NULL data or filled with subtle pertinent data that could be lost in compression.

I think by far the best idea is to do what I (and others) already suggested; that being make the tasks 10 times longer.

Einstein sends multiple 4-Meg files per WU and has a small upload. SETI seems to be about the same... larger download, smaller upload.

So, thinking disk related I/O is bottle-necking throughput and causing lost packets (due to timeouts and other concurrent I/O requests), making the tasks 10 times longer (by whatever means) could theoretically reduce indexing/disk I/O lookups an order of magnitude.

JMHO

:)

Making the tasks 10 times larger won't help much as the the size of the data for a typical update is minuscule in comparison to the size of the data transferred for each task. The gain would be a few at most. There would be ten times the data sent for each task as now, so the only gains would be the overlap and the slight decrease in size of a work request.

Zipping the files would decrease the data size of MB tasks by about 30% but would increase the load on some of the servers that are already near the edge of CPU power.
____________


BOINC WIKI

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 937
Credit: 22,066,931
RAC: 87,022
Australia
Message 1160515 - Posted: 9 Oct 2011, 8:40:14 UTC

Maybe a combination of things?

1. Josef's idea of the pure binary data instead of Base64 (approx 50%) saving in data size

2. Increase wu size by 50% so its back to its original size but host is getting more work to do

3. gzip compression on the scheduler request

That saves the servers having to compress (except on scheduler request) thus doesn't require too much extra cpu power and the number of scheduler requests should drop as larger work is being given out.
____________
BOINC blog

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8616
Credit: 23,625,323
RAC: 18,507
United Kingdom
Message 1160539 - Posted: 9 Oct 2011, 11:02:43 UTC

It is not the size of the task that is important but the result, and the storing thereof.

Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians).
They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number.

So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in.

Ingleside
Volunteer developer
Send message
Joined: 4 Feb 03
Posts: 1546
Credit: 4,116,226
RAC: 24,074
Norway
Message 1160545 - Posted: 9 Oct 2011, 12:06:08 UTC - in response to Message 1160337.

While it may appear to be a "good idea" there are quite a few overheads associated with moving from a "raw data" feed to a "compressed data" feed. First, and by no means least, is the vast number of clients out in the field that would have to be updated.

On-the-fly download-compression has been included since v5.4.xx, while on-the-fly upload-compression has been included since v5.8.xx. So appart for some zombie-computers still running ancient clients, 99.9% of clients supports this. Also, as long as project uses on-the-fly compression, the few ancient clients will just get the uncompressed files, meaning 100% of clients supports this.

The extra load on download-servers on the other hand can be a problem. Also, the on-the-fly compression possibly gives less effects than other compression-methods, so it's not a good option.

Using pre-compressed files is another option, as long as it's gzipped files the client will just automatically uncompress the files on download. But, unfortunately this method breaks download-resuming, and the few ancient clients running pre-v5.4.xx won't work.

If uses pre-compressed files like example zip-files, these AFAIK needs application-support to decompress the files, so needs a new application. But if you're going to add compression-support into the application, as Josef Segur already mentioned it's better to not encode the data in the format it's being done now, but use binary data instead. Seeing how a new SETI-application will be released "soon" anyway, adding neccessary changes to application would be better than to add other compression-support.

____________
"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24313
Credit: 519,558
RAC: 29
United States
Message 1160801 - Posted: 10 Oct 2011, 2:32:43 UTC

Switching to Binary is probably the best space savings available.
____________


BOINC WIKI

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 937
Credit: 22,066,931
RAC: 87,022
Australia
Message 1160898 - Posted: 10 Oct 2011, 11:11:50 UTC - in response to Message 1160539.

It is not the size of the task that is important but the result, and the storing thereof.

Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians).
They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number.

So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in.


No. Append 2 tasks together (with appropiate header info). Same as Einstein did except they put 8 of them together. They run as a single task, but are really multiple tasks run sequentially. Would still require an app update to cope and probably scheduler (to append tasks) but maintains DB consistency.
____________
BOINC blog

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8616
Credit: 23,625,323
RAC: 18,507
United Kingdom
Message 1160918 - Posted: 10 Oct 2011, 13:16:51 UTC - in response to Message 1160898.

It is not the size of the task that is important but the result, and the storing thereof.

Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians).
They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number.

So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in.


No. Append 2 tasks together (with appropiate header info). Same as Einstein did except they put 8 of them together. They run as a single task, but are really multiple tasks run sequentially. Would still require an app update to cope and probably scheduler (to append tasks) but maintains DB consistency.

Unless the situation has changed significantly, since I last did Einstein. Only if you have fast h/ware and devote enough time/resources to Einstein.

Profile Fred J. Verster
Volunteer tester
Send message
Joined: 21 Apr 04
Posts: 3236
Credit: 31,654,926
RAC: 5,370
Netherlands
Message 1160939 - Posted: 10 Oct 2011, 14:39:10 UTC - in response to Message 1160918.
Last modified: 10 Oct 2011, 14:43:26 UTC

It is not the size of the task that is important but the result, and the storing thereof.

Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians).
They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number.

So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in.


30 or 31 'signals' are regarded, an overflow, so why should 30 not be enough, what would the average be for Pulses, Spikes, Triplets and Gaussians, in a 'normal'
0.4AR WU. Chance of finding combined signals close to 30 would be possible, with
'Double Sized WUs', can it become a problem?


No. Append 2 tasks together (with appropiate header info). Same as Einstein did except they put 8 of them together. They run as a single task, but are really multiple tasks run sequentially. Would still require an app update to cope and probably scheduler (to append tasks) but maintains DB consistency.

Unless the situation has changed significantly, since I last did Einstein. Only if you have fast h/ware and devote enough time/resources to Einstein.


They did improve their GPU app. with the same resources, my
througput, or R.A.C. in Einstein@home, has improved by more then 5 times.
(Probably 10 fold, by looking at RAC only.)
____________

John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24313
Credit: 519,558
RAC: 29
United States
Message 1161089 - Posted: 10 Oct 2011, 23:06:28 UTC - in response to Message 1160898.

It is not the size of the task that is important but the result, and the storing thereof.

Since the beginnings of Seti Classic the tasks have been of the same size and the results can be of up to 30 items (Spikes, Pulses, Triples and Gausians).
They are stored after we have completed in a table, which at its simplest has 31 columns, the first containing the WU number.

So what you are proposing by increasing the task data size is throwing away information. If the task size is doubled then it is possible under the present rules to find 60 items of interest, and only 30 cells to store it in.


No. Append 2 tasks together (with appropiate header info). Same as Einstein did except they put 8 of them together. They run as a single task, but are really multiple tasks run sequentially. Would still require an app update to cope and probably scheduler (to append tasks) but maintains DB consistency.

It really would not help that much. Most of the data transmission in the task is the raw data, not the updates. Doubling it halves the number of tasks, but doubles the size of each one - net is a wash.
____________


BOINC WIKI

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 937
Credit: 22,066,931
RAC: 87,022
Australia
Message 1161440 - Posted: 12 Oct 2011, 10:57:45 UTC - in response to Message 1161089.
Last modified: 12 Oct 2011, 11:04:30 UTC

It really would not help that much. Most of the data transmission in the task is the raw data, not the updates. Doubling it halves the number of tasks, but doubles the size of each one - net is a wash.


It would reduce the rate of scheduler requests by a factor of 2. If they were also compressed then you get a (bandwidth) saving there as well.

They could take it a step further and have more than 2 wu appended which then reduces the number of scheduler request even further.
____________
BOINC blog

Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Idea for Boinc.....

Copyright © 2014 University of California