Idea for Boinc.....


log in

Advanced search

Message boards : Number crunching : Idea for Boinc.....

Previous · 1 · 2 · 3 · 4 · 5 · Next
Author Message
Rick
Avatar
Send message
Joined: 3 Dec 99
Posts: 79
Credit: 11,486,227
RAC: 0
United States
Message 1159972 - Posted: 7 Oct 2011, 22:21:25 UTC
Last modified: 7 Oct 2011, 22:23:34 UTC

Is the WU data compressed? If not, has anybody ever tried to compress the WU data to see if it would make much difference? I know compressing the data on the Seti side would take more resources but not sure how that would balance out against the reduced traffic.

I would also suggest that changing the download process so the client only makes one connection then downloads multiple files across that one connection would also reduce the load on the router. If it would mean fewer lost connections then the actual throughput would probably be better than we're seeing now.
____________

Rick
Avatar
Send message
Joined: 3 Dec 99
Posts: 79
Credit: 11,486,227
RAC: 0
United States
Message 1159980 - Posted: 7 Oct 2011, 22:44:22 UTC - in response to Message 1159973.

Is the WU data compressed? If not, has anybody ever tried to compress the WU data to see if it would make much difference? I know compressing the data on the Seti side would take more resources but not sure how that would balance out against the reduced traffic.

I would also suggest that changing the download process so the client only makes one connection then downloads multiple files across that one connection would also reduce the load on the router. If it would mean fewer lost connections then the actual throughput would probably be better than we're seeing now.

I believe it's been tested, and compressing the actual WUs results in very little gain.


Well, it was worth a shot but after this many years I should have guessed that someone would have looked into it by now.

____________

Profile Arvid Almstrom
Avatar
Send message
Joined: 23 Mar 00
Posts: 98
Credit: 137,331,372
RAC: 49
Australia
Message 1159981 - Posted: 7 Oct 2011, 22:46:39 UTC

As the data that we process is more or less random data compression will do little to improve this.
Have a look at compressing an MP3 or JPG file and you will find virtually no compression is achievable.
It is not quite the same but similar enough I think.
____________
Arvid Almstrom

Profile janneseti
Avatar
Send message
Joined: 14 Oct 09
Posts: 2730
Credit: 528,265
RAC: 335
Sweden
Message 1159994 - Posted: 7 Oct 2011, 23:42:18 UTC - in response to Message 1159962.


[/quote]
You have to remember that graph is inside-out....
The green is downloads going out.
The blue line is uploads and work requests going in.
[/quote]

OK. Help me out since i'm not english spoken.
You say downloads are going out and uploads are going in.
Please send an answer to this folling questions.

If Alice sends a message to Bob, does she download the message?
If I receive message from Bob, do I upload the message?

Profile Zapped SparkyProject donor
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 30 Aug 08
Posts: 9370
Credit: 1,333,553
RAC: 702
United Kingdom
Message 1159995 - Posted: 7 Oct 2011, 23:45:24 UTC

What if every host was only allowed to connect to the project at maximum when a certain percentage of cache is completed, and the update button did nothing.

For example, with a ten day cache, Boinc would only connect to the server to simultaneously upload/report/get new workunits after 1 day of cache was depleted.

If your cache hasn't been filled maybe Boinc could be limited to connecting to just once an hour or more if necessary. Except for when a host was out of workunits.

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4347
Credit: 1,123,882
RAC: 739
United States
Message 1160020 - Posted: 8 Oct 2011, 1:15:03 UTC - in response to Message 1159972.

Is the WU data compressed? If not, has anybody ever tried to compress the WU data to see if it would make much difference? I know compressing the data on the Seti side would take more resources but not sure how that would balance out against the reduced traffic.
...

MB WUs compress to about 75 to 85% of their original size because the data format is BASE64 and there's a fairly large XML header section. AP WUs have a smaller header and the data is pure binary so compression is much less.
Joe

Profile janneseti
Avatar
Send message
Joined: 14 Oct 09
Posts: 2730
Credit: 528,265
RAC: 335
Sweden
Message 1160038 - Posted: 8 Oct 2011, 2:57:19 UTC - in response to Message 1159996.



You have to remember that graph is inside-out....
The green is downloads going out.
The blue line is uploads and work requests going in.


OK. Help me out since i'm not english spoken.
You say downloads are going out and uploads are going in.
Please send an answer to this folling questions.

If Alice sends a message to Bob, does she download the message?
If I receive message from Bob, do I upload the message?

No, the opposite.
If you receive a file into your computer, you are downloading it from another computer or server.
If you send a file out of your computer, you are uploading it to another computer or server.


Thanks msattler.
However the cricket graph makes me wonder still.
Who's gigabitethernet2_3 and who's sending and receiving what?



John McLeod VII
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 529,956
RAC: 323
United States
Message 1160040 - Posted: 8 Oct 2011, 3:37:25 UTC - in response to Message 1160038.



You have to remember that graph is inside-out....
The green is downloads going out.
The blue line is uploads and work requests going in.


OK. Help me out since i'm not english spoken.
You say downloads are going out and uploads are going in.
Please send an answer to this folling questions.

If Alice sends a message to Bob, does she download the message?
If I receive message from Bob, do I upload the message?

No, the opposite.
If you receive a file into your computer, you are downloading it from another computer or server.
If you send a file out of your computer, you are uploading it to another computer or server.


Thanks msattler.
However the cricket graph makes me wonder still.
Who's gigabitethernet2_3 and who's sending and receiving what?




gigabitethernet2_3 does SETI@Home upload, download, and possibly update (work requests and reports). There are two paths into SSL (Space Sciences Lab), the one that does the web pages goes through the normal Berkeley network. The one that does uploads and downloads goes through Hurricane Electric. The cricket graph is for a switch at Hurricane Electric. It is in the Hurricane Electric building facing SSL. Since in and out are from the point of view of a router/switch facing SSL, they have to be swapped to match the point of view of SSL. Think of it this way. The port side of a boat is to your left if you are in the stern facing the bow. If you turn around, so you are looking at the wake, the port side of the boat will be on your right.
____________


BOINC WIKI

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8777
Credit: 25,919,032
RAC: 18,119
United Kingdom
Message 1160042 - Posted: 8 Oct 2011, 3:40:02 UTC - in response to Message 1160038.



You have to remember that graph is inside-out....
The green is downloads going out.
The blue line is uploads and work requests going in.


OK. Help me out since i'm not english spoken.
You say downloads are going out and uploads are going in.
Please send an answer to this folling questions.

If Alice sends a message to Bob, does she download the message?
If I receive message from Bob, do I upload the message?

No, the opposite.
If you receive a file into your computer, you are downloading it from another computer or server.
If you send a file out of your computer, you are uploading it to another computer or server.


Thanks msattler.
However the cricket graph makes me wonder still.
Who's gigabitethernet2_3 and who's sending and receiving what?

gigabitethernet2_3 is the Berkeley campus designation, also the cricket graphs are generated by Berkeley campus.

Therefore the cricket graphs are showing d/loading from the Seti servers in green and uploads to the Seti servers in blue.

tbretProject donor
Volunteer tester
Avatar
Send message
Joined: 28 May 99
Posts: 2905
Credit: 218,679,445
RAC: 14,147
United States
Message 1160100 - Posted: 8 Oct 2011, 5:39:23 UTC - in response to Message 1159808.
Last modified: 8 Oct 2011, 5:39:50 UTC

Why not make the workunits bigger?
Every server at Seti would benefit from it.


I'll try again.

Why not make the workunits larger in datasize?
Every server at Seti would benefit from it.
Instead of about 300 Kbytes files per WU, increase it to let's say 1 Mbyte.
That will decrease the number of workunits by a third.
The network load for WU's surely must drop.

That depends on the time required for the hosts to crunch the WU.
Simply increasing their size will not reduce server load if the crunch time per Kbyte of data remains the same.
AP is a good example...the WUs are much larger, but the time required to process them does not increase in proportion to their size, so they actually are harder on bandwidth than MB work. Of course, VHAR MB work with it's very quick processing times are even worse.

The only thing that would change the ratios is if the science application did more work on the data sent.

I am not sure how the new version of Seti being rolled out later this year compares in terms of processing time per WU sent.


Here's an idea --- What if they just didn't send VHAR tasks at all and processed them on a vhar-optimized cruncher or two locally?

I'm seeing completion times under 2:00 on some computers, under 4:00 on many, many (and I assume these are running two or three at a time).

Say... eight GTX 590s running six units each (48) completing every three minutes, so 20 x that per hour...ummm 960 WUs/hr...ummmm, 23040 WUs per day that wouldn't have to be sent, reported, retrieved... x 2... 46080 Wus of bandwidth @ 366.55KB = ummmm.... 16GB/day...downloaded, and ummm...results are how big?... ummmm...

And do that twice... so you've got 32GB/day of bandwith opened-up while crunching 92,000 vhars/day

Well, whatever the real number is, it's a lot of bandwidth and a lot of CPU cycles wasted trying to download and upload and a lot of scheduler tasks and a lot of everything else. If nothing else, crunching them locally would save half the bandwidth because I assume they wouldn't need duplication for verification.

Now if I just knew what portion of the 40-50k results every hour were vhars I'd know if I'm slowing things down or speeding things up for the project as a whole.

And then there is the question: Would anything be left for us to crunch?

We would "hit" the servers a WHOLE lot less frequently if we were doing nothing but larger tasks. As it is, I've got a pot-load of under 4:00 tasks I've been downloading, crunching, and uploading.

Maybe one of these?

http://www.youtube.com/watch?NR=1&v=PZ3CJw-u2cc

Yeah, I know. But it looks so cool.

I've just got this feeling that we're not thinking outside the box and there's some very simple principle that could be applied to make things much better very quickly. I just "feel" like there's an "ah-HA!" to be had.

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 944
Credit: 25,172,902
RAC: 450
Australia
Message 1160106 - Posted: 8 Oct 2011, 6:07:25 UTC

Einstein does locality scheduling, meaning that once you have some work files you can generate your own wu from them, this saves downloading them all the time. Once the scheduler has worked out all the work units have been done for that work file it instructs the BOINC client to delete it. Don't know if it's at all possible to do on here.

Also I would suggest gzip for the scheduler requests to reduce the traffic for them, while not giving a big saving in bandwidth every little bit helps.
____________
BOINC blog

Profile janneseti
Avatar
Send message
Joined: 14 Oct 09
Posts: 2730
Credit: 528,265
RAC: 335
Sweden
Message 1160160 - Posted: 8 Oct 2011, 11:39:09 UTC - in response to Message 1160040.



You have to remember that graph is inside-out....
The green is downloads going out.
The blue line is uploads and work requests going in.


OK. Help me out since i'm not english spoken.
You say downloads are going out and uploads are going in.
Please send an answer to this folling questions.

If Alice sends a message to Bob, does she download the message?
If I receive message from Bob, do I upload the message?

No, the opposite.
If you receive a file into your computer, you are downloading it from another computer or server.
If you send a file out of your computer, you are uploading it to another computer or server.


Thanks msattler.
However the cricket graph makes me wonder still.
Who's gigabitethernet2_3 and who's sending and receiving what?




gigabitethernet2_3 does SETI@Home upload, download, and possibly update (work requests and reports). There are two paths into SSL (Space Sciences Lab), the one that does the web pages goes through the normal Berkeley network. The one that does uploads and downloads goes through Hurricane Electric. The cricket graph is for a switch at Hurricane Electric. It is in the Hurricane Electric building facing SSL. Since in and out are from the point of view of a router/switch facing SSL, they have to be swapped to match the point of view of SSL. Think of it this way. The port side of a boat is to your left if you are in the stern facing the bow. If you turn around, so you are looking at the wake, the port side of the boat will be on your right.


Thanks all for the help.
I can now see that Berkeley uses all the bandwidth all the time.
The bottom line is that Berkeley lacks of fundings and we all have to wait for some years until they can get all the bandwidth they (we) want.

Sami
Send message
Joined: 12 Aug 99
Posts: 37
Credit: 3,358,623
RAC: 164
Finland
Message 1160169 - Posted: 8 Oct 2011, 12:30:58 UTC - in response to Message 1159801.

Why not make the workunits larger in datasize?
Every server at Seti would benefit from it.
Instead of about 300 Kbytes files per WU, increase it to let's say 1 Mbyte.
That will decrease the number of workunits by a third.
The network load for WU's surely must drop.


Rna world beta did something like that. They combined many small WUs into one big WU. One reason was to reduce server load.

http://www.rnaworld.de/rnaworld/forum_thread.php?id=76

Is this something you meant?
____________

Profile MarkJProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 08
Posts: 944
Credit: 25,172,902
RAC: 450
Australia
Message 1160175 - Posted: 8 Oct 2011, 13:01:25 UTC - in response to Message 1160169.

Why not make the workunits larger in datasize?
Every server at Seti would benefit from it.
Instead of about 300 Kbytes files per WU, increase it to let's say 1 Mbyte.
That will decrease the number of workunits by a third.
The network load for WU's surely must drop.


Rna world beta did something like that. They combined many small WUs into one big WU. One reason was to reduce server load.

http://www.rnaworld.de/rnaworld/forum_thread.php?id=76

Is this something you meant?


They also did this for the Einstein cuda work. They are actually 8 wu combined. To process you download 8 x 4Mb data files, which are processed serially and uploaded as 8 result files.
____________
BOINC blog

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8777
Credit: 25,919,032
RAC: 18,119
United Kingdom
Message 1160225 - Posted: 8 Oct 2011, 15:41:31 UTC - in response to Message 1160106.

Einstein does locality scheduling, meaning that once you have some work files you can generate your own wu from them, this saves downloading them all the time. Once the scheduler has worked out all the work units have been done for that work file it instructs the BOINC client to delete it. Don't know if it's at all possible to do on here.

Also I would suggest gzip for the scheduler requests to reduce the traffic for them, while not giving a big saving in bandwidth every little bit helps.

Until you get the end of a run, and you are on clean-up duty to process the odd ones not completed yet. Then you d/load these large files just to extract one unit for processing, when that one task is complete the order is to delete those massive d/loads.

Profile janneseti
Avatar
Send message
Joined: 14 Oct 09
Posts: 2730
Credit: 528,265
RAC: 335
Sweden
Message 1160255 - Posted: 8 Oct 2011, 17:15:04 UTC - in response to Message 1160175.

Why not make the workunits larger in datasize?
Every server at Seti would benefit from it.
Instead of about 300 Kbytes files per WU, increase it to let's say 1 Mbyte.
That will decrease the number of workunits by a third.
The network load for WU's surely must drop.


Rna world beta did something like that. They combined many small WUs into one big WU. One reason was to reduce server load.

http://www.rnaworld.de/rnaworld/forum_thread.php?id=76

Is this something you meant?


They also did this for the Einstein cuda work. They are actually 8 wu combined. To process you download 8 x 4Mb data files, which are processed serially and uploaded as 8 result files.

Profile janneseti
Avatar
Send message
Joined: 14 Oct 09
Posts: 2730
Credit: 528,265
RAC: 335
Sweden
Message 1160260 - Posted: 8 Oct 2011, 17:25:06 UTC - in response to Message 1160255.

Why not make the workunits larger in datasize?
Every server at Seti would benefit from it.
Instead of about 300 Kbytes files per WU, increase it to let's say 1 Mbyte.
That will decrease the number of workunits by a third.
The network load for WU's surely must drop.


Rna world beta did something like that. They combined many small WUs into one big WU. One reason was to reduce server load.

http://www.rnaworld.de/rnaworld/forum_thread.php?id=76

Is this something you meant?


They also did this for the Einstein cuda work. They are actually 8 wu combined. To process you download 8 x 4Mb data files, which are processed serially and uploaded as 8 result files.



Hyvää päivää.

Exactly.
However the Berkeley network load is probably too high and the crunchers demand is more than they can handle.
But like they say. Every bit counts.



Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Idea for Boinc.....

Copyright © 2014 University of California