Tenaya (Feb 24 2009)


log in

Advanced search

Message boards : Technical News : Tenaya (Feb 24 2009)

Previous · 1 · 2 · 3 · 4 · Next
Author Message
Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8833
Credit: 53,667,614
RAC: 48,514
United Kingdom
Message 869339 - Posted: 25 Feb 2009, 15:13:14 UTC

This is turning into quite a pub crawl - any offers from Lincolnshire? (we might be getting a little thirsty as we head south....)

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1624
Credit: 22,621,644
RAC: 4,620
United States
Message 869364 - Posted: 25 Feb 2009, 16:21:05 UTC - in response to Message 869213.

Seeing the constant issues with bandwidth, I was wondering if the SETI data packets could be compressed (to reduce size for transfer), then BIONIC decompress for processing.

I know it would take more CPU time to compress the packets, but it would reduce bandwidth use.

Just an idea, thought I would share.

- Wol

File compression works based on the fact that the data in those files (word processing, databases, etc.) are not entirely random. A "flat file" database may compress 90% because one filler character appears over and over.

Common bytes get shorter codes, uncommon bytes longer, and the average number of bits/character goes down. (gross oversimplification)

Binary data consisting almost entirely of noise is going to be equally distributed across the whole range, so they aren't very compressable.


To make this concrete, I compressed (Windows) my project directory of 500 or so wu's. It saved 7MB out of 180MB, or about 4%. So I then zipped up the entire directory and saved about 28% of disk space.

Granted there may be better, or more tailored compression algorithms available, these numbers are a good guide and that tends to validate Ned's observation. Although 28% might be appealing if it could be implemented and if it is related to wu's not not some sublety of Vista NTFS.

PhonAcq
Send message
Joined: 14 Apr 01
Posts: 1624
Credit: 22,621,644
RAC: 4,620
United States
Message 869365 - Posted: 25 Feb 2009, 16:26:02 UTC - in response to Message 869364.

Oops. I just saw Josef's estimates. Sorry for the added post.

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 869428 - Posted: 25 Feb 2009, 20:37:03 UTC - in response to Message 869364.
Last modified: 25 Feb 2009, 20:40:21 UTC

Granted there may be better, or more tailored compression algorithms available, these numbers are a good guide and that tends to validate Ned's observation. Although 28% might be appealing if it could be implemented and if it is related to wu's not not some sublety of Vista NTFS.

All file compression is based on the fact that data usually isn't very random.

In the absense of a strong, consistent signal, stuff coming from a radio receiver is very well randomized.

Joe's comments: the easiest way to pick up that 15% would be to encode the MB work units as pure binary, same as AP. Of course, we'd all need new science apps.
____________

Mike Davis
Volunteer tester
Send message
Joined: 17 May 99
Posts: 232
Credit: 5,305,576
RAC: 0
Isle of Man
Message 869447 - Posted: 25 Feb 2009, 21:50:14 UTC - in response to Message 869334.


We don't usually do pitchers in England, but I'd love to hear the stories - if you're ever in Yorkshire, these are waiting for you:


And a dyed-in-the-wool Lancastrian would risk crossing the border to double the contribution.

F.


And I'll come across from Nottinghamshire to triple the contribution.

Claggy


Add another 4 from Cambridgeshire :).

Mick


Id probably even manage to come over from the isle of man for that :)
____________

Fred W
Volunteer tester
Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 869492 - Posted: 25 Feb 2009, 23:46:36 UTC - in response to Message 869447.


We don't usually do pitchers in England, but I'd love to hear the stories - if you're ever in Yorkshire, these are waiting for you:


And a dyed-in-the-wool Lancastrian would risk crossing the border to double the contribution.

F.


And I'll come across from Nottinghamshire to triple the contribution.

Claggy


Add another 4 from Cambridgeshire :).

Mick


Id probably even manage to come over from the isle of man for that :)

Perhaps we should get Matt to put it on a CD and just have a get-together to listen to it. Seems a pity for all those pints to go to waste ;)

F.

____________

Nick Fox
Send message
Joined: 5 Jan 04
Posts: 46
Credit: 2,156,647
RAC: 893
United Kingdom
Message 869623 - Posted: 26 Feb 2009, 7:40:56 UTC - in response to Message 869334.


We don't usually do pitchers in England, but I'd love to hear the stories - if you're ever in Yorkshire, these are waiting for you:


And a dyed-in-the-wool Lancastrian would risk crossing the border to double the contribution.

F.


And I'll come across from Nottinghamshire to triple the contribution.

Claggy


Add another 4 from Cambridgeshire :).

Mick


Here's another 4 from a Yorkshireman in Lincolnshire!

____________

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4348
Credit: 1,129,381
RAC: 957
United States
Message 869629 - Posted: 26 Feb 2009, 8:43:21 UTC - in response to Message 869428.

Ned Ludd wrote:
...
Joe's comments: the easiest way to pick up that 15% would be to encode the MB work units as pure binary, same as AP. Of course, we'd all need new science apps.

It would be more like 13% that way, the 20K of XML in Enhanced WUs gzips to 4K and you lose the gzip compression of AP WUs. OTOH, it has an advantage for project WU storage space. Adding the capability to handle encoding="binary" to the Enhanced apps would be fairly easy, once everyone had new apps the splitter could be changed to send that format. It would be a way to weed out those numbskulls who are running obsolete optimized apps, too.
Joe

Zydor
Send message
Joined: 4 Oct 03
Posts: 172
Credit: 491,111
RAC: 0
United Kingdom
Message 869643 - Posted: 26 Feb 2009, 9:37:09 UTC - in response to Message 869623.


We don't usually do pitchers in England, but I'd love to hear the stories - if you're ever in Yorkshire, these are waiting for you:


And a dyed-in-the-wool Lancastrian would risk crossing the border to double the contribution.

F.


And I'll come across from Nottinghamshire to triple the contribution.

Claggy


Add another 4 from Cambridgeshire :).

Mick


Here's another 4 from a Yorkshireman in Lincolnshire!

To complete the liquid odesey I've got a crate ready here in Hampshire - if he's still standing - and will pour him back on a plane at Heathrow :)
____________

.clair.
Volunteer moderator
Send message
Joined: 4 Nov 04
Posts: 1300
Credit: 23,738,400
RAC: 33,672
United Kingdom
Message 869855 - Posted: 27 Feb 2009, 0:07:44 UTC - in response to Message 869643.


We don't usually do pitchers in England, but I'd love to hear the stories - if you're ever in Yorkshire, these are waiting for you:


And a dyed-in-the-wool Lancastrian would risk crossing the border to double the contribution.

F.


And I'll come across from Nottinghamshire to triple the contribution.

Claggy


Add another 4 from Cambridgeshire :).

Mick


Here's another 4 from a Yorkshireman in Lincolnshire!

To complete the liquid odesey I've got a crate ready here in Hampshire - if he's still standing - and will pour him back on a plane at Heathrow :)


There are a `few` counties left yet........
and I am shure a pint or few can be found in Cumbria (west coast)
(hello Isle of Man, I can see you :)....

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 870191 - Posted: 27 Feb 2009, 22:41:20 UTC - in response to Message 870138.

Right now Seti@Home is coping quite well with a 100 Mb connection to the internet
for the most part. This connection has to send out at least 2 copies of each WU,
accept work returned, and send out updated Apps.

If a server closet could be set up at the 1 GB feed to distribute the
WU's, as well as the updated apps, and accept the work returned, then the
bandwidth could easily double.

With Wu's going down the 100 Mb line just once, instead of 2 or 3 times,
returned work coming back on the same line, and new Apps only being
updated to the 1 Gb server closet once per update, the increase in bandwidth
would easily exceed 50%.

It might be easier to have the work returned directly to the lab since the
bandwidth required is so little.

Sounds good, but....

I don't think the two work-units are identical, so you'd need a pre-splitter that made one, and then something else that copied the identical parts to make another.

I'm not saying it's not possible, just that it might not be as helpful as it appears at first blush.

I think it'd be better if all of the servers could end up close to the 1gb feed, and then managed remotely (with the "tapes" sent over the 100mb link) but ultimately, whatever they do has to be workable for them.


____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8833
Credit: 53,667,614
RAC: 48,514
United Kingdom
Message 870195 - Posted: 27 Feb 2009, 22:57:37 UTC - in response to Message 870191.

Right now Seti@Home is coping quite well with a 100 Mb connection to the internet
for the most part. This connection has to send out at least 2 copies of each WU,
accept work returned, and send out updated Apps.

If a server closet could be set up at the 1 GB feed to distribute the
WU's, as well as the updated apps, and accept the work returned, then the
bandwidth could easily double.

With Wu's going down the 100 Mb line just once, instead of 2 or 3 times,
returned work coming back on the same line, and new Apps only being
updated to the 1 Gb server closet once per update, the increase in bandwidth
would easily exceed 50%.

It might be easier to have the work returned directly to the lab since the
bandwidth required is so little.

Sounds good, but....

I don't think the two work-units are identical, so you'd need a pre-splitter that made one, and then something else that copied the identical parts to make another.

I'm not saying it's not possible, just that it might not be as helpful as it appears at first blush.

I think it'd be better if all of the servers could end up close to the 1gb feed, and then managed remotely (with the "tapes" sent over the 100mb link) but ultimately, whatever they do has to be workable for them.

Ned,

Surely the data in a workunit (the 366KB or 8MB download file) is identical for all replications. Otherwise validation makes no sense. Have a look in the fanout directories for a current one of yours (the directory name is in the url in client_state.xml). You'll only find one copy.

The 'tasks' assigned to each host are different, but trivially so: "process this data, and return a file called WU_0" - "... WU_1" - and so on. But the 'tasks' are trivially small, and highly compressible, components of 'sched_reply...xml' files.

What's more, the data files remain unchanged on disk in case of compute error, deadline exceeded, CBNC etc.: they can then be resent quickly if needed, or deleted once validation is complete.

That's probably the weakness with guido.man's idea: storing the datafiles at the bottom of the hill involves a large data storage unit, and a lot of management data traffic as data is added and deleted - all for a comparatively small (just over 50%, allowing for resends) gain in bandwidth. Storing the applications down there would require much less data capacity and much less management, and still be a worthwhile contribution.

WinterKnight
Volunteer tester
Send message
Joined: 18 May 99
Posts: 8781
Credit: 25,976,389
RAC: 17,090
United Kingdom
Message 870217 - Posted: 28 Feb 2009, 0:27:30 UTC

If when a new application is available, it might help, a lot, if the application were sent first and the host acknowledges the correct download of same. Before sending any tasks for that application.

Rather than, here is a task, Oh you haven't got the application, wait one I'll send it to you. Only to find the application downloads are not working. Oooops.

Sending out, in some cases 5 copies of the task @ 8 MB each, only for then to fail because the host hasn't got the application yet was a big waste of bandwidth. Which caused all sorts of problems for the project team and us poor users who have now done several long tasks only for them to reach the "Too many errors" threshold.

1mp0£173
Volunteer tester
Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 870295 - Posted: 28 Feb 2009, 4:17:35 UTC - in response to Message 870195.


Surely the data in a workunit (the 366KB or 8MB download file) is identical for all replications. Otherwise validation makes no sense. Have a look in the fanout directories for a current one of yours (the directory name is in the url in client_state.xml). You'll only find one copy.

At the moment, I've got one of the 5.01 "beta" astropulse units that is taking a long time to crunch, so I can't look at Multibeam.

When I look at that workunit, it has a header, plus the data.

I don't have two copies of the same workunit sent to different hosts, so I can't compare two headers to see if they're the same.

I assume they're different -- that they contain information that identifies that result.

The data part of each of these should be identical.

I don't know how BOINC handles directory fan-outs, but I would expect the companion WU to be in a different directory on the server, based on a hash of the file name.
____________

Josef W. SegurProject donor
Volunteer developer
Volunteer tester
Send message
Joined: 30 Oct 99
Posts: 4348
Credit: 1,129,381
RAC: 957
United States
Message 870346 - Posted: 28 Feb 2009, 8:42:35 UTC - in response to Message 870295.


Surely the data in a workunit (the 366KB or 8MB download file) is identical for all replications. Otherwise validation makes no sense. Have a look in the fanout directories for a current one of yours (the directory name is in the url in client_state.xml). You'll only find one copy.

At the moment, I've got one of the 5.01 "beta" astropulse units that is taking a long time to crunch, so I can't look at Multibeam.

When I look at that workunit, it has a header, plus the data.

I don't have two copies of the same workunit sent to different hosts, so I can't compare two headers to see if they're the same.

I assume they're different -- that they contain information that identifies that result.

The data part of each of these should be identical.

I don't know how BOINC handles directory fan-outs, but I would expect the companion WU to be in a different directory on the server, based on a hash of the file name.

Sorry, Ned, Richard is right. The splitter produces one workunit file and stores it on a download server. It also produces a database record for that workunit which includes a <target_nresults> value, that is what determines how many hosts will initially be told to download a copy of the workunit file (initial replication). The Transitioner creates that many result database entries which are added to the "Results ready to send" pool, the Feeder eventually passes those on to the Scheduler and hosts are told about the tasks. It's the result records in the database which differ for each host, the workunit file is identical.
Joe

Profile [AF>France>Bourgogne]Patouchon
Avatar
Send message
Joined: 25 Aug 01
Posts: 7
Credit: 2,119,827
RAC: 1,013
France
Message 870350 - Posted: 28 Feb 2009, 9:10:01 UTC - in response to Message 869256.

hello from France
i'm happy than the outage was issued, you did a great job.
infortunately i can't get SETI wus, i can only get Astropulse wus and they are too big to crunch with my old PC (about 650 h announced !!!).
do you think i'll can get some in a few hours ?
thanks for you patience
Patrick from "l'Alliance francophone" team.

trouble resolved: i check my preferences and i have now only seti Wus, special thanks to BernardP from my team.
____________
seti1 was pretty good, seti2 will be better ?

Profile [seti.international] Dirk Sadowski
Volunteer tester
Avatar
Send message
Joined: 6 Apr 07
Posts: 7124
Credit: 61,647,272
RAC: 15,435
Germany
Message 870428 - Posted: 28 Feb 2009, 14:21:36 UTC - in response to Message 870350.

hello from France
i'm happy than the outage was issued, you did a great job.
infortunately i can't get SETI wus, i can only get Astropulse wus and they are too big to crunch with my old PC (about 650 h announced !!!).
do you think i'll can get some in a few hours ?
thanks for you patience
Patrick from "l'Alliance francophone" team.

trouble resolved: i check my preferences and i have now only seti Wus, special thanks to BernardP from my team.


Hello,

well that your problem is solved.

For the next time if you have a problem, have a look in the 'number crunching' forum:
http://setiathome.berkeley.edu/forum_forum.php?id=10

There are more people around which could help you.. :-)

Nice greetings from Germany! :-)

____________
BR

SETI@home Needs your Help ... $10 & U get a Star!

Team seti.international

Das Deutsche Cafe. The German Cafe.

Profile Westsail and *Pyxey*
Volunteer tester
Avatar
Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,538,216
RAC: 0
United States
Message 870471 - Posted: 28 Feb 2009, 15:51:25 UTC

What would happen if say half of the top 25 crunchers no longer needed internet access. Why not make it like netflix?
Image someone could put say 10,000 work units on a DVD and post it to me on Monday. Whenever I crunch through that disk I put all the results on a disk pop it in the prepaid mailer and in 3-5 days a new disk arrives. Rinse and repeat. You would need 3 disks for the system to be most efficient. One that is being crunched and two in the mail going opposite directions. The ermm "benefit" would of course require that the cruncher make regular minimum donations to the project. This would go to cover the time involved and supplies shipping etc. Heck with the decreased server load the total workload may actually improve due to less time spent chasing server gremlins. P.S. I second the request to add Alan array data to the project someday if there ever were a shortfall of raw data. I am still working on getting the other Tesla card for this machine. Just bought the wifey a new Mustang yesterday so the card may have to wait a bit. I can't seem to successfully explain to her why I horde video cards for headless machines...lol Hey, this machine has a 9500, tesla, and 260..How come on the computer information page it is listed as 3x gtx260? Thanks!
____________
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13706
Credit: 31,760,370
RAC: 13,474
United States
Message 870472 - Posted: 28 Feb 2009, 15:54:08 UTC - in response to Message 870471.

I'm not so certain that idea would work. One of the Pros of volunteering is that its easy and not a whole lot of work. If people had to load their own workunits via DVD and send results back the same way, they'd quickly find another project.
____________

Profile Westsail and *Pyxey*
Volunteer tester
Avatar
Send message
Joined: 26 Jul 99
Posts: 338
Credit: 20,538,216
RAC: 0
United States
Message 870485 - Posted: 28 Feb 2009, 16:16:56 UTC - in response to Message 870472.

Aloha!
*shrugs* mo better than spending 2-3 days a week with idle machines reading:
"Scheduler request completed: got 0 new tasks" ;)

____________
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov

Previous · 1 · 2 · 3 · 4 · Next

Message boards : Technical News : Tenaya (Feb 24 2009)

Copyright © 2014 University of California