Help optimize SETI@Home

Message boards : Number crunching : Help optimize SETI@Home
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
HachPi
Avatar

Send message
Joined: 2 Aug 99
Posts: 481
Credit: 21,807,425
RAC: 21
Belgium
Message 47380 - Posted: 17 Nov 2004, 15:29:09 UTC - in response to Message 47376.  
Last modified: 17 Nov 2004, 15:31:19 UTC

Sorry, but this is not the correct definition of a cluster...
I prefer to call it a giant network of computers but certainly NOT a cluster doing clustered computing.

When a certain huge job is to be done, in a real cluster it can be split up in different parts within the same program and the contstituing parts can be distributed towards a large number of different processors. All this processors work more or less simultaneously and most efficiently on that job together to speed up the finalising of the running program. In this sence the Seti stuff can not be called a cluster. The software simply is not written for the moment to do this.

Greetings ;-))


ID: 47380 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 47518 - Posted: 18 Nov 2004, 7:38:24 UTC - in response to Message 47380.  

> Sorry, but this is not the correct definition of a cluster...
> I prefer to call it a giant network of computers but certainly NOT a cluster
> doing clustered computing.
>
> When a certain huge job is to be done, in a real cluster it can be split up in
> different parts within the same program and the contstituing parts can be
> distributed towards a large number of different processors. All this
> processors work more or less simultaneously and most efficiently on that job
> together to speed up the finalising of the running program. In this sence the
> Seti stuff can not be called a cluster. The software simply is not written for
> the moment to do this.
>
> Greetings ;-))

Hi!

Doing a "Google" search on "Define: cluster" gets many definitions of which there are two that I find relevant:

1) (1) (n.) A group of computers connected by a high-speed network that work together as if they were one machine with multiple CPUs. (2) (n.) A logical collection of packages (software modules). (3) (n.) A group of software packages. Clusters can contain other clusters. Clusters and their components form a hierarchical tree.
Link

2) A group of independent computer systems known as nodes or hosts, that work together as a single system to ensure that mission-critical applications and resources remain available to clients. A server cluster is the type of cluster that the Cluster service implements. Network Load Balancing provides a software solution for clustering multiple computers running Windows 2000 Server that provides networked services over the Internet and private intranets.
Link

the two definitions support our two viewpoints.

Which is why I used the "traditionally" tag in my reply. The use of a group of homogenous machines to make a "supercomputer" is a fairly new phenomena based mostly on cost considerations and the fact that there are many problems that this architecture is usable to perform solutions. Clusters of inexpensive machines allows for large virtual machine to be created in a loosly coupled architecture and also in "hyper-cubes" to improve ajacent node communications with less expensive channels.

Along these lines we should note that this architecture choice is not a perfect fit in all cases. The NSF recently started to fund research into other classes of machine architectures because we (the USA) have started to fall behind in the technologies that enable the creation of the old style supercomputers of the Cray type (I did not save the reference so I cannot point to where I saw this).

BOINC and now the IBM effort are an interesting set because the "homogenous" computers are not that similar to each other. I mean in the "normal" cluster type supercomputer all of the nodes are identical. This obviously not true for BOINC with Linux, Macintosh OS-X, and Microsoft Windows machines of varying memory sizes, disk sizes, processor speeds, etc. being harnessed to perform work.

One last point is that many providers of web sites using large quantities of PC class machines find that at some point that the total cost of the machines and the management overhead of caring and feeding thousands of PCs is more expensive and less reliable than using a single mainframe machine to perform the same workload. Contrary to persistent rumors, the mainframe is not dead ...


ID: 47518 · Report as offensive
Alex

Send message
Joined: 26 Sep 01
Posts: 260
Credit: 2,327
RAC: 0
Canada
Message 47523 - Posted: 18 Nov 2004, 8:37:30 UTC

Sorry paul, but both definitions of cluster have 'work together' in their descriptions.

A pair of P3's crunching seti don't 'work together'. They don't even communicate the fact that they're crunching seti to each other.

The nodes of a real cluster talk to each other, or have one machine acting as a traffic cop for computing threads to be processed by different cpu's in the group.
In the case of a 'web server cluster', if one machine goes down, the other machines make up for the missing machine.

Openmosix simulates one giant CPU. When you run seti on an openmosix cluster, it tends to run one thread (try it when running clusterknoppix). It'll also move computing threads from the inefficient machines to the machines that are not busy.

OSCAR is another linux package which does clustering, and it's used by the BC Genome Science Center. Their servers do a network boot, hook up to a master node, and then take instructions from that node.

Google is just a text search engine. Just because 'cluster definition' is printed on the web, doesn't mean it was written in the right context.
ID: 47523 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 47554 - Posted: 18 Nov 2004, 13:52:15 UTC - in response to Message 47523.  

> Sorry paul, but both definitions of cluster have 'work together' in their
> descriptions.

Well, I guess we will have to agree to disagree.

I will point out that the definitions though acquired via Google are not necessarily perfect, but if you look at the links one was written by Microsoft and the other by Sun. I grant you I distrust most things from Microsoft, but I would submit that these two definitions are about the best official definitions that can be found for the word "cluster".

Secondly my original point was that the primary purpose of the use of a cluster was to mitigate situations where a hardware failure occurred. In the face of this problem the other nodes of the cluster can take over for the failed node.

BOINC does still fit within these definitions in that each participant's computer does work on the same body of source material and failures of one node does not damage this process. Secondly, though there are not direct links between the nodes in the BOINC system, there is still the control exercized by the servers that the participant connects to to obtain and report work. I do grant that the coupling of the nodes is very loose, much looser than in most clusters such as those that make up a "cluster" supercomputer.

Next, one of the prime uses of a "cluster" in today's environment is to make up a web server. And we seem to agraee here. As need grows, throw in some more boxes. Since the main parts of web communication are "stateless" this works just fine. Well, almost. It works fine up to that point of diminishing returns I mentioned. However, the machines in the web cluster don't really comunicate with each other either.

:)

As I said, maybe we should agree to not agree on this one. I think that BOINC falls within the bounds of the definition of a cluster and that one of the primary reasons to use a cluster is to mitigate node failures. I did not say that the next reason to use a cluster such as this is to improve processing speed. Since that is pretty much a given ...

ID: 47554 · Report as offensive
Profile SwissNic
Avatar

Send message
Joined: 27 Nov 99
Posts: 78
Credit: 633,713
RAC: 0
United Kingdom
Message 47562 - Posted: 18 Nov 2004, 14:38:11 UTC - in response to Message 47523.  

> Openmosix simulates one giant CPU. When you run seti on an openmosix cluster,
> it tends to run one thread (try it when running clusterknoppix). It'll also
> move computing threads from the inefficient machines to the machines that are
> not busy.

Hi there,

The statement above really interested. Being able to combine many machines into one would be awesome! Unfortunately, the OpenMOSIX website doesnt support you comments. It says OpenMOSIX can turn a number of different machines into one SMP machine. (SMP stands for Symetric Multi-processor)

Now, I have a Dual Opteron SMP machine, and BOINC runs two seti processes on this machine - one for each processor.

Now if an OpenMOSIX cluster (say 4 nodes) runs SETI - it would assign one work unit to each node, and therefore process each WU on an individual cpu. How is this different from running these machines individually?

This is contrary to your statement that OpenMosic simulates one giant cpu...

Is there any software that would allow multiple computers to be merged into one virtual cpu?

Cheers,

SwissNic


ID: 47562 · Report as offensive
HachPi
Avatar

Send message
Joined: 2 Aug 99
Posts: 481
Credit: 21,807,425
RAC: 21
Belgium
Message 47599 - Posted: 18 Nov 2004, 16:32:02 UTC - in response to Message 47554.  

Everyone intrested in this discussion should read the three articles by Robbins (Intel),

http://www.intel.com/cd/ids/developer/asmo-na/eng/20449.htm

They do enlighten the subject (from Open Mosix point of view).
He knows what he is talking about because he stood at the cradle of development...

As is often the case with -new-, our existing definitions are out of date or not encompassing the new happenings or are unsufficient to accurately discribe what we really mean.
If possible always go straight towards the forerunners themselves as in the above described three articles, because very often what internet (Google) is saying is outdated/even scientifically inaccurate.
I'm a scientist myself and I never trust what's on the web, except for the university pages/articles of known fellow researchers. The draw back is often you only can get in if a password is granted to you / or if your department is willing to pay for admission.
Only in this way you can get in my opinion firm proof because the writer of the article by name is known and his scientific reputation will be in question if he fakes...

Greetings from Belgium ;-)
Hugo P
Professor Physics Chemistry


ID: 47599 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 47605 - Posted: 18 Nov 2004, 16:46:50 UTC

All of you cluster people are welcome to come help develop seti source ;)

Just head on over to Join Up Forum and make a post.
ID: 47605 · Report as offensive
Mark Day
Avatar

Send message
Joined: 19 Aug 02
Posts: 81
Credit: 502,830
RAC: 0
United States
Message 47616 - Posted: 18 Nov 2004, 17:31:07 UTC
Last modified: 18 Nov 2004, 17:33:39 UTC

I considered cooking up something with PVM (parallel virtual machine) along these lines. Just as the seti splitter doles out work units to each host, so too would
a PVM "workunit splitter" parallelize each of these units to a number of given hosts. Note that we are talking process parallelizing here and not clustering.

Something like this could still run under boinc which would run the master node
much like the seti client runs already. PVM has some really cool stuff, like
being able to assign hosts a speed factor based on their processing power, its
multiprocessor aware, works across nfs (shared tmp etc).

See http://www.csm.ornl.gov/pvm/pvm_home.html

Mark
ID: 47616 · Report as offensive
Profile SwissNic
Avatar

Send message
Joined: 27 Nov 99
Posts: 78
Credit: 633,713
RAC: 0
United Kingdom
Message 47790 - Posted: 19 Nov 2004, 8:27:15 UTC - in response to Message 47616.  

> I considered cooking up something with PVM (parallel virtual machine) along
> these lines. Just as the seti splitter doles out work units to each host, so
> too would
> a PVM "workunit splitter" parallelize each of these units to a number of given
> hosts. Note that we are talking process parallelizing here and not
> clustering.

Hi Mark,

PVM is mentioned in the Intel OpenMOSIX documentation. Doesn't PVM require you to add special code and libraries to the BPOINC client to get it to run? OpenMOSIX would achieve the same or similar results whilst using a standard client...

This is not my area of expertise, so apologies if I'm totally wrong on this - just what I read... link below...

http://www.intel.com/cd/ids/developer/asmo-na/eng/microprocessors/ia32/xeon/threading/20449.htm

Cheers, Nic.
ID: 47790 · Report as offensive
Mark Day
Avatar

Send message
Joined: 19 Aug 02
Posts: 81
Credit: 502,830
RAC: 0
United States
Message 47816 - Posted: 19 Nov 2004, 12:57:32 UTC
Last modified: 19 Nov 2004, 13:11:16 UTC

>Doesn't PVM require you to add special code and libraries to the BOINC client to get it to run?

Yes, it does. From the OpenMOSIX URL...

"OpenMosix, like an SMP system, cannot execute a single process on multiple physical CPUs at the same time."

In addition...

"OpenMosix can migrate most standard Linux processes between nodes with no problem. If an application forks many child processes, each of which performs work, then openMosix will be able to migrate each one of these processes to an appropriate node in the cluster."

This is not us.. we have a single process, pieces of which (the fft code and
analysis routines) could be run on multiple physical processors simultaneously,
giving the illusion of one powerful CPU. This is what PVM does. OpenMOSIX also
requires in-kernel support. PVM does not, and is available for many platforms,
including Windows.

Lets say we have 3 nodes and each one can complete a workunit in 3 hours...

Clustering ala OpenMOSIX gives us 3 workunits in 3 hours.

Process parallelizing ala PVM gives us 1 workunit in 1 hour.

Note that over time, the same amount of work is done, its just a matter of
how one gets there ;)

My thinking is that I could put more nodes to work that Im currently not using,
I have several pentium and sparc based machines that are sitting around idling
because they just cant keep up, so maybe they could actually contribute.

Mark
ID: 47816 · Report as offensive
Profile SwissNic
Avatar

Send message
Joined: 27 Nov 99
Posts: 78
Credit: 633,713
RAC: 0
United Kingdom
Message 47832 - Posted: 19 Nov 2004, 14:23:10 UTC - in response to Message 47816.  

Thanks Mark - I thought I had understood in correctly. PVM is what I would like to run then too...

> I have several pentium and sparc based machines that are sitting around
> idling
> because they just cant keep up, so maybe they could actually contribute.

A pentium that can't keep up? What do you mean???

I have several P1's P2's all helping the effort - and my team is specialising in low-powered computing. These low powered computers are helping to keep the awarded credit higher!!! ;o)))

I look forward to you joing Team Sausageroll then Mark - even if it's only with a couple of your old machines!!! ;o)))

Cheers, Nic.

ID: 47832 · Report as offensive
Profile SwissNic
Avatar

Send message
Joined: 27 Nov 99
Posts: 78
Credit: 633,713
RAC: 0
United Kingdom
Message 47835 - Posted: 19 Nov 2004, 14:40:46 UTC - in response to Message 47816.  


> Process parallelizing ala PVM gives us 1 workunit in 1 hour.

I just remembered somethng. The fourier transform calculation which seti does are an iteritive process, where the result from one calculation is used as the data for the next. This type of calculation cannot be done in parallel due to the very nature of math... hence why a grid computer is perfect for this sort of calculation...

See http://aurora.phys.utk.edu/~forrest/papers/fourier/ for a much better explanation!

Cheers, Nic.


ID: 47835 · Report as offensive
Ned Slider

Send message
Joined: 12 Oct 01
Posts: 668
Credit: 4,375,315
RAC: 0
United Kingdom
Message 47841 - Posted: 19 Nov 2004, 15:12:50 UTC - in response to Message 46783.  

> Ned,
>
> Join up on sourceforge.
> -- https://sourceforge.net/account/register.php
>
> ... and post your sourceforge ID. I will then make you a "documenter".
>
> There is a doc posting section for managing online docs.
>
> Anyone can post probationary documentation, only admins and documenters can
> make them visibile to site visitors.
>
>

Ben,

I've signed up and posted my details in your "How to apply" thread.

Ned

*** My Guide to Compiling Optimised BOINC and SETI Clients ***
*** Download Optimised BOINC and SETI Clients for Linux Here ***
ID: 47841 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : Help optimize SETI@Home


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.