What's up with NEZ?

Message boards : Number crunching : What's up with NEZ?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6

AuthorMessage
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 9529
Credit: 7,379,080
RAC: 148
United Kingdom
Message 806657 - Posted: 10 Sep 2008, 10:44:38 UTC - in response to Message 806647.  
Last modified: 10 Sep 2008, 10:46:42 UTC

[...]
The real issue is how you define 'cluster'. The e@h systems only loosely fit the description in that they have a comparatively low quality interconnect between the 'cluster' nodes. They're more like what we call a server farm.

[...]

According to dr. Bruce Allen:
In comparison to Einstein@Home, Atlas is very general-purpose. It offers high IO bandwidth, rapid access to more than 1 Petabyte of data, fast interprocessor communication, 'reliable' hardware, and other features that E@H lacks...

OK, so Atlas includes:

Debian GNU/Linux powers Max Planck Institute 32.8 TFlops supercomputer
... hierarchical fully non-blocking network. The EFX 1000 core switch features 144 10 Gb/s CX4 ports and connects currently to 32 TRX100 edge switches which feature 48 1 Gb/s ports and 4x10 Gb/s uplinks, reaching 2880 Gb/s. Also their Sun Fire X4500 are directly connected to the core switch.

... which is a fair bit better than your usual 1Gbit/s LAN interconnect!

I think that rates more as a cluster rather than just a mere loosely connected farm.


In contrast, the interconnect for Merlin and Morgane looks to be just a hierachically connected Gbit LAN. Myself, I'd rate them as big farms, but you can call them a cluster if the interconnect isn't a bottleneck for the overall performance...

Happy fast crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 806657 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 7382
Credit: 2,348,353
RAC: 1,430
Italy
Message 806666 - Posted: 10 Sep 2008, 11:03:01 UTC

Do all the ATLAS nodes share a global memory (multiprocessor like) or each has its own private memory (multicomputer) and communicate by passing messages like PVM? This, in my opinion, is the focal point.
Tullio
ID: 806666 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 9529
Credit: 7,379,080
RAC: 148
United Kingdom
Message 806675 - Posted: 10 Sep 2008, 11:59:42 UTC - in response to Message 806666.  
Last modified: 10 Sep 2008, 12:00:34 UTC

Do all the ATLAS nodes share a global memory (multiprocessor like) or each has its own private memory (multicomputer) and communicate by passing messages like PVM? This, in my opinion, is the focal point.

All three use "off-the-shelf" PC parts. Each node has a multi-core CPU with memory shared for just those on the same motherboard cores, as per any PC. Inter-node communications will be via the OS via the network connections.

Physically, it is like having a shelf of individual PCs but with a 10Gbit/s LAN instead of the more common 100Mbit/s or 1Gbit/s LAN.

The focus is: When is the internode communications "fast enough" to call the system a cluster? How "closely" must the individual CPUs work on the same data?

Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 806675 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 7382
Credit: 2,348,353
RAC: 1,430
Italy
Message 806694 - Posted: 10 Sep 2008, 14:14:10 UTC - in response to Message 806675.  
Last modified: 10 Sep 2008, 14:16:39 UTC

Do all the ATLAS nodes share a global memory (multiprocessor like) or each has its own private memory (multicomputer) and communicate by passing messages like PVM? This, in my opinion, is the focal point.

All three use "off-the-shelf" PC parts. Each node has a multi-core CPU with memory shared for just those on the same motherboard cores, as per any PC. Inter-node communications will be via the OS via the network connections.

Physically, it is like having a shelf of individual PCs but with a 10Gbit/s LAN instead of the more common 100Mbit/s or 1Gbit/s LAN.

The focus is: When is the internode communications "fast enough" to call the system a cluster? How "closely" must the individual CPUs work on the same data?

Happy crunchin',
Martin

I would call it a NUMA machine (Not Uniform Memory Architecture). Right?
Tullio
ID: 806694 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 9529
Credit: 7,379,080
RAC: 148
United Kingdom
Message 806726 - Posted: 10 Sep 2008, 16:50:01 UTC - in response to Message 806694.  

[ATLAS, Merlin, Morgane]
All three use "off-the-shelf" PC parts. Each node has a multi-core CPU with memory shared for just those on the same motherboard cores, as per any PC. Inter-node communications will be via the OS via the network connections.

Physically, [ATLAS] is like having a shelf of individual PCs but with a 10Gbit/s LAN instead of the more common 100Mbit/s or 1Gbit/s LAN. [The others use slower cheaper more 'conventional' LANs.]

The focus is: When is the internode communications "fast enough" to call the system a cluster? How "closely" must the individual CPUs work on the same data?

I would call it a NUMA machine (Not Uniform Memory Architecture). Right?

For this example, that depends on what level you consider.

NUMA: Non-Uniform Memory Access or Non-Uniform Memory Architecture

You have NUMA at the motherboard level for the multiple cores for each node. However, I very much doubt that each node CPU can directly address or access the memory space of any other node. Inter-node communication must be via 'message passing' at a level higher than the memory system. For example, the OS or application must implement communication with other nodes.

Or have they done something clever with a hypervisor or some such? (I would not expect so, I doubt they have the need.)


Further thoughts on the cluster/farm distinction: I think the emphasis must be on whether the nodes are connected via a 'standard' LAN or whether something 'special' has been done to give significantly faster/better node interconnect (cluster) than that possible by a mere LAN (farm).

Happy crunchin',
Martin


See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 806726 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 7382
Credit: 2,348,353
RAC: 1,430
Italy
Message 806730 - Posted: 10 Sep 2008, 17:08:36 UTC - in response to Message 806726.  

[ATLAS, Merlin, Morgane]
All three use "off-the-shelf" PC parts. Each node has a multi-core CPU with memory shared for just those on the same motherboard cores, as per any PC. Inter-node communications will be via the OS via the network connections.

Physically, [ATLAS] is like having a shelf of individual PCs but with a 10Gbit/s LAN instead of the more common 100Mbit/s or 1Gbit/s LAN. [The others use slower cheaper more 'conventional' LANs.]

The focus is: When is the internode communications "fast enough" to call the system a cluster? How "closely" must the individual CPUs work on the same data?

I would call it a NUMA machine (Not Uniform Memory Architecture). Right?

For this example, that depends on what level you consider.

NUMA: Non-Uniform Memory Access or Non-Uniform Memory Architecture

You have NUMA at the motherboard level for the multiple cores for each node. However, I very much doubt that each node CPU can directly address or access the memory space of any other node. Inter-node communication must be via 'message passing' at a level higher than the memory system. For example, the OS or application must implement communication with other nodes.

Or have they done something clever with a hypervisor or some such? (I would not expect so, I doubt they have the need.)


Further thoughts on the cluster/farm distinction: I think the emphasis must be on whether the nodes are connected via a 'standard' LAN or whether something 'special' has been done to give significantly faster/better node interconnect (cluster) than that possible by a mere LAN (farm).

Happy crunchin',
Martin


Here is what the top500 list says (#58):
Pyramid Cluster Xeon QC 32xx 2.4 GHz, GigEthernet

ID: 806730 · Report as offensive
Profile speedimic
Volunteer tester
Avatar

Send message
Joined: 28 Sep 02
Posts: 362
Credit: 16,590,653
RAC: 0
Germany
Message 806737 - Posted: 10 Sep 2008, 18:10:57 UTC

Further thoughts on the cluster/farm distinction: I think the emphasis must be on whether the nodes are connected via a 'standard' LAN or whether something 'special' has been done to give significantly faster/better node interconnect (cluster) than that possible by a mere LAN (farm).


What I get from reading across the net is, that it's the cluster software together with the head nodes what makes the difference.

In a server-farm setup every computer does the work it gets assigned no matter if it' neighbour sits idle. It's just a bunch of somehow independent computers.

In a cluster setup you got head nodes with some cluster-software (Condor in this case) on them which distributes the work and check the status (working/idle) of every computing node. The nodes aren' independent - they only do what the head-node/cluster-software thells them to do.

mic.


ID: 806737 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 7382
Credit: 2,348,353
RAC: 1,430
Italy
Message 806747 - Posted: 10 Sep 2008, 19:11:30 UTC
Last modified: 10 Sep 2008, 19:19:08 UTC

I think that the first example of a cluster was the CMP or CM* (if I remember) of Carnegie Mellon University back in the Seventies. At Elettronica San Giorgio ELSAG in Genoa they had conceived a similar system (EMMA, Elaboratore Multi Mini Associativo) where each node was a minicomputer and it was used in a postal automated system later sold also to the US Postal Service. When I described it to prof.Emilio Segre' he told me "then I won't receive any mail". He did not trust Italian technology after having taken part in the Manhattan project.
Tullio
ID: 806747 · Report as offensive
OzzFan Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Apr 02
Posts: 15666
Credit: 75,827,165
RAC: 87,420
United States
Message 806837 - Posted: 11 Sep 2008, 0:56:04 UTC - in response to Message 806647.  

Can you say CLUSTER?

BOINC isn't written for clustering, unless the person has written their own version of BOINC (since it is open source) that will work with clusters.


But on E@H there are at least 2 huge clusters participating through BOINC.

Indeed so on all counts.

The e@h clusters are clusters only in that they are closely connected PCs and that Boinc is coordinating them all on the same or similar tasks.

It would be pointless to run Boinc itself on top of a cluster OS of a (closely coupled) cluster system.


The real issue is how you define 'cluster'. The e@h systems only loosely fit the description in that they have a comparatively low quality interconnect between the 'cluster' nodes. They're more like what we call a server farm.

Happy crunchin',
Martin


Martin is right. I'd call that a farm as it doesn't really qualify to technically be called a 'cluster'. Though some people refer to a generic farm a 'cluster' of computers, but I'd still call that incorrect.


According to dr. Bruce Allen:
In comparison to Einstein@Home, Atlas is very general-purpose. It offers high IO bandwidth, rapid access to more than 1 Petabyte of data, fast interprocessor communication, 'reliable' hardware, and other features that E@H lacks...

So I would assume that Atlas is a cluster in its most basic sense, however BOINC work done on it will be done in a sense of each node being individual PC with BOINC installed on it.

Hope, this explanation eliminates our differences.

Greetings,


While that may clarify (somewhat) that Atlas could be considered a cluster in the most basic sense, bringing the focus back to the original statement: BOINC does not work with clustering in that it uses each machine as a single node, which is more like a farm.
ID: 806837 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 7382
Credit: 2,348,353
RAC: 1,430
Italy
Message 806880 - Posted: 11 Sep 2008, 2:17:07 UTC - in response to Message 806837.  


While that may clarify (somewhat) that Atlas could be considered a cluster in the most basic sense, bringing the focus back to the original statement: BOINC does not work with clustering in that it uses each machine as a single node, which is more like a farm.

I agree. Cheers.
Tullio
ID: 806880 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 9529
Credit: 7,379,080
RAC: 148
United Kingdom
Message 806977 - Posted: 11 Sep 2008, 10:06:31 UTC - in response to Message 806880.  

While that may clarify (somewhat) that Atlas could be considered a cluster in the most basic sense, bringing the focus back to the original statement: BOINC does not work with clustering in that it uses each machine as a single node, which is more like a farm.

I agree. Cheers.

Sorry... That's far too easy an answer.

We should be able to thrash about for when is a cluster not a cluster and when is a farm a cluster or a farm for another 100 posts or so at least!

I prefer the description of a cluster being that of interconnected nodes that are coordinated by supervisor nodes/software. Whereas, a farm is a collection of nodes independantly working on a similar task.

A cluster should also imply that you have "good" interconnect between the nodes for interprocess and supervisor communication.


Meanwhile, what has happened to Nez?

Happy fast crunchin',
Martin

See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 806977 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 7382
Credit: 2,348,353
RAC: 1,430
Italy
Message 806981 - Posted: 11 Sep 2008, 11:46:49 UTC - in response to Message 806977.  
Last modified: 11 Sep 2008, 12:18:36 UTC

While that may clarify (somewhat) that Atlas could be considered a cluster in the most basic sense, bringing the focus back to the original statement: BOINC does not work with clustering in that it uses each machine as a single node, which is more like a farm.

I agree. Cheers.

Sorry... That's far too easy an answer.

We should be able to thrash about for when is a cluster not a cluster and when is a farm a cluster or a farm for another 100 posts or so at least!

I prefer the description of a cluster being that of interconnected nodes that are coordinated by supervisor nodes/software. Whereas, a farm is a collection of nodes independantly working on a similar task.

A cluster should also imply that you have "good" interconnect between the nodes for interprocess and supervisor communication.


Meanwhile, what has happened to Nez?

Happy fast crunchin',
Martin

I admit that I wanted to end the discussion in order to watch the presentations at the Grenoble BOINC Workshop. Unfortunately, some of the slides, included those of prof.Allen of Einstein@home, were pretty invisible on my screen Too bad. Practically all of the big clusters in the top500 list have high speed interconnections between the nodes, Infiniband or Quadrics, Xeon processors and Linux as OS plus an hypervisor software. I have no experience of clusters but years ago, while at Trieste Science Park as manager of the Unix Bull Laboratory (four Bull employees plus four graduate students) I downloaded a software called PVM (parallel virtual machine) from the University of Tennessee and compiled it on my Bull/MIPS R6000 minicomputer (not to be confused with IBM RS6000) running UNIX System V. I compiled both server and client, then I put a client on a SUN Sparc workstation running SUNOS (practically Berkeley Unix). I was thus able to parallelize a job on those two machines. When I suggested to people of the Istituto Nazionale di Fisica Nucleare to put a client also on their DEC systems running both VMS and the DEC version of UNIX (ultrix), they looked on me with horror. They believed in DEC more than in God. Poor guys.
Tullio
ID: 806981 · Report as offensive
Profile Misfit
Volunteer tester
Avatar

Send message
Joined: 21 Jun 01
Posts: 21803
Credit: 2,815,091
RAC: 0
United States
Message 807059 - Posted: 11 Sep 2008, 18:53:55 UTC - in response to Message 806977.  

Meanwhile, what has happened to Nez?

The thread turned into a cluster ....
me@rescam.org
ID: 807059 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 7382
Credit: 2,348,353
RAC: 1,430
Italy
Message 807060 - Posted: 11 Sep 2008, 18:57:47 UTC - in response to Message 807059.  

Meanwhile, what has happened to Nez?

The thread turned into a cluster ....

bomb
ID: 807060 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6

Message boards : Number crunching : What's up with NEZ?


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.