What's up with NEZ?

Author	Message
ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20289 Credit: 7,508,002 RAC: 20	Message 806657 - Posted: 10 Sep 2008, 10:44:38 UTC - in response to Message 806647. Last modified: 10 Sep 2008, 10:46:42 UTC [...] The real issue is how you define 'cluster'. The e@h systems only loosely fit the description in that they have a comparatively low quality interconnect between the 'cluster' nodes. They're more like what we call a server farm. [...] According to dr. Bruce Allen: In comparison to Einstein@Home, Atlas is very general-purpose. It offers high IO bandwidth, rapid access to more than 1 Petabyte of data, fast interprocessor communication, 'reliable' hardware, and other features that E@H lacks... OK, so Atlas includes: Debian GNU/Linux powers Max Planck Institute 32.8 TFlops supercomputer ... hierarchical fully non-blocking network. The EFX 1000 core switch features 144 10 Gb/s CX4 ports and connects currently to 32 TRX100 edge switches which feature 48 1 Gb/s ports and 4x10 Gb/s uplinks, reaching 2880 Gb/s. Also their Sun Fire X4500 are directly connected to the core switch. ... which is a fair bit better than your usual 1Gbit/s LAN interconnect! I think that rates more as a cluster rather than just a mere loosely connected farm. In contrast, the interconnect for Merlin and Morgane looks to be just a hierachically connected Gbit LAN. Myself, I'd rate them as big farms, but you can call them a cluster if the interconnect isn't a bottleneck for the overall performance... Happy fast crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 806657 ·

tullio Volunteer tester Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1	Message 806666 - Posted: 10 Sep 2008, 11:03:01 UTC Do all the ATLAS nodes share a global memory (multiprocessor like) or each has its own private memory (multicomputer) and communicate by passing messages like PVM? This, in my opinion, is the focal point. Tullio ID: 806666 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20289 Credit: 7,508,002 RAC: 20	Message 806675 - Posted: 10 Sep 2008, 11:59:42 UTC - in response to Message 806666. Last modified: 10 Sep 2008, 12:00:34 UTC Do all the ATLAS nodes share a global memory (multiprocessor like) or each has its own private memory (multicomputer) and communicate by passing messages like PVM? This, in my opinion, is the focal point. All three use "off-the-shelf" PC parts. Each node has a multi-core CPU with memory shared for just those on the same motherboard cores, as per any PC. Inter-node communications will be via the OS via the network connections. Physically, it is like having a shelf of individual PCs but with a 10Gbit/s LAN instead of the more common 100Mbit/s or 1Gbit/s LAN. The focus is: When is the internode communications "fast enough" to call the system a cluster? How "closely" must the individual CPUs work on the same data? Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 806675 ·

tullio Volunteer tester Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1	Message 806694 - Posted: 10 Sep 2008, 14:14:10 UTC - in response to Message 806675. Last modified: 10 Sep 2008, 14:16:39 UTC Do all the ATLAS nodes share a global memory (multiprocessor like) or each has its own private memory (multicomputer) and communicate by passing messages like PVM? This, in my opinion, is the focal point. All three use "off-the-shelf" PC parts. Each node has a multi-core CPU with memory shared for just those on the same motherboard cores, as per any PC. Inter-node communications will be via the OS via the network connections. Physically, it is like having a shelf of individual PCs but with a 10Gbit/s LAN instead of the more common 100Mbit/s or 1Gbit/s LAN. The focus is: When is the internode communications "fast enough" to call the system a cluster? How "closely" must the individual CPUs work on the same data? Happy crunchin', Martin I would call it a NUMA machine (Not Uniform Memory Architecture). Right? Tullio ID: 806694 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20289 Credit: 7,508,002 RAC: 20	Message 806726 - Posted: 10 Sep 2008, 16:50:01 UTC - in response to Message 806694. [ATLAS, Merlin, Morgane] All three use "off-the-shelf" PC parts. Each node has a multi-core CPU with memory shared for just those on the same motherboard cores, as per any PC. Inter-node communications will be via the OS via the network connections. Physically, [ATLAS] is like having a shelf of individual PCs but with a 10Gbit/s LAN instead of the more common 100Mbit/s or 1Gbit/s LAN. [The others use slower cheaper more 'conventional' LANs.] The focus is: When is the internode communications "fast enough" to call the system a cluster? How "closely" must the individual CPUs work on the same data? I would call it a NUMA machine (Not Uniform Memory Architecture). Right? For this example, that depends on what level you consider. NUMA: Non-Uniform Memory Access or Non-Uniform Memory Architecture You have NUMA at the motherboard level for the multiple cores for each node. However, I very much doubt that each node CPU can directly address or access the memory space of any other node. Inter-node communication must be via 'message passing' at a level higher than the memory system. For example, the OS or application must implement communication with other nodes. Or have they done something clever with a hypervisor or some such? (I would not expect so, I doubt they have the need.) Further thoughts on the cluster/farm distinction: I think the emphasis must be on whether the nodes are connected via a 'standard' LAN or whether something 'special' has been done to give significantly faster/better node interconnect (cluster) than that possible by a mere LAN (farm). Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 806726 ·

tullio Volunteer tester Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1	Message 806730 - Posted: 10 Sep 2008, 17:08:36 UTC - in response to Message 806726. [ATLAS, Merlin, Morgane] All three use "off-the-shelf" PC parts. Each node has a multi-core CPU with memory shared for just those on the same motherboard cores, as per any PC. Inter-node communications will be via the OS via the network connections. Physically, [ATLAS] is like having a shelf of individual PCs but with a 10Gbit/s LAN instead of the more common 100Mbit/s or 1Gbit/s LAN. [The others use slower cheaper more 'conventional' LANs.] The focus is: When is the internode communications "fast enough" to call the system a cluster? How "closely" must the individual CPUs work on the same data? I would call it a NUMA machine (Not Uniform Memory Architecture). Right? For this example, that depends on what level you consider. NUMA: Non-Uniform Memory Access or Non-Uniform Memory Architecture You have NUMA at the motherboard level for the multiple cores for each node. However, I very much doubt that each node CPU can directly address or access the memory space of any other node. Inter-node communication must be via 'message passing' at a level higher than the memory system. For example, the OS or application must implement communication with other nodes. Or have they done something clever with a hypervisor or some such? (I would not expect so, I doubt they have the need.) Further thoughts on the cluster/farm distinction: I think the emphasis must be on whether the nodes are connected via a 'standard' LAN or whether something 'special' has been done to give significantly faster/better node interconnect (cluster) than that possible by a mere LAN (farm). Happy crunchin', Martin Here is what the top500 list says (#58): Pyramid Cluster Xeon QC 32xx 2.4 GHz, GigEthernet ID: 806730 ·

speedimic Volunteer tester Send message Joined: 28 Sep 02 Posts: 362 Credit: 16,590,653 RAC: 0	Message 806737 - Posted: 10 Sep 2008, 18:10:57 UTC Further thoughts on the cluster/farm distinction: I think the emphasis must be on whether the nodes are connected via a 'standard' LAN or whether something 'special' has been done to give significantly faster/better node interconnect (cluster) than that possible by a mere LAN (farm). What I get from reading across the net is, that it's the cluster software together with the head nodes what makes the difference. In a server-farm setup every computer does the work it gets assigned no matter if it' neighbour sits idle. It's just a bunch of somehow independent computers. In a cluster setup you got head nodes with some cluster-software (Condor in this case) on them which distributes the work and check the status (working/idle) of every computing node. The nodes aren' independent - they only do what the head-node/cluster-software thells them to do. mic. ID: 806737 ·

tullio Volunteer tester Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1	Message 806747 - Posted: 10 Sep 2008, 19:11:30 UTC Last modified: 10 Sep 2008, 19:19:08 UTC I think that the first example of a cluster was the CMP or CM* (if I remember) of Carnegie Mellon University back in the Seventies. At Elettronica San Giorgio ELSAG in Genoa they had conceived a similar system (EMMA, Elaboratore Multi Mini Associativo) where each node was a minicomputer and it was used in a postal automated system later sold also to the US Postal Service. When I described it to prof.Emilio Segre' he told me "then I won't receive any mail". He did not trust Italian technology after having taken part in the Manhattan project. Tullio ID: 806747 ·

OzzFan Volunteer tester Send message Joined: 9 Apr 02 Posts: 15691 Credit: 84,761,841 RAC: 28	Message 806837 - Posted: 11 Sep 2008, 0:56:04 UTC - in response to Message 806647. Can you say CLUSTER? BOINC isn't written for clustering, unless the person has written their own version of BOINC (since it is open source) that will work with clusters. But on E@H there are at least 2 huge clusters participating through BOINC. Indeed so on all counts. The e@h clusters are clusters only in that they are closely connected PCs and that Boinc is coordinating them all on the same or similar tasks. It would be pointless to run Boinc itself on top of a cluster OS of a (closely coupled) cluster system. The real issue is how you define 'cluster'. The e@h systems only loosely fit the description in that they have a comparatively low quality interconnect between the 'cluster' nodes. They're more like what we call a server farm. Happy crunchin', Martin Martin is right. I'd call that a farm as it doesn't really qualify to technically be called a 'cluster'. Though some people refer to a generic farm a 'cluster' of computers, but I'd still call that incorrect. According to dr. Bruce Allen: In comparison to Einstein@Home, Atlas is very general-purpose. It offers high IO bandwidth, rapid access to more than 1 Petabyte of data, fast interprocessor communication, 'reliable' hardware, and other features that E@H lacks... So I would assume that Atlas is a cluster in its most basic sense, however BOINC work done on it will be done in a sense of each node being individual PC with BOINC installed on it. Hope, this explanation eliminates our differences. Greetings, While that may clarify (somewhat) that Atlas could be considered a cluster in the most basic sense, bringing the focus back to the original statement: BOINC does not work with clustering in that it uses each machine as a single node, which is more like a farm. ID: 806837 ·

tullio Volunteer tester Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1	Message 806880 - Posted: 11 Sep 2008, 2:17:07 UTC - in response to Message 806837. While that may clarify (somewhat) that Atlas could be considered a cluster in the most basic sense, bringing the focus back to the original statement: BOINC does not work with clustering in that it uses each machine as a single node, which is more like a farm. I agree. Cheers. Tullio ID: 806880 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20289 Credit: 7,508,002 RAC: 20	Message 806977 - Posted: 11 Sep 2008, 10:06:31 UTC - in response to Message 806880. While that may clarify (somewhat) that Atlas could be considered a cluster in the most basic sense, bringing the focus back to the original statement: BOINC does not work with clustering in that it uses each machine as a single node, which is more like a farm. I agree. Cheers. Sorry... That's far too easy an answer. We should be able to thrash about for when is a cluster not a cluster and when is a farm a cluster or a farm for another 100 posts or so at least! I prefer the description of a cluster being that of interconnected nodes that are coordinated by supervisor nodes/software. Whereas, a farm is a collection of nodes independantly working on a similar task. A cluster should also imply that you have "good" interconnect between the nodes for interprocess and supervisor communication. Meanwhile, what has happened to Nez? Happy fast crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 806977 ·

tullio Volunteer tester Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1	Message 806981 - Posted: 11 Sep 2008, 11:46:49 UTC - in response to Message 806977. Last modified: 11 Sep 2008, 12:18:36 UTC While that may clarify (somewhat) that Atlas could be considered a cluster in the most basic sense, bringing the focus back to the original statement: BOINC does not work with clustering in that it uses each machine as a single node, which is more like a farm. I agree. Cheers. Sorry... That's far too easy an answer. We should be able to thrash about for when is a cluster not a cluster and when is a farm a cluster or a farm for another 100 posts or so at least! I prefer the description of a cluster being that of interconnected nodes that are coordinated by supervisor nodes/software. Whereas, a farm is a collection of nodes independantly working on a similar task. A cluster should also imply that you have "good" interconnect between the nodes for interprocess and supervisor communication. Meanwhile, what has happened to Nez? Happy fast crunchin', Martin I admit that I wanted to end the discussion in order to watch the presentations at the Grenoble BOINC Workshop. Unfortunately, some of the slides, included those of prof.Allen of Einstein@home, were pretty invisible on my screen Too bad. Practically all of the big clusters in the top500 list have high speed interconnections between the nodes, Infiniband or Quadrics, Xeon processors and Linux as OS plus an hypervisor software. I have no experience of clusters but years ago, while at Trieste Science Park as manager of the Unix Bull Laboratory (four Bull employees plus four graduate students) I downloaded a software called PVM (parallel virtual machine) from the University of Tennessee and compiled it on my Bull/MIPS R6000 minicomputer (not to be confused with IBM RS6000) running UNIX System V. I compiled both server and client, then I put a client on a SUN Sparc workstation running SUNOS (practically Berkeley Unix). I was thus able to parallelize a job on those two machines. When I suggested to people of the Istituto Nazionale di Fisica Nucleare to put a client also on their DEC systems running both VMS and the DEC version of UNIX (ultrix), they looked on me with horror. They believed in DEC more than in God. Poor guys. Tullio ID: 806981 ·

Misfit Volunteer tester Send message Joined: 21 Jun 01 Posts: 21804 Credit: 2,815,091 RAC: 0	Message 807059 - Posted: 11 Sep 2008, 18:53:55 UTC - in response to Message 806977. Meanwhile, what has happened to Nez? The thread turned into a cluster .... me@rescam.org ID: 807059 ·

tullio Volunteer tester Send message Joined: 9 Apr 04 Posts: 8797 Credit: 2,930,782 RAC: 1	Message 807060 - Posted: 11 Sep 2008, 18:57:47 UTC - in response to Message 807059. Meanwhile, what has happened to Nez? The thread turned into a cluster .... bomb ID: 807060 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.