Best performing hardware

Author	Message
Ryan Munro Send message Joined: 5 Feb 06 Posts: 63 Credit: 18,519,866 RAC: 10	Message 1602166 - Posted: 18 Nov 2014, 14:54:27 UTC So just a thought, what kit out there would perform the best for Seti?, Im not talking big clusters ect just specific pieces of hardware. I would assume it would be some form of GPU but what about the Intel Xeon Phi cards for example? I'm not looking to purchase just curious as to what the ideal Seti rig would be :) ID: 1602166 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1602182 - Posted: 18 Nov 2014, 15:30:02 UTC - in response to Message 1602166. So just a thought, what kit out there would perform the best for Seti?, Im not talking big clusters ect just specific pieces of hardware. I would assume it would be some form of GPU but what about the Intel Xeon Phi cards for example? I'm not looking to purchase just curious as to what the ideal Seti rig would be :) I'm not sure that anyone has developed any SETI@home apps for the Xeon Phi hardware yet. However at the moment the best performance per watt would be doing work on GPUs. The type of work depends on which vendor hardware is the most efficient. For SETI@home (MB work): NV GPUs using CUDA For Astropulse (AP work): ATI GPUs using OpenCL That isn't saying MB work on ATI GPUs or AP work on NV GPUs is bad. In the NV range the 750ti seems to be the best PPW GPU. I have not seen enough reports on the new NV 900 series GPUs to know how well they preform at the moment. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1602182 ·

qbit Volunteer tester Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0	Message 1602206 - Posted: 18 Nov 2014, 16:30:57 UTC Correct me if I'm wrong, but AFAIK the important thing for SETI is single precision performance. The Xeon Phi 7120P seems to have a theoretical single precision peak of 2.4 TFLOPS/s: http://www.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-theoretical-maximums.html A GTX780 or a Titan seems to be much faster: The GTX 780 still offers respectable single precision performance though, clocking in at 4 Teraflops compared to the Titan's 4.5 Teraflops. http://www.maximumpc.com/article/news/geforce_gtx_780_benchmarks The Titan Black is rated at 5.1 TFLOPS/s http://www.bit-tech.net/news/hardware/2014/02/18/nvidia-gtx-titan-black-launched/1 The GTX980 should be about the same with ~5 TFLOPS/s http://www.pcworld.com/article/2686115/nvidia-unveils-its-all-new-geforce-gtx-980-and-gtx-970-graphics-processors.html But the by far fastest "computing unit" is the human brain, which is rated at ~1000000 TFLOPS/s (that's 1 Exaflop/s)! So maybe we should look for a way to use our brains to crunch SETI ;-) ID: 1602206 ·

ivan Volunteer tester Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223	Message 1602280 - Posted: 18 Nov 2014, 22:55:30 UTC - in response to Message 1602206. Correct me if I'm wrong, but AFAIK the important thing for SETI is single precision performance. The Xeon Phi 7120P seems to have a theoretical single precision peak of 2.4 TFLOPS/s: http://www.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-theoretical-maximums.html A GTX780 or a Titan seems to be much faster: The GTX 780 still offers respectable single precision performance though, clocking in at 4 Teraflops compared to the Titan's 4.5 Teraflops. http://www.maximumpc.com/article/news/geforce_gtx_780_benchmarks The Titan Black is rated at 5.1 TFLOPS/s http://www.bit-tech.net/news/hardware/2014/02/18/nvidia-gtx-titan-black-launched/1 The GTX980 should be about the same with ~5 TFLOPS/s http://www.pcworld.com/article/2686115/nvidia-unveils-its-all-new-geforce-gtx-980-and-gtx-970-graphics-processors.html I have a Xeon Phi. I gave up on trying to port BOINC and S@H to it. For the best performance you have to run native code on the Phi cluster (otherwise communication bottlenecks between the host and the cluster slow you down). That would mean running BOINC on the Phi as well and BOINC code is not the most portable (but not too bad if you ignore boincmgr and use boinccmd for all interactions). Then there's the memory problem -- our particular model only has 8 GB of ram, which has to contain the OS as well as applications. At the moment top reports 7.6 GB free, so for 60 cores x 4 threads that'd be only around 30 MB available per thread; currently on an Ubuntu box s@h reports 104 or 164 MB virtual memory per process, 40 or 96 MB resident per process, and 12 MB KB shared. (Similar figures on a RHEL box; the one job on Ubuntu that's taking more RAM is not the vlar WU running at the moment.) This might all change with the new Phis, which slot into motherboard sockets and can access main memory, but Intel hasn't offered me one to play with yet... ID: 1602280 ·

Admiral Gloval Send message Joined: 31 Mar 13 Posts: 20287 Credit: 5,308,449 RAC: 0	Message 1602329 - Posted: 19 Nov 2014, 1:00:56 UTC If you want a hint at a number cruncher. Just go to statistics and look at the top performers. ID: 1602329 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1602518 - Posted: 19 Nov 2014, 13:17:26 UTC - in response to Message 1602280. [quote] Then there's the memory problem -- our particular model only has 8 GB of ram, which has to contain the OS as well as applications. At the moment top reports 7.6 GB free, so for 60 cores x 4 threads that'd be only around 30 MB available per thread; currently on an Ubuntu box s@h reports 104 or 164 MB virtual memory per process, 40 or 96 MB resident per process, and 12 MB KB shared. (Similar figures on a RHEL box; the one job on Ubuntu that's taking more RAM is not the vlar WU running at the moment.) This might all change with the new Phis, which slot into motherboard sockets and can access main memory, but Intel hasn't offered me one to play with yet... Would it be possible to use massively-parallel system via OpenCL drivers and not to run separate instance on each node/core ? ID: 1602518 ·

Ryan Munro Send message Joined: 5 Feb 06 Posts: 63 Credit: 18,519,866 RAC: 10	Message 1602531 - Posted: 19 Nov 2014, 13:50:33 UTC SP / DP was to be my next question but that's been answered, so it looks like if you had the cash a Dual 18 core Xeon box with 2x Titan Z's would be the best bet? Damn when I win the lottery I am defiantly building the worlds best home Seti box, something you can still run off the mains and use day to day I think, would be awesome getting to play with that kit and giving something back with all the power :) ID: 1602531 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1602548 - Posted: 19 Nov 2014, 15:13:46 UTC - in response to Message 1602531. SP / DP was to be my next question but that's been answered, so it looks like if you had the cash a Dual 18 core Xeon box with 2x Titan Z's would be the best bet? Damn when I win the lottery I am defiantly building the worlds best home Seti box, something you can still run off the mains and use day to day I think, would be awesome getting to play with that kit and giving something back with all the power :) You don't really need that much CPU oomph. An i5 or i7 with a pair of those GPUs would likely be in the top 15-20 computers for the project. The CPUs, in comparison to the GPUs, would only account for a small fraction of the work the machine completes. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1602548 ·

ivan Volunteer tester Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223	Message 1602580 - Posted: 19 Nov 2014, 16:43:19 UTC - in response to Message 1602518. [quote] Then there's the memory problem -- our particular model only has 8 GB of ram, which has to contain the OS as well as applications. At the moment top reports 7.6 GB free, so for 60 cores x 4 threads that'd be only around 30 MB available per thread; currently on an Ubuntu box s@h reports 104 or 164 MB virtual memory per process, 40 or 96 MB resident per process, and 12 MB KB shared. (Similar figures on a RHEL box; the one job on Ubuntu that's taking more RAM is not the vlar WU running at the moment.) This might all change with the new Phis, which slot into motherboard sockets and can access main memory, but Intel hasn't offered me one to play with yet... Would it be possible to use massively-parallel system via OpenCL drivers and not to run separate instance on each node/core ? Perhaps, you have more experience there than I. I've done a few OpenMP things but not really had the chance to play with it otherwise. Do you have to explicitly parallelise everything in OpenCL or does the system work that out for you? (Guess I should dig out your code if I get an idle moment.) Another possibility would be to try to get the shareable section much larger by using shared libraries: [eesridr:~] > ldd BOINC/projects/setiathome.berkeley.edu/setiathome_7.01_x86_64-pc-linux-gnu not a dynamic executable ID: 1602580 ·

Juha Volunteer tester Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0	Message 1602689 - Posted: 19 Nov 2014, 20:57:54 UTC - in response to Message 1602280. Then there's the memory problem -- our particular model only has 8 GB of ram, which has to contain the OS as well as applications. At the moment top reports 7.6 GB free, so for 60 cores x 4 threads that'd be only around 30 MB available per thread; currently on an Ubuntu box s@h reports 104 or 164 MB virtual memory per process, 40 or 96 MB resident per process, and 12 MB KB shared. (Similar figures on a RHEL box; the one job on Ubuntu that's taking more RAM is not the vlar WU running at the moment.) If you are running stock apps, those are UPX compressed. Decompress them and see what figures you get then. Although, iirc, the apps use more than 30 MB for data. ID: 1602689 ·

ivan Volunteer tester Send message Joined: 5 Mar 01 Posts: 783 Credit: 348,560,338 RAC: 223	Message 1602713 - Posted: 19 Nov 2014, 21:49:26 UTC - in response to Message 1602689. Then there's the memory problem -- our particular model only has 8 GB of ram, which has to contain the OS as well as applications. At the moment top reports 7.6 GB free, so for 60 cores x 4 threads that'd be only around 30 MB available per thread; currently on an Ubuntu box s@h reports 104 or 164 MB virtual memory per process, 40 or 96 MB resident per process, and 12 MB KB shared. (Similar figures on a RHEL box; the one job on Ubuntu that's taking more RAM is not the vlar WU running at the moment.) If you are running stock apps, those are UPX compressed. Decompress them and see what figures you get then. Although, iirc, the apps use more than 30 MB for data. Those are in-memory figures from top -- compression of the executable wouldn't affect that (only the space on disk), surely? ID: 1602713 ·

Woodgie Send message Joined: 6 Dec 99 Posts: 134 Credit: 89,630,417 RAC: 55	Message 1602716 - Posted: 19 Nov 2014, 21:59:48 UTC - in response to Message 1602548. SP / DP was to be my next question but that's been answered, so it looks like if you had the cash a Dual 18 core Xeon box with 2x Titan Z's would be the best bet? Damn when I win the lottery I am defiantly building the worlds best home Seti box, something you can still run off the mains and use day to day I think, would be awesome getting to play with that kit and giving something back with all the power :) You don't really need that much CPU oomph. An i5 or i7 with a pair of those GPUs would likely be in the top 15-20 computers for the project. The CPUs, in comparison to the GPUs, would only account for a small fraction of the work the machine completes. If anyone cares to have a look at outlander, it's an i7 overclocked to 4.4GHz with 16GB RAM and 2 original TITANs (not Z or Black) When I win the lottery I'm going to use a couple of these... ~W ID: 1602716 ·

Juha Volunteer tester Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0	Message 1602744 - Posted: 19 Nov 2014, 22:53:28 UTC - in response to Message 1602713. Then there's the memory problem -- our particular model only has 8 GB of ram, which has to contain the OS as well as applications. At the moment top reports 7.6 GB free, so for 60 cores x 4 threads that'd be only around 30 MB available per thread; currently on an Ubuntu box s@h reports 104 or 164 MB virtual memory per process, 40 or 96 MB resident per process, and 12 MB KB shared. (Similar figures on a RHEL box; the one job on Ubuntu that's taking more RAM is not the vlar WU running at the moment.) If you are running stock apps, those are UPX compressed. Decompress them and see what figures you get then. Although, iirc, the apps use more than 30 MB for data. Those are in-memory figures from top -- compression of the executable wouldn't affect that (only the space on disk), surely? The uncompressed code started its life as compressed data. In particular, before the code became code, it was written to memory as data. Pages that have been written to aren't shared between different processes. ID: 1602744 ·

Ryan Munro Send message Joined: 5 Feb 06 Posts: 63 Credit: 18,519,866 RAC: 10	Message 1603031 - Posted: 20 Nov 2014, 11:13:56 UTC Damn that system beats mine : http://setiathome.berkeley.edu/show_host_detail.php?hostid=7407076 Couple of questions, on the Nvidia system posted he is doing a wide range of GPU based units where as I am only doing one type, is this because he has a Cuda capable card? Second I was under the assumption that CPU was still important as there is specific work types that only run on CPU? Third I have a second box with a 3770k installed, is it possible to crunch on the CPU's GPU and the discrete (270x) card at the same time? I think this combo rather than CPU + 270x would use less power and produce about the same points? ID: 1603031 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1603122 - Posted: 20 Nov 2014, 14:47:33 UTC - in response to Message 1603031. Damn that system beats mine : http://setiathome.berkeley.edu/show_host_detail.php?hostid=7407076 Couple of questions, on the Nvidia system posted he is doing a wide range of GPU based units where as I am only doing one type, is this because he has a Cuda capable card? Second I was under the assumption that CPU was still important as there is specific work types that only run on CPU? Third I have a second box with a 3770k installed, is it possible to crunch on the CPU's GPU and the discrete (270x) card at the same time? I think this combo rather than CPU + 270x would use less power and produce about the same points? In my first message I has mentioned which apps run best on which hardware. However with SETI@home all 4 types of hardware CPU, ATI GPU, Intel GPU, & NVIDIA GPU can run both Astropulse & SETI@home applications. At least in Windows. From the Applications page you can see what apps there are for which types of hardware in each OS. OpenCL is used for the ATI, Intel, & Nvidia Astropulse apps OpenCL is used for the ATI, & Intel SETI@home apps. CUDA is use for the NVIDIA SETI@home app. It looks like none of your machines have done any Astropulse work. You may have disabled Astropulse in your preferences. You can check your Project Preferences & make sure everything is enabled or at least SETI@home v7 & AstroPulse v7. As those two are the current active applications that we are working on right now. Run only the selected applications: SETI@home Enhanced - Obsolete & replaced by SETI@home v7 SETI@home v7 AstroPulse v6 - Obsolete & replaced by AstroPulse v7 AstroPulse v7 CPUs are still important. However a system with a mid to high GPU the CPU output does only account for a fraction of the systems total output. Primarily because GPUs are several times more efficient that CPUs in the manor we are using them. For example my HD6870's will process an Astropulse task in about 30 min & my i5-4670's will take about 4 hours running 4 at a time. So in 4 hours my HD6870 has completed 8 tasks where my CPU has completed 4. Also the Intel HD4000 @ 107.52 GFLOPS is much less powerful than your R9 270x @ 2560 GFLOPS. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1603122 ·

Ryan Munro Send message Joined: 5 Feb 06 Posts: 63 Credit: 18,519,866 RAC: 10	Message 1603137 - Posted: 20 Nov 2014, 15:50:43 UTC - in response to Message 1603122. Thanks for the info. Regards to the Intel GPU I was thinking of running this instead of the Intel CPU alongside the Radeon, My thoughts were lower power for the same sort of output? ID: 1603137 ·

qbit Volunteer tester Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0	Message 1603144 - Posted: 20 Nov 2014, 16:04:36 UTC - in response to Message 1602716. When I win the lottery I'm going to use a couple of these... Oh yeah! The K80 is rated 8.7 TFLOPS/s for single precision and therefore would/should be considerably faster then the fastest consumer GPUs. http://www.computerworld.com/article/2848128/servers/nvidia-reaches-high-on-graphics-performance-with-tesla-k80.html Problem really is the price, it should be around $7000 :-( ID: 1603144 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1603151 - Posted: 20 Nov 2014, 16:10:17 UTC - in response to Message 1603137. Last modified: 20 Nov 2014, 16:15:20 UTC Thanks for the info. Regards to the Intel GPU I was thinking of running this instead of the Intel CPU alongside the Radeon, My thoughts were lower power for the same sort of output? So iGPU + Radeon & no CPU? I'm not as familiar with the Ivy Bridge iGPU as I am with Haswell. If the CPU to iGPU performance scales the same. Then you could expect about the same amount of output vs CPU + Radeon at lower power levels. The iGPU runs in the neighborhood of 11-15 or so watts. EDIT: Also I just noticed I messed up on the GFLOPS for your iGPU. It should have been 294.4 instead of 107.52. As it has 16, not 6, execution units. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1603151 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1603159 - Posted: 20 Nov 2014, 16:22:55 UTC - in response to Message 1603144. When I win the lottery I'm going to use a couple of these... Oh yeah! The K80 is rated 8.7 TFLOPS/s for single precision and therefore would/should be considerably faster then the fastest consumer GPUs. http://www.computerworld.com/article/2848128/servers/nvidia-reaches-high-on-graphics-performance-with-tesla-k80.html Problem really is the price, it should be around $7000 :-( I don't know about accuracy, but the price is quoted as being $5,000 on anandtech. Which is $2000 more than a Titan Z that is rated at 8.1 TFLOPS. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1603159 ·

BassieXp Volunteer tester Send message Joined: 5 Jun 05 Posts: 14 Credit: 1,408,518 RAC: 0	Message 1603210 - Posted: 20 Nov 2014, 18:25:16 UTC - in response to Message 1603144. Last modified: 20 Nov 2014, 18:35:14 UTC Oh yeah! The K80 is rated 8.7 TFLOPS/s for single precision and therefore would/should be considerably faster then the fastest consumer GPUs. http://www.computerworld.com/article/2848128/servers/nvidia-reaches-high-on-graphics-performance-with-tesla-k80.html Problem really is the price, it should be around $7000 :-( just a thought. From the article Dell's PowerEdge C4130 is a 1U server that looks more like an appliance and will be able to accommodate up to four K80 cards At 8,75TFlops a card. 4 cards per 1u server and a 40u rack(2u for switches). 8,75440 = 1400Tflops That's a lot of crunching, with six of these racks you could double the output of the boinc program entirely. (boinc site says 8,4 Pflops across all projects.) As a side note the press release from dell says it has up to 7,2Tflops per server. What I find a strange number as the K80 has 2,91Tflops double precision and that makes for an 11,64Tflops for four cards. PS. This is just some thinking from someone who has no experience with rack servers. Do know the power an cooling can be problematic with so much in a rack. ID: 1603210 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.