Best performing hardware

Message boards : Number crunching : Best performing hardware
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Ryan Munro

Send message
Joined: 5 Feb 06
Posts: 63
Credit: 18,519,866
RAC: 10
United Kingdom
Message 1602166 - Posted: 18 Nov 2014, 14:54:27 UTC

So just a thought, what kit out there would perform the best for Seti?, Im not talking big clusters ect just specific pieces of hardware.

I would assume it would be some form of GPU but what about the Intel Xeon Phi cards for example?

I'm not looking to purchase just curious as to what the ideal Seti rig would be :)
ID: 1602166 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1602182 - Posted: 18 Nov 2014, 15:30:02 UTC - in response to Message 1602166.  

So just a thought, what kit out there would perform the best for Seti?, Im not talking big clusters ect just specific pieces of hardware.

I would assume it would be some form of GPU but what about the Intel Xeon Phi cards for example?

I'm not looking to purchase just curious as to what the ideal Seti rig would be :)

I'm not sure that anyone has developed any SETI@home apps for the Xeon Phi hardware yet. However at the moment the best performance per watt would be doing work on GPUs. The type of work depends on which vendor hardware is the most efficient.
For SETI@home (MB work): NV GPUs using CUDA
For Astropulse (AP work): ATI GPUs using OpenCL
That isn't saying MB work on ATI GPUs or AP work on NV GPUs is bad.

In the NV range the 750ti seems to be the best PPW GPU. I have not seen enough reports on the new NV 900 series GPUs to know how well they preform at the moment.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1602182 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1602206 - Posted: 18 Nov 2014, 16:30:57 UTC

Correct me if I'm wrong, but AFAIK the important thing for SETI is single precision performance.

The Xeon Phi 7120P seems to have a theoretical single precision peak of 2.4 TFLOPS/s:
http://www.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-theoretical-maximums.html

A GTX780 or a Titan seems to be much faster:

The GTX 780 still offers respectable single precision performance though, clocking in at 4 Teraflops compared to the Titan's 4.5 Teraflops.

http://www.maximumpc.com/article/news/geforce_gtx_780_benchmarks

The Titan Black is rated at 5.1 TFLOPS/s
http://www.bit-tech.net/news/hardware/2014/02/18/nvidia-gtx-titan-black-launched/1

The GTX980 should be about the same with ~5 TFLOPS/s
http://www.pcworld.com/article/2686115/nvidia-unveils-its-all-new-geforce-gtx-980-and-gtx-970-graphics-processors.html



But the by far fastest "computing unit" is the human brain, which is rated at ~1000000 TFLOPS/s (that's 1 Exaflop/s)! So maybe we should look for a way to use our brains to crunch SETI ;-)
ID: 1602206 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1602280 - Posted: 18 Nov 2014, 22:55:30 UTC - in response to Message 1602206.  

Correct me if I'm wrong, but AFAIK the important thing for SETI is single precision performance.

The Xeon Phi 7120P seems to have a theoretical single precision peak of 2.4 TFLOPS/s:
http://www.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-theoretical-maximums.html

A GTX780 or a Titan seems to be much faster:

The GTX 780 still offers respectable single precision performance though, clocking in at 4 Teraflops compared to the Titan's 4.5 Teraflops.

http://www.maximumpc.com/article/news/geforce_gtx_780_benchmarks

The Titan Black is rated at 5.1 TFLOPS/s
http://www.bit-tech.net/news/hardware/2014/02/18/nvidia-gtx-titan-black-launched/1

The GTX980 should be about the same with ~5 TFLOPS/s
http://www.pcworld.com/article/2686115/nvidia-unveils-its-all-new-geforce-gtx-980-and-gtx-970-graphics-processors.html

I have a Xeon Phi. I gave up on trying to port BOINC and S@H to it. For the best performance you have to run native code on the Phi cluster (otherwise communication bottlenecks between the host and the cluster slow you down). That would mean running BOINC on the Phi as well and BOINC code is not the most portable (but not too bad if you ignore boincmgr and use boinccmd for all interactions).
Then there's the memory problem -- our particular model only has 8 GB of ram, which has to contain the OS as well as applications. At the moment top reports 7.6 GB free, so for 60 cores x 4 threads that'd be only around 30 MB available per thread; currently on an Ubuntu box s@h reports 104 or 164 MB virtual memory per process, 40 or 96 MB resident per process, and 12 MB KB shared. (Similar figures on a RHEL box; the one job on Ubuntu that's taking more RAM is not the vlar WU running at the moment.)
This might all change with the new Phis, which slot into motherboard sockets and can access main memory, but Intel hasn't offered me one to play with yet...
ID: 1602280 · Report as offensive
Admiral Gloval
Avatar

Send message
Joined: 31 Mar 13
Posts: 20152
Credit: 5,308,449
RAC: 0
United States
Message 1602329 - Posted: 19 Nov 2014, 1:00:56 UTC

If you want a hint at a number cruncher. Just go to statistics and look at the top performers.

ID: 1602329 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1602518 - Posted: 19 Nov 2014, 13:17:26 UTC - in response to Message 1602280.  

[quote]
Then there's the memory problem -- our particular model only has 8 GB of ram, which has to contain the OS as well as applications. At the moment top reports 7.6 GB free, so for 60 cores x 4 threads that'd be only around 30 MB available per thread; currently on an Ubuntu box s@h reports 104 or 164 MB virtual memory per process, 40 or 96 MB resident per process, and 12 MB KB shared. (Similar figures on a RHEL box; the one job on Ubuntu that's taking more RAM is not the vlar WU running at the moment.)
This might all change with the new Phis, which slot into motherboard sockets and can access main memory, but Intel hasn't offered me one to play with yet...

Would it be possible to use massively-parallel system via OpenCL drivers and not to run separate instance on each node/core ?
ID: 1602518 · Report as offensive
Ryan Munro

Send message
Joined: 5 Feb 06
Posts: 63
Credit: 18,519,866
RAC: 10
United Kingdom
Message 1602531 - Posted: 19 Nov 2014, 13:50:33 UTC

SP / DP was to be my next question but that's been answered, so it looks like if you had the cash a Dual 18 core Xeon box with 2x Titan Z's would be the best bet?

Damn when I win the lottery I am defiantly building the worlds best home Seti box, something you can still run off the mains and use day to day I think, would be awesome getting to play with that kit and giving something back with all the power :)
ID: 1602531 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1602548 - Posted: 19 Nov 2014, 15:13:46 UTC - in response to Message 1602531.  

SP / DP was to be my next question but that's been answered, so it looks like if you had the cash a Dual 18 core Xeon box with 2x Titan Z's would be the best bet?

Damn when I win the lottery I am defiantly building the worlds best home Seti box, something you can still run off the mains and use day to day I think, would be awesome getting to play with that kit and giving something back with all the power :)

You don't really need that much CPU oomph. An i5 or i7 with a pair of those GPUs would likely be in the top 15-20 computers for the project. The CPUs, in comparison to the GPUs, would only account for a small fraction of the work the machine completes.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1602548 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1602580 - Posted: 19 Nov 2014, 16:43:19 UTC - in response to Message 1602518.  

[quote]
Then there's the memory problem -- our particular model only has 8 GB of ram, which has to contain the OS as well as applications. At the moment top reports 7.6 GB free, so for 60 cores x 4 threads that'd be only around 30 MB available per thread; currently on an Ubuntu box s@h reports 104 or 164 MB virtual memory per process, 40 or 96 MB resident per process, and 12 MB KB shared. (Similar figures on a RHEL box; the one job on Ubuntu that's taking more RAM is not the vlar WU running at the moment.)
This might all change with the new Phis, which slot into motherboard sockets and can access main memory, but Intel hasn't offered me one to play with yet...

Would it be possible to use massively-parallel system via OpenCL drivers and not to run separate instance on each node/core ?

Perhaps, you have more experience there than I. I've done a few OpenMP things but not really had the chance to play with it otherwise. Do you have to explicitly parallelise everything in OpenCL or does the system work that out for you? (Guess I should dig out your code if I get an idle moment.)
Another possibility would be to try to get the shareable section much larger by using shared libraries:
[eesridr:~] > ldd BOINC/projects/setiathome.berkeley.edu/setiathome_7.01_x86_64-pc-linux-gnu 
	not a dynamic executable

ID: 1602580 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1602689 - Posted: 19 Nov 2014, 20:57:54 UTC - in response to Message 1602280.  

Then there's the memory problem -- our particular model only has 8 GB of ram, which has to contain the OS as well as applications. At the moment top reports 7.6 GB free, so for 60 cores x 4 threads that'd be only around 30 MB available per thread; currently on an Ubuntu box s@h reports 104 or 164 MB virtual memory per process, 40 or 96 MB resident per process, and 12 MB KB shared. (Similar figures on a RHEL box; the one job on Ubuntu that's taking more RAM is not the vlar WU running at the moment.)

If you are running stock apps, those are UPX compressed. Decompress them and see what figures you get then. Although, iirc, the apps use more than 30 MB for data.
ID: 1602689 · Report as offensive
Profile ivan
Volunteer tester
Avatar

Send message
Joined: 5 Mar 01
Posts: 783
Credit: 348,560,338
RAC: 223
United Kingdom
Message 1602713 - Posted: 19 Nov 2014, 21:49:26 UTC - in response to Message 1602689.  

Then there's the memory problem -- our particular model only has 8 GB of ram, which has to contain the OS as well as applications. At the moment top reports 7.6 GB free, so for 60 cores x 4 threads that'd be only around 30 MB available per thread; currently on an Ubuntu box s@h reports 104 or 164 MB virtual memory per process, 40 or 96 MB resident per process, and 12 MB KB shared. (Similar figures on a RHEL box; the one job on Ubuntu that's taking more RAM is not the vlar WU running at the moment.)

If you are running stock apps, those are UPX compressed. Decompress them and see what figures you get then. Although, iirc, the apps use more than 30 MB for data.

Those are in-memory figures from top -- compression of the executable wouldn't affect that (only the space on disk), surely?
ID: 1602713 · Report as offensive
Profile Woodgie
Avatar

Send message
Joined: 6 Dec 99
Posts: 134
Credit: 89,630,417
RAC: 55
United Kingdom
Message 1602716 - Posted: 19 Nov 2014, 21:59:48 UTC - in response to Message 1602548.  

SP / DP was to be my next question but that's been answered, so it looks like if you had the cash a Dual 18 core Xeon box with 2x Titan Z's would be the best bet?

Damn when I win the lottery I am defiantly building the worlds best home Seti box, something you can still run off the mains and use day to day I think, would be awesome getting to play with that kit and giving something back with all the power :)

You don't really need that much CPU oomph. An i5 or i7 with a pair of those GPUs would likely be in the top 15-20 computers for the project. The CPUs, in comparison to the GPUs, would only account for a small fraction of the work the machine completes.


If anyone cares to have a look at outlander, it's an i7 overclocked to 4.4GHz with 16GB RAM and 2 original TITANs (not Z or Black)

When I win the lottery I'm going to use a couple of these...
~W

ID: 1602716 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1602744 - Posted: 19 Nov 2014, 22:53:28 UTC - in response to Message 1602713.  

Then there's the memory problem -- our particular model only has 8 GB of ram, which has to contain the OS as well as applications. At the moment top reports 7.6 GB free, so for 60 cores x 4 threads that'd be only around 30 MB available per thread; currently on an Ubuntu box s@h reports 104 or 164 MB virtual memory per process, 40 or 96 MB resident per process, and 12 MB KB shared. (Similar figures on a RHEL box; the one job on Ubuntu that's taking more RAM is not the vlar WU running at the moment.)

If you are running stock apps, those are UPX compressed. Decompress them and see what figures you get then. Although, iirc, the apps use more than 30 MB for data.

Those are in-memory figures from top -- compression of the executable wouldn't affect that (only the space on disk), surely?

The uncompressed code started its life as compressed data. In particular, before the code became code, it was written to memory as data. Pages that have been written to aren't shared between different processes.
ID: 1602744 · Report as offensive
Ryan Munro

Send message
Joined: 5 Feb 06
Posts: 63
Credit: 18,519,866
RAC: 10
United Kingdom
Message 1603031 - Posted: 20 Nov 2014, 11:13:56 UTC

Damn that system beats mine :

http://setiathome.berkeley.edu/show_host_detail.php?hostid=7407076

Couple of questions, on the Nvidia system posted he is doing a wide range of GPU based units where as I am only doing one type, is this because he has a Cuda capable card?

Second I was under the assumption that CPU was still important as there is specific work types that only run on CPU?

Third I have a second box with a 3770k installed, is it possible to crunch on the CPU's GPU and the discrete (270x) card at the same time?
I think this combo rather than CPU + 270x would use less power and produce about the same points?
ID: 1603031 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1603122 - Posted: 20 Nov 2014, 14:47:33 UTC - in response to Message 1603031.  

Damn that system beats mine :

http://setiathome.berkeley.edu/show_host_detail.php?hostid=7407076

Couple of questions, on the Nvidia system posted he is doing a wide range of GPU based units where as I am only doing one type, is this because he has a Cuda capable card?

Second I was under the assumption that CPU was still important as there is specific work types that only run on CPU?

Third I have a second box with a 3770k installed, is it possible to crunch on the CPU's GPU and the discrete (270x) card at the same time?
I think this combo rather than CPU + 270x would use less power and produce about the same points?

In my first message I has mentioned which apps run best on which hardware. However with SETI@home all 4 types of hardware CPU, ATI GPU, Intel GPU, & NVIDIA GPU can run both Astropulse & SETI@home applications. At least in Windows.
From the Applications page you can see what apps there are for which types of hardware in each OS.
OpenCL is used for the ATI, Intel, & Nvidia Astropulse apps
OpenCL is used for the ATI, & Intel SETI@home apps.
CUDA is use for the NVIDIA SETI@home app.

It looks like none of your machines have done any Astropulse work. You may have disabled Astropulse in your preferences. You can check your Project Preferences & make sure everything is enabled or at least SETI@home v7 & AstroPulse v7. As those two are the current active applications that we are working on right now.
Run only the selected applications:
SETI@home Enhanced - Obsolete & replaced by SETI@home v7
SETI@home v7
AstroPulse v6 - Obsolete & replaced by AstroPulse v7
AstroPulse v7

CPUs are still important. However a system with a mid to high GPU the CPU output does only account for a fraction of the systems total output. Primarily because GPUs are several times more efficient that CPUs in the manor we are using them. For example my HD6870's will process an Astropulse task in about 30 min & my i5-4670's will take about 4 hours running 4 at a time. So in 4 hours my HD6870 has completed 8 tasks where my CPU has completed 4.

Also the Intel HD4000 @ 107.52 GFLOPS is much less powerful than your R9 270x @ 2560 GFLOPS.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1603122 · Report as offensive
Ryan Munro

Send message
Joined: 5 Feb 06
Posts: 63
Credit: 18,519,866
RAC: 10
United Kingdom
Message 1603137 - Posted: 20 Nov 2014, 15:50:43 UTC - in response to Message 1603122.  

Thanks for the info.

Regards to the Intel GPU I was thinking of running this instead of the Intel CPU alongside the Radeon, My thoughts were lower power for the same sort of output?
ID: 1603137 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1603144 - Posted: 20 Nov 2014, 16:04:36 UTC - in response to Message 1602716.  


When I win the lottery I'm going to use a couple of these...

Oh yeah! The K80 is rated 8.7 TFLOPS/s for single precision and therefore would/should be considerably faster then the fastest consumer GPUs.

http://www.computerworld.com/article/2848128/servers/nvidia-reaches-high-on-graphics-performance-with-tesla-k80.html

Problem really is the price, it should be around $7000 :-(
ID: 1603144 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1603151 - Posted: 20 Nov 2014, 16:10:17 UTC - in response to Message 1603137.  
Last modified: 20 Nov 2014, 16:15:20 UTC

Thanks for the info.

Regards to the Intel GPU I was thinking of running this instead of the Intel CPU alongside the Radeon, My thoughts were lower power for the same sort of output?

So iGPU + Radeon & no CPU? I'm not as familiar with the Ivy Bridge iGPU as I am with Haswell. If the CPU to iGPU performance scales the same. Then you could expect about the same amount of output vs CPU + Radeon at lower power levels. The iGPU runs in the neighborhood of 11-15 or so watts.

EDIT: Also I just noticed I messed up on the GFLOPS for your iGPU. It should have been 294.4 instead of 107.52. As it has 16, not 6, execution units.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1603151 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1603159 - Posted: 20 Nov 2014, 16:22:55 UTC - in response to Message 1603144.  


When I win the lottery I'm going to use a couple of these...

Oh yeah! The K80 is rated 8.7 TFLOPS/s for single precision and therefore would/should be considerably faster then the fastest consumer GPUs.

http://www.computerworld.com/article/2848128/servers/nvidia-reaches-high-on-graphics-performance-with-tesla-k80.html

Problem really is the price, it should be around $7000 :-(

I don't know about accuracy, but the price is quoted as being $5,000 on anandtech.
Which is $2000 more than a Titan Z that is rated at 8.1 TFLOPS.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1603159 · Report as offensive
Profile BassieXp
Volunteer tester

Send message
Joined: 5 Jun 05
Posts: 14
Credit: 1,408,518
RAC: 0
Netherlands
Message 1603210 - Posted: 20 Nov 2014, 18:25:16 UTC - in response to Message 1603144.  
Last modified: 20 Nov 2014, 18:35:14 UTC

Oh yeah! The K80 is rated 8.7 TFLOPS/s for single precision and therefore would/should be considerably faster then the fastest consumer GPUs.

http://www.computerworld.com/article/2848128/servers/nvidia-reaches-high-on-graphics-performance-with-tesla-k80.html

Problem really is the price, it should be around $7000 :-(


just a thought.

From the article
Dell's PowerEdge C4130 is a 1U server that looks more like an appliance and will be able to accommodate up to four K80 cards


At 8,75TFlops a card. 4 cards per 1u server and a 40u rack(2u for switches).
8,75*4*40 = 1400Tflops
That's a lot of crunching, with six of these racks you could double the output of the boinc program entirely. (boinc site says 8,4 Pflops across all projects.)

As a side note the press release from dell says it has up to 7,2Tflops per server. What I find a strange number as the K80 has 2,91Tflops double precision and that makes for an 11,64Tflops for four cards.

PS. This is just some thinking from someone who has no experience with rack servers. Do know the power an cooling can be problematic with so much in a rack.
ID: 1603210 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Best performing hardware


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.