What are the "best" combination(s) of motherboard/gpu's for mixed BOINC processing?

Message boards : Number crunching : What are the "best" combination(s) of motherboard/gpu's for mixed BOINC processing?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1971943 - Posted: 25 Dec 2018, 16:45:58 UTC

There has been plenty of discussion about the best pure BOINC/SETI combinations of motherboard/CPU/GPU but I want to explore examples of high powered "mixed" processing systems.

Two of my systems (which are not in the running for "top computers") are running mixes of Seti, Mind Modeling, World Community Grid, Rosetti@Home, Seti Beta and sometimes Einstein@Home.

Another Setizen started a PM conversation about the maximum system in a single computer. He doesn't have an unlimited budget so I didn't talk about the results on my "Heavy Metal" thread which includes $6,000 cpus. :)

He is/was thinking about a 2990wx. My impression is there is no BOINC project out there that has tasks that will run strictly in the CPU cache. Is this true? If there is a project that would run strictly in the CPU cache that would be excellent for the 2990wx CPU.

So what would you propose for maximum cores and maximum CPU crunching as well as best bang for the buck for gpu combinations?

I have some experience with previous generation Intel server cpus (e5-2670 and e5-2690). The Intels are not the fastest CPU crunchers but have been very reliable and stable.

I now have enough knowledge that I think that AMD's 2950x, 1950x, 1920x and Ryzen 7 (8c/16t) cpus to believe that the AMD line below the 2990wx/2970wx would be fast/reliable crunchers in a mixed project environment.

Keith has a 2920x with 4 GPUs that have joined the top rank of the Seti processors. What I am not sure about is if he is running a mixed load on that box?

We have a couple of examples of extreme gpu oriented boxes on the list (essentially ex-miner systems if I understand correctly). But they aren't set up for or doing much CPU crunching.

There has been a discussion about this before but I wanted to focus it on a single thread.

I want to thank all the thoughtful Setizen's out there.
Tom
A proud member of the OFA (Old Farts Association).
ID: 1971943 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1971953 - Posted: 25 Dec 2018, 18:13:08 UTC - in response to Message 1971943.  
Last modified: 25 Dec 2018, 18:15:33 UTC

Your Thread headline and post diverge. Are you looking for a CPU only project or are you looking a combination CPU/GPU project to discuss? I know there are only CPU oriented projects out there. There are also project that utilize both the CPU and GPU only for different aspects.. For this I'm going to use Einstein and GPUGrid. Einstein has a Gravity wave app for the CPU, 1 core per work unit. Their gamma ray searches run on the GPUs. GPUGrid has Quantum Chemistry application for the CPU, 4 CPU cores per work unit. They also have GPU work units unrelated to QC that are GPU only.

So are you looking for one of those??

I'm sure Keith has told you already that here in Seti PCIe speeds don't matter. That's not true for other projects. I've found that both Einstein and GPUGrid's GPU application are both dependent on PCIe speeds. I've had conversation with other members who wanted to know why I can process work units almost 2x faster than theirs. They have the same number of GPUs and same kind. In the end, I believe it has to do with the number to lanes in the CPU chip and what the PCIe run at in my computers. In my case, I use a server grade MoBo which has all PCIe running at 16x.

Lastly, sometimes it's not only about CPU threads. Yes you can have 48 threads but can you utilize all of them?? I've found with GPUGrid running 4 1080Tis and 2 Quantum Chemistry that that my temperatures came close to the 100C mark for my i7 6950X. I use a 280 mm CPU cooler and it couldn't keep up. That forced me to reduce the CPU work units to only 1 - 4 core work unit along with 4 GPU work units. I'm going to have to build a 360mm custom water loop if I want to run more than 1 CPU task.

Last... HD space. Usually not a problem with most projects but as I found out with GPUGrid, the CPU 4 thread per work unit requires ALOT of HD scratch space. My 450 GB SSD wasn't big enough. After discussions with the project managers, it was determined that a 1 TB HD would be required. Luckily we don't have to install BOINC on the same HD as the OS. So I threw in a 4 TB HDD (HDD are MUCH CHEAPER than SSD in these sizes) as the source HD for BOINC.

That should give you some ideas.


Z
ID: 1971953 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1971955 - Posted: 25 Dec 2018, 18:30:44 UTC - in response to Message 1971953.  


That should give you some ideas.


Z


Thank you for the thoughtful post. Your discussion is exactly what I was looking for. I would love to have more Setizen's chime in because at least two of us are thinking about our next BOINC boxes. What brand CPU/MB? How many cores? How much cooling to upgrade to? What GPUs? And what projects?

Tom
A proud member of the OFA (Old Farts Association).
ID: 1971955 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1971960 - Posted: 25 Dec 2018, 19:10:03 UTC - in response to Message 1971943.  

First I would like clarification on what you mean by "run strictly in the CPU cache" That seems like non-sense in that the cpu cache is fixed in all modern processors and is not more than 4MB at most.

I am running "mixed" projects on the new Threadripper 2920X. I run Seti, MW, Einstein and GPUGrid on the host. CPU and GPU for Seti and only GPU for the other projects.

Zalster is the expert for running the mt QC Chemistry cpu application at GPUGrid. It can take as many cpu cores as you can throw at it. But as he noted, you better have extreme cpu cooling. I am sure there are plenty of other cpu only projects that can use as many cores as you have available.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1971960 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1972039 - Posted: 26 Dec 2018, 6:55:44 UTC - in response to Message 1971960.  

First I would like clarification on what you mean by "run strictly in the CPU cache" That seems like non-sense in that the cpu cache is fixed in all modern processors and is not more than 4MB at most.


According to the reviewers if the processing on the non-direct memory cpus of the 2990wx/2970wx series is done without memory access it runs very, very fast. The reviewers all mention "rendering" using something like Adobe as being an example that runs very fast. They also posted specific benchmarks that showed in a couple of cases the results were spectacular. I think those where rendering benchmarks (or maybe not).

I have inferred that if it is not accessing memory, then all the "processing memory" that is being used is in the CPU cache. If this inference is wrong, I apologize.

If there are any BOINC gpu tasks that can run "inside" the CPU cache, it would seem likely they would run very rapidly.

Thank you for confirming that you are running very rapidly while having a mixed project load on your 2920x.

Respectfully,
Tom
A proud member of the OFA (Old Farts Association).
ID: 1972039 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1972041 - Posted: 26 Dec 2018, 7:41:02 UTC - in response to Message 1972039.  

Unless the compute task or render is small enough to fit in the cpu L1,L2 and L3 onboard caches, then memory access has to take place. And the data that is being computed has to come off the storage system at least once and pass through system memory. That means that the memory access through the cpu controller has to be made at least once on the way in and once on the way. All of the tasks that I crunch are too big to fit into cpu local memory, so they have to come through system memory multiple times.

I'm not positive but I think the 'peak working set size' and the 'peak swap size' printed in each tasks stderr.txt shows the amount of memory the task used to compute. Seti BLC tasks are using 628MB and 34.6GB respectively. So no they are not able to compute just in the L1 through L3 caches. GPUGrid tasks take hundreds of GB of memory. A 1 TB hard drive is the minimum requirement for QC Chemistry tasks and Zalster uses a 4 TB drive. They use massive amounts of system memory access to pass that much data to and from storage.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1972041 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1972043 - Posted: 26 Dec 2018, 8:23:31 UTC - in response to Message 1972041.  

So no they are not able to compute just in the L1 through L3 caches. GPUGrid tasks take hundreds of GB of memory. A 1 TB hard drive is the minimum requirement for QC Chemistry tasks and Zalster uses a 4 TB drive. They use massive amounts of system memory access to pass that much data to and from storage.

So current Threadripper2 systems tend to be at a disadvantage compared to similar Intel systems when it comes to memory accesses that aren't on the local die. This is where their new chiplet design, coming first to Epyc Rome CPUs, will give a significant improvement in overall performance with these extreme core count CPUs. There will be a slight increase in memory latency for local accesses, but there won't be any further penalty for access that aren't on the local die.
Grant
Darwin NT
ID: 1972043 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1972163 - Posted: 27 Dec 2018, 15:29:49 UTC - in response to Message 1972043.  

So no they are not able to compute just in the L1 through L3 caches. GPUGrid tasks take hundreds of GB of memory. A 1 TB hard drive is the minimum requirement for QC Chemistry tasks and Zalster uses a 4 TB drive. They use massive amounts of system memory access to pass that much data to and from storage.

So current Threadripper2 systems tend to be at a disadvantage compared to similar Intel systems when it comes to memory accesses that aren't on the local die. This is where their new chiplet design, coming first to Epyc Rome CPUs, will give a significant improvement in overall performance with these extreme core count CPUs. There will be a slight increase in memory latency for local accesses, but there won't be any further penalty for access that aren't on the local die.


That is why I am pondering the 2950wx. It doesn't have the isolated cpus that the 2970wx/2990wx do. I am assuming I would get the same kind of performance Keith is getting with his 2920wx. (I am looking hard at the 1950x too because it is $300 cheaper than a 2950wx :)

And they are/will be a bit cheaper than the Epyc servers :)

Tom
A proud member of the OFA (Old Farts Association).
ID: 1972163 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1972177 - Posted: 27 Dec 2018, 17:59:36 UTC

If you don't want to push your system to the max and want to run a stable basic system, then the 1950X is a fine cpu and costs less than a TR2 cpu. You just won't get the performance boost and higher memory speeds that TR2 is capable of.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1972177 · Report as offensive

Message boards : Number crunching : What are the "best" combination(s) of motherboard/gpu's for mixed BOINC processing?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.