Multi-GPU hardware setups: My method (feel free to share your own!)

Message boards : Number crunching : Multi-GPU hardware setups: My method (feel free to share your own!)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7

AuthorMessage
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1718
Credit: 700,208,369
RAC: 2,415,110
United States
Message 1993635 - Posted: 13 May 2019, 15:20:37 UTC - in response to Message 1993633.  

I am pleased to note that my AMD box with 9 gpus has been running without crashing since Saturday afternoon. My previous experience with running a 9 gpu setup on my Intel box suggests that if it runs more than 3 days non-stop, then it MIGHT be stable.

I think I will shoot for 5 days. If it is still running then, I will start transplanting my high end gpus from my Intel box into the AMD system. At least get the AMD box to the point where it has 3 gtx 1070's which turn out to run about 10 seconds slower than my 3 gtx 1070Ti's do but at a significant cost reduction.

Tom


great to hear that those new cables are holding up well so far.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1993635 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1718
Credit: 700,208,369
RAC: 2,415,110
United States
Message 1993637 - Posted: 13 May 2019, 15:30:13 UTC

My Beast also has been running smooth as butter.

8 cards at 8x PCIe gen3 is unheard of from anyone else. no problems with power delivery through the PCIe slots if you set things up properly like I have, and do the research on your parts selection to get an idea of how much power the card is likely to pull from the PCIe slot. I've proven the setup to be safe and stable, been running for several months now like this, but I'll measure the current to the 2 cards on risers just for additional validation/verification of power draw from the slot.

the system runs 10 cards total, but 2 cards are on risers at the moment. but only because of clearance issues with the rear CPU heatsink (the last two cards sit over it, and i don't have any right angle adapters at the moment)

I'm on the fence if I should leave things as is, get some right angle adapters to get those last 2 cards on full lanes, or toss a couple PCIe 4-in-1 splitters to see if the system will run 16 GPUs lol. It's no rush, being about 2x as fast as the next contender :)
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1993637 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3452
Credit: 195,992,647
RAC: 525,215
United States
Message 1993639 - Posted: 13 May 2019, 15:52:34 UTC - in response to Message 1993637.  

Ah, but wait till I get my "unlimited budget" untracked. Then I can "dog paddle" a little faster..... may give you a run for your "money"....

Nah.... Haven't got the budget. Being Semi-Retired and unemployed because I don't want to work at a call center again right now... I think the best I could do would be re-deploy all my 1070 assets into one or the other machines and fill out any leftover slots with 1060's.

That will leave the "other" box with an "all gtx 1060 3GB" cast but based on past experience it should stay on at least the 2nd page of the leader board no matter which machine it is.

Unfortunately, the AMD (TB350-BTC) Motherboard only has 2 Gen3 slots. Everything else is Gen2.

I really would like to see if you can exceed your Pcie "resource limit" or other head room limit (excepting not enough PSU watts) and where it tops out at. Based on my current experience I would use the high end UGreens with 1 1/2 foot lengths. I suspect the shorter length of the cable makes the resources more usable.

Tom
I will stop procrastinating tomorrow.
\\// Live Long & Prosper (starting tomorrow ;)
ID: 1993639 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3452
Credit: 195,992,647
RAC: 525,215
United States
Message 1993640 - Posted: 13 May 2019, 16:08:53 UTC

Besides seeing how high I can push the AMD 9 gpu rig (if it turns out to be stable) I have another change I would be interested in.

I have already proven that when the "-nobs" parameter for the gpu command line is in place, no combination of less than 1 CPU per GPU keeps the CPU's from pegging at 100%.

Since all previous experience points to slowed total production with a CPU running at 100%, it looks like the only other combination that is reasonable to explore would be something like 0.49 CPU per 1 GPU without the -nobs parameter.

I know that the GPU average processing speed will slow "some" but will the increased # of CPU threads that can now crunch offset that slowdown enough to give me more total production?

Ah, another experiment :)

Tom
I will stop procrastinating tomorrow.
\\// Live Long & Prosper (starting tomorrow ;)
ID: 1993640 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1718
Credit: 700,208,369
RAC: 2,415,110
United States
Message 1993641 - Posted: 13 May 2019, 16:10:52 UTC - in response to Message 1993639.  

the shorter length and higher quality both add to the stability. a shorter run will be less susceptible to crosstalk and interference in the link, which is the biggest cause of problems with the cheap cables.

and about gen2, i wouldn't sweat it if you could hypothetically get more lanes (i know you can't on that board, just making a general point). PCIe gen2 is half speed compared to gen3. so if you have a x2 or x4 link. you're in the same ballpark as pcie gen 3 x1. but this also requires the use of a ribbon riser or plugging the card directly to the board. which will be a case by case thing depending on hardware.

the lack of PCIe gen3 lanes available via the chipset is the biggest drawback of the AMD consumer end platforms in my opinion. Thread ripper has a bunch via the CPU, but its a much more expensive platform. Whereas even older Intel platforms have had PCIe gen3 via the chipset for a long time now.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1993641 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1718
Credit: 700,208,369
RAC: 2,415,110
United States
Message 1993870 - Posted: 15 May 2019, 15:03:33 UTC - in response to Message 1993633.  
Last modified: 15 May 2019, 15:17:13 UTC

I am pleased to note that my AMD box with 9 gpus has been running without crashing since Saturday afternoon. My previous experience with running a 9 gpu setup on my Intel box suggests that if it runs more than 3 days non-stop, then it MIGHT be stable.

I think I will shoot for 5 days. If it is still running then, I will start transplanting my high end gpus from my Intel box into the AMD system. At least get the AMD box to the point where it has 3 gtx 1070's which turn out to run about 10 seconds slower than my 3 gtx 1070Ti's do but at a significant cost reduction.

Tom


i see your system is still going (as of now) with 9 GPUs. have you had any problems since replacing the cables? definitely interested to hear how the cables are holding up with the cheap splitters.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1993870 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3452
Credit: 195,992,647
RAC: 525,215
United States
Message 1993873 - Posted: 15 May 2019, 15:18:28 UTC - in response to Message 1993870.  

I am pleased to note that my AMD box with 9 gpus has been running without crashing since Saturday afternoon. My previous experience with running a 9 gpu setup on my Intel box suggests that if it runs more than 3 days non-stop, then it MIGHT be stable.

I think I will shoot for 5 days. If it is still running then, I will start transplanting my high end gpus from my Intel box into the AMD system. At least get the AMD box to the point where it has 3 gtx 1070's which turn out to run about 10 seconds slower than my 3 gtx 1070Ti's do but at a significant cost reduction.

Tom


i see your system is still going (as on now) with 9 GPUs. have you had any problems since replacing the cables? definitely interested to hear how the cables are holding up with the cheap splitters.


So far I have not had any problems with the AMD and cheap 1 to 4 extenders. Out of 3 or 4 of those I may have had 1 quit.
I have been running one 1 to 4 extender from the word go (when I needed more gpus than I had slots) on my Intel box. It is STILL running even though I have had to replace cables.

Today was the 5th day of running 9 gpus on the AMD box. Its been "boring", it just runs. So I re-deployed/added gtx 1070 series gpus to it. It was a simple unplug the card, replace with a gtx 1070ti or 1070. I have only changed out 3 since I will have to add power cables to manage the last two.

So the AMD box with 9 gpus is now running 1 gtx 1070Ti, 3 gtx 1070's and 5 gtx 1060 3GB's.
The Intel is running 7 gpus. 2 gtx 1070Ti's (plugged into the MB) and 5 gtx 1060 3GB's.

Since the Intel is likely to stop being my top performer I have increased the World Community Grid cpu threads mix to 10. Meanwhile the RAC on the Intel has stayed in the range of 330,000. I expect that to go down but heck, it might even stay on the first page of the Leader Board.

Tom
I will stop procrastinating tomorrow.
\\// Live Long & Prosper (starting tomorrow ;)
ID: 1993873 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1718
Credit: 700,208,369
RAC: 2,415,110
United States
Message 1994617 - Posted: 21 May 2019, 17:26:07 UTC
Last modified: 21 May 2019, 17:31:53 UTC



Measurement of the slot power of one of my 2070s, this was taken at the peak. About 42w max, which is exactly in line with the research I did prior to putting together this system with the ribbon risers. But it fluctuates a lot and only hits 3.5A briefly. Average power across the Whole WU is closer to about 25-30W

No problem at all running all 8GPUs from the MB.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1994617 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3452
Credit: 195,992,647
RAC: 525,215
United States
Message 1997670 - Posted: 10 Jun 2019, 13:37:39 UTC

https://setiathome.berkeley.edu/show_host_detail.php?hostid=8674981

Some of you will remember the active discussion about the "Wildcardcorp.com" machine.

I am wondering, how do we account for the very long wallclock times those gtx 1080Ti's are taking? Shouldn't a machine with that many gtx 1080Ti's be cranking out more RAC than that?

Tom
I will stop procrastinating tomorrow.
\\// Live Long & Prosper (starting tomorrow ;)
ID: 1997670 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9646
Credit: 887,309,859
RAC: 1,717,115
United States
Message 1997686 - Posted: 10 Jun 2019, 16:00:55 UTC - in response to Message 1997670.  

They are using sleep.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 1997686 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11567
Credit: 170,775,625
RAC: 103,474
Australia
Message 1997748 - Posted: 11 Jun 2019, 4:15:29 UTC

It's purely default stock settings- no optimised values used, no High Performance or High Priority settings. And for all we know they could be running 2, 3 or 5 or more WUs at time.
Yes, their output is extremely poor (even for SoG).
Grant
Darwin NT
ID: 1997748 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1718
Credit: 700,208,369
RAC: 2,415,110
United States
Message 2004539 - Posted: 27 Jul 2019, 17:48:44 UTC
Last modified: 27 Jul 2019, 17:53:08 UTC

Doesn't look like I ever posted about my water cooled server case. I just reconfigured it so here we go.

Case: Supermicro CSE-743tq-1200b (incl. 1200W Platinum PSU)
MB: Supermicro X9DRi-F
CPUs: Xeon E5-2630Lv2
RAM: 32GB (8x4GB) DDR3 Reg ECC 1600MHz
GPUs: 2080ti + 2080 + 2080 (all EVGA Black models)
CPU blocks: Watercool Heatkiller IV w/ narrow-ILM brackets
GPU blocks: EK-FC 2080(+ti) classic Nickel/Plexi
GPU bridge: EK-FC X3 Terminal Acetal Parallel flow
I/O adapter: Koolance Part No. BKT-PCI-G (I think the Alphacool product might be better, but this is what I had)
Pumps: (2x) XSPX D5 Vario in EK Dual D5 housing
Koolance quick disconnects
Radiator: Watercool MO-RA3 360
Fluid: Koolance 702 clear

pics:




the pumps and radiator are completely external, with the rad mounted in the window blowing all the hot air outside. here's an old pic of when i first set it up about a year ago (using a different fan now, those car rad fans i was using, were not reliable and would fail after about a month)



All in all, it works pretty well. I had to make some compromises on the memory layout due to the GPU in the first slot hanging over the first memory slot. but it won't affect anything really. I don't run CPU work so they only get used for sending work to/from the GPUs. you can see better in this pic I took during hardware testing:



The GPU (2080) in the last slot does run a bit warmer than the others, as i suspect it's not getting great flow through it. The best solution would be to put the outlet tubing on the opposite side of the GPUs but there doesn't look to be enough room for that between the bridge and the inside wall of the case. Maybe I'll look into it later but I don't really want to mess with it more because draining it for disassembly is a pain.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2004539 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3452
Credit: 195,992,647
RAC: 525,215
United States
Message 2004557 - Posted: 27 Jul 2019, 23:20:20 UTC - in response to Message 2004539.  

but I don't really want to mess with it more because draining it for disassembly is a pain.


So now you feeling the "drain" of this hobby?






Tom
I will stop procrastinating tomorrow.
\\// Live Long & Prosper (starting tomorrow ;)
ID: 2004557 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11567
Credit: 170,775,625
RAC: 103,474
Australia
Message 2004566 - Posted: 28 Jul 2019, 0:18:19 UTC - in response to Message 2004539.  

the pumps and radiator are completely external, with the rad mounted in the window blowing all the hot air outside. here's an old pic of when i first set it up about a year ago (using a different fan now, those car rad fans i was using, were not reliable and would fail after about a month)

I like that setup.
Reminds me of the system I saw pictures of years ago, where the overclocker had the radiator acting as a heat exchanger in his pool.[/quote]
Grant
Darwin NT
ID: 2004566 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1718
Credit: 700,208,369
RAC: 2,415,110
United States
Message 2004573 - Posted: 28 Jul 2019, 1:37:33 UTC - in response to Message 2004566.  

Haha. That’s awesome. I’ve read about guys putting long runs of tubing buried in their yard (below the frost line) with a powerful pump for insane amounts of heat dissipation capacity. I’d love to do that one day. It’s almost the perfect solution other than cost and effort required.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2004573 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3452
Credit: 195,992,647
RAC: 525,215
United States
Message 2007489 - Posted: 15 Aug 2019, 13:27:57 UTC
Last modified: 15 Aug 2019, 13:28:54 UTC

Found a
Supermicro X9DRX+-F Intel C602 Dual LGA2011 Proprietary Motherboard System Board
for $499 (used,[new, open box]). Ordered it.

After WOW I will start transplanting the guts of my Intel server onto it as well as all my fastest gpus.
It will go into the same Mining Rack Ian is using for that MB.

Tom
I will stop procrastinating tomorrow.
\\// Live Long & Prosper (starting tomorrow ;)
ID: 2007489 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1718
Credit: 700,208,369
RAC: 2,415,110
United States
Message 2007498 - Posted: 15 Aug 2019, 14:10:28 UTC - in response to Message 2007489.  

cool!

just make sure you get the right CPU coolers. that board is Narrow-ILM.

I use these guys, you'll need 2: Supermicro SNK-P0050AP4

Excellent/quiet heatsink.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2007498 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3452
Credit: 195,992,647
RAC: 525,215
United States
Message 2007648 - Posted: 16 Aug 2019, 4:18:51 UTC - in response to Message 2007498.  

cool!

just make sure you get the right CPU coolers. that board is Narrow-ILM.

I use these guys, you'll need 2: Supermicro SNK-P0050AP4

Excellent/quiet heatsink.


I think I have the quietest available narrow-ILM heat sinks already from a previous MB experiment that expired when the MB stopped booting :(

Even so, they are louder than the regular heat sinks.


Tom
I will stop procrastinating tomorrow.
\\// Live Long & Prosper (starting tomorrow ;)
ID: 2007648 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3452
Credit: 195,992,647
RAC: 525,215
United States
Message 2008430 - Posted: 20 Aug 2019, 0:55:56 UTC - in response to Message 2007648.  

cool!

just make sure you get the right CPU coolers. that board is Narrow-ILM.

I use these guys, you'll need 2: Supermicro SNK-P0050AP4

Excellent/quiet heatsink.


I think I have the quietest available narrow-ILM heat sinks already from a previous MB experiment that expired when the MB stopped booting :(

Even so, they are louder than the regular heat sinks.

Tom


The MB showed up in a well packaged box, in between rain storms :)

My brother suggested a local Glass Shop that has done custom work for him to get the MB mounting plate created.
So sometime after Aug 29th 1600UTC (WOW) I will have a "transplant" party which will move a lot of parts around and leave my current top performer short even more gpus.... :)

I sustain my CORE ENVY :)


Tom

Tom
I will stop procrastinating tomorrow.
\\// Live Long & Prosper (starting tomorrow ;)
ID: 2008430 · Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7

Message boards : Number crunching : Multi-GPU hardware setups: My method (feel free to share your own!)


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.