Best way to get more processing from GTX 1050ti alongside GTX 750ti/ GTX 950

Message boards : Number crunching : Best way to get more processing from GTX 1050ti alongside GTX 750ti/ GTX 950
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile ralphw
Volunteer tester

Send message
Joined: 7 May 99
Posts: 72
Credit: 10,666,926
RAC: 16,841
United States
Message 1872515 - Posted: 12 Jun 2017, 0:06:11 UTC

Hello,

I have thee models of NVIDIA GPU,

  • NVIDIA 1050 (1) Ti (1)
  • NVIDIA GTX 950 Ti (2)
  • NVIDIA GTX 750 Ti (3)


Spread across two systems

What's the best strategy for having the fast cards (950,1050) process more data?
There are some clients that have loop unrolling options, but is running multiple workunits on a GPU - by setting up an app_info.xml file - really taking advantage of the extra CUDA cores?


ID: 1872515 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 8893
Credit: 115,251,013
RAC: 70,859
Australia
Message 1872520 - Posted: 12 Jun 2017, 0:44:46 UTC - in response to Message 1872515.  
Last modified: 12 Jun 2017, 0:49:32 UTC

What's the best strategy for having the fast cards (950,1050) process more data?

Run Petrie's development application, or Tbar's recent applications based on it.

There are some clients that have loop unrolling options, but is running multiple workunits on a GPU - by setting up an app_info.xml file - really taking advantage of the extra CUDA cores?

Running more than 1 WU is only of benefit for some high end cards running the SoG application, or for cards that are running the older CUDA applications.
If running Petrie's development application, or some of Tbar's based on that application, 1 WU at a time is best.

You don't need an app_info to run multiple WUs (with the more recent BOINC managers). App_info allows you to run a non stock, or a specific stock, application.

When running the SoG application there are several command line values you can use to significantly boost output from the default settings.
Grant
Darwin NT
ID: 1872520 · Report as offensive
Profile Tom Miller
Volunteer tester
Avatar

Send message
Joined: 28 Nov 02
Posts: 768
Credit: 18,731,376
RAC: 16,210
United States
Message 1872525 - Posted: 12 Jun 2017, 1:19:43 UTC

I was looking at the machines (8278492) you have listed. You have one with 3 Gtx 1050 Ti's. It is listing a speed of 183.52 GFLOPS. Before I moved a GTX 750 TI to another box, it had stabilized at 157.14 GFLOPS.

The 1050 Ti's are supposed to be upto 40% faster than a 750 Ti. The TI version of the 1050 has more CUDA cores than the 750 TI. And you have 3 of them. Unless you just started up that machine, your GFlops "should" be 'on the order of' 3 X 157 = 471 Gflops. So (I think) you aren't getting the production you are paying for. There are confounding factors in my data but you should be getting north of 300 Gflops at least.

Since you are running Linux I can't offer any Linux ideas but it might make sense to post your equivalent of the Windows MB*Sog.txt command line file in case we have some ideas on how to improve them.

Here is the one I am running the Gtx 750 Ti with "-sbs 512 -spike_fft_thresh 2048 -tune 1 64 1 4 -period_iterations_num 4" if your command line for the 1050 isn't something similar to this (or has larger parameters) that might be slowing it down. I don't understand all I think I know, but I ran across a 2 Gtx 750 Ti setup where the command line had this for each card: -tune 1 64 1 4 -tune 2 64 1 4 Apparently the first number refers to each card?

HTH,
Tom
"You are entitled to your own opinion but not to your own facts." Senator and Professor Patrick Moynihan
---
https://GalensonConsulting.WordPress.com
ID: 1872525 · Report as offensive
Profile Tom Miller
Volunteer tester
Avatar

Send message
Joined: 28 Nov 02
Posts: 768
Credit: 18,731,376
RAC: 16,210
United States
Message 1872526 - Posted: 12 Jun 2017, 1:25:23 UTC - in response to Message 1872515.  

Hello,

I have thee models of NVIDIA GPU,

  • NVIDIA 1050 (1) Ti (1)
  • NVIDIA GTX 950 Ti (2)
  • NVIDIA GTX 750 Ti (3)


Spread across two systems

What's the best strategy for having the fast cards (950,1050) process more data?



I would put the 3 GTX 750 Ti cards on one system and the GTX 950 Ti's on the 2nd system and leave the 1050 out of the mix while you try to come up with some reasonable parameters and locations to put them in for the 1050. Don't forget to setup the MB*SOG.txt command lines/files/parameters.

Tom
"You are entitled to your own opinion but not to your own facts." Senator and Professor Patrick Moynihan
---
https://GalensonConsulting.WordPress.com
ID: 1872526 · Report as offensive
Profile Wiggo "Socialist"
Avatar

Send message
Joined: 24 Jan 00
Posts: 12607
Credit: 169,502,545
RAC: 87,094
Australia
Message 1872529 - Posted: 12 Jun 2017, 1:45:46 UTC

Tom, don't go by the computer's details as that only shows the number of GPU's listed by what BOINC considers to be the most powerful. ;-)

Look instead at the result details of a completed GPU task and you'll see that the rig listed with the 1050Ti also has a pair of 750Ti's in it while the other just has a pair of 950's (I don't believe that Nvidia ever released a Ti version of the 950's and I can't see the 3rd 750Ti anywhere).

ralphw your rig with the 1050Ti/750Ti combo will have to be tuned to suit the 750Ti's performance as tuning to the 1050Ti performance may not suit the 750Ti's at all and I'm sure that someone will supply those tuning settings for both of your rigs.

Cheers.
ID: 1872529 · Report as offensive
Profile ralphw
Volunteer tester

Send message
Joined: 7 May 99
Posts: 72
Credit: 10,666,926
RAC: 16,841
United States
Message 1872556 - Posted: 12 Jun 2017, 9:53:25 UTC - in response to Message 1872529.  

Thanks.

That is my primary configuration (GTX 1050 Ti alongside two GTX 750 Ti systems).

The third 750 Ti (from MSI) is currently in an inactive system.
ID: 1872556 · Report as offensive
Profile Tom Miller
Volunteer tester
Avatar

Send message
Joined: 28 Nov 02
Posts: 768
Credit: 18,731,376
RAC: 16,210
United States
Message 1872583 - Posted: 12 Jun 2017, 16:03:38 UTC - in response to Message 1872529.  
Last modified: 12 Jun 2017, 16:09:59 UTC

Tom, don't go by the computer's details as that only shows the number of GPU's listed by what BOINC considers to be the most powerful. ;-)
---------
ralphw your rig with the 1050Ti/750Ti combo will have to be tuned to suit the 750Ti's performance as tuning to the 1050Ti performance may not suit the 750Ti's at all and I'm sure that someone will supply those tuning settings for both of your rigs.

Cheers.


Wiggo,
I was replying to how to "maximize" production based on the theory that using the same cards on a single PC will maximize the production for those cards. He had offered a list of cards. I didn't look any further.

I just looked at 8278492 and I must say I can come close to the listed Gflops with a single Gtx 750 Ti running Lunatix under Windows so something is out of tune.

As for "best" SOG command line for GTX 750 Ti, I am using: -sbs 512 -spike_fft_thresh 2048 -tune 1 64 1 4 -period_iterations_num 4

I ran across the following variation for multiple card machines: -sbs 512 -spike_fft_thresh 2048 -tune 1 64 1 4 -tune 2 64 1 4 -tune 3 64 1 4 -period_iterations_num 4

I can't swear that the multiple -tune commands, one for each card work exactly like that. Nor can I swear that whichever one turns out to be the 1050 shouldn't be different.

If you add the -hp it makes the system laggy but does seem to load up my gpu a bit more. (from low 90's to high 90's).

I suspect that the above command line(s) MIGHT improve the overall production on the mixed 750/1050 machine.

Tom
"You are entitled to your own opinion but not to your own facts." Senator and Professor Patrick Moynihan
---
https://GalensonConsulting.WordPress.com
ID: 1872583 · Report as offensive
Profile MikeProject Donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 30612
Credit: 57,676,486
RAC: 30,836
Germany
Message 1872585 - Posted: 12 Jun 2017, 16:17:31 UTC

For the 750TI`s i suggest -sbs 684 -spike_fft_thresh 2048 -tune 1 64 1 4 -period_iterations_num 4

Tune 2 and tune 3 don`t give much benefit on those cards.
But this is host dependent.
With each crime and every kindness we birth our future.
ID: 1872585 · Report as offensive
Profile ralphw
Volunteer tester

Send message
Joined: 7 May 99
Posts: 72
Credit: 10,666,926
RAC: 16,841
United States
Message 1873749 - Posted: 18 Jun 2017, 5:33:47 UTC - in response to Message 1872556.  

I ended up moving an MSI GTX 750 Ti back into this system.

I was expecting to put four GPUs on this motherboard, but I apparently need all of my 750 Ti systems to be
the shorter 5-6" long cards.

Only the first slot of this Gigabyte motherboard really accommodates a full-length card such as MSI's dual-fan GTX 750Ti.
The fan shroud and card length really don't fit well with the other heat sinks and other connectors

I will have to limit myself to the smaller form factor GPUs to mechanically use all of my remaining motherboard slots

I'll see how well the WU averages keep up.
ID: 1873749 · Report as offensive
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2637
Credit: 48,577,573
RAC: 137,068
Australia
Message 1873760 - Posted: 18 Jun 2017, 7:14:56 UTC - in response to Message 1872520.  
Last modified: 18 Jun 2017, 7:34:04 UTC


Run Petrie's development application, or Tbar's recent applications based on it.

Running more than 1 WU is only of benefit for some high end cards running the SoG application, or for cards that are running the older CUDA applications.
If running Petrie's development application, or some of Tbar's based on that application, 1 WU at a time is best.


. . Petri's app definitely.

. . Actually Grant both my GTX950 and GTX1050ti give/gave better results under SoG by running 2 at a time. But that is r3557, I can't speak for r3584 (V8.22).

Stephen

..
ID: 1873760 · Report as offensive
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2637
Credit: 48,577,573
RAC: 137,068
Australia
Message 1873762 - Posted: 18 Jun 2017, 7:24:03 UTC - in response to Message 1872525.  
Last modified: 18 Jun 2017, 7:39:59 UTC

I was looking at the machines (8278492) you have listed. You have one with 3 Gtx 1050 Ti's. It is listing a speed of 183.52 GFLOPS. Before I moved a GTX 750 TI to another box, it had stabilized at 157.14 GFLOPS.


. . Hmmm, something strange there. Oh he has moved some cards around ...

. . But if he is willing to tackle running CUDA80 with Petri's special app it will definitely improve his productivity. It is not hard to do when he is already running Linux.

Stephen

??
ID: 1873762 · Report as offensive
TheHoosh

Send message
Joined: 17 Aug 12
Posts: 11
Credit: 6,693,181
RAC: 35,793
Germany
Message 1879233 - Posted: 20 Jul 2017, 12:03:27 UTC

Two days ago I've added a KFA² 1050 Ti to my main cruncher, which is already housing a Palit 750 Ti StormX Dual (using the CUDA80 app under Linux, driver version 381.22).
So far, everything is looking good. Running several hundred WUs for Milkyway and Einstein confirmed, that the 1050 Ti is about 35%-40% faster than the 750 Ti.

However, with SETI the 1050 Ti runs only ~15% faster than the 750 Ti, which is odd considering the results for Milkyway and Einstein.
I'm crunching only 1 WU per GPU and have set the unroll option to "autotune".
My 750 Ti needs 640s to crunch one WU, whereas the 1050 Ti needs 550s, although I would expect it to be around 400s per WU.

Is there anything I need to adjust in my app_info.xml in order to leverage the 1050 Ti's full potential for SETI?
As far as I've understood, the command line options that have been discussed in this thread only apply to the OpenCL application.
ID: 1879233 · Report as offensive
Profile Brent Norman
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 1824
Credit: 107,589,887
RAC: 461,468
Canada
Message 1879303 - Posted: 20 Jul 2017, 19:34:25 UTC - in response to Message 1879233.  

Linux Questions are better here http://setiathome.berkeley.edu/forum_thread.php?id=80636

I run autotune -nobs ... it uses a full core but increases performance.
ID: 1879303 · Report as offensive
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2637
Credit: 48,577,573
RAC: 137,068
Australia
Message 1879388 - Posted: 21 Jul 2017, 6:07:16 UTC - in response to Message 1879233.  

. . Hi Hoosh,

. . I am running a 1050ti on a C2D E7600 rig using Linux and CUDA80, it is in a x16 slot and is the only GPU in that rig. I am not sure what differences may exist between our two setups but my runtimes are as follows:

. . Normal AR Arecibo tasks ... 250 to 300 secs
. . Halflings (VHAR) ... 125 to 160 secs
. . GBT Blc05 tasks ... 480 to 520 secs.

. . I am at a loss to understand why yours are so much longer. The only 'tweaks' I know of for CUDA80 (special) are -nobs to disable blocking sync and -unroll autune which is very necessary with two different cards with different abilites which you say you have already set. If you want to use -nobs be sure you have plenty of CPU resources for it to use, it will use 100% of a CPU core and then some, so with 2 x GPUs you would need three spare CPU cores to get the full benefit of it.

. . Good luck

Stephen

.
ID: 1879388 · Report as offensive
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 15209
Credit: 252,548,772
RAC: 325,600
United Kingdom
Message 1879391 - Posted: 21 Jul 2017, 6:26:51 UTC

The default is blocking sync disabled, so the command line should look something like:
-unroll autotune


To use blocking sync (which can help reduce temperature problems) the command line becomes
-bs -unroll autotune


or if you want to fiddle around with the unroll command will also take an integer value in the range 1 to ?? (possibly 64?)
-unroll x


When I had two GTX1080 and a GTX980 in the same system (in the days before autotune) I had to use a compromise value for unroll which was OK for the GTX980, but too low for the GTX1080s - using the value for the GTX1080s resulted in the GTX980 not playing ball at all. It would appear that having unroll set too low is "OK", but too high is "very bad".
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1879391 · Report as offensive
Profile Brent Norman
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 1824
Credit: 107,589,887
RAC: 461,468
Canada
Message 1879396 - Posted: 21 Jul 2017, 6:56:44 UTC - in response to Message 1879391.  

Hey Rob, What are you running for 1080 cards? I'm curious as to the specs as I seem to outperform you.
I have EVGA 6188 Hybrids.

Maybe it's thermal throttling, don't know.
ID: 1879396 · Report as offensive
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 15209
Credit: 252,548,772
RAC: 325,600
United Kingdom
Message 1879398 - Posted: 21 Jul 2017, 7:16:19 UTC

Just basic single fan cards, no over clocking or anything like that - and the computer does spend a few hours a day off doing other things.
It may be thermal issues as the "room" is currently at >40C (outside its <20C) - I need to sort the air-con (again).
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1879398 · Report as offensive
Profile petri33Project Donor
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1465
Credit: 270,118,169
RAC: 304,007
Finland
Message 1879947 - Posted: 23 Jul 2017, 19:40:37 UTC

Hi, you could add option -pfb 32
The latest version has blocking sync on by default an it can be disabled with -nobs
The -unroll autotune should give best performance in mixed gpu setups
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1879947 · Report as offensive

Message boards : Number crunching : Best way to get more processing from GTX 1050ti alongside GTX 750ti/ GTX 950


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.