Best way to get more processing from GTX 1050ti alongside GTX 750ti/ GTX 950

Author	Message
ralphw Volunteer tester Send message Joined: 7 May 99 Posts: 78 Credit: 18,032,718 RAC: 38	Message 1872515 - Posted: 12 Jun 2017, 0:06:11 UTC Hello, I have thee models of NVIDIA GPU, NVIDIA 1050 (1) Ti (1) NVIDIA GTX 950 Ti (2) NVIDIA GTX 750 Ti (3) Spread across two systems What's the best strategy for having the fast cards (950,1050) process more data? There are some clients that have loop unrolling options, but is running multiple workunits on a GPU - by setting up an app_info.xml file - really taking advantage of the extra CUDA cores? ID: 1872515 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1872520 - Posted: 12 Jun 2017, 0:44:46 UTC - in response to Message 1872515. Last modified: 12 Jun 2017, 0:49:32 UTC What's the best strategy for having the fast cards (950,1050) process more data? Run Petrie's development application, or Tbar's recent applications based on it. There are some clients that have loop unrolling options, but is running multiple workunits on a GPU - by setting up an app_info.xml file - really taking advantage of the extra CUDA cores? Running more than 1 WU is only of benefit for some high end cards running the SoG application, or for cards that are running the older CUDA applications. If running Petrie's development application, or some of Tbar's based on that application, 1 WU at a time is best. You don't need an app_info to run multiple WUs (with the more recent BOINC managers). App_info allows you to run a non stock, or a specific stock, application. When running the SoG application there are several command line values you can use to significantly boost output from the default settings. Grant Darwin NT ID: 1872520 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1872525 - Posted: 12 Jun 2017, 1:19:43 UTC I was looking at the machines (8278492) you have listed. You have one with 3 Gtx 1050 Ti's. It is listing a speed of 183.52 GFLOPS. Before I moved a GTX 750 TI to another box, it had stabilized at 157.14 GFLOPS. The 1050 Ti's are supposed to be upto 40% faster than a 750 Ti. The TI version of the 1050 has more CUDA cores than the 750 TI. And you have 3 of them. Unless you just started up that machine, your GFlops "should" be 'on the order of' 3 X 157 = 471 Gflops. So (I think) you aren't getting the production you are paying for. There are confounding factors in my data but you should be getting north of 300 Gflops at least. Since you are running Linux I can't offer any Linux ideas but it might make sense to post your equivalent of the Windows MB*Sog.txt command line file in case we have some ideas on how to improve them. Here is the one I am running the Gtx 750 Ti with "-sbs 512 -spike_fft_thresh 2048 -tune 1 64 1 4 -period_iterations_num 4" if your command line for the 1050 isn't something similar to this (or has larger parameters) that might be slowing it down. I don't understand all I think I know, but I ran across a 2 Gtx 750 Ti setup where the command line had this for each card: -tune 1 64 1 4 -tune 2 64 1 4 Apparently the first number refers to each card? HTH, Tom A proud member of the OFA (Old Farts Association). ID: 1872525 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1872526 - Posted: 12 Jun 2017, 1:25:23 UTC - in response to Message 1872515. Hello, I have thee models of NVIDIA GPU, NVIDIA 1050 (1) Ti (1) NVIDIA GTX 950 Ti (2) NVIDIA GTX 750 Ti (3) Spread across two systems What's the best strategy for having the fast cards (950,1050) process more data? I would put the 3 GTX 750 Ti cards on one system and the GTX 950 Ti's on the 2nd system and leave the 1050 out of the mix while you try to come up with some reasonable parameters and locations to put them in for the 1050. Don't forget to setup the MB*SOG.txt command lines/files/parameters. Tom A proud member of the OFA (Old Farts Association). ID: 1872526 ·

Wiggo Send message Joined: 24 Jan 00 Posts: 34744 Credit: 261,360,520 RAC: 489	Message 1872529 - Posted: 12 Jun 2017, 1:45:46 UTC Tom, don't go by the computer's details as that only shows the number of GPU's listed by what BOINC considers to be the most powerful. ;-) Look instead at the result details of a completed GPU task and you'll see that the rig listed with the 1050Ti also has a pair of 750Ti's in it while the other just has a pair of 950's (I don't believe that Nvidia ever released a Ti version of the 950's and I can't see the 3rd 750Ti anywhere). ralphw your rig with the 1050Ti/750Ti combo will have to be tuned to suit the 750Ti's performance as tuning to the 1050Ti performance may not suit the 750Ti's at all and I'm sure that someone will supply those tuning settings for both of your rigs. Cheers. ID: 1872529 ·

ralphw Volunteer tester Send message Joined: 7 May 99 Posts: 78 Credit: 18,032,718 RAC: 38	Message 1872556 - Posted: 12 Jun 2017, 9:53:25 UTC - in response to Message 1872529. Thanks. That is my primary configuration (GTX 1050 Ti alongside two GTX 750 Ti systems). The third 750 Ti (from MSI) is currently in an inactive system. ID: 1872556 ·

Tom M Volunteer tester Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462	Message 1872583 - Posted: 12 Jun 2017, 16:03:38 UTC - in response to Message 1872529. Last modified: 12 Jun 2017, 16:09:59 UTC Tom, don't go by the computer's details as that only shows the number of GPU's listed by what BOINC considers to be the most powerful. ;-) --------- ralphw your rig with the 1050Ti/750Ti combo will have to be tuned to suit the 750Ti's performance as tuning to the 1050Ti performance may not suit the 750Ti's at all and I'm sure that someone will supply those tuning settings for both of your rigs. Cheers. Wiggo, I was replying to how to "maximize" production based on the theory that using the same cards on a single PC will maximize the production for those cards. He had offered a list of cards. I didn't look any further. I just looked at 8278492 and I must say I can come close to the listed Gflops with a single Gtx 750 Ti running Lunatix under Windows so something is out of tune. As for "best" SOG command line for GTX 750 Ti, I am using: -sbs 512 -spike_fft_thresh 2048 -tune 1 64 1 4 -period_iterations_num 4 I ran across the following variation for multiple card machines: -sbs 512 -spike_fft_thresh 2048 -tune 1 64 1 4 -tune 2 64 1 4 -tune 3 64 1 4 -period_iterations_num 4 I can't swear that the multiple -tune commands, one for each card work exactly like that. Nor can I swear that whichever one turns out to be the 1050 shouldn't be different. If you add the -hp it makes the system laggy but does seem to load up my gpu a bit more. (from low 90's to high 90's). I suspect that the above command line(s) MIGHT improve the overall production on the mixed 750/1050 machine. Tom A proud member of the OFA (Old Farts Association). ID: 1872583 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1872585 - Posted: 12 Jun 2017, 16:17:31 UTC For the 750TI`s i suggest -sbs 684 -spike_fft_thresh 2048 -tune 1 64 1 4 -period_iterations_num 4 Tune 2 and tune 3 don`t give much benefit on those cards. But this is host dependent. With each crime and every kindness we birth our future. ID: 1872585 ·

ralphw Volunteer tester Send message Joined: 7 May 99 Posts: 78 Credit: 18,032,718 RAC: 38	Message 1873749 - Posted: 18 Jun 2017, 5:33:47 UTC - in response to Message 1872556. I ended up moving an MSI GTX 750 Ti back into this system. I was expecting to put four GPUs on this motherboard, but I apparently need all of my 750 Ti systems to be the shorter 5-6" long cards. Only the first slot of this Gigabyte motherboard really accommodates a full-length card such as MSI's dual-fan GTX 750Ti. The fan shroud and card length really don't fit well with the other heat sinks and other connectors I will have to limit myself to the smaller form factor GPUs to mechanically use all of my remaining motherboard slots I'll see how well the WU averages keep up. ID: 1873749 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1873760 - Posted: 18 Jun 2017, 7:14:56 UTC - in response to Message 1872520. Last modified: 18 Jun 2017, 7:34:04 UTC Run Petrie's development application, or Tbar's recent applications based on it. Running more than 1 WU is only of benefit for some high end cards running the SoG application, or for cards that are running the older CUDA applications. If running Petrie's development application, or some of Tbar's based on that application, 1 WU at a time is best. . . Petri's app definitely. . . Actually Grant both my GTX950 and GTX1050ti give/gave better results under SoG by running 2 at a time. But that is r3557, I can't speak for r3584 (V8.22). Stephen .. ID: 1873760 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1873762 - Posted: 18 Jun 2017, 7:24:03 UTC - in response to Message 1872525. Last modified: 18 Jun 2017, 7:39:59 UTC I was looking at the machines (8278492) you have listed. You have one with 3 Gtx 1050 Ti's. It is listing a speed of 183.52 GFLOPS. Before I moved a GTX 750 TI to another box, it had stabilized at 157.14 GFLOPS. . . Hmmm, something strange there. Oh he has moved some cards around ... . . But if he is willing to tackle running CUDA80 with Petri's special app it will definitely improve his productivity. It is not hard to do when he is already running Linux. Stephen ?? ID: 1873762 ·

TheHoosh Send message Joined: 17 Aug 12 Posts: 12 Credit: 11,693,138 RAC: 0	Message 1879233 - Posted: 20 Jul 2017, 12:03:27 UTC Two days ago I've added a KFAÂ² 1050 Ti to my main cruncher, which is already housing a Palit 750 Ti StormX Dual (using the CUDA80 app under Linux, driver version 381.22). So far, everything is looking good. Running several hundred WUs for Milkyway and Einstein confirmed, that the 1050 Ti is about 35%-40% faster than the 750 Ti. However, with SETI the 1050 Ti runs only ~15% faster than the 750 Ti, which is odd considering the results for Milkyway and Einstein. I'm crunching only 1 WU per GPU and have set the unroll option to "autotune". My 750 Ti needs 640s to crunch one WU, whereas the 1050 Ti needs 550s, although I would expect it to be around 400s per WU. Is there anything I need to adjust in my app_info.xml in order to leverage the 1050 Ti's full potential for SETI? As far as I've understood, the command line options that have been discussed in this thread only apply to the OpenCL application. ID: 1879233 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1879303 - Posted: 20 Jul 2017, 19:34:25 UTC - in response to Message 1879233. Linux Questions are better here http://setiathome.berkeley.edu/forum_thread.php?id=80636 I run autotune -nobs ... it uses a full core but increases performance. ID: 1879303 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1879388 - Posted: 21 Jul 2017, 6:07:16 UTC - in response to Message 1879233. . . Hi Hoosh, . . I am running a 1050ti on a C2D E7600 rig using Linux and CUDA80, it is in a x16 slot and is the only GPU in that rig. I am not sure what differences may exist between our two setups but my runtimes are as follows: . . Normal AR Arecibo tasks ... 250 to 300 secs . . Halflings (VHAR) ... 125 to 160 secs . . GBT Blc05 tasks ... 480 to 520 secs. . . I am at a loss to understand why yours are so much longer. The only 'tweaks' I know of for CUDA80 (special) are -nobs to disable blocking sync and -unroll autune which is very necessary with two different cards with different abilites which you say you have already set. If you want to use -nobs be sure you have plenty of CPU resources for it to use, it will use 100% of a CPU core and then some, so with 2 x GPUs you would need three spare CPU cores to get the full benefit of it. . . Good luck Stephen . ID: 1879388 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 1879391 - Posted: 21 Jul 2017, 6:26:51 UTC The default is blocking sync disabled, so the command line should look something like: -unroll autotune To use blocking sync (which can help reduce temperature problems) the command line becomes -bs -unroll autotune or if you want to fiddle around with the unroll command will also take an integer value in the range 1 to ?? (possibly 64?) -unroll x When I had two GTX1080 and a GTX980 in the same system (in the days before autotune) I had to use a compromise value for unroll which was OK for the GTX980, but too low for the GTX1080s - using the value for the GTX1080s resulted in the GTX980 not playing ball at all. It would appear that having unroll set too low is "OK", but too high is "very bad". Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1879391 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1879396 - Posted: 21 Jul 2017, 6:56:44 UTC - in response to Message 1879391. Hey Rob, What are you running for 1080 cards? I'm curious as to the specs as I seem to outperform you. I have EVGA 6188 Hybrids. Maybe it's thermal throttling, don't know. ID: 1879396 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 1879398 - Posted: 21 Jul 2017, 7:16:19 UTC Just basic single fan cards, no over clocking or anything like that - and the computer does spend a few hours a day off doing other things. It may be thermal issues as the "room" is currently at >40C (outside its <20C) - I need to sort the air-con (again). Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1879398 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1879947 - Posted: 23 Jul 2017, 19:40:37 UTC Hi, you could add option -pfb 32 The latest version has blocking sync on by default an it can be disabled with -nobs The -unroll autotune should give best performance in mixed gpu setups To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1879947 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.