Why is my card so slow? (GTX750)

Author	Message
qbit Volunteer tester Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0	Message 1673866 - Posted: 4 May 2015, 17:14:06 UTC I just discovered this today: http://setiathome.berkeley.edu/workunit.php?wuid=1778912613 The computer of my wingman, equipped with a GTX750 also, did this task in less then 60% of the time it took my machine to crunch it. I wonder why there's so much difference and why my machine is so slow. OK, the GPU from my wingman has more memory and slightly more clockspeed, on the other hand I am running lunatics apps and he doesn't. So why is his machine so much faster? Ofc, besides the GPU his computer is much better (and newer). Is it possible that my rather old CPU and slow memory is the problem? Or the mainboard? There has to be a bottleneck somewhere that slows the GPU down. Any other thoughts? And Garrett, when you maybe read this here: Do you run 2 instances on your card also? ID: 1673866 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1673948 - Posted: 4 May 2015, 20:55:43 UTC - in response to Message 1673866. From a look at both of your machines and the stock priority settings you both are using, I would guess either your machine was busy doing something else while you were processing that task or likely, yes his CPU and memory are much better at feeding the GPU than yours. Your times for similar AR range task are not out of the ordinary. You could improve your times a bit by running a customized MBCUDA.CFG file and bump up your priority or blocks per launch. Try out some higher values than the stock 4, maybe 8 or 10. I really don't see anything to worry about. Cheers, Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1673948 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1673961 - Posted: 4 May 2015, 21:49:58 UTC They could only be running 1 GPU task, while you might be running 2 or 3 tasks. Makes a big difference it run times. ID: 1673961 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 1673966 - Posted: 4 May 2015, 22:05:16 UTC - in response to Message 1673866. Last modified: 4 May 2015, 22:09:12 UTC NX-01 wrote: I just discovered this today: http://setiathome.berkeley.edu/workunit.php?wuid=1778912613 The computer of my wingman, equipped with a GTX750 also, did this task in less then 60% of the time it took my machine to crunch it. I wonder why there's so much difference and why my machine is so slow. OK, the GPU from my wingman has more memory and slightly more clockspeed, on the other hand I am running lunatics apps and he doesn't. So why is his machine so much faster? Ofc, besides the GPU his computer is much better (and newer). Is it possible that my rather old CPU and slow memory is the problem? Or the mainboard? There has to be a bottleneck somewhere that slows the GPU down. Any other thoughts? And Garrett, when you maybe read this here: Do you run 2 instances on your card also? Send a PM to Garrett and you know it ... Until now this service is for free. ;-) How much SETI tasks you let run simultaneously on your GTX750? If just one ... Because of the times, I guess he let run 2 tasks simultaneously (if he use the app_config.xml file). Or you let run also 1 SETI and 1 AP task simultaneously on your VGA card? Then the AP tasks could slow down the SETI task calculation - I guess ... ID: 1673966 ·

qbit Volunteer tester Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0	Message 1674204 - Posted: 5 May 2015, 21:19:09 UTC Short explanation: This computer was my daily machine and now it is a dedicated cruncher. It runs a fresh, clean installation of Windows 7 and besides Boinc there are just 2 more programms installed. First one is Chrome, which I need to control the machine remotly and second is Panda, a cloudbased AV that uses very little resources. So, the machine wasn't used for anything else while the WU was crunched, but maybe some tasks from windows and/or Panda were running in the background. @Keith: Yes, I think I should bump up the priority a bit. Can you tell me more about optimizing my MBCuda config file? I use a custom command line for AP but never tuned anything for MB. @Brent: Ofc that's a possibility. But I guess most run 2 on those cards. @Dirk: Yes, 2 tasks a time. But no AP was running while I crunched this WU. Maybe it was really just coincidence, or my wingman maybe really just runs one task on his card. I will keep checking my results for other 750s from time to time. Thx everybody! ID: 1674204 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22205 Credit: 416,307,556 RAC: 380	Message 1674206 - Posted: 5 May 2015, 21:29:55 UTC If the other person is running one-per GPU then that would explain a lot of the difference. Generally while running two-per results in a higher throughput it does mean that each task takes a longer time to run when compared with running one-per on the same hardware. The difference can be anything from a few percent on at the very top end to nearly double at the bottom end of GPUs. This means that while each task is taking longer the overall throughput is higher. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1674206 ·

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1674248 - Posted: 5 May 2015, 23:29:30 UTC - in response to Message 1674204. @Keith: Yes, I think I should bump up the priority a bit. Can you tell me more about optimizing my MBCuda config file? I use a custom command line for AP but never tuned anything for MB. There is a nice little utility called SetiPerformance over at eFMer's BoincTasks web site. It runs through some test files with 1-4 tasks per GPU utilization and times the results. You choose whatever utilization is the most efficient in your system for the fastest throughput. If you are running the Lunatics optimized applications, you can simply open up the MBCUDA.cfg file in Notepad and read the explanation in the file how to adjust the parameters. The file is referenced in an app_info system configuration. I can't remember whether you are running stock applications or not. The stock 41zc application runs at below priority and you can boost that to above priority in the MBCUDA file. You can also adjust the pfblockspersm value from the stock 4 value to something appropriate to your 750, like an 8 or 10 value. You can also adjust the pfperiodsperlaunch value from the stock 100 to 200. These tweaks can really help out a dedicated cruncher in throughput on the GPU. Another way to boost the stock application priority level when not running the optimized platform is to use a utility like ProcessLasso to boost the application priority up to above normal or just to normal instead of stock below normal. You can also do that in Task Manager but that is only for that one instance of the process. You would have to use the Process Priority Saver utility to make permanent the new elevated process priority level for each time the process is instantiated. You might like to experiment with adjusting these settings and see if you can reduce the runtimes of a GPU task and boost the system throughput. http://efmer.com/forum/index.php?topic=974.0 Cheers, Keith Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1674248 ·

qbit Volunteer tester Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0	Message 1674422 - Posted: 6 May 2015, 17:23:49 UTC Thx again Keith! ID: 1674422 ·

qbit Volunteer tester Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0	Message 1697423 - Posted: 1 Jul 2015, 18:58:34 UTC Here's another one: https://setiathome.berkeley.edu/workunit.php?wuid=1833332701 I just don't get it. Yes, it's a ti and he's probably just running 1 task at a time but still there shouldn't be that much difference. Look at the device peak, it's 582 GFLOPS for me and 2183 (!!) GFLOPS for my wingman. WTF? Can't be oc'd that much, can it? He also seems to use an older version of the AP app. ID: 1697423 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1697440 - Posted: 1 Jul 2015, 20:03:06 UTC - in response to Message 1697423. Last modified: 1 Jul 2015, 20:11:11 UTC Here's another one: https://setiathome.berkeley.edu/workunit.php?wuid=1833332701 I just don't get it. Yes, it's a ti and he's probably just running 1 task at a time but still there shouldn't be that much difference. Look at the device peak, it's 582 GFLOPS for me and 2183 (!!) GFLOPS for my wingman. WTF? Can't be oc'd that much, can it? He also seems to use an older version of the AP app. Well if you look at the task run times Run time CPU time 5,642.90 104.90 Your machine 1,951.03 1,946.46 Their machine With 99.765% of the task run time being done by the CPU. It would seem their CPU is faster then the GPU they are trying to use. EDIT: Looking at the MB run times for both machines they are similar for normal AR tasks. With 800-1500 seconds run time. Also looking at the Average processing rate for the two machines. You are likely running twice as many tasks at once as they are. AstroPulse v7 SETI@home v7 398.68 GFLOPS 104.43 GFLOPS Your machine 819.38 GFLOPS 203.57 GFLOPS Their machine SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1697440 ·

castor Send message Joined: 2 Jan 02 Posts: 13 Credit: 17,721,708 RAC: 0	Message 1697443 - Posted: 1 Jul 2015, 20:31:01 UTC Last modified: 1 Jul 2015, 20:37:08 UTC I also have a 750ti running AP under linux, and they seem to be about 20% faster. But I have the card in a slower pcie slot, and cmdline with a bit less of optimizations, so not too surprising. ID: 1697443 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1697451 - Posted: 1 Jul 2015, 20:59:37 UTC - in response to Message 1697441. Here's another one: https://setiathome.berkeley.edu/workunit.php?wuid=1833332701 I just don't get it. Yes, it's a ti and he's probably just running 1 task at a time but still there shouldn't be that much difference. Look at the device peak, it's 582 GFLOPS for me and 2183 (!!) GFLOPS for my wingman. WTF? Can't be oc'd that much, can it? He also seems to use an older version of the AP app. Well if you look at the task run times Run time CPU time 5,642.90 104.90 You 1,951.03 1,946.46 Them With 99.765% of the task run time being done by the CPU. It would seem their CPU is faster then the GPU they are trying to use. EDIT: Looking at the MB run times for both machines they are similar for normal AR tasks. With 800-1500 seconds run time. The main difference is that the faster one, is not using sleep (Sleep() & wait for event loops disabled). The slower one is using sleep (Sleep() & wait for event loops will be used in some places) , which lowers CPU usage, but makes the tasks take a lot longer. Edit: OpenCL on Nvidia, takes a full CPU core per task, unless you are using the -use_sleep command. And using -use_sleep will punish you with susbstantially longer run times. Ah yes. The 100% CPU thing on NV cards. I forget about that sometimes. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1697451 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1697491 - Posted: 1 Jul 2015, 23:26:41 UTC Well I can say, my AMD 4200+ (w/free core), and i5 (wo/free core), with identical 750Ti's, same settings ... my i5 runs about 10% faster. So YES the feeder DOES make a difference! ID: 1697491 ·

betreger Send message Joined: 29 Jun 99 Posts: 11361 Credit: 29,581,041 RAC: 66	Message 1697499 - Posted: 2 Jul 2015, 0:31:32 UTC - in response to Message 1697491. Well I can say, my AMD 4200+ (w/free core), and i5 (wo/free core), with identical 750Ti's, same settings ... my i5 runs about 10% faster. So YES the feeder DOES make a difference! I wonder what your 750ti would do with a free core on the I5 would do. ID: 1697499 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1697503 - Posted: 2 Jul 2015, 0:42:01 UTC - in response to Message 1697499. I leave 2 cores (1 real and 1 virtual) free on my i7 I don't use the percentage in the pull down folder or web based perferences. I've never bought into the idea that when we set those those to say 87.5% of all cores (thereby leaving 2 untouched) that the computer will somehow override this restriction and use 1 of those 2 untouched free cores to feed the GPUs. More likely, it's going to find percentages somewhere in that 87.5% to use for feeding the GPUs... I went a different route, used a max concurrent in my app_config.xml to restrict total number that could be running at any 1 time. Thereby making sure 2 cores weren't being used for crunching and could be utilized by the Computer to feed the core. I've notice my times to complete improved dramatically for both CPU and GPU work units. My 2 cents (apologies to the original author of this thread, didn't mean to hijack it) ID: 1697503 ·

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1697511 - Posted: 2 Jul 2015, 1:13:43 UTC My i5 does do marginally better work on 750Ti with a free core, but I feel that core does more work than I gain. If I was running more than 1 card, would most likely be a benefit. ID: 1697511 ·

qbit Volunteer tester Send message Joined: 19 Sep 04 Posts: 630 Credit: 6,868,528 RAC: 0	Message 1697637 - Posted: 2 Jul 2015, 6:43:16 UTC Good morning folks! I always thought that use_sleep on Nvidia cards doesn't make much difference in runtime. If it really does, why would most ppl recommend it? @Hal: That's interresting, I didn't check the averages. But, with our machines beeing similar there (when taking into account that he just runs one task a time while I run two) how the hell can he get so much performance on this particular task? BTW everybody: I don't complain about my card, I like it a lot. I'm just trying to understand things better so I can get the max out of it. ID: 1697637 ·

BetelgeuseFive Volunteer tester Send message Joined: 6 Jul 99 Posts: 158 Credit: 17,117,787 RAC: 19	Message 1697666 - Posted: 2 Jul 2015, 8:37:27 UTC - in response to Message 1697637. Good morning folks! I always thought that use_sleep on Nvidia cards doesn't make much difference in runtime. If it really does, why would most ppl recommend it? @Hal: That's interresting, I didn't check the averages. But, with our machines beeing similar there (when taking into account that he just runs one task a time while I run two) how the hell can he get so much performance on this particular task? BTW everybody: I don't complain about my card, I like it a lot. I'm just trying to understand things better so I can get the max out of it. If you use use_sleep when running only one task your GPU will not be fully used. However, if you run multiple tasks at the same time, you can use use_sleep and your GPU will still be at 99% (and you can use your CPU to do something else). BTW, what settings do you use ? Your GTX-750 seems a little bit faster than mine (I am running two tasks at the same time). Tom ID: 1697666 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1697671 - Posted: 2 Jul 2015, 9:06:25 UTC Last modified: 2 Jul 2015, 9:08:17 UTC Also check if your GPU memory clock runs at the advertised rate. We've been able to collectively work out that Cuda will only push the GPU to the P2 power state, while a truly stable card/system can run the task with p3 clocks forced. nvidia inspector can be used to observe and correct this. GTX 750 should be memory bound in theory, so this may have quite a fair impact with current applications. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1697671 ·

HAL9000 Volunteer tester Send message Joined: 11 Sep 99 Posts: 6534 Credit: 196,805,888 RAC: 57	Message 1697719 - Posted: 2 Jul 2015, 13:29:49 UTC - in response to Message 1697637. Good morning folks! I always thought that use_sleep on Nvidia cards doesn't make much difference in runtime. If it really does, why would most ppl recommend it? @Hal: That's interresting, I didn't check the averages. But, with our machines beeing similar there (when taking into account that he just runs one task a time while I run two) how the hell can he get so much performance on this particular task? BTW everybody: I don't complain about my card, I like it a lot. I'm just trying to understand things better so I can get the max out of it. I did find the SETI@home performance of my HD6870 went up considerably when I upgraded form a Core 2 Duo E8400 to a i5-4670K. I'm not sure which aspect of the newer system cause the increase. The CPU clock moving from 3.0GHz to 3.4GHz does not seem like it would be enough, at least not alone, to account for the GPU performance increase. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[ ID: 1697719 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.