Why is my card so slow? (GTX750)

Message boards : Number crunching : Why is my card so slow? (GTX750)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1673866 - Posted: 4 May 2015, 17:14:06 UTC

I just discovered this today:
http://setiathome.berkeley.edu/workunit.php?wuid=1778912613

The computer of my wingman, equipped with a GTX750 also, did this task in less then 60% of the time it took my machine to crunch it.

I wonder why there's so much difference and why my machine is so slow. OK, the GPU from my wingman has more memory and slightly more clockspeed, on the other hand I am running lunatics apps and he doesn't. So why is his machine so much faster?

Ofc, besides the GPU his computer is much better (and newer). Is it possible that my rather old CPU and slow memory is the problem? Or the mainboard? There has to be a bottleneck somewhere that slows the GPU down. Any other thoughts?

And Garrett, when you maybe read this here: Do you run 2 instances on your card also?
ID: 1673866 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1673948 - Posted: 4 May 2015, 20:55:43 UTC - in response to Message 1673866.  

From a look at both of your machines and the stock priority settings you both are using, I would guess either your machine was busy doing something else while you were processing that task or likely, yes his CPU and memory are much better at feeding the GPU than yours. Your times for similar AR range task are not out of the ordinary. You could improve your times a bit by running a customized MBCUDA.CFG file and bump up your priority or blocks per launch. Try out some higher values than the stock 4, maybe 8 or 10. I really don't see anything to worry about.

Cheers, Keith
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1673948 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1673961 - Posted: 4 May 2015, 21:49:58 UTC

They could only be running 1 GPU task, while you might be running 2 or 3 tasks. Makes a big difference it run times.
ID: 1673961 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1673966 - Posted: 4 May 2015, 22:05:16 UTC - in response to Message 1673866.  
Last modified: 4 May 2015, 22:09:12 UTC

NX-01 wrote:
I just discovered this today:
http://setiathome.berkeley.edu/workunit.php?wuid=1778912613

The computer of my wingman, equipped with a GTX750 also, did this task in less then 60% of the time it took my machine to crunch it.

I wonder why there's so much difference and why my machine is so slow. OK, the GPU from my wingman has more memory and slightly more clockspeed, on the other hand I am running lunatics apps and he doesn't. So why is his machine so much faster?

Ofc, besides the GPU his computer is much better (and newer). Is it possible that my rather old CPU and slow memory is the problem? Or the mainboard? There has to be a bottleneck somewhere that slows the GPU down. Any other thoughts?

And Garrett, when you maybe read this here: Do you run 2 instances on your card also?

Send a PM to Garrett and you know it ...
Until now this service is for free. ;-)

How much SETI tasks you let run simultaneously on your GTX750? If just one ...
Because of the times, I guess he let run 2 tasks simultaneously (if he use the app_config.xml file).

Or you let run also 1 SETI and 1 AP task simultaneously on your VGA card?
Then the AP tasks could slow down the SETI task calculation - I guess ...
ID: 1673966 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1674204 - Posted: 5 May 2015, 21:19:09 UTC

Short explanation: This computer was my daily machine and now it is a dedicated cruncher. It runs a fresh, clean installation of Windows 7 and besides Boinc there are just 2 more programms installed. First one is Chrome, which I need to control the machine remotly and second is Panda, a cloudbased AV that uses very little resources. So, the machine wasn't used for anything else while the WU was crunched, but maybe some tasks from windows and/or Panda were running in the background.

@Keith: Yes, I think I should bump up the priority a bit. Can you tell me more about optimizing my MBCuda config file? I use a custom command line for AP but never tuned anything for MB.

@Brent: Ofc that's a possibility. But I guess most run 2 on those cards.

@Dirk: Yes, 2 tasks a time. But no AP was running while I crunched this WU.


Maybe it was really just coincidence, or my wingman maybe really just runs one task on his card. I will keep checking my results for other 750s from time to time.

Thx everybody!
ID: 1674204 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1674206 - Posted: 5 May 2015, 21:29:55 UTC

If the other person is running one-per GPU then that would explain a lot of the difference. Generally while running two-per results in a higher throughput it does mean that each task takes a longer time to run when compared with running one-per on the same hardware. The difference can be anything from a few percent on at the very top end to nearly double at the bottom end of GPUs. This means that while each task is taking longer the overall throughput is higher.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1674206 · Report as offensive
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13161
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1674248 - Posted: 5 May 2015, 23:29:30 UTC - in response to Message 1674204.  


@Keith: Yes, I think I should bump up the priority a bit. Can you tell me more about optimizing my MBCuda config file? I use a custom command line for AP but never tuned anything for MB.


There is a nice little utility called SetiPerformance over at eFMer's BoincTasks web site. It runs through some test files with 1-4 tasks per GPU utilization and times the results. You choose whatever utilization is the most efficient in your system for the fastest throughput.

If you are running the Lunatics optimized applications, you can simply open up the MBCUDA.cfg file in Notepad and read the explanation in the file how to adjust the parameters. The file is referenced in an app_info system configuration. I can't remember whether you are running stock applications or not. The stock 41zc application runs at below priority and you can boost that to above priority in the MBCUDA file. You can also adjust the pfblockspersm value from the stock 4 value to something appropriate to your 750, like an 8 or 10 value. You can also adjust the pfperiodsperlaunch value from the stock 100 to 200. These tweaks can really help out a dedicated cruncher in throughput on the GPU.

Another way to boost the stock application priority level when not running the optimized platform is to use a utility like ProcessLasso to boost the application priority up to above normal or just to normal instead of stock below normal. You can also do that in Task Manager but that is only for that one instance of the process. You would have to use the Process Priority Saver utility to make permanent the new elevated process priority level for each time the process is instantiated.

You might like to experiment with adjusting these settings and see if you can reduce the runtimes of a GPU task and boost the system throughput.

http://efmer.com/forum/index.php?topic=974.0

Cheers, Keith
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1674248 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1674422 - Posted: 6 May 2015, 17:23:49 UTC

Thx again Keith!
ID: 1674422 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1697423 - Posted: 1 Jul 2015, 18:58:34 UTC

Here's another one:

https://setiathome.berkeley.edu/workunit.php?wuid=1833332701

I just don't get it. Yes, it's a ti and he's probably just running 1 task at a time but still there shouldn't be that much difference. Look at the device peak, it's 582 GFLOPS for me and 2183 (!!) GFLOPS for my wingman. WTF? Can't be oc'd that much, can it? He also seems to use an older version of the AP app.
ID: 1697423 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1697440 - Posted: 1 Jul 2015, 20:03:06 UTC - in response to Message 1697423.  
Last modified: 1 Jul 2015, 20:11:11 UTC

Here's another one:

https://setiathome.berkeley.edu/workunit.php?wuid=1833332701

I just don't get it. Yes, it's a ti and he's probably just running 1 task at a time but still there shouldn't be that much difference. Look at the device peak, it's 582 GFLOPS for me and 2183 (!!) GFLOPS for my wingman. WTF? Can't be oc'd that much, can it? He also seems to use an older version of the AP app.

Well if you look at the task run times
Run time 	CPU time
5,642.90 	104.90 	 	Your machine
1,951.03 	1,946.46 	Their machine

With 99.765% of the task run time being done by the CPU. It would seem their CPU is faster then the GPU they are trying to use.

EDIT: Looking at the MB run times for both machines they are similar for normal AR tasks. With 800-1500 seconds run time.
Also looking at the Average processing rate for the two machines. You are likely running twice as many tasks at once as they are.
AstroPulse v7	SETI@home v7
398.68 GFLOPS	104.43 GFLOPS	Your machine
819.38 GFLOPS	203.57 GFLOPS	Their machine

SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1697440 · Report as offensive
castor

Send message
Joined: 2 Jan 02
Posts: 13
Credit: 17,721,708
RAC: 0
Finland
Message 1697443 - Posted: 1 Jul 2015, 20:31:01 UTC
Last modified: 1 Jul 2015, 20:37:08 UTC

I also have a 750ti running AP under linux, and they seem to be about 20% faster.
But I have the card in a slower pcie slot, and cmdline with a bit less of optimizations, so not too surprising.
ID: 1697443 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1697451 - Posted: 1 Jul 2015, 20:59:37 UTC - in response to Message 1697441.  

Here's another one:

https://setiathome.berkeley.edu/workunit.php?wuid=1833332701

I just don't get it. Yes, it's a ti and he's probably just running 1 task at a time but still there shouldn't be that much difference. Look at the device peak, it's 582 GFLOPS for me and 2183 (!!) GFLOPS for my wingman. WTF? Can't be oc'd that much, can it? He also seems to use an older version of the AP app.

Well if you look at the task run times
Run time 	CPU time
5,642.90 	104.90 	 	You
1,951.03 	1,946.46 	Them

With 99.765% of the task run time being done by the CPU. It would seem their CPU is faster then the GPU they are trying to use.

EDIT: Looking at the MB run times for both machines they are similar for normal AR tasks. With 800-1500 seconds run time.


The main difference is that the faster one, is not using sleep (Sleep() & wait for event loops disabled). The slower one is using sleep (Sleep() & wait for event loops will be used in some places) , which lowers CPU usage, but makes the tasks take a lot longer.

Edit: OpenCL on Nvidia, takes a full CPU core per task, unless you are using the -use_sleep command. And using -use_sleep will punish you with susbstantially longer run times.

Ah yes. The 100% CPU thing on NV cards. I forget about that sometimes.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1697451 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1697491 - Posted: 1 Jul 2015, 23:26:41 UTC

Well I can say, my AMD 4200+ (w/free core), and i5 (wo/free core), with identical 750Ti's, same settings ... my i5 runs about 10% faster. So YES the feeder DOES make a difference!
ID: 1697491 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11360
Credit: 29,581,041
RAC: 66
United States
Message 1697499 - Posted: 2 Jul 2015, 0:31:32 UTC - in response to Message 1697491.  

Well I can say, my AMD 4200+ (w/free core), and i5 (wo/free core), with identical 750Ti's, same settings ... my i5 runs about 10% faster. So YES the feeder DOES make a difference!

I wonder what your 750ti would do with a free core on the I5 would do.
ID: 1697499 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1697503 - Posted: 2 Jul 2015, 0:42:01 UTC - in response to Message 1697499.  

I leave 2 cores (1 real and 1 virtual) free on my i7

I don't use the percentage in the pull down folder or web based perferences.

I've never bought into the idea that when we set those those to say 87.5% of all cores (thereby leaving 2 untouched) that the computer will somehow override this restriction and use 1 of those 2 untouched free cores to feed the GPUs.

More likely, it's going to find percentages somewhere in that 87.5% to use for feeding the GPUs...

I went a different route, used a max concurrent in my app_config.xml to restrict total number that could be running at any 1 time. Thereby making sure 2 cores weren't being used for crunching and could be utilized by the Computer to feed the core. I've notice my times to complete improved dramatically for both CPU and GPU work units.

My 2 cents

(apologies to the original author of this thread, didn't mean to hijack it)
ID: 1697503 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1697511 - Posted: 2 Jul 2015, 1:13:43 UTC

My i5 does do marginally better work on 750Ti with a free core, but I feel that core does more work than I gain. If I was running more than 1 card, would most likely be a benefit.
ID: 1697511 · Report as offensive
qbit
Volunteer tester
Avatar

Send message
Joined: 19 Sep 04
Posts: 630
Credit: 6,868,528
RAC: 0
Austria
Message 1697637 - Posted: 2 Jul 2015, 6:43:16 UTC

Good morning folks!

I always thought that use_sleep on Nvidia cards doesn't make much difference in runtime. If it really does, why would most ppl recommend it?

@Hal: That's interresting, I didn't check the averages. But, with our machines beeing similar there (when taking into account that he just runs one task a time while I run two) how the hell can he get so much performance on this particular task?

BTW everybody: I don't complain about my card, I like it a lot. I'm just trying to understand things better so I can get the max out of it.
ID: 1697637 · Report as offensive
BetelgeuseFive Project Donor
Volunteer tester

Send message
Joined: 6 Jul 99
Posts: 158
Credit: 17,117,787
RAC: 19
Netherlands
Message 1697666 - Posted: 2 Jul 2015, 8:37:27 UTC - in response to Message 1697637.  

Good morning folks!

I always thought that use_sleep on Nvidia cards doesn't make much difference in runtime. If it really does, why would most ppl recommend it?

@Hal: That's interresting, I didn't check the averages. But, with our machines beeing similar there (when taking into account that he just runs one task a time while I run two) how the hell can he get so much performance on this particular task?

BTW everybody: I don't complain about my card, I like it a lot. I'm just trying to understand things better so I can get the max out of it.


If you use use_sleep when running only one task your GPU will not be fully used. However, if you run multiple tasks at the same time, you can use use_sleep and your GPU will still be at 99% (and you can use your CPU to do something else).

BTW, what settings do you use ? Your GTX-750 seems a little bit faster than mine (I am running two tasks at the same time).

Tom
ID: 1697666 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1697671 - Posted: 2 Jul 2015, 9:06:25 UTC
Last modified: 2 Jul 2015, 9:08:17 UTC

Also check if your GPU memory clock runs at the advertised rate. We've been able to collectively work out that Cuda will only push the GPU to the P2 power state, while a truly stable card/system can run the task with p3 clocks forced. nvidia inspector can be used to observe and correct this.

GTX 750 should be memory bound in theory, so this may have quite a fair impact with current applications.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1697671 · Report as offensive
Profile HAL9000
Volunteer tester
Avatar

Send message
Joined: 11 Sep 99
Posts: 6534
Credit: 196,805,888
RAC: 57
United States
Message 1697719 - Posted: 2 Jul 2015, 13:29:49 UTC - in response to Message 1697637.  

Good morning folks!

I always thought that use_sleep on Nvidia cards doesn't make much difference in runtime. If it really does, why would most ppl recommend it?

@Hal: That's interresting, I didn't check the averages. But, with our machines beeing similar there (when taking into account that he just runs one task a time while I run two) how the hell can he get so much performance on this particular task?

BTW everybody: I don't complain about my card, I like it a lot. I'm just trying to understand things better so I can get the max out of it.

I did find the SETI@home performance of my HD6870 went up considerably when I upgraded form a Core 2 Duo E8400 to a i5-4670K. I'm not sure which aspect of the newer system cause the increase. The CPU clock moving from 3.0GHz to 3.4GHz does not seem like it would be enough, at least not alone, to account for the GPU performance increase.
SETI@home classic workunits: 93,865 CPU time: 863,447 hours
Join the [url=http://tinyurl.com/8y46zvu]BP6/VP6 User Group[
ID: 1697719 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Why is my card so slow? (GTX750)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.