Why so slow?

Questions and Answers : GPU applications : Why so slow?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Nick

Send message
Joined: 17 May 99
Posts: 96
Credit: 17,356,094
RAC: 0
United States
Message 928075 - Posted: 22 Aug 2009, 21:23:36 UTC

I have a Dell system with an Intel Q9300 quad core. I've had it for 6 months or more and I just added an EVGA card with a GeForce 9400 GT with 1G memory. The Cuda jobs I run seem to take just about the same time as each of the CPU cores requires to process a wu. I was expecting something faster, was I being unrealistic? Does screen resolution impact performance? I've got 2 monitors, one at 2560x1600 and the other at 1920x1600. Any thoughts?


ID: 928075 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 928115 - Posted: 23 Aug 2009, 2:03:10 UTC - in response to Message 928075.  

I have a Dell system with an Intel Q9300 quad core. I've had it for 6 months or more and I just added an EVGA card with a GeForce 9400 GT with 1G memory. The Cuda jobs I run seem to take just about the same time as each of the CPU cores requires to process a wu. I was expecting something faster, was I being unrealistic? Does screen resolution impact performance? I've got 2 monitors, one at 2560x1600 and the other at 1920x1600. Any thoughts?



the 9400 probably has around 4 gflops according to boinc. it tells you in the message area of the manager when it reports your cuda card. you are using a lot of vid memory with dual monitors, multiple very high resolutions, so i would bet there is little for it to process with not to mention how busy you may be keeping the gpu with whatever your desktop is doing. everything concerning the desktop config and usage chews video memory. my guess is it is close to right. go take a look at your completed cuda units in tasks off your host list and see if it shows successful completion or if it shows out of memory and processing on cpu instead, etc.. if the cuda units show successful completion, then it probably is accurate. the 9400 even if not used as a vid card but only as a cuda processor is not a blazing unit that will see 5 or 10 min workunits. it will take some time. but before you settle for that, you need to double check your production to be sure the card is being used and how much memory is available to cuda etc. its all in the completed workunits.
ID: 928115 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 928121 - Posted: 23 Aug 2009, 2:27:43 UTC - in response to Message 928075.  

i just took a look at your workunits. they look odd to me. do you have the cuda 2.2 toolkit installed and are the libraries where the cuda app can find them?
you have plenty of memory available so that is not an issue like i first thought.

here is what concerns me in your reports:


Work Unit Info:
...............
WU true angle range is :  1.479648
After app init: total GPU memory 1073741824	 free GPU memory 963768320

Flopcounter: 11519105207831.588000

Spike count:    0
Pulse count:    0
Triplet count:  2
Gaussian count: 0

Wall-clock time elapsed since last restart: 3348.9 seconds
class T_FFT<0>:	total=2.67e+006,	N=98124,	<>=27 (2.70e+001),	min=0 (0.00e+000)
class T_FFT<8>:	total=9.30e+001,	N=3,	<>=31 (3.10e+001),	min=31 (3.10e+001)
class T_FFT<16>:	total=1.25e+002,	N=7,	<>=17 (1.70e+001),	min=15 (1.50e+001)
class T_FFT<64>:	total=2.65e+002,	N=29,	<>=9 (9.00e+000),	min=0 (0.00e+000)
class T_FFT<256>:	total=0.00e+000,	N=115,	<>=0 (0.00e+000),	min=0 (0.00e+000)
class T_FFT<512>:	total=2.65e+003,	N=229,	<>=11 (1.10e+001),	min=0 (0.00e+000)
class T_FFT<1024>:	total=4.30e+003,	N=457,	<>=9 (9.00e+000),	min=0 (0.00e+000)
class T_FFT<2048>:	total=1.50e+004,	N=915,	<>=16 (1.60e+001),	min=0 (0.00e+000)
class T_FFT<4096>:	total=3.94e+003,	N=211,	<>=18 (1.80e+001),	min=15 (1.50e+001)
class T_FFT<8192>:	total=1.72e+004,	N=845,	<>=20 (2.00e+001),	min=15 (1.50e+001)
called boinc_finish

</stderr_txt>
]]>


is boinc being stopped a lot?

here is one of my workunits for comparison:

Work Unit Info:
...............
WU true angle range is : 0.432219

Flopcounter: 43227981027609.000000

Spike count: 21
Pulse count: 0
Triplet count: 0
Gaussian count: 2
called boinc_finish

</stderr_txt>
]]>


unfortunately i do not know enough about these reports to say for sure what is happening but there is definitely something...

don't just use the latest of everything because it is there, sometimes they do not match well. i would be sure to use one of the 185.18 series of drivers like the 185.18.29 or 185.18.31 along with the cuda toolkit 2.2. i do not think the 190 driver series and cuda 2.3 will fly properly on this card plus you need to match the cuda version of your application which is likely to be cuda 2.2 (at least mine is).

once this is fixed up you might also want to use the rebranding script available on lunatics to make sure no vlar or vhar units are fed to cuda and are assigned instead to cpu since they will cause extreme slowdown in gpu processing, plus i think you have the vlarkill version of the app so it will just trash them which is a waste.
ID: 928121 · Report as offensive
Nick

Send message
Joined: 17 May 99
Posts: 96
Credit: 17,356,094
RAC: 0
United States
Message 928124 - Posted: 23 Aug 2009, 2:41:01 UTC

Chuck -

I'm running:

CPUs - AK_v8_win_x64_SSE41

CUDA - MB_6.08_CUDA_V12_VLARKill_FPLim248.exe
driver: 19062 (V2.3 I believe)

Seti@home runs continuously.


ID: 928124 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 928125 - Posted: 23 Aug 2009, 2:54:45 UTC - in response to Message 928124.  

Chuck -

I'm running:

CPUs - AK_v8_win_x64_SSE41

CUDA - MB_6.08_CUDA_V12_VLARKill_FPLim248.exe
driver: 19062 (V2.3 I believe)

Seti@home runs continuously.



hmm im not familiar with the windows versioning on the app. the fplim248 confuses me etc.. im using linux.. if it is not too much of a hassle i would back off to one of the 185 series drivers i mentioned before and cuda 2.2 and see what happens. keep the setups around for 2.3 just in case there is no difference. yeah the 190 driver wants 2.3.

i cant run 2.3 as long as i run my tesla. it is a prerelease engineering version and when i installed 2.3 it went insane so i dropped back to 2.2. i don't think the older cards can use 2.3..

i'm going to ask a friend to stop in and look at your workunits to see what he thinks. he is an expert on these things and probably will know instantly what those fft lines mean.

ID: 928125 · Report as offensive
Nick

Send message
Joined: 17 May 99
Posts: 96
Credit: 17,356,094
RAC: 0
United States
Message 928141 - Posted: 23 Aug 2009, 4:37:37 UTC - in response to Message 928125.  

Thanks Chuck. Let me know what he says.


ID: 928141 · Report as offensive
Profile Gundolf Jahn

Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 446,358
RAC: 0
Germany
Message 928157 - Posted: 23 Aug 2009, 8:13:17 UTC - in response to Message 928075.  

What's your preference for
Suspend GPU work while computer is in use?

Gruß,
Gundolf
ID: 928157 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 928168 - Posted: 23 Aug 2009, 10:56:46 UTC - in response to Message 928141.  
Last modified: 23 Aug 2009, 11:07:43 UTC

Thanks Chuck. Let me know what he says.



they say the fft notations are normal for that build.. msg i received concerning it:

The FFT lines are normal for those builds, just keeping some statistics on those which may give clues for future improvements. The FPLim2048 in the file name is also related to the same effort.


so chances are your processing times are about right. the 9400 is one of the slower cuda cards having 16 shader cores running at 1.4ghz and your cpu is quite fast so i imagine they would be a close match with the 9400 probably pulling out for the win by a near margin. the high performance cards have 192, 216 or 240 shader cores. at least you can process more workunits than without :)

also as Gundolf says, check all your settings to be sure they are not suspended when computer is being used.
ID: 928168 · Report as offensive
Nick

Send message
Joined: 17 May 99
Posts: 96
Credit: 17,356,094
RAC: 0
United States
Message 928192 - Posted: 23 Aug 2009, 15:20:45 UTC - in response to Message 928168.  

Thanks Chuck, I'll accept your analysis.

What would you recommend as an alternative card? What's the best price/performance if Seti/CUDA is my main goal? I am not a gamer.

Nick


ID: 928192 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 928201 - Posted: 23 Aug 2009, 16:23:55 UTC - in response to Message 928192.  

Thanks Chuck, I'll accept your analysis.

What would you recommend as an alternative card? What's the best price/performance if Seti/CUDA is my main goal? I am not a gamer.

Nick



i was looking at that myself for a while. definitely the GT200 series cards.i think the best performance in keeping with cost savings is the gtx 260 216core SP edition. be careful there are 3 versions of this card. the one released in dec 2008 is the one you want with the gt200b (note the b) 55nm processor. its performance should come close to a tesla. tesla rates 933gflops and this card rates 805gflops. i had a moment of insanity and i decided to go for the gold so i got the gtx285 and have not looked back. in fact i am going to replace my tesla with a 2nd 285.

one of the most important features, more than clock speed, is the number of shaders or 'cores'. the more you can get the better it will perform.

********************

two extremely important things to consider.

performance graphics cards are power hungry heat generating monsters!

you will need a strong power supply. i use an antec signature series 850w for handling 2 cards... and that is barely enough. i find i should have gone for a 1000w one as i have little 'headroom' left. for one card a 650w should be sufficient as long as the power rail plugged into the card can supply the necessary current. this is *extremely* important. most of these high perf cards will require 2 pcie power plugs. if there is not quite enough power they will be clunky in performance if they even agree to run.

your case will also need a LOT of fresh air ventilation flowing through it to get rid of the excess heat these puppies toss out (mine has 8 fans not including cpu/gpu fans). additionally as soon as you get a card, use evga precision or riva tuner to increase the fan speed. watch your gpu and card ambient temp as it processes. i keep mine both at 100% to keep a max of 70c under load.

all these fans are stupidly set for slow hot operation to keep noise down. if you are going to operate a powerhouse of a computer you should not mind a mini jet engine sitting next to you.

heat will shorten the life of your entire computer.

*********************

on the economy side the gts250 is a reasonable performer but it is almost a rebranded gtx9800 and is the older g92b processor. its a good choice but i would stick with the gt200 processor series for better performance and lower power drain than equivalent performers in g92 series. here is a chart you can look at each model, its factory specifications, gflops and watt usage:


http://en.wikipedia.org/wiki/Comparison_of_NVIDIA_Graphics_Processing_Units

this chart helped me the most of anything in comparing 'bang for the buck'.
ID: 928201 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 928202 - Posted: 23 Aug 2009, 16:28:02 UTC - in response to Message 928192.  

Thanks Chuck, I'll accept your analysis.

What would you recommend as an alternative card? What's the best price/performance if Seti/CUDA is my main goal? I am not a gamer.

Nick



oh.. the best all out performers bar none are the GTX 285 or the GTX 295 dual processor card. the 295 is basically 2x285 cards in one package. the 285 out performs the tesla by no small amount. tesla rates 933gflops by the chart and the 285 rates 1063gflops.. more importantly is the dual precision rating that boinc gives on startup. my tesla rates 74gflops and my 285 rates 127gflops. quite a difference.

ID: 928202 · Report as offensive
Nick

Send message
Joined: 17 May 99
Posts: 96
Credit: 17,356,094
RAC: 0
United States
Message 928217 - Posted: 23 Aug 2009, 16:51:58 UTC
Last modified: 23 Aug 2009, 16:53:24 UTC

I noticed the 2009 version of PCWizard has a CUDA benchmarking tool. It doesn't seem to work though. Maybe it's the new 2.3 CUDA version I'm running?
ID: 928217 · Report as offensive
Nick

Send message
Joined: 17 May 99
Posts: 96
Credit: 17,356,094
RAC: 0
United States
Message 928267 - Posted: 23 Aug 2009, 20:07:42 UTC - in response to Message 928202.  

I see that EVGA sells a GeForce 9400 GT for a PCI2.1(not PCIe) slot. I could put that into one of my old machines. Is there any reason why this wouldn't run CUDA?


ID: 928267 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 928276 - Posted: 23 Aug 2009, 21:26:21 UTC - in response to Message 928267.  

I see that EVGA sells a GeForce 9400 GT for a PCI2.1(not PCIe) slot. I could put that into one of my old machines. Is there any reason why this wouldn't run CUDA?



it will run cuda. we have 2 8400GS pci cards running in an old 600mhz p3 clunker
pretty slow, but it still adds numbers to the total.


ID: 928276 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 928277 - Posted: 23 Aug 2009, 21:32:40 UTC - in response to Message 928217.  

I noticed the 2009 version of PCWizard has a CUDA benchmarking tool. It doesn't seem to work though. Maybe it's the new 2.3 CUDA version I'm running?


for benchmarking simply use the cuda sdk. you can check bandwidth and processing power such as gflops and nbody calculations from the cmdline.

ID: 928277 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 928278 - Posted: 23 Aug 2009, 21:36:00 UTC - in response to Message 928217.  

I noticed the 2009 version of PCWizard has a CUDA benchmarking tool. It doesn't seem to work though. Maybe it's the new 2.3 CUDA version I'm running?


dont know about 2.3. for me other people swear by it but i wound up swearing at it. cuda 2.3 simply does not work for me at this moment. maybe if i eliminate the tesla it will i dont know. i know it was installed properly and every check i could give on the integrity of the installations showed fine but the cards wound up complaining bitterly along with random desktop glitches etc when i ran the 2.3 libs and app so i went back to 2.2 and everything has been happy.

ID: 928278 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 928280 - Posted: 23 Aug 2009, 21:41:21 UTC - in response to Message 928217.  

I noticed the 2009 version of PCWizard has a CUDA benchmarking tool. It doesn't seem to work though. Maybe it's the new 2.3 CUDA version I'm running?


that won't allow you to set your fans though. for card settings and overclocking etc which i don't recommend but many have done it successfully, use evga precision or rivatuner both available from guru3d.com. evga precision is easier to use especially for setting fans but rivia tuner lets you do more with the esoteric side of things.
ID: 928280 · Report as offensive
Nick

Send message
Joined: 17 May 99
Posts: 96
Credit: 17,356,094
RAC: 0
United States
Message 928288 - Posted: 23 Aug 2009, 22:44:42 UTC - in response to Message 928276.  

it will run cuda. we have 2 8400GS pci cards running in an old 600mhz p3 clunker pretty slow, but it still adds numbers to the total.


Excellent, I'll pick up a couple. BTW, it doesn't show up on your list of computers. You have quite a stable there but no P3s.


ID: 928288 · Report as offensive
Chuck Gorish

Send message
Joined: 19 Jun 00
Posts: 156
Credit: 29,589,106
RAC: 0
United States
Message 928313 - Posted: 24 Aug 2009, 1:06:19 UTC - in response to Message 928288.  

it will run cuda. we have 2 8400GS pci cards running in an old 600mhz p3 clunker pretty slow, but it still adds numbers to the total.


Excellent, I'll pick up a couple. BTW, it doesn't show up on your list of computers. You have quite a stable there but no P3s.



thats because my boss is running it at the office on his account. i have all the file and application servers i am responsible for on my account



ID: 928313 · Report as offensive
Nick

Send message
Joined: 17 May 99
Posts: 96
Credit: 17,356,094
RAC: 0
United States
Message 929101 - Posted: 27 Aug 2009, 18:54:33 UTC - in response to Message 928313.  

Chuck - thought I'd update you on running a GeForce 9400 GS PCI card. It was a total disaster under Vista64. Trashed the OS and I had to restore. I tried it with both a PCIe card (ATI) and no other graphics card and it blue screened both times. I set it to PCI in the Bios and still blue screened. It never gets far enough to load the drivers off the OEM disk, it crashes when Vista detects new hardware and attempts to install a PCI to PCI bridge.

Works fine on a XP machine.

Don't ya just love Vista?


ID: 929101 · Report as offensive
1 · 2 · Next

Questions and Answers : GPU applications : Why so slow?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.