Questions and Answers :
GPU applications :
Why so slow?
Message board moderation
Author | Message |
---|---|
Nick Send message Joined: 17 May 99 Posts: 96 Credit: 17,356,094 RAC: 0 |
I have a Dell system with an Intel Q9300 quad core. I've had it for 6 months or more and I just added an EVGA card with a GeForce 9400 GT with 1G memory. The Cuda jobs I run seem to take just about the same time as each of the CPU cores requires to process a wu. I was expecting something faster, was I being unrealistic? Does screen resolution impact performance? I've got 2 monitors, one at 2560x1600 and the other at 1920x1600. Any thoughts? |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
I have a Dell system with an Intel Q9300 quad core. I've had it for 6 months or more and I just added an EVGA card with a GeForce 9400 GT with 1G memory. The Cuda jobs I run seem to take just about the same time as each of the CPU cores requires to process a wu. I was expecting something faster, was I being unrealistic? Does screen resolution impact performance? I've got 2 monitors, one at 2560x1600 and the other at 1920x1600. Any thoughts? the 9400 probably has around 4 gflops according to boinc. it tells you in the message area of the manager when it reports your cuda card. you are using a lot of vid memory with dual monitors, multiple very high resolutions, so i would bet there is little for it to process with not to mention how busy you may be keeping the gpu with whatever your desktop is doing. everything concerning the desktop config and usage chews video memory. my guess is it is close to right. go take a look at your completed cuda units in tasks off your host list and see if it shows successful completion or if it shows out of memory and processing on cpu instead, etc.. if the cuda units show successful completion, then it probably is accurate. the 9400 even if not used as a vid card but only as a cuda processor is not a blazing unit that will see 5 or 10 min workunits. it will take some time. but before you settle for that, you need to double check your production to be sure the card is being used and how much memory is available to cuda etc. its all in the completed workunits. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
i just took a look at your workunits. they look odd to me. do you have the cuda 2.2 toolkit installed and are the libraries where the cuda app can find them? you have plenty of memory available so that is not an issue like i first thought. here is what concerns me in your reports: Work Unit Info: ............... WU true angle range is : 1.479648 After app init: total GPU memory 1073741824 free GPU memory 963768320 Flopcounter: 11519105207831.588000 Spike count: 0 Pulse count: 0 Triplet count: 2 Gaussian count: 0 Wall-clock time elapsed since last restart: 3348.9 seconds class T_FFT<0>: total=2.67e+006, N=98124, <>=27 (2.70e+001), min=0 (0.00e+000) class T_FFT<8>: total=9.30e+001, N=3, <>=31 (3.10e+001), min=31 (3.10e+001) class T_FFT<16>: total=1.25e+002, N=7, <>=17 (1.70e+001), min=15 (1.50e+001) class T_FFT<64>: total=2.65e+002, N=29, <>=9 (9.00e+000), min=0 (0.00e+000) class T_FFT<256>: total=0.00e+000, N=115, <>=0 (0.00e+000), min=0 (0.00e+000) class T_FFT<512>: total=2.65e+003, N=229, <>=11 (1.10e+001), min=0 (0.00e+000) class T_FFT<1024>: total=4.30e+003, N=457, <>=9 (9.00e+000), min=0 (0.00e+000) class T_FFT<2048>: total=1.50e+004, N=915, <>=16 (1.60e+001), min=0 (0.00e+000) class T_FFT<4096>: total=3.94e+003, N=211, <>=18 (1.80e+001), min=15 (1.50e+001) class T_FFT<8192>: total=1.72e+004, N=845, <>=20 (2.00e+001), min=15 (1.50e+001) called boinc_finish </stderr_txt> ]]> is boinc being stopped a lot? here is one of my workunits for comparison: Work Unit Info: unfortunately i do not know enough about these reports to say for sure what is happening but there is definitely something... don't just use the latest of everything because it is there, sometimes they do not match well. i would be sure to use one of the 185.18 series of drivers like the 185.18.29 or 185.18.31 along with the cuda toolkit 2.2. i do not think the 190 driver series and cuda 2.3 will fly properly on this card plus you need to match the cuda version of your application which is likely to be cuda 2.2 (at least mine is). once this is fixed up you might also want to use the rebranding script available on lunatics to make sure no vlar or vhar units are fed to cuda and are assigned instead to cpu since they will cause extreme slowdown in gpu processing, plus i think you have the vlarkill version of the app so it will just trash them which is a waste. |
Nick Send message Joined: 17 May 99 Posts: 96 Credit: 17,356,094 RAC: 0 |
Chuck - I'm running: CPUs - AK_v8_win_x64_SSE41 CUDA - MB_6.08_CUDA_V12_VLARKill_FPLim248.exe driver: 19062 (V2.3 I believe) Seti@home runs continuously. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
Chuck - hmm im not familiar with the windows versioning on the app. the fplim248 confuses me etc.. im using linux.. if it is not too much of a hassle i would back off to one of the 185 series drivers i mentioned before and cuda 2.2 and see what happens. keep the setups around for 2.3 just in case there is no difference. yeah the 190 driver wants 2.3. i cant run 2.3 as long as i run my tesla. it is a prerelease engineering version and when i installed 2.3 it went insane so i dropped back to 2.2. i don't think the older cards can use 2.3.. i'm going to ask a friend to stop in and look at your workunits to see what he thinks. he is an expert on these things and probably will know instantly what those fft lines mean. |
Nick Send message Joined: 17 May 99 Posts: 96 Credit: 17,356,094 RAC: 0 |
Thanks Chuck. Let me know what he says. |
Gundolf Jahn Send message Joined: 19 Sep 00 Posts: 3184 Credit: 446,358 RAC: 0 |
What's your preference for Suspend GPU work while computer is in use? Gruß, Gundolf |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
Thanks Chuck. Let me know what he says. they say the fft notations are normal for that build.. msg i received concerning it: The FFT lines are normal for those builds, just keeping some statistics on those which may give clues for future improvements. The FPLim2048 in the file name is also related to the same effort. so chances are your processing times are about right. the 9400 is one of the slower cuda cards having 16 shader cores running at 1.4ghz and your cpu is quite fast so i imagine they would be a close match with the 9400 probably pulling out for the win by a near margin. the high performance cards have 192, 216 or 240 shader cores. at least you can process more workunits than without :) also as Gundolf says, check all your settings to be sure they are not suspended when computer is being used. |
Nick Send message Joined: 17 May 99 Posts: 96 Credit: 17,356,094 RAC: 0 |
Thanks Chuck, I'll accept your analysis. What would you recommend as an alternative card? What's the best price/performance if Seti/CUDA is my main goal? I am not a gamer. Nick |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
Thanks Chuck, I'll accept your analysis. i was looking at that myself for a while. definitely the GT200 series cards.i think the best performance in keeping with cost savings is the gtx 260 216core SP edition. be careful there are 3 versions of this card. the one released in dec 2008 is the one you want with the gt200b (note the b) 55nm processor. its performance should come close to a tesla. tesla rates 933gflops and this card rates 805gflops. i had a moment of insanity and i decided to go for the gold so i got the gtx285 and have not looked back. in fact i am going to replace my tesla with a 2nd 285. one of the most important features, more than clock speed, is the number of shaders or 'cores'. the more you can get the better it will perform. ******************** two extremely important things to consider. performance graphics cards are power hungry heat generating monsters! you will need a strong power supply. i use an antec signature series 850w for handling 2 cards... and that is barely enough. i find i should have gone for a 1000w one as i have little 'headroom' left. for one card a 650w should be sufficient as long as the power rail plugged into the card can supply the necessary current. this is *extremely* important. most of these high perf cards will require 2 pcie power plugs. if there is not quite enough power they will be clunky in performance if they even agree to run. your case will also need a LOT of fresh air ventilation flowing through it to get rid of the excess heat these puppies toss out (mine has 8 fans not including cpu/gpu fans). additionally as soon as you get a card, use evga precision or riva tuner to increase the fan speed. watch your gpu and card ambient temp as it processes. i keep mine both at 100% to keep a max of 70c under load. all these fans are stupidly set for slow hot operation to keep noise down. if you are going to operate a powerhouse of a computer you should not mind a mini jet engine sitting next to you. heat will shorten the life of your entire computer. ********************* on the economy side the gts250 is a reasonable performer but it is almost a rebranded gtx9800 and is the older g92b processor. its a good choice but i would stick with the gt200 processor series for better performance and lower power drain than equivalent performers in g92 series. here is a chart you can look at each model, its factory specifications, gflops and watt usage: http://en.wikipedia.org/wiki/Comparison_of_NVIDIA_Graphics_Processing_Units this chart helped me the most of anything in comparing 'bang for the buck'. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
Thanks Chuck, I'll accept your analysis. oh.. the best all out performers bar none are the GTX 285 or the GTX 295 dual processor card. the 295 is basically 2x285 cards in one package. the 285 out performs the tesla by no small amount. tesla rates 933gflops by the chart and the 285 rates 1063gflops.. more importantly is the dual precision rating that boinc gives on startup. my tesla rates 74gflops and my 285 rates 127gflops. quite a difference. |
Nick Send message Joined: 17 May 99 Posts: 96 Credit: 17,356,094 RAC: 0 |
I noticed the 2009 version of PCWizard has a CUDA benchmarking tool. It doesn't seem to work though. Maybe it's the new 2.3 CUDA version I'm running? |
Nick Send message Joined: 17 May 99 Posts: 96 Credit: 17,356,094 RAC: 0 |
I see that EVGA sells a GeForce 9400 GT for a PCI2.1(not PCIe) slot. I could put that into one of my old machines. Is there any reason why this wouldn't run CUDA? |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
I see that EVGA sells a GeForce 9400 GT for a PCI2.1(not PCIe) slot. I could put that into one of my old machines. Is there any reason why this wouldn't run CUDA? it will run cuda. we have 2 8400GS pci cards running in an old 600mhz p3 clunker pretty slow, but it still adds numbers to the total. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
I noticed the 2009 version of PCWizard has a CUDA benchmarking tool. It doesn't seem to work though. Maybe it's the new 2.3 CUDA version I'm running? for benchmarking simply use the cuda sdk. you can check bandwidth and processing power such as gflops and nbody calculations from the cmdline. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
I noticed the 2009 version of PCWizard has a CUDA benchmarking tool. It doesn't seem to work though. Maybe it's the new 2.3 CUDA version I'm running? dont know about 2.3. for me other people swear by it but i wound up swearing at it. cuda 2.3 simply does not work for me at this moment. maybe if i eliminate the tesla it will i dont know. i know it was installed properly and every check i could give on the integrity of the installations showed fine but the cards wound up complaining bitterly along with random desktop glitches etc when i ran the 2.3 libs and app so i went back to 2.2 and everything has been happy. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
I noticed the 2009 version of PCWizard has a CUDA benchmarking tool. It doesn't seem to work though. Maybe it's the new 2.3 CUDA version I'm running? that won't allow you to set your fans though. for card settings and overclocking etc which i don't recommend but many have done it successfully, use evga precision or rivatuner both available from guru3d.com. evga precision is easier to use especially for setting fans but rivia tuner lets you do more with the esoteric side of things. |
Nick Send message Joined: 17 May 99 Posts: 96 Credit: 17,356,094 RAC: 0 |
it will run cuda. we have 2 8400GS pci cards running in an old 600mhz p3 clunker pretty slow, but it still adds numbers to the total. Excellent, I'll pick up a couple. BTW, it doesn't show up on your list of computers. You have quite a stable there but no P3s. |
Chuck Gorish Send message Joined: 19 Jun 00 Posts: 156 Credit: 29,589,106 RAC: 0 |
it will run cuda. we have 2 8400GS pci cards running in an old 600mhz p3 clunker pretty slow, but it still adds numbers to the total. thats because my boss is running it at the office on his account. i have all the file and application servers i am responsible for on my account |
Nick Send message Joined: 17 May 99 Posts: 96 Credit: 17,356,094 RAC: 0 |
Chuck - thought I'd update you on running a GeForce 9400 GS PCI card. It was a total disaster under Vista64. Trashed the OS and I had to restore. I tried it with both a PCIe card (ATI) and no other graphics card and it blue screened both times. I set it to PCI in the Bios and still blue screened. It never gets far enough to load the drivers off the OEM disk, it crashes when Vista detects new hardware and attempts to install a PCI to PCI bridge. Works fine on a XP machine. Don't ya just love Vista? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.