CUDA cards: SETI crunching speeds

Author	Message
Jack Shaftoe Send message Joined: 19 Aug 04 Posts: 44 Credit: 2,343,242 RAC: 0	Message 864310 - Posted: 11 Feb 2009, 12:38:39 UTC - in response to Message 864189. It's early days yet, and we don't have data from a full set of cards. But on this very preliminary evidence, using a very preliminary SETI application, the 2xx series cards don't seem to show a performance boost here at SETI commensurate with their pricing premium. This may change, but at the present (early) stage of CUDA development, I'm glad I opted for 9800-range cards. Thank you Richard, cheers. ID: 864310 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20265 Credit: 7,508,002 RAC: 20	Message 864316 - Posted: 11 Feb 2009, 13:28:39 UTC - in response to Message 864199. Last modified: 11 Feb 2009, 13:30:09 UTC ... OK, now for the killer comparison with a couple of days at high priority... If polling, then I'd expect little change in wall-clock time for the WUs. If the CPU is maxed out, then the wall-clock times should change proportionately. Here's hoping for a consistent mix of WUs to show something useful! Such is my hypothesis!! Well... With just a few noisy plots to compare, the low priority runs look to be slower and very much more variable than the higher priority runs. That suggests that the CPU is the limiting factor at least. So perhaps indeed there is no polling or at least limited polling. I've also got a log running of CPU utilisation, and there does appear to be brief occasions where the CUDA task uses less than max CPU for it's work, dropping to as low as 90% rather than 99 - 100%. Someone on Linux with a much more powerful CPU than my old clunker is needed to test! Still curious as to why the AMD X2 has one core always maxed out for even quite a low spec graphics card... Is not so much of the work handed over to the GPU?... Still scraping data. Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 864316 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 864346 - Posted: 11 Feb 2009, 15:34:38 UTC Last modified: 11 Feb 2009, 15:38:30 UTC 'Polling' mean the CPU is 'waiting'? The Google-translator don't know 'polling'.. ;-) If the CPU (one Core of it) would be all the time 100 %.. then it's no longer GPU crunching, or? ;-) If my two GTX260 Core216 crunching they get ~ 7 % per GPU from the whole CPU. This means ~ 28 % Core per one GPU. At the first ~ 25 sec. of every WU the CPU support rise to 25 % CPU. [~ 100 % Core] AMD Phenom II X4 940 BE @ 3.0 GHz. Or you mean all the time with higher CPU support to the GPU, the crunching speed would be faster? ID: 864346 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20265 Credit: 7,508,002 RAC: 20	Message 864397 - Posted: 11 Feb 2009, 19:46:45 UTC - in response to Message 864346. Last modified: 11 Feb 2009, 19:48:53 UTC 'Polling' mean the CPU is 'waiting'? The Google-translator don't know 'polling'.. ;-) "Polling" means the CPU continuously repeatedly checks for whether the GPU has finished. This is also called a "busy loop", in that the CPU is kept 100% busy just running a loop to check (poll) for a finish condition. This is very wasteful of the CPU. Polling is similar to the 'do nothing' "idle loop" which is exactly what Boinc is trying to replace! If the CPU (one Core of it) would be all the time 100 %.. then it's no longer GPU crunching, or? ;-) Either it is 100% occupied feeding and retrieving data to/from the GPU, or it is in a "busy loop" pestering (polling) the GPU to see if the GPU is still busy or now idle. If my two GTX260 Core216 crunching they get ~ 7 % per GPU from the whole CPU. This means ~ 28 % Core per one GPU. At the first ~ 25 sec. of every WU the CPU support rise to 25 % CPU. [~ 100 % Core] AMD Phenom II X4 940 BE @ 3.0 GHz. OK... So my cores are only running at 2.6GHz but they are kept at 90% to 100% busy... Or you mean all the time with higher CPU support to the GPU, the crunching speed would be faster? That is my question. My suspicion of polling is now lessened with more recent data. The rate of results appears to be roughly proportional to the CPU time, which suggests that the system is CPU limited. If polling was used, then there should be less of a results slowdown when the CPU time is reduced by setting a lower priority. So... Is the Linux version less efficient than the Windows versions? Or is this a feature with using the 8600GT 256MBytes hardware? Crunch3r has hinted that with a more powerful CPU, he sees the CPU idle whilst the GPU is kept busy. That strongly suggests there is no polling for that case. Anyone with a top-end Intel CPUs running a 8600GT GPU on Linux for comparison? Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 864397 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 864409 - Posted: 11 Feb 2009, 20:22:56 UTC - in response to Message 864397. Last modified: 11 Feb 2009, 20:23:21 UTC Very strange indeed. I saw no speed increase from polling for my 9600 running with 2,6GHz quad. I think your 8600 should be slower than my 9600GSO, so it should require even less CPU, not more CPU for feeding... But you feels CPU limits when I don't feel it. Probably it's Windows/Linux question.... Could you try dual boot into windows on the same host ?... (Or maybe someone know Linux live CD with preinstalled nVisia drivers that I could use for testing Linux CUDA MB on my own host ? ) ID: 864409 ·

Crunch3r Volunteer tester Send message Joined: 15 Apr 99 Posts: 1546 Credit: 3,438,823 RAC: 0	Message 864428 - Posted: 11 Feb 2009, 21:39:57 UTC - in response to Message 864409. Very strange indeed. I saw no speed increase from polling for my 9600 running with 2,6GHz quad. I think your 8600 should be slower than my 9600GSO, so it should require even less CPU, not more CPU for feeding... But you feels CPU limits when I don't feel it. Probably it's Windows/Linux question.... Could you try dual boot into windows on the same host ?... (Or maybe someone know Linux live CD with preinstalled nVisia drivers that I could use for testing Linux CUDA MB on my own host ? ) it's actually the same on windows. If you do run an 8+1, 4+1 or 2+1 setup you'll starve the gpu. GPU crunching times on a Xeon V8 + 8800GT inceased by 30 to 45 min depending on the AR (running an 8+1 setup). You can even watch that behavior in the boinc manager... simply stop all CPU tasks and you'll see the GPU crunching speed will increase. Join BOINC United now! ID: 864428 ·

SoNic Send message Joined: 24 Dec 00 Posts: 140 Credit: 2,963,627 RAC: 0	Message 864443 - Posted: 11 Feb 2009, 22:59:35 UTC I don't belive that, because the temperature of the GPU is as high as it can get either with the config set for 2 CPU or for 2+1 CPU (I have a C2D). When I exit from Crysis (BOINC stopped of course) the GPU temp it is the same as during cruncing. I have a baseline now - 3000-3200 sec for a 42 credit unit. I will try for a day with 2 CPU so one of the cores will be dedicated feeding the GPU. ID: 864443 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 864454 - Posted: 11 Feb 2009, 23:23:42 UTC - in response to Message 864428. Very strange indeed. I saw no speed increase from polling for my 9600 running with 2,6GHz quad. I think your 8600 should be slower than my 9600GSO, so it should require even less CPU, not more CPU for feeding... But you feels CPU limits when I don't feel it. Probably it's Windows/Linux question.... Could you try dual boot into windows on the same host ?... (Or maybe someone know Linux live CD with preinstalled nVisia drivers that I could use for testing Linux CUDA MB on my own host ? ) it's actually the same on windows. If you do run an 8+1, 4+1 or 2+1 setup you'll starve the gpu. GPU crunching times on a Xeon V8 + 8800GT inceased by 30 to 45 min depending on the AR (running an 8+1 setup). You can even watch that behavior in the boinc manager... simply stop all CPU tasks and you'll see the GPU crunching speed will increase. I did special test. Ran CUDA with no BOINC CPU tasks, 1, 2, 3, 4 - elapsed times differs VERY small. Both for tested /CUDA app and for CPU apps. So it's not the same on Windows, at least on my own host. ID: 864454 ·

SoNic Send message Joined: 24 Dec 00 Posts: 140 Credit: 2,963,627 RAC: 0	Message 864460 - Posted: 11 Feb 2009, 23:39:20 UTC Last modified: 11 Feb 2009, 23:40:23 UTC I have the first unit. The temperature is the same (don't know why) but the time is at 1700 sec now. I will test some more. WinXP here. ID: 864460 ·

SoNic Send message Joined: 24 Dec 00 Posts: 140 Credit: 2,963,627 RAC: 0	Message 864621 - Posted: 12 Feb 2009, 11:34:45 UTC Last modified: 12 Feb 2009, 11:36:51 UTC The average speed after a few units increased just a little bit - from 3200 to 3000 sec/unit... So in my case the CPU feeds ok the GPU even in the case of CPU+1 units running. But I have only a GF 9500GT. ID: 864621 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20265 Credit: 7,508,002 RAC: 20	Message 864655 - Posted: 12 Feb 2009, 15:29:32 UTC - in response to Message 864409. Last modified: 12 Feb 2009, 15:32:30 UTC Very strange indeed. ... Probably it's Windows/Linux question.... Could you try dual boot into windows on the same host ?... (Or maybe someone know Linux live CD with preinstalled nVisia drivers that I could use for testing Linux CUDA MB on my own host ? ) Mmmm... Still curious... Sorry, this area is Linux & nix only. The only Windows I have here is Win95C. I don't relish a 1 hour+ Windows install for a look-see test... Or... Err nope. Linux is already on here so I'd have to start physically swapping HDDs to avoid a Windows install blindly overwriting the Linux... :-(* To try to replicate for a test, a good bet would be to try Mandriva Linux One 2009. That's likely closest to the system here. You may need to add the "contrib" repositories to install: dkms-nvidia-current-180.22-1mdv2009.0 nvidia-current-devel-180.22-1mdv2009.0 You can then install the Boinc ".sh" install script into your home directory, add the Crunch3r stuff, and crunch on. Hope that helps, Cheers, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 864655 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20265 Credit: 7,508,002 RAC: 20	Message 864701 - Posted: 12 Feb 2009, 17:12:39 UTC - in response to Message 864655. Last modified: 12 Feb 2009, 17:13:46 UTC Further thoughts: You may well also need: x11-driver-video-nvidia-current An 'easy' way to sort that lot out is to call up XFdrake or to set to use the nVidia drivers from the graphics card setup. Unless you have more than 256MBytes of VRAM, you'll have to log out of the desktop and drop down to a command terminal (so that you free up a maximum of VRAM). Eg: Run boinc, run the graphical boincmgr to set up seti, exit, logout, login via a terminal (login menu select, or just simply ctl-alt-F1), login, cd BOINC ./boinc >boinc.log 2>&1 & ... And see what happens. Use: top to see what the processes are doing. Good luck! Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 864701 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 864767 - Posted: 12 Feb 2009, 20:13:54 UTC - in response to Message 864701. Well, it seems it's not "Live CD/DVD", more packages need to be added and so on... not ready to jump into Linux configs too. This question will be sorted sooner or later by comparison RACs of similar hosts under Linux/ Windows... ID: 864767 ·

RandyC Send message Joined: 20 Oct 99 Posts: 714 Credit: 1,704,345 RAC: 0	Message 865001 - Posted: 13 Feb 2009, 12:17:13 UTC Just upgraded my video card from 9400GT to a 9500GT...same machine, no other changes. Very preliminary results (overnight): 9400GT processed AR .44xxx in ~5100 secs per WU (~30CS/hr) 9500GT processes AR .44xxx in ~3060 secs per WU (~50CS/hr) about 40% improvement. Not much to compare since S@H only seems to be sending .44xxx WUs currently. The 9400GT DID process other ARs faster (up to ~37CS/hr at times), but I would expect better performance from the 9500GT at those ARs as well. 9400GT has 16 stream processors - 9500GT has 32. Price difference at Microcenter is about $11.00 more for the 9500GT. ID: 865001 ·

Westsail and Pyxey Volunteer tester Send message Joined: 26 Jul 99 Posts: 338 Credit: 20,544,999 RAC: 0	Message 865036 - Posted: 13 Feb 2009, 14:32:52 UTC OK, got the new host up. She has a c1070 260 and 9500gt. As soon as we can get some work I will shoot some new data over. It will be neat doing 8 units at once. grins "The most exciting phrase to hear in science, the one that heralds new discoveries, is not Eureka! (I found it!) but rather, 'hmm... that's funny...'" -- Isaac Asimov ID: 865036 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 865186 - Posted: 13 Feb 2009, 22:27:10 UTC To my knowledge.. More 'Processor Cores' and more 'Shader MHz' more crunching speed.. ID: 865186 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 865260 - Posted: 14 Feb 2009, 3:04:45 UTC Last modified: 14 Feb 2009, 3:07:27 UTC I found something confusing.. The same WU. The CPU time is only ~ 2 sec. different. O.K., it's not the real GPU crunching time.. ;-) But so longer the CPU support so longer the GPU crunching time, or? So in real maybe ~ 10 sec. different. Why? I thought the GTX 260 would be much faster.. than a 8800 GTS.. Because of Intel-/AMD- architecture? Why the BOINC message don't say the real GFLOPS? What you think? WU true angle range is : 0.447869 AMD Phenom II X4 940 BE @ 3.0 GHz GeForce GTX 260 (OC Edition) totalGlobalMem = 939261952 sharedMemPerBlock = 16384 regsPerBlock = 16384 warpSize = 32 memPitch = 262144 maxThreadsPerBlock = 512 clockRate = 1458000 totalConstMem = 65536 major = 1 minor = 3 textureAlignment = 256 deviceOverlap = 1 multiProcessorCount = 27 [112 GFLOPS - message in BOINC] 804 GFLOPS (stock GPU) - read in a report Shader: 216 - 1458 MHz CPU time 118.0469 With Raistmer's V7 mod --------------------------------------------------- Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz GeForce 8800 GTS totalGlobalMem = 335216640 sharedMemPerBlock = 16384 regsPerBlock = 8192 warpSize = 32 memPitch = 262144 maxThreadsPerBlock = 512 clockRate = 1350000 totalConstMem = 65536 major = 1 minor = 0 textureAlignment = 256 deviceOverlap = 0 multiProcessorCount = 12 518 GFLOPs - read in a report Shader: 128 - 1350 MHz CPU time 120.3594 stock app ID: 865260 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 865307 - Posted: 14 Feb 2009, 8:26:44 UTC - in response to Message 865260. Better try to get elapsed times for 2 WUs with same AR for comparison. ID: 865307 ·

Sutaru Tsureku Volunteer tester Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5	Message 865508 - Posted: 14 Feb 2009, 22:17:45 UTC - in response to Message 865307. Last modified: 14 Feb 2009, 23:15:14 UTC Better try to get elapsed times for 2 WUs with same AR for comparison. But, the current app(s) don't have the feature to show the real GPU crunching time.. ;-) ----------------------------------------------------------------- Something confusing.. For a 44.x WU my rig have ~ 120 sec. CPU support. [AR=0.44x] For a 72.x WU my rig have ~ 60 sec. CPU support.. [AR=0.14x] And then subtract the ~ 25 sec. 100 % CPU load, before GPU crunching.. O.K., O.K., I don't have the real GPU crunching time.. but something confusing to tell.. ;-) EDIT: I looked to the rig and saw.. The AR=0.44x need around 8 min. - everything well. The AR=0.14x need around 15 min. - and the rig need less wattage and the BOINC Manager is little bit slowly.. [GPU crunching times in BOINC Manager] ID: 865508 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 865512 - Posted: 14 Feb 2009, 22:27:31 UTC - in response to Message 865508. Yuo will be less confused if yopu will use elapsed time instead of CPU time. It roughly equals GPU time. And performance determined by elapsed time, not something else. ID: 865512 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.