JACT (Jet Another Cuda Thread): GPU undeperforms when all CPU cores are processing WUs

Author	Message
Roberto Patriarca Send message Joined: 17 May 99 Posts: 22 Credit: 1,967,389 RAC: 0	Message 847383 - Posted: 31 Dec 2008, 17:40:49 UTC Configuration: - AMD Phenom (4 cores) - 8600GT on a PCIE slot - 8300GS on board - Windows XP SP2 - Boinc 6.4.5, Nvidia drivers 178.24 Now that CUDA seem to work as expected on my system, I have a couple of reports about scheduling and one important question. Here is what I noticed: - Sometimes only one CUDA thread is started, other times two of them (as I have got two CUDA-capable devices). If, when both GPUs are crunching, I manually suspend one CUDA WU, no new CUDA WU is started - i.e. only one CUDA thread is left working. - Sometimes three CPU-based threads are started, sometimes four of them. Note that I did not rise the CPU count artificially by edithing the XML file. - And finally the most important issue: when four CPU-based crunching threads are running, my CUDA devices seem to underperform: nVidia monitor says that GPU load is about 25%, and GPU temperature is considerably lower as well. If I manually stop one of the CPU tasks, GPU usage goes to 100% and temperature rises about 10 degrees C. Can anybody confirm the latter fact? Maybe it has all already been reported and/or addressed, but if this is not the case, owner of four cores might as well decide to reserve one of them to GPU crunchers until the issue is resolved: what they lose in CPU processing power should be more than balanced by GPU power increase. ID: 847383 ·

ohiomike Send message Joined: 14 Mar 04 Posts: 357 Credit: 650,069 RAC: 0	Message 847387 - Posted: 31 Dec 2008, 17:46:50 UTC - in response to Message 847383. Last modified: 31 Dec 2008, 17:48:12 UTC Configuration: - AMD Phenom (4 cores) - 8600GT on a PCIE slot - 8300GS on board - Windows XP SP2 - Boinc 6.4.5, Nvidia drivers 178.24 Now that CUDA seem to work as expected on my system, I have a couple of reports about scheduling and one important question. Here is what I noticed: - Sometimes only one CUDA thread is started, other times two of them (as I have got two CUDA-capable devices). If, when both GPUs are crunching, I manually suspend one CUDA WU, no new CUDA WU is started - i.e. only one CUDA thread is left working. - Sometimes three CPU-based threads are started, sometimes four of them. Note that I did not rise the CPU count artificially by edithing the XML file. - And finally the most important issue: when four CPU-based crunching threads are running, my CUDA devices seem to underperform: nVidia monitor says that GPU load is about 25%, and GPU temperature is considerably lower as well. If I manually stop one of the CPU tasks, GPU usage goes to 100% and temperature rises about 10 degrees C. Can anybody confirm the latter fact? Maybe it has all already been reported and/or addressed, but if this is not the case, owner of four cores might as well decide to reserve one of them to GPU crunchers until the issue is resolved: what they lose in CPU processing power should be more than balanced by GPU power increase. Just to add a note to this, when I had my Q6600 setup with 2 8800 GTS cards as a test, it looked like the Cuda tasks my have gotten mixed up about which GPU to use. I could not confirm this without more debug info from the tasks however. I like how Folding@Home does it, you lock a GPU to a task slot. Boinc Button Abuser In Training >My Shrubbers< ID: 847387 ·

Eric Korpela Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 3 Apr 99 Posts: 1382 Credit: 54,506,847 RAC: 60	Message 847389 - Posted: 31 Dec 2008, 17:53:42 UTC Some fairly substantial changes are going to be required to the BOINC client work scheduler to fix these problems. David says he's aware of these issues. I will keep reminding him. @SETIEric@qoto.org (Mastodon) ID: 847389 ·

archae86 Send message Joined: 31 Aug 99 Posts: 909 Credit: 1,582,816 RAC: 0	Message 847395 - Posted: 31 Dec 2008, 17:56:53 UTC - in response to Message 847383. - And finally the most important issue: when four CPU-based crunching threads are running, my CUDA devices seem to underperform: nVidia monitor says that GPU load is about 25%, and GPU temperature is considerably lower as well. If I manually stop one of the CPU tasks, GPU usage goes to 100% and temperature rises about 10 degrees C. Can anybody confirm the latter fact? Maybe it has all already been reported and/or addressed, but if this is not the case, owner of four cores might as well decide to reserve one of them to GPU crunchers until the issue is resolved: what they lose in CPU processing power should be more than balanced by GPU power increase. Hummm... that effect is interesting, and seems plausible to me. After all, the CPU is serving pretty much as the graphics card's memory controller in this case, as well as the other overhead housekeeping it is doing. All the folks who have endlessly told us how foolish Intel's Pentium Pro through Conroe designs were to put the memory controller offchip on the other side of a Front Side Bus may wish to contemplate a memory controller that is not merely offchip, but having to wait for a task switch from the currently running task to do anything at all, and then talks through an rather slower bus. Of course the oncard graphics memory is rather larger than even a Penryn cache, but still... Is the task that runs on the CPU to handle the GPU for CUDA operations visible to you in Task Manager? Is it running at the same (low) priority as the conventional BOINC CPU tasks? If so, as a temporary testing expendient, it might be interesting to see whether bumping up its priority to enourage the task to switch sooner to service CUDA might reduce the disadvantage you are speaking of. Caveat: I don't really understand either the programming or the hardware architecture of the CUDA implementation, so some of the above may be nonsense. ID: 847395 ·

Roberto Patriarca Send message Joined: 17 May 99 Posts: 22 Credit: 1,967,389 RAC: 0	Message 847409 - Posted: 31 Dec 2008, 18:29:16 UTC - in response to Message 847395. Is the task that runs on the CPU to handle the GPU for CUDA operations visible to you in Task Manager? Is it running at the same (low) priority as the conventional BOINC CPU tasks? GPU tasks are, as far as I could observe, normal tasks which use a very tiny amount of CPU power (less than 1%). Probably they just push and pull data to and from the graphics board memory. They run at normal priority, while conventional CPU crunchers run at minimun priority. Given this difference, one might think when as many CPU crunchers as CPU cores are running, one of them should be slowed down a little by GPU crunchers. Actually, as I found out, it is the other way around and GPU tasks get slowed down by as much as 75%. I tried and artificially rised GPU task priority, but it do not make any difference. I am positive that the issue will be resolved soon; in the meantime I thought it was better to put other GPU users on the advice. Happy new year everybody! Roberto ID: 847409 ·

enusbaum Volunteer tester Send message Joined: 29 Apr 00 Posts: 15 Credit: 5,921,750 RAC: 0	Message 847436 - Posted: 31 Dec 2008, 20:15:07 UTC From what I've clocked so far, it seems that my 8800GTX is 3.4x faster than a single Xeon X5355 core clocked at 2.66Ghz with the AK v8 optimized (SSSE3) version. So from that, you can say that a single GeForce 8800GTX (768MB edition) is almost as fast as a Quad Core Xeon X5355 @ 2.66ghz. Not too shabby! (This is all based on wall clock comparisons for comparable SETI@Home classic work units) ID: 847436 ·

enusbaum Volunteer tester Send message Joined: 29 Apr 00 Posts: 15 Credit: 5,921,750 RAC: 0	Message 847441 - Posted: 31 Dec 2008, 20:23:53 UTC Last modified: 31 Dec 2008, 20:24:04 UTC Another thing to keep in mind is the number of Stream Processors available to the CUDA interface depending on your video card. - 8300GS only has 8 Stream Processors with a Shader Clock of 900Mhz - 8600GT only has 32 Stream Processors with a Shader Clock of 1.19Ghz - 8800GTX has 128 Stream Processors with a Shader Clock of 1.35Ghz So you can see that just using the 8xxx series of cards for an example, there's a drastic difference in processing power depending on the card you're using. Just on raw numbers, the 8800GTX is over 20x faster than an 8300GS. For this reason I believe, nVidia left the 8300GS off it's list of "CUDA Enabled Devices", which is why you're only able to run CUDA tasks on the 8600GT you have installed (since running them on the 8300GS would be PAINFULLY slow) :) ID: 847441 ·

Borgholio Send message Joined: 2 Aug 99 Posts: 654 Credit: 18,623,738 RAC: 45	Message 847465 - Posted: 31 Dec 2008, 21:27:49 UTC I am running an 8400 GS and my CUDA crunch times range from between 3 hours to 8 hours per workunit, while my dual core P4 crunches between 1.5 hours and 6 hours per core. In other words, my CUDA client is far slower than my CPU. You will be assimilated...bunghole! ID: 847465 ·

MeglaW Volunteer tester Send message Joined: 21 Jun 00 Posts: 36 Credit: 479,460 RAC: 0	Message 848229 - Posted: 2 Jan 2009, 16:04:48 UTC Last modified: 2 Jan 2009, 16:05:49 UTC i noticed this to, i did the cc_config.xml thing for 3 cores (c2d + gpu) and time went up from 6-7min to 20-30min, also, gpu temp dropped 4 degrees when running 2+1 so im back to no config and it using my cpu mainly for one unit of einstein@home and gpu soly for seti cuda.. no more folding for me. running on a watercooled (brag) c2d 1.6 @ 3.4GHz 425x8, with a 8800gt running 655x965MHz ID: 848229 ·

Roberto Patriarca Send message Joined: 17 May 99 Posts: 22 Credit: 1,967,389 RAC: 0	Message 848242 - Posted: 2 Jan 2009, 16:36:22 UTC - in response to Message 847465. In other words, my CUDA client is far slower than my CPU. Did you try to run CPU application on just one of your two cores? Your CUDA client should benefit a lot from running on an unloaded CPU. Probably it will never match your CPU speed but it is worth a try, if only for testing purposes. ID: 848242 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 848245 - Posted: 2 Jan 2009, 16:48:08 UTC - in response to Message 848242. In other words, my CUDA client is far slower than my CPU. Did you try to run CPU application on just one of your two cores? Your CUDA client should benefit a lot from running on an unloaded CPU. Probably it will never match your CPU speed but it is worth a try, if only for testing purposes. Well, look this thread. http://setiathome.berkeley.edu/forum_thread.php?id=50829 This CPU.GPU full load question already discussed in details and solution for full GPU and CPU load is supplied. ID: 848245 ·

Roberto Patriarca Send message Joined: 17 May 99 Posts: 22 Credit: 1,967,389 RAC: 0	Message 848745 - Posted: 3 Jan 2009, 16:59:13 UTC - in response to Message 847441. For this reason I believe, nVidia left the 8300GS off it's list of "CUDA Enabled Devices", which is why you're only able to run CUDA tasks on the 8600GT you have installed (since running them on the 8300GS would be PAINFULLY slow) :) I don't know whether my built-in 8300 is a GS, but I am able to run CUDA on it. It is quite slow, but still useful IMHO. I took some time to match credits claimed by my CUDA-crunched workunits and the (wall)time it took to crunch them. The 8300 is indeed some 3.5 times slower than the 8600 GT on similar work units; this figure agrees with you: 8 shaders (8300) against 32 (8600 GT), but 1500 MHz shader clock (8300) against 1200 (8600 GT). By comparison, the 8300 is slightly slower than one of my CPU cores running the standard S@H multibeam application, while the 8600 is slightly slower than the all four of them combined. So is it worth while to run CUDA on an 8300? Of course I would not buy such a board (neither an 8600 GT, BTW) just to crunch workunits. But their combined power matches that of an extra quad-core CPU - a free quad core CPU, as I already own them. Why not using that power? Regards, Roberto ID: 848745 ·

Loony Send message Joined: 8 Dec 99 Posts: 5 Credit: 3,193,475 RAC: 78	Message 848989 - Posted: 4 Jan 2009, 0:11:47 UTC - in response to Message 847465. Unlucky...... Mine dropped from 3 hours to 17 minutes for an average unit..... Unfortunatley it also kept crashing the Video card.... some really pretty effects... So rolled back to NON cuda version ID: 848989 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.