JACT (Jet Another Cuda Thread): GPU undeperforms when all CPU cores are processing WUs

Questions and Answers : GPU applications : JACT (Jet Another Cuda Thread): GPU undeperforms when all CPU cores are processing WUs
Message board moderation

To post messages, you must log in.

AuthorMessage
Roberto Patriarca

Send message
Joined: 17 May 99
Posts: 22
Credit: 1,967,389
RAC: 0
Italy
Message 847383 - Posted: 31 Dec 2008, 17:40:49 UTC

Configuration:

- AMD Phenom (4 cores)
- 8600GT on a PCIE slot
- 8300GS on board
- Windows XP SP2
- Boinc 6.4.5, Nvidia drivers 178.24

Now that CUDA seem to work as expected on my system, I have a couple of reports about scheduling and one important question. Here is what I noticed:

- Sometimes only one CUDA thread is started, other times two of them (as I have got two CUDA-capable devices). If, when both GPUs are crunching, I manually suspend one CUDA WU, no new CUDA WU is started - i.e. only one CUDA thread is left working.

- Sometimes three CPU-based threads are started, sometimes four of them. Note that I did not rise the CPU count artificially by edithing the XML file.

- And finally the most important issue: when four CPU-based crunching threads are running, my CUDA devices seem to underperform: nVidia monitor says that GPU load is about 25%, and GPU temperature is considerably lower as well. If I manually stop one of the CPU tasks, GPU usage goes to 100% and temperature rises about 10 degrees C.

Can anybody confirm the latter fact?
Maybe it has all already been reported and/or addressed, but if this is not the case, owner of four cores might as well decide to reserve one of them to GPU crunchers until the issue is resolved: what they lose in CPU processing power should be more than balanced by GPU power increase.
ID: 847383 · Report as offensive
Profile ohiomike
Avatar

Send message
Joined: 14 Mar 04
Posts: 357
Credit: 650,069
RAC: 0
United States
Message 847387 - Posted: 31 Dec 2008, 17:46:50 UTC - in response to Message 847383.  
Last modified: 31 Dec 2008, 17:48:12 UTC

Configuration:

- AMD Phenom (4 cores)
- 8600GT on a PCIE slot
- 8300GS on board
- Windows XP SP2
- Boinc 6.4.5, Nvidia drivers 178.24

Now that CUDA seem to work as expected on my system, I have a couple of reports about scheduling and one important question. Here is what I noticed:

- Sometimes only one CUDA thread is started, other times two of them (as I have got two CUDA-capable devices). If, when both GPUs are crunching, I manually suspend one CUDA WU, no new CUDA WU is started - i.e. only one CUDA thread is left working.

- Sometimes three CPU-based threads are started, sometimes four of them. Note that I did not rise the CPU count artificially by edithing the XML file.

- And finally the most important issue: when four CPU-based crunching threads are running, my CUDA devices seem to underperform: nVidia monitor says that GPU load is about 25%, and GPU temperature is considerably lower as well. If I manually stop one of the CPU tasks, GPU usage goes to 100% and temperature rises about 10 degrees C.

Can anybody confirm the latter fact?
Maybe it has all already been reported and/or addressed, but if this is not the case, owner of four cores might as well decide to reserve one of them to GPU crunchers until the issue is resolved: what they lose in CPU processing power should be more than balanced by GPU power increase.


Just to add a note to this, when I had my Q6600 setup with 2 8800 GTS cards as a test, it looked like the Cuda tasks my have gotten mixed up about which GPU to use. I could not confirm this without more debug info from the tasks however. I like how Folding@Home does it, you lock a GPU to a task slot.

Boinc Button Abuser In Training >My Shrubbers<
ID: 847387 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 847389 - Posted: 31 Dec 2008, 17:53:42 UTC

Some fairly substantial changes are going to be required to the BOINC client work scheduler to fix these problems. David says he's aware of these issues. I will keep reminding him.

@SETIEric@qoto.org (Mastodon)

ID: 847389 · Report as offensive
archae86

Send message
Joined: 31 Aug 99
Posts: 909
Credit: 1,582,816
RAC: 0
United States
Message 847395 - Posted: 31 Dec 2008, 17:56:53 UTC - in response to Message 847383.  

- And finally the most important issue: when four CPU-based crunching threads are running, my CUDA devices seem to underperform: nVidia monitor says that GPU load is about 25%, and GPU temperature is considerably lower as well. If I manually stop one of the CPU tasks, GPU usage goes to 100% and temperature rises about 10 degrees C.

Can anybody confirm the latter fact?
Maybe it has all already been reported and/or addressed, but if this is not the case, owner of four cores might as well decide to reserve one of them to GPU crunchers until the issue is resolved: what they lose in CPU processing power should be more than balanced by GPU power increase.

Hummm... that effect is interesting, and seems plausible to me. After all, the CPU is serving pretty much as the graphics card's memory controller in this case, as well as the other overhead housekeeping it is doing.

All the folks who have endlessly told us how foolish Intel's Pentium Pro through Conroe designs were to put the memory controller offchip on the other side of a Front Side Bus may wish to contemplate a memory controller that is not merely offchip, but having to wait for a task switch from the currently running task to do anything at all, and then talks through an rather slower bus. Of course the oncard graphics memory is rather larger than even a Penryn cache, but still...

Is the task that runs on the CPU to handle the GPU for CUDA operations visible to you in Task Manager? Is it running at the same (low) priority as the conventional BOINC CPU tasks? If so, as a temporary testing expendient, it might be interesting to see whether bumping up its priority to enourage the task to switch sooner to service CUDA might reduce the disadvantage you are speaking of.

Caveat: I don't really understand either the programming or the hardware architecture of the CUDA implementation, so some of the above may be nonsense.

ID: 847395 · Report as offensive
Roberto Patriarca

Send message
Joined: 17 May 99
Posts: 22
Credit: 1,967,389
RAC: 0
Italy
Message 847409 - Posted: 31 Dec 2008, 18:29:16 UTC - in response to Message 847395.  


Is the task that runs on the CPU to handle the GPU for CUDA operations visible to you in Task Manager? Is it running at the same (low) priority as the conventional BOINC CPU tasks?


GPU tasks are, as far as I could observe, normal tasks which use a very tiny amount of CPU power (less than 1%). Probably they just push and pull data to and from the graphics board memory.

They run at normal priority, while conventional CPU crunchers run at minimun priority. Given this difference, one might think when as many CPU crunchers as CPU cores are running, one of them should be slowed down a little by GPU crunchers. Actually, as I found out, it is the other way around and GPU tasks get slowed down by as much as 75%.

I tried and artificially rised GPU task priority, but it do not make any difference. I am positive that the issue will be resolved soon; in the meantime I thought it was better to put other GPU users on the advice.

Happy new year everybody!

Roberto
ID: 847409 · Report as offensive
Profile enusbaum
Volunteer tester

Send message
Joined: 29 Apr 00
Posts: 15
Credit: 5,921,750
RAC: 0
United States
Message 847436 - Posted: 31 Dec 2008, 20:15:07 UTC

From what I've clocked so far, it seems that my 8800GTX is 3.4x faster than a single Xeon X5355 core clocked at 2.66Ghz with the AK v8 optimized (SSSE3) version.

So from that, you can say that a single GeForce 8800GTX (768MB edition) is almost as fast as a Quad Core Xeon X5355 @ 2.66ghz.

Not too shabby!

(This is all based on wall clock comparisons for comparable SETI@Home classic work units)
ID: 847436 · Report as offensive
Profile enusbaum
Volunteer tester

Send message
Joined: 29 Apr 00
Posts: 15
Credit: 5,921,750
RAC: 0
United States
Message 847441 - Posted: 31 Dec 2008, 20:23:53 UTC
Last modified: 31 Dec 2008, 20:24:04 UTC

Another thing to keep in mind is the number of Stream Processors available to the CUDA interface depending on your video card.

- 8300GS only has 8 Stream Processors with a Shader Clock of 900Mhz
- 8600GT only has 32 Stream Processors with a Shader Clock of 1.19Ghz
- 8800GTX has 128 Stream Processors with a Shader Clock of 1.35Ghz

So you can see that just using the 8xxx series of cards for an example, there's a drastic difference in processing power depending on the card you're using. Just on raw numbers, the 8800GTX is over 20x faster than an 8300GS.

For this reason I believe, nVidia left the 8300GS off it's list of "CUDA Enabled Devices", which is why you're only able to run CUDA tasks on the 8600GT you have installed (since running them on the 8300GS would be PAINFULLY slow) :)
ID: 847441 · Report as offensive
Profile Borgholio
Avatar

Send message
Joined: 2 Aug 99
Posts: 654
Credit: 18,623,738
RAC: 45
United States
Message 847465 - Posted: 31 Dec 2008, 21:27:49 UTC

I am running an 8400 GS and my CUDA crunch times range from between 3 hours to 8 hours per workunit, while my dual core P4 crunches between 1.5 hours and 6 hours per core. In other words, my CUDA client is far slower than my CPU.
You will be assimilated...bunghole!

ID: 847465 · Report as offensive
Profile MeglaW
Volunteer tester

Send message
Joined: 21 Jun 00
Posts: 36
Credit: 479,460
RAC: 0
Sweden
Message 848229 - Posted: 2 Jan 2009, 16:04:48 UTC
Last modified: 2 Jan 2009, 16:05:49 UTC

i noticed this to, i did the cc_config.xml thing for 3 cores (c2d + gpu) and time went up from 6-7min to 20-30min, also, gpu temp dropped 4 degrees when running 2+1 so im back to no config and it using my cpu mainly for one unit of einstein@home and gpu soly for seti cuda.. no more folding for me.

running on a watercooled (brag) c2d 1.6 @ 3.4GHz 425x8, with a 8800gt running 655x965MHz
ID: 848229 · Report as offensive
Roberto Patriarca

Send message
Joined: 17 May 99
Posts: 22
Credit: 1,967,389
RAC: 0
Italy
Message 848242 - Posted: 2 Jan 2009, 16:36:22 UTC - in response to Message 847465.  

In other words, my CUDA client is far slower than my CPU.


Did you try to run CPU application on just one of your two cores? Your CUDA client should benefit a lot from running on an unloaded CPU. Probably it will never match your CPU speed but it is worth a try, if only for testing purposes.
ID: 848242 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 848245 - Posted: 2 Jan 2009, 16:48:08 UTC - in response to Message 848242.  

In other words, my CUDA client is far slower than my CPU.


Did you try to run CPU application on just one of your two cores? Your CUDA client should benefit a lot from running on an unloaded CPU. Probably it will never match your CPU speed but it is worth a try, if only for testing purposes.

Well, look this thread. http://setiathome.berkeley.edu/forum_thread.php?id=50829
This CPU.GPU full load question already discussed in details and solution for full GPU and CPU load is supplied.
ID: 848245 · Report as offensive
Roberto Patriarca

Send message
Joined: 17 May 99
Posts: 22
Credit: 1,967,389
RAC: 0
Italy
Message 848745 - Posted: 3 Jan 2009, 16:59:13 UTC - in response to Message 847441.  


For this reason I believe, nVidia left the 8300GS off it's list of "CUDA Enabled Devices", which is why you're only able to run CUDA tasks on the 8600GT you have installed (since running them on the 8300GS would be PAINFULLY slow) :)



I don't know whether my built-in 8300 is a GS, but I *am* able to run CUDA on it. It is quite slow, but still useful IMHO.

I took some time to match credits claimed by my CUDA-crunched workunits and the (wall)time it took to crunch them. The 8300 is indeed some 3.5 times slower than the 8600 GT on similar work units; this figure agrees with you: 8 shaders (8300) against 32 (8600 GT), but 1500 MHz shader clock (8300) against 1200 (8600 GT).

By comparison, the 8300 is slightly slower than one of my CPU cores running the standard S@H multibeam application, while the 8600 is slightly slower than the all four of them combined.

So is it worth while to run CUDA on an 8300? Of course I would not buy such a board (neither an 8600 GT, BTW) just to crunch workunits. But their combined power matches that of an extra quad-core CPU - a *free* quad core CPU, as I already own them. Why not using that power?

Regards,

Roberto
ID: 848745 · Report as offensive
Profile Loony
Avatar

Send message
Joined: 8 Dec 99
Posts: 5
Credit: 3,193,475
RAC: 78
United Kingdom
Message 848989 - Posted: 4 Jan 2009, 0:11:47 UTC - in response to Message 847465.  

Unlucky......
Mine dropped from 3 hours to 17 minutes for an average unit.....


Unfortunatley it also kept crashing the Video card.... some really pretty effects...

So rolled back to NON cuda version

ID: 848989 · Report as offensive

Questions and Answers : GPU applications : JACT (Jet Another Cuda Thread): GPU undeperforms when all CPU cores are processing WUs


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.