GPU running at 75%

Message boards : Number crunching : GPU running at 75%
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile cybrdaze

Send message
Joined: 26 Sep 11
Posts: 6
Credit: 14,266,184
RAC: 0
United States
Message 1765000 - Posted: 14 Feb 2016, 22:07:47 UTC

Since the v8 change I can't seem to get my gpu to run over 75%. Is there a setting for max gpu usage?
ID: 1765000 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13732
Credit: 208,696,464
RAC: 304
Australia
Message 1765005 - Posted: 14 Feb 2016, 22:16:58 UTC - in response to Message 1765000.  
Last modified: 14 Feb 2016, 22:28:08 UTC

Running 2 WUs at a time will increase GPU usage, however having the GPU running at (or close to) 100% often results in less work being done per hour.
For my GTX750Tis, 2 at a time is the sweet spot and generally gives around 80-85% load on the GPU.


EDIT-
This is the app_config.xml file I use to run 2 WUs at a time (I only run MB work).

<app_config>
<app>
<name>setiathome_v8</name>
<gpu_versions>
<gpu_usage>0.50</gpu_usage>
<cpu_usage>0.04</cpu_usage>
</gpu_versions>
</app>

<app>
<name>setiathome_v7</name>
<gpu_versions>
<gpu_usage>0.50</gpu_usage>
<cpu_usage>0.04</cpu_usage>
</gpu_versions>
</app>
</app_config>


Place the app_config.xml file in the ProgrammeData\BOINC\Projects\setiathome.berkeley
directory (if you went with the default installation settings),
in the BOINC Manager, select Options, Read config files.
Grant
Darwin NT
ID: 1765005 · Report as offensive
Profile cybrdaze

Send message
Joined: 26 Sep 11
Posts: 6
Credit: 14,266,184
RAC: 0
United States
Message 1765012 - Posted: 14 Feb 2016, 22:37:34 UTC - in response to Message 1765005.  

Do you replace the entire config file with just this shortened version?
ID: 1765012 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13732
Credit: 208,696,464
RAC: 304
Australia
Message 1765017 - Posted: 14 Feb 2016, 22:50:24 UTC - in response to Message 1765012.  

Do you replace the entire config file with just this shortened version?

That file won't be there unless you've put one there previously.
If there is one there, then just replace the setathome_v8 & setiathome_v7 (if you still do v7 work) GPU versions sections with the ones I posted.
Grant
Darwin NT
ID: 1765017 · Report as offensive
Profile cybrdaze

Send message
Joined: 26 Sep 11
Posts: 6
Credit: 14,266,184
RAC: 0
United States
Message 1765020 - Posted: 14 Feb 2016, 22:55:07 UTC - in response to Message 1765017.  

Thanks Grant! I'm running the same gtx 750ti card as you it appears just swore that it ran harder last year. 80% is fine though, started buying parts for a new rig
ID: 1765020 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13732
Credit: 208,696,464
RAC: 304
Australia
Message 1765063 - Posted: 15 Feb 2016, 2:56:52 UTC - in response to Message 1765020.  

just swore that it ran harder last year.

It would have if you were running the Lunatics installation, as it was optimised compared to the stock installation; it worked the hardware much harder.
The current CUDA application hasn't been optimised to take advantage of newer hardware- it was built to give valid results on a wide range of hardware.
Lunatics are presently working on optimised applications that will be more efficient (and work your hardware harder) than the current stock applications, but it takes time and the developers need to earn a living. That tends to cut in to the time available for development.
Grant
Darwin NT
ID: 1765063 · Report as offensive
Sidewinder Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 Nov 09
Posts: 100
Credit: 79,432,465
RAC: 0
United States
Message 1765169 - Posted: 15 Feb 2016, 13:42:10 UTC - in response to Message 1765020.  

You're not crazy. My 750 Ti's were running ~90-95% utilization (2 WU per) on the v7 app and they're only around 80% on v8, but I don't think that actually slowed the cards any. The run times are pretty much the same between v7 and v8 and your times seem about the same as mine. I'd chalk this up to the optimizations done by the SETI@Home and Lunatics teams; see: (http://setiathome.berkeley.edu/forum_thread.php?id=78710#1752922).
ID: 1765169 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1766088 - Posted: 18 Feb 2016, 19:05:36 UTC - in response to Message 1765000.  

After running v8 for a couple weeks on my old xw9400, I noticed the same thing. Whereas with v7 the average GPU load factor on the 4 GPUs (according to GPU-Z) was consistently in the 88-98% range, with v8 it had dropped to 78-83%.

I first experimented with mbcuda.cfg changes (as suggested to someone else by Jason in another thread), increasing the pfblockspersm and pfperiodsperlaunch values. That only seemed to boost the GPU load factor by a percent or two. So, after about 4 days of running that configuration, I decided to try increasing the number of tasks per GPU. With v7, 3 tasks per GPU had tended to degrade the performance of the GTX750Ti cards (although 3 tasks on the GTX660s increased throughput).

After changing from 2 to 3 tasks per GPU, the GPU load factor increased to a range of about 86-93%. Still less than v7, but it appeared to be an improvement.

In order to verify that the box truly was more productive, I went and pulled run times from my archives for tasks from two different angle ranges, "Normal" ARs of 0.44nnnn and "High" ARs of 2.72nnnn. I also went further back and pulled run times from v7 tasks, ending up with comparisons for 5 different configurations, 2 under v7 and 3 under v8. All figures are for periods when the machine was running MB tasks only. (Mixing in APs really screws up MB run times.)

The tables below show my results. The first table shows average run times, while the second shows average throughput, which might make for an easier comparison. All but two of the averages were calculated from 5 tasks each. (Only the CPU numbers for Configuration 4 used fewer than 5 tasks, simply because I didn't have enough samples available in each Angle Range, due to the fact that I only ran that configuration for about 4 days.)

The machine is a dedicated cruncher with dual Quad-Core AMD Opteron 2389 processors (8 cores total) running Windows 7 32-bit and BOINC 7.6.9. It has 2 GTX660 and 2 GTX750Ti GPUs (all with different core clocks), using driver 353.30.

KEY:
Configuration 1: v7 - 2 tasks (cuda50) per GPU (0 cores free) - (processpriority = abovenormal; pfblockspersm = 4; pfperiodsperlaunch = 100)
Configuration 2: v7 - 2 tasks (cuda50) per GPU (1 core free) - (processpriority = abovenormal; pfblockspersm = 4; pfperiodsperlaunch = 100)
Configuration 3: v8 - 2 tasks (cuda50) per GPU (1 core free) - (processpriority = abovenormal; pfblockspersm = 4; pfperiodsperlaunch = 100)
Configuration 4: v8 - 2 tasks (cuda50) per GPU (1 core free) - (processpriority = abovenormal; pfblockspersm = 15; pfperiodsperlaunch = 200)
Configuration 5: v8 - 3 tasks (cuda50) per GPU (2 cores free) - (processpriority = abovenormal; pfblockspersm = 15; pfperiodsperlaunch = 200)

Angle Ranges: Normal = 0.44nnnn, High = 2.72nnnn

AVERAGE TASK RUN TIMES ([hh:]mm:ss)
 Config  AR   GTX750Ti  GTX660    GTX660    GTX750Ti           CPU
             (1084Mhz) (1110Mhz) (888Mhz)  (1254Mhz)
   1   Normal  21:51     18:16     19:59     20:51           4:31:07
       High    14:01     11:50     12:40     13:27           2:39:44

   2   Normal  22:12     18:12     20:27     21:41           4:52:48
       High    14:37     11:51     12:30     14:16           2:32:07

   3   Normal  27:19     20:41     22:24     25:37           4:56:57
       High    18:53     12:24     14:51     18:37           2:31:48

   4   Normal  26:54     20:33     21:58     25:16           5:09:58
       High    17:04     12:28     13:51     17:30           2:33:38

   5   Normal  33:00     27:08     32:38     34:07           5:05:29
       High    25:49     15:39     22:47     23:16           2:31:34

AVERAGE TASK THROUGHPUT (Tasks/Hr)
 Config  AR   GTX750Ti  GTX660    GTX660    GTX750Ti    GPU    CPU     Host
             (1084Mhz) (1110Mhz) (888Mhz)  (1254Mhz)   Totals         Totals
   1   Normal  5.49      6.56      6.00      5.76      23.81   1.77   25.58
       High    8.57     10.15      9.48      8.92      37.12   3.01   40.13

   2   Normal  5.41      6.59      5.87      5.53      23.40   1.43   24.83
       High    8.21     10.13      9.60      8.41      36.35   2.76   39.11

   3   Normal  4.39      5.80      5.36      4.68      20.23   1.41   21.64
       High    6.35      9.68      8.08      6.45      30.56   2.77   33.33

   4   Normal  4.46      5.84      5.46      4.75      20.51   1.35   21.86
       High    7.03      9.63      8.66      6.86      32.18   2.73   34.91

   5   Normal  5.45      6.63      5.52      5.28      22.88   1.18   24.06
       High    6.97     11.50      7.90      7.74      34.11   2.38   36.49

Although a few of the numbers seem a bit odd (probably due to the small 5-task sample size that makes up each average), I think this comparison shows at least 3 things (to me, anyway). First, that v8 tasks take slightly longer than v7, no matter what. Second, that increasing pfblockspersm and pfperiodsperlaunch does help. And third, that under v8 it appears that there's at least a slight overall benefit from increasing the number of concurrent tasks for these particular cards from 2 to 3.

A possible 4th observation is that freeing CPU cores really doesn't have that much of an impact. I had always run v7 flat out, with all 8 cores crunching, until late December, when I decided to try freeing just one. I continued with the one free core under v8 until I increased the tasks per GPU from 2 to 3, at which time I decided to try also freeing a second core. I may try going back to one, or even none, in the next few days.
ID: 1766088 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13732
Credit: 208,696,464
RAC: 304
Australia
Message 1766122 - Posted: 18 Feb 2016, 22:17:07 UTC - in response to Message 1766088.  

Although a few of the numbers seem a bit odd (probably due to the small 5-task sample size that makes up each average), I think this comparison shows at least 3 things (to me, anyway). First, that v8 tasks take slightly longer than v7, no matter what. Second, that increasing pfblockspersm and pfperiodsperlaunch does help. And third, that under v8 it appears that there's at least a slight overall benefit from increasing the number of concurrent tasks for these particular cards from 2 to 3.

On my systems with 2 GTX 750Tis each using the default MB_cfg values I found 2 MB WUs at a time gives the best throughput. 3 at a time increases GPU load, and with longer running WUs I get more WUs done per hour, but the blowout in shorty runtimes well and truly kills that advantage, hence still running 2 at a time.
Did you try the higher pfblockspersm and pfperiodsperlaunch with the processpriority at the default?
When running v7 I tried multiple values of pfblockspersm and pfperiodsperlaunch with no effect. With v8 I'm running them on one system with processpriority at above normal and it does help, particularly with longer running WUs; but I had to give that away on my main machine as every 15th (or so) letter I typed didn't make it to the screen.



A possible 4th observation is that freeing CPU cores really doesn't have that much of an impact.

I've only even run MB and I've found that to be the case. With AP it might be necessary to get maximum GPU performance, but on my systems freeing up a CPU core had no effect on GPU output.
Grant
Darwin NT
ID: 1766122 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1766147 - Posted: 18 Feb 2016, 23:25:43 UTC - in response to Message 1766122.  

On my systems with 2 GTX 750Tis each using the default MB_cfg values I found 2 MB WUs at a time gives the best throughput. 3 at a time increases GPU load, and with longer running WUs I get more WUs done per hour, but the blowout in shorty runtimes well and truly kills that advantage, hence still running 2 at a time.

Yeah, for the high AR shorties, my results are mixed. As you can see, one of the 660s and one of the 750Tis seemed to do better at 3 per GPU, while one of each did worse. However, there still appeared to be an overall gain in throughput of about 2 tasks per hour for shorties. For the "normal" ARs, 3 per GPU was a more consistent winner for all 4 GPUs. So, I would think that with a typical mixed bag of ARs, 3 per GPU will also come out slightly ahead.

Did you try the higher pfblockspersm and pfperiodsperlaunch with the processpriority at the default?

No, I didn't. At least, not yet. I've been running that box with "abovenormal" since way back in July, 2013, when it just had a GTX650 and GTX550Ti on board. As a crunch-only machine, I'm not really worried about S@H impact on anything else. I could probably even run with "processpriority = high" and not affect anything else, but I don't think I'd see much advantage to that, either.

When running v7 I tried multiple values of pfblockspersm and pfperiodsperlaunch with no effect. With v8 I'm running them on one system with processpriority at above normal and it does help, particularly with longer running WUs; but I had to give that away on my main machine as every 15th (or so) letter I typed didn't make it to the screen.

On my daily driver, I also leave everything at the defaults, since here I consider S@H as strictly background processing, sort of the original intent, I believe. :^)
ID: 1766147 · Report as offensive

Message boards : Number crunching : GPU running at 75%


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.