Getting maximum credits


log in

Advanced search

Questions and Answers : GPU applications : Getting maximum credits

Author Message
Default
Avatar
Send message
Joined: 23 Aug 08
Posts: 50
Credit: 2,222,384
RAC: 0
United States
Message 878182 - Posted: 22 Mar 2009, 3:12:39 UTC

I have three rigs with Cuda enabled GPU's and was curious if I would get the most credits by trying to process as much Cuda WU's as I could or just let the BOINC software manage my WU's, which right now is doing mostly AP 5.03's. The quad core running 8 instances of AP in about 70 hours is fairly impressive, but it also has two GTX 260's SLI'd and I want to know if I might get more credits by running mainly Cuda. Thoughts?

OzzFan
Volunteer tester
Avatar
Send message
Joined: 9 Apr 02
Posts: 13633
Credit: 31,173,330
RAC: 19,582
United States
Message 878197 - Posted: 22 Mar 2009, 4:08:07 UTC - in response to Message 878182.

You'll get the most credits by letting the CPUs run AstroPulse and the GPUs run CUDA. Running only CUDA will mean your CPUs go idle when they could be processing more data, earning you more credit.
____________

Fred W
Volunteer tester
Send message
Joined: 13 Jun 99
Posts: 2524
Credit: 11,954,210
RAC: 0
United Kingdom
Message 878226 - Posted: 22 Mar 2009, 8:03:07 UTC

And if your GTX260's really are SLI'd then you will be running only one CUDA at a time. Switching off SLI will allow 2 to run at the same time.

F.
____________

Profile Bob Mahoney Design
Avatar
Send message
Joined: 4 Apr 04
Posts: 178
Credit: 9,205,632
RAC: 0
United States
Message 883978 - Posted: 10 Apr 2009, 16:48:43 UTC
Last modified: 10 Apr 2009, 16:56:58 UTC

OzzFan and Fred W are exactly right, of course, but here is a tuning method that might optimize anyone's particular system to a max:

1. Run a bunch of WU's with no CPU processing. That is, no Astropulse (AP) for a while.

2. Check BOINC Manager, Tasks screen as follows: Click on "Elapsed" column or "CPU time" column (depending on BOINC version) until completed tasks are at the top.

3. Read down the "To completion" column and see what the time estimate is for tasks that have NOT run yet. This is usually a very good indicator of GPU WU wall-clock time. This number can vary wildly with a few oddball WU's, but you'll get a feel for it after watching it for a while.

4. Now fill up your CPU, all cores, with AP work.

5. Let the system settle down for a long while. Now check your "To completion" and see how bad a performance hit GPU processing has taken. It might be bad.

6. Adjust DOWN by one the number of CPU cores dedicated to AP processing.

7. Recheck the "To completion" column.

8. Calculate your total system RAC.

9. Repeat setps 6 through 8 until you find the RAC sweet spot for your system.

How do you know it is the "sweet spot"? This is a difficult calculation. Here's a clumsy summary of the math:

First, you need to know what your GPU setup will do, all by itself, for an RAC. You can estimate this from the data you've collected above (86,400 seconds in a day divided by seconds to complete a WU multiplied by credit for those WU multiplied by number of GPU)

Now you know your predicted GPU RAC output.

Then, with each CPU AP test (1-core, 2-core, etc.) you can recalculate the expected GPU total output (should be lower the more you load up the CPU with AP), vs. the predicted gain in RAC from AP processing.

You should find a point of diminishing returns for total RAC. As you run more concurrent AP on the CPU you might find your total GPU RAC output drops more than you re-gain by loading up the CPU cores with Astropulse.

There should be a magical point at which RAC will be optimized for your entire system working together. This will, usually, also be the point of maximum wattage used. This indicates maximum work. (This isn't always true, but is close enough for this analysis. Just remember a watt given to the GPU usually does more work than that watt spent on a CPU.)

The magic point for my mutated Sarge1 host was not what I expected.

Please remember you are tuning a supercomputer, yes you are. The first job of the CPU is to feed the GPU processors. The next priority for the CPU is to use idle time to complete extra work, but not at the expense of GPU output. A system might work best with lots of apparent idle time for the CPU, since maximizing math/trigonmetric-coprocessor (GPU CUDA, in our case) output is exactly what supercomputers are tuned to do.

One last point: I highly recommend using BOINC 6.6.15 or above for this, since older versions of BOINC allow an AP completion to greatly skew the estimated "To Completion" column for all the GPU tasks (running Multibeam) after each AP unit completes. BOINC 6.6.20 nicely isolates AP time calculations from MB time calculations.

This exact method increased my RAC by quite a bit.

Why do we need to do this type of counter-intuitive tuning to get maximum work output? I suspect we are stressing the capabilities of Windows multitasking. That makes the logical answer sometimes less than optimal.

I hope this helps! And I apologize for the windbagishness of this post.

Bob Mahoney
____________
Opinion stated as fact? Who, me?

Default
Avatar
Send message
Joined: 23 Aug 08
Posts: 50
Credit: 2,222,384
RAC: 0
United States
Message 885416 - Posted: 15 Apr 2009, 1:40:45 UTC - in response to Message 883978.

Thanks for the replies. I am slowly getting things figured. I may have stumbled onto an easy method for for maximizing multi-threaded processors running CUDA. Going off what Bob has given us, I used the performance tab in my task manager to observe each of my cores and memory usage (shows 8 cores on a multi-threaded i7). With all eight CPU cores enabled, the taskmanager showed (obviously) that all the cores were running at 100 percent when running eight instances of AP. I then started disabling cores in cc_config. One core disabled still showed all eight in taskmanager running at nearly 100 percent. I then tried two, for a total of 6 CPU and two GPU tasks. This seems to be my magic number as my task manager shows average CPU utiliazation at about 85 percent, with seven CPU cores at near 100 percent usage and one at about 40 percent and it approaches 100 percent only when a MB WU begins. Disabling 3 cores causes the average CPU utilization rate to drop to 70 percent. Completion times have dropped significantly and I think I've solved my CUDA errors problem, too. A quick way to find the sweet spot? I think so!
____________

Profile Gundolf Jahn
Send message
Joined: 19 Sep 00
Posts: 3184
Credit: 359,640
RAC: 35
Germany
Message 885486 - Posted: 15 Apr 2009, 8:07:58 UTC - in response to Message 885416.

...I then started disabling cores in cc_config...

Quote from Client configuration in the BOINC wiki:
<ncpus>
Act as if there were N CPUs: run N tasks at once. This is for debugging, i.e. to simulate 2 CPUs on a machine that has only 1. To use the number of available CPUs, set the value to -1. Don't use it to limit the number of CPUs used by BOINC; use general preferences instead.

The preference in question would be
On multiprocessors, use at most 75 % of the processors Enforced by version 6.1+

Gruß,
Gundolf
____________
Computer sind nicht alles im Leben. (Kleiner Scherz)

SETI@home classic workunits 3,758
SETI@home classic CPU time 66,520 hours

Profile Steven Meyer
Avatar
Send message
Joined: 24 Mar 08
Posts: 2301
Credit: 2,999,007
RAC: 0
United States
Message 887642 - Posted: 23 Apr 2009, 17:00:55 UTC - in response to Message 883978.

... here is a tuning method that might optimize anyone's particular system to a max:
...
6. Adjust DOWN by one the number of CPU cores dedicated to AP processing.
...
Bob Mahoney


Bob,

When we use the option ...
On multiprocessors, use at most xx% of the processors"
setting it to 75% on a 4 processor system, would that also make one less processor available to feed the CUDA, or does it simply run one less concurrent AP process?

____________
FireFox Personas


Profile neil.stott
Send message
Joined: 28 Aug 03
Posts: 1
Credit: 9,113
RAC: 0
United Kingdom
Message 888385 - Posted: 26 Apr 2009, 1:50:58 UTC - in response to Message 887642.

looking at my processes tab in task manager 75% setting seems to be using 3cpu's for AP [75% total cpu] then 8 - 16% of my 4th cpu for cuda feeding [77-79% total cpu]

running with 100% setting each AP process uses aprox 24% with cuda using 2-4% [100% total cpu]

i find the 75% makes my system more useable while its completing work units. allowing me to do other stuff.

estimated time to completion doesn’t appear to change for cuda units between 75% and 100%

system: Q9550 - 9800GTX+

____________

Profile Bob Mahoney Design
Avatar
Send message
Joined: 4 Apr 04
Posts: 178
Credit: 9,205,632
RAC: 0
United States
Message 888479 - Posted: 26 Apr 2009, 14:34:07 UTC - in response to Message 887642.

Bob,

When we use the option ...
On multiprocessors, use at most xx% of the processors"
setting it to 75% on a 4 processor system, would that also make one less processor available to feed the CUDA, or does it simply run one less concurrent AP process?

I haven't experimented with the "use at most xx% of the processors" option. I fear it might limit all processing (GPU support included). While that is probalby a good idea for systems being used for non-BOINC activities, my system is only running BOINC and SETI, no other user apps.

I've always used the cc_config method of limiting the number of processors.

My system (older 4-core CPU with 6 GPU) is most comfortable at 1xCPU for AP while filling all the GPU with MB.

Here is my cc_config.xml file:

<cc_config>
<options>
<ncpus>1</ncpus>
</options>
</cc_config>

I edit the cc_config.xml with Windows Notepad. If you are creating such a file for the first time, the simplest way to do it is as follows:

Right click on a blank area of your desktop.
Select "new" then "text document"
Now right click on the new text document just created.
Select "open with" then "Notepad"
Cut and Paste in the contents as shown above.
Select "File" then "save as"
Give it a filename cc_config.xml This forces it to be an xml type of file.

Then make sure you put this new .xml file in your BOINC data folder, default of which contains things like a projects folder, a slots folder, etc.

For future edits of this file, just right click on the file, choose "open with" and then "notepad". Edit it, then exit notepad and save it.

Note: The <ncpus> parameter only affects the number of CPU cores used for workunit processing on the CPU itself. The <ncpus> parameter does not affect the amount of total CPU available for support of GPU-based processing. So if your GPU wants 10% of total CPU power, that is what it still gets. Not 10% of a core, but 10% of all of the CPU (all cores) in total.

Bob
____________
Opinion stated as fact? Who, me?

Questions and Answers : GPU applications : Getting maximum credits

Copyright © 2014 University of California