Can I further optimize multiple GPU calculations?

Message boards : Number crunching : Can I further optimize multiple GPU calculations?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1771941 - Posted: 16 Mar 2016, 18:17:25 UTC

I have 3 PC's currently working on SETI.

i7-3930K with 3 x GTX 670, <GPU_usage>.33, <cpu_usage>.04, Run time 1409sec, CPU time 243sec, 76 credits
i7-950 with 1 x GTX 680, <GPU_usage>.33, <cpu_usage>.04, Run time 1446sec, CPU time 199sec, 95 credits
Q6600 with 1 x GTX 950, <GPU_usage>.50, <cpu_usage>.04, Run time 2478sec, CPU time 812sec, 102 credits

Run time and CPU time and credits are average of latest 20 WU's for SETI at home v8.
With this setting, GPU usage stays above 80% most of the time.

If I change CPU usage assignment, will it affect Run time?
And does it affect credit?

I can always test this by changing the settings,
but could someone help me rationalize?
ID: 1771941 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1771954 - Posted: 16 Mar 2016, 19:24:31 UTC

20 tasks is far too small a sample size to give you any indication of the performance of a processor, you need to run for two thousand tasks to get a meaningful sample of task types and wingmen - yes, that will take a couple of weeks to achieve.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1771954 · Report as offensive
Sidewinder Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 Nov 09
Posts: 100
Credit: 79,432,465
RAC: 0
United States
Message 1772007 - Posted: 16 Mar 2016, 22:33:11 UTC

Like Rob said, 20 WU's are too small a sample size to get a good idea on average times (not all WUs are the same). To really check your settings, you would need to run some test WUs many times. I think the ones at Lunatics should still be valid (http://lunatics.kwsn.info/index.php?action=downloads;cat=45). Someone correct me if I'm wrong :)

That being said, in my experience you *may* see a decrease in processing times on your crunchers with older CPUs (your i7-950 and Q6600) if you increase the CPU per WU. On my Q8300 I needed to bump up my <cpu_usage> to 0.4 to keep my GPUs properly fed.
ID: 1772007 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1772009 - Posted: 16 Mar 2016, 22:46:37 UTC

20 tasks is far too small a sample size to give you any indication of the performance of a processor, you need to run for two thousand tasks to get a meaningful sample of task types and wingmen - yes, that will take a couple of weeks to achieve.


I get similar averages if I take 20 tasks from 10 days ago or 20 days ago.
Also, I don't understand how wingmen's performance affect my calculation speed.
Am I missing something very fundamental?

My question is, simply put, by increasing CPU time from 0.04 can I reduce run time for GPU tasks, and if so, what enhancement can I expect by increased CPU time: Is it linear? Is it saturable?
ID: 1772009 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1772011 - Posted: 16 Mar 2016, 22:50:11 UTC

On my Q8300 I needed to bump up my <cpu_usage> to 0.4 to keep my GPUs properly fed.


Thank you very much for the info.
I guess I can experiment with mine, too, now that I know I can further reduce the processing speed.
ID: 1772011 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1772077 - Posted: 17 Mar 2016, 5:53:00 UTC

Given you are running Intel processors there will be virtually no improvement that can be realistically measured over such a small sample - which is why I said use a LARGE sample.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1772077 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1772150 - Posted: 17 Mar 2016, 14:26:53 UTC
Last modified: 17 Mar 2016, 14:51:07 UTC

I understand your point.
With run time of a few thousand seconds and associated standard deviation of a few hundred, I can estimate the statistical power.
That's why I needed rational because I don't have any clue what's CPU's calculating and sequential events happening, i.e. Does GPU need to wait periodically for what's CPU feeds, etc.

I decided to start my experiment blindly but with Sidewinder's observation by increasing CPU usage to 0.4 for now and keep accumulating data.
Sadly I even don't know what CPU usage of 0.4 means, i.e., is this the upper limit, average, time, threads or something else?

With average run time of ~2500 and associated current estimated standard deviation of ~600 sec, it will take a while ;-)
ID: 1772150 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1772162 - Posted: 17 Mar 2016, 15:32:20 UTC

Simple statistical analysis as you are suggesting will not work - there are so many variations in the run time of tasks due to the data contained in them. For example, this morning I watched 6 tasks running on the GPUs on one of my PCs, the run times varied between 12 and 45 minutes (ignoring a couple of 5 second runs). Now if I'd been "lucky" and all the 20 or so tasks I used for an analysis were at one end of the spectrum, and I made a change and ran for another 20 or so tasks, but they came from the other end of the spectrum I would get a totally wrong impression of the effect of the change.
You need to capture a representative set of data, and, given the way SETI distributes the data this means long periods of time and lots of samples. And LONG really does mean long - hundreds or even thousands of tasks.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1772162 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1772191 - Posted: 17 Mar 2016, 18:16:13 UTC
Last modified: 17 Mar 2016, 18:29:01 UTC

That's exactly why I asked for rational help to begin with.
When I look at the run time distributions on my main PC, usually 2/3 are 27-33 min WU's, and 20% or so of 10-12 min WU's and some in between. I'm sure there is a good reason for this but I don't know.
ID: 1772191 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22190
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1772200 - Posted: 17 Mar 2016, 18:56:19 UTC

There are a number of reasons for the variability in the run times.
The simplest to understand is the amount of noise contained within the data, next is the "angular range" of the data. On top of these are the number of potential signals, the characteristics of these signals, even the location of these signals affects the speed of processing. Then there are the things that your PC decides to do in the background.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1772200 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1772388 - Posted: 18 Mar 2016, 11:55:08 UTC - in response to Message 1771941.  
Last modified: 18 Mar 2016, 12:27:16 UTC

If I change CPU usage assignment, will it affect Run time?
And does it affect credit?

No and no

The <cpu_usage> is not used by (GPU) applications, they use as much CPU time as they need no matter what value you set for <cpu_usage>

<cpu_usage> is only used by BOINC - to decide if it should run less CPU tasks so GPU tasks/apps have enough free CPU

The value is "part of a CPU core" (1 = full core, .04 = 4% of core ("core" as a device))

So if you change <cpu_usage>.04 to <cpu_usage>.33 this will Not have any effect (for computer which have one GPU):
3 GPU tasks * 0.33 cpu_usage = 0.99 = (truncated to) 0
= BOINC will act "normally" - the same as if <cpu_usage>0.001
BOINC will Not run one less CPU task (will Not "free a CPU core") - will start/run as many CPU tasks as the number of CPU cores (threads) (if you "Use at most 100% of the CPUs" in Preferences)

If you change to <cpu_usage>.34 -
3 GPU tasks * 0.34 = 1.02 = (truncated to) 1 = BOINC will run one less CPU task (will "free a CPU core")

Any value 0.34 ... 0.66 will free 1 CPU core
0.67 will free 2 CPU cores (for 3 GPU tasks running)

Free CPU cores may/will affect Run time for GPU tasks and so affect credit/day


Sadly I even don't know what CPU usage of 0.4 means, i.e., is this the upper limit, average, time, threads or something else?

Again - the applications don't know what you use for <cpu_usage>, only BOINC knows
BOINC can't restrict GPU apps by "upper limit, average, time, threads" - once the app is started it uses CPU GPU RAM ... as it needs

So CPU usage of 0.4 means (is hint) for BOINC that one running GPU app will need 40% of a CPU core ("thread" (virtual CPU) if you use Hyper-Threading)
BOINC sums all the <cpu_usage> numbers from the currently running apps
Then truncates to integer (1.99999 -> 1) and reduces by that number the started CPU tasks (i.e. instead of 4 will start 3 CPU tasks)


i7-3930K with 3 x GTX 670, <GPU_usage>.33, <cpu_usage>.04

For this system with 3 GPUs <cpu_usage>.4 will make BOINC to:
3 GPUs * 3 GPU tasks * 0.4 = 3.6 = (truncated to) 3 = BOINC will run 3 less CPU task (will "free 3 CPU cores")
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1772388 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1772403 - Posted: 18 Mar 2016, 13:21:25 UTC
Last modified: 18 Mar 2016, 13:31:16 UTC

Thank you very much for the detailed explanation.
Your explanation is very clear in terms of how I can free up CPU cores.

May I ask a few more clarifications?

The <cpu_usage> is not used by (GPU) applications, they use as much CPU time as they need no matter what value you set for <cpu_usage>

<cpu_usage> is only used by BOINC - to decide if it should run less CPU tasks so GPU tasks/apps have enough free CPU


This is still confusing to me. Doesn't this mean <cpu_usage> value affect GPU application since it is deciding how much free CPU is allocated to GPU tasks?
If GPU application uses as much CPU time as they need, why do we need to free up CPU core? Also, this brings up another question of how many cores I should free up for my old PC to maximize the speed of GPU application. (And now I can test it by changing <cpu_usage> value, thank you)

i7-3930K with 3 x GTX 670, <GPU_usage>.33, <cpu_usage>.04

For this system with 3 GPUs <cpu_usage>.4 will make BOINC to:
3 GPUs * 3 GPU tasks * 0.4 = 3.6 = (truncated to) 3 = BOINC will run 3 less CPU task (will "free 3 CPU cores")


Here, 3 GPUs * 3 GPU tasks * 0.04 =0.36 = (truncated to) 0
But my PC only runs 11 CPU tasks, not 12.
I'm not restricting CPU usage and set as 100% of the CPUs and CPU time.

This PC uses above 85% of GPU and 98 to 99% of CPU while I'm not doing anything else but SETI calculations with this setting.
ID: 1772403 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 8797
Credit: 2,930,782
RAC: 1
Italy
Message 1772416 - Posted: 18 Mar 2016, 14:57:24 UTC
Last modified: 18 Mar 2016, 14:57:46 UTC

My SETI@home GPU tasks use 0.135 CPU. My Einstein@home tasks use 0.2. All by themselves, I made no intervention. What I get is about 30 GPU tasks in SETI@home and they coexist well with the CERN projects like Atlas@home, vLHC@home and vLHCathome-dev which includes CMS-dev, all using VirtualBox.
Tullio
ID: 1772416 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1772486 - Posted: 18 Mar 2016, 20:18:28 UTC
Last modified: 18 Mar 2016, 20:32:25 UTC

I tested further and found that if I run 8 GPU tasks, all 12 CPU tasks run at the same time but not with 9 GPU tasks (<cpu_usage>.04). So this doesn't quite fit what described by BilBg. I wonder what I am overlooking here.
If I suspend all GPU tasks, then all 12 CPU tasks run, of course.
Now I suspended all CPU tasks and observed the CPU usage by GPU tasks (Why didn't I think about this earlier?), and I saw 8 threads keep working for GPU tasks and 4 threads get down to 0, average CPU usage being approximately 14%. Meanwhile average GPU usage of 85% (with CPU tasks running) goes up to 95%.
I still don't have a clue how CPU contributes to GPU tasks but for my 3 GPUs running 9 tasks seem to require 15% of total CPU usage on average.
It looks like if I can dedicate 2 threads to GPU tasks (16.7%), I may be able to maximize my GPU usage.
But again, I don't know the internal work of CPU assignment by the software to GPU tasks, this may not be the case...
Will see.
ID: 1772486 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1772572 - Posted: 19 Mar 2016, 3:20:35 UTC - in response to Message 1772403.  
Last modified: 19 Mar 2016, 3:44:18 UTC

May I ask a few more clarifications?
The <cpu_usage> is not used by (GPU) applications, they use as much CPU time as they need no matter what value you set for <cpu_usage>

<cpu_usage> is only used by BOINC - to decide if it should run less CPU tasks so GPU tasks/apps have enough free CPU

This is still confusing to me. Doesn't this mean <cpu_usage> value affect GPU application since it is deciding how much free CPU is allocated to GPU tasks?
If GPU application uses as much CPU time as they need, why do we need to free up CPU core?

What do you mean by "it is deciding"?
It reads like "value ... is deciding" and value itself can't decide ;)

If you mean "GPU application ... is deciding" it is not.
Apps don't know about the value and can't restrict themselves to use less CPU (or "unleash" themselves to use more CPU if you give them freedom by bigger value)

Even if they know about the value I doubt there is a easy way for them (for programmers) to monitor their CPU usage and do pauses to fit under the value you select.
And "pauses" on the CPU part of the GPU app will mean no new data and code for GPU part (no "feed" for GPU) = lower GPU load and longer Run time - most users will not like it.

The CPU usage of GPU app is not only by the app itself but also from the driver (sometimes mostly by the driver)

BOINC also can't restrict (throttle) CPU apps to make/force them use less CPU to free e.g. 30% of a core
This will mean BOINC to manage processes on thread level (pause computing thread for 30% of the time in ms intervals)
I think TThrottle can do that. (looking in Process Explorer what happens with processes' threads)

BOINC can pause the processes on seconds intervals
E.g. if set "Use at most 70% of CPU time" will let the processes run for 7 seconds and pause them for 3 seconds


i7-3930K with 3 x GTX 670, <GPU_usage>.33, <cpu_usage>.04

For this system with 3 GPUs <cpu_usage>.4 will make BOINC to:
3 GPUs * 3 GPU tasks * 0.4 = 3.6 = (truncated to) 3 = BOINC will run 3 less CPU task (will "free 3 CPU cores")


Here, 3 GPUs * 3 GPU tasks * 0.04 =0.36 = (truncated to) 0
But my PC only runs 11 CPU tasks, not 12.
I'm not restricting CPU usage and set as 100% of the CPUs and CPU time.

You posted earlier:
"I decided to start my experiment blindly but with Sidewinder's observation by increasing CPU usage to 0.4 for now and keep accumulating data. "
That's why I used 0.4 and not 0.04

For "only runs 11 CPU tasks, not 12" and "found that if I run 8 GPU tasks, all 12 CPU tasks run at the same time but not with 9 GPU tasks (<cpu_usage>.04)."
- maybe another limit is reached, probably on RAM ("When computer is in use, use at most" - try 99% - some tasks need more RAM on startup)

Try the effect of <cpu_usage>0.001
If again you see 11 CPU tasks this have to be caused by another limit

Also 11+9 = 8+12 = 20 so you may have some forgotten <max_concurrent>20</max_concurrent> in
app_config.xml


There have to exist some debug log flag you can set so BOINC will tell why it decided to run one less CPU task but I'm not sure which flag
http://boinc.berkeley.edu/wiki/Client_configuration#Logging_flags

They (Logging Flags) can be set by Options -> Event Log options (Ctrl+Shift+F)
BOINC Manager Menu


If you want just a fixed number of free CPU cores (no matter if and how many GPU tasks run) you may just:
1) set <cpu_usage>0.001
2) "Use at most XX% of the CPUs"
11/12 = 0.91666666666666666666666666666667
10/12 = 0.83333333333333333333333333333333

So any value 84% ... 91% will free 2 "cores" of 12
(I think these are virtual/Hyper-Threading "cores" (of course BOINC don't care if they are real or virtual, if the OS say "you have 12" BOINC will not disagree ;) ))
 
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1772572 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1772624 - Posted: 19 Mar 2016, 11:50:03 UTC - in response to Message 1772572.  

The CPU usage of GPU app is not only by the app itself but also from the driver (sometimes mostly by the driver) 

And the 'driver' is not a simple, monolithic, whole that is the same for all apps.

It contains the runtime support components for the programming language chosen by the application developer - typically CUDA or OpenCL for the current generations of GPU applications, perhaps soon to be extended with Vulkan.

And within the limitations of each language and runtime environment, CPU usage will also depend on the objectives and programming techniques chosen by the developer.
ID: 1772624 · Report as offensive
Profile ReiAyanami
Avatar

Send message
Joined: 6 Dec 05
Posts: 116
Credit: 222,900,202
RAC: 174
Japan
Message 1773029 - Posted: 21 Mar 2016, 13:04:46 UTC
Last modified: 21 Mar 2016, 13:20:13 UTC

Thank you very much for the help.
Also 11+9 = 8+12 = 20 so you may have some forgotten <max_concurrent>20</max_concurrent> in app_config.xml

You were right, I forgot that I set max at 20. I was thinking max was set within the program defined by <name> but obviously total.

I freed up 1 CPU thread for PC's with one Graphics card and 2 for 3 cards and GPU usage went up a little. Meanwhile, total CPU usage went down only by a few %. Will see if increased GPU work can make up for the decreased CPU work, or out performs the previous setting over all.

This setting is great because even with a little bit of work I have to do on these PC's time to time, it doesn't freeze up the screen anymore and doesn't hold up SETI work, either.
Thank you again!
ID: 1773029 · Report as offensive
Cruncher-American Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor

Send message
Joined: 25 Mar 02
Posts: 1513
Credit: 370,893,186
RAC: 340
United States
Message 1773053 - Posted: 21 Mar 2016, 16:01:16 UTC - in response to Message 1772486.  

I tested further and found that if I run 8 GPU tasks, all 12 CPU tasks run at the same time but not with 9 GPU tasks (<cpu_usage>.04). So this doesn't quite fit what described by BilBg. I wonder what I am overlooking here.
If I suspend all GPU tasks, then all 12 CPU tasks run, of course.
Now I suspended all CPU tasks and observed the CPU usage by GPU tasks (Why didn't I think about this earlier?), and I saw 8 threads keep working for GPU tasks and 4 threads get down to 0, average CPU usage being approximately 14%. Meanwhile average GPU usage of 85% (with CPU tasks running) goes up to 95%.
I still don't have a clue how CPU contributes to GPU tasks but for my 3 GPUs running 9 tasks seem to require 15% of total CPU usage on average.
It looks like if I can dedicate 2 threads to GPU tasks (16.7%), I may be able to maximize my GPU usage.
But again, I don't know the internal work of CPU assignment by the software to GPU tasks, this may not be the case...
Will see.


The problem here is that Windows, like many OSes, doesn't properly account for CPU usage.

In Task Manager, you have green and red CPU usage. Green is application use, and red is system use. BUT there are really THREE types of usage: Application (A), System support for application (SA), and system business not chargeable to applications (SB).

A is obvious. SA is, eg, reading a file - clearly supporting an app. SB is the problem here - it is, eg, system time involved in switching tasks or paging. In this latter case, the more tasks running, the more complex the calculation, so SB can expand rapidly - exponentially - as CPU gets near 100%. This is called "thrashing", when the system is stepping on its own toes, so to speak. Windows' charging algorithm gets progressively worse at allocating usage to programs in this case, too, so CPU usage may go up for apps even though they are doing the same thing as before when the system was not as heavily loaded.

For me, it is particularly noticeable when running 2 or 3 APs on my GPUs.
ID: 1773053 · Report as offensive
Al Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Avatar

Send message
Joined: 3 Apr 99
Posts: 1682
Credit: 477,343,364
RAC: 482
United States
Message 1774449 - Posted: 27 Mar 2016, 15:12:46 UTC

Quick question along the same lines, I putting together a number of systems running (older) 4 real core CPU's, and a fairly current GPU in each. I was wondering if it made sense to reserve one or 2 CPU cores for the care and feeding of the GPU? I need to set it up so they will process 2 GPU tasks at once, but would also like to properly restrain the CPU, if necessary, to allow it to provide maximum productivity. If you have any questions that need to be answered to help with advising me, let me know. Thanks!

ID: 1774449 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1774454 - Posted: 27 Mar 2016, 15:38:26 UTC - in response to Message 1774449.  
Last modified: 27 Mar 2016, 15:41:11 UTC

If you are only running 1 GPU then you won't need to reserve an extra core but you might want to change the percentage of how much of a CPU the GPU work units use so the final total equals 1

Reserving extra cores to feed the GPUs is only necessary when running more than 1 GPU per system.

If you want to restrain CPU there are 2 different ways to do so.

Method 1 is to change the percentage of CPU usage in the local preferences on your computer ( I personally don't use this method as I believe the thoughts on how people explain it is flawed)

Method 2 is to limit the number of instances of each type of work in a app_config by either installing a max_concurrent for each type of work unit or by using a Project_max_concurrent for overall number of work units.
ID: 1774454 · Report as offensive
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Can I further optimize multiple GPU calculations?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.