Boinc Manager ignores cpu core usage for gpu WUs

Message boards : Number crunching : Boinc Manager ignores cpu core usage for gpu WUs
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1962105 - Posted: 27 Oct 2018, 9:37:15 UTC
Last modified: 27 Oct 2018, 9:37:39 UTC

Hello,

normally my boin manager is pausing one CPU WU while a GPU WU is active.
For SETI@Home i created an app_config.xml
<app_config>
   <app>
      <name>setiathome_v8</name>
      <gpu_versions>
          <gpu_usage>1</gpu_usage>
          <cpu_usage>1</cpu_usage>
      </gpu_versions>
    </app>
   <app>
      <name>astropulse_v7</name>
      <gpu_versions>
          <gpu_usage>1</gpu_usage>
          <cpu_usage>1</cpu_usage>
      </gpu_versions>
    </app>

</app_config>

I also see in the list of WUs for SETI@Home "1 CPU + 1 Nvidia GPU" as expected, so the app_config has been taken into account.

But at the moment this settings are ignored by the Boinc Manager. Always if i dont use the PC for the specified time, the GPU WU is starting, but no CPU WU is supended like in the past.

This behaviour is see on several PCs and i didnt change anything in the config files. It just happened sometime an i have no idea why.

The problem is, that the GPU WU needs a free CPU core to run fast.
I use an optimized CUDA8.0 app and also optimzed CPU apps - but since more than a year and this worked find in the past.

A restart of the PC didnt help.
Any ideas what i can try?
My work around is now to keep always 1 CPU core free via the Boinc Manager settings.
ID: 1962105 · Report as offensive
RickToTheMax

Send message
Joined: 22 May 99
Posts: 105
Credit: 7,958,297
RAC: 0
Canada
Message 1962114 - Posted: 27 Oct 2018, 12:33:30 UTC - in response to Message 1962105.  

Hi,

You could add this in your app_config.xml

<app_config>
	<project_max_concurrent>8</project_max_concurrent>
</app_config>


And change the number of task you want to be run at any given time..
In this example code, i am using a 8 thread cpu, so as soon as a GPU WU starts, it will pause a CPU WU, if 8 are already running
ID: 1962114 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1962118 - Posted: 27 Oct 2018, 12:59:02 UTC
Last modified: 27 Oct 2018, 13:01:55 UTC

This optiion is also ignored.
I set it to 4 and then 3 - alway 4xCPU and 1xGPU+CPU is running.

Maybe i would need to set this for every other project too - but i prefer a global option.
ID: 1962118 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34258
Credit: 79,922,639
RAC: 80
Germany
Message 1962125 - Posted: 27 Oct 2018, 13:22:05 UTC - in response to Message 1962118.  

This optiion is also ignored.
I set it to 4 and then 3 - alway 4xCPU and 1xGPU+CPU is running.

Maybe i would need to set this for every other project too - but i prefer a global option.


Yes, since it seems you are running GPU only at seti and CPU on other projects it has to be defined for those projects as well.
Easier would be to do it via Boinc manager in your case.
Just set it to use 75% of your CPU, this should work for all projects.


With each crime and every kindness we birth our future.
ID: 1962125 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22199
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1962127 - Posted: 27 Oct 2018, 13:24:11 UTC

These setting take a bit to get one's head round.
1 gpu plus 1 cpu means a "gpu" task will have thhe resources of one gpu and up to one cpu core or thread. It doees not always mean that it will use the entire cpu core, and thus one will see what you are describing, an apparent over comitment of cpu resources.
I find it less painful on my brain to consider these numberss to be targets not absolute values.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1962127 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1962131 - Posted: 27 Oct 2018, 13:40:50 UTC
Last modified: 27 Oct 2018, 13:43:56 UTC

It has also no effect if i put the same option to WCG-app_config (my CPU-project).
So i think the boinc manager completely ignores all these options. :/

I know that these GPU-WUs only need about 10% of the CPU-core. But if all cpu-cores are at 100% than this 10% is not available and this results in mach longer computing duration (about +50%)

I will continue to set the cpu-limit directly in the BM and need to adjust it, when i switch to other projects.
ID: 1962131 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30646
Credit: 53,134,872
RAC: 32
United States
Message 1962139 - Posted: 27 Oct 2018, 15:15:01 UTC

Hasn't everyone forgotten the number of CPU is done in integer math?
So a w/u declares it needs 0.3 CPU and 1 GPU, that w/u is counted as needing 0 CPU and 1 GPU.
ID: 1962139 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1962141 - Posted: 27 Oct 2018, 15:45:32 UTC - in response to Message 1962139.  

Hasn't everyone forgotten the number of CPU is done in integer math?
So a w/u declares it needs 0.3 CPU and 1 GPU, that w/u is counted as needing 0 CPU and 1 GPU.


I seldom follow what the app says it needs. I prefer to use BoincTasks and get an actual read on what is really being used. BoincTasks will let you see how much it is using and then you can configure your system by determining the total. Example on Petri/TBar special it say 0.1 CPU per task. But BOINCTasks tells me they are actually using 0.97 So I make adjustments for that when I decide how many threads to give to the GPU vs the CPU work task.
ID: 1962141 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1962159 - Posted: 27 Oct 2018, 18:48:49 UTC - in response to Message 1962139.  

Hasn't everyone forgotten the number of CPU is done in integer math?
So a w/u declares it needs 0.3 CPU and 1 GPU, that w/u is counted as needing 0 CPU and 1 GPU.

But i have set it manually to 1 CPU and 1 GPU. And all the years this worked fine (if the GPU started, one CPU core suspended).
ID: 1962159 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1962176 - Posted: 27 Oct 2018, 21:43:09 UTC - in response to Message 1962159.  

But i have set it manually to 1 CPU and 1 GPU.

And by using 1 CPU core for each GPU WU it is doing what you told it to.

I know that these GPU-WUs only need about 10% of the CPU-core.

So why have you set it to use a whole CPU core, when it doesn't need it? One on the main reasons for the development of the Linux CUDA application was to reduce the amount of CPU resources required to process GPU work.
Grant
Darwin NT
ID: 1962176 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1962177 - Posted: 27 Oct 2018, 21:45:39 UTC - in response to Message 1962176.  
Last modified: 27 Oct 2018, 21:50:58 UTC

It is not doing what it should do (thats the reason for this thread...).

And as I wrote above - it does need the CPU core for acceptable performance.
Without a free CPU core the WUs run about 500...550s for 80cr.
With a free CPU core it runs about 300...350s for the same 80cr.

I remember that 2-3 years ago it was no problem to leave no CPU core free. But now it is so.
Maybe a driver update from Nvidia let this problem appear.
ID: 1962177 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1962183 - Posted: 27 Oct 2018, 22:03:59 UTC - in response to Message 1962177.  
Last modified: 27 Oct 2018, 22:04:30 UTC

It is not doing what it should do (thats the reason for this thread...).

It is doing exactly as it should, according to the app_config.xml settings you posted.

However the reserving of cores for non-BOINC use, and the behaviour you describe are the result of other configuration settings, either in app_config.xml, or elsewhere.
I've personally never had an issue with Seti using all available cores, so I can't help with the blocking of cores for BOINC use.
Grant
Darwin NT
ID: 1962183 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1962189 - Posted: 27 Oct 2018, 23:24:46 UTC - in response to Message 1962105.  


I also see in the list of WUs for SETI@Home "1 CPU + 1 Nvidia GPU" as expected, so the app_config has been taken into account.

But at the moment this settings are ignored by the Boinc Manager. Always if i dont use the PC for the specified time, the GPU WU is starting, but no CPU WU is supended like in the past.
I'm trying to understand you ... You have seen 1 + 1 displayed in the past but not now? Or it does say 1 + 1 Now?
The app_config file needs to be in the seti folder, not BOINC.

"Always if i dont use the PC for the specified time" ... If you have it set to only use computer when inactive, you may create problems. The CUDA "special' app is known to have problems restarting from a check point. It is best to have the computer run 24/7 and have your check point time set to long than your GPU tasks normally run.
ID: 1962189 · Report as offensive
Darrell Wilcox Project Donor
Volunteer tester

Send message
Joined: 11 Nov 99
Posts: 303
Credit: 180,954,940
RAC: 118
Vietnam
Message 1962225 - Posted: 28 Oct 2018, 4:46:49 UTC - in response to Message 1962139.  

All the CPU and GPU defined needs are summed, then rounded down, so three GPU tasks defined as needing 0.5 CPUs (for example) would result in 1 CPU being allocated from the pool for BOINC. It does NOT mean the resource(s) will actually be used by the client.
ID: 1962225 · Report as offensive
Darrell Wilcox Project Donor
Volunteer tester

Send message
Joined: 11 Nov 99
Posts: 303
Credit: 180,954,940
RAC: 118
Vietnam
Message 1962227 - Posted: 28 Oct 2018, 4:53:43 UTC - in response to Message 1962189.  

Yes, in ...\BOINC\projects\setiathome.berkeley.edu

and a line must appear in the "Tools" -> "Event log" similar to "10/28/2018 11:49:39 AM | SETI@home | Found app_config.xml"

to prove the Manager found it.
ID: 1962227 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30646
Credit: 53,134,872
RAC: 32
United States
Message 1962229 - Posted: 28 Oct 2018, 5:14:46 UTC - in response to Message 1962177.  

It is not doing what it should do (thats the reason for this thread...).

And as I wrote above - it does need the CPU core for acceptable performance.
Without a free CPU core the WUs run about 500...550s for 80cr.
With a free CPU core it runs about 300...350s for the same 80cr.

I remember that 2-3 years ago it was no problem to leave no CPU core free. But now it is so.
Maybe a driver update from Nvidia let this problem appear.

Is this on all 3 of your machines? What else do you have running on that machine? Both BOINC and not BOINC.

Frankly because what you say doesn't make sense unless there is something else going on or the root of the evil.

I hate to guess without running diagnostics but, thinking possibly RAM related. Oh, DUH, do you have leave paused jobs in memory set?

See also http://man7.org/linux/man-pages/man7/sched.7.html CONFIG_SCHED_AUTOGROUP
ID: 1962229 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1962242 - Posted: 28 Oct 2018, 7:10:23 UTC - in response to Message 1962183.  

It is not doing what it should do (thats the reason for this thread...).

It is doing exactly as it should, according to the app_config.xml settings you posted.

Can you please explain a bit detailed, what you mean?
ID: 1962242 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1962244 - Posted: 28 Oct 2018, 7:15:34 UTC - in response to Message 1962189.  
Last modified: 28 Oct 2018, 7:39:13 UTC


I also see in the list of WUs for SETI@Home "1 CPU + 1 Nvidia GPU" as expected, so the app_config has been taken into account.

But at the moment this settings are ignored by the Boinc Manager. Always if i dont use the PC for the specified time, the GPU WU is starting, but no CPU WU is supended like in the past.
I'm trying to understand you ... You have seen 1 + 1 displayed in the past but not now? Or it does say 1 + 1 Now?
The app_config file needs to be in the seti folder, not BOINC.

"Always if i dont use the PC for the specified time" ... If you have it set to only use computer when inactive, you may create problems. The CUDA "special' app is known to have problems restarting from a check point. It is best to have the computer run 24/7 and have your check point time set to long than your GPU tasks normally run.


I have specified that the GPU is suspended if the PC is in use and only starts if the PC is 3 minutes inactive.

I the past:
Computer is used: 4x CPU WUs are running
Computer is not used: after 3 minutes the GPU WU is starting and a CPU WU is suspended, so 3x CPU and 1x GPU were running.

Now:
Computer is used: 4x CPU WUs are running
Computer is not used: after 3 minutes the GPU WU is starting and 4x CPU and 1x GPU were running.

Its the same behavior on my Ryzen, where 16 threads are available, but there i have not set the option to suspend the work, so in the past always 15 threads are used for CPU, 1 for GPU - now 16 for CPU + GPU.
The Athlon5350 i use for all office stuff, so i need to suspend the GPU work to get more than 1fps on the desktop. In the past and also now i have not seen any problems with suspended CUDA apps. Its no problem for me if they start again from zero.
I can set the checkpoint time? I thought that is done by the project app?
ID: 1962244 · Report as offensive
Profile MagicEye
Volunteer tester
Avatar

Send message
Joined: 19 Sep 99
Posts: 70
Credit: 40,327,877
RAC: 75
Germany
Message 1962245 - Posted: 28 Oct 2018, 7:31:39 UTC - in response to Message 1962229.  
Last modified: 28 Oct 2018, 7:48:50 UTC

It is not doing what it should do (thats the reason for this thread...).

And as I wrote above - it does need the CPU core for acceptable performance.
Without a free CPU core the WUs run about 500...550s for 80cr.
With a free CPU core it runs about 300...350s for the same 80cr.

I remember that 2-3 years ago it was no problem to leave no CPU core free. But now it is so.
Maybe a driver update from Nvidia let this problem appear.

Is this on all 3 of your machines? What else do you have running on that machine? Both BOINC and not BOINC.

Frankly because what you say doesn't make sense unless there is something else going on or the root of the evil.

I hate to guess without running diagnostics but, thinking possibly RAM related. Oh, DUH, do you have leave paused jobs in memory set?

See also http://man7.org/linux/man-pages/man7/sched.7.html CONFIG_SCHED_AUTOGROUP

I have only access to 2 machines, the FX-8350 is far away and i can check it next week.

At the moment the GPUs runs SETI and the CPUs runs World Community Grid.
On the Ryzen nothing else except the usual Ubuntu18 services is running, its a crunching only PC.
The Athlon5350 is used with Firefox, Thunderbird, LibreOffice, Gimp, Psensor.

Paused jobs stay in memory is activated.

/proc/sys/kernel/sched_autogroup_enabled is set to 1 (=enabled)

Output of boinc if the GPU work starts:
So 28 Okt 2018 08:41:38 CET |  | Resuming GPU computation
So 28 Okt 2018 08:41:38 CET | SETI@home | [cpu_sched] Restarting task blc23_2bit_guppi_58340_57645_HIP25878_0085.22529.409.19.28.101.vlar_1 using setiathome_v8 version 801 (cuda80) in slot 4
So 28 Okt 2018 08:41:57 CET |  | Suspending GPU computation - computer is in use

ID: 1962245 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1962246 - Posted: 28 Oct 2018, 8:06:06 UTC - in response to Message 1962244.  

What you say it is doing makes no sense, as the syntax of the app_config looks fine to me. What computer you copied that from I do not know.

As Darrell said "and a line must appear in the "Tools" -> "Event log" similar to "10/28/2018 11:49:39 AM | SETI@home | Found app_config.xml"

The app_config file MUST be in the setiathome folder, AND have read right for the user running BOINC. I believe that is 'boinc-client' for the repository version of BOINC.

Check if you see "SETI@home | Found app_config.xml" in your BOINC startup log.

How do you know what tasks are running when idle 3 minutes? Are you watching the BoincManger Window as the GPU starts up, and the 4 CPU tasks remain running?
ID: 1962246 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : Boinc Manager ignores cpu core usage for gpu WUs


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.