How to use multiple GPU's

Questions and Answers : GPU applications : How to use multiple GPU's
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1959366 - Posted: 9 Oct 2018, 4:05:19 UTC - in response to Message 1959351.  

Ok, adding gpus fairly easily ( reboot, reboot,reboot) in the status column (for GPUs) ie: Running (0.18 CPUs + 1 NVIDA GPU).... where is the 0.18 (in this instance as I see almost all pcs are different) configured, and what are the values sources? (if that makes sense)
THANX


The value of 0.18 is in either 1 of 2 places. First place is in the app_info.xml As you read thru it you will find for cuda applications the value of 0.18 located there. The other place you can find it is in an app_config.xml. If you have the latter, then it will override the value 0.18 for whatever value you set in the app_config.xml.

Where is the value source......

From what I remember it was a value that the original creators of the lunatics came up with, ie cuda 32 and cuda 42. However, it was never follow up on once we went to cuda 5.0 How do I know that?? Because I experimented with different values after I started with it and found that 0.18 was too low of a value. Of the top of my head I can't remember what the value I finally settled on but it was higher than 0.18. Found it. It was 0.35 that was the actual amount each cuda 5.0 needed to run correctly.

For SoG, I found that it need 0.97 of a core to run per each work unit. ie.. might as well just set it to 1.
ID: 1959366 · Report as offensive
Profile Bravo6
Avatar

Send message
Joined: 28 Oct 99
Posts: 52
Credit: 44,947,756
RAC: 0
United States
Message 1960411 - Posted: 15 Oct 2018, 15:52:25 UTC
Last modified: 15 Oct 2018, 15:55:10 UTC

Is there an overall effect (neg or pos) of setting use at most 0% of CPUs in the manager (going to try running several [5] GPUs). Would it be better to do in the config files?
Also I do not see much performance effect from additional system RAM?
THANX
"Don't worry about it, nothing is gonna be O.K. anyway.........."
ID: 1960411 · Report as offensive
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1960417 - Posted: 15 Oct 2018, 16:13:35 UTC - in response to Message 1960411.  

Is there an overall effect (neg or pos) of setting use at most 0% of CPUs in the manager (going to try running several [5] GPUs). Would it be better to do in the config files?
Also I do not see much performance effect from additional system RAM?
THANX


Excellent question!
Depending on how heavily your CPU is loaded, reducing the CPU cores that are processing Seti tasks can speed up your overall production. I think the numbers bandied around are either something like 10% of the available cores or 1-3 cores should be idled.

I have no experience trying to manage the cpu's proper using app_config.xml and/or app_info.xml BUT

You can control the total number of tasks the project will run using <project_max_concurrent>5</project_max_concurrent> inside the app_config.xml file. You need to put it inside the outside pair of parameters.

And you can control the # of cpu cores you use. If you are using less than 0.50? cpus / gpu then this will not control the number of gpu's that are being run. If you are running 1 CPU / gpu it will.

I only have 1 machine right now that is running pure gpu only. I controlled it that way by setting up one of the "locations" in on the Seti website for my computers to be "gpu" only.

Since it will not harm your system to set it to 0% cpu cores, you should be able to experiment and get immediate feedback. If the GPUs stop processing when you do this then either you have 1 gpu per CPU setup or I am wrong.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1960417 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1960493 - Posted: 16 Oct 2018, 1:46:31 UTC - in response to Message 1960417.  

I've never understood why people continue to limit CPU usage by using the % option. Makes no sense.

You tell the CPU it can only use 10% of all CPU. So what are all the work units (both CPU and GPU) supposed to do?? Cut up that 10% among all of them?? Because that is what you are telling it to doing. Others will say that isn't so but that's not what I've seen. Tell the computer it can use 100% of all cores then limit how many work units you have running at anytime by use of the <project_max_concurrent> in the app_config.xml.

my 2 cents...
ID: 1960493 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1962364 - Posted: 29 Oct 2018, 6:00:45 UTC - in response to Message 1960493.  
Last modified: 29 Oct 2018, 6:39:29 UTC

I've always used the CPU % setting to set the number of CPUs to run. I've Never had a problem with it either, and it's Global. I have had problems with using <project_max_concurrent> because it's Not Global. I.E. set the SETI preferences to 6 max while running 3 GPUs & 3 CPUs. Now switch a GPU to run SETI Beta. That will result in SETI launching another CPU task making the total be 4 CPU and 3 GPU tasks...which is Too many for me. I never could get it to work correctly, because it's Not Global. So, I don't use an app_config file, the old fashioned way works fine for me.

To Prove the % setting only affects the CPU tasks, look at this machine, it is set to run 49% of 8 CPUs and -nobs on the GPUs. Obviously 3 CPU tasks are running and the GPUs are using 2 Full CPUs, one a piece; https://setiathome.berkeley.edu/results.php?hostid=6796475&offset=220&show_names=0&state=0&appid=
It's also the same running 24% (One) CPU, and 3 GPUs with -nobs. One CPU task will run and the 3 GPUs will use 3 Full CPUs, proving the CPU % setting Only affects the CPU tasks, not the GPU tasks.

Also, my Mining machine is set to run One CPU for when I decide to Bunker Tasks. Right now I'm not running any CPU tasks, but, it still shows around 40-60% CPU usage even though the setting says 24%. When I do change it to run One CPU while Bunkering, that 40-60% goes up by about 12.5%, again proving the CPU % setting Only affects the CPUs, https://setiathome.berkeley.edu/results.php?hostid=6813106&offset=1100
ID: 1962364 · Report as offensive
Profile jrs

Send message
Joined: 1 Feb 16
Posts: 3
Credit: 73,979,603
RAC: 127
Norway
Message 1968654 - Posted: 4 Dec 2018, 9:46:56 UTC

Hi. I am not able to make both my Nvidia 1060 and 1070 GPU work with Boinc. It work with other software like NiceHash miner.
OS is windows 10 Pro. In the log file I get this message.

04.12.2018 09.47.32 | | Running under account jrs
04.12.2018 09.47.33 | | CUDA: NVIDIA GPU 0: GeForce GTX 1070 (driver version 417.22, CUDA version 10.0, compute capability 6.1, 4096MB, 3560MB available, 6803 GFLOPS peak)
04.12.2018 09.47.33 | | CUDA: NVIDIA GPU 1 (not used): GeForce GTX 1060 3GB (driver version 417.22, CUDA version 10.0, compute capability 6.1, 3072MB, 2487MB available, 4111 GFLOPS peak)
04.12.2018 09.47.33 | | OpenCL: NVIDIA GPU 0: GeForce GTX 1070 (driver version 417.22, device version OpenCL 1.2 CUDA, 8192MB, 3560MB available, 6803 GFLOPS peak)
04.12.2018 09.47.33 | | OpenCL: NVIDIA GPU 1 (ignored by config): GeForce GTX 1060 3GB (driver version 417.22, device version OpenCL 1.2 CUDA, 3072MB, 2487MB available, 4111 GFLOPS peak)

What I have tested.

Driver.
I have stopped windows form making drivers updates. Installed the newest Nvidia driver with the fresh option. In the log it now seen to be ok. When I check the driver under system it is 25.21.14.1722. It has the same date as my new instalation. Is it suppose to have another number?

Nvidia controll panel.
Have turn all on. I have also set the 1060 to be Primary OpenGL, without any changes.

app_config.xml
I have had a similar problem with and old computer. I then removed this file from the project folder and it start working, but then only one operation for each GPU. (Also 2 GPUs, but old GTX 560.)

cc_config.xml
Added <use_all_gpus>1</use_all_gpus> under options.

Remove and reinstall
I have uninstalled Bonic and Oracel VM and installed the newest version.

Any suggestion ?

Richard Steen
Norway
ID: 1968654 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1968665 - Posted: 4 Dec 2018, 11:47:18 UTC - in response to Message 1968654.  

cc_config.xml
Added <use_all_gpus>1</use_all_gpus> under options.

Did you exit & restart BOINC completely after adding this? By completely I don't just mean exit BOINC Manager and restart it because this may just restart the Manager, not the client. Command decisions, including all detection of CPU and GPU are made only at BOINC startup.

Can you also post the contents of your cc_config.xml file?
The only extension on cc_config.xml is xml? If you edited the file with Notepad, it may have added .txt to the end (and with Windows default still hiding known extensions, you may never know!)
ID: 1968665 · Report as offensive
Profile jrs

Send message
Joined: 1 Feb 16
Posts: 3
Credit: 73,979,603
RAC: 127
Norway
Message 1968666 - Posted: 4 Dec 2018, 12:17:44 UTC - in response to Message 1968665.  
Last modified: 4 Dec 2018, 13:06:22 UTC

I have rebooted the computer several times.

Is it any command that I can add to the start of the boinc, to force it?

My cc_config.xml. Extension is xml.

<cc_config>
<log_flags>
<file_xfer>1</file_xfer>
<sched_ops>1</sched_ops>
<task>1</task>
<app_msg_receive>0</app_msg_receive>
<app_msg_send>0</app_msg_send>
<async_file_debug>0</async_file_debug>
<benchmark_debug>0</benchmark_debug>
<checkpoint_debug>0</checkpoint_debug>
<coproc_debug>0</coproc_debug>
<cpu_sched>0</cpu_sched>
<cpu_sched_debug>0</cpu_sched_debug>
<cpu_sched_status>0</cpu_sched_status>
<dcf_debug>0</dcf_debug>
<disk_usage_debug>0</disk_usage_debug>
<file_xfer_debug>0</file_xfer_debug>
<gui_rpc_debug>0</gui_rpc_debug>
<heartbeat_debug>0</heartbeat_debug>
<http_debug>0</http_debug>
<http_xfer_debug>0</http_xfer_debug>
<idle_detection_debug>0</idle_detection_debug>
<mem_usage_debug>0</mem_usage_debug>
<network_status_debug>0</network_status_debug>
<notice_debug>0</notice_debug>
<poll_debug>0</poll_debug>
<priority_debug>0</priority_debug>
<proxy_debug>0</proxy_debug>
<rr_simulation>0</rr_simulation>
<rrsim_detail>0</rrsim_detail>
<sched_op_debug>0</sched_op_debug>
<scrsave_debug>0</scrsave_debug>
<slot_debug>0</slot_debug>
<state_debug>0</state_debug>
<statefile_debug>0</statefile_debug>
<suspend_debug>0</suspend_debug>
<task_debug>0</task_debug>
<time_debug>0</time_debug>
<trickle_debug>0</trickle_debug>
<unparsed_xml>0</unparsed_xml>
<work_fetch_debug>0</work_fetch_debug>
</log_flags>
<options>
<use_all_gpus>1</use_all_gpus>
<abort_jobs_on_exit>0</abort_jobs_on_exit>
<allow_multiple_clients>0</allow_multiple_clients>
<allow_remote_gui_rpc>0</allow_remote_gui_rpc>
<disallow_attach>0</disallow_attach>
<dont_check_file_sizes>0</dont_check_file_sizes>
<dont_contact_ref_site>0</dont_contact_ref_site>
<lower_client_priority>0</lower_client_priority>
<dont_suspend_nci>0</dont_suspend_nci>
<dont_use_vbox>0</dont_use_vbox>
<dont_use_wsl>0</dont_use_wsl>
<exit_after_finish>0</exit_after_finish>
<exit_before_start>0</exit_before_start>
<exit_when_idle>0</exit_when_idle>
<fetch_minimal_work>0</fetch_minimal_work>
<fetch_on_update>0</fetch_on_update>
<force_auth>default</force_auth>
<http_1_0>0</http_1_0>
<http_transfer_timeout>300</http_transfer_timeout>
<http_transfer_timeout_bps>10</http_transfer_timeout_bps>
<max_event_log_lines>2000</max_event_log_lines>
<max_file_xfers>8</max_file_xfers>
<max_file_xfers_per_project>2</max_file_xfers_per_project>
<max_stderr_file_size>0</max_stderr_file_size>
<max_stdout_file_size>0</max_stdout_file_size>
<max_tasks_reported>0</max_tasks_reported>
<proxy_info>
<socks_server_name></socks_server_name>
<socks_server_port>80</socks_server_port>
<http_server_name></http_server_name>
<http_server_port>80</http_server_port>
<socks5_user_name></socks5_user_name>
<socks5_user_passwd></socks5_user_passwd>
<socks5_remote_dns>0</socks5_remote_dns>
<http_user_name></http_user_name>
<http_user_passwd></http_user_passwd>
<no_proxy></no_proxy>
<no_autodetect>0</no_autodetect>
</proxy_info>
<rec_half_life_days>10.000000</rec_half_life_days>
<report_results_immediately>0</report_results_immediately>
<run_apps_manually>0</run_apps_manually>
<save_stats_days>30</save_stats_days>
<skip_cpu_benchmarks>0</skip_cpu_benchmarks>
<simple_gui_only>0</simple_gui_only>
<start_delay>0.000000</start_delay>
<stderr_head>0</stderr_head>
<suppress_net_info>0</suppress_net_info>
<unsigned_apps_ok>0</unsigned_apps_ok>
<use_all_gpus>0</use_all_gpus>
<use_certs>0</use_certs>
<use_certs_only>0</use_certs_only>
<vbox_window>0</vbox_window>
</options>
</cc_config>

I will test this next.
I have another computer with a 1060 GPU. After work I will take out the 1070 GPU and replace it with the other 1060 card. The problem computer will then have two 1060. They are same brand and type. It will force the start up to make some changes.

Richard
ID: 1968666 · Report as offensive
Profile Jord
Volunteer tester
Avatar

Send message
Joined: 9 Jun 99
Posts: 15184
Credit: 4,362,181
RAC: 3
Netherlands
Message 1968694 - Posted: 4 Dec 2018, 22:10:00 UTC - in response to Message 1968666.  
Last modified: 4 Dec 2018, 22:16:55 UTC

You manually added <use_all_gpus>1</use_all_gpus> to your cc_config.xml file, but there's already one in there. Remove the line you added, scroll down and change the 0 on the later line to 1, then save the file. All options lines should only be used once in cc_config.xml

A full cc_config.xml file is saved to the data directory when you add an exclusive app via the menus in BOINC Manager, or when you make a change to the Event Log options menu and save that. When you next edit that file and add anything to the top, BOINC will read the file, find your added line that switches something on, then continue down to the bottom, find the line that switches it off again and that there is the problem you have.

Always check if the line you wanted isn't already in the file. They're in alphabetical order, so you can easily glance at commands starting with U.

By default BOINC uses only the best GPU it detects, any lesser GPUs are not used.
It'll do this based on what it detects in the drivers for compute capability, software version, available memory and speed.
ID: 1968694 · Report as offensive
Profile jrs

Send message
Joined: 1 Feb 16
Posts: 3
Credit: 73,979,603
RAC: 127
Norway
Message 1968772 - Posted: 5 Dec 2018, 7:31:37 UTC - in response to Message 1968694.  
Last modified: 5 Dec 2018, 7:48:34 UTC

Hi.

It is now working. Tank you for the help. The two GPUs are now identical. I have replaces the 1070 with a 1060. This computer now has two 1060. I made a clean GPU installation with the same driver. Boinc found the cards without problems.

04.12.2018 20.15.50 | | CUDA: NVIDIA GPU 0: GeForce GTX 1060 3GB (driver version 417.22, CUDA version 10.0, compute capability 6.1, 3072MB, 2487MB available, 4111 GFLOPS peak)
04.12.2018 20.15.50 | | CUDA: NVIDIA GPU 1: GeForce GTX 1060 3GB (driver version 417.22, CUDA version 10.0, compute capability 6.1, 3072MB, 2487MB available, 4111 GFLOPS peak)
04.12.2018 20.15.50 | | OpenCL: NVIDIA GPU 0: GeForce GTX 1060 3GB (driver version 417.22, device version OpenCL 1.2 CUDA, 3072MB, 2487MB available, 4111 GFLOPS peak)
04.12.2018 20.15.50 | | OpenCL: NVIDIA GPU 1: GeForce GTX 1060 3GB (driver version 417.22, device version OpenCL 1.2 CUDA, 3072MB, 2487MB available, 4111 GFLOPS peak)

My next build is a second hand GPU miningrig. I havent purchase any GPUs yet, but I will make sure they all will be the same. This motherboard has only one PCIe 3.0, but 11 PCIe 2.0 slots. If Bonic might drop GPUs that is not that fast. I will place an old GPU in the PCIe 3.0 slot, and use same GPUs for the rest of the 11 PCIe 2.0 slots. I guess Bonic find these to be the same and faster than the one I will place in the PCIe 3.0 slot.
This rig has a slow CPU, that I will replace with a one with more cores.

The mistake with two <use_all_gpus>1</use_all_gpus> lines might be the reason for my problem. I guess that the last one will be the one Bonic will use. <use_all_gpus>0</use_all_gpus>. I will test it with the new rig.

Richard
ID: 1968772 · Report as offensive
Previous · 1 · 2 · 3 · 4

Questions and Answers : GPU applications : How to use multiple GPU's


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.