multi-client and GPU detection

Message boards : Number crunching : multi-client and GPU detection
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile ChristianVirtual
Avatar

Send message
Joined: 23 Jun 13
Posts: 20
Credit: 6,631,963
RAC: 82
Japan
Message 1879614 - Posted: 22 Jul 2017, 7:19:16 UTC
Last modified: 22 Jul 2017, 7:21:48 UTC

I try on my linux systems to setup multiple clients to allow an easy way for bunkering. I know its controversial but if its encourage one need to use in order to be competitive in challenges. Do I like it: not really though as I would prefer raw-power during events.

Now: technically I understand how run multiple clients and can do so for CPU clients meanwhile quite well.

But when I try to do that for GPU I get the following event logs

Sat 22 Jul 16:13:03 2017 |  | Starting BOINC client version 7.7.0 for x86_64-pc-linux-gnu
Sat 22 Jul 16:13:03 2017 |  | This a development version of BOINC and may not function properly
Sat 22 Jul 16:13:03 2017 |  | log flags: file_xfer, sched_ops, task, coproc_debug
Sat 22 Jul 16:13:03 2017 |  | Libraries: libcurl/7.29.0 NSS/3.21 Basic ECC zlib/1.2.7 libidn/1.28 libssh2/1.4.3
Sat 22 Jul 16:13:03 2017 |  | Running as a daemon
Sat 22 Jul 16:13:03 2017 |  | Data directory: /home/boinc/zdatadirs/boincdata1
Sat 22 Jul 16:13:03 2017 |  | [coproc] launching child process at boinc
Sat 22 Jul 16:13:03 2017 |  | [coproc] relative to directory /home/boinc
Sat 22 Jul 16:13:03 2017 |  | [coproc] with data directory /home/boinc/zdatadirs/boincdata1
Sat 22 Jul 16:13:03 2017 |  | GPU detection failed. error code 5632
Sat 22 Jul 16:13:03 2017 |  | [coproc] read_coproc_info_file() returned error -108
Sat 22 Jul 16:13:03 2017 |  | No usable GPUs found
Sat 22 Jul 16:13:03 2017 |  | Host name: linuxpowered
Sat 22 Jul 16:13:03 2017 |  | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600S CPU @ 2.80GHz [Family 6 Model 42 Stepping 7]
Sat 22 Jul 16:13:03 2017 |  | OS: Linux CentOS Linux: CentOS Linux 7 (Core) [3.10.0-514.6.1.el7.x86_64]
Sat 22 Jul 16:13:03 2017 |  | Memory: 31.21 GB physical, 11.18 GB virtual
Sat 22 Jul 16:13:03 2017 |  | Disk: 50.03 GB total, 47.52 GB free
Sat 22 Jul 16:13:03 2017 |  | Local time is UTC +9 hours


I tried several things like disallow GPU on the client on port 31416 ; don't run it at all etc.

On the main port via 31416 the GPUs get correctly detected with proper driver version and working well. So driver and GPUs installation are ok for itself.

Any idea what the error codes meaning ? I grepped through the sources but could not find any good hint.

Thanks in advance !
ID: 1879614 · Report as offensive
Profile Tom Miller
Volunteer tester
Avatar

Send message
Joined: 28 Nov 02
Posts: 768
Credit: 18,730,656
RAC: 16,875
United States
Message 1879623 - Posted: 22 Jul 2017, 8:41:09 UTC - in response to Message 1879614.  

Are you running each copy of Boinc in a separate directory? It seems to me I read you had to do that. All the data I have seen about gpu's is specific to that copy of Boinc.

Since I run 1 task / gpu versions of gpu cards I am uncertain how you would get multiple copies of Boinc to play "nicely" with a single gpu.

Tom
"You are entitled to your own opinion but not to your own facts." Senator and Professor Patrick Moynihan
---
https://GalensonConsulting.WordPress.com
ID: 1879623 · Report as offensive
Profile ChristianVirtual
Avatar

Send message
Joined: 23 Jun 13
Posts: 20
Credit: 6,631,963
RAC: 82
Japan
Message 1879628 - Posted: 22 Jul 2017, 9:01:03 UTC
Last modified: 22 Jul 2017, 9:02:34 UTC

the main folder is in /home/boinc and different data folder underneath of the multiple instances.

You mean I need to install in a way like /home/boinc, /home/boinc1, .... with a full duplication ? I was thinking about that but hoped I can avoid (it would be similar to a VM-based setup, if passthrough for GPU would work)
ID: 1879628 · Report as offensive
Profile Tom Miller
Volunteer tester
Avatar

Send message
Joined: 28 Nov 02
Posts: 768
Credit: 18,730,656
RAC: 16,875
United States
Message 1879630 - Posted: 22 Jul 2017, 9:17:23 UTC - in response to Message 1879628.  

the main folder is in /home/boinc and different data folder underneath of the multiple instances.

You mean I need to install in a way like /home/boinc, /home/boinc1, .... with a full duplication ? I was thinking about that but hoped I can avoid (it would be similar to a VM-based setup, if passthrough for GPU would work)


I only looked briefly at the multiple Boinc client documentation. But I THOUGHT I understood that each client instance was "fully separate" which meant that even the Boinc directories were entirely separate.

This looks like a relevant discussion which doesn't support my understanding but does discuss how/what should be done.


https://boinc.berkeley.edu/dev/forum_thread.php?id=10986

HTH,
Tom
"You are entitled to your own opinion but not to your own facts." Senator and Professor Patrick Moynihan
---
https://GalensonConsulting.WordPress.com
ID: 1879630 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 350
Credit: 971,179
RAC: 2,350
Finland
Message 1879673 - Posted: 22 Jul 2017, 14:44:40 UTC - in response to Message 1879614.  

The client can't be started from $PATH. You need to include the path to the executable.

5632 is status code from waitpid(). Decoded it's process exited with exit code 22 which is EINVAL or Invalid argument. Not sure where that could come from. Is (/usr/bin/)boinc the real executable or some wrapper?
ID: 1879673 · Report as offensive
Profile ChristianVirtual
Avatar

Send message
Joined: 23 Jun 13
Posts: 20
Credit: 6,631,963
RAC: 82
Japan
Message 1879884 - Posted: 23 Jul 2017, 8:35:19 UTC - in response to Message 1879673.  

The client can't be started from $PATH. You need to include the path to the executable.

5632 is status code from waitpid(). Decoded it's process exited with exit code 22 which is EINVAL or Invalid argument. Not sure where that could come from. Is (/usr/bin/)boinc the real executable or some wrapper?


Thats what I do right now ... in a crude way by installing several complete boincs based on source code and in a way that bin/libs are in the home folder (else it will complain during install that some files are locked due to other running versions). Diskspace is cheap. :-|
I need that setup only occasionally so I can live with less elegance.

PS: the /usr/local/bin/boinc is the executable of the first instance installed ...
ID: 1879884 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11516
Credit: 106,352,340
RAC: 71,573
United Kingdom
Message 1879885 - Posted: 23 Jul 2017, 9:10:33 UTC

By analogy with the Windows model (and only by analogy, so there may be Linux nuances I'm unaware of):

There's no problem with running multiple instances of boinc (the client) and boincmgr (the manager) from a single binary application folder/

BUT:

Each client instance must be directed to use a separate and distinct data folder
Each client instance must be directed to use a separate and distinct TCP port to communicate with its manager
Each manager instance must be directed to use the matching TCP port to communicate with its client

That probably requires extended use of command line switches in your launch scripts, including switches for 'allow multiple clients' and 'allow multiple managers', as appropriate.
ID: 1879885 · Report as offensive
Profile ChristianVirtual
Avatar

Send message
Joined: 23 Jun 13
Posts: 20
Credit: 6,631,963
RAC: 82
Japan
Message 1879887 - Posted: 23 Jul 2017, 9:37:52 UTC

yes, that the way it works perfectly for CPU crunching also under linux; just create the data folder; standard manually the instances on different ports etc. Also starting the BM remotely with command parameter to reflect the different ports.

The only failing capability is the desired GPU detection when starting via simple separate data folders.
ID: 1879887 · Report as offensive
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11516
Credit: 106,352,340
RAC: 71,573
United Kingdom
Message 1879889 - Posted: 23 Jul 2017, 9:49:32 UTC - in response to Message 1879623.  
Last modified: 23 Jul 2017, 10:15:28 UTC

OK then, I'll bow out - Linux GPU detection is certainly a specialised topic.

I simply wanted to clarify some of the uncertainty introduced by Tom Miller

I am uncertain how you would get multiple copies of Boinc to play "nicely" with a single gpu.
By modifying the (separate, distinct) cc_config.xml files in each data folder, and by creating (separate, distinct) app_config.xml files in as many project folders in each separate data folder as you need.

Edit: it's beginning to feel as if the first BOINC (client) instance is establishing and maintaining an exclusive lock on some part of the CUDA library infrastructure. In which case, the BOINC developers and the distro package maintainers ought to be told about it - but they would ask for a detailed investigation report first.
ID: 1879889 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 350
Credit: 971,179
RAC: 2,350
Finland
Message 1879961 - Posted: 23 Jul 2017, 20:41:14 UTC - in response to Message 1879887.  

Could you share start up script or however you start them?
ID: 1879961 · Report as offensive
Profile ChristianVirtual
Avatar

Send message
Joined: 23 Jun 13
Posts: 20
Credit: 6,631,963
RAC: 82
Japan
Message 1880070 - Posted: 24 Jul 2017, 15:36:17 UTC

in case of separate installation (working GPU detection) with dedicated users it looks like this (from ps -ef)

/home/boinc3/bin/boinc --allow_multiple_clients --allow_remote_gui_rpc --gui_rpc_port 30003 --daemon --suppress_net_info


in the home folder I have the bins and libs

drwxr-xr-x 38 boinc4 boinc4   4096 Jul 24 16:28 boinc_source
drwxrwxr-x  3 boinc4 boinc4   4096 Jul 24 16:28 include
drwxrwxr-x  2 boinc4 boinc4   4096 Jul 24 16:28 lib
drwxrwxr-x  2 boinc4 boinc4   4096 Jul 24 16:28 bin




in case of shared installation with one users (GPU not detected) it looks like this
sudo -u boinc /usr/bin/boinc --allow_multiple_clients --allow_remote_gui_rpc --gui_rpc_port 30018 --daemon --dir /var/lib/boinc-client/datadirs/boincdata18  --suppress_net_info >>/var/lib/boinc-client/datadirs/boincdata18/boinc_client.log 2>>/var/lib/boinc-client/datadirs/boincdata18/boinc_client_err.log
ID: 1880070 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5820
Credit: 76,159,648
RAC: 52,319
Russia
Message 1880123 - Posted: 24 Jul 2017, 20:31:10 UTC - in response to Message 1879887.  

yes, that the way it works perfectly for CPU crunching also under linux; just create the data folder; standard manually the instances on different ports etc. Also starting the BM remotely with command parameter to reflect the different ports.

The only failing capability is the desired GPU detection when starting via simple separate data folders.

Based on Windows experience too, but for portable clients - analyse Juha's answer carefully - you need to run with full specified path, but not nessessarily different ones.
So you should go into BOINc dir and run multiple instances specifying different BOINC data dirs for each instance.
Each time you should use fully-specified path to BOINC executable (can be same one IMHO).
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1880123 · Report as offensive
Profile ChristianVirtual
Avatar

Send message
Joined: 23 Jun 13
Posts: 20
Credit: 6,631,963
RAC: 82
Japan
Message 1880135 - Posted: 24 Jul 2017, 20:59:28 UTC
Last modified: 24 Jul 2017, 21:01:44 UTC

looks like full qualified path to me (for the bin, data folders and output redirects)

sudo -u boinc /usr/bin/boinc --allow_multiple_clients --allow_remote_gui_rpc --gui_rpc_port 30018 --daemon --dir /var/lib/boinc-client/datadirs/boincdata18  --suppress_net_info >>/var/lib/boinc-client/datadirs/boincdata18/boinc_client.log 2>>/var/lib/boinc-client/datadirs/boincdata18/boinc_client_err.log


checking the file at /usr/bin/boinc I get

Ubuntu-1704-zesty-64-minimal /usr/bin # file boinc
boinc: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=ab7439644e844254efa51a611c6d95f1789536e2, stripped


on the CentOS similar
[CentOS-73-64-minimal /usr/local/bin]# file boinc
boinc: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=1e5936fcd1500881a1f3ef043a302a28f0b68962, not stripped
ID: 1880135 · Report as offensive
Profile ChristianVirtual
Avatar

Send message
Joined: 23 Jun 13
Posts: 20
Credit: 6,631,963
RAC: 82
Japan
Message 1880142 - Posted: 24 Jul 2017, 21:23:39 UTC
Last modified: 24 Jul 2017, 21:42:25 UTC

ahhh, sorry ... now I see the what Raistmer and Juha indicating and my fault, the full-path example was from the pure rzyen CPU box; not the GPU

doing once more on a CentOS-GPU box this way

sudo -u boinc /usr/local/bin/boinc --allow_multiple_clients --allow_remote_gui_rpc --gui_rpc_port 30002 --daemon --dir /home/boinc/zdatadirs/boincdata2  --suppress_net_info >>/home/boinc/zdatadirs/boincdata2/boinc_client.log 2>>/home/boinc/zdatadirs/boincdata2/boinc_client_err.log


with full qualified path to boinc

I get
Tue 25 Jul 06:12:34 2017 |  | Starting BOINC client version 7.7.0 for x86_64-pc-linux-gnu
Tue 25 Jul 06:12:34 2017 |  | This a development version of BOINC and may not function properly
Tue 25 Jul 06:12:34 2017 |  | log flags: file_xfer, sched_ops, task
Tue 25 Jul 06:12:34 2017 |  | Libraries: libcurl/7.29.0 NSS/3.21 Basic ECC zlib/1.2.7 libidn/1.28 libssh2/1.4.3
Tue 25 Jul 06:12:34 2017 |  | Running as a daemon
Tue 25 Jul 06:12:34 2017 |  | Data directory: /home/boinc/zdatadirs/boincdata2
Tue 25 Jul 06:12:35 2017 |  | CUDA: NVIDIA GPU 0: GeForce GTX 1080 Ti (driver version 384.47, CUDA version 9.0, compute capability 6.1, 4096MB, 3976MB available, 11340 GFLOPS peak)
Tue 25 Jul 06:12:35 2017 |  | CUDA: NVIDIA GPU 1: GeForce GTX 980 Ti (driver version 384.47, CUDA version 9.0, compute capability 5.2, 4096MB, 4006MB available, 7271 GFLOPS peak)
Tue 25 Jul 06:12:35 2017 |  | OpenCL: NVIDIA GPU 0: GeForce GTX 1080 Ti (driver version 384.47, device version OpenCL 1.2 CUDA, 11172MB, 3976MB available, 11340 GFLOPS peak)
Tue 25 Jul 06:12:35 2017 |  | OpenCL: NVIDIA GPU 1: GeForce GTX 980 Ti (driver version 384.47, device version OpenCL 1.2 CUDA, 6078MB, 4006MB available, 7271 GFLOPS peak)
Tue 25 Jul 06:12:35 2017 |  | Creating new client state file
Tue 25 Jul 06:12:35 2017 |  | Host name: linuxpowered
Tue 25 Jul 06:12:35 2017 |  | Processor: 3 GenuineIntel Intel(R) Core(TM) i7-2600S CPU @ 2.80GHz [Family 6 Model 42 Stepping 7]
Tue 25 Jul 06:12:35 2017 |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid xsaveopt
Tue 25 Jul 06:12:35 2017 |  | OS: Linux CentOS Linux: CentOS Linux 7 (Core) [3.10.0-514.6.1.el7.x86_64]
Tue 25 Jul 06:12:35 2017 |  | Memory: 31.21 GB physical, 11.18 GB virtual
Tue 25 Jul 06:12:35 2017 |  | Disk: 50.03 GB total, 43.24 GB free
Tue 25 Jul 06:12:35 2017 |  | Local time is UTC +9 hours
Tue 25 Jul 06:12:35 2017 |  | Config: allow multiple clients
Tue 25 Jul 06:12:35 2017 |  | Config: GUI RPC allowed from any host
Tue 25 Jul 06:12:35 2017 |  | Config: GUI RPCs allowed from:
Tue 25 Jul 06:12:35 2017 |  | Config: simulate 3 CPUs
Tue 25 Jul 06:12:35 2017 |  | Config: use all coprocessors
Tue 25 Jul 06:12:35 2017 |  | No general preferences found - using defaults
Tue 25 Jul 06:12:35 2017 |  | Preferences:


and the download of WU started

My apologies !!!! And thanks for your patience !!!!!
ID: 1880142 · Report as offensive

Message boards : Number crunching : multi-client and GPU detection


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.