Message boards :
Number crunching :
multi-client and GPU detection
Message board moderation
Author | Message |
---|---|
ChristianVirtual Send message Joined: 23 Jun 13 Posts: 21 Credit: 10,060,003 RAC: 0 |
I try on my linux systems to setup multiple clients to allow an easy way for bunkering. I know its controversial but if its encourage one need to use in order to be competitive in challenges. Do I like it: not really though as I would prefer raw-power during events. Now: technically I understand how run multiple clients and can do so for CPU clients meanwhile quite well. But when I try to do that for GPU I get the following event logs Sat 22 Jul 16:13:03 2017 | | Starting BOINC client version 7.7.0 for x86_64-pc-linux-gnu Sat 22 Jul 16:13:03 2017 | | This a development version of BOINC and may not function properly Sat 22 Jul 16:13:03 2017 | | log flags: file_xfer, sched_ops, task, coproc_debug Sat 22 Jul 16:13:03 2017 | | Libraries: libcurl/7.29.0 NSS/3.21 Basic ECC zlib/1.2.7 libidn/1.28 libssh2/1.4.3 Sat 22 Jul 16:13:03 2017 | | Running as a daemon Sat 22 Jul 16:13:03 2017 | | Data directory: /home/boinc/zdatadirs/boincdata1 Sat 22 Jul 16:13:03 2017 | | [coproc] launching child process at boinc Sat 22 Jul 16:13:03 2017 | | [coproc] relative to directory /home/boinc Sat 22 Jul 16:13:03 2017 | | [coproc] with data directory /home/boinc/zdatadirs/boincdata1 Sat 22 Jul 16:13:03 2017 | | GPU detection failed. error code 5632 Sat 22 Jul 16:13:03 2017 | | [coproc] read_coproc_info_file() returned error -108 Sat 22 Jul 16:13:03 2017 | | No usable GPUs found Sat 22 Jul 16:13:03 2017 | | Host name: linuxpowered Sat 22 Jul 16:13:03 2017 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600S CPU @ 2.80GHz [Family 6 Model 42 Stepping 7] Sat 22 Jul 16:13:03 2017 | | OS: Linux CentOS Linux: CentOS Linux 7 (Core) [3.10.0-514.6.1.el7.x86_64] Sat 22 Jul 16:13:03 2017 | | Memory: 31.21 GB physical, 11.18 GB virtual Sat 22 Jul 16:13:03 2017 | | Disk: 50.03 GB total, 47.52 GB free Sat 22 Jul 16:13:03 2017 | | Local time is UTC +9 hours I tried several things like disallow GPU on the client on port 31416 ; don't run it at all etc. On the main port via 31416 the GPUs get correctly detected with proper driver version and working well. So driver and GPUs installation are ok for itself. Any idea what the error codes meaning ? I grepped through the sources but could not find any good hint. Thanks in advance ! |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Are you running each copy of Boinc in a separate directory? It seems to me I read you had to do that. All the data I have seen about gpu's is specific to that copy of Boinc. Since I run 1 task / gpu versions of gpu cards I am uncertain how you would get multiple copies of Boinc to play "nicely" with a single gpu. Tom A proud member of the OFA (Old Farts Association). |
ChristianVirtual Send message Joined: 23 Jun 13 Posts: 21 Credit: 10,060,003 RAC: 0 |
the main folder is in /home/boinc and different data folder underneath of the multiple instances. You mean I need to install in a way like /home/boinc, /home/boinc1, .... with a full duplication ? I was thinking about that but hoped I can avoid (it would be similar to a VM-based setup, if passthrough for GPU would work) |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
the main folder is in /home/boinc and different data folder underneath of the multiple instances. I only looked briefly at the multiple Boinc client documentation. But I THOUGHT I understood that each client instance was "fully separate" which meant that even the Boinc directories were entirely separate. This looks like a relevant discussion which doesn't support my understanding but does discuss how/what should be done. https://boinc.berkeley.edu/dev/forum_thread.php?id=10986 HTH, Tom A proud member of the OFA (Old Farts Association). |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
The client can't be started from $PATH. You need to include the path to the executable. 5632 is status code from waitpid(). Decoded it's process exited with exit code 22 which is EINVAL or Invalid argument. Not sure where that could come from. Is (/usr/bin/)boinc the real executable or some wrapper? |
ChristianVirtual Send message Joined: 23 Jun 13 Posts: 21 Credit: 10,060,003 RAC: 0 |
The client can't be started from $PATH. You need to include the path to the executable. Thats what I do right now ... in a crude way by installing several complete boincs based on source code and in a way that bin/libs are in the home folder (else it will complain during install that some files are locked due to other running versions). Diskspace is cheap. :-| I need that setup only occasionally so I can live with less elegance. PS: the /usr/local/bin/boinc is the executable of the first instance installed ... |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
By analogy with the Windows model (and only by analogy, so there may be Linux nuances I'm unaware of): There's no problem with running multiple instances of boinc (the client) and boincmgr (the manager) from a single binary application folder/ BUT: Each client instance must be directed to use a separate and distinct data folder Each client instance must be directed to use a separate and distinct TCP port to communicate with its manager Each manager instance must be directed to use the matching TCP port to communicate with its client That probably requires extended use of command line switches in your launch scripts, including switches for 'allow multiple clients' and 'allow multiple managers', as appropriate. |
ChristianVirtual Send message Joined: 23 Jun 13 Posts: 21 Credit: 10,060,003 RAC: 0 |
yes, that the way it works perfectly for CPU crunching also under linux; just create the data folder; standard manually the instances on different ports etc. Also starting the BM remotely with command parameter to reflect the different ports. The only failing capability is the desired GPU detection when starting via simple separate data folders. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14649 Credit: 200,643,578 RAC: 874 |
OK then, I'll bow out - Linux GPU detection is certainly a specialised topic. I simply wanted to clarify some of the uncertainty introduced by Tom Miller I am uncertain how you would get multiple copies of Boinc to play "nicely" with a single gpu.By modifying the (separate, distinct) cc_config.xml files in each data folder, and by creating (separate, distinct) app_config.xml files in as many project folders in each separate data folder as you need. Edit: it's beginning to feel as if the first BOINC (client) instance is establishing and maintaining an exclusive lock on some part of the CUDA library infrastructure. In which case, the BOINC developers and the distro package maintainers ought to be told about it - but they would ask for a detailed investigation report first. |
Juha Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0 |
Could you share start up script or however you start them? |
ChristianVirtual Send message Joined: 23 Jun 13 Posts: 21 Credit: 10,060,003 RAC: 0 |
in case of separate installation (working GPU detection) with dedicated users it looks like this (from ps -ef) /home/boinc3/bin/boinc --allow_multiple_clients --allow_remote_gui_rpc --gui_rpc_port 30003 --daemon --suppress_net_info in the home folder I have the bins and libs drwxr-xr-x 38 boinc4 boinc4 4096 Jul 24 16:28 boinc_source drwxrwxr-x 3 boinc4 boinc4 4096 Jul 24 16:28 include drwxrwxr-x 2 boinc4 boinc4 4096 Jul 24 16:28 lib drwxrwxr-x 2 boinc4 boinc4 4096 Jul 24 16:28 bin in case of shared installation with one users (GPU not detected) it looks like this sudo -u boinc /usr/bin/boinc --allow_multiple_clients --allow_remote_gui_rpc --gui_rpc_port 30018 --daemon --dir /var/lib/boinc-client/datadirs/boincdata18 --suppress_net_info >>/var/lib/boinc-client/datadirs/boincdata18/boinc_client.log 2>>/var/lib/boinc-client/datadirs/boincdata18/boinc_client_err.log |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
yes, that the way it works perfectly for CPU crunching also under linux; just create the data folder; standard manually the instances on different ports etc. Also starting the BM remotely with command parameter to reflect the different ports. Based on Windows experience too, but for portable clients - analyse Juha's answer carefully - you need to run with full specified path, but not nessessarily different ones. So you should go into BOINc dir and run multiple instances specifying different BOINC data dirs for each instance. Each time you should use fully-specified path to BOINC executable (can be same one IMHO). SETI apps news We're not gonna fight them. We're gonna transcend them. |
ChristianVirtual Send message Joined: 23 Jun 13 Posts: 21 Credit: 10,060,003 RAC: 0 |
looks like full qualified path to me (for the bin, data folders and output redirects) sudo -u boinc /usr/bin/boinc --allow_multiple_clients --allow_remote_gui_rpc --gui_rpc_port 30018 --daemon --dir /var/lib/boinc-client/datadirs/boincdata18 --suppress_net_info >>/var/lib/boinc-client/datadirs/boincdata18/boinc_client.log 2>>/var/lib/boinc-client/datadirs/boincdata18/boinc_client_err.log checking the file at /usr/bin/boinc I get Ubuntu-1704-zesty-64-minimal /usr/bin # file boinc boinc: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=ab7439644e844254efa51a611c6d95f1789536e2, stripped on the CentOS similar [CentOS-73-64-minimal /usr/local/bin]# file boinc boinc: ELF 64-bit LSB executable, x86-64, version 1 (GNU/Linux), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=1e5936fcd1500881a1f3ef043a302a28f0b68962, not stripped |
ChristianVirtual Send message Joined: 23 Jun 13 Posts: 21 Credit: 10,060,003 RAC: 0 |
ahhh, sorry ... now I see the what Raistmer and Juha indicating and my fault, the full-path example was from the pure rzyen CPU box; not the GPU doing once more on a CentOS-GPU box this way sudo -u boinc /usr/local/bin/boinc --allow_multiple_clients --allow_remote_gui_rpc --gui_rpc_port 30002 --daemon --dir /home/boinc/zdatadirs/boincdata2 --suppress_net_info >>/home/boinc/zdatadirs/boincdata2/boinc_client.log 2>>/home/boinc/zdatadirs/boincdata2/boinc_client_err.log with full qualified path to boinc I get Tue 25 Jul 06:12:34 2017 | | Starting BOINC client version 7.7.0 for x86_64-pc-linux-gnu Tue 25 Jul 06:12:34 2017 | | This a development version of BOINC and may not function properly Tue 25 Jul 06:12:34 2017 | | log flags: file_xfer, sched_ops, task Tue 25 Jul 06:12:34 2017 | | Libraries: libcurl/7.29.0 NSS/3.21 Basic ECC zlib/1.2.7 libidn/1.28 libssh2/1.4.3 Tue 25 Jul 06:12:34 2017 | | Running as a daemon Tue 25 Jul 06:12:34 2017 | | Data directory: /home/boinc/zdatadirs/boincdata2 Tue 25 Jul 06:12:35 2017 | | CUDA: NVIDIA GPU 0: GeForce GTX 1080 Ti (driver version 384.47, CUDA version 9.0, compute capability 6.1, 4096MB, 3976MB available, 11340 GFLOPS peak) Tue 25 Jul 06:12:35 2017 | | CUDA: NVIDIA GPU 1: GeForce GTX 980 Ti (driver version 384.47, CUDA version 9.0, compute capability 5.2, 4096MB, 4006MB available, 7271 GFLOPS peak) Tue 25 Jul 06:12:35 2017 | | OpenCL: NVIDIA GPU 0: GeForce GTX 1080 Ti (driver version 384.47, device version OpenCL 1.2 CUDA, 11172MB, 3976MB available, 11340 GFLOPS peak) Tue 25 Jul 06:12:35 2017 | | OpenCL: NVIDIA GPU 1: GeForce GTX 980 Ti (driver version 384.47, device version OpenCL 1.2 CUDA, 6078MB, 4006MB available, 7271 GFLOPS peak) Tue 25 Jul 06:12:35 2017 | | Creating new client state file Tue 25 Jul 06:12:35 2017 | | Host name: linuxpowered Tue 25 Jul 06:12:35 2017 | | Processor: 3 GenuineIntel Intel(R) Core(TM) i7-2600S CPU @ 2.80GHz [Family 6 Model 42 Stepping 7] Tue 25 Jul 06:12:35 2017 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid xsaveopt Tue 25 Jul 06:12:35 2017 | | OS: Linux CentOS Linux: CentOS Linux 7 (Core) [3.10.0-514.6.1.el7.x86_64] Tue 25 Jul 06:12:35 2017 | | Memory: 31.21 GB physical, 11.18 GB virtual Tue 25 Jul 06:12:35 2017 | | Disk: 50.03 GB total, 43.24 GB free Tue 25 Jul 06:12:35 2017 | | Local time is UTC +9 hours Tue 25 Jul 06:12:35 2017 | | Config: allow multiple clients Tue 25 Jul 06:12:35 2017 | | Config: GUI RPC allowed from any host Tue 25 Jul 06:12:35 2017 | | Config: GUI RPCs allowed from: Tue 25 Jul 06:12:35 2017 | | Config: simulate 3 CPUs Tue 25 Jul 06:12:35 2017 | | Config: use all coprocessors Tue 25 Jul 06:12:35 2017 | | No general preferences found - using defaults Tue 25 Jul 06:12:35 2017 | | Preferences: and the download of WU started My apologies !!!! And thanks for your patience !!!!! |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.