Message boards :
Number crunching :
Run Cuda app only... again
Message board moderation
Author | Message |
---|---|
Luigi R. Send message Joined: 26 Nov 13 Posts: 10 Credit: 1,608,382 RAC: 0 |
Hello ITEHQ, Hello, does someone know how to get this working on Linux? This discussion didn't help me. I have a computer running 2 BOINC clients (hosts) to get things separated. The selected app is SETI@home v8. Hardware & software: 17-Aug-2016 01:36:01 [---] Starting BOINC client version 7.2.42 for x86_64-pc-linux-gnu 17-Aug-2016 01:36:01 [---] CUDA: NVIDIA GPU 0: GeForce GTX 750 Ti (driver version unknown, CUDA version 8.0, compute capability 5.0, 2000MB, 1973MB available, 2132 GFLOPS peak) 17-Aug-2016 01:36:01 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 750 Ti (driver version 367.35, device version OpenCL 1.2 CUDA, 2000MB, 1973MB available, 2132 GFLOPS peak) 17-Aug-2016 01:36:01 [---] Processor: 8 GenuineIntel Intel(R) Core(TM) i7-4770K CPU @ 3.50GHz [Family 6 Model 60 Stepping 3] 17-Aug-2016 01:36:01 [---] OS: Linux: 3.13.0-92-generic The first host runs GPU apps and crunches whatever comes. CPU Unchecked GPU ATI Unchecked GPU NVIDIA Checked Intel GPU Unchecked The second host runs the optimized CPU app "Linux 64bit Multibeam v8 for AVX CPUs" by Lunatics (see here). CPU Checked GPU ATI Unchecked GPU NVIDIA Unhecked Intel GPU Unchecked I want Cuda app only on the first host because it doesn't use cpu very much and allows CPU to run 8 threads ~100%. Opencl tasks use 100% cpu although this is my app_config.xml. That's not very efficient. <app_config> <app> <name>setiathome_v8</name> <max_concurrent>2</max_concurrent> <gpu_versions> <gpu_usage>.5</gpu_usage> <cpu_usage>.04</cpu_usage> </gpu_versions> </app> </app_config> Could an app_info.xml (like the below one) do it right? <app_info> <app_version> <app_name>setiathome_v8</app_name> <version_num>801</version_num> <platform>x86_64-pc-linux-gnu</platform> <coproc> <type>NVIDIA</type> <count>0.5</count> </coproc> <plan_class>cuda60</plan_class> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.2</max_ncpus> <cmdline></cmdline> <file_ref> <file_name>setiathome_8.01_x86_64-pc-linux-gnu__cuda60</file_name> <main_program/> </file_ref> <file_ref> <file_name>libcudart.so.6.0</file_name> <open_name>libcudart.so.6.0</open_name> </file_ref> <file_ref> <file_name>libcufft.so.6.0</file_name> <open_name>libcufft.so.6.0</open_name> </file_ref> <file_ref> <file_name>mb_cmdline-8.01-cuda60.txt</file_name> <open_name>mb_cmdline.txt</open_name> </file_ref> <file_ref> <file_name>setiathome-8.01-cuda60_AUTHORS</file_name> <open_name>setiathome-8.01-cuda60_AUTHORS</open_name> </file_ref> <file_ref> <file_name>setiathome-8.01-cuda60_COPYING</file_name> <open_name>setiathome-8.01-cuda60_COPYING</open_name> </file_ref> <file_ref> <file_name>setiathome-8.01-cuda60_COPYRIGHT</file_name> <open_name>setiathome-8.01-cuda60_COPYRIGHT</open_name> </file_ref> <file_ref> <file_name>setiathome-8.01-cuda60_README</file_name> <open_name>setiathome-8.01-cuda60_README</open_name> </file_ref> <file_ref> <file_name>setiathome-8.01-cuda60_linux_README_x41zi</file_name> <open_name>setiathome-8.01-cuda60_linux_README_x41zi</open_name> </file_ref> </app_version> </app_info> |
Luigi R. Send message Joined: 26 Nov 13 Posts: 10 Credit: 1,608,382 RAC: 0 |
Hello, libcudart.so.6.0 libcufft.so.6.0 setiathome_8.01_x86_64-pc-linux-gnu__cuda60 app_info.xml <app_info> <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_8.01_x86_64-pc-linux-gnu__cuda60</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <version_num>801</version_num> <platform>x86_64-pc-linux-gnu</platform> <coproc> <type>NVIDIA</type> <count>0.5</count> </coproc> <plan_class>cuda60</plan_class> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>0.2</max_ncpus> <cmdline></cmdline> <file_ref> <file_name>setiathome_8.01_x86_64-pc-linux-gnu__cuda60</file_name> <main_program/> </file_ref> </app_version> </app_info> It works. Hope this could help someone. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Hi Luigi, Missed this thread in the mayhem, but glad to see you figured something out :) I'm curious (from a developer perspective) about a couple of things: - First, probably more a part of Linux driver/library workings, how it goes with the cudart and cufft shared library references omitted. I can build with cudart statically linked, but I guess the system must be finding cufft and possibly cudart already somewhere on your system ? Performing an ldd command on the executable within a terminal session might reveal where the executable is finding them. Not that it matters for you if working. - Second, what is the motivation with going Cuda GPU only in this case ? With the majority of work being GBT, for which the Cuda app was not originally designed, and there are higher performing OpenCL apps at the moment, I need to understand that motivation. With Xbranch currently in drydock for redesign and refit, I'd like to make sure this use-case doesn't accidentally get designed out. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Luigi R. Send message Joined: 26 Nov 13 Posts: 10 Credit: 1,608,382 RAC: 0 |
Hello Jason, 1) Cuda application finds cudart and cufft shared library in its own directory. They could be simply pasted there and it is fine. ldd output: luis@luis-XU:~/Applicazioni/boinc/projects/setiathome.berkeley.edu$ ldd setiathome_8.01_x86_64-pc-linux-gnu__cuda60 linux-vdso.so.1 => (0x00007fff699b3000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f63383f5000) libcudart.so.6.0 => /home/luis/Applicazioni/boinc/projects/setiathome.berkeley.edu/./libcudart.so.6.0 (0x00007f63381a4000) libcufft.so.6.0 => /home/luis/Applicazioni/boinc/projects/setiathome.berkeley.edu/./libcufft.so.6.0 (0x00007f6335f77000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6335c73000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f633596d000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f6335757000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6335392000) /lib64/ld-linux-x86-64.so.2 (0x00007f6338613000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f633518e000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f6334f86000) 2) I noticed Opencl applications use 100% CPU instead of respecting my project configuration file. app_config.xml <app_config> <app> <name>setiathome_v8</name> <max_concurrent>2</max_concurrent> <gpu_versions> <gpu_usage>.5</gpu_usage> <cpu_usage>.04</cpu_usage> </gpu_versions> </app> </app_config> I guess this behaviour leads to an inefficiency of CPU-only applications because of 10 concurrent threads (8 CPU + 2 CPU-GPU) and different priority (nice=10 for CPU-GPU Vs. nice=19 for CPU). I observed that Cuda application's CPU usage is not much (less than 2 minutes for each task) too. CPU time for CPU-only tasks: -yesterday: http://setiathome.berkeley.edu/result.php?resultid=5101989937 -today: http://setiathome.berkeley.edu/result.php?resultid=5103322853 opencl_nvidia_SoG Vs. opencl_nvidia_sah Vs. cuda60 http://setiathome.berkeley.edu/result.php?resultid=5101496155 http://setiathome.berkeley.edu/result.php?resultid=5098523660 http://setiathome.berkeley.edu/result.php?resultid=5101749058 |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
2) I noticed Opencl applications use 100% CPU instead of respecting my project configuration file. Only BOINC reads that file. So no application can "respect" it. <cpu_usage> is used by BOINC only to decide to "free CPU cores" or not. - if the Sum of <cpu_usage> for all the running GPU applications >= 1 and < 2 : BOINC will run one less CPU task E.g.: - if <cpu_usage>0.5</cpu_usage> and you run 4 GPU tasks - BOINC will free 2 CPU cores (will run two less CPU tasks) - For 3 running GPU tasks <cpu_usage>0.33</cpu_usage> will not free any core (= is useless, is only cosmetic display in BOINC Manager) The Sum is truncated: 0.99 == 0 ; 1.99 == 1 ; ... I guess this behaviour leads to an inefficiency of CPU-only applications Most people prefer to run less CPU tasks ("free CPU cores") to help the GPU app perform faster because the GPU can do more work than the cores you "lose". Â Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
2) I noticed Opencl applications use 100% CPU instead of respecting my project configuration file. app_config.xml is not intended to apply a controlling directive to GPU apps, and never does - don't expect any application to modify its internal behaviour as a result of app_config settings, OpenCL ones or otherwise. In the extreme cases where behaviour modification is desirable - multithreaded CPU apps - the load is modified via a command-line argument --nthreads N Would this be a valid place to express a personal opinion that there's never an excuse for an OpenCL app to use 100% CPU for support, and that's always reducible to bad programming practice - although I hasten to make it clear that the bad programming practice may be that of the developers and providers of the SDK or equivalent development tools, not necessarily of the final application developer. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
opencl_nvidia_SoG Vs. opencl_nvidia_sah Vs. cuda60 You may read the doc - there are many switches to tune OpenCL apps for: less CPU usage / less lag / higher performance: From here: https://cloud.mail.ru/public/LJ8s/c3WyRR8ip get MB8_win_x86_SSE3_OpenCL_NV_SoG_r3500.7z Inside it read: ReadMe_MultiBeam_OpenCL.txt  - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Raistmer has stated that the NVidia OpenCL development tools (I think referring to Windows, not sure if it applies to Linux too) don't provide access to the low-level OpenCL synchronisation routines in the same way that ATI and Intel does. That's why the (NV only? Windows only?) apps have a CPU overhead in the first place, and need the command-line interventions to reduce it. Since luigi's question related to Linux, I thought now might be a moment to examine the NVidia development tool position from that alternative perspective. |
Luigi R. Send message Joined: 26 Nov 13 Posts: 10 Credit: 1,608,382 RAC: 0 |
Most people prefer to run less CPU tasks ("free CPU cores") to help the GPU app perform faster because the GPU can do more work than the cores you "lose". I would consider that lost as acceptable if I had got a faster GPU like GTX 970/980 or 1070/1080 (just talking about NVIDIA). Let's take *guppi tasks. GTX 750Ti: 2 tasks for 4800s, consumption: 27-29W (nvidia-smi). i7 4770k: 8 tasks for 5400s, consumption: 84W (probably higher). There is a little difference between OpenCL and CUDA GPU-time. I don't know if it is worth running OpenCL. I guessed no. Have you got any benchmark for GTX 750Ti? |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
This high CPU use from OpenCL started after some driver versions I think for NVIDIA the last driver with low CPU usage was/is 266.58 On Windows XP for my older AMD ATI Radeon HD 6570 + Catalyst 11.12 the CPU time was low for any app in the past and is still very low with the current app (ATi_HD5_r3500) http://setiathome.berkeley.edu/results.php?userid=8647488&offset=0&show_names=0&state=4&appid= The new ATi_HD5_r3500 app have much less lag even on VLARs (and ~ the same speed for me as r3330) (on default, no mb_cmdline - file is empty) Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Thanks for the detail on the choices. Yes I'll be maintaining low CPU use for default modes of operation in future Cuda builds, even when/if dedicating more CPU use will become an option, and even if I change the way this is done. as for the expectations that some or another OS or brand of GPU OpenCL driver or libraries should default to (old fashioned) blocking synchronisation, at the risk of opening a can of worms, smarter design is needed due to rapid pace of change. That's part of the reason Xbranch is in drydock now for overhaul. No amount of wishful thinking, whining, or bandaid patches will make old design and implementation practices suddenly ideal again. (nearly every useful modern API is moving, or already has moved, to fully asynchronous non-blocking behaviour, for some very good reasons) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
There is a little difference between OpenCL and CUDA GPU-time. I don't know if it is worth running OpenCL. I guessed no. Have you got any benchmark for GTX 750Ti? OK, according to the graph Shaggie76 made - for GTX 750Ti there is not much difference (graph is for one task/GPU, he analyses the host.gz from http://setiathome.berkeley.edu/stats/ and some web pages here but you may ask him for the details) http://setiathome.berkeley.edu/forum_thread.php?id=80132 Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Stubbles Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0 |
Let's take *guppi tasks. Hello Luigi, I tried to end the debate. See GTX 750 Ti optimisation with GuppiRescheduler: Cuda50 (3/gpu) bests NV_SoG!?! ... but I wasn't successful before SETI WoW event started because I wasn't aware that the throughput with NV_SoG was better with 2nonVLARs/GPU even though a 2nd CPUcore would have been locked out of running a guppi. Since then, I am currently running 4 nonVLARs/gpu on both: GTX 750 Ti: ~48mins for 4nonVLARs = ~12min throughput (or ~5 nonVLAR per hour) GTX 1060: ~25mins for 4nonVLARs = ~6+min throughput (or ~10 nonVLAR per hour) Keep in mind those #s and that link are with a DeviceQueueOptimizer such as Mr Kevvy's GuppiRescheduler (also available for Linux). Cheers, RobG :-) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
There is a little difference between OpenCL and CUDA GPU-time. I don't know if it is worth running OpenCL. I guessed no. Have you got any benchmark for GTX 750Ti? Depends on your settings. I run 2 WUs per GPU (although now trying a few different settings with the latest release), with 1 CPU core reserved for each WU. On my GTX 750Tis running 2 at a time, Guppie WUs can take 60-75min using CUDA. Using SoG it's 40-50min. If there is a mix of Arecibo & Guppie work on one GPU, the CUDA50 Arecibo times can almost double on CUDA50. On SoG there is an increase in the Arecibo runtime, but no where near as much. Grant Darwin NT |
Luigi R. Send message Joined: 26 Nov 13 Posts: 10 Credit: 1,608,382 RAC: 0 |
Hello Luigi, Hi Stubbles, My 750Ti takes 25 minutes for 2 nonVLARs with CUDA. It doesn't look bad if you consider that I earn 23 minutes (less than 2 of 25 minutes needed by CUDA) of CPU-time to dedicate to AVX app. If I run SoG (OpenCL) - considering your GPU-time - I will earn 1 minute for GPU-time and lose all the 24 minutes of CPU-time. My 750Ti efficiency is pretty similar to CPU one on SETI@home. Indeed I run by GPU here only because of SETI WoW event. Then I will probably come back to Poem@home, Primegrid or GPUGRID. I don't want to sound repetitive (and maybe argumentative), but I still ask if it is worth for a volunteer to "waste" time looking for the best (instead of optimal) configuration. Optimal configuration took 1 minute of my time, best configuration will take hours as long as I read something that is not my native language and I do some tries to get it to work. I took 50 minutes to write this post. :( Depends on your settings. I'm blocking *guppi* tasks. I received and blocked only 4 tasks (in 2 days) for now. Let me know if you consider that a bad practice. P.S. I run cuda60. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Optimal configuration took 1 minute of my time, best configuration will take hours as long as I read something that is not my native language and I do some tries to get it to work. Understood from this end very well, and thanks very much for the input. I will continue trying to keep things simple. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
I'm blocking *guppi* tasks. I received and blocked only 4 tasks (in 2 days) for now. There have been some issues with the splitters. For a while there, there were only Guppie tasks available, with the occasional resend of Aricebo tasks. Now it's the other way around- almost no Guppies at all. The ratio varies, but it's usually more 55%/45% to 65%/35% Guppies/Arecibo. Let me know if you consider that a bad practice. It is considered poor practice, however there are many people here who do that all the time anyway. Grant Darwin NT |
Stubbles Send message Joined: 29 Nov 99 Posts: 358 Credit: 5,909,255 RAC: 0 |
Optimal configuration took 1 minute of my time, best configuration will take hours as long as I read something that is not my native language and I do some tries to get it to work. Thanks for taking the time to post again Luigi. For the rest of SETI WoW, you will likely get tasks in the ratio that Grant mentioned. If you are only looking to improve your daily output for WoW, then give Mr Kevvy's GuppiRescheduler a try. Unfortunately, there isn't a front-end to it yet for Linux (I only made one for Windows 7-10 since that is what I know to make scripts with...even though I'm more of a Linux user when I get the choice). So for your "optimal" +5 (with +10 = "best"), getting guppis to be processed on the CPU is the next "better" step. Those who have used it report an increase of 10-15% over their "optimal" or "best" settings. My experience was closer to 10%. Be aware that some view the act of: "Abort" tasks, to be ...hmmm... very bad! lol For WoW though, since the rules are: there are no rules! ...and the S@h staff did not stipulate explicitly any "nefarious" practices, then I view it as a way for the project staff to test the overall architecture to see where the weaknesses are. We already had 2 minor events (with the last one of those seeming to be just a small human error that caused a flood of guppi followed by a flood of nonVLARs). Cheers, Keep calm and RobG :-p [edit] oops, I forgot about Cuda60. I don't know if Mr Kevvy's script was tested for that app. Just backup your whole Boinc folder first...just in case. [/e] |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
[edit] oops, I forgot about Cuda60. I don't know if Mr Kevvy's script was tested for that app. Just backup your whole Boinc folder first...just in case. [/e] Mr Kevvy's script looks for client_state.xml in the current directory. So it can be tested with a copy of client_state.xml Then the changed copy can be compared to the original client_state.xml Put in empty directory: - the "script" - a copy of client_state.xml - a copy of sched_request_setiathome.berkeley.edu.xml (the "script" analyses that file too (maybe Mr Kevvy found that easier than analysing just client_state.xml) but changes only client_state.xml) Run the "script" (from this temp directory) Compare the changed copy with the original client_state.xml P.S. In fact the "script" is a compiled C++ code, not what I will call "script" Â Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Luigi R. Send message Joined: 26 Nov 13 Posts: 10 Credit: 1,608,382 RAC: 0 |
I know that aborting is not be highly regarded, but I don't care about challengers. :P I care if it is ok for project administrators. I don't usually abort anything, but this project is good at doing something else. It allows me to block undesired tasks, cause of long deadline, and to complete them after challenge's period. I'm afraid that rescheduler doesn't suit my requirements. I'm using two BOINC clients on the same computer. The first one runs GPU tasks, the second runs CPU tasks. I wouldn't like to have additional concurrent CPU tasks. It's interesting what it does, I didn't know about sched*.xml files. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.