Linux CUDA 'Special' App finally available, featuring Low CPU use

Author	Message
Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1834787 - Posted: 8 Dec 2016, 11:16:38 UTC - in response to Message 1834778. TBAR: Have you compared the speed of your compile compared to Petris different builds?..... I was one of the First testers, been at it for over a Year now. I've tested hundreds of builds during that Year, right to p_zi3i. I haven't been sent any newer version that zi3i. Your other theory doesn't take into account the use of Offline Benchmarking. The Benchmark App will identify the source of the problem. I just ran another series of tests which show the Pulsefind Error that was addressed in zi3f is still present, it's just a little better in the zi+ build than the zi3i build. That's sad news but thanks that you rigorously monitor result quality. If the issue really that I think of it, that's quite hard to track bug and it can manifest itself differently not only on different platforms but on different hardware too. Apparently (had no time still to review the code so guessing here) there is parallelization of Pulse find search with splitting of periods through different workitems. And _IF_ splitting done not only through few workitems but through few workgroups as well - it can be the bug in its current manifestation. AFAIK by both OpenCL and CUDA design separate workgroups are completely independent entities. There is no ordering in their execution and no synching besides running in separate ordered kernel calls. So, order can be chosen freely by Runtime and can be different for different platforms/hardware. Worth to check this. I will have time for code review only after New Year perhaps, hardly earlier. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1834787 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1834788 - Posted: 8 Dec 2016, 11:20:31 UTC - in response to Message 1834781. Last modified: 8 Dec 2016, 11:24:12 UTC Have you noticed it's always just One Bad Pulse? Never 2 or more, always one , no matter how many Pulses are found. I think that's just the luck. Pulse is rare event. Bad Pulse is rare event between rare events. Just probability to get 2 bads in single task too low to easily catch it. EDIT: but to check this worth to put all BAD Pulses in the table and see then what is common between them? Same FFT length for example, or some particular chirp sign, or some particular period and so forth. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1834788 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1834791 - Posted: 8 Dec 2016, 11:51:55 UTC - in response to Message 1834787. ...If the issue really that I think of it, that's quite hard to track bug and it can manifest itself differently not only on different platforms but on different hardware too. Apparently (had no time still to review the code so guessing here) there is parallelization of Pulse find search with splitting of periods through different workitems. And _IF_ splitting done not only through few workitems but through few workgroups as well - it can be the bug in its current manifestation. AFAIK by both OpenCL and CUDA design separate workgroups are completely independent entities. There is no ordering in their execution and no synching besides running in separate ordered kernel calls. So, order can be chosen freely by Runtime and can be different for different platforms/hardware. ... This is where the discontinued serial emulated device would have come in handy. a missing syncthreads() or warp level reduction not marked volatile can have similar behaviour. tough to locate something that could be missing indeed :) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1834791 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1835615 - Posted: 12 Dec 2016, 18:33:18 UTC Now up to 10 downloads, Linux CUDA 6 Special App, and not a single report on how it's working. It would be nice to hear a little Feedback. I ran driver 364.19 for a while and it also gave a couple Invalid Unmatched Overflows on the 750Ti, so, you need to use at least driver 367.xx. Everything else appears to be working fine as long as those Quick Arecibo Overflows stay away. Seems none of the 'Special' Apps like those things. Still waiting on the word about the Unroll problem... ID: 1835615 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1835696 - Posted: 13 Dec 2016, 0:59:25 UTC - in response to Message 1835615. Still digging here. Will know more later on. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1835696 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20283 Credit: 7,508,002 RAC: 20	Message 1835809 - Posted: 14 Dec 2016, 0:51:41 UTC TBar, Just caught the end of this thread. Silly busy here (Damned Americans... And silly IT... And...)... Any value in trying some older CUDA GPUs?... Happy fast crunchin, Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 1835809 ·

Rockhount Send message Joined: 29 May 00 Posts: 34 Credit: 31,935,954 RAC: 29	Message 1835847 - Posted: 14 Dec 2016, 6:22:53 UTC Hi TBar, I just downloaded your special cuda apps 6.0 & 4.2. At the moment the system is working on old cpu units. Maybe today it will ask for cuda work. I've bind for first use the cuda 4.2 app in my app_info. This machine is now with cuda linux: https://setiathome.berkeley.edu/show_host_detail.php?hostid=1931980 If everything looks good I could try the cuda 6.2 app. Keep crunching. Regards from nothern Germany Roman SETI@home classic workunits 207,059 SETI@home classic CPU time 1,251,095 hours ID: 1835847 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1835850 - Posted: 14 Dec 2016, 6:56:11 UTC - in response to Message 1835847. Any value in trying some older CUDA GPUs?... Hi TBar, I just downloaded your special cuda apps 6.0 & 4.2. At the moment the system is working on old cpu units. Maybe today it will ask for cuda work. I've bind for first use the cuda 4.2 app in my app_info. This machine is now with cuda linux: https://setiathome.berkeley.edu/show_host_detail.php?hostid=1931980 ... Well, the Special App will only work with GPUs that have a Compute Capability of 3.2 or higher. It will not work with older GPUs. For the Older GPUs, such as the Pre-Fermi GPUs, and very Low End GPUs the CUDA 4.2 App will probably be best. For the GPUs between the Pre-Fermi and GPUs Lower than CC 3.2 the OpenCL App will be best. There is nothing Special about the 4.2 or OpenCL App. If you have a Supported GPU, such as the 750Ti, the best choice is the Special CUDA 60 App. This is Not the same as the Plain CUDA 60 App currently on Main, it's totally different. The Plain CUDA 60 App currently on Main is about the same as the CUDA 4.2 App, it was coded back in 2007 and does Not work well with the BLC tasks. The New Special App works the best on Supported GPUs. The New Special App was developed to run ONE task at a time per GPU, it's remarkably different than the Older CUDA Apps. ID: 1835850 ·

Rockhount Send message Joined: 29 May 00 Posts: 34 Credit: 31,935,954 RAC: 29	Message 1835857 - Posted: 14 Dec 2016, 7:59:42 UTC Ok, I will switch it, but I need the libcudart.so.6 and libcufft.so.6. The old version libcudart.so.4 could found in your 7z file but not the new ones. Regards from nothern Germany Roman SETI@home classic workunits 207,059 SETI@home classic CPU time 1,251,095 hours ID: 1835857 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1835858 - Posted: 14 Dec 2016, 8:24:19 UTC - in response to Message 1835857. Linux CUDA 6 Special App This App will only work on GPUs that are Compute Capability 3.2 or higher. See the list of supported GPUs here, https://en.wikipedia.org/wiki/CUDA#GPUs_supported Tested in Ubuntu 14.04.5 with nVidia driver 367.44 You must download the CUDA Libraries from SETI Beta, see the Docs folder, Notes_x41p_zi+.txt for the links. Also, make sure to read the Notes about setting the -unroll number in app_info.xml. The cufft file is too large to post. You will also need to set the -unroll in the app_info.xml to match your GPU. For the 750Ti the best setting is -unroll 5 http://boinc2.ssl.berkeley.edu/beta/download/libcudart.so.6.0 http://boinc2.ssl.berkeley.edu/beta/download/libcufft.so.6.0 ID: 1835858 ·

Rockhount Send message Joined: 29 May 00 Posts: 34 Credit: 31,935,954 RAC: 29	Message 1835862 - Posted: 14 Dec 2016, 8:57:17 UTC Ok, will update when the machine crunched down the cache with cuda 4.2 units. Some of then are already finished. https://setiathome.berkeley.edu/show_host_detail.php?hostid=1931980 Maintanence with putty is a really nice way when you're at work and the machine is at home especially when you're using a mobile device with limited bandwith. Regards from nothern Germany Roman SETI@home classic workunits 207,059 SETI@home classic CPU time 1,251,095 hours ID: 1835862 ·

Rockhount Send message Joined: 29 May 00 Posts: 34 Credit: 31,935,954 RAC: 29	Message 1835876 - Posted: 14 Dec 2016, 10:54:37 UTC First inconclusive found with cuda 4.2 app (pulse find issue). https://setiathome.berkeley.edu/workunit.php?wuid=2360508589 Regards from nothern Germany Roman SETI@home classic workunits 207,059 SETI@home classic CPU time 1,251,095 hours ID: 1835876 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1835884 - Posted: 14 Dec 2016, 11:36:03 UTC - in response to Message 1835850. Last modified: 14 Dec 2016, 12:20:52 UTC Well, the Special App will only work with GPUs that have a Compute Capability of 3.2 or higher. It will not work with older GPUs. For the Older GPUs, such as the Pre-Fermi GPUs, and very Low End GPUs the CUDA 4.2 App will probably be best. For the GPUs between the Pre-Fermi and GPUs Lower than CC 3.2 the OpenCL App will be best. There is nothing Special about the 4.2 or OpenCL App. If you have a Supported GPU, such as the 750Ti, the best choice is the Special CUDA 60 App. This is Not the same as the Plain CUDA 60 App currently on Main, it's totally different. The Plain CUDA 60 App currently on Main is about the same as the CUDA 4.2 App, it was coded back in 2007 and does Not work well with the BLC tasks. The New Special App works the best on Supported GPUs. The New Special App was developed to run ONE task at a time per GPU, it's remarkably different than the Older CUDA Apps. . . It is a shame I am running Windows, I would love to see how well the Special Cuda 60 app performs on my GT730 (CC=3.5), even though I am planning on retiring it if MSI ever actually release the low profile version of the GTX1050 ti. It would be nice to improve its output until that happens. But I am not at the level of compiling and tweaking software at the programming level. . . Even though the GT 730 is a fairly newish card it actually performs much better under Cuda50 than with the OpenCL app (SoG r3557). Stephen . ID: 1835884 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1835892 - Posted: 14 Dec 2016, 13:11:07 UTC - in response to Message 1835876. First inconclusive found with cuda 4.2 app (pulse find issue). https://setiathome.berkeley.edu/workunit.php?wuid=2360508589 That's a known issue with your Wingperson, SETI@home v8 v8.00 (opencl_intel_gpu_sah) x86_64-apple-darwin, and it isn't going to get any better anytime soon as the newer Apple Intel App at Beta has the same problem(s). The CUDA 42 App is very good, presently more reliable than the Special App. Hopefully the last few bugs will be solved in the Special App soon. You will probably receive a few Inconclusives with the New App, but, nowhere close to the number the 'Stock' Apple iGPU App receives. You are running with the Default Unroll number of 2; Using unroll = 2 from command line args, you would receive better times if you set it to -unroll 5 in the app_info.xml; <version_num>801</version_num> <plan_class>cuda60</plan_class> <cmdline>-bs -unroll 5</cmdline> Otherwise it appears to be working normally, Congratulations. ID: 1835892 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1835894 - Posted: 14 Dec 2016, 13:15:09 UTC - in response to Message 1835884. . . It is a shame I am running Windows, I would love to see how well the Special Cuda 60 app performs on my GT730 (CC=3.5), even though I am planning on retiring it... You could always build a Linux machine for it, the parts are cheap on eBay. ID: 1835894 ·

Rockhount Send message Joined: 29 May 00 Posts: 34 Credit: 31,935,954 RAC: 29	Message 1835895 - Posted: 14 Dec 2016, 13:35:02 UTC Just changed the -bs -unroll paramter. The memory consumption increase from 870MiB to ~1086MiB with the cuda 6.0 version. The old cuda 4.2 version was only at 270MiB with -bs -unroll 2. Power consumption and GPU utilisation is about the same ~ 24W @ 100% (44Â°C) with the Gigabyte 750ti The calculation time decreses from 1200-1300s to 800-900s (cuda 4.2 -> cuda 6) Nice work TBar. With this app I squeeze a lot more credits out of this old machine. Thumbs up! Regards from nothern Germany Roman SETI@home classic workunits 207,059 SETI@home classic CPU time 1,251,095 hours ID: 1835895 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1835910 - Posted: 14 Dec 2016, 16:58:21 UTC - in response to Message 1835894. . . It is a shame I am running Windows, I would love to see how well the Special Cuda 60 app performs on my GT730 (CC=3.5), even though I am planning on retiring it... You could always build a Linux machine for it, the parts are cheap on eBay. . . But then I would have to learn Linux. Stephen . ID: 1835910 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 1835921 - Posted: 14 Dec 2016, 18:35:28 UTC Learning Linux isn't too bad, especially if you use one of the modern distros - I've just reconfigured one of my PCs into a Linux box using "Mint" and am now getting the beast to run as I want it. The actual install took less than half an hour, sits alongside the Windows installation which is still there as a fall-back if I need it (or any of the data therein). The one thing I am having to think about is that only one of the GPUs is recognised by BOINC/S@H, and I do recall having a similar problem under Windows... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1835921 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1835926 - Posted: 14 Dec 2016, 19:07:50 UTC - in response to Message 1835921. The one thing I am having to think about is that only one of the GPUs is recognised by BOINC/S@H, and I do recall having a similar problem under Windows... If your version of Mint is based on Ubuntu 14.04 you might need to apply the GPU Manager fix. I think I've only had to apply it to the ATI machines, but, it might help. It took me over a Year to find this fix; https://bugs.launchpad.net/ubuntu/+source/ubuntu-drivers-common/+bug/1310489 The other link; http://askubuntu.com/questions/453902/problem-in-setting-up-amd-dual-graphics-trinity-radeon-hd-7660g-and-thames-ra/477006#477006 There are two possible workarounds: a) Edit /etc/init/gpu-manager.conf commenting out lines until it looks like this: #start on (starting lightdm # or starting kdm # or starting xdm # or starting lxdm) task exec gpu-manager --log /var/log/gpu-manager.log b) Remove the ubuntu-drivers-common package: sudo apt-get purge ubuntu-drivers-common After either a) or b), you should generate your xorg.conf again: sudo aticonfig --initial -f --adapter=all Finally reboot, xorg.conf should no longer be overwritten. I use option A. It's easy to edit the file from the gui by entering gksu nautilus into the Terminal, navigate to the gpu-manager.conf, and add the comments. ID: 1835926 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 1835934 - Posted: 14 Dec 2016, 20:50:09 UTC Great, but it didn't work :-( Mint (Cinamon, 18) is happy in telling me I have three GPUs, two GTX970, and one 1080. Prodding around I found, in one of the boinc-client directories a file called "coproc_info.xml", this file lists all three GPUs. I've set "use_all_gpus" to 1, that was the first thing I did. After trying (a) from the two options BOINC isn't detecting any of the GPUs - one step back :-( So, I've tried re-installing the drivers (Nvidia v.367.57) I've now lost BOINC manager from the GUI - oh joy, and I had everything suspended while I tried the above - now to find the command lines I need to get back in control. Ah, getting lots of "authorization failure -102" responses, hmm, I don't like that as it means the boinc user has lost his marbles somewhere along the road :-( Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1835934 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.