Linux CUDA 'Special' App finally available, featuring Low CPU use

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 83 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1834787 - Posted: 8 Dec 2016, 11:16:38 UTC - in response to Message 1834778.  

TBAR: Have you compared the speed of your compile compared to Petris different builds?.....

I was one of the First testers, been at it for over a Year now. I've tested hundreds of builds during that Year, right to p_zi3i. I haven't been sent any newer version that zi3i.
Your other theory doesn't take into account the use of Offline Benchmarking. The Benchmark App will identify the source of the problem. I just ran another series of tests which show the Pulsefind Error that was addressed in zi3f is still present, it's just a little better in the zi+ build than the zi3i build.

That's sad news but thanks that you rigorously monitor result quality.

If the issue really that I think of it, that's quite hard to track bug and it can manifest itself differently not only on different platforms but on different hardware too.
Apparently (had no time still to review the code so guessing here) there is parallelization of Pulse find search with splitting of periods through different workitems.
And _IF_ splitting done not only through few workitems but through few workgroups as well - it can be the bug in its current manifestation.
AFAIK by both OpenCL and CUDA design separate workgroups are completely independent entities. There is no ordering in their execution and no synching besides running in separate ordered kernel calls.
So, order can be chosen freely by Runtime and can be different for different platforms/hardware.

Worth to check this. I will have time for code review only after New Year perhaps, hardly earlier.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1834787 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1834788 - Posted: 8 Dec 2016, 11:20:31 UTC - in response to Message 1834781.  
Last modified: 8 Dec 2016, 11:24:12 UTC


Have you noticed it's always just One Bad Pulse? Never 2 or more, always one , no matter how many Pulses are found.

I think that's just the luck. Pulse is rare event. Bad Pulse is rare event between rare events. Just probability to get 2 bads in single task too low to easily catch it.

EDIT: but to check this worth to put all BAD Pulses in the table and see then what is common between them? Same FFT length for example, or some particular chirp sign, or some particular period and so forth.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1834788 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1834791 - Posted: 8 Dec 2016, 11:51:55 UTC - in response to Message 1834787.  

...If the issue really that I think of it, that's quite hard to track bug and it can manifest itself differently not only on different platforms but on different hardware too.
Apparently (had no time still to review the code so guessing here) there is parallelization of Pulse find search with splitting of periods through different workitems.
And _IF_ splitting done not only through few workitems but through few workgroups as well - it can be the bug in its current manifestation.
AFAIK by both OpenCL and CUDA design separate workgroups are completely independent entities. There is no ordering in their execution and no synching besides running in separate ordered kernel calls.
So, order can be chosen freely by Runtime and can be different for different platforms/hardware.
...


This is where the discontinued serial emulated device would have come in handy. a missing syncthreads() or warp level reduction not marked volatile can have similar behaviour. tough to locate something that could be missing indeed :)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1834791 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1835615 - Posted: 12 Dec 2016, 18:33:18 UTC

Now up to 10 downloads, Linux CUDA 6 Special App, and not a single report on how it's working. It would be nice to hear a little Feedback.
I ran driver 364.19 for a while and it also gave a couple Invalid Unmatched Overflows on the 750Ti, so, you need to use at least driver 367.xx.
Everything else appears to be working fine as long as those Quick Arecibo Overflows stay away. Seems none of the 'Special' Apps like those things.
Still waiting on the word about the Unroll problem...
ID: 1835615 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1835696 - Posted: 13 Dec 2016, 0:59:25 UTC - in response to Message 1835615.  

Still digging here. Will know more later on.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1835696 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20147
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1835809 - Posted: 14 Dec 2016, 0:51:41 UTC

TBar,

Just caught the end of this thread.

Silly busy here (Damned Americans... And silly IT... And...)...


Any value in trying some older CUDA GPUs?...


Happy fast crunchin,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1835809 · Report as offensive
Rockhount
Avatar

Send message
Joined: 29 May 00
Posts: 34
Credit: 31,935,954
RAC: 29
Germany
Message 1835847 - Posted: 14 Dec 2016, 6:22:53 UTC

Hi TBar,
I just downloaded your special cuda apps 6.0 & 4.2. At the moment the system is working on old cpu units.
Maybe today it will ask for cuda work. I've bind for first use the cuda 4.2 app in my app_info.

This machine is now with cuda linux:
https://setiathome.berkeley.edu/show_host_detail.php?hostid=1931980

If everything looks good I could try the cuda 6.2 app.

Keep crunching.
Regards from nothern Germany
Roman

SETI@home classic workunits 207,059
SETI@home classic CPU time 1,251,095 hours

ID: 1835847 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1835850 - Posted: 14 Dec 2016, 6:56:11 UTC - in response to Message 1835847.  

Any value in trying some older CUDA GPUs?...

Hi TBar,
I just downloaded your special cuda apps 6.0 & 4.2. At the moment the system is working on old cpu units.
Maybe today it will ask for cuda work. I've bind for first use the cuda 4.2 app in my app_info.

This machine is now with cuda linux:
https://setiathome.berkeley.edu/show_host_detail.php?hostid=1931980
...

Well, the Special App will only work with GPUs that have a Compute Capability of 3.2 or higher. It will not work with older GPUs.
For the Older GPUs, such as the Pre-Fermi GPUs, and very Low End GPUs the CUDA 4.2 App will probably be best. For the GPUs between the Pre-Fermi and GPUs Lower than CC 3.2 the OpenCL App will be best. There is nothing Special about the 4.2 or OpenCL App. If you have a Supported GPU, such as the 750Ti, the best choice is the Special CUDA 60 App. This is Not the same as the Plain CUDA 60 App currently on Main, it's totally different. The Plain CUDA 60 App currently on Main is about the same as the CUDA 4.2 App, it was coded back in 2007 and does Not work well with the BLC tasks. The New Special App works the best on Supported GPUs. The New Special App was developed to run ONE task at a time per GPU, it's remarkably different than the Older CUDA Apps.
ID: 1835850 · Report as offensive
Rockhount
Avatar

Send message
Joined: 29 May 00
Posts: 34
Credit: 31,935,954
RAC: 29
Germany
Message 1835857 - Posted: 14 Dec 2016, 7:59:42 UTC

Ok, I will switch it, but I need the libcudart.so.6 and libcufft.so.6.
The old version libcudart.so.4 could found in your 7z file but not the new ones.
Regards from nothern Germany
Roman

SETI@home classic workunits 207,059
SETI@home classic CPU time 1,251,095 hours

ID: 1835857 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1835858 - Posted: 14 Dec 2016, 8:24:19 UTC - in response to Message 1835857.  

Linux CUDA 6 Special App

This App will only work on GPUs that are Compute Capability 3.2 or higher.
See the list of supported GPUs here, https://en.wikipedia.org/wiki/CUDA#GPUs_supported
Tested in Ubuntu 14.04.5 with nVidia driver 367.44
You must download the CUDA Libraries from SETI Beta, see the Docs folder, Notes_x41p_zi+.txt for the links.
Also, make sure to read the Notes about setting the -unroll number in app_info.xml.

The cufft file is too large to post.
You will also need to set the -unroll in the app_info.xml to match your GPU. For the 750Ti the best setting is -unroll 5

http://boinc2.ssl.berkeley.edu/beta/download/libcudart.so.6.0
http://boinc2.ssl.berkeley.edu/beta/download/libcufft.so.6.0
ID: 1835858 · Report as offensive
Rockhount
Avatar

Send message
Joined: 29 May 00
Posts: 34
Credit: 31,935,954
RAC: 29
Germany
Message 1835862 - Posted: 14 Dec 2016, 8:57:17 UTC

Ok, will update when the machine crunched down the cache with cuda 4.2 units.
Some of then are already finished.
https://setiathome.berkeley.edu/show_host_detail.php?hostid=1931980

Maintanence with putty is a really nice way when you're at work and the machine is at home especially when you're using a mobile device with limited bandwith.
Regards from nothern Germany
Roman

SETI@home classic workunits 207,059
SETI@home classic CPU time 1,251,095 hours

ID: 1835862 · Report as offensive
Rockhount
Avatar

Send message
Joined: 29 May 00
Posts: 34
Credit: 31,935,954
RAC: 29
Germany
Message 1835876 - Posted: 14 Dec 2016, 10:54:37 UTC

First inconclusive found with cuda 4.2 app (pulse find issue).

https://setiathome.berkeley.edu/workunit.php?wuid=2360508589
Regards from nothern Germany
Roman

SETI@home classic workunits 207,059
SETI@home classic CPU time 1,251,095 hours

ID: 1835876 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1835884 - Posted: 14 Dec 2016, 11:36:03 UTC - in response to Message 1835850.  
Last modified: 14 Dec 2016, 12:20:52 UTC


Well, the Special App will only work with GPUs that have a Compute Capability of 3.2 or higher. It will not work with older GPUs.
For the Older GPUs, such as the Pre-Fermi GPUs, and very Low End GPUs the CUDA 4.2 App will probably be best. For the GPUs between the Pre-Fermi and GPUs Lower than CC 3.2 the OpenCL App will be best. There is nothing Special about the 4.2 or OpenCL App. If you have a Supported GPU, such as the 750Ti, the best choice is the Special CUDA 60 App. This is Not the same as the Plain CUDA 60 App currently on Main, it's totally different. The Plain CUDA 60 App currently on Main is about the same as the CUDA 4.2 App, it was coded back in 2007 and does Not work well with the BLC tasks. The New Special App works the best on Supported GPUs. The New Special App was developed to run ONE task at a time per GPU, it's remarkably different than the Older CUDA Apps.


. . It is a shame I am running Windows, I would love to see how well the Special Cuda 60 app performs on my GT730 (CC=3.5), even though I am planning on retiring it if MSI ever actually release the low profile version of the GTX1050 ti. It would be nice to improve its output until that happens. But I am not at the level of compiling and tweaking software at the programming level.

. . Even though the GT 730 is a fairly newish card it actually performs much better under Cuda50 than with the OpenCL app (SoG r3557).

Stephen

.
ID: 1835884 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1835892 - Posted: 14 Dec 2016, 13:11:07 UTC - in response to Message 1835876.  

First inconclusive found with cuda 4.2 app (pulse find issue).
https://setiathome.berkeley.edu/workunit.php?wuid=2360508589

That's a known issue with your Wingperson, SETI@home v8 v8.00 (opencl_intel_gpu_sah) x86_64-apple-darwin, and it isn't going to get any better anytime soon as the newer Apple Intel App at Beta has the same problem(s). The CUDA 42 App is very good, presently more reliable than the Special App. Hopefully the last few bugs will be solved in the Special App soon. You will probably receive a few Inconclusives with the New App, but, nowhere close to the number the 'Stock' Apple iGPU App receives.

You are running with the Default Unroll number of 2; Using unroll = 2 from command line args, you would receive better times if you set it to -unroll 5 in the app_info.xml;
<version_num>801</version_num>
<plan_class>cuda60</plan_class>
<cmdline>-bs -unroll 5</cmdline>

Otherwise it appears to be working normally, Congratulations.
ID: 1835892 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1835894 - Posted: 14 Dec 2016, 13:15:09 UTC - in response to Message 1835884.  

. . It is a shame I am running Windows, I would love to see how well the Special Cuda 60 app performs on my GT730 (CC=3.5), even though I am planning on retiring it...

You could always build a Linux machine for it, the parts are cheap on eBay.
ID: 1835894 · Report as offensive
Rockhount
Avatar

Send message
Joined: 29 May 00
Posts: 34
Credit: 31,935,954
RAC: 29
Germany
Message 1835895 - Posted: 14 Dec 2016, 13:35:02 UTC

Just changed the -bs -unroll paramter. The memory consumption increase from 870MiB to ~1086MiB with the cuda 6.0 version.
The old cuda 4.2 version was only at 270MiB with -bs -unroll 2.

Power consumption and GPU utilisation is about the same ~ 24W @ 100% (44°C) with the Gigabyte 750ti
The calculation time decreses from 1200-1300s to 800-900s (cuda 4.2 -> cuda 6)

Nice work TBar. With this app I squeeze a lot more credits out of this old machine.

Thumbs up!
Regards from nothern Germany
Roman

SETI@home classic workunits 207,059
SETI@home classic CPU time 1,251,095 hours

ID: 1835895 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1835910 - Posted: 14 Dec 2016, 16:58:21 UTC - in response to Message 1835894.  

. . It is a shame I am running Windows, I would love to see how well the Special Cuda 60 app performs on my GT730 (CC=3.5), even though I am planning on retiring it...

You could always build a Linux machine for it, the parts are cheap on eBay.


. . But then I would have to learn Linux.

Stephen

.
ID: 1835910 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1835921 - Posted: 14 Dec 2016, 18:35:28 UTC

Learning Linux isn't too bad, especially if you use one of the modern distros - I've just reconfigured one of my PCs into a Linux box using "Mint" and am now getting the beast to run as I want it. The actual install took less than half an hour, sits alongside the Windows installation which is still there as a fall-back if I need it (or any of the data therein).
The one thing I am having to think about is that only one of the GPUs is recognised by BOINC/S@H, and I do recall having a similar problem under Windows...
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1835921 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1835926 - Posted: 14 Dec 2016, 19:07:50 UTC - in response to Message 1835921.  

The one thing I am having to think about is that only one of the GPUs is recognised by BOINC/S@H, and I do recall having a similar problem under Windows...

If your version of Mint is based on Ubuntu 14.04 you might need to apply the GPU Manager fix. I think I've only had to apply it to the ATI machines, but, it might help.
It took me over a Year to find this fix; https://bugs.launchpad.net/ubuntu/+source/ubuntu-drivers-common/+bug/1310489
The other link; http://askubuntu.com/questions/453902/problem-in-setting-up-amd-dual-graphics-trinity-radeon-hd-7660g-and-thames-ra/477006#477006
There are two possible workarounds:
a) Edit /etc/init/gpu-manager.conf commenting out lines until it looks like this:
#start on (starting lightdm
#          or starting kdm
#          or starting xdm
#          or starting lxdm)
task
exec gpu-manager --log /var/log/gpu-manager.log

b) Remove the ubuntu-drivers-common package:
sudo apt-get purge ubuntu-drivers-common

After either a) or b), you should generate your xorg.conf again:
sudo aticonfig --initial -f --adapter=all

Finally reboot, xorg.conf should no longer be overwritten.

I use option A.
It's easy to edit the file from the gui by entering gksu nautilus into the Terminal, navigate to the gpu-manager.conf, and add the comments.
ID: 1835926 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22160
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1835934 - Posted: 14 Dec 2016, 20:50:09 UTC

Great, but it didn't work :-(
Mint (Cinamon, 18) is happy in telling me I have three GPUs, two GTX970, and one 1080.
Prodding around I found, in one of the boinc-client directories a file called "coproc_info.xml", this file lists all three GPUs. I've set "use_all_gpus" to 1, that was the first thing I did. After trying (a) from the two options BOINC isn't detecting any of the GPUs - one step back :-(
So, I've tried re-installing the drivers (Nvidia v.367.57) I've now lost BOINC manager from the GUI - oh joy, and I had everything suspended while I tried the above - now to find the command lines I need to get back in control.
Ah, getting lots of "authorization failure -102" responses, hmm, I don't like that as it means the boinc user has lost his marbles somewhere along the road :-(
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1835934 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 . . . 83 · Next

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.