Linux CUDA 'Special' App finally available, featuring Low CPU use

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 83 · Next

AuthorMessage
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1839522 - Posted: 2 Jan 2017, 1:42:24 UTC - in response to Message 1839500.  
Last modified: 2 Jan 2017, 1:43:26 UTC

I have a list of issues, but none of the others prevent wider testing so much as present much rarer annoyances. I'll probably list those once I (or anyone else) resolves the pulsefinding issue, since that's priority. Later tonight, am going to attempt to bring the Mac Pro up on win10 (usb stick is prepared). That currently has el capitan, sierra, and Ubuntu 16.04 LTS. once same machine/device cross platform comparison is doable things get a bit easier. Fingers crossed for at least a sierra 1050ti driver sometime soon.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1839522 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4803
Credit: 549,538,426
RAC: 1,269,078
United States
Message 1839535 - Posted: 2 Jan 2017, 3:19:54 UTC - in response to Message 1839522.  

Well, I just read the last few weeks at Mac Rumors and it doesn't look good for any Pascal drivers. The final word appears to be here, ...Actually, we did hear from the Nvidia CEO himself and it's not promising. The people in that thread would have it working if it were possible. So, you might be better off to put your 980 in the Mac and forget about the Mac running Pascal for now. The 980 works fine in a Mac, I've run across a few while looking over the Mac nVidia OpenCL App.
ID: 1839535 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1839540 - Posted: 2 Jan 2017, 3:55:17 UTC - in response to Message 1839535.  

Ugh, frustrating. Well we'll see. Probably if nothing happens between now and getting Windows working with the Radeon and 1050ti, then I'll rather umbilical in the 780 again for regression testing purposes (relatively easy, but ugly). If the 980 can be flashed to enable the boot screen I'd prefer that, since it'd drop 1 GPU I'm not really using for crunching, and avoid snaking cables in through the back. Failing that, the 680 can be more readily flashed apparently, and is currently sitting idle. Unfortunately alpha code would need some jiggering to work on 680, but then again that's going to have to happen eventually anyway.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1839540 · Report as offensive
Profile tullio
Volunteer tester

Send message
Joined: 9 Apr 04
Posts: 7666
Credit: 2,717,612
RAC: 2,580
Italy
Message 1840327 - Posted: 6 Jan 2017, 11:18:04 UTC

I have installed it on my SuSE Leap 42.2 and it works both as CPU and GPU.
Tullio
ID: 1840327 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2766
Credit: 546,440,464
RAC: 844,443
Canada
Message 1840343 - Posted: 6 Jan 2017, 13:10:54 UTC

I just looked at Petri's times for his stack of 1080's ... WOW 45s for Alerico tasks !!

It looks like he is running a 2.75hour GPU cache, that's speedy.
ID: 1840343 · Report as offensive
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 4623
Credit: 144,020,989
RAC: 237,071
Australia
Message 1840431 - Posted: 6 Jan 2017, 20:59:08 UTC - in response to Message 1840343.  

I just looked at Petri's times for his stack of 1080's ... WOW 45s for Alerico tasks !!

It looks like he is running a 2.75hour GPU cache, that's speedy.


. . Absolutely, but a pain during the weekly outages.

Stephen

:)
ID: 1840431 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 17719
Credit: 402,636,076
RAC: 155,880
United Kingdom
Message 1840433 - Posted: 6 Jan 2017, 21:05:02 UTC

...more like about an hour - remember he only has 100 tasks for each of his GPUs on that rig.
(My own "big" cruncher barely manages three hours on the 300 tasks available for the three GPUs, and that one is nowhere as near as fast as Petri's monster)
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1840433 · Report as offensive
Sidewinder Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 Nov 09
Posts: 100
Credit: 79,103,942
RAC: 4,833
United States
Message 1840581 - Posted: 7 Jan 2017, 10:35:36 UTC
Last modified: 7 Jan 2017, 10:47:23 UTC

My GTX 1050 Ti's arrived today and switched over to Tbar's app. It appears to be running well on the 1050 Ti's and Arch.

https://setiathome.berkeley.edu/show_host_detail.php?hostid=8173821

Name            : nvidia
Version         : 375.26-1
Description     : NVIDIA drivers for linux
Architecture    : x86_64

Name            : cuda
Version         : 8.0.44-2
Description     : NVIDIA's GPU programming toolkit
Architecture    : x86_64


GPU stats:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 0000:02:00.0     Off |                  N/A |
| 39%   62C    P0    59W /  75W |   1189MiB /  4036MiB |     87%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 105...  Off  | 0000:06:00.0     Off |                  N/A |
| 35%   45C    P0    60W /  75W |   1189MiB /  4038MiB |     92%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      3651    C   ...thome_x41p_zi+_x86_64-pc-linux-gnu_cuda60  1187MiB |
|    1      3526    C   ...thome_x41p_zi+_x86_64-pc-linux-gnu_cuda60  1187MiB |
+-----------------------------------------------------------------------------+

ID: 1840581 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2766
Credit: 546,440,464
RAC: 844,443
Canada
Message 1840586 - Posted: 7 Jan 2017, 11:16:12 UTC - in response to Message 1840581.  

Hey sidewinder, How do you get those outputs from Ubuntu?
ID: 1840586 · Report as offensive
Sidewinder Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 Nov 09
Posts: 100
Credit: 79,103,942
RAC: 4,833
United States
Message 1840591 - Posted: 7 Jan 2017, 11:45:32 UTC - in response to Message 1840586.  
Last modified: 7 Jan 2017, 11:46:02 UTC

Hey sidewinder, How do you get those outputs from Ubuntu?


I'm on an Arch-based distro so it may be different for debian/ubuntu. The first is just a pacman (arch's package manager) query and the second is nvidia-smi which comes with the nvidia driver package.
ID: 1840591 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1654
Credit: 533,618,957
RAC: 469,284
Finland
Message 1840722 - Posted: 7 Jan 2017, 22:34:50 UTC - in response to Message 1840591.  

Hey sidewinder, How do you get those outputs from Ubuntu?


I'm on an Arch-based distro so it may be different for debian/ubuntu. The first is just a pacman (arch's package manager) query and the second is nvidia-smi which comes with the nvidia driver package.


Hi,

I run
nvidia-smi -l

on a separate console window all the time..
|===============================+======================+======================|
|   0  GeForce GTX 1080    On   | 0000:05:00.0      On |                  N/A |
| 96%   66C    P2   173W / 215W |   4258MiB /  8112MiB |     94%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    On   | 0000:06:00.0     Off |                  N/A |
|100%   70C    P2   148W / 215W |   3896MiB /  8113MiB |     94%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 1080    On   | 0000:09:00.0     Off |                  N/A |
| 96%   60C    P2   148W / 215W |   3896MiB /  8113MiB |     90%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 1080    On   | 0000:0A:00.0     Off |                  N/A |
| 96%   59C    P2   140W / 215W |   3896MiB /  8113MiB |     94%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0       898    G   /usr/bin/X                                     214MiB |
|    0      1537    G   compiz                                         146MiB |
|    0     11565    C   ...thome_x41zc_x86_64-pc-linux-gnu_cuda65_v8  3893MiB |
|    1     11574    C   ...thome_x41zc_x86_64-pc-linux-gnu_cuda65_v8  3893MiB |
|    2     12029    C   ...thome_x41zc_x86_64-pc-linux-gnu_cuda65_v8  3893MiB |
|    3     11947    C   ...thome_x41zc_x86_64-pc-linux-gnu_cuda65_v8  3893MiB |
+-----------------------------------------------------------------------------+



To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1840722 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2766
Credit: 546,440,464
RAC: 844,443
Canada
Message 1840761 - Posted: 8 Jan 2017, 2:34:20 UTC
Last modified: 8 Jan 2017, 2:35:57 UTC

Thanks guys,
I'm a little surprised to see my 980 using less power than a 1080

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980     Off  | 0000:01:00.0     Off |                  N/A |
| 26%   37C    P2    94W / 180W |   2007MiB /  4037MiB |     48%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1070    Off  | 0000:03:00.0     Off |                  N/A |
| 51%   72C    P2    72W / 151W |   1810MiB /  8113MiB |     39%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1358    G   /usr/lib/xorg/Xorg                             181MiB |
|    0      2465    G   compiz                                          38MiB |
|    0      2901    G   /usr/lib/firefox/plugin-container                4MiB |
|    0      5762    C   ...thome_x41p_zi+_x86_64-pc-linux-gnu_cuda60  1779MiB |
|    1      5798    C   ...thome_x41p_zi+_x86_64-pc-linux-gnu_cuda60  1807MiB |
+-----------------------------------------------------------------------------+
ID: 1840761 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11562
Credit: 170,513,160
RAC: 101,702
Australia
Message 1840768 - Posted: 8 Jan 2017, 3:13:20 UTC - in response to Message 1840761.  

Thanks guys,
I'm a little surprised to see my 980 using less power than a 1080

You need to compare your GPU Utilisation with Petrie's

GPU-Util
You
48%
39%

petrie33
94%
94%
90%
94%

And the APR
643 v 1734 GFLOPS.

The more work it does, the more power it needs.
Still, his cards use less than double your power, but pump out (almost) 3 times as much work.
Pascal really is impressive for it's work/watt hour.
Grant
Darwin NT
ID: 1840768 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4803
Credit: 549,538,426
RAC: 1,269,078
United States
Message 1840774 - Posted: 8 Jan 2017, 3:39:54 UTC - in response to Message 1840761.  

Thanks guys,
I'm a little surprised to see my 980 using less power than a 1080

That's low for the GPU Utilization. If it's constantly that low you might try raising the Unroll number. Petri is using -unroll 40 on his 1080s, which is twice as high as the compute units. However, it appears to be working well at 40.

Most people have a higher GPU-Util number,
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 750 Ti  Off  | 0000:01:00.0      On |                  N/A |
| 70%   61C    P0    24W /  38W |   1476MiB /  1999MiB |     92%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 750 Ti  Off  | 0000:02:00.0     Off |                  N/A |
| 53%   63C    P0    28W /  38W |   1284MiB /  2000MiB |     93%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 750 Ti  Off  | 0000:03:00.0     Off |                  N/A |
| 46%   54C    P0    29W /  38W |   1284MiB /  2000MiB |     95%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1242    G   /usr/lib/xorg/Xorg                             137MiB |
|    0      1989    G   compiz                                          54MiB |
|    0     19463    C   ...home_x41p_zi3k_x86_64-pc-linux-gnu_cuda75  1280MiB |
|    1     19714    C   ...home_x41p_zi3k_x86_64-pc-linux-gnu_cuda75  1280MiB |
|    2     19237    C   ...home_x41p_zi3k_x86_64-pc-linux-gnu_cuda75  1280MiB |
+-----------------------------------------------------------------------------+

Look at that, under 30 watts on all three.
ID: 1840774 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2766
Credit: 546,440,464
RAC: 844,443
Canada
Message 1840778 - Posted: 8 Jan 2017, 3:53:45 UTC - in response to Message 1840774.  
Last modified: 8 Jan 2017, 3:56:32 UTC

Yes I noticed Petri was set at 40 with 20 CU, and also some other comand line options that I'm unsure of (and not mention anywhere that I have found)

Using pfb = 8 from command line args
Using pfp = 40 from command line args
Using unroll = 40 from command line args


Edit: I have 15CU(1070) and 16CU(980) so that adds another twist.
ID: 1840778 · Report as offensive
Sidewinder Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 Nov 09
Posts: 100
Credit: 79,103,942
RAC: 4,833
United States
Message 1840782 - Posted: 8 Jan 2017, 4:06:03 UTC - in response to Message 1840778.  

Yes I noticed Petri was set at 40 with 20 CU, and also some other comand line options that I'm unsure of (and not mention anywhere that I have found)

Using pfb = 8 from command line args
Using pfp = 40 from command line args
Using unroll = 40 from command line args


Edit: I have 15CU(1070) and 16CU(980) so that adds another twist.


For reference, the 1050 Ti's have 6 CUs and my utilization numbers above are with unroll set to 6.

Tbar, do you think setting it any higher will keep the utilization higher on the 1050 Ti's? They typically stay at low 90's to mid-80's.
ID: 1840782 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2766
Credit: 546,440,464
RAC: 844,443
Canada
Message 1840783 - Posted: 8 Jan 2017, 4:22:44 UTC

Well this is certainly interesting, I was running 3 CPU tasks with 2 GPU tasks (total) at ~85% CPU usage, and figured I would run tasks out before restarting.

When one CPU task finished GPU temp/Usage went UP
When two CPU task finished GPU temp/Usage went UP
When three CPU task finished GPU temp/Usage went UP (slightly)

I guess it's because all tasks are at VeryLowPriority ... Any one know how to raise that in Linux for GPU? I did try -hp before and it didn't change anything.
ID: 1840783 · Report as offensive
Sidewinder Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 Nov 09
Posts: 100
Credit: 79,103,942
RAC: 4,833
United States
Message 1840785 - Posted: 8 Jan 2017, 4:41:53 UTC - in response to Message 1840783.  
Last modified: 8 Jan 2017, 4:42:23 UTC

Well this is certainly interesting, I was running 3 CPU tasks with 2 GPU tasks (total) at ~85% CPU usage, and figured I would run tasks out before restarting.

When one CPU task finished GPU temp/Usage went UP
When two CPU task finished GPU temp/Usage went UP
When three CPU task finished GPU temp/Usage went UP (slightly)

I guess it's because all tasks are at VeryLowPriority ... Any one know how to raise that in Linux for GPU? I did try -hp before and it didn't change anything.


It looks like you control that with "<process_priority_special>N</process_priority_special>" in the cc_config.xml file. See: https://boinc.berkeley.edu/wiki/Client_configuration. Looks like it requires BOINC v7.6.14 or higher.
ID: 1840785 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4803
Credit: 549,538,426
RAC: 1,269,078
United States
Message 1840786 - Posted: 8 Jan 2017, 4:47:02 UTC - in response to Message 1840782.  

Tbar, do you think setting it any higher will keep the utilization higher on the 1050 Ti's? They typically stay at low 90's to mid-80's.

You could try it and see. In my experience raising the unroll on the lower end cards will slow down the VLAR tasks. Of course we don't have any BLC tasks at present, but, the unroll is mainly for the BLC tasks and a setting for Arecibo tasks may not be best for VLARs.

It might be better to add the <no_priority_change>1</no_priority_change> line to your cc_config.xml file. That line will set All BOINC tasks to nice 0, but usually will increase GPU usage. I have it on my machines and the machine with three 750Ti is also running 2 CPU tasks on a old Quad core CPU yet the GPU usage is around the low to mid 90s. The pfb & pfp settings are the same from the Windows CUDA Apps and just as with the Windows CUDA Apps they produce little to No advantage. They Can cause increased Inconclusive results on some cards though. On my Mac it Slows down the tasks unless the settings are maxed out and then you get Many Inconclusive results. The last thing this App needs is More Inconclusive results, so, I don't recommend using those settings.
ID: 1840786 · Report as offensive
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2766
Credit: 546,440,464
RAC: 844,443
Canada
Message 1840787 - Posted: 8 Jan 2017, 5:22:12 UTC

It's certainly warming up in here ...
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 980     Off  | 0000:01:00.0     Off |                  N/A |
| 26%   40C    P2   133W / 180W |   3300MiB /  4037MiB |     92%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1070    Off  | 0000:03:00.0     Off |                  N/A |
| 60%   80C    P2   123W / 151W |   3106MiB /  8113MiB |     90%      Default |
+-------------------------------+----------------------+----------------------+


Down about 25% in times to around 3:30 ... more playing required :)
ID: 1840787 · Report as offensive
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 83 · Next

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.