Setting up a Linux machine to crunch CUDA80 for Windows users

Message boards : Number crunching : Setting up a Linux machine to crunch CUDA80 for Windows users
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 43 · Next

AuthorMessage
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2631
Credit: 48,127,191
RAC: 131,354
Australia
Message 1861507 - Posted: 14 Apr 2017, 23:59:46 UTC - in response to Message 1861501.  
Last modified: 15 Apr 2017, 0:00:49 UTC


Only testing can tell. Set the -bs flag and watch what happens to run times.

. . OK, I have looked at app_info.xml and found the line ...

<cmdline> -bs -unroll 10 </cmdline>

. . So I take it that it is already set. Do you want me to take it out and see what difference it makes? Or just leave well enough alone :)

I'm sure a lot of people would like to know if and how it effects the performance.

. . OK I will remove the -bs flag and check the outcomes. Which rig would you like me to try, the one with one 1050ti or the one with the two 1060s?


. . Being impatient I decided to try the rig with the 2 x1060s (because it was easier). Strangely there is very little noticeable effect. If anything run times may be a few seconds quicker without it. What exactly does -bs do? With it off CPU use is slightly lower, runtimes are maybe a few seconds faster, GPU temps are up a degree or 3. It seems to be using the GPUs more and the CPU less with it off.

Stephen

??
ID: 1861507 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2631
Credit: 48,127,191
RAC: 131,354
Australia
Message 1861543 - Posted: 15 Apr 2017, 2:12:02 UTC

I'm sure a lot of people would like to know if and how it effects the performance.

. . OK I will remove the -bs flag and check the outcomes. Which rig would you like me to try, the one with one 1050ti or the one with the two 1060s?
. . Being impatient I decided to try the rig with the 2 x1060s (because it was easier). Strangely there is very little noticeable effect. If anything run times may be a few seconds quicker without it. What exactly does -bs do? With it off CPU use is slightly lower, runtimes are maybe a few seconds faster, GPU temps are up a degree or 3. It seems to be using the GPUs more and the CPU less with it off.


. . I have now repeated the change on the Core2 Duo with the GTX1050ti. The results are more definite on this rig. There is a clear reduction in run times. I will have to check again over the next few days with different batches of tasks to confirm it is not random, but it seems clear that these rigs like it better with -bs off, and that's no bs.

Stephen

:)
ID: 1861543 · Report as offensive     Reply Quote
Profile petri33Project Donor
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1465
Credit: 269,152,784
RAC: 295,798
Finland
Message 1861594 - Posted: 15 Apr 2017, 6:03:58 UTC

Thanks Stephen,

When set the -bs tells the CUDA driver CPU part of the code to wait for GPU to finish its tasks in blocking sync mode. That should reduce CPU usage and increase run time a bit. GPU will run less hot. Blocking sync allows the process to go idle while waiting.
Without -bs the CPU spin loops waiting actively for GPU to finish its tasks. It should be quicker, use more CPU and push GPU harder because it is getting more work done.

So your measurements confirm that without bs it is faster, produces more heat on the GPU, but that is strange if CPU is less heat stressed.
One explanation could be that you have all CPU threads/cores running setiMBCPU and the active spin-loop lets the math intensive setiMBCPU to run less (slowing it down).

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1861594 · Report as offensive     Reply Quote
Profile MarkJProject Donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1044
Credit: 50,379,755
RAC: 3,172
Australia
Message 1861619 - Posted: 15 Apr 2017, 7:49:44 UTC
Last modified: 15 Apr 2017, 8:30:16 UTC

Rather than using an outdated Ubuntu this is what I used with Debian. This will get you the latest Debian (Jessie), latest kernel (4.9) and the 7.6.33 BOINC client. This won't give you the CUDA80 app but you should be up and running with a CUDA capable machine after doing this.


Part 1 - Install Debian
I used the Debian 8.7 net install for this. You’ll need a thumb drive or a blank CD. Download it from http://www.debian.org/distrib/ and write the ISO image to CD or thumb drive.

Boot off the thumb drive or CD. It will start up the Debian installer

Install Debian select, SSH server and whatever desktop you prefer and remove all other selections. Once done it will reboot.


Part 2 - Install Nvidia software
Login as root, open a xterm window and type the following commands:

cd /etc/apt

nano sources.list (nano is a text editor)

Change “Jessie Main” lines to “jessie main contrib non-free” and add a Jessie-backports line. It should look like this when you're done. I'm using httpredir as it will pick the fastest server.

deb http://httpredir.debian.org/debian/ jessie main contrib non-free
deb http://security.debian.org/ jessie/updates main contrib non-free
deb http://httpredir.debian.org/debian/ jessie-updates main contrib non-free
deb http://httpredir.debian.org/debian/ jessie-backports main contrib non-free

Exit out of nano and save the file (Control-O followed by Control-X)

apt update

apt install –t jessie-backports firmware-realtek (if needed)

apt install –t jessie-backports linux-image-amd64

apt install –t jessie-backports nvida-kernel-dkms nvidia-smi nvidia-xconfig

apt install –t jessie-backports nvidia-opencl-icd (if you want OpenCL support)

nvidia-xconfig

sync

reboot


Part 3 – Install BOINC
login as root. Start xterm again and type the following commands:

apt install –t Jessie-backports boinc-nvidia-cuda boinc-manager

I got the cuda libraries that Petri posted earlier in this thread and put them in the /var/lib/boinc-client directory. They may not be needed. Make sure they're marked as executable and are owned by user boinc (do a chown boinc:boinc lib* command).

sync

reboot
BOINC blog
ID: 1861619 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2631
Credit: 48,127,191
RAC: 131,354
Australia
Message 1861665 - Posted: 15 Apr 2017, 14:36:59 UTC - in response to Message 1861594.  

Thanks Stephen,

When set the -bs tells the CUDA driver CPU part of the code to wait for GPU to finish its tasks in blocking sync mode. That should reduce CPU usage and increase run time a bit. GPU will run less hot. Blocking sync allows the process to go idle while waiting.
Without -bs the CPU spin loops waiting actively for GPU to finish its tasks. It should be quicker, use more CPU and push GPU harder because it is getting more work done.

So your measurements confirm that without bs it is faster, produces more heat on the GPU, but that is strange if CPU is less heat stressed.
One explanation could be that you have all CPU threads/cores running setiMBCPU and the active spin-loop lets the math intensive setiMBCPU to run less (slowing it down).

Petri


. . On the rig where the reduction in run times is marginal it has only a 2 core Pentium D supporting 2 GTX1060s and both CPU cores are busy doing that, no CPU crunching. The other rig where the run times reduced more significantly has a Core2 Duo with only one GTX1050ti and one CPU core crunching. I guess the better ratio of CPU resources for the GPU made the greater difference.

. . The A/C has been working so temps are well down at the moment. But I will be shutting it down for the night ...

Stephen

.
ID: 1861665 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3789
Credit: 186,307,125
RAC: 237,507
United States
Message 1861666 - Posted: 15 Apr 2017, 14:44:11 UTC - in response to Message 1861665.  

Did you remember to restart BOINC after changing the app_info settings? Changes to the app_info require a restart. Removing the -bs setting will cause the App to use 100% CPU, I've never seen a case where it didn't.
ID: 1861666 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2631
Credit: 48,127,191
RAC: 131,354
Australia
Message 1861673 - Posted: 15 Apr 2017, 16:09:54 UTC - in response to Message 1861666.  

Did you remember to restart BOINC after changing the app_info settings? Changes to the app_info require a restart. Removing the -bs setting will cause the App to use 100% CPU, I've never seen a case where it didn't.


. . So selecting "read config files" won't do it ??

. . I will execute a restart

Stephen

oops.
ID: 1861673 · Report as offensive     Reply Quote
Profile petri33Project Donor
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1465
Credit: 269,152,784
RAC: 295,798
Finland
Message 1861696 - Posted: 15 Apr 2017, 19:29:05 UTC - in response to Message 1861666.  

Did you remember to restart BOINC after changing the app_info settings? Changes to the app_info require a restart. Removing the -bs setting will cause the App to use 100% CPU, I've never seen a case where it didn't.


Right!
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1861696 · Report as offensive     Reply Quote
Profile petri33Project Donor
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1465
Credit: 269,152,784
RAC: 295,798
Finland
Message 1861697 - Posted: 15 Apr 2017, 19:31:04 UTC - in response to Message 1861673.  

Did you remember to restart BOINC after changing the app_info settings? Changes to the app_info require a restart. Removing the -bs setting will cause the App to use 100% CPU, I've never seen a case where it didn't.


. . So selecting "read config files" won't do it ??

. . I will execute a restart

Stephen

oops.


You could use an app_config.xml I guess to make changes to parameters 'immediate' to next task starting.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1861697 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2631
Credit: 48,127,191
RAC: 131,354
Australia
Message 1861764 - Posted: 16 Apr 2017, 0:46:34 UTC - in response to Message 1861696.  

Did you remember to restart BOINC after changing the app_info settings? Changes to the app_info require a restart. Removing the -bs setting will cause the App to use 100% CPU, I've never seen a case where it didn't.


Right!


. . Prepare to have your world rocked :)

. . On the Pentium rig I closed and restarted BOINC, no significant change, run times are similar and CPU use is still similar. I shut down and restarted whole rig. Run times are similar and CPU usage is similar. I rechecked app_info.xml and confirmed I definitely have removed -bs. For whatever reason (I am presuming because of the relatively limited resources of the old architecture) the presence or absence of -bs on that machine makes little or no real difference.

. . On the Core2 Duo, I had to restart the whole rig anyway because I cannot get Boinc manager to re-launch the BOINC client once halted, but it launches AOK on start up. The CPU use is now as predicted, about 100%, but drops noticeably between running tasks. Run times have formed slightly tighter groupings but show only moderate/minor reductions. Halflings still take 1.66 to 1.75 mins and are tightly grouped, NARA (normal AR Arecibo) take slightly under 5 mins now where they were more typically 5 to 5.25 mins before. Blc04 are taking about the same at 5.5 to 5.66 mins and Blc13 take about the same at 5 to 5.25 mins. The main difference is in NARA runtimes which are significantly quicker with about a 10 to 20 sec reduction, not bad out of a 300 to 330 second runtime, 5% is still 5% :)

Stephen

.
ID: 1861764 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3789
Credit: 186,307,125
RAC: 237,507
United States
Message 1861765 - Posted: 16 Apr 2017, 1:09:55 UTC - in response to Message 1861764.  

Have you considered how many watts that 5% is costing you? You forget I'm the one that built and tested that App. On a machine with ample CPU resources it was common to see the CPU use spike to 110% per task on the CPU monitor before I convinced Petri to add the Blocking Sync feature. If you're OK with burning up near 60% CPU wattage on your Dual core CPU for a 5% gain then go for it.
ID: 1861765 · Report as offensive     Reply Quote
Profile MarkJProject Donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1044
Credit: 50,379,755
RAC: 3,172
Australia
Message 1861775 - Posted: 16 Apr 2017, 1:39:04 UTC - in response to Message 1861619.  

I got the cuda libraries that Petri posted earlier in this thread and put them in the /var/lib/boinc-client directory. They may not be needed. Make sure they're marked as executable and are owned by user boinc (do a chown boinc:boinc lib* command).

Apparently you don't need these to get it to recognise CUDA. You will of course need to put them and the app into the projects/setiathome folder along with an app_info.
BOINC blog
ID: 1861775 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2631
Credit: 48,127,191
RAC: 131,354
Australia
Message 1861842 - Posted: 16 Apr 2017, 10:19:25 UTC - in response to Message 1861765.  
Last modified: 16 Apr 2017, 10:23:52 UTC

Have you considered how many watts that 5% is costing you? You forget I'm the one that built and tested that App. On a machine with ample CPU resources it was common to see the CPU use spike to 110% per task on the CPU monitor before I convinced Petri to add the Blocking Sync feature. If you're OK with burning up near 60% CPU wattage on your Dual core CPU for a 5% gain then go for it.


. . My rigs all run off separate power boards that draw their power from the wall via indivdual power meters (I am a very inquisitive sort of person). And with -bs on the C2D was drawing about 115W, now it is drawing about 120W or maybe 125W. So yes it is probably slightly better value running with -bs on but I would have to restart the whole thing again and it is only a small difference after all. The Pentium rig was drawing 330 to 335W, now about 340 to 345W so again not enough of a difference to make me shut it all down again. If I need to make any other changes I can take care of it then.

. . If there is anything you want me to try out of curiosity I would be happy to oblige.

Stephen

:)
ID: 1861842 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2631
Credit: 48,127,191
RAC: 131,354
Australia
Message 1861844 - Posted: 16 Apr 2017, 10:26:23 UTC - in response to Message 1861775.  

I got the cuda libraries that Petri posted earlier in this thread and put them in the /var/lib/boinc-client directory. They may not be needed. Make sure they're marked as executable and are owned by user boinc (do a chown boinc:boinc lib* command).

Apparently you don't need these to get it to recognise CUDA. You will of course need to put them and the app into the projects/setiathome folder along with an app_info.


. . Well am of the understanding that those libraries are essential for CUDA80 to do its thing. But you can discuss that with TBar or Petri, it is their app. Of course if you only want to run stock CUDA60 then no worries :)

Stephen

.
ID: 1861844 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3789
Credit: 186,307,125
RAC: 237,507
United States
Message 1861861 - Posted: 16 Apr 2017, 13:20:44 UTC - in response to Message 1861842.  

. . If there is anything you want me to try out of curiosity I would be happy to oblige.

Stephen
:)
I've noticed you're running Sleep with the OpenCL App in Windows. Sleep is basically the same as -BS on the CUDA App. Have you calculated the difference between using Sleep in OpenCL verses not using Sleep? That might be interesting. Prior to Sleep & BS both Apps used near 100% CPU, you can even speed up the Old CUDA App by using the -poll cmd which also uses 100% CPU. There is a larger speed up on the older CUDA App using -poll, but very few people have chosen to use it.
ID: 1861861 · Report as offensive     Reply Quote
Mark Loukko

Send message
Joined: 7 Jun 99
Posts: 52
Credit: 17,835,229
RAC: 9,472
Canada
Message 1861865 - Posted: 16 Apr 2017, 14:00:44 UTC - in response to Message 1861418.  

Hi Petri,

How much effort would it take to create a Windows version of your app? It sounds like there are real benefits.
I would really like to use the most efficient application possible and contribute as much as possible to SETI using the resources I have.

Cheers
Mark
ID: 1861865 · Report as offensive     Reply Quote
rob smithProject Donor
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 15197
Credit: 251,581,734
RAC: 324,920
United Kingdom
Message 1861867 - Posted: 16 Apr 2017, 14:05:48 UTC

I believe Jason (and maybe a few others) are working on one - It has proven to be less easy due to the way Windows does various timing things that Petri & TBar have utilised to great effect.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1861867 · Report as offensive     Reply Quote
Profile petri33Project Donor
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1465
Credit: 269,152,784
RAC: 295,798
Finland
Message 1861888 - Posted: 16 Apr 2017, 16:19:28 UTC - in response to Message 1861865.  

Hi Petri,

How much effort would it take to create a Windows version of your app? It sounds like there are real benefits.
I would really like to use the most efficient application possible and contribute as much as possible to SETI using the resources I have.

Cheers
Mark


Hi Mark,
rob_smith kind of answered that. It would need that you have a working windows compilation environment and then copying the new code over that. Then some fiddling with C/C++ header files and a lot of testing.
Jason_gee is testing. He can compile and he has a version for windows, but due to the big number of errors in pulseFind still it is not ready to be released.
My code has an issue with sometimes reporting a wrong pulse as the best found.
I'm sure the windows version will come eventually.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1861888 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2631
Credit: 48,127,191
RAC: 131,354
Australia
Message 1861922 - Posted: 16 Apr 2017, 21:48:53 UTC - in response to Message 1861861.  

. . If there is anything you want me to try out of curiosity I would be happy to oblige.

Stephen
:)

I've noticed you're running Sleep with the OpenCL App in Windows. Sleep is basically the same as -BS on the CUDA App. Have you calculated the difference between using Sleep in OpenCL verses not using Sleep? That might be interesting. Prior to Sleep & BS both Apps used near 100% CPU, you can even speed up the Old CUDA App by using the -poll cmd which also uses 100% CPU. There is a larger speed up on the older CUDA App using -poll, but very few people have chosen to use it.


. . I didn't even hear about the -poll switch until I was well and truly into SoG and trying to learn it's tuning options. If I had still been doing CUDA50 I would have given that a try. There is a school of thought (Hi Grant and Zalster) that sleep should not be used as it reduces the overall output, but I am definitely not of that school, though there are hardware configurations where it will hold up. But with a nice little i5-6400 I would lose a CPU core to do that, and I am sure the smallish improvement on my modest GTX950 would not come close to making up for the lost productivity of that one core crunching. If this rig was running a 1070 or 1080 then I would probably give that a run. If you are really curious I can change to that setup for a test period.

. . I am actually considering reducing the load cycle for the CPU in the C2D from 100% to maybe 80% or 75% to see if the reduced CPU load speeds up the CUDA80 app at all. Damn the torpedos, I'll give that a try . It is always nice to know the answers to questions like this.

Stephen

:)
ID: 1861922 · Report as offensive     Reply Quote
Stephen "Heretic"Project Donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 2631
Credit: 48,127,191
RAC: 131,354
Australia
Message 1861939 - Posted: 16 Apr 2017, 23:07:50 UTC

. . I have set Max CPU load to 75% to free some CPU time for CUDA80, it has increased the GPU usage by a small amount, it is sitting in the mid to high 90's now.

Mon Apr 17 08:58:47 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 0000:01:00.0      On |                  N/A |
| 80%   60C    P0    61W /  75W |   1552MiB /  4033MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1053    G   /usr/lib/xorg/Xorg                             101MiB |
|    0      3122    G   compiz                                          29MiB |
|    0      9009    C   ...ome_x41p_zi3k+_x86_64-pc-linux-gnu_cuda80  1417MiB |
+-----------------------------------------------------------------------------+


. . The CPU use is cycling between about 60% and 100% as BOINC adjusts the CPU time for crunching.
. . Run times seem to have dropped by a few more seconds but not conclusive yet, I will monitor it for while.

. . Having fun!

Stephen

:)
ID: 1861939 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 43 · Next

Message boards : Number crunching : Setting up a Linux machine to crunch CUDA80 for Windows users


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.