Setting up Linux to crunch CUDA90 and above for Windows users

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 162 · Next

AuthorMessage
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1862304 - Posted: 19 Apr 2017, 5:01:18 UTC

. . @ TBar,

. . Follow on from previous message.

. . The Pent-D is now running with -bs off :). It is also happily purring along with version 0-72 generic. I followed the second option, that is I re-installed the Nvidia distro making sure that "yes" was selected to the question about making this version the preferred option in x/xserver for future reference. I can only presume that during the original install I just presumed that defaulted to yes and hit enter. This time I selected yes and it took.

. . Also just for the record, that icon is still labelled "Files". I don't suppose you have installed any patches that might have changed that ???

Stephen
ID: 1862304 · Report as offensive     Reply Quote
The_Matrix
Volunteer tester

Send message
Joined: 17 Nov 03
Posts: 414
Credit: 5,827,850
RAC: 0
Germany
Message 1862324 - Posted: 19 Apr 2017, 10:19:26 UTC
Last modified: 19 Apr 2017, 10:40:40 UTC

I walked the path, and so everythings is in place, but it says "can't open CL file" , what means that to me ?

wooops ! Now it worked without problem, It was a problem with access permissions.
ID: 1862324 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1862331 - Posted: 19 Apr 2017, 12:16:45 UTC - in response to Message 1862324.  

I walked the path, and so everythings is in place, but it says "can't open CL file" , what means that to me ?

wooops ! Now it worked without problem, It was a problem with access permissions.


. . Always a trap with Linux.

. . It will be good to see how well it goes :)

Stephen

:)
ID: 1862331 · Report as offensive     Reply Quote
The_Matrix
Volunteer tester

Send message
Joined: 17 Nov 03
Posts: 414
Credit: 5,827,850
RAC: 0
Germany
Message 1862361 - Posted: 19 Apr 2017, 15:15:12 UTC

got here 100 seconds less on a guppi 2-bit wu, let's see if it's happen again...
ID: 1862361 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1862387 - Posted: 19 Apr 2017, 19:52:07 UTC - in response to Message 1862361.  

got here 100 seconds less on a guppi 2-bit wu, let's see if it's happen again...


. . Hi The_Matrix

. . Nice to chat, but the purpose of this thread is to help Windows users come to terms with Linux to run the TBar/Petri special app for CUDA processing. You are running SoG which is fine but not the purpose here. If you want tuning info for that there are some very good threads with lots of helpful information.

. . If you do want to run CUDA80 you may be in trouble. Petri and TBar have made it quite clear that this app requires cards with a compute capability of 3.2 or higher to work properly. Your GTX670 only has a CC of 3.0 (see reference). I am not sure what the exact limitations are but I would expect it will not function properly, if at all. Maybe TBar or Petri will add more on this point.

https://developer.nvidia.com/cuda-gpus

. . Sorry if this is a disappointment for you.

Stephen

:(
ID: 1862387 · Report as offensive     Reply Quote
The_Matrix
Volunteer tester

Send message
Joined: 17 Nov 03
Posts: 414
Credit: 5,827,850
RAC: 0
Germany
Message 1862388 - Posted: 19 Apr 2017, 20:04:26 UTC
Last modified: 19 Apr 2017, 20:15:38 UTC

just running opencl, havend tried yet cuda 8.0, can't run the other AMD app. But's no problem.

How could i try Cuda 8.0 !?

ok, then i am out. Shame on me.
ID: 1862388 · Report as offensive     Reply Quote
Profile scocam
Avatar

Send message
Joined: 28 Feb 17
Posts: 27
Credit: 15,120,999
RAC: 0
United States
Message 1862408 - Posted: 19 Apr 2017, 21:42:00 UTC
Last modified: 19 Apr 2017, 21:43:00 UTC

After a few trials and errors, I'm up and running!

A couple questions regarding BOINC on Linux...
How do you run BOINC Manager? Do you launch it from the icon, use ./boincmgr or ./run_manager or some other method? The reason I ask is that after a reboot, the default startup instance of BOINC Manager shows-up as a completely different computer in my account (https://setiathome.berkeley.edu/show_host_detail.php?hostid=8248934). My main computer is https://setiathome.berkeley.edu/show_host_detail.php?hostid=8247189. Very odd and I'm not sure I understand how this is happening. To get to the main instance, upon reboot, I have to launch the manager from the icon and then quit it. Then I'm able to run ./boingmgr and get back to my primary computer's manager. I'm at a loss.

Also, my tasks are quite a bit different than I was expecting. This could be the current batch that downloaded that I'm gradually crunching through but wanted to get a sanity-check. Despite the long "Remaining" times, these tasks actually complete in 15 minutes or less. Any ideas? Anything else look out of the ordinary here?




Regards,
scocam
ID: 1862408 · Report as offensive     Reply Quote
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1862416 - Posted: 19 Apr 2017, 23:00:18 UTC - in response to Message 1862408.  
Last modified: 19 Apr 2017, 23:26:07 UTC

I think you have 2 BOINC installs
- The Ubuntu default in /var/lib/boinc-client
- A BOINC version in your Home directory

The Ubuntu version sets up as auto startup. You should be able to get rid of ONLY that version in the software center where you got it from. Be kind and abort tasks first if you are removing it.

In Ubuntu (not sure what you are running) you can find BOINC Manager in the app search in the top left corner then lock it to the launcher once open - but you might have 2 of them now. Mint also has a search to find it.

EDIT: The run times will sort themselves out as you complete more tasks.
ID: 1862416 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1862421 - Posted: 19 Apr 2017, 23:22:44 UTC - in response to Message 1862388.  

just running opencl, havend tried yet cuda 8.0, can't run the other AMD app. But's no problem.

How could i try Cuda 8.0 !?

ok, then i am out. Shame on me.


. . No shame, you are welcome to read and contribute, but if you want info on SoG (OpenCL) then there is plenty of that in the other threads. I wish I could offer a suggestion on how to run CUDA80 but from what I understand it requires the 'dynamic parallelism' that is only available on GPUs with CC of 3.2 and higher. You could try CUDA60 without the special app but I have doubts that it would give any better results than you can get with SoG. I am not sure what you mean by the 'other AMD app" but I am happy to help if I am able. If you ever upgrade your GPU then a whole new world opens up :)

Stephen

.
ID: 1862421 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1862424 - Posted: 19 Apr 2017, 23:36:14 UTC - in response to Message 1862416.  

I think you have 2 BOINC installs
- The Ubuntu default in /var/lib/boinc-client
- A BOINC version in your Home directory

The Ubuntu version sets up as auto startup. You should be able to get rid of ONLY that version in the software center where you got it from. Be kind and abort tasks first if you are removing it.

In Ubuntu (not sure what you are running) you can find BOINC Manager in the app search in the top left corner then lock it to the launcher once open - but you might have 2 of them now. Mint also has a search to find it.


. . Yep! Yep! I do indeed.

. . When I was stumbling my way through setting up Linux I installed the Seti install package as recommended by TBar, not realising that along the way I had previously managed to install the repository version as well. It caused me lots of confusion for a while but with lots of help I have come to terms/grips with that version and I have let it stay. I have not tried to uninstall the second version because I was afraid I might end up removing them both, or at least some common component that might cripple the version that is doing the work. On top of that I have been trying to get Stubbles' script to work for Linux and was using the "unused" version as a test bed. But I discovered that the Linux version of BOINC makes distinct differences in the format of client_state.xml from the Windows version, so I don't think I can get it to work, and that project has stalled.

. . Somebody already pointed out that if I launch BOINC from that app find utility it would let me lock it to the launch bar and it worked nicely. Who knew it does things that make that possible, while launching it from a terminal windows does not ... :( But thanks for the info. To the best of my knowledge nothing is trying to use the second install so I feel safe for now.

Stephen

:)
ID: 1862424 · Report as offensive     Reply Quote
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1862426 - Posted: 19 Apr 2017, 23:59:50 UTC
Last modified: 20 Apr 2017, 0:29:21 UTC

Further to the discussion about the -bs option of on/off, this is what I came up with for a work around to BOINC v7.2 not recognising priority flags in cc_config. Thanks to Petri for sharing his script so I could add to it.

sudo nvidia-smi -pm 1

for (( ; ; ))
do

  # Assign CPU Usage Threads (0-7)
  schedtool -v -a 1,3,5,7 `pidof setiathome_x41p_zi3k+_x86_64-pc-linux-gnu_cuda80`
  schedtool -v -a 1,3,5,7 `pidof setiathome_x41p_zi+_x86_64-pc-linux-gnu_cuda60`
  schedtool -v -a 1,3,5,7 `pidof setiathome_x41zc_x86_64-pc-linux-gnu_cuda80_2`
  schedtool -v -a 1,3,5,7 `pidof astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100`
  schedtool -a 3,5,7   `pidof compiz`
  schedtool -a 5,7,3   `pidof gnome-system-monitor`
  schedtool -a 7,5,3   `pidof vino-server`
  schedtool -a 0,2,4,6 `pidof MBv8_8.0r3305_ssse3_x86_64-pc-linux-gnu`
  schedtool -a 0,2,4,6 `pidof MBv8_8.06r3371_sse42_linux32`
  schedtool -a 0,2,4,6 `pidof MBv8_8.05r3345_avx_linux64`
  schedtool -a 0,2,4,6 `pidof ap_7.05r2728_sse3_linux64`
  #
  # Assign CPU Priority (19=Nice, -20=High)
  schedtool -n -15 `pidof setiathome_x41p_zi3k+_x86_64-pc-linux-gnu_cuda80`
  schedtool -n -15 `pidof setiathome_x41p_zi+_x86_64-pc-linux-gnu_cuda60`
  schedtool -n -15 `pidof setiathome_x41zc_x86_64-pc-linux-gnu_cuda80_2`
  schedtool -n -15 `pidof astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100`
  schedtool -n  19 `pidof vino-server`
  schedtool -n  6  `pidof MBv8_8.0r3305_ssse3_x86_64-pc-linux-gnu`
  schedtool -n  6  `pidof MBv8_8.06r3371_sse42_linux32`
  schedtool -n  0  `pidof MBv8_8.05r3345_avx_linux64`
  schedtool -n  6  `pidof ap_7.05r2728_sse3_linux64`
  sleep 5
  echo  "  CPU Priority and Assignment Script" 
  date
  #
  # Beta GPU apps, Add Core(1) to CPU app
  #schedtool -n -15 `pidof setiathome_8.01_x86_64-pc-linux-gnu__cuda60`
  #schedtool -n -15 `pidof setiathome_8.22_x86_64-pc-linux-gnu__opencl_nvidia_sah`
  #schedtool -n -15 `pidof setiathome_8.22_x86_64-pc-linux-gnu__opencl_nvidia_SoG`

done

EDIT: Read 'man schedtool' for more info, and edit as you like.
And you need to install schedtool, just type the command and it gives you instructions.

EDIT2: I run this in a root bash window 'sudo bash'
ID: 1862426 · Report as offensive     Reply Quote
Profile scocam
Avatar

Send message
Joined: 28 Feb 17
Posts: 27
Credit: 15,120,999
RAC: 0
United States
Message 1862446 - Posted: 20 Apr 2017, 3:57:42 UTC - in response to Message 1862416.  

Thank you, Brent. That was exactly correct. I'm all set!


Regards,
scocam
ID: 1862446 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1863744 - Posted: 26 Apr 2017, 14:07:10 UTC - in response to Message 1862136.  

...But now I realise he is referring to the problem I have when I accept the Linux updates which get to the login screen and accept the password but keep returning to the login screen instead of actually loading your desktop. So now I think I know why that is happening, the newer versions are loading the 375 driver and there is a conflict. So how do I amend the latest of the updates to remove the extraneous video drivers when I cannot log into it...
The most logical explanation is you didn't Register the Driver Module when you installed the Driver from nVidia. During the Install of the Vendor Driver you will be Asked if you want to Register the Module so it will be applied to future Kernel Updates. If you Don't Register the Module, the next Kernel update will break the driver. That would be my first guess. I always chose to Register the Driver Module. To fix that just follow the previous procedure to Install the Video driver from the Console.


. . Hi TBar,

. . Feeling cocky after removing and re-installing the nvidia driver distro using the second option you suggested and making a point of having the x server set to use the nvidia distro in future, I decided to accept the next major update, which turns out to be 0-75. BUT, on reboot there I was at the never ending login screen loop. So I went through the process again this time using option 1 and installing the nvidia-375 drivers from Linux. Hopefully on the next update it will work right away. I just thought I would let you know.

Stephen

<shrug>
ID: 1863744 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1865630 - Posted: 5 May 2017, 0:38:11 UTC

. . @ Me

. . I'm old so talking to myself is normal. But actually I just wanted to keep this thread alive.

. . It seems to have lost any interest from out there. :(

Stephen

?
ID: 1865630 · Report as offensive     Reply Quote
Profile scocam
Avatar

Send message
Joined: 28 Feb 17
Posts: 27
Credit: 15,120,999
RAC: 0
United States
Message 1865633 - Posted: 5 May 2017, 1:11:17 UTC

I just got my cruncher back up and running after two days of installing a couple more 1070s. I also installed new EK water blocks on all four 1070s, a new radiator, pump, reservoir and cooling loop. After a successful 24 hour leak test, I'm back crunching again. I've been running all day without issue. At the moment, I'm running 1 WU per GPU with 1.25 CPU threads per WU as well as 5 CPU threads crunching WUs. CPU and GPU temperatures are all under 46 degrees after running for 12+ hours. I'm pretty happy with it.

I appreciate all the help!

scocam
ID: 1865633 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1865636 - Posted: 5 May 2017, 2:01:51 UTC - in response to Message 1865633.  

I just got my cruncher back up and running after two days of installing a couple more 1070s. I also installed new EK water blocks on all four 1070s, a new radiator, pump, reservoir and cooling loop. After a successful 24 hour leak test, I'm back crunching again. I've been running all day without issue. At the moment, I'm running 1 WU per GPU with 1.25 CPU threads per WU as well as 5 CPU threads crunching WUs. CPU and GPU temperatures are all under 46 degrees after running for 12+ hours. I'm pretty happy with it.

I appreciate all the help!

scocam


. . Hi Scocam,

. . My gills just turned green ...

. . I am intrinsically wary of putting water inside a PC so I will leave the water cooling to bravehearts like yourself. :) But I am very impressed at being able to crunch so intensively on high level hardware while keeping temps down to that level. I am going to add a Cooler Master Hyper 612 to Bertie to bring the CPU temp back to a sane level. It is sitting in the low 60s at the moment.

. . I expect that rig of yours is destined for greatness! Probably not as great as Petri's rig (who can afford 4 x 1080s?) but right up there.

. . Meanwhile I have to be content keeping the hardware temps under 60 where I can. That is my benchmark.

. . I have had a peek at your results and I am concerned about the numbers of invalids and errors. You might want to talk to Petri or TBar about them. Four of the invalids are damaged file headers on the WUs so that could be almost anything, but I think the other three should be investigated. And the high number of "errors while computing" is of concern as well.

Stephen

:)
ID: 1865636 · Report as offensive     Reply Quote
Profile scocam
Avatar

Send message
Joined: 28 Feb 17
Posts: 27
Credit: 15,120,999
RAC: 0
United States
Message 1865643 - Posted: 5 May 2017, 3:37:45 UTC

I've been keeping an eye on those errors and invalids as well. Not sure what's going on there quite yet. My GPUs aren't overclocked and the 6850 CPU is only very mildly overclocked but it could be something in the command line that's causing issue. I'll investigate more tomorrow. I had an embarrassing amount of errored WUs while initially building this machine due to a couple of operator errors resulting in complete rebuilds. The errors within the last day or two are something I need to look into further though. I don't recall so many errored WUs while using zi3k+.

scocam
ID: 1865643 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1865659 - Posted: 5 May 2017, 6:12:41 UTC - in response to Message 1865643.  

It looks to be working OK now. Since you just finished major changes I'd say just keep and eye on it. The only App problem I saw was two spike overflows on GPU 3, but, it was only two. Everything else appears to have been caused by other problems. The one unfinished task was caused by the App being stopped during work, could have been a hard restart or power failure. The 'Bad workunit header' error is usually caused upon BOINC restart after a hard restart or power failure. All the other errors appear to have been caused by the op errors you mentioned.

It appears to be running as well as other machines that are using a full CPU, https://setiathome.berkeley.edu/workunit.php?wuid=2529161981
If everything goes well I'm afraid the Windows machines will have yet another top ten Linux machine to deal with...soon. It's getting crowded up there.
ID: 1865659 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1865661 - Posted: 5 May 2017, 6:48:10 UTC - in response to Message 1865643.  

The errors within the last day or two are something I need to look into further though. I don't recall so many errored WUs while using zi3k+.
scocam


. . I am not getting many myself but then I am not doing the volume you are. So far I have had only one invalid and that was using zi3k+, none so far with zi3t2b. But I will be keeping an eye out.

Stephen

..
ID: 1865661 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1865662 - Posted: 5 May 2017, 6:51:52 UTC - in response to Message 1865659.  
Last modified: 5 May 2017, 6:57:39 UTC

. . @ TBar,

. . I have been looking for the link to Petri's version of zi3t2b but so far can only find multiple links to the one I already have. If you can remember where he posted it I would appreciate knowing.

. . I am keeping an eye on the results for numbers of inconclusives. At first it was 6.7% but that was when there were a lot more Arecibo tasks and the numbers of valids were much higher. At the moment it is 10% but that is because the actual number of inconclusives has yet to change while the number of valids has dropped sharply with the Guppi flood. Hopefully in a day or two the number of 'incs' will drop and be more reflective of current results.

Stephen

?
ID: 1865662 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 162 · Next

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.