Setting up Linux to crunch CUDA90 and above for Windows users

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 38 · 39 · 40 · 41 · 42 · 43 · 44 . . . 162 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1894224 - Posted: 9 Oct 2017, 4:06:07 UTC - in response to Message 1894218.  

I thought we already went over this. Yes, here it is, There is a difference between Drivers and Libraries. All three platforms have the CUDA Libraries separate from the Driver. To download the Driver you go here, http://www.nvidia.com/object/linux-amd64-display-archive.html To download the Libraries you go here, https://developer.nvidia.com/gpu-accelerated-libraries It has always been that way, most people just need the driver, some of us actually know about the Libraries. If you don't think you still need the CUDA 8 Libraries just remove them from the setiathome.berkeley.edu folder and watch what happens to the CUDA 8 App. Having the Libraries built into the App just means you won't have to chase down the Correct Libraries to make the App work. The App will Always need the Libraries to work.
ID: 1894224 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1894234 - Posted: 9 Oct 2017, 5:36:40 UTC - in response to Message 1894224.  

I see I didn't make myself clearly understood. What I meant was WHEN Nvidia starts shipping a Linux driver with CUDA 9.0 libraries or WHEN the repositories get an updated driver with CUDA 9.0 libraries in the package .....
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1894234 · Report as offensive     Reply Quote
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22200
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1894237 - Posted: 9 Oct 2017, 6:31:31 UTC

Static linking means that the libraries are part of the application. So when nvidia start to deliver cuda9 by default there will be no change. Unless they change part of the api.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1894237 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1894252 - Posted: 9 Oct 2017, 12:09:20 UTC - in response to Message 1894234.  

I see I didn't make myself clearly understood. What I meant was WHEN Nvidia starts shipping a Linux driver with CUDA 9.0 libraries or WHEN the repositories get an updated driver with CUDA 9.0 libraries in the package .....
Never going to happen. The Drivers and Libraries have Always been separate. The Cuda 9 Drivers have been available for months, they are the ones numbered 384.x. The CUDA 9 Toolkit has been out for a while, you can find it here, https://developer.nvidia.com/cuda-toolkit Download and install the Toolkit if you want the Libraries, they will Never be part of the Driver. That's why there are separate links for the CUDA 8.0 Libraries in the CUDA 8 download, the Libraries are separate from the driver,
You will need to Download the CUDA 8.0 libraries mentioned in the README_x41p_zi3v and place them in the setiathome.berkeley.edu folder before running.
ID: 1894252 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1894349 - Posted: 9 Oct 2017, 17:28:35 UTC - in response to Message 1894237.  

Static linking means that the libraries are part of the application. So when nvidia start to deliver cuda9 by default there will be no change. Unless they change part of the api.

I am having a really hard time explaining my question I see. I understand static linking includes the CUDA 9.0 libraries in the application. I understand Nvidia delivers the drivers separate from the CUDA libraries.

Let's change the question. What was the cause for Petri to statically link the CUDA 9.0 libraries into the application instead of just following suit as in the past with providing a link to the CUDA 9.0 libraries and having the user of his special app retrieve them on their own as was the case with the special app using the CUDA 8.0 libraries.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1894349 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1894353 - Posted: 9 Oct 2017, 17:50:50 UTC - in response to Message 1894349.  
Last modified: 9 Oct 2017, 18:01:48 UTC

Petri answered your question days ago over here, https://setiathome.berkeley.edu/forum_thread.php?id=80636&postid=1893313#1893313

Keith Myers wrote:
Petri, is this strictly to implement CUDA 9.0? Or does it do anything with P2 state in Pascal cards being unable to override like you can with Maxwell cards. Have you ever been able to get around the P2 state on your Pascal cards or have you resigned yourself to accept P2 state as the highest you can clock them?

This (fft callbacks) could have been done with CUDA 6.5 or later. The callbacks need static link. The static linking helps to deploy the executable since it does not need external lib files.

The fft callbacks help to reduce the amount of data transfers to and from GPU RAM since the pre- and post processing of data can be done when fft reads from mem and when it writes to mem. That gives the speed-up since RAM is 'slow'. The callbacks are now implemented for auto correlation search. It will be implemented for all other pulse types later.

The P2 is in the driver. NVIDIA could remove it if they wanted to.
Petri
ID: 1894353 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1894356 - Posted: 9 Oct 2017, 17:55:10 UTC - in response to Message 1894353.  

No, he didn't answer my question about P2 state, so I guess I ignored the rest of his response as not pertinent to my question.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1894356 · Report as offensive     Reply Quote
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1894386 - Posted: 9 Oct 2017, 20:56:13 UTC - in response to Message 1894356.  
Last modified: 9 Oct 2017, 21:00:26 UTC

No, he didn't answer my question about P2 state, so I guess I ignored the rest of his response as not pertinent to my question.


1) I answered that P2 problem is in the driver. Yes I have tried to get the cards to run P0. I have not succeeded. I have searched the internet and all are asking the same question. The answer is always that in Linux you can not get P0 with compute load on 1080.

I had similar problem with 780 and 980. Then all of a sudden one day NVIDIA changed the drivers to allow setting P0 to them. It was only for quadro and titans that could do that before. Now I'm waiting NVIDIA to allow that for 1080 in some future driver.

2) The static cuda90 library link is needed for the fft callbacks. An extra bonus is that you do not need to download the dynamic library files from NVIDIA or another place.

p.s. On Linux once the executable is in the main memory another process using the same executable shares the code. Load time is not an issue even with this big exe when running multiple copies. I could do a parameter --faststart that could be used to start one executable outside boinc so that the exe is always in memory even with one GPU and one task at a time. It would sit in a sleep(a lot of time like forever) loop.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1894386 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1894391 - Posted: 9 Oct 2017, 21:06:02 UTC - in response to Message 1894386.  

Thanks for the clarification Petri. I know you said you were unable to get the 1080 into P0 state. But I never got an answer whether the issue was with the drivers or with the overclock tools. The blame is solely with the Nvidia Linux drivers I understand now.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1894391 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1897633 - Posted: 26 Oct 2017, 20:44:27 UTC

Hope that someone can help me out with my Linux machine. I just found it not running BOINC. And now every time I start BOINC, it only runs for about 10 seconds before it reboots the machine. I have reinstalled BOINC and have the same problem. If I quickly suspend the project before it locks up the machine and reboots, the machine will run fine for all other programs. It is definitely BOINC that is causing the reboots. What kind of action can I take or tools to figure out why BOINC is crashing the machine.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1897633 · Report as offensive     Reply Quote
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1897637 - Posted: 26 Oct 2017, 21:00:05 UTC - in response to Message 1897633.  

Maybe try renaming /etc/init.d/boinc-client so that it can't auto start, then you can eliminate that one thing only from your startup ... maybe something in the GPU clocking script is going wrong when not loaded ... dunno man ...
ID: 1897637 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1897640 - Posted: 26 Oct 2017, 21:14:45 UTC - in response to Message 1897637.  

I don't have it autostart. I am running TBar 7.8.3 BOINC in /Home directory. Not overclocking for testing. Just load the desktop and then manually start BOINC. It runs for about ten seconds then all the timers freeze, the mouse freezes and five seconds later, reboot. If I don't start BOINC, the machine runs normally.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1897640 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1897643 - Posted: 26 Oct 2017, 21:17:56 UTC - in response to Message 1897633.  

What kind of action can I take or tools to figure out why BOINC is crashing the machine.
While the Project is Suspended, select all the tasks and Suspend them. Stop BOINC, open client_state.xml, scroll to the bottom and delete all the active tasks from the list, everything between <active_task_set> & </active_task_set> save the results. Then go to the slots folder and delete all the numbered folders. Start BOINC, resume the Project, and resume the tasks one at a time until all devices are running. If it stays running, resume the rest of the tasks.
ID: 1897643 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1897658 - Posted: 26 Oct 2017, 22:18:02 UTC - in response to Message 1897643.  

Thanks TBar. Making some progress. Can run the CPU tasks. The GPU tasks seem to be the factor that causes BOINC to crash. I tried to go back to the CUDA80 app but this is what I am getting in the stdout file.
Thu 26 Oct 2017 03:10:27 PM PDT | SETI@home | task blc04_2bit_guppi_57976_07262_HIP74926_0026.20486.0.21.44.241.vlar_1 resumed by user
Thu 26 Oct 2017 03:10:27 PM PDT | SETI@home | [error] error: can't open file for shmem seg name
Thu 26 Oct 2017 03:10:27 PM PDT | SETI@home | [error] error: can't open file for shmem seg name: 2

I just checked the dependencies on both the CUDA 80 and CUDA 90 static apps and didn't see any irregularities.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1897658 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1897662 - Posted: 26 Oct 2017, 22:34:44 UTC - in response to Message 1897640.  

I don't have it autostart. I am running TBar 7.8.3 BOINC in /Home directory. Not overclocking for testing. Just load the desktop and then manually start BOINC. It runs for about ten seconds then all the timers freeze, the mouse freezes and five seconds later, reboot. If I don't start BOINC, the machine runs normally.


. . Perhaps try going back to the earlier version of BOINC that you were using and see if that runs OK ??

Stephen

??
ID: 1897662 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1897663 - Posted: 26 Oct 2017, 22:40:09 UTC - in response to Message 1897658.  
Last modified: 26 Oct 2017, 22:40:45 UTC

Thanks TBar. Making some progress. Can run the CPU tasks. The GPU tasks seem to be the factor that causes BOINC to crash. I tried to go back to the CUDA80 app but this is what I am getting in the stdout file.
Thu 26 Oct 2017 03:10:27 PM PDT | SETI@home | task blc04_2bit_guppi_57976_07262_HIP74926_0026.20486.0.21.44.241.vlar_1 resumed by user
Thu 26 Oct 2017 03:10:27 PM PDT | SETI@home | [error] error: can't open file for shmem seg name
Thu 26 Oct 2017 03:10:27 PM PDT | SETI@home | [error] error: can't open file for shmem seg name: 2

I just checked the dependencies on both the CUDA 80 and CUDA 90 static apps and didn't see any irregularities.

Hmmm, never heard of that one before. Try stopping BOINC, in the Home folder select Show Hidden Files from the View menu. Open the folder .nv, and then the folder ComputeCache. Delete all items from the folder ComputeCache. Then start BOINC and see if that helps.
ID: 1897663 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1897665 - Posted: 26 Oct 2017, 22:44:39 UTC - in response to Message 1897662.  

I don't have it autostart. I am running TBar 7.8.3 BOINC in /Home directory. Not overclocking for testing. Just load the desktop and then manually start BOINC. It runs for about ten seconds then all the timers freeze, the mouse freezes and five seconds later, reboot. If I don't start BOINC, the machine runs normally.


. . Perhaps try going back to the earlier version of BOINC that you were using and see if that runs OK ??

Stephen

??

It's running again. I found something strange in client_state file where TBar told me to delete the running tasks. There were no running tasks in that section and there wasn't a proper tag opening for that parameter, just the closing tag. So I added the open tag and saved the file. Maybe that was what was causing BOINC to crash. There should have been 7 tasks listed as running there if I understand what that section is for.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1897665 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1897666 - Posted: 26 Oct 2017, 22:50:51 UTC - in response to Message 1897663.  

Thanks TBar. Making some progress. Can run the CPU tasks. The GPU tasks seem to be the factor that causes BOINC to crash. I tried to go back to the CUDA80 app but this is what I am getting in the stdout file.
Thu 26 Oct 2017 03:10:27 PM PDT | SETI@home | task blc04_2bit_guppi_57976_07262_HIP74926_0026.20486.0.21.44.241.vlar_1 resumed by user
Thu 26 Oct 2017 03:10:27 PM PDT | SETI@home | [error] error: can't open file for shmem seg name
Thu 26 Oct 2017 03:10:27 PM PDT | SETI@home | [error] error: can't open file for shmem seg name: 2

I just checked the dependencies on both the CUDA 80 and CUDA 90 static apps and didn't see any irregularities.

Hmmm, never heard of that one before. Try stopping BOINC, in the Home folder select Show Hidden Files from the View menu. Open the folder .nv, and then the folder ComputeCache. Delete all items from the folder ComputeCache. Then start BOINC and see if that helps.

Hi TBar, thanks for the help. Didn't know about that hidden directory. Will put that one in the memory bank. The problem was definitely coming from the gpu tasks. Since that hidden directory seems to be about Nvidia, suspect that is where the problem lay. I would assume that ComputeCache has something to do with what each gpu is working with?? Time for some Googling I guess to see what that one is about.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1897666 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1897668 - Posted: 26 Oct 2017, 22:54:08 UTC - in response to Message 1897665.  


It's running again. I found something strange in client_state file where TBar told me to delete the running tasks. There were no running tasks in that section and there wasn't a proper tag opening for that parameter, just the closing tag. So I added the open tag and saved the file. Maybe that was what was causing BOINC to crash. There should have been 7 tasks listed as running there if I understand what that section is for.


. . That sounds like you have diagnosed it well. You could suspend BOINC and recheck that section and see what is there when it is running OK. That should confirm your diagnosis and you would then know what to expect to see there.

Stephen

..
ID: 1897668 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1897675 - Posted: 26 Oct 2017, 23:46:29 UTC

I was finally able to look at Darksider's tasks. The website is slow as molasses for the past couple of hours. I gave up waiting for the Error tasks to refresh and fixed my office chair. I was able to look at the stderr.txt output for the errored tasks and then searched on the error. I came up with a hit on the BOINC site about this error showing up back in the 6.2.18 client.
BOINC 6.2.xx - crashes all over the place
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1897675 · Report as offensive     Reply Quote
Previous · 1 . . . 38 · 39 · 40 · 41 · 42 · 43 · 44 . . . 162 · Next

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.