Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 38 · 39 · 40 · 41 · 42 · 43 · 44 . . . 162 · Next
Author | Message |
---|---|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I thought we already went over this. Yes, here it is, There is a difference between Drivers and Libraries. All three platforms have the CUDA Libraries separate from the Driver. To download the Driver you go here, http://www.nvidia.com/object/linux-amd64-display-archive.html To download the Libraries you go here, https://developer.nvidia.com/gpu-accelerated-libraries It has always been that way, most people just need the driver, some of us actually know about the Libraries. If you don't think you still need the CUDA 8 Libraries just remove them from the setiathome.berkeley.edu folder and watch what happens to the CUDA 8 App. Having the Libraries built into the App just means you won't have to chase down the Correct Libraries to make the App work. The App will Always need the Libraries to work. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I see I didn't make myself clearly understood. What I meant was WHEN Nvidia starts shipping a Linux driver with CUDA 9.0 libraries or WHEN the repositories get an updated driver with CUDA 9.0 libraries in the package ..... Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
rob smith Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380 |
Static linking means that the libraries are part of the application. So when nvidia start to deliver cuda9 by default there will be no change. Unless they change part of the api. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I see I didn't make myself clearly understood. What I meant was WHEN Nvidia starts shipping a Linux driver with CUDA 9.0 libraries or WHEN the repositories get an updated driver with CUDA 9.0 libraries in the package .....Never going to happen. The Drivers and Libraries have Always been separate. The Cuda 9 Drivers have been available for months, they are the ones numbered 384.x. The CUDA 9 Toolkit has been out for a while, you can find it here, https://developer.nvidia.com/cuda-toolkit Download and install the Toolkit if you want the Libraries, they will Never be part of the Driver. That's why there are separate links for the CUDA 8.0 Libraries in the CUDA 8 download, the Libraries are separate from the driver, You will need to Download the CUDA 8.0 libraries mentioned in the README_x41p_zi3v and place them in the setiathome.berkeley.edu folder before running. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Static linking means that the libraries are part of the application. So when nvidia start to deliver cuda9 by default there will be no change. Unless they change part of the api. I am having a really hard time explaining my question I see. I understand static linking includes the CUDA 9.0 libraries in the application. I understand Nvidia delivers the drivers separate from the CUDA libraries. Let's change the question. What was the cause for Petri to statically link the CUDA 9.0 libraries into the application instead of just following suit as in the past with providing a link to the CUDA 9.0 libraries and having the user of his special app retrieve them on their own as was the case with the special app using the CUDA 8.0 libraries. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Petri answered your question days ago over here, https://setiathome.berkeley.edu/forum_thread.php?id=80636&postid=1893313#1893313 Keith Myers wrote: |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
No, he didn't answer my question about P2 state, so I guess I ignored the rest of his response as not pertinent to my question. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
No, he didn't answer my question about P2 state, so I guess I ignored the rest of his response as not pertinent to my question. 1) I answered that P2 problem is in the driver. Yes I have tried to get the cards to run P0. I have not succeeded. I have searched the internet and all are asking the same question. The answer is always that in Linux you can not get P0 with compute load on 1080. I had similar problem with 780 and 980. Then all of a sudden one day NVIDIA changed the drivers to allow setting P0 to them. It was only for quadro and titans that could do that before. Now I'm waiting NVIDIA to allow that for 1080 in some future driver. 2) The static cuda90 library link is needed for the fft callbacks. An extra bonus is that you do not need to download the dynamic library files from NVIDIA or another place. p.s. On Linux once the executable is in the main memory another process using the same executable shares the code. Load time is not an issue even with this big exe when running multiple copies. I could do a parameter --faststart that could be used to start one executable outside boinc so that the exe is always in memory even with one GPU and one task at a time. It would sit in a sleep(a lot of time like forever) loop. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks for the clarification Petri. I know you said you were unable to get the 1080 into P0 state. But I never got an answer whether the issue was with the drivers or with the overclock tools. The blame is solely with the Nvidia Linux drivers I understand now. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Hope that someone can help me out with my Linux machine. I just found it not running BOINC. And now every time I start BOINC, it only runs for about 10 seconds before it reboots the machine. I have reinstalled BOINC and have the same problem. If I quickly suspend the project before it locks up the machine and reboots, the machine will run fine for all other programs. It is definitely BOINC that is causing the reboots. What kind of action can I take or tools to figure out why BOINC is crashing the machine. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Maybe try renaming /etc/init.d/boinc-client so that it can't auto start, then you can eliminate that one thing only from your startup ... maybe something in the GPU clocking script is going wrong when not loaded ... dunno man ... |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I don't have it autostart. I am running TBar 7.8.3 BOINC in /Home directory. Not overclocking for testing. Just load the desktop and then manually start BOINC. It runs for about ten seconds then all the timers freeze, the mouse freezes and five seconds later, reboot. If I don't start BOINC, the machine runs normally. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
What kind of action can I take or tools to figure out why BOINC is crashing the machine.While the Project is Suspended, select all the tasks and Suspend them. Stop BOINC, open client_state.xml, scroll to the bottom and delete all the active tasks from the list, everything between <active_task_set> & </active_task_set> save the results. Then go to the slots folder and delete all the numbered folders. Start BOINC, resume the Project, and resume the tasks one at a time until all devices are running. If it stays running, resume the rest of the tasks. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks TBar. Making some progress. Can run the CPU tasks. The GPU tasks seem to be the factor that causes BOINC to crash. I tried to go back to the CUDA80 app but this is what I am getting in the stdout file. Thu 26 Oct 2017 03:10:27 PM PDT | SETI@home | task blc04_2bit_guppi_57976_07262_HIP74926_0026.20486.0.21.44.241.vlar_1 resumed by user Thu 26 Oct 2017 03:10:27 PM PDT | SETI@home | [error] error: can't open file for shmem seg name Thu 26 Oct 2017 03:10:27 PM PDT | SETI@home | [error] error: can't open file for shmem seg name: 2 I just checked the dependencies on both the CUDA 80 and CUDA 90 static apps and didn't see any irregularities. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I don't have it autostart. I am running TBar 7.8.3 BOINC in /Home directory. Not overclocking for testing. Just load the desktop and then manually start BOINC. It runs for about ten seconds then all the timers freeze, the mouse freezes and five seconds later, reboot. If I don't start BOINC, the machine runs normally. . . Perhaps try going back to the earlier version of BOINC that you were using and see if that runs OK ?? Stephen ?? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Thanks TBar. Making some progress. Can run the CPU tasks. The GPU tasks seem to be the factor that causes BOINC to crash. I tried to go back to the CUDA80 app but this is what I am getting in the stdout file. Hmmm, never heard of that one before. Try stopping BOINC, in the Home folder select Show Hidden Files from the View menu. Open the folder .nv, and then the folder ComputeCache. Delete all items from the folder ComputeCache. Then start BOINC and see if that helps. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I don't have it autostart. I am running TBar 7.8.3 BOINC in /Home directory. Not overclocking for testing. Just load the desktop and then manually start BOINC. It runs for about ten seconds then all the timers freeze, the mouse freezes and five seconds later, reboot. If I don't start BOINC, the machine runs normally. It's running again. I found something strange in client_state file where TBar told me to delete the running tasks. There were no running tasks in that section and there wasn't a proper tag opening for that parameter, just the closing tag. So I added the open tag and saved the file. Maybe that was what was causing BOINC to crash. There should have been 7 tasks listed as running there if I understand what that section is for. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks TBar. Making some progress. Can run the CPU tasks. The GPU tasks seem to be the factor that causes BOINC to crash. I tried to go back to the CUDA80 app but this is what I am getting in the stdout file. Hi TBar, thanks for the help. Didn't know about that hidden directory. Will put that one in the memory bank. The problem was definitely coming from the gpu tasks. Since that hidden directory seems to be about Nvidia, suspect that is where the problem lay. I would assume that ComputeCache has something to do with what each gpu is working with?? Time for some Googling I guess to see what that one is about. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . That sounds like you have diagnosed it well. You could suspend BOINC and recheck that section and see what is there when it is running OK. That should confirm your diagnosis and you would then know what to expect to see there. Stephen .. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I was finally able to look at Darksider's tasks. The website is slow as molasses for the past couple of hours. I gave up waiting for the Error tasks to refresh and fixed my office chair. I was able to look at the stderr.txt output for the errored tasks and then searched on the error. I came up with a hit on the BOINC site about this error showing up back in the 6.2.18 client. BOINC 6.2.xx - crashes all over the place Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.