Setting up Linux to crunch CUDA90 and above for Windows users

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 117 · 118 · 119 · 120 · 121 · 122 · 123 . . . 135 · Next

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3452
Credit: 195,278,888
RAC: 548,519
United States
Message 2002857 - Posted: 16 Jul 2019, 12:03:58 UTC - in response to Message 2002846.  

. . S O S

. . OK I must have missed something (as usual)

. . In preparation for updating the i5 rig with the GTX970s to 0.98b 101 I upgraded the video drivers first. I am guessing I missed a caveat about it not working with 970s because while the system fired up, when I started BOINC it dumped my entire cache. Looking at the event log it did not see the GPU at all :( This is strange because the system itself did, I could run nvidia-smi with the normal results.

. . The real drama is that when I tried to take it back from driver 430 to driver 410 it failed. I even tried taking it back to driver 390 but that also failed. When I reboot I am now at the 'bouncing log in screen. Going back to a previous version also doesn't work. The last I remember the old system of changing the video drivers in this situation from the terminal screen no longer works either ...

HELP!

Stephen

? ?


I don't really have a good idea but what happens if you take it back to the Noveau type drivers? The default?

Tom
I will stop procrastinating tomorrow.
\\// Live Long & Prosper (starting tomorrow ;)
ID: 2002857 · Report as offensive     Reply Quote
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 4624
Credit: 144,159,869
RAC: 236,394
Australia
Message 2002858 - Posted: 16 Jul 2019, 12:07:31 UTC - in response to Message 2002856.  

in what way did you "go back"? did you just install the old drivers over the new?
in my experience it's best to purge the previous drivers first
sudo apt purge *nvidia*
(keep the asterisks)
then reinstall the new version you want
sudo apt install nvidia-driver-410

then reboot


. . That is probably what I should have done, but now I am unable to do that because I cannot get Linux to run.

Stephen

:(
ID: 2002858 · Report as offensive     Reply Quote
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 4624
Credit: 144,159,869
RAC: 236,394
Australia
Message 2002859 - Posted: 16 Jul 2019, 12:08:58 UTC - in response to Message 2002857.  

I don't really have a good idea but what happens if you take it back to the Noveau type drivers? The default?
Tom


. . I can't get passed the bouncing login screeen :(

Stephen

:(
ID: 2002859 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 3452
Credit: 195,278,888
RAC: 548,519
United States
Message 2002862 - Posted: 16 Jul 2019, 12:13:13 UTC - in response to Message 2002859.  

I don't really have a good idea but what happens if you take it back to the Noveau type drivers? The default?
Tom


. . I can't get passed the bouncing login screeen :(

Stephen

:(


Try recovery mode? I think it is a shift key during boot. Then you can use the command line to clean out the drivers. And restart installing them.

Tom
I will stop procrastinating tomorrow.
\\// Live Long & Prosper (starting tomorrow ;)
ID: 2002862 · Report as offensive     Reply Quote
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 4624
Credit: 144,159,869
RAC: 236,394
Australia
Message 2002864 - Posted: 16 Jul 2019, 12:25:12 UTC - in response to Message 2002862.  

. . I can't get passed the bouncing login screeen :(
Stephen

Try recovery mode? I think it is a shift key during boot. Then you can use the command line to clean out the drivers. And restart installing them.
Tom


. . the first time this happened to me (a couple of years ago) TBar told me a process to correct it by invoking a system terminal mode and running through a list of commands. But when it happened several months later because of a system update that made the video drivers incompatible in some way the process no longer worked. I can give it a try since that machine is presently just a paper weight with flashing lights :(

Stephen

:(
ID: 2002864 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1718
Credit: 696,143,009
RAC: 2,438,330
United States
Message 2002866 - Posted: 16 Jul 2019, 12:40:27 UTC

Boot to recovery mode by holding down the left shift key while it’s booting. Sometimes it can boot too fast before you get it so try just spamming the shift key right after the POST messages. Then run the two commands I listed.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2002866 · Report as offensive     Reply Quote
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 4624
Credit: 144,159,869
RAC: 236,394
Australia
Message 2002871 - Posted: 16 Jul 2019, 13:10:40 UTC - in response to Message 2002866.  

Boot to recovery mode by holding down the left shift key while it’s booting. Sometimes it can boot too fast before you get it so try just spamming the shift key right after the POST messages. Then run the two commands I listed.


. . I entered "console mode" with ctrl-alt-F1, stopped lightdm, did the purge and tried to install nvidia-410 but it keeps telling me there is a dependency problem. Apparently nvidia-418 has loaded a module that is not purging when I run that command yet prevents a lower level of the module required by nvidia-410 from installing and so the install fails. The machine is presently running on nouveau drivers but I cannot change to anything else. Every attempt fails.

:(

Stephen
ID: 2002871 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4806
Credit: 550,286,308
RAC: 1,265,804
United States
Message 2002874 - Posted: 16 Jul 2019, 17:36:02 UTC - in response to Message 2002871.  

Well, the 970 is one of the GPUs that doesn't see any improvement with the CUDA 10.x Apps. You'd probably be better off using the CUDA 9.0 App with the Repository CUDA 9.0 driver.
Try running, sudo apt-get autoremove Then try installing the Repository driver from Additional Drivers, CUDA 9 needs at least 384.
Sounds like a perfect time to Upgrade to at least Ubuntu 16.04 and the All-In-One to me...
ID: 2002874 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1718
Credit: 696,143,009
RAC: 2,438,330
United States
Message 2002875 - Posted: 16 Jul 2019, 17:36:56 UTC - in response to Message 2002871.  
Last modified: 16 Jul 2019, 17:41:50 UTC

Well which dependency isn’t removing? It should be telling you.

Edit- oh sorry I didn’t see that you were running something other than Ubuntu 18.04. Which OS are you using? And why such an old BOINC version? You’re a recent convert to Linux right, and long time Windows user. I don’t see why you wouldn’t go with the easiest and tried and true setup with Ubuntu 18.04 + AIO package if you were moving to something new
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2002875 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9621
Credit: 884,406,217
RAC: 1,732,322
United States
Message 2002919 - Posted: 17 Jul 2019, 0:01:51 UTC - in response to Message 2002871.  

Apparently nvidia-418 has loaded a module that is not purging when I run that command yet prevents a lower level of the module required by nvidia-410 from installing and so the install fails.

This just crept up as an issue with the latest updates on the 418 series drivers. The module that is causing the issue is xserver-xorg-video-nvidia-418. It is hanging around like a bad penny.

The solution is a complete purge of nvidia drivers and installation of the nouveau drivers to get back to basics, then reapply the 430 series drivers. The 418 series drivers are being supplanted by the 430 series drivers in the distros.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2002919 · Report as offensive     Reply Quote
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 4624
Credit: 144,159,869
RAC: 236,394
Australia
Message 2002931 - Posted: 17 Jul 2019, 0:57:58 UTC - in response to Message 2002874.  
Last modified: 17 Jul 2019, 1:02:37 UTC

Well, the 970 is one of the GPUs that doesn't see any improvement with the CUDA 10.x Apps. You'd probably be better off using the CUDA 9.0 App with the Repository CUDA 9.0 driver.
Try running, sudo apt-get autoremove Then try installing the Repository driver from Additional Drivers, CUDA 9 needs at least 384.
Sounds like a perfect time to Upgrade to at least Ubuntu 16.04 and the All-In-One to me...


. . OK then I'm screwed, because I have done that and it fails every time. If there is a magic bullet to kill of the dependency I wish I knew it. Every attempt to install another driver gets this message that it is missing lib32gcc1 and that this module will not load because of lib32gcc4 put in place by nvidia-418. I have purged twice but this nasty little sucker just hangs around. :(

. . The reason I was trying to upgrade the 970s to 0.98B-101 was not for speed increase but to make the platforms all uniform, since your posts making it clear this version has benefits on some platforms and detriments on none makes it the 'obvious' choice for a standard configuration.

Stephen
ID: 2002931 · Report as offensive     Reply Quote
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 4624
Credit: 144,159,869
RAC: 236,394
Australia
Message 2002938 - Posted: 17 Jul 2019, 1:17:51 UTC - in response to Message 2002875.  

Well which dependency isn’t removing? It should be telling you.
Edit- oh sorry I didn’t see that you were running something other than Ubuntu 18.04. Which OS are you using? And why such an old BOINC version? You’re a recent convert to Linux right, and long time Windows user. I don’t see why you wouldn’t go with the easiest and tried and true setup with Ubuntu 18.04 + AIO package if you were moving to something new


. . The stubborn dependency is lib32gcc4 put there by nvidia-418 drivers. It stops any install attempt from loading the module lib32gcc1 that they require.

. . Yep, I am definitely a recent convert to Linux (I am a windows user at heart) because of Petri's special sauce. I have been using it for a little under 2.5 years which is why I am running Ubuntu 14.04 LTS and such an old version of BOINC. I am the "if it aint broke don't fix it" kind of guy which is why everything is still old but decided to lift everything up to the current "norm". I cannot claim to be a fan of Linux because it ALWAYS bites me.

. . I tried first on the C2D machine but neither Lubuntu 18.04 or Ubuntu 18.04 would install on the SSD in that machine to allow me to migrate SETI across. Ubuntu made better progress than Lubuntu but both failed to install, so I am back to the overcrowded 16GB flashdrive :( But the strange thing is Windows works very happily with that SSD. That rig is running repository BOINC so I left the upgrade to 0.98B for a later date.

. . I thought unit with the 970s would be easier but ... not to be.

Stephen

. .
ID: 2002938 · Report as offensive     Reply Quote
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 4624
Credit: 144,159,869
RAC: 236,394
Australia
Message 2002939 - Posted: 17 Jul 2019, 1:25:08 UTC - in response to Message 2002919.  

Apparently nvidia-418 has loaded a module that is not purging when I run that command yet prevents a lower level of the module required by nvidia-410 from installing and so the install fails.

This just crept up as an issue with the latest updates on the 418 series drivers. The module that is causing the issue is xserver-xorg-video-nvidia-418. It is hanging around like a bad penny.
The solution is a complete purge of nvidia drivers and installation of the nouveau drivers to get back to basics, then reapply the 430 series drivers. The 418 series drivers are being supplanted by the 430 series drivers in the distros.


. . I actually went straight for the 430 drivers and only tried to go back to 410 when there was a problem. So I was a little surprised when there was an issue caused by 418 drivers. I have actually purged twice and it is currently bootable with the nouveau drivers but I still have the same problem when trying to install any nvidia drivers, I even tried going back to the 384 drivers it had previously been running.

Stephen

? ?
ID: 2002939 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4806
Credit: 550,286,308
RAC: 1,265,804
United States
Message 2002944 - Posted: 17 Jul 2019, 1:44:48 UTC - in response to Message 2002931.  

Well, the 970 is one of the GPUs that doesn't see any improvement with the CUDA 10.x Apps. You'd probably be better off using the CUDA 9.0 App with the Repository CUDA 9.0 driver.
Try running, sudo apt-get autoremove Then try installing the Repository driver from Additional Drivers, CUDA 9 needs at least 384.
Sounds like a perfect time to Upgrade to at least Ubuntu 16.04 and the All-In-One to me...


. . OK then I'm screwed, because I have done that and it fails every time. If there is a magic bullet to kill of the dependency I wish I knew it. Every attempt to install another driver gets this message that it is missing lib32gcc1 and that this module will not load because of lib32gcc4 put in place by nvidia-418. I have purged twice but this nasty little sucker just hangs around. :(

. . The reason I was trying to upgrade the 970s to 0.98B-101 was not for speed increase but to make the platforms all uniform, since your posts making it clear this version has benefits on some platforms and detriments on none makes it the 'obvious' choice for a standard configuration.

Stephen
If you know WHERE the file is located you simply track it down, open the last folder as Admin, then Delete the offending file.
Your system has been borked for a while, I'd still suggest Upgrading to a newer version.
A bit of advice already in this thread, Tell People Which Driver You Are Using. The Uninstall Commands are Different depending on which Driver you are using.
For the Driver Downloaded from nVidia the command is a simple sudo nvidia-uninstall For the Repository & PPA driver you use the purge command.
The nvidia-uninstall command Will remove All nVidia components installed by their driver. Do Not Try to remove the driver from nVidia by purging, as you see it doesn't work and it kills the simple nvidia-uninstall command so you are borked. I've gone from 418 to 410 a couple of times using the driver from nVidia in Ubuntu 18.04 and 19.04, no problems here.
ID: 2002944 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1718
Credit: 696,143,009
RAC: 2,438,330
United States
Message 2002947 - Posted: 17 Jul 2019, 1:59:31 UTC - in response to Message 2002938.  

I have found that Ubuntu 18.04 is picky about finding the SSD when you have the boot environment not configured properly for the hardware. Mismatches between Legacy/UEFI and SATA mode settings (AHCI/RAID/etc) and how you actually booted the install media all matter. I’ve had the best luck making sure the SATA mode is on AHCI, the boot mode to UEFI, and then boot the UEFI install image (not the Legacy one).

You may have to play around with the settings to get it to load up properly depending on your exact hardware. I really can’t remember if C2D stuff was compatible with UEFI or not.

But once you get it all sorted, I think you should definitely just wipe the system and start over fresh with at least 16.04 or later.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2002947 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9621
Credit: 884,406,217
RAC: 1,732,322
United States
Message 2002948 - Posted: 17 Jul 2019, 2:03:11 UTC

TBar's comment about removing Nvidia drivers being different depending on source origin is a important piece of information to remember. The Nvidia official .run installer creates an uninstall list file just like Windows drivers and programs do. You have to use the installers provided uninstall command for the official Nvidia, closed source proprietary drivers. The open source Nvidia drivers from the ppa uses a different method of removing the drivers.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2002948 · Report as offensive     Reply Quote
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 4624
Credit: 144,159,869
RAC: 236,394
Australia
Message 2002951 - Posted: 17 Jul 2019, 2:48:26 UTC - in response to Message 2002944.  

If you know WHERE the file is located you simply track it down, open the last folder as Admin, then Delete the offending file.

. . Is there a Linux command to locate a file anywhere on a system?

Your system has been borked for a while, I'd still suggest Upgrading to a newer version.

. . Actually it has been working quite well until I try to change something. So if I can have this sort of problem changing a video driver how much damage can I cause trying to change the OS itself?

A bit of advice already in this thread, Tell People Which Driver You Are Using. The Uninstall Commands are Different depending on which Driver you are using.
For the Driver Downloaded from nVidia the command is a simple sudo nvidia-uninstall For the Repository & PPA driver you use the purge command.
The nvidia-uninstall command Will remove All nVidia components installed by their driver. Do Not Try to remove the driver from nVidia by purging, as you see it doesn't work and it kills the simple nvidia-uninstall command so you are borked. I've gone from 418 to 410 a couple of times using the driver from nVidia in Ubuntu 18.04 and 19.04, no problems here.

. . Sorry for not making that clear, this all ensued from your notice that the latest drivers are now in the repository. I have avoided using nvidia's own drivers ever since the first 'bouncing logon screen" problem. I was running nvidia-384 previously and decided to make 0.98b-101 the new platform for my Linux rigs so I upgraded to nvidia-430. All from the repository/ppa.

Stephen

:(
ID: 2002951 · Report as offensive     Reply Quote
Stephen "Heretic" Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 4624
Credit: 144,159,869
RAC: 236,394
Australia
Message 2002952 - Posted: 17 Jul 2019, 3:03:24 UTC - in response to Message 2002947.  

I have found that Ubuntu 18.04 is picky about finding the SSD when you have the boot environment not configured properly for the hardware. Mismatches between Legacy/UEFI and SATA mode settings (AHCI/RAID/etc) and how you actually booted the install media all matter. I’ve had the best luck making sure the SATA mode is on AHCI, the boot mode to UEFI, and then boot the UEFI install image (not the Legacy one).

You may have to play around with the settings to get it to load up properly depending on your exact hardware. I really can’t remember if C2D stuff was compatible with UEFI or not.

But once you get it all sorted, I think you should definitely just wipe the system and start over fresh with at least 16.04 or later.


. . First, may I ask, should I address you as Ian or Steve? I find it awkward always typing both ... ??

. . Yes I found that problem when trying to get Linux working on my Ryzen rig, but my settings must be different from those you used because I found if I booted the install from the UEFI image it always failed, I had to boot from the legacy image to get it to work. But other problems that persist on the installation, mainly the disappearing ethernet port, are why it is still running Windows/SoG.

. . I don't think the C2D does support UEFI but I will check. So I need to have the SATA mode set to AHCI and the install boot from UEFI if it is supported?

Stephen

<scratching head>
ID: 2002952 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1718
Credit: 696,143,009
RAC: 2,438,330
United States
Message 2002955 - Posted: 17 Jul 2019, 3:33:19 UTC - in response to Message 2002952.  

I’m Ian. Steve was my father who started this account. I took it over for him a few years ago and leave his name for legacy.

If your BIOS is set for Legacy mode, then you may need to boot the Legacy installer. My point was that the configurations need to match. My systems are setup for UEFI and so I boot the UEFI installer. I think I’m both cases I have the SATA mode set to AHCI. A few of my systems are running 2 SSDs in RAID 1 and hence have the SATA mode set to RAID. But I remember having to do a back and forth dance finding the right combination of BIOS/boot settings in the BIOS to be able to boot it and have the installer recognize the raid array. But with a single drive it should be simpler.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2002955 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 9621
Credit: 884,406,217
RAC: 1,732,322
United States
Message 2002957 - Posted: 17 Jul 2019, 3:35:17 UTC

My notice is that the Nvidia drivers are now part of the default Debian repos. No need to install the ppa repository. The 430 drivers are now automatically installed in the original OS installation without user intervention. They should just work out of the box from a clean install.

There is a difference between the closed proprietary Nvidia 430 drivers provided in the main distro now and the open source 430 drivers provided by the ppa. They ARE DIFFERENT.
Seti@Home classic workunits:20,676 CPU time:74,226 hours
ID: 2002957 · Report as offensive     Reply Quote
Previous · 1 . . . 117 · 118 · 119 · 120 · 121 · 122 · 123 . . . 135 · Next

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.