Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 162 · Next
Author | Message |
---|---|
scocam Send message Joined: 28 Feb 17 Posts: 27 Credit: 15,120,999 RAC: 0 |
Hey All, How are you installing nvidia 375.39 without issue? Each time I install it, I get a login loop until I uninstall. I think I have everything else working correctly. This is the last frustrating piece of the puzzle! Any help would be greatly appreciated. You'll see in my Computers list (https://setiathome.berkeley.edu/show_host_detail.php?hostid=8247189), that it's not currently showing the GPU. Regards, scocam |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Hey All, . .Did you see my message where I detailed the process for updating the video drivers?
. . It can run through a couple of restarts on the installer phase but if you are patient it should resolve. It did take me a couple of tries the first time I did it. Stephen . |
scocam Send message Joined: 28 Feb 17 Posts: 27 Credit: 15,120,999 RAC: 0 |
Thank you for the quick response, Stephen. I followed your instruction to the letter and everything went smoothly with the exception of the nvidia driver. Stuff like this makes me wonder how I've managed to work with linux for 15 years! If I get the 375 driver installed, I get the login loop and have to open tty and uninstall to be able to login again... but at the moment, that's not working either. I'll get it working at some point but didn't know if anyone else had similar issues. I'll keep working on it. I hope to report back some good news tomorrow. Regards, scocam |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Thank you for the quick response, Stephen.Usually the Login Loop is caused by the intrd.img not being updated correctly after the driver install. This can be caused by having another driver installed. It appears you were running the 375.39 driver earlier in the day, I suppose this was the Repository driver? Back when the instructions were posted the Repository didn't have the 375 driver, now that they do, you can just use the driver from the Repository. You can try reinstalling it from Additional drivers, or from the Console. It is possible to have both the Repository and the Vendor drivers installed at the same time, and that usually causes problems. I would drop back into the Console, stop lightdm, and run the uninstall commands for both drivers; sudo apt-get remove --purge nvidia* sudo ./NVIDIA-Linux-x86_64-375.39.run --uninstall (you need the Vendor version of the driver in your Home folder for this) sudo apt-get autoremove Then install the Repository driver, sudo apt-get install nvidia-375 That should give you a working Repository version of 375.39. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
If I get the 375 driver installed, I get the login loop and have to open tty and uninstall to be able to login again... but at the moment, that's not working either. I'll get it working at some point but didn't know if anyone else had similar issues.Usually the Login Loop is caused by the intrd.img not being updated correctly after the driver install. This can be caused by having another driver installed. It appears you were running the 375.39 driver earlier in the day, I suppose this was the Repository driver? Back when the instructions were posted the Repository didn't have the 375 driver, now that they do, you can just use the driver from the Repository. You can try reinstalling it from Additional drivers, or from the Console. It is possible to have both the Repository and the Vendor drivers installed at the same time, and that usually causes problems. I would drop back into the Console, stop lightdm, and run the uninstall commands for both drivers; . . AHA! That clinking sound you may have just heard was the penny dropping. I was misinterpreting his problem as the cycling of the video driver install procedure that can go round and round several times before the install completes. But now I realise he is referring to the problem I have when I accept the Linux updates which get to the login screen and accept the password but keep returning to the login screen instead of actually loading your desktop. So now I think I know why that is happening, the newer versions are loading the 375 driver and there is a conflict. So how do I amend the latest of the updates to remove the extraneous video drivers when I cannot log into it. I have abandoned revisions 0.70 and 0.72 and simply manually select revision 0.66 or 0.67 (different on each machine) which loads and runs AOK. . . I have tried to work out how to fix them by logging into the recovery option but I have no ideas what to do with the further options it offers. Stephen ?? |
scocam Send message Joined: 28 Feb 17 Posts: 27 Credit: 15,120,999 RAC: 0 |
Got it, I think. Among other minor things, I was having issues installing the display driver due to "Secure Boot" being enabled on the mobo BIOS. I didn't even connect the two until I started picking through the log files and then it dawned on me that this Asus mobo has a Secure Boot section that is enabled, by default, for Windows installs. This simple little oversight ate my lunch today. I'm currently crunching WUs though I'll do some additional testing tomorrow. Also, I had to RMA one of the 1070 due to a leaky fitting/connection that I found prior to installing it into the case... I guess I should mention that I run hybrids. Thanks again for your help, TBar and Stephen (and Petri)! Couldn't have done it without your patience and direction! Regards, scocam |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
. . AHA! That clinking sound you may have just heard was the penny dropping. I was misinterpreting his problem as the cycling of the video driver install procedure that can go round and round several times before the install completes. But now I realise he is referring to the problem I have when I accept the Linux updates which get to the login screen and accept the password but keep returning to the login screen instead of actually loading your desktop. So now I think I know why that is happening, the newer versions are loading the 375 driver and there is a conflict. So how do I amend the latest of the updates to remove the extraneous video drivers when I cannot log into it. I have abandoned revisions 0.70 and 0.72 and simply manually select revision 0.66 or 0.67 (different on each machine) which loads and runs AOK.The same procedure should work. You should be able to drop into the Console from the Log in window, Ctrl+Alt+F1 Then just follow the same cmds; sudo stop lightdm sudo apt-get remove --purge nvidia* sudo ./NVIDIA-Linux-x86_64-375.39.run --uninstall (you need the Vendor version of the driver in your Home folder for this) sudo apt-get autoremove Then install the Repository driver, sudo apt-get install nvidia-375 That should give you a working Repository version of 375.39. If you're using the Vendor driver, it's best to keep the installer in your Home folder so you don't have to find it to run the Uninstall cmd from the Console. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
The same procedure should work. You should be able to drop into the Console from the Log in window, Ctrl+Alt+F1 . . Just to be clear when you say my home directory you mean /home/<username> not just /home right? . . And should I use the - sudo reboot at the end to get back into Linux to do the - sudo apt-get install nvidia-375 ? Stephen ? |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . I have set Max CPU load to 75% to free some CPU time for CUDA80, it has increased the GPU usage by a small amount, it is sitting in the mid to high 90's now. . . Based on the success of that I have now completely suspended CPU crunching and made both CPU cores fully available to CUDA80. With -bs off the total CPU usage is running about 60% and the power consumed has dropped by about 10 to 15W, it is now where it was crunching on SoG or with -bs on. But the run times are amazing. Blc04 were about 8.5 to 8.75 mins, now 8 to 8.25, NARAs were about 5.25 to 5.5, now consistently about the 4.5 mark, maybe up to 4.75 and VHARs were consistently 2.25, now 2.0. I am sure some people think such gains are trivial but they mean that running in this mode this rig will produce about 25 to 30 more jobs per day, compared to the 10 to 15 I could possibly hope to get from the CPU when crunching. I see that as improved efficiency. Mon Apr 17 14:34:35 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 105... Off | 0000:01:00.0 On | N/A | | 80% 57C P0 59W / 75W | 1564MiB / 4033MiB | 99% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1053 G /usr/lib/xorg/Xorg 114MiB | | 0 3122 G compiz 29MiB | | 0 9976 C ...ome_x41p_zi3k+_x86_64-pc-linux-gnu_cuda80 1417MiB | +-----------------------------------------------------------------------------+ . . Having fun! BIG TIME! Stephen :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . @ TBar . . This has reminded me about a setting you had me change in the config on my first Linux rig that prevented a priority change when running CUDA80 tasks. The original message has been lost in the avalanche and I cannot remember where that change was. I have suspicions that if it is still lowering the priority on my second rig that may be why there is no difference with -bs off? It was worth asking the question at least. Stephen ?? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The concept is to keep things simple. The Home folder is that Orange icon near the top of the launcher that says Home Folder when you run the mouse over it. That way when you log into the console with your username and password you don't have to Change Directories to run the Driver Uninstall command. It is normal to Reboot After installing the Video driver, not before.The same procedure should work. You should be able to drop into the Console from the Log in window, Ctrl+Alt+F1. . Just to be clear when you say my home directory you mean /home/<username> not just /home right? The problem with your Pentium 4 machine is most likely lack of CPU resources. Have you tried running just One GPU and seeing if the CPU load will increase? Try Suspending all but one task so only One GPU is running and see if the CPU load for that one task will increase. The Option you added to cc_config.xml does just the Opposite of what you suggested, it Stops the Priority from being lowered and if anything should Raise the CPU load of the task. I doubt that option has anything to do with lower CPU load. Try running just one GPU and see what that does. To run more than One GPU that uses 100% CPU you really need a least a Core2 Quad CPU. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Just to be clear when you say my home directory you mean /home/<username> not just /home right? . . On my rigs running the mouse over that icon just says files, but when you open it you can select the folder it identifies as home. But that is the place I thought.
. . I tried that and surprise, surprise . . . . ... still no change :) This machine likes to be contrary. CPU use dropped to about 43% (half of the level when running both GPUs) ... with just one GPU crunching - Tue Apr 18 08:44:39 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.39 Driver Version: 375.39 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 106... Off | 0000:01:00.0 On | N/A | | 75% 54C P2 79W / 120W | 1874MiB / 6068MiB | 84% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 106... Off | 0000:02:00.0 Off | N/A | | 55% 37C P0 28W / 120W | 3MiB / 6072MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 977 G /usr/lib/xorg/Xorg 113MiB | | 0 2142 G compiz 29MiB | | 0 2545 C ...ome_x41p_zi3k+_x86_64-pc-linux-gnu_cuda80 1727MiB | +-----------------------------------------------------------------------------+ . . Even if it is not the fix for this issue can you remind me of that setting to fix the running priority? I would like to have both rigs with the same settings. Stephen . |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
. . Even if it is not the fix for this issue can you remind me of that setting to fix the running priority? I would like to have both rigs with the same settings.It's cc_config <process_priorty> |
Wiggo Send message Joined: 24 Jan 00 Posts: 36396 Credit: 261,360,520 RAC: 489 |
TBar, considering that Stephen's Pentium D (it does have 2 full cores) is sitting in 77th spot by RAC it can't be doing to badly. ;-) Cheers. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
LOL, I just looked at mine, I have spot 18 and 21 .... with the SAME computer LOL |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Even if it is not the fix for this issue can you remind me of that setting to fix the running priority? I would like to have both rigs with the same settings.It's cc_config <process_priorty> . . Ta muchly Brent. Stephen :) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
LOL, I just looked at mine, I have spot 18 and 21 .... with the SAME computer LOL . . A split personality there then ? :) Stephen LOL |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Yes, the old Dual core is working fine with the Default settings I set Just for machines such as his. Apparently he's not happy with that, he wants to suck every bit of life from the old CPU. That GPU App without the Blocking Sync setting will attempt to pull up to 110% from a normal Core2 Quad CPU. That means both GPUs will at times try to pull 220% from his CPU that only has 200%, not to mention what the system takes. It will not work very well. Best to just leave it at the Default setting and be happy it works as well as it does. The Option you added to cc_config.xml does just the Opposite of what you suggested, it Stops the Priority from being lowered...It's gotta be the Option in cc_config dealing with No Priority Change... from that quote. My older systems call the Icon Files, but shows it as Home when opened. If your system is calling it Files instead of Home, then you are running a strange version of the 4.4 Kernel. My 4.4 Kernel calls it Home Folder. It might be wise to attempt to fix the problems with the newer kernel before other whims. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
...But now I realise he is referring to the problem I have when I accept the Linux updates which get to the login screen and accept the password but keep returning to the login screen instead of actually loading your desktop. So now I think I know why that is happening, the newer versions are loading the 375 driver and there is a conflict. So how do I amend the latest of the updates to remove the extraneous video drivers when I cannot log into it...The most logical explanation is you didn't Register the Driver Module when you installed the Driver from nVidia. During the Install of the Vendor Driver you will be Asked if you want to Register the Module so it will be applied to future Kernel Updates. If you Don't Register the Module, the next Kernel update will break the driver. That would be my first guess. I always chose to Register the Driver Module. To fix that just follow the previous procedure to Install the Video driver from the Console. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
Yes, the old Dual core is working fine with the Default settings I set Just for machines such as his. Apparently he's not happy with that, he wants to suck every bit of life from the old CPU. . . YEP! LOL ..... Well not quite but almost :) That GPU App without the Blocking Sync setting will attempt to pull up to 110% from a normal Core2 Quad CPU. That means both GPUs will at times try to pull 220% from his CPU that only has 200%, not to mention what the system takes. It will not work very well. Best to just leave it at the Default setting and be happy it works as well as it does. . . This is the part that has me intrigued. Nothing so far makes much or any difference to the way it behaves. With -bs in or out, that priority change allowed or blocked, it just plods along at the same level. And it shows no sign of being a hardware limitation either, despite logic saying it almost certainly is. The GPUs are running light at about the 80% mark, the CPU/s about them same. I don't have GPU-Z for Linux so I cannot be sure about the PCIe bus but I suspect it is not being overly driven either and the RAM is not being overcommitted. Everything seems happy to just run at about 80% of capacity and nothing I change makes any difference. Most bizarre. While on the Core2 Duo it is as you say, with -bs off and CPU crunching suspended, overall CPU use is about 60% which would equate to 120% of a single core. Which is not a problem, having only the one GPU, and I think I like it that way. I just have to clear out the CPU cache and then I will make it a GPU only cruncher. But the Pentium D is an enigma ... The Option you added to cc_config.xml does just the Opposite of what you suggested, it Stops the Priority from being lowered...It's gotta be the Option in cc_config dealing with No Priority Change... from that quote. . . Yes it is, Brent steered me to it. But I am not sure why you are hung up on that part, if you will refer back to my original message you may notice that is exactly what I did say. I was asking for the option you had me change{or check} that prevents the priority from being changed. My older systems call the Icon Files, but shows it as Home when opened. If your system is calling it Files instead of Home, then you are running a strange version of the 4.4 Kernel. My 4.4 Kernel calls it Home Folder. It might be wise to attempt to fix the problems with the newer kernel before other whims. . . In my world anything is possible :). It is the same on both Linux boxes but then they are both built from the same installation version. I have so far demurred at doing an unnecessary restart but tomorrow when the outage has starved that rig of work I will take the opportunity to kill the proverbial two birds. I will restore -bs (just to settle your qualms) and try to sort out the video driver clash. With luck by tommorrow afternoon it will be running a very happy version generic 0.72. It will be interesting if that icon designation changes. . . Just for the record. The original installation version was 0.31 which updated at install to 0.66 on the Core2 Duo and 0.67 on the Pent-D. Both systems are rejecting the 0.70 and 072 updates but if I can resolve that on the Pent-D I will do so on the C2D as well. . . Thanks guys Stephen :) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.