Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 52 · 53 · 54 · 55 · 56 · 57 · 58 . . . 162 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Just been reading posts over on the Nvidia developers forum and some light has dawned on why so many people are having issues detecting their gpus when they install the 396.51 drivers. Seems the developers made a mistake in configuration of the drivers on releases after 396.24. They are requiring NUMA support in the latest drivers which will not work on 32 bit systems. The CONFIG_NUMA requirement wasn't intentional and is being tracked in bug 2316155. Also see posts about losing monitor output after installing the drivers and rebooting. Caused by xorg.conf not setting any monitor for screen output. Seen this myself every time after 396.24 drivers which were the last good ones. Also see a post on login boot loop on a Ubuntu 16.04 system after installation of the 396.51 drivers. Just fought this one myself the night before last. The current short term release 396.51 drivers are pretty buggy to say the least. A lot more than usual. The last good short term release driver was the 396.24 and if you want to be entirely safe, stick with the current long term release 390.77 drivers. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
That just goes to show you how different people's system can be. I have Three Systems running 396.45, 51 and I'm not having any of those problems. I downloaded the driver from nVidia and installed it from Recovery Mode. Been working fine...so far. One of them is even on Ubuntu 14.04.1 , where I built the latest Special App. No problems with 16.04 either. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Both Zalster's and JBird's machines have had the swapped PCIe BusID issue after running nvidia-xconfig cool-bits tweak. Same as me. Never had the issue until the 396.45 and 396.51 driver installations. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
OK, so I'm running 396.51 and looking at the Server Setting with the cool-bits tweak. What should I be looking for to find this bug? I have two 750 Ti in this machine and it's running Pertri's latest App. As far as I can tell the Slots assignments are correct. I have to increase the Top cards fan cause it's running much hotter than the lower card., check. I just checked the 3 GPU machine, it appears correct too. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK, so I'm running 396.51 and looking at the Server Setting with the cool-bits tweak. It only happens on the first installation of the Nvidia drivers from Nouveau drivers. After rebooting to the Nvidia drivers. Make a backup copy of xorg.conf. Run the cool-bits tweak and you will reboot to a blank monitor. Drop to recovery mode and look again at the new xorg.conf with the cool-bits enabled and compare the PCIe BusId's with your backup copy. Typically the first and last card's have swapped PCIe BusID. [Before] GPU#0 = PCIe BusID = 5 GPU#3= PCIe BusID = 10 [After] GPU#0 = PCIe BusID =10 GPU#3 = PCIe BusID = 5 Monitor is blank because the Display Manager is sending screen output to a different card with no monitor attached likely. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Ah, so the xorg.conf is not being built correctly then? I have copies of the xorg.conf scattered around each machine. I usually just make sure I have a copy handy if needed. That's really a trivial matter. The driver itself works fine. I have had past problems when using a PPA though. The cool-bits didn't work correctly with the driver from the PPA whereas it worked correctly with the driver from nVidia installed as per nVidia's instructions. That's why I download the driver from nVidia, and install it from Recovery Mode, where the X Server isn't active. I'm really not up to installing a new system just to check the xorg.conf creation. Once the xorg.conf is correct, the driver works fine....on my machines. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Once you fix the swapped Id's everything is good. Had me scratching my head for a day and I was reverting to the original xorg.conf to get my display back. Then every time I used nvidia-xconfig for the cool-bits tweak I had no monitor. Took me a while to finally notice the BusID changes. Now I know to expect it and to prepare for it with a backup copy of the original xorg.conf. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I downloaded the Nvidia.run file for the latest short term Linux release. I will use that again at some point to try that installation method. When I first started out with Linux and attempted installation with the .run file I had nothing but troubles and issues and was never successful. Now that I am more familiar with Linux, I should be able to navigate the instructions more easily. I never could figure out how to stop the display manager for installation originally. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The most important part of running the nVidia Installer is to make sure you have dkms installed first, so you can register the module to be used with Kernel updates. sudo apt-get install dkms That should fix that. Other points; The installer script Always fails unless you have provided one, click OK, it means nothing. You can choose to install the 32 bit part of the driver, but, most people don't have the 32 bit libraries and it will fail. It means nothing, don't worry about it. You Really should register the driver module so you don't have to reinstall the driver after a Kernel update. Most people have a Custom xorg.conf, and you shouldn't let the installer touch it, the default is NO! If you don't have a custom xorg.conf installed, you can choose Yes and let the installer build a new xorg.conf. When finished, enter reboot. That's about it, I may have left something out. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . So no 0.95 Special Sauce then? :( Stephen :( |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
No, still can run the beta 0.96 special sauce. Just expect to deal with the headaches associated with 396 drivers. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I'm glad I missed all those headaches by using the nVidia Installer. So, tell me, does the Installer that comes with the PPA driver allow you to say NO to having the Installer mess with your working xorg.conf? The nVidia Installer is set to Not mess with your working xorg.conf by default. If it works, don't mess with it. Otherwise, I see no headaches with the 396.x drivers. Installing them wasn't any different than installing any other driver I've installed from nVidia. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I'm glad I missed all those headaches by using the nVidia Installer. Could be related to the GPU or OS version he use? I use 396.51 drivers with no problem at all too. Installed manualy with driver update as usual, first stop Boinc, install the driver and reset the host. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Apparently, there is a problem with the PPA he is using. Similar to the Repository problems with the 390 drivers. If you use the drivers downloaded from nVidia, as I do, you miss All those problems. I haven't had any problems with either the 390 or 396 drivers from nVidia. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I'm glad I missed all those headaches by using the nVidia Installer. No, no choice just as with any other package. It is assumed that if you are asking to install the package, you want to install everything. It just blows the package to the system. No interaction with the installing script. Same with getting the package from Synaptic Package Manager. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
OK, well there's one advantage to installing the nVidia downloaded driver from the Recovery mode. The Default action of the nVidia driver is to Not Touch your perfectly working xorg.conf. The advantage of using the Recovery Mode instead of Console is the X Server was never started in the Recovery Mode, so, you don't have to Stop it as you do in the Console. That removes one step and it doesn't matter which display manager the OS uses, it was never started so you don't need to stop it. Get the CUDA 9.2 Drivers installed People, 'cause the latest CUDA 9.2 App seems to be working. Just a few tasks run by both CUDA 9.0 zi3v and V0.97. Note the Shorties work this time, and the BLC Overflows match with both Apps. Running on TBarsMacPro.local at Sat Aug 18 16:08:28 2018 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : 03my17ab.4903.11519.16.43.91.wu 16fe08aa.12502.25021.6.33.13.wu 18dc09ah.26284.16432.6.33.125.wu blc01_2bit_guppi_58137_29542_HIP45689_002 0.26400.818.21.44.80.vlar.wu blc03_2bit_guppi_58227_18045_HIP66354_0049.27116.0.22.45.226.vlar.wu blc04_2bit_blc04_guppi_58226_25178_DIAG_ PSR_J1935+1616_0007.31720.818.22.45.135.vlar.wu blc04_2bit_guppi_58227_05169_HIP53229_0012.26582.409.21.44.134.vlar.wu blc16_2bit_guppi_58 185_76028_Dw1_off_0033.2471.1636.22.45.95.vlar.wu Listing executable(s) in /APPS : setiathome_x41p_V0.97_x86_64-apple-darwin_cuda91 Listing executable in /REF_APPs : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda91 --------------------------------------------------- Current WU: 03my17ab.4903.11519.16.43.91.wu --------------------------------------------------- Running default app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda91 -nobs -device 0 175.75 real 152.75 user 18.94 sys Elapsed Time: ....................................... 175 seconds --------------------------------------------------- Running app with command : setiathome_x41p_V0.97_x86_64-apple-darwin_cuda91 -nobs -device 0 167.63 real 146.53 user 18.70 sys Elapsed Time : .................................... 168 seconds Speed compared to default : 104 % ----------------- Comparing results Result : Strongly similar, Q= 100.0% --------------------------------------------------- Done with 03my17ab.4903.11519.16.43.91.wu. Current WU: 16fe08aa.12502.25021.6.33.13.wu --------------------------------------------------- Running default app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda91 -nobs -device 0 129.00 real 112.09 user 14.62 sys Elapsed Time: ....................................... 129 seconds --------------------------------------------------- Running app with command : setiathome_x41p_V0.97_x86_64-apple-darwin_cuda91 -nobs -device 0 109.32 real 95.27 user 11.77 sys Elapsed Time : .................................... 110 seconds Speed compared to default : 117 % ----------------- Comparing results Result : Strongly similar, Q= 100.0% --------------------------------------------------- Done with 16fe08aa.12502.25021.6.33.13.wu. Current WU: 18dc09ah.26284.16432.6.33.125.wu --------------------------------------------------- Running default app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda91 -nobs -device 0 125.65 real 109.25 user 14.11 sys Elapsed Time: ....................................... 126 seconds --------------------------------------------------- Running app with command : setiathome_x41p_V0.97_x86_64-apple-darwin_cuda91 -nobs -device 0 106.92 real 93.22 user 11.42 sys Elapsed Time : .................................... 107 seconds Speed compared to default : 117 % ----------------- Comparing results Unmatched signal(s) in R1 at line(s) 393 473 For R1:R2 matched signals only, Q= 100.0% Result : Weakly similar. --------------------------------------------------- Done with 18dc09ah.26284.16432.6.33.125.wu. Current WU: blc01_2bit_guppi_58137_29542_HIP45689_0020.26400.818.21.44.80.vlar.wu --------------------------------------------------- Running default app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda91 -nobs -device 0 293.81 real 256.35 user 34.83 sys Elapsed Time: ....................................... 293 seconds --------------------------------------------------- Running app with command : setiathome_x41p_V0.97_x86_64-apple-darwin_cuda91 -nobs -device 0 191.44 real 165.61 user 23.35 sys Elapsed Time : .................................... 191 seconds Speed compared to default : 153 % ----------------- Comparing results Result : Strongly similar, Q= 100.0% --------------------------------------------------- Done with blc01_2bit_guppi_58137_29542_HIP45689_0020.26400.818.21.44.80.vlar.wu. Current WU: blc03_2bit_guppi_58227_18045_HIP66354_0049.27116.0.22.45.226.vlar.wu --------------------------------------------------- Running default app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda91 -nobs -device 0 299.08 real 261.51 user 34.96 sys Elapsed Time: ....................................... 299 seconds --------------------------------------------------- Running app with command : setiathome_x41p_V0.97_x86_64-apple-darwin_cuda91 -nobs -device 0 195.45 real 170.45 user 22.57 sys Elapsed Time : .................................... 195 seconds Speed compared to default : 153 % ----------------- Comparing results Result : Strongly similar, Q= 100.0% --------------------------------------------------- Done with blc03_2bit_guppi_58227_18045_HIP66354_0049.27116.0.22.45.226.vlar.wu. Current WU: blc04_2bit_blc04_guppi_58226_25178_DIAG_PSR_J1935+1616_0007.31720.818.22.45.135.vlar.wu --------------------------------------------------- Running default app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda91 -nobs -device 0 8.33 real 5.06 user 1.18 sys Elapsed Time: ....................................... 8 seconds --------------------------------------------------- Running app with command : setiathome_x41p_V0.97_x86_64-apple-darwin_cuda91 -nobs -device 0 9.37 real 6.15 user 1.11 sys Elapsed Time : .................................... 9 seconds Speed compared to default : 88 % ----------------- Comparing results Result : Strongly similar, Q= 100.0% --------------------------------------------------- Done with blc04_2bit_blc04_guppi_58226_25178_DIAG_PSR_J1935+1616_0007.31720.818.22.45.135.vlar.wu. Current WU: blc04_2bit_guppi_58227_05169_HIP53229_0012.26582.409.21.44.134.vlar.wu --------------------------------------------------- Running default app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda91 -nobs -device 0 23.81 real 18.63 user 3.04 sys Elapsed Time: ....................................... 24 seconds --------------------------------------------------- Running app with command : setiathome_x41p_V0.97_x86_64-apple-darwin_cuda91 -nobs -device 0 23.10 real 18.37 user 2.61 sys Elapsed Time : .................................... 23 seconds Speed compared to default : 104 % ----------------- Comparing results Result : Strongly similar, Q= 99.99% --------------------------------------------------- Done with blc04_2bit_guppi_58227_05169_HIP53229_0012.26582.409.21.44.134.vlar.wu. Current WU: blc16_2bit_guppi_58185_76028_Dw1_off_0033.2471.1636.22.45.95.vlar.wu --------------------------------------------------- Running default app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda91 -nobs -device 0 357.79 real 315.15 user 39.94 sys Elapsed Time: ....................................... 358 seconds --------------------------------------------------- Running app with command : setiathome_x41p_V0.97_x86_64-apple-darwin_cuda91 -nobs -device 0 222.99 real 192.91 user 27.60 sys Elapsed Time : .................................... 223 seconds Speed compared to default : 160 % ----------------- Comparing results Unmatched signal(s) in R1 at line(s) 373 469 For R1:R2 matched signals only, Q= 100.0% Result : Weakly similar. --------------------------------------------------- Done with blc16_2bit_guppi_58185_76028_Dw1_off_0033.2471.1636.22.45.95.vlar.wu. Done with Benchmark run! Now to build the Linux App... |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
I was looking at the app_info.xml file because the cpu Gflops I am getting seems to be lower than when I last had Lunatics Beta installed on that box. This box doesn't support AVX and when I look at the results of a task it appears to be using an SSE instruction set (I think). If I am reading the file name right, that cpu app only supports SSE but doesn't support AVX? If this is true, can I find an individual download of the cpu app that supports AVX? And patch it in? I am getting ready (I hope later today) to install the "secret sauce"(CUDA90 version) on a large 16c/32t cpu/MB that I am certain supports AVX. I want every little extra I can get on the cpu since the gpu app is certainly holding up its share! I have been looking at "secret sauce" Gflops numbers on some comparable (to my box) 16c/32t machines and they are lower than the numbers I got on an 4c/8t cpu with AVX under lunatics beta distro. So I really am wondering. Thank you, Tom A proud member of the OFA (Old Farts Association). |
rob smith Send message Joined: 7 Mar 03 Posts: 22203 Credit: 416,307,556 RAC: 380 |
If your cpu does not support avx then DO NOT try to run an app which has been complied using the avx extensions. Doing so would result in either the app not ruuning or if it run trashing loads of tasks. The reason that the special sauce apppears to be returning a lower flops count than expected itthe poor way flops are calculated on gpus. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Thanks for the benchmark run TBar. But I would have wanted to see the zi3v app run on CUDA9.0 and CUDA9.2. Or I am not comprehending your comment about "get CUDA 9.2 people" Or is that comment directed at the people running the beta 0.97 special app? Are you saying you are compiling a Linux version of 0.97 special app for public release? So would I see any benefit on running CUDA 9.2 on zi3v? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Just searched for CUDA 9.2 on the Nvidia Developers site. Drilling down the download choices gets me to Ubuntu 17.10 and 16.04 options. No listing for Ubuntu 18.04. Would the 17.10 release work? [Edit] I'm going to try the instructions on this site. How-to-install-CUDA-9-2-on-Ubuntu-18-04 Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.