Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 58 · 59 · 60 · 61 · 62 · 63 · 64 . . . 162 · Next
Author | Message |
---|---|
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
OK, try this in Ubuntu 14.04 (And Others); Hi Stephen, I suggest you use -nobs when you have spare cores. You can forget the -pfp flag. You can use -pfb 32 flag and -pfl like 64 for 1080+ and 512 for the gtx750. There is a flag -pfe (no parameter value for that) you can try. It may give a boost but it will most certainly mess up with noise bombs. Your inconclusives and invalids count will rise. Do not use -pfe flag, just test it to see if it helps with speed. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
I am running CUDA90 and for some reason the -nobs doesn't do the same thing as the app_config.xml file does for using a full core per gpu. <app_info> <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda90</name> <executable/> </file_info> <file_info> <name>libcudart.so.9.0</name> </file_info> <file_info> <name>libcufft.so.9.0</name> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>801</version_num> <plan_class>cuda90</plan_class> <cmdline> -nobs -pfb 32 flag -pfl 512</cmdline> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>0.1</max_ncpus> <file_ref> <file_name>setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda90</file_name> <main_program/> </file_ref> <file_ref> <file_name>libcudart.so.9.0</file_name> </file_ref> <file_ref> <file_name>libcufft.so.9.0</file_name> </file_ref> </app_version> <app> <name>astropulse_v7</name> </app> <file_info> <name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</name> <executable/> </file_info> <file_info> <name>AstroPulse_Kernels_r2751.cl</name> </file_info> <file_info> <name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</name> </file_info> <app_version> <app_name>astropulse_v7</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>708</version_num> <plan_class>opencl_nvidia_100</plan_class> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>0.1</max_ncpus> <file_ref> <file_name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</file_name> <main_program/> </file_ref> <file_ref> <file_name>AstroPulse_Kernels_r2751.cl</file_name> </file_ref> <file_ref> <file_name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</file_name> <open_name>ap_cmdline.txt</open_name> </file_ref> </app_version> <app> <name>setiathome_v8</name> </app> <file_info> <name>MBv8_8.05r3345_avx_linux64</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>800</version_num> <file_ref> <file_name>MBv8_8.05r3345_avx_linux64</file_name> <main_program/> </file_ref> </app_version> <app> <name>astropulse_v7</name> </app> <file_info> <name>ap_7.05r2728_sse3_linux64</name> <executable/> </file_info> <app_version> <app_name>astropulse_v7</app_name> <version_num>704</version_num> <platform>x86_64-pc-linux-gnu</platform> <plan_class></plan_class> <file_ref> <file_name>ap_7.05r2728_sse3_linux64</file_name> <main_program/> </file_ref> </app_version> </app_info> Yes, this includes the AVX attempt which appears to be running SEE4.1 instead, but that is not the question. The question is what am I doing wrong with CUDA90 and the "-nobs" command? A proud member of the OFA (Old Farts Association). |
Sleepy Send message Joined: 21 May 99 Posts: 219 Credit: 98,947,784 RAC: 28,360 |
Dear all, I already was on the route to try unchain from Microsoft and the last apps by Petri (thank you thank you thank you!) gave me the last kick. Therefore I stepped on Kubuntu 18.04 and installed the latest applications (but not the 9.2 CUDA toolkit. I am downloading the biiiiig files for that now). I am running 0.97 10x0 application under 396.51 nVidia driver as of now and using 9.0 .so Throughput has increased a lot and till now I have not trashed too many WUs during my attempts. Invalids have not increased. So far so good. But I have a problem, probably common to many others: In my system I also have the graphic processor of my CPU. Not that I want to use it for Seti, This has long been discussed and deprecated. But under Windows I could easily crunch with my 1060 and drive the display with the embedded Intel GPU. This way I could push the 1060 hard without compromising the normal use of my PC. Now it seems also the X display is generated by the 1060 (though physically the monitor is still connected to the old Intel output! I cannot believe this and this is very weird and probably also very wrong) I tried Monday to solve the problem, details are not important, but I made a terrible mess. Intel was always at low def (could be raised to higher resolution with some line commands, though, till next boot), and the 1060 was not recognized any more by Boinc. I got out of this terrible mess, but I am again at square 0, crunching and displaying through nVdia. Before I make another catastrophic attempt, can anyone suggest a way around this corner? Thank you very much in advance! Sleepy |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
That is a VERY good question. I don't have the ability to run my Intel internal gpu at the same time I am running my discrete card so the issue there has not come up. I have been running gpu tasks on cards that are also displaying. I offer three possible work arounds. 1) Ignore the issue. Keep on computing. 2) Set the Gpu preferences to suspend whenever the computer is "active." This basically means anytime you are on the computer. 3) Set Seti so it will suspend after the system gets busy with other things above XX%. This basically means that it will suspend Seti anytime you are doing something that takes a significant amount of cpu time. Depending on exactly what else you are trying to use the computer for while doing seti processing, #2/#3 might be a working compromise. I have been getting very good production even when sharing the gpu with non-seti tasks. HTH, Tom A proud member of the OFA (Old Farts Association). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Yes this should be doable. You need to look at xorg.conf and see if it has two screen definitions. Look for which screen has the monitor attached to it. If the identifier is nvidia, you need to change it to intel. Pay attention to the busID for each device and screen and make sure they match. Use lspci to verify the busID of each graphics device or sudo lshw -c video Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
If your monitor is connected to the motherboard and not the 1060, you are using the iGPU in the CPU as a graphics driver. No if-and-or-buts. There is no such thing as porting (or what ever) from the 1060 to the motherboard port. You have probably been there already, but you want the 'NVIDIA X Server Settings' screen. Just search for it in your Menu. You don't need the NVidia 9.2 toolkit, only the 396 driver, and you're good to go with the latest apps :) |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
'nvidia-smi' (in the terminal) will also tell you if the 1060 GPU is active: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 396.51 Driver Version: 396.51 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN Xp On | 00000000:02:00.0 Off | N/A | | 83% 83C P2 267W / 300W | 3632MiB / 12196MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 On | 00000000:03:00.0 On | N/A | | 57% 53C P2 155W / 217W | 3063MiB / 8111MiB | 99% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 1080 On | 00000000:04:00.0 Off | N/A | | 57% 49C P2 143W / 217W | 2814MiB / 8119MiB | 99% Default | +-------------------------------+----------------------+----------------------+Notice that my Card #1 has the display ON, or active. |
Sleepy Send message Joined: 21 May 99 Posts: 219 Credit: 98,947,784 RAC: 28,360 |
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 396.51 Driver Version: 396.51 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 106... Off | 00000000:01:00.0 Off | N/A | | 49% 66C P2 92W / 120W | 2022MiB / 3019MiB | 98% Default | +-------------------------------+----------------------+----------------------+ Dear Brent, you are right then. Nvidia-SMI reports my 1060 not active. And Xorg.conf reports only one GPU, the Intel. I should then get the signal from the Intel GPU, as the physical connection implies. Nevertheless, when the nVidia GPU is crunching, I experience strong lags and video stuttering, which I was never experiencing under Win7. I am keeping on average 4 CPU off Seti, depending on CPU temperature. Therefore, I should have enough CPU reserve to cope with anything. By snoozing GPU crunching everything runs again as normal. Now I have updated the IGPU driver from ppa:oibaf/graphics-drivers, I will check if this helps. But I can test it only locally. VNC connection stutters by definition. At least there are no strange data transfers from one GPU to the other (unreasonable I admit). I will need probably to adjust some settings. I went straight on defaults, since nobody was talking as usual about tweaking the settings for best performance/usability. Thank you for your insights. Sleepy |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Default -pfl is 64 for the application. That is tuned for 1070's and 1080's. Probably a bit too aggressive for a 1060. Try -pfl 512. And you should not run -nobs to reduce the impact on cpu core usage. Those two changes should reduce the stuttering Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Well colour me tickled pink. . . Sorry my bad, I meant the -pfb 32 flag. With the -pfl flag can I infer that means values like 128 for my 970s and 1060s and maybe 256 for the 1050ti? There is a flag -pfe (no parameter value for that) you can try. It may give a boost but it will most certainly mess up with noise bombs. Your inconclusives and invalids count will rise. . . OK, I will give the other settings a tweak and see how things settle down. Then I will try the -pfe flag and see if there is any noticeable difference. I moved some VLARs to the 970 GPUs and their times were pretty consistent at 3.9 mins. Stephen :) |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
This is odd. On my machine# 8560172 the scheduler is "waiting" 12 hours before the next polling. I screwed up and managed to generate a lot of "computation errors" for the Gpu because I thought I still was running "headless" but Linux apparently found and installed a Nvidia driver for my gt720 which is NOT anything but a place holder till I get a "real" card. When I manually polled the Seti server(s) it appears to have uploaded all the computation errors (finally) and downloaded a bunch of CPU tasks. I no longer count on my systems to be "headless" and setup one of the locations (home) to be CPU only. If the polling continues to be odd, I will post a follow up to this message. Tom A proud member of the OFA (Old Farts Association). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I sometimes can coax the client to contact the servers when they have imposed a long backoff by setting both priority_debug and time_debug in the diagnostics flags and then doing an update. That usually resets the timers for me. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I found newly prepared flash with Ubuntu booting on one PC but not on another. No bootable OS found. Doing some testing I reproduced issue on original PC too. Flash boots when being chosen via EFI but fails then chosen just as USB device (BIOS). On installation time I said ~ "install grub on "drive" (not in "partition") and make it EFI/BIOS aware". So expected to be able to boot via both routes. How could I reinstall grub (Linux bootloader, right?) w/o reinstalling whole OS and how to make sure it will boot in both EFI and BIOS cases? SETI apps news We're not gonna fight them. We're gonna transcend them. |
Sleepy Send message Joined: 21 May 99 Posts: 219 Credit: 98,947,784 RAC: 28,360 |
Yesterday I left the best part of my Nvidia-smi output. Today I made some more searches (i.e. here: https://devtalk.nvidia.com/default/topic/991849/-solved-run-cuda-on-dedicated-nvidia-gpu-while-connecting-monitors-to-intel-hd-graphics-is-this-possible-/, where someone has my same problem and goal). +-----------------------------------------------------------------------------+ | NVIDIA-SMI 396.54 Driver Version: 396.54 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 106... Off | 00000000:01:00.0 Off | N/A | | 42% 59C P2 41W / 120W | 1988MiB / 3019MiB | 31% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 854 G /usr/lib/xorg/Xorg 83MiB | | 0 1223 G kwin_x11 18MiB | | 0 1225 G /usr/bin/krunner 2MiB | | 0 1238 G /usr/bin/plasmashell 40MiB | | 0 29170 C ...e.berkeley.edu/setiV0.97.linux_x64_10x0 1831MiB | +-----------------------------------------------------------------------------+ As you can see, KDE and X is running from the nVidia card. With the monitor connected phisically to the motherboard! As silly as it can seem, it seems the system is sweating to send the display data produced by the nVidia card through the mainboard. In the post I linked above, the solution seems to have worked for a while, but then it got broken again. If I understood all this stuff correctly, of course. Sleepy |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Yesterday I left the best part of my Nvidia-smi output. Is this a laptop? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The problem with using different platforms is the xorg.conf can only handle one platform. In your case it needs to be set to run the monitor from the Intel device. The way to run different platforms is to run one device with repository drivers and the other using vendor drivers. With ATI/nVidia the easiest way is to install the nVidai driver from nVidia without touching the xorg.conf, and the repository driver for the ATI. The xorg.conf will be set to use ATI for the monitor. I suppose it would be the same for Intel. Use the Driver from nVidia and the repository driver for Intel. The xorg.conf needs to be set to use the intel device. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I found newly prepared flash with Ubuntu booting on one PC but not on another. No bootable OS found. Not sure what you are trying to do but I hope I phrased my search correctly. I think you want the grub bootloader on the drive and not in the EFI partition? How can I reinstall GRUB to the EFI partition? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
BTW, in case people don't know it... There are Two types of nVidia drivers, the ones from the repository and the ones from nVidia. It is Possible to have Both installed at the Same time, and your machine will Not respond well. To remove the repository nVidia driver you use; sudo apt-get remove --purge nvidia* sudo apt-get autoremove This will Only remove the Repository nVidia driver. To remove the driver from nVidia you run the nVidia install again adding --uninstall to the end; sudo ./NVIDIA-Linux-x86_64-396.51.run --uninstall sudo apt-get autoremove With older drivers removing the driver from nVidia without installing another driver before rebooting was a Sure way to get the Login Loop. Fortunately that was fixed in more recent drivers from nVidia. |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
I have gremlins tonight. Its machine # 8560172 https://setiathome.berkeley.edu/show_host_detail.php?hostid=8560172 The Seti server said "Your app_info.xml file doesn't have a usable version of SETI@home v8." 1) Copied a "new" app_info.xml file from my installation media. 2) Copied one from the download folder on the box that was throwing the error. 3) Deleted the cuda90 archive and folder in the download directory. Downloaded a brand new copy of the cuda90 archive. Unarchived into a folder in the download directory. Copied that "new" app_info.xml file into the setiathome project directory. 4) Displayed app_info.xml in the Leafpad print function. The section that that controls the cpu-based processing is literally missing. So I assume I have some undisplayable characters that are screwing things up. But how do they all show up in all sorts of copies from various places? Stumped. Turned off box till I have more inspiration or something. I was having trouble getting windows installed on this box so this is my only all Linux system. I can't even dual boot into windows and start processing :( Tom A proud member of the OFA (Old Farts Association). |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Hi Petri, . . I found things running very nicely and smoothly with the -nobs and -pfb 32. Run times are very impressive. So I added the -pfe flag and I can say it might be saving a few seconds per task but not much more (maybe about 2% - 3%) but the inconclusive rate is much higher, so far not seeing any invalids. How long would you like me to test that setting? Stephen ? ? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.