Setting up Linux to crunch CUDA90 and above for Windows users

Author	Message
Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1955631 - Posted: 16 Sep 2018, 0:28:04 UTC - in response to Message 1955629. What UPS you use? I'm thinking to buy one here too. These days I use Eaton. They cost more, but they are actually capable of meeting their specifications. I've had a bout 4 or 5 different brands over the years, cheaper units, and not one of them came close to meeting the manufacturer's claimed runtime or load ratings. The Eaton units do. Grant Darwin NT ID: 1955631 · Reply Quote

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1955636 - Posted: 16 Sep 2018, 0:49:56 UTC - in response to Message 1955629. I got the APC SmartUPS SMT1500C. APC SmartUPS SMT1500C at Amazon. It is a true RMS sine-wave output and can supply 1000W of power for 7 minutes at full load when the line power goes out. I had the APC BackUPS 1500G before. Both are rated for 1500VA loads but in reality the BackUPS has a load limit of 865W because it is a stepped-wave output with only 60% conversion efficiency compared to the 70% conversion efficiency of the SmartUPS. Much bigger batteries in the SmartUPS compared to the BackUPS. The SmartUPS is a much better product than the BackUPS. You pay for it too. $180 versus $480. I had SmartUPS 1500 ten years ago before the battery chargers failed and couldn't maintain the batteries. They are a solid business class product. Good for 5-6 years at least on the original batteries before the batteries need replacing. Cheapo/knockoff replacement batteries never lasted more than 2-3 years. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1955636 · Reply Quote

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1955640 - Posted: 16 Sep 2018, 0:58:24 UTC I had good luck with the earlier generation of SmartUPS 1500's. They could maintain the server load for their published times. Ran well with no issues until I replaced the APC original batteries with Chinese UBC clone batteries. Then the charging circuits began to fail and cook the batteries. They are paperweights now. I'm hoping this new generation of SmartUPS lasts as long as my first ones. I did the battery calibration rundown on it and it came up with the published 7 minutes at 1000W full load specification. So it should be able to run my Seti load for that long. I of course have APCUPSD software controlling and monitoring the UPS. I have it set to shut the system down after 3 minutes so well before it would have exhausted the batteries. You should never pull a lead acid battery down below 50% charge to keep from damaging it. I looked at the Eaton offerings and believe they are very good products. They charge for it too and that decided it for me to choose the APC unit. Will see I guess if I get my monies worth out of the APC unit. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1955640 · Reply Quote

Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1955653 - Posted: 16 Sep 2018, 2:37:56 UTC - in response to Message 1955640. You will be happy with that SmartUPS. We had dozens of them at work from 5000W down and they just don't fail. Just replace batteries every 2+ years. The consumer products do. My last purchase was an APC BX1500 900W. I only had 2 days of testing with it before my #2 computer had it's meltdown, but it did trip on overload once already. The latest 'sauce' has definitely put more load on my UPSs! The biggest problems we had with the low end UPS (Say >500W Consumer, where we only needed low use for a hub or radio supply) was that after an extended outage they just wouldn't go back to line. Once the batteries ran out - toast. replace battery and it was the same. ID: 1955653 · Reply Quote

Sleepy Volunteer tester Send message Joined: 21 May 99 Posts: 219 Credit: 98,947,784 RAC: 28,360	Message 1956338 - Posted: 19 Sep 2018, 16:43:46 UTC Dear all, as you know, I eventually managed to have xorg run on the Intel GPU, leaving the Nvidia totally free to crunch. Which is good. Next step would be trying to overclock it a bit again, as I was under Windows. That was very easy... Under Linux not that much, like many already pointed out here with many suggestions and tricks. My problem is that under this configuration, I cannot run nvidia-settings, unless I do not run xorg on Nvidia (I tried that and that would work, but I do not want to go back along that road) Anyone has any idea on how to meet both these two conflicting goals? Thank you! Sleepy ID: 1956338 · Reply Quote

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1956349 - Posted: 19 Sep 2018, 17:47:15 UTC - in response to Message 1956338. You are out of luck with a Pascal card as the only way to overclock is to run nvidia-settings. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1956349 · Reply Quote

RickToTheMax Send message Joined: 22 May 99 Posts: 105 Credit: 7,958,297 RAC: 0	Message 1956411 - Posted: 20 Sep 2018, 0:27:21 UTC Regarding overclock and linux/seti.. How would one notice instability if GPU is overclocked too much? Would it generate inconclusive? error? or the app would crash/restart? I would guess it would be better to test in a 3d app, like unigine benchmark first ID: 1956411 · Reply Quote

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1956412 - Posted: 20 Sep 2018, 0:31:53 UTC - in response to Message 1956411. Regarding overclock and linux/seti.. How would one notice instability if GPU is overclocked too much? Would it generate inconclusive? error? or the app would crash/restart? I would guess it would be better to test in a 3d app, like unigine benchmark first application would become unstable and crash. The work unit would then error out. All work units following that work unit will error out as well. After that you would need to find the compute folder and remove them as they are corrupted and will need to regenerate new ones with a reboot. Keith can explain it better. ID: 1956412 · Reply Quote

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1956423 - Posted: 20 Sep 2018, 1:30:54 UTC It's pretty obvious. You will all of a sudden start returning errored work when you overclock too far or your clock is too much for the thermal environment. The tasks will all end up with runtimes of a few seconds and will empty your gpu cache in a matter of minutes. The errored tasks on this host is an example. Host 6279633 and the task stderr.txt output will have something like this in its output. <core_client_version>7.4.44</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> Cuda error 'Couldn't get cuda device count ' in file 'cuda/cudaAcceleration.cu' in line 152 : unknown error. </stderr_txt> ]]> or free(): invalid size SIGABRT: abort called Stack trace (30 frames): ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x85b6c0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f1a7c9e1890] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f1a7b8cce97] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f1a7b8ce801] /lib/x86_64-linux-gnu/libc.so.6(+0x89897)[0x7f1a7b917897] /lib/x86_64-linux-gnu/libc.so.6(+0x9090a)[0x7f1a7b91e90a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4dc)[0x7f1a7b925e2c] /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1(+0x77330a)[0x7f1a71a0c30a] /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1(+0x131699)[0x7f1a713ca699] /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1(+0x77117b)[0x7f1a71a0a17b] /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1(+0x13522b)[0x7f1a713ce22b] /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1(__cuda_CallJitEntryPoint+0x101d)[0x7f1a71367c0d] /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.396.54(elfLink_Finish+0x63)[0x7f1a796d2fd3] /usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x1d36e4)[0x7f1a79ae46e4] /usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x1284ab)[0x7f1a79a394ab] /usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x1285f0)[0x7f1a79a395f0] ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x82b73a] ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x81eef0] ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x82a56b] ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x82ed9f] ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x82f50a] ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x822d9c] ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x80721e] ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x841d5c] ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x40e49b] ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x41c7c5] ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x426525] ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x4080d8] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f1a7b8afb97] ../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x408f89] Exiting... </stderr_txt> ]]> which indicates the application crashed and couldn't find its resources. You need to exit BOINC and delete the files in the hidden .nv directory in the /Compute Cache folder where the CUDA and OpenCL compute primitives are kept. Then after reducing your overclock or making the card cooler, restart BOINC and it will recreate the compute primitives and you will start crunching again after returning some valid work and BOINC starts removing your error penalty. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1956423 · Reply Quote

RickToTheMax Send message Joined: 22 May 99 Posts: 105 Credit: 7,958,297 RAC: 0	Message 1956431 - Posted: 20 Sep 2018, 2:56:56 UTC - in response to Message 1956423. Thanks for the info, so far so good then, ill slowly raise the core some more.. It seems like the seti task don't benefit much from higher gpu memory clock ? probably not worth it to push it much more i would guess. ID: 1956431 · Reply Quote

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1956433 - Posted: 20 Sep 2018, 3:08:31 UTC - in response to Message 1956431. Actually on Seti, we find with the special app that it responds best to overclocking the memory. For other projects, just the opposite with Einstein as an example. Try and get the memory clock at least back to what is should be in P0 compute state if Nvidia didn't penalize us for running compute loads on Pascal. Then add in a modest overclock on top of that. So a 1070 can run 8400Mhz on its memory and a 1080 can run 10400 on its memory and and a 1080Ti can run 11000 -11800 on its memory. You have to run the cards in P2 state with the overclock added to P2 state in unrestricted mode. You really should run Petri's keepP2 state utility on each card also to keep it in P2 state and prevent it from moving to P0 state when a task unloads from the card and before the next task loads. That prevents the card from crashing from the overclock. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1956433 · Reply Quote

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1956434 - Posted: 20 Sep 2018, 3:11:19 UTC Last modified: 20 Sep 2018, 3:12:12 UTC I believe on Seti that is best to let GPU Boost 3.0 manage the core clocks based on the thermal environment. I just add a 20-40Mhz boost to the core clocks on each card. I add a 1400Mhz memory boost to my 1080's and 1080Ti's. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1956434 · Reply Quote

RickToTheMax Send message Joined: 22 May 99 Posts: 105 Credit: 7,958,297 RAC: 0	Message 1956439 - Posted: 20 Sep 2018, 3:31:09 UTC - in response to Message 1956434. Interesting, ill look into that, so far i added 50mhz on the 1080 and 100mhz on the 1060, and both at 1000 for memory Both card are pretty cool at 58-62c The boost throttle back a bit on the 1080 under load.. but the 1060 is stready at 2113mhz ID: 1956439 · Reply Quote

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1956442 - Posted: 20 Sep 2018, 3:56:14 UTC - in response to Message 1956439. When I was running my single fan 1060, it consistently was able to hold the highest GPU Boost 3.0 core clocks of around 2025Mhz. It never seemed to be affected much by temperature. All my other air-cooled cards start out at around 2Ghz at first load but then drop down to low 1900's after about 20-30 minutes of crunching. My AIO cards just motor on without a care with not much deviation from their max boosted core clocks. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1956442 · Reply Quote

Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640	Message 1956499 - Posted: 20 Sep 2018, 16:01:45 UTC Last modified: 20 Sep 2018, 16:02:24 UTC my watercooled 1080tis, i run at +125MHz core clock lol to keep them around 2000MHz, they are reference FE cards though. core temps in the 40's C i still don't see much difference in run time with GPU mem overclock on the latest special app (v0.97b2) since there are less calls to memory as i understand it. but i did/do under SoG and zi3v. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ID: 1956499 · Reply Quote

Keith Myers Volunteer tester Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873	Message 1956515 - Posted: 20 Sep 2018, 17:35:03 UTC - in response to Message 1956499. Based on my observations also Ian, I think you are correct that the latest special app is less responsive to memory overclocks than the older SoG or zi3v apps. Of course, if the overclock doesn't cause instability or heat problems, if it ain't broke . . . . it don't need fixin'. I will just leave them the way they are. No harm - no foul. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) ID: 1956515 · Reply Quote

RickToTheMax Send message Joined: 22 May 99 Posts: 105 Credit: 7,958,297 RAC: 0	Message 1956539 - Posted: 20 Sep 2018, 21:16:59 UTC Is there any thread i should look into to set up keepP2? I would guess i need some script of some sort,like i will have to boot at stock clock, and maybe have a script run P2 then apply overclock after right? I'll look into it a bit, but i might just end up using a mild memory overclock that push it close to the limit in P3/P0 state and run with the P2 penalty.. Running my 1060 @ +120 , running at 2138mhz now Do you guys had to mess with power limit command? usually i crank up the power limit under windows, just wondering if its needed under linux, i think i saw i could change that with nvidia-smi command, but right now there is still a large buffer before reaching the current TDP limit, so i don't see a point really. Gaming might push it closer to the limit i would guess. On another note, damn i like Kubuntu 18.04 alot! KDE plasma is now quite polished and is pretty light in resources usage (at least compared to old KDE/Gnome)! If anyone would like to try that out instead of Ubuntu/Lubuntu, as it is closer to the Windows UI in general. The instruction in here work just fine. My 2nd system is using KDE Neon (now upgraded to 18.04 as well). ID: 1956539 · Reply Quote

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1956550 - Posted: 20 Sep 2018, 22:25:36 UTC - in response to Message 1956539. Last modified: 20 Sep 2018, 22:39:03 UTC On another note, damn i like Kubuntu 18.04 alot! KDE plasma is now quite polished and is pretty light in resources usage (at least compared to old KDE/Gnome)! If anyone would like to try that out instead of Ubuntu/Lubuntu, as it is closer to the Windows UI in general. The instruction in here work just fine. My 2nd system is using KDE Neon (now upgraded to 18.04 as well). . . Thanks for that tip. I might have a look at that .... . . A question though, which system is running the KDE Plasma front end? The system with the 1060 says it is running Neon so I guess that is your "2nd system". And is it hard to select either of those desktop options? Stephen :) ID: 1956550 · Reply Quote

RickToTheMax Send message Joined: 22 May 99 Posts: 105 Credit: 7,958,297 RAC: 0	Message 1956552 - Posted: 20 Sep 2018, 22:38:51 UTC - in response to Message 1956550. Lunbutu is close to the windows UI also tho, but feels a bit more dated (more windows 2000/XP style) ID: 1956552 · Reply Quote

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1956553 - Posted: 20 Sep 2018, 22:43:37 UTC - in response to Message 1956552. Last modified: 20 Sep 2018, 22:44:16 UTC Lunbutu is close to the windows UI also tho, but feels a bit more dated (more windows 2000/XP style) . . I have tried Lubuntu and I do like the look/feel of it, but I have had problems getting it to run properly on my system which is a mirror of your 2nd system. A Ryzen 7 - 1700 but with 2 1060 6GB cards not one. If I could get that to work I would be satisfied with that. What mobo is your version of that system using? . . But I am willing to try Kubuntu if it solves my problems ... Stephen :) ID: 1956553 · Reply Quote

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.