Setting up Linux to crunch CUDA90 and above for Windows users

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 74 · 75 · 76 · 77 · 78 · 79 · 80 . . . 162 · Next

AuthorMessage
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13736
Credit: 208,696,464
RAC: 304
Australia
Message 1955631 - Posted: 16 Sep 2018, 0:28:04 UTC - in response to Message 1955629.  

What UPS you use? I'm thinking to buy one here too.

These days I use Eaton.
They cost more, but they are actually capable of meeting their specifications.
I've had a bout 4 or 5 different brands over the years, cheaper units, and not one of them came close to meeting the manufacturer's claimed runtime or load ratings.
The Eaton units do.
Grant
Darwin NT
ID: 1955631 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1955636 - Posted: 16 Sep 2018, 0:49:56 UTC - in response to Message 1955629.  

I got the APC SmartUPS SMT1500C. APC SmartUPS SMT1500C at Amazon.

It is a true RMS sine-wave output and can supply 1000W of power for 7 minutes at full load when the line power goes out. I had the APC BackUPS 1500G before. Both are rated for 1500VA loads but in reality the BackUPS has a load limit of 865W because it is a stepped-wave output with only 60% conversion efficiency compared to the 70% conversion efficiency of the SmartUPS. Much bigger batteries in the SmartUPS compared to the BackUPS. The SmartUPS is a much better product than the BackUPS. You pay for it too. $180 versus $480.

I had SmartUPS 1500 ten years ago before the battery chargers failed and couldn't maintain the batteries. They are a solid business class product. Good for 5-6 years at least on the original batteries before the batteries need replacing. Cheapo/knockoff replacement batteries never lasted more than 2-3 years.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1955636 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1955640 - Posted: 16 Sep 2018, 0:58:24 UTC

I had good luck with the earlier generation of SmartUPS 1500's. They could maintain the server load for their published times. Ran well with no issues until I replaced the APC original batteries with Chinese UBC clone batteries. Then the charging circuits began to fail and cook the batteries. They are paperweights now. I'm hoping this new generation of SmartUPS lasts as long as my first ones. I did the battery calibration rundown on it and it came up with the published 7 minutes at 1000W full load specification. So it should be able to run my Seti load for that long. I of course have APCUPSD software controlling and monitoring the UPS. I have it set to shut the system down after 3 minutes so well before it would have exhausted the batteries. You should never pull a lead acid battery down below 50% charge to keep from damaging it.

I looked at the Eaton offerings and believe they are very good products. They charge for it too and that decided it for me to choose the APC unit. Will see I guess if I get my monies worth out of the APC unit.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1955640 · Report as offensive     Reply Quote
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1955653 - Posted: 16 Sep 2018, 2:37:56 UTC - in response to Message 1955640.  

You will be happy with that SmartUPS.
We had dozens of them at work from 5000W down and they just don't fail. Just replace batteries every 2+ years. The consumer products do. My last purchase was an APC BX1500 900W. I only had 2 days of testing with it before my #2 computer had it's meltdown, but it did trip on overload once already. The latest 'sauce' has definitely put more load on my UPSs!

The biggest problems we had with the low end UPS (Say >500W Consumer, where we only needed low use for a hub or radio supply) was that after an extended outage they just wouldn't go back to line. Once the batteries ran out - toast. replace battery and it was the same.
ID: 1955653 · Report as offensive     Reply Quote
Sleepy
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 219
Credit: 98,947,784
RAC: 28,360
Italy
Message 1956338 - Posted: 19 Sep 2018, 16:43:46 UTC

Dear all,
as you know, I eventually managed to have xorg run on the Intel GPU, leaving the Nvidia totally free to crunch. Which is good.
Next step would be trying to overclock it a bit again, as I was under Windows. That was very easy... Under Linux not that much, like many already pointed out here with many suggestions and tricks.

My problem is that under this configuration, I cannot run nvidia-settings, unless I do not run xorg on Nvidia (I tried that and that would work, but I do not want to go back along that road)

Anyone has any idea on how to meet both these two conflicting goals?

Thank you!

Sleepy
ID: 1956338 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1956349 - Posted: 19 Sep 2018, 17:47:15 UTC - in response to Message 1956338.  

You are out of luck with a Pascal card as the only way to overclock is to run nvidia-settings.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1956349 · Report as offensive     Reply Quote
RickToTheMax

Send message
Joined: 22 May 99
Posts: 105
Credit: 7,958,297
RAC: 0
Canada
Message 1956411 - Posted: 20 Sep 2018, 0:27:21 UTC

Regarding overclock and linux/seti..

How would one notice instability if GPU is overclocked too much?
Would it generate inconclusive? error? or the app would crash/restart?

I would guess it would be better to test in a 3d app, like unigine benchmark first
ID: 1956411 · Report as offensive     Reply Quote
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1956412 - Posted: 20 Sep 2018, 0:31:53 UTC - in response to Message 1956411.  

Regarding overclock and linux/seti..

How would one notice instability if GPU is overclocked too much?
Would it generate inconclusive? error? or the app would crash/restart?

I would guess it would be better to test in a 3d app, like unigine benchmark first



application would become unstable and crash. The work unit would then error out. All work units following that work unit will error out as well. After that you would need to find the compute folder and remove them as they are corrupted and will need to regenerate new ones with a reboot. Keith can explain it better.
ID: 1956412 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1956423 - Posted: 20 Sep 2018, 1:30:54 UTC

It's pretty obvious. You will all of a sudden start returning errored work when you overclock too far or your clock is too much for the thermal environment. The tasks will all end up with runtimes of a few seconds and will empty your gpu cache in a matter of minutes. The errored tasks on this host is an example.
Host 6279633

and the task stderr.txt output will have something like this in its output.
<core_client_version>7.4.44</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
Cuda error 'Couldn't get cuda device count
' in file 'cuda/cudaAcceleration.cu' in line 152 : unknown error.

</stderr_txt>
]]>

or

free(): invalid size
SIGABRT: abort called
Stack trace (30 frames):
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x85b6c0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f1a7c9e1890]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f1a7b8cce97]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f1a7b8ce801]
/lib/x86_64-linux-gnu/libc.so.6(+0x89897)[0x7f1a7b917897]
/lib/x86_64-linux-gnu/libc.so.6(+0x9090a)[0x7f1a7b91e90a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4dc)[0x7f1a7b925e2c]
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1(+0x77330a)[0x7f1a71a0c30a]
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1(+0x131699)[0x7f1a713ca699]
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1(+0x77117b)[0x7f1a71a0a17b]
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1(+0x13522b)[0x7f1a713ce22b]
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1(__cuda_CallJitEntryPoint+0x101d)[0x7f1a71367c0d]
/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.396.54(elfLink_Finish+0x63)[0x7f1a796d2fd3]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x1d36e4)[0x7f1a79ae46e4]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x1284ab)[0x7f1a79a394ab]
/usr/lib/x86_64-linux-gnu/libcuda.so.1(+0x1285f0)[0x7f1a79a395f0]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x82b73a]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x81eef0]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x82a56b]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x82ed9f]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x82f50a]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x822d9c]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x80721e]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x841d5c]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x40e49b]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x41c7c5]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x426525]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x4080d8]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f1a7b8afb97]
../../projects/setiathome.berkeley.edu/setiathome_x41p_V0.97b2_Linux-Pascal+_cuda92[0x408f89]

Exiting...

</stderr_txt>
]]>

which indicates the application crashed and couldn't find its resources. You need to exit BOINC and delete the files in the hidden .nv directory in the /Compute Cache folder where the CUDA and OpenCL compute primitives are kept. Then after reducing your overclock or making the card cooler, restart BOINC and it will recreate the compute primitives and you will start crunching again after returning some valid work and BOINC starts removing your error penalty.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1956423 · Report as offensive     Reply Quote
RickToTheMax

Send message
Joined: 22 May 99
Posts: 105
Credit: 7,958,297
RAC: 0
Canada
Message 1956431 - Posted: 20 Sep 2018, 2:56:56 UTC - in response to Message 1956423.  

Thanks for the info, so far so good then, ill slowly raise the core some more..

It seems like the seti task don't benefit much from higher gpu memory clock ? probably not worth it to push it much more i would guess.
ID: 1956431 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1956433 - Posted: 20 Sep 2018, 3:08:31 UTC - in response to Message 1956431.  

Actually on Seti, we find with the special app that it responds best to overclocking the memory. For other projects, just the opposite with Einstein as an example. Try and get the memory clock at least back to what is should be in P0 compute state if Nvidia didn't penalize us for running compute loads on Pascal. Then add in a modest overclock on top of that. So a 1070 can run 8400Mhz on its memory and a 1080 can run 10400 on its memory and and a 1080Ti can run 11000 -11800 on its memory. You have to run the cards in P2 state with the overclock added to P2 state in unrestricted mode. You really should run Petri's keepP2 state utility on each card also to keep it in P2 state and prevent it from moving to P0 state when a task unloads from the card and before the next task loads. That prevents the card from crashing from the overclock.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1956433 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1956434 - Posted: 20 Sep 2018, 3:11:19 UTC
Last modified: 20 Sep 2018, 3:12:12 UTC

I believe on Seti that is best to let GPU Boost 3.0 manage the core clocks based on the thermal environment. I just add a 20-40Mhz boost to the core clocks on each card. I add a 1400Mhz memory boost to my 1080's and 1080Ti's.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1956434 · Report as offensive     Reply Quote
RickToTheMax

Send message
Joined: 22 May 99
Posts: 105
Credit: 7,958,297
RAC: 0
Canada
Message 1956439 - Posted: 20 Sep 2018, 3:31:09 UTC - in response to Message 1956434.  

Interesting, ill look into that, so far i added 50mhz on the 1080 and 100mhz on the 1060, and both at 1000 for memory

Both card are pretty cool at 58-62c

The boost throttle back a bit on the 1080 under load.. but the 1060 is stready at 2113mhz
ID: 1956439 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1956442 - Posted: 20 Sep 2018, 3:56:14 UTC - in response to Message 1956439.  

When I was running my single fan 1060, it consistently was able to hold the highest GPU Boost 3.0 core clocks of around 2025Mhz. It never seemed to be affected much by temperature. All my other air-cooled cards start out at around 2Ghz at first load but then drop down to low 1900's after about 20-30 minutes of crunching. My AIO cards just motor on without a care with not much deviation from their max boosted core clocks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1956442 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1956499 - Posted: 20 Sep 2018, 16:01:45 UTC
Last modified: 20 Sep 2018, 16:02:24 UTC

my watercooled 1080tis, i run at +125MHz core clock lol to keep them around 2000MHz, they are reference FE cards though. core temps in the 40's C

i still don't see much difference in run time with GPU mem overclock on the latest special app (v0.97b2) since there are less calls to memory as i understand it. but i did/do under SoG and zi3v.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1956499 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1956515 - Posted: 20 Sep 2018, 17:35:03 UTC - in response to Message 1956499.  

Based on my observations also Ian, I think you are correct that the latest special app is less responsive to memory overclocks than the older SoG or zi3v apps. Of course, if the overclock doesn't cause instability or heat problems, if it ain't broke . . . . it don't need fixin'. I will just leave them the way they are. No harm - no foul.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1956515 · Report as offensive     Reply Quote
RickToTheMax

Send message
Joined: 22 May 99
Posts: 105
Credit: 7,958,297
RAC: 0
Canada
Message 1956539 - Posted: 20 Sep 2018, 21:16:59 UTC

Is there any thread i should look into to set up keepP2?
I would guess i need some script of some sort,like i will have to boot at stock clock, and maybe have a script run P2 then apply overclock after right?

I'll look into it a bit, but i might just end up using a mild memory overclock that push it close to the limit in P3/P0 state and run with the P2 penalty..

Running my 1060 @ +120 , running at 2138mhz now

Do you guys had to mess with power limit command? usually i crank up the power limit under windows, just wondering if its needed under linux, i think i saw i could change that
with nvidia-smi command, but right now there is still a large buffer before reaching the current TDP limit, so i don't see a point really. Gaming might push it closer to the limit i would guess.

On another note, damn i like Kubuntu 18.04 alot! KDE plasma is now quite polished and is pretty light in resources usage (at least compared to old KDE/Gnome)!
If anyone would like to try that out instead of Ubuntu/Lubuntu, as it is closer to the Windows UI in general. The instruction in here work just fine.
My 2nd system is using KDE Neon (now upgraded to 18.04 as well).
ID: 1956539 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1956550 - Posted: 20 Sep 2018, 22:25:36 UTC - in response to Message 1956539.  
Last modified: 20 Sep 2018, 22:39:03 UTC


On another note, damn i like Kubuntu 18.04 alot! KDE plasma is now quite polished and is pretty light in resources usage (at least compared to old KDE/Gnome)!
If anyone would like to try that out instead of Ubuntu/Lubuntu, as it is closer to the Windows UI in general. The instruction in here work just fine.
My 2nd system is using KDE Neon (now upgraded to 18.04 as well).


. . Thanks for that tip. I might have a look at that ....

. . A question though, which system is running the KDE Plasma front end? The system with the 1060 says it is running Neon so I guess that is your "2nd system". And is it hard to select either of those desktop options?

Stephen

:)
ID: 1956550 · Report as offensive     Reply Quote
RickToTheMax

Send message
Joined: 22 May 99
Posts: 105
Credit: 7,958,297
RAC: 0
Canada
Message 1956552 - Posted: 20 Sep 2018, 22:38:51 UTC - in response to Message 1956550.  

Lunbutu is close to the windows UI also tho, but feels a bit more dated (more windows 2000/XP style)
ID: 1956552 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1956553 - Posted: 20 Sep 2018, 22:43:37 UTC - in response to Message 1956552.  
Last modified: 20 Sep 2018, 22:44:16 UTC

Lunbutu is close to the windows UI also tho, but feels a bit more dated (more windows 2000/XP style)


. . I have tried Lubuntu and I do like the look/feel of it, but I have had problems getting it to run properly on my system which is a mirror of your 2nd system. A Ryzen 7 - 1700 but with 2 1060 6GB cards not one. If I could get that to work I would be satisfied with that. What mobo is your version of that system using?

. . But I am willing to try Kubuntu if it solves my problems ...

Stephen

:)
ID: 1956553 · Report as offensive     Reply Quote
Previous · 1 . . . 74 · 75 · 76 · 77 · 78 · 79 · 80 . . . 162 · Next

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.