Setting up Linux to crunch CUDA90 and above for Windows users

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 76 · 77 · 78 · 79 · 80 · 81 · 82 . . . 162 · Next

AuthorMessage
Sleepy
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 219
Credit: 98,947,784
RAC: 28,360
Italy
Message 1957078 - Posted: 24 Sep 2018, 6:55:52 UTC - in response to Message 1957023.  
Last modified: 24 Sep 2018, 7:32:21 UTC

It can be done and works.

Here:
http://setiathome.berkeley.edu/forum_thread.php?id=81271&postid=1952070
You can skip the unplug/replug GPU step.

But actually the most important part is the xorg.conf.

You can just disable Nouveau drivers and install Nvidia even from device manager and cook the right .conf.
For Intel, you may put "i915" as driver name (check on your system).

Next step is overclocking.
At the moment it does not work without Nvidia not running xorg. But I have ideas...

Sleepy
ID: 1957078 · Report as offensive     Reply Quote
J. Mileski
Volunteer tester
Avatar

Send message
Joined: 9 Jun 02
Posts: 632
Credit: 172,116,532
RAC: 572
United States
Message 1957083 - Posted: 24 Sep 2018, 9:33:42 UTC - in response to Message 1957078.  
Last modified: 24 Sep 2018, 9:34:16 UTC

It can be done and works.

Here:
http://setiathome.berkeley.edu/forum_thread.php?id=81271&postid=1952070
You can skip the unplug/replug GPU step.

But actually the most important part is the xorg.conf.

You can just disable Nouveau drivers and install Nvidia even from device manager and cook the right .conf.
For Intel, you may put "i915" as driver name (check on your system).

Next step is overclocking.
At the moment it does not work without Nvidia not running xorg. But I have ideas...

Sleepy


To be clear, my onboard graphics is an ASPEED chipset on the motherboard, not a gpu on cpu.

I'm going to make it easy on myself and disable the onboard video when I install linux.
ID: 1957083 · Report as offensive     Reply Quote
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1957084 - Posted: 24 Sep 2018, 9:44:13 UTC - in response to Message 1957083.  

I'm going to make it easy on myself and disable the onboard video when I install linux.
I think that is a good idea. The performance gain you gain from the special app will more than make up for the video usage ... as long as your card has enough memory.
ID: 1957084 · Report as offensive     Reply Quote
J. Mileski
Volunteer tester
Avatar

Send message
Joined: 9 Jun 02
Posts: 632
Credit: 172,116,532
RAC: 572
United States
Message 1957085 - Posted: 24 Sep 2018, 9:48:06 UTC - in response to Message 1957084.  

I'm going to make it easy on myself and disable the onboard video when I install linux.
I think that is a good idea. The performance gain you gain from the special app will more than make up for the video usage ... as long as your card has enough memory.

The 2 GTX 960 are 4GB each from MSI
ID: 1957085 · Report as offensive     Reply Quote
Sleepy
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 219
Credit: 98,947,784
RAC: 28,360
Italy
Message 1957093 - Posted: 24 Sep 2018, 10:51:57 UTC - in response to Message 1957078.  

It does not matter which onboard graphic you have.
I assumed it was the Intel embedded in recent CPUs, but my suggestions are the same.
You should just have to change the driver entry in xorg.conf

In my experience, the problem was not what you loose or gain in Seti crunching, that was fine with my Nvdia running Xorg or not, no big difference.
The difference was the usability of the PC. With Nvidia running Xorg, stuttering and lag were not acceptable for me while working.
Disabling Seti while working would be overkill and probably really counterbalancing the advantages of the new application.
In my current setting, I am really fine. Working and crunching do not interfere.

Sleepy
ID: 1957093 · Report as offensive     Reply Quote
J. Mileski
Volunteer tester
Avatar

Send message
Joined: 9 Jun 02
Posts: 632
Credit: 172,116,532
RAC: 572
United States
Message 1957098 - Posted: 24 Sep 2018, 11:55:26 UTC - in response to Message 1957093.  

I should be clear too, I only crunch seti on this box. I currently have 2 120GB SSD's in it, Windows is installed on one and linux will be installed on the other. Only time I'm on this computer is to see if it is still running, or if windows needs a reboot after an update. I have other computers for my internet needs.
ID: 1957098 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1957156 - Posted: 24 Sep 2018, 21:10:19 UTC - in response to Message 1957098.  

I should be clear too, I only crunch seti on this box. I currently have 2 120GB SSD's in it, Windows is installed on one and linux will be installed on the other. Only time I'm on this computer is to see if it is still running, or if windows needs a reboot after an update. I have other computers for my internet needs.


Many of us don't have an internal video card that can be run at the same time as discrete video card can. And most have not had any issues they have been posting about.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1957156 · Report as offensive     Reply Quote
RickToTheMax

Send message
Joined: 22 May 99
Posts: 105
Credit: 7,958,297
RAC: 0
Canada
Message 1957364 - Posted: 26 Sep 2018, 20:34:19 UTC

One of my host is not a dedicated cruncher, and sometimes i need to suspend/resume tasks...
In the special app docs i can see this:
"5) The App may give Incorrect results on a restarted task. One way to avoid restarted tasks is to set the checkpoint higher than the task's estimated run-time, and also avoid suspending/resuming a task."
How do i set the checkpoint? is it the "Request tasks to checkpoint at most every X seconds" in Boinc Computing preference?
If i set it higher than my task run-time, it would just discard the task if not finished, and when resuming it would just start over? and not generate incorrect result?

Another problem i have since adding a 2nd card to my other host, i had to use the nvidia-xconfig --enable-all-gpus , and set coolbits to both device.
But now it seems i have a ghost 2nd screen, when i move my mouse to the right side, it is like there is a 2nd monitor connected, my cursor vanish and when i move back to my screen the cursor
is at a different position. (feels like that 2nd ghost screen is using a smaller resolution)
ID: 1957364 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1957369 - Posted: 26 Sep 2018, 21:06:44 UTC - in response to Message 1957364.  
Last modified: 26 Sep 2018, 21:13:21 UTC

One of my host is not a dedicated cruncher, and sometimes i need to suspend/resume tasks...
In the special app docs i can see this:
"5) The App may give Incorrect results on a restarted task. One way to avoid restarted tasks is to set the checkpoint higher than the task's estimated run-time, and also avoid suspending/resuming a task."
How do i set the checkpoint? is it the "Request tasks to checkpoint at most every X seconds" in Boinc Computing preference?
If i set it higher than my task run-time, it would just discard the task if not finished, and when resuming it would just start over? and not generate incorrect result?

Another problem i have since adding a 2nd card to my other host, i had to use the nvidia-xconfig --enable-all-gpus , and set coolbits to both device.
But now it seems i have a ghost 2nd screen, when i move my mouse to the right side, it is like there is a 2nd monitor connected, my cursor vanish and when i move back to my screen the cursor
is at a different position. (feels like that 2nd ghost screen is using a smaller resolution)


. . Hi Rick

. . Yes!

. . I can't remember if the default is 120, 180 or 300 seconds. I have used several different values there depending on the machine. But as long as you set it definitely higher than your normal run time you should have very few if any restarts. It does mean the running task will be discarded, but for one reason or another I have occassionally had a task restart and not had it crash so the problem may no longer be as severe as it had been. But it is still prudent to avoid restarts as much as possible. If I need to stop crunching at any time (with multiple GPUs) I check out the task status and wait for the most advanced task to end so I am only dumping one partially completed task. But that is me ... :)

. . As for the screen issue it sounds to me like you have set the "output screen size" larger than your physical display can manage so it is a virtual screen you have to scroll around. Check out the maximum resolution your monitor can display and set the screen size in xorg to match it. It may be as simple as having a 4:3 monitor but selecting 16:9 screen size. Been there and done that :)

Stephen

:)
ID: 1957369 · Report as offensive     Reply Quote
RickToTheMax

Send message
Joined: 22 May 99
Posts: 105
Credit: 7,958,297
RAC: 0
Canada
Message 1957370 - Posted: 26 Sep 2018, 21:21:26 UTC - in response to Message 1957369.  


. . As for the screen issue it sounds to me like you have set the "output screen size" larger than your physical display can manage so it is a virtual screen you have to scroll around. Check out the maximum resolution your monitor can display and set the screen size in xorg to match it. It may be as simple as having a 4:3 monitor but selecting 16:9 screen size. Been there and done that :)


Actually my working screen resolution is fine no problem there, the problem is, my system acts like there is a second display connected while there is none.
Something is probably wrong with my xorg config
ID: 1957370 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1957398 - Posted: 27 Sep 2018, 1:45:27 UTC

Interesting symptom Rick. Zalster has been complaining of the exact same thing on a computer at work that he crunches with. Now I know it is not a one-off problem since you have it too. I think there is something wrong with your xorg.conf configuration too. Have you started from scratch with a --allow-empty-initial-configuration on nvidia-xconfig? Then follow up with a --no-separate-x-screens? Or a --only-one-x-screen?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1957398 · Report as offensive     Reply Quote
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1957408 - Posted: 27 Sep 2018, 2:34:18 UTC

Just managed to make a mess of threadripper.

Tried to update the Nvidia drivers and it has gone badly wrong. After I restarted it has failed to start - all I am getting is a blank screen with a message occasionally appear and disappear so fast that I cannot read it.

I can get into the boot manager and the advanced options but cannot work out how to fix it.
Kevin


ID: 1957408 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1957411 - Posted: 27 Sep 2018, 3:06:41 UTC - in response to Message 1957408.  

Just a wild guess since I was just dealing with this issue, but try moving your connected monitor to the other gpus in the system and see if the X server monitor output has moved to one of the other cards due to reenumeration of the BusID's in the xorg.conf file. The symptom I had was the last output on the screen was the enabling of my user id and then the output appears to freeze. I have removed the no splash in the kernel command line to see the boot process. But in reality that point is where the X server starts rendering the desktop and the output had moved to the bottom installed card. All I had to do was unplug the monitor from the top card and try the outputs of each other card until I found the normal desktop sitting there waiting for me. Then just had to figure out what the BusID of each card was with the nvidia X settings app and rewrite the xorg.conf file to properly identify the cards and get the desktop output back onto the top slot card which is the one I normally connect to the monitor.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1957411 · Report as offensive     Reply Quote
RickToTheMax

Send message
Joined: 22 May 99
Posts: 105
Credit: 7,958,297
RAC: 0
Canada
Message 1957412 - Posted: 27 Sep 2018, 3:11:31 UTC - in response to Message 1957411.  

Thanks for the help! I will try your suggestions
ID: 1957412 · Report as offensive     Reply Quote
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1957413 - Posted: 27 Sep 2018, 3:16:31 UTC - in response to Message 1957411.  

No, no signal on any other card, boot manager is on first card.
Kevin


ID: 1957413 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1957416 - Posted: 27 Sep 2018, 3:51:15 UTC

If you can boot to the recovery mode, then drop to the root console and enable read/write access to the drive that has the installation with a:
mount -o rw,remount /

then do a:
apt-get remove --purge nvidia*
followed up with a:
apt-get autoremove

Then do the installation again either with the nvidia .run file or with a:
apt-get install nvidia-396
or
apt-get install nvidia-390

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1957416 · Report as offensive     Reply Quote
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1957417 - Posted: 27 Sep 2018, 3:57:12 UTC - in response to Message 1957416.  

Ok will do
Kevin


ID: 1957417 · Report as offensive     Reply Quote
Kevin Olley

Send message
Joined: 3 Aug 99
Posts: 906
Credit: 261,085,289
RAC: 572
United Kingdom
Message 1957424 - Posted: 27 Sep 2018, 5:10:30 UTC

Up and running again, Thanks
Kevin


ID: 1957424 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1957429 - Posted: 27 Sep 2018, 6:35:50 UTC - in response to Message 1957416.  

If you can boot to the recovery mode, then drop to the root console and enable read/write access to the drive that has the installation with a:
mount -o rw,remount /

then do a:
apt-get remove --purge nvidia*
followed up with a:
apt-get autoremove

Then do the installation again either with the nvidia .run file or with a:
apt-get install nvidia-396
or
apt-get install nvidia-390


. . Hi Keith,

. . Did I misunderstand TBar about using the recovery mode? I thought the reason for doing that was there is no video drivers loaded at that point and the install will run as a clean install so no need for stop lightdm or apt-get remove?

Stephen

? ?
ID: 1957429 · Report as offensive     Reply Quote
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1957431 - Posted: 27 Sep 2018, 7:04:25 UTC - in response to Message 1957429.  

The drivers may not be loaded when in recovery mode, but they are still installed on the system you are wanting to update.
ID: 1957431 · Report as offensive     Reply Quote
Previous · 1 . . . 76 · 77 · 78 · 79 · 80 · 81 · 82 . . . 162 · Next

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.