Setting up Linux to crunch CUDA90 and above for Windows users

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 102 · 103 · 104 · 105 · 106 · 107 · 108 . . . 162 · Next

AuthorMessage
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1997953 - Posted: 12 Jun 2019, 20:30:00 UTC - in response to Message 1997952.  

If your read the AIO installer docs, it says that if you want to change to the 0.98b1 CUDA10.1 application you will need to edit app_info and change the application name from the CUDA9.0 application. Other than that and having a Nvidia driver compatible with Turing cards, that is all that is needed.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1997953 · Report as offensive     Reply Quote
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 1997954 - Posted: 12 Jun 2019, 20:34:33 UTC - in response to Message 1997953.  

Great, THANKS!!

This occasional Linus management is a brain teaser for me ... Patience PLEASE!!

Thanks again!!

Ed F
ID: 1997954 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1997967 - Posted: 12 Jun 2019, 23:49:18 UTC - in response to Message 1997954.  

Just open the app_info with the Text Editor and do a Find and Replace from the hamburger menu substituting the 0.98b1 CUDA101 filename for the 0.98b1 CUDA90 filename.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1997967 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1998151 - Posted: 14 Jun 2019, 7:15:18 UTC
Last modified: 14 Jun 2019, 9:06:29 UTC

Here is a "fun fact"

Back in Feb I upgraded my old AMD MB machine with a new MB, it has a GTX970. It took that machine 17 days to get an RAC of 15,000 using Windows 10

https://setiathome.berkeley.edu/show_host_detail.php?hostid=8669914

I then paired that old MB with a pair of GTX 750ti's, Linux and the "special app" and it reached the same RAC in 5 days, and unlike the Win machine it hasn't stopped yet.

https://setiathome.berkeley.edu/show_host_detail.php?hostid=8730293

Now I will be the first to admit I struggled with the first Linux install, and still don't really like the way it works, but as the three machines I have converted are only headless crunchers and I still have the two Windows machines for daily use what's not to like. ;-)

A quick PS, a smiling post lady just delivered a parcel, and look what happened

[2] NVIDIA GeForce GTX 1060 3GB (3019MB) driver: 418.56 OpenCL: 1.2


One becomes two :-)

Now that will be interesting.
ID: 1998151 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1998155 - Posted: 14 Jun 2019, 9:10:12 UTC - in response to Message 1998151.  

Here is a "fun fact"

Back in Feb I upgraded my old AMD MB machine with a new MB, it has a GTX970. It took that machine 17 days to get an RAC of 15,000 using Windows 10

https://setiathome.berkeley.edu/show_host_detail.php?hostid=8669914

I then paired that old MB with a pair of GTX 750ti's, Linux and the "special app" and it reached the same RAC in 5 days, and unlike the Win machine it hasn't stopped yet.

https://setiathome.berkeley.edu/show_host_detail.php?hostid=8730293

Now I will be the first to admit I struggled with the first Linux install, and still don't really like the way it works, but as the three machines I have converted are only headless crunchers and I still have the two Windows machines for daily use what's not to like. ;-)

A quick PS, a smiling post lady just delivered a parcel, and look what happened

[2] NVIDIA GeForce GTX 1060 3GB (3019MB) driver: 418.56 OpenCL: 1.2


One becomes two :-)

Now that will be interesting.


+1
A proud member of the OFA (Old Farts Association).
ID: 1998155 · Report as offensive     Reply Quote
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1998222 - Posted: 14 Jun 2019, 21:46:18 UTC - in response to Message 1998155.  
Last modified: 14 Jun 2019, 21:47:22 UTC

I then paired that old MB with a pair of GTX 750ti's, Linux and the "special app" and it reached the same RAC in 5 days, and unlike the Win machine it hasn't stopped yet.

https://setiathome.berkeley.edu/show_host_detail.php?hostid=8730293


+1

+1
Yep, doing nicely here. Still thinking this beast will top out ~100k.
ID: 1998222 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1998489 - Posted: 17 Jun 2019, 1:08:48 UTC

. . So, is it worthwhile to start a sweep on when someone will eventually kick start 18dc09aa or kick it to the weeds?

Stephen

? ?

. . I'll take 6pm Wednesday 19th June UTC ....

:)
ID: 1998489 · Report as offensive     Reply Quote
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1998501 - Posted: 17 Jun 2019, 4:22:43 UTC - in response to Message 1998222.  
Last modified: 17 Jun 2019, 4:23:22 UTC

I then paired that old MB with a pair of GTX 750ti's, Linux and the "special app" and it reached the same RAC in 5 days, and unlike the Win machine it hasn't stopped yet.

https://setiathome.berkeley.edu/show_host_detail.php?hostid=8730293


+1

+1
Yep, doing nicely here. Still thinking this beast will top out ~100k.

Guess I jinxed it. Impressed that in less than 10 days the box worked itself over 100k RAC.
Unimpressed that the OS now seems to have cratered, and will no longer even boot.
I think the software updater did this, as last action I took was responding to its notification of an Nvidia driver update. Now hangs with a message that NVidia persistence driver is waiting for boot to finish.
Twas nice while it lasted ...
ID: 1998501 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1998503 - Posted: 17 Jun 2019, 4:40:13 UTC - in response to Message 1998501.  

The Nvidia persistence daemon might be waiting on the password. I've run into persistence daemon issues myself when changing drivers.

Two choices. Purge Nvidia drivers to go back to stock Nouveau drivers and then reinstall the drivers or re-enable the persistence daemon by resetting the password. Either way you are going to have to boot to recovery mode Terminal.

You should try this first. Boot into recovery mode Terminal and enter these commands.

sudo getent group nvidia-persistenced &>/dev/null || groupadd -g 143 nvidia-persistenced
sudo getent passwd nvidia-persistenced &>/dev/null || useradd -c 'NVIDIA Persistence Daemon' -u 143 -g nvidia-persistenced -d '/' -s /sbin/nologin nvidia-persistenced

Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1998503 · Report as offensive     Reply Quote
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1998509 - Posted: 17 Jun 2019, 5:22:17 UTC - in response to Message 1998503.  
Last modified: 17 Jun 2019, 5:33:50 UTC

The Nvidia persistence daemon might be waiting on the password. I've run into persistence daemon issues myself when changing drivers.

Two choices. Purge Nvidia drivers to go back to stock Nouveau drivers and then reinstall the drivers or re-enable the persistence daemon by resetting the password. Either way you are going to have to boot to recovery mode Terminal.

You should try this first. Boot into recovery mode Terminal and enter these commands.

sudo getent group nvidia-persistenced &>/dev/null || groupadd -g 143 nvidia-persistenced
sudo getent passwd nvidia-persistenced &>/dev/null || useradd -c 'NVIDIA Persistence Daemon' -u 143 -g nvidia-persistenced -d '/' -s /sbin/nologin nvidia-persistenced

Fixed.
Some good info here about how-to-fix-an-ubuntu-system-when-it-wont-boot.
Booted Grub into Recovery Manager, then had dpkg do its package repair magic. First time didn't fly, second time didn't fly, third time through was the ticket.
First pass through it complained that the NVidia driver was only partially installed, likely due to upgrade fail.
Second time through, hung again.
Third time, it did a full reinstall of the NVidia drivers and then another reboot fixed it.
Brought back memories of installing Nortel stuff under VXWorks. I always got it working, but never knew what I was doing or why it worked...
Anyway, all better for the moment, and 100k rac is worth a bit of grief. Thanks for the note.
ID: 1998509 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1998515 - Posted: 17 Jun 2019, 6:48:15 UTC - in response to Message 1998509.  
Last modified: 17 Jun 2019, 6:53:54 UTC

Glad to hear you fixed the issue. Linux doesn't offer the simple repair tools that Windows does. But even Windows fails to repair itself often. That was what prompted me to give my last Windows system the heave-ho when it couldn't fix itself and was going to require a complete reinstall. If I was going to have to reinstall completely, might as well just switch to the Linux install.

[Edit] At least you were able to access the GRUB recovery menu. I swear I have team mates that don't even know how to get into the recovery menu. And I have harped on that fact more than once, that they should at least familiarize themselves with the process as they will likely need to use it at some time. It is not really that hard to press the ESC or SHIFT key at boot is it?

But they haven't even tried. Ho hum.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1998515 · Report as offensive     Reply Quote
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1998521 - Posted: 17 Jun 2019, 7:49:37 UTC - in response to Message 1998515.  
Last modified: 17 Jun 2019, 7:51:52 UTC

...

[Edit] At least you were able to access the GRUB recovery menu. I swear I have team mates that don't even know how to get into the recovery menu. And I have harped on that fact more than once, that they should at least familiarize themselves with the process as they will likely need to use it at some time. It is not really that hard to press the ESC or SHIFT key at boot is it?

But they haven't even tried. Ho hum.

Oh, I've been know to blow things up just as much, but usually by trying. For example, I've test loaded Ubuntu on most of my various machines, with varying results.
One thing I have not yet succeeded at is a 18.04/Win 10 dual boot, though my Win 7 dual boot efforts were easy as could be. All the procedures seem pretty convoluted.
A few things that knock my socks off:
1) that apparently you can reinstall the OS and not lose existing user data and structure and, in some cases, even installed apps. Major irritant I've always had with Win.
2) that several scanners I have laying around I can't use on Win, as there was no driver support starting with Win 7. let alone 10. Yet 18.04 supported them just fine. Worth a dual boot just for that.
A few that don't:
GUIs are weak when it comes to file management, e.g. cutting and pasting a directory with the contents thereof.
Folks that write scripts to do stuff like installs and updates are pretty sloppy in providing user update status and progress messages, thus leading to impatience by dummies like me crashing stuff. Wasn't as bad when I still smoked:) We used to measure install scripts in the number of ciggies they took.

But it keeps me amused.

As for tonight's blow-up, it happened while I was trying to see why 1 GPU was using 0% CPU and had been running a 2 minute Aricebo task for almost half an hour, and had aborted several for timeout.
When I suspended the slow task, another didn't start, and as I suspended another it wasn't back-filled either. Thus, soon the machine was waiting for all 4 GPUs. It was looking at that that I found Update Manager wanting to fiddle with the NVidia drive and all the above drama ensued.
Pretty weird. but unless it happens again I won't be worried.
Later, ...
ID: 1998521 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 1998522 - Posted: 17 Jun 2019, 7:50:43 UTC - in response to Message 1998515.  

Best bet. Never do updates and nothing should break.

If you decide to do updates, pay attention to what packages are being installed, and don’t just blindly hit install. If you see nvidia packages trying to be upgraded, simply unselect them.

If you don’t feel confident installing some packages a la carte, then install them all, but before reboot, purge everything with nvidia in it

sudo apt purge *nvidia*


Then before reboot, reinstall the drivers you want. For example:

sudo apt install nvidia-driver-410


But I recommend just never doing the updates if you want it to be stable and don’t do anything else with the system.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 1998522 · Report as offensive     Reply Quote
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 1998524 - Posted: 17 Jun 2019, 7:58:26 UTC - in response to Message 1998522.  
Last modified: 17 Jun 2019, 7:59:20 UTC

Best bet. Never do updates and nothing should break.

You're right, all that box needs to do is crunch away in it's little hidey hole in the basement.
Wondering if I should kill that update manager thingie ...
Thx.
ID: 1998524 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1998538 - Posted: 17 Jun 2019, 13:43:19 UTC - in response to Message 1998524.  

Best bet. Never do updates and nothing should break.

You're right, all that box needs to do is crunch away in it's little hidey hole in the basement.
Wondering if I should kill that update manager thingie ...
Thx.


A lot of us have. I still have it notify me for "security" updates. But I have disabled all the rest including the version upgrade.

Tom
A proud member of the OFA (Old Farts Association).
ID: 1998538 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1998548 - Posted: 17 Jun 2019, 14:47:08 UTC - in response to Message 1998538.  

You're right, all that box needs to do is crunch away in it's little hidey hole in the basement.
Wondering if I should kill that update manager thingie ...
Thx.

A lot of us have. I still have it notify me for "security" updates. But I have disabled all the rest including the version upgrade.
Tom


. . Is there much of a trick to doing that? Sounds like the way to go ...

Stephen

? ?
ID: 1998548 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1998553 - Posted: 17 Jun 2019, 15:20:30 UTC - in response to Message 1998524.  

Best bet. Never do updates and nothing should break.

You're right, all that box needs to do is crunch away in it's little hidey hole in the basement.
Wondering if I should kill that update manager thingie ...
Thx.
Another option would be to just disable the Driver PPA so it doesn't offer any more of those "Driver Updates".
I think most, if not all, of the Driver Update SNAFUs are coming from the PPA. Ubuntu rarely offers NV driver updates. If you need it, just re-enable the PPA.
Or, you could do as I have done and just bail on the mistake called 18.04 and move on to 19.04. The good part about 19.04 is it has NV driver 418.56 in the Ubuntu Repository, no need to use the PPA.
Other than having to compile a new version of boinc with the new openSSL I haven't had any trouble with 19.04. There is currently a list of recent updates I haven't installed and none of them are a NV driver update.
One of these days I will install them.
ID: 1998553 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 1998558 - Posted: 17 Jun 2019, 17:05:56 UTC - in response to Message 1998548.  

You're right, all that box needs to do is crunch away in it's little hidey hole in the basement.
Wondering if I should kill that update manager thingie ...
Thx.

A lot of us have. I still have it notify me for "security" updates. But I have disabled all the rest including the version upgrade.
Tom


. . Is there much of a trick to doing that? Sounds like the way to go ...

Stephen

? ?

Just toggle off the updates in the Software&Update tools Update tab and leave only the Security Updates active.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 1998558 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 1998598 - Posted: 17 Jun 2019, 21:28:06 UTC - in response to Message 1998548.  

You're right, all that box needs to do is crunch away in it's little hidey hole in the basement.
Wondering if I should kill that update manager thingie ...
Thx.

A lot of us have. I still have it notify me for "security" updates. But I have disabled all the rest including the version upgrade.
Tom


. . Is there much of a trick to doing that? Sounds like the way to go ...

Stephen

? ?


Start up "updater" from menu. Follow around till you find "settings". Change to "security only", check every 2 weeks, no version update (Either check boxes or drop downs).
Save.
A proud member of the OFA (Old Farts Association).
ID: 1998598 · Report as offensive     Reply Quote
Profile Bernie Vine
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 26 May 99
Posts: 9954
Credit: 103,452,613
RAC: 328
United Kingdom
Message 1998695 - Posted: 18 Jun 2019, 19:06:00 UTC

Just before the outage finished I had this task error

Aborting task blc24_2bit_guppi_58340_31750_HIP112870_0010.23665.409.20.29.37.vlar_1: exceeded elapsed time limit 3440.98 (185579.28G/53.93G)

followed immediately by 112 of these

Starting task blc24_2bit_guppi_58340_31750_HIP112870_0010.31050.818.19.28.202.vlar_1
[SETI@home] Task blc24_2bit_guppi_58340_31750_HIP112870_0010.31050.818.19.28.202.vlar_1 postponed for 180 seconds: Cuda device initialisation failed.

Which continued to repeat over and over till I suspended Boinc, I restarted the machine and the 112 tasks are now running OK.

I cannot tell which GPU was the culprit as it is not in the stdoutdae .txt file or the error file that was reported.

Any ideas?
ID: 1998695 · Report as offensive     Reply Quote
Previous · 1 . . . 102 · 103 · 104 · 105 · 106 · 107 · 108 . . . 162 · Next

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.