Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 102 · 103 · 104 · 105 · 106 · 107 · 108 . . . 162 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
If your read the AIO installer docs, it says that if you want to change to the 0.98b1 CUDA10.1 application you will need to edit app_info and change the application name from the CUDA9.0 application. Other than that and having a Nvidia driver compatible with Turing cards, that is all that is needed. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
EdwardPF Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374 |
Great, THANKS!! This occasional Linus management is a brain teaser for me ... Patience PLEASE!! Thanks again!! Ed F |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Just open the app_info with the Text Editor and do a Find and Replace from the hamburger menu substituting the 0.98b1 CUDA101 filename for the 0.98b1 CUDA90 filename. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
Here is a "fun fact" Back in Feb I upgraded my old AMD MB machine with a new MB, it has a GTX970. It took that machine 17 days to get an RAC of 15,000 using Windows 10 https://setiathome.berkeley.edu/show_host_detail.php?hostid=8669914 I then paired that old MB with a pair of GTX 750ti's, Linux and the "special app" and it reached the same RAC in 5 days, and unlike the Win machine it hasn't stopped yet. https://setiathome.berkeley.edu/show_host_detail.php?hostid=8730293 Now I will be the first to admit I struggled with the first Linux install, and still don't really like the way it works, but as the three machines I have converted are only headless crunchers and I still have the two Windows machines for daily use what's not to like. ;-) A quick PS, a smiling post lady just delivered a parcel, and look what happened [2] NVIDIA GeForce GTX 1060 3GB (3019MB) driver: 418.56 OpenCL: 1.2 One becomes two :-) Now that will be interesting. |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Here is a "fun fact" +1 A proud member of the OFA (Old Farts Association). |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
I then paired that old MB with a pair of GTX 750ti's, Linux and the "special app" and it reached the same RAC in 5 days, and unlike the Win machine it hasn't stopped yet. +1 Yep, doing nicely here. Still thinking this beast will top out ~100k. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . So, is it worthwhile to start a sweep on when someone will eventually kick start 18dc09aa or kick it to the weeds? Stephen ? ? . . I'll take 6pm Wednesday 19th June UTC .... :) |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
I then paired that old MB with a pair of GTX 750ti's, Linux and the "special app" and it reached the same RAC in 5 days, and unlike the Win machine it hasn't stopped yet. Guess I jinxed it. Impressed that in less than 10 days the box worked itself over 100k RAC. Unimpressed that the OS now seems to have cratered, and will no longer even boot. I think the software updater did this, as last action I took was responding to its notification of an Nvidia driver update. Now hangs with a message that NVidia persistence driver is waiting for boot to finish. Twas nice while it lasted ... |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
The Nvidia persistence daemon might be waiting on the password. I've run into persistence daemon issues myself when changing drivers. Two choices. Purge Nvidia drivers to go back to stock Nouveau drivers and then reinstall the drivers or re-enable the persistence daemon by resetting the password. Either way you are going to have to boot to recovery mode Terminal. You should try this first. Boot into recovery mode Terminal and enter these commands. sudo getent group nvidia-persistenced &>/dev/null || groupadd -g 143 nvidia-persistenced Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
The Nvidia persistence daemon might be waiting on the password. I've run into persistence daemon issues myself when changing drivers. Fixed. Some good info here about how-to-fix-an-ubuntu-system-when-it-wont-boot. Booted Grub into Recovery Manager, then had dpkg do its package repair magic. First time didn't fly, second time didn't fly, third time through was the ticket. First pass through it complained that the NVidia driver was only partially installed, likely due to upgrade fail. Second time through, hung again. Third time, it did a full reinstall of the NVidia drivers and then another reboot fixed it. Brought back memories of installing Nortel stuff under VXWorks. I always got it working, but never knew what I was doing or why it worked... Anyway, all better for the moment, and 100k rac is worth a bit of grief. Thanks for the note. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Glad to hear you fixed the issue. Linux doesn't offer the simple repair tools that Windows does. But even Windows fails to repair itself often. That was what prompted me to give my last Windows system the heave-ho when it couldn't fix itself and was going to require a complete reinstall. If I was going to have to reinstall completely, might as well just switch to the Linux install. [Edit] At least you were able to access the GRUB recovery menu. I swear I have team mates that don't even know how to get into the recovery menu. And I have harped on that fact more than once, that they should at least familiarize themselves with the process as they will likely need to use it at some time. It is not really that hard to press the ESC or SHIFT key at boot is it? But they haven't even tried. Ho hum. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
... Oh, I've been know to blow things up just as much, but usually by trying. For example, I've test loaded Ubuntu on most of my various machines, with varying results. One thing I have not yet succeeded at is a 18.04/Win 10 dual boot, though my Win 7 dual boot efforts were easy as could be. All the procedures seem pretty convoluted. A few things that knock my socks off: 1) that apparently you can reinstall the OS and not lose existing user data and structure and, in some cases, even installed apps. Major irritant I've always had with Win. 2) that several scanners I have laying around I can't use on Win, as there was no driver support starting with Win 7. let alone 10. Yet 18.04 supported them just fine. Worth a dual boot just for that. A few that don't: GUIs are weak when it comes to file management, e.g. cutting and pasting a directory with the contents thereof. Folks that write scripts to do stuff like installs and updates are pretty sloppy in providing user update status and progress messages, thus leading to impatience by dummies like me crashing stuff. Wasn't as bad when I still smoked:) We used to measure install scripts in the number of ciggies they took. But it keeps me amused. As for tonight's blow-up, it happened while I was trying to see why 1 GPU was using 0% CPU and had been running a 2 minute Aricebo task for almost half an hour, and had aborted several for timeout. When I suspended the slow task, another didn't start, and as I suspended another it wasn't back-filled either. Thus, soon the machine was waiting for all 4 GPUs. It was looking at that that I found Update Manager wanting to fiddle with the NVidia drive and all the above drama ensued. Pretty weird. but unless it happens again I won't be worried. Later, ... |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
Best bet. Never do updates and nothing should break. If you decide to do updates, pay attention to what packages are being installed, and don’t just blindly hit install. If you see nvidia packages trying to be upgraded, simply unselect them. If you don’t feel confident installing some packages a la carte, then install them all, but before reboot, purge everything with nvidia in it sudo apt purge *nvidia* Then before reboot, reinstall the drivers you want. For example: sudo apt install nvidia-driver-410 But I recommend just never doing the updates if you want it to be stable and don’t do anything else with the system. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1853 Credit: 268,616,081 RAC: 1,349 |
Best bet. Never do updates and nothing should break. You're right, all that box needs to do is crunch away in it's little hidey hole in the basement. Wondering if I should kill that update manager thingie ... Thx. |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
Best bet. Never do updates and nothing should break. A lot of us have. I still have it notify me for "security" updates. But I have disabled all the rest including the version upgrade. Tom A proud member of the OFA (Old Farts Association). |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
You're right, all that box needs to do is crunch away in it's little hidey hole in the basement. . . Is there much of a trick to doing that? Sounds like the way to go ... Stephen ? ? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Another option would be to just disable the Driver PPA so it doesn't offer any more of those "Driver Updates".Best bet. Never do updates and nothing should break. I think most, if not all, of the Driver Update SNAFUs are coming from the PPA. Ubuntu rarely offers NV driver updates. If you need it, just re-enable the PPA. Or, you could do as I have done and just bail on the mistake called 18.04 and move on to 19.04. The good part about 19.04 is it has NV driver 418.56 in the Ubuntu Repository, no need to use the PPA. Other than having to compile a new version of boinc with the new openSSL I haven't had any trouble with 19.04. There is currently a list of recent updates I haven't installed and none of them are a NV driver update. One of these days I will install them. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
You're right, all that box needs to do is crunch away in it's little hidey hole in the basement. Just toggle off the updates in the Software&Update tools Update tab and leave only the Security Updates active. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Tom M Send message Joined: 28 Nov 02 Posts: 5124 Credit: 276,046,078 RAC: 462 |
You're right, all that box needs to do is crunch away in it's little hidey hole in the basement. Start up "updater" from menu. Follow around till you find "settings". Change to "security only", check every 2 weeks, no version update (Either check boxes or drop downs). Save. A proud member of the OFA (Old Farts Association). |
Bernie Vine Send message Joined: 26 May 99 Posts: 9954 Credit: 103,452,613 RAC: 328 |
Just before the outage finished I had this task error Aborting task blc24_2bit_guppi_58340_31750_HIP112870_0010.23665.409.20.29.37.vlar_1: exceeded elapsed time limit 3440.98 (185579.28G/53.93G) followed immediately by 112 of these Starting task blc24_2bit_guppi_58340_31750_HIP112870_0010.31050.818.19.28.202.vlar_1 [SETI@home] Task blc24_2bit_guppi_58340_31750_HIP112870_0010.31050.818.19.28.202.vlar_1 postponed for 180 seconds: Cuda device initialisation failed. Which continued to repeat over and over till I suspended Boinc, I restarted the machine and the 112 tasks are now running OK. I cannot tell which GPU was the culprit as it is not in the stdoutdae .txt file or the error file that was reported. Any ideas? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.