Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 142 · 143 · 144 · 145 · 146 · 147 · 148 . . . 162 · Next
Author | Message |
---|---|
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
Tom did you do updates? I am currently downloading a fresh copy of Ubuntu 18.04.3 (still got a half hour left). I will burn that onto my "install" flash drive with Rufus and then try again. The weekend is coming up. I will decide if I want to jump completely out of the mining rack or try to get the standoffs to be "flat". I have another (open box) B360-F Pro coming in. So I will be trying to beat my issues to death. Tom A proud member of the OFA (Old Farts Association). |
Darrell Wilcox Send message Joined: 11 Nov 99 Posts: 303 Credit: 180,954,940 RAC: 118 |
@ Jimbocous Thu Feb 6 20:45:02 2020 One website I found claims a driver that is too new may cause unstable system. "... If the graphics card is old, newer drivers can do more harm than good for system stability...." Have you tried backing the driver down to 390 or some such? Or moving up to 440 (which I am using)? |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
Tom did you do updates? tlgalenson@moonshot4:~$ nvidia-smi Fri Feb 7 06:14:19 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.48.02 Driver Version: 440.48.02 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 106... Off | 00000000:01:00.0 Off | N/A | | 56% 41C P0 34W / 120W | 109MiB / 3019MiB | 7% Default | +-------------------------------+----------------------+----------------------+ | 1 P102-100 Off | 00000000:02:00.0 Off | N/A | | 54% 30C P8 8W / 250W | 0MiB / 5059MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 P102-100 Off | 00000000:03:00.0 Off | N/A | | 0% 26C P8 10W / 250W | 0MiB / 5059MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 3 P102-100 Off | 00000000:06:00.0 Off | N/A | | 0% 28C P8 12W / 250W | 0MiB / 5059MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 4 GeForce GTX 106... Off | 00000000:0A:00.0 Off | N/A | | 51% 27C P8 6W / 120W | 2MiB / 3019MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 5 P102-100 Off | 00000000:0D:00.0 Off | N/A | | 55% 25C P8 7W / 250W | 0MiB / 5059MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 6 P102-100 Off | 00000000:0F:00.0 Off | N/A | | 0% 28C P8 11W / 250W | 0MiB / 5059MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 7 GeForce GTX 1070 Off | 00000000:10:00.0 Off | N/A | | 0% 31C P8 4W / 151W | 2MiB / 8119MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1595 G /usr/lib/xorg/Xorg 48MiB | | 0 1779 G /usr/bin/gnome-shell 57MiB | +-----------------------------------------------------------------------------+ tlgalenson@moonshot4:~$ I believe I can now accuse the last version of Lubuntu that I was running of limiting me to 6 gpus. I have deleted that Linux ISO as well as everything else except a ISO for Part-Ed boot from my Linux ISO folder that I keep stuff on my Windows box. By George I think I've got it. Since the MB is setting on non-electrostatic plastic I am going to shut it down for now. Step 2. Get it setup in a more permanent "home". Tom A proud member of the OFA (Old Farts Association). |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
. . Redundant |
Buckeye4LF Send message Joined: 19 Jun 00 Posts: 173 Credit: 54,916,209 RAC: 833 |
I got my new rig up and running last night and the rig is now starting to report results. I am surprised at how slow it is ~6,000 seconds per WU. My older window based machine is faster. Where should I check for issues? I have gone through this thread but do not see where I should start troubleshooting. I am very overwhelmed by the Linux learning curve. Computer ID ID: 8894243 Name greg-MS-7C59 Avg. credit 223.02 Total credit 2,280 BOINC version 7.16.3 CPU AMD Ryzen Threadripper 3970X 32-Core Processor [Family 23 Model 49 Stepping 0] (64 processors) GPU [2] NVIDIA GeForce RTX 2070 SUPER (4095MB) driver: 440.48 OpenCL: 1.2 OS Linux Ubuntu 19.10 [5.0.0-38-generic|libc 2.30 (Ubuntu GLIBC 2.30-0ubuntu2)] Last contact 7 Feb 2020, 15:44:46 UTC |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I got my new rig up and running last night and the rig is now starting to report results. I am surprised at how slow it is ~6,000 seconds per WU. My older window based machine is faster. Where should I check for issues? I have gone through this thread but do not see where I should start troubleshooting. I am very overwhelmed by the Linux learning curve. part of your problem is that you are using the slow stock apps provided by the project. a second problem is that you are likely using 100% of the CPU causing everything to slow down as every part of the computer is fighting over resources. reduce that in your compute preferences to something like 80-90% I see no reason why you decided to install the stock BOINC repository install if you eventually intended to use the AIO setup. you should have just went straight for that. I would remove the repository install completely, and just start back up on the AIO package. it contains the faster apps both on CPU and GPU. and after you get that working, come back and inquire about getting slightly more optimized apps running. there is a new AMD AVX app that is reported to run very well for Ryzen, as well as updated CUDA 10.2 apps that will perform better on your RTX cards. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Buckeye4LF Send message Joined: 19 Jun 00 Posts: 173 Credit: 54,916,209 RAC: 833 |
I did not do the repository install, I downloaded and unzipped the AIO. I thought it was a unzip and go, maybe I missed a step somewhere. Not sure what I did wrong.... CPU usage is set at 90%. I was told by someone to reinstall Linux and start over......not very practical I am using the CUDA 102 already.... Maybe I need to reboot to restart the project, will try that when I get home from work. I can always go back to windows........ |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Question is whether the AIO is compatible with Ubuntu 19.10. There is compatibility with 19.04 as there is a zipped up version in the AIO for 19.04. But there is the possibility that dependencies are not met and somehow, someway, the host decided to install the repo version of BOINC when the AIO failed to run. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I did not do the repository install, I downloaded and unzipped the AIO. I thought it was a unzip and go, maybe I missed a step somewhere. Not sure what I did wrong.... all of the tasks that you have returned have been the stock apps. I see that you have returned some CUDA 60 and SoG tasks (OpenCL). you have not returned any work from the special app and your CPU work is not using the apps provided in the AIO. If you were using the AIO properly, you would have your CPU and GPU tasks labelled as "Anonymous Platform" but that is not the case. All you have to do is look at your task list and you'll see what we mean. it's likely that you made some mistake in your app_info.xml file, and BOINC went back to the stock apps. post your app_info.xml file contents. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I missed this part of the post . . . I am using the CUDA 102 already.... Likely he flubbed the edit of the app_info and it got thrown away. So that is why he got the stock apps. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
You can see a number of Inconsistent results in this page; 1) The AIO doesn't use BOINC 7.16.3 2) Data directory: /home/greg, Should be "Data directory: /home/greg/BOINC" This implies the Home folder is being used as the Data folder, quite messy. 3) Creating new client state file, This usually means something BAD has happened 4) Could not open directory 'slots' from '/home/greg', Again, the Data directory Should be /home/greg/BOINC From the above, it would appear he is Not running the BOINC AIO from the BOINC folder. Linux is giving me issues. Does anyone Know what I am doing wrong? Here is a paste from the terminal window.....Oh, and I have No Idea if the AIO will work with Ubuntu 19.10, Newer versions are REALLY GOOD at breaking things. It's best to stay with 18.04 for now. |
Buckeye4LF Send message Joined: 19 Jun 00 Posts: 173 Credit: 54,916,209 RAC: 833 |
So, not sure what I did wrong but I think it is fixed. I was running the boinccmd command out of my AIO folder but it was executing in repository location var/lib/boinc. I let my tasks complete and deleted that folder. I then re-unpacked AIO and unzipped ubuntu19.04 from the subfolder. I have already reported anonymous platform results in less than a minute. I did change application file to run CUDA102 but left everything else alone. Thanks for everybody's comments, I am still learning nuances of linux but it looks like I am up and running now. I did not revert back, running Ubuntu19.10 |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
This happens when you have more than one version of Boinc on your computer and that 'other' version is installed as a Service, such as the Repository version of Boinc. That's why it's recommended to the Remove any other version of Boinc from your machine before adding the BOINC-All-In-One. The Best way to remove the Repository version is to use the Package Manager to search for any file referencing boinc and choose to Remove them COMPLETELY. But Now, you have the BOINC files in your Home folder as well as the ones in the Repository Install. So, you must be careful not to delete any BOINC files from your Home /BOINC folder. It's easiest to zip the active BOINC-All-In-One folder, save it to a USB drive, then reinstall Ubuntu and Do Not Install the Repository version of Boinc. All you need to run Boinc is a compatible video driver, and the All-In-One folder. Here is the Old instructions for the Berkeley version of BOINC, which is the version in the All-In-One folder; https://boinc.berkeley.edu/wiki/Installing_BOINC#The_Berkeley_Installer "...everything related to the BOINC client is contained within that directory (/Home/BOINC), and you should always run the client and the manager from that working directory." Pay special attention to the Part about Changing Directories before running boincmgr from the Terminal. It's easiest to run boincmgr by just doing as suggested, Open the BOINC folder, then Double click on boincmgr. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Here's a suggestion. Having spent last weekend building and then configuring a new Linux box to run the 'special sauce' app (thanks guys for that), I was mentally reviewing all my build/configure steps to make sure I hadn't missed anything. And I had. I believe that the 'Read Me' file (which I didn't read, of course) advises users not to restart processing from a checkpoint file. OK, I've now made the manual change suggested. But it would be far safer, far more robust, and probably quicker if you simply disabled the checkpointing code in the app - then the problem would go away completely, without requiring every user to read and act on the advice. You need to keep the - slightly broken - progress reporting code, of course, but just don't write a 'state.sah' file. That's all. |
tazzduke Send message Joined: 15 Sep 07 Posts: 190 Credit: 28,269,068 RAC: 5 |
Greetings All I can confirm that I used the 19.04 boinc (AIO folder) on a Kubuntu 19.10 and also a Lubuntu 19.10 and noticed no problems, well there was no errors in the event log at BOINC startup. Completed a fair few tasks with no know errors as well. I was using these versions of Ubuntu back in November, but have now gone back to running Linux Mint 19.3 on my 2 x Linux crunchers. Turns out I still like the Mint lol. Regards |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Here's a suggestion. Great suggestion Richard, need to mention it to the GPUUG developers to hunt out the checkpointing code in the special app and disable it. Then there would be no need to set a checkpoint interval longer than the normal crunching time for gpu tasks. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Tom M Send message Joined: 28 Nov 02 Posts: 5126 Credit: 276,046,078 RAC: 462 |
On my Intel MSI B360-F Pro MB I just turned off "cpu virtualization" and severe lagging behavior went away. I had shutdown the system and swapped in 3 gtx 1600 Supers for some non-P100-102 gpus. And when I rebooted it was slow and laggy. I have all the security patches up to date on my freshly downloaded/installed Ubuntu 18.x.x(?) but until I toggled that off (why was it on?<shrug>) the problem persisted through multiple boots. Tom A proud member of the OFA (Old Farts Association). |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK, really confused. Did you mean you turned off cpu hyperthreading, or that you turned off VT or virtualization? Probably interfering with the Above4G decoding. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I think you should go back into the BIOS and set all slots to gen 3 if you can. There’s usually a setting specifically for the main x16 slot, as well as a setting for the other slots through the chipset. In addition to the link speed query you’re running, id like to see a normal nvidia-smi command run to see GPU utilization while its running tasks on all GPUs. Same with the Link speed query, make sure you’re always posting the output from when it’s actually running and not just sitting idle at the desktop. Power saving features will reduce the link speed if the card is not active. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I guess he means VT/virtualization, since he’s still got HT enabled (reporting 16 CPUs) I’m not sure why virtualization would matter for the problem he was experiencing, unless it’s some weird bug in the BIOS. But he mentioned in another thread that he changed another setting at the same time (turned mining mode off) so when you change 2 variables at once and observe a change it’s hard to say which one really caused the change. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.