Setting up Linux to crunch CUDA90 and above for Windows users

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 142 · 143 · 144 · 145 · 146 · 147 · 148 . . . 162 · Next

AuthorMessage
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2031184 - Posted: 7 Feb 2020, 10:39:41 UTC - in response to Message 2031123.  

Tom did you do updates?

certainly on a fresh install you want to do all your system updates before messing about with this.

sudo apt update
sudo apt upgrade

reboot

try again.


I am currently downloading a fresh copy of Ubuntu 18.04.3 (still got a half hour left).

I will burn that onto my "install" flash drive with Rufus and then try again.

The weekend is coming up. I will decide if I want to jump completely out of the mining rack or try to get the standoffs to be "flat".

I have another (open box) B360-F Pro coming in. So I will be trying to beat my issues to death.

Tom
A proud member of the OFA (Old Farts Association).
ID: 2031184 · Report as offensive     Reply Quote
Darrell Wilcox Project Donor
Volunteer tester

Send message
Joined: 11 Nov 99
Posts: 303
Credit: 180,954,940
RAC: 118
Vietnam
Message 2031185 - Posted: 7 Feb 2020, 10:45:01 UTC - in response to Message 2031148.  

@ Jimbocous
Thu Feb 6 20:45:02 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64 Driver Version: 430.64 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+

One website I found claims a driver that is too new may cause unstable system.

"... If the graphics card is old, newer drivers can do more harm than good for system stability...."

Have you tried backing the driver down to 390 or some such? Or moving up to 440 (which I am using)?
ID: 2031185 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2031198 - Posted: 7 Feb 2020, 12:30:53 UTC - in response to Message 2031184.  

Tom did you do updates?

certainly on a fresh install you want to do all your system updates before messing about with this.

sudo apt update
sudo apt upgrade

reboot

try again.


I am currently downloading a fresh copy of Ubuntu 18.04.3 (still got a half hour left).

I will burn that onto my "install" flash drive with Rufus and then try again.

The weekend is coming up. I will decide if I want to jump completely out of the mining rack or try to get the standoffs to be "flat".

I have another (open box) B360-F Pro coming in. So I will be trying to beat my issues to death.

Tom




tlgalenson@moonshot4:~$ nvidia-smi
Fri Feb  7 06:14:19 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.48.02    Driver Version: 440.48.02    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  Off  | 00000000:01:00.0 Off |                  N/A |
| 56%   41C    P0    34W / 120W |    109MiB /  3019MiB |      7%      Default |
+-------------------------------+----------------------+----------------------+
|   1  P102-100            Off  | 00000000:02:00.0 Off |                  N/A |
| 54%   30C    P8     8W / 250W |      0MiB /  5059MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  P102-100            Off  | 00000000:03:00.0 Off |                  N/A |
|  0%   26C    P8    10W / 250W |      0MiB /  5059MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  P102-100            Off  | 00000000:06:00.0 Off |                  N/A |
|  0%   28C    P8    12W / 250W |      0MiB /  5059MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  GeForce GTX 106...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 51%   27C    P8     6W / 120W |      2MiB /  3019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  P102-100            Off  | 00000000:0D:00.0 Off |                  N/A |
| 55%   25C    P8     7W / 250W |      0MiB /  5059MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  P102-100            Off  | 00000000:0F:00.0 Off |                  N/A |
|  0%   28C    P8    11W / 250W |      0MiB /  5059MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  GeForce GTX 1070    Off  | 00000000:10:00.0 Off |                  N/A |
|  0%   31C    P8     4W / 151W |      2MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1595      G   /usr/lib/xorg/Xorg                            48MiB |
|    0      1779      G   /usr/bin/gnome-shell                          57MiB |
+-----------------------------------------------------------------------------+
tlgalenson@moonshot4:~$ 


I believe I can now accuse the last version of Lubuntu that I was running of limiting me to 6 gpus.
I have deleted that Linux ISO as well as everything else except a ISO for Part-Ed boot from my Linux ISO folder that I keep stuff on my Windows box.



By George I think I've got it.

Since the MB is setting on non-electrostatic plastic I am going to shut it down for now.

Step 2. Get it setup in a more permanent "home".

Tom
A proud member of the OFA (Old Farts Association).
ID: 2031198 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2031210 - Posted: 7 Feb 2020, 14:52:03 UTC - in response to Message 2031112.  
Last modified: 7 Feb 2020, 15:08:36 UTC

. . Redundant
ID: 2031210 · Report as offensive     Reply Quote
Profile Buckeye4LF Project Donor
Avatar

Send message
Joined: 19 Jun 00
Posts: 173
Credit: 54,916,209
RAC: 833
United States
Message 2031223 - Posted: 7 Feb 2020, 16:00:57 UTC

I got my new rig up and running last night and the rig is now starting to report results. I am surprised at how slow it is ~6,000 seconds per WU. My older window based machine is faster. Where should I check for issues? I have gone through this thread but do not see where I should start troubleshooting. I am very overwhelmed by the Linux learning curve.

Computer ID ID: 8894243
Name greg-MS-7C59
Avg. credit 223.02
Total credit 2,280
BOINC version 7.16.3
CPU AMD Ryzen Threadripper 3970X 32-Core Processor [Family 23 Model 49 Stepping 0] (64 processors)
GPU [2] NVIDIA GeForce RTX 2070 SUPER (4095MB) driver: 440.48 OpenCL: 1.2
OS Linux Ubuntu 19.10 [5.0.0-38-generic|libc 2.30 (Ubuntu GLIBC 2.30-0ubuntu2)]
Last contact 7 Feb 2020, 15:44:46 UTC

ID: 2031223 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2031231 - Posted: 7 Feb 2020, 16:26:47 UTC - in response to Message 2031223.  

I got my new rig up and running last night and the rig is now starting to report results. I am surprised at how slow it is ~6,000 seconds per WU. My older window based machine is faster. Where should I check for issues? I have gone through this thread but do not see where I should start troubleshooting. I am very overwhelmed by the Linux learning curve.

Computer ID ID: 8894243
Name greg-MS-7C59
Avg. credit 223.02
Total credit 2,280
BOINC version 7.16.3
CPU AMD Ryzen Threadripper 3970X 32-Core Processor [Family 23 Model 49 Stepping 0] (64 processors)
GPU [2] NVIDIA GeForce RTX 2070 SUPER (4095MB) driver: 440.48 OpenCL: 1.2
OS Linux Ubuntu 19.10 [5.0.0-38-generic|libc 2.30 (Ubuntu GLIBC 2.30-0ubuntu2)]
Last contact 7 Feb 2020, 15:44:46 UTC


part of your problem is that you are using the slow stock apps provided by the project.

a second problem is that you are likely using 100% of the CPU causing everything to slow down as every part of the computer is fighting over resources. reduce that in your compute preferences to something like 80-90%

I see no reason why you decided to install the stock BOINC repository install if you eventually intended to use the AIO setup. you should have just went straight for that.

I would remove the repository install completely, and just start back up on the AIO package. it contains the faster apps both on CPU and GPU. and after you get that working, come back and inquire about getting slightly more optimized apps running. there is a new AMD AVX app that is reported to run very well for Ryzen, as well as updated CUDA 10.2 apps that will perform better on your RTX cards.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2031231 · Report as offensive     Reply Quote
Profile Buckeye4LF Project Donor
Avatar

Send message
Joined: 19 Jun 00
Posts: 173
Credit: 54,916,209
RAC: 833
United States
Message 2031236 - Posted: 7 Feb 2020, 16:44:47 UTC - in response to Message 2031231.  
Last modified: 7 Feb 2020, 17:18:07 UTC

I did not do the repository install, I downloaded and unzipped the AIO. I thought it was a unzip and go, maybe I missed a step somewhere. Not sure what I did wrong....

CPU usage is set at 90%. I was told by someone to reinstall Linux and start over......not very practical

I am using the CUDA 102 already....

Maybe I need to reboot to restart the project, will try that when I get home from work. I can always go back to windows........

ID: 2031236 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031248 - Posted: 7 Feb 2020, 19:09:42 UTC

Question is whether the AIO is compatible with Ubuntu 19.10. There is compatibility with 19.04 as there is a zipped up version in the AIO for 19.04. But there is the possibility that dependencies are not met and somehow, someway, the host decided to install the repo version of BOINC when the AIO failed to run.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031248 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2031254 - Posted: 7 Feb 2020, 19:35:18 UTC - in response to Message 2031236.  
Last modified: 7 Feb 2020, 19:35:49 UTC

I did not do the repository install, I downloaded and unzipped the AIO. I thought it was a unzip and go, maybe I missed a step somewhere. Not sure what I did wrong....

CPU usage is set at 90%. I was told by someone to reinstall Linux and start over......not very practical

I am using the CUDA 102 already....

Maybe I need to reboot to restart the project, will try that when I get home from work. I can always go back to windows........


all of the tasks that you have returned have been the stock apps. I see that you have returned some CUDA 60 and SoG tasks (OpenCL). you have not returned any work from the special app and your CPU work is not using the apps provided in the AIO.

If you were using the AIO properly, you would have your CPU and GPU tasks labelled as "Anonymous Platform" but that is not the case. All you have to do is look at your task list and you'll see what we mean.

it's likely that you made some mistake in your app_info.xml file, and BOINC went back to the stock apps. post your app_info.xml file contents.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2031254 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031258 - Posted: 7 Feb 2020, 20:14:22 UTC

I missed this part of the post . . .
I am using the CUDA 102 already....


Likely he flubbed the edit of the app_info and it got thrown away. So that is why he got the stock apps.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031258 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2031265 - Posted: 7 Feb 2020, 20:36:56 UTC - in response to Message 2031021.  
Last modified: 7 Feb 2020, 20:55:38 UTC

You can see a number of Inconsistent results in this page;
1) The AIO doesn't use BOINC 7.16.3
2) Data directory: /home/greg, Should be "Data directory: /home/greg/BOINC" This implies the Home folder is being used as the Data folder, quite messy.
3) Creating new client state file, This usually means something BAD has happened
4) Could not open directory 'slots' from '/home/greg', Again, the Data directory Should be /home/greg/BOINC
From the above, it would appear he is Not running the BOINC AIO from the BOINC folder.

Linux is giving me issues. Does anyone Know what I am doing wrong? Here is a paste from the terminal window.....

greg@greg-MS-7C59:~$ boinc
06-Feb-2020 05:29:51 [---] cc_config.xml not found - using defaults
06-Feb-2020 05:29:51 [---] Starting BOINC client version 7.16.3 for x86_64-pc-linux-gnu
06-Feb-2020 05:29:51 [---] log flags: file_xfer, sched_ops, task
06-Feb-2020 05:29:51 [---] Libraries: libcurl/7.65.3 OpenSSL/1.1.1c zlib/1.2.11 libidn2/2.2.0 libpsl/0.20.2 (+libidn2/2.0.5) libssh/0.9.0/openssl/zlib nghttp2/1.39.2 librtmp/2.3
06-Feb-2020 05:29:51 [---] Data directory: /home/greg
06-Feb-2020 05:29:52 [---] CUDA: NVIDIA GPU 0: GeForce RTX 2070 SUPER (driver version 440.48, CUDA version 10.2, compute capability 7.5, 4096MB, 3968MB available, 9216 GFLOPS peak)
06-Feb-2020 05:29:52 [---] CUDA: NVIDIA GPU 1: GeForce RTX 2070 SUPER (driver version 440.48, CUDA version 10.2, compute capability 7.5, 4096MB, 3968MB available, 9216 GFLOPS peak)
06-Feb-2020 05:29:52 [---] Creating new client state file
06-Feb-2020 05:29:52 [---] [libc detection] gathered: 2.30, Ubuntu GLIBC 2.30-0ubuntu2
06-Feb-2020 05:29:52 [---] Host name: greg-MS-7C59
06-Feb-2020 05:29:52 [---] Processor: 64 AuthenticAMD AMD Ryzen Threadripper 3970X 32-Core Processor [Family 23 Model 49 Stepping 0]
06-Feb-2020 05:29:52 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd mba sev ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca
06-Feb-2020 05:29:52 [---] OS: Linux Ubuntu: Ubuntu 19.10 [5.0.0-38-generic|libc 2.30 (Ubuntu GLIBC 2.30-0ubuntu2)]
06-Feb-2020 05:29:52 [---] Memory: 125.85 GB physical, 2.00 GB virtual
06-Feb-2020 05:29:52 [---] Disk: 915.40 GB total, 857.67 GB free
06-Feb-2020 05:29:52 [---] Local time is UTC -6 hours
06-Feb-2020 05:29:52 [---] Last benchmark was 18298 days 11:29:51 ago
06-Feb-2020 05:29:52 [---] No general preferences found - using defaults
06-Feb-2020 05:29:52 [---] Preferences:
06-Feb-2020 05:29:52 [---] max memory usage when active: 64433.68 MB
06-Feb-2020 05:29:52 [---] max memory usage when idle: 115980.63 MB
06-Feb-2020 05:29:52 [---] max disk usage: 823.86 GB
06-Feb-2020 05:29:52 [---] don't use GPU while active
06-Feb-2020 05:29:52 [---] suspend work if non-BOINC CPU load exceeds 25%
06-Feb-2020 05:29:52 [---] (to change preferences, visit a project web site or select Preferences in the Manager)
06-Feb-2020 05:29:52 [---] Setting up project and slot directories
dir_open: Could not open directory 'slots' from '/home/greg'.
06-Feb-2020 05:29:52 [---] Checking active tasks
06-Feb-2020 05:29:52 [---] Setting up GUI RPC socket
I 06-Feb-2020 05:30:21 [---] GUI RPC bind to port 31416 failed: 98
06-Feb-2020 05:30:22 gstate.init() failed
Error Code: -180
Oh, and I have No Idea if the AIO will work with Ubuntu 19.10, Newer versions are REALLY GOOD at breaking things. It's best to stay with 18.04 for now.
ID: 2031265 · Report as offensive     Reply Quote
Profile Buckeye4LF Project Donor
Avatar

Send message
Joined: 19 Jun 00
Posts: 173
Credit: 54,916,209
RAC: 833
United States
Message 2031364 - Posted: 8 Feb 2020, 7:44:42 UTC - in response to Message 2031265.  
Last modified: 8 Feb 2020, 7:46:03 UTC

So, not sure what I did wrong but I think it is fixed. I was running the boinccmd command out of my AIO folder but it was executing in repository location var/lib/boinc. I let my tasks complete and deleted that folder. I then re-unpacked AIO and unzipped ubuntu19.04 from the subfolder. I have already reported anonymous platform results in less than a minute. I did change application file to run CUDA102 but left everything else alone. Thanks for everybody's comments, I am still learning nuances of linux but it looks like I am up and running now.

I did not revert back, running Ubuntu19.10

ID: 2031364 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2031396 - Posted: 8 Feb 2020, 12:41:44 UTC - in response to Message 2031364.  

This happens when you have more than one version of Boinc on your computer and that 'other' version is installed as a Service, such as the Repository version of Boinc. That's why it's recommended to the Remove any other version of Boinc from your machine before adding the BOINC-All-In-One. The Best way to remove the Repository version is to use the Package Manager to search for any file referencing boinc and choose to Remove them COMPLETELY. But Now, you have the BOINC files in your Home folder as well as the ones in the Repository Install. So, you must be careful not to delete any BOINC files from your Home /BOINC folder. It's easiest to zip the active BOINC-All-In-One folder, save it to a USB drive, then reinstall Ubuntu and Do Not Install the Repository version of Boinc. All you need to run Boinc is a compatible video driver, and the All-In-One folder.

Here is the Old instructions for the Berkeley version of BOINC, which is the version in the All-In-One folder; https://boinc.berkeley.edu/wiki/Installing_BOINC#The_Berkeley_Installer
"...everything related to the BOINC client is contained within that directory (/Home/BOINC), and you should always run the client and the manager from that working directory."
Pay special attention to the Part about Changing Directories before running boincmgr from the Terminal. It's easiest to run boincmgr by just doing as suggested, Open the BOINC folder, then Double click on boincmgr.
ID: 2031396 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14653
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2031398 - Posted: 8 Feb 2020, 12:48:45 UTC

Here's a suggestion.

Having spent last weekend building and then configuring a new Linux box to run the 'special sauce' app (thanks guys for that), I was mentally reviewing all my build/configure steps to make sure I hadn't missed anything.

And I had.

I believe that the 'Read Me' file (which I didn't read, of course) advises users not to restart processing from a checkpoint file. OK, I've now made the manual change suggested. But it would be far safer, far more robust, and probably quicker if you simply disabled the checkpointing code in the app - then the problem would go away completely, without requiring every user to read and act on the advice.

You need to keep the - slightly broken - progress reporting code, of course, but just don't write a 'state.sah' file. That's all.
ID: 2031398 · Report as offensive     Reply Quote
Profile tazzduke
Volunteer tester

Send message
Joined: 15 Sep 07
Posts: 190
Credit: 28,269,068
RAC: 5
Australia
Message 2031411 - Posted: 8 Feb 2020, 14:32:40 UTC - in response to Message 2031398.  

Greetings All

I can confirm that I used the 19.04 boinc (AIO folder) on a Kubuntu 19.10 and also a Lubuntu 19.10 and noticed no problems, well there was no errors in the event log at BOINC startup.

Completed a fair few tasks with no know errors as well.

I was using these versions of Ubuntu back in November, but have now gone back to running Linux Mint 19.3 on my 2 x Linux crunchers.

Turns out I still like the Mint lol.

Regards
ID: 2031411 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031425 - Posted: 8 Feb 2020, 15:53:44 UTC - in response to Message 2031398.  

Here's a suggestion.

Having spent last weekend building and then configuring a new Linux box to run the 'special sauce' app (thanks guys for that), I was mentally reviewing all my build/configure steps to make sure I hadn't missed anything.

And I had.

I believe that the 'Read Me' file (which I didn't read, of course) advises users not to restart processing from a checkpoint file. OK, I've now made the manual change suggested. But it would be far safer, far more robust, and probably quicker if you simply disabled the checkpointing code in the app - then the problem would go away completely, without requiring every user to read and act on the advice.

You need to keep the - slightly broken - progress reporting code, of course, but just don't write a 'state.sah' file. That's all.

Great suggestion Richard, need to mention it to the GPUUG developers to hunt out the checkpointing code in the special app and disable it. Then there would be no need to set a checkpoint interval longer than the normal crunching time for gpu tasks.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031425 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2031436 - Posted: 8 Feb 2020, 16:25:03 UTC

On my Intel MSI B360-F Pro MB I just turned off "cpu virtualization" and severe lagging behavior went away.
I had shutdown the system and swapped in 3 gtx 1600 Supers for some non-P100-102 gpus. And when I rebooted it was slow and laggy.

I have all the security patches up to date on my freshly downloaded/installed Ubuntu 18.x.x(?) but until I toggled that off (why was it on?<shrug>) the problem persisted through multiple boots.

Tom
A proud member of the OFA (Old Farts Association).
ID: 2031436 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031437 - Posted: 8 Feb 2020, 16:26:35 UTC - in response to Message 2031436.  
Last modified: 8 Feb 2020, 16:30:41 UTC

Wasn't it determined that those mining cards are only running at X1 and that is probably why the system got laggy.
Nevermind. I see I got the swap direction reversed.
OK, really confused. Did you mean you turned off cpu hyperthreading, or that you turned off VT or virtualization? Probably interfering with the Above4G decoding.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031437 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2031439 - Posted: 8 Feb 2020, 16:34:10 UTC
Last modified: 8 Feb 2020, 16:34:55 UTC

I think you should go back into the BIOS and set all slots to gen 3 if you can. There’s usually a setting specifically for the main x16 slot, as well as a setting for the other slots through the chipset.

In addition to the link speed query you’re running, id like to see a normal nvidia-smi command run to see GPU utilization while its running tasks on all GPUs. Same with the Link speed query, make sure you’re always posting the output from when it’s actually running and not just sitting idle at the desktop. Power saving features will reduce the link speed if the card is not active.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2031439 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2031441 - Posted: 8 Feb 2020, 16:40:41 UTC - in response to Message 2031437.  

Wasn't it determined that those mining cards are only running at X1 and that is probably why the system got laggy.
Nevermind. I see I got the swap direction reversed.
OK, really confused. Did you mean you turned off cpu hyperthreading, or that you turned off VT or virtualization? Probably interfering with the Above4G decoding.


I guess he means VT/virtualization, since he’s still got HT enabled (reporting 16 CPUs)

I’m not sure why virtualization would matter for the problem he was experiencing, unless it’s some weird bug in the BIOS. But he mentioned in another thread that he changed another setting at the same time (turned mining mode off) so when you change 2 variables at once and observe a change it’s hard to say which one really caused the change.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2031441 · Report as offensive     Reply Quote
Previous · 1 . . . 142 · 143 · 144 · 145 · 146 · 147 · 148 . . . 162 · Next

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.