Setting up Linux to crunch CUDA90 and above for Windows users

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 132 · 133 · 134 · 135 · 136 · 137 · 138 . . . 162 · Next

AuthorMessage
elec999 Project Donor

Send message
Joined: 24 Nov 02
Posts: 375
Credit: 416,969,548
RAC: 141
Canada
Message 2011909 - Posted: 14 Sep 2019, 1:45:47 UTC

Is my 2080 system running fine? Seems kinda of slow...

ID: 8811367
Details | Tasks
Cross-project stats:
BOINCstats.com Free-DC 2080 home 33,915.80 407,519 7.14.2 GenuineIntel
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz [Family 6 Model 58 Stepping 9]
(8 processors) NVIDIA GeForce RTX 2080 (4095MB) driver: 430.40 OpenCL: 1.2 Linux Ubuntu
Ubuntu 19.04 [5.0.0-27-generic|libc 2.29 (Ubuntu GLIBC 2.29-0ubuntu2)] 14 Sep 2019, 1:40:38 UTC
ID: 2011909 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2011913 - Posted: 14 Sep 2019, 2:57:40 UTC - in response to Message 2011909.  
Last modified: 14 Sep 2019, 2:58:18 UTC

Is my 2080 system running fine? Seems kinda of slow...

ID: 8811367
Details | Tasks
Cross-project stats:
BOINCstats.com Free-DC 2080 home 33,915.80 407,519 7.14.2 GenuineIntel
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz [Family 6 Model 58 Stepping 9]
(8 processors) NVIDIA GeForce RTX 2080 (4095MB) driver: 430.40 OpenCL: 1.2 Linux Ubuntu
Ubuntu 19.04 [5.0.0-27-generic|libc 2.29 (Ubuntu GLIBC 2.29-0ubuntu2)] 14 Sep 2019, 1:40:38 UTC


what is your basis for thinking it's slow?

I compared your last 250 tasks to other systems and it looks fine to me. roughly the same run times and credit awarded as other 2080s running the same type of tasks
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2011913 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2012079 - Posted: 15 Sep 2019, 18:43:18 UTC - in response to Message 2011913.  

Is my 2080 system running fine? Seems kinda of slow...

ID: 8811367
Details | Tasks
Cross-project stats:
BOINCstats.com Free-DC 2080 home 33,915.80 407,519 7.14.2 GenuineIntel
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz [Family 6 Model 58 Stepping 9]
(8 processors) NVIDIA GeForce RTX 2080 (4095MB) driver: 430.40 OpenCL: 1.2 Linux Ubuntu
Ubuntu 19.04 [5.0.0-27-generic|libc 2.29 (Ubuntu GLIBC 2.29-0ubuntu2)] 14 Sep 2019, 1:40:38 UTC


what is your basis for thinking it's slow?

I compared your last 250 tasks to other systems and it looks fine to me. roughly the same run times and credit awarded as other 2080s running the same type of tasks


+1

A cursory glance at Ian's mixed 2080Ti/2080 system seems to show that his 2080's are running no more than 1 or 2 seconds faster than yours, if that.

Tom
A proud member of the OFA (Old Farts Association).
ID: 2012079 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2012731 - Posted: 21 Sep 2019, 15:17:32 UTC
Last modified: 21 Sep 2019, 15:33:48 UTC

Well, it seems more tinkering is just an annoying Error Fest where most attempts end in compiling errors. It's a major challenge to even get wxWidgets to compile by itself, much less to get Boinc and wxWidgets to compile together. Seems the only pair that works is wxWidgets 2.8.12 & 6.10.37, and even then you have to add a file to the Makefile or that too will end in an Error. What you get is an App that works for the most part using SSL 1.0 and libcurl3 in 18.04 but will not work in the newer systems. It really doesn't seem worth the effort, and mostly a waste of time. Speaking of time, I need to test this eBayed Power Supply that arrived with just Two cables, none of which was even a main power cable. Fortunately I have spare cables that will work and it was pretty cheap for an EVGA 850BQ. I actually bought some cables from an EVGA 850GQ a few months ago to work on my EVGA 650BQ & 750B2, now, I know where the cables came from...
...could somebody who knows what they're doing just compile this thing in ubuntu 18.04 x64 for me?
Well, it doesn't appear to work with OpenSSL 1.1, but, 18.04 can use either. It also needs libcurl-openssl1.0-dev, but that's also in 18.04...just don't try it in 19.04. Sometimes the menu bar goes hiding too, but it does seem to work,
Operating System: Linux 5.0.0-27-generic
BOINC version: 6.10.37

Fri 20 Sep 2019 06:10:01 PM EDT Starting BOINC client version 6.10.37 for x86_64-pc-linux-gnu
Fri 20 Sep 2019 06:10:01 PM EDT log flags: file_xfer, sched_ops, task, sched_op_debug
Fri 20 Sep 2019 06:10:01 PM EDT Libraries: libcurl/7.58.0 OpenSSL/1.0.2n zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3
Fri 20 Sep 2019 06:10:01 PM EDT Data directory: /home/tbar/BOINC-9629
Fri 20 Sep 2019 06:10:01 PM EDT OS: Linux: 5.0.0-27-generic
Fri 20 Sep 2019 06:10:01 PM EDT NVIDIA GPU 0: GeForce GTX 750 Ti (driver version unknown, CUDA version 10020, compute capability 5.0, 2001MB, 89 GFLOPS peak)
Fri 20 Sep 2019 06:10:01 PM EDT SETI@home Found app_info.xml; using anonymous platform
Fri 20 Sep 2019 06:10:32 PM EDT SETI@home URL http://setiathome.berkeley.edu/; Computer ID 6979629; resource share 100
Maybe a little more tinkering...
ID: 2012731 · Report as offensive     Reply Quote
elec999 Project Donor

Send message
Joined: 24 Nov 02
Posts: 375
Credit: 416,969,548
RAC: 141
Canada
Message 2014279 - Posted: 5 Oct 2019, 17:19:25 UTC

New problems for me. My MBv8_8.22r3711_sse41_amd_x86_64-pc-linux-gnu file somehow went missing on my amd machines, I didnt touch anything.

Also I keep on getting syntex error in app_info now.

<app_info>
<app>
<name>setiathome_v8</name>
</app>
<file_info>
<name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101</name>
<executable/>
</file_info>
<app_version>
<app_name>setiathome_v8</app_name>
<platform>x86_64-pc-linux-gnu</platform>
<version_num>801</version_num>
<plan_class>cuda90</plan_class>
<cmdline>-nobs</cmdline>
<coproc>
<type>NVIDIA</type>
<count>1</count>
</coproc>
<avg_ncpus>0.1</avg_ncpus>
<max_ncpus>0.1</max_ncpus>
<file_ref>
<file_name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>astropulse_v7</name>
</app>
<file_info>
<name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</name>
<executable/>
</file_info>
<file_info>
<name>AstroPulse_Kernels_r2751.cl</name>
</file_info>
<file_info>
<name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</name>
</file_info>
<app_version>
<app_name>astropulse_v7</app_name>
<platform>x86_64-pc-linux-gnu</platform>
<version_num>708</version_num>
<plan_class>opencl_nvidia_100</plan_class>
<coproc>
<type>NVIDIA</type>
<count>1</count>
</coproc>
<avg_ncpus>0.1</avg_ncpus>
<max_ncpus>0.1</max_ncpus>
<file_ref>
<file_name>astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>AstroPulse_Kernels_r2751.cl</file_name>
</file_ref>
<file_ref>
<file_name>ap_cmdline_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100.txt</file_name>
<open_name>ap_cmdline.txt</open_name>
</file_ref>
</app_version>
<app>
<name>setiathome_v8</name>
</app>
<file_info>
<name>MBv8_8.22r3711_sse41_amd_x86_64-pc-linux-gnu</name>
<executable/>
</file_info>
<app_version>
<app_name>setiathome_v8</app_name>
<platform>x86_64-pc-linux-gnu</platform>
<version_num>800</version_num>
<file_ref>
<file_name>MBv8_8.22r3711_sse41_amd_x86_64-pc-linux-gnu/file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>astropulse_v7</name>
</app>
<file_info>
<name>ap_7.05r2728_sse3_linux64</name>
<executable/>
</file_info>
<app_version>
<app_name>astropulse_v7</app_name>
<version_num>704</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<plan_class></plan_class>
<file_ref>
<file_name>ap_7.05r2728_sse3_linux64</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>
ID: 2014279 · Report as offensive     Reply Quote
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 2014281 - Posted: 5 Oct 2019, 17:38:21 UTC - in response to Message 2014279.  

There's an error on this line:
<file_name>MBv8_8.22r3711_sse41_amd_x86_64-pc-linux-gnu/file_name>

So says this page: https://www.xmlvalidation.com/index.php
ID: 2014281 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2014283 - Posted: 5 Oct 2019, 18:27:29 UTC - in response to Message 2014281.  

Really cool tool there. Didn't know of it. Bookmarked now. Thanks. I always did my XML sanity check with a browser but that has some limitations. I will be using this one now.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2014283 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2014290 - Posted: 5 Oct 2019, 21:15:14 UTC

NIce tool to play with. Love new toys.
ID: 2014290 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2014300 - Posted: 5 Oct 2019, 22:28:19 UTC - in response to Message 2014281.  
Last modified: 5 Oct 2019, 22:31:38 UTC

There's an error on this line:<file_name>MBv8_8.22r3711_sse41_amd_x86_64-pc-linux-gnu/file_name>
. . It's missing the angle bracket (less than sign) before the /file_name ...

Stephen

..
ID: 2014300 · Report as offensive     Reply Quote
elec999 Project Donor

Send message
Joined: 24 Nov 02
Posts: 375
Credit: 416,969,548
RAC: 141
Canada
Message 2014320 - Posted: 6 Oct 2019, 0:08:49 UTC - in response to Message 2014300.  

There's an error on this line:<file_name>MBv8_8.22r3711_sse41_amd_x86_64-pc-linux-gnu/file_name>
. . It's missing the angle bracket (less than sign) before the /file_name ...

Stephen

..

you guys always help, me the idiot out. Thanks!!
ID: 2014320 · Report as offensive     Reply Quote
Profile Tom M
Volunteer tester

Send message
Joined: 28 Nov 02
Posts: 5124
Credit: 276,046,078
RAC: 462
Message 2014335 - Posted: 6 Oct 2019, 3:42:28 UTC - in response to Message 2014320.  

There's an error on this line:<file_name>MBv8_8.22r3711_sse41_amd_x86_64-pc-linux-gnu/file_name>
. . It's missing the angle bracket (less than sign) before the /file_name ...

Stephen

..

you guys always help, me the idiot out. Thanks!!


Finding syntax errors without a compiler/verifier pointing the way is VERY hard.
All of us are idiot's at that level!

Tom
A proud member of the OFA (Old Farts Association).
ID: 2014335 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34744
Credit: 261,360,520
RAC: 489
Australia
Message 2014355 - Posted: 6 Oct 2019, 10:17:43 UTC - in response to Message 2014335.  

There's an error on this line:<file_name>MBv8_8.22r3711_sse41_amd_x86_64-pc-linux-gnu/file_name>
. . It's missing the angle bracket (less than sign) before the /file_name ...

Stephen

..
you guys always help, me the idiot out. Thanks!!
Finding syntax errors without a compiler/verifier pointing the way is VERY hard.
All of us are idiot's at that level!

Tom
Or just plain old rushing what a lot of us old fellows learned over many years will take plenty of time when editing it (I always make a duplicate copy before editing anyway just in case I miss something). ;-)

Cheers.
ID: 2014355 · Report as offensive     Reply Quote
elec999 Project Donor

Send message
Joined: 24 Nov 02
Posts: 375
Credit: 416,969,548
RAC: 141
Canada
Message 2014580 - Posted: 7 Oct 2019, 23:39:30 UTC

Not sure what happened, but this machine dropped in RAC Like crazy. Would it make any sense to switch the 2070 gpus to the amd 1600x, or the i7-3770k is good enough. Not sure where the bottle neck is here.
ID: 8765031
Details | Tasks
Cross-project stats:
BOINCstats.com Free-DC 1070ti home 39,837.69 8,938,679 7.14.2 GenuineIntel
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz [Family 6 Model 58 Stepping 9]
(8 processors) [2] NVIDIA GeForce RTX 2070 (4095MB) driver: 418.56 Linux Ubuntu
Ubuntu 19.04 [5.0.0-27-generic|libc 2.29 (Ubuntu GLIBC 2.29-0ubuntu2)] 7 Oct 2019, 22:48:28 UTC
ID: 2014580 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2014581 - Posted: 8 Oct 2019, 0:08:01 UTC - in response to Message 2014580.  

Not sure what happened, but this machine dropped in RAC Like crazy. Would it make any sense to switch the 2070 gpus to the amd 1600x, or the i7-3770k is good enough. Not sure where the bottle neck is here.
ID: 8765031
Details | Tasks
Cross-project stats:
BOINCstats.com Free-DC 1070ti home 39,837.69 8,938,679 7.14.2 GenuineIntel
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz [Family 6 Model 58 Stepping 9]
(8 processors) [2] NVIDIA GeForce RTX 2070 (4095MB) driver: 418.56 Linux Ubuntu
Ubuntu 19.04 [5.0.0-27-generic|libc 2.29 (Ubuntu GLIBC 2.29-0ubuntu2)] 7 Oct 2019, 22:48:28 UTC


. . A whole lot of things I suspect. First, lost time because of the server outage over the weekend. Then the fact that you aborted 45 tasks which will throttle your work allocation but mainly because you have 2 GPU cards of different types but only the 1080 seems to be doing any work. This leads me to suspect you do not have the 'use_all_gpus" set in your cc_config.xml?

Stephen

? ?
ID: 2014581 · Report as offensive     Reply Quote
elec999 Project Donor

Send message
Joined: 24 Nov 02
Posts: 375
Credit: 416,969,548
RAC: 141
Canada
Message 2014732 - Posted: 9 Oct 2019, 12:45:44 UTC - in response to Message 2014581.  

Not sure what happened, but this machine dropped in RAC Like crazy. Would it make any sense to switch the 2070 gpus to the amd 1600x, or the i7-3770k is good enough. Not sure where the bottle neck is here.
ID: 8765031
Details | Tasks
Cross-project stats:
BOINCstats.com Free-DC 1070ti home 39,837.69 8,938,679 7.14.2 GenuineIntel
Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz [Family 6 Model 58 Stepping 9]
(8 processors) [2] NVIDIA GeForce RTX 2070 (4095MB) driver: 418.56 Linux Ubuntu
Ubuntu 19.04 [5.0.0-27-generic|libc 2.29 (Ubuntu GLIBC 2.29-0ubuntu2)] 7 Oct 2019, 22:48:28 UTC


. . A whole lot of things I suspect. First, lost time because of the server outage over the weekend. Then the fact that you aborted 45 tasks which will throttle your work allocation but mainly because you have 2 GPU cards of different types but only the 1080 seems to be doing any work. This leads me to suspect you do not have the 'use_all_gpus" set in your cc_config.xml?

Stephen

? ?


I checked the 'use_all_gpus file and it is set to use both gpus, I also check the nvidia-smi and both gpus are doing work.
ID: 2014732 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2014734 - Posted: 9 Oct 2019, 12:58:31 UTC - in response to Message 2014732.  
Last modified: 9 Oct 2019, 12:59:04 UTC

You have a lot of aborted WU with time limit exceed, that could cause the drop on the RAC since all the crunching time was wasted.

That happening on the 8 of oct.
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 11198.56 (550134.14G/49.13G)</message>
]]>


Need to check the usual stuff to see if you find something who causes that.

My clue, a OS update without a host reset.

Suggestion, stop all crunching process before update the OS and reboot the host after done, even when it shows is not needed.

Apparently now is running fine.
ID: 2014734 · Report as offensive     Reply Quote
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 2014805 - Posted: 9 Oct 2019, 23:21:46 UTC - in response to Message 2014732.  
Last modified: 9 Oct 2019, 23:27:41 UTC


Not sure what happened, but this machine dropped in RAC Like crazy. Would it make any sense to switch the 2070 gpus to the amd 1600x, or the i7-3770k is good enough. Not sure where the bottle neck is here.

. . A whole lot of things I suspect. First, lost time because of the server outage over the weekend. Then the fact that you aborted 45 tasks which will throttle your work allocation but mainly because you have 2 GPU cards of different types but only the 1080 seems to be doing any work. This leads me to suspect you do not have the 'use_all_gpus" set in your cc_config.xml?
Stephen

I checked the 'use_all_gpus file and it is set to use both gpus, I also check the nvidia-smi and both gpus are doing work.

. . OK, I only looked at the first few pages of your results and found only results from the 1080. Well that theory is shot full of holes ... :)

. . For what it is worth my RACs dropped after the outage as expected, but they have continued to drop each day since so Credit Screw is just messing with us all too ...

Stephen

? ?
ID: 2014805 · Report as offensive     Reply Quote
elec999 Project Donor

Send message
Joined: 24 Nov 02
Posts: 375
Credit: 416,969,548
RAC: 141
Canada
Message 2014817 - Posted: 10 Oct 2019, 2:09:50 UTC - in response to Message 2014734.  

You have a lot of aborted WU with time limit exceed, that could cause the drop on the RAC since all the crunching time was wasted.

That happening on the 8 of oct.
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 11198.56 (550134.14G/49.13G)</message>
]]>


Need to check the usual stuff to see if you find something who causes that.

My clue, a OS update without a host reset.

Suggestion, stop all crunching process before update the OS and reboot the host after done, even when it shows is not needed.

Apparently now is running fine.


These systems once I setup, I leave at the DC and dont touch them. I didnt run any updates nor leave auto updates on.
ID: 2014817 · Report as offensive     Reply Quote
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1853
Credit: 268,616,081
RAC: 1,349
United States
Message 2014821 - Posted: 10 Oct 2019, 2:42:33 UTC - in response to Message 2014817.  

You have a lot of aborted WU with time limit exceed, that could cause the drop on the RAC since all the crunching time was wasted.


These systems once I setup, I leave at the DC and dont touch them. I didnt run any updates nor leave auto updates on.

Running Ubuntu 18.04 here. There is definitely something going on that intermittently causes those "timelimit exceeded" problems. I do see it from time to time, and a reboot solves the issue until it recurs. What I have seen is that before the first of them, I see a single WU that aborted with a compute error on the same GPU that then begins throwing "exceeded" errors. My suspicion is that something in the app or WU causes the driver for that GPU to explode, requiring a reboot to regain sanity. But I have no way to prove this. The other thing that may contribute, on NV at least, is borderline power to GPUs causing them to generate communications errors on the bus. NVs seem very susceptible to this. I just redid my power connections on the machine that was doing this most, and haven't seen further errors since. Getting some 8-pin CPU to GPU adapters really helped this as opposed to the 4-pin Molex splitters. I'm running power pig GTX980s, ymmv.
ID: 2014821 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2014824 - Posted: 10 Oct 2019, 3:04:22 UTC

The Linux equivalent of the Windows TDR error and driver reset is what causes either one or all gpus to go missing and generate those errors. The only solution is a reboot. Also best to clear out the .nv Compute Cache primitives in case they got corrupted and regenerate them upon restart.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2014824 · Report as offensive     Reply Quote
Previous · 1 . . . 132 · 133 · 134 · 135 · 136 · 137 · 138 . . . 162 · Next

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.