Setting up Linux to crunch CUDA90 and above for Windows users

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 146 · 147 · 148 · 149 · 150 · 151 · 152 . . . 162 · Next

AuthorMessage
Oddbjornik Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 220
Credit: 349,610,548
RAC: 1,728
Norway
Message 2031716 - Posted: 10 Feb 2020, 1:16:52 UTC - in response to Message 2031713.  

Seemed 440 drivers worked last round
Yep, but you're on 435.21.
ID: 2031716 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031718 - Posted: 10 Feb 2020, 1:18:19 UTC - in response to Message 2031686.  

I was never able to replicate that on my systems. *shrug*
I can pop one off in a heartbeat. All I have to do is restart the 9 GPU GDDR5 machine and usually at least one GPU will Miss All Pulses on the first task.
*shrug*

I need clarification. Does the machine have to be power restarted or simply restarting BOINC?
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031718 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031719 - Posted: 10 Feb 2020, 1:21:03 UTC - in response to Message 2031713.  

I noticed that, but thanks for checking. I was going to let it run for a day as is to make sure it was not random. Seemed 440 drivers worked last round

I sent you several PM's apprising you of the problem also with your current drivers.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031719 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2031720 - Posted: 10 Feb 2020, 1:25:35 UTC - in response to Message 2031719.  

I noticed that, but thanks for checking. I was going to let it run for a day as is to make sure it was not random. Seemed 440 drivers worked last round

I sent you several PM's apprising you of the problem also with your current drivers.


He has 2 different Linux hosts one with [2] NVIDIA GeForce RTX 2070 SUPER (4095MB) driver: 435.21 who last contact was today and the other with [2] NVIDIA GeForce RTX 2070 SUPER (4095MB) driver: 440.48 OpenCL: 1.2 who last contact was 8 fev. That is the problem.
ID: 2031720 · Report as offensive     Reply Quote
Profile Buckeye4LF Project Donor
Avatar

Send message
Joined: 19 Jun 00
Posts: 173
Credit: 54,916,209
RAC: 833
United States
Message 2031722 - Posted: 10 Feb 2020, 1:37:55 UTC - in response to Message 2031720.  

no, i reformatted and reinstalled Linux those are the same machine but with different loads of linux. The drivers that came with Mint were not the same as Ubuntu.

ID: 2031722 · Report as offensive     Reply Quote
Profile Buckeye4LF Project Donor
Avatar

Send message
Joined: 19 Jun 00
Posts: 173
Credit: 54,916,209
RAC: 833
United States
Message 2031724 - Posted: 10 Feb 2020, 1:38:50 UTC - in response to Message 2031719.  

I reverted back to cuda90 for now, do not want to mess with video drivers tonight

ID: 2031724 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031725 - Posted: 10 Feb 2020, 1:40:17 UTC - in response to Message 2031724.  

Can you read your PM's and reply please.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031725 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2031726 - Posted: 10 Feb 2020, 1:45:06 UTC

About the removal of the checkpoint.

FYI I made some changes in the code and thanks to Ian help with the compile process, we have an experimental version of the 10.2 mutex builds running with the checkpoint removed. Will wait Richard wake up to guide us to how to test to see if all is working.

Fingers crossed while we wait the beer if flowing.
ID: 2031726 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2031728 - Posted: 10 Feb 2020, 2:00:35 UTC - in response to Message 2031718.  
Last modified: 10 Feb 2020, 2:09:12 UTC

I was never able to replicate that on my systems. *shrug*
I can pop one off in a heartbeat. All I have to do is restart the 9 GPU GDDR5 machine and usually at least one GPU will Miss All Pulses on the first task.
*shrug*
I need clarification. Does the machine have to be power restarted or simply restarting BOINC?
Restart/Reboot the Machine. It also needs to be a machine with mostly GDDR5 GPUs., BTW, Petri isn't using any GPUs with GDDR5 vram. I bought My 1080Ti because another user wasn't having the Problem with His 1080Ti and I had tried everything else. He also had the Same Problem with His 750Ti Missing Pulses where his 1080Ti doesn't, https://setiathome.berkeley.edu/show_host_detail.php?hostid=8424399

I'd suggest a machine using 1070s and lower for the test, with at least one GPU not connected to a monitor, it's usually the GPU(s) not connected to a monitor that have the problem immediately. Sometimes on a Mac the GPU connected to the monitor can run hours, or days, before it starts missing All pulses.
ID: 2031728 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031731 - Posted: 10 Feb 2020, 2:16:24 UTC - in response to Message 2031728.  

OK, thanks for the instructions. I wanted to test on a machine with your necessary criteria. I have a host with a 1070 Ti and a 1070 which meet the criteria of no monitor attached. The 2080 is attached to the monitor. I did not find any missing pulses on the 1070 upon restarting BOINC. But that was after updating the drivers to the latest 440.59 and restarting the computer. This task was started right after the host was restarted.

https://setiathome.berkeley.edu/result.php?result_name=blc75_2bit_guppi_58693_08905_HIP98801_0143.7855.818.22.45.224.vlar_0


In cudaAcc_initializeDevice(): Boinc passed DevPref 3
setiathome_CUDA: CUDA Device 3 specified, checking...
Device 3: GeForce GTX 1070 is okay
SETI@home using CUDA accelerated device GeForce GTX 1070
Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1

Spike count: 1
Autocorr count: 0
Pulse count: 10
Triplet count: 1
Gaussian count: 0

09-Feb-2020 15:17:16 [---] Starting BOINC spoofed client version 7.16.3 for x86_64-pc-linux-gnu
~ {snip}
~ {snip}
09-Feb-2020 15:17:17 [SETI@home] URL http://setiathome.berkeley.edu/; Computer ID 6279633; resource share 1000
09-Feb-2020 15:17:17 [---] Setting up GUI RPC socket
09-Feb-2020 15:17:17 [---] Checking presence of 8203 project files
09-Feb-2020 15:17:17 Initialization completed
09-Feb-2020 15:17:17 [SETI@home] Starting task blc75_2bit_guppi_58693_08905_HIP98801_0143.7855.818.22.45.224.vlar_0

Not seeing the issue on my GTX 1070.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031731 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2031732 - Posted: 10 Feb 2020, 2:19:41 UTC
Last modified: 10 Feb 2020, 2:22:54 UTC

My host has 2x 2070 + 2x1070 with no monitor attached to both 1070`s .

Never see that too. Will keep an eye on that.

Could be something related to the MAC hosts only? Who knows?
ID: 2031732 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2031733 - Posted: 10 Feb 2020, 2:22:33 UTC - in response to Message 2031731.  
Last modified: 10 Feb 2020, 2:24:41 UTC

How many times? Usually less than half of My cards show the problem after a reboot. I'd suggest trying it with just the lower end GPUs.

The Mining machine Is Not a Mac. Juan was the First to post about the problem long ago when he was just running 1070s in Linux. Remember that Juan?
ID: 2031733 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031734 - Posted: 10 Feb 2020, 2:24:11 UTC - in response to Message 2031733.  

Lowest end card I own is a 1070. Never noticed the problem ever after what . . . . . couple of years now.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031734 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2031735 - Posted: 10 Feb 2020, 2:25:50 UTC - in response to Message 2031733.  

How many times? Usually less than half of My cards show the problem after a reboot. I'd suggest trying it with just the lower end GPUs.

Just a question to be sure i look in the right place.

When you say reboot is a warm reboot of the host, a cold restart or just stop and reload the boinc itself?
ID: 2031735 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2031737 - Posted: 10 Feb 2020, 2:30:09 UTC - in response to Message 2031733.  
Last modified: 10 Feb 2020, 2:40:18 UTC

How many times? Usually less than half of My cards show the problem after a reboot. I'd suggest trying it with just the lower end GPUs.

The Mining machine Is Not a Mac. Juan was the First to post about the problem long ago when he was just running 1070s in Linux. Remember that Juan?

I remember something about a long time ago. But IIRC was related to the way the reschedule kills the process leaving a closed slot sometimes or was another program not remember. Was solved by adding a deleting round of the slots. Did you remember the post? Just to refresh my memory. You know i an old man who easy forget everything.
ID: 2031737 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2031739 - Posted: 10 Feb 2020, 2:36:31 UTC - in response to Message 2031735.  

How many times? Usually less than half of My cards show the problem after a reboot. I'd suggest trying it with just the lower end GPUs.

Just a question to be sure i look in the right place.

When you say reboot is a warm reboot of the host, a cold restart or just stop and reload the boinc itself?

He said a power restart of the host. Not just a restart of BOINC. He did not mention a complete cold boot from power switch off.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2031739 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 35274
Credit: 261,360,520
RAC: 489
Australia
Message 2031740 - Posted: 10 Feb 2020, 2:45:38 UTC - in response to Message 2031739.  

How many times? Usually less than half of My cards show the problem after a reboot. I'd suggest trying it with just the lower end GPUs.
Just a question to be sure i look in the right place.

When you say reboot is a warm reboot of the host, a cold restart or just stop and reload the boinc itself?
He said a power restart of the host. Not just a restart of BOINC. He did not mention a complete cold boot from power switch off.
I've never seen that problem on either of my 2 dual GPU rigs, but sudden power downs 1-2 secs after starting a task will produce a corrupt header error and that is all I get on restarting (though that can happen on either 1 of the cards).

Cheers.
ID: 2031740 · Report as offensive     Reply Quote
juan BFP Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 16 Mar 07
Posts: 9786
Credit: 572,710,851
RAC: 3,799
Panama
Message 2031742 - Posted: 10 Feb 2020, 2:50:05 UTC - in response to Message 2031740.  
Last modified: 10 Feb 2020, 2:51:01 UTC

How many times? Usually less than half of My cards show the problem after a reboot. I'd suggest trying it with just the lower end GPUs.
Just a question to be sure i look in the right place.

When you say reboot is a warm reboot of the host, a cold restart or just stop and reload the boinc itself?
He said a power restart of the host. Not just a restart of BOINC. He did not mention a complete cold boot from power switch off.
I've never seen that problem on either of my 2 dual GPU rigs, but sudden power downs 1-2 secs after starting a task will produce a corrupt header error and that is all I get on restarting (though that can happen on either 1 of the cards).

Cheers.

I trying to refresh my memory and remember, the problem i report only happening on my host when the 4 GPU (4x1070 at that time) where enabled and i use an slow HDD. The process was ended before the crunching programs was able to finish the slot housekeeping cleaning process at the end of it. IIRC only one other seti user has the same problem at that time and his host was very similar to mine.
ID: 2031742 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 2031745 - Posted: 10 Feb 2020, 3:17:36 UTC - in response to Message 2031742.  
Last modified: 10 Feb 2020, 3:20:59 UTC

This was when the problem was first being identified, at that point I thought it was just My machine. I later found my other machines had the same problem once I turned the monitors on and started actually using the machines.
juan BFP Message 1953194 - Posted: 1 Sep 2018, 11:37:03 UTC
Last modified: 1 Sep 2018, 11:44:32 UTC

Hi Tbar & Others. Looking the invalids on my host i see thisWU:

https://setiathome.berkeley.edu/workunit.php?wuid=3116569820

As you could see the Cuda 9.2 shows 0 Pulses and the others two shows 17...



How many times? Usually less than half of My cards show the problem after a reboot. I'd suggest trying it with just the lower end GPUs.
Just a question to be sure i look in the right place.

When you say reboot is a warm reboot of the host, a cold restart or just stop and reload the boinc itself?
He said a power restart of the host. Not just a restart of BOINC. He did not mention a complete cold boot from power switch off.
I've never seen that problem on either of my 2 dual GPU rigs, but sudden power downs 1-2 secs after starting a task will produce a corrupt header error and that is all I get on restarting (though that can happen on either 1 of the cards).

Cheers.

I trying to refresh my memory and remember, the problem i report only happening on my host when the 4 GPU (4x1070 at that time) where enabled and i use an slow HDD. The process was ended before the crunching programs was able to finish the slot housekeeping cleaning process at the end of it. IIRC only one other seti user has the same problem at that time and his host was very similar to mine.
ID: 2031745 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2031746 - Posted: 10 Feb 2020, 3:48:49 UTC - in response to Message 2031745.  

you also once blamed this issue on a web browser.

https://setiathome.berkeley.edu/forum_thread.php?id=81271&postid=1954273#1954273
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2031746 · Report as offensive     Reply Quote
Previous · 1 . . . 146 · 147 · 148 · 149 · 150 · 151 · 152 . . . 162 · Next

Message boards : Number crunching : Setting up Linux to crunch CUDA90 and above for Windows users


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.