Message boards :
Number crunching :
Setting up Linux to crunch CUDA90 and above for Windows users
Message board moderation
Previous · 1 . . . 146 · 147 · 148 · 149 · 150 · 151 · 152 . . . 162 · Next
Author | Message |
---|---|
Oddbjornik Send message Joined: 15 May 99 Posts: 220 Credit: 349,610,548 RAC: 1,728 |
|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I was never able to replicate that on my systems. *shrug*I can pop one off in a heartbeat. All I have to do is restart the 9 GPU GDDR5 machine and usually at least one GPU will Miss All Pulses on the first task. I need clarification. Does the machine have to be power restarted or simply restarting BOINC? Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I noticed that, but thanks for checking. I was going to let it run for a day as is to make sure it was not random. Seemed 440 drivers worked last round I sent you several PM's apprising you of the problem also with your current drivers. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I noticed that, but thanks for checking. I was going to let it run for a day as is to make sure it was not random. Seemed 440 drivers worked last round He has 2 different Linux hosts one with [2] NVIDIA GeForce RTX 2070 SUPER (4095MB) driver: 435.21 who last contact was today and the other with [2] NVIDIA GeForce RTX 2070 SUPER (4095MB) driver: 440.48 OpenCL: 1.2 who last contact was 8 fev. That is the problem. |
Buckeye4LF Send message Joined: 19 Jun 00 Posts: 173 Credit: 54,916,209 RAC: 833 |
no, i reformatted and reinstalled Linux those are the same machine but with different loads of linux. The drivers that came with Mint were not the same as Ubuntu. |
Buckeye4LF Send message Joined: 19 Jun 00 Posts: 173 Credit: 54,916,209 RAC: 833 |
I reverted back to cuda90 for now, do not want to mess with video drivers tonight |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Can you read your PM's and reply please. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
About the removal of the checkpoint. FYI I made some changes in the code and thanks to Ian help with the compile process, we have an experimental version of the 10.2 mutex builds running with the checkpoint removed. Will wait Richard wake up to guide us to how to test to see if all is working. Fingers crossed while we wait the beer if flowing. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Restart/Reboot the Machine. It also needs to be a machine with mostly GDDR5 GPUs., BTW, Petri isn't using any GPUs with GDDR5 vram. I bought My 1080Ti because another user wasn't having the Problem with His 1080Ti and I had tried everything else. He also had the Same Problem with His 750Ti Missing Pulses where his 1080Ti doesn't, https://setiathome.berkeley.edu/show_host_detail.php?hostid=8424399I need clarification. Does the machine have to be power restarted or simply restarting BOINC?I was never able to replicate that on my systems. *shrug*I can pop one off in a heartbeat. All I have to do is restart the 9 GPU GDDR5 machine and usually at least one GPU will Miss All Pulses on the first task. I'd suggest a machine using 1070s and lower for the test, with at least one GPU not connected to a monitor, it's usually the GPU(s) not connected to a monitor that have the problem immediately. Sometimes on a Mac the GPU connected to the monitor can run hours, or days, before it starts missing All pulses. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
OK, thanks for the instructions. I wanted to test on a machine with your necessary criteria. I have a host with a 1070 Ti and a 1070 which meet the criteria of no monitor attached. The 2080 is attached to the monitor. I did not find any missing pulses on the 1070 upon restarting BOINC. But that was after updating the drivers to the latest 440.59 and restarting the computer. This task was started right after the host was restarted. https://setiathome.berkeley.edu/result.php?result_name=blc75_2bit_guppi_58693_08905_HIP98801_0143.7855.818.22.45.224.vlar_0 In cudaAcc_initializeDevice(): Boinc passed DevPref 3 setiathome_CUDA: CUDA Device 3 specified, checking... Device 3: GeForce GTX 1070 is okay SETI@home using CUDA accelerated device GeForce GTX 1070 Unroll autotune 1. Overriding Pulse find periods per launch. Parameter -pfp set to 1 Spike count: 1 Autocorr count: 0 Pulse count: 10 Triplet count: 1 Gaussian count: 0 09-Feb-2020 15:17:16 [---] Starting BOINC spoofed client version 7.16.3 for x86_64-pc-linux-gnu ~ {snip} ~ {snip} 09-Feb-2020 15:17:17 [SETI@home] URL http://setiathome.berkeley.edu/; Computer ID 6279633; resource share 1000 09-Feb-2020 15:17:17 [---] Setting up GUI RPC socket 09-Feb-2020 15:17:17 [---] Checking presence of 8203 project files 09-Feb-2020 15:17:17 Initialization completed 09-Feb-2020 15:17:17 [SETI@home] Starting task blc75_2bit_guppi_58693_08905_HIP98801_0143.7855.818.22.45.224.vlar_0 Not seeing the issue on my GTX 1070. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
My host has 2x 2070 + 2x1070 with no monitor attached to both 1070`s . Never see that too. Will keep an eye on that. Could be something related to the MAC hosts only? Who knows? |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
How many times? Usually less than half of My cards show the problem after a reboot. I'd suggest trying it with just the lower end GPUs. The Mining machine Is Not a Mac. Juan was the First to post about the problem long ago when he was just running 1070s in Linux. Remember that Juan? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Lowest end card I own is a 1070. Never noticed the problem ever after what . . . . . couple of years now. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
How many times? Usually less than half of My cards show the problem after a reboot. I'd suggest trying it with just the lower end GPUs. Just a question to be sure i look in the right place. When you say reboot is a warm reboot of the host, a cold restart or just stop and reload the boinc itself? |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
How many times? Usually less than half of My cards show the problem after a reboot. I'd suggest trying it with just the lower end GPUs. I remember something about a long time ago. But IIRC was related to the way the reschedule kills the process leaving a closed slot sometimes or was another program not remember. Was solved by adding a deleting round of the slots. Did you remember the post? Just to refresh my memory. You know i an old man who easy forget everything. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
How many times? Usually less than half of My cards show the problem after a reboot. I'd suggest trying it with just the lower end GPUs. He said a power restart of the host. Not just a restart of BOINC. He did not mention a complete cold boot from power switch off. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Wiggo Send message Joined: 24 Jan 00 Posts: 36791 Credit: 261,360,520 RAC: 489 |
I've never seen that problem on either of my 2 dual GPU rigs, but sudden power downs 1-2 secs after starting a task will produce a corrupt header error and that is all I get on restarting (though that can happen on either 1 of the cards).He said a power restart of the host. Not just a restart of BOINC. He did not mention a complete cold boot from power switch off.How many times? Usually less than half of My cards show the problem after a reboot. I'd suggest trying it with just the lower end GPUs.Just a question to be sure i look in the right place. Cheers. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
I've never seen that problem on either of my 2 dual GPU rigs, but sudden power downs 1-2 secs after starting a task will produce a corrupt header error and that is all I get on restarting (though that can happen on either 1 of the cards).He said a power restart of the host. Not just a restart of BOINC. He did not mention a complete cold boot from power switch off.How many times? Usually less than half of My cards show the problem after a reboot. I'd suggest trying it with just the lower end GPUs.Just a question to be sure i look in the right place. I trying to refresh my memory and remember, the problem i report only happening on my host when the 4 GPU (4x1070 at that time) where enabled and i use an slow HDD. The process was ended before the crunching programs was able to finish the slot housekeeping cleaning process at the end of it. IIRC only one other seti user has the same problem at that time and his host was very similar to mine. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
This was when the problem was first being identified, at that point I thought it was just My machine. I later found my other machines had the same problem once I turned the monitors on and started actually using the machines. juan BFP Message 1953194 - Posted: 1 Sep 2018, 11:37:03 UTC I've never seen that problem on either of my 2 dual GPU rigs, but sudden power downs 1-2 secs after starting a task will produce a corrupt header error and that is all I get on restarting (though that can happen on either 1 of the cards).He said a power restart of the host. Not just a restart of BOINC. He did not mention a complete cold boot from power switch off.How many times? Usually less than half of My cards show the problem after a reboot. I'd suggest trying it with just the lower end GPUs.Just a question to be sure i look in the right place. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
you also once blamed this issue on a web browser. https://setiathome.berkeley.edu/forum_thread.php?id=81271&postid=1954273#1954273 Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.