Message boards :
Number crunching :
dpc_watchdog_violation windows 8
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 |
New info. I checked the DPC latency on the other Win7 computer with 310.90 driver. If GPU computing was turned on the DPC latency was fluctuating between 500 - 1000us. If GPU computing was suspended the DPC latency dropped below 100us. Could feeding 6 tasks on 2 GPUs cause all that DPC to happen or just bad driver code on Nvidias part? |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
New info. I checked the DPC latency on the other Win7 computer with 310.90 driver. If GPU computing was turned on the DPC latency was fluctuating between 500 - 1000us. If GPU computing was suspended the DPC latency dropped below 100us. Just ran it on my system and with BOINC running 3 WU's on the CPU and 2 each on the 670 and 650 Ti I am averaging around 130 and had a spike to 213 in a minute of watching. This is with the 310.90 driver and on my setup. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
New info. I checked the DPC latency on the other Win7 computer with 310.90 driver. If GPU computing was turned on the DPC latency was fluctuating between 500 - 1000us. If GPU computing was suspended the DPC latency dropped below 100us. More likely a side effect from the lower level PCI express drivers for the chipset being immature. Naturally the Video drivers use that bus subsystem, which involves IO buffers & hardware interrupts etc. That's where DPC's & Timeouts come in. Also make sure to get Chipset drivers straight from Intel instead of AsRock or disk with the motherboard. Probably will need to monitor for BIOS updates as well. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 |
We have to consider that the platform involves Plex bridge chip. Windows 8 just is too unusable. I am considering to get a Windows 7 license insteadand let MS sort out the Win 8 issues in their own time. I just spent too much time playing around with Win 8 now. I cannot conclude that there is some underlying issues that could be more optimized on motherboard driver level, but under Win 7 it is at least manageable and the system does not freeze out every 3 seconds for 5 seconds. |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 |
Rebooted Win 8. Turned off SSDP and UPnP services. Now latency looks good again. But for now long? |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Makes sense. It does seem to mirror in some ways the experience back when XP first came online, then again by Vista. For those that don't recall, hardware vendors often have a challenge updating to new driver models. Understandable but frustrating for early adopters. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 |
Meanwhile I have updated latest intel inf driver package 9.3.1.1026 I unlocked the host. The display was at sleep for say 15 seconds. Then I got: (Scheduler wait: Cuda runtime, memory related failure, threadsafe temporary Exit) on Boinc-> Tasks for the previously active Cuda tasks. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
... Then I got: That errors make it up to application level now is actually a positive sign, because the precise subsystem can likely be better isolated. In the task stderr there should be a line number etc, indicating at what point the failures are occurring. If that's during memory allocations / initialisation, or memory transfers (across PCIe), next steps would involve systematically reseating hardware (system RAM, card, power), checking every related BIOS setting & verifying they 'stick', followed by stress tests on system & GPU memory etc. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 |
I uninstalled and re-installed Intel Smart Connect. The problems are less frequent now, but I still get a restart of the kernel mode driver (then all the unrefreshed areas on screen remain black) and occasional restarts of the computer. HP Scan software also contributes to some spikes when you finished scanning document (at end of batch in case of multiple pages). |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 |
Will the task err be uploaded and visible to everyone, or do I need to start looking in some files? There are probably several tens of not hundred tasks away now. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
For a task that is in the process of retrying, stderr.txt will reside in the slot folder (within Boinc's Data directory slots\#). The difficulty from this end is that from the server perspective the tasks look more or less processed correctly (the machine is producing valid work despite its issues) they dissapear quickly off the server & will be buried in large numbers of ones that ran normally. Best bet would be to wait for the temporary exit to occur, stop Boinc, look through the slots for one or more folders containing a boinc_temporary_exit file, and post the stderr.txt from that one "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 |
Hello, It is easy to reproduce. Just wait for the screen to blank out and sign in again. BTW, the Win7 machine also very seldomly (not enough to annoy normal use, but recording a TV show might have some hickups once in an hour or so. Heres one pick: setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 680, 2047 MiB, regsPerBlock 65536 computeCap 3.0, multiProcs 8 pciBusID = 3, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 680 is okay SETI@home using CUDA accelerated device GeForce GTX 680 pulsefind: blocks per SM 4 (Fermi or newer default) pulsefind: periods per launch 100 (default) Priority of process set to BELOW_NORMAL (default) successfully Priority of worker thread set successfully setiathome enhanced x41zc, Cuda 5.00 Legacy setiathome_enhanced V6 mode. Work Unit Info: ............... WU true angle range is : 0.432766 Kepler GPU current clockRate = 1071 MHz Thread call stack limit is: 1k Cuda error 'find_triplets_kernel' in file 'c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_pulsefind.cu' in line 262 : the launch timed out and was terminated. Launch timeout error. uncaptured error before call (cudaMemcpy(&flags, dev_flag, sizeof(*dev_flag), cudaMemcpyDeviceToHost)), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_pulsefind.cu, line 270: the launch timed out and was terminated Exiting cudaAcc_free() called... cudaAcc_free() running... cudaAcc_free() PulseFind freed... cudaAcc_free() Gaussfit freed... cudaAcc_free() AutoCorrelation freed... cudaAcc_free() DONE. Cuda sync'd & freed. Preemptively acknowledging a safe temporary exit-> Exit Status: 0 boinc_exit(): requesting safe worker shutdown -> boinc_exit(): received safe worker shutdown acknowledge -> Cuda threadsafe ExitProcess() initiated, rval 0 setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 680, 2047 MiB, regsPerBlock 65536 computeCap 3.0, multiProcs 8 pciBusID = 3, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 680 is okay SETI@home using CUDA accelerated device GeForce GTX 680 pulsefind: blocks per SM 4 (Fermi or newer default) pulsefind: periods per launch 100 (default) Priority of process set to BELOW_NORMAL (default) successfully Priority of worker thread set successfully Restarted at 55.35 percent, with Lunatics x41zc, Cuda 5.00 Legacy setiathome_enhanced V6 mode. Kepler GPU current clockRate = 1071 MHz Thread call stack limit is: 1k Error on call (cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, (cudaAcc_NumDataPoints / fftlen) * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_summax.cu, line 252: the launch timed out and was terminated Exiting cudaAcc_free() called... cudaAcc_free() running... cudaAcc_free() PulseFind freed... cudaAcc_free() Gaussfit freed... cudaAcc_free() AutoCorrelation freed... cudaAcc_free() DONE. Cuda sync'd & freed. Preemptively acknowledging a safe temporary exit-> Exit Status: 0 boinc_exit(): requesting safe worker shutdown -> boinc_exit(): received safe worker shutdown acknowledge -> Cuda threadsafe ExitProcess() initiated, rval 0 setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 680, 2047 MiB, regsPerBlock 65536 computeCap 3.0, multiProcs 8 pciBusID = 3, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 680 is okay SETI@home using CUDA accelerated device GeForce GTX 680 pulsefind: blocks per SM 4 (Fermi or newer default) pulsefind: periods per launch 100 (default) Priority of process set to BELOW_NORMAL (default) successfully Priority of worker thread set successfully Restarted at 74.86 percent, with Lunatics x41zc, Cuda 5.00 Legacy setiathome_enhanced V6 mode. Kepler GPU current clockRate = 1071 MHz Thread call stack limit is: 1k |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 |
Next one: setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 680, 2047 MiB, regsPerBlock 65536 computeCap 3.0, multiProcs 8 pciBusID = 3, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 680 is okay SETI@home using CUDA accelerated device GeForce GTX 680 pulsefind: blocks per SM 4 (Fermi or newer default) pulsefind: periods per launch 100 (default) Priority of process set to BELOW_NORMAL (default) successfully Priority of worker thread set successfully setiathome enhanced x41zc, Cuda 5.00 Legacy setiathome_enhanced V6 mode. Work Unit Info: ............... WU true angle range is : 0.432766 Kepler GPU current clockRate = 1071 MHz Thread call stack limit is: 1k Error on call (cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, (cudaAcc_NumDataPoints / fftlen) * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_summax.cu, line 252: the launch timed out and was terminated Exiting cudaAcc_free() called... cudaAcc_free() running... cudaAcc_free() PulseFind freed... cudaAcc_free() Gaussfit freed... cudaAcc_free() AutoCorrelation freed... cudaAcc_free() DONE. Cuda sync'd & freed. Preemptively acknowledging a safe temporary exit-> Exit Status: 0 boinc_exit(): requesting safe worker shutdown -> boinc_exit(): received safe worker shutdown acknowledge -> Cuda threadsafe ExitProcess() initiated, rval 0 setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 680, 2047 MiB, regsPerBlock 65536 computeCap 3.0, multiProcs 8 pciBusID = 3, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 680 is okay SETI@home using CUDA accelerated device GeForce GTX 680 pulsefind: blocks per SM 4 (Fermi or newer default) pulsefind: periods per launch 100 (default) Priority of process set to BELOW_NORMAL (default) successfully Priority of worker thread set successfully Restarted at 30.47 percent, with Lunatics x41zc, Cuda 5.00 Legacy setiathome_enhanced V6 mode. Kepler GPU current clockRate = 1071 MHz Thread call stack limit is: 1k Error on call (cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, (cudaAcc_NumDataPoints / fftlen) * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_summax.cu, line 252: the launch timed out and was terminated Exiting cudaAcc_free() called... cudaAcc_free() running... cudaAcc_free() PulseFind freed... cudaAcc_free() Gaussfit freed... cudaAcc_free() AutoCorrelation freed... cudaAcc_free() DONE. Cuda sync'd & freed. Preemptively acknowledging a safe temporary exit-> Exit Status: 0 boinc_exit(): requesting safe worker shutdown -> boinc_exit(): received safe worker shutdown acknowledge -> Cuda threadsafe ExitProcess() initiated, rval 0 |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 |
From slot 12: setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 680, 2047 MiB, regsPerBlock 65536 computeCap 3.0, multiProcs 8 pciBusID = 3, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 680 is okay SETI@home using CUDA accelerated device GeForce GTX 680 pulsefind: blocks per SM 4 (Fermi or newer default) pulsefind: periods per launch 100 (default) Priority of process set to BELOW_NORMAL (default) successfully Priority of worker thread set successfully setiathome enhanced x41zc, Cuda 5.00 Legacy setiathome_enhanced V6 mode. Work Unit Info: ............... WU true angle range is : 0.432766 Kepler GPU current clockRate = 1071 MHz Thread call stack limit is: 1k Error on call (cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, (cudaAcc_NumDataPoints / fftlen) * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_summax.cu, line 252: the launch timed out and was terminated Exiting cudaAcc_free() called... cudaAcc_free() running... cudaAcc_free() PulseFind freed... cudaAcc_free() Gaussfit freed... cudaAcc_free() AutoCorrelation freed... cudaAcc_free() DONE. Cuda sync'd & freed. Preemptively acknowledging a safe temporary exit-> Exit Status: 0 boinc_exit(): requesting safe worker shutdown -> boinc_exit(): received safe worker shutdown acknowledge -> Cuda threadsafe ExitProcess() initiated, rval 0 |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 |
The mouse redraw also sometimes dissapears. I.e. not correct symbol. Or now symbol at all.... |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Yep, those confirm the drivers & hardware are not doing what they should be underneath. Looks like possibly interrupts not getting back from the card to the system (for whatever underlying reason, PCI express failures, Card failures, or something else related). The systematic reseating of everything will be the go next. Do you have some other card you can swap in, to confirm the issue stays with the machine/OS, rather than the Graphics card being faulty ? or did it work already in some other machine ? "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 |
I had that GTX680 crunching well in other machines already over half a year without any significant problems. A reshuffle could be done. I recall that during install the default vga video driver did not have any issues. |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 |
One of my suspects is the HP Scan to PC that is launched every 10 seconds to check for incoming images. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
One of my suspects is the HP Scan to PC that is launched every 10 seconds to check for incoming images. Quite, yes it could be just about any driver, firmware or device. Another possible approach that might work well would be to take everything down to the bare minimum, verify correct operation, and add back one by one until issues start to reappear. Whichever direction you take isolation from here, it'll be interesting once the final culprit is completely narrowed down. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Mark Lybeck Send message Joined: 9 Aug 99 Posts: 245 Credit: 216,677,290 RAC: 173 |
A few more of these: setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 680, 2047 MiB, regsPerBlock 65536 computeCap 3.0, multiProcs 8 pciBusID = 3, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 680 is okay SETI@home using CUDA accelerated device GeForce GTX 680 pulsefind: blocks per SM 4 (Fermi or newer default) pulsefind: periods per launch 100 (default) Priority of process set to BELOW_NORMAL (default) successfully Priority of worker thread set successfully setiathome enhanced x41zc, Cuda 5.00 Legacy setiathome_enhanced V6 mode. Work Unit Info: ............... WU true angle range is : 0.432556 Kepler GPU current clockRate = 1084 MHz Thread call stack limit is: 1k setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 680, 2047 MiB, regsPerBlock 65536 computeCap 3.0, multiProcs 8 pciBusID = 3, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 680 is okay SETI@home using CUDA accelerated device GeForce GTX 680 pulsefind: blocks per SM 4 (Fermi or newer default) pulsefind: periods per launch 100 (default) Priority of process set to BELOW_NORMAL (default) successfully Priority of worker thread set successfully Restarted at 11.09 percent, with Lunatics x41zc, Cuda 5.00 Legacy setiathome_enhanced V6 mode. Kepler GPU current clockRate = 1084 MHz Thread call stack limit is: 1k Error on call (cudaMemcpy(&flags, dev_flag, sizeof(*dev_flag), cudaMemcpyDeviceToHost)), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_gaussfit.cu, line 587: the launch timed out and was terminated Exiting cudaAcc_free() called... cudaAcc_free() running... cudaAcc_free() PulseFind freed... cudaAcc_free() Gaussfit freed... cudaAcc_free() AutoCorrelation freed... cudaAcc_free() DONE. Cuda sync'd & freed. Preemptively acknowledging a safe temporary exit-> Exit Status: 0 boinc_exit(): requesting safe worker shutdown -> boinc_exit(): received safe worker shutdown acknowledge -> Cuda threadsafe ExitProcess() initiated, rval 0 setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 680, 2047 MiB, regsPerBlock 65536 computeCap 3.0, multiProcs 8 pciBusID = 3, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 680 is okay SETI@home using CUDA accelerated device GeForce GTX 680 pulsefind: blocks per SM 4 (Fermi or newer default) pulsefind: periods per launch 100 (default) Priority of process set to BELOW_NORMAL (default) successfully Priority of worker thread set successfully setiathome enhanced x41zc, Cuda 5.00 Legacy setiathome_enhanced V6 mode. Work Unit Info: ............... WU true angle range is : 0.425825 Kepler GPU current clockRate = 1084 MHz Thread call stack limit is: 1k setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 680, 2047 MiB, regsPerBlock 65536 computeCap 3.0, multiProcs 8 pciBusID = 3, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 680 is okay SETI@home using CUDA accelerated device GeForce GTX 680 pulsefind: blocks per SM 4 (Fermi or newer default) pulsefind: periods per launch 100 (default) Priority of process set to BELOW_NORMAL (default) successfully Priority of worker thread set successfully Restarted at 9.93 percent, with Lunatics x41zc, Cuda 5.00 Legacy setiathome_enhanced V6 mode. Kepler GPU current clockRate = 1084 MHz Thread call stack limit is: 1k Error on call (cudaMemcpy(PowerSpectrumSumMax, dev_PowerSpectrumSumMax, (cudaAcc_NumDataPoints / fftlen) * sizeof(*dev_PowerSpectrumSumMax), cudaMemcpyDeviceToHost)), file c:/[Projects]/__Sources/sah_v7_opt/Xbranch/client/cuda/cudaAcc_summax.cu, line 252: the launch timed out and was terminated Exiting cudaAcc_free() called... cudaAcc_free() running... cudaAcc_free() PulseFind freed... cudaAcc_free() Gaussfit freed... cudaAcc_free() AutoCorrelation freed... cudaAcc_free() DONE. Cuda sync'd & freed. Preemptively acknowledging a safe temporary exit-> Exit Status: 0 boinc_exit(): requesting safe worker shutdown -> boinc_exit(): received safe worker shutdown acknowledge -> Cuda threadsafe ExitProcess() initiated, rval 0 |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.