Questions and Answers :
GPU applications :
Cuda runtime, memory related failure, threadsafe temporary exit..
Message board moderation
Author | Message |
---|---|
ncoded.com Send message Joined: 16 Aug 16 Posts: 4 Credit: 25,131,308 RAC: 2 |
Hi, We have been running SETI on two GPUs; a GTX 970 and a GTX 750 Ti for a few days. We just got the message "Cuda runtime, memory related failure, threadsafe temporary exit" It is on the machine called E5-2683V3-1 which has 16 GB DDR4 and has a 14/28 core single V3 Xeon. Machine is running Windows 10 64 Bit. Any ideas as this is the first time we have come across this on any of our machines.. Thanks |
ncoded.com Send message Joined: 16 Aug 16 Posts: 4 Credit: 25,131,308 RAC: 2 |
Just as an update on this issue, within an hour of the problem as detailed above, on a 2nd machine (i7-4790-2) also with two GTX 970's we started getting computation errors under a completely different project Collatz. The reason I mention this is both had the same symptoms, screen going blank temporary indicating some kind of driver issue; all four GPUs use exactly the same NVIdia driver. We have not noticed any driver updates in the last day or so. The only common factor between both these boxes is the driver as mentioned, and that both are new installs of the latest BOINC software. Obviously if there is some kind of issue with the latest software and NVidia driver then we expect there to be many similar reports. It does seem odd that the driver problem is showing CUDA issues with SETI, and computation problems with Collatz. Our other two machines are running GPUs but with completely different drivers (Geo-force 210), running SETI but at this time do not seem to be showing any problems. |
rob smith Send message Joined: 7 Mar 03 Posts: 22188 Credit: 416,307,556 RAC: 380 |
Your recent additional notes point toward it being a driver issue: BOINC doesn't do any computation, it only manages the applications, you've had the same issue with different applications and hardware. What you are describing is very typical of a driver halt/re-start cycle. It wouldn't be the first time that there have been issues with the "latest" drivers when doing computational work - the modern drivers are heavily focused on games with computation a poor third. I would make sure I had drivers from Nvidia, I've been using driver version 368.81 for some time, and that appears to be stable on my Windows 7 PC, which has a pair of GTX 970s. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
ncoded.com Send message Joined: 16 Aug 16 Posts: 4 Credit: 25,131,308 RAC: 2 |
Hi Bob, Thanks for the information. We did a restart on each machine and everything seems to be okay now on both projects. If it continues we will look at older drivers; right now we are using the latest driver 376.19 which in truth has been okay for the last few weeks apart from these issues today. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.