Message boards :
Number crunching :
NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 20 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13904 Credit: 208,696,464 RAC: 304 ![]() ![]() |
I just installed the NVidia Studio Driver version 441.28 . I'll let you know if SETI my performance is restored.v431.60 is the last version to work with VHAR Arecibo work. Grant Darwin NT |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
I just installed the NVidia Studio Driver version 441.28 . I'll let you know if my SETI performance is restored. It’ll work fine until you get some Arecibo VHAR tasks, and you’ll start getting failures again. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
![]() Send message Joined: 1 Jul 99 Posts: 15 Credit: 11,329,118 RAC: 32 ![]() ![]() |
I just installed the NVidia Studio Driver version 441.28 . I'll let you know if SETI my performance is restored.v431.60 is the last version to work with VHAR Arecibo work. Can't/won't revert to 431.60, I need the latest Studio Driver to take advantage of the newest ray tracing acceleration code for my RTX 2080 Ti . |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
looks like GRD 441.20 DCH was released today. https://www.nvidia.com/download/driverResults.aspx/153947/en-us probably doesn't fix the issue since nothing in the release notes mention it, but worth a shot. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 ![]() |
441.20 was released on 11/12/2019. I tested them already. They do not fix the issue. I reported results here. https://setiathome.berkeley.edu/forum_thread.php?id=84694&postid=2018729 However, they recently repacked that version, and the filename has also changed: from: 441.20-desktop-win10-64bit-international-whql.exe to: 441.20-desktop-win10-64bit-international-whql-rp.exe I might retest the repack later, but we can expect to continue to wait on a fix. It's interesting how their website sometimes shows the old date, and sometimes shows the new date. How confusing! NVIDIA.com > Drivers > GeForce Drivers > Search > 441.20 https://www.geforce.com/drivers/results/153948 > Release Date Tue Nov 12, 2019 NVIDIA.com > Drivers > All NVIDIA Drivers > Search > 441.20 https://www.nvidia.com/Download/driverResults.aspx/153944/en-us > Capital "D" in the URL for "Download" > Release Date: 2019.11.12 NVIDIA.com > Drivers > All NVIDIA Drivers > Beta and Older Drivers > Search > 441.20 https://www.nvidia.com/download/driverResults.aspx/153944/en-us > Lowercase "d" in the URL for "download" > Release Date: 2019.11.22 Fun! |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
you tested the non-DCH driver, they are different. like i said, they are unlikely to fix the issue, but the driver is different nontheless Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Baltazar Mejia Send message Joined: 14 Sep 05 Posts: 1 Credit: 14,817,297 RAC: 0 ![]() |
I have occasional GPU tasks (opencl_nvidia_SoG) stall between .5-.6%. I manually abort them, then other tasks work fine. I've noticed the tasks that fail tend to hand a deadline in December. Other tasks work fine. I've uninstalled, and reinstalled boinc, deleted the directory, installed the latest nvidia drivers. For example just now I have a task stall at .605% task 19no19ac.26046.... due 12/14/19 Task 19no19ac.20334.1754 stalled at .592% Contrast task 19no19ac.4760 due 1/6/20 is running fine. No idea how to fix this, if this is a problem on my end, or a problem with the tasks. My GPU is an RTX 2070 running the latest 441.20 driver. My data directory is in a different drive than the program directory. Would that have anything to do with it? Anyone else having this problem? |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
The problem is a combination of: 1. Windows 10 2. Nvidia drivers newer than 431.xx 3. The SETI SoG app 4. Arecibo VHAR tasks. Remove any of these 4 variables and the problem goes away. The easiest thing to do is to revert back to the 431.xx driver. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Cameron ![]() Send message Joined: 27 Nov 02 Posts: 110 Credit: 5,082,471 RAC: 17 ![]() ![]() |
The problem is a combination of: One other easy alternative could be to not allow tasks for NVIDIA GPU. |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
Doesn’t seem like a good idea since you won’t get any work that way. You can ignore the problem and you’ll still do most tasks. Or revert the drivers to be back to doing all tasks. Stopping all work to the nvidia GPU seems overly extreme. Would you really prefer to do nothing instead of something? Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Speedy ![]() Send message Joined: 26 Jun 04 Posts: 1646 Credit: 12,921,799 RAC: 89 ![]() ![]() |
The problem is a combination of: To do this it would require a coding change on the server side I would imagine. I do not know anybody capable of such a task at the present time apart from Jeff, Matt or Eric ![]() |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 ![]() |
The user has a web setting to not receive NVIDIA work, if desired. Anyway, my short term solution has been to just put SETI to "no new work" until things are fixed up. ... since I don't want my GPUs to possibly go idle with the problem, and I have other GPU projects that I'm attached to. |
d.wenzel Send message Joined: 11 Mar 01 Posts: 3 Credit: 38,878,272 RAC: 130 ![]() ![]() |
Good evening, does anyone have experiences with the new driver 441.41 of Nvidia published this day (11/26/2019)? Kind regards |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
Good evening, nothing in the release notes saying it's been addressed. So likely not fixed. someone will have to test to be sure. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 ![]() |
NVIDIA released 441.41 drivers today. I tested them, and they still have the "SETI OpenCL SoG VHAR on Windows 10" problems: Maxwell: > Tasks crash with error. >ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995. Pascal/Turing: > Tasks run indefinitely with no load on the GPU. 431.60 are the last drivers that work correctly for those specific SETI tasks on Windows 10. NVIDIA is aware, and per NVIDIA, we must continue to be patient for a driver version that includes a fix. |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
It’s looking more and more like nvidia isn’t going to fix this. We may need to look at other options, either wider adoption of the sah app that doesn’t have this issue, or some tweaking on the distribution servers to not send Arecibo VHAR tasks to Nvidia GPUs on Windows 10. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 ![]() |
Fixes take time. I've been told they are working on it. Plus, keep in mind that just because a driver is released today, doesn't mean that it has their latest efforts. For 441.41, for instance, Device Manager shows a driver date of 11/20/2019, so there's a week of lag between when the driver was compiled versus when it was released, meaning it doesn't have their changes for the past week. Again, we need to remain patient, possibly until the first driver from a new Release branch, until we can really wonder if they're abandoning us. You are welcome to work to create another repro for them and send them driver feedback. I do like your idea of a server-side change to prevent sending those out if they are known to cause problems, especially the "infinite run" problem where it wastes a GPU that could be used for other projects/tasks! |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
The project has done it in the past; IIRC, preventing Arecibo VLARs going to Nvidia GPUs. but I'm not sure if it had any additional constraints for the app used or the environment (OS). since it only seems to affect Windows 10, it would be good to narrow it down at least that much since there are lots of highly productive systems running Linux that do not have this problem and can crunch through these tasks without issue. but the problem came about in the R435 release, and nvidia was informed about the issue fairly early on after that started happening, they pushed several updates on R435 without a fix, and now have moved to R440 and still no fix. that's why I think they might not fix it. they've already moved on to a new driver release branch. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 ![]() |
They didn't have my easy repro setup+steps, though, until I put forth the effort to supply it. So .... they now have a repro, and are working on it. To my knowledge, they didn't have a readily available repro before. |
Traveller Send message Joined: 6 Jul 99 Posts: 1 Credit: 5,502,932 RAC: 15 ![]() ![]() |
I intend to just turn off GPU for SETI. My other projects are not having a problem. I'll revisit this when I build the next computer. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.