Message boards :
Number crunching :
NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units
Message board moderation
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · Next
Author | Message |
---|---|
robertmiles Send message Joined: 16 Jan 12 Posts: 213 Credit: 4,117,756 RAC: 6 ![]() |
Such pessimism and defeatism! I've asked Einstein to check whether they had such an error for Nvidia cards or not. No answer yet. |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
Such pessimism and defeatism! or realism. why waste the time and energy fixing something that will be not applicable by the time the fix can be implemented? you remember how long it took to get fixed last time? SETI is over in 4 days, why bother? maybe they had a breakdown of communication, but I would imagine the same team that made the updates last time are still making the updates now. or maybe the changes they made were not in line with other parts of their driver, and caused conflicts with other goals they had, so learning that the fix would no longer be needed, allows them to remove the conflicting code? we're not likely to get clear answers, so we can only speculate. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 ![]() |
why waste the time and energy fixing Why? Because if it's broken for calls made by one science app, it can be broken for calls made by other science apps. we're not likely to get clear answers, so we can only speculate I intend to not speculate, as you have done. I intend to get answers, and get it fixed. |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
why waste the time and energy fixing you have no power to get it fixed. and Nvidia isn't likely to give you any answers on why it broke in the first place. they didn't tell you last time. why would they fix something that will only affect and be verifiable using a benchmark tool created to run workunits for an EOL project? at some point the juice isn't worth the squeeze. and unless someone can prove that it's impacting another project, it's not. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
robertmiles Send message Joined: 16 Jan 12 Posts: 213 Credit: 4,117,756 RAC: 6 ![]() |
why waste the time and energy fixing I've seen a message over on Einstein@Home saying that they were probably also affected. They aren't shutting down. |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
Please link the message. The only message I see is the same thing that Keith posted here. I searched through the messages at Einstein and could not find a verifiable post of someone having this problem at Einstein. If it was a real problem before you would see lots of posts about it. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
![]() ![]() ![]() Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 ![]() ![]() |
There was a discussion in Problems and Bug Reports forum about progress stalling out like it does on SoG tasks in Win10. Only affected Win10 and 7 was fine. So likely the same issue. Seti@Home classic workunits:20,676 CPU time:74,226 hours ![]() ![]() A proud member of the OFA (Old Farts Association) |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 ![]() |
I hope you won't impede my attempts to retain optimism and work towards a fix. Thank you. |
Ian&Steve C. ![]() Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 ![]() ![]() |
can you link directly to what you're referencing? the only thing I can find is the AMD issue on RX5700 cards (like this one), nothing about Nvidia driver problems, these two issues were happening at about the same time, and fixed at about the same time also. I looked through 4 pages of threads on the Problems and Bug reporting board, which reaches back to early September, before it was reported here. I really think you're confusing the Nvidia issue and AMD issue. Prime example right here: https://einsteinathome.org/host/12803450 This computer is using Win10 with 445 drivers no problem. Einstein doesn't have this issue. Seems to be SETI only, and it's day's are numbered. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours ![]() ![]() |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 ![]() |
Woohoo, today is the day I can test some drivers for this issue! :) Here are my targets: 442.19 442.37 442.50 442.59 442.74 445.75 445.78 Be back in a few hours! ;) |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 ![]() |
Nearing the end of my test run. So far, the behavior of my results looks like the solution was not included in R445. Will report full result set later today. |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 ![]() |
My results are below. The behavior looks like the fix did not get included in R445 drivers. I sent a report to NVIDIA, to their Driver Feedback form here: https://forms.gle/kJ9Bqcaicvjb82SdA If you have this problem too, please also send a report to NVIDIA using that form. My test and result files, are all located here: https://1drv.ms/f/s!AgP0NBEuAPQRp6Fr322LD1BXy6rdAg Thank you. 442.19 - 442.74 All good. 445.75 2080: No GPU Usage, Ran forever with no progress 1050 Ti: No GPU Usage, Ran forever with no progress 980 Ti: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995. 980: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995. 970: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995. 445.78 2080: No GPU Usage, Ran forever with no progress 1050 Ti: No GPU Usage, Ran forever with no progress 980 Ti: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995. 980: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995. 970: ERROR: OpenCL kernel/call 'clEnqueueMapBuffer(gpu_GPUState)' call failed (-36) in file ..\analyzeFuncs.cpp near line 1995. |
robertmiles Send message Joined: 16 Jan 12 Posts: 213 Credit: 4,117,756 RAC: 6 ![]() |
I'm trying to run similar tests with 445. How can I identify an Arecibo VHAR workunit if the only workunits I can use for the tests are the ones I've downloaded but not finished yet? I previously reported the problem but without running suitable tests. Nvidia replied that they would push to up to level 2, but had not seen relevant problem reports for 445 yet. |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 ![]() |
robertmiles, If you go to my OneDrive link, you'll find some folders for "CUDA Testing" and "OpenCL Testing". - If you download those, you can use them to do some tests. - You may only need to use the "Ex 1" (Example 1) folders, to get the results that you want to check. - It is set up to be able to run tests on up-to-3 GPUs in the system (dev0, dev1, dev2). Perhaps you could: - Download and extract those folders - Run the .cmd files for your GPU (correct dev folder - dev0 for 1 GPU) - Look for GPU Usage - When it is done, inspect the .txt file in the Testdatas for the result. - Report the results here, and - Report the results to the NVIDIA Driver Feedback link. Regards, Jacob |
robertmiles Send message Joined: 16 Jan 12 Posts: 213 Credit: 4,117,756 RAC: 6 ![]() |
I tried what you suggested. The commend file started, showed several lines, then appeared to freeze. The best I can tell, the seti*.exe program is still running, but using so little CPU time that it rounds off to zero. How to I check how much the GPU is being used? How long do I wait before deciding that the command file will never finish? Using a GTX 1080. |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 ![]() |
You can use GPU-Z to monitor GPU information including GPU Usage. The OpenCL Test should start using GPU after a couple seconds. If it hasn't started within 30 seconds, it won't start, and you'll need to close the window and kill the seti executable in Task Manager. Hopefully you can: - Test 442.74 (the last R440 driver), to verify it works correctly. - Test 445.75 (the first R445 driver), to verify the problem has been reintroduced. |
robertmiles Send message Joined: 16 Jan 12 Posts: 213 Credit: 4,117,756 RAC: 6 ![]() |
How do you use GPU-Z to monitor GPU use? It showed me a lot of information about the GPU, not including whether it was being used. GTX 1080 445.75 hangs. GTX 1080 442.19 finishes in a few minutes, but the *-benchMB.txt has a lot of messages about files not found. It's too late here to download another 442 version of the driver - I'll try tomorrow. |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13913 Credit: 208,696,464 RAC: 304 ![]() ![]() |
How do you use GPU-Z to monitor GPU use?Click on the Sensors tab. Grant Darwin NT |
robertmiles Send message Joined: 16 Jan 12 Posts: 213 Credit: 4,117,756 RAC: 6 ![]() |
GTX 1080 442.74 finishes in about 3 minutes, but the *-benchMB.txt has a lot of messages about files not found. The GPU use was about 97%. The sensors tab of GPU-Z made GPU use obvious AFTER I had observed it both with and without another BOINC project using the GPU. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14690 Credit: 200,643,578 RAC: 874 ![]() ![]() |
GTX 1080 442.74 finishes in about 3 minutes, but the *-benchMB.txt has a lot of messages about files not found.What file is missing? state.sah would be worrying, result.sah would be catastrophic. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.