Message boards :
Number crunching :
NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units
Message board moderation
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 20 · Next
Author | Message |
---|---|
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I think 442.19 has definitely fixed it! !YAY! Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
I think 442.19 has definitely fixed it!If so, that is excellent news. If sorted, the problems will be quickly resolved as many of those that don't want to use the older drivers almost always install the most recent drivers by habit (even if the new driver doesn't actually do anything for their games or systems anyway). Thank you for all your efforts with this issue. Grant Darwin NT |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
You're welcome. :) It usually takes a bit of effort to get make an easy repro, and proper problem reporting, for NVIDIA guys to recognize the issue and go after it. I'd like to thank Richard Haselgrove for his help with the repro. Thanks Richard -- Remember when we were diagnosing while I was at an airport and you were using Discord for one of the first times? Fun times - unforgettable! Also, regarding confirming the fix, I'm in the process of getting OpenCL and CUDA results, for us to compare and verify. Maybe Richard can help with that verification, when I have the data ready, later today. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874 |
Can do. I've arranged to meet someone at 14:30, and may not be back before maintenance starts. We may have to fire up Discord again... |
robertmiles Send message Joined: 16 Jan 12 Posts: 213 Credit: 4,117,756 RAC: 6 |
Could you mention which of the applications runs VHAR workunits faster? We may need a way to insure that we use that one, except when you need the other one used instead. |
Bruce N. Goren Send message Joined: 1 Jul 99 Posts: 15 Credit: 11,329,118 RAC: 32 |
Yep, I grabbed the Studio Driver variant of 442.19 and it looks good on my RTX2080i . Thanks to all ! |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
Yep, I grabbed the Studio Driver variant of 442.19 and it looks good on my RTX2080i . Thanks to all !The issue only occurred with some Arecibo WUs. Have you processed any shortie Arecibo WUs successfully, as there are very, very few of them around at the moment? Grant Darwin NT |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
1) My NVIDIA contact would like to extend NVIDIA's apologies on the lengthy time needed for this fix. They are grateful for our patience. 2) Some people are looking for good examples to test the fix. I offer them my examples, located in the .zip files here. Just download the .zip file, then extract it, then run the .cmd file within the folder whose name matches whatever GPU device (dev 0, or dev 1, or dev 2) that you want to test. Results are put into the "Testdatas" folder. Example Work Units - Zips: https://1drv.ms/f/s!AgP0NBEuAPQRp9ky322LD1BXy6rdAg 3) Richard, here are my results. I did not inspect thoroughly, and I'm hoping you can do that. All I know is that they completed without error, and GPU Usage seemed good... so I suspect the results are probably good. Can you please have a look? Especially comparing them fixed 442.19 to known working 431.68, and the results should match. My OpenCL Results: https://1drv.ms/f/s!AgP0NBEuAPQRp7kD322LD1BXy6rdAg My CUDA Results: https://1drv.ms/f/s!AgP0NBEuAPQRp8RF322LD1BXy6rdAg 4) For anyone wanting access to every file I have on this bug, here's the main folder: Main folder: https://1drv.ms/f/s!AgP0NBEuAPQRp6Fr322LD1BXy6rdAg Regards, Jacob Klein |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
File 01fe20aa appears to be putting out quite a few shorties, so that will give people something to further test the new driver on. Grant Darwin NT |
EdwardPF Send message Joined: 26 Jul 99 Posts: 389 Credit: 236,772,605 RAC: 374 |
nvidia 442.19-desktop-win10-64bit-international-whql has been working fine for the last 12 hrs with SOG on my nvidia 1660 super, no hanging as near as I can tell Maybe it's a good one Ed F |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
Yep, 442.19 is a good driver that fixes the issue in this thread. Scroll up to see the prior results. And thanks for confirming. |
VelocityRC Send message Joined: 27 Sep 19 Posts: 23 Credit: 1,421,582 RAC: 86 |
I 'll give it a spin this morning and see how things run for a few days. From the recent posts I'm glad that nVidia is listening and there are folks here that understand what to tell them. Have fun everyone and thanks !!! BIll S. |
KWSN - Sir Nutsalot Send message Joined: 4 Jun 99 Posts: 5 Credit: 22,114,565 RAC: 47 |
I can confirm that the latest Radeon 20-1-1 drivers drivers have fixed four of my machines (RX580 and 590) from crashing workunits on Radeon software above version 19-7-5. Tried the the latest Nvidia DCH drivers at version 441.19 and I have not seen work units stalling yet either. I used the gaming drivers for my test machine. A good day all round me thinks. Jim |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Interesting, just found a wingman running Windows 10 and the 441.66 drivers who crapped out on a VHAR task from the 30ja20ab series. Shouldn't have had an issue theoretically since the problem was supposedly fixed in the 441.19 drivers. AR = 14.597567 https://setiathome.berkeley.edu/result.php?resultid=8513797826 Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
Keith, I believe you are mistaken. The fix is in 442.19. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Keith, Ohh, sorry about that. Got the version number of the fix wrong I see. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
Hi folks, I recently went through all of my local "repro" examples ... against all 6 of my GPUs ... against both Cuda and OpenCL ... against drivers: - 431.60 (known good NVIDIA public release) - 431.68 (known good NVIDIA hotfix driver) - 432.00 (known good Windows Update driver) - 442.19 (recent NVIDIA driver with fix that looks good so far) A .zip of the results can be found here: https://1drv.ms/f/s!AgP0NBEuAPQRp-ZG322LD1BXy6rdAg Richard is going to look them over, but if anyone else knows how to do that and wants to also inspect for validation, please feel free! Regards, Jacob Klein |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
Doesn't look you ran any of the high AR tasks by the reference app for comparison. That is the application that needs to have the results evaluated against. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jacob Klein Send message Joined: 15 Apr 11 Posts: 149 Credit: 9,783,406 RAC: 9 |
I'm not sure I understand. I ran the Cuda apps, but separately. Does that end up skipping an automated validation process? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
All applications use the stock cpu application as the reference result that any other application is judged against. The stock cpu result is considered the standard to match. I looked in all your folders and only saw the individual CUDA and OpenCL applications with results. No cpu application results for the tasks that were run. The benchmark allows using the stock cpu application to be run and then compare the test application against. It doesn't look like you did that. At least I could not find any reference result in any of the folders. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.