NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units

Message boards : Number crunching : NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14655
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2031765 - Posted: 10 Feb 2020, 8:52:18 UTC - in response to Message 2031690.  

Richard, do we need any 432.00 validation? I'm not going to redo any of my testing.
No, I think we've done that already: message 2030385.

The next testing milestone will come when Microsoft start distribution of their next driver after 432.00 - which you suggested is likely to happen with the 2003 feature update. But perhaps we should keep an eye on the February and March security rollups, in case they sneak one out early.
ID: 2031765 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2032426 - Posted: 14 Feb 2020, 20:20:06 UTC

Hope the majority of people running Win 10 and the SoG app have updated their drivers to the latest. We are currently processing a cr*p-ton of old Arecibo VHAR tasks of angle range = 2.7
The 05dc14aa file seems to be the main culprit.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2032426 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13758
Credit: 208,696,464
RAC: 304
Australia
Message 2032457 - Posted: 14 Feb 2020, 22:51:40 UTC - in response to Message 2032426.  

We are currently processing a cr*p-ton of old Arecibo VHAR tasks of angle range = 2.7
The 05dc14aa file seems to be the main culprit.
Add 22dc09ac to that, and it looks like it's got quite a few noise bombs in there with the shorties.
Grant
Darwin NT
ID: 2032457 · Report as offensive     Reply Quote
Profile CalicoSkies
Avatar

Send message
Joined: 20 May 99
Posts: 31
Credit: 1,352,098
RAC: 4
United States
Message 2033244 - Posted: 21 Feb 2020, 0:42:22 UTC - in response to Message 2030751.  

I think 442.19 has definitely fixed it!

In my testing of the 442.19 drivers, I had no problems processing VHAR work items, on my main rig (RTX 2080, GTX 980 Ti, GTX 980) using Windows 10.
All 3 GPUs acted correctly.

I can second this. Yes, I recently installed the 442.19 driver (game-ready) and started running BOINC again in the last few days and have not seen any problems running the SETI@Home SoG tasks. They have all been completing within 2-4 minutes for me (RTX 2070 Super) on Windows 10.
ID: 2033244 · Report as offensive     Reply Quote
rcthardcore

Send message
Joined: 23 Nov 08
Posts: 48
Credit: 1,306,006
RAC: 0
United States
Message 2034294 - Posted: 27 Feb 2020, 22:30:10 UTC

Seti@Home is working quite well with Nvidia Whql driver version 442.37 for Windows 10 Professional 64-bit. Everything seems to be ok with this release.
At least on my end, I haven't had any problems with it. Knock on wood.
ID: 2034294 · Report as offensive     Reply Quote
Patrick Meyer

Send message
Joined: 18 Jun 11
Posts: 5
Credit: 23,418,285
RAC: 104
Canada
Message 2036945 - Posted: 9 Mar 2020, 15:05:22 UTC

i have just installed nvidia driver 442.50 and it looks like it is
working good on seti home
ID: 2036945 · Report as offensive     Reply Quote
Patrick Meyer

Send message
Joined: 18 Jun 11
Posts: 5
Credit: 23,418,285
RAC: 104
Canada
Message 2040112 - Posted: 24 Mar 2020, 15:21:58 UTC - in response to Message 2039624.  
Last modified: 24 Mar 2020, 15:22:25 UTC

i just updated my driver to 445.75 and it is now not working
ID: 2040112 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14655
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2040115 - Posted: 24 Mar 2020, 15:30:47 UTC - in response to Message 2040112.  

i just updated my driver to 445.75 and it is now not working
Because of the replica database delay, we won't be able to find/see the error messages from your tasks for another three days or more.

Please help us understand the 'not workingness of it'. Any symptoms? Any error messages / error numbers you can post?
ID: 2040115 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2040166 - Posted: 24 Mar 2020, 19:54:33 UTC

Yuck.

I think I am noticing R475 driver 445.75 fail my OpenCL test for my 2080.

I guess I will be testing all my GPUs against:
- R440 driver versions: 442.37, 442.50, 442.59, 442.74
- R445 driver version: 445.75
... in the coming days.
ID: 2040166 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2040174 - Posted: 24 Mar 2020, 20:25:30 UTC

Maybe run a task against the offline benchmark tool with the suspect driver to get instant error logs.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2040174 · Report as offensive     Reply Quote
Michael Donikowski
Volunteer tester

Send message
Joined: 29 May 99
Posts: 8
Credit: 22,914,826
RAC: 38
United States
Message 2040182 - Posted: 24 Mar 2020, 21:27:45 UTC - in response to Message 2040115.  
Last modified: 24 Mar 2020, 21:27:59 UTC

445 driver is repeating the Open CL problem of several months ago. Had 1 WU run for 40 min, got comp error. Rolled back to 442 driver and all is ok.
ID: 2040182 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2040192 - Posted: 24 Mar 2020, 22:41:06 UTC - in response to Message 2040182.  

Looks like the changes they put in place for 442.19 to fix the original problem got dropped by the development team. Or a different team was unaware of the code changes that fixed that to include it in their release build.

Someone needs to remind Nvidia again to put the code patch back in.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2040192 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13758
Credit: 208,696,464
RAC: 304
Australia
Message 2040238 - Posted: 25 Mar 2020, 3:16:00 UTC - in response to Message 2040192.  

Someone needs to remind Nvidia again to put the code patch back in.
Or they heard Seti was shutting down & decided they didn't need it in there anymore...
Grant
Darwin NT
ID: 2040238 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2040255 - Posted: 25 Mar 2020, 7:52:07 UTC - in response to Message 2040238.  

Someone needs to remind Nvidia again to put the code patch back in.
Or they heard Seti was shutting down & decided they didn't need it in there anymore...

That's a pessimistic view. FYI, the tasks on other projects were failing too with the older drivers without the fix for the "Seti" tasks. Einstein for one.
It would be a shame if they pulled the code fix thinking it only applied to Seti and Seti shutting down led them to believe it was only needed for this project.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2040255 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13758
Credit: 208,696,464
RAC: 304
Australia
Message 2040265 - Posted: 25 Mar 2020, 9:02:32 UTC - in response to Message 2040255.  
Last modified: 25 Mar 2020, 9:03:10 UTC

That's a pessimistic view. FYI, the tasks on other projects were failing too with the older drivers without the fix for the "Seti" tasks. Einstein for one.
It would be a shame if they pulled the code fix thinking it only applied to Seti and Seti shutting down led them to believe it was only needed for this project.
I was just being facetious. Most likely the left had not knowing about the right hand and different code paths etc.
Never attribute to malice that which can adequately be explained by stupidity (i have to remind myself of this daily due to the crap going on at work at present. The Virus is a good excuse for management to do all sorts of things they never could do otherwise, and they are trying to do it all at once. If there were somewhere to go, the rats would all be jumping ship. But there isn't. So they can't. We're all just trapped rats).
Grant
Darwin NT
ID: 2040265 · Report as offensive     Reply Quote
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 2040301 - Posted: 25 Mar 2020, 12:54:14 UTC - in response to Message 2040238.  

Someone needs to remind Nvidia again to put the code patch back in.
Or they heard Seti was shutting down & decided they didn't need it in there anymore...
I told them last night, although probably without enough details. We probably need many more such messages to them.
ID: 2040301 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2040348 - Posted: 25 Mar 2020, 17:46:59 UTC - in response to Message 2040301.  

I would hope they don't require the extensive documentation and user input that the first logged complaint needed to effect a fix.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2040348 · Report as offensive     Reply Quote
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 2040363 - Posted: 25 Mar 2020, 18:51:48 UTC - in response to Message 2040348.  

I would hope they don't require the extensive documentation and user input that the first logged complaint needed to effect a fix.

Hopefully, they can try the same OpenCL section that was in the 442 drivers, then ask a few SETI@Home users to test how well that works. If it works, they might even get it fixed in the later 445 drivers.
ID: 2040363 · Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 4267
Credit: 1,282,604,591
RAC: 6,640
United States
Message 2040368 - Posted: 25 Mar 2020, 19:12:36 UTC - in response to Message 2040255.  

Someone needs to remind Nvidia again to put the code patch back in.
Or they heard Seti was shutting down & decided they didn't need it in there anymore...

That's a pessimistic view. FYI, the tasks on other projects were failing too with the older drivers without the fix for the "Seti" tasks. Einstein for one.
It would be a shame if they pulled the code fix thinking it only applied to Seti and Seti shutting down led them to believe it was only needed for this project.


I know Einstein had issues with the AMD driver on RX 5700 cards, but wasn't aware that they had issues with Nvidia drivers that correlated with the same SETI fix. Can you confirm?

If it only affects SETI, and again only one WU type on one app on one OS, then really it's not worth the effort to fix again. SETI will be done in a week.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2040368 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2040462 - Posted: 26 Mar 2020, 3:40:53 UTC

Such pessimism and defeatism!

If there's a bug in these R445 drivers, which it seems likely to be, then ...
... I for one will push for them to fix the bug.

I have not yet fully characterized the behavior that I'm seeing, and may not have time until the weekend.
ID: 2040462 · Report as offensive     Reply Quote
Previous · 1 . . . 15 · 16 · 17 · 18 · 19 · 20 · Next

Message boards : Number crunching : NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.