NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units

Message boards : Number crunching : NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 20 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030344 - Posted: 1 Feb 2020, 17:56:35 UTC
Last modified: 1 Feb 2020, 18:27:10 UTC

I have a host (8121358) with 1050 TI

It normally runs Windows 7, but has an alternative boot to Windows 10. I'll switch it over. push for updates, and see what comes. May take a while...

Bah - wants a Windows 7 update before I can even shut it down...

Into Win 10, downloading 1809 and NVidia 26.21.14.3200 - that looks like the one we want. Except my internet just crashed...

It'll be 8508571 when it's ready - it does have a 1050, honest!
ID: 2030344 · Report as offensive     Reply Quote
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19144
Credit: 40,757,560
RAC: 67
United Kingdom
Message 2030347 - Posted: 1 Feb 2020, 18:07:05 UTC - in response to Message 2030335.  

Or if the host computer does not do games, click the "Windows Driver Type" and choose Studio Driver. That one also works.
ID: 2030347 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030357 - Posted: 1 Feb 2020, 19:05:14 UTC

Here we go. Device Manager says


and that's the one I saw downloading from Microsoft

BOINC says

01/02/2020 18:57:07 |  | CUDA: NVIDIA GPU 0: GeForce GTX 1050 Ti (driver version 432.00, CUDA version 10.1, compute capability 6.1, 4096MB, 3376MB available, 2138 GFLOPS peak)
01/02/2020 18:57:07 |  | OpenCL: NVIDIA GPU 0: GeForce GTX 1050 Ti (driver version 432.00, device version OpenCL 1.2 CUDA, 4096MB, 3376MB available, 2138 GFLOPS peak)
01/02/2020 18:57:07 |  | OpenCL CPU: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (OpenCL driver vendor: Intel(R) Corporation, driver version 7.6.0.0814, device version OpenCL 2.1 (Build 0))
01/02/2020 18:57:07 |  | app version refers to missing GPU type intel_gpu
I'll sort that last one out later. But I think we've confirmed what came from where. Next I'll run the same bench test (on overflow tasks) that I did for opencl_nvidia_sah six days ago - then hunt for some VHARs.
ID: 2030357 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030359 - Posted: 1 Feb 2020, 19:17:33 UTC - in response to Message 2030342.  

I suspect 432.00 is likely to be good as well, since I think it was likely a "Release 430" driver, and the problems started with 436.02 from the "Release 435" driver branch.
The Release branches are found in the Release Notes of the drivers.
Comparing the release dates from the two screen shots, I think you're right.

Microsoft's 432.00 is dated 24 July 2019
NVidia's 431.60 is dated 23 July 2019

The bench test (overflow tasks) ran at normal speed with high Q validation. I'll see if I've got any spare VHARs downstairs.
ID: 2030359 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2030360 - Posted: 1 Feb 2020, 19:18:23 UTC - in response to Message 2030357.  
Last modified: 1 Feb 2020, 19:22:14 UTC

I only have one VHAR task in my Test WUs directory. I can post it up to my Google drive if you can't find some VHAR tasks in your collection. This one definitely stalled out on Windows 10 with the later drivers. Angle range is 2.7.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2030360 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030363 - Posted: 1 Feb 2020, 19:31:37 UTC

Still trying to find my Intel HD 530 - I'll give you a shout if I need that VHAR. Ta.
ID: 2030363 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2030364 - Posted: 1 Feb 2020, 19:32:52 UTC
Last modified: 1 Feb 2020, 19:35:57 UTC

Richard,

If you're looking for a VHAR to test the issue in this thread..
Try this link, for the example that I've been testing with, along with repro steps.
https://setiathome.berkeley.edu/forum_thread.php?id=84780&postid=2016218

"MBbench - OpenCL Testing\28oc11aa.6787.6611.5.32.85.wu"
.. is the folder that you want.
ID: 2030364 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030366 - Posted: 1 Feb 2020, 19:38:47 UTC - in response to Message 2030364.  

And thanks for that, too. Machine is currently installing 1909 feature update, and I'm gagging for a cup of coffee. Taking a break...
ID: 2030366 · Report as offensive     Reply Quote
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 2030374 - Posted: 1 Feb 2020, 20:36:32 UTC - in response to Message 2030332.  

Well, since I announced the availability of the test build in this thread (and the similar parallel ATI thread), I'd sort of assumed that people would report back here. I'd prefer the test reports to be publicly visible for peer review, rather than hidden in PMs, but I'll try to read it wherever you post. But be aware I can't see inside private team discussion groups.

But anyway - thanks for the positive vote. I won't do a public release this late in the day, UK time, but I'm minded to do it tomorrow morning - say any time after 12 hours from now.


A lot of this GPU language is over my head and I'm not sure I have the time lo learn all that is needed to understand some of this. I have heard of CUDA but my knowledge is limited. If I recall it is a alternate driver for nVidia. Is that correct ?? Us greenhorns would like a driver download link, preferably an nVidia one but that seems to be an issue ATM. I don't mind using third party GPU driver with a link posted here.

JMHO

Bill S.

Alternative drivers are usually available only for the less common graphics boards for which no BOINC projects produce suitable workunits.

CUDA is a computer language for Nvidia boards only, and only Nvidia produces drivers that can use it. These drivers now can also use another computer language, OpenCL, for which other GPU companies produce drivers that can use it. Microsoft (the source of Windows) edits these drivers to produce alternate versions with an CUDA and OpenCL support removed, and distributes those versions instead.

If you find any other third party drivers, don't expect them to be useful for any BOINC work.
ID: 2030374 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030385 - Posted: 1 Feb 2020, 22:21:12 UTC

These M$ updates take ages, don't they? Coffee, pizza, wine - we got there in the end.

Downloaded Jacob's test suite, added Keith's WU.

@ Jacob - test wouldn't run because the working directory 'Testdatas' was missing. Corrected that, runs fine.

Windows 10 task manager shows utilisation, but you have to be careful to interpret it.


On the face of it, GPU utilisation is zero - but find and look at the 'Cuda' data (displayed in top-left window above). That wobbled at 97-98-99% throughout the tasks.

Result:
Quick timetable 
 
WU : 21jn12ac.5081.67.5.32.189.wu 
setiathome_8.22_windows_intelx86__opencl_nvidia_SoG.exe -verb -nog :
  Elapsed 362.870 secs 
      CPU 156.313 secs 
 
WU : 28oc11aa.6787.6611.5.32.85.wu 
setiathome_8.22_windows_intelx86__opencl_nvidia_SoG.exe -verb -nog :
  Elapsed 393.679 secs 
      CPU 186.344 secs 
I'd say that was normal for a working driver on this card, and it's still described as "driver version 432.00, device version OpenCL 1.2 CUDA".
ID: 2030385 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030388 - Posted: 1 Feb 2020, 22:30:03 UTC
Last modified: 1 Feb 2020, 22:33:35 UTC

Just realised why I can't see the Intel GPU in this version of BOINC - I haven't installed my own patch to make it visible! Updating to v7.16.something...

That's better:
01/02/2020 22:32:09 |  | Starting BOINC client version 7.16.3 for windows_x86_64
01/02/2020 22:32:10 |  | CUDA: NVIDIA GPU 0: GeForce GTX 1050 Ti (driver version 432.00, CUDA version 10.1, compute capability 6.1, 4096MB, 3376MB available, 2138 GFLOPS peak)
01/02/2020 22:32:10 |  | OpenCL: NVIDIA GPU 0: GeForce GTX 1050 Ti (driver version 432.00, device version OpenCL 1.2 CUDA, 4096MB, 3376MB available, 2138 GFLOPS peak)
01/02/2020 22:32:10 |  | OpenCL: Intel GPU 0: Intel(R) HD Graphics 530 (driver version 26.20.100.7262, device version OpenCL 2.1 NEO, 3231MB, 3231MB available, 202 GFLOPS peak)
ID: 2030388 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2030389 - Posted: 1 Feb 2020, 22:32:07 UTC - in response to Message 2030385.  
Last modified: 1 Feb 2020, 22:33:00 UTC

1) I assure you that Testdatas is there. If you used OneDrive to download it, I just tested that, and it apparently excludes empty folders, on zip creation. What a POS!! I will be filing that Feedback to Microsoft later.

2) I don't recommend trying to use Task Manager to monitor GPU stuffs. Either use GPU-Z to look at the "GPU Load" sensor, or use MSI Afterburner to view GPU Load.
ID: 2030389 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030390 - Posted: 1 Feb 2020, 22:35:54 UTC - in response to Message 2030389.  

Temporary fix - put a 'placeholder.txt' or similar in the folder. Yes, I downloaded it in .zip format.

Windows 10 task manager is OK, if you take care to use it carefully. And it's always on the system!
ID: 2030390 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030392 - Posted: 1 Feb 2020, 22:47:32 UTC

And back to my nice, comfortable Windows 7. Remember that final update I had to do before shut-down? It broke the system - had to run Startup Repair. Be careful of .NET Framework v4.8
ID: 2030392 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2030393 - Posted: 1 Feb 2020, 22:54:11 UTC - in response to Message 2030390.  
Last modified: 1 Feb 2020, 22:54:27 UTC

I'd say that pretty conclusively states that the MS supplied 432.00 driver is functionally equivalent to the official Nvidia downloadable 431.60 version. That high AR test WU did in fact hang on testers machines using a driver past the 431.60 series.

My question and observation is . . . . . did MS respond to the community's concern about the high angle range tasks failing on the later drivers and responded by downgrading hosts automatically to a version that is compatible? We haven't seen or heard anything official out of Nvidia on the matter and I know there have been multiple bug reports and issues logged with them.

Did Nvidia quietly behind the scenes suggest to MS to roll out the older driver to hosts running BOINC? I know that MS' telemetry probably advertises a BOINC platform host to their servers.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2030393 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14656
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030395 - Posted: 1 Feb 2020, 23:04:42 UTC - in response to Message 2030393.  

I wondered about that, too - it seems like a long time for Microsoft to ignore newer revisions from NVidia.
ID: 2030395 · Report as offensive     Reply Quote
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 2030428 - Posted: 2 Feb 2020, 1:12:16 UTC - in response to Message 2030395.  

I wondered about that, too - it seems like a long time for Microsoft to ignore newer revisions from NVidia.

If Microsoft is still removing the CUDA and OpenCL sections of their versions of the Nvidia drivers they distribute, I would not expect them to care whether there are any problems in the sections they removed.
ID: 2030428 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2030431 - Posted: 2 Feb 2020, 1:26:15 UTC - in response to Message 2030428.  

I wondered about that, too - it seems like a long time for Microsoft to ignore newer revisions from NVidia.

If Microsoft is still removing the CUDA and OpenCL sections of their versions of the Nvidia drivers they distribute, I would not expect them to care whether there are any problems in the sections they removed.

I think their versions of the drivers have been hit or miss regarding whether they have removed the OpenCL parts. Seems like I remember reports that some releases have been fine while others were missing.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2030431 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2030440 - Posted: 2 Feb 2020, 2:26:52 UTC
Last modified: 2 Feb 2020, 2:26:59 UTC

From my experience, the version of NVIDIA drivers that get installed for your PC by Windows Update, depends on a combination of:
- What version of Windows 10 you are using (in System > About)
- What NVIDIA GPU(s) are in your system

I highly doubt Microsoft would roll back any drivers. And, I thought on newer hardware they were issuing 436.30 by default as of this time. But I must retest.

I will do some testing tonight to confirm that, on my 2 systems shown below, on both Win 10 Release and Win 10 Insider Fast.
- System 1: RTX 2080, GTX 980 Ti, GTX 980
- System 2: GTX 970, GTX 1050, GTX 660 Ti

Regarding NVIDIA, I have been in communication with an NVIDIA QA support person who, again, claims our issue is being worked on, and still requires us to be patient.
That is all the info I can give at this time.
I know it sucks.

My workaround continues to be to set Seti@Home to No New Tasks on PCs that I use 436-or-higher drivers on.
If Seti@Home supplied a server side block of some sort for these tasks that run indefinitely on my GPUs, then maybe I might consider unsetting No New Tasks.
ID: 2030440 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2030443 - Posted: 2 Feb 2020, 2:33:53 UTC
Last modified: 2 Feb 2020, 2:42:31 UTC

Well Richard's test this afternoon on a brand new Win 10 installation yielded him the 432.00 drivers on a GTX 1050 Ti.

So something else is going on as far as which drivers are offered. Probably on the hardware detected.

[Edit] I'm sure that MS is shipping drivers that are compatible with the detected hardware. For example someone that PM'd me today has a rig with a GTX 1080 Ti, certainly the top card for the Pascal generation and was running on the 432.00 drivers. But that is sufficient for that generation. Anyone running the newest Turing cards like the new Supers would of course need a driver that recognized those new models. And those have version numbers past the cutoff point for avoiding the issue of VHAR tasks.

MS does not appear to be just automatically upgrading the drivers to the newest releases just on principle.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2030443 · Report as offensive     Reply Quote
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 20 · Next

Message boards : Number crunching : NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.