NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units

Message boards : Number crunching : NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 20 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029363 - Posted: 26 Jan 2020, 13:26:35 UTC - in response to Message 2029362.  
Last modified: 26 Jan 2020, 13:46:55 UTC

8.16 (opencl_nvidia_sah) 9 Jul 2016, 20:20:13 UTC - let me see how that matches up to the code revision log.

Edit - r3551 and r3556 are 30 October 2016 and 05 November respectively. I'm not going to distribute that without re-testing first.

Edit2 - I got cuda42 VLARs. This may take some time...

Edit3 - downloaded it from http://boinc2.ssl.berkeley.edu/beta/download/setiathome_8.16_windows_intelx86__opencl_nvidia_sah.exe. Thank you Eric for being tidy with your file names.

Edit4 - anyone like to guess what OpenCL kernel goes with it? The string in the executable is 'MultiBeam_Kernels_r%d'
ID: 2029363 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029369 - Posted: 26 Jan 2020, 13:48:24 UTC - in response to Message 2029367.  
Last modified: 26 Jan 2020, 13:53:47 UTC

Could you locate and send me the Kernel file, please? That would save time. (or just post the r number - must be 3486 or before)
ID: 2029369 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029371 - Posted: 26 Jan 2020, 13:54:37 UTC - in response to Message 2029370.  
Last modified: 26 Jan 2020, 13:59:54 UTC

Check! Ta.

Strangely, that one was a fix for intel_gpu memory issues. He must have sent up a batch of builds for testing. Now to find some overflows for validation testing - shouldn't be too hard right now ;-)
ID: 2029371 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029376 - Posted: 26 Jan 2020, 14:19:25 UTC

Well, I can still drive the knabench - it's running fine. Interestingly, I have a v8.04 CPU reference app, timed about 20 minutes after it was deployed at Beta. But the baseline Main app is 8.00, two days later. I remember a last-minute flap about CPU accuracy before the full deployment of v8 - I'll check.

Bench only got Q= 99.80% for the reference workunit - that's a bit low.
ID: 2029376 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029382 - Posted: 26 Jan 2020, 14:39:39 UTC - in response to Message 2029380.  

Well, I've gathered two Arecibo and half-a-dozen BLC35 from another machine which still has data to crunch. I want to make it a realistic test on current work - that's what it'll have to cope with if we distribute it. Back to the bench...
ID: 2029382 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029385 - Posted: 26 Jan 2020, 14:58:44 UTC - in response to Message 2029383.  
Last modified: 26 Jan 2020, 15:01:00 UTC

I think this will do:

 10 testWU(s) found
   └─(_WisGenA.wu)
   └─(_WisGenB.wu)
   └─(21ja20ad.23426.17245.6.33.66.wu)
   └─(21ja20ad.23903.6611.9.36.8.wu)
   └─(blc35_2bit_guppi_58691_62810_HIP23311_0035.19050.409.22.45.121.vlar.wu)
   └─(blc35_2bit_guppi_58691_63755_HIP23422_0038.31446.818.21.44.30.vlar.wu)
   └─(blc35_2bit_guppi_58691_64069_HIP23311_0039.12742.0.22.45.81.vlar.wu)
   └─(blc35_2bit_guppi_58691_64069_HIP23311_0039.14146.818.22.45.38.vlar.wu)
   └─(blc35_2bit_guppi_58691_64387_HIP23535_0040.23091.818.22.45.12.vlar.wu)
   └─(blc35_2bit_guppi_58692_01957_HIP80644_0118.23993.409.21.44.246.vlar.wu)

 2 reference science app(s) found
   └─(setiathome_8.00_windows_intelx86.exe -verb -nog)
   └─(setiathome_8.16_windows_intelx86__opencl_nvidia_sah.exe -verb -nog)

 1 science app(s) found
   └─(MB8_win_x86_SSE3_OpenCL_NV_SoG_r3584.exe -verb -nog)
Should give me time to break for lunch, even if they are overflows.

Good thing I put in both reference apps - I'm getting errors between 8.00 and 8.04 - and I'm still on the WisGens!

Edit - that wasn't what I meant to do, was it? OK, it's giving useful info - I'll let it run. Evidently need more coffee...
ID: 2029385 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029407 - Posted: 26 Jan 2020, 17:27:24 UTC

Well, several re-runs and re-configurations of the bench test suite later, I think I've found a setting which gives us the answer we were looking for.

That's not as bad as it sounds. The configuration problems were things like not forcing the test to be done on the NVidia card - and with two GPUs in the system, I wasn't confident about the auto-detection. Anyway, I've now got a run with those 8 randomly-selected VHARs, and

 1 reference science app(s) found 
     (setiathome_8.00_windows_intelx86.exe -verb -nog) 
 2 science app(s) found 
     (MB8_win_x86_SSE3_OpenCL_NV_SoG_r3584.exe -verb -nog) 
     (setiathome_8.16_windows_intelx86__opencl_nvidia_sah.exe -verb -nog) 
definitively run on the GTX 1050 Ti with driver version 388.43

And it's come up smelling of roses, Q 100.00 or 99.99 throughout, and the right signal counts.

Bother.

It's probably worth putting it in the installer after all, and that means I'll have to tear apart the one I had ready to go, and make space for an extra entry on the selection screen. cuda32 will probably have to go. Tomorrow, I think.
ID: 2029407 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2029411 - Posted: 26 Jan 2020, 17:51:51 UTC

Always happy to help Richard. Ha ha. I don't think anyone will miss CUDA32. Always crashed and burned on the current popular mix of hardware and drivers anyway.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2029411 · Report as offensive     Reply Quote
EdwardPF
Volunteer tester

Send message
Joined: 26 Jul 99
Posts: 389
Credit: 236,772,605
RAC: 374
United States
Message 2029435 - Posted: 26 Jan 2020, 20:46:41 UTC

if you need a simple minded tester ... thats me ... I'm in!

Ed F
ID: 2029435 · Report as offensive     Reply Quote
Profile Wiggo
Avatar

Send message
Joined: 24 Jan 00
Posts: 34928
Credit: 261,360,520
RAC: 489
Australia
Message 2029438 - Posted: 26 Jan 2020, 20:58:20 UTC - in response to Message 2029411.  
Last modified: 26 Jan 2020, 21:03:43 UTC

Always happy to help Richard. Ha ha. I don't think anyone will miss CUDA32. Always crashed and burned on the current popular mix of hardware and drivers anyway.
But there are still those out there that are still using 8xxx and 9xxx and other pre-Fermi GPU's that that app suits. ;-)

Cheers.
ID: 2029438 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029529 - Posted: 27 Jan 2020, 17:17:37 UTC

OK guys, let's make this an open Beta test.

Lunatics Installer v0.46 64-bit is available for testing.

For NVidia: includes the 8.16 (opencl_nvidia_sah) application from Beta. This is intended as a temporary workround for drivers above 431.60 on Windows 10. If Microsoft has updated your drivers, or you have updated them yourself for gaming, use the default option 'MB8_win_x86_SSE3_OpenCL_NV_sah_r3486' (the original name for the same file). If you have older drivers, or if you are running a different version of Windows, go ahead and choose the 'SoG' app - with minor update to r3584.

For AMD/ATI: updated all apps to match the v8.24 stock release, with safety patch for RX 5700-series 'NAVI' GPUs. You must also upgrade to the Adrenalin 2020 Edition 20.1.2 Optional driver or later.

The following link is for a Google Drive folder with two files: the main Installer, and a small configuration file. If you put both files in the same folder, and run the installer, it will run in 'test mode': it will extract the files you request and go through the installation process, but place the results into a separate test folder, leaving your main BOINC installation untouched.

If you want to perform the actual installation for real, simply delete the configuration file.

The installation process (which is unchanged) has proved itself over the years, so you shouldn't have any problems. The installer will locate your BOINC installation, stop it, install the new files, and restart it. The restart process sometimes fails: if that happens, just wait a few seconds after the installer has closed, and restart it manually. Or you may prefer to stop and then restart BOINC manually - either will do.

The installer is designed to preserve all SETI tasks in your cache, and run them with the new applications. This is the potentially difficult part, and I'd like to hear if there are any problems with disappearing caches. Because it's a possibility, and we're likely to go into a long maintenance outage within 24 hours, you may prefer to defer testing until SETI is back up and running.

Download from Lunatics v0.46 installer test, and enjoy.
ID: 2029529 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029651 - Posted: 28 Jan 2020, 11:45:35 UTC

Minor refresh made to the installer, to provide a better return path from OpenCL_NV_sah to CUDA. If that's your route, please re-download the installer before using it to revert to CUDA.
ID: 2029651 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029932 - Posted: 30 Jan 2020, 18:52:07 UTC

Well over 48 hours since I posted the refreshed installer. I've had precisely one PM saying it's looking OK, and no more reported quibbles. Are we good to go? I'd like to see some actual tester feedback, please - good or bad.
ID: 2029932 · Report as offensive     Reply Quote
robertmiles
Volunteer tester

Send message
Joined: 16 Jan 12
Posts: 213
Credit: 4,117,756
RAC: 6
United States
Message 2029947 - Posted: 30 Jan 2020, 20:08:45 UTC - in response to Message 2029932.  

Well over 48 hours since I posted the refreshed installer. I've had precisely one PM saying it's looking OK, and no more reported quibbles. Are we good to go? I'd like to see some actual tester feedback, please - good or bad.

Looks OK for me. I couldn't find the method you expected to report this, though.
ID: 2029947 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2029950 - Posted: 30 Jan 2020, 20:23:34 UTC - in response to Message 2029947.  

Well, since I announced the availability of the test build in this thread (and the similar parallel ATI thread), I'd sort of assumed that people would report back here. I'd prefer the test reports to be publicly visible for peer review, rather than hidden in PMs, but I'll try to read it wherever you post. But be aware I can't see inside private team discussion groups.

But anyway - thanks for the positive vote. I won't do a public release this late in the day, UK time, but I'm minded to do it tomorrow morning - say any time after 12 hours from now.
ID: 2029950 · Report as offensive     Reply Quote
VelocityRC
Avatar

Send message
Joined: 27 Sep 19
Posts: 23
Credit: 1,421,582
RAC: 86
United States
Message 2030332 - Posted: 1 Feb 2020, 16:38:02 UTC - in response to Message 2029950.  

Well, since I announced the availability of the test build in this thread (and the similar parallel ATI thread), I'd sort of assumed that people would report back here. I'd prefer the test reports to be publicly visible for peer review, rather than hidden in PMs, but I'll try to read it wherever you post. But be aware I can't see inside private team discussion groups.

But anyway - thanks for the positive vote. I won't do a public release this late in the day, UK time, but I'm minded to do it tomorrow morning - say any time after 12 hours from now.


A lot of this GPU language is over my head and I'm not sure I have the time lo learn all that is needed to understand some of this. I have heard of CUDA but my knowledge is limited. If I recall it is a alternate driver for nVidia. Is that correct ?? Us greenhorns would like a driver download link, preferably an nVidia one but that seems to be an issue ATM. I don't mind using third party GPU driver with a link posted here.

JMHO

Bill S.
ID: 2030332 · Report as offensive     Reply Quote
Profile Mr. Kevvy Crowdfunding Project Donor*Special Project $250 donor
Volunteer moderator
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 3776
Credit: 1,114,826,392
RAC: 3,319
Canada
Message 2030333 - Posted: 1 Feb 2020, 16:44:06 UTC - in response to Message 2030332.  
Last modified: 1 Feb 2020, 16:44:37 UTC

A lot of this GPU language is over my head and I'm not sure I have the time lo learn all that is needed to understand some of this. I have heard of CUDA but my knowledge is limited.


Graphics, especially 3D, is all math/computation. CUDA is simply a framework to allow your GPU to use that capability to do actual computation like a CPU does.
ID: 2030333 · Report as offensive     Reply Quote
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 2030335 - Posted: 1 Feb 2020, 16:52:40 UTC - in response to Message 2030332.  
Last modified: 1 Feb 2020, 17:02:08 UTC

@VelocityRC: You currently are showing 'NVIDIA GeForce GTX 1050 (2048MB) driver: 432.00'. That's earlier than the number in the thread title. You're good to continue as you are - no change needed.

Edit - while I was posting, Keith Myers suggested that one might fall into the gap between 'known good' and 'known bad'. If you have any problems with SETI tasks(*), you might be better following the link for general downloaders.

(*) apart from not getting any...

@everyone else: https://www.nvidia.com/Download/Find.aspx?lang=en-us is probably the place to look. Fill in everything, and you should get something like this:


Don't use the ones I've crossed out: use the green one at the bottom.
ID: 2030335 · Report as offensive     Reply Quote
Profile Keith Myers Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 29 Apr 01
Posts: 13164
Credit: 1,160,866,277
RAC: 1,873
United States
Message 2030340 - Posted: 1 Feb 2020, 17:34:54 UTC

Is there a way to check the providence of a Windows Nvidia driver as to origin?

I think all these Windows hosts with driver version 432.00 are using a Microsoft delivered driver of which we know nothing about. The driver could be good or bad, we just don't know. Somebody needs to run the 432.00 driver and some known VHAR Arecibo work and see if the tasks stall out or complete normally in the benchmark tools.
Seti@Home classic workunits:20,676 CPU time:74,226 hours

A proud member of the OFA (Old Farts Association)
ID: 2030340 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 149
Credit: 9,783,406
RAC: 9
United States
Message 2030342 - Posted: 1 Feb 2020, 17:52:29 UTC
Last modified: 1 Feb 2020, 17:55:13 UTC

Keith:

I'm willing to do what you want, if you explain it more clearly.

1) Are you proposing that I DDU (to get rid of all current drivers), then see what Windows gives me, and if I get 432.00, test it?
2) Also, do you know which GPU is likely to get 432.00 from Windows Update? I think my newer GPUs get offered a newer driver from Windows Update, though not 100% sure.

Note: I have previously tested 431.60 (public release from NVIDIA), as well as 431.68 (Hotfix release from NVIDIA).
Both were found to be good.

I suspect 432.00 is likely to be good as well, since I think it was likely a "Release 430" driver, and the problems started with 436.02 from the "Release 435" driver branch.
The Release branches are found in the Release Notes of the drivers.

Let me know,
Jacob
ID: 2030342 · Report as offensive     Reply Quote
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · 16 . . . 20 · Next

Message boards : Number crunching : NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.