NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units

Message boards : Number crunching : NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

AuthorMessage
Ian&Steve C.
Avatar

Send message
Joined: 28 Sep 99
Posts: 1982
Credit: 901,362,660
RAC: 2,740,871
United States
Message 2016388 - Posted: 23 Oct 2019, 17:42:53 UTC - in response to Message 2016385.  
Last modified: 23 Oct 2019, 17:43:31 UTC

Geforce experience isn’t the culprit. But you did the right thing avoiding that software anyway. It’s been know to cause all kinds of problems, most unrelated to SETI.

As for why you didn’t see anything until recently, well the problem only happens on one type of WU, Arecibo VHAR. So you probably didn’t get one until recently.
Seti@Home classic workunits: 29,492 CPU time: 134,419 hours

ID: 2016388 · Report as offensive     Reply Quote
billy ewell 1931
Volunteer tester

Send message
Joined: 1 Apr 03
Posts: 22
Credit: 22,752,843
RAC: 33,134
United States
Message 2016390 - Posted: 23 Oct 2019, 18:00:52 UTC - in response to Message 2016388.  

Of course and thanks; good information. Bill
ID: 2016390 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 99
Credit: 9,381,912
RAC: 1,405
United States
Message 2016438 - Posted: 24 Oct 2019, 1:10:15 UTC
Last modified: 24 Oct 2019, 1:16:12 UTC

Could anyone having this problem, please do the steps below, to report your findings to NVIDIA?

If NVIDIA gets more useful reports, they will fix it.

1) Find my repro steps here:
https://setiathome.berkeley.edu/forum_thread.php?id=84780&postid=2016218

2) Test the 431.60 drivers. Verify that they work for all your GPUs.

3) Test the 440.97 drivers. Verify that they FAIL for some of your GPUs.

4) Report your findings to NVIDIA, mentioning "SETI OpenCL", to the Driver Feedback page located here:
https://forms.gle/kJ9Bqcaicvjb82SdA

Hopefully the steps are straightforward enough to complete, but I admit they were written hastily.

Thank you!
ID: 2016438 · Report as offensive     Reply Quote
Holdolin

Send message
Joined: 10 Apr 19
Posts: 68
Credit: 87,431,832
RAC: 714,673
United States
Message 2016482 - Posted: 24 Oct 2019, 14:56:51 UTC

Thanks for the information all. While all my dedicated crunching rigs run Linux, I do have this old beat up Windows machine I used and decided to put it to work as well when I found the exact problem discussed in this thread. Sure enough, reverting to the 431.6 drivers fixed the issue. Again, thanks to all who contributed to the solution.
ID: 2016482 · Report as offensive     Reply Quote
Dave Dickinson

Send message
Joined: 30 Oct 05
Posts: 4
Credit: 545,316
RAC: 317
United Kingdom
Message 2016484 - Posted: 24 Oct 2019, 16:03:44 UTC

Thanks to all who posted above. I have just noticed that some workunits in Seti were running really slow and the gpu process was 10% instead of the usual 90%. Glad i read this thread as i was starting to believe that my GPU had gone faulty. Will be rolling back the drivers shortly!
ID: 2016484 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 99
Credit: 9,381,912
RAC: 1,405
United States
Message 2016561 - Posted: 25 Oct 2019, 11:41:46 UTC

I just heard back from NVIDIA, regarding my feedback about breaking SETI OpenCL, and NVIDIA is looking into it.

Hurray!
ID: 2016561 · Report as offensive     Reply Quote
Holdolin

Send message
Joined: 10 Apr 19
Posts: 68
Credit: 87,431,832
RAC: 714,673
United States
Message 2016569 - Posted: 25 Oct 2019, 12:19:03 UTC

Well that's a good thing. Thanks for your effort.
ID: 2016569 · Report as offensive     Reply Quote
Lemon Wolf
Volunteer tester
Avatar

Send message
Joined: 19 Jul 09
Posts: 9
Credit: 1,109,151
RAC: 273
Germany
Message 2016675 - Posted: 26 Oct 2019, 4:08:12 UTC

Isnt that what i said a couple of posts ago?
ID: 2016675 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 99
Credit: 9,381,912
RAC: 1,405
United States
Message 2016676 - Posted: 26 Oct 2019, 4:21:08 UTC

Yes. And no. Maybe. :)

Thank you for reporting that you got a response from them too. That is good.

My understanding was that they thought they fixed it... and now know that they didn't and are again looking at it.

Maybe. Don't know for sure. But do know that they are for sure looking at it presently, which is good.

Thank you again for helping with that!
ID: 2016676 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11819
Credit: 183,113,987
RAC: 227,991
Australia
Message 2016677 - Posted: 26 Oct 2019, 4:25:19 UTC - in response to Message 2016675.  
Last modified: 26 Oct 2019, 4:25:55 UTC

Isnt that what i said a couple of posts ago?
Someone here posting about contact from Nvidia directly to them is considerably more re-assuring than a post from someone here, that has heard from someone, that has a contact with someone at Nvidia.
Grant
Darwin NT
ID: 2016677 · Report as offensive     Reply Quote
Profile Sebastian Brack

Send message
Joined: 22 Aug 99
Posts: 9
Credit: 10,481,820
RAC: 3,792
Germany
Message 2016686 - Posted: 26 Oct 2019, 7:55:39 UTC

The Nvidia Quadro driver 440.xx does also not work correctly.

https://setiathome.berkeley.edu/result.php?resultid=8156230119
ID: 2016686 · Report as offensive     Reply Quote
Lemon Wolf
Volunteer tester
Avatar

Send message
Joined: 19 Jul 09
Posts: 9
Credit: 1,109,151
RAC: 273
Germany
Message 2016970 - Posted: 28 Oct 2019, 12:52:38 UTC - in response to Message 2016677.  

Isnt that what i said a couple of posts ago?
Someone here posting about contact from Nvidia directly to them is considerably more re-assuring than a post from someone here, that has heard from someone, that has a contact with someone at Nvidia.

I am sure to keep my mouth shut in the future then.
Certainly dont want to spread any hearsay.
ID: 2016970 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 99
Credit: 9,381,912
RAC: 1,405
United States
Message 2016973 - Posted: 28 Oct 2019, 13:03:14 UTC - in response to Message 2016970.  

Come on. Same team. Thank you for helping to work together toward getting this problem fixed. I hope you continue to do that.

Thanks,
Jacob
ID: 2016973 · Report as offensive     Reply Quote
Stephen Thomas Home - Delhi

Send message
Joined: 16 Sep 99
Posts: 10
Credit: 487,234,377
RAC: 142,770
India
Message 2017120 - Posted: 29 Oct 2019, 13:43:35 UTC

I did just tested NVIDIA GRD 441.08 but no change. I switch back to 431.60.
ID: 2017120 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 99
Credit: 9,381,912
RAC: 1,405
United States
Message 2017148 - Posted: 29 Oct 2019, 17:46:58 UTC - in response to Message 2017120.  

I confirm. The 441.08 drivers do not fix the problem. We will have to continue waiting for a fix.
431.60 continues to be the workaround.
ID: 2017148 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4913
Credit: 659,292,804
RAC: 1,430,364
United States
Message 2017286 - Posted: 30 Oct 2019, 23:10:10 UTC - in response to Message 2016087.  

Has anyone with Windows 10 tried the VHARs with the Non-SoG OpenCL App? ...
There you go, from BETA, where you can find the Non-SoG App;
https://setiweb.ssl.berkeley.edu/beta/result.php?resultid=34768072
Operating System : Microsoft Windows 10 Professional x64 Edition, (10.00.18362.00)
Driver version: 440.97
WU true angle range is : 4.344386
 OpenCL Platform Name:					 NVIDIA CUDA
Number of devices:				 1
  Max compute units:				 10
  Max work group size:				 1024
  Max clock frequency:				 1784Mhz
  Max memory allocation:			 1610612736
  Cache type:					 Read/Write
  Cache line size:				 128
  Cache size:					 491520
  Global memory size:				 6442450944
  Constant buffer size:				 65536
  Max number of constant args:			 9
  Local memory type:				 Scratchpad
  Local memory size:				 49152
  Queue properties:				 
    Out-of-Order:				 Yes
  Name:						 GeForce GTX 1060 6GB
  Vendor:					 NVIDIA Corporation
  Driver version:				 440.97
  Version:					 OpenCL 1.2 CUDA
  Extensions:					 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics

Work Unit Info:
...............
Credit multiplier is :  2.85
WU true angle range is :  4.344386
Used GPU device parameters are:
	Number of compute units: 10
	Single buffer allocation size: 128MB
	Total device global memory: 6144MB
	max WG size: 1024
	local mem type: Real
	FERMI path used: yes
	LotOfMem path: no
	LowPerformanceGPU path: no
	HighPerformanceGPU path: no
period_iterations_num=50
Triplet: peak=7.870685, time=83.23, period=0.3428, d_freq=1419212646.48, chirp=0, fft_len=8 
Triplet: peak=8.022648, time=4.461, period=0.0811, d_freq=1419210815.43, chirp=0, fft_len=16 
Triplet: peak=8.850973, time=100.4, period=0.1901, d_freq=1419204101.56, chirp=0, fft_len=16 
Spike: peak=24.75023, time=46.98, d_freq=1419207139.83, chirp=0.79302, fft_len=128k
Spike: peak=24.16562, time=78.85, d_freq=1419208781.99, chirp=-33.037, fft_len=32k
Triplet: peak=8.684459, time=46.9, period=0.2966, d_freq=1419211939.82, chirp=-41.094, fft_len=32 
Spike: peak=24.23859, time=105.7, d_freq=1419210593.08, chirp=-85.017, fft_len=32k
Spike: peak=24.04153, time=105.7, d_freq=1419210593.09, chirp=-85.076, fft_len=32k

Best spike: peak=24.75023, time=46.98, d_freq=1419207139.83, chirp=0.79302, fft_len=128k
Best autocorr: peak=17.08628, time=33.55, delay=6.5962, d_freq=1419209320.8, chirp=10.026, fft_len=128k
Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.124e+011, d_freq=0,
	score=-12, null_hyp=0, chirp=0, fft_len=0 
Best pulse: peak=2.854268, time=80.09, period=0.1032, d_freq=1419205759.15, score=0.8821, chirp=-82.188, fft_len=32 
Best triplet: peak=8.850973, time=100.4, period=0.1901, d_freq=1419204101.56, chirp=0, fft_len=16 

Spike count:    4
Autocorr count: 0
Pulse count:    0
Triplet count:  4
Gaussian count: 0
Another, https://setiweb.ssl.berkeley.edu/beta/result.php?resultid=34767640
ID: 2017286 · Report as offensive     Reply Quote
Aaron
Volunteer tester

Send message
Joined: 1 May 10
Posts: 5
Credit: 2,610,881
RAC: 17,127
United States
Message 2017588 - Posted: 2 Nov 2019, 20:34:48 UTC

Be nice when Nvidia fixes the issue. Until then i'll just keep swapping between drivers for games and crunching.
ID: 2017588 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4913
Credit: 659,292,804
RAC: 1,430,364
United States
Message 2017591 - Posted: 2 Nov 2019, 21:10:38 UTC

At this point it's pretty clear the problem doesn't exist when using the Non-SoG version of the App. So, if you are using the Lunatics version of SoG it's as simple as changing the App over in the app_info.xml to the version here; http://boinc2.ssl.berkeley.edu/beta/download/setiathome_8.16_windows_intelx86__opencl_nvidia_sah.exe
Download the 8.16 App, place it in your setiathome.berkeley.edu folder, then change your app_info.xml to name the new App instead of the SoG version. Since Lunatics usually creates a 5+ page app_info.xml, it would probably be best to use find & replace ;-) I think both Apps uses the same libfftw3f-3-3-4_x86.dll, so, you probably don't need to change anything else.

You can see the setiathome_8.16_windows_intelx86__opencl_nvidia_sah.exe App working with Arecibo VHARs at beta, https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=87127&offset=220
ID: 2017591 · Report as offensive     Reply Quote
Jacob Klein
Volunteer tester

Send message
Joined: 15 Apr 11
Posts: 99
Credit: 9,381,912
RAC: 1,405
United States
Message 2017596 - Posted: 2 Nov 2019, 21:30:53 UTC

While that may be true...

To my knowledge...
Normal crunchers don't mess with app_info.xml or app_config.xml files.
Normal crunchers will want NVIDIA to fix this as soon as possible.
Normal crunchers will do semi-normal things like reverting drivers, as a workaround for this issue, if they're even aware of the issue and seeking a workaround.

I am hopeful NVIDIA will help out the Normal crunchers soon! :)
ID: 2017596 · Report as offensive     Reply Quote
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 11819
Credit: 183,113,987
RAC: 227,991
Australia
Message 2017597 - Posted: 2 Nov 2019, 21:35:49 UTC - in response to Message 2017591.  

At this point it's pretty clear the problem doesn't exist when using the Non-SoG version of the App.
So I guess the questions are
1 What's different between he applications?
2 What's different about the recent drivers?
Grant
Darwin NT
ID: 2017597 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

Message boards : Number crunching : NVidia 436.xx and later drivers can cause very long compute times especially on Arecibo VHAR work units


 
©2019 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.