OpenCL NV MultiBeam v8 SoG edition for Windows

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 18 · Next

AuthorMessage
Rasputin42
Volunteer tester

Send message
Joined: 25 Jul 08
Posts: 412
Credit: 5,834,661
RAC: 0
United States
Message 1773418 - Posted: 23 Mar 2016, 10:52:55 UTC - in response to Message 1773375.  

Did you experience issues with default settings before such tune attempts?


Yes, i did. The gpu utilization was very spiky and low.

I discovered, it was classified as "low performance", which it is not. This set the period iterations num to 500, which made unbelievably slow.

I got some performance out of it by setting P.I.N. nearly to zero.

I also tried the most recent driver-- no change.

No driver restarts.
ID: 1773418 · Report as offensive
Marco Franceschini
Volunteer tester
Avatar

Send message
Joined: 4 Jul 01
Posts: 54
Credit: 69,877,354
RAC: 135
Italy
Message 1773450 - Posted: 23 Mar 2016, 13:04:11 UTC - in response to Message 1773418.  
Last modified: 23 Mar 2016, 13:10:09 UTC

Did you experience issues with default settings before such tune attempts?


Yes, i did. The gpu utilization was very spiky and low.

I discovered, it was classified as "low performance", which it is not. This set the period iterations num to 500, which made unbelievably slow.

I got some performance out of it by setting P.I.N. nearly to zero.

I also tried the most recent driver-- no change.

No driver restarts.


My GTX 660M/GT820M/GT 640/GT 730 all is classified as "low performance" too.
ID: 1773450 · Report as offensive
Rasputin42
Volunteer tester

Send message
Joined: 25 Jul 08
Posts: 412
Credit: 5,834,661
RAC: 0
United States
Message 1773456 - Posted: 23 Mar 2016, 13:31:49 UTC

My GTX 660M/GT820M/GT 640/GT 730 all is classified as "low performance" too.


So what did you do?
ID: 1773456 · Report as offensive
Marco Franceschini
Volunteer tester
Avatar

Send message
Joined: 4 Jul 01
Posts: 54
Credit: 69,877,354
RAC: 135
Italy
Message 1773478 - Posted: 23 Mar 2016, 15:22:07 UTC

So far i'm in the process of recompiling fftw 3.3 with Intel Compiler C++ and researching about this "issue" (my gpu all are under Tflops in single precision).
ID: 1773478 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1773480 - Posted: 23 Mar 2016, 15:26:50 UTC - in response to Message 1773450.  

Did you experience issues with default settings before such tune attempts?


Yes, i did. The gpu utilization was very spiky and low.

I discovered, it was classified as "low performance", which it is not. This set the period iterations num to 500, which made unbelievably slow.

I got some performance out of it by setting P.I.N. nearly to zero.

I also tried the most recent driver-- no change.

No driver restarts.


My GTX 660M/GT820M/GT 640/GT 730 all is classified as "low performance" too.


Because they are.


With each crime and every kindness we birth our future.
ID: 1773480 · Report as offensive
Joe Januzzi
Volunteer tester
Avatar

Send message
Joined: 13 Apr 03
Posts: 54
Credit: 307,134,110
RAC: 492
United States
Message 1773541 - Posted: 23 Mar 2016, 20:50:10 UTC - in response to Message 1773480.  

FYI and a little help:
I been using the “mb_cmdline_win_x86_SSE3_OpenCL_NV.txt” file only (V. 3366).
-sbs 192 -instances_per_device 3 -period_iterations_num 20 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 16 -oclfft_tune_cw 16

WU's times are fast (for me).

When I added the “MultiBeam_NV_config.xml” file to the mix, my WU's times decrease by 60+ secounds.
My goal was to use the same parameters for both setup's, so there be minimum speed changes, if any (I think). I also had the same slow down on version 3401.

So I have 2 questions:
1. Is my “MultiBeam_NV_config.xml” file or setup wrong?
2. Should there be any time difference in using the “MultiBeam_NV_config.xml” file?
Thanks
Joe

WU's times before adding “MultiBeam_NV_config.xml” file shown on first line of data, second line with file added.
Note: I could of shown more WU's for samples, but they were mostly in the same time different ratio for both versions.

_______GTX 980(do)____________________GTX 980(d2)_____________________GTX 780 (d3)
__AR____Elapsed_CPU_____WU#_______AR___Elapsed_CPU_____WU________#AR___Elapsed__CPU____WU#
0.4224____881___424__101999550___0.4225___878___463__2101726988___0.4222___1067___605__2102034291
0.4224____967___541__2102376628__0.4225___957___447__2102379836___0.4221___1129___727__2102368697

http://setiathome.berkeley.edu/workunit.php?wuid=2101999550
http://setiathome.berkeley.edu/workunit.php?wuid=2101726988
http://setiathome.berkeley.edu/workunit.php?wuid=2102034291
http://setiathome.berkeley.edu/workunit.php?wuid=2102376628
http://setiathome.berkeley.edu/workunit.php?wuid=2102379836
http://setiathome.berkeley.edu/workunit.php?wuid=2102368697

Note: I ran the file with and without using ";;; GTX xxx" in the file with no speed difference, so I left it in (shown in red).
MultiBeam_NV_config.xml
;;; GTX 980
<device0>
<period_iterations_num>20</period_iterations_num>
<spike_fft_thresh>4096</spike_fft_thresh>
<sbs>192</sbs>
<oclfft_plan>
<size>256</size>
<global_radix>256</global_radix>
<local_radix>16</local_radix>
<workgroup_size>256</workgroup_size>
<max_local_size>512</max_local_size>
<localmem_banks>16</localmem_banks>
<localmem_coalesce_width>16</localmem_coalesce_width>
</oclfft_plan>
</device0>
;;; GTX 780
<device1>
<period_iterations_num>20</period_iterations_num>
<spike_fft_thresh>4096</spike_fft_thresh>
<sbs>192</sbs>
<oclfft_plan>
<size>256</size>
<global_radix>256</global_radix>
<local_radix>16</local_radix>
<workgroup_size>256</workgroup_size>
<max_local_size>512</max_local_size>
<localmem_banks>16</localmem_banks>
<localmem_coalesce_width>16</localmem_coalesce_width>
</oclfft_plan>
</device1>
;;; GTX 980
<device2>
<period_iterations_num>20</period_iterations_num>
<spike_fft_thresh>4096</spike_fft_thresh>
<sbs>192</sbs>
<oclfft_plan>
<size>256</size>
<global_radix>256</global_radix>
<local_radix>16</local_radix>
<workgroup_size>256</workgroup_size>
<max_local_size>512</max_local_size>
<localmem_banks>16</localmem_banks>
<localmem_coalesce_width>16</localmem_coalesce_width>
</oclfft_plan>
</device2>
;;; GTX 960
<device3>
<period_iterations_num>20</period_iterations_num>
<spike_fft_thresh>4096</spike_fft_thresh>
<sbs>192</sbs>
<oclfft_plan>
<size>256</size>
<global_radix>256</global_radix>
<local_radix>16</local_radix>
<workgroup_size>256</workgroup_size>
<max_local_size>512</max_local_size>
<localmem_banks>16</localmem_banks>
<localmem_coalesce_width>16</localmem_coalesce_width>
</oclfft_plan>
</device3>


app_info.xml changes in red (only OpenCL_r3366_SoG shown)
<app>
<name>setiathome_v8</name>
</app>
<file_info>
<name>MB8_win_x86_SSE3_OpenCL_NV_r3366_SoG.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-3-4_x86.dll</name>
<executable/>
</file_info>
<file_ref>
<file_name>MultiBeam_Kernels_r3366.cl</file_name>
</file_ref>
<file_info>
<name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</name>
</file_info>
<file_info>
<name>MultiBeam_NV_config.xml</name>
</file_info>

<app_version>
<app_name>setiathome_v8</app_name>
<version_num>800</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_SoG</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>MB8_win_x86_SSE3_OpenCL_NV_r3366_SoG.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
<file_ref>
<file_name>MultiBeam_NV_config.xml</file_name>
</file_ref>

</app_version>

mb_cmdline_win_x86_SSE3_OpenCL_NV.txt
-instances_per_device 3 -tune 1 64 1 4





[img][/img][img][/img]

Real Join Date:
Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC
Try to learn something new everyday.
ID: 1773541 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1773567 - Posted: 23 Mar 2016, 23:01:31 UTC - in response to Message 1773418.  

Did you experience issues with default settings before such tune attempts?


Yes, i did. The gpu utilization was very spiky and low.

I discovered, it was classified as "low performance", which it is not. This set the period iterations num to 500, which made unbelievably slow.

I got some performance out of it by setting P.I.N. nearly to zero.

I also tried the most recent driver-- no change.

No driver restarts.


don't you see some contradiction between your posts?
"spiky and low GPU usage" is not usability issues. Lags and driver restarts are.
Defaults chosen to allow operation w/o driver restarts in unattended mode for most cards. If user wants to optimize - no probs, he can do that (and then appearing lags on his own choice). So, if r3401 slower than prev build with defaults - well, because it's stock RC and should work on most cards available.
Prev rev did not on ow-performance GPUs and that required separate tuning for them.
And regarding peak app performance - do you see slower r3401 operation with same tuning line as for prev build? How you compare builds?
ID: 1773567 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1773569 - Posted: 23 Mar 2016, 23:16:30 UTC - in response to Message 1773541.  
Last modified: 23 Mar 2016, 23:17:47 UTC


When I added the “MultiBeam_NV_config.xml” file to the mix, my WU's times decrease by 60+ secounds.
My goal was to use the same parameters for both setup's, so there be minimum speed changes, if any (I think). I also had the same slow down on version 3401.

So I have 2 questions:
1. Is my “MultiBeam_NV_config.xml” file or setup wrong?
2. Should there be any time difference in using the “MultiBeam_NV_config.xml” file?

"config" file provided for multi-GPU hosts to supply separate tuning for different devices. If provided tuning the same for all of them then "config" file usage is excessive and cmdline (or any another available method to supply command line params) is enough.

After that, no matter wich way you supply param to app - app will react on particular param value, not on the way how param was passed to it (config has precedence though to override common param values).

Regarding performance changes between builds:
1) defaults are changed.
2) work splitting for PulseFind is changed.

If you see slowdown because of "low performance GPU detected" and know that GPU can perform OK with lower -period_iterations_num N value that set it lower and app will obey.

In other case try to change -sbs N setting and values of newly added settings:
-pref_wg_size N (was 128 before, changed to 64 with r3401 for ATi, leaved at 32 for NV)
-pref_wg_num_per_cu N (default is 4; different algorithm used before so no direct comparison with older default).
ID: 1773569 · Report as offensive
Joe Januzzi
Volunteer tester
Avatar

Send message
Joined: 13 Apr 03
Posts: 54
Credit: 307,134,110
RAC: 492
United States
Message 1773595 - Posted: 24 Mar 2016, 1:55:13 UTC - in response to Message 1773569.  

Raistmer,
Thanks again for all your help.
Joe

Regarding performance changes between builds:
1) defaults are changed.
2) work splitting for PulseFind is changed.

I'm guilty of this. Not draining cache before build changes. That could explain a lot. Will drain before going back to r3401.

In other case try to change -sbs N setting and values of newly added settings:
-pref_wg_size N (was 128 before, changed to 64 with r3401 for ATi, leaved at 32 for NV)
-pref_wg_num_per_cu N (default is 4; different algorithm used before so no direct comparison with older default).

Will make some changes, when I'm back on r3401.

The config file will be the last one to be added, once I get r3401 tuned for best performance. The config file will be tested to see if I can fine tune my GTX 960 separably from my other video cards. No speed change = no config file.

Real Join Date:
Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC
Try to learn something new everyday.
ID: 1773595 · Report as offensive
Joe Januzzi
Volunteer tester
Avatar

Send message
Joined: 13 Apr 03
Posts: 54
Credit: 307,134,110
RAC: 492
United States
Message 1774262 - Posted: 26 Mar 2016, 21:57:17 UTC - in response to Message 1773595.  

-pref_wg_num_per_cu N (default is 4; different algorithm used before so no direct comparison with older default).

I was wondering if there is a set range (like 4 to ??)?
So far I tried 4, 6, 10 and 20. All the numbers so far, seem to make any difference in changes to speed. It's also harder to track any changes for “-pref_wg_num_per_cu N”, because it doesn't say anything in the stderr.txt file. I think after “20”, I'll go back to the default.

So far this is my best set-up for version 3401_SoG. Still trying different params.
Joe

-sbs 192 -instances_per_device 3 -period_iterations_num 20 -pref_wg_size 32 -pref_wg_num_per_cu 4 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 16 -oclfft_tune_cw 16


http://setiathome.berkeley.edu/workunit.php?wuid=2105819825
Stderr output

<core_client_version>7.4.42</core_client_version>
<![CDATA[
<stderr_txt>
Running on device number: 0
Maximum single buffer size set to:192MB
Number of app instances per device set to:3
Number of period iterations for PulseFind set to:20
Preferred workgroup size set to 32. <--- Is this right?
SpikeFind FFT size threshold override set to:4096
TUNE: kernel 1 now has workgroup size of (64,1,4)
oclFFT global radix override set to:256
oclFFT local radix override set to:16
oclFFT max WG size override set to:256
oclFFT max local FFT size override set to:512
oclFFT number of local memory banks set to:16
oclFFT minimal memory coalesce width set to:16
Priority of worker thread raised successfully
Priority of process adjusted successfully, below normal priority class used
OpenCL platform detected: Intel(R) Corporation
OpenCL platform detected: NVIDIA Corporation
BOINC assigns device 0
Info: BOINC provided OpenCL device ID used

Build features: SETI8 Non-graphics OpenCL USE_OPENCL_NV OCL_ZERO_COPY SIGNALS_ON_GPU OCL_CHIRP3 FFTW USE_SSE3 x86
CPUID: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz

Cache: L1=64K L2=256K

CPU features: FPU TSC PAE CMPXCHG8B APIC SYSENTER MTRR CMOV/CCMP MMX FXSAVE/FXRSTOR SSE SSE2 HT SSE3 SSSE3 SSE4.1 SSE4.2 AVX
OpenCL-kernels filename : MultiBeam_Kernels_r3401.cl
ar=0.427216 NumCfft=195899 NumGauss=1105829004 NumPulse=226440719403 NumTriplet=452848448667
Currently allocated 293 MB for GPU buffers
In v_BaseLineSmooth: NumDataPoints=1048576, BoxCarLength=8192, NumPointsInChunk=32768

Real Join Date:
Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC
Try to learn something new everyday.
ID: 1774262 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1774267 - Posted: 26 Mar 2016, 22:16:29 UTC

Changes in -pref_wg_num_per_cu N makes it just slower.
At least on my GPU.
It might be different on NV cards not sure atm.
As soon i have more time left i will test it on NV also.


With each crime and every kindness we birth our future.
ID: 1774267 · Report as offensive
Joe Januzzi
Volunteer tester
Avatar

Send message
Joined: 13 Apr 03
Posts: 54
Credit: 307,134,110
RAC: 492
United States
Message 1774268 - Posted: 26 Mar 2016, 22:22:17 UTC - in response to Message 1774267.  

Thanks Mike for the info.
Joe

Real Join Date:
Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC
Try to learn something new everyday.
ID: 1774268 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1774269 - Posted: 26 Mar 2016, 22:23:25 UTC - in response to Message 1774262.  

-pref_wg_num_per_cu N introduced in 3410 so didn't come in current RC
ID: 1774269 · Report as offensive
Sleepy
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 219
Credit: 98,947,784
RAC: 28,360
Italy
Message 1775367 - Posted: 31 Mar 2016, 17:43:58 UTC

During crunching, sometimes I get a driver reset. BOINC switches to another pair of WUs (I am crunching 2 WUs at a time) which follow their fate, whichever it is.

When crunching comes back to the WU(s) that caused the reset, roughly at the same completion percentage driver resets and the cycle starts over. I eventually have to manually abort the "offending" WU(s) before the system efficiency degrades, especially if any other similar resetting WU comes along, increasing the oscillations.

I am experiencing it on both my systems, on a GTX660 and GTX 650Ti.
I am running the 3401 version, which should be the last one.

I have also tried to use the default settings of the application, but the effects stay the same.
I think, but I have not checked yet, that it is WU related in some form.

But I cannot be more precise at the moment.

Anyone having the same problems or a solution?

Cheers,
Sleepy
ID: 1775367 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1775371 - Posted: 31 Mar 2016, 18:07:32 UTC - in response to Message 1775367.  

Sleepy,

Have you checked to see what the true angle is on those work units? Just curious.
ID: 1775371 · Report as offensive
Sleepy
Volunteer tester
Avatar

Send message
Joined: 21 May 99
Posts: 219
Credit: 98,947,784
RAC: 28,360
Italy
Message 1775374 - Posted: 31 Mar 2016, 18:14:03 UTC - in response to Message 1775371.  

This is what I wanted to do, but I am quite busy ATM and I cannot go so deep.
But yes, this was my idea as well, to be confirmed.

But if you go inside my PCs you find till they are there my aborted MB WUs.

Sleepy
ID: 1775374 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1778532 - Posted: 12 Apr 2016, 23:23:28 UTC - in response to Message 1775374.  

Raistmer, will you app work for these new GBT data?
ID: 1778532 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1778584 - Posted: 13 Apr 2016, 3:42:18 UTC - in response to Message 1778532.  

Raistmer, will you app work for these new GBT data?


I`m not Raistmer but yes it will.


With each crime and every kindness we birth our future.
ID: 1778584 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1778773 - Posted: 13 Apr 2016, 17:52:26 UTC

So was rewriting a portion of the app_info and afterwards got the following message in start up log

4/13/2016 11:10:28 AM | SETI@home | [error] State file error: duplicate app version: setiathome_v8 windows_x86_64 800
4/13/2016 11:10:28 AM | SETI@home | [error] State file error: duplicate app version: setiathome_v8 windows_intelx86 800



When I went looking saw these 2 areas and am thinking they are causing the error report? Should I remove 1 of them?

<app_version>
<app_name>setiathome_v8</app_name>
<version_num>800</version_num>
<platform>windows_x86_64</platform>
<api_version>7.5.0</api_version>
<file_ref>
<file_name>MB8_win_x64_AVX_VS2010_r3330.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>mb_cmdline_win_x64_AVX_VS2010.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
</app_version>
<app_version>
<app_name>setiathome_v8</app_name>
<version_num>800</version_num>
<platform>windows_intelx86</platform>
<api_version>7.5.0</api_version>
<file_ref>
<file_name>MB8_win_x64_AVX_VS2010_r3330.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x64.dll</file_name>
</file_ref>
<file_ref>
<file_name>mb_cmdline_win_x64_AVX_VS2010.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
</app_version>
ID: 1778773 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1778789 - Posted: 13 Apr 2016, 19:29:52 UTC - in response to Message 1778773.  

So was rewriting a portion of the app_info and afterwards got the following message in start up log

4/13/2016 11:10:28 AM | SETI@home | [error] State file error: duplicate app version: setiathome_v8 windows_x86_64 800
4/13/2016 11:10:28 AM | SETI@home | [error] State file error: duplicate app version: setiathome_v8 windows_intelx86 800



When I went looking saw these 2 areas and am thinking they are causing the error report? Should I remove 1 of them?

Yeah, I'd think so. Error message seems pretty clear ...
ID: 1778789 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 18 · Next

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.