OpenCL NV MultiBeam v8 SoG edition for Windows

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 21 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1770396 - Posted: 8 Mar 2016, 9:47:42 UTC
Last modified: 8 Mar 2016, 9:49:40 UTC

If you see slowdown versus r3366 please try to play with this parameters:

-pref_wg_size N
New one, older default would correspond -pref_wg_size 128 for ATi and 32 for NV
Now default for ATi is 64 (for NV should be same 32 but maybe defaults screwed so try -pref_wg_size from 32 to 256 in step of 64 for ATi and 32 for NV).
And better to do this offline cause with some configs high WG sizes caused total OS freeze (yeah, we have "truly preemptive multitasking OS" all these years called Windows :/ )

-sbs N
default is 128, try different values around. Not nessessary in 64MB steps (!).
this value used @decision how many WG will be. Non-standard size could change that decision to be more speedy.
Also, would be good to use -v 8 option and note what WG numbers formed in r3366 and r3401 for similar PulseFind launches.
r3401 should load all available CUs and load them more fully in case of memory limit, but this can have side-effects of different memory access patterns. Quite possible that new memory access pattern causes more slowdown than few idle CUs would do in prev revision.
And that memory access pattern can be changed in some extent with these 2 options.
ID: 1770396 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1770397 - Posted: 8 Mar 2016, 9:52:23 UTC

Also, please report AR of task where you see slowdown.
I expect some changes for low and mid ARs between r3366 and r3401 but no changes for high ARs. If you see slowdown with high AR value - make it clear cause it's unexpected.
ID: 1770397 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1770402 - Posted: 8 Mar 2016, 10:08:00 UTC

@ all who has "alpha-tester" status on Lunatics boards and has NV FERMI+ hardware, please read this: http://lunatics.kwsn.info/index.php/topic,1777.msg60748.html#msg60748 and make conclusions.
ID: 1770402 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1770441 - Posted: 8 Mar 2016, 15:04:54 UTC - in response to Message 1770402.  

Speaking of lunatics, is there a way to register over there anymore? I used to have an account and it got lost somewhere in the transition and now it says it no longer accepts new users.

Thanks,

Chris
ID: 1770441 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1849
Credit: 268,616,081
RAC: 1,349
United States
Message 1770467 - Posted: 8 Mar 2016, 23:00:26 UTC - in response to Message 1770441.  

Speaking of lunatics, is there a way to register over there anymore? I used to have an account and it got lost somewhere in the transition and now it says it no longer accepts new users.

Thanks,

Chris

Dunno. I tried for almost a year to get verified, finally gave it up and found another team to join. Seems like no one is minding the store. Went to Arkayn's place instead ...
ID: 1770467 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1770534 - Posted: 9 Mar 2016, 10:26:55 UTC - in response to Message 1770467.  

There are issues with site management.
ID: 1770534 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1770535 - Posted: 9 Mar 2016, 10:27:52 UTC

All OpenCL Windows MultiBeam builds were updated on Beta project, please test there to speedup release to main as stock app.
ID: 1770535 · Report as offensive
Joe Januzzi
Volunteer tester
Avatar

Send message
Joined: 13 Apr 03
Posts: 54
Credit: 307,134,110
RAC: 492
United States
Message 1770580 - Posted: 9 Mar 2016, 19:20:25 UTC - in response to Message 1770535.  

FYI

When I used V. 3401 my CPU was running mostly at 100%. Screen lags for the first time. I tried different values for “ increase -period_iteration_num” in mb_cmdline*.txt. No number worked on stopping the screen lags. When I used V. 3366 the CPU only hit 100% at times, and when it did I had no screen lags. V. 3401 worked my system to hard. On either version if I could throttle the CPU just a little, it would be real nice. Would using Tthrottle work?

My RAC on my GTX 560 Ti running V. 3366 is going up, even with 1 CPU Wu running. When my CPU Wu's are done (like watching water to boil). I'll test with GPU only. After that I like to run V. 3401, because I have only one card in this system.

All OpenCL Windows MultiBeam builds were updated on Beta project, please test there to speedup release to main as stock app.

Raistmer,
By the time I saw this post, I was running OpenCL Windows MultiBeam on main. I know you said “all” OpenCL on Beta. Do you mean starting at V3401 and up?

Joe

Real Join Date:
Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC
Try to learn something new everyday.
ID: 1770580 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1770581 - Posted: 9 Mar 2016, 19:24:29 UTC - in response to Message 1770580.  

Would using Tthrottle work?

Joe


It should.
Regarding beta testing - the more hosts will be attached to beta and run at default settings the sooner most bugs will be catched and app released to main for all.
ID: 1770581 · Report as offensive
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8923
Credit: 49,849,242
RAC: 65
Sweden
Message 1770582 - Posted: 9 Mar 2016, 19:32:14 UTC - in response to Message 1770581.  
Last modified: 9 Mar 2016, 19:32:25 UTC

Would using Tthrottle work?

Joe


It should.
Regarding beta testing - the more hosts will be attached to beta and run at default settings the sooner most bugs will be catched and app released to main for all.

To make the app take the user defined settings, is it required to put -no_defaults_scaling, into the mb_cmdline_win_x86_SSE3_OpenCL_NV.txt file?

From the ReadMe_AstroPulse_OpenCL_NV.txt file: -no_defaults_scaling : Disables auto-tuning default parameters. Basic params will be used. Implies user-supplied tuning.
ID: 1770582 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1770610 - Posted: 9 Mar 2016, 21:41:01 UTC - in response to Message 1770582.  
Last modified: 9 Mar 2016, 21:41:44 UTC

User settings have priority.
This option just disables auto-tuning. Currently it disables tuning to very high iterations num for low-end cards (for MB) and fetch and unroll auto-tuning for AP.
If user setting detected it will be used instead.
ID: 1770610 · Report as offensive
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8923
Credit: 49,849,242
RAC: 65
Sweden
Message 1770614 - Posted: 9 Mar 2016, 21:57:27 UTC - in response to Message 1770610.  
Last modified: 9 Mar 2016, 21:57:38 UTC

User settings have priority.
This option just disables auto-tuning. Currently it disables tuning to very high iterations num for low-end cards (for MB) and fetch and unroll auto-tuning for AP.
If user setting detected it will be used instead.

Thank you Raistmer, I understand.

Now, another question: Would the same tuning settings that I have found to be the best for my setup, with the older app (Build 3366), be appropriate for this app also?

For my GTX 980 that is:

-cpu_lock -sbs 192 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -instances_per_device 4

I will change to the new app, this weekend.
ID: 1770614 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1770630 - Posted: 9 Mar 2016, 22:43:52 UTC - in response to Message 1770614.  
Last modified: 9 Mar 2016, 23:18:01 UTC

The new build has quite a bit less utilization (old version kept the GPU at about 97-98% vs 3401 bouncing between 71-84%) so I'm trying to adjust the -sbs and work group size as you described above. Its in beta, also using -v 8 so hopefully you can see a bit more about what's going on.

Update: never did fine a combination of those two values that smoothed out the GPU utilization. Bounces all over the place pretty much regardless what those settings are.

When I was running it on main, 3366 has an APR of around 350GFlops, 3401 dropped that to 280-290. Numbers In beta are a bit less than that, but there's a much smaller run of wu's at the moment.

Chris
ID: 1770630 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1770632 - Posted: 9 Mar 2016, 22:56:33 UTC - in response to Message 1770614.  

-sbs N option could change cause PulseFind behavior changed.
ID: 1770632 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1770636 - Posted: 9 Mar 2016, 23:13:15 UTC - in response to Message 1770630.  

The new build has quite a bit less utilization (old version kept the GPU at about 97-98% vs 3401 bouncing between 71-84%) so I'm trying to adjust the -sbs and work group size as you described above. Its in beta, also using -v 8 so hopefully you can see a bit more about what's going on.

Chris


-v 8 has sense in offline runs. For full-scale live run it just overflows return buffer.
ID: 1770636 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1770638 - Posted: 9 Mar 2016, 23:19:23 UTC - in response to Message 1770636.  

The new build has quite a bit less utilization (old version kept the GPU at about 97-98% vs 3401 bouncing between 71-84%) so I'm trying to adjust the -sbs and work group size as you described above. Its in beta, also using -v 8 so hopefully you can see a bit more about what's going on.

Chris


-v 8 has sense in offline runs. For full-scale live run it just overflows return buffer.



So I see.=) I'll try to get things downloaded to do some offline testing tonight after the little one is asleep.

Thanks,

Chris
ID: 1770638 · Report as offensive
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8923
Credit: 49,849,242
RAC: 65
Sweden
Message 1770757 - Posted: 10 Mar 2016, 15:27:08 UTC - in response to Message 1770632.  
Last modified: 10 Mar 2016, 15:29:15 UTC

-sbs N option could change cause PulseFind behavior changed.

OK, will take that into consideration, when I change app, the coming weekend.
ID: 1770757 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6324
Credit: 106,370,077
RAC: 121
Russia
Message 1770816 - Posted: 10 Mar 2016, 20:47:28 UTC - in response to Message 1770757.  

-sbs N option could change cause PulseFind behavior changed.

OK, will take that into consideration, when I change app, the coming weekend.

Offline test show speedup for both AMD and NV apps with default settings.
ID: 1770816 · Report as offensive
Grumpy Swede (I stand with Ukraine)
Volunteer tester
Avatar

Send message
Joined: 1 Nov 08
Posts: 8923
Credit: 49,849,242
RAC: 65
Sweden
Message 1771000 - Posted: 11 Mar 2016, 16:57:23 UTC
Last modified: 11 Mar 2016, 16:58:32 UTC

OK, I'm now running MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG.exe, with the same settings as I've been running MB8_win_x86_SSE3_OpenCL_NV_r3366_SoG.exe for around 13000 WU's.

That is:
-cpu_lock -sbs 192 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -instances_per_device 4

We'll see how it goes....
ID: 1771000 · Report as offensive
Joe Januzzi
Volunteer tester
Avatar

Send message
Joined: 13 Apr 03
Posts: 54
Credit: 307,134,110
RAC: 492
United States
Message 1771199 - Posted: 12 Mar 2016, 16:37:16 UTC - in response to Message 1771000.  

FYI
Running Version#3401 (SoG). I hope I can fine-tune each video card separately. Added “MultiBeam_NV_config.xml” file in the mix. Won't be able to make any adjustment until Sunday (fishing).

Raistmer,
Do you think that some day this <instances_per_device>N</instances_per_device>” could be added to the MultiBeam_NV_config.xml file?

Still didn't use Tthrottle. CPU usage is higher than V. 1366 on my system. I still hit 100% at times, but with a lot less screen lags. I can live with that.

Here's some Wu's.
Device 0
http://setiathome.berkeley.edu/workunit.php?wuid=2089786243
Device 1
http://setiathome.berkeley.edu/workunit.php?wuid=2090191161
Device 2
http://setiathome.berkeley.edu/workunit.php?wuid=2089927129
Device 3
http://setiathome.berkeley.edu/workunit.php?wuid=2090899771
Joe

Here's my “MultiBeam_NV_config.xml” file.
;;; GTX 980
<device0>
<period_iterations_num>40</period_iterations_num>
<spike_fft_thresh>4096</spike_fft_thresh>
<sbs>192</sbs>
<oclfft_plan>
<size>256</size>
<global_radix>256</global_radix>
<local_radix>16</local_radix>
<workgroup_size>256</workgroup_size>
<max_local_size>512</max_local_size>
<localmem_banks>64</localmem_banks>
<localmem_coalesce_width>64</localmem_coalesce_width>
</oclfft_plan>
</device0>
;;; GTX 780
<device1>
<spike_fft_thresh>4096</spike_fft_thresh>
<sbs>192</sbs>
<oclfft_plan>
<size>256</size>
<global_radix>256</global_radix>
<local_radix>16</local_radix>
<workgroup_size>256</workgroup_size>
<max_local_size>512</max_local_size>
<localmem_banks>64</localmem_banks>
<localmem_coalesce_width>64</localmem_coalesce_width>
</oclfft_plan>
</device1>
;;; GTX 980
<device2>
<spike_fft_thresh>4096</spike_fft_thresh>
<sbs>192</sbs>
<oclfft_plan>
<size>256</size>
<global_radix>256</global_radix>
<local_radix>16</local_radix>
<workgroup_size>256</workgroup_size>
<max_local_size>512</max_local_size>
<localmem_banks>64</localmem_banks>
<localmem_coalesce_width>64</localmem_coalesce_width>
</oclfft_plan>
</device2>
;;; GTX 960
<device3>
<spike_fft_thresh>4096</spike_fft_thresh>
<sbs>192</sbs>
<oclfft_plan>
<size>256</size>
<global_radix>256</global_radix>
<local_radix>16</local_radix>
<workgroup_size>256</workgroup_size>
<max_local_size>512</max_local_size>
<localmem_banks>64</localmem_banks>
<localmem_coalesce_width>64</localmem_coalesce_width>
</oclfft_plan>
</device3>


Here's my “mb_cmdline_win_x86_SSE3_OpenCL_NV.txt” file.

-instances_per_device 3 -tune 1 64 1 4


Here's my “app_info.xml” file. I'm only showing the SoG portion that change.

<app>
<name>setiathome_v8</name>
</app>
<file_info>
<name>MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-3-4_x86.dll</name>
<executable/>
</file_info>
<file_ref>
<file_name>MultiBeam_Kernels_r3401.cl</file_name>
</file_ref>
<file_info>
<name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</name>
</file_info>
<file_info>
<name>MultiBeam_NV_config.xml</name>
</file_info>

<app_version>
<app_name>setiathome_v8</app_name>
<version_num>800</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_SoG</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
<file_ref>
<file_name>MultiBeam_NV_config.xml</file_name>
</file_ref>

</app_version>

Real Join Date:
Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC
Try to learn something new everyday.
ID: 1771199 · Report as offensive
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 21 · Next

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows


 
©2022 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.