Message boards :
Number crunching :
OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 21 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
If you see slowdown versus r3366 please try to play with this parameters: -pref_wg_size N New one, older default would correspond -pref_wg_size 128 for ATi and 32 for NV Now default for ATi is 64 (for NV should be same 32 but maybe defaults screwed so try -pref_wg_size from 32 to 256 in step of 64 for ATi and 32 for NV). And better to do this offline cause with some configs high WG sizes caused total OS freeze (yeah, we have "truly preemptive multitasking OS" all these years called Windows :/ ) -sbs N default is 128, try different values around. Not nessessary in 64MB steps (!). this value used @decision how many WG will be. Non-standard size could change that decision to be more speedy. Also, would be good to use -v 8 option and note what WG numbers formed in r3366 and r3401 for similar PulseFind launches. r3401 should load all available CUs and load them more fully in case of memory limit, but this can have side-effects of different memory access patterns. Quite possible that new memory access pattern causes more slowdown than few idle CUs would do in prev revision. And that memory access pattern can be changed in some extent with these 2 options. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Also, please report AR of task where you see slowdown. I expect some changes for low and mid ARs between r3366 and r3401 but no changes for high ARs. If you see slowdown with high AR value - make it clear cause it's unexpected. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
@ all who has "alpha-tester" status on Lunatics boards and has NV FERMI+ hardware, please read this: http://lunatics.kwsn.info/index.php/topic,1777.msg60748.html#msg60748 and make conclusions. |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 ![]() ![]() |
Speaking of lunatics, is there a way to register over there anymore? I used to have an account and it got lost somewhere in the transition and now it says it no longer accepts new users. Thanks, Chris |
![]() ![]() ![]() Send message Joined: 1 Apr 13 Posts: 1849 Credit: 268,616,081 RAC: 1,349 ![]() ![]() |
Speaking of lunatics, is there a way to register over there anymore? I used to have an account and it got lost somewhere in the transition and now it says it no longer accepts new users. Dunno. I tried for almost a year to get verified, finally gave it up and found another team to join. Seems like no one is minding the store. Went to Arkayn's place instead ... ![]() ![]() |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
There are issues with site management. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
All OpenCL Windows MultiBeam builds were updated on Beta project, please test there to speedup release to main as stock app. |
Joe Januzzi ![]() Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 ![]() ![]() |
FYI When I used V. 3401 my CPU was running mostly at 100%. Screen lags for the first time. I tried different values for “ increase -period_iteration_num†in mb_cmdline*.txt. No number worked on stopping the screen lags. When I used V. 3366 the CPU only hit 100% at times, and when it did I had no screen lags. V. 3401 worked my system to hard. On either version if I could throttle the CPU just a little, it would be real nice. Would using Tthrottle work? My RAC on my GTX 560 Ti running V. 3366 is going up, even with 1 CPU Wu running. When my CPU Wu's are done (like watching water to boil). I'll test with GPU only. After that I like to run V. 3401, because I have only one card in this system. All OpenCL Windows MultiBeam builds were updated on Beta project, please test there to speedup release to main as stock app. Raistmer, By the time I saw this post, I was running OpenCL Windows MultiBeam on main. I know you said “all†OpenCL on Beta. Do you mean starting at V3401 and up? Joe ![]() Real Join Date: Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC Try to learn something new everyday. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
Would using Tthrottle work? It should. Regarding beta testing - the more hosts will be attached to beta and run at default settings the sooner most bugs will be catched and app released to main for all. |
Grumpy Swede (I stand with Ukraine) ![]() Send message Joined: 1 Nov 08 Posts: 8923 Credit: 49,849,242 RAC: 65 ![]() ![]() |
Would using Tthrottle work? To make the app take the user defined settings, is it required to put -no_defaults_scaling, into the mb_cmdline_win_x86_SSE3_OpenCL_NV.txt file? From the ReadMe_AstroPulse_OpenCL_NV.txt file: -no_defaults_scaling : Disables auto-tuning default parameters. Basic params will be used. Implies user-supplied tuning. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
User settings have priority. This option just disables auto-tuning. Currently it disables tuning to very high iterations num for low-end cards (for MB) and fetch and unroll auto-tuning for AP. If user setting detected it will be used instead. |
Grumpy Swede (I stand with Ukraine) ![]() Send message Joined: 1 Nov 08 Posts: 8923 Credit: 49,849,242 RAC: 65 ![]() ![]() |
User settings have priority. Thank you Raistmer, I understand. Now, another question: Would the same tuning settings that I have found to be the best for my setup, with the older app (Build 3366), be appropriate for this app also? For my GTX 980 that is: -cpu_lock -sbs 192 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -instances_per_device 4 I will change to the new app, this weekend. |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 ![]() ![]() |
The new build has quite a bit less utilization (old version kept the GPU at about 97-98% vs 3401 bouncing between 71-84%) so I'm trying to adjust the -sbs and work group size as you described above. Its in beta, also using -v 8 so hopefully you can see a bit more about what's going on. Update: never did fine a combination of those two values that smoothed out the GPU utilization. Bounces all over the place pretty much regardless what those settings are. When I was running it on main, 3366 has an APR of around 350GFlops, 3401 dropped that to 280-290. Numbers In beta are a bit less than that, but there's a much smaller run of wu's at the moment. Chris |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
-sbs N option could change cause PulseFind behavior changed. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
The new build has quite a bit less utilization (old version kept the GPU at about 97-98% vs 3401 bouncing between 71-84%) so I'm trying to adjust the -sbs and work group size as you described above. Its in beta, also using -v 8 so hopefully you can see a bit more about what's going on. -v 8 has sense in offline runs. For full-scale live run it just overflows return buffer. |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 ![]() ![]() |
The new build has quite a bit less utilization (old version kept the GPU at about 97-98% vs 3401 bouncing between 71-84%) so I'm trying to adjust the -sbs and work group size as you described above. Its in beta, also using -v 8 so hopefully you can see a bit more about what's going on. So I see.=) I'll try to get things downloaded to do some offline testing tonight after the little one is asleep. Thanks, Chris |
Grumpy Swede (I stand with Ukraine) ![]() Send message Joined: 1 Nov 08 Posts: 8923 Credit: 49,849,242 RAC: 65 ![]() ![]() |
-sbs N option could change cause PulseFind behavior changed. OK, will take that into consideration, when I change app, the coming weekend. |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
-sbs N option could change cause PulseFind behavior changed. Offline test show speedup for both AMD and NV apps with default settings. |
Grumpy Swede (I stand with Ukraine) ![]() Send message Joined: 1 Nov 08 Posts: 8923 Credit: 49,849,242 RAC: 65 ![]() ![]() |
OK, I'm now running MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG.exe, with the same settings as I've been running MB8_win_x86_SSE3_OpenCL_NV_r3366_SoG.exe for around 13000 WU's. That is: -cpu_lock -sbs 192 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -instances_per_device 4 We'll see how it goes.... |
Joe Januzzi ![]() Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 ![]() ![]() |
FYI Running Version#3401 (SoG). I hope I can fine-tune each video card separately. Added “MultiBeam_NV_config.xml†file in the mix. Won't be able to make any adjustment until Sunday (fishing). Raistmer, Do you think that some day this <instances_per_device>N</instances_per_device>†could be added to the MultiBeam_NV_config.xml file? Still didn't use Tthrottle. CPU usage is higher than V. 1366 on my system. I still hit 100% at times, but with a lot less screen lags. I can live with that. Here's some Wu's. Device 0 http://setiathome.berkeley.edu/workunit.php?wuid=2089786243 Device 1 http://setiathome.berkeley.edu/workunit.php?wuid=2090191161 Device 2 http://setiathome.berkeley.edu/workunit.php?wuid=2089927129 Device 3 http://setiathome.berkeley.edu/workunit.php?wuid=2090899771 Joe Here's my “MultiBeam_NV_config.xml†file. ;;; GTX 980 <device0> <period_iterations_num>40</period_iterations_num> <spike_fft_thresh>4096</spike_fft_thresh> <sbs>192</sbs> <oclfft_plan> <size>256</size> <global_radix>256</global_radix> <local_radix>16</local_radix> <workgroup_size>256</workgroup_size> <max_local_size>512</max_local_size> <localmem_banks>64</localmem_banks> <localmem_coalesce_width>64</localmem_coalesce_width> </oclfft_plan> </device0> ;;; GTX 780 <device1> <spike_fft_thresh>4096</spike_fft_thresh> <sbs>192</sbs> <oclfft_plan> <size>256</size> <global_radix>256</global_radix> <local_radix>16</local_radix> <workgroup_size>256</workgroup_size> <max_local_size>512</max_local_size> <localmem_banks>64</localmem_banks> <localmem_coalesce_width>64</localmem_coalesce_width> </oclfft_plan> </device1> ;;; GTX 980 <device2> <spike_fft_thresh>4096</spike_fft_thresh> <sbs>192</sbs> <oclfft_plan> <size>256</size> <global_radix>256</global_radix> <local_radix>16</local_radix> <workgroup_size>256</workgroup_size> <max_local_size>512</max_local_size> <localmem_banks>64</localmem_banks> <localmem_coalesce_width>64</localmem_coalesce_width> </oclfft_plan> </device2> ;;; GTX 960 <device3> <spike_fft_thresh>4096</spike_fft_thresh> <sbs>192</sbs> <oclfft_plan> <size>256</size> <global_radix>256</global_radix> <local_radix>16</local_radix> <workgroup_size>256</workgroup_size> <max_local_size>512</max_local_size> <localmem_banks>64</localmem_banks> <localmem_coalesce_width>64</localmem_coalesce_width> </oclfft_plan> </device3> Here's my “mb_cmdline_win_x86_SSE3_OpenCL_NV.txt†file. -instances_per_device 3 -tune 1 64 1 4 Here's my “app_info.xml†file. I'm only showing the SoG portion that change. <app> <name>setiathome_v8</name> </app> <file_info> <name>MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG.exe</name> <executable/> </file_info> <file_info> <name>libfftw3f-3-3-4_x86.dll</name> <executable/> </file_info> <file_ref> <file_name>MultiBeam_Kernels_r3401.cl</file_name> </file_ref> <file_info> <name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</name> </file_info> <file_info> <name>MultiBeam_NV_config.xml</name> </file_info> <app_version> <app_name>setiathome_v8</app_name> <version_num>800</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.04</avg_ncpus> <max_ncpus>0.2</max_ncpus> <plan_class>opencl_nvidia_SoG</plan_class> <cmdline></cmdline> <coproc> <type>CUDA</type> <count>1</count> </coproc> <file_ref> <file_name>MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>libfftw3f-3-3-4_x86.dll</file_name> </file_ref> <file_ref> <file_name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</file_name> <open_name>mb_cmdline.txt</open_name> </file_ref> <file_ref> <file_name>MultiBeam_NV_config.xml</file_name> </file_ref> </app_version> ![]() Real Join Date: Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC Try to learn something new everyday. |
©2022 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.