Message boards :
Number crunching :
OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 18 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Also, please report AR of task where you see slowdown. I expect some changes for low and mid ARs between r3366 and r3401 but no changes for high ARs. If you see slowdown with high AR value - make it clear cause it's unexpected. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
@ all who has "alpha-tester" status on Lunatics boards and has NV FERMI+ hardware, please read this: http://lunatics.kwsn.info/index.php/topic,1777.msg60748.html#msg60748 and make conclusions. |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
Speaking of lunatics, is there a way to register over there anymore? I used to have an account and it got lost somewhere in the transition and now it says it no longer accepts new users. Thanks, Chris |
Jimbocous Send message Joined: 1 Apr 13 Posts: 1856 Credit: 268,616,081 RAC: 1,349 |
Speaking of lunatics, is there a way to register over there anymore? I used to have an account and it got lost somewhere in the transition and now it says it no longer accepts new users. Dunno. I tried for almost a year to get verified, finally gave it up and found another team to join. Seems like no one is minding the store. Went to Arkayn's place instead ... |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
There are issues with site management. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
All OpenCL Windows MultiBeam builds were updated on Beta project, please test there to speedup release to main as stock app. |
Joe Januzzi Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 |
FYI When I used V. 3401 my CPU was running mostly at 100%. Screen lags for the first time. I tried different values for “ increase -period_iteration_num†in mb_cmdline*.txt. No number worked on stopping the screen lags. When I used V. 3366 the CPU only hit 100% at times, and when it did I had no screen lags. V. 3401 worked my system to hard. On either version if I could throttle the CPU just a little, it would be real nice. Would using Tthrottle work? My RAC on my GTX 560 Ti running V. 3366 is going up, even with 1 CPU Wu running. When my CPU Wu's are done (like watching water to boil). I'll test with GPU only. After that I like to run V. 3401, because I have only one card in this system. All OpenCL Windows MultiBeam builds were updated on Beta project, please test there to speedup release to main as stock app. Raistmer, By the time I saw this post, I was running OpenCL Windows MultiBeam on main. I know you said “all†OpenCL on Beta. Do you mean starting at V3401 and up? Joe Real Join Date: Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC Try to learn something new everyday. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Would using Tthrottle work? It should. Regarding beta testing - the more hosts will be attached to beta and run at default settings the sooner most bugs will be catched and app released to main for all. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
User settings have priority. This option just disables auto-tuning. Currently it disables tuning to very high iterations num for low-end cards (for MB) and fetch and unroll auto-tuning for AP. If user setting detected it will be used instead. |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
The new build has quite a bit less utilization (old version kept the GPU at about 97-98% vs 3401 bouncing between 71-84%) so I'm trying to adjust the -sbs and work group size as you described above. Its in beta, also using -v 8 so hopefully you can see a bit more about what's going on. Update: never did fine a combination of those two values that smoothed out the GPU utilization. Bounces all over the place pretty much regardless what those settings are. When I was running it on main, 3366 has an APR of around 350GFlops, 3401 dropped that to 280-290. Numbers In beta are a bit less than that, but there's a much smaller run of wu's at the moment. Chris |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
-sbs N option could change cause PulseFind behavior changed. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
The new build has quite a bit less utilization (old version kept the GPU at about 97-98% vs 3401 bouncing between 71-84%) so I'm trying to adjust the -sbs and work group size as you described above. Its in beta, also using -v 8 so hopefully you can see a bit more about what's going on. -v 8 has sense in offline runs. For full-scale live run it just overflows return buffer. |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
The new build has quite a bit less utilization (old version kept the GPU at about 97-98% vs 3401 bouncing between 71-84%) so I'm trying to adjust the -sbs and work group size as you described above. Its in beta, also using -v 8 so hopefully you can see a bit more about what's going on. So I see.=) I'll try to get things downloaded to do some offline testing tonight after the little one is asleep. Thanks, Chris |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
-sbs N option could change cause PulseFind behavior changed. Offline test show speedup for both AMD and NV apps with default settings. |
Joe Januzzi Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 |
FYI Running Version#3401 (SoG). I hope I can fine-tune each video card separately. Added “MultiBeam_NV_config.xml†file in the mix. Won't be able to make any adjustment until Sunday (fishing). Raistmer, Do you think that some day this <instances_per_device>N</instances_per_device>†could be added to the MultiBeam_NV_config.xml file? Still didn't use Tthrottle. CPU usage is higher than V. 1366 on my system. I still hit 100% at times, but with a lot less screen lags. I can live with that. Here's some Wu's. Device 0 http://setiathome.berkeley.edu/workunit.php?wuid=2089786243 Device 1 http://setiathome.berkeley.edu/workunit.php?wuid=2090191161 Device 2 http://setiathome.berkeley.edu/workunit.php?wuid=2089927129 Device 3 http://setiathome.berkeley.edu/workunit.php?wuid=2090899771 Joe Here's my “MultiBeam_NV_config.xml†file. ;;; GTX 980 <device0> <period_iterations_num>40</period_iterations_num> <spike_fft_thresh>4096</spike_fft_thresh> <sbs>192</sbs> <oclfft_plan> <size>256</size> <global_radix>256</global_radix> <local_radix>16</local_radix> <workgroup_size>256</workgroup_size> <max_local_size>512</max_local_size> <localmem_banks>64</localmem_banks> <localmem_coalesce_width>64</localmem_coalesce_width> </oclfft_plan> </device0> ;;; GTX 780 <device1> <spike_fft_thresh>4096</spike_fft_thresh> <sbs>192</sbs> <oclfft_plan> <size>256</size> <global_radix>256</global_radix> <local_radix>16</local_radix> <workgroup_size>256</workgroup_size> <max_local_size>512</max_local_size> <localmem_banks>64</localmem_banks> <localmem_coalesce_width>64</localmem_coalesce_width> </oclfft_plan> </device1> ;;; GTX 980 <device2> <spike_fft_thresh>4096</spike_fft_thresh> <sbs>192</sbs> <oclfft_plan> <size>256</size> <global_radix>256</global_radix> <local_radix>16</local_radix> <workgroup_size>256</workgroup_size> <max_local_size>512</max_local_size> <localmem_banks>64</localmem_banks> <localmem_coalesce_width>64</localmem_coalesce_width> </oclfft_plan> </device2> ;;; GTX 960 <device3> <spike_fft_thresh>4096</spike_fft_thresh> <sbs>192</sbs> <oclfft_plan> <size>256</size> <global_radix>256</global_radix> <local_radix>16</local_radix> <workgroup_size>256</workgroup_size> <max_local_size>512</max_local_size> <localmem_banks>64</localmem_banks> <localmem_coalesce_width>64</localmem_coalesce_width> </oclfft_plan> </device3> Here's my “mb_cmdline_win_x86_SSE3_OpenCL_NV.txt†file. -instances_per_device 3 -tune 1 64 1 4 Here's my “app_info.xml†file. I'm only showing the SoG portion that change. <app> <name>setiathome_v8</name> </app> <file_info> <name>MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG.exe</name> <executable/> </file_info> <file_info> <name>libfftw3f-3-3-4_x86.dll</name> <executable/> </file_info> <file_ref> <file_name>MultiBeam_Kernels_r3401.cl</file_name> </file_ref> <file_info> <name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</name> </file_info> <file_info> <name>MultiBeam_NV_config.xml</name> </file_info> <app_version> <app_name>setiathome_v8</app_name> <version_num>800</version_num> <platform>windows_intelx86</platform> <avg_ncpus>0.04</avg_ncpus> <max_ncpus>0.2</max_ncpus> <plan_class>opencl_nvidia_SoG</plan_class> <cmdline></cmdline> <coproc> <type>CUDA</type> <count>1</count> </coproc> <file_ref> <file_name>MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG.exe</file_name> <main_program/> </file_ref> <file_ref> <file_name>libfftw3f-3-3-4_x86.dll</file_name> </file_ref> <file_ref> <file_name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</file_name> <open_name>mb_cmdline.txt</open_name> </file_ref> <file_ref> <file_name>MultiBeam_NV_config.xml</file_name> </file_ref> </app_version> Real Join Date: Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC Try to learn something new everyday. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Hardly. and definitely no sense to do that until BOINC will be able to run different number of tasks for different GPU of the same vendor. Can it? |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Indeed, performance variation from AR change bigger than possible change in performance from switching 3/4 tasks per GPU. So really good statistics or some offline tests in controlled environment are required for that. |
Joe Januzzi Send message Joined: 13 Apr 03 Posts: 54 Credit: 307,134,110 RAC: 492 |
Dropping -period_iterations_num to 20, from default 50, increased the speed considerably. Dropping it to 20, helped my speed too! Thanks Tutankhamon for the info. I had the -v 8 switch running without knowing it for about a week. The -v 8 switch was at the tale end of the commands, which I didn't see, because my screen was to small :-( So now I'm back tracking a little bit. Hopefully with better data this time. Here's my “mb_cmdline_win_x86_SSE3_OpenCL_NV.txt†file. -sbs 192 -instances_per_device 3 -period_iterations_num 20 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 16 -oclfft_tune_cw 16 Real Join Date: Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC Try to learn something new everyday. |
Rasputin42 Send message Joined: 25 Jul 08 Posts: 412 Credit: 5,834,661 RAC: 0 |
The r3401 version is not working well for me. I guess, it is no good for cards with few Compute units (2 in my case) One of the test wus(from lunatics) does not even run at all(no error, but no cpu or gpu usage) I tried all sorts of tweaking, different drivers,but performance is bad. The r3366 works fine. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
What exactly you tried? r3401 currently is RC build so any usability degradation not solved on beta will remain after release. Did you check system log for driver restart events? EDIT: on beta I see such completed result: Defaults scaling is disabled, basic defaults will be used. Tuning on user's discretion. Number of period iterations for PulseFind set to:5 Such tuning definitely not correct for low-performance card with small number of CUs. You purposedly worse app usability with such tuning. For low-performance GPU default value is 500, for mid-range and high-level GPUs default is 50. So value of 5 can be complete no go for your device. Did you experience issues with default settings before such tune attempts? |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.