OpenCL NV MultiBeam v8 SoG edition for Windows

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 18 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1770397 - Posted: 8 Mar 2016, 9:52:23 UTC

Also, please report AR of task where you see slowdown.
I expect some changes for low and mid ARs between r3366 and r3401 but no changes for high ARs. If you see slowdown with high AR value - make it clear cause it's unexpected.
ID: 1770397 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1770402 - Posted: 8 Mar 2016, 10:08:00 UTC

@ all who has "alpha-tester" status on Lunatics boards and has NV FERMI+ hardware, please read this: http://lunatics.kwsn.info/index.php/topic,1777.msg60748.html#msg60748 and make conclusions.
ID: 1770402 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1770441 - Posted: 8 Mar 2016, 15:04:54 UTC - in response to Message 1770402.  

Speaking of lunatics, is there a way to register over there anymore? I used to have an account and it got lost somewhere in the transition and now it says it no longer accepts new users.

Thanks,

Chris
ID: 1770441 · Report as offensive
Profile Jimbocous Project Donor
Volunteer tester
Avatar

Send message
Joined: 1 Apr 13
Posts: 1856
Credit: 268,616,081
RAC: 1,349
United States
Message 1770467 - Posted: 8 Mar 2016, 23:00:26 UTC - in response to Message 1770441.  

Speaking of lunatics, is there a way to register over there anymore? I used to have an account and it got lost somewhere in the transition and now it says it no longer accepts new users.

Thanks,

Chris

Dunno. I tried for almost a year to get verified, finally gave it up and found another team to join. Seems like no one is minding the store. Went to Arkayn's place instead ...
ID: 1770467 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1770534 - Posted: 9 Mar 2016, 10:26:55 UTC - in response to Message 1770467.  

There are issues with site management.
ID: 1770534 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1770535 - Posted: 9 Mar 2016, 10:27:52 UTC

All OpenCL Windows MultiBeam builds were updated on Beta project, please test there to speedup release to main as stock app.
ID: 1770535 · Report as offensive
Joe Januzzi
Volunteer tester
Avatar

Send message
Joined: 13 Apr 03
Posts: 54
Credit: 307,134,110
RAC: 492
United States
Message 1770580 - Posted: 9 Mar 2016, 19:20:25 UTC - in response to Message 1770535.  

FYI

When I used V. 3401 my CPU was running mostly at 100%. Screen lags for the first time. I tried different values for “ increase -period_iteration_num” in mb_cmdline*.txt. No number worked on stopping the screen lags. When I used V. 3366 the CPU only hit 100% at times, and when it did I had no screen lags. V. 3401 worked my system to hard. On either version if I could throttle the CPU just a little, it would be real nice. Would using Tthrottle work?

My RAC on my GTX 560 Ti running V. 3366 is going up, even with 1 CPU Wu running. When my CPU Wu's are done (like watching water to boil). I'll test with GPU only. After that I like to run V. 3401, because I have only one card in this system.

All OpenCL Windows MultiBeam builds were updated on Beta project, please test there to speedup release to main as stock app.

Raistmer,
By the time I saw this post, I was running OpenCL Windows MultiBeam on main. I know you said “all” OpenCL on Beta. Do you mean starting at V3401 and up?

Joe

Real Join Date:
Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC
Try to learn something new everyday.
ID: 1770580 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1770581 - Posted: 9 Mar 2016, 19:24:29 UTC - in response to Message 1770580.  

Would using Tthrottle work?

Joe


It should.
Regarding beta testing - the more hosts will be attached to beta and run at default settings the sooner most bugs will be catched and app released to main for all.
ID: 1770581 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1770610 - Posted: 9 Mar 2016, 21:41:01 UTC - in response to Message 1770582.  
Last modified: 9 Mar 2016, 21:41:44 UTC

User settings have priority.
This option just disables auto-tuning. Currently it disables tuning to very high iterations num for low-end cards (for MB) and fetch and unroll auto-tuning for AP.
If user setting detected it will be used instead.
ID: 1770610 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1770630 - Posted: 9 Mar 2016, 22:43:52 UTC - in response to Message 1770614.  
Last modified: 9 Mar 2016, 23:18:01 UTC

The new build has quite a bit less utilization (old version kept the GPU at about 97-98% vs 3401 bouncing between 71-84%) so I'm trying to adjust the -sbs and work group size as you described above. Its in beta, also using -v 8 so hopefully you can see a bit more about what's going on.

Update: never did fine a combination of those two values that smoothed out the GPU utilization. Bounces all over the place pretty much regardless what those settings are.

When I was running it on main, 3366 has an APR of around 350GFlops, 3401 dropped that to 280-290. Numbers In beta are a bit less than that, but there's a much smaller run of wu's at the moment.

Chris
ID: 1770630 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1770632 - Posted: 9 Mar 2016, 22:56:33 UTC - in response to Message 1770614.  

-sbs N option could change cause PulseFind behavior changed.
ID: 1770632 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1770636 - Posted: 9 Mar 2016, 23:13:15 UTC - in response to Message 1770630.  

The new build has quite a bit less utilization (old version kept the GPU at about 97-98% vs 3401 bouncing between 71-84%) so I'm trying to adjust the -sbs and work group size as you described above. Its in beta, also using -v 8 so hopefully you can see a bit more about what's going on.

Chris


-v 8 has sense in offline runs. For full-scale live run it just overflows return buffer.
ID: 1770636 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1770638 - Posted: 9 Mar 2016, 23:19:23 UTC - in response to Message 1770636.  

The new build has quite a bit less utilization (old version kept the GPU at about 97-98% vs 3401 bouncing between 71-84%) so I'm trying to adjust the -sbs and work group size as you described above. Its in beta, also using -v 8 so hopefully you can see a bit more about what's going on.

Chris


-v 8 has sense in offline runs. For full-scale live run it just overflows return buffer.



So I see.=) I'll try to get things downloaded to do some offline testing tonight after the little one is asleep.

Thanks,

Chris
ID: 1770638 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1770816 - Posted: 10 Mar 2016, 20:47:28 UTC - in response to Message 1770757.  

-sbs N option could change cause PulseFind behavior changed.

OK, will take that into consideration, when I change app, the coming weekend.

Offline test show speedup for both AMD and NV apps with default settings.
ID: 1770816 · Report as offensive
Joe Januzzi
Volunteer tester
Avatar

Send message
Joined: 13 Apr 03
Posts: 54
Credit: 307,134,110
RAC: 492
United States
Message 1771199 - Posted: 12 Mar 2016, 16:37:16 UTC - in response to Message 1771000.  

FYI
Running Version#3401 (SoG). I hope I can fine-tune each video card separately. Added “MultiBeam_NV_config.xml” file in the mix. Won't be able to make any adjustment until Sunday (fishing).

Raistmer,
Do you think that some day this <instances_per_device>N</instances_per_device>” could be added to the MultiBeam_NV_config.xml file?

Still didn't use Tthrottle. CPU usage is higher than V. 1366 on my system. I still hit 100% at times, but with a lot less screen lags. I can live with that.

Here's some Wu's.
Device 0
http://setiathome.berkeley.edu/workunit.php?wuid=2089786243
Device 1
http://setiathome.berkeley.edu/workunit.php?wuid=2090191161
Device 2
http://setiathome.berkeley.edu/workunit.php?wuid=2089927129
Device 3
http://setiathome.berkeley.edu/workunit.php?wuid=2090899771
Joe

Here's my “MultiBeam_NV_config.xml” file.
;;; GTX 980
<device0>
<period_iterations_num>40</period_iterations_num>
<spike_fft_thresh>4096</spike_fft_thresh>
<sbs>192</sbs>
<oclfft_plan>
<size>256</size>
<global_radix>256</global_radix>
<local_radix>16</local_radix>
<workgroup_size>256</workgroup_size>
<max_local_size>512</max_local_size>
<localmem_banks>64</localmem_banks>
<localmem_coalesce_width>64</localmem_coalesce_width>
</oclfft_plan>
</device0>
;;; GTX 780
<device1>
<spike_fft_thresh>4096</spike_fft_thresh>
<sbs>192</sbs>
<oclfft_plan>
<size>256</size>
<global_radix>256</global_radix>
<local_radix>16</local_radix>
<workgroup_size>256</workgroup_size>
<max_local_size>512</max_local_size>
<localmem_banks>64</localmem_banks>
<localmem_coalesce_width>64</localmem_coalesce_width>
</oclfft_plan>
</device1>
;;; GTX 980
<device2>
<spike_fft_thresh>4096</spike_fft_thresh>
<sbs>192</sbs>
<oclfft_plan>
<size>256</size>
<global_radix>256</global_radix>
<local_radix>16</local_radix>
<workgroup_size>256</workgroup_size>
<max_local_size>512</max_local_size>
<localmem_banks>64</localmem_banks>
<localmem_coalesce_width>64</localmem_coalesce_width>
</oclfft_plan>
</device2>
;;; GTX 960
<device3>
<spike_fft_thresh>4096</spike_fft_thresh>
<sbs>192</sbs>
<oclfft_plan>
<size>256</size>
<global_radix>256</global_radix>
<local_radix>16</local_radix>
<workgroup_size>256</workgroup_size>
<max_local_size>512</max_local_size>
<localmem_banks>64</localmem_banks>
<localmem_coalesce_width>64</localmem_coalesce_width>
</oclfft_plan>
</device3>


Here's my “mb_cmdline_win_x86_SSE3_OpenCL_NV.txt” file.

-instances_per_device 3 -tune 1 64 1 4


Here's my “app_info.xml” file. I'm only showing the SoG portion that change.

<app>
<name>setiathome_v8</name>
</app>
<file_info>
<name>MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG.exe</name>
<executable/>
</file_info>
<file_info>
<name>libfftw3f-3-3-4_x86.dll</name>
<executable/>
</file_info>
<file_ref>
<file_name>MultiBeam_Kernels_r3401.cl</file_name>
</file_ref>
<file_info>
<name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</name>
</file_info>
<file_info>
<name>MultiBeam_NV_config.xml</name>
</file_info>

<app_version>
<app_name>setiathome_v8</app_name>
<version_num>800</version_num>
<platform>windows_intelx86</platform>
<avg_ncpus>0.04</avg_ncpus>
<max_ncpus>0.2</max_ncpus>
<plan_class>opencl_nvidia_SoG</plan_class>
<cmdline></cmdline>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>MB8_win_x86_SSE3_OpenCL_NV_r3401_SoG.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>libfftw3f-3-3-4_x86.dll</file_name>
</file_ref>
<file_ref>
<file_name>mb_cmdline_win_x86_SSE3_OpenCL_NV.txt</file_name>
<open_name>mb_cmdline.txt</open_name>
</file_ref>
<file_ref>
<file_name>MultiBeam_NV_config.xml</file_name>
</file_ref>

</app_version>

Real Join Date:
Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC
Try to learn something new everyday.
ID: 1771199 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1771227 - Posted: 12 Mar 2016, 18:49:36 UTC - in response to Message 1771199.  


Raistmer,
Do you think that some day this <instances_per_device>N</instances_per_device>” could be added to the MultiBeam_NV_config.xml file?

Hardly.
and definitely no sense to do that until BOINC will be able to run different number of tasks for different GPU of the same vendor. Can it?
ID: 1771227 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1773210 - Posted: 22 Mar 2016, 7:51:13 UTC - in response to Message 1773039.  

Indeed, performance variation from AR change bigger than possible change in performance from switching 3/4 tasks per GPU. So really good statistics or some offline tests in controlled environment are required for that.
ID: 1773210 · Report as offensive
Joe Januzzi
Volunteer tester
Avatar

Send message
Joined: 13 Apr 03
Posts: 54
Credit: 307,134,110
RAC: 492
United States
Message 1773305 - Posted: 22 Mar 2016, 21:55:46 UTC - in response to Message 1772694.  

Dropping -period_iterations_num to 20, from default 50, increased the speed considerably.

Dropping it to 20, helped my speed too! Thanks Tutankhamon for the info.

I had the -v 8 switch running without knowing it for about a week. The -v 8 switch was at the tale end of the commands, which I didn't see, because my screen was to small :-(
So now I'm back tracking a little bit. Hopefully with better data this time.

Here's my “mb_cmdline_win_x86_SSE3_OpenCL_NV.txt” file.

-sbs 192 -instances_per_device 3 -period_iterations_num 20 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 16 -oclfft_tune_cw 16

Real Join Date:
Joe Januzzi (ID 253343) 29 Sep 1999, 22:30:36 UTC
Try to learn something new everyday.
ID: 1773305 · Report as offensive
Rasputin42
Volunteer tester

Send message
Joined: 25 Jul 08
Posts: 412
Credit: 5,834,661
RAC: 0
United States
Message 1773310 - Posted: 22 Mar 2016, 22:24:28 UTC

The r3401 version is not working well for me.
I guess, it is no good for cards with few Compute units (2 in my case)
One of the test wus(from lunatics) does not even run at all(no error, but no cpu or gpu usage)
I tried all sorts of tweaking, different drivers,but performance is bad.

The r3366 works fine.
ID: 1773310 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1773375 - Posted: 23 Mar 2016, 5:53:24 UTC - in response to Message 1773310.  
Last modified: 23 Mar 2016, 6:01:25 UTC


I tried all sorts of tweaking

What exactly you tried?
r3401 currently is RC build so any usability degradation not solved on beta will remain after release. Did you check system log for driver restart events?
EDIT: on beta I see such completed result:

Defaults scaling is disabled, basic defaults will be used. Tuning on user's discretion.
Number of period iterations for PulseFind set to:5

Such tuning definitely not correct for low-performance card with small number of CUs. You purposedly worse app usability with such tuning.
For low-performance GPU default value is 500, for mid-range and high-level GPUs default is 50.
So value of 5 can be complete no go for your device.
Did you experience issues with default settings before such tune attempts?
ID: 1773375 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 18 · Next

Message boards : Number crunching : OpenCL NV MultiBeam v8 SoG edition for Windows


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.