4x AMD Radeon R9 Fury X

Message boards : Number crunching : 4x AMD Radeon R9 Fury X
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6

AuthorMessage
Profile Louis Loria II
Volunteer tester
Avatar

Send message
Joined: 20 Oct 03
Posts: 259
Credit: 9,208,040
RAC: 24
United States
Message 1736598 - Posted: 24 Oct 2015, 2:47:13 UTC

okie dokie... installed the BETA. No problems as far as I can tell at this point. I am still running multiple WUs with no hiccups. We'll see what happens over the next day or so...

AMD FX-8350 at 4300mhz
16gigs of G-Skill Ripjaws RAM at 1600mhz
2-Powercolor R9 280Xs 1030/1500mhz
Gigabyte GA970-UD3 MOBO
Samsung EVO 850 SSD
EVGA Supernova P2 1200W PSU
ID: 1736598 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1736861 - Posted: 25 Oct 2015, 6:33:16 UTC
Last modified: 25 Oct 2015, 7:14:02 UTC

Because of bench test runs...

I have no idea when AMD will have fixed the new added BUG (for the new chips).


AstroPulse... 'Windows AP bench 211 minimal' and...
In past I used the '2LC67' AP WU of 'Zblank shortened WUs' (for J1900 iGPU and NV GT730).
For the R9 Fury X I should use the '9LC67' AP WU, right?

Execution like last time? Or there are now new added cmdline params possible?


MultiBeam...
'MBbench 2.10'?
And which WU?

Just the following cmdline params, or more?
-sbs N
-period_iterations_num N
-spike_fft_thresh N
-tune 1 N N N
-oclfft_tune_gr N
-oclfft_tune_lr N
-oclfft_tune_wg N
-oclfft_tune_ls N
-oclfft_tune_bn N
-oclfft_tune_cw N

[EDIT: From which to which value is possible each params?]

They are all independence, or one or more cmdline params are connected (like AP -ffa_block N and -ffa_block_fetch N)?


Now I have 4 identical VGA cards.
If I execute one 'bench .cmd tool', it will run on GPU#0, right?
If I execute a 2nd instance of the 'bench .cmd tool', it will run on GPU#1, or also on GPU#0?
(I could speed up the bench test runs if GPU#0, #1 and #2 will be used simultaneously during it (3 times same cmdline params).)

Thanks.
ID: 1736861 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1736867 - Posted: 25 Oct 2015, 7:49:13 UTC - in response to Message 1736861.  

BTW.
What are the default cmdline settings for AP and MB for a 64 compute units VGA card?
ID: 1736867 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1736881 - Posted: 25 Oct 2015, 10:04:29 UTC - in response to Message 1736867.  

BTW.
What are the default cmdline settings for AP and MB for a 64 compute units VGA card?


Just run a bench without comand line settings and check stderr.txt.


With each crime and every kindness we birth our future.
ID: 1736881 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 34253
Credit: 79,922,639
RAC: 80
Germany
Message 1736882 - Posted: 25 Oct 2015, 10:12:58 UTC
Last modified: 25 Oct 2015, 10:14:45 UTC

AstroPulse... 'Windows AP bench 211 minimal' and...
In past I used the '2LC67' AP WU of 'Zblank shortened WUs' (for J1900 iGPU and NV GT730).
For the R9 Fury X I should use the '9LC67' AP WU, right?

Execution like last time? Or there are now new added cmdline params possible?


MultiBeam...
'MBbench 2.10'?
And which WU?


Bench script for both is 2.13.

Task for AP is 9LC67

For MB use PG0395 PG444 and PG1327.

You can make 4 folders for bench runs for each GPU.
In comandline just add -device 0 to 3 to use all GPU`s.

To test oclFFT planning and tune params you need to understand how it works.
I certainly won`t give lessons in fft kernels.
There are hundreds of possiblities.

Here is a small example of one of my benches.

#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 4096 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 1 1 2 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 1 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 1 1 8 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 32
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 1 1 16 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 32 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 1 1 32 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 1 1 64 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 16 -oclfft_tune_cw 32
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 1 1 128 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 16 -oclfft_tune_cw 16
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 1 1 256 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 2 1 1 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 4 1 1 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 8 1 1 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 16 1 1 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 32 1 1 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 64 1 1 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 128 1 1 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 256 1 1 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 2 1 2 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 2 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 2 1 8 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 2 1 16 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 2 1 32 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 2 1 64 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 2 1 128 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 4 1 2 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 4 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 4 1 8 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 4 1 16 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 4 1 32 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 4 1 64 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 8 1 2 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 8 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 8 1 16 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 8 1 32 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 8 1 64 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 16 1 2 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 16 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 16 1 8 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 16 1 16 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 32 1 2 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 32 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 32 1 8 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 64 1 2 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64
#MB7_win_x86_SSE_OpenCL_ATi_HD5_r2889.exe -device 0 -spike_fft_thresh 2048 -tune 1 128 1 2 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64


With each crime and every kindness we birth our future.
ID: 1736882 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1736990 - Posted: 25 Oct 2015, 19:12:31 UTC - in response to Message 1736882.  
Last modified: 25 Oct 2015, 19:16:35 UTC

AstroPulse... 'Windows AP bench 211 minimal' and...
In past I used the '2LC67' AP WU of 'Zblank shortened WUs' (for J1900 iGPU and NV GT730).
For the R9 Fury X I should use the '9LC67' AP WU, right?

Execution like last time? Or there are now new added cmdline params possible?


MultiBeam...
'MBbench 2.10'?
And which WU?

Bench script for both is 2.13.

Task for AP is 9LC67

For MB use PG0395 PG444 and PG1327.

You can make 4 folders for bench runs for each GPU.
In comandline just add -device 0 to 3 to use all GPU`s.

To test oclFFT planning and tune params you need to understand how it works.
I certainly won`t give lessons in fft kernels.
There are hundreds of possiblities.

Here is a small example of one of my benches.
(...)

Where I could download the 'Bench v2.13'? (http://lunatics.kwsn.info/index.php?module=Downloads)

In which download folder are the three PG0395, PG444 & PG1327 WUs?
(The fastest settings are found, if all three WUs are fastest calculated? (Mix of WUs in real life.))

I understand it like this, that this cmdline params are 'connected':
-oclfft_tune_gr N
-oclfft_tune_lr N
-oclfft_tune_wg N
-oclfft_tune_ls N
-oclfft_tune_bn N
-oclfft_tune_cw N

Which is the min value and which is the max value?
min = 16
max = 512
So I can test, 16, 32, 48, 64, 80 (always +16) ... up to 512 for all above mentioned -oclfft_tune_* params?

Example:
1.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16
2.) -oclfft_tune_gr 32 -oclfft_tune_lr 32 -oclfft_tune_wg 32 -oclfft_tune_ls 32 -oclfft_tune_bn 32 -oclfft_tune_cw 32
3.) -oclfft_tune_gr 48 -oclfft_tune_lr 48 -oclfft_tune_wg 48 -oclfft_tune_ls 48 -oclfft_tune_bn 48 -oclfft_tune_cw 48
(...)
32.) -oclfft_tune_gr 512 -oclfft_tune_lr 512 -oclfft_tune_wg 512 -oclfft_tune_ls 512 -oclfft_tune_bn 512 -oclfft_tune_cw 512


1.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16
2.) -oclfft_tune_gr 32 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16
3.) -oclfft_tune_gr 48 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16
(...)
32.)-oclfft_tune_gr 512 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16

1.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16
2.) -oclfft_tune_gr 16 -oclfft_tune_lr 32 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16
3.) -oclfft_tune_gr 16 -oclfft_tune_lr 48 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16
(...)
32.)-oclfft_tune_gr 16 -oclfft_tune_lr 512 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16

1.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16
2.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 32 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16
3.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 48 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16
(...)
32.)-oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 512 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16

1.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16
2.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 32 -oclfft_tune_bn 16 -oclfft_tune_cw 16
3.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 48 -oclfft_tune_bn 16 -oclfft_tune_cw 16
(...)
32.)-oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 512 -oclfft_tune_bn 16 -oclfft_tune_cw 16

1.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16
2.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 32 -oclfft_tune_cw 16
3.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 48 -oclfft_tune_cw 16
(...)
32.)-oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 512 -oclfft_tune_cw 16

1.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 16
2.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 32
3.) -oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 48
(...)
32.)-oclfft_tune_gr 16 -oclfft_tune_lr 16 -oclfft_tune_wg 16 -oclfft_tune_ls 16 -oclfft_tune_bn 16 -oclfft_tune_cw 512

Then:
1.) -oclfft_tune_gr 16 -oclfft_tune_lr 32 -oclfft_tune_wg 32 -oclfft_tune_ls 32 -oclfft_tune_bn 32 -oclfft_tune_cw 32
2.) -oclfft_tune_gr 32 -oclfft_tune_lr 32 -oclfft_tune_wg 32 -oclfft_tune_ls 32 -oclfft_tune_bn 32 -oclfft_tune_cw 32
3.) -oclfft_tune_gr 48 -oclfft_tune_lr 32 -oclfft_tune_wg 32 -oclfft_tune_ls 32 -oclfft_tune_bn 32 -oclfft_tune_cw 32
(...)
32.)-oclfft_tune_gr 512 -oclfft_tune_lr 32 -oclfft_tune_wg 32 -oclfft_tune_ls 32 -oclfft_tune_bn 32 -oclfft_tune_cw 32

(...)

1.) -oclfft_tune_gr 16 -oclfft_tune_lr 48 -oclfft_tune_wg 48 -oclfft_tune_ls 48 -oclfft_tune_bn 48 -oclfft_tune_cw 48
2.) -oclfft_tune_gr 32 -oclfft_tune_lr 48 -oclfft_tune_wg 48 -oclfft_tune_ls 48 -oclfft_tune_bn 48 -oclfft_tune_cw 48
3.) -oclfft_tune_gr 48 -oclfft_tune_lr 48 -oclfft_tune_wg 48 -oclfft_tune_ls 48 -oclfft_tune_bn 48 -oclfft_tune_cw 48
(...)
32.)-oclfft_tune_gr 512 -oclfft_tune_lr 48 -oclfft_tune_wg 48 -oclfft_tune_ls 48 -oclfft_tune_bn 48 -oclfft_tune_cw 48

(...)

So that I had tested 16 (in +16 steps) up to 512 at all 6 -oclfft_tune_* places.
With all combinations which are possible?

Then (?) independence cmdline params:
-tune 1 N N N = all possible entries like at AstroPulse

-sbs N = 32, 64, 128, 256, 512

-period_iterations_num N = 10 (with +1 steps) up to 40

-spike_fft_thresh N = 512, 1024, 2048, 4096

...yes?


For AstroPulse I made already bench test runs...
For MultiBeam it will be the first time.


Maybe there is already a manual how to make MB bench test runs?


Thanks.
ID: 1736990 · Report as offensive
Profile Louis Loria II
Volunteer tester
Avatar

Send message
Joined: 20 Oct 03
Posts: 259
Credit: 9,208,040
RAC: 24
United States
Message 1737331 - Posted: 26 Oct 2015, 23:29:16 UTC - in response to Message 1736598.  

okie dokie... installed the BETA. No problems as far as I can tell at this point. I am still running multiple WUs with no hiccups. We'll see what happens over the next day or so...

AMD FX-8350 at 4300mhz
16gigs of G-Skill Ripjaws RAM at 1600mhz
2-Powercolor R9 280Xs 1030/1500mhz
Gigabyte GA970-UD3 MOBO
Samsung EVO 850 SSD
EVGA Supernova P2 1200W PSU


Running the BETA for 3 days now, no problems here.

<app_config>
<app>
<name>astropulse_v7</name>
<max_concurrent>12</max_concurrent>
<gpu_versions>
<gpu_usage>.33</gpu_usage>
<cpu_usage>.5</cpu_usage>
</gpu_versions>
</app>
<app>
<name>setiathome_v7</name>
<max_concurrent>12</max_concurrent>
<gpu_versions>
<gpu_usage>.20</gpu_usage>
<cpu_usage>.10</cpu_usage>
</gpu_versions>
</app>
</app_config>
ID: 1737331 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1737581 - Posted: 27 Oct 2015, 22:46:45 UTC
Last modified: 27 Oct 2015, 23:03:43 UTC

Someone could answer my questions in my last message? Thanks.
I can't make bench test runs now, until someone answer the questions...

(Maybe someone of the Lunatics crew could make a thread about which cmdline settings are possible (for MB and AP (ATI)) and which values (should be tested for slow and fast (compute units) GPUs)?
This would be very helpful.)



OK, installed Catalyst v15.10 Beta.

MB with '-v 0 -hp -no_cpu_lock' and AP with '-v 0 -hp' in cmdline.txt files.

Count 0.5 -> 2 WUs/GPU

MB, after a short time, a few computations errors.
AP, I started 8 WUs, all ended successful.

So does this mean that AP work well 2 WUs/GPU, but MB don't?

What's different?
ID: 1737581 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1737586 - Posted: 27 Oct 2015, 23:03:07 UTC

If I would like to let run SETI stock, without app_info.xml file, BOINC v7.6.9 would get all the apps (?):

SETI@home v7
7.07 (opencl_ati5_cat132)
7.07 (opencl_ati5_nocal)
7.07 (opencl_ati5_sah)
7.07 (opencl_ati_cat132)
7.07 (opencl_ati_nocal)
7.07 (opencl_ati_sah)

AstroPulse v7
7.09 (opencl_ati_100)


BOINC say just:
[4] AMD Fiji (4096MB) OpenCL: 2.0

(AFAIK, the VGA cards have 3072 MB VRAM.)

No driver version. So BOINC need an upgrade?

Maybe *_cat132 apps BOINC will not get?

Thanks.
ID: 1737586 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1737689 - Posted: 28 Oct 2015, 7:48:31 UTC - in response to Message 1736990.  

Maybe there is already a manual how to make MB bench test runs?


Nope, you could be its author ;)

To start with look these tables I created at times of oclFFT settings development.
They could give some impression what is possible and what isn't. On case study of C-60 APU capabilities. Only green lines should be used.

https://drive.google.com/file/d/0BwjTLNvsJmLBcEtaNG5xTUc4TDA/view?usp=sharing
https://docs.google.com/spreadsheets/d/1bywjOlnPhTcpzk7UFl4T4ZPb2T0uS19l-ILQBQR3OS4/edit?usp=sharing
https://docs.google.com/spreadsheets/d/1hgMumRHrYJ-R35xixyCI1_CHlczUE2f_M75jQ9euMJ8/edit?usp=sharing
ID: 1737689 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1737690 - Posted: 28 Oct 2015, 7:51:04 UTC - in response to Message 1737586.  

If I would like to let run SETI stock, without app_info.xml file, BOINC v7.6.9 would get all the apps (?):

SETI@home v7
7.07 (opencl_ati5_cat132)
7.07 (opencl_ati5_nocal)
7.07 (opencl_ati5_sah)
7.07 (opencl_ati_cat132)
7.07 (opencl_ati_nocal)
7.07 (opencl_ati_sah)

AstroPulse v7
7.09 (opencl_ati_100)


BOINC say just:
[4] AMD Fiji (4096MB) OpenCL: 2.0

(AFAIK, the VGA cards have 3072 MB VRAM.)

No driver version. So BOINC need an upgrade?

Maybe *_cat132 apps BOINC will not get?

Thanks.


It will get all allowed plan classes. Assuming SETI main plan classes are same as beta, look here: http://setiweb.ssl.berkeley.edu/beta/plan_class_spec.xml
ID: 1737690 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1737700 - Posted: 28 Oct 2015, 8:12:00 UTC - in response to Message 1737581.  

What's different?

Application used. Algorithms used inside application. Exact GPU device machine codes sequence inside applications.
Or do you think if both apps use GPU device for computations they are identical?
Definitely would be good to narrow area of differencies to particular parts of algorithm. This could help AMD to fix driver issues.
As example of such approach look posts by Jacob Klein who (despite on my skepticism) was able to drive nVidia to bug fixing. IMHO most important part in his endeavour was ability to demonstrate bug on nVidia's own SDK samples. Working with known code usually much easier then debugging smth else creation.
So, in very beginning of this thread I proposed you to try AMD own SDK samples in attempt to re-create similar invalid computations in their own code running simultaneously. This would make AMD more convinced about bug existance.
Unfortunately, my proposal looks ignored so far.
So, I could only propose such "deal" to you: either you do such tests on hardware you own or you buy me such hardware and I do those tests ;) Good deal? ;) ;)
ID: 1737700 · Report as offensive
Profile Sutaru Tsureku
Volunteer tester

Send message
Joined: 6 Apr 07
Posts: 7105
Credit: 147,663,825
RAC: 5
Germany
Message 1737711 - Posted: 28 Oct 2015, 10:33:15 UTC - in response to Message 1737700.  
Last modified: 28 Oct 2015, 10:35:01 UTC

OK, there: http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk

AMD-SDK-InstallManager-v1.4.84.exe ?

http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_APP_SDK_GettingStartedGuide.pdf (.PDF file)

It looks like I download the InstallManager and execute the samples over command prompt.

No additional software/tools needed?

This is just a fast message.
I look later more deeper because of correct execution/make samples results.

Is there something I shouldn't forget, pay attention of?

- - - - - - - - - -

Maybe it would be also good if you would own such Fury X VGA card?
SETI would profit from it (app development)?

Maybe a few SETI members could donate money that you could buy such VGA card?

I payed €700 for one Fury X. I looked, now for €650...
If 13 members pay €50 each...

But currently I have no idea how, which way.

ID: 1737711 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1737713 - Posted: 28 Oct 2015, 10:43:52 UTC - in response to Message 1737711.  


Is there something I shouldn't forget, pay attention of?

AFAIK many tests have CPU-based verification. You should check that GPU results pass that verification. Also, you should run few samples simultaneously or run sample with BOINC GPU computation enabled (to emulate few simultaneously run GPU apps).

- - - - - - - - - -

Maybe it would be also good if you would own such Fury X VGA card?
SETI would profit from it (app development)?

Maybe a few SETI members could donate money that you could buy such VGA card?

I payed €700 for one Fury X. I looked, now for €650...
If 13 members pay €50 each...

But currently I have no idea how, which way.

LoL, maybe, indeed :D Especially taking into account what today's change rate is...
ID: 1737713 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6

Message boards : Number crunching : 4x AMD Radeon R9 Fury X


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.