Posts by Dirk Sadowski


log in
1) Message boards : Team Recruitment Center : BOINC Team #1@SETI - Needs Your Help! (Message 1658779)
Posted 2 days ago by Profile Dirk SadowskiProject donor



As the team name suggests, our goal is to reach and remain at Rank #1 here and here. :-D


Your help is needed!

Join our team here at SETI@home and together we accomplish this goal! :-)


The detailed team description you find here.


2) Message boards : Number crunching : Intel GPU tasks fail (Message 1658776)
Posted 2 days ago by Profile Dirk SadowskiProject donor
Look there, maybe it's the driver version?

Message boards : Number crunching : SETI@home v7 7.03 (opencl_intel_gpu_sah) - Computation Error
3) Message boards : Number crunching : BOINC 7.4.42 (Message 1658768)
Posted 2 days ago by Profile Dirk SadowskiProject donor
AFAIK, in v7.4.36 the '0 resource share' didn't worked.

In v7.4.42 it work again (work request at this project with '0' if BOINC have no WUs ready for crunching)?
4) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1658620)
Posted 2 days ago by Profile Dirk SadowskiProject donor
I worry to find the fastest settings will not last days, it will last weeks/months ... ;-)

With the '-unroll 5 -ffa_block 1472 -ffa_block_fetch 368' the calculation times on iGPU gone from ~21h down to ~17,5h.
One CPU-thread do an AP WU in ~22h.
The iGPU and CPU aren't longer similar.

I could 'skip' the test run with BOINC running in the background CPU WUs?

We could finish this function settings test and we go further to test the next function settings?

Thanks.
5) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1658231)
Posted 3 days ago by Profile Dirk SadowskiProject donor
(previously) winner from 7. run:
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1017.331 secs CPU 13.313 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1019.347 secs CPU 13.781 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1017.888 secs CPU 20.969 secs
...........................................................................................Elapsed 1018.189 secs CPU 16.021 secs (average)


11. run - test of the other -oclFFT_plan values (found new fastest params), 12. run (2nd and 3rd run of the new 3 fastest):

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 16 8 32 : Elapsed 1067.872 secs CPU 14.875 secs - 1st
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 16 8 64 : Elapsed 1058.029 secs CPU 25.453 secs - 1st
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 16 8 128 : Elapsed 1061.091 secs CPU 14.281 secs - 1st
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 16 8 256 : Elapsed 1068.990 secs CPU 15.234 secs - 1st
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 32 8 32 : Elapsed 1086.904 secs CPU 14.688 secs - 1st

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 32 8 64 : Elapsed 1008.850 secs CPU 16.750 secs - 1st
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 32 8 64 : Elapsed 994.947 secs CPU 15.984 secs - 2nd
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 32 8 64 : Elapsed 993.226 secs CPU 21.813 secs - 3rd
........................................................................................................................Elapsed 999.008 secs CPU 18.182 secs (average)

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 32 8 128 : Elapsed 1007.959 secs CPU 14.844 secs - 1st
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 32 8 128 : Elapsed 1006.604 secs CPU 16.516 secs - 2nd
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 32 8 128 : Elapsed 1006.654 secs CPU 15.328 secs - 3rd
..........................................................................................................................Elapsed 1007.072 secs CPU 15.563 secs (average)

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 32 8 256 : Elapsed 993.624 secs CPU 22.031 secs - 1st
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 32 8 256 : Elapsed 992.701 secs CPU 23.313 secs - 2nd
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 32 8 256 : Elapsed 992.907 secs CPU 22.531 secs - 3rd
..........................................................................................................................Elapsed 993.077 secs CPU 22.625 secs (average)

What should I do now?

Thanks.

BTW. I use the fastest settings now 'live'.
In the program data/slots/*/stderr.txt file I read a new AP_clFFTplan_*.bin_* file was created in the setiathome.berkeley.edu folder.
Have this new file the same function like a .wisdom file (so a new calibration run in BOINC with 2 new and different tasks needed again)?
6) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1657982)
Posted 4 days ago by Profile Dirk SadowskiProject donor
Winner (until now) from 7. run:
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1017.331 secs CPU 13.313 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1019.347 secs CPU 13.781 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1017.888 secs CPU 20.969 secs
...........................................................................................Elapsed 1018.189 secs CPU 16.021 secs (average)

'-tune 2' no effect.

Here the 10. run (added -oclFFT_plan):
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 64 8 32 : Elapsed 1444.710 secs CPU 14.766 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 64 8 64 : Elapsed 1047.000 secs CPU 22.563 secs <- #3
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 64 8 128 : Elapsed 1018.135 secs CPU 22.109 secs <- fastest, #1
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 64 8 256 : Elapsed 1032.599 secs CPU 16.547 secs <- #2
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 64 16 32 : Elapsed 1047.690 secs CPU 22.328 secs <- #4
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 64 16 64 : Elapsed 1104.392 secs CPU 16.563 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 64 16 128 : Elapsed 1086.849 secs CPU 21.547 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 64 16 256 : Elapsed 1090.922 secs CPU 13.891 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 128 8 32 : Elapsed 1641.464 secs CPU 16.234 secs - wrong 0/0 result
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 128 8 64 : Elapsed 1189.361 secs CPU 14.609 secs - wrong 0/0 result
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 128 8 128 : Elapsed 938.334 secs CPU 16.297 secs - wrong 0/0 result
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 128 8 256 : Elapsed 958.776 secs CPU 15.625 secs - wrong 0/0 result

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 128 16 32 : Elapsed 1520.595 secs CPU 15.344 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 128 16 64 : Elapsed 1103.218 secs CPU 21.063 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 128 16 128 : Elapsed 1220.350 secs CPU 13.453 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 128 16 256 : Elapsed 1234.224 secs CPU 15.719 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 256 8 32 : Elapsed 1937.555 secs CPU 24.750 secs - wrong 0/0 result
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 256 8 64 : Elapsed 1286.375 secs CPU 14.750 secs - wrong 0/0 result
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 256 8 128 : Elapsed 934.111 secs CPU 19.125 secs - wrong 0/0 result
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 256 8 256 : Elapsed 831.297 secs CPU 14.516 secs - wrong 0/0 result

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 256 16 32 : Elapsed 1752.615 secs CPU 15.406 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 256 16 64 : Elapsed 1196.623 secs CPU 14.547 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 256 16 128 : Elapsed 1091.905 secs CPU 22.891 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 256 16 256 : Elapsed 1048.444 secs CPU 18.641 secs <- #5
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 256 32 32 : Elapsed 1362.465 secs CPU 15.672 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 256 32 64 : Elapsed 1238.831 secs CPU 25.891 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 256 32 128 : Elapsed 1141.789 secs CPU 13.750 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan 256 32 256 : Elapsed 1153.078 secs CPU 16.141 secs

What should I do now?

Thanks.
7) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1657651)
Posted 4 days ago by Profile Dirk SadowskiProject donor
Hi Jason,
now you confused me. ;-)
Before I test -oclFFT_plan, I need to go back to the first and following test runs and look (take into account) to average/median times?
8) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1657637)
Posted 4 days ago by Profile Dirk SadowskiProject donor
Winner of 7. run:
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1017.331 secs CPU 13.313 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1019.347 secs CPU 13.781 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1017.888 secs CPU 20.969 secs
...........................................................................................Elapsed 1018.189 secs CPU 16.021 secs (average)

8. run (added -tune 2 N N N):
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 32 1 1 : Elapsed 1030.832 secs CPU 16.094 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 32 1 1 : Elapsed 1019.339 secs CPU 13.750 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 32 1 1 : Elapsed 1019.760 secs CPU 13.984 secs
...............................................................................................................Elapsed 1023.310 secs CPU 14.609 secs (average)

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 64 1 1 : Elapsed 1031.556 secs CPU 16.734 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 64 1 1 : Elapsed 1017.591 secs CPU 21.656 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 64 1 1 : Elapsed 1033.269 secs CPU 16.859 secs
...............................................................................................................Elapsed 1027.472 secs CPU 18.416 secs (average)

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 128 1 1 : Elapsed 1018.622 secs CPU 21.875 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 128 1 1 : Elapsed 1033.929 secs CPU 16.094 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 128 1 1 : Elapsed 1018.039 secs CPU 21.203 secs
.................................................................................................................Elapsed 1023.530 secs CPU 19.724 secs (average)

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 256 1 1 : Elapsed 1020.815 secs CPU 14.109 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 256 1 1 : Elapsed 1032.858 secs CPU 14.953 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 256 1 1 : Elapsed 1017.683 secs CPU 21.281 secs
.................................................................................................................Elapsed 1023.785 secs CPU 16.781 secs (average)
(...)

As a perfectionist I had to test it (9. run, -tune 2 with 8 & 16). ;-)

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 8 1 1 : Elapsed 1029.878 secs CPU 16.578 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 8 1 1 : Elapsed 1032.028 secs CPU 15.359 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 8 1 1 : Elapsed 1018.096 secs CPU 20.453 secs
..............................................................................................................Elapsed 1026,667 secs CPU 17,463 secs (average)


-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 16 1 1 : Elapsed 1017.932 secs CPU 21.531 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 16 1 1 : Elapsed 1034.118 secs CPU 15.203 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 16 1 1 : Elapsed 1020.170 secs CPU 13.938 secs
...............................................................................................................Elapsed 1024,073 secs CPU 16,891 secs (average)

-oclFFT_tune test run follow ASAP.

[EDIT: Ops, -oclFFT_plan is correct. ;-)]
9) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1657577)
Posted 4 days ago by Profile Dirk SadowskiProject donor
I'll do the next bench test run ASAP.

I saw SETI Beta released 'APv7 7.08 (opencl_intel_gpu_102)'.

I do the tests with the r2737 app.

If v7.08 will released here at SETI Main, all my tests are obsolete and I need to start again from scratch?

Thanks.
10) Message boards : Team Recruitment Center : BOINC Team #1@SETI - Needs Your Help! (Message 1657275)
Posted 5 days ago by Profile Dirk SadowskiProject donor



As the team name suggests ;-) , our goal is to reach and remain at Rank #1 here and here. :-)


Your help is needed!

Join our team here at SETI@home and together we accomplish this goal! :-)


The detailed team description you find here.


11) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1657267)
Posted 5 days ago by Profile Dirk SadowskiProject donor
Winner of 7. run:
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1017.331 secs CPU 13.313 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1019.347 secs CPU 13.781 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1017.888 secs CPU 20.969 secs
...........................................................................................Elapsed 1018.189 secs CPU 16.021 secs (average)

8. run (added -tune 2 N N N):
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 32 1 1 : Elapsed 1030.832 secs CPU 16.094 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 32 1 1 : Elapsed 1019.339 secs CPU 13.750 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 32 1 1 : Elapsed 1019.760 secs CPU 13.984 secs
...............................................................................................................Elapsed 1023.310 secs CPU 14.609 secs (average)

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 64 1 1 : Elapsed 1031.556 secs CPU 16.734 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 64 1 1 : Elapsed 1017.591 secs CPU 21.656 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 64 1 1 : Elapsed 1033.269 secs CPU 16.859 secs
...............................................................................................................Elapsed 1027.472 secs CPU 18.416 secs (average)

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 128 1 1 : Elapsed 1018.622 secs CPU 21.875 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 128 1 1 : Elapsed 1033.929 secs CPU 16.094 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 128 1 1 : Elapsed 1018.039 secs CPU 21.203 secs
.................................................................................................................Elapsed 1023.530 secs CPU 19.724 secs (average)

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 256 1 1 : Elapsed 1020.815 secs CPU 14.109 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 256 1 1 : Elapsed 1032.858 secs CPU 14.953 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -tune 2 256 1 1 : Elapsed 1017.683 secs CPU 21.281 secs
.................................................................................................................Elapsed 1023.785 secs CPU 16.781 secs (average)


Hm, I'm confused now ..., -tune 2 showed no benefit ..., other values are possible (maybe between the tested values - or lower or higher?)?

What should I test now?

Thanks.
12) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1652965)
Posted 17 days ago by Profile Dirk SadowskiProject donor
7. test run (2nd and 3rd run of the 3 best of last run):

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 8 4 1 : Elapsed 1010.911 secs CPU 13.922 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 8 4 1 : Elapsed 1029.613 secs CPU 17.047 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 8 4 1 : Elapsed 1030.048 secs CPU 16.422 secs
.........................................................................................Elapsed 1023.524 secs CPU 15,797 secs (average)

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 32 2 1 : Elapsed 1016.773 secs CPU 14.016 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 32 2 1 : Elapsed 1032.878 secs CPU 16.563 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 32 2 1 : Elapsed 1017.640 secs CPU 21.734 secs
...........................................................................................Elapsed 1022.430 secs CPU 17.438 secs (average)

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1017.331 secs CPU 13.313 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1019.347 secs CPU 13.781 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1017.888 secs CPU 20.969 secs
...........................................................................................Elapsed 1018.189 secs CPU 16,021 secs (average)

If I look to the results, I think:
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1
... are the fastest settings for the J1900 iGPU - until now, yes?

If I look to the readme of Intel® iGPU AP:
-skip_ffa_precompute
-use_sleep (I thought just for NV GPUs, no?)
-cpu_lock
-sbs N
-oclFFT_plan

I should test this settings also (all independent from each other, alone?) (maybe there are more possible settings?)?
Like this, compare the first 3:

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -skip_ffa_precompute

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -use_sleep

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -cpu_lock

And then, how, with which params?:
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -sbs 128 (?)
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -sbs 256 (?)

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan N N N
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 -oclFFT_plan N N N


[The iGPU get 512MB system RAM.
With the fastest settings until now, mentioned above, GPU-Z say mem usage:
dedicated: 25MB
dynamic: (around, down/up) 130-145MB]


Thanks.
13) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1652536)
Posted 18 days ago by Profile Dirk SadowskiProject donor
6. test run:
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 8 4 1 : Elapsed 1010.911 secs CPU 13.922 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 16 2 1 : Elapsed 1075.417 secs CPU 13.328 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 16 4 1 : Elapsed 1038.886 secs CPU 20.156 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 32 2 1 : Elapsed 1016.773 secs CPU 14.016 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 32 4 1 : Elapsed 1030.911 secs CPU 21.281 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 2 1 : Elapsed 1017.331 secs CPU 13.313 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp -tune 1 64 4 1 : Elapsed 1029.548 secs CPU 19.203 secs

(after looking to stderr, without -ffa_* settings it ran at default -ffa_block 1024 -ffa_block_fetch 512):
-unroll 5 -hp -tune 1 8 4 1 : Elapsed 1080.439 secs CPU 21.109 secs
-unroll 5 -hp -tune 1 16 2 1 : Elapsed 1121.769 secs CPU 25.063 secs
-unroll 5 -hp -tune 1 16 4 1 : Elapsed 1059.497 secs CPU 16.422 secs
-unroll 5 -hp -tune 1 32 2 1 : Elapsed 1122.313 secs CPU 23.500 secs
-unroll 5 -hp -tune 1 32 4 1 : Elapsed 1073.097 secs CPU 21.172 secs
-unroll 5 -hp -tune 1 64 2 1 : Elapsed 1082.752 secs CPU 25.016 secs
-unroll 5 -hp -tune 1 64 4 1 : Elapsed 1048.965 secs CPU 16.547 secs

What should I do now?
This was all I could adjust, or are there more settings (-x N)?
I found now the fastest params, or I should let run the 3 fastest each twice again for confirmation?

Thanks.
14) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1651139)
Posted 22 days ago by Profile Dirk SadowskiProject donor
Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 1.2 Platform Name: Intel(R) OpenCL Platform Vendor: Intel(R) Corporation Platform Extensions: cl_intel_dx9_media_sharing cl_khr_byte_addressable_store cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics Platform Name: Intel(R) OpenCL Number of devices: 2 Device Type: CL_DEVICE_TYPE_CPU Device ID: 32902 Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 0 Max clock frequency: 1990Mhz Address bits: 14757395255531667488 Max memory allocation: 536838144 Image support: Yes Max number of images read arguments: 480 Max number of images write arguments: 480 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 480 Max size of kernel argument: 3840 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: No Round to +ve and infinity: No IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 1048576 Global memory size: 2147352576 Constant buffer size: 131072 Max number of constant args: 480 Local memory type: Global Local memory size: 32768 Error correction support: 0 Profiling timer resolution: 512 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: Yes Queue properties: Out-of-Order: Yes Profiling : Yes Platform ID: 00DE1DA0 Name: Intel(R) Celeron(R) CPU J1900 @ 1.99GHz Vendor: Intel(R) Corporation Driver version: 3.0.1.10878 Profile: FULL_PROFILE Version: OpenCL 1.2 (Build 76413) Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_intel_printf cl_ext_device_fission cl_intel_exec_by_local_thread cl_khr_gl_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d11_sharing Device Type: CL_DEVICE_TYPE_GPU Device ID: 32902 Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 1 Preferred vector width short: 1 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 0 Max clock frequency: 200Mhz Address bits: 14757395255531667520 Max memory allocation: 341835776 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: No Cache type: Read/Write Cache line size: 64 Cache size: 2097152 Global memory size: 1367343104 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 65536 Error correction support: 0 Profiling timer resolution: 80 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue properties: Out-of-Order: No Profiling : Yes Platform ID: 00DE1DA0 Name: Intel(R) HD Graphics Vendor: Intel(R) Corporation Driver version: 10.18.10.3408 Profile: FULL_PROFILE Version: OpenCL 1.2 Extensions: cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_accelerator cl_intel_motion_estimation
15) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1650987)
Posted 22 days ago by Profile Dirk SadowskiProject donor
Thanks.

I have let run a MB WU without '-v 0' and got:
OpenCL Platform Name: Intel(R) OpenCL Number of devices: 1 Max compute units: 4 Max work group size: 256 Max clock frequency: 200Mhz Max memory allocation: 341835776 Cache type: Read/Write Cache line size: 64 Cache size: 2097152 Global memory size: 1367343104 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 65536 Queue properties: Out-of-Order: No Name: Intel(R) HD Graphics Vendor: Intel(R) Corporation Driver version: 10.18.10.3408 Version: OpenCL 1.2 Extensions: cl_intel_dx9_media_sharing cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_depth_images cl_khr_dx9_media_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_khr_gl_sharing cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_accelerator cl_intel_motion_estimation


Is this enough and fine, or I should let run also clInfo?

So I should let run the above ...
-tune 1 8 4 1
-tune 1 16 2 1
-tune 1 16 4 1
-tune 1 32 2 1
-tune 1 32 4 1
-tune 1 64 2 1
-tune 1 64 4 1
... (with and without -ffa_*) or some more or less?

Thanks.
16) Message boards : Number crunching : Intel® iGPU MB bench test run (Message 1650850)
Posted 23 days ago by Profile Dirk SadowskiProject donor
I have an Intel® Celeron® J1900 (Quad-Core) with Intel® HD Graphics (iGPU).
The iGPU just have 4 compute units.

I would like to find the fastest settings for the MB (MultiBeam, SETI@home) Intel® OpenCL application.

Example:
An AR 0.415014 WU lasts:
Run time 5 hours 48 min 22 sec
CPU time 7 min 3 sec

http://lunatics.kwsn.net/index.php?module=Downloads;catd=45

Which bench test run tool I should use and with which test WU (best (also quickest) WU, combination of calculation time and meaningfulness of the bench test run)?

With which cmdline settings I should start/compare?

Thanks.
17) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1650843)
Posted 23 days ago by Profile Dirk SadowskiProject donor
clInfo ?

Does anyone know a trusted download site?

Thanks.
18) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1650833)
Posted 23 days ago by Profile Dirk SadowskiProject donor
Thanks.

The 2 (with and without -ffa_*) x 7 different params bench test run will last some time - will post ASAP.

I have '-v 0' settings always.
So just this:
OpenCL Platform Name: Intel(R) OpenCL Number of devices: 1 Max compute units: 4 Max work group size: 256 Max clock frequency: 200Mhz Max memory allocation: 341835776 Name: Intel(R) HD Graphics Vendor: Intel(R) Corporation Driver version: 10.18.10.3408 Version: OpenCL 1.2

It's this what's wanted?

Max clock frequency: 200 Mhz?
Max memory allocation: 326 MB?

Intel® Celeron® Processor J1900
- says:
Graphics Base Frequency 688 MHz Graphics Max Dynamic Frequency 854 MHz

The iGPU get 512 MB system RAM (settings in BIOS). (If 'auto' the iGPU get just 256 MB)

So this two stderr infos are wrong?
19) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1650422)
Posted 24 days ago by Profile Dirk SadowskiProject donor
Thanks to all.
All your messages are appreciated.
You have the most knowledge and experiences.
I don't want someone feels offended - if I say I would like to follow Joe's instruction. :-)


5. test run (1472/736 3rd, all others 2nd and 3rd run):

-unroll 5 -ffa_block 1472 -ffa_block_fetch 736 -hp
Elapsed 1043.571 secs CPU 13.578 secs
Elapsed 1040.732 secs CPU 21.859 secs
Elapsed 1042.560 secs CPU 15.500 secs
Elapsed 1042.288 secs (average)

-unroll 5 -ffa_block 736 -ffa_block_fetch 736 -hp
Elapsed 1043.800 secs CPU 19.016 secs
Elapsed 1042.366 secs CPU 27.672 secs
Elapsed 1058.247 secs CPU 21.547 secs
Elapsed 1048.138 secs (average)

-unroll 5 -ffa_block 2208 -ffa_block_fetch 736 -hp
Elapsed 1043.915 secs CPU 12.172 secs
Elapsed 1042.393 secs CPU 20.453 secs
Elapsed 1045.627 secs CPU 12.984 secs
Elapsed 1043.978 secs (average)

-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp
Elapsed 1041.475 secs CPU 17.234 secs
Elapsed 1025.846 secs CPU 21.359 secs
Elapsed 1028.439 secs CPU 14.469 secs
Elapsed 1031.92 secs (average)


Which settings/params I should test now with the fastest 1472/368?

Thanks.
20) Message boards : Number crunching : Intel® iGPU AP bench test run (Message 1649566)
Posted 26 days ago by Profile Dirk SadowskiProject donor
If I understood it correct I should test 1470/490 intead of 1473/491.

Winner until now:
1st run of: -unroll 5 -ffa_block 1472 -ffa_block_fetch 736 -hp : Elapsed 1043.571 secs CPU 13.578 secs
2nd run of: -unroll 5 -ffa_block 1472 -ffa_block_fetch 736 -hp : Elapsed 1040.732 secs CPU 21.859 secs

4. run:
-unroll 5 -ffa_block 736 -ffa_block_fetch 736 -hp : Elapsed 1043.800 secs CPU 19.016 secs
-unroll 5 -ffa_block 2208 -ffa_block_fetch 736 -hp : Elapsed 1043.915 secs CPU 12.172 secs
-unroll 5 -ffa_block 2944 -ffa_block_fetch 736 -hp : Elapsed 1067.664 secs CPU 13.672 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 1472 -hp : Elapsed 1057.448 secs CPU 15.375 secs
-unroll 5 -ffa_block 1470 -ffa_block_fetch 490 -hp : Elapsed 1073.687 secs CPU 16.313 secs
-unroll 5 -ffa_block 1472 -ffa_block_fetch 368 -hp : Elapsed 1041.475 secs CPU 17.234 secs

Hm, OK, which params are the fastest now? ;-)

Which params I should test now?

Thanks.


Next 20

Copyright © 2015 University of California