benchmark stock vs. optimized -- problem

Message boards : Number crunching : benchmark stock vs. optimized -- problem
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1732477 - Posted: 7 Oct 2015, 6:55:39 UTC - in response to Message 1732475.  

Yeah, x41zc will be a tad better, and current work going in ( Some from Petri33 and some of my own) will end up a fairly big step.

I'll certainly be trying to replicate soon, if only to completely understand if that behaviour is an artefact of the exe's linkage, or something environmental (and so how to avoid it)

For the purposes of general usage, you may consider x41zc usable live. Our delays in development are mostly just to do with a massive switch to a different team oriented development model, that is giving us a culture shock on top of Real life demands. On the other hand a sanity check Linux build with some cosmetic tweaks is due shortly, which I would add to downloads. All that is though is a baseline build environment check for all the new work going in. It will have some minor cosmetic tweaks, though be more or less functionally equivalent, so using what;s there now is fine if it works for you.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1732477 · Report as offensive
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1732761 - Posted: 8 Oct 2015, 6:14:11 UTC

Here are the benchmark comparisons. Just to restate the system specs: AMD FX-4300, 3.8 Ghz, Linux 3.16.7, Nvidia GTX 650, driver 352.41. The benchmarks were run with boinc suspended and no other significant GPU activity; the times for the x41g application are from the S@H task status page (1 instance per GPU and 1 core reserved for the x41g task) since it was not happy in "benchmark" context.

Abbreviated task names:
x41g = setiathome_x41g_x86_64-pc-linux-gnu_cuda32
7.08 = setiathome_7.08_x86_64-pc-linux-gnu__opencl_nvidia_sah
x41zc = setiathome_x41zc_x86_64-pc-linux-gnu_cuda60

Three work units:
15ap... = a small one, estimated computation size ~28000 GF
03no... = a medium one, ~69000 GF
04mr... = the largest, ~78000 GF

   App.        15ap     03no     04mr
  -----       ------   ------   ------
   x41g        855      1752     2107    times in seconds
   7.08        792      1432     1867
   x41zc       766      1374     1770


I did not use any command line parameters for any of the applications. From what I could find at kwsn.info there aren't any for the MB apps, only for AP.

Thanks to all who were interested and/or offered advice. I don't think I have anything further to add to this thread unless there is a specific test I can try or something to "grep" out of the strace logs.
ID: 1732761 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 451
Credit: 431,396,357
RAC: 553
Australia
Message 1732837 - Posted: 8 Oct 2015, 12:59:27 UTC

I don't think jason made any command-line parameters available - configuration is in mbcuda.cfg.
Soli Deo Gloria
ID: 1732837 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1732843 - Posted: 8 Oct 2015, 13:30:42 UTC - in response to Message 1732837.  

I don't think jason made any command-line parameters available - configuration is in mbcuda.cfg.

Command-line parameters are available for the Windows versions, although the configuration file approach is recommended (it's easier to manage, it's self-documented, and it allows finer control down to the device level).

I don't know for certain how much of that caries over to the Linux builds - some of the parameters, like process priority control, would need a separate coding branch, even if the bulk of the science run uses a common code-base. Also, x41zc has greater configurability than x41g.
ID: 1732843 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1732852 - Posted: 8 Oct 2015, 13:48:05 UTC - in response to Message 1732843.  
Last modified: 8 Oct 2015, 14:19:58 UTC

I don't think jason made any command-line parameters available - configuration is in mbcuda.cfg.

Command-line parameters are available for the Windows versions, although the configuration file approach is recommended (it's easier to manage, it's self-documented, and it allows finer control down to the device level).

I don't know for certain how much of that caries over to the Linux builds - some of the parameters, like process priority control, would need a separate coding branch, even if the bulk of the science run uses a common code-base. Also, x41zc has greater configurability than x41g.


The OpenCL-app for Linux does have optional settings (see README), and if some one has root-access, like Gene, it should be no problem to write a script that handles process priority control. The app does not offer this option, because it is made for Linux userland. Therefor no reduction of "nice"ness possible.

Of course someone could use BOINCs cc_config.xml and set <no_priority_change>0</no_priority_change> to "1", which should give "normal priority" (nice 0) to all apps that BOINC starts, but in general it should not be necessary.
_\|/_
U r s
ID: 1732852 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1732858 - Posted: 8 Oct 2015, 14:17:19 UTC - in response to Message 1732852.  


The OpenCL-app for Linux does have optional settings

As I understand, Linux port implements all oclFFT configs along with buffer size management for PulseFind. These options could make some difference so definitely worth to try some tuning if inclination present.
ID: 1732858 · Report as offensive
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1733201 - Posted: 9 Oct 2015, 16:45:14 UTC


The OpenCL-app for Linux does have optional settings (see README)


@Urs

That link points to options [nographics version verbose] which don't seem to apply to the application. Maybe you had a different README in mind.

@Raistmer

In the benchmark mode I tried (wildly optimistic...) the following command line:
-unroll 4 -tune 1 128 8 1 -oclFFT_plan 256 16 1024 -ffa_block 8192 -ffa_block_fetch 4096

The x41zc app rejected all parameters as "bad arg"; the 7.08 stock app gave no messages but appears to have silently ignored all the cmd line parameters. The run time was the same as without any cmd line.
ID: 1733201 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1733203 - Posted: 9 Oct 2015, 16:53:50 UTC - in response to Message 1733201.  
Last modified: 9 Oct 2015, 17:01:12 UTC

Yes, we were having a quiet conversation about CUDA apps, comparing Windows with Linux, when a couple of people jumped jumped in with numbers and documents relating to OpenCL. Different language, different application, different command line entirely. Skip the previous two posts.

And the next one. Not for the apps you're trying to run.
ID: 1733203 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1733205 - Posted: 9 Oct 2015, 16:58:41 UTC - in response to Message 1733201.  


The x41zc app rejected all parameters as "bad arg"; the 7.08 stock app gave no messages but appears to have silently ignored all the cmd line parameters. The run time was the same as without any cmd line.


You supplied args for AstroPulse, not MultiBeam OpenCL app.
Look ReadMe for OpenCL MultiBeam for available options.
As hint look for Mike's suggestions for High-end AMD cards:
-sbs 256 -period_iterations_num 40 -spike_fft_thresh 2048 -tune 1 64 1 4 -oclfft_tune_gr 256 -oclfft_tune_lr 16 -oclfft_tune_wg 256 -oclfft_tune_ls 512 -oclfft_tune_bn 64 -oclfft_tune_cw 64 -hp -no_cpu_lock
ID: 1733205 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1733206 - Posted: 9 Oct 2015, 17:01:01 UTC - in response to Message 1733203.  
Last modified: 9 Oct 2015, 17:03:01 UTC

Skip the previous two posts.

Richard, as OP printed his results, 7.08 included in result table. I would suggest to re-read conversation before such radical "suggestions".
OP wanna test different apps under Linux. Some of them allow additional params. So, what you regard as "to skip" then ??
ID: 1733206 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1733210 - Posted: 9 Oct 2015, 17:04:45 UTC - in response to Message 1733206.  

All right, I'll go back to sleep and let him work it out for himself.

And dream of the time when all SETI apps have a common command line syntax.
ID: 1733210 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1733211 - Posted: 9 Oct 2015, 17:05:46 UTC - in response to Message 1733210.  

All right, I'll go back to sleep and let him work it out for himself.

And dream of the time when all SETI apps have a common command line syntax.

I would dream when they will use hardware better of ultimately found something instead :P
ID: 1733211 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1733234 - Posted: 9 Oct 2015, 19:43:03 UTC - in response to Message 1733210.  

All right, I'll go back to sleep and let him work it out for himself.

And dream of the time when all SETI apps have a common command line syntax.


That would be the dream of the dreams.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1733234 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1733257 - Posted: 9 Oct 2015, 21:25:03 UTC - in response to Message 1733201.  


The OpenCL-app for Linux does have optional settings (see README)


@Urs

That link points to options [nographics version verbose] which don't seem to apply to the application. Maybe you had a different README in mind.


Yes, sorry Gene, the filenames are somehow similar. Second try :

OpenCL README
_\|/_
U r s
ID: 1733257 · Report as offensive
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1733445 - Posted: 10 Oct 2015, 16:18:51 UTC


OpenCL README


@Urs

Thanks. That one looks good. Give me a couple of days to study it and run a few benchmark comparisons (I have benchmark results in place for default, i.e. no command line options). I don't expect a big improvement - the GTX650 is "low end" with only 2 compute units, but 1G video memory might allow some gain from buffering efficiency.

@Raistmer

Your list of cmd options is noted. I will try to "merge" that list with the guidance from the OpenCL README and do some benchmark testing. There are a lot(!) of combinations and not all variables are independent.
ID: 1733445 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1733484 - Posted: 10 Oct 2015, 19:20:57 UTC - in response to Message 1732475.  

Maybe you are not worrying too much about the x41g crashes but something crossed my mind.

selinux

"/tmp/BENCH/KWSN-Bench-Linux-MBv7"...


/tmp is sort of special in that anyone and anything can write there. It could be that SELinux is forcing extra restrictions on code running from there.

Why the other apps don't crash I don't know. x41g seems to be tripping over null pointer so there could be some syscall that failed and x41g not checking for that failure. Maybe the newer apps or newer CUDA/OpenCL runtime don't do the same syscall or they handle the failure better.
ID: 1733484 · Report as offensive
Gene Project Donor

Send message
Joined: 26 Apr 99
Posts: 150
Credit: 48,393,279
RAC: 118
United States
Message 1734565 - Posted: 16 Oct 2015, 4:36:03 UTC


/tmp is sort of special


@juha
I have dealt with the well-known "tmp" properties, i.e. all content is lost on power down and updatedb does not index its contents. And I've even (rarely) had the write permission for "other" mysteriously go away so that, as user, I could not write files there. To satisfy myself that the benchmarking in /tmp/.. was not a factor I replicated everything in /root/BENCH/ with the same results - just the x41g application fails.

Even further experimenting with everything in /home/gene/BENCH/.. and running benchmark as user, and as root, had the same SEGV exit. Then, somewhat surprising, I set up all the files that the benchmark script does (init_data.xml, work_unit.sah, and the executable) and tried running the executable directly in that directory. SIGSEGV !! I even copied the init_data.xml from the boinc/slots directory of a running x41g application instead of using the .template and that also failed.
I am left with the suspicion that boinc does something else, besides passing the init_data.xml and work_unit.sah files, before passing control to the application. Initializing or priming the GPU?? Or, can boinc trap the SIGSEGV and recover from it?? Both hypotheses based on actions not available running in the benchmark script.

Meanwhile, comparing stock 7.08 and Beta x41zc. (Which the benchmark script handles without complaint!) I tried a few random options for 7.08 from the suggestions given earlier in this thread, Times were either unchanged or longer, compared to the no options reference. My GTX650 is low performance to begin with so maybe the options only help on higher performance GPUs.
ID: 1734565 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1734567 - Posted: 16 Oct 2015, 4:51:52 UTC - in response to Message 1734565.  

Do you have CUDA libraries in SETI@home directory (<BOINC_Data>\projects\setiathome.berkeley.edu\) (where you also have the x41g executable)?
Do you have all those same CUDA libraries in /BENCH/... (next to x41g executable)?
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1734567 · Report as offensive
Juha
Volunteer tester

Send message
Joined: 7 Mar 04
Posts: 388
Credit: 1,857,738
RAC: 0
Finland
Message 1734754 - Posted: 16 Oct 2015, 21:25:54 UTC - in response to Message 1734565.  

fails SEGV SIGSEGV failed SIGSEGV


So much for that theory.

boinc does something else, besides passing the init_data.xml and work_unit.sah files, before passing control to the application.


Does the following:

1. if API version is < 7.5 adds "--device x" to command line
2. if API version is >= 6.0 sets up memory mapped file, otherwise shared memory segment
3. redirects stdout to /dev/null
4. sets LD_LIBRARY_PATH to "../../$project:.:../.." (on Mac that plus ":/usr/local/cuda/lib/" and sets DYLD_LIBRARY_PATH to same)
5. redirects stderr to stderr file
6. sets priority/niceness to 19 (CPU apps) or 10 (the rest)
7. for CPU apps sets scheduling policy to SCHED_BATCH

It might also change some setting for itself which then gets inherited by the science app. But I haven't seen anything like that nor can I think of any such setting.

I don't think anyone ever sets <api_version> in app_info.xml so you can safely assume that the client thinks the API version is something really old.
ID: 1734754 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1734760 - Posted: 16 Oct 2015, 22:02:03 UTC - in response to Message 1734567.  

Do you have CUDA libraries in SETI@home directory (<BOINC_Data>\projects\setiathome.berkeley.edu\) (where you also have the x41g executable)?
Do you have all those same CUDA libraries in /BENCH/... (next to x41g executable)?


In addition to Juha's excellent comment...

If the lib files that are needed, libcudart and libcufft (with their respective version dependent names) are not found in the LIBPATH (echo $LIBPATH) they should be in the ../BENCH/. directory, NOT in the .../BENCH/APPS/. -directory where the applications are.

Just now You may have the library files in Your boinc/seti but not in BENCH/.

... and ... The applications and the libraries need 'executable' bit set. (chmod ugo+x filename).

:|
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1734760 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : benchmark stock vs. optimized -- problem


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.