Profiling AMD/ATI OpenCL systems (Linux)

Message boards : Number crunching : Profiling AMD/ATI OpenCL systems (Linux)
Message board moderation

To post messages, you must log in.

AuthorMessage
Fortran

Send message
Joined: 20 Apr 06
Posts: 16
Credit: 13,398,872
RAC: 31
Canada
Message 1799262 - Posted: 28 Jun 2016, 23:56:33 UTC

I have 4 AMD 64 systems, one of which is not yet moved to a 64 bit OS. The oldest system has a 4800+ on a M2R32-MVP motherboard, with I believe 8 GB of RAM. It is intended to be the LAN server, and will do backups and other stuff. The smallest system was a refurbished business system from Dell, which is running the 32 bit OS, has 3 GB of RAM and a HD-5450 graphics card (a little bit of single precision floating point GPU). What was the only other system, is a 78LMT-USB3 with a AMD B24 processor and 8 GB of RAM. It normally runs with a HD-6450 video card, which has a bit more single precision performance than the 5450.

The reason for the 2 GPUs is not games or graphics, just an opportunity to learn about number crunching on GPUs. I thought BOINC would be a good place to learn.

There is an application I have in mind which could be heavy on floating point in an environment where energy efficiency is useful. In terms of GPU, I thought a R7-250 was the unit to use, and I recently had an opportunity to pick up one of these cards (2 GB of video RAM). In the machine with the HD6450, having the R7-250 installed instead of the HD6450 results in a miniscule difference in BOINC units. Well, the 78LMT only has PCI2-2.0 slots, and the RAM isn't that fast (1333?). Why such a small difference between 2 quite different GPUs?

Okay, I went out and bought some more of my application hardware, a motherboard with a PCIe-3.0 slot for a video card, and supports ram to DDR3-2400 (which is overclocking, I am not doing that yet) with a SSD instead of a disk (SATA3), 16 GB of RAM (capable of DDR-2400) and a A10-7860K APU. It's a pretty minimal Debian/stable install, and not many daemons running. I think BOINC is getting 99.9+% of time with 3 of 4 processors devoted to BOINC, and the 8 GPU cores also working on a BOINC job. (No systemd on any of my machines.)

At some point I want to add the R7-250 card in the PCIe-x16 slot, as it is supposed to be meant to work with the A10 APU.

How does one profile this, to find out where the choke points are?

In terms of coding, I prefer FORTRAN, but I have done C, C++ and others.
ID: 1799262 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20264
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1799268 - Posted: 29 Jun 2016, 0:26:00 UTC - in response to Message 1799262.  

What number crunching are you wanting to do?...

If this is all for s@h, then you will be limited by your GPU capability. Look up the Lunatics optimised apps for the latest developments.

If this is for your own code, then you should already have a good idea about what data flows and what computation flows you need...


So... What are you wanting to try and with what?

(In the Linux world, you could try looking up the details for the kernel performance tooling called "perf" to see real-time real world kernel activity...)


Hope that helps with a few thoughts.

Happy fast crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1799268 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1799326 - Posted: 29 Jun 2016, 7:33:58 UTC - in response to Message 1799262.  
Last modified: 29 Jun 2016, 7:34:30 UTC


How does one profile this, to find out where the choke points are?

In terms of coding, I prefer FORTRAN, but I have done C, C++ and others.

SETI app written on C/C++/OpenCL/CUDA(for NV) so sorry no use of FORTRAN here.
Sources available here: https://setisvn.ssl.berkeley.edu/svn/branches/sah_v7_opt
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1799326 · Report as offensive
Fortran

Send message
Joined: 20 Apr 06
Posts: 16
Credit: 13,398,872
RAC: 31
Canada
Message 1799738 - Posted: 30 Jun 2016, 20:42:22 UTC - in response to Message 1799268.  

No use for FORTRAN? Ah gee, .... :-)

I would presume SETI has just about all the help it needs on number crunching.

No, I want to move towards doing my own code on GPUs. My M.Eng. (1986) was a dynamical system (a ring of soap bubbles linked by pipes as a model for grain growth in solids). I have done a fair amount of GIS/GPS related stuff since then, which is probably where I will start as I now have 2 or 3 GPS units and 9/10 DOF IMUs. Hence, with a measured location of the GPS antenna, the known vehicle dimensions and a measured orientation in space, you can calculate all of the vehicle/ground locations.

I wanted to get used to GPU calculations with BOINC, and then move into this other stuff.
ID: 1799738 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1799743 - Posted: 30 Jun 2016, 21:09:47 UTC - in response to Message 1799738.  
Last modified: 30 Jun 2016, 21:10:02 UTC

No use for FORTRAN? Ah gee, .... :-)

If would presume SETI has just about all the help it needs on number crunching.

I you willing to port to FORTRAN and achieve better performance for that port sure SETI will need it ;)
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1799743 · Report as offensive
Fortran

Send message
Joined: 20 Apr 06
Posts: 16
Credit: 13,398,872
RAC: 31
Canada
Message 1799746 - Posted: 30 Jun 2016, 21:46:31 UTC - in response to Message 1799743.  

I don't think I have the time to port FORTRAN. The closest thing I could relate to that, was to port Perl-4.x to QNX 2.x many years ago. Which was mostly hacking out the networking code, as QNX had a much different idea of networks than UNIX/Perl assumed.

I've read bits and pieces about the first ports of fortran to UNIX, and I once wrote something to translate ratfor into FORTRAN-77. The world view of FORTRAN is orthogonal to C, C++, ..., I don't think it would be a one or two night venture.

Thanks for the invite. :-)
ID: 1799746 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1799760 - Posted: 30 Jun 2016, 22:31:26 UTC - in response to Message 1799746.  
Last modified: 30 Jun 2016, 22:32:39 UTC

Which was mostly hacking out the networking code, as QNX had a much different idea of networks than UNIX/Perl assumed.

QNX has POSIX layer AFAIK (my experience based on QNX 4.x and Momentics though)
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1799760 · Report as offensive
Fortran

Send message
Joined: 20 Apr 06
Posts: 16
Credit: 13,398,872
RAC: 31
Canada
Message 1799819 - Posted: 1 Jul 2016, 1:49:52 UTC - in response to Message 1799760.  

QNX 4.x was a big change from 2.x.

It was an interesting work environment. I had RT-11 on a PDP 11 and some K11 based machines, and then PCs running QNX-2. 8 inch floppies and the really old Winchester disks for the PDP type stuff. Mind you, at least I didn't have to work with paper tape there. I did once have paper tape for a teletype. I guess I am old. :-)

I know of machines with toggle switches on the front panel, I never got to use that kind of hardware.
ID: 1799819 · Report as offensive
Fortran

Send message
Joined: 20 Apr 06
Posts: 16
Credit: 13,398,872
RAC: 31
Canada
Message 1799924 - Posted: 1 Jul 2016, 14:50:42 UTC - in response to Message 1799819.  

How BOINC sets things up is different from how Debian sets up BOINC. And my first problem, is how to get GPU jobs to continue running. I have more computers than monitors and mice, and so I am swapping monitors and mice between 2 systems. Sometimes (always) I end up typing on the keyboard of the computer running these GPU jobs, and the GPU job dies or ends or suspends or ....

I tried starting Xeyes, and assigning focus to that application (I think it ignores the keyboard), but the GPU job eventually stopped and nothing GPU happened since.

I've seen some writeups about setting permissions to work with Debian. But, as I intend to eventually move to Devuan, I brought this up with Devuan.

It seems that Jaromil (of Devuan) has ported/forked/updated a priviledge escalation tool called sup https://git.devuan.org/jaromil/sup . This tool is configured at compile time, for the purpose of escalating priviledges of known programs. As it involves compiling, it isn't a general solution for GPU access with BOINC (too many people can't compile), but it might be useful to some people.
ID: 1799924 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1799939 - Posted: 1 Jul 2016, 15:57:47 UTC - in response to Message 1799924.  
Last modified: 1 Jul 2016, 16:00:03 UTC

Check if BOINC configured to run GPU only on idle system. In that case GPU task will be suspended when you type or move mouse.

EDIT: GPU app doesn't actually require escalated priviligies.
It's the question of enqueue enough work and then just wait for completion. Can be done with low priority if (big if) algorithm allows.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1799939 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20264
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1800044 - Posted: 2 Jul 2016, 0:59:42 UTC
Last modified: 2 Jul 2016, 1:01:59 UTC

???!...

You are running Linux so...


Yes, you can use a physical keyboard-video-mouse switch.

Or...

Look up using "ssh -X", or look up for using the boinc client to access boinc on other hosts on your network.


Aside: Note that boinc will need to be in a video group to be able to access any GPU resource. A distro install will set that for you. Whereas the Berkeley installer will leave you to set that manually. Do not run boinc as root! There is no need to.

Happy fast crunchin,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1800044 · Report as offensive
Fortran

Send message
Joined: 20 Apr 06
Posts: 16
Credit: 13,398,872
RAC: 31
Canada
Message 1800270 - Posted: 3 Jul 2016, 2:58:27 UTC - in response to Message 1800044.  

I probably could use a KVM switch, if I had one. Over too many years, I have extra keyboards (many are PS/2), some extra mice and no extra monitors. All the monitors that work, are in use.

I try to set things up as commonly as I can (with 2 Debian stable64, 1 Debian stable32 and 1 Gentoo64), and all the users in question are in all the right groups. The Debian32 is going to get updated to Debian64, and the Gentoo is going back to Debian (bad nightmare) at some point. And if it makes a difference, the whole bunch will go to Devuan after that.

But, I can't do much on my tracing things, because as it stands now if I disconnect the monitor (turned off) the GPU job at some point finishes or dies (I've seen defunct a couple of times in ps output). CPU jobs run fine. In the near term, I want to run that system headless. But, in the future it will have a monitor (when it is mounted inside a motor vehicle), and possibly if I can afford another monitor that isn't used all the time.

I have seen some web pages about headless GPU work, I'm not sure how they fit in with how Debian installs BOINC and SETI.

I started installing perf related tools. I see that there is a perf for 4.2 and for 4.6, but not for 4.5 (which I have on 2 machines). DKMS did not run without error under 4.6 on this machine which has the APU (A10-7860) in it.
ID: 1800270 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20264
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1800326 - Posted: 3 Jul 2016, 12:55:54 UTC - in response to Message 1800270.  
Last modified: 3 Jul 2016, 13:05:45 UTC

Hey, it's good to experiment!

I tried a physical KVM switch a few years ago. One plus point is that it keeps things conceptually direct and 'easy'. Minus points are the mass of cables needed and that you can still get yourself confused as to which machine you have (inadvertently) selected!

Since then, and since all my machines are on a Gbit network, I stay settled and comfortable on just the one physical machine and use the various remote connections available on Linux to work with all the other machines (including for my normal work where the machines can be hundreds of miles away).

I try to set things up as commonly as I can (with 2 Debian stable64, 1 Debian stable32 and 1 Gentoo64), and all the users in question are in all the right groups. The Debian32 is going to get updated to Debian64, and the Gentoo is going back to Debian


Very good idea to settle on what you like until you wish to explore further.

I learnt through the DEMO-Linux, Mandrake, Mandriva, Magia route for the 'desktop' Linux distros. Various others were used on odd occasions also.

Alongside that selection, I've moved almost entirely over to customised Gentoo systems, often with no desktop at all!... Works very well but that is quite a lot of new things to learn to move from the other very well packaged 'desktop' Linux distros.

For getting GPUs to work for Boinc:

  • Use the version of boinc that is packaged with the Linux distro;
  • Check the system power settings to set what happens when there is no human activity!
  • Check whichever forums for your distro and the questions forums here for any comments about running boinc.



In the Windows world, some users have to make a dummy video plug to fool the graphics card electronics (or the Windows drivers?) into believing a physical monitor is connected. Is that still the case or has that problem 'gone away'?

Myself, not seen that problem for any of the GPU crunching done here across a mix of VGA, DVI and HDMI graphics cards.

Note: Boinc can get disconnected from the GPU if you are running Boinc as a yourself as a user and you then log off or allow the system to go into sleep or whatever powersaving mode.


Go with 4.6 kernel systems for perf and the various other kernel tools... A lot has happened since the previous versions!

And see:

All the performance monitoring tools in Linux and their functionalities in a single pic


Hope that gives a few clues,

Happy cool crunchin'
Martin


See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1800326 · Report as offensive
Fortran

Send message
Joined: 20 Apr 06
Posts: 16
Credit: 13,398,872
RAC: 31
Canada
Message 1801673 - Posted: 9 Jul 2016, 2:19:38 UTC - in response to Message 1800326.  

I started with Linux with the 1.2.13 kernel. I will say Slackware, but yggradsil (or something like that) seems to be involved. I tried SuSE for a while before finding Debian. Gentoo is a nightmare for me. I will be moving to Devuan, as I cannot stomach systemd.

But that has nothing (or very little) to do with BOINC/SETI. My two most up to date systems, are a Gigabyte 78LMT-USB3 with an AMD B24 processor (and a PCIe-x16 2.x slot) and an ASRock A88m-ITX/ac with an A10-7860k (and a PCIe-x16 3.0 slot). The Gigabyte has 8 GB of RAM, and the ASRock has 16 GB (both dual channel). The RAM in the ASRock is rated to 2400 (g.Skill). As a general rule, I don't overclock. I will consider it here.

For GPUs I can devote to this, I have a HD6450 and a R7-250 (both AMD/ATI). I can do runs (Debian/stable with as little systemd as I can get away with) with the two machines, with any reasonable setup you suggest. If I have to compile PERF tools to instrument things (and you tell me what works), I will modify equipment and conditions to generate data.

But, from casual examination of the data, I think it is silly that there seems to be almost no difference between the HD6450 and the R7-250 in the Gigabyte machine. Of given that 1333 is the default memory rate if you don't do overclocking, that there is little difference between the Gigabyte and the ASRock machines (without installing the R7-250 in the ASRock, since the A10-7860 is meant to work with a R7-250).

I had run across an article about someone (from Apple?) who had been involved with GPU work, and who recently got involved in trying to do things with the RPi GPU (after Broadcom published data about the GPU).

What this looks like to me, is that something like the ATLAS BLAS library needs to get built. For every single operation, a person has to test many different ways to do things, because nobody publishes enough data. ATLAS seems to spend a lot of time on partial loop unrolling, I think the big thing with these GPUs is just getting data into and out of the board. Or in the case of AMD APUs, into the GPU unit.
ID: 1801673 · Report as offensive
Fortran

Send message
Joined: 20 Apr 06
Posts: 16
Credit: 13,398,872
RAC: 31
Canada
Message 1801676 - Posted: 9 Jul 2016, 2:24:24 UTC - in response to Message 1800326.  

Oh, my issues in swapping cables while machines are running.

Brand new Debian/jessie install (too much systemd, which may not be at fault, so I am just grumbling). But, it seems that Debian installs things such that the system was either suspending or hibernating. So, for a while I am not switching cables (or turning off the monitor), just to try and see if having 8 GPU cores and 3 CPU cores working on BOINC/SETI is significantly different from just 3 CPU cores doing BOINC/SETI.

I would not have thought that such an experiment was even needed.
ID: 1801676 · Report as offensive
Fortran

Send message
Joined: 20 Apr 06
Posts: 16
Credit: 13,398,872
RAC: 31
Canada
Message 1801996 - Posted: 10 Jul 2016, 20:49:11 UTC - in response to Message 1801676.  

I stumbled across a OpenCL capable profiler, which is supposedly Open Source called LPTV (github.com/LTPV/LTPV). There is some ASCII text there, but there is a lot of binary for an Open Source package. I had never heard of wap for compiling and installing a project before.

In any event, to use it is:

ltpv /path/to/opencl/program

and it seems it overwrites OpenCL calls, so it probably should be a copy of a OpenCL program.

I guess BOINC/SETI calls programs, which might be what to do here (find a SETI program to call). But, is there perhaps something like "Hello world", and maybe something "trivial" (fitting a polynomial to data?) that people know of, to get used to this profiler?
ID: 1801996 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1802006 - Posted: 10 Jul 2016, 22:34:55 UTC - in response to Message 1801996.  
Last modified: 10 Jul 2016, 22:35:58 UTC

Usually profiler comes with few examples how to use it with some test project for profile.
Also you could use OpenCL SDK samples for that. Unmodified SETI app gives too too much profiler data to serve as good sample. One can artifically reduce number of icfft pairs (either in app's code or in sample test task) to make profiler output more manageable.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1802006 · Report as offensive

Message boards : Number crunching : Profiling AMD/ATI OpenCL systems (Linux)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.