OpenCL apps are available for download on Lunatics

Message boards : Number crunching : OpenCL apps are available for download on Lunatics
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next

AuthorMessage
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 12100
Credit: 124,084,991
RAC: 44,334
United Kingdom
Message 1340276 - Posted: 22 Feb 2013, 16:54:32 UTC - in response to Message 1340271.  

If so, could somebody put together a recommended toolset and programming guide for study?

Like the nVidia OpenCL Ocean Demo (from the GPU compute SDK) ?

I'll see if I can find a pull quote from there (I should have it somewhere, else my download link is going to get some more exercise) that I can pass on to Raistmer.
ID: 1340276 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1340278 - Posted: 22 Feb 2013, 16:57:07 UTC - in response to Message 1340276.  

If so, could somebody put together a recommended toolset and programming guide for study?

Like the nVidia OpenCL Ocean Demo (from the GPU compute SDK) ?

I'll see if I can find a pull quote from there (I should have it somewhere, else my download link is going to get some more exercise) that I can pass on to Raistmer.
Will have to find it again myself, though I remember that using an OpenGL callback in a renderloop for OpenCL 1.0, since OpenCL didn't get them until 1.1.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1340278 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 12100
Credit: 124,084,991
RAC: 44,334
United Kingdom
Message 1340280 - Posted: 22 Feb 2013, 17:04:17 UTC - in response to Message 1340278.  

If so, could somebody put together a recommended toolset and programming guide for study?

Like the nVidia OpenCL Ocean Demo (from the GPU compute SDK) ?

I'll see if I can find a pull quote from there (I should have it somewhere, else my download link is going to get some more exercise) that I can pass on to Raistmer.
Will have to find it again myself, though I remember that using an OpenGL callback in a renderloop for OpenCL 1.0, since OpenCL didn't get them until 1.1.

Pity the OpenCL code samples aren't broken out into a separate download - looks like I need the whole GPU Computing [sic] SDK download (that's the same thing, I take it?)

None of
CUDA C/C++ Code Samples
DirectCompute Code Samples
CUDA Library Samples

look hopeful.
ID: 1340280 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1340324 - Posted: 22 Feb 2013, 17:54:25 UTC - in response to Message 1340280.  
Last modified: 22 Feb 2013, 18:01:38 UTC

If so, could somebody put together a recommended toolset and programming guide for study?

Like the nVidia OpenCL Ocean Demo (from the GPU compute SDK) ?

I'll see if I can find a pull quote from there (I should have it somewhere, else my download link is going to get some more exercise) that I can pass on to Raistmer.
Will have to find it again myself, though I remember that using an OpenGL callback in a renderloop for OpenCL 1.0, since OpenCL didn't get them until 1.1.

Pity the OpenCL code samples aren't broken out into a separate download - looks like I need the whole GPU Computing [sic] SDK download (that's the same thing, I take it?)

None of
CUDA C/C++ Code Samples
DirectCompute Code Samples
CUDA Library Samples

look hopeful.


Things have moved around. The oclSimpleGL sample implements the Callbacks using OpenGL, and the ocean simulation has been moved to DirectCompute Using DirectX render callbacks. In either case (both workably demonstrating callbacks & very low CPU usage, with high GPU usage), the basic computation changes from the old style:

Setup device & Kernels etc.
main processing loop:
    Do some computation on GPU
    Hard Synchronise (implicit or explicit)
    Transfer back to CPU for postprocessing

to:
Setup device & Kernels etc.  Context with Vsync off where applicable.
Setup Callbacks (OpenGL, DirectX, OpenCL1.1+ or CUPTi)
Sleep loop 
       Check for exit requests etc
       
with callback ('render') function:
  Do Some computation on GPU
  transfer back to CPU & postprocess where needed  
  [Optionally] draw something


The core changes being only that processing comes out of the main polled/blocking loop, and into a callback called by either a graphics context or a callback interface as supplied by OpenCL 1.1 or Cupti without the need for a graphics context.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1340324 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 12100
Credit: 124,084,991
RAC: 44,334
United Kingdom
Message 1340338 - Posted: 22 Feb 2013, 18:34:22 UTC - in response to Message 1340324.  

I found the CUDA FFT Ocean simulation, thanks - still using OpenGL in the public SDK I downloaded, it may be the developer NDA version which has moved to DirectX. I'll look for the oclSimpleGL sample over the weekend while the servers are dark (got to have an alternative displacement activity), and try and find Open CL in the documentation while I'm down there.
ID: 1340338 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1340345 - Posted: 22 Feb 2013, 18:47:13 UTC - in response to Message 1340338.  
Last modified: 22 Feb 2013, 18:50:08 UTC

I found the CUDA FFT Ocean simulation, thanks - still using OpenGL in the public SDK I downloaded, it may be the developer NDA version which has moved to DirectX. I'll look for the oclSimpleGL sample over the weekend while the servers are dark (got to have an alternative displacement activity), and try and find Open CL in the documentation while I'm down there.


Yep, right direction. A couple of extra notes that may or may not be helpful:

- Internally when/if the app using the Cuda runtime crashes, the call stack usually illustrates a number of DirectX interface functions are used. Those are the method by which the Cuda runtime sets up its 'traditional looking' synchronisation, so attempting to emulate the Cuda method closely on Windows+opencl would involve those rather complex interfaces.

- The Opencl/OpenGL methods use the open source FreeGLUT library, which if digging into deeper opens further options for 'roll your own' synchronisation techniques. (I have those on the board for 'other purposes' down the road)

Obviously both these examples are geared toward graphics interop demonstration, where low CPU usage is considered important, as for example in games the CPU might need to be handling the user interface, ai, sound etc. In dedicated gpGPU 'Our case' is a bit unusual in that our users tend to expect max processing speed and low CPU usage. While these demos illustrate that's possible (especially the DirectCompute/DirectXOcean one) neither are trivial of course. The OpenCL+OpenGL example though is probably much simpler than dealing with DirectX directly, and OpenCL 1.1's mechanism would be an easier route to the same end.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1340345 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 450
Credit: 320,412,996
RAC: 121,978
Australia
Message 1340566 - Posted: 24 Feb 2013, 22:02:08 UTC

Just an update on my experiment: 275.33 still exhibited high CPU usage with AP even though the release notes states it is only supporting OpenCL 1.0. But then I realised that reverting to an older driver isn't a feasible solution because then that means JG's most recent MB applications are no longer supported. Granted, these are slow cards, anyway, but with the new applications, the speed improvements are measurable in the order of several minutes for short WUs and dozens of minutes for the longer ones.

With the servers being off-line, there was an opportunity for the ATI AP WUs to be processed without any heavy CPU usage in its duration. During this time, the host was able to produce at least one valid AP result. If this trend continues, it looks like I'll have to limit the number of NV AP WUs being run to one at a time, to avoid excessive CPU usage.

This isn't a problem for most users, I imagine, as most people wouldn't be using such an old CPU in their hosts.
Soli Deo Gloria
ID: 1340566 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1340582 - Posted: 24 Feb 2013, 22:22:44 UTC - in response to Message 1340566.  

Just an update on my experiment: 275.33 still exhibited high CPU usage with AP even though the release notes states it is only supporting OpenCL 1.0. But then I realised that reverting to an older driver isn't a feasible solution because then that means JG's most recent MB applications are no longer supported. Granted, these are slow cards, anyway, but with the new applications, the speed improvements are measurable in the order of several minutes for short WUs and dozens of minutes for the longer ones.

Which card? The usual pattern is that Cuda 2.3 is best for pre-Fermi, 4.2 for Fermi and 5.0 for Kepler. Going to an older driver and an earlier Cuda version may not be losing you too much speed. You might want to bench that though.

With the servers being off-line, there was an opportunity for the ATI AP WUs to be processed without any heavy CPU usage in its duration. During this time, the host was able to produce at least one valid AP result. If this trend continues, it looks like I'll have to limit the number of NV AP WUs being run to one at a time, to avoid excessive CPU usage.

since you are already running latest boinc alpha, you can make use of the 'max concurrent' feature, see this.

This isn't a problem for most users, I imagine, as most people wouldn't be using such an old CPU in their hosts.


or they might be having an old rig and give it a new lease of life with a good GPU. In those cases it might be best if the CPUs didn't crunch at all.

William the Silent
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1340582 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 450
Credit: 320,412,996
RAC: 121,978
Australia
Message 1340586 - Posted: 24 Feb 2013, 22:34:18 UTC
Last modified: 24 Feb 2013, 22:34:42 UTC

It's a Fermi-based card - GT 430. I'm aware of the differences in each build designed for a different CUDA version. It may all be moot, anyway - with the age of the overall system, I may retire it by the end of the year.

It's not the number of GPU tasks that's the problem, it's the number of NV AP tasks specifically. I think modifying the app_info.xml will be best for limiting those tasks in my particular case.

And yes, what you describe is what I've done - taken an old computer and fill it with GPUs. Still not really cost-effective compared with buying a cheap, modern platform, which is why I doubt many others have done it.
Soli Deo Gloria
ID: 1340586 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4264
Credit: 261,879,592
RAC: 354,500
United States
Message 1340593 - Posted: 24 Feb 2013, 22:39:33 UTC

I also gave 275.33 a try with an old NV8800. Still used a full core, and the CUDA 23 tasks were not affected. If anything, the MB tasks improved slightly.
ID: 1340593 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1340595 - Posted: 24 Feb 2013, 22:42:25 UTC - in response to Message 1340586.  

It's not the number of GPU tasks that's the problem, it's the number of NV AP tasks specifically. I think modifying the app_info.xml will be best for limiting those tasks in my particular case.


If you change the <count> variable to 1 you can only run that one AP task. However, if you leave both MB and AP counts at 0.5 and use app_config.xml to limit max_concurrent for AP only to 1, you can run an AP alongside an MB or two MB at a time but it won't run two AP at the same time. Just saying that might be more to your liking throughput wise.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1340595 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 450
Credit: 320,412,996
RAC: 121,978
Australia
Message 1340600 - Posted: 24 Feb 2013, 23:35:14 UTC
Last modified: 24 Feb 2013, 23:35:29 UTC

I'm aware of how app_info.xml works. It's not a case of multiple WUs on a single card in this case, it's a case of single WU on multiple cards each.
Soli Deo Gloria
ID: 1340600 · Report as offensive
Profile Floyd
Avatar

Send message
Joined: 19 May 11
Posts: 524
Credit: 1,870,625
RAC: 0
United States
Message 1340625 - Posted: 25 Feb 2013, 3:08:45 UTC - in response to Message 1340586.  
Last modified: 25 Feb 2013, 3:24:22 UTC

Wedge009
"And yes, what you describe is what I've done - taken an old computer and fill it with GPUs. Still not really cost-effective compared with buying a cheap, modern platform, which is why I doubt many others have done it."

There is probably more people doing that then you realize , I just did it with an old (2005) dell GX620 I was given , added a GTX 430 GPU and letting it work what it can. total cost $36.00 ,
With the economy being what it is , I can't afford to do anything else.
Rac 611 , 10410 total credits , every little bit helps the cause...
ID: 1340625 · Report as offensive
Wedge009
Volunteer tester
Avatar

Send message
Joined: 3 Apr 99
Posts: 450
Credit: 320,412,996
RAC: 121,978
Australia
Message 1340627 - Posted: 25 Feb 2013, 4:04:11 UTC

Glad to know I'm not the only one who likes tinkering with old hardware as well as new! (:
Soli Deo Gloria
ID: 1340627 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 4264
Credit: 261,879,592
RAC: 354,500
United States
Message 1342297 - Posted: 2 Mar 2013, 4:34:04 UTC - in response to Message 1340593.  

I also gave 275.33 a try with an old NV8800. Still used a full core, and the CUDA 23 tasks were not affected. If anything, the MB tasks improved slightly.

Seems 266.58 is the ticket for Windows XP. Running r1764 there are just a few little red spikes ever so often. We'll see how it goes.
ID: 1342297 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 31288
Credit: 65,710,071
RAC: 26,155
Germany
Message 1345113 - Posted: 10 Mar 2013, 22:11:23 UTC
Last modified: 10 Mar 2013, 22:13:33 UTC

For those suffering with driver restarts with MB_r_1761/r_1764 can now download the HD5 version from my site.

Downloads
With each crime and every kindness we birth our future.
ID: 1345113 · Report as offensive
Profile cov_route
Avatar

Send message
Joined: 13 Sep 12
Posts: 342
Credit: 10,270,618
RAC: 0
Canada
Message 1345133 - Posted: 10 Mar 2013, 22:53:54 UTC - in response to Message 1345113.  

For those suffering with driver restarts with MB_r_1761/r_1764 can now download the HD5 version from my site.

Fantastic! Should this fix restarts with the 13.1 driver?
ID: 1345133 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 31288
Credit: 65,710,071
RAC: 26,155
Germany
Message 1345137 - Posted: 10 Mar 2013, 23:12:30 UTC - in response to Message 1345133.  

For those suffering with driver restarts with MB_r_1761/r_1764 can now download the HD5 version from my site.

Fantastic! Should this fix restarts with the 13.1 driver?


Yes. it should.
But you will still loose time.

With each crime and every kindness we birth our future.
ID: 1345137 · Report as offensive
Profile cov_route
Avatar

Send message
Joined: 13 Sep 12
Posts: 342
Credit: 10,270,618
RAC: 0
Canada
Message 1345138 - Posted: 10 Mar 2013, 23:17:18 UTC - in response to Message 1345137.  

Yes. it should.
But you will still loose time.

I just did a quick bench with short test wu's and it is faster than r390 on 12.8.

I'm going to make the switch then go up to 13.1 and see what happens.
ID: 1345138 · Report as offensive
Profile Mike Special Project $75 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 01
Posts: 31288
Credit: 65,710,071
RAC: 26,155
Germany
Message 1345139 - Posted: 10 Mar 2013, 23:20:39 UTC - in response to Message 1345138.  

Yes. it should.
But you will still loose time.

I just did a quick bench with short test wu's and it is faster than r390 on 12.8.

I'm going to make the switch then go up to 13.1 and see what happens.


Of course its faster.
But 13.1 doesn`t work properly.
On my GPU only 5 out of 12 registers were in use.
That means a heavy slow down.
But it still produces valid results.

Maybe running more instances reduces the slow down a little bit.

With each crime and every kindness we birth our future.
ID: 1345139 · Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · 9 · Next

Message boards : Number crunching : OpenCL apps are available for download on Lunatics


 
©2018 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.