OpenCL apps are available for download on Lunatics


log in

Advanced search

Message boards : Number crunching : OpenCL apps are available for download on Lunatics

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next
Author Message
Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24062
Credit: 33,035,395
RAC: 24,509
Germany
Message 1334313 - Posted: 3 Feb 2013, 17:30:16 UTC

You cant compare HD5 app version against non HD5 app.
Stay tuned we will release new HD5 app soon.

____________

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24062
Credit: 33,035,395
RAC: 24,509
Germany
Message 1334356 - Posted: 3 Feb 2013, 18:52:53 UTC
Last modified: 3 Feb 2013, 18:55:21 UTC

For those having trouble running Cat 13.1.
Just unistall APP runtime and use pre version under same drivers.

You might want to use a file manager like system commander.

Shut down Boinc

Go to C:/AMD/support/13-1_vista_win7_win8_64_dd_ccc_whql/Packages/apps/OpenCL64 directory and run OpenCL.msi to uninstall APP runtime.

Now follow this steps carefully.


Restart your computer.

Now unpack Cat 12.8. by clicking on it but abort installation and run OpenCL.msi from 12.8 package.

Restart your computer again.

Check with GPUz or GPUCapsviewer that OpenCL is checked.

Now you are running Cat 13.1. with OpenCL runtime from 12.8.

Hope this helps.
____________

Profile cov_route
Avatar
Send message
Joined: 13 Sep 12
Posts: 293
Credit: 6,974,720
RAC: 14,454
Canada
Message 1334446 - Posted: 3 Feb 2013, 22:08:42 UTC - in response to Message 1334356.

For those having trouble running Cat 13.1.
Just unistall APP runtime and use pre version under same drivers....


I don't suppose it would be as easy as: start with clean install of 12.8, run 13.1 install but don't check the box for "APP SDK Runtime" ? IE keep the previous APP runtime?

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24062
Credit: 33,035,395
RAC: 24,509
Germany
Message 1334451 - Posted: 3 Feb 2013, 22:23:08 UTC - in response to Message 1334446.
Last modified: 3 Feb 2013, 22:24:40 UTC

For those having trouble running Cat 13.1.
Just unistall APP runtime and use pre version under same drivers....


I don't suppose it would be as easy as: start with clean install of 12.8, run 13.1 install but don't check the box for "APP SDK Runtime" ? IE keep the previous APP runtime?


Its not that hard at all.
Took me 20 minutes.

Some might want to keep 13.1. for new games.

Edit: Its for those who already have 13.1 installed.
____________

Profile TRuEQ & TuVaLu
Volunteer tester
Avatar
Send message
Joined: 4 Oct 99
Posts: 470
Credit: 18,234,256
RAC: 22,881
Sweden
Message 1334453 - Posted: 3 Feb 2013, 23:14:36 UTC

1'st task done with 1761
http://setiathome.berkeley.edu/result.php?resultid=2821308526
Speed is about the same as 1363 and it works, which is the important thing.


Good Work!!!

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 323
Credit: 142,405,787
RAC: 258,445
Australia
Message 1339804 - Posted: 20 Feb 2013, 20:25:21 UTC
Last modified: 20 Feb 2013, 20:25:57 UTC

I've noticed a warning message that's been bothering me since r1761:

WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiathome.berkeley.edu\AP_clFFTplan_ATIRV770_32768_r1766.bin, continue with recompile...

On two of my hosts, I get this message with every single work-unit (and, obviously, I haven't purged any binary caches). This is what I've been able to find, relating to this warning message:

  • Affected hosts both run Windows XP.
  • Affected hosts have this warning message on HD 4000 GPUs. HD 6000 GPUs (only other ATI GPUs I have) are not affected.
  • This is only an issue on ATI AstroPulse. NV AstroPulse and ATI MB are not affected.
  • Issue appears to be present in r1761, r1764, and r1766. It was not present in r1636.

I'm just wondering, is this going to be a problem for me? Are these hosts really going to spend CPU cycles recompiling for every single AP WU?
____________
Soli Deo Gloria

Profile MikeProject donor
Volunteer tester
Avatar
Send message
Joined: 17 Feb 01
Posts: 24062
Credit: 33,035,395
RAC: 24,509
Germany
Message 1339855 - Posted: 20 Feb 2013, 22:03:49 UTC

Its not really a problem.
Nothing changed depending binary creation since r1316 AFAIK.

Yes, it will create the binaries on each AP unit.

Maybe you changed something else.
Boinc/driver etc ?

____________

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3419
Credit: 46,514,731
RAC: 10,087
Russia
Message 1339857 - Posted: 20 Feb 2013, 22:13:39 UTC - in response to Message 1339804.
Last modified: 20 Feb 2013, 22:17:34 UTC

I've noticed a warning message that's been bothering me since r1761:

WARNING: can't open binary kernel file for oclFFT plan: C:\Documents and Settings\All Users\Application Data\BOINC/projects/setiathome.berkeley.edu\AP_clFFTplan_ATIRV770_32768_r1766.bin, continue with recompile...

On two of my hosts, I get this message with every single work-unit (and, obviously, I haven't purged any binary caches). This is what I've been able to find, relating to this warning message:

  • Affected hosts both run Windows XP.
  • Affected hosts have this warning message on HD 4000 GPUs. HD 6000 GPUs (only other ATI GPUs I have) are not affected.
  • This is only an issue on ATI AstroPulse. NV AstroPulse and ATI MB are not affected.
  • Issue appears to be present in r1761, r1764, and r1766. It was not present in r1636.

I'm just wondering, is this going to be a problem for me? Are these hosts really going to spend CPU cycles recompiling for every single AP WU?


Stop boinc
locate that file. Does it exist?
rename it.
start boinc
file recreated?
compare recreated one with saved one -are they identical?

EDIT:
Is this result from affected host?
http://setiathome.berkeley.edu/result.php?resultid=2841704346

If yes better to stop with ATi AP on that host, it produces invalids.
Perhaps low-end HD4xxx no more supported at all.
____________

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 323
Credit: 142,405,787
RAC: 258,445
Australia
Message 1339883 - Posted: 21 Feb 2013, 1:02:17 UTC

I'll double-check when I get home, but I have restarted BOINC and the whole system at least a few times since r1761 was first released. I haven't changed the Catalyst drivers, but that's a good point: the affected hosts are using Catalyst 11.12, and none of my other hosts use that particular version.

As for the invalid results: yes, I'm aware of those. That's the AGP HD 4670 you helped me with some time ago. The funny thing about that card is that its failures (and successes) are inconsistent. With BOINC and the host remaining on continuously and no driver or other system changes, there will be a bunch of valid AP results for a while, then a bunch of invalid results, then a bunch of valid results... I do know that Catalyst 12.1 gives a lot of invalid results on this card, though, but Catalyst 11.12 is okay.
____________
Soli Deo Gloria

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 323
Credit: 142,405,787
RAC: 258,445
Australia
Message 1340084 - Posted: 22 Feb 2013, 4:38:49 UTC

Phew, site is back on-line...

Interestingly, I just realised that the ATI bin cache is not being written to disk at all, which would explain the warning messages. Now the puzzle is why it's not being written to disk. The directory is not write-protected in anyway, and the other .bin_V6 and .wisdom cache files are being written as normal.

Any idea why the bin file isn't getting saved with each WU run?
____________
Soli Deo Gloria

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3419
Credit: 46,514,731
RAC: 10,087
Russia
Message 1340129 - Posted: 22 Feb 2013, 7:19:38 UTC - in response to Message 1340084.

Phew, site is back on-line...

Interestingly, I just realised that the ATI bin cache is not being written to disk at all, which would explain the warning messages. Now the puzzle is why it's not being written to disk. The directory is not write-protected in anyway, and the other .bin_V6 and .wisdom cache files are being written as normal.

Any idea why the bin file isn't getting saved with each WU run?


Cause "patching required" for oclFFT.
It means that after kernel formation oclFFT setup code finds that it can't execute prepared kernels on particular device. And re-creates some of them. There were reports that after binary cache introduction app stopped to work on low-ATi devices (workgroup of 128). So, I disabled cache creation in such case. It will always though such warning but should each time create "right" kernels. But look slike last statement not the case. But not because of binary cache at last.
App was compiled with SDK 2.5 so you need corresponding driver to run w/o issues.
We can try SDK 2.4-compiled one insttead as debugging effort, but it will take some time to get.

____________

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 323
Credit: 142,405,787
RAC: 258,445
Australia
Message 1340149 - Posted: 22 Feb 2013, 8:25:53 UTC

Ah, that makes sense. The 'patching' warning, that is, as both hosts where the .bin cache is not present also exhibit the patching warning. That's fine, as long as I know there's no major problem, I'm satisfied, thanks for the answer. APP SDK 2.5 support is supposed to start with Catalyst 11.7, so both hosts should be okay at Catalyst 11.12, anyway.

Just for the record, the HD 4850 has a workgroup size of 256, though it produces the patching warning.

Somewhat related, the invalid results you were observing on my HD 4670 earlier, I have a sneaking suspicion - though no valid proof yet - that the invalid results may be caused by CPU starvation. It's a very old Pentium 4 CPU on this host, so I don't have any CPU tasks running on it, but the AstroPulse NV tasks do cause a lot of CPU usage. I'm aware of the high-CPU usage issue in recent GeForce drivers... what's the most recent set of drivers that did not exhibit this high-CPU issue? After waiting so long for AMD to fix Catalyst's high-CPU usage problem, I'm disappointed Nvidia still hasn't fixed theirs...
____________
Soli Deo Gloria

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4982
Credit: 73,380,946
RAC: 15,619
Australia
Message 1340152 - Posted: 22 Feb 2013, 8:33:56 UTC - in response to Message 1340149.
Last modified: 22 Feb 2013, 8:49:08 UTC

Somewhat related, the invalid results you were observing on my HD 4670 earlier, I have a sneaking suspicion - though no valid proof yet - that the invalid results may be caused by CPU starvation. It's a very old Pentium 4 CPU on this host, so I don't have any CPU tasks running on it, but the AstroPulse NV tasks do cause a lot of CPU usage. I'm aware of the high-CPU usage issue in recent GeForce drivers... what's the most recent set of drivers that did not exhibit this high-CPU issue? After waiting so long for AMD to fix Catalyst's high-CPU usage problem, I'm disappointed Nvidia still hasn't fixed theirs...


IIRC around 266.58 or so (maybe earlier, someone else can correct me) was the last nv OpenCL 1.0 driver after when they went to multithreaded drivers. These older ones use the blocking style synchronisation semantics required for that kindof coding to use low amounts of CPU. All nVidia OpenCL 1.1 drivers will be expecting the developer to implement GPU callback functionality ( ala. OpenGL/DirectX ) for low CPU usage, as blocking synchronisation is no longer used with multithreaded drivers at low levels (a form of 'deadlock prevention', moving to 'non-blocking' APIs ).
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 323
Credit: 142,405,787
RAC: 258,445
Australia
Message 1340167 - Posted: 22 Feb 2013, 9:29:39 UTC

Thanks for that. I haven't done any OpenCL development before, but I think I understand what's going on there.

After looking around, it seems OpenCL 1.1 was introduced with GeForce 280.26, with the release notes for GeForce 275.33 indicating OpenCL support is at 1.0. You're saying the CPU issue relates to the multi-threading drivers introduced with OpenCL 1.1 or did the switch to multi-threading happen before that?

Also, you say that OpenCL 1.1 is dependent on the developer to implement GPU call-back - is this the case for the current AstroPulse NV application?

I might give 275.33 a try tomorrow.
____________
Soli Deo Gloria

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4982
Credit: 73,380,946
RAC: 15,619
Australia
Message 1340169 - Posted: 22 Feb 2013, 10:07:49 UTC - in response to Message 1340167.

...After looking around, it seems OpenCL 1.1 was introduced with GeForce 280.26, with the release notes for GeForce 275.33 indicating OpenCL support is at 1.0. You're saying the CPU issue relates to the multi-threading drivers introduced with OpenCL 1.1 or did the switch to multi-threading happen before that?
It's a while back now, but i recall a bit of overlap due to the usual transition bugs etc.

Also, you say that OpenCL 1.1 is dependent on the developer to implement GPU call-back - is this the case for the current AstroPulse NV application?
Not sure which direction Raistmer went in after OpenCL 1.0, but yes. I'm not aware if AMDs OpenCL 1.2 implementation works around this with Cuda like syncs at a higher level, or a change/ammendment to the opencl specification (or both*).

[*Including callback functionailty derived from preexisting OpenGL, DirectX and Windows Core Audio technologies]
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Wedge009
Volunteer tester
Avatar
Send message
Joined: 3 Apr 99
Posts: 323
Credit: 142,405,787
RAC: 258,445
Australia
Message 1340176 - Posted: 22 Feb 2013, 10:42:25 UTC
Last modified: 22 Feb 2013, 10:43:54 UTC

Thanks for the information. I thought of one more point: I notice that running AP and MB simultaneously on the same NV GPU results in far lower overall CPU usage (on the GPU tasks) than running multiple AP WUs on a single NV card only. I notice this pattern across all hosts, where those hosts also run CPU-only tasks, on recent GeForce drivers.

Does this match your experience of running tasks on NV GPUs?
____________
Soli Deo Gloria

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4982
Credit: 73,380,946
RAC: 15,619
Australia
Message 1340185 - Posted: 22 Feb 2013, 12:02:03 UTC - in response to Message 1340176.
Last modified: 22 Feb 2013, 12:02:24 UTC

Does this match your experience of running tasks on NV GPUs?


Technically no, because I don't run GPU AP. x41zc is pretty low CPU usage on its own here.
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Profile Raistmer
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 16 Jun 01
Posts: 3419
Credit: 46,514,731
RAC: 10,087
Russia
Message 1340257 - Posted: 22 Feb 2013, 16:08:47 UTC - in response to Message 1340176.

Thanks for the information. I thought of one more point: I notice that running AP and MB simultaneously on the same NV GPU results in far lower overall CPU usage (on the GPU tasks) than running multiple AP WUs on a single NV card only. I notice this pattern across all hosts, where those hosts also run CPU-only tasks, on recent GeForce drivers.

Does this match your experience of running tasks on NV GPUs?


It can be explained if postulate that OpenCL sync uses busy-wait loop while CUDA uses interrupt-based signalling (or smth more modern signalling that I'm not aware of but not busy-wait loop).
Then, if GPU busy with both OpenCL AP and CUDA MB kernels there is time for AP kernel to complete before driver thread starts to poll CPU constantly. Hence, lower CPU usage overall.
Also, I see BIG drop in CPU usage for the same reason between idle and busy CPU (studied in details on APU, observed on Intel "APU" (Ivy Bridge) and on "CPU consuming" NV drivers on GTX 260). If CPU busy with some work it will take longer time to switch to polling driver thread. Hence, GPU may report completion on first iteration saving CPU cycles for useful work.
Unfortunately, in this case CPU can easely become TOO busy to feed GPU unit with new work so, while CPU consumption will be quite slow, GPU load and overall host performance will drop considerably.
Hence, freeing core for GPU feeding is some sort of compromise.

____________

Richard HaselgroveProject donor
Volunteer tester
Send message
Joined: 4 Jul 99
Posts: 8494
Credit: 49,815,986
RAC: 51,753
United Kingdom
Message 1340269 - Posted: 22 Feb 2013, 16:41:16 UTC - in response to Message 1340257.

Thanks for the information. I thought of one more point: I notice that running AP and MB simultaneously on the same NV GPU results in far lower overall CPU usage (on the GPU tasks) than running multiple AP WUs on a single NV card only. I notice this pattern across all hosts, where those hosts also run CPU-only tasks, on recent GeForce drivers.

Does this match your experience of running tasks on NV GPUs?


It can be explained if postulate that OpenCL sync uses busy-wait loop while CUDA uses interrupt-based signalling (or smth more modern signalling that I'm not aware of but not busy-wait loop).
Then, if GPU busy with both OpenCL AP and CUDA MB kernels there is time for AP kernel to complete before driver thread starts to poll CPU constantly. Hence, lower CPU usage overall.
Also, I see BIG drop in CPU usage for the same reason between idle and busy CPU (studied in details on APU, observed on Intel "APU" (Ivy Bridge) and on "CPU consuming" NV drivers on GTX 260). If CPU busy with some work it will take longer time to switch to polling driver thread. Hence, GPU may report completion on first iteration saving CPU cycles for useful work.
Unfortunately, in this case CPU can easely become TOO busy to feed GPU unit with new work so, while CPU consumption will be quite slow, GPU load and overall host performance will drop considerably.
Hence, freeing core for GPU feeding is some sort of compromise.

Which takes us back to a question which I think has been raised before.

Is it possible, or is it not possible, for a developer to code for "GPU callback functionality ( ala. OpenGL/DirectX )" under the OpenCL 1.1 platform on NVidia hardware, using open-source GPL-compliant development environments?

If so, could somebody put together a recommended toolset and programming guide for study?

Profile jason_geeProject donor
Volunteer developer
Volunteer tester
Avatar
Send message
Joined: 24 Nov 06
Posts: 4982
Credit: 73,380,946
RAC: 15,619
Australia
Message 1340271 - Posted: 22 Feb 2013, 16:49:57 UTC - in response to Message 1340269.
Last modified: 22 Feb 2013, 16:50:35 UTC

If so, could somebody put together a recommended toolset and programming guide for study?


Like the nVidia OpenCL Ocean Demo (from the GPU compute SDK) ?
____________
"It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is the most adaptable to change."
Charles Darwin

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 9 · Next

Message boards : Number crunching : OpenCL apps are available for download on Lunatics

Copyright © 2014 University of California