I've Built a Couple OSX CUDA Apps...

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 58 · Next

AuthorMessage
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1748512 - Posted: 11 Dec 2015, 21:59:04 UTC - in response to Message 1748501.  

Try an AP on One NV card and it makes Both NV cards run at around half speed. There isn't a problem running CUDAs on the NV cards, both cards run at Full speed, it only has the problem with APs on the nVidia cards.

And did you check any other OpenCL-based apps but AstroPulse?
ID: 1748512 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1748517 - Posted: 11 Dec 2015, 22:30:46 UTC - in response to Message 1748512.  
Last modified: 11 Dec 2015, 22:43:57 UTC

Try an AP on One NV card and it makes Both NV cards run at around half speed. There isn't a problem running CUDAs on the NV cards, both cards run at Full speed, it only has the problem with APs on the nVidia cards.

And did you check any other OpenCL-based apps but AstroPulse?

Can you give some examples of which tests to try?
I just downloaded LuxMark 3.1 http://www.luxrender.net/wiki/LuxMark#Binaries and ran LuxBall tests on;
1 ATI 6870 Result = 3075
1 NV 750Ti Result = 4403
2 NV 750Ti Result = 8797
All 3 GPUs Result = 11758
All 8 CPUs Result = 863
I'm not familiar with LuxMark, are those results helpful?
ID: 1748517 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1748526 - Posted: 11 Dec 2015, 23:12:41 UTC - in response to Message 1748505.  
Last modified: 11 Dec 2015, 23:19:11 UTC

BTW, have you noticed the last CUDA build has a problem with tasks with an AR of around 1.103119? All the Errors are around this particular AR, http://setiathome.berkeley.edu/results.php?hostid=6796479&state=6


That's a new one to me, will watch out for issues there as I put some of petris updates to stock through. [We're aware there are some issues to track down before primetime]


I quess that is with my ''optimized'' MB app. A glinch. But true. Just at that ar range. (Any rare fft's, odd gaussians, uneven something run there?)

Error at line 64 and at the end of the run means that cuFFT failed freeing resources. So something has screwed up things before - mem overwrite (buffer overflow) - or something, on host (CPU) or on device (GPU).
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1748526 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1748529 - Posted: 11 Dec 2015, 23:19:36 UTC - in response to Message 1748517.  


Can you give some examples of which tests to try?

See if Urs ported OpenCL MultiBeam already or try to port it.
Check if Einstein project has some OpenCL for OS X NV.
And see what OpenCL samples from SDK show.
Yor Luxmark results seem scale well.
ID: 1748529 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1748548 - Posted: 12 Dec 2015, 1:08:01 UTC - in response to Message 1748529.  
Last modified: 12 Dec 2015, 2:02:35 UTC

Well, I decided to run my Other OSX Host as Stock. It hasn't been used in a while and didn't have any active tasks.

Right off the start, Boom, Two Computation errors on the OpenCL MB App using the Same OS I was using a few minutes ago;
Work Unit Info:
...............
Credit multiplier is :  2.85
WU true angle range is :  0.406220
SIGABRT: abort called

Crashed executable name: setiathome_7.08_x86_64-apple-darwin__opencl_nvidia_mac
Machine type Intel 80486 (64-bit executable)
System version: Macintosh OS 10.10.5 build 14F27
Fri Dec 11 19:54:40 2015
atos cannot load symbols for the file setiathome_7.08_x86_64-apple-darwin__opencl_nvidia_mac for architecture x86_64.
0   setiathome_7.08_x86_64-apple-darwin__opencl_nvidia_mac 0x000000010d883054  
1   setiathome_7.08_x86_64-apple-darwin__opencl_nvidia_mac 0x000000010d874166  
2   libsystem_platform.dylib            0x00007fff8d11ef1a  
3   ???                                 0x0000000000000000  
4   libsystem_c.dylib                   0x00007fff8caa59b3  
5   libGPUSupportMercury.dylib          0x00007fff879e4b81  
6   GeForceGLDriverWeb                  0x0000123440369410  
7   libGPUSupportMercury.dylib          0x00007fff879e5538  
8   GeForceGLDriverWeb                  0x000012344035506e  
9   libclhWeb.dylib                     0x000000011098b30a

Don't know, the Two tasks after those appear to be running. I'll reset the Two Errors and see what Else happens. This Host; http://setiathome.berkeley.edu/results.php?hostid=7199204

My guess is there was a conflict with two identical cards trying to build the initial kernels.
*shrugs*

*************

The first ones finished, and considering the AR the Run-Times aren't that much different than the Times when it was running just One 750Ti, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959&offset=160
So....back to the AP App.

BTW, do you see what I see? http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959&offset=140
ID: 1748548 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1748565 - Posted: 12 Dec 2015, 2:22:47 UTC - in response to Message 1748526.  
Last modified: 12 Dec 2015, 2:38:43 UTC

Error at line 64 and at the end of the run means that cuFFT failed freeing resources. So something has screwed up things before - mem overwrite (buffer overflow) - or something, on host (CPU) or on device (GPU).


Yeah, standard boincApi kills the drivers sometimes. Chatted with devs about it and they don't know why, but I have some mild boincapi cutomisations that fix that. [It's freeing memory buffers on the host while the GPU is actively using them, not through our control, since the drivers and libraries do asynchronous stuff]. Will work out how the fixes fit into modern api as things move on.

Rather surprised to see it crop up on Mac, since I thought multithreaded C-Runtimes was just an M$ thing (Though did see some hints of it happening in Linux too, with a different set of symptoms).
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1748565 · Report as offensive
Profile Gary Charpentier Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 25 Dec 00
Posts: 30593
Credit: 53,134,872
RAC: 32
United States
Message 1748615 - Posted: 12 Dec 2015, 5:59:30 UTC - in response to Message 1748565.  

Rather surprised to see it crop up on Mac, since I thought multithreaded C-Runtimes was just an M$ thing (Though did see some hints of it happening in Linux too, with a different set of symptoms).
OS X is Posix and Pthread is part of Posix.

Some time ago I did get upset at a compiler bug on Mac. It was in the optimizer -O3. Nasty to find as it happened only very occasionally. What the issue was the optimizer inlined a subroutine call. Shouldn't be an issue, except by dropping the subroutine call it dropped storing all the registers and restoring them. So when the now inlined subroutine overwrote one of them it was invalid in some later code. (Something that any compiler author should have realized!) Went nuts looking for it. Using the debugger was nearly useless as the particular variable could not be printed as "compiler optimized it away" because it was in the register. Testing without optimization showed working code! So I had to disassemble the executable the figure out what was being stored in which register and as I stepped through I saw what had happened. Then I saw that the GCC branch had fixed that error but the Apple branch hadn't and they didn't seem to care as Apple said -OS is the thing to do!

Moral. Your code may be right and the compiler is screwing it up! Check other branches of the compiler for bug / bug fix issues in the optimizer when all else fails and you are sure the code is written correctly.
ID: 1748615 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1748749 - Posted: 12 Dec 2015, 19:52:10 UTC
Last modified: 12 Dec 2015, 20:21:58 UTC

I went ahead and tried compiling a NV AP App in Mountain Lion. After some time I was able to compile a couple but they both have the Same problem my early attempts at the ATI MB App had earlier this year. The file size is a little small, 3.2 mb, and they fail the standalone test with;
INFO: can't open binary kernel file: .//AstroPulse_Kernels_r2935.cl_GeForceGTX750Ti.bin_V7_TWIN_FFA_14.5.0_10523460203f01, continue with recompile...
libc++abi.dylib: terminating with uncaught exception of type std::logic_error: basic_string::_S_construct NULL not valid
SIGABRT: abort called

Crashed executable name: ap_7.08r2935_NV_ssse3_x86_64-apple-darwin
Machine type Intel 80486 (64-bit executable)
System version: Macintosh OS 10.10.5 build 14F27
Sat Dec 12 03:22:25 2015

0   ap_7.08r2935_NV_ssse3_x86_64-apple-darwin 0x0000000108d76e1b std::_Rb_tree<int, std::pair<int const, PROCINFO>, std::_Select1st<std::pair<int const, PROCINFO> >, std::less<int>, std::allocator<std::pair<int const, PROCINFO> > >::_M_create_node(std::pair<int const, PROCINFO> const&) + 1099
1   ap_7.08r2935_NV_ssse3_x86_64-apple-darwin 0x0000000108d68256 COPROCS::clear() + 4006
2   libsystem_platform.dylib            0x00007fff97260f1a _sigtramp + 26
3   ???                                 0x00007fff56f4e208 0x0 + 140734652277256
4   libsystem_c.dylib                   0x00007fff96be79b3 abort + 129
5   libc++abi.dylib                     0x00007fff96b0fa21 __cxa_bad_cast + 0
6   libc++abi.dylib                     0x00007fff96b379b9 default_terminate_handler() + 243
7   libobjc.A.dylib                     0x00007fff920297eb _objc_terminate() + 124
8   libc++abi.dylib                     0x00007fff96b350a1 std::__terminate(void (*)()) + 8
9   libc++abi.dylib                     0x00007fff96b34b30 __cxxabiv1::exception_cleanup_func(_Unwind_Reason_Code, _Unwind_Exception*) + 0
10  libstdc++.6.dylib                   0x00007fff9b05948b std::__throw_logic_error(char const*) + 85
11  libstdc++.6.dylib                   0x00007fff9b083883 char const* std::search<char const*, char const*, bool (*)(char const&, char const&)>(char const*, char const*, char const*, char const*, bool (*)(char const&, char const&)) + 0
12  libstdc++.6.dylib                   0x00007fff9b081c8e std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) + 56
13  ap_7.08r2935_NV_ssse3_x86_64-apple-darwin 0x0000000108d0c3c3 std::vector<int, std::allocator<int> >::_M_insert_aux(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int const&) + 7395
14  ap_7.08r2935_NV_ssse3_x86_64-apple-darwin 0x0000000108d0cadb std::vector<int, std::allocator<int> >::_M_insert_aux(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >, int const&) + 9211
15  ap_7.08r2935_NV_ssse3_x86_64-apple-darwin 0x0000000108cfc8bf Astropulse::Client::Client() + 3967
16  ap_7.08r2935_NV_ssse3_x86_64-apple-darwin 0x0000000108cf197a ap_signal::~ap_signal() + 11562
17  ap_7.08r2935_NV_ssse3_x86_64-apple-darwin 0x0000000108cf13f4 ap_signal::~ap_signal() + 10148
18  ap_7.08r2935_NV_ssse3_x86_64-apple-darwin 0x0000000108cb07d4 ap_7.08r2935_NV_ssse3_x86_64-apple-darwin + 6100
...

Strange I can compile a working ATI AP App but Not a NV AP App.
The Compile Fails in Yosemite with more talk of "std::basic_string<char, std::char_traits<char>, etc"
I can't remember how this was fixed earlier, all I can remember was a problem with the Apple workgroup size.
Looking at the LuxMark results;
1 ATI 6870 Result = 3075
1 NV 750Ti Result = 4403
and the OpenCL MB results, http://setiathome.berkeley.edu/results.php?hostid=7199204, you would think the NV 750 would complete an OpenCL AP task Faster than the ATI 6870.
After all, the NV 750 is faster in OpenCL in those two cases, however, in my experience the best AP times for the NV 750Ti is around 35 minutes where the ATI 6870 finishes them is less than 30 minutes.
ID: 1748749 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1748753 - Posted: 12 Dec 2015, 20:18:24 UTC - in response to Message 1748548.  

BTW, do you see what I see? http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959&offset=140

Uh, hole in the matrix :)
ID: 1748753 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1748898 - Posted: 13 Dec 2015, 13:43:54 UTC - in response to Message 1748749.  

I went ahead and tried compiling a NV AP App in Mountain Lion. After some time I was able to compile a couple but they both have the Same problem my early attempts at the ATI MB App had earlier this year. The file size is a little small, 3.2 mb, and they fail the standalone test with;
[size=10][pre]INFO: can't open binary kernel file: .//AstroPulse_Kernels_r2935.cl_GeForceGTX750Ti.bin_V7_TWIN_FFA_14.5.0_10523460203f01, continue with recompile...
libc++abi.dylib: terminating with uncaught exception of type std::logic_error: basic_string::_S_construct NULL not valid
SIGABRT: abort called

Crashed executable name: ap_7.08r2935_NV_ssse3_x86_64-apple-darwin
...

Known error : The AstroPulse_Kernels_r2935.cl file was not found by executable at runtime.
Solution : Add AstroPulse_Kernels_r2935.cl file and retry.
_\|/_
U r s
ID: 1748898 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1748919 - Posted: 13 Dec 2015, 16:01:56 UTC - in response to Message 1748898.  
Last modified: 13 Dec 2015, 16:44:14 UTC

I went ahead and tried compiling a NV AP App in Mountain Lion. After some time I was able to compile a couple but they both have the Same problem my early attempts at the ATI MB App had earlier this year. The file size is a little small, 3.2 mb, and they fail the standalone test with;
[size=10][pre]INFO: can't open binary kernel file: .//AstroPulse_Kernels_r2935.cl_GeForceGTX750Ti.bin_V7_TWIN_FFA_14.5.0_10523460203f01, continue with recompile...
libc++abi.dylib: terminating with uncaught exception of type std::logic_error: basic_string::_S_construct NULL not valid
SIGABRT: abort called

Crashed executable name: ap_7.08r2935_NV_ssse3_x86_64-apple-darwin
...

Known error : The AstroPulse_Kernels_r2935.cl file was not found by executable at runtime.
Solution : Add AstroPulse_Kernels_r2935.cl file and retry.

Maybe known to some...
OK, there are three AP kernels in the client folder;
AstroPulse_Kernels_float.cl
AstroPulse_Kernels.cl
AstroPulse_Kernels4.cl
If I try and use anyone of them by changing the name to AstroPulse_Kernels_r2935.cl, I receive the same Error.
The AstroPulse_Kernels_float.cl actually produces a kernel file before it fails.
However, if I use the Stock AstroPulse_Kernels_r2750.cl by changing the name to AstroPulse_Kernels_r2935.cl then it works.

Unfortunately, it appears to be the same as r2750 and r2709. If the other card is running a CUDA task, both cards run at about half speed. Lots of strangeness around here lately. I awoke to find ALL the CUDA tasks had erred out with 'Out of Memory' but had not reported. There were a few normal completions mixed in with all the Errors, but for some reason it hadn't reported them. I'm still trying to recover from that little fiasco.

When I went to bed all three cards were happily working APs. The best I can tell is that after the NV cards finished the APs they threw Out of Memory Errors on the CUDAs. It seems to be working again now...
ID: 1748919 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1748943 - Posted: 13 Dec 2015, 17:55:14 UTC - in response to Message 1748898.  
Last modified: 13 Dec 2015, 18:21:18 UTC

I decided to suspend the CUDA tasks and let the One remaining AP run by itself on a NV 750Ti. It finished normally using the renamed AstroPulse_Kernels_r2750.cl file, http://setiathome.berkeley.edu/result.php?resultid=4586273023
Now I'm outta APs.
When I receive some more APs I'll try running mixed tasks with the old CUDA 5.5 App and see if the tasks still run at half speed.
Strange.

BTW, is there some reason the CUDA tasks need 23GBs of virtual memory?
http://setiathome.berkeley.edu/result.php?resultid=4592951921
The ATI OpenCL tasks only need around 3GBs,
http://setiathome.berkeley.edu/result.php?resultid=4592893700
ID: 1748943 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1749019 - Posted: 13 Dec 2015, 22:24:50 UTC
Last modified: 13 Dec 2015, 23:08:52 UTC

Really strange stuff. Using the CUDA App setiathome_x41zc_x86_64-apple-darwin_cuda55 isn't any better. When one 750Ti is running an AP and the other 750Ti is running a CUDA both tasks are slowed down. I started the CUDA task after the AP and the CUDA task isn't running much faster than the AP even though it's a shorty and should finish in around 8 minutes with that App. So far the 'Shorty' has run 20 minutes and it's only 55% complete.
?????

The ATI AP finished in normal time, http://setiathome.berkeley.edu/result.php?resultid=4596226192
The Shorty that should have finished in 8 mins took 32:18, http://setiathome.berkeley.edu/result.php?resultid=4595266288 ??????
Might as well so back to the 'Special' CUDA App.
Here is a normal Shorty using setiathome_x41zc_x86_64-apple-darwin_cuda55, 7:10, http://setiathome.berkeley.edu/result.php?resultid=4595266358
The NV AP finished in 41:41 instead of around 35 mins, a little better than before but still slow, http://setiathome.berkeley.edu/result.php?resultid=4596226208

As soon as the NV AP finished the other card running another CUDA (VERY Slowly) instantly Sped up to a Large degree.
Not a Clue...
ID: 1749019 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1749023 - Posted: 13 Dec 2015, 22:46:35 UTC - in response to Message 1749019.  

Yeah, that will start to get into well hidden driver architecture bits there, with Cuda and OpenCL having different priorities. probably the best you can do pending further developments/research, is find a ways to keep them out of one another's way ( e.g. priorities + would out settings ). Probably I'll add some cudamb.cfg settings once I figure out how/if You can tweak these things in code on a Mac at all.Probably similar measures would be needed for OpenCL apps, so chatting with the devs there would be the go.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1749023 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1749027 - Posted: 13 Dec 2015, 23:20:25 UTC - in response to Message 1749023.  
Last modified: 13 Dec 2015, 23:34:27 UTC

Yeah, that will start to get into well hidden driver architecture bits there, with Cuda and OpenCL having different priorities. probably the best you can do pending further developments/research, is find a ways to keep them out of one another's way ( e.g. priorities + would out settings ). Probably I'll add some cudamb.cfg settings once I figure out how/if You can tweak these things in code on a Mac at all.Probably similar measures would be needed for OpenCL apps, so chatting with the devs there would be the go.

I suppose since the same driver is running both cards it must be the driver. However, I wouldn't have expected such a Large difference. The Driver is the Latest one you can get from the nVidia Drive Manager, but I see a newer one here; http://www.nvidia.com/download/driverResults.aspx/96724/en-us
Hmmm, mine says System Version: OS X 10.10.5 (14F27). Someone has Version 10.10.5 (14F1505)?
I'll bet that's what this does, https://support.apple.com/en-us/HT205653
I'll also bet if there is any difference at all it will be Slower, it's Always slower ;-)
ID: 1749027 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1749034 - Posted: 13 Dec 2015, 23:52:36 UTC - in response to Message 1749027.  
Last modified: 13 Dec 2015, 23:53:38 UTC

What frequently is forgotten, is that under default settings a Boinc client with GPU(s) is in a state of overcommit out of the box.

While not in such a state, the priorities tend not to matter. When the internal queues are full though, they will. Whether the OpenCL api sits at a higher priority due to being lower level than the Cuda runtime, or some other other reason, The best case scenario would probably be acheivable with both explicit priority management, and apps that yield (sleep) when not active. The Cuda apps do that (cuda blocking sync mode), but I guess looks like the OpenCL one isn't yielding.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1749034 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14644
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1749035 - Posted: 13 Dec 2015, 23:54:35 UTC

I'm used to seeing that sort of behaviour (one type of app speeding up, another type slowing down) when two tasks are running on the same GPU. That seems explicable, as Jason says, with different priorities or different kernel architectures. It applies within SETI (MB and AP), and between SETI and Einstein.

I can't easily see a way that the two apps can interfere with each other when they're on different cards. So, do you have any way (independent of BOINC) of monitoring which apps are running on which card? Remember that CUDA and OpenCL have their own, independent, enumeration schemes, so device 0 for CUDA isn't necessarily device 0 for OpenCL.
ID: 1749035 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1749036 - Posted: 14 Dec 2015, 0:03:33 UTC - in response to Message 1749035.  
Last modified: 14 Dec 2015, 0:05:32 UTC

I'm used to seeing that sort of behaviour (one type of app speeding up, another type slowing down) when two tasks are running on the same GPU. That seems explicable, as Jason says, with different priorities or different kernel architectures. It applies within SETI (MB and AP), and between SETI and Einstein.

I can't easily see a way that the two apps can interfere with each other when they're on different cards. So, do you have any way (independent of BOINC) of monitoring which apps are running on which card? Remember that CUDA and OpenCL have their own, independent, enumeration schemes, so device 0 for CUDA isn't necessarily device 0 for OpenCL.


Easily explained with Unified common command queues within the drivers+OS, but those are proprietary and so in the shadows. Cuda on Windows for example, makes DirectX calls underneath, as does OpenCL on NV. They're still largely graphics based devices and infrastructure (except perhaps using special Tesla compute Cluster drivers), and unification of all the devices with system memory and dedicated memory hardware has been an evolving major player since Vista. That unification involves internal queues, priorities, scheduling and synchronisation through a more centralised scheme. In the Case of Mac/OSX this *might* simply be less overcommit+gaming oriented, and more coarsely layered or chunky.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1749036 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1749041 - Posted: 14 Dec 2015, 0:13:58 UTC - in response to Message 1748943.  

.

BTW, is there some reason the CUDA tasks need 23GBs of virtual memory?
http://setiathome.berkeley.edu/result.php?resultid=4592951921
The ATI OpenCL tasks only need around 3GBs,
http://setiathome.berkeley.edu/result.php?resultid=4592893700


Actually both numbers too high to be true.
ID: 1749041 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1749043 - Posted: 14 Dec 2015, 0:20:06 UTC - in response to Message 1749041.  

.

BTW, is there some reason the CUDA tasks need 23GBs of virtual memory?
http://setiathome.berkeley.edu/result.php?resultid=4592951921
The ATI OpenCL tasks only need around 3GBs,
http://setiathome.berkeley.edu/result.php?resultid=4592893700


Actually both numbers too high to be true.


No idea on that custom Cuda one either, certainly way more than stock on Win, or self built Linux. some sortof memory leaks in the builds perhaps.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1749043 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 58 · Next

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.