I've Built a Couple OSX CUDA Apps...

Author	Message
TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749235 - Posted: 14 Dec 2015, 19:22:31 UTC - in response to Message 1749193. Last modified: 14 Dec 2015, 19:38:38 UTC Seems there is the same slowdown problem in El Capitan when running an AP on one card and CUDA on the other. Here's a resend that ran in 47 mins instead of the normal 35 mins, http://setiathome.berkeley.edu/result.php?resultid=4598060285. Here is a shorty that ran at the same time and instead of finishing in less than 3 mins took 9 mins, http://setiathome.berkeley.edu/result.php?resultid=4597402920 I have a couple more NV AP resends, I suppose I'll change the plan class to have them run on the ATI card. Looks to be the first Invalid also, http://setiathome.berkeley.edu/result.php?resultid=4592931423 Triplets? That task is from yesterday with Yosemite and involved a restart. Hmmm, that was about the time I was recovering all those Ghosts that had been Out of Memory errors. Took about 5 resend events to recover them all. ID: 1749235 ·

Juha Volunteer tester Send message Joined: 7 Mar 04 Posts: 388 Credit: 1,857,738 RAC: 0	Message 1749240 - Posted: 14 Dec 2015, 20:21:56 UTC Linux happily overcommits memory. It only becomes an issue if you actually use that memory. Windows is different in that it wants every memory allocation to be backed by real memory (RAM or page file). Since allocating memory is free CUDA or OpenCL runtimes may be simply allocating buffers or whatever based on some internal limit without bothering to compute just how large the buffer should be. OSX could be similar. ID: 1749240 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749242 - Posted: 14 Dec 2015, 20:46:11 UTC Last modified: 14 Dec 2015, 21:04:20 UTC Just a little comparison. I have FireFox open with 10 tabs. According to Activity Monitor the Virtual Memory size is 4.4 GBs. I have 2 CUDA tasks running, Activity Monitor says they are using 22.6 GBs of VM each. The ATI MB App shows 2.9 GBs assigned. The CPU tasks show 2.44 GBs each. ID: 1749242 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1749276 - Posted: 14 Dec 2015, 22:31:14 UTC Last modified: 14 Dec 2015, 22:35:29 UTC 1) There is difference between committed memory (that is, allocated) and reserved address space. 2) This difference exists in Windows also AFAIK. One can reserve address space w/o committing memory pages RAM or pagefile. 3) Most probably CUDA/OpenCL runtime reserve big address space area to do something with their new unified address space architecture. If OS/driver mixes it with really allocated memory... not apps fault. I would suggest to report this both to NV and Apple. @TBar How much RAM your Mac has? More than 23GB? Do you understand that committed 23GB of virtual memory with let say 8GM RAM available will completely froze your Mac with constant HDD activity? Do you see such activity? ID: 1749276 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749281 - Posted: 14 Dec 2015, 22:54:52 UTC - in response to Message 1749276. 1) There is difference between committed memory (that is, allocated) and reserved address space. 2) This difference exists in Windows also AFAIK. One can reserve address space w/o committing memory pages RAM or pagefile. 3) Most probably CUDA/OpenCL runtime reserve big address space area to do something with their new unified address space architecture. If OS/driver mixes it with really allocated memory... not apps fault. I would suggest to report this both to NV and Apple. @TBar How much RAM your Mac has? More than 23GB? Do you understand that committed 23GB of virtual memory with let say 8GM RAM available will completely froze your Mac with constant HDD activity? Do you see such activity? Yes, I understand the OSX Memory very well. I also understand VM Very well and that it is related to Disk space and when you start using up 23 GBs at a time it goes away quickly. Do you realize to equal the VM of those 2 cuda tasks I would have to run 10 instances of FireFox with 100 tabs open? Don't you think that is a little excessive for two GPU tasks? Here are the numbers; Mon Dec 14 16:34:37 2015 \| \| Starting BOINC client version 7.4.36 for x86_64-apple-darwin Mon Dec 14 16:34:37 2015 \| \| Data directory: /Volumes/Mov1/BOINC/Yosemite/BOINC Data Mon Dec 14 16:34:37 2015 \| \| OS: Mac OS X 10.11.2 (Darwin 15.2.0) Mon Dec 14 16:34:37 2015 \| \| Memory: 6.00 GB physical, 89.19 GB virtual Mon Dec 14 16:34:37 2015 \| \| Disk: 622.12 GB total, 89.19 GB free Mon Dec 14 16:34:37 2015 \| \| Local time is UTC -5 hours Mon Dec 14 16:34:37 2015 \| \| Config: simulate 8 CPUs Mon Dec 14 16:34:37 2015 \| \| max memory usage when active: 3686.40MB Mon Dec 14 16:34:37 2015 \| \| max memory usage when idle: 4915.20MB Mon Dec 14 16:34:37 2015 \| \| max disk usage: 20.00GB Here is the important number, 89.19 GB virtual That means once the total VM equals 89.19 GB You Are Out Of Memory. The VM assignment for the cuda task is Too Large, it's that simple. ID: 1749281 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1749292 - Posted: 14 Dec 2015, 23:26:50 UTC - in response to Message 1749281. Last modified: 14 Dec 2015, 23:28:51 UTC Do you realize to equal the VM of those 2 cuda tasks I would have to run 10 instances of FireFox with 100 tabs open? Don't you think that is a little excessive for two GPU tasks? Here are the numbers; That means once the total VM equals 89.19 GB You Are Out Of Memory. The VM assignment for the cuda task is Too Large, it's that simple. What I'm realize is that 100 open tabs really allocate many GB of memory. And I'm still very doubt that CUDA MB allocates 23GB of memory or OpenCL AP allocates 3GB of memory. Please answer my question about disk activity w/o trying to increase memory consumption with 100 open tabs or whatever. BOINC, 2 CUDA MB running - other apps closed as possible - what disk activity do you see? Sad that I need to ask twice - it's waste of time don't you think so? Host has 6GB of RAM for record (as BOINC reports at least). ID: 1749292 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749295 - Posted: 14 Dec 2015, 23:37:16 UTC - in response to Message 1749292. Do you realize to equal the VM of those 2 cuda tasks I would have to run 10 instances of FireFox with 100 tabs open? Don't you think that is a little excessive for two GPU tasks? Here are the numbers; That means once the total VM equals 89.19 GB You Are Out Of Memory. The VM assignment for the cuda task is Too Large, it's that simple. What I'm realize is that 100 open tabs really allocate many GB of memory. And I'm still very doubt that CUDA MB allocates 23GB of memory or OpenCL AP allocates 3GB of memory. Please answer my question about disk activity w/o trying to increase memory consumption with 100 open tabs or whatever. BOINC, 2 CUDA MB running - other apps closed as possible - what disk activity do you see? Sad that I need to ask twice - it's waste of time don't you think so? Host has 6GB of RAM for record (as BOINC reports at least). This is from Top, 0(0) swapins, 0(0) swapouts That mean s NO swap activity. OSX VM is based on Disk space, once the VM exceeds that space you are Out Of Memory. Sad that a developer doesn't understand the basics of OSX VM. How can you possibly justify 23 GB of VM for a simple Unix GPU task? ID: 1749295 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1749296 - Posted: 14 Dec 2015, 23:39:25 UTC - in response to Message 1749295. Where in the SETI CUDA source code do you see this 23 GB being requested? ID: 1749296 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1749302 - Posted: 14 Dec 2015, 23:45:39 UTC - in response to Message 1749295. Last modified: 14 Dec 2015, 23:55:39 UTC This is from Top, 0(0) swapins, 0(0) swapouts That mean s NO swap activity. OSX VM is based on Disk space, once the VM exceeds that space you are Out Of Memory. Sad that a developer doesn't understand the basics of OSX VM. How can you possibly justify 23 GB of VM for a simple Unix GPU task? Thanks for direct answer. It's sad indeed that I never had any connection to OS X or Mac through my life. Well, if you donate it to me we could close this gap in my education ;) LoL. And back to troubleshooting your issue: zero HDD activity means that even if it (memory) would be allocated it never gets access (instead swapping would be inevitable). Cause there is NO swaping at allocation time too one can conclude (even having no knowledge of OS X VM architecture :P ) there is NO allocation made at all. Hence, we return to one of my prevs posts about virtual address space reservation vs real memory allocation. NV CUDA/OpenCL runtime reserves too much address space. Maybe there are some settings (through env variable for example) to restrict it. I would recommend to check with NV _and_ Apple's forums and support. For now that's all. ID: 1749302 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749304 - Posted: 14 Dec 2015, 23:50:41 UTC - in response to Message 1749296. Where in the SETI CUDA source code do you see this 23 GB being requested? Here comes the mis-direction. Look at the Results page for the Peak swap size. Look in the BOINC Properties display for the Virtual Memory size. Look in Activity Monitor for the Virtual Memory size. Where do you think those numbers are coming from? ID: 1749304 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1749307 - Posted: 14 Dec 2015, 23:56:45 UTC - in response to Message 1749304. Where do you think those numbers are coming from? Your comments Sad that a developer doesn't understand the basics of OSX VM. How can you possibly justify 23 GB of VM for a simple Unix GPU task? suggest that you think the developer is responsible for the allocation request. If so, how and where? If not, who/where else? ID: 1749307 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749309 - Posted: 15 Dec 2015, 0:02:56 UTC - in response to Message 1749302. This is from Top, 0(0) swapins, 0(0) swapouts That mean s NO swap activity. OSX VM is based on Disk space, once the VM exceeds that space you are Out Of Memory. Sad that a developer doesn't understand the basics of OSX VM. How can you possibly justify 23 GB of VM for a simple Unix GPU task? Thanks for direct answer. It's sad indeed that I never had any connection to OS X or Mac through my life. Well, if you donate it to me we could close this gap in my education ;) LoL. And back to troubleshooting your issue: zero HDD activity means that even if it (memory) would be allocated it never gets access (instead swapping would be inevitable). Cause there is NO swaping at allocation time too one can conclude (even having no knowledge of OS X VM architecture :P ) there is NO allocation made at all. Hence, we return to one of my prevs posts about virtual address space reservation vs real memory allocation. NV CUDA/OpenCL runtime reserves too much address space. Maybe there are some settings (through env variable for example) to restrict it. I would recommend to check with NV _and_ Apple's forums and support. For now that's all. You still don't understand the basics. VM is requested by the App and added to the total. Once the total exceeds available disk space you are Out. Actually I though this was a SETI/BOINC problem as they are the ones holding the code. I'm just pointing out how the code appears to have a problem with VM assignment. ID: 1749309 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749310 - Posted: 15 Dec 2015, 0:07:03 UTC - in response to Message 1749307. Where do you think those numbers are coming from? Your comments Sad that a developer doesn't understand the basics of OSX VM. How can you possibly justify 23 GB of VM for a simple Unix GPU task? suggest that you think the developer is responsible for the allocation request. If so, how and where? If not, who/where else? 23 GBs is requested by the App, it's certainly someones problem. It certainly isn't My problem. You're welcome for my help identifying a VM assignment problem. Whether or not you act on my help is up to you. Have a good one. ID: 1749310 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1749311 - Posted: 15 Dec 2015, 0:07:20 UTC - in response to Message 1749309. VM is requested by the App [hence, by the developer who wrote the app] the code appears to have a problem with VM assignment. My point exactly. The problem now is to read the source code and find out where the VM assignment occurs. In this context ("I'm Trying to Build an OSX CUDA App..."), you are the developer. ID: 1749311 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1749315 - Posted: 15 Dec 2015, 0:17:47 UTC - in response to Message 1749309. Actually I though this was a SETI/BOINC problem as they are the ones holding the code. Reasons? Listed reason just assumption w/o real base. Apple holds the code, NV holds the code, any libs in run authors hold the code and so on. To have reason to say that you should demonstrate that it's SETI/BOINC issue. Leaving aside allocation vs reservation issue, what signs that this issue belongs solely to BOINC/SETI ? Did you check Einstein's application for example? Did you post top's output for CUDA SDK's samples? W/o this - no actual reasons. ID: 1749315 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1749317 - Posted: 15 Dec 2015, 0:26:19 UTC I'm just pointing out how the code appears to have a problem with VM assignment. And blame authors of SETI app code collaterally ;) No one likes to be blamed for nothing. As I already tried few times to explain to you there is no memory allocations of such sizes anywhere in SETI MB/AP code. Don't trust me - check by yourself as Richard suggested ;). So why such big address space reservation occurs and who does it? You say it's BOINC/SETI issue - this needs troubleshooting. Try to do suggested steps then we could discuss result tomorrow. ID: 1749317 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1749335 - Posted: 15 Dec 2015, 1:24:57 UTC Last modified: 15 Dec 2015, 1:25:56 UTC Well still has me interested enough to track down exactly where that fantastic number comes from, red herring or not. post hoc ergo propter hoc rarely turns out to be the right assumption in something this complex. I'll stick with my Occam's Razor assessment, that the Mac driver stacks are different, and so the work needs to be fed differently (especially with multiple GPUs + instances + apps) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1749335 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749354 - Posted: 15 Dec 2015, 2:07:27 UTC - in response to Message 1749335. Last modified: 15 Dec 2015, 2:09:51 UTC Obviously the number is real, as it is running the system out of VM space. This problem doesn't exist with the other OSX Apps which are around 3 GBs or less. It is strictly a CUDA problem. I've run 6 tasks at once on my 3 ATI cards without a problem. It's a joke there is a problem running just 2 tasks on two nVidia cards. Someone is going to have to read the manual and then try to apply it to the code. Seeings as how there are some around here that seem to know where to start looking I would suggest they give hints. Just remember, the other Apps don't have this problem. ID: 1749354 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1749377 - Posted: 15 Dec 2015, 4:22:24 UTC - in response to Message 1749354. You will need to check with Petri and the other guy what they did with what you are using .. I can't really vouch for the modified builds and code you are using. (though they may or may not be fine) No not saying the number isn't 'real' in some way, just saying if you genuinely believe there is ~32 Gigabytes being explicitly allocated somewhere in the app code, then finding it would be helpful. I'm occupied with (re)creating the Mac build-system from scratch for XBranch. As a test, and for familiarisation, managed to get boinc api+libraries built from current head, and client to compile in XCode. I'm short a few dependencies to make an executable client, which I don't need yet, so will pause that there. clean skeleton makefiles driving nvcc and clang looks like the way to go comparing to the samples. IF that plays ball as expected, then I'll commit a mac_build directory under the client one. glibtool and all the ports/brew messing around should not be needed for this particular codebase. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1749377 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749383 - Posted: 15 Dec 2015, 5:32:06 UTC - in response to Message 1749377. I just had the same experience again with the error; A cuFFT plan FAILED, Initiating Boinc temporary exit (180 secs) I came in to find one card running a cuda and the other running an AP...slowly. I decided to suspend all the cudas and let it finish 4 APs then go back to cuda. After running a couple APs on both cards the cuda task failed to start with the above error. I had to reboot to get the cudas to work again. Strange stuff. I found a possible answer to large VM, but it doesn't explain why in my case 6 + 2 + 2 = 23 http://stackoverflow.com/questions/11631191/why-does-the-cuda-runtime-reserve-80-gib-virtual-memory-upon-initialization I'm just wondering what would happen if the App was compiled as 32-bit, would it even run? ID: 1749383 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.