I've Built a Couple OSX CUDA Apps...

Author	Message
Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1749044 - Posted: 14 Dec 2015, 0:21:33 UTC - in response to Message 1749035. I can't easily see a way that the two apps can interfere with each other when they're on different cards. So, do you have any way (independent of BOINC) of monitoring which apps are running on which card? Remember that CUDA and OpenCL have their own, independent, enumeration schemes, so device 0 for CUDA isn't necessarily device 0 for OpenCL. It's possible explanation indeed. If OS X has smth like GPU-Z TBar needs to use it to check GPU load for both cards. If both AP and MB tasks simultaneously assigned to the same device ... no wonder they slow each other. Another possibilities: CPU core concurence - can be exclused by pinning apps to different cores. some serialization inside driver that doesn't allow really parallel operation of 2 GPU cards. Can be healed only by driver replacement. ID: 1749044 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1749045 - Posted: 14 Dec 2015, 0:24:57 UTC - in response to Message 1749043. . BTW, is there some reason the CUDA tasks need 23GBs of virtual memory? http://setiathome.berkeley.edu/result.php?resultid=4592951921 The ATI OpenCL tasks only need around 3GBs, http://setiathome.berkeley.edu/result.php?resultid=4592893700 Actually both numbers too high to be true. No idea on that custom Cuda one either, certainly way more than stock on Win, or self built Linux. some sortof memory leaks in the builds perhaps. Well, for OpenCL task I don't understand from where number of 3GB came at all: sterr clearly states: Currently allocated 185 MB for GPU buffers ID: 1749045 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749054 - Posted: 14 Dec 2015, 1:01:55 UTC - in response to Message 1749045. Last modified: 14 Dec 2015, 1:12:39 UTC . BTW, is there some reason the CUDA tasks need 23GBs of virtual memory? http://setiathome.berkeley.edu/result.php?resultid=4592951921 The ATI OpenCL tasks only need around 3GBs, http://setiathome.berkeley.edu/result.php?resultid=4592893700 Actually both numbers too high to be true. No idea on that custom Cuda one either, certainly way more than stock on Win, or self built Linux. some sortof memory leaks in the builds perhaps. Well, for OpenCL task I don't understand from where number of 3GB came at all: sterr clearly states: Currently allocated 185 MB for GPU buffers It's listed as Peak Swap size on the sterr, and the 'Properties' button in BOINC lists it as Virtual Memory size. The ATI AP App is listing it as 2.7GBs, and the NV AP App says 3.15GBs right now. The Older CUDA 5.5 App also uses 23GBs, http://setiathome.berkeley.edu/result.php?resultid=4595405654 23GBs does seem quite large. There isn't any App similar to GPUz on the Mac and besides the BOINC Manager and Activity Monitor there is CUDA-Z which does show both cards. I'll look at CUDA-Z next time to see what it shows. When it rains.... I ran the Yosemite Updates and somehow ended up with version 1509. The latest nVidia driver is for 1505 and won't work with 1509. I'll have to wait for nVidia to release another driver for Yosemite. Meanwhile I'm back to running El Capitan... ID: 1749054 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30669 Credit: 53,134,872 RAC: 32	Message 1749092 - Posted: 14 Dec 2015, 5:57:55 UTC - in response to Message 1749041. . BTW, is there some reason the CUDA tasks need 23GBs of virtual memory? http://setiathome.berkeley.edu/result.php?resultid=4592951921 The ATI OpenCL tasks only need around 3GBs, http://setiathome.berkeley.edu/result.php?resultid=4592893700 Actually both numbers too high to be true. Wondering if the 23GB number is the O/S hard limit being reported and not at all related to actual use? ID: 1749092 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1749094 - Posted: 14 Dec 2015, 6:04:38 UTC 32Gb is the same as in linux. top - 08:02:19 up 11:32, 3 users, load average: 7,18, 7,06, 7,05 Tasks: 293 total, 8 running, 285 sleeping, 0 stopped, 0 zombie %Cpu(s): 0,3 us, 0,9 sy, 56,4 ni, 42,5 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st KiB Mem: 8112704 total, 3813956 used, 4298748 free, 179628 buffers KiB Swap: 8325116 total, 0 used, 8325116 free. 1576248 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14450 root 39 19 50428 42552 4272 R 100,3 0,5 26:30.33 ../../projects/setiathome.berkeley.edu/MBv7_7.05r2549_avx_linux32 15602 root 39 19 50428 39904 4272 R 100,0 0,5 13:24.45 ../../projects/setiathome.berkeley.edu/MBv7_7.05r2549_avx_linux32 15636 root 39 19 52944 49028 4272 R 100,0 0,6 12:49.71 ../../projects/setiathome.berkeley.edu/MBv7_7.05r2549_avx_linux32 15656 root 39 19 52944 49088 4272 R 100,0 0,6 12:34.48 ../../projects/setiathome.berkeley.edu/MBv7_7.05r2549_avx_linux32 16027 root 39 19 50428 42100 4272 R 100,0 0,5 6:53.65 ../../projects/setiathome.berkeley.edu/MBv7_7.05r2549_avx_linux32 11451 root 39 19 53300 49848 4272 R 99,6 0,6 71:34.09 ../../projects/setiathome.berkeley.edu/MBv7_7.05r2549_avx_linux32 16482 root 30 10 32,396g 348660 238180 S 25,2 4,3 0:19.47 ../../projects/setiathome.berkeley.edu/setiathome_x41zc_x86_64-pc-linux-gnu_cuda65 -pfb 16 -pfp 192 --device 1 16567 root 30 10 32,425g 382504 256128 S 25,2 4,7 0:04.91 ../../projects/setiathome.berkeley.edu/setiathome_x41zc_x86_64-pc-linux-gnu_cuda65 -pfb 16 -pfp 192 --device 2 16526 root 30 10 32,396g 350856 238200 R 18,6 4,3 0:14.14 ../../projects/setiathome.berkeley.edu/setiathome_x41zc_x86_64-pc-linux-gnu_cuda65 -pfb 16 -pfp 192 --device 3 14614 root 30 10 32,389g 146416 110624 S 7,6 1,8 1:57.71 ../../projects/setiathome.berkeley.edu/ap_7.01r2793_sse3_clGPU_x86_64 -sbs 512 -oclFFT_plan 256 16 256 -tune 1 64 1+ 16300 root 30 10 32,389g 144256 110632 S 7,0 1,8 0:20.11 ../../projects/setiathome.berkeley.edu/ap_7.01r2793_sse3_clGPU_x86_64 -sbs 512 -oclFFT_plan 256 16 256 -tune 1 64 1+ 872 root 20 0 244316 75732 47468 S 4,0 0,9 1:26.81 /usr/bin/X -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch 2054 root 20 0 1752820 26860 10544 S 0,7 0,3 6:35.79 /home/petri/Downloads/BOINC/boinc --redirectio --launched_by_manager 1461 petri 20 0 1376592 163228 80000 S 0,3 2,0 1:51.10 compiz 1799 petri 20 0 653396 37380 27220 S 0,3 0,5 0:44.37 /usr/lib/gnome-terminal/gnome-terminal-server 2293 root 20 0 14164 3128 2860 S 0,3 0,0 1:36.43 nvidia-smi -l 16539 petri 20 0 30576 3420 2748 R 0,3 0,0 0:00.17 top -c To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1749094 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749097 - Posted: 14 Dec 2015, 6:13:20 UTC Jason, check your Mac, http://setiathome.berkeley.edu/result.php?resultid=4524427601 Peak working set size: 159.50 MB Peak swap size: 47,805.18 MB setiathome enhanced x41zc, Cuda 5.50 ID: 1749097 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1749114 - Posted: 14 Dec 2015, 6:55:12 UTC - in response to Message 1749097. Well don't know where it's getting that number from, but worthy to investigate. Seems unlikely to be realistic ;) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1749114 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1749144 - Posted: 14 Dec 2015, 8:32:01 UTC - in response to Message 1749054. It's listed as Peak Swap size on the sterr, and the 'Properties' button in BOINC lists it as Virtual Memory size. Please post that log. ID: 1749144 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1749145 - Posted: 14 Dec 2015, 8:34:22 UTC - in response to Message 1749094. Last modified: 14 Dec 2015, 8:36:05 UTC 32Gb is the same as in linux. top - 08:02:19 up 11:32, 3 users, load average: 7,18, 7,06, 7,05 Tasks: 293 total, 8 running, 285 sleeping, 0 stopped, 0 zombie %Cpu(s): 0,3 us, 0,9 sy, 56,4 ni, 42,5 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st KiB Mem: 8112704 total, 3813956 used, 4298748 free, 179628 buffers KiB Swap: 8325116 total, 0 used, 8325116 free. 1576248 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14450 root 39 19 50428 42552 4272 R 100,3 0,5 26:30.33 ../../projects/setiathome.berkeley.edu/MBv7_7.05r2549_avx_linux32 15602 root 39 19 50428 39904 4272 R 100,0 0,5 13:24.45 ../../projects/setiathome.berkeley.edu/MBv7_7.05r2549_avx_linux32 15636 root 39 19 52944 49028 4272 R 100,0 0,6 12:49.71 ../../projects/setiathome.berkeley.edu/MBv7_7.05r2549_avx_linux32 15656 root 39 19 52944 49088 4272 R 100,0 0,6 12:34.48 ../../projects/setiathome.berkeley.edu/MBv7_7.05r2549_avx_linux32 16027 root 39 19 50428 42100 4272 R 100,0 0,5 6:53.65 ../../projects/setiathome.berkeley.edu/MBv7_7.05r2549_avx_linux32 11451 root 39 19 53300 49848 4272 R 99,6 0,6 71:34.09 ../../projects/setiathome.berkeley.edu/MBv7_7.05r2549_avx_linux32 16482 root 30 10 32,396g 348660 238180 S 25,2 4,3 0:19.47 ../../projects/setiathome.berkeley.edu/setiathome_x41zc_x86_64-pc-linux-gnu_cuda65 -pfb 16 -pfp 192 --device 1 16567 root 30 10 32,425g 382504 256128 S 25,2 4,7 0:04.91 ../../projects/setiathome.berkeley.edu/setiathome_x41zc_x86_64-pc-linux-gnu_cuda65 -pfb 16 -pfp 192 --device 2 16526 root 30 10 32,396g 350856 238200 R 18,6 4,3 0:14.14 ../../projects/setiathome.berkeley.edu/setiathome_x41zc_x86_64-pc-linux-gnu_cuda65 -pfb 16 -pfp 192 --device 3 14614 root 30 10 32,389g 146416 110624 S 7,6 1,8 1:57.71 ../../projects/setiathome.berkeley.edu/ap_7.01r2793_sse3_clGPU_x86_64 -sbs 512 -oclFFT_plan 256 16 256 -tune 1 64 1+ 16300 root 30 10 32,389g 144256 110632 S 7,0 1,8 0:20.11 ../../projects/setiathome.berkeley.edu/ap_7.01r2793_sse3_clGPU_x86_64 -sbs 512 -oclFFT_plan 256 16 256 -tune 1 64 1+ 872 root 20 0 244316 75732 47468 S 4,0 0,9 1:26.81 /usr/bin/X -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch 2054 root 20 0 1752820 26860 10544 S 0,7 0,3 6:35.79 /home/petri/Downloads/BOINC/boinc --redirectio --launched_by_manager 1461 petri 20 0 1376592 163228 80000 S 0,3 2,0 1:51.10 compiz 1799 petri 20 0 653396 37380 27220 S 0,3 0,5 0:44.37 /usr/lib/gnome-terminal/gnome-terminal-server 2293 root 20 0 14164 3128 2860 S 0,3 0,0 1:36.43 nvidia-smi -l 16539 petri 20 0 30576 3420 2748 R 0,3 0,0 0:00.17 top -c And what description of top's VIRT field says? What it reports? Combining it with totals: KiB Mem: 8112704 total, 3813956 used, 4298748 free, 179628 buffers KiB Swap: 8325116 total, 0 used, 8325116 free. 1576248 cached Mem one easely can see that 32kg of "VIRT" has no connection to committed memory. ID: 1749145 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749150 - Posted: 14 Dec 2015, 9:23:59 UTC - in response to Message 1749144. Last modified: 14 Dec 2015, 9:39:59 UTC It's listed as Peak Swap size on the sterr, and the 'Properties' button in BOINC lists it as Virtual Memory size. Please post that log. Here you are, the Peak swap size is listed on every result and matches the Virtual memory size as seen in numerous tools, http://setiathome.berkeley.edu/results.php?hostid=6796479&state=4&appid=11 Here are the same Swap sizes on Jason's Mac, http://setiathome.berkeley.edu/results.php?hostid=7644315 Here is the same Error I received last night. The same scenario, the first card to run out of APs started a CUDA task and immediately threw an Out of Memory Error. I was ready this time and had the other CUDAs suspended. Right now though, it will give the Memory Error on any CUDA it tries to start. Cuda error 'cudaMalloc((void) &dev_WorkData' in file 'cuda/cudaAcceleration.cu' in line 433 : out of memory**. Now what? Wait for the last NV AP to finish? Restart the machine? The Last NV AP finished, but it still refuses to run a CUDA giving this error; A cuFFT plan FAILED, Initiating Boinc temporary exit (180 secs) I suppose it's restart time. Since the CUDA55 App has the same outrageous Virtual Memory requirements, and was compiled Almost Two years ago, I'd say the problem is buried in the code somewhere. ID: 1749150 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1749151 - Posted: 14 Dec 2015, 9:28:53 UTC - in response to Message 1749150. Last modified: 14 Dec 2015, 9:37:32 UTC Will crank up the Mac pro again for comparison (since I need to be working on that soon anyway). I don't recall seeing any particular erroring out due to memory running out while it was running, but will keep an eye out. Don't know what boincapi was used to make the Cuda 5.5 app, and allocations within the app are way smaller than that and tightly controlled, though it doesn't exclude system, driver, [boincapi] or Cuda runtime bugs. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1749151 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1749152 - Posted: 14 Dec 2015, 9:43:23 UTC - in response to Message 1749150. Last modified: 14 Dec 2015, 9:44:00 UTC Looks like possibly Mac terminology for 'Address space', which is a different thing than physical or virtual memory usage. Will see what shakes out. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1749152 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749155 - Posted: 14 Dec 2015, 9:56:39 UTC - in response to Message 1749152. Last modified: 14 Dec 2015, 9:59:14 UTC It's called Virtual Memory and it's nothing new. I didn't have any trouble when I was using just One CUDA card either. With one card you can't do things such as run APs for hours then have one card switch to CUDA while the other card is still working an AP. In both cases that is where the problem exists. So, unless you have Two NV cards, and have them Both running APs before switching to CUDAs, you might not see this 'bug'. Restarting the machine seems to have fixed the problem...for now. ID: 1749155 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1749156 - Posted: 14 Dec 2015, 9:59:29 UTC - in response to Message 1749155. Last modified: 14 Dec 2015, 10:00:09 UTC Well it would seem to reinforce the driver[/Cuda/OS] bug possibility (without further information to the contrary). I'd recommend gathering as much as you can and reporting it to NV. Maybe they have a multi-GPU MAc pro running and can replicate. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1749156 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749157 - Posted: 14 Dec 2015, 10:06:17 UTC - in response to Message 1749156. Last modified: 14 Dec 2015, 10:11:37 UTC Well it would seem to reinforce the driver[/Cuda/OS] bug possibility (without further information to the contrary). I'd recommend gathering as much as you can and reporting it to NV. Maybe they have a multi-GPU MAc pro running and can replicate. I'm willing to bet Real money nVidia will just tell you to reduce the Apps Virtual Memory setting to just a few hundred MB instead of over 20 GBs. Something such as this machine; Peak working set size: 167.74 MB Peak swap size: 169.28 MB http://setiathome.berkeley.edu/result.php?resultid=4578245382 I'm pretty sure the App sets the Memory requirements, unless something has changed recently. Maybe someone got the decimal points wrong, sorta like what just happened at Beta. ID: 1749157 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1749158 - Posted: 14 Dec 2015, 10:14:08 UTC - in response to Message 1749157. If they tried that, I would immediately abandon all nVidia products, because there is no such setting :) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1749158 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1749159 - Posted: 14 Dec 2015, 10:22:00 UTC - in response to Message 1749158. If they tried that, I would immediately abandon all nVidia products, because there is no such setting :) Here is a nVidia AstroPulse Task using the Exact same driver as the CUDA task. http://setiathome.berkeley.edu/result.php?resultid=4596454729 Note the 3.2 GB Peak swap size. Same Driver. Last night I was using Yosemite, tonight El Capitan. Different System, Different Driver, Same Problem. I don't think it's the driver. ID: 1749159 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1749160 - Posted: 14 Dec 2015, 10:28:18 UTC - in response to Message 1749159. If they tried that, I would immediately abandon all nVidia products, because there is no such setting :) Here is a nVidia AstroPulse Task using the Exact same driver as the CUDA task. http://setiathome.berkeley.edu/result.php?resultid=4596454729 Note the 3.2 GB Peak swap size. Same Driver. Last night I was using Yosemite, tonight El Capitan. Different System, Different Driver, Same Problem. I don't think it's the driver. I included OS, boincapi, and system in the suspects. Any of those can generate memory leaks, or otherwise be a source for using a detected number incorrectly. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1749160 ·

Gary Charpentier Volunteer tester Send message Joined: 25 Dec 00 Posts: 30669 Credit: 53,134,872 RAC: 32	Message 1749191 - Posted: 14 Dec 2015, 14:53:07 UTC - in response to Message 1749152. Looks like possibly Mac terminology for 'Address space', which is a different thing than physical or virtual memory usage. Will see what shakes out. Mac reports limits in a different way than Linux. Check the code to see that the right call to the right set of API's is being made. The localization may have been missed, or the result may have a different meaning. It's a RTFM thing. ID: 1749191 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1749193 - Posted: 14 Dec 2015, 14:59:21 UTC - in response to Message 1749191. Last modified: 14 Dec 2015, 15:00:39 UTC Looks like possibly Mac terminology for 'Address space', which is a different thing than physical or virtual memory usage. Will see what shakes out. Mac reports limits in a different way than Linux. Check the code to see that the right call to the right set of API's is being made. The localization may have been missed, or the result may have a different meaning. It's a RTFM thing. Yeah, certainly does look that way. Will be rifling through Boinc code next. Looks like XCode's just received an update, as well as fairly notable Boinc codebase changes since the Cuda 5.5 version would have been built, including api+lib memory leak plugs, though probably not relevant to this particular observation of TBar's. Probably going to have to fish for the right tools to verify those vulnerabilities are truly fixed, and others aren't introduced. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1749193 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.