I've Built a Couple OSX CUDA Apps...

Author	Message
jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1793481 - Posted: 4 Jun 2016, 20:39:18 UTC - in response to Message 1793429. Well I run the display off the Radeon, so shouldn't be a problem. I've yet to actually see a crash after finish or sigbus errors on mine, so not sure where they are coming from on yours. What build sequence did you use to build the boinc libraries ? the described XCode one in mac_build of the Boinc source tree ? or some other method ? "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1793481 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1793489 - Posted: 4 Jun 2016, 21:01:52 UTC - in response to Message 1793481. Well I run the display off the Radeon, so shouldn't be a problem. I've yet to actually see a crash after finish or sigbus errors on mine, so not sure where they are coming from on yours. What build sequence did you use to build the boinc libraries ? the described XCode one in mac_build of the Boinc source tree ? or some other method ? This; Well, the CUDA 8.0 App isn't any better than the CUDA 7.5 App. The run-times are similar on my GTX 950s & 750Ti and it seems to produce just as many, if not More, SIGBUS Errors. I even tried using the 'Baseline' seti.cpp with 8.0 & 7.5 and didn't see any change. The Baseline Apps and the Special Apps compiled with Toolkit 6.5 doesn't have this problem with SIGBUS Errors After the Results have been printed. The Special CUDA 6.5 app is about 3 to 4 minutes slower than the other 2 on a 30 minute BLC VLAR. This is using the x41p_zi code from the Repository folder, the newer code seems to produce the same SIGBUS Errors with more Inconclusive results. Since it appears you are using the Baseline App, I doubt you will see any SIGBUS Errors. I did post a few Apps in that thread at C.A. The Apps with cuda75 at the end are the ones causing SIGBUS Errors on my machine. I was hoping you might try one and experience the Errors first hand. Now I have a few with cuda80 at the end that also cause SIGBUS Errors. These Apps compiled with the 'Special' code with ToolKit 7.5 and 8.0 are the Only Apps that cause this Error, None of the other Apps have this problem. Right now I'm running the OpenCL App I built and everything is fine, I can even run a couple more CPU tasks than with the CUDA Special App. ID: 1793489 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1793492 - Posted: 4 Jun 2016, 21:16:16 UTC - in response to Message 1793489. Hmmm. Got a feeling I know what might be happening there. Will try a few things out once I get this beast updated. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1793492 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1794064 - Posted: 6 Jun 2016, 19:31:55 UTC - in response to Message 1793492. A couple more items found over the weekend. First, compiling from the folder PetriR_raw2 runs almost all the way through then hangs with; Undefined symbols for architecture x86_64: "cudaAcc_GetAutoCorrelation(float, int, int)", referenced from: seti_analyze(ANALYSIS_STATE&) in seti_cuda-analyzeFuncs.o "cudaAcc_FindAutoCorrelations(int, int)", referenced from: seti_analyze(ANALYSIS_STATE&) in seti_cuda-analyzeFuncs.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) make[2]: [seti_cuda] Error 1 make[1]: * [all-recursive] Error 1 ... Next, using OpenCL in AKv8 will compile an OpenCL CPU App that fails with; 01:52:46 (8900): Can't open init data file - running in standalone mode Not using mb_cmdline.txt-file, using commandline options. 01:52:46 (8900): Can't open init data file - running in standalone mode WARNING: init_data.xml missing OpenCL platform detected: Apple WARNING: BOINC supplied wrong platform! Number of OpenCL devices found : 1 BOINC assigns slot on device #0. WARNING: BOINC failed to provide OpenCL device, using own enumeration abilities WARNING: CPU used as OpenCL device Build features: SETI8 Non-graphics OpenCL USE_OPENCL_CPU FFTW SSE4.1 64bit System: Darwin x86_64 Kernel: 15.4.0 CPU : Intel(R) Xeon(R) CPU E5472 @ 3.00GHz GenuineIntel x86, Family 6 Model 23 Stepping 6 Features : FPU TSC PAE APIC MTRR MMX SSE SSE2 HT SSE3 SSSE3 SSE4.1 OpenCL-kernels filename : MultiBeam_Kernels_r3414.cl INFO: can't open binary kernel file: .//MultiBeam_Kernels_r3414.cl_IntelRXeonRCPUE5472300GHz.bin_V7_15.4.0_11, continue with recompile... Error : Building Program (binary, clBuildProgram):main kernels: not OK code -11 CL file build log on device Intel(R) Xeon(R) CPU E5472 @ 3.00GHz error: definition of macro '__APPLE__' conflicts with an identifier used in the precompiled header APPLE is not Identified correctly??? Provided the CPU App eventually works, I'm hoping it will just use the number of cores set in the Preferences? Hopefully it won't use All the cores? Hmmm, Apps that give Errors, Apps that won't compile, Compiled Apps that won't run... I suppose I could just go out on the deck and watch the rain, http://www.accuweather.com/en/us/tampa-fl/33602/weather-warnings-1604937/347937 ID: 1794064 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1794093 - Posted: 6 Jun 2016, 21:51:52 UTC - in response to Message 1794064. Probably just missing the Makefile mods to use Petri's streamed and streamlined Autocorrelation (added as one or more extra .cpp/.cu/.h files iirc, which would not be in the baseline scripts). Probably if I didn't forget that file in the upload, which is quite possible, then rerunning configure would have clobbered any manual entry he may have made anyway. Since I last used the flat makefile on mac, rather than poke around in the configure makefile input templates, I can probably verify that one way or another in due course. In preparing for integration, the alternative flat Makefile is an intermediate step to comprehensive Gradle automation. While the shift in build system isn't strictly necessary, longer term complete cross platform automation through to deployment is the target. Not quite there yet (It's intended as a key component for x42 series), so in the meantime hacking on Makefiles is necessary, where source file arrangement/names change at all. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1794093 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1794545 - Posted: 8 Jun 2016, 20:57:46 UTC Last modified: 8 Jun 2016, 21:03:55 UTC Seems nVidia OpenCL AstroPulse is also Borked in Darwin 15.4. I don't know about all the results yet, but they take almost twice as long to run in 15.4 verses 14.5. I even tried to build a new App in 15.4 to see if it was any better...it wasn't. Trying different MBv8 Apps isn't any better either, it would appear nVidia OpenCL is totally Borked in 15.4. I also ran across an error about 'no symbols' for BOINCs_dev. Appears that BOINCs_dev relates to earlier problems noticed with OpenCL device detection posted earlier. One line is, "AquireExecutionSlot::BOINCs_device=selected_device;" another, "BOINCs_dev=selected_device;" I didn't have that problem a while back when I built r2935. Fortunately, Raistmer's code does a pretty good job assigning it's own device. At least the APs run in 15.4 are validating so far, http://setiathome.berkeley.edu/results.php?hostid=6796479&appid=20 In other news, Toolkit 7.5 actually installs and works in Mountain Lion. Not only that, it would appear the cuda75 App compiled in ML does Not have the SIGBUS Errors and is just as fast as the Apps compiled in Mavericks and above. At least it hasn't given any SIGBUS Errors yet anyway... ID: 1794545 ·

Chris Adamek Volunteer tester Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236	Message 1794548 - Posted: 8 Jun 2016, 21:09:54 UTC - in response to Message 1794545. I've not tried to run Astropulse on my 750ti's yet. Almost installed it last night but didn't quite get around to it as I'm still fighting a little jet lag from my trip to Japan. I think in general there seem to be some issues with OpenCL in 15.4 & 15.5. I've not tried 15.6 and will probably wait until 16.0 next week to see how things do with it as it seemed resolve the issues I was seeing. Maybe 16.0 will fix the nvidia issues too... Chris ID: 1794548 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1794551 - Posted: 8 Jun 2016, 21:26:24 UTC - in response to Message 1794548. There's another problem with running the APs mentioned a while back. If you have one card running an OpenCL App, and another running a CUDA App, both tasks will be slowed Waaayyy Down. The CUDA task will be around 3 times as slow as normal. So, it's best to run the APs on All the cards at the same time. Either suspend the other tasks or edit the <received_time> in client_state.xml so all the APs run at the same time. They are almost more trouble than they are worth, I have been just ignoring them. I just wanted to see if there was any difference between 15.4 and 14.5, and there is. ID: 1794551 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13797 Credit: 208,696,464 RAC: 304	Message 1794583 - Posted: 9 Jun 2016, 0:41:25 UTC - in response to Message 1794551. If you have one card running an OpenCL App, and another running a CUDA App, both tasks will be slowed Waaayyy Down. Different tasks, on different cards, with different applications significantly affect each other? Grant Darwin NT ID: 1794583 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1794587 - Posted: 9 Jun 2016, 0:52:42 UTC - in response to Message 1794583. If you have one card running an OpenCL App, and another running a CUDA App, both tasks will be slowed Waaayyy Down. Different tasks, on different cards, with different applications significantly affect each other? On newer OSX, there appear to be the largest (system/driver-stack) latencies involved that I've come across on the 3 platforms so far. That's going to require scaling up everything to reduce and hide them effectively. Fortunately I may have found some way to get meaningful utilisation data on this platform, where monitoring tools for NV are quite limited (to be confirmed/rejected when I can). Petri's approach with Cuda streams should ultimately have the biggest impact on this platform. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1794587 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13797 Credit: 208,696,464 RAC: 304	Message 1794593 - Posted: 9 Jun 2016, 1:15:10 UTC - in response to Message 1794587. If you have one card running an OpenCL App, and another running a CUDA App, both tasks will be slowed Waaayyy Down. Different tasks, on different cards, with different applications significantly affect each other? On newer OSX, there appear to be the largest (system/driver-stack) latencies involved that I've come across on the 3 platforms so far. That's going to require scaling up everything to reduce and hide them effectively. Fortunately I may have found some way to get meaningful utilisation data on this platform, where monitoring tools for NV are quite limited (to be confirmed/rejected when I can). Petri's approach with Cuda streams should ultimately have the biggest impact on this platform. On my Win10 system, running a Guppie on a card (eg Device 1) with another WU really drags out the processing time for that other WU (to be expected), but I haven't seen any indication of it affecting the processing times of WUs running on the other card (eg Device 0), although I am using -poll & have 1 CPU core reserved for each GPU WU. Grant Darwin NT ID: 1794593 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1794597 - Posted: 9 Jun 2016, 1:29:55 UTC - in response to Message 1794587. Last modified: 9 Jun 2016, 1:44:51 UTC Hmmm, my OSX CUDA times were almost exactly the same as my Ubuntu 14.04 times running similar cards. I just moved the second 950 back to OSX as it would appear the OpenCL testing is over. At the end of the AP testing I noticed the Slow Down was only happening with the similar 750Ti cards. The last AP was running on a 750Ti and the other 750Ti was running a CUDA task slowly. However, the 950 was running the CUDA tasks at the normal speed. The 950 finished a couple tasks while the 750s were fooling around over the last AP. So it would appear it's more complex than first thought. I've heard of similar Slow Downs when running a CUDA and AP task on the same card on that other platform. So, it's not that unusual. Right now I'm testing a 'Baseline' CUDA75 App compiled in Darwin 12.6. As with other Baseline Apps it's quite a bit slower with the VLARs. I'll see how it goes with the -poll command in a little while. Seeing as how the App was compiled in Mountain Lion, it should work in Mavericks and above, it will need the CUDA 7.5 Driver anyway. The CUDA75 'Special' App compiled in Mountain Lion is just as fast as the Special App running in Ubuntu. The first ones are showing about 25 minutes for a BLC6. Not that bad compared to other baseline v8 CUDA results I've seen. I wonder why my "Compiled by TBar" line isn't showing up. It appears in the Special App...oh well. http://setiathome.berkeley.edu/result.php?resultid=4973599683 I wonder what it will do with the -poll command. ID: 1794597 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13797 Credit: 208,696,464 RAC: 304	Message 1794598 - Posted: 9 Jun 2016, 1:47:56 UTC - in response to Message 1794597. I've heard of similar Slow Downs when running a CUDA and AP task on the same card on that other platform. So, it's not that unusual. I've never run AP, however running MB & Guppie on the same GPU results in a big slowdown of the MB processing. Not so much the Guppie. I figure it relates to the kernel runtimes for the Guppies- they take that long so that's how long it takes to do the Guppie. However that extra time spent on the Guppie impacts on the processing time available for the MB WU. Grant Darwin NT ID: 1794598 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1797698 - Posted: 21 Jun 2016, 3:39:35 UTC - in response to Message 1794587. Last modified: 21 Jun 2016, 3:47:45 UTC If you have one card running an OpenCL App, and another running a CUDA App, both tasks will be slowed Waaayyy Down. Different tasks, on different cards, with different applications significantly affect each other? On newer OSX, there appear to be the largest (system/driver-stack) latencies involved that I've come across on the 3 platforms so far. That's going to require scaling up everything to reduce and hide them effectively. Fortunately I may have found some way to get meaningful utilisation data on this platform, where monitoring tools for NV are quite limited (to be confirmed/rejected when I can). Petri's approach with Cuda streams should ultimately have the biggest impact on this platform. I was testing a few things and discovered the older cards can use maxrregcount=48. This appears to give a nice speedup on the VLARs over maxrregcount=32. I built a New CUDA42 App in Lion and in offline testing it worked fine with the GTS250 in Lion. Right now I'm testing it in Yosemite with the GTX950 where it still seems to work fine, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959. If tests work out I'll be replacing the Older CUDA42 and CUDA65 Apps at C.A. with the newer versions that use maxrregcount=48. Seeing as how the new Apps use the exact same code as the older Apps I don't see any trouble. It would be nice if the OSX nVidia LapTops had an App that actually gives the correct results in El Capitan. While in Lion I also compiled a ssse3 CPU App that 'should' work with those LapTops running an i5 & i7 2600 CPU. The current Stock CPU App doesn't work with those LapTops running Lion (Darwin 11.4.2). I don't know why people are sending PMs to users that can do Nothing about SETI sending their machines Apps that don't work, https://setiathome.berkeley.edu/forum_thread.php?id=69782&postid=1797395#1797395 ID: 1797698 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13797 Credit: 208,696,464 RAC: 304	Message 1797700 - Posted: 21 Jun 2016, 3:57:33 UTC - in response to Message 1797698. I don't know why people are sending PMs to users that can do Nothing about SETI sending their machines Apps that don't work, https://setiathome.berkeley.edu/forum_thread.php?id=69782&postid=1797395#1797395 So they can stop crunching till there's an application that works? Grant Darwin NT ID: 1797700 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1797701 - Posted: 21 Jun 2016, 4:02:46 UTC - in response to Message 1797700. I don't know why people are sending PMs to users that can do Nothing about SETI sending their machines Apps that don't work, https://setiathome.berkeley.edu/forum_thread.php?id=69782&postid=1797395#1797395 So they can stop crunching till there's an application that works? How about SETI Stop sending Apps that don't work? Seems to me the the Problem originates with sending Apps that don't work, the User has little control over if what SETI sends works or not. The User is just volunteering his machine and trusting SETI to send them Apps that have been tested to Work. ID: 1797701 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13797 Credit: 208,696,464 RAC: 304	Message 1797710 - Posted: 21 Jun 2016, 4:41:04 UTC - in response to Message 1797701. I don't know why people are sending PMs to users that can do Nothing about SETI sending their machines Apps that don't work, https://setiathome.berkeley.edu/forum_thread.php?id=69782&postid=1797395#1797395 So they can stop crunching till there's an application that works? How about SETI Stop sending Apps that don't work? Seems to me the the Problem originates with sending Apps that don't work, the User has little control over if what SETI sends works or not. The User is just volunteering his machine and trusting SETI to send them Apps that have been tested to Work. And how does Seti know which Apps do & don't work on which hardware & OS? Grant Darwin NT ID: 1797710 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1797715 - Posted: 21 Jun 2016, 5:06:01 UTC - in response to Message 1797710. I don't know why people are sending PMs to users that can do Nothing about SETI sending their machines Apps that don't work, https://setiathome.berkeley.edu/forum_thread.php?id=69782&postid=1797395#1797395 So they can stop crunching till there's an application that works? How about SETI Stop sending Apps that don't work? Seems to me the the Problem originates with sending Apps that don't work, the User has little control over if what SETI sends works or not. The User is just volunteering his machine and trusting SETI to send them Apps that have been tested to Work. And how does Seti know which Apps do & don't work on which hardware & OS? If SETI doesn't bother to check their results they should stop what they are doing immediately. In this particular instance I've personally sent Eric PMs about the CPU Apps...back when I was recommending the AVX & SSE41 Apps for Beta. This User has the exact same Problem and posted about it at Beta back when the App was at Beta, http://setiathome.berkeley.edu/show_host_detail.php?hostid=6463983 In this case SETI is very aware about the Problems with Darwin 11.4.2 with the i7 & i5 2600 class CPUs. The Problem with these machines existed with MBv7 and back then most of them actually worked with the ssse3 App. ID: 1797715 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13797 Credit: 208,696,464 RAC: 304	Message 1797720 - Posted: 21 Jun 2016, 5:25:38 UTC - in response to Message 1797715. Last modified: 21 Jun 2016, 5:28:47 UTC I don't know why people are sending PMs to users that can do Nothing about SETI sending their machines Apps that don't work, https://setiathome.berkeley.edu/forum_thread.php?id=69782&postid=1797395#1797395 So they can stop crunching till there's an application that works? How about SETI Stop sending Apps that don't work? Seems to me the the Problem originates with sending Apps that don't work, the User has little control over if what SETI sends works or not. The User is just volunteering his machine and trusting SETI to send them Apps that have been tested to Work. And how does Seti know which Apps do & don't work on which hardware & OS? If SETI doesn't bother to check their results they should stop what they are doing immediately. So the project needs to implement what I and others have suggested in other threads; above a certain percentage of Invalids or Errors the host needs to be limited to 1WU at a time. In this particular instance I've personally sent Eric PMs about the CPU Apps...back when I was recommending the AVX & SSE41 Apps for Beta. This User has the exact same Problem and posted about it at Beta back when the App was at Beta, http://setiathome.berkeley.edu/show_host_detail.php?hostid=6463983 In this case SETI is very aware about the Problems with Darwin 11.4.2 with the i7 & i5 2600 class CPUs. The Problem with these machines existed with MBv7 and back then most of them actually worked with the ssse3 App. So the project needs to implement blocks on certain hardware with certain drivers & certain Operating Systems? Sounds like it would be better if they stopped supplying stock applications across the board until these issues are resolved. Until then people can install them using the anonymous platform, hopefully taking notice of the readme telling which combinations of hardware/OS/driver are suitable & which aren't. Grant Darwin NT ID: 1797720 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1797721 - Posted: 21 Jun 2016, 5:48:11 UTC - in response to Message 1797720. So the project needs to implement blocks on certain hardware with certain drivers & certain Operating Systems? Sounds like it would be better if they stopped supplying stock applications across the board until these issues are resolved. Until then people can install them using the anonymous platform, hopefully taking notice of the readme telling which combinations of hardware/OS/driver are suitable & which aren't. All I'm saying is you shouldn't blame the Volunteer if his machine is sent work that doesn't actually Work. Most of them probably haven't any idea they are wasting Time & Energy due to being sent faulty work. SETI is going to do exactly as they wish, even if it does result in people wasting Time & Energy. If you want to see the actions you described I'm afraid you're going to have to accomplish them yourself. Good Luck with that. nods head ID: 1797721 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.