I've Built a Couple OSX CUDA Apps...

Author	Message
TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1750211 - Posted: 18 Dec 2015, 21:13:22 UTC Last modified: 18 Dec 2015, 21:14:13 UTC YES! First test with the new code from a few hours ago allows the CUDA task to start normally after both cards have been running APs. Nice. Thanks Petri. Now to figure out why all three Apps run around 3 times as slow when the other card is running an AP. Also need to figure out what Jason has done... ID: 1750211 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1750247 - Posted: 18 Dec 2015, 23:21:42 UTC - in response to Message 1750211. YES! First test with the new code from a few hours ago allows the CUDA task to start normally after both cards have been running APs. Nice. Thanks Petri. Now to figure out why all three Apps run around 3 times as slow when the other card is running an AP. Also need to figure out what Jason has done... I'm sure Jason has worked his magick. Glad it helpd I could reduce the memory need by about 800 Mb. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1750247 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1750276 - Posted: 19 Dec 2015, 1:19:22 UTC - in response to Message 1750247. Last modified: 19 Dec 2015, 1:26:25 UTC I'm sure Jason has worked his magick. Glad it helpd I could reduce the memory need by about 800 Mb. Still very much in cleanup mode, though looks like first validations went through. Will take it easy until later due to heat today, but will check Windows builds[ making include file fixes as needs be] & probably try get a Linux build going under the new build system. Compiler flags, then v8 mods after that. Generic-ised hand optimisation and gradle automation while v8 mods percolate through testing by the team. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1750276 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1750533 - Posted: 20 Dec 2015, 8:56:32 UTC Windows build needed only a single include re-enabled, that the Mac didn't like. 2 platforms down (out of 3 for now) Onto checking/tweaking the Linux build tonight, then v8 updates here we come! (Eric appears to be getting closer at beta with the CPU build) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1750533 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1751638 - Posted: 25 Dec 2015, 15:26:12 UTC Last modified: 25 Dec 2015, 15:29:09 UTC Third platform, quick and dirty MB Cuda linux build operational, no extreme build system difficulties, though the deprecations from Cuda65 on are going to make things tricky there too (as on Win and Mac). As per log, it's baseline for (so nothing special), and will just hold until v8 updates and some optimisations go in. Since all three platforms running here now, then are no more blockages to v8 updates, build system cleanup/change as things go, then back to optimisation proper (Finally!) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1751638 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1751656 - Posted: 25 Dec 2015, 17:19:34 UTC - in response to Message 1751638. Greetings Jason. Would you please look into compiling a v8 OSX CPU App from the seti_boinc/client folder. The one currently on beta is working very slowly on My Mac and I can't get a compile to work. If I try it with the graphics I get a number of Errors and without graphics it doesn't seem to be building the boinc libs. I was able to compile a CPU app from the sah_v7_opt folder but it fails on launch. The thread is here, http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2266 ID: 1751656 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1751657 - Posted: 25 Dec 2015, 17:29:19 UTC - in response to Message 1751656. Last modified: 25 Dec 2015, 17:31:08 UTC Will have to look at that indeed, since that branch holds the major changes I'll have to port to XBranch, hopefully over the coming week. One Caveat is that I am still seeing notable commits to various bits of multibeam (v8 for beta), So expect slowness (debugging) by default. In that context optimisation and speed is not necessarily appropriate. Looks like some commits by Charlie Fenton with respect to XCode projects, so could be worth investigating if that works better for you in the meantime. Basically the gnu autotools buildsystem from old to new OSX (mostly libtool) appears to me to have a number of apparent breakages, which would explain complexity there as well as my own juggling. Slightly longer term I can't promise to convince every party to use a most common and simple build-system, though I suspect If I put a unified flat Make alongside the existing system, then Eric and others might not object. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1751657 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1751732 - Posted: 26 Dec 2015, 5:11:34 UTC - in response to Message 1751657. Last modified: 26 Dec 2015, 5:11:52 UTC Note that it will be worth to look if you get Arecibo or GBT tasks, since by my very rudimentary understanding of the differences, GBT are bigger, and Very low angle range (targeted) by comparison to familiar mid to high Angle range tasks. Could take some time to shake out what works and what doesn't, as well as expected performance. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1751732 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1752235 - Posted: 29 Dec 2015, 9:25:12 UTC Last modified: 29 Dec 2015, 9:26:52 UTC A few days later, and inconclusive to pending ratios on all three platforms appear nominal (<5%). This indicates, despite excessive heat, the machines with the sanity check builds appear to function as expected, as well as apparent overall project health under MB v7 (all CPU+GPU) seems to be in the usual steady state. Waiting on a RAID rebuild+verify, after a drive failed (probably due to the heat mentioned) then onto the fairly straightforward v8 modifications. Probably going to have to think about adding some kindof throttle, before some of Petris and other optimisations become stock. Cooking myself and dog while we sleep is looking like a distinct possibility, lol. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1752235 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1752591 - Posted: 31 Dec 2015, 2:47:21 UTC Last modified: 31 Dec 2015, 3:42:42 UTC Well, I'm going to Declare Success on the opening round of SETIv8 CPU trials & tribulations. I would like to thank whoever is responsible for setting up the opportunity for me to break my old record of merely Twice as Fast. I didn't think exceeding that feat would be possible. The New v8 CPU App is... FOUR TIMES as Fast as the current OSX CPU App on Beta. On My machine anyway. Some comparisons from http://setiweb.ssl.berkeley.edu/beta/host_app_versions.php?hostid=63959 SETI@home v7 7.00 i686-apple-darwin Average processing rate: 12.75 GFLOPS SETI@home v7 7.07 x86_64-apple-darwin (sse41) Average processing rate: 23.66 GFLOPS SETI@home v8 8.00 x86_64-apple-darwin Average processing rate: 5.61 GFLOPS SETI@home v8 (anonymous platform, CPU) Average processing rate: 22.29 GFLOPS Or in other terms; SETI@home v8 8.00 x86_64-apple-darwin Run time: 3 hours 41 min 7 sec CPU time: 3 hours 40 min 45 sec WU true angle range is : 2.729899 MBv8_8.0r3299_sse41_x86_64-apple-darwin Run time: 52 min 9 sec CPU time: 51 min 40 sec WU true angle range is : 2.596247 If you don't like those generalized times there are more here, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959 Up to Four Times as Fastâ„¢ ðŸ˜Ž ID: 1752591 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1752595 - Posted: 31 Dec 2015, 3:17:52 UTC - in response to Message 1752591. Congratulations!! ID: 1752595 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1752600 - Posted: 31 Dec 2015, 3:49:00 UTC - in response to Message 1752595. Much Obliged. Now if we could just get Petri's CUDA code transferred over to v8. I'm kinda getting use to my two 750Ti preforming as if they were 780s. ID: 1752600 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1752604 - Posted: 31 Dec 2015, 4:23:40 UTC - in response to Message 1752600. Last modified: 31 Dec 2015, 4:33:49 UTC Base v8 compatibility, then it's open season on that for sure ;) Yeah Eric's still tweaking v8 CPU, which is the baseline for changes. Unsure when newest builds go live on Beta [or here directly perhaps...]. Doesn't stop me making the start with Cuda now the 3 platforms build (once it cools down again), though expect changes if problems turn up in beta. Fingers crossed I come across what's causing the validation issues with Petri's builds, as we go. From investigation here before, a couple of months back with Windows builds, it looked like it's confined to the PoT analysis (gaussians and/or pulses), so fingers crossed nothing huge. For Petri's additions, Won't be able to run streams or large memory for the oldest GPUs+Cuda-versions supported, so will have to spend some time adding detection and option logic. Not sure yet where a reasonable defaults line will be for general distribution to avoid cooking cards/systems, Will cross that bridge when we come to it. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1752604 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1754124 - Posted: 5 Jan 2016, 11:54:47 UTC The New Mac Petri Version 8 CUDA App is reality. Not only is it just as fast as the Version 7 creation, it appears the number of Inconclusive results has been drastically reduced, http://setiathome.berkeley.edu/results.php?hostid=6796479&offset=180&appid=29 I have also compiled a 'Stock' version from r3312 using ToolKit 6.5 that should work on Compute Code 2.0 and above cards. I'll be switching to that version shortly. A new Version 8 CPU App was created and appears to be about as fast as the old v7 SSE41 version. The current list of Version 8 Mac Apps; MBv8_8.0r3300_clGPU_sse41_x86_64-apple-darwin MBv8_8.0r3301_avx_x86_64-apple-darwin MBv8_8.0r3304_sse41_x86_64-apple-darwin MBv8_8.0r3305_ati5_ssse3_x86_64-apple-darwin MBv8_8.0r3306_nvidia_ssse3_x86_64-apple-darwin setiathome_x41zc_x86_64-apple-darwin_cuda65_Petri setiathome_x41zc_x86_64-apple-darwin_cuda65_Stock Getting to be quite a list... ID: 1754124 ·

William Volunteer tester Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0	Message 1754126 - Posted: 5 Jan 2016, 12:15:40 UTC you may want to make sure it's x41zf and call it accordingly. A person who won't read has no advantage over one who can't read. (Mark Twain) ID: 1754126 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1754246 - Posted: 6 Jan 2016, 2:03:56 UTC I have posted a few of the Apps that have passed early testing. The packages are self contained and just need to be extracted and placed in the /Library/Application Support/BOINC Data/projects/setiathome.berkeley.edu folder. The Permissions will have to be set after installing the Files, the easiest way is to simply reinstall BOINC. Reinstalling BOINC will set the File Permissions. To revert to Stock simply remove the files from the setiathome.berkeley.edu folder. SETIv8 OSX Apps ID: 1754246 ·

Tom Rinehart Volunteer tester Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6	Message 1754275 - Posted: 6 Jan 2016, 5:41:40 UTC - in response to Message 1754246. TBar - Thanks for making these available. So far I've been able to test these on my: iMac (21.5-inch, Mid 2010) Processor 3.06 GHz Intel Core i3 Memory 16 GB 1333 MHz DDR3 Graphics ATI Radeon HD 4670 256 MB The MBv8_8.0r3304_sse41_x86_64-apple-darwin seems to work great and is much faster than the stock CPU app. The MBv8_8.0r3300_clGPU_sse41_x86_64-apple-darwin runs until it gets to about 2% and then fails with a computation error. I wonder if it is a GPU memory problem with mine not having enough. These are the results: http://setiathome.berkeley.edu/result.php?resultid=4650994079 http://setiathome.berkeley.edu/result.php?resultid=4650994066 http://setiathome.berkeley.edu/result.php?resultid=4650994065 http://setiathome.berkeley.edu/result.php?resultid=4650994036 http://setiathome.berkeley.edu/result.php?resultid=4650994035 http://setiathome.berkeley.edu/result.php?resultid=4650993990 To fix the permissions I used: cd "/Library/Application Support/BOINC Data/projects/setiathome.berkeley.edu" sudo chown boinc_master:boinc_project . sudo chmod +r MultiBeam_Kernels_r3300.cl I had to add the last command because when I first tried to run the GPU app it said Postponed - canâ€™t read CL file. I will try these on my 27" iMac tomorrow after it is done with its current work. ID: 1754275 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1754279 - Posted: 6 Jan 2016, 6:06:48 UTC - in response to Message 1754275. Looks like the same error as before on the HD4 card. Probably not going to work on those cards. It worked well on my GTX 750Ti and ATI HD6870, you can see the 750 results here; http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959&offset=100 One thing you can try is open the mb_cmdline_mac_OpenCL_sah.txt file and change the settings to; -sbs 64 -oclfft_tune_gr 64 -oclfft_tune_wg 64 -period_iterations_num 96 That should reduce the memory load and work better on the older cards. You could try different settings and maybe -oclfft_tune_wg 32 as the results say it can't use over wg size 32. I'm not sure how low you can set those settings without getting even more errors. As before, you should try removing the generated files in between restarts to see if it will run at all. You will have to set the permissions again after editing the mb_cmdline_mac_OpenCL_sah.txt file. Good Luck. ID: 1754279 ·

Tom Rinehart Volunteer tester Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6	Message 1754395 - Posted: 6 Jan 2016, 18:01:26 UTC - in response to Message 1754279. I edited mb_cmdline_mac_OpenCL_sah.txt to: -sbs 64 -oclfft_tune_gr 64 -oclfft_tune_wg 32 -period_iterations_num 96 It is now running, but it is slow. It is saying it will take over a day to complete. ID: 1754395 ·

Tom Rinehart Volunteer tester Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6	Message 1754398 - Posted: 6 Jan 2016, 18:08:12 UTC - in response to Message 1754246. I was able to run MBv8_8.0r3300_clGPU_sse41_x86_64-apple-darwin on my 27" iMac. The computer is: iMac (27-inch, Late 2009) Processor 2.8 GHz Intel Core i7 Memory 8 GB 1067 MHz DDR3 Graphics ATI Radeon HD 4850 512 MB I did not edit mb_cmdline_mac_OpenCL_sah.txt. I just ran it. The first WU worked: http://setiathome.berkeley.edu/result.php?resultid=4651566055 The second did not: http://setiathome.berkeley.edu/result.php?resultid=4651565303 I will try deleting the files and see if it works again (old problem issues?). ID: 1754398 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.