I've Built a Couple OSX CUDA Apps...

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 58 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1750211 - Posted: 18 Dec 2015, 21:13:22 UTC
Last modified: 18 Dec 2015, 21:14:13 UTC

YES!
First test with the new code from a few hours ago allows the CUDA task to start normally after both cards have been running APs.
Nice. Thanks Petri.



Now to figure out why all three Apps run around 3 times as slow when the other card is running an AP.
Also need to figure out what Jason has done...
ID: 1750211 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1750247 - Posted: 18 Dec 2015, 23:21:42 UTC - in response to Message 1750211.  

YES!
First test with the new code from a few hours ago allows the CUDA task to start normally after both cards have been running APs.
Nice. Thanks Petri.



Now to figure out why all three Apps run around 3 times as slow when the other card is running an AP.
Also need to figure out what Jason has done...



I'm sure Jason has worked his magick. Glad it helpd I could reduce the memory need by about 800 Mb.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1750247 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1750276 - Posted: 19 Dec 2015, 1:19:22 UTC - in response to Message 1750247.  
Last modified: 19 Dec 2015, 1:26:25 UTC

I'm sure Jason has worked his magick. Glad it helpd I could reduce the memory need by about 800 Mb.


Still very much in cleanup mode, though looks like first validations went through. Will take it easy until later due to heat today, but will check Windows builds[ making include file fixes as needs be] & probably try get a Linux build going under the new build system. Compiler flags, then v8 mods after that. Generic-ised hand optimisation and gradle automation while v8 mods percolate through testing by the team.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1750276 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1750533 - Posted: 20 Dec 2015, 8:56:32 UTC

Windows build needed only a single include re-enabled, that the Mac didn't like.
2 platforms down (out of 3 for now)
Onto checking/tweaking the Linux build tonight, then v8 updates here we come! (Eric appears to be getting closer at beta with the CPU build)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1750533 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1751638 - Posted: 25 Dec 2015, 15:26:12 UTC
Last modified: 25 Dec 2015, 15:29:09 UTC

Third platform, quick and dirty MB Cuda linux build operational, no extreme build system difficulties, though the deprecations from Cuda65 on are going to make things tricky there too (as on Win and Mac).

As per log, it's baseline for (so nothing special), and will just hold until v8 updates and some optimisations go in. Since all three platforms running here now, then are no more blockages to v8 updates, build system cleanup/change as things go, then back to optimisation proper (Finally!)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1751638 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1751656 - Posted: 25 Dec 2015, 17:19:34 UTC - in response to Message 1751638.  

Greetings Jason. Would you please look into compiling a v8 OSX CPU App from the seti_boinc/client folder. The one currently on beta is working very slowly on My Mac and I can't get a compile to work. If I try it with the graphics I get a number of Errors and without graphics it doesn't seem to be building the boinc libs. I was able to compile a CPU app from the sah_v7_opt folder but it fails on launch. The thread is here, http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2266
ID: 1751656 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1751657 - Posted: 25 Dec 2015, 17:29:19 UTC - in response to Message 1751656.  
Last modified: 25 Dec 2015, 17:31:08 UTC

Will have to look at that indeed, since that branch holds the major changes I'll have to port to XBranch, hopefully over the coming week.

One Caveat is that I am still seeing notable commits to various bits of multibeam (v8 for beta), So expect slowness (debugging) by default. In that context optimisation and speed is not necessarily appropriate. Looks like some commits by Charlie Fenton with respect to XCode projects, so could be worth investigating if that works better for you in the meantime. Basically the gnu autotools buildsystem from old to new OSX (mostly libtool) appears to me to have a number of apparent breakages, which would explain complexity there as well as my own juggling.

Slightly longer term I can't promise to convince every party to use a most common and simple build-system, though I suspect If I put a unified flat Make alongside the existing system, then Eric and others might not object.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1751657 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1751732 - Posted: 26 Dec 2015, 5:11:34 UTC - in response to Message 1751657.  
Last modified: 26 Dec 2015, 5:11:52 UTC

Note that it will be worth to look if you get Arecibo or GBT tasks, since by my very rudimentary understanding of the differences, GBT are bigger, and Very low angle range (targeted) by comparison to familiar mid to high Angle range tasks. Could take some time to shake out what works and what doesn't, as well as expected performance.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1751732 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1752235 - Posted: 29 Dec 2015, 9:25:12 UTC
Last modified: 29 Dec 2015, 9:26:52 UTC

A few days later, and inconclusive to pending ratios on all three platforms appear nominal (<5%). This indicates, despite excessive heat, the machines with the sanity check builds appear to function as expected, as well as apparent overall project health under MB v7 (all CPU+GPU) seems to be in the usual steady state.

Waiting on a RAID rebuild+verify, after a drive failed (probably due to the heat mentioned) then onto the fairly straightforward v8 modifications.

Probably going to have to think about adding some kindof throttle, before some of Petris and other optimisations become stock. Cooking myself and dog while we sleep is looking like a distinct possibility, lol.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1752235 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1752591 - Posted: 31 Dec 2015, 2:47:21 UTC
Last modified: 31 Dec 2015, 3:42:42 UTC

Well, I'm going to Declare Success on the opening round of SETIv8 CPU trials & tribulations.
I would like to thank whoever is responsible for setting up the opportunity for me to break my old record of merely Twice as Fast. I didn't think exceeding that feat would be possible. The New v8 CPU App is... FOUR TIMES as Fast as the current OSX CPU App on Beta. On My machine anyway.
Some comparisons from http://setiweb.ssl.berkeley.edu/beta/host_app_versions.php?hostid=63959

SETI@home v7 7.00 i686-apple-darwin
Average processing rate: 12.75 GFLOPS

SETI@home v7 7.07 x86_64-apple-darwin (sse41)
Average processing rate: 23.66 GFLOPS

SETI@home v8 8.00 x86_64-apple-darwin
Average processing rate: 5.61 GFLOPS

SETI@home v8 (anonymous platform, CPU)
Average processing rate: 22.29 GFLOPS

Or in other terms;
SETI@home v8 8.00 x86_64-apple-darwin
Run time: 3 hours 41 min 7 sec
CPU time: 3 hours 40 min 45 sec
WU true angle range is : 2.729899

MBv8_8.0r3299_sse41_x86_64-apple-darwin
Run time: 52 min 9 sec
CPU time: 51 min 40 sec
WU true angle range is : 2.596247
If you don't like those generalized times there are more here, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959

*Up to Four Times as Fast*â„¢
😎
ID: 1752591 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1752595 - Posted: 31 Dec 2015, 3:17:52 UTC - in response to Message 1752591.  

Congratulations!!
ID: 1752595 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1752600 - Posted: 31 Dec 2015, 3:49:00 UTC - in response to Message 1752595.  

Much Obliged.
Now if we could just get Petri's CUDA code transferred over to v8.
I'm kinda getting use to my two 750Ti preforming as if they were 780s.
ID: 1752600 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1752604 - Posted: 31 Dec 2015, 4:23:40 UTC - in response to Message 1752600.  
Last modified: 31 Dec 2015, 4:33:49 UTC

Base v8 compatibility, then it's open season on that for sure ;)

Yeah Eric's still tweaking v8 CPU, which is the baseline for changes. Unsure when newest builds go live on Beta [or here directly perhaps...]. Doesn't stop me making the start with Cuda now the 3 platforms build (once it cools down again), though expect changes if problems turn up in beta.

Fingers crossed I come across what's causing the validation issues with Petri's builds, as we go. From investigation here before, a couple of months back with Windows builds, it looked like it's confined to the PoT analysis (gaussians and/or pulses), so fingers crossed nothing huge.

For Petri's additions, Won't be able to run streams or large memory for the oldest GPUs+Cuda-versions supported, so will have to spend some time adding detection and option logic. Not sure yet where a reasonable defaults line will be for general distribution to avoid cooking cards/systems, Will cross that bridge when we come to it.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1752604 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1754124 - Posted: 5 Jan 2016, 11:54:47 UTC

The New Mac Petri Version 8 CUDA App is reality. Not only is it just as fast as the Version 7 creation, it appears the number of
Inconclusive results has been drastically reduced, http://setiathome.berkeley.edu/results.php?hostid=6796479&offset=180&appid=29
I have also compiled a 'Stock' version from r3312 using ToolKit 6.5 that should work on Compute Code 2.0 and above cards. I'll be
switching to that version shortly. A new Version 8 CPU App was created and appears to be about as fast as the old v7 SSE41 version.
The current list of Version 8 Mac Apps;
MBv8_8.0r3300_clGPU_sse41_x86_64-apple-darwin
MBv8_8.0r3301_avx_x86_64-apple-darwin
MBv8_8.0r3304_sse41_x86_64-apple-darwin
MBv8_8.0r3305_ati5_ssse3_x86_64-apple-darwin
MBv8_8.0r3306_nvidia_ssse3_x86_64-apple-darwin
setiathome_x41zc_x86_64-apple-darwin_cuda65_Petri
setiathome_x41zc_x86_64-apple-darwin_cuda65_Stock

Getting to be quite a list...
ID: 1754124 · Report as offensive
Profile William
Volunteer tester
Avatar

Send message
Joined: 14 Feb 13
Posts: 2037
Credit: 17,689,662
RAC: 0
Message 1754126 - Posted: 5 Jan 2016, 12:15:40 UTC

you may want to make sure it's x41zf and call it accordingly.
A person who won't read has no advantage over one who can't read. (Mark Twain)
ID: 1754126 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1754246 - Posted: 6 Jan 2016, 2:03:56 UTC

I have posted a few of the Apps that have passed early testing. The packages are self contained and just need to be extracted and placed in the /Library/Application Support/BOINC Data/projects/setiathome.berkeley.edu folder. The Permissions will have to be set after installing the Files, the easiest way is to simply reinstall BOINC. Reinstalling BOINC will set the File Permissions. To revert to Stock simply remove the files from the setiathome.berkeley.edu folder.
SETIv8 OSX Apps
ID: 1754246 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1754275 - Posted: 6 Jan 2016, 5:41:40 UTC - in response to Message 1754246.  

TBar -

Thanks for making these available. So far I've been able to test these on my:

iMac (21.5-inch, Mid 2010)
Processor 3.06 GHz Intel Core i3
Memory 16 GB 1333 MHz DDR3
Graphics ATI Radeon HD 4670 256 MB

The MBv8_8.0r3304_sse41_x86_64-apple-darwin seems to work great and is much faster than the stock CPU app.

The MBv8_8.0r3300_clGPU_sse41_x86_64-apple-darwin runs until it gets to about 2% and then fails with a computation error. I wonder if it is a GPU memory problem with mine not having enough. These are the results:

http://setiathome.berkeley.edu/result.php?resultid=4650994079
http://setiathome.berkeley.edu/result.php?resultid=4650994066
http://setiathome.berkeley.edu/result.php?resultid=4650994065
http://setiathome.berkeley.edu/result.php?resultid=4650994036
http://setiathome.berkeley.edu/result.php?resultid=4650994035
http://setiathome.berkeley.edu/result.php?resultid=4650993990

To fix the permissions I used:

cd "/Library/Application Support/BOINC Data/projects/setiathome.berkeley.edu"
sudo chown boinc_master:boinc_project *.*
sudo chmod +r MultiBeam_Kernels_r3300.cl

I had to add the last command because when I first tried to run the GPU app it said Postponed - can’t read CL file.

I will try these on my 27" iMac tomorrow after it is done with its current work.
ID: 1754275 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1754279 - Posted: 6 Jan 2016, 6:06:48 UTC - in response to Message 1754275.  

Looks like the same error as before on the HD4 card. Probably not going to work on those cards. It worked well on my GTX 750Ti and ATI HD6870, you can see the 750 results here; http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959&offset=100
One thing you can try is open the mb_cmdline_mac_OpenCL_sah.txt file and change the settings to;
-sbs 64 -oclfft_tune_gr 64 -oclfft_tune_wg 64 -period_iterations_num 96
That should reduce the memory load and work better on the older cards.
You could try different settings and maybe -oclfft_tune_wg 32 as the results say it can't use over wg size 32. I'm not sure how low you can set those settings without getting even more errors. As before, you should try removing the generated files in between restarts to see if it will run at all. You will have to set the permissions again after editing the mb_cmdline_mac_OpenCL_sah.txt file.
Good Luck.
ID: 1754279 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1754395 - Posted: 6 Jan 2016, 18:01:26 UTC - in response to Message 1754279.  

I edited mb_cmdline_mac_OpenCL_sah.txt to:

-sbs 64 -oclfft_tune_gr 64 -oclfft_tune_wg 32 -period_iterations_num 96

It is now running, but it is slow. It is saying it will take over a day to complete.
ID: 1754395 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1754398 - Posted: 6 Jan 2016, 18:08:12 UTC - in response to Message 1754246.  

I was able to run MBv8_8.0r3300_clGPU_sse41_x86_64-apple-darwin on my 27" iMac. The computer is:

iMac (27-inch, Late 2009)
Processor 2.8 GHz Intel Core i7
Memory 8 GB 1067 MHz DDR3
Graphics ATI Radeon HD 4850 512 MB

I did not edit mb_cmdline_mac_OpenCL_sah.txt. I just ran it. The first WU worked:

http://setiathome.berkeley.edu/result.php?resultid=4651566055

The second did not:

http://setiathome.berkeley.edu/result.php?resultid=4651565303

I will try deleting the files and see if it works again (old problem issues?).
ID: 1754398 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 58 · Next

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.