I've Built a Couple OSX CUDA Apps...

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 38 · 39 · 40 · 41 · 42 · 43 · 44 . . . 58 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1820341 - Posted: 29 Sep 2016, 0:02:26 UTC

So,
Darwin 16.0 is out...so is CUDA Toolkit 8.0.
Do you see a problem here;
Click on the green buttons that describe your target platform. Only supported platforms will be shown.

Operating System   Windows  Linux

The 8.0 driver appears to be MIA as well.
Best I can tell there is a problem with NVCC and CLANG.
Of course, only nVidia knows for sure...

Not impressed.
ID: 1820341 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1820356 - Posted: 29 Sep 2016, 1:26:11 UTC - in response to Message 1820341.  

So,
Darwin 16.0 is out...so is CUDA Toolkit 8.0.
Do you see a problem here;
Click on the green buttons that describe your target platform. Only supported platforms will be shown.

Operating System   Windows  Linux

The 8.0 driver appears to be MIA as well.
Best I can tell there is a problem with NVCC and CLANG.
Of course, only nVidia knows for sure...

Not impressed.



There have been rumours that the next Macbook Pros, and/or Mac Pros, may include Pascal GPUs, based on certain job openings advertised by nVidia. These were directed at findoign experienced Mac Developers, for participation in Metal and OpenCL development, and driver development.

So to my mind, it amounts to that the Cuda support isn't there yet for OSX, and the underlying infrastructure might be temporarily missing. (for example on Windows Cuda uses DirectX underneath... more direct Means on Linux)

The logical conclusion for me is that Cuda support would need to migrate significant underlying portions, and probably would happen piecemeal, rather than at once.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1820356 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1820839 - Posted: 30 Sep 2016, 14:40:45 UTC - in response to Message 1820356.  

It's here, https://developer.nvidia.com/cuda-downloads.
Says it's for Darwin 15 & 16. I launched the installer in Darwin 14.5 and got all the way to the button to click to install the Driver only without any complaints. Then I booted into Darwin 15.5 and actually installed the driver. I haven't tried it with Darwin 15.6 yet.

I changed out the HDD on the Linux machine and installed the 361.45.18 driver. So far p_zi+ has run about half a day without any stalls. With the same driver p_zi3i would stall/hang about every other task running on the 750Ti cards.
ID: 1820839 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1820847 - Posted: 30 Sep 2016, 15:27:58 UTC - in response to Message 1820839.  
Last modified: 30 Sep 2016, 15:29:46 UTC

In the meantime, interesting experience between local storms with the alpha build on Windows. I had assumed my host/system wasn't enough to feed the GTX980, added some additional syncs and wound back settings, so as to get some usability back.

Tonight I left everything the same but upped the # instances to 2. No additional lag and the Guppi task times stayed roughly the same elapsed (so doubling the throughput)

So, the lag is indeed the Windows driver doing some funky stream fusion, inducing some compound kernels that are too long (as opposed to cpu or bus being hammered as I had thought). That means this weekend will involve adding synclevel and syncrate options, to force break up those pulsefinds. A little polish, and should be testable by others.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1820847 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1820851 - Posted: 30 Sep 2016, 15:52:17 UTC - in response to Message 1820847.  
Last modified: 30 Sep 2016, 15:53:37 UTC

In the meantime, interesting experience between local storms with the alpha build on Windows. I had assumed my host/system wasn't enough to feed the GTX980, added some additional syncs and wound back settings, so as to get some usability back.

Tonight I left everything the same but upped the # instances to 2. No additional lag and the Guppi task times stayed roughly the same elapsed (so doubling the throughput)

So, the lag is indeed the Windows driver doing some funky stream fusion, inducing some compound kernels that are too long (as opposed to cpu or bus being hammered as I had thought). That means this weekend will involve adding synclevel and syncrate options, to force break up those pulsefinds. A little polish, and should be testable by others.


And if you increase unroll from 1 to 16 you'll shorten the kernel runtimes. The kernel work size is the same, but more SMX participate. A 27ms kernel is done in 0.7-4 ms.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1820851 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1820856 - Posted: 30 Sep 2016, 16:02:55 UTC - in response to Message 1820851.  
Last modified: 30 Sep 2016, 16:30:25 UTC

In the meantime, interesting experience between local storms with the alpha build on Windows. I had assumed my host/system wasn't enough to feed the GTX980, added some additional syncs and wound back settings, so as to get some usability back.

Tonight I left everything the same but upped the # instances to 2. No additional lag and the Guppi task times stayed roughly the same elapsed (so doubling the throughput)

So, the lag is indeed the Windows driver doing some funky stream fusion, inducing some compound kernels that are too long (as opposed to cpu or bus being hammered as I had thought). That means this weekend will involve adding synclevel and syncrate options, to force break up those pulsefinds. A little polish, and should be testable by others.


And if you increase unroll from 1 to 16 you'll shorten the kernel runtimes. The kernel work size is the same, but more SMX participate. A 27ms kernel is done in 0.7-4 ms.


Yep, have been poking at various settings over time (including unroll 16). No lag induced by higher unrolls. [unroll is there, but not showing during startup]. Still won't be going for speed here just yet (a few Windows specific quirks to iron out with the shutdown)

Probably while alpha circulates I'll dig out some older self-scaling tests. If I can replace the old clumsy pf settings with self scaling timer based values, one for each fft length, then it should be able to keep the execution time more or less constant, so target nn millseconds, minimising syncs but staying in windows' driver lag-free envelope.

[Edit:] Reenabled that printing of unroll, backed off to single instance, kept low pf settings. let's see what the guppi times drop to.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1820856 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1820868 - Posted: 30 Sep 2016, 16:54:49 UTC
Last modified: 30 Sep 2016, 16:56:03 UTC

ID: 1820868 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1820869 - Posted: 30 Sep 2016, 17:07:36 UTC - in response to Message 1820851.  
Last modified: 30 Sep 2016, 17:08:24 UTC

And if you increase unroll from 1 to 16 you'll shorten the kernel runtimes. The kernel work size is the same, but more SMX participate. A 27ms kernel is done in 0.7-4 ms.


Special speed run just for you ;)
http://setiathome.berkeley.edu/result.php?resultid=5187401114
Well not really a speed run, because I was watching a Paul's Hardware video on youtube, and put the pf settings very low, but should show I have no speed issue (just boincapi wiring to fix up)

I'll just (in the morning) default unroll to the number of sms, tidy up a Windows suspend issue, then throw richard a quick pre-Alpha to check those oddities with.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1820869 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1820870 - Posted: 30 Sep 2016, 17:11:46 UTC - in response to Message 1820868.  
Last modified: 30 Sep 2016, 17:17:05 UTC

test case: http://boinc2.ssl.berkeley.edu/beta/download/13b/27jl16aa.19977.160094.8.42.65

http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2266&postid=59664 and messages around that.


Will run that through 8.00, baseline and zipa2 windows build comparison, since haven't been able to replicate validation issues so far.

[Edit:] task file deleted already, if someone can email that'd be great.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1820870 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1820872 - Posted: 30 Sep 2016, 17:19:53 UTC - in response to Message 1820870.  
Last modified: 30 Sep 2016, 17:24:22 UTC

test case: http://boinc2.ssl.berkeley.edu/beta/download/13b/27jl16aa.19977.160094.8.42.65

http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2266&postid=59664 and messages around that.


Will run that through 8.00, baseline and zipa2 windows build comparison, since haven't been able to replicate validation issues so far.

[Edit:] task file deleted already, if someone can email that'd be great.

I think you have it already - it's the one we all processed to death just over a week ago.

Edit - OK, this week. I sent it to you early (UK time) on Monday 26 Sept, and to Petri a few hours later the same day.
ID: 1820872 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1820874 - Posted: 30 Sep 2016, 17:25:38 UTC - in response to Message 1820872.  
Last modified: 30 Sep 2016, 17:26:49 UTC

test case: http://boinc2.ssl.berkeley.edu/beta/download/13b/27jl16aa.19977.160094.8.42.65

http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2266&postid=59664 and messages around that.


Will run that through 8.00, baseline and zipa2 windows build comparison, since haven't been able to replicate validation issues so far.

[Edit:] task file deleted already, if someone can email that'd be great.

I think you have it already - it's the one we all processed to death just over a week ago.


Oh one of those ---> yeah no disprancies found here, but I'm using Petri's zi+a sources + my very windows specific mods. We'll need to work out something for a direct Linux comparison. ssh tunnel into a terminal perhaps ? will think about options to try. assuming you don't have a Linux install handy (?)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1820874 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1820878 - Posted: 30 Sep 2016, 17:37:22 UTC - in response to Message 1820874.  

test case: http://boinc2.ssl.berkeley.edu/beta/download/13b/27jl16aa.19977.160094.8.42.65

http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2266&postid=59664 and messages around that.


Will run that through 8.00, baseline and zipa2 windows build comparison, since haven't been able to replicate validation issues so far.

[Edit:] task file deleted already, if someone can email that'd be great.

I think you have it already - it's the one we all processed to death just over a week ago.


Oh one of those ---> yeah no disprancies found here, but I'm using Petri's zi+a sources + my very windows specific mods. We'll need to work out something for a direct Linux comparison. ssh tunnel into a terminal perhaps ? will think about options to try. assuming you don't have a Linux install handy (?)

No, I don't have linux to hand.

If you look back through the email thread I copied you into with six similar cases (28 Sept), you'll see that Petri reported bench tests:

I ran a first test against the 8.04 standard result:
./rescmpv5_l testData/ref-result.setiathome_8.04_i686-pc-linux-gnu.27jl16aa.19977.160094.8.42.65.wu.sah testData/result.axo.27jl16aa.19977.160094.8.42.65.wu.sah
Result : Strongly similar, Q= 99.73%

So my current version is OK.

next against Raistmers result:
./rescmpv5_l testData/result-MB8_win_x86_SSE3_OpenCL_NV_SoG_r3528.exe-27jl16aa.19977.160094.8.42.65.wu.res testData/result.axo.27jl16aa.19977.160094.8.42.65.wu.sah
Result : Strongly similar, Q= 99.73%

So OK again.

The implication being - his current source code (which may or may not be the code he was running at Beta) running on his bench machine (which may or may not ditto) isn't throwing that class of error. I see the machine in question has neither fetched nor reported any Beta work since 28 September, so its current status is unknowable except to Petri.
ID: 1820878 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1820881 - Posted: 30 Sep 2016, 17:43:58 UTC - in response to Message 1820878.  
Last modified: 30 Sep 2016, 17:51:46 UTC

Ah I see. Another factor is that 4 x 1080's running that application are likely to be reasonably toasty, so a single task bench run may not necessarily be representative of the conditions running fully loaded.

[Edit:] the temp delta on my 980, zipa2 with conservative settings, is a full 7 degrees C hotter than Cuda50.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1820881 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1820882 - Posted: 30 Sep 2016, 17:49:14 UTC - in response to Message 1820881.  

Ah I see. Another factor is that 4 x 1080's running that application are likely to be reasonably toasty, so a single task bench run may not necessarily be representative of the conditions running fully loaded.

I did consider looking to see if each of the seven identified cases (Raistmer's initial report, and the six I emailed) ran on the same device out of the four. But life intervened.
ID: 1820882 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1820883 - Posted: 30 Sep 2016, 17:51:07 UTC - in response to Message 1820882.  

Ah I see. Another factor is that 4 x 1080's running that application are likely to be reasonably toasty, so a single task bench run may not necessarily be representative of the conditions running fully loaded.

I did consider looking to see if each of the seven identified cases (Raistmer's initial report, and the six I emailed) ran on the same device out of the four. But life intervened.


Yep storms happened here. Oh well. Lots to think about during alpha regarding temps to start with.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1820883 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1820887 - Posted: 30 Sep 2016, 17:58:34 UTC - in response to Message 1820883.  

Ah I see. Another factor is that 4 x 1080's running that application are likely to be reasonably toasty, so a single task bench run may not necessarily be representative of the conditions running fully loaded.

I did consider looking to see if each of the seven identified cases (Raistmer's initial report, and the six I emailed) ran on the same device out of the four. But life intervened.

Yep storms happened here. Oh well. Lots to think about during alpha regarding temps to start with.

Nothing as serious as that, but I will be offline all day tomorrow.

Having temporarily banished real life, I've checked the device numbers. In the order I happened to look them up:

1, 3, 2, 4, 1, 2, 4.

That's about as non-deterministic as we're likely to get.
ID: 1820887 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1820891 - Posted: 30 Sep 2016, 18:09:32 UTC - in response to Message 1820887.  
Last modified: 30 Sep 2016, 18:10:01 UTC

Ah I see. Another factor is that 4 x 1080's running that application are likely to be reasonably toasty, so a single task bench run may not necessarily be representative of the conditions running fully loaded.

I did consider looking to see if each of the seven identified cases (Raistmer's initial report, and the six I emailed) ran on the same device out of the four. But life intervened.

Yep storms happened here. Oh well. Lots to think about during alpha regarding temps to start with.

Nothing as serious as that, but I will be offline all day tomorrow.

Having temporarily banished real life, I've checked the device numbers. In the order I happened to look them up:

1, 3, 2, 4, 1, 2, 4.

That's about as non-deterministic as we're likely to get.


resembles the firing order on my [six cylinder] 1988 Ford Falcon (about to be retired to the scrapyard)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1820891 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1822229 - Posted: 6 Oct 2016, 15:35:06 UTC

With Einstein out of work, I've resumed crunching at SETI Beta. MAC/Hackintosh now crunching CUDA75 there, again. 😃


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1822229 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1822290 - Posted: 6 Oct 2016, 20:10:10 UTC

Already posted this in Beta...

Just Upgraded Andromeda, (MAC OSX El Capitan 10.11.4), to CUDA Driver 8.0.46 from the last 7.x.x CUDA Driver. (7.??.30 can't remember the exact Driver number, now...)

Also, just found out that the MAC Pro 3,1 model is NOT able to Upgrade to Sierra. The 2009 5,1 is the oldest MAC Pro that can Upgrade to Sierra. However; as a Hackintosh, (if I desire), I am able to change my Model to a 5,1; if I choose... However; as my RAM is DDR2, NOT DDR3 which the real 5,1 has, I don't think I will force a Model change to my system. I'd hate to do that and then have it NOT work...

I went to iBuildMACs and checked out the price of a custom 2009 5,1 system. With the specs I want, I CAN get it for just above $1,500.00. Don't have that right now; however, so I will start saving and see what happens in about a year.


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1822290 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1822391 - Posted: 7 Oct 2016, 3:51:01 UTC - in response to Message 1822290.  
Last modified: 7 Oct 2016, 3:53:41 UTC

...
I went to iBuildMACs and checked out the price of a custom 2009 5,1 system. With the specs I want, I CAN get it for just above $1,500.00. Don't have that right now; however, so I will start saving and see what happens in about a year.


TL


I got my genuine 2009 Mac Pro, which was already updated to 5,1 spec and upgraded to 3.46 GHz CPUs and 32 GiG ECC memory, for AUS$1800 on eBay ( ~US$1360 )

By my understanding, you should be able to get a much better deal in the US, due to more of those around.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1822391 · Report as offensive
Previous · 1 . . . 38 · 39 · 40 · 41 · 42 · 43 · 44 . . . 58 · Next

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.