Mac, NVidia GPU and Seti@home

Message boards : Number crunching : Mac, NVidia GPU and Seti@home
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1407284 - Posted: 24 Aug 2013, 12:55:50 UTC - in response to Message 1407277.  
Last modified: 24 Aug 2013, 12:56:54 UTC

Had similar problem. I fixed by setting:

export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-5.0/lib:$DYLD_LIBRARY_PATH

in .bash_profile. You'll need to check library is in same place. Still looking at details of how library is provided as part of app (Einstein appears to use this solution) or a better way to ensure library found.



The Linux equivalent was what I had to manipulate to get what I regardas a bitcloser to 'sane'.

In that case it was ldconfig exports.

If it works the same way as on Linux (but named differently), you might want to set rpath to include origin, then he exe will look for the runtime & cufft shared library in the same place as the executabe when run.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1407284 · Report as offensive
enewman
Volunteer tester

Send message
Joined: 27 Jun 01
Posts: 15
Credit: 6,344,951
RAC: 0
United States
Message 1407285 - Posted: 24 Aug 2013, 12:56:15 UTC - in response to Message 1407281.  

I agree as this is not supportable for all BOINC/Seti users. Just a way that I got my machine up and running to test.
ID: 1407285 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1407287 - Posted: 24 Aug 2013, 13:19:08 UTC - in response to Message 1407284.  
Last modified: 24 Aug 2013, 14:12:47 UTC

Had similar problem. I fixed by setting:

export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-5.0/lib:$DYLD_LIBRARY_PATH

in .bash_profile. You'll need to check library is in same place. Still looking at details of how library is provided as part of app (Einstein appears to use this solution) or a better way to ensure library found.



The Linux equivalent was what I had to manipulate to get what I regardas a bitcloser to 'sane'.

In that case it was ldconfig exports.

If it works the same way as on Linux (but named differently), you might want to set rpath to include origin, then he exe will look for the runtime & cufft shared library in the same place as the executabe when run.

Would it work if I just copied the libcuda.dylib file to the project folder? Or linked to it? It appears to be in usr/local/cuda/lib

Nope, neither one of those works. Oh well, I guess when the App is finally released I might try it again...

This appears to sum it up,
Charlie Fenton | 9 Jan 2010 09:16
Picon
Re: MAC OS X CUDA App in BOINC

The standard CUDA driver installer for the Mac installs only one library:
/usr/local/cuda/lib/libcuda.dylib
so applications will almost certainly need to send along any other
libraries they need. The beta Einstein application sends
libcudart.dylib, libcufft.dylib and libtlshook.dylib along with the
science app...

http://permalink.gmane.org/gmane.comp.distributed.boinc.devel/2662

As far as I can tell, I DO NOT have libcudart.dylib on my machine. Maybe someone can post the required files in the thread at Arkayns' site?

    Sat Aug 24 08:18:17 2013 | | OS: Mac OS X 10.8.4 (Darwin 12.4.0)
    Sat Aug 24 08:18:17 2013 | | CUDA: NVIDIA GPU 0: GeForce GTS 250 (driver version 5.5.25, CUDA version 5.50, compute capability 1.1, 1024MB, 951MB available, 705 GFLOPS peak)
    Sat Aug 24 08:18:17 2013 | | OpenCL: NVIDIA GPU 0: GeForce GTS 250 (driver version 8.14.11 313.01.02f01, device version OpenCL 1.0, 1024MB, 951MB available, 705 GFLOPS peak)
    Sat Aug 24 08:18:17 2013 | | OpenCL: AMD/ATI GPU 0: ATI Radeon Barts PRO Prototype (driver version 1.0, device version OpenCL 1.1, 1024MB, 1024MB available, 868 GFLOPS peak)

ID: 1407287 · Report as offensive
enewman
Volunteer tester

Send message
Joined: 27 Jun 01
Posts: 15
Credit: 6,344,951
RAC: 0
United States
Message 1407329 - Posted: 24 Aug 2013, 16:01:31 UTC - in response to Message 1407287.  

Believe standard location for these files is /usr/local/cuda/lib/ and these (on my machine with SDK installed) are sym-linked to /Developer/NVIDIA/CUDA-5.5/lib equivalents
ID: 1407329 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1407336 - Posted: 24 Aug 2013, 16:19:30 UTC - in response to Message 1407329.  
Last modified: 24 Aug 2013, 16:49:55 UTC

Believe standard location for these files is /usr/local/cuda/lib/ and these (on my machine with SDK installed) are sym-linked to /Developer/NVIDIA/CUDA-5.5/lib equivalents

It appears Charlie is correct. The only library installed by the CUDA driver is /usr/local/cuda/lib/libcuda.dylib. You have to download and install the CUDA Toolkit to obtain the libcudart.dylib. The Toolkit is a 790MB download. It looks as though it wants over a GB of space to Install. I wasn't expecting, or desiring, to install something taking up over a GB of space. I have extracted libcudart.dylib, and libcufft.dylib to the project folder and run 'export DYLD_LIBRARY_PATH=/Library/Application\ Support/BOINC\ Data/projects/setiathome.berkeley.edu:$DYLD_LIBRARY_PATH'. That doesn't appear to work either. It would be nice to have this work without having to install over a GB of Files.

It Appears this may work;
Download the ToolKit: https://developer.nvidia.com/cuda-downloads
Extract: libcudart5.5.dylib & libcufft5.5.dylib to /usr/local/cuda/lib
Change names to: libcudart.dylib & libcufft.dylib
Run: export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PATH

The first one is finally running.
ID: 1407336 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1407372 - Posted: 24 Aug 2013, 17:46:53 UTC - in response to Message 1407336.  

Well, the first one finished. It was estimated at 1:40, The next one is estimated at :33...

At least it seems to work; Task 3129070998

The same card was just in Linux & XP;
http://setiathome.berkeley.edu/results.php?hostid=6864181
http://setiathome.berkeley.edu/results.php?hostid=6979629
ID: 1407372 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1407595 - Posted: 25 Aug 2013, 12:50:44 UTC - in response to Message 1407372.  

So... How long do I need to run the new CUDA App to be considered a Success? It's been running almost a day now without any Errors. It's not exactly a 'real world' test though, considering I'm completely out of work for my other devices. It would be more realistic if I could obtain, say, a few dozen APs to heat up my ATI card and CPUs.

That way I could test the new CUDA & ATI Apps in a more realistic environment :-)
ID: 1407595 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1407631 - Posted: 25 Aug 2013, 16:16:33 UTC - in response to Message 1407624.  

...New APs are being split now. Go get them while you can.

Quiet now....those are mine!
ID: 1407631 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1407712 - Posted: 25 Aug 2013, 21:56:27 UTC - in response to Message 1407595.  
Last modified: 25 Aug 2013, 21:58:32 UTC

So... How long do I need to run the new CUDA App to be considered a Success? It's been running almost a day now without any Errors. It's not exactly a 'real world' test though, considering I'm completely out of work for my other devices. It would be more realistic if I could obtain, say, a few dozen APs to heat up my ATI card and CPUs.

That way I could test the new CUDA & ATI Apps in a more realistic environment :-)


A fair bit when you look at the nuts & bolts.

Basically, x41zc ported over 'relatively' painlessly due to Aaron's tweaks already being in there for Linux, then some Mac specific patch & juggling (according to Ed), but for stock distribution quite a bit more is needed than a solitary binary.

What isn't ready ( Both the Mac & Linux variants ), for public consumption as stock, are a number of things including:
- rationalised/updated build system (requiring a user have cuda toolkit installed is not usually practical, or necessary, under stock distribution). This would include ensuring minimal dependencies etc.
- some of the robust fail-safe error handling isn't active, as it is on Windows (deals with failures like Ed's been seeing better). Some involves a minor patch to the Boincapi used, and also handles driver or hardware problems more gracefully.
- Clarified system requirements & updated documentation.
- Probably more I forgot about off the top of my head.

So in a nutshell, bringing the Mac & Linux builds entirely into line with the multiple Windows flavours, so progress in the next optimisation phase can proceed more or less in lock-step.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1407712 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1407718 - Posted: 25 Aug 2013, 22:39:33 UTC - in response to Message 1407712.  
Last modified: 25 Aug 2013, 22:50:41 UTC

...What isn't ready ( Both the Mac & Linux variants ), for public consumption as stock, are a number of things including:
- rationalised/updated build system (requiring a user have cuda toolkit installed is not usually practical, or necessary, under stock distribution). This would include ensuring minimal dependencies etc.
- some of the robust fail-safe error handling isn't active, as it is on Windows (deals with failures like Ed's been seeing better). Some involves a minor patch to the Boincapi used, and also handles driver or hardware problems more gracefully.
- Clarified system requirements & updated documentation.
- Probably more I forgot about off the top of my head.

So in a nutshell, bringing the Mac & Linux builds entirely into line with the multiple Windows flavours, so progress in the next optimisation phase can proceed more or less in lock-step.

Yes, definitely need to have just the needed libraries placed in the project folder as with the 'other' cuda apps. I seem to be fine using just the three, libcuda.dylib, libcudart.dylib, and libcufft.dylib. I'm not even sure if you need anymore than just libcudart.dylib in the project folder. Edward seems to have an 'extra' library in his /usr/local/cuda/lib that I don't have.

Something I find interesting is the similar speed differences with the Mac Cuda and OpenCL Apps as compared to other platforms. I decided to test the Mac nVidia AstroPulse App again with the newer drivers. Just as a few weeks ago, the AP is taking just under twice as long as it should. This is about the same as the cuda app, it's just under twice as long as it is in XP & Linux. The good thing is the cuda app doesn't have the memory leak the AP app does. The AP app also has an extreme CPU usage variance from around 2 to 60% every few seconds. It's hard to tell with the cuda app since it has a low CPU usage to begin with. The nVidia AP app just decided to give the Maximum elapsed time exceeded error at 2.9 hrs even though it still had plenty of leeway to run to 7.5 hrs. Oh well, enough of that. I'll reset it to run on the CPU app.
The cuda app is doing much better than the nVidia OpenCL app...
ID: 1407718 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1407730 - Posted: 25 Aug 2013, 23:48:45 UTC - in response to Message 1407718.  
Last modified: 25 Aug 2013, 23:50:19 UTC

...Something I find interesting is the similar speed differences with the Mac Cuda and OpenCL Apps as compared to other platforms. I decided to test the Mac nVidia AstroPulse App again with the newer drivers. Just as a few weeks ago, the AP is taking just under twice as long as it should. This is about the same as the cuda app, it's just under twice as long as it is in XP & Linux....


Yes, the pattern being completely different driver models. I've actually isolated particular portions & types of coding that cause a lot of variation by platform, driver version, driver (OS) model, along with older GPU cuda version performance dependance that shouldn't be there.

These variations are target in coming work. For one, the Windows driver model was recently updated in Win7 to bring things in line with 8/8.1& newer drivers. That in part explains some challenges with the newest drivers & GPUs.

There's (increasing) considerable unhidden latencies to work on reducing & hiding. That tends to be more obvious with multiple high performance cards, as GPUs get faster.

As the reasons for these weighty driver models (compared to older XPDM etc) have to do with availability & reliability of the graphics subsystems, both very legitimate in my opinion, I'm not surprised mac might have added latency there, and there are ways to tackle it in the medium term. That will be needed as all the next generation GPUs will be faster again relative to the hosts.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1407730 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1407937 - Posted: 26 Aug 2013, 13:36:17 UTC - in response to Message 1407730.  

Coming up on two days without any CUDA errors. I'm going to need to boot into XP and finish a number of AP tasks that are due on the third. Now seems to be a good time to switch over. So, it might be a couple days before I work on any more Mac CUDA tasks...
ID: 1407937 · Report as offensive
enewman
Volunteer tester

Send message
Joined: 27 Jun 01
Posts: 15
Credit: 6,344,951
RAC: 0
United States
Message 1410884 - Posted: 2 Sep 2013, 22:34:37 UTC - in response to Message 1407718.  

Hit one issue with distributing libraries with science app - Seti sets a maximum work unit size of 32Mb and yet libcufft.5.5.dylib on Mac OSX is 146Mb by itself. Spent some time trying to debug why app was receiving a sigpipe interrupt after ~5 mins only to realise that this was BOINC enforcing max WU disk size. Will also need to check licensing issues with redistributing these libraries and not sure why they are not in default rutime package.

After backing out the app_info.xml changes, have got the latest build of code to work and is processing units. I can get 4 units working on GTX680 but apparently one is processing faster than the others. Needs further investigation.

Still not sure of exactly what is required to get to next step of distributing via Beta project.
ID: 1410884 · Report as offensive
Profile arkayn
Volunteer tester
Avatar

Send message
Joined: 14 May 99
Posts: 4438
Credit: 55,006,323
RAC: 0
United States
Message 1410888 - Posted: 2 Sep 2013, 22:59:16 UTC - in response to Message 1410884.  

Hit one issue with distributing libraries with science app - Seti sets a maximum work unit size of 32Mb and yet libcufft.5.5.dylib on Mac OSX is 146Mb by itself. Spent some time trying to debug why app was receiving a sigpipe interrupt after ~5 mins only to realise that this was BOINC enforcing max WU disk size. Will also need to check licensing issues with redistributing these libraries and not sure why they are not in default rutime package.

After backing out the app_info.xml changes, have got the latest build of code to work and is processing units. I can get 4 units working on GTX680 but apparently one is processing faster than the others. Needs further investigation.

Still not sure of exactly what is required to get to next step of distributing via Beta project.


Give Jason your recent code changes and he will get them committed to GIT.

Then we just have to wait for Eric.

ID: 1410888 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1410963 - Posted: 3 Sep 2013, 6:45:49 UTC - in response to Message 1410884.  
Last modified: 3 Sep 2013, 7:07:01 UTC

Hit one issue with distributing libraries with science app - Seti sets a maximum work unit size of 32Mb and yet libcufft.5.5.dylib on Mac OSX is 146Mb by itself. Spent some time trying to debug why app was receiving a sigpipe interrupt after ~5 mins only to realise that this was BOINC enforcing max WU disk size. Will also need to check licensing issues with redistributing these libraries and not sure why they are not in default rutime package.

After backing out the app_info.xml changes, have got the latest build of code to work and is processing units. I can get 4 units working on GTX680 but apparently one is processing faster than the others. Needs further investigation.

Still not sure of exactly what is required to get to next step of distributing via Beta project.


The Cuda runtime and Cufft libraries, according to nVidia, fit in their definition of redistributables that come under the 'operating system components' exceptions of the GpL. Just avoid using the Cuda 5.5 static linkage capability, in order to avoid some grey areas.

For stock project beta We're really waiting on Eric's readiness & responses to the email I sent regarding both Mac & Linux, but once you have [one or more] packages that is more or less usable we can stick it as third party public Beta on Arkayn's site. [Also, as Arkayn indicates, any source or build system alterations will need committing for public release, though sounds like these might be comparatively minor compared to library & platform juggling]

[Edit:] if the Mac version of the Cuda toolkit contains the samples simpleHyperQ and bandwidthTest, then the first will tell you what stream conccurency is acheivabe as determined by hardware & driver limits. My 680on Win7 only acheives 2 way concurrency which meant hiding the considerable driver latencies took 2 to 4 instances. Mac's heavier (presumably more secure)driver model might shift that a bit & share the GPU among applications differently.

The second test, for bandwidth, run with -mode=shmoo will give you an extended test of differnt transfer sizes for PCI express transfers. Since we currently use too frequenct & too small transfers, about 45% of the execution time depends heavily on Operating system driver latencies, rather than usig PCIe efficiently. Addressing that & the strided pulsefinding problems are for x42.

Incidentally avoiding these (driver induced) latencies is the why Tesla cards can use a special Tesla Compute Cluster driver, which are low-latency. Since we're bound to consumer grade gear, we have to use elimination & latency hiding techniques instead, which can get complex.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1410963 · Report as offensive
enewman
Volunteer tester

Send message
Joined: 27 Jun 01
Posts: 15
Credit: 6,344,951
RAC: 0
United States
Message 1411156 - Posted: 3 Sep 2013, 21:29:57 UTC - in response to Message 1410963.  

So we just need to get WU space allocation up'ed or provide instructions for users on how to add the /usr/local/cuda/lib.

Will send you results from CUDA samples out of band. Was reading earlier about issues with PC cards (I have PNY GTX680 PC version - no bootscreen) not running at full PCI bandwidth so investigating further.

ID: 1411156 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1411250 - Posted: 4 Sep 2013, 4:40:20 UTC - in response to Message 1411156.  

So we just need to get WU space allocation up'ed or provide instructions for users on how to add the /usr/local/cuda/lib.

Will send you results from CUDA samples out of band. Was reading earlier about issues with PC cards (I have PNY GTX680 PC version - no bootscreen) not running at full PCI bandwidth so investigating further.



So the Mac linker or OS doesn't allow you to include the executable origin into the rpath ? For the Linux one doing so allowed dropping the libraries straight in alongside the executable in the project directory, which when setup by the project should locate them there (or referencing suitably in app_info).
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1411250 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1416965 - Posted: 17 Sep 2013, 22:08:18 UTC

My Mac is Out of APs. Now is a good of time as any to add more files. Can someone please add some AP files?
ID: 1416965 · Report as offensive
spitfire_mk_2
Avatar

Send message
Joined: 14 Apr 00
Posts: 563
Credit: 27,306,885
RAC: 0
United States
Message 1417028 - Posted: 18 Sep 2013, 2:39:05 UTC - in response to Message 1416965.  

My Mac is Out of APs.

It has been that way for a few days now.
ID: 1417028 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1432017 - Posted: 22 Oct 2013, 23:18:09 UTC

Apple just released 10.9 'Mavericks'. For...FREE. In fact, it looks like as long as you have 10.6.8 you can Update to 10.9 for....Free. I guess I'll have to clone my existing ML and give it a try.

Is there a Big Cat named Mavericks? WTH?



ID: 1432017 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Mac, NVidia GPU and Seti@home


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.