Message boards :
Number crunching :
Mac, NVidia GPU and Seti@home
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Had similar problem. I fixed by setting: The Linux equivalent was what I had to manipulate to get what I regardas a bitcloser to 'sane'. In that case it was ldconfig exports. If it works the same way as on Linux (but named differently), you might want to set rpath to include origin, then he exe will look for the runtime & cufft shared library in the same place as the executabe when run. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
enewman Send message Joined: 27 Jun 01 Posts: 15 Credit: 6,344,951 RAC: 0 |
I agree as this is not supportable for all BOINC/Seti users. Just a way that I got my machine up and running to test. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Had similar problem. I fixed by setting: Would it work if I just copied the libcuda.dylib file to the project folder? Or linked to it? It appears to be in usr/local/cuda/lib Nope, neither one of those works. Oh well, I guess when the App is finally released I might try it again... This appears to sum it up, Charlie Fenton | 9 Jan 2010 09:16 http://permalink.gmane.org/gmane.comp.distributed.boinc.devel/2662 As far as I can tell, I DO NOT have libcudart.dylib on my machine. Maybe someone can post the required files in the thread at Arkayns' site?
Sat Aug 24 08:18:17 2013 | | CUDA: NVIDIA GPU 0: GeForce GTS 250 (driver version 5.5.25, CUDA version 5.50, compute capability 1.1, 1024MB, 951MB available, 705 GFLOPS peak) Sat Aug 24 08:18:17 2013 | | OpenCL: NVIDIA GPU 0: GeForce GTS 250 (driver version 8.14.11 313.01.02f01, device version OpenCL 1.0, 1024MB, 951MB available, 705 GFLOPS peak) Sat Aug 24 08:18:17 2013 | | OpenCL: AMD/ATI GPU 0: ATI Radeon Barts PRO Prototype (driver version 1.0, device version OpenCL 1.1, 1024MB, 1024MB available, 868 GFLOPS peak) |
enewman Send message Joined: 27 Jun 01 Posts: 15 Credit: 6,344,951 RAC: 0 |
Believe standard location for these files is /usr/local/cuda/lib/ and these (on my machine with SDK installed) are sym-linked to /Developer/NVIDIA/CUDA-5.5/lib equivalents |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Believe standard location for these files is /usr/local/cuda/lib/ and these (on my machine with SDK installed) are sym-linked to /Developer/NVIDIA/CUDA-5.5/lib equivalents It appears Charlie is correct. The only library installed by the CUDA driver is /usr/local/cuda/lib/libcuda.dylib. You have to download and install the CUDA Toolkit to obtain the libcudart.dylib. The Toolkit is a 790MB download. It looks as though it wants over a GB of space to Install. I wasn't expecting, or desiring, to install something taking up over a GB of space. I have extracted libcudart.dylib, and libcufft.dylib to the project folder and run 'export DYLD_LIBRARY_PATH=/Library/Application\ Support/BOINC\ Data/projects/setiathome.berkeley.edu:$DYLD_LIBRARY_PATH'. That doesn't appear to work either. It would be nice to have this work without having to install over a GB of Files. It Appears this may work; Download the ToolKit: https://developer.nvidia.com/cuda-downloads Extract: libcudart5.5.dylib & libcufft5.5.dylib to /usr/local/cuda/lib Change names to: libcudart.dylib & libcufft.dylib Run: export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PATH The first one is finally running. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well, the first one finished. It was estimated at 1:40, The next one is estimated at :33... At least it seems to work; Task 3129070998 The same card was just in Linux & XP; http://setiathome.berkeley.edu/results.php?hostid=6864181 http://setiathome.berkeley.edu/results.php?hostid=6979629 |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
So... How long do I need to run the new CUDA App to be considered a Success? It's been running almost a day now without any Errors. It's not exactly a 'real world' test though, considering I'm completely out of work for my other devices. It would be more realistic if I could obtain, say, a few dozen APs to heat up my ATI card and CPUs. That way I could test the new CUDA & ATI Apps in a more realistic environment :-) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
|
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
So... How long do I need to run the new CUDA App to be considered a Success? It's been running almost a day now without any Errors. It's not exactly a 'real world' test though, considering I'm completely out of work for my other devices. It would be more realistic if I could obtain, say, a few dozen APs to heat up my ATI card and CPUs. A fair bit when you look at the nuts & bolts. Basically, x41zc ported over 'relatively' painlessly due to Aaron's tweaks already being in there for Linux, then some Mac specific patch & juggling (according to Ed), but for stock distribution quite a bit more is needed than a solitary binary. What isn't ready ( Both the Mac & Linux variants ), for public consumption as stock, are a number of things including: - rationalised/updated build system (requiring a user have cuda toolkit installed is not usually practical, or necessary, under stock distribution). This would include ensuring minimal dependencies etc. - some of the robust fail-safe error handling isn't active, as it is on Windows (deals with failures like Ed's been seeing better). Some involves a minor patch to the Boincapi used, and also handles driver or hardware problems more gracefully. - Clarified system requirements & updated documentation. - Probably more I forgot about off the top of my head. So in a nutshell, bringing the Mac & Linux builds entirely into line with the multiple Windows flavours, so progress in the next optimisation phase can proceed more or less in lock-step. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
...What isn't ready ( Both the Mac & Linux variants ), for public consumption as stock, are a number of things including: Yes, definitely need to have just the needed libraries placed in the project folder as with the 'other' cuda apps. I seem to be fine using just the three, libcuda.dylib, libcudart.dylib, and libcufft.dylib. I'm not even sure if you need anymore than just libcudart.dylib in the project folder. Edward seems to have an 'extra' library in his /usr/local/cuda/lib that I don't have. Something I find interesting is the similar speed differences with the Mac Cuda and OpenCL Apps as compared to other platforms. I decided to test the Mac nVidia AstroPulse App again with the newer drivers. Just as a few weeks ago, the AP is taking just under twice as long as it should. This is about the same as the cuda app, it's just under twice as long as it is in XP & Linux. The good thing is the cuda app doesn't have the memory leak the AP app does. The AP app also has an extreme CPU usage variance from around 2 to 60% every few seconds. It's hard to tell with the cuda app since it has a low CPU usage to begin with. The nVidia AP app just decided to give the Maximum elapsed time exceeded error at 2.9 hrs even though it still had plenty of leeway to run to 7.5 hrs. Oh well, enough of that. I'll reset it to run on the CPU app. The cuda app is doing much better than the nVidia OpenCL app... |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
...Something I find interesting is the similar speed differences with the Mac Cuda and OpenCL Apps as compared to other platforms. I decided to test the Mac nVidia AstroPulse App again with the newer drivers. Just as a few weeks ago, the AP is taking just under twice as long as it should. This is about the same as the cuda app, it's just under twice as long as it is in XP & Linux.... Yes, the pattern being completely different driver models. I've actually isolated particular portions & types of coding that cause a lot of variation by platform, driver version, driver (OS) model, along with older GPU cuda version performance dependance that shouldn't be there. These variations are target in coming work. For one, the Windows driver model was recently updated in Win7 to bring things in line with 8/8.1& newer drivers. That in part explains some challenges with the newest drivers & GPUs. There's (increasing) considerable unhidden latencies to work on reducing & hiding. That tends to be more obvious with multiple high performance cards, as GPUs get faster. As the reasons for these weighty driver models (compared to older XPDM etc) have to do with availability & reliability of the graphics subsystems, both very legitimate in my opinion, I'm not surprised mac might have added latency there, and there are ways to tackle it in the medium term. That will be needed as all the next generation GPUs will be faster again relative to the hosts. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Coming up on two days without any CUDA errors. I'm going to need to boot into XP and finish a number of AP tasks that are due on the third. Now seems to be a good time to switch over. So, it might be a couple days before I work on any more Mac CUDA tasks... |
enewman Send message Joined: 27 Jun 01 Posts: 15 Credit: 6,344,951 RAC: 0 |
Hit one issue with distributing libraries with science app - Seti sets a maximum work unit size of 32Mb and yet libcufft.5.5.dylib on Mac OSX is 146Mb by itself. Spent some time trying to debug why app was receiving a sigpipe interrupt after ~5 mins only to realise that this was BOINC enforcing max WU disk size. Will also need to check licensing issues with redistributing these libraries and not sure why they are not in default rutime package. After backing out the app_info.xml changes, have got the latest build of code to work and is processing units. I can get 4 units working on GTX680 but apparently one is processing faster than the others. Needs further investigation. Still not sure of exactly what is required to get to next step of distributing via Beta project. |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
Hit one issue with distributing libraries with science app - Seti sets a maximum work unit size of 32Mb and yet libcufft.5.5.dylib on Mac OSX is 146Mb by itself. Spent some time trying to debug why app was receiving a sigpipe interrupt after ~5 mins only to realise that this was BOINC enforcing max WU disk size. Will also need to check licensing issues with redistributing these libraries and not sure why they are not in default rutime package. Give Jason your recent code changes and he will get them committed to GIT. Then we just have to wait for Eric. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Hit one issue with distributing libraries with science app - Seti sets a maximum work unit size of 32Mb and yet libcufft.5.5.dylib on Mac OSX is 146Mb by itself. Spent some time trying to debug why app was receiving a sigpipe interrupt after ~5 mins only to realise that this was BOINC enforcing max WU disk size. Will also need to check licensing issues with redistributing these libraries and not sure why they are not in default rutime package. The Cuda runtime and Cufft libraries, according to nVidia, fit in their definition of redistributables that come under the 'operating system components' exceptions of the GpL. Just avoid using the Cuda 5.5 static linkage capability, in order to avoid some grey areas. For stock project beta We're really waiting on Eric's readiness & responses to the email I sent regarding both Mac & Linux, but once you have [one or more] packages that is more or less usable we can stick it as third party public Beta on Arkayn's site. [Also, as Arkayn indicates, any source or build system alterations will need committing for public release, though sounds like these might be comparatively minor compared to library & platform juggling] [Edit:] if the Mac version of the Cuda toolkit contains the samples simpleHyperQ and bandwidthTest, then the first will tell you what stream conccurency is acheivabe as determined by hardware & driver limits. My 680on Win7 only acheives 2 way concurrency which meant hiding the considerable driver latencies took 2 to 4 instances. Mac's heavier (presumably more secure)driver model might shift that a bit & share the GPU among applications differently. The second test, for bandwidth, run with -mode=shmoo will give you an extended test of differnt transfer sizes for PCI express transfers. Since we currently use too frequenct & too small transfers, about 45% of the execution time depends heavily on Operating system driver latencies, rather than usig PCIe efficiently. Addressing that & the strided pulsefinding problems are for x42. Incidentally avoiding these (driver induced) latencies is the why Tesla cards can use a special Tesla Compute Cluster driver, which are low-latency. Since we're bound to consumer grade gear, we have to use elimination & latency hiding techniques instead, which can get complex. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
enewman Send message Joined: 27 Jun 01 Posts: 15 Credit: 6,344,951 RAC: 0 |
So we just need to get WU space allocation up'ed or provide instructions for users on how to add the /usr/local/cuda/lib. Will send you results from CUDA samples out of band. Was reading earlier about issues with PC cards (I have PNY GTX680 PC version - no bootscreen) not running at full PCI bandwidth so investigating further. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
So we just need to get WU space allocation up'ed or provide instructions for users on how to add the /usr/local/cuda/lib. So the Mac linker or OS doesn't allow you to include the executable origin into the rpath ? For the Linux one doing so allowed dropping the libraries straight in alongside the executable in the project directory, which when setup by the project should locate them there (or referencing suitably in app_info). "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
My Mac is Out of APs. Now is a good of time as any to add more files. Can someone please add some AP files? |
spitfire_mk_2 Send message Joined: 14 Apr 00 Posts: 563 Credit: 27,306,885 RAC: 0 |
My Mac is Out of APs. It has been that way for a few days now. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
|
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.