Message boards :
Number crunching :
I've Built a Couple OSX CUDA Apps...
Message board moderation
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 58 · Next
Author | Message |
---|---|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
That's weird (Both TBat and Richard), because firstly the V7 app I built against Boinc 7.7 libs/api using the XCode project supplied, and secondly that app crunched flawlessly without any api version entry in the app_info.xml, as did the CPU app (wherever I got it from). Yes that is weird considering I've been trying to compile code from the repository for the last year without any success except with Mountain Lion and lower. Right now trying to compile the Xbranch folder in the terminal using Yosemite/Xcode 6.1.1 I'm first getting the object.h Error and then the Linker Error. I have tried it in the App, and Still receive the Linker Error after jumping trough all the hoops of loading all the libraries and whatnot. In my experience I receive the Same Errors whether using the Terminal or the App, and using the Terminal is somewhat less frustrating. So, What are you using/compiling and are you using the Terminal or the App? Have you tried compiling something from the AKv8 folder...say a CPU App? Something Else that's weird is all the Dozens, probably Hundreds of My Apps that have been downloaded from C.A. and No One has mentioned this api problem before. I've downloaded Mac Apps before and never had that problem. Yes, Weird. MY CUDA App was compiled in Mountain Lion with ToolKit 6.5, supports CC2.0 and above, and seems to work fine. Just remember the Sincos_stret error when you try to run an App in ML. If it was compiled in Mavericks or above, ML will have the SS error...unless you found a workaround. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Well will see where that goes. There's always always the option of putting out one version that works on a few systems, as a feeler, then working out a priority list from that as to what others will be needed. Do the nVidia toolkit samples build and run (correctly) for you ? My flat mac_build/Makefile is based off those sample Files. for 7.7.0 boinc libs, I had to sift through a readme for the Boinc Xcode project file, and the two required .a libraries just built (and were all I needed) So I didn't try other components. Probably with *something* for each platform operational on each platform, I'd reprioritise, since there's a lot of optimisation to go in, that needs some switching/option logic and surrounding architecture so as not to kill off the old cards that Still work on Windows. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
Something Else that's weird is all the Dozens, probably Hundreds of My Apps that have been downloaded from C.A. and No One has mentioned this api problem before. I've downloaded Mac Apps before and never had that problem. Yes, Weird. Running a search for "Waiting for shared memory" shows that the subject has come up a few times in the Macintosh-specific sub-forum in Q & A in recent years - but I must confess I don't read there as a matter of routine. Now that the subject has come up in a forum which I do read regularly, all that research I started two years ago has perhaps paid off. It seems to be non-critical, but I'd suggest adding <api_version> tags to your releases from now on - it might just help some of those silent downloaders to be more productive, as it did for Tom. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well will see where that goes. There's always always the option of putting out one version that works on a few systems, as a feeler, then working out a priority list from that as to what others will be needed. Hmmm, I've never looked at the ToolKit samples. I don't think I ever installed them. I do have the Full boinc-master installed though, on numerous systems, installed numerous times. Funny I never had any problems compiling in ML before SETIv8 came out and someone suggested it might be better to use the latest version. After going back to 7.5 the problems went away and I could compile Apps again. It was a little different in Lion, the Apps would compile but crash immediately with a Memory Error. That also went away with reverting back to 7.5. In fact, I just compiled another OSX CPU App in Lion I'm trying right now. Seems the Linux CPU App I created is still faster than anything on the Mac so far. I'm trying to fix that. Why did I compile a Linux CPU App? Because the Stock Linux App kept crashing with a Memory Error. It started on Beta around 8.02, never was fixed. So I compiled a Non-Graphics Linux CPU App and I've Never had the Memory error again. The thing cooks too. Add another line to the app_info? I suppose I could do that, for whatever good it will do. Now to see if this Exact duplicate OSX App I made will be anywhere close to the Linux app. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Well trial Linux 64 bit, Cuda6 up on my 680 (running 2up on Beta), no serious issues encountered during build. Will see if that holds up to the other builds validation-wise as well, then if it does notify Eric that at leat we have a start. Yeah the toolkit samples gave me confidence that my build systems were correct for Cuda development, with or without crazy nuances and complexity of the stock and Boinc autotools based systems. Based on how much easier the flat makefile (imitating nv's samples) made things, I'll probably transition through flat Makefiles then to Gradle automation (since then one build system, many platforms, automated regression testing and deployment) Better check the weather forecast again in case I fall asleep and cook myself and the dog... "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Gary Charpentier Send message Joined: 25 Dec 00 Posts: 31006 Credit: 53,134,872 RAC: 32 |
The Anonymous Platform documentation has indicated the use of <api_version> since v6.1.0, but to be honest, I ignored it - I couldn't see the point. It makes no visible difference on my platform, Windows. And that is why it isn't fixed. Fixed, you would have to explicitly request the old version to be used, default would be to the new. I do detest that the thinking of the programming community is that "breaking" old bug riddled code is worse than running old bug riddled code. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Here's a Strange result from using the line <api_version>7.7.0</api_version> on My Mac. Adding that line to the section for my CUDA App Slows both cards by around 90%. You can see the times in my results. It slowed the cards down to where they were Slower than my ATI card. Removing the line restored them to their normal speed. I added the line again, back to Slow motion, http://setiathome.berkeley.edu/result.php?resultid=4659195444 It doesn't seem to bother the other Apps, but the App compiled from Jason's r3328 hates that setting. I'm running BOINC 7.4.36 and the only time I've seen the Shared Memory notice was when I had about a dozen or so CPU tasks waiting to run. Simple fix, don't have about a dozen or so CPU tasks waiting to run. I would be running 7.2.33 but it doesn't see CUDA on my machine, so, I have to run 7.4.36 to have BOINC see CUDA with Yosemite. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
That is curious, and worth exploring when we have time (I'd like to get v8 out of the door first). Please make notes of what makes this happen - what source code were you using for your BOINC API library build, for example?. For completeness, could you try an intermediate value, please, like <api_version>7.5.0</api_version>? That would take the special Bitcoin Utopia extension out of play, just to be on the safe side (assuming you're using the official Berkeley library code, of course - I know Jason prefers his own modifications, which may not have been extensively tested on Macs). |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I'm using vanilla 7.7.0 on the Mac build currently working through testing backstage. I will eventually install some customisations, though for the time being OSX el-capitan + Cuda drivers at least seems unaffected by bad threading practices (WHile some Linuxes seem to be and others not) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
FYI, I'm testing now against the reference-results I got from Richard. Looking better every day. My machine makes almost no errors (one bad for every 400 000 points), The number of invalids is dropping. Jason and TBar may want to check their mailbox for latest source. One question: Where is the period for triplts and pulses calculated and possibly truncated/rounded for output? REF: <period>0.9961472</period> MY : <period>0.99614721536636</period> To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Cablemodem down for a day or so, but checking in from mobile. Thanks for the update :) Going from memory this might be being rounded in either find_triplets itself, and maybe cudaAcc_reportTriplet() (or whatever it's called, similar for pulses), Eric may have cast to a float where we don't (?) For some rough numbers, 8 significant digits match for many values around thresholds, or bests just below reportable, is about 3-4 sig digits better than under v7, and v6 match was around only 4 sig digits. We've been seeing some hints that we are bouncing off the noise floor, so probably inspection of current validator code will be needed as well (which is probably slightly in flux still) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove Send message Joined: 4 Jul 99 Posts: 14679 Credit: 200,643,578 RAC: 874 |
If you're referring to my files, I think I sent you two versions of each. ref-...res - that is the actual result file that would be uploaded to the project servers: it is the authoritative, full-precision file. It's a simple text (xml) file, but it's written with *nix line-endings, which can make it hard to read on Windows. Use either WordPad or NotePad++. ref-...summary.txt - that is a simpler summary file, designed to make it easier to compare a result visually when all you have to compare with is the std_err of an OpenCL application. I think I sent you the summariser tool as well: the ReadMe explains that the figures are truncated (not rounded) to six decimal places. For predicting normal validation, the 6-figure summary is probably enough, but for application checking, you should also refer to the Q-value reported by the rescmpv5 tool, which works on the full-precision files. I don't think the OpenCL apps list the triplet periods in std_err, so I didn't bother summarising it. For that, you'll need to look in the full result file. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
If you're referring to my files, I think I sent you two versions of each. Hi Richard, I'm running under Linux. I use your (thank you) full precision ref-res files that I copied over my ref-result.app-name.PGxxxx.wu xml-style files in testData and renamed them so that my old rescmp5_l can do the comparison and Q-value reporting every time I test a new app. Current Q is 99.17% - 99.97% depending on wu. I can read the result files with emacs in extremely readable pretty printed screenlayout and do a diff 2 buffers and browse for next difference hitting 'n'. Precision seems pretty good with current version. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
TBar - Last night I finally had a chance to test your Mac AVX applications - CPU and Intel GPU. They both worked as expected. The results are here and they are starting to validate: CPU http://setiathome.berkeley.edu/result.php?resultid=4660824260 http://setiathome.berkeley.edu/result.php?resultid=4660824259 http://setiathome.berkeley.edu/result.php?resultid=4660824257 http://setiathome.berkeley.edu/result.php?resultid=4660831005 GPU http://setiathome.berkeley.edu/result.php?resultid=4660830697 http://setiathome.berkeley.edu/result.php?resultid=4660830678 http://setiathome.berkeley.edu/result.php?resultid=4660830852 http://setiathome.berkeley.edu/result.php?resultid=4660830990 http://setiathome.berkeley.edu/result.php?resultid=4660830989 http://setiathome.berkeley.edu/result.php?resultid=4660830726 - Tom |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
This should be interesting to see how the newer iGPU App preforms. If I remember that machine completed a few of the r3323 iGPU tasks without any inconclusives. A MacMini with the same HD4000 produces a number of inconclusives with the same App even though the signal count is mostly the same as the wingpeople. That's pretty much the way it was with the v7 iGPU App, it worked fine on some machines but not others. I was hoping to find a more consistent App. Seems that may be a problem with the current state of Intel drivers. |
Tom Rinehart Send message Joined: 12 Dec 01 Posts: 113 Credit: 13,255,975 RAC: 6 |
This should be interesting to see how the newer iGPU App preforms. If I remember that machine completed a few of the r3323 iGPU tasks without any inconclusives. One of the GPU tasks has come back as Validation Inconclusive: http://setiathome.berkeley.edu/result.php?resultid=4660830852 Two have validated and three GPU tasks are still pending. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Try the new Intel iGPU at C.A., it might be better, probably not though. In other News I have built a nVidia CUDA 4.2 App for the Pre-Fermi cards in Mountain Lion and below. It seems to be working very well in Mountain Lion with my GTS250, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=71141 and here, http://setiathome.berkeley.edu/results.php?hostid=7199204. Due to nVidia Dropping support for the Pre-Fermi cards in CUDA 6.5, about the last OS for the Pre-Fermi cards for CUDA is Mountain Lion. Mavericks is a transitional OS and has Pre-Fermi CUDA library problems. The App seems to be working well and needs testing in Lion and Snow Leopard. Unlike the earlier CUDA APPs with the Mac Pre-Fermi cards this App appears to be running at Full Speed. At least compared to my GTS250 in Windows 8.1 anyway... |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
After more testing it appears to be still suffering from the @rpath bug. The App will look for the libraries in other locations before the origin, and upon finding a different library version exit with an error. This is even after setting the library path to the setiathome.berkeley.edu folder. The first part of the App has a number of locations listed before $ORIGIN. It also seems to think it needs the libtlshook.dylib file in Mavericks. Once those hurtles are passed it appears to work fine with my GTS250 in Lion, Mountain Lion, and Mavericks. Unfortunately Snow Leopard fails to recognize my PC 250. It also works in Yosemite with my GTX750Ti and is just slightly slower than CUDA 6.5, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959 |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
What I'll probably do, for initial stock Mac deployment, is request limiting to Yosemite+, and maybe Fermi+ (we'll see) In probably a few older Cuda versions. Then systematically work out all the breaking changes back that far (THere seem to be a number, probably not limited to the shift from gcc to Clang, and the odd library management changes). "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I've found a winning combination with Snow Leopard/Xcode 3.2 and GCC. It compiles without any -dumpspecs nonsense. Right now the only problem is the App is looking at the rpaths instead of the origin. Instead of looking in the same folder it follows the path to the CUDA ToolKit, finds different libraries, then throws an error. Even after the ToolKit has been removed it's still Not looking in the origin folder until you enter a path to it. If you search the makefiles for rpath you find only One hit, and it just happens to be just before ORIGIN, -Wl,-rpath,\$$ORIGIN Does that have anything to do with it looking for paths instead of ORIGINS? Surely there's an easy way to fix this... |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.