Message boards :
Number crunching :
I've Built a Couple OSX CUDA Apps...
Message board moderation
Previous · 1 . . . 20 · 21 · 22 · 23 · 24 · 25 · 26 . . . 58 · Next
Author | Message |
---|---|
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
In/on Linux there is this libsleep.so solution that has been used in other BOINC projects too. The liblessp.so does help ith 100% CPU usage. 1) Make the code and CUDA use Yield. 2) libsleep.so replaces Yield() with nanosleep. 3) libsleep.so need to be loaded ith LD_PRELOAD in linux, in Mac there may be some magic DYLD_XXX that does the same. How does it work? Yield gives the timeslice to a process that is ready to run and has the same or higher priority. Sleep and nanosleep gives the timeslice to any thread that is ready to run. LD_PRELOAD loads a library in to memory before the program and its libraries and replaces the Yield() function call in the program and in the libraries (NVIDIA libs too) with nanosleep(). I have posted the nanosleep instructions and source and it should be easily found with a search in the forums. Fakesguy is running linux and has low CPU usage, and me too. And the maxrregcount is a ay to tell the CUDA compiler to allocate more register to a thread but the sacrifice is to run a kernel (a piece of GPU code) with less parallelism. There is alays a trade of between more threads or more work done by a thread with less interdependencies or waiting for memory. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Sooo, are you and Fakesguy using the Sleep function? I tried the Sleep option in Windows with the OpenCL Apps and didn't like it. I'd rather just run a couple less CPU Apps than use the same type of Sleep I saw in Windows. It really isn't a big deal, however, if it isn't supposed to cause 100% CPU use it would be nice to correct it. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Yes, that's basically why I've been working on configuration and dispatch infrastructure. Without that, the base code has to be able to run on a G80 class Pre-Fermi GPU (e.g. 32 max registers) and blocking sync only, minimising CPU use (limited to no usefulness of Cuda streams). Fortunately we get to choose to fit the larger machines & newer GPUs, with all the things being tried, and I guarantee different configurations will suit different hardware better. A good example is my woeful GTX 980, being underfed by an old Core2Duo. Not a situation you want to be chewing a CPU core per instance :) So we have the knowledge and the plan to make things faster for mainstream, just a fair bit to lay down so as to avoid having to have 50 different builds. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Gianfranco Lizzio Send message Joined: 5 May 99 Posts: 39 Credit: 28,049,113 RAC: 87 |
Increasing the maxrregcount value increase inconclusive results. Now I'm trying to find a good compromise between speed of execution and inconclusive results. I don't want to believe, I want to know! |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Increasing the maxrregcount value increase inconclusive results. Now I'm trying to find a good compromise between speed of execution and inconclusive results. If you're seeing inconclusives directly as a result of switching maxregcount, then you have Cuda kernels failing silently (and so looking quick) due to launch restrictions in Drivers and hardware. Net effect would be it's skipping searches. I'd advise extreme caution until such time as every kernel has an inbuilt regression test for first run (meant as part of x42). "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
After compiling the App in Mountain Lion and moving back to Yosemite for testing, I noticed the new App seemed to match my CPU test results slightly better than the last build. I was expecting the Inconclusive list to shrink a little, instead it went up just a few but now appears to be about where it was with the maxrregcount 32 build. Don't know, I'll watch it as I have been. |
Gianfranco Lizzio Send message Joined: 5 May 99 Posts: 39 Credit: 28,049,113 RAC: 87 |
If you're seeing inconclusives directly as a result of switching maxregcount, then you have Cuda kernels failing silently (and so looking quick) due to launch restrictions in Drivers and hardware..... The most likely explanation seems to be the use of Nvidia Beta Driver. But there is nothing to do because they are the only ones available. I don't want to believe, I want to know! |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
If you're seeing inconclusives directly as a result of switching maxregcount, then you have Cuda kernels failing silently (and so looking quick) due to launch restrictions in Drivers and hardware..... Hmmm, so you're using a LapTop with an External video card? Speaking of LapTops....I think I've determined just how fast a NV 320 should run, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=58784&offset=40 The 320M in a Mac LapTop should be just a little slower than that full sized card. That would mean the 320M in Mac Laptops should run much faster than they are currently running, http://setiathome.berkeley.edu/results.php?hostid=7523831&offset=40 Hmmm, that would be around 2 to 3 times faster using the CUDA App verses the OpenCL App, which is pretty close to the difference with the Mac 650M & 750M. That's when the 320M actually works with the OpenCL App, a few of the ones I've seen are either much, much slower or simply give Errors, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=59702&offset=40 Has anyone run the CUDA 42 App with one of these 320M LapTops? Of course it will only work with the 320M in Mavericks and lower running CUDA... |
Gianfranco Lizzio Send message Joined: 5 May 99 Posts: 39 Credit: 28,049,113 RAC: 87 |
Hmmm, so you're using a LapTop with an External video card? No Tom...I'm using a self-built Macintosh with performance comparable to a Mac Pro 2013. I don't want to believe, I want to know! |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Hmmm, so you're using a LapTop with an External video card? Ok, that explains the CPU then. I was wondering about that. The Webdrivers are still being called Beta for the LapTops only. Even after over a year they are still calling them Beta. The Webdrivers for the MacPros have been around for a very long time, and are not considered Beta. I think I was using the Webdriver back when my 2008 MacPro was new. Anyway, the concept of a Hackintosh is itself Beta, it's nice to know it works fine with the default settings though. |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
TBar, Apple just recently released 10.11.4 Update to El Capitan. When I'm ready to install this Update, it WILL REQUIRE new Alternate NVIDIA Web and CUDA Drivers - just released in the last couple days... I've just now Downloaded the new Web and CUDA Drivers. Just thought you should know this if you are still working on/fine tuning the CUDA Apps for MAC. TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
MarkJ Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5 |
Yes, that's basically why I've been working on configuration and dispatch infrastructure. Without that, the base code has to be able to run on a G80 class Pre-Fermi GPU (e.g. 32 max registers) and blocking sync only, minimising CPU use (limited to no usefulness of Cuda streams). Jason, do we really have to support back to the G80, given Nvidia have thrown them in the legacy bin? Can we convince the powers that be (Eric?) that the lowest supported GPU should be a Fermi for future developments. It sounds like a lot of effort to support old hardware. BOINC blog |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Yes, that's basically why I've been working on configuration and dispatch infrastructure. Without that, the base code has to be able to run on a G80 class Pre-Fermi GPU (e.g. 32 max registers) and blocking sync only, minimising CPU use (limited to no usefulness of Cuda streams). Certainly something I've mused about. It's definitely not as high a priority on platforms other than Windows (which was already done), however abandoning those in order to focus solely on Fermi+ leaves some important development holes for the next generation of applications. That next generation is to move toward inclusion of other platforms that may not even have Cuda, and closer resemble G80 than they do Fermi, Kepler or Maxwell... So while working Pre-Fermi baseline Cuda code may not be relevant for mainstream NV card owners, it is relevant for development purposes. Actually building Pre-Fermi applications isn't especially difficult either, provided the already working code for them is preserved alongside the new stuff. Where the real current challenges lie is in gradually making the build system more unified (across 3 main platforms), and splitting the very hardware specific code into plugin/dispatch infrastructure, so that the next major architecture shifts don't cost so much development investment. What is costly, is having 3 separate platforms, on 3 separate build systems. In those respects, discarding working pre-Fermi builds/codepaths would be actually a step backward, by not giving 'known working' reference to compare against the new stuff to come. Therefore at this point, the strength of Cuda X-branch is in its diversity, despite the massive amount of growing pains needed. Without that diversity then there's no incentive or testbed to improve the infrastructure to cope better with the next major change (e.g. GBT, other telescopes, next gen GPUs, everyone decides SPIR-V/Vulkan is better, Samsung starts selling its GPUs as Compute Card, and so on ) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
Just so TBar is aware: I just finished Upgrading to 10.11.4 from 10.11.3, installed the new Alternate NVIDIA, (Web), Driver and new CUDA Driver. (346.03.06f01 and 7.5.26 - respectively.) Resumed BOINC and SETI and crunching resumed without issue. Ran for over a minute; no crashes... Reinstated time of day limitations and BOINC Suspended. :-) (BOINC still Stock, unmodified; haven't been brave enough to install your CUDA App, yet.) TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
Well, just wanted to report my first Invalid with CUDA 42. WU ID 2114110856 - Comp ID: 7952666. Any idea why a "100% Accurate App." has an Invalid??? Or, is this a fluke??? TL Stderr output <core_client_version>7.6.22</core_client_version> <![CDATA[ <stderr_txt> v8 task detected setiathome_CUDA: Found 1 CUDA device(s): Device 1: GeForce GTX 750 Ti, 2047 MiB, regsPerBlock 65536 computeCap 5.0, multiProcs 5 pciBusID = 1, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 750 Ti is okay SETI@home using CUDA accelerated device GeForce GTX 750 Ti setiathome enhanced x41zi (baseline v8), Cuda 4.20 setiathome_v8 task detected Detected Autocorrelations as enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 1.046500 re-using dev_GaussFitResults array for dev_AutoCorrIn, 4194304 bytes re-using dev_GaussFitResults+524288x8 array for dev_AutoCorrOut, 4194304 bytes Thread call stack limit is: 1k SETI@Home Informational message -9 result_overflow NOTE: The number of results detected equals the storage space allocated. Flopcounter: 3287976293326.461914 Spike count: 26 Autocorr count: 4 Pulse count: 0 Triplet count: 0 Gaussian count: 0 06:46:32 (18049): called boinc_finish(0) </stderr_txt> ]]> Name: 01dc10aa.29917.21335.6.33.41_1 Workunit: 2114110856 Created: 3 Apr 2016, 18:04:31 UTC Sent: 4 Apr 2016, 0:01:24 UTC Report deadline: 15 May 2016, 10:34:34 UTC Received: 4 Apr 2016, 14:43:31 UTC Server state: Over Outcome: Success Client state: Done Exit status: 0 (0x0) Computer ID: 7952666 Run time: 2 min 57 sec CPU time: 42 sec Validate state: Invalid Credit: 0.00 Device peak FLOPS: 802.88 GFLOPS Application version: SETI@home v8 Anonymous platform: (NVIDIA GPU) Peak working set size: 134.09 MB Peak swap size: 23,073.26 MB Peak disk usage: 0.02 MB TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Running a single CUDA 42 task will be faster than what you are running now and it should produce nearly 100% Valid results. At least it does on Real Macs anyway. That's the quote. You seem to miss certain words, such as EXTRACT, FINISH The Tasks, NEARLY, REAL MACS, etc... Hey, you're welcome to go back to what you were using. I'd suggest watching the results and seeing if any other Overflows, besides the three that already happened, show up. You still have Two more suspicious Overflows, that will probably fail, waiting for the third wingperson. As with the Nice results you were receiving with the OpenCL App, I suspect the 10.11.4 update has a little to do with it. The hosts I've been watching have all had their Inconclusives rise after 10.11.4, mine when up by around 20 and I got an Invalid with 10.11.4. Since going back to Yosemite the Inconclusives are dropping and the RAC is rising. I would suggest going back to Yosemite if you want better SETI results. I'd be running Mavericks if I could, it seems to work best on my machine. You might even try CUDA 65, which is the suggested App for your machine. The CUDA 42 App was intended for Hosts running Mavericks and lower with a Pre-Fermi card. CUDA 42 was compiled in SNOW LEOPARD, CUDA 65 was compiled in Mountain Lion for the newer cards. Or you can keep complaining about having to install things you don't have to. The only difference between installing the CUDA 42 App and the CUDA 65 App is you have to obtain the CUDA 65 Libraries from nVidia, EXTRACT them from the ToolKit, and then paste them into the setiathome.berkeley.edu folder just as you did with the CUDA 42 libraries. You DO NOT have to install any ToolKits, and if you do have a Toolkit installed I'd suggest Removing it unless you plan on building a CUDA App. After removing the ToolKit you will probably need to reinstall the CUDA driver. |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
Can't go to Yosemite; as I NEVER had a copy of it, and you can't DL old OSes from the App Store. ONLY 10.11.4, (the current OS Upgrade), is available through the App Store. My available choices are Snow Leopard 10.6.8, El Cap 10.11.3, and 10.11.4. As I've already installed some pretty cool apps that are working in 10.11.4; I'm NOT willing to go backward at this point, and going back to 10.11.3 is still an El Cap version. I will NOT being going back to Snow Leopard. I will monitor the results in queue, and if I'm able, I will change to CUDA 65. I'm NOT really complaining about your work; as I've stated over, and over, I'm in AWE over what you do, and have done. I did NOT appreciate being hammered into installing CUDA 42. That's where I'm coming from. I was hammered by you, by others, by Team Members, and accused in PM of being "aloof". I'm NOT aloof, I'm CONSERVATIVE of what goes on my computer systems. TL [EDIT:] I found this through NVIDIA: CUDA Toolkit 6.5.14 for MAC OS X. I hope this is the right version to get; it's all I could find via Search. [EDIT 2:] I don't know how to "Extract" the files you are looking for. After completing the DL, this looks like an automatic installer. I have NOT Double Clicked it; I do NOT want to inadvertently install something that I don't need. How do I get the two Libraries that you want me to have from this Executable File??? TL [EDIT 3:] I'm also tired of hearing that "because you created a Hack..." that doing these mods should be easy in comparison... My building the Hack took 3 Weeks on Snow Leopard trying to make the drive bootable. To then find that the App Store would NOT work, and I could NOT DL El Capitan 10.11.3 from my SL Desktop. Had to get creative and use Skype to DL 10.11.3 from my MAC friend who WAS able to DL 10.11.3 to his MAC. Beyond these issues; getting 10.11.3 Installed and running took a few GUI Tools that I familiarized myself with. Had I had to use ANY Command Line coding, I would NOT be running my system today. While it wasn't easy doing this project, it was a learning experience; as I knew it would be. That's WHY I chose to attack this project on my own; for the challenge. I was successful. I now have a running MAC OS on a PC. Does this make me an Expert??? HELL NO!!! I'm learning much from the Hackintosh community on what it takes to maintain this system. I don't want something else to inadvertently take the system down. Again, I'm being CONSERVATIVE! I ALWAYS like to learn new things. Acting on what I learn takes me a LOT longer than many other people. Maybe I'm too cautious... But, I don't plan on changing my personal values any time soon. TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I did NOT appreciate being hammered into installing CUDA 42. That's where I'm coming from. I was hammered by you, by others, by Team Members, and accused in PM of being "aloof". I'm NOT aloof, I'm CONSERVATIVE of what goes on my computer systems. Please point out the post where I "hammered" you. Since I didn't send any PMs it should be easy. As I remember it, I merely pointed out how to install the app seeing as how you obviously didn't understand how easy it is to just copy and paste a few files. Here it is again; You don't need to install the ToolKit, just extract Two Libraries.Oh, to unpack the ToolKit use this tool, http://www.macupdate.com/app/mac/16357/unpkg Here is the file from the 6.5 Download; Links to Libraries and Driver; Here is the instructions from the UnPkg link; unpkg is a utility that simply unpacks all the files in a Mac OS X package (.pkg file) into a folder. Drag more packages here or quit. You simply drag the Toolkit Package onto the App or the Window and release it. The App will UnPack it, just as other utilities Unpack Zip files. |
TimeLord04 Send message Joined: 9 Mar 06 Posts: 21140 Credit: 33,933,039 RAC: 23 |
I did NOT appreciate being hammered into installing CUDA 42. That's where I'm coming from. I was hammered by you, by others, by Team Members, and accused in PM of being "aloof". I'm NOT aloof, I'm CONSERVATIVE of what goes on my computer systems. I didn't say that you PM-ed me; I said I received PMs... TL [EDIT:] However; I did feel attacked because I did NOT understand how to do what you wanted me to do. Again, you specifically did NOT attack, but being told by others about the "20 years of experience..." that you and the others have over me made me feel quite inferior. I did NOT appreciate the attacks. It takes me longer to learn because I have LEARNING DISABILITIES!!! I have ADD, I have Bipolar, and PTSD from being attacked as kid by peers that hated me. All this takes it's toll, and all I ask is for understanding when I DON'T want to immediately jump on something that someone claims works. I choose to read, and learn, and slowly understand what's being presented. THEN, I implement... WHEN I choose to do so. TL TimeLord04 Have TARDIS, will travel... Come along K-9! Join Calm Chaos |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
My personal intent was never to make you feel attacked, and the whole "we've been doing this for nearly 20 years thing" was to merely reinforce and hopefully get you to acknowledge that your machine was not acting appropriately with the th OpenCL version of the App, in spite of your repeated instance that it was. It was a fairly common problem that had a solution and for which we all offered you help to resolved. I don't know who PM'd you, but you know I kinda get it. Greatly summarized, you came to us asking how to make seti work on your non-standard setup (and I'm all for people doing things like hackintosh), and we told you. You did some of it, but not all and then we pointed out it wasn't working right and you argued with it that it was. Help was offered at every step, heck if asked nicely someone on here would have probably gladly remotely connected to your computer and copied the files for you so that you could see what needed to be done. Be that as it may, your computer it working a lot better now than it was and that's great, for you and the science that has brought us all together here. As for the learning disabilities, I get it. My partner is severely ADHD, as well as PTSD from an abusive childhood and my son is ADHD and will likely be diagnosed with some level of bipolar disorder by the time he is grown. So I juggle that everyday, and in my interactions with them I've found some "vigorous encouragement" is sometimes what's needed to encourage a change when a change is needed. Perhaps that came across too strongly for you in this particular situation and if it did I apologize for that as I certainly never intended for you to feel attacked by what I wrote. At the end of the day though we are all at a better place of understanding and the science will benefit because of that. Chris |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.