Message boards :
Number crunching :
I've Built a Couple OSX CUDA Apps...
Message board moderation
Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 . . . 58 · Next
Author | Message |
---|---|
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Well those are questions for Petri and Eric (not my decisions). For 'Release', as opposed to test/private builds, for GPL compliance, you would need to do the minimum of update & supply a docs folder in packages similar to: https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch#docs, adding links to any code changes you had to make to get the build functional (e.g. for petri's code, indicate in the alpha folder where his modified code is available), adding or adding to a file saying "Builds compiled by TBar", would be a reasonable way to brand it (though not required afaik). Doing so in the Stderr would make differentiation easier for people looking at the results, should there be some need to trace the build (many possible scenarios). "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Also, any similarity with AMD issue on OS X that Chris follows very good? Gaussians missing there - what pattern here (NV) if any? My guess would be yes. It seems the problems started around the same time on the desktops, the change from Darwin 15.3 to 15.4. Some nVidia machines were working fine with the OpenCL App with 15.3 and then began producing mostly Inconclusives with 15.4. On my machine the ATI 6850 takes almost an hour to complete a blc5 in Yosemite 14.5. After booting into 15.4 the exact same arrangement completes the blc5 in around 42 minutes. Obviously a major change with the OS. I was hoping you might come up with a fix before blocking a number of machines. |
Chris Adamek Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236 |
Kinda unrelated, but is documentation all that's missing for releasing the optimized Mac apps you put together Tbar? I'm not sure what all I'd need to do but I'd be happy to help getting those pushed to main. There are lots of 12,16 and 24 core Mac Pros out there that could really help process some guppies through with some better apps. Yours seems to be about twice as fast as the current stock apps. Just curious. Thanks, Chris |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
AFAICT, that'd be about all that's missing for CPU. Eric's usually particularly careful about the completion of that for Stock CPU, because he apparently gets a call from FFTW library's Matteo Frigo if something's out of order there. Fortunately if there aren't any stock codebase changes, then duplicates of the text files distributed with existing stock CPU should be fine. Caveat would be that having not provided a stock CPU application directly myself (just small source patches), I'm not clear if Eric/Berkeley prefer to control those builds, performance secondary, for other reasons. GPUs, reliant on third party development from the start, and decoupled from FFTW, are less reliant on layers of compliance/bureaucracy, and the user-base + combined throughput considerably lower. That isn't to say there isn't still a process for those - making Eric's Job as easy as possible like Raistmer's description of reminders and prodding helps. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The CPU apps were compiled from the AKv8 folder which also has it's own Copyright file; https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/AKv8/COPYRIGHT FFTW: Copyright (c) 2003,2006-2011 Matteo Frigo 4 // Copyright (c) 2003,2006-2011 Massachusets Institute of Technology 5 // 6 // fft8g.[cpp,h]: Copyright (c) 1995-2001 Takya Ooura 7 8 // Brook+ and OpenCL parts of code: Copyright (c) 2010-2015 Raistmer 9 10 // This program is free software; you can redistribute it and/or modify it 11 // under the terms of the GNU General Public License as published by the 12 // Free Software Foundation; either version 2, or (at your option) any later 13 // version. None of my Apps have any stock codebase changes, only custom Configure lines. In fact, Chris helped track down the configure flag that helped the AVX App preform better. Considering the Stock Windows & Linux CPU App has AVX I see No reason for it to be left out of the Stock Mac CPU App. I believe I mentioned that to Eric the last time I recommended the Apps to Beta a few months ago. Any documentation required for the Apps I've compiled can be found in the Berkeley Repository as that is the location of the code. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Has Eric released any direct AKv8 derivatives as stock CPU? Historically they may have hardware specific limitations built in that make them less suitable for generic stock. Boinc server doesn't do hardware dispatch unless Eric creates a custom scheduler path for it, which could be one part of explaining why such a build isn't already stock. [Edit:] That's why Lunatics and Joe Segur, before my time, ported Alex Kan's and others' code into stock CPU with internal dispatch, and Joe Later added hand AVX kernels there. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The Apps here, http://setiathome.berkeley.edu/apps.php 7.07 (avx) 7.07 (sse41) 7.07 (ssse3) Were All compiled from the AKv8 folder and preformed Very well with the exception of those few AVX LapTops running Lion. Those Machines are Still choking on the Current Stock CPU App and producing Thousands of Invalids a Day. They did work with the ssse3 App though. The only problem was with those handful of people who seem to only run SETI to watch the ScreenSaver. It appears that way from their RAC anyway. For those few I propose SETI assemble a 50 frame GIF they can watch the few times a year they actually run SETI. That should solve that. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Very well, seems like Ducks in a row then :D To my mind if you're providing drop-in-equivalent builds that work better, have the legals in order, and make it easy for Eric, then probably the only blockages are Eric's timetable & priorities. As Raistmer pointed out the last part can be sometimes tough. Depends on how hard they're turning the screws on him at Berkeley. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Hmmm, after reporting another 56 tasks I just realized I haven't received a single SIGBUS error since placing the 3rd CUDA card back in the machine early this morning. Interesting. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Hmmm, after reporting another 56 tasks I just realized I haven't received a single SIGBUS error since placing the 3rd CUDA card back in the machine early this morning. Haven't been game to click the update button on the Mac Pro until I have more time. Simply reseating cards can do things interrupt related. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I dunno, but I'd suspect it doesn't help with BOINC claiming you have GPUs 1,2,2 instead of GPUs 0,1,2. Sat May 21 17:25:37 2016 | | OpenCL: NVIDIA GPU 1: GeForce GTX 950 (driver version 10.11.10 346.03.10f01, device version OpenCL 1.2, 2048MB, 2048MB available, 758 GFLOPS peak) I'll wait another day before making any conclusions. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Agreed, that's a Boinc device enumeration issue which is why on Windows MB Cuda for device specific identification and control, I chose to use PCI Bus and slot number, which at least is afaik the only recommended way that is guaranteed consistent. Bit of a case of someone somewhere not having read something, and making up their own 'best practices' "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Obviously a major change with the OS. I was hoping you might come up with a fix before blocking a number of machines. Fix to... OS X? As I read it correctly, only some OS X version range produce invalids. Latest version works OK again. So, range exclusion will help to clean from invalids and inconclusives. I'm not sure anyone will try to produce any workaround for that range. It's quite time consuming to develop workaround to semi-broken system code (I suspect broken precision for some trigonometry or alike) and hardly anyone has free time for that. So, would be good to have definitive range for exclusion then plan class could be corrected. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
As usual, the solution is not that simple. Since going back to the AMD App r3347 Chris's Mac Pro is again finding Gaussians and has moved up in the standings, http://setiathome.berkeley.edu/top_hosts.php?sort_by=expavg_credit&offset=20 It would seem there was a code change somewhere between r3347 and r3430 that results in the Mac Pros having problems using the newer App in Darwin 15.4 and above. The Current OS is 15.5 and it still has the problem with r3430. Right now Darwin 16.0 is somewhere in the future and hasn't even reached Beta stage yet. There is no guarantee 16.0 will work with r3430 in it's final release, which is a long way off. The nVidia LapTops are a different story, they were having problems long before Darwin 15.4. Most of the Mac nVidia GPUs are in LapTops. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
The nVidia LapTops are a different story, they were having problems long before Darwin 15.4. Most of the Mac nVidia GPUs are in LapTops. Minor point/question of order. Do these laptops 'have problems'? or do the apps 'have problems on them' ? My 2009 Mac pro has problems, because it's obnoxiously shiny and ugly looking all fancy in the rubble, but OS and hardware seem to work. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
From my POV the problem starts with having to get a driver... A person who won't read has no advantage over one who can't read. (Mark Twain) |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
From my POV the problem starts with having to get a driver... To some extent I agree. people definitely need to consider driver/OS support [,and compatibility of course,] when purchasing/installing hardware, more than often is. A friend of mine paid a fortune for his old creative labs sound card, and he won;t let it go. They stopped making drivers for it with Vista (because the driver+hardware model changed), but there is always a hack/workaround. Depends on how much you value your time, as to how much you're willing to flog a dead horse I suppose. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The OpenCL drivers are built into the OS. You can't 'get a driver' in OSX. nVidia offers their own Video driver that is again Custom Built for a particular OS version. Each driver will only work with that OS version. To change the driver you must change the OS version. You can look at the nVidia Video driver yourself and see if there is an OpenCL driver in there, I couldn't find one, http://www.nvidia.com/download/driverResults.aspx/103826/en-us Now nVidia does offer a CUDA driver, and you must use the correct version of that as well when using CUDA. The CUDA Apps work with those nVidia LapTops rather nicely even though the OpenCL App doesn't. That's sorta why I posted the Mac CUDA Apps over here, http://www.arkayn.us/forum/index.php?topic=191.msg4411#msg4411 Unfortunately SETI has changed to VLARs which takes almost as long with the CUDA App as it does with the OpenCL App on those LapTops. But, the CUDA App produces Valid results on those LapTops whereas the OpenCL App produces mainly Inconclusives. The only drawback is you must manually update the CUDA driver when you update the OS, or else CUDA may stop working. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Those don't sound like serious serious issues to a Cuda developer intent on producing valid results (i.e. me) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
William Send message Joined: 14 Feb 13 Posts: 2037 Credit: 17,689,662 RAC: 0 |
sorry, NV for me is still synonymous with CUDA. I forget they can do OpenCL as well. A person who won't read has no advantage over one who can't read. (Mark Twain) |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.