I've Built a Couple OSX CUDA Apps...

Author	Message
Brent Norman Volunteer tester Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835	Message 1857981 - Posted: 27 Mar 2017, 19:21:05 UTC - in response to Message 1857948. Last modified: 27 Mar 2017, 19:23:48 UTC My i7 with 980/1070 looks fairly good with zi3k+ Validation pending (1214) Â· Validation inconclusive (79) Â· Valid (955) Â· Invalid (0) ID: 1857981 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1858005 - Posted: 27 Mar 2017, 20:29:37 UTC - in response to Message 1857957. Last modified: 27 Mar 2017, 20:39:02 UTC The parameters to cudaMemsetAsync are in the CUDA documentation and the size of the reserved mem buffer can be found in cudaAcceleration.cu where the buffer is allocated using CUDA device memory allocation function. The size is in bytes and one float (short and fast form of decimal number) takes four bytes (4 chunks of 8 bit integers, totalling of 32 bits i.e. four bytes) Petri I couldn't make heads or tails of that, however, I decided to bring back the last cudaDeviceReset(); in cudaAcceleration.cu. In hopes of avoiding the Mac SIGBUS error, which is why that particular cudaDeviceReset was disabled, I moved it to After the cudaThreadExit();. Unfortunately, that didn't work. The 10th task ended in a SIGBUS Error. I haven't seen one of those in a while, in fact, I haven't seen one since cudaDeviceReset was disabled. Best spike: peak=25.77551, time=90.6, d_freq=1419030775.88, chirp=8.566, fft_len=64k Best autocorr: peak=19.23401, time=60.4, delay=3.331, d_freq=1419034455.47, chirp=20.735, fft_len=128k Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.121e+11, d_freq=0, score=-12, null_hyp=0, chirp=0, fft_len=0 Best pulse: peak=3.728501, time=38.44, period=0.07373, d_freq=1419029541.02, score=0.8326, chirp=0, fft_len=32 Best triplet: peak=8.613966, time=52.78, period=0.0426, d_freq=1419031372.07, chirp=0, fft_len=32 Spike count: 9 Autocorr count: 2 Pulse count: 0 Triplet count: 7 Gaussian count: 0 SIGBUS: bus error Crashed executable name: setiathome_x41p_zi3t1e_x86_64-apple-darwin_cuda80 Oh well, zi3k+ isn't so bad...for now. ID: 1858005 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13731 Credit: 208,696,464 RAC: 304	Message 1858078 - Posted: 28 Mar 2017, 7:41:04 UTC - in response to Message 1857981. My i7 with 980/1070 looks fairly good with zi3k+ Validation pending (1214) Â· Validation inconclusive (79) Â· Valid (955) Â· Invalid (0) 6.5% So very close. Grant Darwin NT ID: 1858078 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1858307 - Posted: 29 Mar 2017, 19:41:48 UTC If pattern of PulseFind issue remained the same from time I saw it before, the issue not in some missing tale processing. There were samples with proper signal reported as best (that is, corresponding data was processed, but incorrect pulse reported as reportable for same chunk. I described issue in details some time ago on beta. Maybe could be useful for more targeted bughunt. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1858307 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1858373 - Posted: 30 Mar 2017, 2:16:39 UTC - in response to Message 1858307. ...Maybe could be useful for more targeted bughunt. Any suggestions? I sent Petri a WU that consistently produces a Bad Best Pulse, I think he's working on that. I'm just trying to determine why some of us are receiving False overflows while he isn't. I've always wondered what disabling the last cudaDeviceReset(); would do. Perhaps not ending correctly will cause something to not start correctly? There are 3 cudaDeviceReset(); in cudaAcceleration.cu. The first two appear to affect CUDART_VERSION < 3000, the third one, which is disabled, is for CUDART_VERSION >= 4000. Maybe adding one for CUDART_VERSION >= 4000 will help? It can't be added at the end, that causes SIGBUS errors. Maybe adding one at the top, below CUDART_VERSION >= 4000? But where to put it? If you were going to add a cudaDeviceReset(); to the top of cudaAcceleration.cu, where would you put it? ID: 1858373 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1858429 - Posted: 30 Mar 2017, 13:52:08 UTC Well, Petri is receiving False overflows, it's just that his doesn't happen immediately, https://setiathome.berkeley.edu/results.php?hostid=7475713&state=5 Where the rest of us have them immediately, https://setiathome.berkeley.edu/workunit.php?wuid=2486318447 https://setiathome.berkeley.edu/workunit.php?wuid=2478474501 https://setiathome.berkeley.edu/workunit.php?wuid=2478335863 https://setiathome.berkeley.edu/workunit.php?wuid=2484236659 Etc, etc... Meanwhile, the Inconclusive results have overall dropped dramatically since adding a cudaDeviceReset(); to the Top of cudaAcceleration.cu. Compare the results since 29/1700 UTC, http://setiathome.berkeley.edu/results.php?hostid=6796479&state=3, of course, you have to weed out the usual suspects. ID: 1858429 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1858503 - Posted: 31 Mar 2017, 2:40:52 UTC - in response to Message 1858429. slightly closer to diving in. Will see if this weekend gets as throttled as last... "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1858503 ·

Chris Adamek Volunteer tester Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236	Message 1859840 - Posted: 6 Apr 2017, 17:21:07 UTC Nvidia is finally releasing Pascal drivers for the Mac. Guess it's time to start thinking about an upgrade... https://blogs.nvidia.com/blog/2017/04/06/titan-xp/ Chris ID: 1859840 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1859891 - Posted: 6 Apr 2017, 22:50:12 UTC - in response to Message 1859840. Nvidia is finally releasing Pascal drivers for the Mac. Guess it's time to start thinking about an upgrade... https://blogs.nvidia.com/blog/2017/04/06/titan-xp/ Chris Wooo, That'll make things way easier "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1859891 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1859905 - Posted: 7 Apr 2017, 0:53:48 UTC - in response to Message 1859891. Nvidia is finally releasing Pascal drivers for the Mac. Guess it's time to start thinking about an upgrade... https://blogs.nvidia.com/blog/2017/04/06/titan-xp/ Chris Wooo, That'll make things way easier The big question is which versions of OSX will be supported. I looked around nVidia and didn't see any indications. It's a given 10.12 will be supported, but what about the others? I really don't want to jump through the hoops to install 11.12 on my old Mac, 11.11 works for me. I just had a meltdown with version x41p_zi3t1e while looking around nVidia. I noticed the machine sounded different and when I looked at BOINC I noticed it had just Trashed EVERY GPU task on the machine. Around 288 ! WTH? It hadn't reported anything, so I shut it down before it could. The best I can figure, it ran into a horde of Shorties that triggered an Error with the Autotune. Cuda error 'cufftPlan1d(&fft_analysis_plans[FftNum][0], FftLen, CUFFT_C2C, NumDataPoints / FftLen)' in file 'cuda/cudaAcc_fft.cu' in line 29 : invalid argument. About 288 of those. Since they were ALL errors I decided to reset them and see if they would be accepted. So far, it looks as though they are validating instead of being Immediately labeled as Invalid. Of course...I went back to zi3k+. I don't need any more Errors anytime soon. I have No idea why that Error was triggered with zi3t1e, but it was pretty nasty. ID: 1859905 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1859923 - Posted: 7 Apr 2017, 2:22:32 UTC - in response to Message 1859905. Last modified: 7 Apr 2017, 2:23:41 UTC Cufft failures like that have a lot of possible causes only really some monitoring can diagnose further. Fingers crossed the nvml support (as used by the nvidia-smi command) will work under the new drivers for Pascal when they materialise. If so I'll just whip up a lightweight (cross platform) standalone monitoring utility, and put some basic monitoring in the apps. There are some basic failures due to bugs or other that may be caught with things like Petri and I discussed in the Linux thread, however there is still some juggling likely across all platforms now. That's since Vulkan and Vulkan over metal is starting to show its teeth in places, and Supposedly (rumoured) Apple wants VR support, demanding high end GPUs "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1859923 ·

petri33 Volunteer tester Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156	Message 1859965 - Posted: 7 Apr 2017, 5:31:50 UTC Hi, Autotune has no memory allocation check right now. If you run low on GPU RAM you get those errors. That is on my TODO list: Make the app to recognise too big unroll (/autotuned value) at task startup. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones ID: 1859965 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1861005 - Posted: 12 Apr 2017, 9:57:38 UTC Has anyone tried the New Pascal driver with the New CUDA driver? The Pascal driver came out early yesterday for 10.12.4, I jumped through the Hoops to get 10.12.4 installed and then found the CUDA driver 8.0.71 didn't work. Later they released 8.0.81, but it appears to be trash. With the 9 series cards CUDA 8.0.81 in 10.12.4 is around 30% slower than 8.0.71 in 10.11.6. With CUDA 8.0.81 in 10.11.6 I ended up quitting it cause it was so slow. So, it's back to 10.11.6 with driver 8.0.71 for now. I never even tried the 1050s in the Mac, no sense swapping cards if the CUDA driver doesn't work. The Graphics driver seems to work fine, just the CUDA driver is really slow. I did find you can't use a DVI-VGA adapter with the new graphics driver, it's either HDMI or DVI with the new driver. https://forums.macrumors.com/threads/webdriver-for-gtx-1080-1070.1979778/page-12 ID: 1861005 ·

Zalster Volunteer tester Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242	Message 1861024 - Posted: 12 Apr 2017, 13:00:56 UTC - in response to Message 1861005. Last modified: 12 Apr 2017, 13:03:53 UTC I'm still running with my old laptop. MacBook Pro Late 2013 15 inch Processor 2.6 Ghz Intel Core i7 16 GB 1600 MHz DDR3 Grapichs Nvidia GeForce GT 750M 2048 MB Intel Iris Pro 1536 MB Mac OS v 10.12.4(16E195) Cuda Driver Version 8.0.81 GPU Driver Version 10.16.34 355.10.05.35f05 Nvidia Web Driver 378.05.05.05f01 I did notice, don't use the Nvidia Web Driver to display while trying to use BOINC at the same time. My webpages immediately went blank and I was forced to hard quit and reboot the system. I had to change the default display to the OS X default Graphic Driver to display webpages and crunch at the same time. Sorry haven't really bother to check the times. I have notice that if the screen saver was selected to run BOINC that once the screen saver started (or was supposed to start) the system locks up. So I've disabled BOINC as default SS and only run it when I remember. ID: 1861024 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1861044 - Posted: 12 Apr 2017, 14:58:24 UTC - in response to Message 1861024. Last modified: 12 Apr 2017, 15:52:02 UTC Looking at the times on that GPU they don't seem much different with the 8.0.81 driver. I still don't know why the LapTops have trouble finding the NV GPU. Since you're not using the iGPU, can you try adding a line to cc_config.xml telling BOINC to ignore the iGPU? It would look like this; <ignore_intel_dev>0</ignore_intel_dev> Maybe that will solve the 'no CUDA-capable device is detected.' problem. I just ran the CUDA 75 App benchmark in 10.11.6 with 8.0.71 and it looks normal on my machine, I don't have an iGPU; 10:29:06 (43008): Can't set up shared mem: -1. Will run in standalone mode. v8 task detected setiathome_CUDA: Found 3 CUDA device(s): Device 1: GeForce GTX 960, 2047 MiB, regsPerBlock 65536 computeCap 5.2, multiProcs 8 pciBusID = 2, pciSlotID = 0 Device 2: GeForce GTX 950, 2047 MiB, regsPerBlock 65536 computeCap 5.2, multiProcs 6 pciBusID = 1, pciSlotID = 0 Device 3: GeForce GTX 950, 2047 MiB, regsPerBlock 65536 computeCap 5.2, multiProcs 6 pciBusID = 5, pciSlotID = 0 setiathome_CUDA: No device specified, determined to use CUDA device 1: GeForce GTX 960 SETI@home using CUDA accelerated device GeForce GTX 960 setiathome enhanced x41zi (baseline v8), Cuda 7.50 setiathome_v8 task detected Detected Autocorrelations as enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 0.215592 re-using dev_GaussFitResults array for dev_AutoCorrIn, 4194304 bytes re-using dev_GaussFitResults+524288x8 array for dev_AutoCorrOut, 4194304 bytes Thread call stack limit is: 1k cudaAcc_free() called... cudaAcc_free() running... cudaAcc_free() PulseFind freed... cudaAcc_free() Gaussfit freed... cudaAcc_free() AutoCorrelation freed... cudaAcc_free() DONE. Flopcounter: 59550947364203.703125 Spike count: 7 Autocorr count: 0 Pulse count: 0 Triplet count: 0 Gaussian count: 0 10:44:39 (43008): called boinc_finish(0) I'll boot to 10.12.4 after it's finished with the benchmark and try the same WUs there with 8.0.81. I'll also check the Browser while the App is running, yesterday there wasn't any problem on my machine while running the Special cuda App. Everything looks normal in 10.11.6 with cuda 8.0.71; Running on TomsMacPro.local at Wed Apr 12 14:22:18 2017 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : 18dc09ah.26284.16432.6.33.125.wu 30oc08ae.27779.7839.4.31.25.wu blc2_2bit_guppi_57403_HIP11048_0006.17091.831.22.45.71.wu Listing executable(s) in /APPS : setiathome_8.11_x86_64-apple-darwin__cuda75_mac Listing executable in /REF_APPs : MBv8_8.05r3344_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: 18dc09ah.26284.16432.6.33.125.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 3517 seconds --------------------------------------------------- Running app with command : setiathome_8.11_x86_64-apple-darwin__cuda75_mac 407.15 real 85.59 user 73.66 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 407 seconds Speed compared to default : 864 % ----------------- Comparing results Result : Strongly similar, Q= 99.70% --------------------------------------------------- Done with 18dc09ah.26284.16432.6.33.125.wu. Current WU: 30oc08ae.27779.7839.4.31.25.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 8462 seconds --------------------------------------------------- Running app with command : setiathome_8.11_x86_64-apple-darwin__cuda75_mac 936.01 real 111.58 user 92.52 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 936 seconds Speed compared to default : 904 % ----------------- Comparing results Result : Strongly similar, Q= 99.77% --------------------------------------------------- Done with 30oc08ae.27779.7839.4.31.25.wu. Current WU: blc2_2bit_guppi_57403_HIP11048_0006.17091.831.22.45.71.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 4797 seconds --------------------------------------------------- Running app with command : setiathome_8.11_x86_64-apple-darwin__cuda75_mac 1331.45 real 83.22 user 71.03 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 1331 seconds Speed compared to default : 360 % ----------------- Comparing results Result : Strongly similar, Q= 99.37% --------------------------------------------------- Done with blc2_2bit_guppi_57403_HIP11048_0006.17091.831.22.45.71.wu. Done with Benchmark run! Removing temporary files! ++++++++++++++++++++++++++ Well, I see the same problem with the Baseline App with 8.0.81 as the Special App, the new CUDA driver uses Twice as Much CPU and in addition, uses Twice as much Memory. It seems to be taking Much longer as well. I don't see any Browser problems with Safari or FireFox though. WU: 18dc09ah.26284.16432.6.33.125.wu has been running 28 minutes and it still isn't finished.....Bad. I wonder how long it will be before nVidia releases a New CUDA driver for 10.12.4. ID: 1861044 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1861071 - Posted: 12 Apr 2017, 17:01:28 UTC So, I stand by my original comment. The New CUDA driver 8.0.81 released yesterday is trash if used with Graphics driver 10.17.34 378.05.05.05f01 in 10.12.4. The task 18dc09ah.26284.16432.6.33.125.wu finally finished after 45 minutes running the Stock CUDA75 App, it only took around 7 minutes in 10.11.6 with driver 8.0.71. The Stock OpenCL App seems to be working normally in my 10.12.4 system, it's a little slower than CUDA on the Arecibos and a bit faster on the BLCs; Running on TomsMacPro.local at Wed Apr 12 15:19:44 2017 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : 18dc09ah.26284.16432.6.33.125.wu 30oc08ae.27779.7839.4.31.25.wu blc2_2bit_guppi_57403_HIP11048_0006.17091.831.22.45.71.wu Listing executable(s) in /APPS : setiathome_8.11_x86_64-apple-darwin__cuda75_mac Listing executable in /REF_APPs : MBv8_8.05r3344_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: 18dc09ah.26284.16432.6.33.125.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 3517 seconds --------------------------------------------------- Running app with command : setiathome_8.11_x86_64-apple-darwin__cuda75_mac 2701.69 real 256.77 user 2156.47 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 2702 seconds Speed compared to default : 130 % ----------------- Comparing results Result : Strongly similar, Q= 99.70% --------------------------------------------------- Done with 18dc09ah.26284.16432.6.33.125.wu. Done with Benchmark run! Removing temporary files! Running on TomsMacPro.local at Wed Apr 12 16:07:08 2017 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : 18dc09ah.26284.16432.6.33.125.wu 30oc08ae.27779.7839.4.31.25.wu blc2_2bit_guppi_57403_HIP11048_0006.17091.831.22.45.71.wu Listing executable(s) in /APPS : setiathome_8.19_x86_64-apple-darwin__opencl_nvidia_mac Listing executable in /REF_APPs : MBv8_8.05r3344_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: 18dc09ah.26284.16432.6.33.125.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 3517 seconds --------------------------------------------------- Running app with command : setiathome_8.19_x86_64-apple-darwin__opencl_nvidia_mac -sbs 256 -period_iterations_num 10 429.37 real 46.42 user 125.92 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 430 seconds Speed compared to default : 817 % ----------------- Comparing results Result : Strongly similar, Q= 99.48% --------------------------------------------------- Done with 18dc09ah.26284.16432.6.33.125.wu. Current WU: 30oc08ae.27779.7839.4.31.25.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 8462 seconds --------------------------------------------------- Running app with command : setiathome_8.19_x86_64-apple-darwin__opencl_nvidia_mac -sbs 256 -period_iterations_num 10 1021.87 real 133.40 user 235.57 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 1022 seconds Speed compared to default : 827 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 5 7 7 0 0 5 7 7 0 Autocorr 0 0 0 0 0 0 0 0 0 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 0 0 0 0 0 0 0 0 0 Triplet 0 0 0 0 0 0 0 0 0 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 0 1 1 0 0 0 1 1 0 Best Gaussian 1 1 1 1 0 1 1 1 1 0 Best Pulse 0 1 1 1 0 0 1 1 1 0 Best Triplet 0 0 0 0 0 0 0 0 0 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 1 8 11 11 0 1 8 11 11 0 Result : Strongly similar, Q= 97.23% --------------------------------------------------- Done with 30oc08ae.27779.7839.4.31.25.wu. Current WU: blc2_2bit_guppi_57403_HIP11048_0006.17091.831.22.45.71.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 4797 seconds --------------------------------------------------- Running app with command : setiathome_8.19_x86_64-apple-darwin__opencl_nvidia_mac -sbs 256 -period_iterations_num 10 778.23 real 71.34 user 211.07 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 778 seconds Speed compared to default : 616 % ----------------- Comparing results Result : Strongly similar, Q= 99.89% --------------------------------------------------- Done with blc2_2bit_guppi_57403_HIP11048_0006.17091.831.22.45.71.wu. Done with Benchmark run! Removing temporary files! We'll see how it goes with the Next CUDA driver. If you're running the Apple Graphics driver you should be able to Manually Reinstall the Older CUDA driver 8.0.71 by just running the installer from here; http://www.nvidia.com/object/macosx-cuda-8.0.71-driver.html ID: 1861071 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1861477 - Posted: 14 Apr 2017, 18:15:38 UTC Still nothing from nVidia about CUDA Driver 8.0.81, https://forums.geforce.com/default/topic/1002724/geforce-apple-gpus/pascal-based-graphics-card-drivers-for-macos/post/5127351/#5127351 One can only hope someone at nVidia is running SETI@Home on their test Mac.... In other news it appears the latest App x41p_zi3t2b has solved the problem with False Overflows on the Low Angle Range Arecibo tasks; Listing executable(s) in /APPS : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 Listing executable in /REF_APPs : MBv8_8.05r3344_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: 30oc08ae.27779.7839.4.31.25.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 8462 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 -bs -unroll autotune -device 0 unroll limits: min = 1, max = 256. Using unroll autotune. 393.03 real 51.21 user 26.88 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 393 seconds Speed compared to default : 2153 % ----------------- Comparing results Result : Strongly similar, Q= 99.76% --------------------------------------------------- Done with 30oc08ae.27779.7839.4.31.25.wu. Current WU: 31oc08ad.28198.24203.5.32.12.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 10974 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 -bs -unroll autotune -device 0 unroll limits: min = 1, max = 256. Using unroll autotune. 470.36 real 73.58 user 47.05 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 470 seconds Speed compared to default : 2334 % ----------------- Comparing results Result : Strongly similar, Q= 99.87% --------------------------------------------------- Done with 31oc08ad.28198.24203.5.32.12.wu. Done with Benchmark run! Removing temporary files! Unfortunately it still has the PulseFind problem; Current WU: 16fe08aa.12502.25021.6.33.13.wu --------------------------------------------------- Running app with command : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 -bs -unroll autotune -device 0 unroll limits: min = 1, max = 256. Using unroll autotune. 148.69 real 21.88 user 15.76 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 148 seconds Speed compared to default : 2408 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 0 0 0 0 0 0 0 0 0 Autocorr 0 0 0 0 0 0 0 0 0 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 0 0 0 0 0 0 0 0 0 Triplet 0 0 0 0 0 0 0 0 0 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 1 1 1 1 0 1 1 1 1 0 Best Pulse 0 0 0 0 1 0 0 0 0 1 Best Triplet 0 0 0 0 0 0 0 0 0 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 1 3 3 3 1 1 3 3 3 1 Unmatched signal(s) in R1 at line(s) 396 Unmatched signal(s) in R2 at line(s) 396 For R1:R2 matched signals only, Q= 99.98% Result : Weakly similar. --------------------------------------------------- Done with 16fe08aa.12502.25021.6.33.13.wu. Current WU: blc13_2bit_guppi_57824_79834_HIP22449_0042.23456.818.23.46.39.vlar.wu --------------------------------------------------- Running app with command : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 -bs -unroll autotune -device 0 unroll limits: min = 1, max = 256. Using unroll autotune. 502.96 real 60.15 user 25.14 sys Elapsed Time : â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦â€¦ 503 seconds Speed compared to default : 1488 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 5 5 5 0 0 5 5 5 0 Autocorr 0 0 0 0 0 0 0 0 0 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 13 13 13 2 0 13 13 13 2 Triplet 0 3 3 3 0 0 3 3 3 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 1 1 1 1 0 1 1 1 1 0 Best Pulse 0 1 1 1 0 0 1 1 1 0 Best Triplet 0 1 1 1 0 0 1 1 1 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 1 26 26 26 2 1 26 26 26 2 Unmatched signal(s) in R1 at line(s) 404 500 Unmatched signal(s) in R2 at line(s) 404 489 For R1:R2 matched signals only, Q= 99.22% Result : Weakly similar. --------------------------------------------------- ID: 1861477 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1863677 - Posted: 26 Apr 2017, 6:32:53 UTC Still Nothing from nVidia on the CUDA 8.0.81 driver. However, the Maxwell GPUs work fine with the Original CUDA 8.0.71 Driver with the Original 10.12.4 Webdriver here, https://images.nvidia.com/mac/pkg/367/WebDriver-367.15.10.45f01.pkg So, We will just have to keep using the Maxwell GPUs for now. I just posted the New Mac CUDA Special App, it works fine on My Mac; http://www.arkayn.us/forum/index.php?topic=191.msg4411#msg4411 It's a nice improvement over the last posted Mac App with much faster results with the BLC tasks. Have fun. ID: 1863677 ·

Chris Adamek Volunteer tester Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236	Message 1863756 - Posted: 26 Apr 2017, 14:48:02 UTC - in response to Message 1863677. If they ever get these drivers working well, I plan on giving an external thunderbolt enclosure with a 10x0 card a try on my 2013 Mac Pro. Don't want to do anything as simple as just adding one to my 5,1 Mac Pro lol. I'll give the new app a try this afternoon. Same command line options for unroll and such? Thanks, Chris ID: 1863756 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1863788 - Posted: 26 Apr 2017, 17:47:13 UTC - in response to Message 1863756. There is a new command for the unroll that automatically sets the tasks unroll number to the number of multiProcs, However, there are Known limits depending on the GPU's amount of vRam. On GPUs with only One or Two GBs of vRam you will need to set the unroll number Manually (aka, the Old Fashioned Way). On the following vRam GPUs set the unroll to; 1 or 2 for GPUs with One GB vRam and more than Two multiProcs 6 for GPUs with Two GB vRam and more than Six multiProcs GPUs with 3 GBs of vRam haven't been tested but should work using the -unroll autotune command. 2 GB GPUs with 5 or 6 multiProcs can use -unroll autotune. If you receive the "cufftPlan1d..." Error you should manually reduce the unroll number. It seems Sierra uses slightly more vRam than previous systems. The settings are covered in the Settings_Libraries_and_Drivers.txt file in the docs folder. ID: 1863788 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.