I've Built a Couple OSX CUDA Apps...

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 44 · 45 · 46 · 47 · 48 · 49 · 50 . . . 58 · Next

AuthorMessage
Profile Brent Norman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 1 Dec 99
Posts: 2786
Credit: 685,657,289
RAC: 835
Canada
Message 1857981 - Posted: 27 Mar 2017, 19:21:05 UTC - in response to Message 1857948.  
Last modified: 27 Mar 2017, 19:23:48 UTC

My i7 with 980/1070 looks fairly good with zi3k+
Validation pending (1214) · Validation inconclusive (79) · Valid (955) · Invalid (0)
ID: 1857981 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1858005 - Posted: 27 Mar 2017, 20:29:37 UTC - in response to Message 1857957.  
Last modified: 27 Mar 2017, 20:39:02 UTC

The parameters to cudaMemsetAsync are in the CUDA documentation and the size of the reserved mem buffer can be found in cudaAcceleration.cu where the buffer is allocated using CUDA device memory allocation function. The size is in bytes and one float (short and fast form of decimal number) takes four bytes (4 chunks of 8 bit integers, totalling of 32 bits i.e. four bytes)
Petri
I couldn't make heads or tails of that, however, I decided to bring back the last cudaDeviceReset(); in cudaAcceleration.cu. In hopes of avoiding the Mac SIGBUS error, which is why that particular cudaDeviceReset was disabled, I moved it to After the cudaThreadExit();. Unfortunately, that didn't work. The 10th task ended in a SIGBUS Error. I haven't seen one of those in a while, in fact, I haven't seen one since cudaDeviceReset was disabled.
Best spike: peak=25.77551, time=90.6, d_freq=1419030775.88, chirp=8.566, fft_len=64k
Best autocorr: peak=19.23401, time=60.4, delay=3.331, d_freq=1419034455.47, chirp=20.735, fft_len=128k
Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.121e+11, d_freq=0,
	score=-12, null_hyp=0, chirp=0, fft_len=0 
Best pulse: peak=3.728501, time=38.44, period=0.07373, d_freq=1419029541.02, score=0.8326, chirp=0, fft_len=32 
Best triplet: peak=8.613966, time=52.78, period=0.0426, d_freq=1419031372.07, chirp=0, fft_len=32 

Spike count:    9
Autocorr count: 2
Pulse count:    0
Triplet count:  7
Gaussian count: 0

SIGBUS: bus error

Crashed executable name: setiathome_x41p_zi3t1e_x86_64-apple-darwin_cuda80

Oh well, zi3k+ isn't so bad...for now.
ID: 1858005 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13731
Credit: 208,696,464
RAC: 304
Australia
Message 1858078 - Posted: 28 Mar 2017, 7:41:04 UTC - in response to Message 1857981.  

My i7 with 980/1070 looks fairly good with zi3k+
Validation pending (1214) · Validation inconclusive (79) · Valid (955) · Invalid (0)

6.5%
So very close.
Grant
Darwin NT
ID: 1858078 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1858307 - Posted: 29 Mar 2017, 19:41:48 UTC

If pattern of PulseFind issue remained the same from time I saw it before, the issue not in some missing tale processing.
There were samples with proper signal reported as best (that is, corresponding data was processed, but incorrect pulse reported as reportable for same chunk.
I described issue in details some time ago on beta. Maybe could be useful for more targeted bughunt.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1858307 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1858373 - Posted: 30 Mar 2017, 2:16:39 UTC - in response to Message 1858307.  

...Maybe could be useful for more targeted bughunt.
Any suggestions?
I sent Petri a WU that consistently produces a Bad Best Pulse, I think he's working on that.
I'm just trying to determine why some of us are receiving False overflows while he isn't. I've always wondered what disabling the last cudaDeviceReset(); would do. Perhaps not ending correctly will cause something to not start correctly? There are 3 cudaDeviceReset(); in cudaAcceleration.cu. The first two appear to affect CUDART_VERSION < 3000, the third one, which is disabled, is for CUDART_VERSION >= 4000. Maybe adding one for CUDART_VERSION >= 4000 will help? It can't be added at the end, that causes SIGBUS errors. Maybe adding one at the top, below CUDART_VERSION >= 4000? But where to put it? If you were going to add a cudaDeviceReset(); to the top of cudaAcceleration.cu, where would you put it?
ID: 1858373 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1858429 - Posted: 30 Mar 2017, 13:52:08 UTC

Well, Petri is receiving False overflows, it's just that his doesn't happen immediately, https://setiathome.berkeley.edu/results.php?hostid=7475713&state=5
Where the rest of us have them immediately, https://setiathome.berkeley.edu/workunit.php?wuid=2486318447
https://setiathome.berkeley.edu/workunit.php?wuid=2478474501
https://setiathome.berkeley.edu/workunit.php?wuid=2478335863
https://setiathome.berkeley.edu/workunit.php?wuid=2484236659
Etc, etc...

Meanwhile, the Inconclusive results have overall dropped dramatically since adding a cudaDeviceReset(); to the Top of cudaAcceleration.cu.
Compare the results since 29/1700 UTC, http://setiathome.berkeley.edu/results.php?hostid=6796479&state=3, of course, you have to weed out the usual suspects.
ID: 1858429 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1858503 - Posted: 31 Mar 2017, 2:40:52 UTC - in response to Message 1858429.  

slightly closer to diving in. Will see if this weekend gets as throttled as last...
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1858503 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1859840 - Posted: 6 Apr 2017, 17:21:07 UTC

Nvidia is finally releasing Pascal drivers for the Mac. Guess it's time to start thinking about an upgrade...

https://blogs.nvidia.com/blog/2017/04/06/titan-xp/

Chris
ID: 1859840 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1859891 - Posted: 6 Apr 2017, 22:50:12 UTC - in response to Message 1859840.  

Nvidia is finally releasing Pascal drivers for the Mac. Guess it's time to start thinking about an upgrade...

https://blogs.nvidia.com/blog/2017/04/06/titan-xp/

Chris


Wooo, That'll make things way easier
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1859891 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1859905 - Posted: 7 Apr 2017, 0:53:48 UTC - in response to Message 1859891.  

Nvidia is finally releasing Pascal drivers for the Mac. Guess it's time to start thinking about an upgrade...

https://blogs.nvidia.com/blog/2017/04/06/titan-xp/

Chris

Wooo, That'll make things way easier

The big question is which versions of OSX will be supported. I looked around nVidia and didn't see any indications. It's a given 10.12 will be supported, but what about the others? I really don't want to jump through the hoops to install 11.12 on my old Mac, 11.11 works for me.

I just had a meltdown with version x41p_zi3t1e while looking around nVidia. I noticed the machine sounded different and when I looked at BOINC I noticed it had just Trashed EVERY GPU task on the machine. Around 288 ! WTH? It hadn't reported anything, so I shut it down before it could. The best I can figure, it ran into a horde of Shorties that triggered an Error with the Autotune.
Cuda error 'cufftPlan1d(&fft_analysis_plans[FftNum][0], FftLen, CUFFT_C2C, NumDataPoints / FftLen)' in file 'cuda/cudaAcc_fft.cu' in line 29 : invalid argument.
About 288 of those.

Since they were ALL errors I decided to reset them and see if they would be accepted. So far, it looks as though they are validating instead of being Immediately labeled as Invalid. Of course...I went back to zi3k+. I don't need any more Errors anytime soon. I have No idea why that Error was triggered with zi3t1e, but it was pretty nasty.
ID: 1859905 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1859923 - Posted: 7 Apr 2017, 2:22:32 UTC - in response to Message 1859905.  
Last modified: 7 Apr 2017, 2:23:41 UTC

Cufft failures like that have a lot of possible causes only really some monitoring can diagnose further. Fingers crossed the nvml support (as used by the nvidia-smi command) will work under the new drivers for Pascal when they materialise. If so I'll just whip up a lightweight (cross platform) standalone monitoring utility, and put some basic monitoring in the apps. There are some basic failures due to bugs or other that may be caught with things like Petri and I discussed in the Linux thread, however there is still some juggling likely across all platforms now. That's since Vulkan and Vulkan over metal is starting to show its teeth in places, and Supposedly (rumoured) Apple wants VR support, demanding high end GPUs
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1859923 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1859965 - Posted: 7 Apr 2017, 5:31:50 UTC

Hi,
Autotune has no memory allocation check right now. If you run low on GPU RAM you get those errors. That is on my TODO list: Make the app to recognise too big unroll (/autotuned value) at task startup.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1859965 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1861005 - Posted: 12 Apr 2017, 9:57:38 UTC

Has anyone tried the New Pascal driver with the New CUDA driver? The Pascal driver came out early yesterday for 10.12.4, I jumped through the Hoops to get 10.12.4 installed and then found the CUDA driver 8.0.71 didn't work. Later they released 8.0.81, but it appears to be trash. With the 9 series cards CUDA 8.0.81 in 10.12.4 is around 30% slower than 8.0.71 in 10.11.6. With CUDA 8.0.81 in 10.11.6 I ended up quitting it cause it was so slow. So, it's back to 10.11.6 with driver 8.0.71 for now. I never even tried the 1050s in the Mac, no sense swapping cards if the CUDA driver doesn't work. The Graphics driver seems to work fine, just the CUDA driver is really slow. I did find you can't use a DVI-VGA adapter with the new graphics driver, it's either HDMI or DVI with the new driver.
https://forums.macrumors.com/threads/webdriver-for-gtx-1080-1070.1979778/page-12
ID: 1861005 · Report as offensive
Profile Zalster Special Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 27 May 99
Posts: 5517
Credit: 528,817,460
RAC: 242
United States
Message 1861024 - Posted: 12 Apr 2017, 13:00:56 UTC - in response to Message 1861005.  
Last modified: 12 Apr 2017, 13:03:53 UTC

I'm still running with my old laptop.

MacBook Pro Late 2013 15 inch
Processor 2.6 Ghz Intel Core i7
16 GB 1600 MHz DDR3
Grapichs Nvidia GeForce GT 750M 2048 MB
Intel Iris Pro 1536 MB

Mac OS v 10.12.4(16E195)

Cuda Driver Version 8.0.81
GPU Driver Version 10.16.34 355.10.05.35f05

Nvidia Web Driver 378.05.05.05f01

I did notice, don't use the Nvidia Web Driver to display while trying to use BOINC at the same time. My webpages immediately went blank and I was forced to hard quit and reboot the system. I had to change the default display to the OS X default Graphic Driver to display webpages and crunch at the same time. Sorry haven't really bother to check the times. I have notice that if the screen saver was selected to run BOINC that once the screen saver started (or was supposed to start) the system locks up. So I've disabled BOINC as default SS and only run it when I remember.
ID: 1861024 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1861044 - Posted: 12 Apr 2017, 14:58:24 UTC - in response to Message 1861024.  
Last modified: 12 Apr 2017, 15:52:02 UTC

Looking at the times on that GPU they don't seem much different with the 8.0.81 driver. I still don't know why the LapTops have trouble finding the NV GPU. Since you're not using the iGPU, can you try adding a line to cc_config.xml telling BOINC to ignore the iGPU? It would look like this;
<ignore_intel_dev>0</ignore_intel_dev>
Maybe that will solve the 'no CUDA-capable device is detected.' problem.
I just ran the CUDA 75 App benchmark in 10.11.6 with 8.0.71 and it looks normal on my machine, I don't have an iGPU;
10:29:06 (43008): Can't set up shared mem: -1. Will run in standalone mode.
v8 task detected
setiathome_CUDA: Found 3 CUDA device(s):
  Device 1: GeForce GTX 960, 2047 MiB, regsPerBlock 65536
     computeCap 5.2, multiProcs 8 
     pciBusID = 2, pciSlotID = 0
  Device 2: GeForce GTX 950, 2047 MiB, regsPerBlock 65536
     computeCap 5.2, multiProcs 6 
     pciBusID = 1, pciSlotID = 0
  Device 3: GeForce GTX 950, 2047 MiB, regsPerBlock 65536
     computeCap 5.2, multiProcs 6 
     pciBusID = 5, pciSlotID = 0
setiathome_CUDA: No device specified, determined to use CUDA device 1: GeForce GTX 960
SETI@home using CUDA accelerated device GeForce GTX 960

setiathome enhanced x41zi (baseline v8), Cuda 7.50

setiathome_v8 task detected
Detected Autocorrelations as enabled, size 128k elements.
Work Unit Info:
...............
WU true angle range is :  0.215592
re-using dev_GaussFitResults array for dev_AutoCorrIn, 4194304 bytes
re-using dev_GaussFitResults+524288x8 array for dev_AutoCorrOut, 4194304 bytes
Thread call stack limit is: 1k
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
cudaAcc_free() DONE.

Flopcounter: 59550947364203.703125

Spike count:    7
Autocorr count: 0
Pulse count:    0
Triplet count:  0
Gaussian count: 0
10:44:39 (43008): called boinc_finish(0)

I'll boot to 10.12.4 after it's finished with the benchmark and try the same WUs there with 8.0.81. I'll also check the Browser while the App is running, yesterday there wasn't any problem on my machine while running the Special cuda App.

Everything looks normal in 10.11.6 with cuda 8.0.71;
Running on TomsMacPro.local at Wed Apr 12 14:22:18 2017
---------------------------------------------------
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
18dc09ah.26284.16432.6.33.125.wu 30oc08ae.27779.7839.4.31.25.wu blc2_2bit_guppi_57403_HIP11048_0006.17091.831.22.45.71.wu

Listing executable(s) in /APPS :
setiathome_8.11_x86_64-apple-darwin__cuda75_mac

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: 18dc09ah.26284.16432.6.33.125.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 3517 seconds
---------------------------------------------------
Running app with command : setiathome_8.11_x86_64-apple-darwin__cuda75_mac
      407.15 real        85.59 user        73.66 sys
Elapsed Time : ……………………………… 407 seconds
Speed compared to default : 864 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.70%
---------------------------------------------------
Done with 18dc09ah.26284.16432.6.33.125.wu.
Current WU: 30oc08ae.27779.7839.4.31.25.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 8462 seconds
---------------------------------------------------
Running app with command : setiathome_8.11_x86_64-apple-darwin__cuda75_mac
      936.01 real       111.58 user        92.52 sys
Elapsed Time : ……………………………… 936 seconds
Speed compared to default : 904 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.77%
---------------------------------------------------
Done with 30oc08ae.27779.7839.4.31.25.wu.
Current WU: blc2_2bit_guppi_57403_HIP11048_0006.17091.831.22.45.71.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 4797 seconds
---------------------------------------------------
Running app with command : setiathome_8.11_x86_64-apple-darwin__cuda75_mac
     1331.45 real        83.22 user        71.03 sys
Elapsed Time : ……………………………… 1331 seconds
Speed compared to default : 360 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.37%
---------------------------------------------------
Done with blc2_2bit_guppi_57403_HIP11048_0006.17091.831.22.45.71.wu.

Done with Benchmark run! Removing temporary files!


++++++++++++++++++++++++++

Well, I see the same problem with the Baseline App with 8.0.81 as the Special App, the new CUDA driver uses Twice as Much CPU and in addition, uses Twice as much Memory. It seems to be taking Much longer as well. I don't see any Browser problems with Safari or FireFox though.

WU: 18dc09ah.26284.16432.6.33.125.wu has been running 28 minutes and it still isn't finished.....Bad. I wonder how long it will be before nVidia releases a New CUDA driver for 10.12.4.
ID: 1861044 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1861071 - Posted: 12 Apr 2017, 17:01:28 UTC

So, I stand by my original comment. The New CUDA driver 8.0.81 released yesterday is trash if used with Graphics driver 10.17.34 378.05.05.05f01 in 10.12.4.
The task 18dc09ah.26284.16432.6.33.125.wu finally finished after 45 minutes running the Stock CUDA75 App, it only took around 7 minutes in 10.11.6 with driver 8.0.71.
The Stock OpenCL App seems to be working normally in my 10.12.4 system, it's a little slower than CUDA on the Arecibos and a bit faster on the BLCs;
Running on TomsMacPro.local at Wed Apr 12 15:19:44 2017
---------------------------------------------------
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
18dc09ah.26284.16432.6.33.125.wu 30oc08ae.27779.7839.4.31.25.wu blc2_2bit_guppi_57403_HIP11048_0006.17091.831.22.45.71.wu

Listing executable(s) in /APPS :
setiathome_8.11_x86_64-apple-darwin__cuda75_mac

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: 18dc09ah.26284.16432.6.33.125.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 3517 seconds
---------------------------------------------------
Running app with command : setiathome_8.11_x86_64-apple-darwin__cuda75_mac
     2701.69 real       256.77 user      2156.47 sys
Elapsed Time : ……………………………… 2702 seconds
Speed compared to default : 130 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.70%
---------------------------------------------------
Done with 18dc09ah.26284.16432.6.33.125.wu.
Done with Benchmark run! Removing temporary files!

Running on TomsMacPro.local at Wed Apr 12 16:07:08 2017
---------------------------------------------------
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
18dc09ah.26284.16432.6.33.125.wu 30oc08ae.27779.7839.4.31.25.wu blc2_2bit_guppi_57403_HIP11048_0006.17091.831.22.45.71.wu

Listing executable(s) in /APPS :
setiathome_8.19_x86_64-apple-darwin__opencl_nvidia_mac

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: 18dc09ah.26284.16432.6.33.125.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 3517 seconds
---------------------------------------------------
Running app with command : setiathome_8.19_x86_64-apple-darwin__opencl_nvidia_mac -sbs 256 -period_iterations_num 10
      429.37 real        46.42 user       125.92 sys
Elapsed Time : ……………………………… 430 seconds
Speed compared to default : 817 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.48%
---------------------------------------------------
Done with 18dc09ah.26284.16432.6.33.125.wu.
Current WU: 30oc08ae.27779.7839.4.31.25.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 8462 seconds
---------------------------------------------------
Running app with command : setiathome_8.19_x86_64-apple-darwin__opencl_nvidia_mac -sbs 256 -period_iterations_num 10
     1021.87 real       133.40 user       235.57 sys
Elapsed Time : ……………………………… 1022 seconds
Speed compared to default : 827 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      5      7      7      0        0      5      7      7      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      0      0      0      0        0      0      0      0      0
      Triplet      0      0      0      0      0        0      0      0      0      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      0      1      1      0        0      0      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      1      1      1      0        0      1      1      1      0
 Best Triplet      0      0      0      0      0        0      0      0      0      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1      8     11     11      0        1      8     11     11      0

Result      : Strongly similar,  Q= 97.23%
---------------------------------------------------
Done with 30oc08ae.27779.7839.4.31.25.wu.
Current WU: blc2_2bit_guppi_57403_HIP11048_0006.17091.831.22.45.71.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 4797 seconds
---------------------------------------------------
Running app with command : setiathome_8.19_x86_64-apple-darwin__opencl_nvidia_mac -sbs 256 -period_iterations_num 10
      778.23 real        71.34 user       211.07 sys
Elapsed Time : ……………………………… 778 seconds
Speed compared to default : 616 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.89%
---------------------------------------------------
Done with blc2_2bit_guppi_57403_HIP11048_0006.17091.831.22.45.71.wu.

Done with Benchmark run! Removing temporary files!

We'll see how it goes with the Next CUDA driver. If you're running the Apple Graphics driver you should be able to Manually Reinstall the Older CUDA driver 8.0.71 by just running the installer from here;
http://www.nvidia.com/object/macosx-cuda-8.0.71-driver.html
ID: 1861071 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1861477 - Posted: 14 Apr 2017, 18:15:38 UTC

Still nothing from nVidia about CUDA Driver 8.0.81, https://forums.geforce.com/default/topic/1002724/geforce-apple-gpus/pascal-based-graphics-card-drivers-for-macos/post/5127351/#5127351
One can only hope someone at nVidia is running SETI@Home on their test Mac....

In other news it appears the latest App x41p_zi3t2b has solved the problem with False Overflows on the Low Angle Range Arecibo tasks;
Listing executable(s) in /APPS :
setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: 30oc08ae.27779.7839.4.31.25.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 8462 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 -bs -unroll autotune -device 0
unroll limits: min = 1, max = 256. Using unroll autotune.
      393.03 real        51.21 user        26.88 sys
Elapsed Time : ……………………………… 393 seconds
Speed compared to default : 2153 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.76%
---------------------------------------------------
Done with 30oc08ae.27779.7839.4.31.25.wu.
Current WU: 31oc08ad.28198.24203.5.32.12.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 10974 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 -bs -unroll autotune -device 0
unroll limits: min = 1, max = 256. Using unroll autotune.
      470.36 real        73.58 user        47.05 sys
Elapsed Time : ……………………………… 470 seconds
Speed compared to default : 2334 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.87%
---------------------------------------------------
Done with 31oc08ad.28198.24203.5.32.12.wu.

Done with Benchmark run! Removing temporary files!

Unfortunately it still has the PulseFind problem;
Current WU: 16fe08aa.12502.25021.6.33.13.wu
---------------------------------------------------
Running app with command : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 -bs -unroll autotune -device 0
unroll limits: min = 1, max = 256. Using unroll autotune.
      148.69 real        21.88 user        15.76 sys
Elapsed Time : ……………………………… 148 seconds
Speed compared to default : 2408 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      0      0      0      0        0      0      0      0      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      0      0      0      0        0      0      0      0      0
      Triplet      0      0      0      0      0        0      0      0      0      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      0      0      0      0      0        0      0      0      0      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1      3      3      3      1        1      3      3      3      1

Unmatched signal(s) in R1 at line(s) 396
Unmatched signal(s) in R2 at line(s) 396
For R1:R2 matched signals only, Q= 99.98%
Result      : Weakly similar.
---------------------------------------------------
Done with 16fe08aa.12502.25021.6.33.13.wu.
Current WU: blc13_2bit_guppi_57824_79834_HIP22449_0042.23456.818.23.46.39.vlar.wu
---------------------------------------------------
Running app with command : setiathome_x41p_zi3t2b_x86_64-apple-darwin_cuda80 -bs -unroll autotune -device 0
unroll limits: min = 1, max = 256. Using unroll autotune.
      502.96 real        60.15 user        25.14 sys
Elapsed Time : ……………………………… 503 seconds
Speed compared to default : 1488 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      5      5      5      0        0      5      5      5      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0     13     13     13      2        0     13     13     13      2
      Triplet      0      3      3      3      0        0      3      3      3      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      1      1      1      0        0      1      1      1      0
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1     26     26     26      2        1     26     26     26      2

Unmatched signal(s) in R1 at line(s) 404 500
Unmatched signal(s) in R2 at line(s) 404 489
For R1:R2 matched signals only, Q= 99.22%
Result      : Weakly similar.
---------------------------------------------------
ID: 1861477 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1863677 - Posted: 26 Apr 2017, 6:32:53 UTC

Still Nothing from nVidia on the CUDA 8.0.81 driver. However, the Maxwell GPUs work fine with the Original CUDA 8.0.71 Driver with the Original 10.12.4 Webdriver here, https://images.nvidia.com/mac/pkg/367/WebDriver-367.15.10.45f01.pkg So, We will just have to keep using the Maxwell GPUs for now. I just posted the New Mac CUDA Special App, it works fine on My Mac; http://www.arkayn.us/forum/index.php?topic=191.msg4411#msg4411 It's a nice improvement over the last posted Mac App with much faster results with the BLC tasks.
Have fun.
ID: 1863677 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1863756 - Posted: 26 Apr 2017, 14:48:02 UTC - in response to Message 1863677.  

If they ever get these drivers working well, I plan on giving an external thunderbolt enclosure with a 10x0 card a try on my 2013 Mac Pro. Don't want to do anything as simple as just adding one to my 5,1 Mac Pro lol. I'll give the new app a try this afternoon. Same command line options for unroll and such?

Thanks,

Chris
ID: 1863756 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1863788 - Posted: 26 Apr 2017, 17:47:13 UTC - in response to Message 1863756.  

There is a new command for the unroll that automatically sets the tasks unroll number to the number of multiProcs, However, there are Known limits depending on the GPU's amount of vRam. On GPUs with only One or Two GBs of vRam you will need to set the unroll number Manually (aka, the Old Fashioned Way). On the following vRam GPUs set the unroll to;
1 or 2 for GPUs with One GB vRam and more than Two multiProcs
6 for GPUs with Two GB vRam and more than Six multiProcs
GPUs with 3 GBs of vRam haven't been tested but should work using the -unroll autotune command. 2 GB GPUs with 5 or 6 multiProcs can use -unroll autotune.
If you receive the "cufftPlan1d..." Error you should manually reduce the unroll number. It seems Sierra uses slightly more vRam than previous systems.
The settings are covered in the Settings_Libraries_and_Drivers.txt file in the docs folder.
ID: 1863788 · Report as offensive
Previous · 1 . . . 44 · 45 · 46 · 47 · 48 · 49 · 50 . . . 58 · Next

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.