I've Built a Couple OSX CUDA Apps...

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 46 · 47 · 48 · 49 · 50 · 51 · 52 . . . 58 · Next

AuthorMessage
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1887768 - Posted: 3 Sep 2017, 19:13:13 UTC - in response to Message 1887723.  
Last modified: 3 Sep 2017, 19:29:30 UTC

There may be Hope! After months of new CUDA drivers that basically didn't work with the Special App along comes the CUDA 9.0 Toolkit with CUDA Driver 9.0.103 that appears to Work.
I first tried it with the GTX 9xx GPUs in OSX 12.4 with the 367 driver and it worked OK, so, I then tried it with the 378 driver without any trouble. Swapped out cards to the 10 series and it still worked. Next was to update to OSX 12.6...it still works! At least with the CUDA 8.0 version of the Special App, the CUDA 7.5 version seems to have some weird Memory Hole, I wouldn't call it a leak, it's much worse. The CUDA 8 version works though, and it seems to work a little faster with all the updates.
Finally.
Coprocessors: [3] NVIDIA GeForce GTX 1050 Ti (4095MB) driver: 5899.27
Operating System: Darwin 16.7.0


I'm still trying to decide what to do with the Top card. They changed the spacing on the faceplate between the 950 & 1050 and now the trick I was using won't work. I may have to take the tin-snips to the 1050 faceplate to make it fit in the top slot...

I'm still seeing that BOINC Error as well;
Sun Sep 3 08:58:28 2017 | | Starting BOINC client version 7.6.34 for x86_64-apple-darwin
Sun Sep 3 08:58:28 2017 | | Data directory: /Volumes/Mov1/BOINC/Yosemite/BOINC Data
Sun Sep 3 08:58:29 2017 | | NVIDIA GPU 1: GeForce GTX 1050 Ti cannot be used for CUDA or OpenCL computation with CUDA driver 6.5 or later
Sun Sep 3 08:58:29 2017 | | NVIDIA GPU 2: GeForce GTX 1050 cannot be used for CUDA or OpenCL computation with CUDA driver 6.5 or later
Sun Sep 3 08:58:29 2017 | | CUDA: NVIDIA GPU 0: GeForce GTX 1050 Ti (driver version 9.0.103, CUDA version 9.0, compute capability 6.1, 4096MB, 3465MB available, 2255 GFLOPS peak)
Sun Sep 3 08:58:29 2017 | | CUDA: NVIDIA GPU 1: GeForce GTX 1050 (driver version 9.0.103, CUDA version 9.0, compute capability 6.1, 2048MB, 1992MB available, 1976 GFLOPS peak)
Sun Sep 3 08:58:29 2017 | | CUDA: NVIDIA GPU 2: GeForce GTX 950 (driver version 9.0.103, CUDA version 9.0, compute capability 5.2, 2048MB, 1999MB available, 2022 GFLOPS peak)
Sun Sep 3 08:58:29 2017 | | OpenCL: NVIDIA GPU 1: GeForce GTX 1050 Ti (driver version 10.18.5 378.05.05.25f01, device version OpenCL 1.2, 4096MB, 4096MB available, 729 GFLOPS peak)
Sun Sep 3 08:58:29 2017 | | OpenCL: NVIDIA GPU 2: GeForce GTX 950 (driver version 10.18.5 378.05.05.25f01, device version OpenCL 1.2, 2048MB, 1999MB available, 2022 GFLOPS peak)
Sun Sep 3 08:58:29 2017 | | OpenCL: NVIDIA GPU 2: GeForce GTX 1050 (driver version 10.18.5 378.05.05.25f01, device version OpenCL 1.2, 2048MB, 2048MB available, 619 GFLOPS peak)
Sun Sep 3 08:58:29 2017 | | OpenCL CPU: Intel(R) Xeon(R) CPU E5472 @ 3.00GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2)
Sun Sep 3 08:58:34 2017 | | OS: Mac OS X 10.12.6 (Darwin 16.7.0)
Sun Sep 3 08:58:34 2017 | | Local time is UTC -4 hours

The CUDA driver has the cards in the correct order, the OpenCL part is wrong, and the 10 series are Not Pre-Fermi cards!


Would you like to have some updated source code so you will not receive so much warnings with CUDA 9 compiler and to get a huge 2% performance boost at the same time?
Look at your mailbox in a couple of minutes. :)
It may have its version labels mixed up, but you can begin testing. It will be called zi3x.

EDIT: The same source code has been set to You and Gianfranco and W3Perl. It needs CUDA 9 development environment -- available from NVIDIA.
EDIT2: The source code has been sent to Jason too.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1887768 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1887879 - Posted: 4 Sep 2017, 15:19:48 UTC - in response to Message 1887768.  

Thanks Petri, I have the zi3x for CUDA 9. I don't think that's going to work in OSX as the CUDA 9 Toolkit only works with Sierra and the XCode for Sierra won't compile the App. For now the highest you can go is the CUDA 8 Toolkit in El Capitan with the older XCode. The CUDA 8 App compiled back in June is working nicely with the New CUDA driver in Sierra, that will work for now. When I get a chance I'll try it in Ubuntu, right now I have some family matters to deal with. It might be a while.
ID: 1887879 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1889305 - Posted: 11 Sep 2017, 19:47:15 UTC - in response to Message 1887879.  

TBar,

Have you tried the Xcode 9 beta, not sure if it requires High Sierra or not. I can say I haven't see any issues with High Sierra and 3610 ATI build of yours I've been using. I haven't tried loading that on my High Sierra yet on my 5,1 Mac Pro yet though. I try to only initially break only one computer in the house at a time.=)

Chris
ID: 1889305 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1889507 - Posted: 13 Sep 2017, 13:18:16 UTC - in response to Message 1889305.  

I think the problem is with Petri's code, any XCode higher than 6.2 doesn't like the asm lines. XCode 6.2 Doesn't work in Sierra, it works fine in El Capitan. I had to jump through hoops to run Sierra, so far I haven't seen any hoops for High Sierra on my Old Mac Pro. Sierra seems to be working fine with the 1050s since I replaced the top card, now it lists the cards correctly;
12-Sep-2017 08:19:57 [---] Starting BOINC client version 7.6.34 for x86_64-apple-darwin
12-Sep-2017 08:19:58 [---] CUDA: NVIDIA GPU 0: GeForce GTX 1050 Ti (driver version 9.0.103, CUDA version 9.0, compute capability 6.1, 4096MB, 3370MB available, 2255 GFLOPS peak)
12-Sep-2017 08:19:58 [---] CUDA: NVIDIA GPU 1: GeForce GTX 1050 (driver version 9.0.103, CUDA version 9.0, compute capability 6.1, 2048MB, 1992MB available, 1976 GFLOPS peak)
12-Sep-2017 08:19:58 [---] CUDA: NVIDIA GPU 2: GeForce GTX 1050 (driver version 9.0.103, CUDA version 9.0, compute capability 6.1, 2048MB, 1992MB available, 1960 GFLOPS peak)
12-Sep-2017 08:19:58 [---] OpenCL: NVIDIA GPU 0: GeForce GTX 1050 Ti (driver version 10.18.5 378.05.05.25f01, device version OpenCL 1.2, 4096MB, 3370MB available, 2255 GFLOPS peak)
12-Sep-2017 08:19:58 [---] OpenCL: NVIDIA GPU 1: GeForce GTX 1050 (driver version 10.18.5 378.05.05.25f01, device version OpenCL 1.2, 2048MB, 1992MB available, 1976 GFLOPS peak)
12-Sep-2017 08:19:58 [---] OpenCL: NVIDIA GPU 2: GeForce GTX 1050 (driver version 10.18.5 378.05.05.25f01, device version OpenCL 1.2, 2048MB, 1992MB available, 1960 GFLOPS peak)
12-Sep-2017 08:19:58 [---] OpenCL CPU: Intel(R) Xeon(R) CPU E5472 @ 3.00GHz (OpenCL driver vendor: Apple, driver version 1.1, device version OpenCL 1.2)
12-Sep-2017 08:20:03 [---] Processor: 8 GenuineIntel Intel(R) Xeon(R) CPU E5472 @ 3.00GHz [x86 Family 6 Model 23 Stepping 6]
12-Sep-2017 08:20:03 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clfsh ds acpi mmx fxsr sse sse2 ss htt tm pbe pni dtes64 mon dscpl vmx est tm2 ssse3 cx16 tpr pdcm sse4_1
12-Sep-2017 08:20:03 [---] OS: Mac OS X 10.12.6 (Darwin 16.7.0)
This machine will probably be running this system and GPUs for the duration.
ID: 1889507 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1900637 - Posted: 12 Nov 2017, 2:39:49 UTC
Last modified: 12 Nov 2017, 2:47:29 UTC

Good News on the Mac CUDA Front. It appears the drivers have settled down to where All zi3v Special versions work in Sierra 12.6 with the latest CUDA 9 driver. The only version that doesn't work correctly is the 7.5 version still posted at Crunchers Anonymous based on the older zi3t2b, that version still needs OS versions lower than 12.4 and the CUDA 8 driver. I'm still trying to pick a winner, so far the zi3v cuda 9 version is out in front due the the Low number of Inconclusives and lack of Invalid Overflows. Probably matched with the new zi3v cuda 7.5 version for the older systems. Some recent results run in Sierra with the current CUDA Driver;

Current WU: 18dc09ah.26284.16432.6.33.125.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 3517 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda65 -device 0
       95.20 real        21.04 user        14.07 sys
Elapsed Time : ……………………………… 96 seconds
Speed compared to default : 3663 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.70%
---------------------------------------------------
Running app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda75 -device 0
       90.15 real        23.05 user        15.05 sys
Elapsed Time : ……………………………… 90 seconds
Speed compared to default : 3907 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.70%
---------------------------------------------------
Running app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda80 -device 0
       85.70 real        18.08 user        14.54 sys
Elapsed Time : ……………………………… 86 seconds
Speed compared to default : 4089 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.70%
---------------------------------------------------
Running app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda90 -device 0
       85.24 real        18.23 user        14.78 sys
Elapsed Time : ……………………………… 85 seconds
Speed compared to default : 4137 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.70%
---------------------------------------------------
Running app with command : setiathome_x41p_zi3xs3_x86_64-apple-darwin_cuda90 -device 0
       74.11 real        16.75 user        11.71 sys
Elapsed Time : ……………………………… 74 seconds
Speed compared to default : 4752 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.70%
---------------------------------------------------
Done with 18dc09ah.26284.16432.6.33.125.wu.

Done with Benchmark run! Removing temporary files!
TomsMacPro:KWSN-OSX-bench-MB Tom$ ./benchmark
KWSN-Darwin-MBbench v2.1.08

zi3xs3 using the Static libraries is obviously the fastest, but, results in a continuous Invalid Overflow task in the Invalid field and slightly more Inconclusives. The CUDA 9 version of zi3v uses the autocorr.cu & gaussfit.cu files from zi3x as they appear to be a little better than the zi3v files.
The machine has been building and testing new Apps since around the 7th, which is why the RAC nosedived recently.
ID: 1900637 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1901052 - Posted: 14 Nov 2017, 22:42:59 UTC - in response to Message 1900637.  

Have you tried it with High Sierra yet? Debating on whether to update my MacPro 5,1 to Sierra or High Sierra once I get back from the holidays in a couple weeks...

Thanks,

Chris
ID: 1901052 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1901096 - Posted: 15 Nov 2017, 2:20:50 UTC - in response to Message 1901052.  

My Mac doen't run High Sierra, it's not supposed to run Sierra either...but does. I don't see why the App wouldn't run in High Sierra, there shouldn't be any trouble with the App. Speaking of Apps, I was trying to show people how well the ATI App r3610 ran in HS and noticed your one machine is having trouble with what appears to be the Target kernel sequence timer. It might help if you could fix that, all those Errors don't go over very well. I have a new ATI App but was going to wait and see what happens with r3610 before posting it.
ID: 1901096 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1901104 - Posted: 15 Nov 2017, 3:04:09 UTC - in response to Message 1901096.  

Oops, thanks for the heads up. I’ve been exceedingly busy lately and haven’t even checked on any of my computers. I’ll go give it a reset right now and tweak the timer.

Thanks,

Chris
ID: 1901104 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1901187 - Posted: 15 Nov 2017, 15:12:52 UTC - in response to Message 1901104.  

Actually that D700 computer has some quirky hardware issues, Apple never has been able to run them down though. Its prone to randomly lock up for no good reason, running seti or not. If you want another comparison, my D500 machine only runs around 10% slower (despite significantly fewer CU's) and never has issues with your 3610 app. Both machines are running 3 at a time, which seems to be the sweet spot and almost identical command line tweaks, the timer being one difference. SMS has to be comparatively small since Apple doesn't expose very much memory to OpenCL relative to how much memory is onboard, particularly on the D700...

https://setiathome.berkeley.edu/results.php?hostid=8243589
ID: 1901187 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1901206 - Posted: 15 Nov 2017, 17:56:09 UTC - in response to Message 1901187.  

I had a Mac like that back in the late 90s. After a couple years it was found to be caused by the memory even though Apple said they had tested the memory a couple of times and said it was good.
Do you feel like testing the new CUDA 7.5 App? I'd have to swap out the cards to rest it in anything below Sierra. These Pascal cards are a pain since they only work in Sierra and above. The new driver in Sierra is nice though, before that they had been stuck on 346 for a couple of years. That's really the only reason to run Sierra, the new nVidia drivers for the newer GPUs.
ID: 1901206 · Report as offensive
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 251
Credit: 434,772,072
RAC: 236
United States
Message 1901329 - Posted: 16 Nov 2017, 11:44:19 UTC - in response to Message 1901206.  

Sure, I’ll be back in town on the 28th and I’ll get the machine updated to Sierra and give the app a shot.

Thanks,

Chris
ID: 1901329 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1901361 - Posted: 16 Nov 2017, 17:23:32 UTC - in response to Message 1901329.  

Actually, I've already tested it in Sierra since it's the only OS I can presently run with the Pascal cards. What I need is someone with Maxwell GPUs to test the 7.5 App in Yosemite and El Capitan. Two weeks is a little longer than I had in mind. I suppose I could just forget about the older one and post the one for Sierra and above.
ID: 1901361 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1901752 - Posted: 18 Nov 2017, 22:30:50 UTC

Another bit of BOINC strangeness. For some reason zi3v will cause BOINC to throw an Error after the task is finished when running in Yosemite. The same zi3v works fine in El Capitan. I suppose there will need to be two different Apps for Yosemite and El Capitan, Yosemite will have to keep running zi3t2b.

Spike count:    0
Autocorr count: 0
Pulse count:    1
Triplet count:  3
Gaussian count: 0
cudaAcc_free() called...
cudaAcc_free() running...
cudaAcc_free() PulseFind freed...
cudaAcc_free() Gaussfit freed...
cudaAcc_free() AutoCorrelation freed...
1,2,3,4,5,6,7,8,9,10,10,11,12,13
cudaAcc_free() DONE.

SIGSEGV: segmentation violation

Crashed executable name: setiathome_x41p_zi3v_x86_64-apple-darwin_cuda75
Machine type Intel 80486 (64-bit executable)
System version: Macintosh OS 10.10.5 build 14F2511
Sat Nov 18 12:30:38 2017

0   setiathome_x41p_zi3v_x86_64-apple-darwin_cuda75 0x000000010ea5077d std::pair<int, PROCINFO>::pair(int const&, PROCINFO const&) + 1197
1   setiathome_x41p_zi3v_x86_64-apple-darwin_cuda75 0x000000010ea41046 COPROCS::clear() + 3174
2   libsystem_platform.dylib            0x00007fff92fbcf1a _sigtramp + 26
3   ???                                 0x0000000000000000  0
4   setiathome_x41p_zi3v_x86_64-apple-darwin_cuda75 0x000000010e904e31 void v_pfsubTranspose<8>(float*, float*, int, int) + 12465
5   setiathome_x41p_zi3v_x86_64-apple-darwin_cuda75 0x000000010e914062 std::vector<unsigned char, std::allocator<unsigned char> >::_M_insert_aux(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, std::allocator<unsigned char> > >, unsigned char const&) + 20754
6   setiathome_x41p_zi3v_x86_64-apple-darwin_cuda75 0x000000010e91994a SETI_WU_INFO::~SETI_WU_INFO() + 5754
etc...etc...

There is still a problem with any CUDA 8 driver higher than 8.0.71. The cuda 8 drivers higher than 8.0.71 are about 30% slower and use more memory, so far the cuda 9 drivers work fine.
ID: 1901752 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1901905 - Posted: 19 Nov 2017, 18:44:25 UTC

Here you go, Updated CUDA Apps for Mac OS. I'm afraid you will need a different App for Yosemite, El Capitan, and Sierra for best use. Most will be interested in the CUDA 9 App for the Pascal GPUs. Instructions are in the Docs folder of the Download; SETIv8 OSX CUDA Apps
ID: 1901905 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1901944 - Posted: 19 Nov 2017, 23:22:13 UTC

Here are the new Mac OpenCL Apps also posted at Crunchers Anonymous. All Apps are based on AKv8 r3710. The Intel App is the same as the last try, it is the Exact same App as the nVidia App with just a different name & number. Since I don't have an Intel iGPU, it is also completely untested on an Intel GPU, it does work on an nVidia GPU though. If the Intel App works just a Little better than the Current App on Main Please report it as the App on Main is in dire need of replacement. The new versions are;

* ATi5r3710&CPU-AVX
* ATi5r3710&CPU-AVX2
* nVidia_r3709&CPUr3711
* Intel_r3708&CPUr3711
* SSE41_CPUr3711

Here: http://www.arkayn.us/forum/index.php?topic=191.msg4368#msg4368
They work on My Mac...except the AVX apps
ID: 1901944 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1902158 - Posted: 21 Nov 2017, 4:03:37 UTC - in response to Message 1901944.  

I have been running ATi5r3710 and SSE41_CPUr3711 apps on my Late 2009 27" iMac (upgraded with a ATI Radeon HD 5750 1024 MB card) for the last two days. They seem to work well. https://setiathome.berkeley.edu/show_host_detail.php?hostid=7560405

I just started running the SSE41_CPUr3711 app on my Mid-2010 21.5" iMac. https://setiathome.berkeley.edu/show_host_detail.php?hostid=7737022

I haven't run them through Bench on these machines. What have your results been like? I don't see that you posted any comparisons. A quick look just comparing the results SSE41_CPUr3711 seems similar to SSE41_CPUr3344, which I have been running for a long time. ATi5r3710 seems faster than ATi5r3610, but I'd have to run Bench to verify. Maybe I will try that in the next few days.
ID: 1902158 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1902216 - Posted: 21 Nov 2017, 10:33:57 UTC - in response to Message 1902158.  

Well, on my Mac, the ATi 3710 App is pretty close to the r3610 App, that doesn't mean it won't be different on other machines. On my old Core2 Xeons the CPU r3711 is quite a bit faster. One result with RWU r3215 showed it at 118% as compared to r3344. I just ran another using a normal WU;
Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: 18dc09ah.26284.16432.6.33.125.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 3517 seconds
---------------------------------------------------
Running app with command : MBv8_8.22r3711_sse41_x86_64-apple-darwin
     3104.09 real      3092.26 user         9.70 sys
Elapsed Time : ……………………………… 3105 seconds
Speed compared to default : 113 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.99%
---------------------------------------------------

I've been calling it 15% faster, it may be different on other machines. On mine the VLARs are a bit faster with r3711, anything helps on these old CPUs.
ID: 1902216 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1902227 - Posted: 21 Nov 2017, 11:14:44 UTC - in response to Message 1902216.  

Ans as I trying to explain in another thread speedup hardly came from revision changes.
Maybe OS X compiler become better since last time? Or linker made slightly different choices on code placement resulting in less cache thrashing... but no optimization were done recently on source level.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1902227 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1902302 - Posted: 22 Nov 2017, 7:47:36 UTC - in response to Message 1902227.  

Maybe OS X compiler become better since last time? Or linker made slightly different choices on code placement resulting in less cache thrashing... but no optimization were done recently on source level.


When I compiled the ARM Linux apps, just using the latest FFTW with a newer compiler made a significant improvement in the app. Then with your fixes to the assembly code it got even better. I suspect TBar's Mac apps would be faster from both the compiler and if the FFTW is newer. I will get Bench setup on my Mac and give it a try.
ID: 1902302 · Report as offensive
Tom Rinehart
Volunteer tester

Send message
Joined: 12 Dec 01
Posts: 113
Credit: 13,255,975
RAC: 6
United States
Message 1902405 - Posted: 22 Nov 2017, 19:12:37 UTC - in response to Message 1902302.  

On my computer the apps are about the same speed with the reference WU:

KWSN-Darwin-MBbench v2.1.08
Running on Toms-iMac.local at Wed Nov 22 16:51:48 2017
---------------------------------------------------
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
reference_work_unit_r3215.wu

Listing executable(s) in /APPS :
MBv8_8.22r3711_sse41_x86_64-apple-darwin

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: reference_work_unit_r3215.wu
---------------------------------------------------
Running default app with command : MBv8_8.05r3344_sse41_x86_64-apple-darwin -verb -st -nog
     1643.13 real      1638.84 user         2.16 sys
Elapsed Time: ………………………………… 1644 seconds
---------------------------------------------------
Running app with command : MBv8_8.22r3711_sse41_x86_64-apple-darwin -verb -st -nog
     1646.36 real      1641.12 user         2.96 sys
Elapsed Time : ……………………………… 1646 seconds
Speed compared to default : 99 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.98%
---------------------------------------------------
Done with reference_work_unit_r3215.wu.

Done with Benchmark run! Removing temporary files!
Toms-iMac:KWSN-OSX-bench-MB tom$ ./benchmark
KWSN-Darwin-MBbench v2.1.08
Running on Toms-iMac.local at Wed Nov 22 18:37:47 2017
---------------------------------------------------
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
reference_work_unit_r3215.wu

Listing executable(s) in /APPS :
MBv8_8.22r3710_ati5_ssse3_x86_64-apple-darwin MBv8_8.22r3711_sse41_x86_64-apple-darwin

Listing executable in /REF_APPs :
MBv8_8.22r3610_ati5_ssse3_x86_64-apple-darwin
---------------------------------------------------
Current WU: reference_work_unit_r3215.wu
---------------------------------------------------
Running default app with command : MBv8_8.22r3610_ati5_ssse3_x86_64-apple-darwin -verb -st -nog
      772.95 real        34.06 user        42.84 sys
Elapsed Time: ………………………………… 773 seconds
---------------------------------------------------
Running app with command : MBv8_8.22r3710_ati5_ssse3_x86_64-apple-darwin -verb -st -nog
      751.35 real        34.37 user        43.54 sys
Elapsed Time : ……………………………… 752 seconds
Speed compared to default : 102 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.81%


I noticed the command lines are different for the two apps in my normal use (I used the second for the Bench run).

For MBv8_8.22r3610_ati5_ssse3_x86_64-apple-darwin it is -sbs 256 -oclfft_tune_gr 256 -oclfft_tune_wg 128 -spike_fft_thresh 2048 -period_iterations_num 16
For MBv8_8.22r3710_ati5_ssse3_x86_64-apple-darwin it is -sbs 256 -oclfft_tune_gr 256 -oclfft_tune_wg 256 -spike_fft_thresh 2048 -period_iterations_num 16

That might be why the newer app seemed faster?
ID: 1902405 · Report as offensive
Previous · 1 . . . 46 · 47 · 48 · 49 · 50 · 51 · 52 . . . 58 · Next

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.