I've Built a Couple OSX CUDA Apps...

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 36 · 37 · 38 · 39 · 40 · 41 · 42 . . . 58 · Next

AuthorMessage
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1816568 - Posted: 12 Sep 2016, 3:22:18 UTC - in response to Message 1816487.  

Running the Baseline App 2 at a time... Now that's what I call S-L-O-W.
I just clear the Active GPU tasks from client_state.xml every time I restart, it takes about 15 seconds. Another method would be to just Suspend all the non-running GPU tasks when you want to Stop, once the GPU tasks finish, quit. Or best yet, just Don't stop crunching.

If you want to run the Much slower App, fine. If you want to remind yourself you made the right decision, just check these numbers every once in a while. It's a similar machine running similar cards, http://setiathome.berkeley.edu/results.php?hostid=7942417&offset=340
ID: 1816568 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1816611 - Posted: 12 Sep 2016, 7:15:35 UTC - in response to Message 1816476.  
Last modified: 12 Sep 2016, 7:19:08 UTC

That is, they run for a couple seconds, their Estimated Times may still have say 10 Min of crunching time left; BUT the Units "Finish" at the point of Resuming. In viewing Tasks on the Web, these Units immediately show up in Inconclusives.

TL


This means that checkpointing mechanism is broken in that build.

If app targets for high-end GPUs maybe it's OK to go w/o checkpoint at all. But in this case better to state this directly by state.sah write/read omission and clear exit with error state (computational error) in case app detects resume attempt.
Reporting invalid result is worst way.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1816611 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1816613 - Posted: 12 Sep 2016, 7:37:34 UTC - in response to Message 1816611.  
Last modified: 12 Sep 2016, 8:13:36 UTC

It appears to be broken in the current builds as well. Any quick and easy fixes you can offer would be appreciated. I think Petri's busy with the AutoCorr problem right now, but, it would be nice to have a working checkpoint. The Older x41p_zi App has basically been surpassed by the newer zi3 versions. The Older version is the best version of the Mac Special App available right now though, and it is listed as 'available for testing'. It also says, "See the Notes in the docs folder...".
Note 5) Restarted tasks could produce Incorrect Results.
Try not to resume suspended/paused tasks, if possible stop BOINC and reset active tasks in client_state.xml.

That's about as much as I can do.
ID: 1816613 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1816618 - Posted: 12 Sep 2016, 8:17:24 UTC - in response to Message 1816613.  

Any quick and easy fixes you can offer would be appreciated.

I'm afraid there is no quick and easy one cause any fix would imply diving into new code and all changes where made. Not that I could afford right now being in depths of new TwichChirp OpenCL path.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1816618 · Report as offensive
Profile TimeLord04
Volunteer tester
Avatar

Send message
Joined: 9 Mar 06
Posts: 21140
Credit: 33,933,039
RAC: 23
United States
Message 1816657 - Posted: 12 Sep 2016, 16:07:10 UTC - in response to Message 1816613.  
Last modified: 12 Sep 2016, 16:08:16 UTC

It appears to be broken in the current builds as well. Any quick and easy fixes you can offer would be appreciated. I think Petri's busy with the AutoCorr problem right now, but, it would be nice to have a working checkpoint. The Older x41p_zi App has basically been surpassed by the newer zi3 versions. The Older version is the best version of the Mac Special App available right now though, and it is listed as 'available for testing'. It also says, "See the Notes in the docs folder...".
Note 5) Restarted tasks could produce Incorrect Results.
Try not to resume suspended/paused tasks, if possible stop BOINC and reset active tasks in client_state.xml.

That's about as much as I can do.

Well, I didn't seem to see this issue in the "Regular" CUDA75 App. Both the SETI Beta Testing App, and the Cruncher's Anonymous App. for use here at SETI Main seem to be working fine on my Hackintosh. The system can be Suspended and Resumed at any point and time, and no issues occur for me.


TL
TimeLord04
Have TARDIS, will travel...
Come along K-9!
Join Calm Chaos
ID: 1816657 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1817328 - Posted: 15 Sep 2016, 16:15:34 UTC

I thought there are some peoples interesting in fastest possible CUDA binaries deployment on main for OS X :/
http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2334
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1817328 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1817338 - Posted: 15 Sep 2016, 16:51:09 UTC - in response to Message 1817328.  
Last modified: 15 Sep 2016, 16:57:20 UTC

Yep, the last time I looked at the Mac Hosts at Beta all the Hosts idled by the block on Darwin 15.x are just sitting there doing nothing. They are either not aware they could be testing the new Apps or they don't want to be like everyone else and install a nVidia driver to run CUDA on their Mac. It won't hurt people, everyone running Windows and Linux also have to install a Driver to run SETI work. Just install the latest driver and update it when you update the OS, http://www.nvidia.com/object/mac-driver-archive.html If you're running an older card in Mountain Lion or Lion install this one, http://www.nvidia.com/object/macosx-cuda-5.5.47-driver.html

I built another OpenCL App earlier, it's not any better than the last one in Darwin 15.6;
Running on TomsMacPro.local at Thu Sep 15 13:57:29 2016
---------------------------------------------------
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
reference_work_unit_r3215.wu sniff.wu

Listing executable(s) in /APPS :
MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: reference_work_unit_r3215.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 2110 seconds
---------------------------------------------------
Running app with command : MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_gr 256 -oclfft_tune_wg 128 -device 2
Elapsed Time : ……………………………… 418 seconds
Speed compared to default : 504 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      9     11     13      0        0      9     11     13      0
     Autocorr      0      1      1      1      0        0      1      1      1      0
     Gaussian      0      0      0      1      5        0      0      0      1      5
        Pulse      0      0      0      0      0        0      0      0      0      2
      Triplet      0      1      1      2      0        0      1      1      2      1
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      0      0      0      0      1        0      0      0      0      1
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   0     14     16     20      7        0     14     16     20     10

Unmatched signal(s) in R1 at line(s) 499 526 580 607 634 694 720
Unmatched signal(s) in R2 at line(s) 482 509 526 569 595 649 676 703 763 789
For R1:R2 matched signals only, Q= 7.885%
Result      : Weakly similar.
---------------------------------------------------
Done with reference_work_unit_r3215.wu.
Current WU: sniff.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 199 seconds
---------------------------------------------------
Running app with command : MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_gr 256 -oclfft_tune_wg 128 -device 2
Elapsed Time : ……………………………… 25 seconds
Speed compared to default : 796 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      2      5     10      1        0      2      5     10      0
     Autocorr      0      1      1      2      0        0      1      1      2      0
     Gaussian      0      0      0      7      4        0      0      0      7      4
        Pulse      0      1      1      1      2        0      1      1      1      2
      Triplet      2      2      2      2      0        2      2      2      2      0
   Best Spike      0      0      1      1      0        0      0      1      1      0
Best Autocorr      0      0      0      1      0        0      0      0      1      0
Best Gaussian      0      0      0      0      1        0      0      0      0      1
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      1      1      1      1      0        1      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   3      7     11     25      9        3      7     11     25      8

Unmatched signal(s) in R1 at line(s) 554 613 738 765 792 808 834 894 920
Unmatched signal(s) in R2 at line(s) 586 695 738 765 792 818 878 904
For R1:R2 matched signals only, Q= ????
Result      : Weakly similar.
---------------------------------------------------

Bad juju going on there with MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin
The CUDA App is looking much better, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959
ID: 1817338 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1817421 - Posted: 15 Sep 2016, 23:38:55 UTC - in response to Message 1817338.  
Last modified: 15 Sep 2016, 23:43:21 UTC

Yep, the last time I looked at the Mac Hosts at Beta all the Hosts idled by the block on Darwin 15.x are just sitting there doing nothing. They are either not aware they could be testing the new Apps or they don't want to be like everyone else and install a nVidia driver to run CUDA on their Mac. It won't hurt people, everyone running Windows and Linux also have to install a Driver to run SETI work. Just install the latest driver and update it when you update the OS, http://www.nvidia.com/object/mac-driver-archive.html If you're running an older card in Mountain Lion or Lion install this one, http://www.nvidia.com/object/macosx-cuda-5.5.47-driver.html

I built another OpenCL App earlier, it's not any better than the last one in Darwin 15.6;
Running on TomsMacPro.local at Thu Sep 15 13:57:29 2016
---------------------------------------------------
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
reference_work_unit_r3215.wu sniff.wu

Listing executable(s) in /APPS :
MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: reference_work_unit_r3215.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 2110 seconds
---------------------------------------------------
Running app with command : MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_gr 256 -oclfft_tune_wg 128 -device 2
Elapsed Time : ……………………………… 418 seconds
Speed compared to default : 504 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      9     11     13      0        0      9     11     13      0
     Autocorr      0      1      1      1      0        0      1      1      1      0
     Gaussian      0      0      0      1      5        0      0      0      1      5
        Pulse      0      0      0      0      0        0      0      0      0      2
      Triplet      0      1      1      2      0        0      1      1      2      1
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      0      0      0      0      1        0      0      0      0      1
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   0     14     16     20      7        0     14     16     20     10

Unmatched signal(s) in R1 at line(s) 499 526 580 607 634 694 720
Unmatched signal(s) in R2 at line(s) 482 509 526 569 595 649 676 703 763 789
For R1:R2 matched signals only, Q= 7.885%
Result      : Weakly similar.
---------------------------------------------------
Done with reference_work_unit_r3215.wu.
Current WU: sniff.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 199 seconds
---------------------------------------------------
Running app with command : MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_gr 256 -oclfft_tune_wg 128 -device 2
Elapsed Time : ……………………………… 25 seconds
Speed compared to default : 796 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      2      5     10      1        0      2      5     10      0
     Autocorr      0      1      1      2      0        0      1      1      2      0
     Gaussian      0      0      0      7      4        0      0      0      7      4
        Pulse      0      1      1      1      2        0      1      1      1      2
      Triplet      2      2      2      2      0        2      2      2      2      0
   Best Spike      0      0      1      1      0        0      0      1      1      0
Best Autocorr      0      0      0      1      0        0      0      0      1      0
Best Gaussian      0      0      0      0      1        0      0      0      0      1
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      1      1      1      1      0        1      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   3      7     11     25      9        3      7     11     25      8

Unmatched signal(s) in R1 at line(s) 554 613 738 765 792 808 834 894 920
Unmatched signal(s) in R2 at line(s) 586 695 738 765 792 818 878 904
For R1:R2 matched signals only, Q= ????
Result      : Weakly similar.
---------------------------------------------------

Bad juju going on there with MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin
The CUDA App is looking much better, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959


Hi,

Should anything show with zi3i as a no go / stop working .. then ...

How about, for MAC, rigging the old and reliable zi with unroll?


I need your cudaAcceleration.cu and cudaAcceleration.h plus cudaAcc_pulsefind.cu and confsettings.cpp in addition to main.cpp


With those files (in a zip to my email) I'll return a zi+ version to test with MAC. (I hope you're not too tired yet to testing, testing, testing, ... "Is this thing even on", testing, ..., "now it works. So ..")

That would give a nice guppi speed boost and hopefully maintain usability, I guess.

Should this give you any kind of a ahead ache I will not mention it again :)
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1817421 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1817447 - Posted: 16 Sep 2016, 1:15:16 UTC

ok, getting ready for the weekend run, the Mac Pro here is on no new tasks for main, and project reset on beta to see what it gets there. Probably will get a bit too toasty if the Radeon kicks on as well, and confuse the Cuda testing, so will exclude it a bit later
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1817447 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1817484 - Posted: 16 Sep 2016, 4:49:02 UTC - in response to Message 1817421.  
Last modified: 16 Sep 2016, 5:04:23 UTC

...For R1:R2 matched signals only, Q= ????
Result : Weakly similar.
---------------------------------------------------
Bad juju going on there with MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin
The CUDA App is looking much better, http://setiweb.ssl.berkeley.edu/beta/results.php?hostid=63959

Hi,

Should anything show with zi3i as a no go / stop working .. then ...

How about, for MAC, rigging the old and reliable zi with unroll?


I need your cudaAcceleration.cu and cudaAcceleration.h plus cudaAcc_pulsefind.cu and confsettings.cpp in addition to main.cpp


With those files (in a zip to my email) I'll return a zi+ version to test with MAC. (I hope you're not too tired yet to testing, testing, testing, ... "Is this thing even on", testing, ..., "now it works. So ..")

That would give a nice guppi speed boost and hopefully maintain usability, I guess.

Should this give you any kind of a ahead ache I will not mention it again :)

I'll send you the RAW folder from r3470. It's the last sah_v7_opt.zip I have that still has the PetriR_raw folder. The problems with x41p_zi is it fails on most resumed tasks and doesn't have the Device Selection fix. Otherwise, it works on my Mac the same way it works on this Mac, http://setiathome.berkeley.edu/results.php?hostid=7942417&offset=340
Right now the x41p_zi3 Apps fail to work correctly with my 750Ti cards with the current 7.5.30 driver. Using the Beta 8.0.29 driver appears to work in Darwin 15.4, but, I'm still getting about twice as many Inconclusives with x41p_zi3i as with x41p_zi. Of course, the GPUs older than Compute Capability 3.2 will still have to use the Baseline Apps currently at Beta. Most of the Mac nVidia GPUs are in Laptops & iMacs and have Compute Capability 3.0, so, they can only use the Baseline Apps.
ID: 1817484 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1817497 - Posted: 16 Sep 2016, 6:38:12 UTC - in response to Message 1817484.  

Ah, yeah, two things that may help: First @petri33 a reminder to update your main.cpp from svn because of a boincapi change (this fixes the device selection). I applied Juha's patch to both baseline and alpha main.cpp.

Second, My Linux machine is GTX 680 ( Kepler class but only compute capability 3.0). I'll be pretty determined to refactor the 1 or 2 compute capability 3.2+ demanding kernels in alpha early, since dividing support in the middle of of a major compute capability is bound to cause issues in the same Way Boinc does, by changing behaviour in the middle of a major version.

Probably at least the second issue will remain an issue until I can take the time to inject the proper preprocessor controls in the .cu files, however that's probably a comparatively minor issue compared to the major ones likely to crop up soon.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1817497 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1817538 - Posted: 16 Sep 2016, 12:42:23 UTC - in response to Message 1817484.  

@TBar

Hi, I'm working on the 'zi plus unroll' right now. Some adjustment needed to get the unroll going.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1817538 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1817539 - Posted: 16 Sep 2016, 12:43:04 UTC - in response to Message 1817497.  

Ah, yeah, two things that may help: First @petri33 a reminder to update your main.cpp from svn because of a boincapi change (this fixes the device selection). I applied Juha's patch to both baseline and alpha main.cpp.

Second, My Linux machine is GTX 680 ( Kepler class but only compute capability 3.0). I'll be pretty determined to refactor the 1 or 2 compute capability 3.2+ demanding kernels in alpha early, since dividing support in the middle of of a major compute capability is bound to cause issues in the same Way Boinc does, by changing behaviour in the middle of a major version.

Probably at least the second issue will remain an issue until I can take the time to inject the proper preprocessor controls in the .cu files, however that's probably a comparatively minor issue compared to the major ones likely to crop up soon.


Thanks, will do.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1817539 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1817542 - Posted: 16 Sep 2016, 13:11:33 UTC - in response to Message 1817539.  

Ah, yeah, two things that may help: First @petri33 a reminder to update your main.cpp from svn because of a boincapi change (this fixes the device selection). I applied Juha's patch to both baseline and alpha main.cpp.

Second, My Linux machine is GTX 680 ( Kepler class but only compute capability 3.0). I'll be pretty determined to refactor the 1 or 2 compute capability 3.2+ demanding kernels in alpha early, since dividing support in the middle of of a major compute capability is bound to cause issues in the same Way Boinc does, by changing behaviour in the middle of a major version.

Probably at least the second issue will remain an issue until I can take the time to inject the proper preprocessor controls in the .cu files, however that's probably a comparatively minor issue compared to the major ones likely to crop up soon.


Thanks, will do.

Also, the older x41p_zi needs the Blocking Sync. I built a version last week with the Older BS changes, but it was producing infrequent Overflows with 30 Gaussians. I built another without the BS on the Gaussian line, but haven't had a chance to test it. I suppose I could try it now even though about all I've got are GUPPIs. It would be nice to have a few Arecibo tasks to breakup all these GUPPIs.
ID: 1817542 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1817543 - Posted: 16 Sep 2016, 13:20:38 UTC - in response to Message 1817542.  

Ah, yeah, two things that may help: First @petri33 a reminder to update your main.cpp from svn because of a boincapi change (this fixes the device selection). I applied Juha's patch to both baseline and alpha main.cpp.

Second, My Linux machine is GTX 680 ( Kepler class but only compute capability 3.0). I'll be pretty determined to refactor the 1 or 2 compute capability 3.2+ demanding kernels in alpha early, since dividing support in the middle of of a major compute capability is bound to cause issues in the same Way Boinc does, by changing behaviour in the middle of a major version.

Probably at least the second issue will remain an issue until I can take the time to inject the proper preprocessor controls in the .cu files, however that's probably a comparatively minor issue compared to the major ones likely to crop up soon.


Thanks, will do.

Also, the older x41p_zi needs the Blocking Sync. I built a version last week with the Older BS changes, but it was producing infrequent Overflows with 30 Gaussians. I built another without the BS on the Gaussian line, but haven't had a chance to test it. I suppose I could try it now even though about all I've got are GUPPIs. It would be nice to have a few Arecibo tasks to breakup all these GUPPIs.


Yeah the blocking sync things behave (slightly) differently on each of the 3 platforms afaict. That's, logically, probably an artefact of artefact of OS+driver differences, so where present will probably be imperfect until down the road.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1817543 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1817558 - Posted: 16 Sep 2016, 15:22:34 UTC - in response to Message 1817338.  

I built another OpenCL App earlier, it's not any better than the last one in Darwin 15.6;
Running on TomsMacPro.local at Thu Sep 15 13:57:29 2016
---------------------------------------------------
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
reference_work_unit_r3215.wu sniff.wu

Listing executable(s) in /APPS :
MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: reference_work_unit_r3215.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 2110 seconds
---------------------------------------------------
Running app with command : MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_gr 256 -oclfft_tune_wg 128 -device 2
Elapsed Time : ……………………………… 418 seconds
Speed compared to default : 504 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      9     11     13      0        0      9     11     13      0
     Autocorr      0      1      1      1      0        0      1      1      1      0
     Gaussian      0      0      0      1      5        0      0      0      1      5
        Pulse      0      0      0      0      0        0      0      0      0      2
      Triplet      0      1      1      2      0        0      1      1      2      1
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      0      0      0      0      1        0      0      0      0      1
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   0     14     16     20      7        0     14     16     20     10

Unmatched signal(s) in R1 at line(s) 499 526 580 607 634 694 720
Unmatched signal(s) in R2 at line(s) 482 509 526 569 595 649 676 703 763 789
For R1:R2 matched signals only, Q= 7.885%
Result      : Weakly similar.
---------------------------------------------------
Done with reference_work_unit_r3215.wu.
...

Here's the same MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin running in Darwin 14.5;
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
reference_work_unit_r3215.wu sniff.wu

Listing executable(s) in /APPS :
MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin setiathome_8.11_x86_64-apple-darwin__cuda75_mac setiathome_x41p_zi_x86_64-apple-darwin_cuda75

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: reference_work_unit_r3215.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 2198 seconds
---------------------------------------------------
Running app with command : MBv8_8.18r3528_NV_ssse3_x86_64-apple-darwin -sbs 192 -oclfft_tune_gr 256 -oclfft_tune_wg 128 -device 2
      248.18 real        57.34 user        85.84 sys
Elapsed Time : ……………………………… 248 seconds
Speed compared to default : 886 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.49%
---------------------------------------------------
Running app with command : setiathome_8.11_x86_64-apple-darwin__cuda75_mac -device 2
      224.89 real        42.92 user        31.60 sys
Elapsed Time : ……………………………… 225 seconds
Speed compared to default : 976 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.82%
---------------------------------------------------
Running app with command : setiathome_x41p_zi_x86_64-apple-darwin_cuda75 -device 2
      115.73 real        32.38 user         9.21 sys
Elapsed Time : ……………………………… 116 seconds
Speed compared to default : 1894 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.81%
---------------------------------------------------
Done with reference_work_unit_r3215.wu.

Result : Strongly similar, Q= 99.49%
Quite a bit different than in Darwin 15.6.
The x41p_zi in that test is the newer build with the blocking sync.
The setiathome_8.11_x86_64-apple-darwin__cuda75_mac in the test is the App at Beta.
ID: 1817558 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1817624 - Posted: 16 Sep 2016, 19:33:42 UTC - in response to Message 1817542.  

Ah, yeah, two things that may help: First @petri33 a reminder to update your main.cpp from svn because of a boincapi change (this fixes the device selection). I applied Juha's patch to both baseline and alpha main.cpp.

Second, My Linux machine is GTX 680 ( Kepler class but only compute capability 3.0). I'll be pretty determined to refactor the 1 or 2 compute capability 3.2+ demanding kernels in alpha early, since dividing support in the middle of of a major compute capability is bound to cause issues in the same Way Boinc does, by changing behaviour in the middle of a major version.

Probably at least the second issue will remain an issue until I can take the time to inject the proper preprocessor controls in the .cu files, however that's probably a comparatively minor issue compared to the major ones likely to crop up soon.


Thanks, will do.

Also, the older x41p_zi needs the Blocking Sync. I built a version last week with the Older BS changes, but it was producing infrequent Overflows with 30 Gaussians. I built another without the BS on the Gaussian line, but haven't had a chance to test it. I suppose I could try it now even though about all I've got are GUPPIs. It would be nice to have a few Arecibo tasks to breakup all these GUPPIs.


I'll add blocking sync -bs flag and the device seletion thingy too.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1817624 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1817748 - Posted: 17 Sep 2016, 3:22:52 UTC - in response to Message 1817624.  
Last modified: 17 Sep 2016, 3:24:24 UTC

The last x41p_zi build is working very well. Since starting it around 1500 UTC I haven't received a single Inconclusive running it in Darwin 14.5 with driver 7.5.27. No problems with the 750Ti, or anything else. I used this for the blocking sync;
  cudaEventCreateWithFlags(&chirpDoneEvent, cudaEventDisableTiming);
  cudaEventCreateWithFlags(&fftDoneEvent, cudaEventDisableTiming);
  cudaEventCreateWithFlags(&summaxDoneEvent, cudaEventDisableTiming|cudaEventBlockingSync);
  cudaEventCreateWithFlags(&powerspectrumDoneEvent, cudaEventDisableTiming);

  cudaEventCreateWithFlags(&autocorrelationDoneEvent, cudaEventDisableTiming|cudaEventBlockingSync);
  cudaEventCreateWithFlags(&autocorrelationRepackDoneEvent, cudaEventDisableTiming);
  cudaEventCreateWithFlags(&ac_reduce_partialEvent, cudaEventDisableTiming);
      
  cudaEventCreateWithFlags(&tripletsDoneEvent, cudaEventDisableTiming);
  cudaEventCreateWithFlags(&tripletsDoneEvent1, cudaEventDisableTiming|cudaEventBlockingSync);
  cudaEventCreateWithFlags(&pulseDoneEvent, cudaEventDisableTiming);
  cudaEventCreateWithFlags(&pulseDoneEvent1, cudaEventDisableTiming|cudaEventBlockingSync);
  cudaEventCreateWithFlags(&gaussDoneEvent, cudaEventDisableTiming);
  cudaEventCreateWithFlags(&gaussDoneEvent2, cudaEventDisableTiming|cudaEventBlockingSync);

As usual, I commented out the last cudaDeviceReset to keep from getting the SIGBUS errors;
#if(CUDART_VERSION >= 4000)
fprintf(stderr, "11,");
// cudaDeviceReset();
fprintf(stderr, "12,");
#else

This is much better than what happens with the 750Ti running x41p_zi3i with driver 7.5.x;
http://setiathome.berkeley.edu/results.php?hostid=6796479&state=5
5156682094 	2264940596 	14 Sep 2016, 18:51:33 UTC 	15 Sep 2016, 14:59:46 UTC 	Completed, marked as invalid 	40.43 	13.49 	0.00 	SETI@home v8 Anonymous platform (NVIDIA GPU)
5156675797 	2264937406 	14 Sep 2016, 18:46:25 UTC 	15 Sep 2016, 14:59:46 UTC 	Completed, marked as invalid 	10.30 	3.82 	0.00 	SETI@home v8 Anonymous platform (NVIDIA GPU)
5156634063 	2264917766 	14 Sep 2016, 18:15:14 UTC 	15 Sep 2016, 5:19:06 UTC 	Completed, marked as invalid 	234.82 	167.19 	0.00 	SETI@home v8 Anonymous platform (NVIDIA GPU)
5156620485 	2264911671 	14 Sep 2016, 18:04:57 UTC 	15 Sep 2016, 5:05:30 UTC 	Completed, marked as invalid 	104.69 	76.00 	0.00 	SETI@home v8 Anonymous platform (NVIDIA GPU)
5156600475 	2264902185 	14 Sep 2016, 17:49:27 UTC 	15 Sep 2016, 14:41:07 UTC 	Completed, marked as invalid 	289.13 	89.06 	0.00 	SETI@home v8 Anonymous platform (NVIDIA GPU)

Hopefully the unroll feature will result in the x41p_zi being as fast as x41p_zi3i with the GUPPIs.
ID: 1817748 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1817784 - Posted: 17 Sep 2016, 8:49:42 UTC

@TBar

I'got now a running zi+ with -unroll and -bs
Preliminary test show that it is working at least on my 1080 linux system.

The newer zi3 is a bit faster, but I'll let you to run the MAC 750Ti tests. Then we will know if the -unroll and -bs do work as intended.

Now I'll put the device selection in to it.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1817784 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1817787 - Posted: 17 Sep 2016, 9:22:26 UTC

@TBar
you've got email.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1817787 · Report as offensive
Previous · 1 . . . 36 · 37 · 38 · 39 · 40 · 41 · 42 . . . 58 · Next

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.