I've Built a Couple OSX CUDA Apps...

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 41 · 42 · 43 · 44 · 45 · 46 · 47 . . . 49 · Next

AuthorMessage
Richard HaselgroveProject Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 11516
Credit: 106,098,599
RAC: 70,220
United Kingdom
Message 1831222 - Posted: 18 Nov 2016, 15:21:49 UTC - in response to Message 1831214.  

Yeah, you won't see the problem since normalisation fixes that. The discrepancy is purely the two different GFlops Numbers [In Plain sight]. One connected to what you see, and the other connected to the actual backend drive scheduling.

OK, I'll finish lunch and head downstairs to code-walk the line numbers in your email. That may take some time...

If you can explain two different GFlops estimates for the same device as anything better than "WTF', then I will owe you even more respect than I already grant you. If you can explain to me why we should deliberately underestimate by a factor of four or more, then that's bonus points.

If I had access to your Wiki page (or equivalent) for which variable name referenced which real-world value (and where in the code spaghetti the variable's value was set), I might make faster progress.
ID: 1831222 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3788
Credit: 186,094,577
RAC: 236,691
United States
Message 1831262 - Posted: 18 Nov 2016, 19:16:47 UTC

While looking over the New CUDA Apps on Main, I ran across a couple Hosts running the Older nVidia OpenCL Apps. Any nVidia OpenCL App older than r3551 will Not give the correct results under OSX 10.11.x and 10.12.x. If you are running one of those Apps you should Update to the New OpenCL App r3551 which does give the correct results in El Capitan & Sierra, and gives much better performance than the Older OSX OpenCL Apps.

The App is here, nVidia_r3551&CPUr3344.zip
This is the same App currently at Beta.
ID: 1831262 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5804
Credit: 75,966,129
RAC: 50,436
Russia
Message 1831263 - Posted: 18 Nov 2016, 19:19:55 UTC - in response to Message 1831262.  
Last modified: 18 Nov 2016, 19:21:20 UTC

While looking over the New CUDA Apps on Main, I ran across a couple Hosts running the Older nVidia OpenCL Apps. Any nVidia OpenCL App older than r3551 will Not give the correct results under OSX 10.11.x and 10.12.x. If you are running one of those Apps you should Update to the New OpenCL App r3551 which does give the correct results in El Capitan & Sierra, and gives much better performance than the Older OSX OpenCL Apps.

The App is here, nVidia_r3551&CPUr3344.zip
This is the same App currently at Beta.


I would suggest to create separate thread with detailed description (you put lot of info in this and beta threads already, but it's scattered through them - need to be gathered) where to get and how to install and "what for what" regarding CUDA and OpenCL on OS X (we have stock CUDA on main already!). And then to make that thread sticky.
IMHO this way we accomplish better information reach.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1831263 · Report as offensive     Reply Quote
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7471
Credit: 91,023,003
RAC: 10,759
Australia
Message 1831312 - Posted: 18 Nov 2016, 22:58:58 UTC - in response to Message 1831222.  

If I had access to your Wiki page (or equivalent) for which variable name referenced which real-world value (and where in the code spaghetti the variable's value was set), I might make faster progress.


Can do. Will email after awake, and can be finished polluting TBArs thread with GFlops stuff
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1831312 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3788
Credit: 186,094,577
RAC: 236,691
United States
Message 1832245 - Posted: 24 Nov 2016, 20:22:09 UTC - in response to Message 1831263.  

I would suggest to create separate thread with detailed description (you put lot of info in this and beta threads already, but it's scattered through them - need to be gathered) where to get and how to install and "what for what" regarding CUDA and OpenCL on OS X (we have stock CUDA on main already!). And then to make that thread sticky.
IMHO this way we accomplish better information reach.

But, as soon as the Apps are placed on Main there won't be any need. The OpenCL Apps don't really need anything special because the driver is built into the OS, and apparently you can keep Repeating "Update your CUDA Driver" until the Cows come home and it won't make a bit of difference.

BTW, they released another one the other day;
Release Date: 2016.11.22 New Release 8.0.53
http://www.nvidia.com/object/macosx-cuda-8.0.53-driver.html
ID: 1832245 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3788
Credit: 186,094,577
RAC: 236,691
United States
Message 1833544 - Posted: 1 Dec 2016, 21:19:14 UTC - in response to Message 1831312.  

I was just looking at the repository and noticed the last CUDA Special App there is zi3f , https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha/PetriR_raw3/analyzeFuncs.cpp#L715
It's possible the zi+ code may work reasonably well when compiled in ToolKit 6.0 and run with a CUDA 8 driver. I'm still looking at it.
Anyway the zi+ code can be added to the repository just in case this build works out? Since early on the 30th the two builds look to be promising.
ID: 1833544 · Report as offensive     Reply Quote
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7471
Credit: 91,023,003
RAC: 10,759
Australia
Message 1833763 - Posted: 3 Dec 2016, 2:30:02 UTC - in response to Message 1833544.  
Last modified: 3 Dec 2016, 2:31:08 UTC

I was just looking at the repository and noticed the last CUDA Special App there is zi3f , https://setisvn.ssl.berkeley.edu/trac/browser/branches/sah_v7_opt/Xbranch/client/alpha/PetriR_raw3/analyzeFuncs.cpp#L715
It's possible the zi+ code may work reasonably well when compiled in ToolKit 6.0 and run with a CUDA 8 driver. I'm still looking at it.
Anyway the zi+ code can be added to the repository just in case this build works out? Since early on the 30th the two builds look to be promising.


x41zi+a is looking promising, but has a number of showstopping warts I'm trying to clarify the source of prior to committing. Those are namely intermittent failure with 'unknown error' in the powerspectrum/summax pipeline, and some very odd validation concerns that didn't appear at first. I had suspected my own system + GTX980 as being a possible issue related to what I was seeing, but the picture hasn't really improved with brand new 1050ti, and very thorough maintenance & system checks. In that light I switched back to prepping x42 infrastructure to receive all of baseline, Petri, and internal regression code. The situation has basically become complex enough that the full range of issues won't be completely isolated until the tools exist to find the problems :)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1833763 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3788
Credit: 186,094,577
RAC: 236,691
United States
Message 1833785 - Posted: 3 Dec 2016, 4:11:46 UTC - in response to Message 1833763.  

It looks as though I'll have to try another build with OSX, but, the Linux App still seems to be doing well. I swapped out the BOINC folders yesterday so the Linux Host had a low number of existing Inconclusives and so far it's still running at around 20. About half of those are from the ATI App on the previous Host. This is much better than the ~70 it was producing with the higher Toolkits. The only difference with this build from the others is it was built in Ubuntu 12.04.5 with Toolkit 6 and driver 346.59. It would be interesting to see how it runs on other machines. I ran a few tasks at Beta just to see how it ran with a BLC7, https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=76256 Not bad for a 750Ti.
ID: 1833785 · Report as offensive     Reply Quote
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7471
Credit: 91,023,003
RAC: 10,759
Australia
Message 1833786 - Posted: 3 Dec 2016, 4:29:06 UTC - in response to Message 1833785.  

Yes x41+a did well here to some point (before the described showstoppers). It'll be interesting to see if you come up with something the same or different :)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1833786 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3788
Credit: 186,094,577
RAC: 236,691
United States
Message 1834004 - Posted: 4 Dec 2016, 12:39:19 UTC - in response to Message 1833786.  
Last modified: 4 Dec 2016, 12:54:48 UTC

Well, the problems I've been noticing are the high number of Inconclusive results, the need to use the CUDA 8 driver to avoid unmatched random overflows, and the Invalid quick overflows. I really haven't had any 'simple' errors on any machine. The last Ubuntu 12.04 build seems to have solved most of those problems. The Inconclusives seem to have leveled out at around 30, which isn't bad with around 300 Valids a day, and I haven't had any Errors or Invalids. The Linux build x41p_zi+ looks acceptable. The OSX build is still being tested as I just solved the problem with ToolKits 5.0-6.0 using xCode 4&5. The last build was in Mountain Lion with ToolKit 6 and it's too early to notice any major difference, other than the Invalid Quick Overflows appears to have stopped. With all the existing Inconclusives on that Host it might be a while to notice any difference.

So, the Linux App needs a copy of the x41p_zi+ code somewhere so I can post the App. Hmmm, I suppose I need to check the checkpoints out to see if it will resume a task without immediately overflowing. That's on the bottom of my list though...
ID: 1834004 · Report as offensive     Reply Quote
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7471
Credit: 91,023,003
RAC: 10,759
Australia
Message 1834155 - Posted: 4 Dec 2016, 23:18:23 UTC - in response to Message 1834004.  
Last modified: 4 Dec 2016, 23:27:44 UTC

Another main exceptional area to watch is the 'too many triplets' fallback processing, which at least in my own build appears to be broken, and falling into a 'funk'. That's an extremely rare situation, but has happened here at least once in recent days, with the probability going up with noisy data.

What I'll probably do is update the svn alpha code with the zi+a submission as supplied by Petri, but continue focussing on the clean OOA&D 'long game' which is turning up some interesting opportunities/holes in the old app design(*). Am likely to have more time from the end of this week (fingers crossed).

[ (*), namely the baseline, alternate optimised, Cuda internal, and exceptional case paths are behaving polymorphically already in an incomplete way, which opens up a number of streamlining/simplification options]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1834155 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3788
Credit: 186,094,577
RAC: 236,691
United States
Message 1834314 - Posted: 5 Dec 2016, 16:48:48 UTC - in response to Message 1834155.  

The remaining problem I'm seeing is the App picking the correct 'Best' signal. A similar case can be found where the OpenCL App picked the wrong Best autocorr here, https://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2266&postid=60130#60130
I decided to check one of the Mac's suspicious Inconclusives here, http://setiathome.berkeley.edu/workunit.php?wuid=2342099371
I had to run the task offline and received interesting results;
Current WU: 09fe09aa.12449.3344.4.31.24.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 8861 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 5 -device 0
      354.29 real       110.12 user        32.54 sys
Elapsed Time : ……………………………… 354 seconds
Speed compared to default : 2503 %
              Bad
Best Pulse     1

Unmatched signal(s) in R1 at line(s) 450
Unmatched signal(s) in R2 at line(s) 450
For R1:R2 matched signals only, Q= 99.97%
Result      : Weakly similar.

Everything was near perfect except it picked the wrong Best Pulse even though a reportable signal wasn't found.
The result files show;
<best_pulse>
<peak_power>5.6462993621826</peak_power>
<mean_power>0.07098638266325</mean_power>
<time>2454871.7238501</time>
<ra>10.217911822692</ra>
<decl>1.5385880378147</decl>
<q_pix>0</q_pix>
<freq>1420236148.8342</freq>
<detection_freq>1420236195.1932</detection_freq>
<barycentric_freq>0</barycentric_freq>
<fft_len>1024</fft_len>
<chirp_rate>7.6889244970957</chirp_rate>
&
<best_pulse>
<peak_power>5.5777869224548</peak_power>
<mean_power>0.070986375212669</mean_power>
<time>2454871.7238501</time>
<ra>10.217911822692</ra>
<decl>1.5385880378147</decl>
<q_pix>0</q_pix>
<freq>1420236148.8342</freq>
<detection_freq>1420236195.1932</detection_freq>
<barycentric_freq>0</barycentric_freq>
<fft_len>1024</fft_len>
<chirp_rate>7.6889244970957</chirp_rate>


It doesn't happen often, but it does happen...with both OpenCL & CUDA.
ID: 1834314 · Report as offensive     Reply Quote
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 227
Credit: 306,950,197
RAC: 244,783
United States
Message 1834315 - Posted: 5 Dec 2016, 16:49:11 UTC - in response to Message 1834004.  

TBar,

Lemme know when you need some more data points on the OS X version and I'll update my 750ti rig. The zi version of the app I have now has been pretty problem free, inconclusives have been pretty steady at 9% or so...

Thanks,

Chris
ID: 1834315 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3788
Credit: 186,094,577
RAC: 236,691
United States
Message 1834410 - Posted: 6 Dec 2016, 2:58:56 UTC - in response to Message 1834315.  
Last modified: 6 Dec 2016, 2:59:59 UTC

I see you found where to place the commandlines in the app_info. It looks to be working well now.

The Linux version is posted at Crunchers Anonymous for those with Linux machines.
My Linux machine is running very well, once you allow for the leftover ATI tasks, the overflows, and the usual suspects, the Inconclusives are very low.
The Linux version can be downloaded here; http://www.arkayn.us/forum/index.php?topic=197.msg4499#msg4499
ID: 1834410 · Report as offensive     Reply Quote
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7471
Credit: 91,023,003
RAC: 10,759
Australia
Message 1834418 - Posted: 6 Dec 2016, 3:19:29 UTC - in response to Message 1834314.  
Last modified: 6 Dec 2016, 3:23:02 UTC

It doesn't happen often, but it does happen...with both OpenCL & CUDA.


That mean power variation from the ~8th decimal point, is enough to say the scores for those particular best pulses are a toss of a coin. Most of the pipeline is single precision, while the displayed value is double precision. Eric and I have discussed a similar threshold related issue in the past, that remains, however with no reportables then you are indeed in the noise floor (computational or telescope).

IMO the nature of that particular best variation implies in some places we're starting to enter limitations beyond our influence. Naturally there are still things to address in the reportable region, but 'best only' bad matches are likely noise, while good ones are just feelgood confirmation. ['Correct' best only matches should really be scattershot, if all the thresholds and applications were perfect for their precision limits. According the the Bullshit detection Kit inspired by Carl Sagan anyway]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1834418 · Report as offensive     Reply Quote
Chris Adamek
Volunteer tester

Send message
Joined: 15 May 99
Posts: 227
Credit: 306,950,197
RAC: 244,783
United States
Message 1834469 - Posted: 6 Dec 2016, 12:59:44 UTC - in response to Message 1834410.  

Yup, chugging along pretty well. Haven't noticed a big uptick in inconclusives so far either. We'll see what it looks like in a week or so.

Thanks,

Chris
ID: 1834469 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3788
Credit: 186,094,577
RAC: 236,691
United States
Message 1834748 - Posted: 8 Dec 2016, 5:10:29 UTC
Last modified: 8 Dec 2016, 5:26:18 UTC

Decided to try 4 of the recent Mac Inconclusives in the Benchmark App using x41p_zi3i and x41p_zi+. Of the 3 normal tasks p_zi3i resulted in 1 bad Pulse per task and 1 task had 1 bad Best Pulse to go along with the Bad Pulse. The p_zi+ App had 2 tasks with 1 Bad Pulse and the same task had the Bad Best Pulse as p_zi3i. Both Apps were correct on the Late Overflow task #4.
Then I decided to match the current p_zi+ with the Old p_zi I had modified with the Blocking Sync. The results were interesting. The Old 'Original' x41p_zi has the Autocorr Error that has since been corrected, but Not the Pulsefind Error. For some reason p_zi+ found a Bad Pulse the second time it ran the Late Overflow. The New x41p_zi+ doesn't have the Autocorr Error but picked up the Pulsefind Error when the Unroll feature was added. That's the way it appears to me anyway. I'm running the same test on the Linux version and will post the results in the Linux thread when finished.

The Results;
Last login: Wed Dec  7 09:29:24 on ttys000
TomsMacPro:~ Tom$ cd /Users/Tom/KWSN-OSX-bench-MB 
TomsMacPro:KWSN-OSX-bench-MB Tom$ ./benchmark
KWSN-Darwin-MBbench v2.1.07
Running on TomsMacPro.local at Thu Dec 8 01:09:17 2016
---------------------------------------------------
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
18au09aa.4654.85539.7.34.226.wu 18dc09ah.26284.16432.6.33.125.wu blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu

Listing executable(s) in /APPS :
setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 setiathome_x41p_zi3i_x86_64-apple-darwin_cuda75

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: 18au09aa.4654.85539.7.34.226.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 8649 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 5 -device 0
      364.03 real       111.97 user        32.75 sys
Elapsed Time : ……………………………… 364 seconds
Speed compared to default : 2376 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.95%
---------------------------------------------------
Running app with command : setiathome_x41p_zi3i_x86_64-apple-darwin_cuda75 -bs -unroll 5 -device 0
      326.61 real       165.12 user        35.44 sys
Elapsed Time : ……………………………… 327 seconds
Speed compared to default : 2644 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      5      5      5      0        0      5      5      5      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      4      4      4      1        0      4      4      4      1
      Triplet      0      0      0      0      0        0      0      0      0      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      0      1      1      1      0        0      1      1      1      0
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      0      0      0      0      0        0      0      0      0      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   0     12     12     12      2        0     12     12     12      2

Unmatched signal(s) in R1 at line(s) 422 611
Unmatched signal(s) in R2 at line(s) 422 611
For R1:R2 matched signals only, Q= 99.95%
Result      : Weakly similar.
---------------------------------------------------
Done with 18au09aa.4654.85539.7.34.226.wu.
Current WU: 18dc09ah.26284.16432.6.33.125.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 3517 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 5 -device 0
      146.44 real        17.18 user        11.67 sys
Elapsed Time : ……………………………… 147 seconds
Speed compared to default : 2392 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      0      0      0      0        0      0      0      0      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      0      0      0      1        0      0      0      0      1
      Triplet      0      3      3      3      0        0      3      3      3      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1      7      7      7      2        1      7      7      7      2

Unmatched signal(s) in R1 at line(s) 393 473
Unmatched signal(s) in R2 at line(s) 393 473
For R1:R2 matched signals only, Q= 99.99%
Result      : Weakly similar.
---------------------------------------------------
Running app with command : setiathome_x41p_zi3i_x86_64-apple-darwin_cuda75 -bs -unroll 5 -device 0
      141.32 real        92.31 user        19.85 sys
Elapsed Time : ……………………………… 141 seconds
Speed compared to default : 2494 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      0      0      0      0        0      0      0      0      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      0      0      0      1        0      0      0      0      1
      Triplet      0      3      3      3      0        0      3      3      3      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1      7      7      7      2        1      7      7      7      2

Unmatched signal(s) in R1 at line(s) 393 473
Unmatched signal(s) in R2 at line(s) 393 473
For R1:R2 matched signals only, Q= 99.99%
Result      : Weakly similar.
---------------------------------------------------
Done with 18dc09ah.26284.16432.6.33.125.wu.
Current WU: blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 6543 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 5 -device 0
      760.45 real        27.10 user        16.79 sys
Elapsed Time : ……………………………… 761 seconds
Speed compared to default : 859 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      1      1      1      0        0      1      1      1      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0     17     17     17      1        0     17     17     17      1
      Triplet      1      1      1      1      0        1      1      1      1      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      1      1      1      0        0      1      1      1      0
 Best Triplet      1      1      1      1      0        1      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   3     24     24     24      1        3     24     24     24      1

Unmatched signal(s) in R1 at line(s) 663
Unmatched signal(s) in R2 at line(s) 663
For R1:R2 matched signals only, Q= 99.25%
Result      : Weakly similar.
---------------------------------------------------
Running app with command : setiathome_x41p_zi3i_x86_64-apple-darwin_cuda75 -bs -unroll 5 -device 0
      714.78 real        80.76 user        23.18 sys
Elapsed Time : ……………………………… 715 seconds
Speed compared to default : 915 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      1      1      1      0        0      1      1      1      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0     17     17     17      1        0     17     17     17      1
      Triplet      1      1      1      1      0        1      1      1      1      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      1      1      1      0        0      1      1      1      0
 Best Triplet      1      1      1      1      0        1      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   3     24     24     24      1        3     24     24     24      1

Unmatched signal(s) in R1 at line(s) 663
Unmatched signal(s) in R2 at line(s) 663
For R1:R2 matched signals only, Q= 99.25%
Result      : Weakly similar.
---------------------------------------------------
Done with blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu.
Current WU: blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu
---------------------------------------------------
Running default app with command : MBv8_8.05r3344_sse41_x86_64-apple-darwin
      432.75 real       428.96 user         1.26 sys
Elapsed Time: ………………………………… 433 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 5 -device 0
       42.53 real         3.09 user         1.89 sys
Elapsed Time : ……………………………… 42 seconds
Speed compared to default : 1030 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      1     15     15     15      0        1     15     15     15      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0     12     12     13      0        0     12     12     13      0
      Triplet      0      2      2      2      0        0      2      2      2      0
   Best Spike      0      0      0      0      0        0      0      0      0      0
Best Gaussian      0      0      0      0      0        0      0      0      0      0
   Best Pulse      0      0      0      0      0        0      0      0      0      0
 Best Triplet      0      0      0      0      0        0      0      0      0      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1     29     29     30      0        1     29     29     30      0

Result      : Strongly similar,  Q= 38.66%
---------------------------------------------------
Running app with command : setiathome_x41p_zi3i_x86_64-apple-darwin_cuda75 -bs -unroll 5 -device 0
       40.86 real        10.64 user         2.80 sys
Elapsed Time : ……………………………… 41 seconds
Speed compared to default : 1056 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      3     15     15     15      0        3     15     15     15      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0     12     12     13      0        0     12     12     13      0
      Triplet      0      2      2      2      0        0      2      2      2      0
   Best Spike      0      0      0      0      0        0      0      0      0      0
Best Gaussian      0      0      0      0      0        0      0      0      0      0
   Best Pulse      0      0      0      0      0        0      0      0      0      0
 Best Triplet      0      0      0      0      0        0      0      0      0      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   3     29     29     30      0        3     29     29     30      0

Result      : Strongly similar,  Q= 38.66%
---------------------------------------------------
Done with blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu.

Done with Benchmark run! Removing temporary files!
TomsMacPro:KWSN-OSX-bench-MB Tom$ 
  [Restored Dec 7, 2016, 10:49:36 PM]
Last login: Wed Dec  7 21:41:25 on console
TomsMacPro:KWSN-OSX-bench-MB Tom$ ./benchmark
KWSN-Darwin-MBbench v2.1.07
Running on TomsMacPro.local at Thu Dec 8 03:50:09 2016
---------------------------------------------------
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
18au09aa.4654.85539.7.34.226.wu 18dc09ah.26284.16432.6.33.125.wu blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu

Listing executable(s) in /APPS :
setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 setiathome_x41p_zi_x86_64-apple-darwin_cuda75

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: 18au09aa.4654.85539.7.34.226.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 8649 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 5 -device 0
      383.26 real       115.04 user        33.62 sys
Elapsed Time : ……………………………… 383 seconds
Speed compared to default : 2258 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.95%
---------------------------------------------------
Running app with command : setiathome_x41p_zi_x86_64-apple-darwin_cuda75
      377.88 real       108.33 user        33.30 sys
Elapsed Time : ……………………………… 378 seconds
Speed compared to default : 2288 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.95%
---------------------------------------------------
Done with 18au09aa.4654.85539.7.34.226.wu.
Current WU: 18dc09ah.26284.16432.6.33.125.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 3517 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 5 -device 0
      155.80 real        17.08 user        11.36 sys
Elapsed Time : ……………………………… 156 seconds
Speed compared to default : 2254 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      0      0      0      0        0      0      0      0      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      0      0      0      1        0      0      0      0      1
      Triplet      0      3      3      3      0        0      3      3      3      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1      7      7      7      2        1      7      7      7      2

Unmatched signal(s) in R1 at line(s) 393 473
Unmatched signal(s) in R2 at line(s) 393 473
For R1:R2 matched signals only, Q= 99.99%
Result      : Weakly similar.
---------------------------------------------------
Running app with command : setiathome_x41p_zi_x86_64-apple-darwin_cuda75
      152.08 real        17.44 user        11.23 sys
Elapsed Time : ……………………………… 152 seconds
Speed compared to default : 2313 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      0      0      0      0        0      0      0      0      0
     Autocorr      0      0      0      0      0        0      0      0      0      1
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      1      1      1      0        0      1      1      1      0
      Triplet      0      3      3      3      0        0      3      3      3      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      0      0      0      1        0      0      0      0      1
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      1      1      1      0        0      1      1      1      0
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1      8      8      8      1        1      8      8      8      2

Unmatched signal(s) in R1 at line(s) 435
Unmatched signal(s) in R2 at line(s) 359 452
For R1:R2 matched signals only, Q= 99.70%
Result      : Weakly similar.
---------------------------------------------------
Done with 18dc09ah.26284.16432.6.33.125.wu.
Current WU: blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 6543 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 5 -device 0
      792.41 real        27.06 user        16.67 sys
Elapsed Time : ……………………………… 793 seconds
Speed compared to default : 825 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      1      1      1      0        0      1      1      1      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0     17     17     17      1        0     17     17     17      1
      Triplet      1      1      1      1      0        1      1      1      1      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      1      1      1      0        0      1      1      1      0
 Best Triplet      1      1      1      1      0        1      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   3     24     24     24      1        3     24     24     24      1

Unmatched signal(s) in R1 at line(s) 663
Unmatched signal(s) in R2 at line(s) 663
For R1:R2 matched signals only, Q= 99.25%
Result      : Weakly similar.
---------------------------------------------------
Running app with command : setiathome_x41p_zi_x86_64-apple-darwin_cuda75
     1146.46 real       245.66 user        53.32 sys
Elapsed Time : ……………………………… 1147 seconds
Speed compared to default : 570 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.25%
---------------------------------------------------
Done with blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu.
Current WU: blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 433 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 5 -device 0
       44.47 real         3.22 user         1.89 sys
Elapsed Time : ……………………………… 44 seconds
Speed compared to default : 984 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      1     15     15     15      0        1     15     15     15      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0     11     11     12      1        0     11     11     12      1
      Triplet      0      2      2      2      0        0      2      2      2      0
   Best Spike      0      0      0      0      0        0      0      0      0      0
Best Gaussian      0      0      0      0      0        0      0      0      0      0
   Best Pulse      0      0      0      0      0        0      0      0      0      0
 Best Triplet      0      0      0      0      0        0      0      0      0      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1     28     28     29      1        1     28     28     29      1

Unmatched signal(s) in R1 at line(s) 524
Unmatched signal(s) in R2 at line(s) 524
For R1:R2 matched signals only, Q= 38.66%
Result      : Weakly similar.
---------------------------------------------------
Running app with command : setiathome_x41p_zi_x86_64-apple-darwin_cuda75
       58.34 real        12.29 user         3.25 sys
Elapsed Time : ……………………………… 59 seconds
Speed compared to default : 733 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.32%
---------------------------------------------------
Done with blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu.

Done with Benchmark run! Removing temporary files!
TomsMacPro:KWSN-OSX-bench-MB Tom$ 
ID: 1834748 · Report as offensive     Reply Quote
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7471
Credit: 91,023,003
RAC: 10,759
Australia
Message 1834776 - Posted: 8 Dec 2016, 8:50:24 UTC - in response to Message 1834748.  
Last modified: 8 Dec 2016, 8:52:27 UTC

If you can repeat on 'slow mode' (i.e .unroll of 1), and it yields the same discrepancy, then it will say some things about the new pulsefind changes. If the discrepancy disappears, then it will point to the unroll reduction stage, and possibly explain why I was unable to replicate the failure (I added additional synchronisation in there).
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1834776 · Report as offensive     Reply Quote
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 5804
Credit: 75,966,129
RAC: 50,436
Russia
Message 1834789 - Posted: 8 Dec 2016, 11:27:02 UTC - in response to Message 1834776.  

yep, very worth to try experiment indeed.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1834789 · Report as offensive     Reply Quote
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 3788
Credit: 186,094,577
RAC: 236,691
United States
Message 1834793 - Posted: 8 Dec 2016, 12:46:12 UTC
Last modified: 8 Dec 2016, 13:09:54 UTC

OK, running the same 4 WUs without the Unroll has produced Good results so far. I'll post the BLC3 tasks when they're finished.

Running on TomsMacPro.local at Thu Dec 8 12:23:59 2016
---------------------------------------------------
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
18au09aa.4654.85539.7.34.226.wu 18dc09ah.26284.16432.6.33.125.wu blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu

Listing executable(s) in /APPS :
setiathome_x41p_zi+_x86_64-apple-darwin_cuda60

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: 18au09aa.4654.85539.7.34.226.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 8649 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -device 0
      427.70 real       124.24 user        36.67 sys
Elapsed Time : ……………………………… 428 seconds
Speed compared to default : 2020 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.95%
---------------------------------------------------
Done with 18au09aa.4654.85539.7.34.226.wu.
Current WU: 18dc09ah.26284.16432.6.33.125.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 3517 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -device 0
      167.60 real        21.53 user        12.69 sys
Elapsed Time : ……………………………… 168 seconds
Speed compared to default : 2093 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.70%
---------------------------------------------------
Done with 18dc09ah.26284.16432.6.33.125.wu.
Current WU: blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 6543 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -device 0
     1830.00 real        35.78 user        20.47 sys
Elapsed Time : ……………………………… 1830 seconds
Speed compared to default : 357 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.25%
---------------------------------------------------
Done with blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu.
Current WU: blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 433 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -device 0
       86.02 real         3.82 user         2.27 sys
Elapsed Time : ……………………………… 86 seconds
Speed compared to default : 503 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.32%
---------------------------------------------------
Done with blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu.

Done with Benchmark run! Removing temporary files!


Yep, removing the Unroll fixes it.
So, what did you say I need to add to solve the Unroll bug?
ID: 1834793 · Report as offensive     Reply Quote
Previous · 1 . . . 41 · 42 · 43 · 44 · 45 · 46 · 47 . . . 49 · Next

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...


 
©2017 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.