I've Built a Couple OSX CUDA Apps...

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 42 · 43 · 44 · 45 · 46 · 47 · 48 . . . 58 · Next

AuthorMessage
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1834798 - Posted: 8 Dec 2016, 13:08:27 UTC - in response to Message 1834793.  
Last modified: 8 Dec 2016, 13:33:16 UTC

That brings things much closer to an explanation :). In the early days of my Windows build of zi+a, I was heavily using my PC for work and entertainment, so was running without unroll to keep load low. After all, I was more interested in validation than throughput. Later I wanted to see what it could do, and opened up unroll ( Petri wanted to see my 980 open the unroll throttle also). Things went south, though I was too busy to join the dots, thinking I broke something or my 980 might have gone bad.

Should ultimately turn out to be relatively simple to fix. Should be able to browse the unroll implementation, While Petri and/or raistmer could look from different angles.

[Edit:]
Yep, removing the Unroll fixes it.
So, what did you say I need to add to solve the Unroll bug?

It's going to depend on how Petri implemented unroll, whether it has interior bugs or something wrong with the subsequent postprocessing. The broad general term is synchronisation, but could be either deep inside the kernels at instruction level, or in the subsequent reductions. my feeling says turning off unroll removes an outer race condition , which would be the simplest to fix, but more insidious possibilities exist if he used warp reduction techniques in the postprocessing. That last ('volatile' warp reduction) possibility could be harder to pin down and fix. Will only be able to clarify that after a lot of digging.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1834798 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1834802 - Posted: 8 Dec 2016, 13:44:56 UTC - in response to Message 1834798.  

Seems to definitely be the unroll, just using unroll 2 brings the Errors back;

Current WU: 18au09aa.4654.85539.7.34.226.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 8649 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 2 -device 0
      413.41 real       119.72 user        39.11 sys
Elapsed Time : ……………………………… 414 seconds
Speed compared to default : 2089 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      5      5      5      0        0      5      5      5      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      4      4      4      1        0      4      4      4      1
      Triplet      0      0      0      0      0        0      0      0      0      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      0      1      1      1      0        0      1      1      1      0
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      0      0      0      0      0        0      0      0      0      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   0     12     12     12      2        0     12     12     12      2

Unmatched signal(s) in R1 at line(s) 422 611
Unmatched signal(s) in R2 at line(s) 422 611
For R1:R2 matched signals only, Q= 99.95%
Result      : Weakly similar.
---------------------------------------------------
Done with 18au09aa.4654.85539.7.34.226.wu.
Current WU: 18dc09ah.26284.16432.6.33.125.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 3517 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 2 -device 0
      169.59 real        22.36 user        12.81 sys
Elapsed Time : ……………………………… 169 seconds
Speed compared to default : 2081 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      0      0      0      0        0      0      0      0      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      0      0      0      1        0      0      0      0      1
      Triplet      0      3      3      3      0        0      3      3      3      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1      7      7      7      2        1      7      7      7      2

Unmatched signal(s) in R1 at line(s) 393 473
Unmatched signal(s) in R2 at line(s) 393 473
For R1:R2 matched signals only, Q= 99.99%
Result      : Weakly similar.
---------------------------------------------------
Done with 18dc09ah.26284.16432.6.33.125.wu.
ID: 1834802 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1834803 - Posted: 8 Dec 2016, 13:48:45 UTC - in response to Message 1834802.  
Last modified: 8 Dec 2016, 13:49:24 UTC

OK, will sleep on how to determine if it's the simpler outer case or the more complex one. Intuition is saying the simpler case, because different OS/Driver implementation could easily return payloads in another sequence (As Raistmer and myself were saying in different ways). Life was much simpler under good old Cuda default stream 0 ;)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1834803 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1841150 - Posted: 10 Jan 2017, 2:56:29 UTC

I found a day old e-mail, that had p_zi3l. Check these results with the same 4 Problem tasks run on p_zi+ and p_zi3l. The p_zi+ app is using the cuda 6.5 libraries, which is usually faster, and the p_zi3l is using the cuda 7.5 libraries.
KWSN-Darwin-MBbench v2.1.07
Running on TomsMacPro.local at Tue Jan 10 02:03:17 2017
---------------------------------------------------
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
18au09aa.4654.85539.7.34.226.wu 18dc09ah.26284.16432.6.33.125.wu blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu

Listing executable(s) in /APPS :
setiathome_x41p_zi+_x86_64-apple-darwin_cuda75 setiathome_x41p_zi3l_x86_64-apple-darwin_cuda75

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: 18au09aa.4654.85539.7.34.226.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 8649 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0
      344.17 real       105.96 user        32.83 sys
Elapsed Time : ……………………………… 344 seconds
Speed compared to default : 2514 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      5      5      5      0        0      5      5      5      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      4      4      4      1        0      4      4      4      1
      Triplet      0      0      0      0      0        0      0      0      0      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      0      1      1      1      0        0      1      1      1      0
   Best Pulse      0      0      0      0      1        0      0      0      0      1
 Best Triplet      0      0      0      0      0        0      0      0      0      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   0     12     12     12      2        0     12     12     12      2

Unmatched signal(s) in R1 at line(s) 422 611
Unmatched signal(s) in R2 at line(s) 422 611
For R1:R2 matched signals only, Q= 99.95%
Result      : Weakly similar.
---------------------------------------------------
Running app with command : setiathome_x41p_zi3l_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0
      323.62 real        45.73 user        31.80 sys
Elapsed Time : ……………………………… 324 seconds
Speed compared to default : 2669 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.96%
---------------------------------------------------
Done with 18au09aa.4654.85539.7.34.226.wu.
Current WU: 18dc09ah.26284.16432.6.33.125.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 3517 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0
      144.11 real        17.22 user        11.44 sys
Elapsed Time : ……………………………… 144 seconds
Speed compared to default : 2442 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      0      0      0      0        0      0      0      0      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      0      0      0      1        0      0      0      0      1
      Triplet      0      3      3      3      0        0      3      3      3      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      1      1      1      0        0      1      1      1      0
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1      8      8      8      1        1      8      8      8      1

Unmatched signal(s) in R1 at line(s) 393
Unmatched signal(s) in R2 at line(s) 393
For R1:R2 matched signals only, Q= 99.70%
Result      : Weakly similar.
---------------------------------------------------
Running app with command : setiathome_x41p_zi3l_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0
      141.81 real        21.12 user        14.95 sys
Elapsed Time : ……………………………… 142 seconds
Speed compared to default : 2476 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.69%
---------------------------------------------------
Done with 18dc09ah.26284.16432.6.33.125.wu.
Current WU: blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 6543 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0
      797.86 real        25.95 user        16.11 sys
Elapsed Time : ……………………………… 798 seconds
Speed compared to default : 819 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      1      1      1      0        0      1      1      1      0
     Autocorr      0      0      0      0      0        0      0      0      0      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0     17     17     17      1        0     17     17     17      1
      Triplet      1      1      1      1      0        1      1      1      1      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      1      1      1      1      0        1      1      1      1      0
   Best Pulse      0      1      1      1      0        0      1      1      1      0
 Best Triplet      1      1      1      1      0        1      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   3     24     24     24      1        3     24     24     24      1

Unmatched signal(s) in R1 at line(s) 663
Unmatched signal(s) in R2 at line(s) 663
For R1:R2 matched signals only, Q= 99.25%
Result      : Weakly similar.
---------------------------------------------------
Running app with command : setiathome_x41p_zi3l_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0
      551.23 real        30.36 user        21.44 sys
Elapsed Time : ……………………………… 552 seconds
Speed compared to default : 1185 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.25%
---------------------------------------------------
Done with blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu.
Current WU: blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 433 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0
       43.40 real         3.01 user         1.90 sys
Elapsed Time : ……………………………… 43 seconds
Speed compared to default : 1006 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      1     15     15     15      0        1     15     15     15      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0     11     11     12      1        0     11     11     12      1
      Triplet      0      2      2      2      0        0      2      2      2      0
   Best Spike      0      0      0      0      0        0      0      0      0      0
Best Gaussian      0      0      0      0      0        0      0      0      0      0
   Best Pulse      0      0      0      0      0        0      0      0      0      0
 Best Triplet      0      0      0      0      0        0      0      0      0      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1     28     28     29      1        1     28     28     29      1

Unmatched signal(s) in R1 at line(s) 356
Unmatched signal(s) in R2 at line(s) 356
For R1:R2 matched signals only, Q= 38.66%
Result      : Weakly similar.
---------------------------------------------------
Running app with command : setiathome_x41p_zi3l_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0
       34.78 real         3.88 user         2.46 sys
Elapsed Time : ……………………………… 35 seconds
Speed compared to default : 1237 %
-----------------
Comparing results
Result      : Strongly similar,  Q= 99.32%
---------------------------------------------------
Done with blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu.

Done with Benchmark run! Removing temporary files!
TomsMacPro:KWSN-OSX-bench-MB Tom$

Nice. I'm running it on my Mac now.
ID: 1841150 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1841162 - Posted: 10 Jan 2017, 4:53:22 UTC - in response to Message 1841150.  

I found a day old e-mail, that had p_zi3l. Check these results with the same 4 Problem tasks run on p_zi+ and p_zi3l. The p_zi+ app is using the cuda 6.5 libraries, which is usually faster, and the p_zi3l is using the cuda 7.5 libraries.
...


Hi TBar,

The l version is fast and accurate, but suffers from 'EXECUTION TIME LIMIT EXEEDED' i.e. locks up at random intervals (situations).
It is not ready for publication.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1841162 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1841169 - Posted: 10 Jan 2017, 5:53:52 UTC - in response to Message 1841162.  

Thanks Petri, I'll keep an eye on it. I did finally track down those random Mac CUDA Driver restarts, seems it is the p_zi3k App. It will cause a CUDA driver restart every day or two on my Mac. The Ubuntu p_zi3k version doesn't have that problem, I've never had a driver restart with p_zi3k on the Linux machine. We'll see how this version works.

I'm still trying to decide what to do about CUDA Toolkit 6.0. It seems that Toolkit 6.0 doesn't know anything about sm_37 even though it supports sm_50. You have to remove sm_37 to compile the App in Toolkit 6. All those Tesla K80s are sm_37. I suppose if I added -gencode arch=compute_35,code=sm_35 along with -gencode arch=compute_50,code=sm_50 it would work on the K80s. I don't know if the last CUDA 6 App will work with those K80s or not, it only has -gencode arch=compute_50,code=sm_50 and doesn't have anything about sm_37. Strange Toolkit 6 doesn't support sm_37. Maybe I should just use Toolkit 7.5 in Ubuntu, but then people would have to work to get the 7.5 Linux CUDA Libraries...
ID: 1841169 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1841212 - Posted: 10 Jan 2017, 13:45:48 UTC - in response to Message 1841169.  

Thanks Petri, I'll keep an eye on it. I did finally track down those random Mac CUDA Driver restarts, seems it is the p_zi3k App. It will cause a CUDA driver restart every day or two on my Mac. The Ubuntu p_zi3k version doesn't have that problem, I've never had a driver restart with p_zi3k on the Linux machine. We'll see how this version works.

I'm still trying to decide what to do about CUDA Toolkit 6.0. It seems that Toolkit 6.0 doesn't know anything about sm_37 even though it supports sm_50. You have to remove sm_37 to compile the App in Toolkit 6. All those Tesla K80s are sm_37. I suppose if I added -gencode arch=compute_35,code=sm_35 along with -gencode arch=compute_50,code=sm_50 it would work on the K80s. I don't know if the last CUDA 6 App will work with those K80s or not, it only has -gencode arch=compute_50,code=sm_50 and doesn't have anything about sm_37. Strange Toolkit 6 doesn't support sm_37. Maybe I should just use Toolkit 7.5 in Ubuntu, but then people would have to work to get the 7.5 Linux CUDA Libraries...


Hi,
thanks for all your hard work. The K80 may need sm_30 if native sm_37 is missing. I have to send you another version for testing since I have not had any lockups now for several hours.
One user (Gianfranco from Italy) is compiling his own executable for MAC from my source (I do not know if he does some modifications before compilation). He's getting occasional errors (executable aborts) but since I'm not a MAC user I can not tell what kind of errors he has.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1841212 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1841412 - Posted: 11 Jan 2017, 13:12:35 UTC - in response to Message 1841212.  

So far I haven't had any trouble on the Mac with p_zi3l, I did have two Hangs on the Ubuntu machine with p_zi3l. I changed the Ubuntu machine over to p_zi3m last night and haven't had any hangs since.
The last I checked Gianfranco is running a Hackintosh, so, I wouldn't expect things to be the same. Some Hacks work fine with most Mac Apps, some don't.
I'm still getting more Inconclusives than I would expect considering the Benchmark runs. I'll see see how it looks tomorrow.
ID: 1841412 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1841414 - Posted: 11 Jan 2017, 13:23:41 UTC - in response to Message 1841412.  
Last modified: 11 Jan 2017, 13:24:38 UTC

Yeah, since v8 all known good applications converged a lot, the benching practices need refinement. Probably with some hand selected/tweaked tasks, to expose specific weaknesses. We lost our expert in that area (Joe), and no-one seems to know for sure what happened. Most likely It's time to transition to internal regression for most applications (like stock CPU has), but that takes a level of organisation resisted by the state of flux with all platforms at the moment.

Just tell the OS people to stop being 'Special Snowflakes', and concentrate on supporting developers instead of lining their pockets.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1841414 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1842588 - Posted: 16 Jan 2017, 13:15:28 UTC

I gave up on CUDA 6 . Trying to support the sm_37 was giving some bad side effects. So, on to CUDA 6.5 with p_zi3m which not only supports sm_37, but sm_52 as well. It looks promising but I'm still seeing PulseFind problems. One task from the Mac, http://setiathome.berkeley.edu/workunit.php?wuid=2396809916. Looks as though it just finished so You can't download the Task, I however, have a copy. It seems to be choosing the wrong signal. The GPU result shows;
Pulse: peak=7.264187, time=32.48, period=3.233, d_freq=1420904567.58, score=1.001, chirp=-52.034, fft_len=512
Pulse: peak=5.197393, time=38.98, period=1.979, d_freq=1420904348.58, score=1.001, chirp=62.587, fft_len=512
Pulse: peak=5.612758, time=38.98, period=1.979, d_freq=1420904348.67, score=1.081, chirp=63.079, fft_len=512
Pulse: peak=5.212731, time=38.98, period=1.966, d_freq=1420904348.73, score=1.004, chirp=63.57, fft_len=512
Pulse: peak=5.621056, time=38.98, period=1.979, d_freq=1420904348.78, score=1.082, chirp=64.061, fft_len=512
Pulse: peak=5.264318, time=38.98, period=1.979, d_freq=1420904348.84, score=1.014, chirp=64.551, fft_len=512
Best pulse: peak=5.758566, time=38.98, period=1.979, d_freq=1420904348.73, score=1.109, chirp=63.57, fft_len=512
The Best Pulse isn't listed on the GPU result, but is listed on my CPU result;
Pulse: peak=7.264189, time=32.48, period=3.233, d_freq=1420904567.58, score=1.001, chirp=-52.034, fft_len=512
Pulse: peak=5.197394, time=38.98, period=1.979, d_freq=1420904348.58, score=1.001, chirp=62.587, fft_len=512
Pulse: peak=5.612758, time=38.98, period=1.979, d_freq=1420904348.67, score=1.081, chirp=63.079, fft_len=512
Pulse: peak=5.758566, time=38.98, period=1.979, d_freq=1420904348.73, score=1.109, chirp=63.57, fft_len=512
Pulse: peak=5.621056, time=38.98, period=1.979, d_freq=1420904348.78, score=1.082, chirp=64.061, fft_len=512
Pulse: peak=5.264318, time=38.98, period=1.979, d_freq=1420904348.84, score=1.013, chirp=64.551, fft_len=512
Best pulse: peak=5.758566, time=38.98, period=1.979, d_freq=1420904348.73, score=1.109, chirp=63.57, fft_len=512

Still a Race?

Otherwise the Linux version looks promising, still a large number of Inconclusives with the Mac version.
ID: 1842588 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1842609 - Posted: 16 Jan 2017, 14:28:40 UTC - in response to Message 1842588.  
Last modified: 16 Jan 2017, 14:30:34 UTC

Still a Race?

Otherwise the Linux version looks promising, still a large number of Inconclusives with the Mac version.


Quite probably (same time yet different lower period+score). That's across the unroll still. In between some family responsibilities, I've managed to work out a way to possibly retain the full speed code, allow some further optimisations, and produce the expected results. Will probably be end of the week before I can dig in properly, and I need to poke into updated code Petri sent me also.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1842609 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1848071 - Posted: 11 Feb 2017, 17:05:40 UTC
Last modified: 11 Feb 2017, 17:14:24 UTC

The CPU AVX2 App is now posted along with the ATI5 GPU App. If you have an Intel iGPU you should Disable it and just run the CPU. You should free at least one CPU core if you are also running a GPU App. Due to the Boost function, running 50% CPU cores will result in better CPU run-times. This is a machine running the AVX2 App on an i7-6700HQ CPU @ 2.60GHz using 50% CPU cores without the Intel iGPU, https://setiathome.berkeley.edu/results.php?hostid=8177300&offset=140
The files are here, http://www.arkayn.us/forum/index.php?topic=191.0

Still nothing new on the nVidia Special App. At present the OSX version is still producing twice as many Inconclusive results as the Linux version.
ID: 1848071 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1848088 - Posted: 11 Feb 2017, 18:54:02 UTC - in response to Message 1848071.  

Still nothing new on the nVidia Special App. At present the OSX version is still producing twice as many Inconclusive results as the Linux version.


Should be able to do a little more poking once the ambient daytime temps drop below 40 Celcius. At that point will temporarily switch the GTX 780 back in, and try for parallel Linux and OSX builds against baseline. If Petri's last contributions improve what was in alpha, I'll wedge in some comparison code for isolating the differences. There are some simple options to try if races are still present. Will know more later in the week.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1848088 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1848123 - Posted: 11 Feb 2017, 22:26:56 UTC

@jason_gee
I'll drop some new honeycombed code to you and TBar to test with.
'To honeycomb' == look at at least from 6 angles and choose the best.

I'll be back on late sunday here. So till then.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1848123 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1849451 - Posted: 18 Feb 2017, 4:36:18 UTC
Last modified: 18 Feb 2017, 5:13:55 UTC

I've stumbled upon a problem with the 41p_zi3m build. Seems it has a problem with a certain angle range Arecibo task, around 1.117012 & 0.917102. It gives the wrong Best gaussian number.
http://setiathome.berkeley.edu/workunit.php?wuid=2436105125
Best gaussian: peak=3.997493 verses Best gaussian: peak=7.937101
http://setiathome.berkeley.edu/workunit.php?wuid=2435132024
Best gaussian: peak=4.078763 verses Best gaussian: peak=6.91799
On the Mac I've tried three different builds, cuda 8.0, cuda 7.5, and cuda 6.5. They all have the same problem with this angle range on multiple tasks.
One of the tasks is here; http://boinc2.ssl.berkeley.edu/sah/download_fanout/11b/26fe09ac.27681.340175.11.38.212
It seems the 41p_zi+, Cuda 6.00 special build has the same problem, at least there are a number of Inconclusive Results at that angle range, http://setiathome.berkeley.edu/results.php?hostid=7942417&offset=20&state=3

Since this is a Best gaussian verses a Best pulse, maybe it will help tracking down the Race...

*shrugs*
ID: 1849451 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1849472 - Posted: 18 Feb 2017, 6:20:40 UTC - in response to Message 1849451.  

Thank you TBar. I'll test that. Hope I find something.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1849472 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1849473 - Posted: 18 Feb 2017, 6:31:25 UTC - in response to Message 1849472.  

Seems it's been around for a while, even the Old 41p_zi which doesn't have the unroll function fails on that task;
Running on TomsMacPro.local at Sat Feb 18 06:18:43 2017
---------------------------------------------------
Starting benchmark run...
---------------------------------------------------
Listing wu-file(s) in /testWUs :
26fe09ac.27681.340175.11.38.212.wu

Listing executable(s) in /APPS :
setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 setiathome_x41p_zi_x86_64-apple-darwin_cuda80

Listing executable in /REF_APPs :
MBv8_8.05r3344_sse41_x86_64-apple-darwin
---------------------------------------------------
Current WU: 26fe09ac.27681.340175.11.38.212.wu
---------------------------------------------------
Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s)
Elapsed Time: ………………………………… 7585 seconds
---------------------------------------------------
Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 8 -device 0
      233.01 real        64.15 user        21.62 sys
Elapsed Time : ……………………………… 234 seconds
Speed compared to default : 3241 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      2      2      2      0        0      2      2      2      0
     Autocorr      0      1      1      1      0        0      1      1      1      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      0      0      0      0        0      0      0      0      0
      Triplet      0      2      2      2      0        0      2      2      2      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      0      0      0      0      1        0      0      0      0      1
   Best Pulse      0      1      1      1      0        0      1      1      1      0
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   0      9      9      9      1        0      9      9      9      1

Unmatched signal(s) in R1 at line(s) 458
Unmatched signal(s) in R2 at line(s) 458
For R1:R2 matched signals only, Q= 99.73%
Result      : Weakly similar.
---------------------------------------------------
Running app with command : setiathome_x41p_zi_x86_64-apple-darwin_cuda80 -device 0
      232.82 real        64.44 user        20.98 sys
Elapsed Time : ……………………………… 233 seconds
Speed compared to default : 3255 %
-----------------
Comparing results
                ------------- R1:R2 ------------     ------------- R2:R1 ------------
                Exact  Super  Tight  Good    Bad     Exact  Super  Tight  Good    Bad
        Spike      0      2      2      2      0        0      2      2      2      0
     Autocorr      0      1      1      1      0        0      1      1      1      0
     Gaussian      0      0      0      0      0        0      0      0      0      0
        Pulse      0      0      0      0      0        0      0      0      0      0
      Triplet      1      2      2      2      0        1      2      2      2      0
   Best Spike      0      1      1      1      0        0      1      1      1      0
Best Autocorr      0      1      1      1      0        0      1      1      1      0
Best Gaussian      0      0      0      0      1        0      0      0      0      1
   Best Pulse      0      1      1      1      0        0      1      1      1      0
 Best Triplet      0      1      1      1      0        0      1      1      1      0
                ----   ----   ----   ----   ----     ----   ----   ----   ----   ----
                   1      9      9      9      1        1      9      9      9      1

Unmatched signal(s) in R1 at line(s) 458
Unmatched signal(s) in R2 at line(s) 458
For R1:R2 matched signals only, Q= 99.72%
Result      : Weakly similar.
---------------------------------------------------
Done with 26fe09ac.27681.340175.11.38.212.wu.

Done with Benchmark run! Removing temporary files!
ID: 1849473 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1849496 - Posted: 18 Feb 2017, 9:35:22 UTC - in response to Message 1849473.  

Hi,
the gaussian search is unaffected by unroll. Unroll is used only in pulse find.
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1849496 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1849507 - Posted: 18 Feb 2017, 10:39:15 UTC

Would recommend checking a vanilla zi Vs vanilla CPU with Gaussians in extreme circumstances, because they have been problematic on and off at different times. They are relatively sensitive to platform variation (though that shouldn't impact best selection logic of course)
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1849507 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1849944 - Posted: 19 Feb 2017, 11:39:26 UTC - in response to Message 1849507.  

Would recommend checking a vanilla zi Vs vanilla CPU with Gaussians in extreme circumstances, because they have been problematic on and off at different times. They are relatively sensitive to platform variation (though that shouldn't impact best selection logic of course)


Hi,

I pm'ed you, TBar, and Gianfranco. There's a fix now to the Gaussian finding.

Petri
To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1849944 · Report as offensive
Previous · 1 . . . 42 · 43 · 44 · 45 · 46 · 47 · 48 . . . 58 · Next

Message boards : Number crunching : I've Built a Couple OSX CUDA Apps...


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.