Message boards :
Number crunching :
I've Built a Couple OSX CUDA Apps...
Message board moderation
Previous · 1 . . . 42 · 43 · 44 · 45 · 46 · 47 · 48 . . . 58 · Next
Author | Message |
---|---|
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
That brings things much closer to an explanation :). In the early days of my Windows build of zi+a, I was heavily using my PC for work and entertainment, so was running without unroll to keep load low. After all, I was more interested in validation than throughput. Later I wanted to see what it could do, and opened up unroll ( Petri wanted to see my 980 open the unroll throttle also). Things went south, though I was too busy to join the dots, thinking I broke something or my 980 might have gone bad. Should ultimately turn out to be relatively simple to fix. Should be able to browse the unroll implementation, While Petri and/or raistmer could look from different angles. [Edit:] Yep, removing the Unroll fixes it. It's going to depend on how Petri implemented unroll, whether it has interior bugs or something wrong with the subsequent postprocessing. The broad general term is synchronisation, but could be either deep inside the kernels at instruction level, or in the subsequent reductions. my feeling says turning off unroll removes an outer race condition , which would be the simplest to fix, but more insidious possibilities exist if he used warp reduction techniques in the postprocessing. That last ('volatile' warp reduction) possibility could be harder to pin down and fix. Will only be able to clarify that after a lot of digging. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Seems to definitely be the unroll, just using unroll 2 brings the Errors back; Current WU: 18au09aa.4654.85539.7.34.226.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 8649 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 2 -device 0 413.41 real 119.72 user 39.11 sys Elapsed Time : ……………………………… 414 seconds Speed compared to default : 2089 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 5 5 5 0 0 5 5 5 0 Autocorr 0 0 0 0 0 0 0 0 0 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 4 4 4 1 0 4 4 4 1 Triplet 0 0 0 0 0 0 0 0 0 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 0 1 1 1 0 0 1 1 1 0 Best Pulse 0 0 0 0 1 0 0 0 0 1 Best Triplet 0 0 0 0 0 0 0 0 0 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 0 12 12 12 2 0 12 12 12 2 Unmatched signal(s) in R1 at line(s) 422 611 Unmatched signal(s) in R2 at line(s) 422 611 For R1:R2 matched signals only, Q= 99.95% Result : Weakly similar. --------------------------------------------------- Done with 18au09aa.4654.85539.7.34.226.wu. Current WU: 18dc09ah.26284.16432.6.33.125.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 3517 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 2 -device 0 169.59 real 22.36 user 12.81 sys Elapsed Time : ……………………………… 169 seconds Speed compared to default : 2081 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 0 0 0 0 0 0 0 0 0 Autocorr 0 0 0 0 0 0 0 0 0 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 0 0 0 1 0 0 0 0 1 Triplet 0 3 3 3 0 0 3 3 3 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 1 1 1 1 0 1 1 1 1 0 Best Pulse 0 0 0 0 1 0 0 0 0 1 Best Triplet 0 1 1 1 0 0 1 1 1 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 1 7 7 7 2 1 7 7 7 2 Unmatched signal(s) in R1 at line(s) 393 473 Unmatched signal(s) in R2 at line(s) 393 473 For R1:R2 matched signals only, Q= 99.99% Result : Weakly similar. --------------------------------------------------- Done with 18dc09ah.26284.16432.6.33.125.wu. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
OK, will sleep on how to determine if it's the simpler outer case or the more complex one. Intuition is saying the simpler case, because different OS/Driver implementation could easily return payloads in another sequence (As Raistmer and myself were saying in different ways). Life was much simpler under good old Cuda default stream 0 ;) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I found a day old e-mail, that had p_zi3l. Check these results with the same 4 Problem tasks run on p_zi+ and p_zi3l. The p_zi+ app is using the cuda 6.5 libraries, which is usually faster, and the p_zi3l is using the cuda 7.5 libraries. KWSN-Darwin-MBbench v2.1.07 Running on TomsMacPro.local at Tue Jan 10 02:03:17 2017 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : 18au09aa.4654.85539.7.34.226.wu 18dc09ah.26284.16432.6.33.125.wu blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu Listing executable(s) in /APPS : setiathome_x41p_zi+_x86_64-apple-darwin_cuda75 setiathome_x41p_zi3l_x86_64-apple-darwin_cuda75 Listing executable in /REF_APPs : MBv8_8.05r3344_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: 18au09aa.4654.85539.7.34.226.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 8649 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0 344.17 real 105.96 user 32.83 sys Elapsed Time : ……………………………… 344 seconds Speed compared to default : 2514 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 5 5 5 0 0 5 5 5 0 Autocorr 0 0 0 0 0 0 0 0 0 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 4 4 4 1 0 4 4 4 1 Triplet 0 0 0 0 0 0 0 0 0 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 0 1 1 1 0 0 1 1 1 0 Best Pulse 0 0 0 0 1 0 0 0 0 1 Best Triplet 0 0 0 0 0 0 0 0 0 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 0 12 12 12 2 0 12 12 12 2 Unmatched signal(s) in R1 at line(s) 422 611 Unmatched signal(s) in R2 at line(s) 422 611 For R1:R2 matched signals only, Q= 99.95% Result : Weakly similar. --------------------------------------------------- Running app with command : setiathome_x41p_zi3l_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0 323.62 real 45.73 user 31.80 sys Elapsed Time : ……………………………… 324 seconds Speed compared to default : 2669 % ----------------- Comparing results Result : Strongly similar, Q= 99.96% --------------------------------------------------- Done with 18au09aa.4654.85539.7.34.226.wu. Current WU: 18dc09ah.26284.16432.6.33.125.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 3517 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0 144.11 real 17.22 user 11.44 sys Elapsed Time : ……………………………… 144 seconds Speed compared to default : 2442 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 0 0 0 0 0 0 0 0 0 Autocorr 0 0 0 0 0 0 0 0 0 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 0 0 0 1 0 0 0 0 1 Triplet 0 3 3 3 0 0 3 3 3 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 1 1 1 1 0 1 1 1 1 0 Best Pulse 0 1 1 1 0 0 1 1 1 0 Best Triplet 0 1 1 1 0 0 1 1 1 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 1 8 8 8 1 1 8 8 8 1 Unmatched signal(s) in R1 at line(s) 393 Unmatched signal(s) in R2 at line(s) 393 For R1:R2 matched signals only, Q= 99.70% Result : Weakly similar. --------------------------------------------------- Running app with command : setiathome_x41p_zi3l_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0 141.81 real 21.12 user 14.95 sys Elapsed Time : ……………………………… 142 seconds Speed compared to default : 2476 % ----------------- Comparing results Result : Strongly similar, Q= 99.69% --------------------------------------------------- Done with 18dc09ah.26284.16432.6.33.125.wu. Current WU: blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 6543 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0 797.86 real 25.95 user 16.11 sys Elapsed Time : ……………………………… 798 seconds Speed compared to default : 819 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 1 1 1 0 0 1 1 1 0 Autocorr 0 0 0 0 0 0 0 0 0 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 17 17 17 1 0 17 17 17 1 Triplet 1 1 1 1 0 1 1 1 1 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 1 1 1 1 0 1 1 1 1 0 Best Pulse 0 1 1 1 0 0 1 1 1 0 Best Triplet 1 1 1 1 0 1 1 1 1 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 3 24 24 24 1 3 24 24 24 1 Unmatched signal(s) in R1 at line(s) 663 Unmatched signal(s) in R2 at line(s) 663 For R1:R2 matched signals only, Q= 99.25% Result : Weakly similar. --------------------------------------------------- Running app with command : setiathome_x41p_zi3l_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0 551.23 real 30.36 user 21.44 sys Elapsed Time : ……………………………… 552 seconds Speed compared to default : 1185 % ----------------- Comparing results Result : Strongly similar, Q= 99.25% --------------------------------------------------- Done with blc3_2bit_guppi_57424_80774_HIP9480_0005.24846.0.17.26.134.vlar.wu. Current WU: blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 433 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0 43.40 real 3.01 user 1.90 sys Elapsed Time : ……………………………… 43 seconds Speed compared to default : 1006 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 1 15 15 15 0 1 15 15 15 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 11 11 12 1 0 11 11 12 1 Triplet 0 2 2 2 0 0 2 2 2 0 Best Spike 0 0 0 0 0 0 0 0 0 0 Best Gaussian 0 0 0 0 0 0 0 0 0 0 Best Pulse 0 0 0 0 0 0 0 0 0 0 Best Triplet 0 0 0 0 0 0 0 0 0 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 1 28 28 29 1 1 28 28 29 1 Unmatched signal(s) in R1 at line(s) 356 Unmatched signal(s) in R2 at line(s) 356 For R1:R2 matched signals only, Q= 38.66% Result : Weakly similar. --------------------------------------------------- Running app with command : setiathome_x41p_zi3l_x86_64-apple-darwin_cuda75 -bs -unroll 6 -device 0 34.78 real 3.88 user 2.46 sys Elapsed Time : ……………………………… 35 seconds Speed compared to default : 1237 % ----------------- Comparing results Result : Strongly similar, Q= 99.32% --------------------------------------------------- Done with blc3_2bit_guppi_57424_81430_HIP9480_0007.5224.831.17.26.71.vlar.wu. Done with Benchmark run! Removing temporary files! TomsMacPro:KWSN-OSX-bench-MB Tom$ Nice. I'm running it on my Mac now. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
I found a day old e-mail, that had p_zi3l. Check these results with the same 4 Problem tasks run on p_zi+ and p_zi3l. The p_zi+ app is using the cuda 6.5 libraries, which is usually faster, and the p_zi3l is using the cuda 7.5 libraries. Hi TBar, The l version is fast and accurate, but suffers from 'EXECUTION TIME LIMIT EXEEDED' i.e. locks up at random intervals (situations). It is not ready for publication. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Thanks Petri, I'll keep an eye on it. I did finally track down those random Mac CUDA Driver restarts, seems it is the p_zi3k App. It will cause a CUDA driver restart every day or two on my Mac. The Ubuntu p_zi3k version doesn't have that problem, I've never had a driver restart with p_zi3k on the Linux machine. We'll see how this version works. I'm still trying to decide what to do about CUDA Toolkit 6.0. It seems that Toolkit 6.0 doesn't know anything about sm_37 even though it supports sm_50. You have to remove sm_37 to compile the App in Toolkit 6. All those Tesla K80s are sm_37. I suppose if I added -gencode arch=compute_35,code=sm_35 along with -gencode arch=compute_50,code=sm_50 it would work on the K80s. I don't know if the last CUDA 6 App will work with those K80s or not, it only has -gencode arch=compute_50,code=sm_50 and doesn't have anything about sm_37. Strange Toolkit 6 doesn't support sm_37. Maybe I should just use Toolkit 7.5 in Ubuntu, but then people would have to work to get the 7.5 Linux CUDA Libraries... |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Thanks Petri, I'll keep an eye on it. I did finally track down those random Mac CUDA Driver restarts, seems it is the p_zi3k App. It will cause a CUDA driver restart every day or two on my Mac. The Ubuntu p_zi3k version doesn't have that problem, I've never had a driver restart with p_zi3k on the Linux machine. We'll see how this version works. Hi, thanks for all your hard work. The K80 may need sm_30 if native sm_37 is missing. I have to send you another version for testing since I have not had any lockups now for several hours. One user (Gianfranco from Italy) is compiling his own executable for MAC from my source (I do not know if he does some modifications before compilation). He's getting occasional errors (executable aborts) but since I'm not a MAC user I can not tell what kind of errors he has. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
So far I haven't had any trouble on the Mac with p_zi3l, I did have two Hangs on the Ubuntu machine with p_zi3l. I changed the Ubuntu machine over to p_zi3m last night and haven't had any hangs since. The last I checked Gianfranco is running a Hackintosh, so, I wouldn't expect things to be the same. Some Hacks work fine with most Mac Apps, some don't. I'm still getting more Inconclusives than I would expect considering the Benchmark runs. I'll see see how it looks tomorrow. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Yeah, since v8 all known good applications converged a lot, the benching practices need refinement. Probably with some hand selected/tweaked tasks, to expose specific weaknesses. We lost our expert in that area (Joe), and no-one seems to know for sure what happened. Most likely It's time to transition to internal regression for most applications (like stock CPU has), but that takes a level of organisation resisted by the state of flux with all platforms at the moment. Just tell the OS people to stop being 'Special Snowflakes', and concentrate on supporting developers instead of lining their pockets. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I gave up on CUDA 6 . Trying to support the sm_37 was giving some bad side effects. So, on to CUDA 6.5 with p_zi3m which not only supports sm_37, but sm_52 as well. It looks promising but I'm still seeing PulseFind problems. One task from the Mac, http://setiathome.berkeley.edu/workunit.php?wuid=2396809916. Looks as though it just finished so You can't download the Task, I however, have a copy. It seems to be choosing the wrong signal. The GPU result shows; Pulse: peak=7.264187, time=32.48, period=3.233, d_freq=1420904567.58, score=1.001, chirp=-52.034, fft_len=512The Best Pulse isn't listed on the GPU result, but is listed on my CPU result; Pulse: peak=7.264189, time=32.48, period=3.233, d_freq=1420904567.58, score=1.001, chirp=-52.034, fft_len=512 Still a Race? Otherwise the Linux version looks promising, still a large number of Inconclusives with the Mac version. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Still a Race? Quite probably (same time yet different lower period+score). That's across the unroll still. In between some family responsibilities, I've managed to work out a way to possibly retain the full speed code, allow some further optimisations, and produce the expected results. Will probably be end of the week before I can dig in properly, and I need to poke into updated code Petri sent me also. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
The CPU AVX2 App is now posted along with the ATI5 GPU App. If you have an Intel iGPU you should Disable it and just run the CPU. You should free at least one CPU core if you are also running a GPU App. Due to the Boost function, running 50% CPU cores will result in better CPU run-times. This is a machine running the AVX2 App on an i7-6700HQ CPU @ 2.60GHz using 50% CPU cores without the Intel iGPU, https://setiathome.berkeley.edu/results.php?hostid=8177300&offset=140 The files are here, http://www.arkayn.us/forum/index.php?topic=191.0 Still nothing new on the nVidia Special App. At present the OSX version is still producing twice as many Inconclusive results as the Linux version. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Still nothing new on the nVidia Special App. At present the OSX version is still producing twice as many Inconclusive results as the Linux version. Should be able to do a little more poking once the ambient daytime temps drop below 40 Celcius. At that point will temporarily switch the GTX 780 back in, and try for parallel Linux and OSX builds against baseline. If Petri's last contributions improve what was in alpha, I'll wedge in some comparison code for isolating the differences. There are some simple options to try if races are still present. Will know more later in the week. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
@jason_gee I'll drop some new honeycombed code to you and TBar to test with. 'To honeycomb' == look at at least from 6 angles and choose the best. I'll be back on late sunday here. So till then. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I've stumbled upon a problem with the 41p_zi3m build. Seems it has a problem with a certain angle range Arecibo task, around 1.117012 & 0.917102. It gives the wrong Best gaussian number. http://setiathome.berkeley.edu/workunit.php?wuid=2436105125 Best gaussian: peak=3.997493 verses Best gaussian: peak=7.937101 http://setiathome.berkeley.edu/workunit.php?wuid=2435132024 Best gaussian: peak=4.078763 verses Best gaussian: peak=6.91799 On the Mac I've tried three different builds, cuda 8.0, cuda 7.5, and cuda 6.5. They all have the same problem with this angle range on multiple tasks. One of the tasks is here; http://boinc2.ssl.berkeley.edu/sah/download_fanout/11b/26fe09ac.27681.340175.11.38.212 It seems the 41p_zi+, Cuda 6.00 special build has the same problem, at least there are a number of Inconclusive Results at that angle range, http://setiathome.berkeley.edu/results.php?hostid=7942417&offset=20&state=3 Since this is a Best gaussian verses a Best pulse, maybe it will help tracking down the Race... *shrugs* |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Thank you TBar. I'll test that. Hope I find something. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Seems it's been around for a while, even the Old 41p_zi which doesn't have the unroll function fails on that task; Running on TomsMacPro.local at Sat Feb 18 06:18:43 2017 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : 26fe09ac.27681.340175.11.38.212.wu Listing executable(s) in /APPS : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 setiathome_x41p_zi_x86_64-apple-darwin_cuda80 Listing executable in /REF_APPs : MBv8_8.05r3344_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: 26fe09ac.27681.340175.11.38.212.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 7585 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi+_x86_64-apple-darwin_cuda60 -bs -unroll 8 -device 0 233.01 real 64.15 user 21.62 sys Elapsed Time : ……………………………… 234 seconds Speed compared to default : 3241 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 2 2 2 0 0 2 2 2 0 Autocorr 0 1 1 1 0 0 1 1 1 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 0 0 0 0 0 0 0 0 0 Triplet 0 2 2 2 0 0 2 2 2 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 0 0 0 0 1 0 0 0 0 1 Best Pulse 0 1 1 1 0 0 1 1 1 0 Best Triplet 0 1 1 1 0 0 1 1 1 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 0 9 9 9 1 0 9 9 9 1 Unmatched signal(s) in R1 at line(s) 458 Unmatched signal(s) in R2 at line(s) 458 For R1:R2 matched signals only, Q= 99.73% Result : Weakly similar. --------------------------------------------------- Running app with command : setiathome_x41p_zi_x86_64-apple-darwin_cuda80 -device 0 232.82 real 64.44 user 20.98 sys Elapsed Time : ……………………………… 233 seconds Speed compared to default : 3255 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 2 2 2 0 0 2 2 2 0 Autocorr 0 1 1 1 0 0 1 1 1 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 0 0 0 0 0 0 0 0 0 Triplet 1 2 2 2 0 1 2 2 2 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 0 0 0 0 1 0 0 0 0 1 Best Pulse 0 1 1 1 0 0 1 1 1 0 Best Triplet 0 1 1 1 0 0 1 1 1 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 1 9 9 9 1 1 9 9 9 1 Unmatched signal(s) in R1 at line(s) 458 Unmatched signal(s) in R2 at line(s) 458 For R1:R2 matched signals only, Q= 99.72% Result : Weakly similar. --------------------------------------------------- Done with 26fe09ac.27681.340175.11.38.212.wu. Done with Benchmark run! Removing temporary files! |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Hi, the gaussian search is unaffected by unroll. Unroll is used only in pulse find. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Would recommend checking a vanilla zi Vs vanilla CPU with Gaussians in extreme circumstances, because they have been problematic on and off at different times. They are relatively sensitive to platform variation (though that shouldn't impact best selection logic of course) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Would recommend checking a vanilla zi Vs vanilla CPU with Gaussians in extreme circumstances, because they have been problematic on and off at different times. They are relatively sensitive to platform variation (though that shouldn't impact best selection logic of course) Hi, I pm'ed you, TBar, and Gianfranco. There's a fix now to the Gaussian finding. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.