Message boards :
Number crunching :
Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation
Previous · 1 . . . 67 · 68 · 69 · 70 · 71 · 72 · 73 . . . 83 · Next
Author | Message |
---|---|
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
I'm afraid you won't find a nVidia GPU with anything above OpenCL 1.2 as they stopped any further development to concentrate on CUDA. Even the Windows GPUs show OpenCL 1.2, https://setiathome.berkeley.edu/top_hosts.php In addition, Apple has frozen OpenCL at 1.2 saying they don't need 2.0, and it would require them to rewrite their software to boot, Coprocessors : AMD AMD Radeon Pro 580 Compute Engine (2047MB) OpenCL: 1.2 Operating System : Darwin 17.0.0 BTW, All the AMD Macs on Main are still getting that Error that doesn't exist on Beta, Exit status : 226 (0xFFFFFF1E) ERR_TOO_MANY_EXITS at least they were before the outage. |
![]() Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 ![]() ![]() |
Well, Petri says it's because his newer Apps are finding signals in the first chirp whereas the other Apps aren't. It is something in the newer Apps, he's just not sure what. Hi, Just like Raistmer said: Zero chirp is the first one and then the +- something ones. The fft PoT slot 0 for every chirp is the static (0 Hz) value and that is not used, it is omitted. Divergence to a short path v.s. some other things: Any output value/value to be checked in the middle for action can be multiplied with factor = (PoT == 0 ? 0.0f : 1.0f); . One multiplication vs divergence to a path length zero can have an impact and it's performance can wary between the CUDA GPU generations/models. Current implementation prefers if(pot == 0) return; . That causes divergence (BAD thing for a GPU). Things may change, but pot 0 for any fft will never be in the reported signals. Chirp 0 will be checked as will all other chirps too. p.s. What is a command line option -spike_fft_limit 4096 or similar (can not remember it right now) in SoG? Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
Here's an interesting build. A few days ago I again tried to get the Cooperative Groups to work on the Maxwell cards. Again, after a few hours I had to admit it just wasn't going to work. I then tried to build a Maxwell version of zi3xs3 using the Pulsefind from zi3v since zi3v doesn't have the problem with the Invalid Overflows. Still no luck. Next I decided to just build a Static CUDA 9 version of zi3v and see how that worked with the Invalid Overflows. Well, I could build a static zi3v but it failed to get any further than assigning the memory, processing would never start. So, the final build was finally successful. A straight non-static version of CUDA 9 zi3v. Well, what a difference. No More Invalid Overflows, and very few Inconclusive results in general. In fact, there were so few yesterday that today there isn't even an Inconclusive listed for the 2nd. The results for today only list 2 non-overflow Inconclusives as well, https://setiathome.berkeley.edu/results.php?hostid=6906726&state=3 It is a little slower than my patched together Maxwell zi3xs3, but I don't get any Invalids and the Inconclusives are much lower as well. Imagine if I could get a Static 'Callback' version of zi3v working. I wonder how that would work... |
![]() ![]() ![]() ![]() Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 ![]() ![]() |
How did it compare against the mostly non cuda apps when you ran it on beta first? |
![]() Send message Joined: 15 Feb 06 Posts: 10 Credit: 27,125,503 RAC: 0 ![]() |
You linked to a post about a 780, Please look above where you will see it is Common Knowledge the Kepler CC 3.5 GPUs DO NOT WORK CORRECTLY with anything above CUDA 6.0. Hi Tbar, Thank you for all your work. I think the phrase in bold is really important and should be highlighted in the download area. At this time, it is written The CUDA 6.0 Special App is for the older Kepler CC 3.5 GPUs that might not work well with CUDA 8 and above.which is less clear. In my case, with both a GTX 780 and a GTX 1060 on the same Linux Fedora 25 host, I thought I could use Cuda 8. Thanks ! ![]() |
![]() Send message Joined: 15 Feb 06 Posts: 10 Credit: 27,125,503 RAC: 0 ![]() |
Interesting, that 1050Ti has only OpenCL 1.2 support. I'm afraid you won't find a nVidia GPU with anything above OpenCL 1.2 as they stopped any further development to concentrate on CUDA. OpenCL 2.0 is officially supported since nvidia driver 378.66. However it is not clear which hardware will support it. On a windows host with driver 388.XX, all opencl tasks on all projects where failing until I reverted to 377.XX. However, CUDA gpugrid tasks were working correctly. So OpenCL 2.0 seems not ready yet :-/ ![]() |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
It shifts threshold for switching between 2 Spike computation strategies. One computes whole spike on single thread (so, 1D grid), another uses reduction and distributes computation over few workitems (threads) so 2D grid (with overheadon reduction though) so, for some matrix geometry one kernel better, for some - another. And this option allows user to move threshold for switching between them. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
. . An interesting development, especially the part about it working better under Linux than under Windows :) Stephen :) |
![]() Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 ![]() ![]() |
Thank you for the explanation. Petri To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 ![]() ![]() |
I saw Kepler - compatible binary posted in this thread. Should it work with 820M GPU? SETI apps news We're not gonna fight them. We're gonna transcend them. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 ![]() ![]() |
The 820M is a Fermi GPU, CC = 2.1 and will Not work, https://en.wikipedia.org/wiki/CUDA#GPUs_supported The 940M would work as it is CC = 5.0 The 1050Ti would also work. My new zi3v cuda 9.0 build seems to work very well with very few Inconclusives. Unfortunately, it still gives the occasional Bad Best Pulse. Looks good at Beta, https://setiweb.ssl.berkeley.edu/beta/results.php?hostid=76256 I'm thinking about building a OSX version as I'm still getting those Invalid Overflows with zi3xs3, https://setiathome.berkeley.edu/results.php?hostid=6796479&offset=300 The inconclusives are still a little high with zi3xs3 as well as the code not building for Maxwell GPUs. |
![]() ![]() ![]() ![]() Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 ![]() |
Hmmm...looks like the Cuda 9 version of x41p_zi3v behaves a bit differently than the Cuda 8 version. Workunit 2736909594 (01mr08ae.1038.4980.12.39.254) Task 6146941858 (S=19, A=0, P=11, T=0, G=0, BS=33.60793, BG=0) x41p_zi3v, Cuda 8.00 special Task 6146941859 (S=17, A=0, P=13, T=0, G=0, BS=32.92188, BG=0) x41p_zi3v, Cuda 9.00 special It appears that the Cuda 9 version reported 3 Pulses that the Cuda 8 didn't, while Cuda 8 reported 1 Pulse that Cuda 9 didn't. In addition, the Best Spikes differ, due to the Cuda 8 app needing to report 2 extra Spikes to reach the 30-signal overflow ceiling. EDIT: I just ran a quick bench with the stock Windows CPU app. It agrees with the signals reported by the Cuda 9 version. Verrrry interesting! EDIT2: Two more benches, one with the SSE3 Windows CPU app and the other with AVX, also confirm the results of the Cuda 9 app. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Hmmm...looks like the Cuda 9 version of x41p_zi3v behaves a bit differently than the Cuda 8 version. . . Hi Jeff, . . Call me optimistic but that is sounding like good news for the Cuda90 version of 3v. I may have to upgrade. Tell me though, are the run times much slower than Cuda80 ?? Stephen ?? |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 22740 Credit: 416,307,556 RAC: 380 ![]() ![]() |
Speed is a bit of a moot question if one produces "less acceptable" results than the other then the one that produces the more acceptable results "wins" Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Speed is a bit of a moot question if one produces "less acceptable" results than the other then the one that produces the more acceptable results "wins" . . The question of speed was not a condition of upgrading. If the sort order issue has been successfully resolved then it is a must. But Petri had suggested that the change in sort order was certainly doable but might have a speed penalty and I was curious how much that was. Since Jeff is the only person posting results he seemed the best person to ask :) Stephen :) |
![]() ![]() ![]() ![]() Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 ![]() |
Since Jeff is the only person posting results he seemed the best person to ask :)Yeah, but in that example, my machine was the one running the Cuda 8 version of zi3v, while TBar's was the one running Cuda 9. I hadn't tried the Cuda 9 because it seemed even more experimental than the rest of the experimental versions. ;^) The thing here, though, is that the code should be the same in both versions, since they're both zi3v. The differences should simply be due to the libraries they're compiled with, which is what made the results for this one WU rather puzzling. Someone would have to run at least a few bench tests with both versions of zi3v with some different WUs to see if differences such as this are common on overflows. That would likely also be the only way, currently, to get speed comparisons, though in the absence of code differences, I wouldn't expect to see anything significant in that regard. |
Stephen "Heretic" ![]() ![]() ![]() ![]() Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 ![]() ![]() |
Since Jeff is the only person posting results he seemed the best person to ask :) . . OK so it is still wait and see ... patience is a virtue :) Stephen :) |
![]() ![]() ![]() ![]() Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 ![]() |
I don't recall seeing this type of Inconclusive before, so I'll post it. Workunit 2736646059 (blc24_2bit_guppi_57895_42566_HIP91491_0020.14868.818.23.46.126.vlar) Task 6146392692 (S=0, A=0, P=6, T=4, G=0, BS=23.64426, BG=0) v8.08 (alt) windows_x86_64 Task 6146392693 (S=0, A=0, P=6, T=5, G=0, BS=23.64431, BG=0) x41p_zi3x, Cuda 9.00 special The Cuda 9 zi3x reported one more Triplet than the stock app. What looks particularly odd to me is that the peak value of the extra Triplet is identical to a Triplet reported earlier. That's just an observation. I don't have any idea what the significance of it might be. Triplet: peak=10.79457, time=54.88, period=20.8, d_freq=2383858494.65, chirp=52.655, fft_len=32 .... Triplet: peak=10.79457, time=13.72, period=5.2, d_freq=2383864500.68, chirp=-55.426, fft_len=8Both apps reported the first Triplet, but only the Special App reported the second one. One of my Linux machines has been assigned the tiebreaker, which may already have been run (with the zi3t2b Cuda 8.00 Special App) and reported, but the result hasn't yet made it to the Replica DB. (The machine is in weekday siesta mode for about another hour, so I can't check the log yet.) |
![]() ![]() ![]() ![]() Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 ![]() |
One of my Linux machines has been assigned the tiebreaker, which may already have been run (with the zi3t2b Cuda 8.00 Special App) and reported, but the result hasn't yet made it to the Replica DB. (The machine is in weekday siesta mode for about another hour, so I can't check the log yet.)Ah, I see now that tiebreaker got rescheduled to the CPU queue so it hasn't run yet. I considered moving it back to the GPU but I think I'll leave it where it is. That will avoid the risk of cross-validation by the Special App, should zi3t2b also report that extra Triplet. |
![]() ![]() ![]() ![]() Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 ![]() |
This Inconclusive is kind of a mixed bag when it comes to the results reported by the Cuda 8 zi3v app vs. the Cuda 9 zi3x app. Workunit 2741342989 (24ap08aa.13399.4980.10.37.0) Task 6156180177 (S=26, A=0, P=4, T=0, G=0, BS=40.18273, BG=0) x41p_zi3x, Cuda 9.00 special Task 6156180178 (S=24, A=2, P=4, T=0, G=0, BS=?, BG=?) v8.00 (cuda50) windows_intelx86 Task 6157360893 (S=25, A=2, P=3, T=0, G=0, BS=919.8826, BG=2.109926) x41p_zi3v, Cuda 8.00 special The Cuda 9 reported one more Pulse than the Cuda 8, though the 3 that both reported match. The reported Spikes have wildly different peaks and come from different ranges of fft_len, as noted in a couple of previously posted examples. I call it a mixed bag because the bench I just ran on this WU with the stock Windows CPU app basically matches the Pulses reported by zi3x while matching the Spikes and Autocorrs reported by zi3v. That would tend to indicate that the zi3x has improved Pulse-finding over zi3v, but has gone backwards on Spike reporting. (I assume that the Autocorrs were missed simply because the 30-signal overflow was reached with the Spikes from the lower fft_len range.) This might be a good WU for testing future modifications to the Special App. Anyway, the actual signal totals from the bench match the Cuda50 results reported by the second host, so would probably validate that result as the canonical one. However, since the tiebreaker has been assigned to an Intel GPU running v8.20 (opencl_intel_gpu_sah), I'm not really sure what will happen. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.