Linux CUDA 'Special' App finally available, featuring Low CPU use

Author	Message
Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1897357 - Posted: 25 Oct 2017, 10:34:59 UTC Last modified: 25 Oct 2017, 11:04:08 UTC It seems once again this thead becomes "hot". Lets cool it down a little. Indeed, overflow means only fraction of all signals are reported. And indeed arbitrary selected 30 from all isn't so "scientific" to stick with that choice as testimony. But there are few considerations also to still put some efforts in proper results ordering (and I think Petri - the single man who currently can change anything on this side - already agreed with it): 1) in CPU app's ordering we go from small to big relative motions. Usually the less value to correct the more adequate result fo correction are (simple saying the highest chirps could be more distorted even after corrections, but it's IMHO, maybe I'm wrong here). 2) any deviation in reporting order means the need to result resend. That is, performance drop on project level. Hence, proper ordering is just another project-wide level of optimization. Guys you all doing important job that required and valued. No need to alienate each other! https://www.youtube.com/watch?v=bBZAccgvp8M :) SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1897357 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1897378 - Posted: 25 Oct 2017, 13:31:16 UTC - in response to Message 1897336. Last modified: 25 Oct 2017, 13:57:55 UTC I see, EVERY example I provided the host just happened to be running doubles...in your world. It's easy Stephen. If the run-times are equal to the time between reports then he is running singles. If the machine were running doubles there would be twice as many reports, there weren't. Especially on the last example where the only reports were from the GPU and the difference in report times matched the run-times. I can provide many examples, anyone can. Why don't you provide an example to support your claim? . . 2nd try, system trashed my first response, so much for that hour .............. . . Again, since you insist on it. . .It takes a while on a slow machine to gather sufficient evidence to convince someone like you. Especially when my attempt to resurrect a windows setup that hadn't been used for nearly 9 months failed to recognize the GPU, and a reboot proved to be unwise as M$ kicked off one of their irresistible fupdates that trashed the setup and necessitated a hasty repair/rebuild. Sadly the system now is not achieving the results it was back then. While last year it was achieving 85% utilization of the GPU on single tasks and 95+% on doubles it is now only getting 75% and 85% respectively. I have no intention of wasting time trying to recover those extra 2 or 3 mins per task as this is a one off short term project called IYF-D. http://setiathome.berkeley.edu/result.php?resultid=6115330054 http://setiathome.berkeley.edu/result.php?resultid=6115329573 http://setiathome.berkeley.edu/result.php?resultid=6115329851 http://setiathome.berkeley.edu/result.php?resultid=6115329857 http://setiathome.berkeley.edu/result.php?resultid=6115329859 http://setiathome.berkeley.edu/result.php?resultid=6115330124 http://setiathome.berkeley.edu/result.php?resultid=6115330161 http://setiathome.berkeley.edu/result.php?resultid=6115329954 http://setiathome.berkeley.edu/result.php?resultid=6115330221 http://setiathome.berkeley.edu/result.php?resultid=6115330223 http://setiathome.berkeley.edu/result.php?resultid=6115330228 http://setiathome.berkeley.edu/result.php?resultid=6113411994 http://setiathome.berkeley.edu/result.php?resultid=6113392134 http://setiathome.berkeley.edu/result.php?resultid=6113392653 http://setiathome.berkeley.edu/result.php?resultid=6113392679 http://setiathome.berkeley.edu/result.php?resultid=6113392011 http://setiathome.berkeley.edu/result.php?resultid=6113392533 . . I expect you're feeling pretty smug and self satisfied right now. The bad news for you, is that EVERY ONE of those results are from running doubles. http://setiathome.berkeley.edu/result.php?resultid=6115329870 http://setiathome.berkeley.edu/result.php?resultid=6115329925 http://setiathome.berkeley.edu/result.php?resultid=6115330183 http://setiathome.berkeley.edu/result.php?resultid=6115329932 http://setiathome.berkeley.edu/result.php?resultid=6113380483 http://setiathome.berkeley.edu/result.php?resultid=6113361743 http://setiathome.berkeley.edu/result.php?resultid=6115330080 http://setiathome.berkeley.edu/result.php?resultid=6115330226 http://setiathome.berkeley.edu/result.php?resultid=6115329777 http://setiathome.berkeley.edu/result.php?resultid=6113524809 http://setiathome.berkeley.edu/result.php?resultid=6113525371 http://setiathome.berkeley.edu/result.php?resultid=6113525373 http://setiathome.berkeley.edu/result.php?resultid=6113525174 http://setiathome.berkeley.edu/result.php?resultid=6113525430 http://setiathome.berkeley.edu/result.php?resultid=6113525436 . . But the pieces de resistance are ... http://setiathome.berkeley.edu/result.php?resultid=6113525433 http://setiathome.berkeley.edu/result.php?resultid=6113524696 . . Of course a clever man like you might have thought to actually look at the host (not hidden) from which you are demanding results. Evidently not. . . It's really a shame that when people try to take part and help you have to be a douche and call them a liar. . . Enjoy your reading ........... Stephen :( ID: 1897378 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1897439 - Posted: 25 Oct 2017, 16:46:22 UTC - in response to Message 1897355. So, for this WU, having 3 Special Apps end up patting each other on the back worked out just fine. That doesn't mean that it's a good situation. There are still enough nagging little inconsistencies and inaccuracies in the Special App that this kind of cross-validation could manage to squeeze out legitimate results, and as use of the Special App becomes more widespread, that sort of scenario becomes more likely. And correct way to solve this would be to send tiebreaker to distinct plan class device, preferably, CPU device. In case of anonymous platform this would mean "to send to stock CPU". I'll attempt to rise this issue with Eric. Thanks, Raistmer. That would be very helpful in reducing the cross-validation risk for experimental apps like Petri's, and others that might come along in the future. Of course, some cross-validation risk still exists even for the initial two hosts on a WU, as we've seen occasionally, but I don't know how that could be addressed. There's such an overwhelming preponderance of nVIDIA devices in the mix that it would seem impossible to always have each one paired with a different device class for that first pass. ID: 1897439 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1897483 - Posted: 25 Oct 2017, 21:54:44 UTC - in response to Message 1897356. Best spike: peak=-nan, time=1.678, d_freq=1418984342.92, chirp=-19.121, fft_len=32k This appears to have resulted from a restart of the task, even though the 2 legitimate Spikes appear in the Stderr both before and after the restart. Hm.... such not a number values better to catch by app's sanity checks. AFAIK Petri implemented same sanity checks as OpenCL and CPU apps have currently. So, probably they all don't check for "not a number" condition... It's TODO for checking... I just did a bit of research this afternoon and found that my own Linux boxes occasionally return "Best spike: peak=nan", (though not "peak=-nan") always associated with a restarted task. I found 15 of them in my October archives, spread across all 3 boxes and both flavors of the Special App (x41p_zi3t2b and x41p_zi3v). The most recent appears to be Task 6115510571. I believe that all of these were initially marked Inconclusive, though all but two (which also had bogus Spikes or Triplets) seemed to get validated in the end. The negative "-nan" does, however, show up on those nagging restarted tasks that throw a bunch of bogus Triplets after the restart (Message 1875324), such as Task 6115692460 where 30 non-existent Triplets similar to the following were reported. Triplet: peak=-nan, time=84.31, period=3.355, d_freq=1420832615.81, chirp=-18.47, fft_len=8k Triplet: peak=-nan, time=84.72, period=3.775, d_freq=1420832608.06, chirp=-18.47, fft_len=8k ... ... Triplet: peak=-nan, time=88.92, period=3.775, d_freq=1420832530.59, chirp=-18.47, fft_len=8k Triplet: peak=-nan, time=88.08, period=2.097, d_freq=1420832546.08, chirp=-18.47, fft_len=8k So far, I've only seen the bogus Triplets occur with the x41p_zi3v version. ID: 1897483 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1897495 - Posted: 26 Oct 2017, 0:25:19 UTC - in response to Message 1897483. Last modified: 26 Oct 2017, 0:27:48 UTC I just did a bit of research this afternoon and found that my own Linux boxes occasionally return "Best spike: peak=nan", (though not "peak=-nan") always associated with a restarted task. I found 15 of them in my October archives, spread across all 3 boxes and both flavors of the Special App (x41p_zi3t2b and x41p_zi3v). The most recent appears to be Task 6115510571. I believe that all of these were initially marked Inconclusive, though all but two (which also had bogus Spikes or Triplets) seemed to get validated in the end. This means "Special" app needs to check resuming logic to properly initialize best signal values. Nevertheless, even being unproperly initialized such values better to catch via sanity check (this way error would be much more obvious). EDIT: AFAIK GPU uses finite arithmetics by default (it's faster) so NaN value can't occur due to GPU operations. That leaves NaN source single option - uninitialized memory (though can be both host and/or device based). SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1897495 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1897507 - Posted: 26 Oct 2017, 4:07:14 UTC - in response to Message 1897495. Yep, certainly seems likely. I wonder why it only seems to affect the individual Triplet reporting and the Best Spike value. I would think that all those fields would get initialized at the same time. Oh well, hopefully Petri will have the time to track it down soon. ID: 1897507 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1897512 - Posted: 26 Oct 2017, 4:39:33 UTC - in response to Message 1897483. The most recent appears to be Task 6115510571. I just noticed that my initial wingman for this task (in WU 2721792884) got a "Maximum elapsed time exceeded" error. That host belongs to Mr. Kevvy and it looks like perhaps it's hit that 20x APR threshold due to excessive rescheduling. Almost all the CPU tasks on that machine are failing after a couple hours of run time, and appear to have been doing so for quite a while. ID: 1897512 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1897526 - Posted: 26 Oct 2017, 6:49:47 UTC - in response to Message 1897378. Last modified: 26 Oct 2017, 6:55:42 UTC I see, EVERY example I provided the host just happened to be running doubles...in your world. It's easy Stephen. If the run-times are equal to the time between reports then he is running singles. If the machine were running doubles there would be twice as many reports, there weren't. Especially on the last example where the only reports were from the GPU and the difference in report times matched the run-times. I can provide many examples, anyone can. Why don't you provide an example to support your claim? . . 2nd try, system trashed my first response, so much for that hour .............. . . Again, since you insist on it... Whoa Stephen, you didn't need to go through all that. I would have settled for links to machines at SETI showing similar times as you claimed, sorta what I did. Oh wait...you couldn't, because you can't find any. I certainly haven't found any, all of the ones I can find show about an hour for an AR 0.40 WU. So, I suppose it as I suggested...something peculiar to your machine? I did notice your clockrate is a bit higher than the rest, 1005 where the stock clock is 901? I trust you have the memory clock spiked as well? Yes, it does look better than all the ones I've been able to find. Shame it seems to have a problem with the Special App, most all the other GPUs are around twice as fast using the Special App. I think that's why You, and a number of others are now running Linux? I ran my own little test using the Linux CUDA 42 app against the zi3v Apps. I think you'll find the Linux CUDA 42 App is just as fast as the Windows one, actually I think it's a little faster. So this is what I got on my 750Ti; Current WU: 23se08ac.6875.22968.6.33.135.wu true_angle_range>0.44813078486041 ---------------------------------------------------------------- Running default app with command :... setiathome_x41zi_x86_64-pc-linux-gnu_cuda42 -device 0 Elapsed Time: ....................... 725 seconds ---------------------------------------------------------------- Running app with command : .......... setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda60 -device 0 Elapsed Time : ...................... 389 seconds Speed compared to default : ......... 186 % ----------------- Comparing results Result : Strongly similar, Q= 99.95% ---------------------------------------------------------------- Running app with command : .......... setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda80 -device 0 Elapsed Time : ...................... 371 seconds Speed compared to default : ......... 195 % ----------------- Comparing results Result : Strongly similar, Q= 99.96% So, just as with the other Special Apps zi3v CUDA60 is almost twice as fast as the baseline CUDA app on the Arecibo tasks, which means it's also about twice as fast as the OpenCL App, but, people already knew that...that's why they are running the Special App. I don't know why it's different on your particular card, not really something to brag about IMHO. What's that old line...with friends like that who needs... Actually, I don't gain a thing by going through all the trouble to build this App, I don't even have a Kepler card, and I'm certainly not being paid for it. So any help would be directed toward those that may find this App useful, not me. The question would be, did you help any of the people that may use this App? I find that questionable. Some may actually believe you saying it's just a little faster than CUDA 50 and choose not to bother using it. That certainly won't bother me, but, they may lose out. The main reason I built this App was for the people using 780s & Titans who were asking for a version that worked with their GPUs. Well, they now have one, whether or not they use it is up to them, I gain nothing either way. Should I add an asterisk to the ReadMe? Something like, "works about Twice as fast as the stock Apps, *except on Stephen's 730". I dunno, I suppose that's possible. ID: 1897526 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1897535 - Posted: 26 Oct 2017, 8:10:24 UTC - in response to Message 1897512. The most recent appears to be Task 6115510571. I just noticed that my initial wingman for this task (in WU 2721792884) got a "Maximum elapsed time exceeded" error. That host belongs to Mr. Kevvy and it looks like perhaps it's hit that 20x APR threshold due to excessive rescheduling. Almost all the CPU tasks on that machine are failing after a couple hours of run time, and appear to have been doing so for quite a while. . . Mr Kevvy seems to be quiet these days, I wonder if he is still checking his rigs? Stephen ?? ID: 1897535 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1897613 - Posted: 26 Oct 2017, 18:42:05 UTC - in response to Message 1897535. . . Mr Kevvy seems to be quiet these days, I wonder if he is still checking his rigs? Stephen ?? He's definitely still around, since he's currently one of the mods. He just hasn't posted in Number Crunching for awhile. I suppose if he doesn't pick up on this thread in a day or so, a PM would be in order. ID: 1897613 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1898212 - Posted: 30 Oct 2017, 3:38:58 UTC Last modified: 30 Oct 2017, 3:40:44 UTC Here's a non-overflow Inconclusive that should probably be watched. Workunit 2727927944 (02ap07ad.14827.10706.5.32.128) Task 6128258808 (S=13, A=0, P=5, T=5, G=0, BS=25.89224, BG=3.375406) x41p_zi3xs3, Cuda 9.00 special Task 6128258809 (S=9, A=0, P=5, T=5, G=0, BS=25.89224, BG=3.375409) x41p_zi3v, Cuda 8.00 special The 4 additional Spikes that the zi3xs3 reported are the first 4 listed in the Stderr, all with "time=6.711". All other signals and "Best" values appear to match up just fine. (Note that I've now added a BS value to my summaries, for "Best Spike", since that also seems to be a possible area of concern on some WUs, where a NaN value might be reported.) As I write this, the tiebreaker task has yet to be sent out. ID: 1898212 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1898296 - Posted: 30 Oct 2017, 17:48:51 UTC - in response to Message 1898212. Here's a non-overflow Inconclusive that should probably be watched. Workunit 2727927944 (02ap07ad.14827.10706.5.32.128) Task 6128258808 (S=13, A=0, P=5, T=5, G=0, BS=25.89224, BG=3.375406) x41p_zi3xs3, Cuda 9.00 special Task 6128258809 (S=9, A=0, P=5, T=5, G=0, BS=25.89224, BG=3.375409) x41p_zi3v, Cuda 8.00 special The 4 additional Spikes that the zi3xs3 reported are the first 4 listed in the Stderr, all with "time=6.711". All other signals and "Best" values appear to match up just fine. (Note that I've now added a BS value to my summaries, for "Best Spike", since that also seems to be a possible area of concern on some WUs, where a NaN value might be reported.) As I write this, the tiebreaker task has yet to be sent out. Did Petri say that "3v" disallowed for production use? SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1898296 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1898301 - Posted: 30 Oct 2017, 18:15:00 UTC - in response to Message 1898296. Did Petri say that "3v" disallowed for production use? Not as far as I know. A few weeks ago he posted a message to try to discourage people from using the zi3t2, due to Pulse reporting issues. He recommended moving to the "latest" Cuda8, which I believe is the zi3v and was supposed to improve on the Pulse reporting. His message was somewhat confusing, however, and referenced the "s2" a lot. Anyway, I've been running the zi3v on that one box since I first converted it to Linux. While there are still several nagging problems there, the biggest one I run into is the bogus Spikes and Triplets that I often see following restarts. ID: 1898301 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1898350 - Posted: 30 Oct 2017, 22:31:10 UTC - in response to Message 1898296. Here's a non-overflow Inconclusive that should probably be watched. Well, here's an Overflow issue that needs to be watched. Raistmer, any idea why the Spikes have a much higher reading on the other Overflows than the Special App? Other than the full zi3xs3 only working on Pascal GPUs, this is the only problem I see with the Overflows with zi3xs3. Of course, it still has the occasional bad Best Pulse problem as well. https://setiathome.berkeley.edu/workunit.php?wuid=2728259294 It's the same with the older App, https://setiathome.berkeley.edu/workunit.php?wuid=2728384537 https://setiathome.berkeley.edu/workunit.php?wuid=2727684970 ID: 1898350 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1898357 - Posted: 30 Oct 2017, 23:02:44 UTC - in response to Message 1898350. That looks about the same as the one I reported in Message 1896776, but with zi3x vs. zi3t2b. When I benched it (Message 1896912), the stock Windows CPU result matched zi3t2b, so it would appear something may have gone sideways starting with zi3x. ID: 1898357 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1898359 - Posted: 30 Oct 2017, 23:22:58 UTC - in response to Message 1898357. Well, Petri says it's because his newer Apps are finding signals in the first chirp whereas the other Apps aren't. It is something in the newer Apps, he's just not sure what. Other than that, the full zi3xs3 runs nicely on the Pascal GPUs. ID: 1898359 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1898361 - Posted: 30 Oct 2017, 23:37:27 UTC - in response to Message 1898212. Here's a non-overflow Inconclusive that should probably be watched. Workunit 2727927944 (02ap07ad.14827.10706.5.32.128) Task 6128258808 (S=13, A=0, P=5, T=5, G=0, BS=25.89224, BG=3.375406) x41p_zi3xs3, Cuda 9.00 special Task 6128258809 (S=9, A=0, P=5, T=5, G=0, BS=25.89224, BG=3.375409) x41p_zi3v, Cuda 8.00 special The 4 additional Spikes that the zi3xs3 reported are the first 4 listed in the Stderr, all with "time=6.711". All other signals and "Best" values appear to match up just fine. The tiebreaker on this one ran as a Linux CPU task and appears to have matched with the zi3v result. It did not report those first 4 Spikes that that the zi3xs3 did, so perhaps the same issue as in the 30-Spike overflow. All 3 hosts got credit on this one, however. ID: 1898361 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1898428 - Posted: 31 Oct 2017, 11:53:24 UTC - in response to Message 1898350. Last modified: 31 Oct 2017, 11:54:08 UTC Raistmer, any idea why the Spikes have a much higher reading on the other Overflows than the Special App? Cause power values come from some summation simplest guess would be those summations are distributed and some part is missing/not reduced in overflow case. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1898428 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1898429 - Posted: 31 Oct 2017, 11:58:25 UTC - in response to Message 1898359. Last modified: 31 Oct 2017, 12:06:17 UTC Well, Petri says it's because his newer Apps are finding signals in the first chirp whereas the other Apps aren't. It is something in the newer Apps, he's just not sure what. Other than that, the full zi3xs3 runs nicely on the Pascal GPUs. To be correct: "first chirp" is zero chirp. And definitely algorithm looks for signals there (it means no relative motion regarding source and receiver). What is omitted and by the reason is the 0th slot in PoT analysis (for all chirps). Zero slot means static signal strength and obviously should be ignored. If Petri's app really accepts anything from that slot it's serious bug. EDIT: indeed, handling 0th slot differently from all others means divergence and performance drop in CU that processed it along with others. But that's life, correct algorithm functioning requires omitting results from that slot. If I recall correctly I implemented it in way that all processing is perfomed w/o deviation but results reduction omits anything from that slot. In such way GPU performance drop is minimal. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1898429 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1898439 - Posted: 31 Oct 2017, 23:41:19 UTC Interesting, that 1050Ti has only OpenCL 1.2 support. What NV card would have OpenCL 2.0 then?... SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1898439 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.