Linux CUDA 'Special' App finally available, featuring Low CPU use

Author	Message
Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1874857 - Posted: 24 Jun 2017, 13:23:24 UTC - in response to Message 1874825. Maybe a new call for testing CPU apps is a good idea New binary to test on beta to get more CPUs running. I fired mine back up. . . Raistmer has been running testing on V8.05 just recently. Stephen . ID: 1874857 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1874870 - Posted: 24 Jun 2017, 14:03:41 UTC The New CPU App is for Windows; Windows/x86 running on an AMD x86_64 or Intel EM64T CPU 8.06 (alt) 1 Jun 2017, 17:48:19 UTC So....You would need to run a Windows Machine at Beta to run the New CPU App. Basically the Test is finished, and it's just a matter of time before it appears on main, https://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2266 I've checked my CPU results against the new App and it's a good match. ID: 1874870 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1874872 - Posted: 24 Jun 2017, 14:12:49 UTC - in response to Message 1874870. The New CPU App is for Windows; Windows/x86 running on an AMD x86_64 or Intel EM64T CPU 8.06 (alt) 1 Jun 2017, 17:48:19 UTC So....You would need to run a Windows Machine at Beta to run the New CPU App. Basically the Test is finished, and it's just a matter of time before it appears on main, https://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2266 I've checked my CPU results against the new App and it's a good match. . . Well that sounds like a fait accomplis! . . So I guess that should mean it is just a matter of time for your app as well ?? Stephen ?? ID: 1874872 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1874886 - Posted: 24 Jun 2017, 17:00:30 UTC - in response to Message 1874126. This version uses a little more Video Ram than before meaning GPUs with less than 2 GBs may not work and GPUs with 2 GBs may have problems using Unroll 8. It you have problems lower the -unroll to 6 or lower. Just a quick note to that point. On my newly-launched Linux host, 8289033, i went ahead with x41p_zi3v since I was starting it up from scratch. The 2GB GTX 960 on that machine, which defaults to an unroll value of 8 using autotune, is using 91% of dedicated memory. The 2GB GTX 960s on my other two Linux boxes, both still running x41p_zi3t2b, are using 81% of dedicated memory, so the increase in memory use is significant but not enough to push it over the top. So far, I've only noted one task with a problem using zi3v, and I'm pretty sure that's an isolated incident. Task 5828724704 originally was started on a GTX 750 Ti but, following a reboot, restarted on the GTX 960. Before the restart, it looks like it was running fine, but afterwards it went haywire, identifying 25 bogus Triplets with non-numeric peaks (i.e, "peak=-nan"). I imagine that it's just some sort of restart timing issue, though perhaps on a restart like that the memory usage spikes in some way. Only if it happens again will I really be concerned. Anyway, that task is currently in an Inconclusive state but I expect it to go Invalid once the tie-breaker reports in. ID: 1874886 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1874934 - Posted: 25 Jun 2017, 2:55:06 UTC Last modified: 25 Jun 2017, 3:12:27 UTC Judging by the following 3 Inconclusive WUs, I have to assume there were significant changes to the Best Gaussian reporting between x41p_zi3j and x41p_zi3t2b. Could identifying where those changes occurred help identify where the issue might be now? Workunit 2584872714 (07no16ab.5263.633293.16.43.234) Task 5829506938 (S=12, A=2, P=0, T=1, G=0, BG=8.010661) x41p_zi3t2b, Cuda 8.00 special Task 5829506939 (S=12, A=2, P=0, T=1, G=0, BG=4.340267) x41p_zi3j, Cuda 8.00 special Workunit 2584872738 (07no16ab.5263.633293.16.43.239) Task 5829506946 (S=0, A=0, P=0, T=2, G=0, BG=8.485233) x41p_zi3t2b, Cuda 8.00 special Task 5829506947 (S=0, A=0, P=0, T=2, G=0, BG=5.217659) x41p_zi3j, Cuda 8.00 special Workunit 2584872762 (07no16ab.5263.633293.16.43.245) Task 5829506954 (S=13, A=0, P=0, T=4, G=0, BG=3.93118) x41p_zi3j, Cuda 8.00 special Task 5829506955 (S=13, A=0, P=0, T=4, G=0, BG=7.523284) x41p_zi3t2b, Cuda 8.00 special EDIT: I also just noticed an odd one where both hosts are apparently running the same app, just with different command line parameters. The only difference I can see is a very subtle one in the Best Pulse (peak=4.157121 vs. peak=4.282314). Is this the sort of thing that the zi3t2v is supposed to correct? Workunit 2585082982 (12no16ab.30367.387921.9.36.65) Task 5829947327 (S=0, A=3, P=0, T=2, G=0, BG=5.81025) x41p_zi3t2b, Cuda 8.00 special Task 5829947328 (S=0, A=3, P=0, T=2, G=0, BG=5.81025) x41p_zi3t2b, Cuda 8.00 special ID: 1874934 ·

Stephen "Heretic" Volunteer tester Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628	Message 1874966 - Posted: 25 Jun 2017, 12:02:25 UTC - in response to Message 1835988. As soon as I (or anyone) has managed to isolate and fix the unroll bug, the special alpha will become available in Windows form. It's not now, only because the mass devastation of the project results that would occur if released. Fortunately have more time for digging from a couple of days ago, though not promising quick fixes. The pulsefinding is unfortunately the most complex part of these applications. . . Not that I have the foggiest about the unroll bug but I like the idea of a test version being available for windows. However long it takes it won't be any worse than waiting for this low profile 1050 ti that MSI touted. Nearly a month and not a peep since they posted it's existence. Stephen . . . OK. so the Windows Alpha/Beta test version has taken longer to arrive than the LP 1050ti did :) Stephen :) ID: 1874966 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1874990 - Posted: 25 Jun 2017, 15:17:48 UTC - in response to Message 1874793. Last modified: 25 Jun 2017, 15:18:14 UTC One thing that I think I'm noticing is that when there is a reported Gaussian, that peak will match the Best Gaussian peak in SoG. However, in the other apps, the Best Gaussian will have a higher peak than the reported Gaussian. Perhaps there's some significance there. Or perhaps not. :^) Yes, it's very important observation (though it doesn't cover 8.05 vs CPU opt difference). OpenCL apps stop Gaussian search with params worse than already found reportable one (that is, best Gaussian will be one of reportable ones if reportable exist) AFAIK same feature (not report non-reportable as best if reportable already found) exists in CPU path too. Need to check that. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1874990 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1874991 - Posted: 25 Jun 2017, 15:20:45 UTC - in response to Message 1874966. Last modified: 25 Jun 2017, 15:21:41 UTC . . OK. so the Windows Alpha/Beta test version has taken longer to arrive than the LP 1050ti did :) Stephen :) This sentence would have more meanig with NV/MSI/whatever vs Petri's budget citation... SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1874991 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1875007 - Posted: 25 Jun 2017, 16:59:23 UTC Last modified: 25 Jun 2017, 17:05:02 UTC After a couple days, I found another Inconclusive against a CPU. It's a matter of missing Triplets, the CUDA App missed three that two CPUs found...first time with this result. The other couple were with bad Best Pulses. Yes, even with zi3v you still get an occasional Bad Best Pulse, but it's extremely rare. The Beta results are here, https://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=9837856 I ran it with the OSX CPU and got; Version info: SSE4.1xjf (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan SSE4.1xjf OS X 64bit Build 3344 , Ported by : Raistmer, JDWhale, Urs Echternacht Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 7.386570 Triplet: peak=7.026053, time=34.2, period=0.1917, d_freq=1420949707.03, chirp=0, fft_len=8 Triplet: peak=9.419447, time=14.05, period=0.01065, d_freq=1420950927.73, chirp=0, fft_len=16 Triplet: peak=9.904287, time=14.05, period=0.01065, d_freq=1420950927.73, chirp=0, fft_len=16 Triplet: peak=7.470804, time=80.6, period=0.1483, d_freq=1420951538.09, chirp=0, fft_len=16 Triplet: peak=7.411139, time=26.32, period=0.01475, d_freq=1420948791.5, chirp=0, fft_len=32 Triplet: peak=7.377739, time=44.18, period=0.06062, d_freq=1420944213.87, chirp=0, fft_len=32 Autocorr: peak=17.97998, time=87.24, delay=5.87, d_freq=1420945408.22, chirp=-21.29, fft_len=128k Spike: peak=24.03246, time=6.711, d_freq=1420944476.05, chirp=-29.499, fft_len=128k Triplet: peak=6.846987, time=73.11, period=0.1065, d_freq=1420946887.11, chirp=69.956, fft_len=32 Triplet: peak=7.447862, time=26.32, period=0.01475, d_freq=1420948801.88, chirp=69.956, fft_len=32 Triplet: peak=7.316117, time=26.32, period=0.01475, d_freq=1420948781.13, chirp=-69.956, fft_len=32 Best spike: peak=24.03246, time=6.711, d_freq=1420944476.05, chirp=-29.499, fft_len=128k Best autocorr: peak=17.97998, time=87.24, delay=5.87, d_freq=1420945408.22, chirp=-21.29, fft_len=128k Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.121e+11, d_freq=0, score=-12, null_hyp=0, chirp=0, fft_len=0 Best pulse: peak=2.541833, time=34.19, period=0.04792, d_freq=1420949707.03, score=0.8909, chirp=0, fft_len=8 Best triplet: peak=9.904287, time=14.05, period=0.01065, d_freq=1420950927.73, chirp=0, fft_len=16 Spike count: 1 Autocorr count: 1 Pulse count: 0 Triplet count: 9 Gaussian count: 0 Time cpu in use since last restart: 3618.6 seconds So, after 2800 tasks, I get a Bad triplet count. I suppose I shouldn't complain. Could have one of those Cosmic Rays I suppose. The task is here, http://boinc2.ssl.berkeley.edu/beta/download/3d/04oc08ab.15453.20522.13.47.97 Anyway, this will give people a chance to compare my SSE4.1xjf OS X 64bit Build 3344 Results to the new SETI@home v8 v8.06 (alt) windows_x86_64 Results. Looks pretty close... ID: 1875007 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1875010 - Posted: 25 Jun 2017, 17:30:46 UTC - in response to Message 1875007. Yes, even with zi3v you still get an occasional Bad Best Pulse, but it's extremely rare. That may explain this one then. I saw it last evening but had refrained from posting it until that machine had more of a processing history behind it. Workunit 2567983999 (20oc08aa.4777.254820.12.39.5) Task 5794100079 (S=10, A=3, P=0, T=0, G=0, BG=0) v8.22 (opencl_ati_cat132) windows_intelx86 Task 5829376759 (S=10, A=3, P=0, T=0, G=0, BG=0) x41p_zi3v, Cuda 8.00 special v8.22 (opencl_ati_cat132) windows_intelx86 - Best pulse: peak=0.4685673, time=98.45, period=0.01441, d_freq=1420048834.69, score=0.9218, chirp=-61.928, fft_len=8 x41p_zi3v, Cuda 8.00 special - Best pulse: peak=0.3951461, time=68.92, period=0.0147, d_freq=1420052490.23, score=0.7774, chirp=0, fft_len=8 All the reported signals and Best signals seem to match between the two. ID: 1875010 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1875012 - Posted: 25 Jun 2017, 17:43:46 UTC - in response to Message 1875010. Yes, even with zi3v you still get an occasional Bad Best Pulse, but it's extremely rare. That may explain this one then. I saw it last evening but had refrained from posting it until that machine had more of a processing history behind it. Workunit 2567983999 (20oc08aa.4777.254820.12.39.5) Task 5794100079 (S=10, A=3, P=0, T=0, G=0, BG=0) v8.22 (opencl_ati_cat132) windows_intelx86 Task 5829376759 (S=10, A=3, P=0, T=0, G=0, BG=0) x41p_zi3v, Cuda 8.00 special v8.22 (opencl_ati_cat132) windows_intelx86 - Best pulse: peak=0.4685673, time=98.45, period=0.01441, d_freq=1420048834.69, score=0.9218, chirp=-61.928, fft_len=8 x41p_zi3v, Cuda 8.00 special - Best pulse: peak=0.3951461, time=68.92, period=0.0147, d_freq=1420052490.23, score=0.7774, chirp=0, fft_len=8 All the reported signals and Best signals seem to match between the two. It differs from usual bug signature - there is no reportable Pulse so can be just difference in precision range. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1875012 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1875014 - Posted: 25 Jun 2017, 17:49:13 UTC - in response to Message 1875010. Last modified: 25 Jun 2017, 17:50:23 UTC The only way to tell for sure is to run the task with your CPU and compare the results. You should give that a try, you can run a CPU task in the benchmark App while running BOINC. Just reduce the CPU usage by One in BOINC and remove any Apps from the APPS folder in the Benchmark package. The CPU App in the REF_APPS folder will search the WU folder and run any task it doesn't have results for. The Benchmark tool is here, KWSN Linux MB Bench v2.01.08. Extract the KWSN-Bench-Linux-MBv7_v2.01.08.7z to your Home folder and run it from there. ID: 1875014 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1875015 - Posted: 25 Jun 2017, 18:05:02 UTC - in response to Message 1875014. The only way to tell for sure is to run the task with your CPU and compare the results. You should give that a try, you can run a CPU task in the benchmark App while running BOINC. Just reduce the CPU usage by One in BOINC and remove any Apps from the APPS folder in the Benchmark package. The CPU App in the REF_APPS folder will search the WU folder and run any task it doesn't have results for. The Benchmark tool is here, KWSN Linux MB Bench v2.01.08. Extract the KWSN-Bench-Linux-MBv7_v2.01.08.7z to your Home folder and run it from there. Thanks. I may try that later. I actually was looking in Lunatics for the Windows CPU bench tool yesterday, but couldn't find one that said v8 on it, or in the Readme, so I wasn't sure what was current. ID: 1875015 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1875019 - Posted: 25 Jun 2017, 18:14:28 UTC - in response to Message 1875015. Just use v8 app as ref one. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1875019 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1875020 - Posted: 25 Jun 2017, 18:20:50 UTC - in response to Message 1875019. Just use v8 app as ref one. Okay, so for Windows, the MBbench210 is the most current? ID: 1875020 ·

TBar Volunteer tester Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768	Message 1875023 - Posted: 25 Jun 2017, 18:35:50 UTC Last modified: 25 Jun 2017, 18:44:34 UTC Here's an example of how valuable the tool can be. I had another task that showed zi3v as finding One less triplet, so, even though the other App is a Known Usual Suspect I ran it with the CPU just to make sure. The task is here, https://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=9838044 My CPU was a Very Close Match to zi3v; SSE4.1xjf OS X 64bit Build 3344 , Ported by : Raistmer, JDWhale, Urs Echternacht Work Unit Info: ............... Credit multiplier is : 2.85 WU true angle range is : 8.626744 Triplet: peak=7.835223, time=7.977, period=0.1479, d_freq=1420284423.83, chirp=0, fft_len=8 Triplet: peak=7.51159, time=9.684, period=0.2327, d_freq=1420278930.66, chirp=0, fft_len=16 Triplet: peak=7.83766, time=106.4, period=0.1303, d_freq=1420280761.72, chirp=0, fft_len=16 Triplet: peak=7.675463, time=39.55, period=0.1098, d_freq=1420285339.36, chirp=0, fft_len=32 Triplet: peak=7.382066, time=78, period=0.008192, d_freq=1420287170.41, chirp=0, fft_len=32 Triplet: peak=14.74022, time=79.87, period=0.1065, d_freq=1420286063.34, chirp=81.663, fft_len=32 Triplet: peak=7.78183, time=102.1, period=0.1458, d_freq=1420285852.58, chirp=-81.663, fft_len=32 Best spike: peak=23.52901, time=20.13, d_freq=1420281048.53, chirp=-14.024, fft_len=128k Best autocorr: peak=17.05683, time=46.98, delay=2.9592, d_freq=1420282416.39, chirp=-16.748, fft_len=128k Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.121e+11, d_freq=0, score=-12, null_hyp=0, chirp=0, fft_len=0 Best pulse: peak=4.714036, time=80, period=0.1057, d_freq=1420286074.44, score=0.8968, chirp=81.663, fft_len=32 Best triplet: peak=14.74022, time=79.87, period=0.1065, d_freq=1420286063.34, chirp=81.663, fft_len=32 Spike count: 0 Autocorr count: 0 Pulse count: 0 Triplet count: 7 Gaussian count: 0 Time cpu in use since last restart: 3568.2 seconds So, I'm pretty sure the Triplet find in zi3v is working normally. ID: 1875023 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1875025 - Posted: 25 Jun 2017, 19:16:57 UTC - in response to Message 1875020. Just use v8 app as ref one. Okay, so for Windows, the MBbench210 is the most current? maybe not but fully adequate for your aims SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1875025 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1875032 - Posted: 25 Jun 2017, 19:40:32 UTC - in response to Message 1875025. Just use v8 app as ref one. Okay, so for Windows, the MBbench210 is the most current? maybe not but fully adequate for your aims Okay, thanks. Your "Addition to MBBench v2.13" had made me think that there might be a more recent version that I just couldn't find. I had downloaded the 210 version yesterday, so perhaps I'll start playing with it later on. ID: 1875032 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1875057 - Posted: 25 Jun 2017, 21:44:41 UTC - in response to Message 1875010. ...All the reported signals and Best signals seem to match between the two. If implemented as I picture: For the pulse mechanism shunt/workaround, the stderr.txt 'realtime' log might see the racing pulse detections, then shunt to unroll 1 to record the correct ones. If that's the case, it does reflect reality in the new 'racey-fixey' kindof way, but may need to be presented more clearly. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1875057 ·

Jeff Buck Volunteer tester Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0	Message 1875060 - Posted: 25 Jun 2017, 21:55:17 UTC - in response to Message 1875057. ...All the reported signals and Best signals seem to match between the two. If implemented as I picture: For the pulse mechanism shunt/workaround, the stderr.txt 'realtime' log might see the racing pulse detections, then shunt to unroll 1 to record the correct ones. If that's the case, it does reflect reality in the new 'racey-fixey' kindof way, but may need to be presented more clearly. Ah, so perhaps the actual Result file would contain a different Best Pulse value than the Stderr shows? ID: 1875060 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.