Linux CUDA 'Special' App finally available, featuring Low CPU use

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 34 · 35 · 36 · 37 · 38 · 39 · 40 . . . 83 · Next

AuthorMessage
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1874857 - Posted: 24 Jun 2017, 13:23:24 UTC - in response to Message 1874825.  

Maybe a new call for testing CPU apps is a good idea New binary to test on beta to get more CPUs running.

I fired mine back up.


. . Raistmer has been running testing on V8.05 just recently.

Stephen

.
ID: 1874857 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1874870 - Posted: 24 Jun 2017, 14:03:41 UTC

The New CPU App is for Windows;
Windows/x86 running on an AMD x86_64 or Intel EM64T CPU 8.06 (alt) 1 Jun 2017, 17:48:19 UTC

So....You would need to run a Windows Machine at Beta to run the New CPU App. Basically the Test is finished, and it's just a matter of time before it appears on main, https://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2266
I've checked my CPU results against the new App and it's a good match.
ID: 1874870 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1874872 - Posted: 24 Jun 2017, 14:12:49 UTC - in response to Message 1874870.  

The New CPU App is for Windows;
Windows/x86 running on an AMD x86_64 or Intel EM64T CPU 8.06 (alt) 1 Jun 2017, 17:48:19 UTC

So....You would need to run a Windows Machine at Beta to run the New CPU App. Basically the Test is finished, and it's just a matter of time before it appears on main, https://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2266
I've checked my CPU results against the new App and it's a good match.


. . Well that sounds like a fait accomplis!

. . So I guess that should mean it is just a matter of time for your app as well ??

Stephen

??
ID: 1874872 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1874886 - Posted: 24 Jun 2017, 17:00:30 UTC - in response to Message 1874126.  

This version uses a little more Video Ram than before meaning GPUs with less than 2 GBs may not work and GPUs with 2 GBs may have problems using Unroll 8. It you have problems lower the -unroll to 6 or lower.
Just a quick note to that point. On my newly-launched Linux host, 8289033, i went ahead with x41p_zi3v since I was starting it up from scratch. The 2GB GTX 960 on that machine, which defaults to an unroll value of 8 using autotune, is using 91% of dedicated memory. The 2GB GTX 960s on my other two Linux boxes, both still running x41p_zi3t2b, are using 81% of dedicated memory, so the increase in memory use is significant but not enough to push it over the top.

So far, I've only noted one task with a problem using zi3v, and I'm pretty sure that's an isolated incident. Task 5828724704 originally was started on a GTX 750 Ti but, following a reboot, restarted on the GTX 960. Before the restart, it looks like it was running fine, but afterwards it went haywire, identifying 25 bogus Triplets with non-numeric peaks (i.e, "peak=-nan"). I imagine that it's just some sort of restart timing issue, though perhaps on a restart like that the memory usage spikes in some way. Only if it happens again will I really be concerned. Anyway, that task is currently in an Inconclusive state but I expect it to go Invalid once the tie-breaker reports in.
ID: 1874886 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1874934 - Posted: 25 Jun 2017, 2:55:06 UTC
Last modified: 25 Jun 2017, 3:12:27 UTC

Judging by the following 3 Inconclusive WUs, I have to assume there were significant changes to the Best Gaussian reporting between x41p_zi3j and x41p_zi3t2b. Could identifying where those changes occurred help identify where the issue might be now?

Workunit 2584872714 (07no16ab.5263.633293.16.43.234)
Task 5829506938 (S=12, A=2, P=0, T=1, G=0, BG=8.010661) x41p_zi3t2b, Cuda 8.00 special
Task 5829506939 (S=12, A=2, P=0, T=1, G=0, BG=4.340267) x41p_zi3j, Cuda 8.00 special

Workunit 2584872738 (07no16ab.5263.633293.16.43.239)
Task 5829506946 (S=0, A=0, P=0, T=2, G=0, BG=8.485233) x41p_zi3t2b, Cuda 8.00 special
Task 5829506947 (S=0, A=0, P=0, T=2, G=0, BG=5.217659) x41p_zi3j, Cuda 8.00 special

Workunit 2584872762 (07no16ab.5263.633293.16.43.245)
Task 5829506954 (S=13, A=0, P=0, T=4, G=0, BG=3.93118) x41p_zi3j, Cuda 8.00 special
Task 5829506955 (S=13, A=0, P=0, T=4, G=0, BG=7.523284) x41p_zi3t2b, Cuda 8.00 special

EDIT: I also just noticed an odd one where both hosts are apparently running the same app, just with different command line parameters. The only difference I can see is a very subtle one in the Best Pulse (peak=4.157121 vs. peak=4.282314). Is this the sort of thing that the zi3t2v is supposed to correct?

Workunit 2585082982 (12no16ab.30367.387921.9.36.65)
Task 5829947327 (S=0, A=3, P=0, T=2, G=0, BG=5.81025) x41p_zi3t2b, Cuda 8.00 special
Task 5829947328 (S=0, A=3, P=0, T=2, G=0, BG=5.81025) x41p_zi3t2b, Cuda 8.00 special
ID: 1874934 · Report as offensive
Stephen "Heretic" Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 20 Sep 12
Posts: 5557
Credit: 192,787,363
RAC: 628
Australia
Message 1874966 - Posted: 25 Jun 2017, 12:02:25 UTC - in response to Message 1835988.  


As soon as I (or anyone) has managed to isolate and fix the unroll bug, the special alpha will become available in Windows form. It's not now, only because the mass devastation of the project results that would occur if released. Fortunately have more time for digging from a couple of days ago, though not promising quick fixes. The pulsefinding is unfortunately the most complex part of these applications.


. . Not that I have the foggiest about the unroll bug but I like the idea of a test version being available for windows. However long it takes it won't be any worse than waiting for this low profile 1050 ti that MSI touted. Nearly a month and not a peep since they posted it's existence.

Stephen

.


. . OK. so the Windows Alpha/Beta test version has taken longer to arrive than the LP 1050ti did :)

Stephen

:)
ID: 1874966 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1874990 - Posted: 25 Jun 2017, 15:17:48 UTC - in response to Message 1874793.  
Last modified: 25 Jun 2017, 15:18:14 UTC


One thing that I think I'm noticing is that when there is a reported Gaussian, that peak will match the Best Gaussian peak in SoG. However, in the other apps, the Best Gaussian will have a higher peak than the reported Gaussian. Perhaps there's some significance there. Or perhaps not. :^)

Yes, it's very important observation (though it doesn't cover 8.05 vs CPU opt difference).
OpenCL apps stop Gaussian search with params worse than already found reportable one (that is, best Gaussian will be one of reportable ones if reportable exist)
AFAIK same feature (not report non-reportable as best if reportable already found) exists in CPU path too. Need to check that.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1874990 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1874991 - Posted: 25 Jun 2017, 15:20:45 UTC - in response to Message 1874966.  
Last modified: 25 Jun 2017, 15:21:41 UTC


. . OK. so the Windows Alpha/Beta test version has taken longer to arrive than the LP 1050ti did :)

Stephen

:)


This sentence would have more meanig with NV/MSI/whatever vs Petri's budget citation...
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1874991 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1875007 - Posted: 25 Jun 2017, 16:59:23 UTC
Last modified: 25 Jun 2017, 17:05:02 UTC

After a couple days, I found another Inconclusive against a CPU. It's a matter of missing Triplets, the CUDA App missed three that two CPUs found...first time with this result. The other couple were with bad Best Pulses. Yes, even with zi3v you still get an occasional Bad Best Pulse, but it's extremely rare. The Beta results are here, https://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=9837856
I ran it with the OSX CPU and got;
Version info: SSE4.1xjf (Intel, Core 2-optimized v8-nographics) V5.13 by Alex Kan
SSE4.1xjf OS X 64bit Build 3344 , Ported by : Raistmer, JDWhale, Urs Echternacht

Work Unit Info:
...............
Credit multiplier is :  2.85
WU true angle range is :  7.386570
Triplet: peak=7.026053, time=34.2, period=0.1917, d_freq=1420949707.03, chirp=0, fft_len=8 
Triplet: peak=9.419447, time=14.05, period=0.01065, d_freq=1420950927.73, chirp=0, fft_len=16 
Triplet: peak=9.904287, time=14.05, period=0.01065, d_freq=1420950927.73, chirp=0, fft_len=16 
Triplet: peak=7.470804, time=80.6, period=0.1483, d_freq=1420951538.09, chirp=0, fft_len=16 
Triplet: peak=7.411139, time=26.32, period=0.01475, d_freq=1420948791.5, chirp=0, fft_len=32 
Triplet: peak=7.377739, time=44.18, period=0.06062, d_freq=1420944213.87, chirp=0, fft_len=32 
Autocorr: peak=17.97998, time=87.24, delay=5.87, d_freq=1420945408.22, chirp=-21.29, fft_len=128k
Spike: peak=24.03246, time=6.711, d_freq=1420944476.05, chirp=-29.499, fft_len=128k
Triplet: peak=6.846987, time=73.11, period=0.1065, d_freq=1420946887.11, chirp=69.956, fft_len=32 
Triplet: peak=7.447862, time=26.32, period=0.01475, d_freq=1420948801.88, chirp=69.956, fft_len=32 
Triplet: peak=7.316117, time=26.32, period=0.01475, d_freq=1420948781.13, chirp=-69.956, fft_len=32 

Best spike: peak=24.03246, time=6.711, d_freq=1420944476.05, chirp=-29.499, fft_len=128k
Best autocorr: peak=17.97998, time=87.24, delay=5.87, d_freq=1420945408.22, chirp=-21.29, fft_len=128k
Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.121e+11, d_freq=0,
	score=-12, null_hyp=0, chirp=0, fft_len=0 
Best pulse: peak=2.541833, time=34.19, period=0.04792, d_freq=1420949707.03, score=0.8909, chirp=0, fft_len=8 
Best triplet: peak=9.904287, time=14.05, period=0.01065, d_freq=1420950927.73, chirp=0, fft_len=16 

Spike count:    1
Autocorr count: 1
Pulse count:    0
Triplet count:  9
Gaussian count: 0
Time cpu in use since last restart: 3618.6 seconds

So, after 2800 tasks, I get a Bad triplet count. I suppose I shouldn't complain. Could have one of those Cosmic Rays I suppose. The task is here, http://boinc2.ssl.berkeley.edu/beta/download/3d/04oc08ab.15453.20522.13.47.97
Anyway, this will give people a chance to compare my SSE4.1xjf OS X 64bit Build 3344 Results to the new SETI@home v8 v8.06 (alt) windows_x86_64 Results. Looks pretty close...
ID: 1875007 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1875010 - Posted: 25 Jun 2017, 17:30:46 UTC - in response to Message 1875007.  

Yes, even with zi3v you still get an occasional Bad Best Pulse, but it's extremely rare.
That may explain this one then. I saw it last evening but had refrained from posting it until that machine had more of a processing history behind it.

Workunit 2567983999 (20oc08aa.4777.254820.12.39.5)
Task 5794100079 (S=10, A=3, P=0, T=0, G=0, BG=0) v8.22 (opencl_ati_cat132) windows_intelx86
Task 5829376759 (S=10, A=3, P=0, T=0, G=0, BG=0) x41p_zi3v, Cuda 8.00 special

v8.22 (opencl_ati_cat132) windows_intelx86 - Best pulse: peak=0.4685673, time=98.45, period=0.01441, d_freq=1420048834.69, score=0.9218, chirp=-61.928, fft_len=8
x41p_zi3v, Cuda 8.00 special - Best pulse: peak=0.3951461, time=68.92, period=0.0147, d_freq=1420052490.23, score=0.7774, chirp=0, fft_len=8

All the reported signals and Best signals seem to match between the two.
ID: 1875010 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1875012 - Posted: 25 Jun 2017, 17:43:46 UTC - in response to Message 1875010.  

Yes, even with zi3v you still get an occasional Bad Best Pulse, but it's extremely rare.
That may explain this one then. I saw it last evening but had refrained from posting it until that machine had more of a processing history behind it.

Workunit 2567983999 (20oc08aa.4777.254820.12.39.5)
Task 5794100079 (S=10, A=3, P=0, T=0, G=0, BG=0) v8.22 (opencl_ati_cat132) windows_intelx86
Task 5829376759 (S=10, A=3, P=0, T=0, G=0, BG=0) x41p_zi3v, Cuda 8.00 special

v8.22 (opencl_ati_cat132) windows_intelx86 - Best pulse: peak=0.4685673, time=98.45, period=0.01441, d_freq=1420048834.69, score=0.9218, chirp=-61.928, fft_len=8
x41p_zi3v, Cuda 8.00 special - Best pulse: peak=0.3951461, time=68.92, period=0.0147, d_freq=1420052490.23, score=0.7774, chirp=0, fft_len=8

All the reported signals and Best signals seem to match between the two.

It differs from usual bug signature - there is no reportable Pulse so can be just difference in precision range.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1875012 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1875014 - Posted: 25 Jun 2017, 17:49:13 UTC - in response to Message 1875010.  
Last modified: 25 Jun 2017, 17:50:23 UTC

The only way to tell for sure is to run the task with your CPU and compare the results. You should give that a try, you can run a CPU task in the benchmark App while running BOINC. Just reduce the CPU usage by One in BOINC and remove any Apps from the APPS folder in the Benchmark package. The CPU App in the REF_APPS folder will search the WU folder and run any task it doesn't have results for. The Benchmark tool is here, KWSN Linux MB Bench v2.01.08. Extract the KWSN-Bench-Linux-MBv7_v2.01.08.7z to your Home folder and run it from there.
ID: 1875014 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1875015 - Posted: 25 Jun 2017, 18:05:02 UTC - in response to Message 1875014.  

The only way to tell for sure is to run the task with your CPU and compare the results. You should give that a try, you can run a CPU task in the benchmark App while running BOINC. Just reduce the CPU usage by One in BOINC and remove any Apps from the APPS folder in the Benchmark package. The CPU App in the REF_APPS folder will search the WU folder and run any task it doesn't have results for. The Benchmark tool is here, KWSN Linux MB Bench v2.01.08. Extract the KWSN-Bench-Linux-MBv7_v2.01.08.7z to your Home folder and run it from there.
Thanks. I may try that later. I actually was looking in Lunatics for the Windows CPU bench tool yesterday, but couldn't find one that said v8 on it, or in the Readme, so I wasn't sure what was current.
ID: 1875015 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1875019 - Posted: 25 Jun 2017, 18:14:28 UTC - in response to Message 1875015.  

Just use v8 app as ref one.
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1875019 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1875020 - Posted: 25 Jun 2017, 18:20:50 UTC - in response to Message 1875019.  

Just use v8 app as ref one.
Okay, so for Windows, the MBbench210 is the most current?
ID: 1875020 · Report as offensive
TBar
Volunteer tester

Send message
Joined: 22 May 99
Posts: 5204
Credit: 840,779,836
RAC: 2,768
United States
Message 1875023 - Posted: 25 Jun 2017, 18:35:50 UTC
Last modified: 25 Jun 2017, 18:44:34 UTC

Here's an example of how valuable the tool can be. I had another task that showed zi3v as finding One less triplet, so, even though the other App is a Known Usual Suspect I ran it with the CPU just to make sure.
The task is here, https://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=9838044 My CPU was a Very Close Match to zi3v;
SSE4.1xjf OS X 64bit Build 3344 , Ported by : Raistmer, JDWhale, Urs Echternacht
Work Unit Info:
...............
Credit multiplier is :  2.85
WU true angle range is :  8.626744
Triplet: peak=7.835223, time=7.977, period=0.1479, d_freq=1420284423.83, chirp=0, fft_len=8 
Triplet: peak=7.51159, time=9.684, period=0.2327, d_freq=1420278930.66, chirp=0, fft_len=16 
Triplet: peak=7.83766, time=106.4, period=0.1303, d_freq=1420280761.72, chirp=0, fft_len=16 
Triplet: peak=7.675463, time=39.55, period=0.1098, d_freq=1420285339.36, chirp=0, fft_len=32 
Triplet: peak=7.382066, time=78, period=0.008192, d_freq=1420287170.41, chirp=0, fft_len=32 
Triplet: peak=14.74022, time=79.87, period=0.1065, d_freq=1420286063.34, chirp=81.663, fft_len=32 
Triplet: peak=7.78183, time=102.1, period=0.1458, d_freq=1420285852.58, chirp=-81.663, fft_len=32 

Best spike: peak=23.52901, time=20.13, d_freq=1420281048.53, chirp=-14.024, fft_len=128k
Best autocorr: peak=17.05683, time=46.98, delay=2.9592, d_freq=1420282416.39, chirp=-16.748, fft_len=128k
Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.121e+11, d_freq=0,
	score=-12, null_hyp=0, chirp=0, fft_len=0 
Best pulse: peak=4.714036, time=80, period=0.1057, d_freq=1420286074.44, score=0.8968, chirp=81.663, fft_len=32 
Best triplet: peak=14.74022, time=79.87, period=0.1065, d_freq=1420286063.34, chirp=81.663, fft_len=32 

Spike count:    0
Autocorr count: 0
Pulse count:    0
Triplet count:  7
Gaussian count: 0
Time cpu in use since last restart: 3568.2 seconds

So, I'm pretty sure the Triplet find in zi3v is working normally.
ID: 1875023 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 1875025 - Posted: 25 Jun 2017, 19:16:57 UTC - in response to Message 1875020.  

Just use v8 app as ref one.
Okay, so for Windows, the MBbench210 is the most current?

maybe not but fully adequate for your aims
SETI apps news
We're not gonna fight them. We're gonna transcend them.
ID: 1875025 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1875032 - Posted: 25 Jun 2017, 19:40:32 UTC - in response to Message 1875025.  

Just use v8 app as ref one.
Okay, so for Windows, the MBbench210 is the most current?

maybe not but fully adequate for your aims
Okay, thanks. Your "Addition to MBBench v2.13" had made me think that there might be a more recent version that I just couldn't find.

I had downloaded the 210 version yesterday, so perhaps I'll start playing with it later on.
ID: 1875032 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1875057 - Posted: 25 Jun 2017, 21:44:41 UTC - in response to Message 1875010.  

...All the reported signals and Best signals seem to match between the two.


If implemented as I picture: For the pulse mechanism shunt/workaround, the stderr.txt 'realtime' log might see the racing pulse detections, then shunt to unroll 1 to record the correct ones. If that's the case, it does reflect reality in the new 'racey-fixey' kindof way, but may need to be presented more clearly.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1875057 · Report as offensive
Profile Jeff Buck Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester

Send message
Joined: 11 Feb 00
Posts: 1441
Credit: 148,764,870
RAC: 0
United States
Message 1875060 - Posted: 25 Jun 2017, 21:55:17 UTC - in response to Message 1875057.  

...All the reported signals and Best signals seem to match between the two.


If implemented as I picture: For the pulse mechanism shunt/workaround, the stderr.txt 'realtime' log might see the racing pulse detections, then shunt to unroll 1 to record the correct ones. If that's the case, it does reflect reality in the new 'racey-fixey' kindof way, but may need to be presented more clearly.
Ah, so perhaps the actual Result file would contain a different Best Pulse value than the Stderr shows?
ID: 1875060 · Report as offensive
Previous · 1 . . . 34 · 35 · 36 · 37 · 38 · 39 · 40 . . . 83 · Next

Message boards : Number crunching : Linux CUDA 'Special' App finally available, featuring Low CPU use


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.