Message boards :
Number crunching :
Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation
Previous · 1 . . . 31 · 32 · 33 · 34 · 35 · 36 · 37 . . . 83 · Next
Author | Message |
---|---|
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Those seem like pretty low peaks to start with for best Gaussian. 1 in 300 with contention deep in the noise floor 'Feels' as though we're pushing technology limits (once again), but it will warrant more definite understanding either way. Plotting the PoT data from the results and visually comparing if they look anything alike might say something. My suspicion is they won't look very 'Gaussiany' at all. If so, pushing further into the noisefloor, while possible, may be fruitless. Eric's ruled out that we need double-precision or bit-Identical results below reportable thresholds (in the case of Gaussians, iirc score derived from the ChiSq Fit and null hypothesis). [Edit:] That's also at a very high chirp rate near chirp limits, so one or another application struggling with that, especially 64-bit builds, wouldn't be surprising or unacceptable. Cumulative error will be at its greatest there. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I found the reference in the PMs with Petri from April 16th; It seems the Best gaussian is off in a few tasks and some tasks are reporting an extra Pulse.That was with zi3t2b on results from Main. Fast forward two months and I'm seeing the same results with zi3v on Beta. The CPUs match the CUDA App over the SoG ones. Here's a couple more; https://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=9809103 https://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=9796907 They are around the 3 to 4 range, if that's too low maybe it needs to be raised. But, the CUDA Apps don't seem to have a problem with it. So, who broke the repository? I've been getting the same error for over a day, https://setisvn.ssl.berkeley.edu/trac/browser/seti_boinc |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Just a comment, I think this may deserve it's own thread since it appears this is leaning towards an SoG issue more than CUDA. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Just a comment, I think this may deserve it's own thread since it appears this is leaning towards an SoG issue more than CUDA. Say What? It's about proving the CUDA App should be accepted to Beta as it agrees with the CPU Apps Better than the current 'standard'. I just checked...it is My name on the thread. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Are these the sorts of "Best gaussian" mismatches that you're looking into? Workunit 2573263722 (23se08ac.6875.22968.6.33.135) Task 5805117074 (S=3, A=0, P=1, T=3, G=0) x41p_zi3t2b, Cuda 8.00 special Task 5805117075 (S=3, A=0, P=1, T=3, G=0) v8.22 (opencl_nvidia_SoG) windows_intelx86 Cuda 8.00 special - Best gaussian: peak=3.252388, mean=0.5397108, ChiSq=1.344394, time=14.26, d_freq=1418816790.11, score=-1.169299, null_hyp=2.144445, chirp=-39.071, fft_len=16k v8.22 SoG - Best gaussian: peak=3.76217, mean=0.5480909, ChiSq=1.226871, time=39.43, d_freq=1418822660.68, score=-1.169124, null_hyp=2.078196, chirp=43.425, fft_len=16k Workunit 2573397376 (23se08ac.6117.29512.7.34.110) Task 5805397040 (S=9, A=1, P=0, T=1, G=2) v8.22 (opencl_nvidia_SoG) windows_intelx86 Task 5805397041 (S=9, A=1, P=0, T=1, G=2) x41p_zi3t2b, Cuda 8.00 special Cuda 8.00 special - Best gaussian: peak=3.803741, mean=0.5273897, ChiSq=1.256179, time=51.17, d_freq=1421074725.21, score=1.100755, null_hyp=2.216236, chirp=46.042, fft_len=16k v8.22 SoG - Best gaussian: peak=3.719964, mean=0.5246363, ChiSq=1.375913, time=52.85, d_freq=1421074802.65, score=1.091865, null_hyp=2.283048, chirp=46.046, fft_len=16k Workunit 2576907391 (26mr17aa.25216.7429.14.41.247) Task 5812811343 (S=0, A=1, P=2, T=0, G=0) v8.22 (opencl_nvidia_SoG) windows_intelx86 Task 5812811344 (S=0, A=1, P=2, T=0, G=0) x41p_zi3t2b, Cuda 8.00 special Cuda 8.00 special - Best gaussian: peak=3.681468, mean=0.5749262, ChiSq=1.387973, time=34.39, d_freq=1419914966.94, score=-1.986706, null_hyp=2.127978, chirp=-74.326, fft_len=16k v8.22 SoG - Best gaussian: peak=3.074103, mean=0.5448458, ChiSq=1.296905, time=99.82, d_freq=1419915193.65, score=-1.983227, null_hyp=2.072506, chirp=-74.55, fft_len=16k Workunit 2577622008 (03my17ab.4903.11519.16.43.91) Task 5814309939 (S=0, A=2, P=0, T=7, G=0) v8.22 (opencl_nvidia_SoG) windows_intelx86 Task 5814309940 (S=0, A=2, P=0, T=7, G=0) x41p_zi3t2b, Cuda 8.00 special Cuda 8.00 special - Best gaussian: peak=8.449782, mean=0.7001557, ChiSq=1.376348, time=9.227, d_freq=1420886348.79, score=-1.853296, null_hyp=2.074755, chirp=-14.824, fft_len=16k v8.22 SoG - Best gaussian: peak=7.661397, mean=0.6757016, ChiSq=1.342818, time=71.3, d_freq=1420892014.37, score=-1.852513, null_hyp=2.053577, chirp=-93.509, fft_len=16k Workunit 2580203063 (28mr17ac.1412.331287.5.32.153) Task 5819707063 (S=0, A=0, P=2, T=1, G=1) x41p_zi3t2b, Cuda 8.00 special Task 5819707064 (S=0, A=0, P=2, T=1, G=1) v8.20 (opencl_ati5_SoG_mac) x86_64-apple-darwin Cuda 8.00 special - Best gaussian: peak=3.722415, mean=0.5437903, ChiSq=1.365831, time=37.75, d_freq=1418989730.74, score=0.4701011, null_hyp=2.247322, chirp=-45.357, fft_len=16k v8.20 ATI SoG Mac - Best gaussian: peak=3.699681, mean=0.5424874, ChiSq=1.395795, time=39.43, d_freq=1418989654.65, score=0.3231449, null_hyp=2.257735, chirp=-45.357, fft_len=16k I spotted a couple more in my Inconclusives list, both also against v8.22 SoG. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Did you run the tasks with your CPU to see which matched better? My CPUs would take about two hours a piece on them. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Did you run the tasks with your CPU to see which matched better? My CPUs would take about two hours a piece on them.No, I've never taken the time to set up to run stand-alone tasks. I've tended to leave that to the developers, if any of the WUs I've identified look sufficiently interesting. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Just a comment, I think this may deserve it's own thread since it appears this is leaning towards an SoG issue more than CUDA. It's going to be pretty important to discuss these things here, because the special is the new kid on the block, with the most changes in quite a while. If it turns out the SoG app needs some attention, then that's a good thing, because it solidifies knowledge all around. My unconfirmed suspicion is that the SoG app may still be using an OpenCL derivation of the single precision chirp I made for Pre-Fermi (CUDA), which was tailored for unique Pre-Fermi characteristics, namely that Pre-Fermi Cuda devices don't have IEEE-754 floating point compliance, and in Pre-GTX2xx cases no double precision at all, therefore it won't necessarily compile to the most accurate GPU code on Fermi or later devices. It was made specifically for G80 type devices. I switched to double precision chirp for newer devices many moons ago. So under the hood there is valuable history to take into account, and probably should be properly documented one day. As for 'Allowing on Beta' I'd concur the special needs to be run extensively under anonymous platform, so will be aiming for some trial builds ASAP. As previously mentioned, stock distribution is problematic because of Boinc server limitations more or less demanding a quite generally compatible app. There may be a driver version cuttoff where Pre-Fermi or Fermi class drop support, though generalising to even Kepler class onwards will still require significant work. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
I agree it does need more discussion, and sorry TBarr, I didn't mean anything by it really. Just that it will be an ongoing discussion not solely regarding the CUDA8 app. TBarr, can you send me the new app and I will run a batch through on both computers during maintenance. That will give you another 500 tasks to look at. You have my email, or PM me and I'll send it. I'm glad to see this app is finally making it though to final testing. |
Stephen "Heretic" Send message Joined: 20 Sep 12 Posts: 5557 Credit: 192,787,363 RAC: 628 |
I agree it does need more discussion, and sorry TBarr, I didn't mean anything by it really. Just that it will be an ongoing discussion not solely regarding the CUDA8 app. . . For what it is worth things are running fine on my rigs (servers not sending work issues not withstanding) and the results are highly consistent. Inconclusives are still a little high on the Pentium-D (2 x 1060s) at about 7.2% but so far only one or maybe 2 invalids and they were noise bombs (early overflow tasks). That isn't bad out of many hundreds of jobs (been running for about 2 months- 60 days at over 700 Wus per day). . . The numbers aren't any better ATM on the Core2 Duo with the 1050ti at about 7.5% but it has been down below 5% at times. So it is variable with different work loads. And I cannot recall a single invalid on that machine. . . Seems a good Beta candidate to me ... Stephen :) |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
TBarr, can you send me the new app and I will run a batch through on both computers during maintenance. That will give you another 500 tasks to look at.I'll post it at Crunchers Anonymous in a while. This version has the Blocking Sync & Unroll Autotune on by Default. You can override it by adding the CmdLines -nobs & -unroll N if you wish. It is a few seconds faster using -nobs. This version uses a little more Video Ram than before meaning GPUs with less than 2 GBs may not work and GPUs with 2 GBs may have problems using Unroll 8. It you have problems lower the -unroll to 6 or lower. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Oh well. It appears since moving to the new server I'm not allowed to upload. All I get is; 403So....you'll have to wait. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Oh well. It appears since moving to the new server I'm not allowed to upload. All I get is; Contact Arkayn, as he did message about the ftp server shifting a week or so ago. If problems still by the weekend, let me know and I'll put it to jgopt.org (if you email the binaries). Once proper broadband is installed here in a month or so, I'll be rehosting my own domain to in my living room, so there will be juggling nomatter which of those works, but you could share it on Google Drive or similar in .7z form or similar if you're stuck. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Well, apparently the less expensive server allows less of an upload. Now instead of 20Mb you get 2.5MB, which is barely enough for just the CUDA App. So, the new download contains just the CUDA App. You'll have to supply the rest yourself. zi3v is here, http://www.arkayn.us/forum/index.php?topic=197.msg4499#msg4499 To use the App at Beta you need to change the app_info from <version_num>801</version_num> to <version_num>802</version_num>. For some reason Beta keeps deleting the Files when I use <version_num>801</version_num>. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
OK. Will be some (yet more, *sigh*) juggling here, as better broadband arrives tomorrow, a month sooner than expected. Teething problems are likely, though will factor in setting aside some hosting space for various and sundry as the dust settles. Hopefully things get less difficult as time goes on. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Workunit 2576907391 (26mr17aa.25216.7429.14.41.247)Hmm....looks like the tiebreaker for this WU agreed with SoG. All got credit, though. SETI@home v8 v8.05 windows_x86_64 - Best gaussian: peak=3.074107, mean=0.5448451, ChiSq=1.296909, time=99.82, d_freq=1419915193.65, score=-1.983146, null_hyp=2.072513, chirp=-74.55, fft_len=16k |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Here's how it plays out on my Mac, I'll run it in Linux shortly. Of course, it needs to be run a couple of times to determine if it's consistent. Last login: Mon Jun 19 21:15:41 on ttys000 TomsMacPro:~ Tom$ cd /Users/Tom/KWSN-OSX-bench-MB TomsMacPro:KWSN-OSX-bench-MB Tom$ ./benchmark KWSN-Darwin-MBbench v2.1.08 Running on TomsMacPro.local at Wed Jun 21 03:15:58 2017 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : 03my17ab.4903.11519.16.43.91.wu 04oc08ab.31484.890.13.47.11.wu 23se08ac.6117.29512.7.34.110.wu 23se08ac.6875.22968.6.33.135.wu 26mr17aa.25216.7429.14.41.247.wu 28mr17ac.1412.331287.5.32.153.wu Listing executable(s) in /APPS : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda80 Listing executable in /REF_APPs : MBv8_8.05r3344_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: 03my17ab.4903.11519.16.43.91.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 6911 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda80 -device 0 194.25 real 28.60 user 20.28 sys Elapsed Time : ……………………………… 194 seconds Speed compared to default : 3562 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 0 0 0 0 0 0 0 0 0 Autocorr 0 2 2 2 0 0 2 2 2 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 0 0 0 0 0 0 0 0 0 Triplet 1 7 7 7 0 1 7 7 7 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 0 0 0 0 1 0 0 0 0 1 Best Pulse 0 1 1 1 0 0 1 1 1 0 Best Triplet 0 1 1 1 0 0 1 1 1 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 1 13 13 13 1 1 13 13 13 1 Unmatched signal(s) in R1 at line(s) 528 Unmatched signal(s) in R2 at line(s) 528 For R1:R2 matched signals only, Q= 99.97% Result : Weakly similar. --------------------------------------------------- Done with 03my17ab.4903.11519.16.43.91.wu. Current WU: 04oc08ab.31484.890.13.47.11.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 9092 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda80 -device 0 291.39 real 44.60 user 30.65 sys Elapsed Time : ……………………………… 291 seconds Speed compared to default : 3124 % ----------------- Comparing results Result : Strongly similar, Q= 99.94% --------------------------------------------------- Done with 04oc08ab.31484.890.13.47.11.wu. Current WU: 23se08ac.6117.29512.7.34.110.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 8085 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda80 -device 0 278.72 real 42.63 user 29.38 sys Elapsed Time : ……………………………… 278 seconds Speed compared to default : 2908 % ----------------- Comparing results Result : Strongly similar, Q= 99.93% --------------------------------------------------- Done with 23se08ac.6117.29512.7.34.110.wu. Current WU: 23se08ac.6875.22968.6.33.135.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 8701 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda80 -device 0 290.45 real 44.42 user 30.39 sys Elapsed Time : ……………………………… 290 seconds Speed compared to default : 3000 % ----------------- Comparing results Result : Strongly similar, Q= 99.97% --------------------------------------------------- Done with 23se08ac.6875.22968.6.33.135.wu. Current WU: 26mr17aa.25216.7429.14.41.247.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 8556 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda80 -device 0 300.71 real 45.47 user 31.39 sys Elapsed Time : ……………………………… 301 seconds Speed compared to default : 2842 % ----------------- Comparing results ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 0 0 0 0 0 0 0 0 0 Autocorr 0 1 1 1 0 0 1 1 1 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 2 2 2 0 0 2 2 2 0 Triplet 0 0 0 0 0 0 0 0 0 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 0 0 0 0 1 0 0 0 0 1 Best Pulse 0 1 1 1 0 0 1 1 1 0 Best Triplet 0 0 0 0 0 0 0 0 0 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 0 6 6 6 1 0 6 6 6 1 Unmatched signal(s) in R1 at line(s) 448 Unmatched signal(s) in R2 at line(s) 448 For R1:R2 matched signals only, Q= 99.97% Result : Weakly similar. --------------------------------------------------- Done with 26mr17aa.25216.7429.14.41.247.wu. Current WU: 28mr17ac.1412.331287.5.32.153.wu --------------------------------------------------- Skipping default app MBv8_8.05r3344_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 8483 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda80 -device 0 294.20 real 44.19 user 30.76 sys Elapsed Time : ……………………………… 294 seconds Speed compared to default : 2885 % ----------------- Comparing results Result : Strongly similar, Q= 99.77% --------------------------------------------------- Done with 28mr17ac.1412.331287.5.32.153.wu. Done with Benchmark run! Removing temporary files! TomsMacPro:KWSN-OSX-bench-MB Tom$ CUDA Wins 4 to 2 So...that's twice as good. |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
Thanks TBar for zi3v, running it now. Petri gave me zi3w which is 'basically the same' which I ran through maintenance, so some tasks are from that app. Computers you can watch: 1080+1080+980 1070+750Ti |
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304 |
OK. Will be some (yet more, *sigh*) juggling here, as better broadband arrives tomorrow, a month sooner than expected. Don't count your chickens until they're hatched. When I organized to get NBN it was going to be several months wait. Then all of a sudden I get a call to organise a time after only a couple of weeks. Somehow the report I made 6 months earlier, and the points I mentioned when organising the appointment, didn't get passed on to the contractor doing the work. They thought it was a straight forward job. It wasn't. So he comes out, has a look, says "you've got to be joking" and then calls both the NBN and my ISP and he tells them what needs to be done before he can setup my NBN modem. Organised another appointment with another contractor a couple of weeks later- they didn't get the message the previous contractor left with both the NBN & my ISP. A few days later (was expecting it to be weeks again) and the second contractor calls me back to make an appointment. This time he's got a bloke with trenching & tunneling equipment- a couple of hours later I had 40/100Mbs NBN. Good luck. Grant Darwin NT |
Brent Norman Send message Joined: 1 Dec 99 Posts: 2786 Credit: 685,657,289 RAC: 835 |
a couple of hours later I had 40/100Mbs NBN.Sigh, and the best I can get is 620k/5Mb :( And I am literally sitting about 600 feet away from a split in the TranCanada Fibre! |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.