Message boards :
Number crunching :
Linux CUDA 'Special' App finally available, featuring Low CPU use
Message board moderation
Previous · 1 . . . 72 · 73 · 74 · 75 · 76 · 77 · 78 . . . 83 · Next
Author | Message |
---|---|
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Well, there is no such thing as best non-reportable triplet. So, if it missed reporatble it's OK that best is zero. Missed signal itself isn't OK though. I'll check this task with different apps. And regarding any OpenCL bug fixing - thinngs don't go well so far. I'm able to build working CPU binariesalready with VS2017, but attempt tobuild OpenCL one versus Intel's SDK failed spectacularly. There is number of incompatibilities with app's header files. Some of them from oclFFT library... SETI apps news We're not gonna fight them. We're gonna transcend them. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I don't believe I've seen this type of Inconclusive previously...Actually, we've seen it quite a bit. It sometimes happens when you see this in the results; Restarted at 30.64 percent, with setiathome enhanced x41p_zi3v, Cuda 9.00 specialIt's even in the README, 6) The App may give Incorrect results on a restarted task. One way to avoid restarted tasks is to set the checkpoint higher than the task's estimated run-time, and also avoid suspending a task.Missed that one did ya? Until Petri gets a chance to fix the checkpointing you might want to move your checkpoint times a little higher. I ran it with my GPU, without a restart, and it matched my CPU, SETI@home using CUDA accelerated device GeForce GTX 1060 3GB Autocorr: peak=18.18838, time=62.99, delay=2.9876, d_freq=2201261727.47, chirp=-2.7683, fft_len=128k Pulse: peak=3.815869, time=45.86, period=9.093, d_freq=2201256231.14, score=1.018, chirp=11.355, fft_len=1024 Pulse: peak=4.737066, time=45.84, period=9.686, d_freq=2201258411.08, score=1.044, chirp=-14.715, fft_len=512 Triplet: peak=10.93747, time=34.56, period=20.22, d_freq=2201264522.54, chirp=-14.715, fft_len=512 Spike: peak=25.19361, time=5.727, d_freq=2201263188.56, chirp=17.945, fft_len=128k Spike: peak=26.20529, time=5.727, d_freq=2201263188.57, chirp=17.946, fft_len=128k Spike: peak=26.08345, time=5.727, d_freq=2201263188.58, chirp=17.948, fft_len=128k Spike: peak=24.86258, time=5.727, d_freq=2201263188.59, chirp=17.949, fft_len=128k Pulse: peak=2.708074, time=45.84, period=4.787, d_freq=2201259393.53, score=1.009, chirp=36.466, fft_len=512 Pulse: peak=7.60402, time=45.99, period=18.85, d_freq=2201267130.34, score=1.03, chirp=-37.765, fft_len=4k Pulse: peak=1.263832, time=45.86, period=1.535, d_freq=2201263339.07, score=1.03, chirp=42.064, fft_len=1024 Pulse: peak=2.37489, time=45.82, period=4.289, d_freq=2201263569.34, score=1.015, chirp=80.288, fft_len=256 Spike: peak=24.37914, time=10.02, d_freq=2201262803.27, chirp=83.569, fft_len=32k Spike: peak=24.39295, time=10.02, d_freq=2201262803.23, chirp=83.671, fft_len=32k Best spike: peak=26.20529, time=5.727, d_freq=2201263188.57, chirp=17.946, fft_len=128k Best autocorr: peak=18.18838, time=62.99, delay=2.9876, d_freq=2201261727.47, chirp=-2.7683, fft_len=128k Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.124e+11, d_freq=0, score=-12, null_hyp=0, chirp=0, fft_len=0 Best pulse: peak=4.737066, time=45.84, period=9.686, d_freq=2201258411.08, score=1.044, chirp=-14.715, fft_len=512 Best triplet: peak=10.93747, time=34.56, period=20.22, d_freq=2201264522.54, chirp=-14.715, fft_len=512 Spike count: 6 Autocorr count: 1 Pulse count: 6 Triplet count: 1 Gaussian count: 0 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : blc25_2bit_guppi_57895_51266_HIP2_0039.24131.818.24.47.42.vlar.wu Listing executable(s) in /APPS : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda90 Listing executable in /REF_APPs : MBv8_8.22r3711_sse41_x86_64-apple-darwin --------------------------------------------------- Current WU: blc25_2bit_guppi_57895_51266_HIP2_0039.24131.818.24.47.42.vlar.wu --------------------------------------------------- Skipping default app MBv8_8.22r3711_sse41_x86_64-apple-darwin, displaying saved result(s) Elapsed Time: ………………………………… 6585 seconds --------------------------------------------------- Running app with command : setiathome_x41p_zi3v_x86_64-apple-darwin_cuda90 -nobs -device 0 303.53 real 266.39 user 32.97 sys Elapsed Time : ……………………………… 304 seconds Speed compared to default : 2166 % ----------------- Comparing results Result : Strongly similar, Q= 99.27% --------------------------------------------------- Done with blc25_2bit_guppi_57895_51266_HIP2_0039.24131.818.24.47.42.vlar.wu. Raistmer, has anyone reported problems running your current AKv8 code with GCC 6.3? It seems to be failing in my Ubuntu 17.04, and there might be problems with GCC 5.4 as well. I was successful using the GCC 4.9.2 in Ubuntu 15.04. There might be an AVX2 Ryzen CPU App running around shortly even though I couldn't compile it in the recommended version of GCC. It's even possible it will have a twin Intel AVX2 CPU App to play with. Anyone interested in testing an AVX2 CPU App in Linux? |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
I have the AMD AVX2 app running as Reference now with the Intel AVX2 app running as Test. Results should be available in the morning. Can anyone give me some simple instructions on how to get the download fanout generator working? I read through the instruction in the macro but it isn't getting anything. I wanted a early overflow task of one of mine I already processed for another test WU. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
6) The App may give Incorrect results on a restarted task. One way to avoid restarted tasks is to set the checkpoint higher than the task's estimated run-time, and also avoid suspending a task. Then doesn't worth to run it offline.
No AFAIK. Regarding AVX2 - if optimizing compiler really good there could be some speedup but in general AVX2 adds integer instructions to AVX set so not very useful for SETI. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Until Petri gets a chance to fix the checkpointing you might want to move your checkpoint times a little higher. Preferable short-term solution would be to disable checkpointing in app's code at all then to have each user to change settings. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13164 Credit: 1,160,866,277 RAC: 1,873 |
The early results are looking very favorable for a significant performance increase for the AVX2 app over the SSE41 app and the AVX app. So compiler optimizations must be having an effect. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Nope, didn't miss it, but this result is just the opposite from the problems I've seen caused by a Special App restart. In this WU, the app apparently missed a Triplet. Every other Triplet issue I've seen from the Special App following a restart (and I reported on it months ago) involves the reporting of a massive number of bogus Triplets following the restart, all of them with "peak=nan", leading to an overflow that results in an Invalid. I still see from 1 to 3 of these a week. I've also explained why I use a 120 second checkpoint interval as a compromise. Anything more than that, I lose more overall processing time from my normal weekday restarts than I would gain from avoiding the Special App's restart problems.I don't believe I've seen this type of Inconclusive previously...Actually, we've seen it quite a bit. It sometimes happens when you see this in the results; And BTW, if you were to actually take a close look at where in the Stderr the Triplet was reported by the CPU app, you might notice that it falls between the first and second Pulse. Even on your GPU run, it falls immediately after the second Pulse and before the 4 Spikes. All of those signals were successfully reported by the Cuda9 zi3v before the restart, so it's not unreasonable to wonder where the Triplet went. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Regarding AVX2 - if optimizing compiler really good there could be some speedup but in general AVX2 adds integer instructions to AVX set so not very useful for SETI.I seem to recall you saying something similar back in February about the Mac AVX2 App. Right up to when it was tested, here, https://setiathome.berkeley.edu/forum_thread.php?id=80359&postid=1845916#1845916 --------------------------------------------------- Starting benchmark run... --------------------------------------------------- Listing wu-file(s) in /testWUs : 18dc09ah.26284.16432.6.33.125.wu reference_work_unit_r3215.wu Listing executable(s) in /APPS : MBv8_8.22r3605_avx2_x86_64-apple-darwin Listing executable in /REF_APPs : MBv8_8.06r3366_avx_x86_64-apple-darwin --------------------------------------------------- Current WU: 18dc09ah.26284.16432.6.33.125.wu --------------------------------------------------- Running default app with command : MBv8_8.06r3366_avx_x86_64-apple-darwin 1529.77 real 1522.93 user 1.39 sys Elapsed Time: ………………………………… 1530 seconds --------------------------------------------------- Running app with command : MBv8_8.22r3605_avx2_x86_64-apple-darwin 1328.03 real 1321.80 user 1.40 sys Elapsed Time : ……………………………… 1328 seconds Speed compared to default : 115 % ----------------- Comparing results Result : Strongly similar, Q= 100.0% --------------------------------------------------- From what I've seen the Linux AVX2 App has similar improvements. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
If you knew it was a Restart, then Why Not Mention it? Surely you would consider other people would mention it. I did better than notice where the Triplet was located, I ran ran the Task on My GPU to see how it reported. Mine reported Normally, I suggest you run the task again on Your GPU and see if it has the same problem when not restarted, if it does, You might have another problem. It's a good thing I'm not so stingy when it comes to GPU time, else we might not even have an App to test.Nope, didn't miss it, but this result is just the opposite from the problems I've seen caused by a Special App restart. In this WU, the app apparently missed a Triplet. Every other Triplet issue I've seen from the Special App following a restart (and I reported on it months ago) involves the reporting of a massive number of bogus Triplets following the restart, all of them with "peak=nan", leading to an overflow that results in an Invalid. I still see from 1 to 3 of these a week. I've also explained why I use a 120 second checkpoint interval as a compromise. Anything more than that, I lose more overall processing time from my normal weekday restarts than I would gain from avoiding the Special App's restart problems.I don't believe I've seen this type of Inconclusive previously...Actually, we've seen it quite a bit. It sometimes happens when you see this in the results; You seem to have a problem with the Serial CPU & Parallel GPU reporting. They Do Not report found signals in the same order. As long as there are less than 30 signals in the WU it isn't a problem. The problem only arises when there are over 30 signals in a WU such as with an Aborted Overflow, or a Restarted task when they are caught looking at different parts of the WU. If the serial CPU and Parallel GPU look at the same part of the WU they will find the Same signals in total, they Don't look at the same parts during processing. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
From what I've seen the Linux AVX2 App has similar improvements. Good for GCC. Hope you compare apples to apples and used SAME GCC version and SAME FFTW version for comparison between AVX2 and AVX binaries. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Okay, so the bottom line is that the Special App can not only report bogus signals (Triplets and Spikes) following a restart, which has been discussed previously, it can also miss signals that should have been reported, as in this example. That's good to know and just means that Petri needs to be looking at both situations when he gets around to trying to pin down the restart problems. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Alright, here's an unusual Inconclusive involving Triplets that does not have a restart in the mix. Workunit 2760143363 (26ja07aa.30143.294304.16.43.191) Task 6195365750 (S=1, A=0, P=0, T=19, G=0, BS=24.22082, BG=0) x41p_zi3v, Cuda 9.00 special Task 6195365751 (S=1, A=0, P=0, T=20, G=0, BS=24.22082, BG=0) v8.22 (opencl_nvidia_SoG) windows_intelx86 What looks really odd to me are two Triplets in the SoG app's output. Triplet: peak=9.234034, time=28.66, period=0.06226, d_freq=1419366068.31, chirp=92.974, fft_len=32 Triplet: peak=9.234034, time=28.67, period=0.0639, d_freq=1419366068.47, chirp=92.974, fft_len=32 The peaks are identical and the other values nearly so. The Cuda9 zi3v only reported the first of the two, so that's where the discrepancy lies. It'll be up to the tiebreaker (assigned to the stock Windows CPU app) to render a verdict as to whether the Cuda9 app actually missed a Triplet or if the SoG app just stuttered somehow. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Looks like the Windows CPU app confirmed the SoG results, reporting a total of 20 Triplets. Perhaps when the Special App found that second Triplet with the same peak, it thought it was a duplicate and discarded it. Might be worth looking at, though probably not as high a priority as some other issues. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
Regarding blc25_2bit_guppi_57895_47387_HIP91358_0034.24610.818.23.46.191.vlar.wu thatwas postedinthis thread some time ago: I confirmed disagreement between current stock windows builds on this task. MB8_win_x64_SSE3amd_VS2017_r3713.exe -nog / blc25_2bit_guppi_57895_47387_HIP91358_0034.24610.818.23.46.191.vlar.wu : R2: .\ref\ref-setiathome_8.00_windows_intelx86.exe-blc25_2bit_guppi_57895_47387_HIP91358_0034.24610.818.23.46.191.vlar.wu.res ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 5 5 5 0 0 5 5 5 0 Autocorr 0 0 0 0 0 0 0 0 0 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 10 10 10 0 0 10 10 10 1 Triplet 0 3 3 3 0 0 3 3 3 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 1 1 1 1 0 1 1 1 1 0 Best Pulse 0 1 1 1 0 0 1 1 1 0 Best Triplet 0 1 1 1 0 0 1 1 1 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 1 23 23 23 0 1 23 23 23 1 Unmatched signal(s) in R2 at line(s) 491 For R1:R2 matched signals only, Q= 99.94% Result : Weakly similar. R2: .\ref\ref-setiathome_8.05_windows_x86_64.exe-blc25_2bit_guppi_57895_47387_HIP91358_0034.24610.818.23.46.191.vlar.wu.res Result : Strongly similar, Q= 99.98% R2: .\ref\ref-setiathome_8.08_windows_x86_64__alt.exe-blc25_2bit_guppi_57895_47387_HIP91358_0034.24610.818.23.46.191.vlar.wu.res Result : Strongly similar, Q= 99.98% R2: .\ref\ref-setiathome_8.20_windows_intelx86__opencl_intel_gpu_sah.exe-blc25_2bit_guppi_57895_47387_HIP91358_0034.24610.818.23.46.191.vlar.wu.res ------------- R1:R2 ------------ ------------- R2:R1 ------------ Exact Super Tight Good Bad Exact Super Tight Good Bad Spike 0 5 5 5 0 0 5 5 5 0 Autocorr 0 0 0 0 0 0 0 0 0 0 Gaussian 0 0 0 0 0 0 0 0 0 0 Pulse 0 9 10 10 0 0 9 10 10 0 Triplet 0 3 3 3 0 0 3 3 3 0 Best Spike 0 1 1 1 0 0 1 1 1 0 Best Autocorr 0 1 1 1 0 0 1 1 1 0 Best Gaussian 1 1 1 1 0 1 1 1 1 0 Best Pulse 0 1 1 1 0 0 1 1 1 0 Best Triplet 0 1 1 1 0 0 1 1 1 0 ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 1 22 23 23 0 1 22 23 23 0 Result : Strongly similar, Q= 98.96% So, some of stock consider result of fresh buid fully correct, some disagree. SETI apps news We're not gonna fight them. We're gonna transcend them. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
I suggest to move to separate thread for this task. https://setiathome.berkeley.edu/forum_thread.php?id=82269&postid=1903422 SETI apps news We're not gonna fight them. We're gonna transcend them. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
I took the time to install Ubuntu 17.10 with GCC 7.2.0 and run the same line that works with GCC 4.9.2. You might want to look it over, it's about the same as what you get with GCC 6.3.x and lists 612 errors. AKv8_GCC-7.2.0_Errors.zip |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
Here's an unusual Inconclusive where the Special App, running on one of my machines, reported 2 Gaussians which the SoG app did not. Workunit 2764926755 (14fe07ab.5355.16841.5.32.110) Task 6205405922 (S=18, A=1, P=0, T=4, G=7, BS=27.00169, BG=3.570737) x41p_zi3v, Cuda 9.00 special Task 6205405923 (S=18, A=1, P=0, T=4, G=5, BS=27.00167, BG=3.570736) v8.22 (opencl_nvidia_SoG) windows_intelx86 The 7 Gaussians reported by the Special App are: Gaussian: peak=3.506042, mean=0.5093156, ChiSq=1.364717, time=96.47, d_freq=1421071421.02, score=3.369742, null_hyp=2.403931, chirp=-30.175, fft_len=16k Gaussian: peak=3.570737, mean=0.5002306, ChiSq=1.32693, time=98.15, d_freq=1421071370.4, score=4.532304, null_hyp=2.442002, chirp=-30.175, fft_len=16k Gaussian: peak=3.376572, mean=0.5267106, ChiSq=1.407839, time=62.91, d_freq=1421075607.9, score=1.498692, null_hyp=2.332992, chirp=51.563, fft_len=16k Gaussian: peak=3.437077, mean=0.5095716, ChiSq=1.392497, time=64.59, d_freq=1421075694.4, score=3.369557, null_hyp=2.420066, chirp=51.563, fft_len=16k Gaussian: peak=3.405566, mean=0.5179273, ChiSq=1.419629, time=66.27, d_freq=1421075780.91, score=2.323027, null_hyp=2.382857, chirp=51.563, fft_len=16k Gaussian: peak=3.506042, mean=0.5093156, ChiSq=1.364717, time=96.47, d_freq=1421069540.57, score=3.369742, null_hyp=2.403931, chirp=51.563, fft_len=16k Gaussian: peak=3.570737, mean=0.5002306, ChiSq=1.32693, time=98.15, d_freq=1421069627.08, score=4.532304, null_hyp=2.442002, chirp=51.563, fft_len=16k The SoG app also reported the first 5 of these. What makes the 2 extra reported by SoG rather intriguing is that almost all the reported values, except for d_freq and chirp, are identical to the first 2 reported. Following the Gaussians, each app reported 10 more Spikes which, for the SoG, app brought the total signals to 28. However, because of the 2 extra Gaussians, the Special App just barely reached the 30-signal threshold to become a -9 overflow. It looks like the stock Cuda42 app has been assigned the tiebreaker. |
juan BFP Send message Joined: 16 Mar 07 Posts: 9786 Credit: 572,710,851 RAC: 3,799 |
This is what you are talking about? https://setiathome.berkeley.edu/workunit.php?wuid=2771602375 Did I need to make any adjust from my side? |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
No, those don't seem to have anything to do with the app. I see them on my machines from time to time, usually associated with a sudden system shutdown and restart, often due to a power outage. I don't know that there's anything you can do, unless the shutdown was voluntary, in which case exitting BOINC first before shutting down would probably avoid them. EDIT: Then there are Invalids like this one, https://setiathome.berkeley.edu/result.php?resultid=6214250418, where the Special App is very much at fault. All those Spikes reported after the restart are phantoms, not reported by your wingmen. The only thing you can do from your end is to either restart as seldom as possible, or change your checkpoint interval to a value that's greater than your normal task run time. That forces tasks to always restart from the beginning. I don't really subscribe to that approach, because overall I would lose more processing time on all the restarted tasks than I would gain from avoiding the Invalids. I keep my checkpoint interval at 120 seconds. However, that option is there if you want to choose that route. EDIT2: And I see you have another type of Invalid that originates with the Special App, https://setiathome.berkeley.edu/result.php?resultid=6213403484. The newer versions changed the processing sequence such that on some overflow tasks, the Special App reports all (or almost all) 30 signals as Triplets, while all other apps report all (or mostly all) Pulses. There's nothing at all you can do about these. |
Jeff Buck Send message Joined: 11 Feb 00 Posts: 1441 Credit: 148,764,870 RAC: 0 |
I'm posting this Inconclusive primarily because it involves a new version of the Special App that I haven't seen show up in my list before. Workunit 2785085052 (blc04_2bit_guppi_57903_58077_HIP20577_0032.18796.818.17.26.80.vlar) Task 6247381318 (S=0, A=1, P=20, T=0, G=0, BS=23.31016, BG=0) x41p_zi3xs3, Cuda 9.10 special Task 6247381319 (S=0, A=1, P=21, T=0, G=0, BS=23.31014, BG=0) SSE3xj Win32 Build 3500 Task 6249807279 (S=0, A=1, P=19, T=0, G=0, BS=23.31016, BG=0) x41p_zi3t2b, Cuda 8.00 special The discrepancy between the Cuda 9.10 app and that older SoG app is one of those Pulses where SoG calculated a score of exactly '1', so it's simply a borderline Pulse that SoG reported and the Special App didn't. Basically the same as what we discussed in the earlier "TestCase:....." thread, so nothing new, just a new app. Perhaps this WU could come in handy for future testing of such threshold cases. |
©2025 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.