Message boards :
Number crunching :
all 14au18ac GPU tasks on one system running 3 seconds?
Message board moderation
Author | Message |
---|---|
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
so i just put up a new system. it's having all kinds of strange issues. one of them being that all the 14au18ac tasks are finishing in 3-5 seconds. this doesnt seem right. and my other systems dont seem to be doing this. none of them have been validated yet, but i have a feeling they will all go invalid. ubuntu 18.04 nvidia 396.51 drivers gtx 1060 it doesnt report an error, but it's suspect that they are always 3 seconds on this kind of job, on this system. task example: https://setiathome.berkeley.edu/result.php?resultid=6897037707 Stderr output <core_client_version>7.9.3</core_client_version> <![CDATA[ <stderr_txt> setiathome_CUDA: Found 2 CUDA device(s): Device 1: GeForce GTX 1060 6GB, 6076 MiB, regsPerBlock 65536 computeCap 6.1, multiProcs 10 pciBusID = 4, pciSlotID = 0 Device 2: GeForce GTX 1060 6GB, 6078 MiB, regsPerBlock 65536 computeCap 6.1, multiProcs 10 pciBusID = 5, pciSlotID = 0 In cudaAcc_initializeDevice(): Boinc passed DevPref 1 setiathome_CUDA: CUDA Device 1 specified, checking... Device 1: GeForce GTX 1060 6GB is okay SETI@home using CUDA accelerated device GeForce GTX 1060 6GB Unroll autotune 10. Overriding Pulse find periods per launch. Parameter -pfp set to 10 Using default pulse Fft limit (-pfl 64) setiathome v8 enhanced x41p_V0.96, Cuda 9.20 special Compiled with NVCC, using static libraries. Modifications done by petri33 and released to the public by TBar. Detected setiathome_enhanced_v8 task. Autocorrelations enabled, size 128k elements. Work Unit Info: ............... WU true angle range is : 9.297712 Sigma 0 Thread call stack limit is: 1k Pulse: peak=inf, time=23.36, period=0.172, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=23.65, period=0.1524, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=23.93, period=0.1921, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=24.22, period=0.1577, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=24.51, period=0.1593, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=24.8, period=0.1528, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=25.09, period=0.1843, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=25.38, period=0.1516, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=25.66, period=0.1581, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=25.95, period=0.1724, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=26.24, period=0.1864, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=26.53, period=0.1819, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=26.82, period=0.1896, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=27.11, period=0.1675, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=27.39, period=0.1905, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=27.68, period=0.181, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=27.97, period=0.1671, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=28.26, period=0.1577, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=28.55, period=0.1556, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=28.84, period=0.1581, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=29.12, period=0.1499, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=29.41, period=0.1831, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=29.7, period=0.1749, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=29.99, period=0.1827, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=30.28, period=0.1761, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=30.57, period=0.154, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=30.85, period=0.1487, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=31.14, period=0.1651, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=31.43, period=0.186, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 Pulse: peak=inf, time=31.72, period=0.1313, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 SETI@Home Informational message -9 result_overflow NOTE: The number of results detected equals the storage space allocated. Best spike: peak=6.95862, time=55.06, d_freq=1420894775.39, chirp=0, fft_len=8 Best autocorr: peak=0, time=-2.124e+11, delay=0, d_freq=0, chirp=0, fft_len=0 Best gaussian: peak=0, mean=0, ChiSq=0, time=-2.124e+11, d_freq=0, score=-12, null_hyp=0, chirp=0, fft_len=0 Best pulse: peak=0, time=-2.124e+11, period=0, d_freq=0, score=0, chirp=0, fft_len=0 Best triplet: peak=0, time=-2.124e+11, period=0, d_freq=0, chirp=0, fft_len=0 Spike count: 0 Autocorr count: 0 Pulse count: 30 Triplet count: 0 Gaussian count: 0 22:51:23 (1489): called boinc_finish(0) </stderr_txt> ]]> Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
Those are overflows. You can see the -9 here Pulse: peak=inf, time=31.43, period=0.186, d_freq=1420899658.2, score=nan, chirp=0, fft_len=8 |
arkayn Send message Joined: 14 May 99 Posts: 4438 Credit: 55,006,323 RAC: 0 |
Those are overflows. You can see the -9 here Those are running between 7 and 10 minutes on my 1070's. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
so i just put up a new system. it's having all kinds of strange issues. Can't say you weren't warned; I'm still trying to figure out why they hate Arecibo shorties, once that is fixed I will post a link to the fully working App. Not only does V0.91-7 Hate Shorties, but on the Mac it also hates the normal length Arecibo tasks. Your only solution is to run zi3v on the Arecibo tasks, like I'm trying to do. Or, just go back to zi3v period. |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
The very first ones split off the tape were all early overflows. I have several pages were almost all of them overflowed. https://setiathome.berkeley.edu/results.php?userid=14084&offset=840&show_names=1&state=0&appid= Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
Does overflow = 3 second run time is normal? Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
so i just put up a new system. it's having all kinds of strange issues. I don’t think that’s it. All of my linux systems are running v0.96. None of them have this behavior. Only this one system. Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Are the other ones running Shorties? Or just the Normal length Arecibos? The ones with an AR of more that 1.5 are the ones that give problems on the Linux Systems. |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I don’t know what AR means. Or how to find out what the AR of a task is/was. I just setup and let it run Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Work Unit Info: That is from the one posted above, AR is 9.297712. Hence above 1.5 and fails on Every one of my machines. The ones at 1.2 and below work on the My Linux systems. On the Mac they sorta work, but give too high of a Pulse count. I.E. Count should be 4 and it reports 8. |
Zalster Send message Joined: 27 May 99 Posts: 5517 Credit: 528,817,460 RAC: 242 |
AR stands for angle range and is located in the stderr report. For the work unit you listed this is it WU true angle range is : 9.297712 |
Ian&Steve C. Send message Joined: 28 Sep 99 Posts: 4267 Credit: 1,282,604,591 RAC: 6,640 |
I guess I just got unlucky to get 20+ of these tasks in a row on the same system? My other v0.96 systems don’t see this nearly as often. Maybe once every couple days. And they have a lot more power on them and process many more WUs than this one Seti@Home classic workunits: 29,492 CPU time: 134,419 hours |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
Look at the good side. Now Petri is convinced it's a serious problem and he's looking into it more seriously. He got the same results BTW, https://setiathome.berkeley.edu/results.php?hostid=7475713&state=5 |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
It is a known problem. I'm working on it. I did not have a WU in my test cases that would have revealed it. Now I grabbed one. I'll work on it tomorrow. To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
If it ain't broke . . . .ya can't fix it! Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
Jord Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3 |
All 14au18a* tasks run for double to triple their normal time on my Android devices. |
TBar Send message Joined: 22 May 99 Posts: 5204 Credit: 840,779,836 RAC: 2,768 |
All Shorties, no matter where they came from, screw up on All my machines when using the same App as the OP. |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Try this on Linux and 10x0 executable https://drive.google.com/open?id=1pe6-p5zn27tXFvvszyGCzo0OfCkkCqDt source https://drive.google.com/open?id=17Djj2E8Pxcd7k2WouYBskfPGrtxwjvFO To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
Keith Myers Send message Joined: 29 Apr 01 Posts: 13161 Credit: 1,160,866,277 RAC: 1,873 |
Try this on Linux and 10x0 Petri, could use some simple instructions on how to make this one work. Installed the CUDA9.2 Toolkit and verified nvcc and clinfo. Changed the app name in app_info and then threw away all my gpu work. Did not install the 396.24 driver in the Toolkit because I was already running 396.51. I followed these instructions. How-to-install-CUDA-9-2-on-Ubuntu-18-04 Can you point out where I went wrong? TIA. Seti@Home classic workunits:20,676 CPU time:74,226 hours A proud member of the OFA (Old Farts Association) |
petri33 Send message Joined: 6 Jun 02 Posts: 1668 Credit: 623,086,772 RAC: 156 |
Here is an example of an app_info.xml file. The <file info> part must state the name of the executable so it does not get deleted when boinc starts. The same name is in the <app version> part <file name> xxx </file name> Sometimes it is easiest to suspend boinc. copy a new executable over the old one and keep the old file name. And resume computing. <app_info> <app> <name>astropulse_v7</name> </app> <file_info> <name>ap_7.01r2793_sse3_clGPU_x86_64</name> <executable/> </file_info> <app_version> <app_name>astropulse_v7</app_name> <version_num>708</version_num> <platform>linux_x86_64</platform> <plan_class>opencl_nvidia_100</plan_class> <cmdline> -verb -st -nog -unroll 80 -ffa_block 2304 -ffa_block_fetch 1152 -oclFFT_plan 256 16 256 </cmdline> <coproc> <type>NVIDIA</type> <count>0.33</count> </coproc> <file_ref> <file_name>ap_7.01r2793_sse3_clGPU_x86_64</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome_v8</name> </app> <file_info> <name>MBv8_8.22r3712_avx2_x86_64-pc-linux-gnu</name> <namex>MBv8_8.05r3345_avx_linux64</namex> <executable/> </file_info> <file_info> <name>setiathome_x41zc_x86_64-pc-linux-gnu_cuda65_v8</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <version_num>800</version_num> <platform>x86_64-pc-linux-gnu</platform> <avg_ncpus>1.000000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <file_ref> <file_name>MBv8_8.22r3712_avx2_x86_64-pc-linux-gnu</file_name> <main_program/> </file_ref> </app_version> <app_version> <app_name>setiathome_v8</app_name> <version_num>801</version_num> <platform>x86_64-pc-linux-gnu</platform> <avg_ncpus>1.000000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <file_ref> <file_name>MBv8_8.22r3712_avx2_x86_64-pc-linux-gnu</file_name> <main_program/> </file_ref> </app_version> <app_version> <app_name>setiathome_v8</app_name> <version_num>804</version_num> <platform>x86_64-pc-linux-gnu</platform> <avg_ncpus>1.000000</avg_ncpus> <max_ncpus>1.000000</max_ncpus> <file_ref> <file_name>MBv8_8.22r3712_avx2_x86_64-pc-linux-gnu</file_name> <main_program/> </file_ref> </app_version> <app_version> <app_name>setiathome_v8</app_name> <version_num>808</version_num> <platform>x86_64-pc-linux-gnu</platform> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>0.1</max_ncpus> <plan_class>nvidia_gpu</plan_class> <cmdline> -nobs -pfb 32 </cmdline> <coproc> <type>NVIDIA</type> <count>1.00</count> </coproc> <file_ref> <file_name>setiathome_x41zc_x86_64-pc-linux-gnu_cuda65_v8</file_name> <main_program/> </file_ref> </app_version> <app_version> <app_name>setiathome_v8</app_name> <version_num>809</version_num> <platform>x86_64-pc-linux-gnu</platform> <avg_ncpus>0.1</avg_ncpus> <max_ncpus>0.1</max_ncpus> <plan_class>opencl_nvidia_sah</plan_class> <cmdline> -nobs -pfb 32 </cmdline> <coproc> <type>NVIDIA</type> <count>1.00</count> </coproc> <file_ref> <file_name>setiathome_x41zc_x86_64-pc-linux-gnu_cuda65_v8</file_name> <main_program/> </file_ref> </app_version> </app_info> An here is an example of a app_config.xml <app_config> <xproject_max_concurrent>10</xproject_max_concurrent> <app> <name>astropulse_v7</name> <max_concurrent>8</max_concurrent> <gpu_versions> <gpu_usage>0.50</gpu_usage> <cpu_usage>0.125</cpu_usage> </gpu_versions> </app> <app> <name>setiathome_v8</name> <xmax_concurrent>10</xmax_concurrent> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>0.1</cpu_usage> </gpu_versions> </app> <app_version> <app_name>astropulse_v7</app_name> <plan_class>opencl_nvidia_100</plan_class> <cmdline> -verb -st -nog -unroll 80 -sbs 2048 -ffa_block 2304 -ffa_block_fetch 1152 -oclFFT_plan 256 16 256 </cmdline> </app_version> <app_version> <app_name>setiathome_v8</app_name> <plan_class>opencl_nvidia_sah</plan_class> <cmdline> -pfb 32 -nobs -pfl 64 </cmdline> </app_version> <app_version> <app_name>setiathome_v8</app_name> <plan_class>nvidia_gpu</plan_class> <cmdline> -pfb 32 -nobs -pfl 64 </cmdline> </app_version> </app_config> To overcome Heisenbergs: "You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.