Impossible Pulse, retrying from checkpoint from setiathome_8.05_i686-pc-linux-gnu

Message boards : Number crunching : Impossible Pulse, retrying from checkpoint from setiathome_8.05_i686-pc-linux-gnu
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Mr_Maniac

Send message
Joined: 22 Oct 04
Posts: 3
Credit: 3,969,266
RAC: 12
Germany
Message 1835396 - Posted: 11 Dec 2016, 18:26:51 UTC

Hello,

since a few days, I'm getting some workunits that always fail with "Impossible Pulse, retrying from checkpoint". I figured out, that the failing workunits all are processed by setiathome_8.05_i686-pc-linux-gnu. GPU and setiathome_8.00_x86_64-pc-linux-gnu are doing fine.

I even get the following messages in my kernel log / dmesg:
[   67.540314] ------------[ cut here ]------------
[   67.540317] WARNING: CPU: 5 PID: 4572 at ./arch/x86/include/asm/fpu/internal.h:368 fpu__restore+0x1fb/0x200
[   67.540317] Modules linked in: nvidia_uvm(PO) xpad ff_memless joydev nvidia_drm(PO) nvidia_modeset(PO) nvidia(PO) vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O)
[   67.540323] CPU: 5 PID: 4572 Comm: setiathome_8.05 Tainted: P           O    4.8.13-gentoo #1
[   67.540323] Hardware name: System manufacturer System Product Name/Z170 PRO GAMING, BIOS 1901 06/20/2016
[   67.540324]  0000000000000000 ffff880807b5bd90 ffffffff813f84b8 0000000000000000
[   67.540325]  0000000000000000 ffff880807b5bdd0 ffffffff810e3936 0000017000000000
[   67.540326]  ffff8808126340c0 0000000000000000 ffff880812633700 ffff8808126340c0
[   67.540328] Call Trace:
[   67.540330]  [<ffffffff813f84b8>] dump_stack+0x4d/0x65
[   67.540331]  [<ffffffff810e3936>] __warn+0xc6/0xe0
[   67.540332]  [<ffffffff810e3a08>] warn_slowpath_null+0x18/0x20
[   67.540333]  [<ffffffff81078cdb>] fpu__restore+0x1fb/0x200
[   67.540335]  [<ffffffff8107a094>] __fpu__restore_sig+0x224/0x4e0
[   67.540336]  [<ffffffff8107a548>] fpu__restore_sig+0x28/0x40
[   67.540337]  [<ffffffff810dd82f>] ia32_restore_sigcontext+0x14f/0x170
[   67.540338]  [<ffffffff810dd9e0>] sys32_sigreturn+0xa0/0xb0
[   67.540339]  [<ffffffff81002b4e>] do_int80_syscall_32+0x4e/0xa0
[   67.540341]  [<ffffffff8182892a>] entry_INT80_compat+0x2a/0x40
[   67.540342] ---[ end trace d7909f4b1f947dc7 ]---


Could this be a bug in setiathome_8.05_i686-pc-linux-gnu or is there something wrong with my system? Can I provide anything else to help?
ID: 1835396 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1835558 - Posted: 12 Dec 2016, 13:07:51 UTC - in response to Message 1835396.  
Last modified: 12 Dec 2016, 13:14:04 UTC

Probably kernel 4.8 makes some adaptations inside the apps necessary.

Do others have such problems with Linux kernel 4.8, too ?
<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
setiathome_v8 8.00 Revision: 3335 g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
libboinc: BOINC 7.7.0

Work Unit Info:
...............
WU true angle range is :  0.331616
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)
           v_avxGetPowerSpectrum 0.000038 0.00000 
                 avx_ChirpData_d 0.002135 0.00000 
           v_avxTranspose4x16ntw 0.000622 0.00000 
                JS AVX_a folding 0.000382 0.00000 
...
Restarted at 100.00 percent.
Restarted at 100.00 percent.
Restarted at 100.00 percent.
SIGSEGV: segmentation violation
Stack trace (10 frames):
[0x8127360]
[0xf7729b00]
[0x8065266]
[0x8060fd3]
[0x805ccbc]
[0x80688b8]
[0x807587d]
[0x8048660]
[0x833e0e8]
[0x8048201]

Exiting...

</stderr_txt>
]]>

@Maniac : Could you try to reproduce the error in standalone ?
_\|/_
U r s
ID: 1835558 · Report as offensive
Profile Mr_Maniac

Send message
Joined: 22 Oct 04
Posts: 3
Credit: 3,969,266
RAC: 12
Germany
Message 1836590 - Posted: 17 Dec 2016, 22:05:54 UTC - in response to Message 1835558.  
Last modified: 17 Dec 2016, 22:06:19 UTC

Okay, it also fails in standalone. setiathome 8.05 just runs for about a minute and then exits. There's a file "boinc_temporary_exit" which contain the following lines:
300
Impossible Pulse, retrying from checkpoint.
notice


stderr.txt:
22:58:49 (25424): Can't open init data file - running in standalone mode
22:58:49 (25424): Can't open init data file - running in standalone mode
setiathome_v8 8.00 Revision: 3335 g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
libboinc: BOINC 7.7.0

Work Unit Info:
...............
WU true angle range is :  0.010995
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000073 0.00000  test
             v_vGetPowerSpectrum 0.000037 0.00000  test
            v_vGetPowerSpectrum2 0.000036 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000033 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000036 0.00000  test
           v_avxGetPowerSpectrum 0.000026 0.00000  test
           v_avxGetPowerSpectrum 0.000026 0.00000  choice

                     v_ChirpData 0.714439     nan  test
                   fpu_ChirpData 0.542551    -nan  test
               fpu_opt_ChirpData 1.298088    -nan  test
             v_vChirpData_x86_64 0.056348     nan  test
               sse1_ChirpData_ak 0.004843    -nan  test
             sse1_ChirpData_ak8e 0.003809    -nan  test
             sse1_ChirpData_ak8h 0.004056    -nan  test
               sse2_ChirpData_ak 0.004270    -nan  test
              sse2_ChirpData_ak8 0.002725    -nan  test
               sse3_ChirpData_ak 0.004026    -nan  test
              sse3_ChirpData_ak8 0.002747    -nan  test
                 avx_ChirpData_a 0.001499    -nan  test
                 avx_ChirpData_b 0.001501    -nan  test
                 avx_ChirpData_c 0.001530    -nan  test
                 avx_ChirpData_d 0.001495    -nan  test
                 avx_ChirpData_d 0.001495    -nan  choice

                 FPU opt folding 0.004615 0.00000  test
                 ben SSE folding 0.000874 0.00000  test
                  AK SSE folding 0.000640 0.00000  test
                  BH SSE folding 0.000666 0.00000  test
                JS AVX_a folding 0.000505 0.00000  test
                JS AVX_c folding 0.000559 0.00000  test
                JS AVX_a folding 0.000505 0.00000  choice

                   Test duration    72.45 seconds

New best spike:score:-0.93486, power: 4.6473, index=1, fft_len=32, ifft=0,icfft=2
New best spike:score:-0.93156, power: 4.6827, index=31, fft_len=32, ifft=2,icfft=2
New best spike:score:-0.8618, power: 5.4987, index=30, fft_len=32, ifft=5,icfft=2
New best spike:score:-0.85308, power: 5.6102, index=28, fft_len=32, ifft=12,icfft=2
New best spike:score:-0.84689, power: 5.6907, index=2, fft_len=32, ifft=17,icfft=2
New best spike:score:-0.79057, power: 6.4787, index=2, fft_len=32, ifft=18,icfft=2
New best spike:score:-0.73158, power: 7.4213, index=19, fft_len=32, ifft=33,icfft=2
New best spike:score:-0.69114, power: 8.1455, index=22, fft_len=32, ifft=38,icfft=2
New best spike:score:-0.63299, power: 9.3126, index=20, fft_len=32, ifft=505,icfft=2
New best spike:score:-0.62555, power: 9.4735, index=31, fft_len=32, ifft=607,icfft=2
New best spike:score:-0.61784, power: 9.6432, index=13, fft_len=32, ifft=2364,icfft=2
New best spike:score:-0.61588, power: 9.6869, index=25, fft_len=32, ifft=3506,icfft=2
New best spike:score:-0.57285, power: 10.696, index=2, fft_len=32, ifft=4481,icfft=2
New best spike:score:-0.50415, power: 12.529, index=17, fft_len=32, ifft=8243,icfft=2
Best pulse updated: score=0.5042,power=0.19155,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.5098,power=0.6935,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.5106,power=0.13076,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.5109,power=0.089005,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.5773,power=0.10058,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.598,power=1.3064,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.6024,power=0.81926,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.6026,power=0.52487,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.6164,power=1.3461,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.6291,power=0.54706,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.6312,power=0.85566,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.6691,power=0.17081,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.6972,power=0.26385,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.7179,power=2.0992,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.7261,power=0.70736,fftlen=32,freq_bin=1,time_bin=16384,icfft=2
Best pulse updated: score=0.7388,power=1.3186,fftlen=32,freq_bin=2,time_bin=16384,icfft=2
Best pulse updated: score=0.748,power=0.73009,fftlen=32,freq_bin=2,time_bin=16384,icfft=2
Best pulse updated: score=0.7557,power=0.48024,fftlen=32,freq_bin=2,time_bin=16384,icfft=2
Best pulse updated: score=0.7927,power=2.3158,fftlen=32,freq_bin=6,time_bin=16384,icfft=2
Best pulse updated: score=0.8323,power=0.098881,fftlen=32,freq_bin=19,time_bin=16384,icfft=2
New best spike:score:-0.43278, power: 14.767, index=17, fft_len=64, ifft=6209,icfft=3
Best pulse updated: score=0.8339,power=0.39413,fftlen=64,freq_bin=16,time_bin=8192,icfft=3


Do you need other info (result.sah or wisdom.sah)? Fortunately I mostly get workunits that use setiathome 8.00 x86_64 or 8.10 opencl and those work flawlessly.
ID: 1836590 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 15 May 99
Posts: 692
Credit: 135,197,781
RAC: 211
Germany
Message 1837091 - Posted: 20 Dec 2016, 14:01:58 UTC - in response to Message 1836590.  
Last modified: 20 Dec 2016, 14:06:15 UTC

Okay, it also fails in standalone. setiathome 8.05 just runs for about a minute and then exits. There's a file "boinc_temporary_exit" which contain the following lines:
300
Impossible Pulse, retrying from checkpoint.
notice


stderr.txt:
...
setiathome_v8 8.00 Revision: 3335 g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
libboinc: BOINC 7.7.0

Work Unit Info:
...............
WU true angle range is :  0.010995
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000073 0.00000  test
             v_vGetPowerSpectrum 0.000037 0.00000  test
            v_vGetPowerSpectrum2 0.000036 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000033 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000036 0.00000  test
           v_avxGetPowerSpectrum 0.000026 0.00000  test
           v_avxGetPowerSpectrum 0.000026 0.00000  choice

                     v_ChirpData 0.714439     nan  test
                   fpu_ChirpData 0.542551    -nan  test
               fpu_opt_ChirpData 1.298088    -nan  test
             v_vChirpData_x86_64 0.056348     nan  test
               sse1_ChirpData_ak 0.004843    -nan  test
             sse1_ChirpData_ak8e 0.003809    -nan  test
             sse1_ChirpData_ak8h 0.004056    -nan  test
               sse2_ChirpData_ak 0.004270    -nan  test
              sse2_ChirpData_ak8 0.002725    -nan  test
               sse3_ChirpData_ak 0.004026    -nan  test
              sse3_ChirpData_ak8 0.002747    -nan  test
                 avx_ChirpData_a 0.001499    -nan  test
                 avx_ChirpData_b 0.001501    -nan  test
                 avx_ChirpData_c 0.001530    -nan  test
                 avx_ChirpData_d 0.001495    -nan  test
                 avx_ChirpData_d 0.001495    -nan  choice

                 FPU opt folding 0.004615 0.00000  test
                 ben SSE folding 0.000874 0.00000  test
                  AK SSE folding 0.000640 0.00000  test
                  BH SSE folding 0.000666 0.00000  test
                JS AVX_a folding 0.000505 0.00000  test
                JS AVX_c folding 0.000559 0.00000  test
                JS AVX_a folding 0.000505 0.00000  choice

                   Test duration    72.45 seconds

...


Do you need other info (result.sah or wisdom.sah)? Fortunately I mostly get workunits that use setiathome 8.00 x86_64 or 8.10 opencl and those work flawlessly.


Hi,
I've checked several hosts with kernel 4.8 at setiathome beta, which also get the 32bit setiathome v8 8.05 app, that seems to fail completely on your host. So probably the app is ok.
Also your last post (important snippet quoted) seems to point to a problem with your host : The optimal function choices look somewhat odd !
Here is an example of how a sane Optimal function choices should look :
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000100 0.00000  test
             v_vGetPowerSpectrum 0.000081 0.00000  test
            v_vGetPowerSpectrum2 0.000091 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000071 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000085 0.00000  test
           v_avxGetPowerSpectrum 0.000148 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000071 0.00000  choice

                     v_ChirpData 0.011877 0.00000  test
                   fpu_ChirpData 0.017826 0.00000  test
               fpu_opt_ChirpData 0.012076 0.00000  test
             v_vChirpData_x86_64 0.053050 0.00000  test
               sse1_ChirpData_ak 0.009364 0.00000  test
             sse1_ChirpData_ak8e 0.007752 0.00000  test
             sse1_ChirpData_ak8h 0.007835 0.00000  test
               sse2_ChirpData_ak 0.007863 0.00000  test
              sse2_ChirpData_ak8 0.005082 0.00000  test
               sse3_ChirpData_ak 0.007386 0.00000  test
              sse3_ChirpData_ak8 0.005060 0.00000  test
                 avx_ChirpData_a 0.004639 0.00000  test
                 avx_ChirpData_b 0.004578 0.00000  test
                 avx_ChirpData_c 0.005383 0.00000  test
                 avx_ChirpData_d 0.004439 0.00000  test
                 avx_ChirpData_d 0.004439 0.00000  choice

                     v_Transpose 0.013233 0.00000  test
                    v_Transpose2 0.006874 0.00000  test
                    v_Transpose4 0.004108 0.00000  test
                    v_Transpose8 0.007133 0.00000  test
                 fftwf_transpose 0.003693 0.00000  test
                  v_pfTranspose2 0.006886 0.00000  test
                  v_pfTranspose4 0.003974 0.00000  test
                  v_pfTranspose8 0.007051 0.00000  test
                   v_vTranspose4 0.003855 0.00000  test
                 v_vTranspose4np 0.003809 0.00000  test
                v_vTranspose4ntw 0.006342 0.00000  test
              v_vTranspose4x8ntw 0.003530 0.00000  test
             v_vTranspose4x16ntw 0.002092 0.00000  test
            v_vpfTranspose8x4ntw 0.006276 0.00000  test
            v_avxTranspose4x8ntw 0.003452 0.00000  test
           v_avxTranspose4x16ntw 0.002319 0.00000  test
            v_avxTranspose8x4ntw 0.006354 0.00000  test
          v_avxTranspose8x8ntw_a 0.003525 0.00000  test
          v_avxTranspose8x8ntw_b 0.003474 0.00000  test
             v_vTranspose4x16ntw 0.002092 0.00000  choice

                 FPU opt folding 0.001605 0.00000  test
                 ben SSE folding 0.000764 0.00000  test
                  AK SSE folding 0.000647 0.00000  test
                  BH SSE folding 0.000600 0.00000  test
                JS AVX_a folding 0.001781 0.00000  test
                JS AVX_c folding 0.001854 0.00000  test
                  BH SSE folding 0.000600 0.00000  choice

                   Test duration     7.75 seconds
Yours seems to miss Transpose section compeletely and shows "nan"s (for Not A Number) in Chirp sections error column.
Eventually your copy of the i686 setiathome app is defective.
Have you tried to delete the local copy and let it redownload ?
_\|/_
U r s
ID: 1837091 · Report as offensive
Profile Mr_Maniac

Send message
Joined: 22 Oct 04
Posts: 3
Credit: 3,969,266
RAC: 12
Germany
Message 1837840 - Posted: 25 Dec 2016, 13:22:44 UTC - in response to Message 1837091.  
Last modified: 25 Dec 2016, 13:33:04 UTC

Unfortunately, even after re-downloading the binary (even twice and more), the situation only changed slightly. The md5sum of the binary changed (so it really seems to be another binary), but the result only varied between several runs.
Sometimes there were less nan-results for ChripData and sometimes they were all nan, again. But I'd say this isn't an issue anymore, since I didn't get any i686 workunits in the last days, anyway. The only question I ask myself now is: Is it my system or the binary...

EDIT:
I was curious and ran an x86_64 workunit in standalone. The results in stderr.txt look much better (no nan), but the transpose-section is still missing. Maybe it only is needed for specific work units?
14:25:50 (22364): Can't open init data file - running in standalone mode
14:25:50 (22364): Can't open init data file - running in standalone mode
setiathome_v8 8.00 Revision: 3290 g++ (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
libboinc: BOINC 7.7.0

Work Unit Info:
...............
WU true angle range is :  0.308809
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000049 0.00000  test
             v_vGetPowerSpectrum 0.000037 0.00000  test
            v_vGetPowerSpectrum2 0.000036 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000033 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000037 0.00000  test
           v_avxGetPowerSpectrum 0.000027 0.00000  test
           v_avxGetPowerSpectrum 0.000027 0.00000  choice

                     v_ChirpData 0.003601 0.00000  test
             v_vChirpData_x86_64 0.055411 0.00000  test
               sse1_ChirpData_ak 0.005125 0.00000  test
             sse1_ChirpData_ak8e 0.004120 0.00000  test
             sse1_ChirpData_ak8h 0.004363 0.00000  test
               sse2_ChirpData_ak 0.007658 0.00000  test
              sse2_ChirpData_ak8 0.002885 0.00000  test
               sse3_ChirpData_ak 0.007782 0.00000  test
              sse3_ChirpData_ak8 0.002854 0.00000  test
                 avx_ChirpData_a 0.001491 0.00000  test
                 avx_ChirpData_b 0.001474 0.00000  test
                 avx_ChirpData_c 0.001471 0.00000  test
                 avx_ChirpData_d 0.001474 0.00000  test
                 avx_ChirpData_c 0.001471 0.00000  choice

                 FPU opt folding 0.000366 0.00000  test
                 ben SSE folding 0.000312 0.00000  test
                  AK SSE folding 0.000202 0.00000  test
                  BH SSE folding 0.000233 0.00000  test
                JS AVX_a folding 0.000235 0.00000  test
                JS AVX_c folding 0.000249 0.00000  test
                  AK SSE folding 0.000202 0.00000  choice

                   Test duration     3.56 seconds

ID: 1837840 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1837845 - Posted: 25 Dec 2016, 13:54:39 UTC - in response to Message 1837840.  
Last modified: 25 Dec 2016, 14:06:11 UTC

NaNs on that very unusual scale, imply some sortof memory (or other component) corruption. Haven't looked back into your machine specs at all, but try simply reseating the RAM and CPU. if the memory has a base published timing spec and is set to an XMP profile, or similar, try setting to base timings manually in bios (disabling such XMP profile, or whatever the AMD equivalent might be). If the behaviour stays the same, then probably more digging would be needed as far as BIOS revisions are concerned, but if the behaviour changes (even if not fixed) then you have a smoking gun.

[Edit:] Come to think of it, Linux is moving to 4k virtualisation and multithreading as Windows did circa 2005, so extra care about firmware, scratch/page drives and Kernel updates could be warranted

[Edit2:] The discrepancy between x64 and x86 might further imply drives may need looking at.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1837845 · Report as offensive

Message boards : Number crunching : Impossible Pulse, retrying from checkpoint from setiathome_8.05_i686-pc-linux-gnu


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.