Long-running work unit

Message boards : Number crunching : Long-running work unit
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Graeme Hewson

Send message
Joined: 14 Jun 99
Posts: 19
Credit: 242,802
RAC: 0
United Kingdom
Message 1573177 - Posted: 17 Sep 2014, 11:26:24 UTC - in response to Message 1573169.  

Yes, I'm showing all tasks; there are some running and some ready to start.
ID: 1573177 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1573180 - Posted: 17 Sep 2014, 12:02:07 UTC - in response to Message 1573177.  

Yes, I'm showing all tasks; there are some running and some ready to start.

Well, I can't explain it then. I don't know which version of BOINC you're using, because your computers are hidden at both projects, but I'm not aware of any reported BOINC bug which would falsely report that some task is suspended when it isn't.

Note that it is possible to 'suspend' a task which has completed and is ready to report - that doesn't show visibly in BOINC Manager, but I use it sometimes to defer work fetch if I want some other project to fetch first. That state clears itself automatically when the suspended task is reported.

At some time - probably in the next few weeks - the v7.4.xx development line will be promoted to 'recommended' status. That version of BOINC restores better logging of the various reasons why specific classes of work may or may not be requested during a scheduler contact. If your problem persists, you might consider trying an alpha test version to obtain that extra information. I haven't tested v7.4.22 yet, but v7.4.21 seems safe and clean for most purposes.
ID: 1573180 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1573272 - Posted: 17 Sep 2014, 16:25:25 UTC - in response to Message 1573180.  

Yes, I'm showing all tasks; there are some running and some ready to start.

Well, I can't explain it then. I don't know which version of BOINC you're using, because your computers are hidden at both projects ...

Link to his computer (the one with the initial problem of hang CPU task) is in my post:
http://setiathome.berkeley.edu/forum_thread.php?id=75617&postid=1571014#1571014
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1573272 · Report as offensive
Graeme Hewson

Send message
Joined: 14 Jun 99
Posts: 19
Credit: 242,802
RAC: 0
United Kingdom
Message 1573669 - Posted: 18 Sep 2014, 4:47:13 UTC - in response to Message 1573180.  

It's OK now after I restarted Boinc.
ID: 1573669 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1574144 - Posted: 19 Sep 2014, 2:14:19 UTC - in response to Message 1573669.  

Did you check again in slots of running tasks that the files are updating?
(stderr and state (I'm not sure about the exact names on Linux), to be sure app is not hang)
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1574144 · Report as offensive
Graeme Hewson

Send message
Joined: 14 Jun 99
Posts: 19
Credit: 242,802
RAC: 0
United Kingdom
Message 1574167 - Posted: 19 Sep 2014, 4:53:37 UTC - in response to Message 1574144.  

No, why? Tasks are going through, and I aborted the WU I had trouble with.
ID: 1574167 · Report as offensive
Graeme Hewson

Send message
Joined: 14 Jun 99
Posts: 19
Credit: 242,802
RAC: 0
United Kingdom
Message 1574990 - Posted: 20 Sep 2014, 11:08:06 UTC

It's happening again with another task. I'm about to abort it.

stderr.txt has:

setiathome_v7 7.00 Revision: 1782 g++ (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2)
libboinc: BOINC 7.1.0

Work Unit Info:
...............
WU true angle range is :  1.571478
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------


I notice it's running with the i686 executable:

ps -fp 4512
UID        PID  PPID  C STIME TTY          TIME CMD
boinc     4512 17291 62 Sep19 ?        11:02:14 ../../projects/setiathome.berkeley.edu/setiathome_7.01_i686-pc-linux-gnu


while the normally-running task on the other core is using the X86_64 executable, as I would expect:

ps -fp 11855
UID        PID  PPID  C STIME TTY          TIME CMD
boinc    11855 17291 91 11:09 ?        00:34:19 ../../projects/setiathome.berkeley.edu/setiathome_7.01_x86_64-pc-linux-gnu

ID: 1574990 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1575344 - Posted: 21 Sep 2014, 1:28:09 UTC - in response to Message 1574990.  

It's happening again with another task

Of course it will happen, it don't depend on task, it depend on app
This is long existing problem with 'Optimal function choices' test on AMD CPUs
(sometimes the test hangs, sometimes not. On some systems it hangs a lot, on some - never/rarely)

The way to avoid this is using apps which don't do 'Optimal function choices' (and are faster):
http://lunatics.kwsn.net/index.php?module=Downloads;catd=1
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1575344 · Report as offensive
Graeme Hewson

Send message
Joined: 14 Jun 99
Posts: 19
Credit: 242,802
RAC: 0
United Kingdom
Message 1575504 - Posted: 21 Sep 2014, 9:40:05 UTC - in response to Message 1575344.  

The way I'm avoiding it is by suspending work on SETI for now. I'll look back in a few months to see if this long existing problem has been fixed.
ID: 1575504 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1575843 - Posted: 22 Sep 2014, 3:49:28 UTC - in response to Message 1575504.  
Last modified: 22 Sep 2014, 3:58:13 UTC

 
long existing = years

It will not be fixed magically
Someone have to figure out why it happens on some systems before a fix is proposed

Example:
It happens in 30-40% at start/restart of tasks on my K6-2+ but only when it is booted in Windows 2000
When the same system runs in Windows 98 it never happens
The BOINC and SETI@home files are the same (same BOINC Program and Data directories used, same tasks continue to run)


PSAPI.DLL used in both cases is from Windows 2000
It is in SETI@home directory (<BOINC_Data>\projects\setiathome.berkeley.edu\)
and in app_info.xml I put these lines:
    <file_info>
        <name>Psapi.dll</name>
        <executable/>
    </file_info>
...
        <file_ref>
            <file_name>Psapi.dll</file_name>
            <copy_file/>
        </file_ref>



I'm willing to test why the hang happens if someone can provide a test scenario


The currently running task (only at night, on Windows 2000) shows how some nights no progress was done (Restarts at the same progress/percent as previous day)
setiathome_v7 7.00 DevC++/MinGW/g++ 4.5.2
libboinc: 7.1.0

Work Unit Info:
...............
WU true angle range is :  0.014060
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)
              v_GetPowerSpectrum 0.010141 0.00000 
setiathome_v7 7.00 DevC++/MinGW/g++ 4.5.2
libboinc: 7.1.0

Work Unit Info:
...............
WU true angle range is :  0.014060
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)
              v_GetPowerSpectrum 0.009980 0.00000 
               fpu_opt_ChirpData 0.368909 0.00000 
                     v_Transpose 0.564497 0.00000 
                 FPU opt folding 0.125455 0.00000 
Restarted at 3.66 percent.
Restarted at 7.49 percent.
Restarted at 11.46 percent.
Restarted at 15.09 percent.
Restarted at 15.09 percent.
Restarted at 18.85 percent.
Restarted at 22.83 percent.
Restarted at 26.47 percent.
Restarted at 30.44 percent.
Restarted at 36.80 percent.
Restarted at 36.80 percent.
Restarted at 43.95 percent.
Restarted at 43.95 percent.
Restarted at 51.01 percent.
Restarted at 51.01 percent.
Restarted at 58.15 percent.
Restarted at 58.15 percent.
Restarted at 65.34 percent.
Restarted at 72.66 percent.

 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1575843 · Report as offensive
Graeme Hewson

Send message
Joined: 14 Jun 99
Posts: 19
Credit: 242,802
RAC: 0
United Kingdom
Message 1575999 - Posted: 22 Sep 2014, 14:09:09 UTC - in response to Message 1575843.  

Is the source code available?

How can I reset a WU so it appears never to have run, so I can watch it start up?
ID: 1575999 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1576490 - Posted: 23 Sep 2014, 10:58:07 UTC - in response to Message 1575999.  
Last modified: 23 Sep 2014, 11:09:53 UTC

 
Source code is available but I don't know the proper link


You don't need to "reset a WU so it appears never to have run" so you "can watch it start up"
'Optimal function choice' is done on every restart (but only the first result is printed in stderr.txt)
There is a cmdline switch which makes the app print in stderr.txt all the functions tested on every restart


Do an Offline Test (outside of BOINC):

- Make an empty directory
- Copy in it all the app files, for Windows those are:
setiathome_7.00_windows_intelx86.exe
libfftw3f-3-3_upx.dll

- Copy some WU file and rename it to work_unit.sah

- Stop BOINC and run the app with -verbose switch:
setiathome_7.00_windows_intelx86.exe -verbose

Now look what is written in stderr.txt (new lines have to appear every few seconds)
After a few minutes kill the app process

(repeat run with -verbose as many times as you like)


(Part of) The results for me (run on 01.08.2013)
(first posted test show hang, second finish, but both have strange big/negative numbers)
16:23:58 (812): Can't set up shared mem: -1. Will run in standalone mode.
Restarted at 50.94 percent.
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.011967 0.00000  test
             v_vGetPowerSpectrum not supported on CPU
            v_vGetPowerSpectrum2 not supported on CPU
     v_vGetPowerSpectrumUnrolled not supported on CPU
    v_vGetPowerSpectrumUnrolled2 not supported on CPU
           v_avxGetPowerSpectrum not supported on CPU
              v_GetPowerSpectrum 0.011967 0.00000  choice

                     v_ChirpData 0.640901 0.00000  test
                   fpu_ChirpData 573481538.838350 0.00000  test
16:28:36 (1380): Can't set up shared mem: -1. Will run in standalone mode.
Restarted at 50.94 percent.
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.010331 0.00000  test
             v_vGetPowerSpectrum not supported on CPU
            v_vGetPowerSpectrum2 not supported on CPU
     v_vGetPowerSpectrumUnrolled not supported on CPU
    v_vGetPowerSpectrumUnrolled2 not supported on CPU
           v_avxGetPowerSpectrum not supported on CPU
              v_GetPowerSpectrum 0.010331 0.00000  choice

                     v_ChirpData 67253406.327791 0.00000  test
                   fpu_ChirpData -22044769989.102539 0.00000  test
               fpu_opt_ChirpData 0.175478 0.00000  test
             v_vChirpData_x86_64 not supported on CPU
               sse1_ChirpData_ak not supported on CPU
             sse1_ChirpData_ak8e not supported on CPU
             sse1_ChirpData_ak8h not supported on CPU
               sse2_ChirpData_ak not supported on CPU
              sse2_ChirpData_ak8 not supported on CPU
               sse3_ChirpData_ak not supported on CPU
              sse3_ChirpData_ak8 not supported on CPU
                 avx_ChirpData_a not supported on CPU
                 avx_ChirpData_b not supported on CPU
                 avx_ChirpData_c not supported on CPU
                 avx_ChirpData_d not supported on CPU
                   fpu_ChirpData -22044769989.102539 0.00000  choice

                     v_Transpose 3.278891 0.00000  test
                    v_Transpose2 140111.578261 0.00000  test
                    v_Transpose4 68867348302.981277 0.00000  test
                    v_Transpose8 -2.147919 0.00000  test
                  v_pfTranspose2 not supported on CPU
                  v_pfTranspose4 not supported on CPU
                  v_pfTranspose8 not supported on CPU
                   v_vTranspose4 not supported on CPU
                 v_vTranspose4np not supported on CPU
                v_vTranspose4ntw not supported on CPU
              v_vTranspose4x8ntw not supported on CPU
             v_vTranspose4x16ntw not supported on CPU
            v_vpfTranspose8x4ntw not supported on CPU
            v_avxTranspose4x8ntw not supported on CPU
           v_avxTranspose4x16ntw not supported on CPU
            v_avxTranspose8x4ntw not supported on CPU
          v_avxTranspose8x8ntw_a not supported on CPU
          v_avxTranspose8x8ntw_b not supported on CPU
                    v_Transpose8 -2.147919 0.00000  choice

                 FPU opt folding 0.075586 0.00000  test
                 ben SSE folding not supported on CPU
                  AK SSE folding not supported on CPU
                  BH SSE folding not supported on CPU
                JS AVX_a folding not supported on CPU
                JS AVX_c folding not supported on CPU
                 FPU opt folding 0.075586 0.00000  choice

                   Test duration    75.50 seconds



*****

Normal 'Optimal function choice' test - no big numbers:
20:11:09 (-1710395): Can't set up shared mem: -1. Will run in standalone mode.
Restarted at 71.81 percent.
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.010295 0.00000  test
             v_vGetPowerSpectrum not supported on CPU
            v_vGetPowerSpectrum2 not supported on CPU
     v_vGetPowerSpectrumUnrolled not supported on CPU
    v_vGetPowerSpectrumUnrolled2 not supported on CPU
           v_avxGetPowerSpectrum not supported on CPU
              v_GetPowerSpectrum 0.010295 0.00000  choice

                     v_ChirpData 0.560746 0.00000  test
                   fpu_ChirpData 0.594549 0.00000  test
               fpu_opt_ChirpData 0.506013 0.00000  test
             v_vChirpData_x86_64 not supported on CPU
               sse1_ChirpData_ak not supported on CPU
             sse1_ChirpData_ak8e not supported on CPU
             sse1_ChirpData_ak8h not supported on CPU
               sse2_ChirpData_ak not supported on CPU
              sse2_ChirpData_ak8 not supported on CPU
               sse3_ChirpData_ak not supported on CPU
              sse3_ChirpData_ak8 not supported on CPU
                 avx_ChirpData_a not supported on CPU
                 avx_ChirpData_b not supported on CPU
                 avx_ChirpData_c not supported on CPU
                 avx_ChirpData_d not supported on CPU
               fpu_opt_ChirpData 0.506013 0.00000  choice

                     v_Transpose 0.798243 0.00000  test
                    v_Transpose2 0.420133 0.00000  test
                    v_Transpose4 0.214401 0.00000  test
                    v_Transpose8 0.361900 0.00000  test
                  v_pfTranspose2 not supported on CPU
                  v_pfTranspose4 not supported on CPU
                  v_pfTranspose8 not supported on CPU
                   v_vTranspose4 not supported on CPU
                 v_vTranspose4np not supported on CPU
                v_vTranspose4ntw not supported on CPU
              v_vTranspose4x8ntw not supported on CPU
             v_vTranspose4x16ntw not supported on CPU
            v_vpfTranspose8x4ntw not supported on CPU
            v_avxTranspose4x8ntw not supported on CPU
           v_avxTranspose4x16ntw not supported on CPU
            v_avxTranspose8x4ntw not supported on CPU
          v_avxTranspose8x8ntw_a not supported on CPU
          v_avxTranspose8x8ntw_b not supported on CPU
                    v_Transpose4 0.214401 0.00000  choice

                 FPU opt folding 0.084228 0.00000  test
                 ben SSE folding not supported on CPU
                  AK SSE folding not supported on CPU
                  BH SSE folding not supported on CPU
                JS AVX_a folding not supported on CPU
                JS AVX_c folding not supported on CPU
                 FPU opt folding 0.084228 0.00000  choice

                   Test duration    50.49 seconds



*****

'Optimal function choice' test on another CPU which do not hang (AMD Athlon II X3 455 + Windows XP)
16:26:24 (476): Can't set up shared mem: -1. Will run in standalone mode.
Restarted at 18.06 percent.
Optimal function choices:
--------------------------------------------------------
                            name   timing   error
--------------------------------------------------------
                v_BaseLineSmooth (no other)

              v_GetPowerSpectrum 0.000382 0.00000  test
             v_vGetPowerSpectrum 0.000320 0.00000  test
            v_vGetPowerSpectrum2 0.000358 0.00000  test
     v_vGetPowerSpectrumUnrolled 0.000324 0.00000  test
    v_vGetPowerSpectrumUnrolled2 0.000340 0.00000  test
           v_avxGetPowerSpectrum not supported on CPU
             v_vGetPowerSpectrum 0.000320 0.00000  choice

                     v_ChirpData 0.012787 0.00000  test
                   fpu_ChirpData 0.016894 0.00000  test
               fpu_opt_ChirpData 0.012891 0.00000  test
             v_vChirpData_x86_64 0.060894 0.00000  test
               sse1_ChirpData_ak 0.010138 0.00000  test
             sse1_ChirpData_ak8e 0.008654 0.00000  test
             sse1_ChirpData_ak8h 0.009152 0.00000  test
               sse2_ChirpData_ak 0.009978 0.00000  test
              sse2_ChirpData_ak8 0.006273 0.00000  test
               sse3_ChirpData_ak 0.009179 0.00000  test
              sse3_ChirpData_ak8 0.006062 0.00000  test
                 avx_ChirpData_a not supported on CPU
                 avx_ChirpData_b not supported on CPU
                 avx_ChirpData_c not supported on CPU
                 avx_ChirpData_d not supported on CPU
              sse3_ChirpData_ak8 0.006062 0.00000  choice

                     v_Transpose 0.021419 0.00000  test
                    v_Transpose2 0.011473 0.00000  test
                    v_Transpose4 0.008364 0.00000  test
                    v_Transpose8 0.014047 0.00000  test
                  v_pfTranspose2 0.011766 0.00000  test
                  v_pfTranspose4 0.007041 0.00000  test
                  v_pfTranspose8 0.011649 0.00000  test
                   v_vTranspose4 0.006263 0.00000  test
                 v_vTranspose4np 0.006060 0.00000  test
                v_vTranspose4ntw 0.005209 0.00000  test
              v_vTranspose4x8ntw 0.003133 0.00000  test
             v_vTranspose4x16ntw 0.002851 0.00000  test
            v_vpfTranspose8x4ntw 0.005198 0.00000  test
            v_avxTranspose4x8ntw not supported on CPU
           v_avxTranspose4x16ntw not supported on CPU
            v_avxTranspose8x4ntw not supported on CPU
          v_avxTranspose8x8ntw_a not supported on CPU
          v_avxTranspose8x8ntw_b not supported on CPU
             v_vTranspose4x16ntw 0.002851 0.00000  choice

                 FPU opt folding 0.003475 0.00000  test
                 ben SSE folding 0.001164 0.00000  test
                  AK SSE folding 0.001046 0.00000  test
                  BH SSE folding 0.001043 0.00000  test
                JS AVX_a folding not supported on CPU
                JS AVX_c folding not supported on CPU
                  BH SSE folding 0.001043 0.00000  choice

                   Test duration     4.77 seconds

 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1576490 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1576497 - Posted: 23 Sep 2014, 11:11:21 UTC - in response to Message 1576490.  

Source code is available but I don't know the proper link

http://setiathome.berkeley.edu/sah_porting.php
ID: 1576497 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1576500 - Posted: 23 Sep 2014, 11:24:13 UTC - in response to Message 1576490.  

for Windows those are:


Hum:

Yes, I'm running Linux. I restarted Boinc and the work unit, and the progress dropped to 0%. :-(

At least it now has a Remaining estimate (rather longer than usual, about 3.5 hours). Looks good so far.

It sounds as though this is a known problem, then?


Claggy
ID: 1576500 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1576502 - Posted: 23 Sep 2014, 11:28:11 UTC - in response to Message 1576500.  

The principle is exactly the same for Linux, including the command line switch and the workunit rename. Only the executable name will be different, and it won't need a DLL file.
ID: 1576502 · Report as offensive
Claggy
Volunteer tester

Send message
Joined: 5 Jul 99
Posts: 4654
Credit: 47,537,079
RAC: 4
United Kingdom
Message 1576515 - Posted: 23 Sep 2014, 12:18:24 UTC - in response to Message 1576502.  

Although it might be easier doing it in a Bench,

I've setup a KWSN-Bench-Linux-MBv7_v2.01.08 Bench for that purpose, with the setiathome_7.01_x86_64-pc-linux-gnu and the setiathome_7.01_i686-pc-linux-gnu apps, and the Seti v7 wisgen Wu,
Just download and extract the Bench program, and run the 'benchmark' file in a terminal, it should only take 30 seconds or so,
afterwards navigate to the testData directory, the two text files you'll want to look at are the ref-stderr.setiathome_7.01_x86_64-pc-linux-gnu._WisGenA.wu.txt and the stderr.setiathome_7.01_i686-pc-linux-gnu._WisGenA.wu.txt files

My OneDrive

Claggy
ID: 1576515 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1576551 - Posted: 23 Sep 2014, 14:02:28 UTC - in response to Message 1576500.  

for Windows those are:

Hum:
Yes, I'm running Linux ...

I know he is running Linux, and I know he knows the app filenames
(I just didn't search them during the post, I explained my experience on how I did the test expecting he to find the proper filenames)

From his posts the apps filenames have to be:
setiathome_7.01_i686-pc-linux-gnu
setiathome_7.01_x86_64-pc-linux-gnu
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1576551 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1576847 - Posted: 24 Sep 2014, 6:00:19 UTC - in response to Message 1576490.  

BilBg wrote:
...
Do an Offline Test (outside of BOINC):

- Make an empty directory
- Copy in it all the app files, for Windows those are:
setiathome_7.00_windows_intelx86.exe
libfftw3f-3-3_upx.dll

- Copy some WU file and rename it to work_unit.sah

- Stop BOINC and run the app with -verbose switch:
setiathome_7.00_windows_intelx86.exe -verbose

Now look what is written in stderr.txt (new lines have to appear every few seconds)
After a few minutes kill the app process

(repeat run with -verbose as many times as you like)


(Part of) The results for me (run on 01.08.2013)
(first posted test show hang, second finish, but both have strange big/negative numbers)
...

It's a good test method, IMO probably the best that can be done without building special code. I'll think about something better which might help, but won't have time to actually do anything until next week at the earliest.

The timing values are based on the QueryPerformanceCounter function on Windows 2000 and later. If that isn't available, there's a fallback using GetSystemTimeAsFileTime which is less precise. Windows' implementation of the QueryPerformanceCounter function of course must be specific to the hardware, such details are worked out cooperatively between Microsoft and the CPU manufacturers. Whether the flaw is in the implementation for some chips or the S@H hires_timer.cpp usage of the function isn't clear.

The big positive or negative values could obviously be handled better, wherever they come from. Some kind of sanity check could be added, perhaps retrying a test if it produces unbelievable times. Really figuring out the cause and eliminating it would be much better, of course.
                                                                  Joe
ID: 1576847 · Report as offensive
Graeme Hewson

Send message
Joined: 14 Jun 99
Posts: 19
Credit: 242,802
RAC: 0
United Kingdom
Message 1576863 - Posted: 24 Sep 2014, 6:28:03 UTC - in response to Message 1576847.  

The timing values are based on the QueryPerformanceCounter function on Windows 2000 and later. If that isn't available, there's a fallback using GetSystemTimeAsFileTime which is less precise.

How about under Linux?
ID: 1576863 · Report as offensive
Profile BilBg
Volunteer tester
Avatar

Send message
Joined: 27 May 07
Posts: 3720
Credit: 9,385,827
RAC: 0
Bulgaria
Message 1576875 - Posted: 24 Sep 2014, 6:50:24 UTC - in response to Message 1576863.  
Last modified: 24 Sep 2014, 6:52:26 UTC

The timing values are based on the QueryPerformanceCounter function on Windows 2000 and later. If that isn't available, there's a fallback using GetSystemTimeAsFileTime which is less precise.

How about under Linux?

Try the test and report, please (if you want to help to find and fix the issue (Josef W. Segur is one of the main programmers))

The test will show (for your system):
- how often the hang happens
- do you also see big + - numbers
- does the hang happen more often at some places (e.g. testing particular function or after big + - numbers)
- does the hang happen for both Linux apps (32 and 64 bit)

Run the apps (32 and 64 bit) at least 10 times each to have enough statistics
 
 


- ALF - "Find out what you don't do well ..... then don't do it!" :)
 
ID: 1576875 · Report as offensive
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Long-running work unit


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.