Message boards :
Number crunching :
Crunching apears to stop
Message board moderation
Author | Message |
---|---|
Richard Jablonski Send message Joined: 16 Sep 10 Posts: 9 Credit: 2,322,803 RAC: 0 |
I am having problem with work units starting and then after days it is still at the same time and percentage done. When I look at the graphics it shows choosing optimal functions. This has been going on about a month now. Some work and some do not past a certain point. After about a week of this going on I normally just abort it and get new work units. I would rather have a fix than abort the work units. |
Mike Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80 |
I see you are running a FX 8350. Did you check your temps ? Are you running on all 8 cores ? Free at least 2 CPU cores. With each crime and every kindness we birth our future. |
Sutaru Tsureku Send message Joined: 6 Apr 07 Posts: 7105 Credit: 147,663,825 RAC: 5 |
I saw you let run the stock project applications. AFAIK, the stock CPU applications 'like' to 'sleep' here and there on AMD CPUs. Maybe it would help if you would install opti applications with help of the Lunatics Installer ... Message boards : Number crunching : Optimised Applications and Other Binaries (sticky thread at the top of NC forum) |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
I am having problem with work units starting and then after days it is still at the same time and percentage done. Read here: http://setiathome.berkeley.edu/forum_thread.php?id=75617&postid=1571978#1571978 This is a long existing problem with 'Optimal function choices' test on AMD CPUs - sometimes the test hangs, sometimes not. On some systems it hangs a lot, on some - never/rarely (long existing = years = don't expect fix of the stock apps soon as "Someone have to figure out why it happens on some systems before a fix is proposed") The only real cure is using apps which don't do 'Optimal function choices' (and are faster): Optimised Applications - Installer v0.43a http://setiathome.berkeley.edu/forum_thread.php?id=71867&postid=1596404#1596404 After about a week of this going ... I would rather have a fix than abort the work units. No need to wait more than a few minutes - on your CPU 'choosing optimal functions' should finish in < 30 seconds. And no need to abort - just Suspend/Restart the task (If you start using Optimised Applications there will be no need for Suspend/Restart as they will never hang) But if you stay on stock/standard apps: - You may need to do Suspend/Restart several times (it is unclear when the test passes and when it hangs). Hang don't depend on task, it depend on (stock) app - so may happen for any CPU task. This test is done before the app even looks at the task data. But if the app goes past 'Optimal function choice' it will no more hang during computing (unless the task is Restarted again - any Restart (e.g. from 77%, from any %) do again 'Optimal function choice') Go for Optimised Applications and you are 'cured' Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
Claggy Send message Joined: 5 Jul 99 Posts: 4654 Credit: 47,537,079 RAC: 4 |
I am having problem with work units starting and then after days it is still at the same time and percentage done. It's not just AMD CPUs, I see this on my C2D T5500 running Ubuntu 14.04, I've tried compiling my own apps, no change, I've reported about it at Lunatics with some suggestions, don't think anyone was interested. Check out some of the results at Beta: All tasks for computer 73174 <core_client_version>7.5.0</core_client_version> <![CDATA[ <stderr_txt> setiathome_v7 7.28 Revision: 2834 g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2 libboinc: BOINC 7.5.0 Work Unit Info: ............... WU true angle range is : 0.011416 Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000654 0.00000 test v_vGetPowerSpectrum 0.000554 0.00000 test v_vGetPowerSpectrum2 0.000812 0.00000 test v_vGetPowerSpectrumUnrolled 0.000320 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000765 0.00000 test v_avxGetPowerSpectrum faulted v_vGetPowerSpectrumUnrolled 0.000320 0.00000 choice v_ChirpData 0.023827 0.00000 test fpu_ChirpData 0.055913 0.00000 test fpu_opt_ChirpData 0.053241 0.00000 test v_vChirpData_x86_64 0.608812 0.01993 test sse1_ChirpData_ak 8215593.620413 0.00000 test setiathome_v7 7.28 Revision: 2834 g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2 libboinc: BOINC 7.5.0 Work Unit Info: ............... WU true angle range is : 0.011416 Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- setiathome_v7 7.28 Revision: 2834 g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2 libboinc: BOINC 7.5.0 Work Unit Info: ............... WU true angle range is : 0.011416 Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000393 0.00000 test v_vGetPowerSpectrum 0.000295 0.00000 test v_vGetPowerSpectrum2 0.000411 0.00000 test v_vGetPowerSpectrumUnrolled 0.000326 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000316 0.00000 test v_avxGetPowerSpectrum faulted v_vGetPowerSpectrum 0.000295 0.00000 choice v_ChirpData 0.026576 0.00000 test fpu_ChirpData 0.025330 0.00000 test fpu_opt_ChirpData 0.032680 0.00000 test v_vChirpData_x86_64 0.582770 0.01993 test sse1_ChirpData_ak 0.018779 0.00000 test sse1_ChirpData_ak8e 0.016703 0.00000 test sse1_ChirpData_ak8h 79256438.325382 0.00000 test setiathome_v7 7.28 Revision: 2834 g++ (Ubuntu 4.8.2-19ubuntu1) 4.8.2 libboinc: BOINC 7.5.0 Work Unit Info: ............... WU true angle range is : 0.011416 Optimal function choices: -------------------------------------------------------- name timing error -------------------------------------------------------- v_BaseLineSmooth (no other) v_GetPowerSpectrum 0.000330 0.00000 test v_vGetPowerSpectrum 0.000273 0.00000 test v_vGetPowerSpectrum2 0.000300 0.00000 test v_vGetPowerSpectrumUnrolled 0.000268 0.00000 test v_vGetPowerSpectrumUnrolled2 0.000323 0.00000 test v_avxGetPowerSpectrum faulted v_vGetPowerSpectrumUnrolled 0.000268 0.00000 choice v_ChirpData 0.028666 0.00000 test fpu_ChirpData 0.021731 0.00000 test fpu_opt_ChirpData 0.027636 0.00000 test v_vChirpData_x86_64 19813884.405080 0.01993 test sse1_ChirpData_ak 138697187.902061 0.00000 test sse1_ChirpData_ak8e 0.023322 0.00000 test sse1_ChirpData_ak8h 0.020502 0.00000 test sse2_ChirpData_ak 0.028234 0.00000 test sse2_ChirpData_ak8 0.263413 0.00000 test sse3_ChirpData_ak 0.047039 0.00000 test sse3_ChirpData_ak8 0.016548 0.00000 test avx_ChirpData_a faulted avx_ChirpData_b faulted avx_ChirpData_c faulted avx_ChirpData_d faulted sse3_ChirpData_ak8 0.016548 0.00000 choice v_Transpose 0.049660 0.00000 test v_Transpose2 0.025243 0.00000 test v_Transpose4 0.014373 0.00000 test v_Transpose8 0.019599 0.00000 test v_pfTranspose2 0.028664 0.00000 test v_pfTranspose4 0.015308 0.00000 test v_pfTranspose8 0.037967 0.00000 test v_vTranspose4 0.019065 0.00000 test v_vTranspose4np 0.018374 0.00000 test v_vTranspose4ntw 0.014543 0.00000 test v_vTranspose4x8ntw 0.018542 0.00000 test v_vTranspose4x16ntw 0.011961 0.00000 test v_vpfTranspose8x4ntw 0.014342 0.00000 test v_avxTranspose4x8ntw faulted v_avxTranspose4x16ntw faulted v_avxTranspose8x4ntw faulted v_avxTranspose8x8ntw_a faulted v_avxTranspose8x8ntw_b faulted v_vTranspose4x16ntw 0.011961 0.00000 choice FPU opt folding 0.005224 0.00000 test ben SSE folding 0.005372 0.00000 test AK SSE folding 0.004457 0.00000 test BH SSE folding 0.004935 0.00000 test JS AVX_a folding faulted JS AVX_c folding faulted AK SSE folding 0.004457 0.00000 choice Test duration 23.30 seconds Flopcounter: 44658297320699.539062 Spike count: 8 Autocorr count: 0 Pulse count: 2 Triplet count: 0 Gaussian count: 0 07:18:01 (30888): called boinc_finish(0) </stderr_txt> ]]> Claggy |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
sse1_ChirpData_ak 8215593.620413 0.00000 test ... sse1_ChirpData_ak8h 79256438.325382 0.00000 test ... v_vChirpData_x86_64 19813884.405080 0.01993 test sse1_ChirpData_ak 138697187.902061 0.00000 test You have strange big numbers like in my tests: http://setiathome.berkeley.edu/forum_thread.php?id=75617&postid=1576490#1576490 Joe may be interested to see more tests (e.g. if you will see also strange big negative numbers): http://setiathome.berkeley.edu/forum_thread.php?id=75617&postid=1576847#1576847 I may guess "big negative numbers" are in fact "too big positive numbers" http://en.wikipedia.org/wiki/2147483647#In_computing OK, here example of your "big negative numbers": http://setiweb.ssl.berkeley.edu/beta/result.php?resultid=18517750 And it results in: "fpu_opt_ChirpData -79751928.307712 0.00000 choice" There is also "very small negative number" which have to mean the counter was just 'a bit' over 2147483647 sse1_ChirpData_ak8e -0.000188 0.00000 test I see some warnings about TSC/RDTSC in the 'Use' section here: http://en.wikipedia.org/wiki/Time_Stamp_Counter   - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
It's not just AMD CPUs, I see this on my C2D T5500 running Ubuntu 14.04, I've tried compiling my own apps, no change, I've reported about it at Lunatics with some suggestions, don't think anyone was interested. Part of a long list of ToDos for my own builds (both CPU and GPU of various types), has for a long time been some C++ class based inheritance for key processing functions. When I mentioned that I was going to be enabling builds to use various CPU FFTs, Cuda and OpenCL, with internal dispatch, Eric did express interest in having a more 'pluggable' implementation for the FFTs at least (which currently are not benched), and that he would appreciate if I could put the same facilities into main. As selection of those depends on hardware, libraries, accuracy and performance, dispatch there has to be a bit more flexible and generic than the existing mechanism, so I started on the Class hierarchy to include the other processing functions as well. Since shifting build system, Cuda7, and various Boinc issues have taken precedence, for stock that has sat at a bare/unpopulated file/class structure I committed quite a while back, in a folder under stock v7. That will probably recommence my end, as soon as I've mastered the basics of the Gradle build system, and by nature requires redoing the benchmark code. [Edit:] since, while testing gradle backstage, we tested some of the precision timers involved a little while back in small test puces, and they appeared to work for a range of devices/purposes without issue, probably the bench code will receive some of that work in the end. Yep, bits and pieces everywhere to tie together. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
My 'bad tests' were on Windows 2000 (and didn't happen on the same machine under Windows 98) this may be related: "Programs that use the QueryPerformanceCounter function may perform poorly in Windows Server 2000, in Windows Server 2003, and in Windows XP" https://support.microsoft.com/en-us/kb/895980 Â - ALF - "Find out what you don't do well ..... then don't do it!" :) Â |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
yeah, underneath most bench code uses that function (for the Windows code), which is basically just a CPU timestamp counter call and appropriate serialising instruction. Some Motherboards and windows versions have issues, as well as some caveats with hyperthreading and such. If the stock bench code is using the RDTSC instruction directly, or that Windows API function (On Windows builds Obviously), there would be some alternative ways to use a lower resolution counter not prone to the issues. When I get to that point, I'll probably test the reliability of the used timer on the host, and use a less accurate means instead of that timestamp [where necessary]. Another possibility is that the timers in the stock variant are overflowing somehow. Since the hardware counters involved should not overflow for ~100 years or so (64 bit IIRC), then it's possible only a portion of the value is used (e.g. if it only uses 32 bits, the number of 'ticks' might only be a couple of seconds on some hosts, and some benches take longer than that.) So plenty of possibilities to check out. I'd be interested to reproduce the issues in a dedicated standalone test piece down the line, allowing proving of alternatives/fixes, especially since what I'm working towards will be heavily dependant on timers. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
BilBg Send message Joined: 27 May 07 Posts: 3720 Credit: 9,385,827 RAC: 0 |
 I also read this: "Acquiring high-resolution time stamps": https://msdn.microsoft.com/en-us/library/windows/desktop/dn553408%28v=vs.85%29.aspx The most funny part ;) for me was: How do I determine and validate that QPC works on my machine?      You don't need to perform such checks.   Sounds like: Stormtrooper: Let me see your identification. Ben Obi-Wan Kenobi: [with a small wave of his hand] You don't need to see his identification. Stormtrooper: We don't need to see his identification. Ben Obi-Wan Kenobi: These aren't the droids you're looking for. http://www.imdb.com/title/tt0076759/quotes They also say: Do I need to set the thread affinity to a single core to use QPC?      No. ... But the first user response at the end: " Is the performance counter monotonic (non-decreasing)? Yes Would be nice if the above was true, but it's not. I can't speak to the technical reasons why this happened in my application, but I found that in order to have monotonic behavior, the thread affinity had to be set. Specifically, the counter was monotonic on only a single CPU core, and affinity had to be set to deal with this. As I recall, this was in a Windows XP SP3 virtual machine running in VirtualBox. All four hyper-threaded cores on my Core i7 were available to the guest (i.e. 8 logical CPUs). Maybe the newer versions of Windows work around this; I don't know. Many others have observed this behavior... .NET Stopwatch class (based on QPC): http://stackoverflow.com/questions/1008345/system-diagnostics-stopwatch-returns-negative-numbers-in-elapsed-properties Note it can be proven by decompiling/examining .NET 2.0 to 4.0 sources to see that MSFT themselves added a hack to protect against returning negative durations in their stopwatch class (see my answer in link above). So you can't say "yes" that the QPC function is monotonic when the NETFX programmers themselves coded to protect against that, and lots of devs observe that it's not. "   - ALF - "Find out what you don't do well ..... then don't do it!" :)  |
Richard Jablonski Send message Joined: 16 Sep 10 Posts: 9 Credit: 2,322,803 RAC: 0 |
Thank you I installed the installer ap |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.