Monitoring inconclusive GBT validations and harvesting data for testing

Author	Message
jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1822077 - Posted: 6 Oct 2016, 1:37:00 UTC - in response to Message 1822042. Last modified: 6 Oct 2016, 2:05:41 UTC Petri's own computer running his own code mis-reported the final pulse (Beta message 59697), but he says his follow-up bench test didn't. Nobody has reproduced the failure, so the finger is pointing towards a hardware glitch, thermal event, etc. Additionally, though the particular comparison failed to reproduce a problem here with my own zi+a build (and Cuda baseline of course), there are some indications here on Windows of 'stuff been happening'. I updated from the Cuda 8 release candidate toolkit driver I was using to latest WHQL, and saw a few tasks enter a 'weird state', spinning their wheels. I then aborted these (saving a few for later inspection), rebooted, and dialled the GPU back to its default Superclocks + amping the p2 memory clock as I usually do. At this point the 'wheel spinning' behaviour has not reoccurred, and it seems Guppi VLAR times may have dropped a further minute, to ~420 seconds (GTX 980 SC), despite lower clocks, and temperatures. This implies a number of things that potentially warrant further examination as familiarity with the new streaming code behaviour improves: - There is, or has been, considerable work going on with the latest drivers (likely all platforms, in support of Cuda 8, and newer gen hardware... logically) - There has likely been debug or profiling code in said drivers, or driver compiler components. - The newest round of memory compression and/or synchronisation optimisations going on may have some quirks we don't yet have experience with, giving sensitivity to aggressive-overclocks/heat in new ways (while at the same time solving/hiding prior overhead limitations). IMO that's a pretty familiar situation from Fermi Introduction, and represents the typical round of instability when so many system hardware, software, and application changes are going on simultaneously. Experience says that it takes time and patience to nut that out, and there is the potential for wild goose chases until more settles down. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1822077 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1822112 - Posted: 6 Oct 2016, 5:14:46 UTC - in response to Message 1822042. Nobody has reproduced the failure, so the finger is pointing towards a hardware glitch, thermal event, etc. Would be good if so. Cause I interpret http://setiweb.ssl.berkeley.edu/beta/forum_thread.php?id=2266&postid=59737 little different. And, of course, all things that include missing sync points very hard to reproduce. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1822112 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1822135 - Posted: 6 Oct 2016, 7:05:41 UTC - in response to Message 1821919. Validation inconclusive (164) Quite impressive number. Is it OK enough or not OK - worth to check. 15% Inconclusive. Not good. Actually, very bad IMHO. Grant Darwin NT ID: 1822135 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1822147 - Posted: 6 Oct 2016, 8:17:58 UTC - in response to Message 1822135. Validation inconclusive (164) Quite impressive number. Is it OK enough or not OK - worth to check. 15% Inconclusive. Not good. Actually, very bad IMHO. Certainly not reflected here on Windows, so something that will have to fall out during alpha "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1822147 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 1822153 - Posted: 6 Oct 2016, 9:03:29 UTC - in response to Message 1822135. Validation inconclusive (164) Quite impressive number. Is it OK enough or not OK - worth to check. 15% Inconclusive. Not good. Actually, very bad IMHO. Well in my World it is not that easy because Petris code seems to work on workunits like the Picture below more. In that way the information sent back especially with dealing with -9 units will almost all fall into the inconclusive state for sure. Cpus and older code seem to work more like pixel for pixel Top Left - Top Right and down a notch to next row (Metaphorically speaking) so it's not crazy at all that the inconclusive is higher than normal. What really actually matters is that the rawdata returned is calculating correctly and it seems now like that in 99.9% of the cases it does and the rest gets binned away as invalid! For what i Believe how Petris code more work i include this picture of an theoretical explanation of how it now seems to process data! Only Petri can tell more here but if it works exactly this way there is no wonder if information is beeing presented differently towards the validator and it takes a second/third opinion until ruled out even if the results do match. I've written numerous times now that Petri has stated that his offline testbed of WUs all fall in the Q99+ range (Strongly similar sts) if he hasn't got something now the latest weeks that behaves differently. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 1822153 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13736 Credit: 208,696,464 RAC: 304	Message 1822154 - Posted: 6 Oct 2016, 9:19:35 UTC - in response to Message 1822153. What really actually matters is that the rawdata returned is calculating correctly and it seems now like that in 99.9% of the cases it does True. Unfortunately the load on the database is also an important factor, and the more times a WU has to be crunched, and the longer it's waiting on Validation the greater the load on the database is. And the more machines with higher levels of Inconclusive results, the greater the significance of that load on the database is. Grant Darwin NT ID: 1822154 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1822157 - Posted: 6 Oct 2016, 9:38:05 UTC - in response to Message 1822153. I've written numerous times now that Petri has stated that his offline testbed of WUs all fall in the Q99+ range (Strongly similar sts) if he hasn't got something now the latest weeks that behaves differently. That raises a number of questions, some of which are going to be pretty tough. The specific WU which Raistmer brought back to our attention last night wasn't an overflow WU - it was an 'out of tolerance' difference in a reported signal, and as such properly rejected by the live validator. Petri re-ran the WU offline, and reported back to me by email I ran a first test against the 8.04 standard result: ./rescmpv5_l testData/ref-result.setiathome_8.04_i686-pc-linux-gnu.27jl16aa.19977.160094.8.42.65.wu.sah testData/result.axo.27jl16aa.19977.160094.8.42.65.wu.sah Result : Strongly similar, Q= 99.73% but he didn't say anything else about the test protocol - whether it was run on the same hardware, whether it was run with exactly the same version of the application, whether he checked the pulse report values by eye. So, 1) what diversity of test WUs is Petri basing that "all Q99+" assessment on? 2) have we actually validated recently that rescmp is calculating Q values the way we think it is? 3) Since we have no way of recovering an uploaded result file after the event, we have no way of knowing whether these sporadics are hardware glitches (as I postulated last night), or buggy code paths which might be followed under some live operating conditions but not when re-run under laboratory conditions. 4) assuming we have the source code for rescmp available, I was wondering whether it could be adapted to produce a comparator which would accept as input the signal summary format reported through stderr, and generate a crude Q value from that? If it could also highlight individual out-of tolerance signal reports, that would make these after-the-event inquests much less tedious than my manual 'signal alignment' technique. And after all that, we still have the overflow case to consider... ID: 1822157 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1822160 - Posted: 6 Oct 2016, 10:18:21 UTC - in response to Message 1822157. 3) Since we have no way of recovering an uploaded result file after the event, we have no way of knowing whether these sporadics are hardware glitches (as I postulated last night), or buggy code paths which might be followed under some live operating conditions but not when re-run under laboratory conditions. That's what I fear and would be glad to be ruled out. As I said on beta, such misreport could (just could, there are other reasons indeed and some hardware glitch one of them)arise from missing ordering inside single PoT processing. This order is required by current processing logic, it can't be just parallelized w/o adequate results reduction after. And it's totally different from overflow issue in that sense. Why this effect quite fragile: if signal reporting depends on order of separate workgroups processing that particular order can easily be changed from run to run (especially taking into account that kernels can be paired inside device). That's why such effect would be hard to reproduce. W/o code review hard to say what particular scenario realized in this single particular case (and code review is resource-costly operation), but I think this task should get "test case" status for awhile and be checked against with different builds on different hardware (in hope not to get broken result again but still with respect of possibility to get it). SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1822160 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1822305 - Posted: 6 Oct 2016, 22:13:01 UTC Returning to "fp:strict" discussion. The question answered by VC++ very straighforward: 1>Compiling... 1>cl : Command line error D8016 : '/arch:SSE2' and '/fp:strict' command-line options are incompatible SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1822305 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1822389 - Posted: 7 Oct 2016, 3:41:15 UTC - in response to Message 1822305. Last modified: 7 Oct 2016, 3:42:25 UTC As far as X-branch needing to use fp:precise over fp:fast for host code, best likelihood I've come across is that I switched to the ooura (spelling ?) FFT for Baseline smooth. This is for the reason that it takes longer for fftw to plan then do the one time smooth, than it takes for the entire CUDA initialisation to take place. Modern fftw is crafted to produce precise results anyway (with SSE+), while OOURA fft is standard C based, so likely sensitive to agressive compiler optimisations. Xbranch on Windows uses no SIMD instruction set on the core code. ARCH:SSE on the Boinc libraries. so the fp:precise seems to be needed using native fpu, to make OOURA match IEEE-754. I can likely prove or disprove this at some stage down the line, when CPU use becomes important again for Cuda (which it will). That examination will happen when I get around to plugin-ising the CPU FFT and other components for x42, as they'll become modularised and internally benched for speed and error growth (like stock CPU does) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1822389 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1822441 - Posted: 7 Oct 2016, 9:56:55 UTC Here are SSE3 x64 fp:fast and fp:precise builds for offline testing: https://cloud.mail.ru/public/bhHf/RcjBZU8nv SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1822441 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1822444 - Posted: 7 Oct 2016, 10:12:55 UTC - in response to Message 1822441. Here are SSE3 x64 fp:fast and fp:precise builds for offline testing: https://cloud.mail.ru/public/bhHf/RcjBZU8nv I can push through some comparison runs if you want. To save time, do we have any idea which ARs are likely to show the greatest effects from cumulative rounding errors? ID: 1822444 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1822445 - Posted: 7 Oct 2016, 10:41:07 UTC - in response to Message 1822444. Here are SSE3 x64 fp:fast and fp:precise builds for offline testing: https://cloud.mail.ru/public/bhHf/RcjBZU8nv I can push through some comparison runs if you want. To save time, do we have any idea which ARs are likely to show the greatest effects from cumulative rounding errors? Those with mid to low AR pulsefinding, since the high angle ranges have none. You might want to look if you have the ones you sent me for the v8 Gaussian discrepancies in December 2015, since that was the marker for switching Xbranch ---> Bearing in mind the discrepancies are apparently small. IMO you'll likely run into a lot of questions about how the builds were made, which is where how the iindividual codebases were compiled matters. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1822445 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1822446 - Posted: 7 Oct 2016, 10:51:48 UTC - in response to Message 1822444. Well, referencetask showed discrepance initially (but it quite long one). I would (actually, will) try quickref for start cause it reports plenty of different signals. Then PG set. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1822446 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1822447 - Posted: 7 Oct 2016, 11:03:36 UTC - in response to Message 1822446. MB8_win_x86_SSE3_VS2008_r3525_default_fast_math.exe / refquick_v8.wu : 37.966 secs Elapsed 35.771 secs CPU time R2: .\ref\ref-MB8_win_x64_AVX_VS2010_r3330.exe-refquick_v8.wu.res Result : Strongly similar, Q= 99.89% MB8_win_x86_SSE3_VS2008_r3525_fp_precise.exe / refquick_v8.wu : 46.690 secs Elapsed 44.538 secs CPU time R2: .\ref\ref-MB8_win_x64_AVX_VS2010_r3330.exe-refquick_v8.wu.res Result : Strongly similar, Q= 99.90% SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1822447 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1822449 - Posted: 7 Oct 2016, 11:16:46 UTC - in response to Message 1822441. Here are SSE3 x64 fp:fast and fp:precise builds for offline testing: https://cloud.mail.ru/public/bhHf/RcjBZU8nv Correction: actually they are x86, archive naming is wrong. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1822449 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1822451 - Posted: 7 Oct 2016, 11:23:40 UTC - in response to Message 1822445. Here are SSE3 x64 fp:fast and fp:precise builds for offline testing: https://cloud.mail.ru/public/bhHf/RcjBZU8nv I can push through some comparison runs if you want. To save time, do we have any idea which ARs are likely to show the greatest effects from cumulative rounding errors? Those with mid to low AR pulsefinding, since the high angle ranges have none. You might want to look if you have the ones you sent me for the v8 Gaussian discrepancies in December 2015, since that was the marker for switching Xbranch ---> Bearing in mind the discrepancies are apparently small. IMO you'll likely run into a lot of questions about how the builds were made, which is where how the iindividual codebases were compiled matters. OK, I've found these in a viable benching machine: ref-setiathome_8.04_windows_intelx86.exe-FG00091_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-FG00134_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-FG01307_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-FG02968_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-FG03853_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-FG04160_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-FG04221_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-FG04317_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-FG04465_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-FG09362_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-FG11753_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-FG13462_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-FG24857_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-FG53024_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-FG76516_v8.wu.res ref-setiathome_8.04_windows_intelx86.exe-PG0009_v7.wu.res ref-setiathome_8.04_windows_intelx86.exe-PG0395_v7.wu.res ref-setiathome_8.04_windows_intelx86.exe-PG0444_v7.wu.res ref-setiathome_8.04_windows_intelx86.exe-PG1327_v7.wu.res ref-setiathome_8.04_windows_intelx86.exe-reference_work_unit_r3215.wu.res The PG_v7 and reference results are dated 30 December 2015, so done on the old Q6600 CPU: the FG_v8 results are dated 24-27 January 2016, so probably done on the replacement i5-4690. Anyone concerned about differences in SSE3 implementation between those two generations? (and come to that, the Q6600 ran under Windows XP, the i5 runs under Windows 7. Both probably had BIOS microcode valid at the date of manufacture - 2007 and 2016 respectively) I'll fire off a few FG_v8 while you think about that. ID: 1822451 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1822453 - Posted: 7 Oct 2016, 11:42:02 UTC - in response to Message 1822451. Last modified: 7 Oct 2016, 11:51:24 UTC Anyone concerned about differences in SSE3 implementation between those two generations? I would say NO. What is matter: strong validation and reasonable time. From quickref: validation reasonable strong in both (actually we see differencies between AVX and SSE3 there), time for fp:precise unacceptable. From Core2Quad: Running app : MB8_win_x86_SSE3_VS2008_r3330.exe -verb -nog with WU : PG0009_v8.wu Result : stored as ref for validations. 508.403 secs Elapsed 507.643 secs CPU time Running app : MB8_win_x86_SSE3_VS2008_r3525_default_fast_math.exe with WU : PG0009_v8.wu 491.750 secs Elapsed 490.701 secs CPU time R2: .\ref\ref-MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe-PG0009_v8.wu.res Result : Strongly similar, Q= 99.65% R2: .\ref\ref-MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe-PG0009_v8.wu.res Result : Strongly similar, Q= 99.65% R2: .\ref\ref-MB8_win_x86_SSE3_VS2008_r3299.exe-PG0009_v8.wu.res Result : Strongly similar, Q= 100.0% R2: .\ref\ref-MB8_win_x86_SSE3_VS2008_r3330.exe-PG0009_v8.wu.res Result : Strongly similar, Q= 100.0% Running app : MB8_win_x86_SSE3_VS2008_r3525_fp_precise.exe with WU : PG0009_v8.wu 508.886 secs Elapsed 507.643 secs CPU time R2: .\ref\ref-MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3330.exe-PG0009_v8.wu.res Result : Strongly similar, Q= 99.66% R2: .\ref\ref-MB8_win_x86_SSE2_OpenCL_ATi_HD5_r3430.exe-PG0009_v8.wu.res Result : Strongly similar, Q= 99.66% R2: .\ref\ref-MB8_win_x86_SSE3_VS2008_r3299.exe-PG0009_v8.wu.res Result : Strongly similar, Q= 99.98% R2: .\ref\ref-MB8_win_x86_SSE3_VS2008_r3330.exe-PG0009_v8.wu.res Result : Strongly similar, Q= 99.98% So, precision changed indeed. But all results in strong coincendence still. fp:precise slower but not too much on PG009. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1822453 ·

Mike Volunteer tester Send message Joined: 17 Feb 01 Posts: 34258 Credit: 79,922,639 RAC: 80	Message 1822456 - Posted: 7 Oct 2016, 11:50:24 UTC The specific WU which Raistmer brought back to our attention last night wasn't an overflow WU - it was an 'out of tolerance' difference in a reported signal, and as such properly rejected by the live validator. Petri re-ran the WU offline, and reported back to me by email I ran a first test against the 8.04 standard result: ./rescmpv5_l testData/ref-result.setiathome_8.04_i686-pc-linux-gnu.27jl16aa.19977.160094.8.42.65.wu.sah testData/result.axo.27jl16aa.19977.160094.8.42.65.wu.sah Result : Strongly similar, Q= 99.73% To be honest all builds i have tested the last 3 years which had lower Q than 99.9x% had some bugs. So precision first IMHO. With each crime and every kindness we birth our future. ID: 1822456 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1822457 - Posted: 7 Oct 2016, 12:04:07 UTC - in response to Message 1822453. So, precision changed indeed. But all results in strong coincendence still. fp:precise slower but not too much on PG009. Thinking scientific method, I'm just a tiny bit nervous about using previous optimised apps as reference in a test like this: when the precision changes the result, we don't know for sure whether it gets closer to or further away from the project's defined gold standard. ID: 1822457 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.