Monitoring inconclusive GBT validations and harvesting data for testing

Author	Message
Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1820474 - Posted: 29 Sep 2016, 10:00:27 UTC - in response to Message 1820465. Now if we go back to idea of FP:strict (IEEE754) vs anything else Double precision etc etc, Have any of you an idea of speed penalty of going strict instead of precise, double precise is? If going to fp:strict single precision (More isn't needed apparently) is a few percent slower then so be it for the sake of conformity! But if it is half the speed etc then, No that is not the route to go "for now" but instead of focusing in the validator/re-order-of-work-reported issue that seems to be apparent. There is no connection between fp:strict or whatever precision switch can be and reporting subset of results on overflow. This was mentioned in this thread before. Increasing precision is not a solution for overflow tasks. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1820474 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1820475 - Posted: 29 Sep 2016, 10:06:17 UTC - in response to Message 1820470. Well what can I say, Opposite was true for Cuda builds and I guess the host code is different. It was Richard that brought flaky Gaussians to my attention, host fp:precise fixed them against 8.00, and no repeatable dissimilarity to 8.00 CPU has been reported to me since. Yes, host code is different indeed. So, I'll reformulate regarding fp:precise: it's not an universal solution to fix precision-related issues. And of course, not a solution at all for different ordering issue on overflows (as already stated in this thread). SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1820475 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 1820477 - Posted: 29 Sep 2016, 10:11:18 UTC - in response to Message 1820474. Last modified: 29 Sep 2016, 10:13:21 UTC There is no connection between fp:strict or whatever precision switch can be and reporting subset of results on overflow.. Isn't you all using fp:precise (single) i just wanted to ask what happens if app is compiled and tested with fp:strict (single) instead! What is the speed penalty of going precise(single) to strict(single)?? This was mentioned in this thread before. Increasing precision is not a solution for overflow tasks I'm not talking about solving overflow tasks, that was not the purpose. (This was an offtopic question that popped up in my mind) The purpose in my mind was an overall platform standard that should follow IEEE754 regardless of cpu, x32 x64 arm, gpu. When calculated and fixed correctly then the outcome would be so very Close to Q100 as it possibly can resulting in less headbanging for all of you optimisers in the future. The idea of me telling you to test for that direction is mainly for you all to switch more to code optimising instead of bughunting various platforms until hell freezes over. It will only increase as i say not decrease. Until you know for sure that it won't work i will continue to push on this for unification if it isn't so much slower than using precise. When numbers have been presented here as an comparison then we know 100% if this is not Worth it or not but if going fp:strict is for an example 3% slower but Q is increased to Q99.99 - Q100 range then if i were an Project manager i would vouch to go that route now instead of banging heads for more months/years to come chasing annoying rounding bugs and result disparities. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 1820477 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1820478 - Posted: 29 Sep 2016, 10:20:43 UTC - in response to Message 1820477. There is no connection between fp:strict or whatever precision switch can be and reporting subset of results on overflow.. Isn't you all using fp:precise (single) i just wanted to ask what happens if app is compiled and tested with fp:strict (single) instead! What is the speed penalty of going precise(single) to strict(single)?? No, all my builds use /fp:fast for example as always was with AKv8 derivatives AFAIK. Out of interest I could provide you builds for comparison. Recently found inefficiency in CPU pulsefinding makes CPU apps rebuild worthwhile. The purpose in my mind was an overall platform standard that should follow IEEE754 regardless of cpu, x32 x64 arm, gpu. This rules out IEE754-incompatible devices w/o any real need to doing so. The idea of me telling you to test for that direction is mainly for you all to switch more to code optimising instead of bughunting various platforms until hell freezes over. It will only increase as i say not decrease. Unjustified idealization here. Most of bugs hunting (except own bugs of course) coming from non-complying runtimes. If runtime doesn't comply with standart stricly following standart will not help, just make debugging even more obscure. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1820478 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 1820479 - Posted: 29 Sep 2016, 10:21:18 UTC Last modified: 29 Sep 2016, 10:26:21 UTC If the s@h people says that they want to use IEEE754 in future releases to iron out differences then science wise it should be very welcome. It's easier to ban an platform/compiler that doesn't conform to those rules in Boinc API if you developers find an combination that doesnt work properly. https://en.wikipedia.org/wiki/IEEE_floating_point#Basic_and_interchange_formats Its only to binary32 or decimal32 what serves best from the simpliest cpu application up to monster quadruple gpu/fpga/asic cores in the future. If you all find a card or driver that doesn't work then it's up to the manufacturer to patch their shit so that they can conform to be working 100% to IEEE754 standard. EDIT: All this above is to get code more to the Q100 mark whatever platform/combination as possible but as a second step perhaps but as we've noticed that thing that i mention now has nothing to do with the main topic of the thread of inconclusive validations, that is Another thing ofcourse that actually needs to be fixed on Another level because i'm sure that each and every one of those applications if compared to all signals found (30+) would get the Q99+ so they most certainly would validate. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 1820479 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 1820481 - Posted: 29 Sep 2016, 10:28:32 UTC Now one of them validated them all! http://setiathome.berkeley.edu/workunit.php?wuid=2276193382 _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 1820481 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1820482 - Posted: 29 Sep 2016, 10:29:07 UTC - in response to Message 1820479. If you all find a card or driver that doesn't work then it's up to the manufacturer to patch their shit so that they can conform to be working 100% to IEEE754 standard. Hm... looks like you dont' read these forums frequently. Else you would know how long is that "patch their shit" list currently is even w/o any precision-compliance. Well, currently we have 2 platforms with real precision issues: OpenCL NV + OS X of modern version; OpenCL Intel + some (still not known exactly those) devices and drivers. OS X out of my scope, but I could do some experiments with iGPU builds regarding /fp:* switches. I think better to make experimental proofs versus plain discussions. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1820482 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1820483 - Posted: 29 Sep 2016, 10:30:22 UTC - in response to Message 1820481. Now one of them validated them all! http://setiathome.berkeley.edu/workunit.php?wuid=2276193382 As should be with current validator in most cases. Nothing really interesting here. But to reduce inefficiency of re-processing validator should be changed. As I said earlier, this topic in discussion with Eric. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1820483 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1820484 - Posted: 29 Sep 2016, 10:32:29 UTC - in response to Message 1820477. Here, on Cuda, fp:strict had minimal performance impact, but actually made match to stock 8.00 worse. That's because x86 builds built with gnu compiler uses the x87 FPU, which uses 80 bit registers for intermediates. There are no such intermediate registers on the GPUs, nor in the SSE+ parts of CPUs, and are blocked from use by ms compiler in x64 builds entirely (so they can reappropriate as general x64 registers, and register renaming capability for Windows32 on Windows64) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1820484 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1820485 - Posted: 29 Sep 2016, 10:34:05 UTC - in response to Message 1820479. Last modified: 29 Sep 2016, 10:35:10 UTC EDIT: All this above is to get code more to the Q100 mark whatever platform/combination as possible but as a second step perhaps but as we've noticed that thing that i mention now has nothing to do with the main topic of the thread of inconclusive validations, that is Another thing ofcourse that actually needs to be fixed on Another level because i'm sure that each and every one of those applications if compared to all signals found (30+) would get the Q99+ so they most certainly would validate. Actually, conforming IEEE754 standart in rounding will not result in Q100 mark either. Standart just describes how rounding will be made, it can't prevent precision lost in such case for example: A+B+C versus A+(B+C) in case where A is big number and B and C much smaller ones. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1820485 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1820486 - Posted: 29 Sep 2016, 10:34:25 UTC - in response to Message 1820477. I'm not talking about solving overflow tasks, that was not the purpose. (This was an offtopic question that popped up in my mind) Most of the conversation this morning has arisen from your message 1820401: Nice find Jeff! in reply to Jeff Buck's message 1820375: It's a -9 overflow with 3 different apps coming up with 4 different results As Jeff says, the point at issue there is very specifically overflow tasks. I haven't (recently) seen any obvious cases where the validator rejections have been attributable directly to precision issues. As Jason reminded us, there were some in the CUDA builds, but they were detected and corrected during the pre-release testing phase for SaH v8 (Breakthough Listen and Guppi). As thread originator, I'm absolutely happy to discuss precision issues here too, but let's try to be rigorous, please, and address comments to the appropriate sub-category (overflow, precision, or whatever else). Is anyone currently seeing systemic validator rejections because of poor precision of the correct signal, rather than selection of the wrong signal to report? ID: 1820486 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1820487 - Posted: 29 Sep 2016, 10:36:00 UTC - in response to Message 1820481. Last modified: 29 Sep 2016, 10:37:06 UTC Now one of them validated them all! http://setiathome.berkeley.edu/workunit.php?wuid=2276193382 Only important one is the canonical, at least the top 2 will be weakly similar. 8.00 apparently strongly matched it, as the results are similarly rolling in here now with Cuda. Q=99.38% . Closer than that is not with reasonable practical reach at this time ;) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1820487 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1820489 - Posted: 29 Sep 2016, 10:38:04 UTC - in response to Message 1820486. Is anyone currently seeing systemic validator rejections because of poor precision of the correct signal, rather than selection of the wrong signal to report? Yes, in some iGPUs as you know from beta. SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1820489 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1820490 - Posted: 29 Sep 2016, 10:40:27 UTC - in response to Message 1820481. Now one of them validated them all! http://setiathome.berkeley.edu/workunit.php?wuid=2276193382 The new (Stock CPU) result must have been strongly similar to "canonical result 5182875831" - the opencl_ati_cat132. All of the others must have been weakly similar to one of those two - and 'weakly' can be very weak indeed (only half the signals matched). I'd actually not call that validated at all, but we're stuck with a binary choice in the status column. ID: 1820490 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1820491 - Posted: 29 Sep 2016, 10:41:30 UTC - in response to Message 1820486. Last modified: 29 Sep 2016, 10:42:31 UTC Is anyone currently seeing systemic validator rejections because of poor precision of the correct signal, rather than selection of the wrong signal to report? Just confirming both Jeff's matched (8.00) on the 2 cuda flavours, Q over 99%. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1820491 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1820492 - Posted: 29 Sep 2016, 10:42:43 UTC - in response to Message 1820490. Last modified: 29 Sep 2016, 10:44:05 UTC I'd actually not call that validated at all, but we're stuck with a binary choice in the status column. And this "feature" really hides issues making builds validation and debugging harder (though allows that damned credits receiving of course...) SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1820492 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1820493 - Posted: 29 Sep 2016, 10:42:45 UTC - in response to Message 1820489. Last modified: 29 Sep 2016, 11:10:35 UTC Is anyone currently seeing systemic validator rejections because of poor precision of the correct signal, rather than selection of the wrong signal to report? Yes, in some iGPUs as you know from beta. Fair point. I've not (yet) done a side-by-side visual comparison of the signal summary reports for one of those, but it would be worth doing. Edit - and the newly-validated one provides an excellent case study. We have an iGPU (HD Graphics 530) with an enormous inconclusive count, and a canonical signal display from the ATi. I'll grab them, and compare after lunch. Edit2 - and both using comparable r3430 code. Even nicer. Edit3 - initial eyeball: the iGPU reported a triplet that the ATI didn't. Threshhold issue, perhaps? ID: 1820493 ·

Raistmer Volunteer developer Volunteer tester Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121	Message 1820500 - Posted: 29 Sep 2016, 11:00:35 UTC - in response to Message 1820482. Last modified: 29 Sep 2016, 11:02:25 UTC If you all find a card or driver that doesn't work then it's up to the manufacturer to patch their shit so that they can conform to be working 100% to IEEE754 standard. Hm... looks like you dont' read these forums frequently. Else you would know how long is that "patch their shit" list currently is even w/o any precision-compliance. To elaborate little more on this: http://setiathome.berkeley.edu/forum_thread.php?id=80247&postid=1820339 Recently testing new builds with Mike and his GPU we discovered that last build stopped to provide inconclusives being run in multiple instances... But then I looked inside stderr and found obviously bad and wrong numbers in profiling counters app prints now. They work OK in single-instance and OK in multiple-instance modes in other configs I tested (where multiple instances allowed before too). So, I think it's straight evidence that driver GPU context switching just bugged for that whole AMD GPUs family on Windows! And we talking about rare borderline inconclusives from Q99 instead of Q100 here.... SETI apps news We're not gonna fight them. We're gonna transcend them. ID: 1820500 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 1820504 - Posted: 29 Sep 2016, 11:39:40 UTC - in response to Message 1820492. I'd actually not call that validated at all, but we're stuck with a binary choice in the status column. And this "feature" really hides issues making builds validation and debugging harder (though allows that damned credits receiving of course...) You got a Point there, because it all gets down to human psychology. If something doesn't needs to be fixed because it won't matter in the end (credits) it won't get fixed that much. But if weakly ones doesnt get a single credit then things would sped up dramtically to make it work or if it can't work then "ban" the computer/platform/gpu combo in the servers instead and don't send units to devices that can't compute them thoroughly. As simple as that really. It would be ashame if Cuda/Amds GPU hardware gets there but in the end it is the same rules that then would apply for everyone. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 1820504 ·

-= Vyper =- Volunteer tester Send message Joined: 5 Sep 99 Posts: 1652 Credit: 1,065,191,981 RAC: 2,537	Message 1820505 - Posted: 29 Sep 2016, 11:46:53 UTC - in response to Message 1820493. Edit - and the newly-validated one provides an excellent case study. We have an iGPU (HD Graphics 530) with an enormous inconclusive count, and a canonical signal display from the ATi. I'll grab them, and compare after lunch. I remember this from last year when i noticed something with iGPU on Intel. Posted: 16 Sep 2015, 14:12:34 UTC Edit Hide Move Last modified: 16 Sep 2015, 14:12:51 UTC Hey Need some assistance Before i start to plunge Deep into my issue. One of my crunchers has got a new Cpu up and running. Problem is that my Intel GPU is starting to pause work in progress and start on the next and next and so on so my computer is refused new work on the Nvidia GPU. I presume it's an EDF thing. How is the right way to adress this nowadays? I bought it solely to Crunch at iGPU and Cpu at the same time as the Igpu is powerful but i sold it and bought a 6700K instead. http://ark.intel.com/sv/products/88040/Intel-Core-i7-5775C-Processor-6M-Cache-up-to-3_70-GHz This processor couldn't do Astropulse and just paused the work and started the next unit, no one at lunatics had an answer that solved this back then either. _________________________________________________________________________ Addicted to SETI crunching! Founder of GPU Users Group ID: 1820505 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.