Average Credit Decreasing?

Author	Message
jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1786128 - Posted: 9 May 2016, 13:33:18 UTC - in response to Message 1786126. Last modified: 9 May 2016, 13:36:04 UTC - CreditNew takes the 'most efficient claiming' stock app as credit normalisation reference - Most efficient stock app here is always stock CPU AVX (because only one) [Correction: Only one radically underclaiming because AVX enabled] - Boinc Whetstone is used for claim and is scalar (non-simd) Whetstone - claim peak (#operations) is elapsed_secondsBoincWHetstone - * SSE-SSE4.2 acheives about 3 * Boinc Whetstone - ** AVX achieves about 5* Boinc Whetstone - Typical implementation efficiency of the applications, for CPU, is around 50% FPU (no-SIMD) ---> SSex is magically 150%+ efficient ---> AVX is magically 240% efficient ** Check Sisift Sandra Single threade SSE/AVX compared to Boinc Whetstone) Now by magic, everyone gets downscaled. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1786128 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1786129 - Posted: 9 May 2016, 13:33:36 UTC - in response to Message 1786125. In a way, that does relate to this morning's post on BOINC Dev - in a weird sort of way, that is. Go back to the concept of the Benchmarks (Whetstone and Dhrystone) rigorously excluding compiler optimisations (as the originals did in 1972), but extend it to the credit base - the stock CPU app - also being optimisation-free. That achieves the design objective of the benchmark accurately reflecting the computational power of the [base] application, as well or better than an optimised benchmark reflecting optimised science apps. That scheme has an elegant simplicity, and calls David's bluff perfectly: I do think it should be tried, even though we may have to go and picket the labs en masse to prevent David breaking in and changing it back again. I wonder how long we could hold out? Practical question: has anybody still got a copy of that stock code? Last I saw, it was running the 'CPU fallback' pathway in the original NVidia supplied CUDA app. ID: 1786129 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1786130 - Posted: 9 May 2016, 13:40:03 UTC - in response to Message 1786129. I have a feeling Eric might go for trying disabling SIMD on Beta at least, jsut to be able to rub David's nose in it. For Stock GPU, well doesn't matter, as the CPU will underclaim because of inflated GPU marketing peak_flops. So CPU Stock becomes reference. For Stock CPU code, it could be harder to regress to FPU/Scala-SSE, but all the code is there. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1786130 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1786133 - Posted: 9 May 2016, 13:53:20 UTC - in response to Message 1786130. Pssssst. I thought of that, too, but it has one major drawback. If Beta starts awarding 'proper' credit, guess where all the credit w****s will move to. It'll be Astropulse all over again. Eric would be well advised to disable new registrations before starting the test. Then it'll be ours, all ours, my precious. ID: 1786133 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1786137 - Posted: 9 May 2016, 13:59:54 UTC - in response to Message 1786133. Last modified: 9 May 2016, 14:01:05 UTC lol. my guesstimate is just a factor of 3.3x for MB, so probably wouldn't draw the extreme credit seekers, though would certainly represent a restoration of the decline I've seen over time. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1786137 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 1786143 - Posted: 9 May 2016, 14:22:30 UTC I am not going to complain about the current situation. But, I surely would not complain about a correction either. I used to have a RAC closer to 500k instead of 300k, and I have better kit online now than then. Meow. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 1786143 ·

Chris Adamek Volunteer tester Send message Joined: 15 May 99 Posts: 251 Credit: 434,772,072 RAC: 236	Message 1786159 - Posted: 9 May 2016, 15:30:40 UTC - in response to Message 1786143. I like it, nice and simple and yet solves the problem by turning the monster against itself. lol Would you even need to replace all the MB cpu stock apps with non-optimized versions, or would just hitting one of the platforms be sufficient? Chris ID: 1786159 ·

Jord Volunteer tester Send message Joined: 9 Jun 99 Posts: 15184 Credit: 4,362,181 RAC: 3	Message 1786163 - Posted: 9 May 2016, 15:36:26 UTC - in response to Message 1786128. - Boinc Whetstone is used for claim and is scalar (non-simd) Whetstone In how far has this got anything to do with the benchmarks BOINC does every so many times, and what if one has these disabled? I set through cc_config.xml not to do benchmarks. How much will it slow the calculations down? Is it even possible to do any radio frequency data calculations without use of the FFT algorithm? ID: 1786163 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1786179 - Posted: 9 May 2016, 16:23:31 UTC - in response to Message 1786163. - Boinc Whetstone is used for claim and is scalar (non-simd) Whetstone In how far has this got anything to do with the benchmarks BOINC does every so many times, and what if one has these disabled? I set through cc_config.xml not to do benchmarks. In order to really understand that, you probably need to have travelled with us through Evaluation Of Credit New (which I don't fully understand, even the bits I wrote myself). But I'm assured by the prime movers, including Jason, that the basic scaling of all credit awards is fundamentally keyed to the ratios of the BOINC benchmark and the performance of the CPU-only application. The BOINC benchmark shouldn't change over time (unless you upgrade the CPU, or unless you're into cheating). So provided it's run once and a value is stored, disabling repeats is no problem. How much will it slow the calculations down? Depends how far back we're able to go in the codebase. There have been many optimisations over the years, and of course they overlap with version changes. A true baseline build from the original sources would need to have the changes for '_enhanced', v7 (autocorrelations), and v8 (GBT) patched in. That would be much, much slower than current - dunno, x5, x6, on fast machines? Jason? That's the main reason I'd agree that Eric would probably want to use Beta, where it doesn't matter: there's still a huge contribution from volunteers using the stock app at Main for production work (302,322 GigaFLOPS, according to the applications page). Eric won't be able to throw away that much science for a test which I guess would need to run for weeks to make the point. Is it even possible to do any radio frequency data calculations without use of the FFT algorithm? No - FFT is essential. But SETI uses an external .DLL version of the FFTW library, which has many internal optimisations - but which I guess can also be run in plain-vanilla mode via an appropriate calling convention. ID: 1786179 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1786234 - Posted: 9 May 2016, 18:57:55 UTC - in response to Message 1786179. Last modified: 9 May 2016, 19:06:02 UTC In addition, what you get out of the derivation of all the aggregate benches, is a statistical average from which it's pretty easy to spot the peaks and outliers. i.e. one broken host bench or one person diddling the claim wouldn't have much if any effect, other than to be on the high side of their own quorum claim. There are multiple issues with using averages for this (estimate localisation) process, and you would recognise them if I point at each symptom that is visible. But that's a different [set of] issue[s] to the main scaling problem. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1786234 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22200 Credit: 416,307,556 RAC: 380	Message 1786246 - Posted: 9 May 2016, 19:32:30 UTC Last modified: 9 May 2016, 19:34:02 UTC Having had a look at the way CN works (as opposed to how it is meant to work) I can see a couple of flaws that seriously hamper its "fitness for purpose". First is there is a dependence on the processor running the task, and second it there is a dependence on the manner in which the calculation is performed. There is some sort of (weak) correction for the performance of the processor, but that fails when the processor is running multiple tasks (e.g. a GPU running two or three simultaneous tasks. However there is no attempt to correct for the computational efficiency of the task being run. Any "sensible" credit system should give a score based on the task, not the manner in which the task has been completed, or the processor on which that task has been run. Looking at MB data it is fairy obvious that the relationship between run time (which is a "fair indicator" of computing effort) and angle, or pulses, or spikes etc. is not a simple linear one - it is a curve, with a sweet spot somewhere between 0.3 and 2 - recent data hasn't had enough tasks in this range to say anything better than that. A simple three term fit may be quite adequate. Advantages - the score is now independent of the computer (unless the task "really" ends in error), thus the more tasks you run, and are validated, the higher your score. One thing to consider is giving a small fraction of the task score in the event of the task really ending in error, say 2%, thus you get a small reward for your efforts... I've not been able to identify a suitable score seed for AP as I've not been able to collect enough data to do any sensible seed/time analysis. That said blanking in combination with number of peaks is looking hopeful. I can see one problem - the score calculation would need to be re-calibrated whenever there was a change in the base algorithm - that is the "what the calculations are all about", NOT the application which the "how the calculations are actually done". A nice, self-self contained project.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1786246 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1786256 - Posted: 9 May 2016, 20:03:41 UTC - in response to Message 1786234. Last modified: 9 May 2016, 20:09:50 UTC Some assorted notes I intended as an [Edit:], but time ran out, on why putting SIMD vectorisation on Boinc WHetstone without a fair bit of other supporting logic would be a bad idea: - mostly only newer, generally faster hosts, that return more CPU results, would tend to update client, so skewing the new numbers - Not all stock applications will use maximum SIMD available on a given host (e.g. WHere AVX might be available, stock CPU AP doesn;t have it) - Would have to add new code every time new instruction set extensions are added Following the averages up through estimates to intended credit (Cobblestone scale award) there are alternatives to de-vectorising the stock cpu applications. Since the host features, features of the boinc clients including type of Whetstone (e.g. Android's is vectorised after some version), and features of stock applications are all known, a justifiable flops multiplier for where boinc Whetstone doesn't include vectorisation might be: if ( Whetstone_of_client_not_vectorised() ) // e.g. not current Android { pfc_multiplier = 1 + logbase2(min(max_vector_length_of_app(), max_vector_length_of_host()); } else { // e.g. recent Android client pfc_multiplier = (1 + logbase2(min(max_vector_length_of_app(), max_vector_length_of_host())) / logbase2(vector_length_of_Boinc_Whetstone); } Produces effective multipliers:- fpu only host+app (rare): 1 Current/old Windows/Linux/Mac-intel client with app using SSE-SSE4.2: 3 Current/old Windows/Linux/Mac-intel client with app using AVX256: 4 Current Android client using vectorised app: 1 older Android client (non SIMD WHetstone) using vectorised app: 3 "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1786256 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1786261 - Posted: 9 May 2016, 20:15:25 UTC - in response to Message 1786256. Sanity check value for multipliers: SIMD_enabled_Whetstone/Boinc_Whetstone SIMD_enabled_Whetstone being a single threaded benchmark such as Sisoft Sandra Lite, or similar. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1786261 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1786269 - Posted: 9 May 2016, 20:39:13 UTC - in response to Message 1786246. Last modified: 9 May 2016, 20:48:25 UTC ... and second it there is a dependence on the manner in which the calculation is performed. That's the big one. It's supposed to gauge the compute efficiency of the application, to scale everyone's estimates, and yield a useful GFlops number. That the applications+hosts are capable of 3-4x Boinc Whetstone on a single thread, then a well optimised and vectorised stock CPU application claims >100% efficiency [obvious nonsense], so all estimates end up short, especially starting a new app, bounds/limits before time_exceeded abort unintentionally close, and credit low. Engineering-wise, another more complete way to fix is just measure some representative hosts, and use summary figures at each level: (using the term 'average' here for familiarity, but it should really say 'controlled estimate', because averages make bad controllers) 'average' GFlops on appVersionDevice (client side) 'average' GFlops on HostAppVersion (server side, mirrored/synthesised using validations) 'average' GFlops on appVersion (project side, many hosts) 'average' GFlops on app (project side, 1 to many appversions) 'average' GFlops on project. ---> Cascade Controllers percolating summary estimates up the chain :D [Down rather, in the order listed] Then you have good starting points for new hosts, New Apps, and potentially new projects, and can grant credit from the appropriate reference level, e.g. at app or project level, a slowly adjusting stable figure. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1786269 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20283 Credit: 7,508,002 RAC: 20	Message 1786295 - Posted: 9 May 2016, 22:46:13 UTC Last modified: 9 May 2016, 22:49:14 UTC Very good this is getting thought about again but a bit more seriously... One thought that might be out of date or just simply wrong but just to check just the same... Is there not some server-side code that takes a (median?) average to 'normalize' all the credit rates? And would that not be very significantly skewed as soon as a GPU result became that magic median value that was taken to be representative of all?... (As in, the credit for CPU-only users would then be seen to plummet.) As I'm sure is appreciated, great care is needed when applying "fiddle-factors" to attempt to massage the results for whatever is not being directly measured... Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 1786295 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1786306 - Posted: 9 May 2016, 23:49:09 UTC - in response to Message 1786295. Last modified: 9 May 2016, 23:58:52 UTC Is there not some server-side code that takes a (median?) average to 'normalize' all the credit rates? Yeah, but it's a weighted average, rather than a median, which would have been a better choice. As for the normalisation step, that's the problem, For multibeam it's normalising to AVX effective underclaiming stock CPU app, further skewed in number of results by the fact that AVX enabled hosts tend to return more results. [A quick way to halve AP credit, would be to release a well optimised stock AVX CPU app for it.] Probably what you've picked up on already: For that choice of weighted average, median, or whatever choices of many suitable options, it's functionally a filter, which is a kind of controller, functionality for localising the estimates. Averages (weighted or otherwise) are known to have a few problems in that context, namely slow convergence, sensitivity to disturbances, and a lack of control to stabilise the thing. Additionally it's using database space when other filter/controller options can do the same job with little to no such overhead. For the fast response estimates, such as in-client and equivalent sythetic image on the server on per task and per validation change, most likely running median would work, though either Kalman or Extended Kalman would be more efficient and provably optimal in terms of convergence and over/undershoot and response to change. Also simple PID control works fine for tracking local estimates (already tested that, just because it was easy) For the slower changing appversion, app, and project-wide estimate localisation, properly controlled estimates at host level mean probably simple medians or even averages would be fine, Though having those levels operate at different controllable rates would make host statistics percolate up through the levels, giving useful summary info for initial estimates of something new, as well as system health indicators. Different words, cascaded controllers operating in the same time domain tend to fight one another and induce instability (as we sometimes see), while carefully including temporal spacing gives coarse->fine->finer->finest control. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1786306 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14650 Credit: 200,643,578 RAC: 874	Message 1786384 - Posted: 10 May 2016, 10:35:22 UTC I think we've mentioned already that a parallel discussion on BOINC credit systems appeared on the BOINC message boards: [Discussion] 4th Generation BOINC credit system In recent days, the BOINC discussion has somewhat submerged under smoke and mirrors deployed by proponents on either side of a GridCoin debate. (This tends to happen with all BOINC credit discussions at some stage, to a greater or lesser extent) But Christian Beer - again previously mentioned in this thread, as a member of the BOINC PMC - has this morning attempted to draw the BOINC discussion back to BOINC credit issues, and seems both to acknowledge the discontent within BOINC concerning the current (3rd Generation) credit system, and to be open to proposals for improvement/replacement. I intend to throw my 0.02â‚¬ into the ring later today. ID: 1786384 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1786397 - Posted: 10 May 2016, 11:56:54 UTC - in response to Message 1786384. Interesting, I wonder if the motivation is coming around (even though being really about estimates was there all along, disguised by 'eww Credit' ) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1786397 ·

shizaru Volunteer tester Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0	Message 1786410 - Posted: 10 May 2016, 12:50:07 UTC - in response to Message 1786384. ...open to proposals for improvement/replacement. Improvement is impossible and replacement is moot as long as you guys continue to pretend that even the CreditNew version of Credit is NOT a FLOPS (or MIPS or whatever) counter. I don't care how many "if found cheating->then..." loops CN has to go through and it doesn't matter. Because at the very END of the equation we're multiplying whatever garbage comes out of all those hoops and loops with... drumroll... Cobblestones. Then grant CREDIT in Cobblestones. Then display RAC in Cobblestones. Ergo: CN = Flopcounter So even if I get another Bank of Zimbabwe speech, CN will still be a Flopcounter. Raistmer, this goes for you too BTW. OK so hopefully we're out of the "denial" stage and can focus on options. Well the next step in our 12-step program is the only one worth talking about. Someone, somewhere has to decide what's more important: the anti-cheating loops or the speedometer? -If it's the anti-cheating loops then we have to bite the bullet, get rid of Cobblestones and multiply by rainbows or unicorns or whatever. Because we can multiply by whatever we want. It makes no difference whatsoever in this scenario. -If it's the speedometer then it has to be stable. It has to at least be stable within each project. And it has to be stable against future app versions. It doesn't have to be super-accurate at first, just stable. It can start out conservatively for example and then someone like Jason can do what he does best: make it better. And enjoy himself while doing it for a change :) Otherwise, as long as you're calling credit "Cobblestones" people will think there's something wrong with their computer... they'll fiddle with everything for a couple of days, stumble into the forums screaming bloody murder and most you guys will get all territorial and say: "Don't let the door hit you on the way out" How many times do we have to watch re-runs of the same episode? ID: 1786410 ·

jason_gee Volunteer developer Volunteer tester Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0	Message 1786415 - Posted: 10 May 2016, 13:06:59 UTC - in response to Message 1786410. Last modified: 10 May 2016, 13:17:47 UTC Not sure if those are meant to be insinuations against my engineering abilities. [Edit:] scratching head guess not, reading the bit mentioning me :). Just reading way more anger into that than is probably intended. Forgetting about credit altogether will dial down the stress a lot. Better estimates and the cobblestones fall out as a side effect. Think convergence on +/- 10 % elapsed estimates on a given host, adapting to major changes within 10 tasks, would be enough of a goal ? "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. ID: 1786415 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.