Message boards :
Number crunching :
Average Credit Decreasing?
Message board moderation
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · 16 · 17 . . . 33 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
In addition, what you get out of the derivation of all the aggregate benches, is a statistical average from which it's pretty easy to spot the peaks and outliers. i.e. one broken host bench or one person diddling the claim wouldn't have much if any effect, other than to be on the high side of their own quorum claim. There are multiple issues with using averages for this (estimate localisation) process, and you would recognise them if I point at each symptom that is visible. But that's a different [set of] issue[s] to the main scaling problem. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
rob smith ![]() ![]() ![]() Send message Joined: 7 Mar 03 Posts: 20793 Credit: 416,307,556 RAC: 380 ![]() ![]() |
Having had a look at the way CN works (as opposed to how it is meant to work) I can see a couple of flaws that seriously hamper its "fitness for purpose". First is there is a dependence on the processor running the task, and second it there is a dependence on the manner in which the calculation is performed. There is some sort of (weak) correction for the performance of the processor, but that fails when the processor is running multiple tasks (e.g. a GPU running two or three simultaneous tasks. However there is no attempt to correct for the computational efficiency of the task being run. Any "sensible" credit system should give a score based on the task, not the manner in which the task has been completed, or the processor on which that task has been run. Looking at MB data it is fairy obvious that the relationship between run time (which is a "fair indicator" of computing effort) and angle, or pulses, or spikes etc. is not a simple linear one - it is a curve, with a sweet spot somewhere between 0.3 and 2 - recent data hasn't had enough tasks in this range to say anything better than that. A simple three term fit may be quite adequate. Advantages - the score is now independent of the computer (unless the task "really" ends in error), thus the more tasks you run, and are validated, the higher your score. One thing to consider is giving a small fraction of the task score in the event of the task really ending in error, say 2%, thus you get a small reward for your efforts... I've not been able to identify a suitable score seed for AP as I've not been able to collect enough data to do any sensible seed/time analysis. That said blanking in combination with number of peaks is looking hopeful. I can see one problem - the score calculation would need to be re-calibrated whenever there was a change in the base algorithm - that is the "what the calculations are all about", NOT the application which the "how the calculations are actually done". A nice, self-self contained project.... Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
Some assorted notes I intended as an [Edit:], but time ran out, on why putting SIMD vectorisation on Boinc WHetstone without a fair bit of other supporting logic would be a bad idea: - mostly only newer, generally faster hosts, that return more CPU results, would tend to update client, so skewing the new numbers - Not all stock applications will use maximum SIMD available on a given host (e.g. WHere AVX might be available, stock CPU AP doesn;t have it) - Would have to add new code every time new instruction set extensions are added Following the averages up through estimates to intended credit (Cobblestone scale award) there are alternatives to de-vectorising the stock cpu applications. Since the host features, features of the boinc clients including type of Whetstone (e.g. Android's is vectorised after some version), and features of stock applications are all known, a justifiable flops multiplier for where boinc Whetstone doesn't include vectorisation might be: if ( Whetstone_of_client_not_vectorised() ) // e.g. not current Android { pfc_multiplier = 1 + logbase2(min(max_vector_length_of_app(), max_vector_length_of_host()); } else { // e.g. recent Android client pfc_multiplier = (1 + logbase2(min(max_vector_length_of_app(), max_vector_length_of_host())) / logbase2(vector_length_of_Boinc_Whetstone); } Produces effective multipliers:- fpu only host+app (rare): 1 Current/old Windows/Linux/Mac-intel client with app using SSE-SSE4.2: 3 Current/old Windows/Linux/Mac-intel client with app using AVX256: 4 Current Android client using vectorised app: 1 older Android client (non SIMD WHetstone) using vectorised app: 3 "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
Sanity check value for multipliers: SIMD_enabled_Whetstone/Boinc_Whetstone SIMD_enabled_Whetstone being a single threaded benchmark such as Sisoft Sandra Lite, or similar. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
... and second it there is a dependence on the manner in which the calculation is performed. That's the big one. It's supposed to gauge the compute efficiency of the application, to scale everyone's estimates, and yield a useful GFlops number. That the applications+hosts are capable of 3-4x Boinc Whetstone on a single thread, then a well optimised and vectorised stock CPU application claims >100% efficiency [obvious nonsense], so all estimates end up short, especially starting a new app, bounds/limits before time_exceeded abort unintentionally close, and credit low. Engineering-wise, another more complete way to fix is just measure some representative hosts, and use summary figures at each level: (using the term 'average' here for familiarity, but it should really say 'controlled estimate', because averages make bad controllers) 'average' GFlops on appVersionDevice (client side) 'average' GFlops on HostAppVersion (server side, mirrored/synthesised using validations) 'average' GFlops on appVersion (project side, many hosts) 'average' GFlops on app (project side, 1 to many appversions) 'average' GFlops on project. ---> Cascade Controllers percolating summary estimates up the chain :D [Down rather, in the order listed] Then you have good starting points for new hosts, New Apps, and potentially new projects, and can grant credit from the appropriate reference level, e.g. at app or project level, a slowly adjusting stable figure. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() Send message Joined: 25 Nov 01 Posts: 16000 Credit: 7,508,002 RAC: 20 ![]() ![]() |
Very good this is getting thought about again but a bit more seriously... One thought that might be out of date or just simply wrong but just to check just the same... Is there not some server-side code that takes a (median?) average to 'normalize' all the credit rates? And would that not be very significantly skewed as soon as a GPU result became that magic median value that was taken to be representative of all?... (As in, the credit for CPU-only users would then be seen to plummet.) As I'm sure is appreciated, great care is needed when applying "fiddle-factors" to attempt to massage the results for whatever is not being directly measured... Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
Is there not some server-side code that takes a (median?) average to 'normalize' all the credit rates? Yeah, but it's a weighted average, rather than a median, which would have been a better choice. As for the normalisation step, that's the problem, For multibeam it's normalising to AVX effective underclaiming stock CPU app, further skewed in number of results by the fact that AVX enabled hosts tend to return more results. [A quick way to halve AP credit, would be to release a well optimised stock AVX CPU app for it.] Probably what you've picked up on already: For that choice of weighted average, median, or whatever choices of many suitable options, it's functionally a filter, which is a kind of controller, functionality for localising the estimates. Averages (weighted or otherwise) are known to have a few problems in that context, namely slow convergence, sensitivity to disturbances, and a lack of control to stabilise the thing. Additionally it's using database space when other filter/controller options can do the same job with little to no such overhead. For the fast response estimates, such as in-client and equivalent sythetic image on the server on per task and per validation change, most likely running median would work, though either Kalman or Extended Kalman would be more efficient and provably optimal in terms of convergence and over/undershoot and response to change. Also simple PID control works fine for tracking local estimates (already tested that, just because it was easy) For the slower changing appversion, app, and project-wide estimate localisation, properly controlled estimates at host level mean probably simple medians or even averages would be fine, Though having those levels operate at different controllable rates would make host statistics percolate up through the levels, giving useful summary info for initial estimates of something new, as well as system health indicators. Different words, cascaded controllers operating in the same time domain tend to fight one another and induce instability (as we sometimes see), while carefully including temporal spacing gives coarse->fine->finer->finest control. "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14474 Credit: 200,643,578 RAC: 874 ![]() ![]() |
I think we've mentioned already that a parallel discussion on BOINC credit systems appeared on the BOINC message boards: [Discussion] 4th Generation BOINC credit system In recent days, the BOINC discussion has somewhat submerged under smoke and mirrors deployed by proponents on either side of a GridCoin debate. (This tends to happen with all BOINC credit discussions at some stage, to a greater or lesser extent) But Christian Beer - again previously mentioned in this thread, as a member of the BOINC PMC - has this morning attempted to draw the BOINC discussion back to BOINC credit issues, and seems both to acknowledge the discontent within BOINC concerning the current (3rd Generation) credit system, and to be open to proposals for improvement/replacement. I intend to throw my 0.02€ into the ring later today. |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
Interesting, I wonder if the motivation is coming around (even though being really about estimates was there all along, disguised by 'eww Credit' ) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() ![]() Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0 ![]() |
...open to proposals for improvement/replacement. Improvement is impossible and replacement is moot as long as you guys continue to pretend that even the CreditNew version of Credit is NOT a FLOPS (or MIPS or whatever) counter. I don't care how many "if found cheating->then..." loops CN has to go through and it doesn't matter. Because at the very END of the equation we're multiplying whatever garbage comes out of all those hoops and loops with... drumroll... Cobblestones. Then grant CREDIT in Cobblestones. Then display RAC in Cobblestones. Ergo: CN = Flopcounter So even if I get another Bank of Zimbabwe speech, CN will still be a Flopcounter. Raistmer, this goes for you too BTW. OK so hopefully we're out of the "denial" stage and can focus on options. Well the next step in our 12-step program is the only one worth talking about. Someone, somewhere has to decide what's more important: the anti-cheating loops or the speedometer? -If it's the anti-cheating loops then we have to bite the bullet, get rid of Cobblestones and multiply by rainbows or unicorns or whatever. Because we can multiply by whatever we want. It makes no difference whatsoever in this scenario. -If it's the speedometer then it has to be stable. It has to at least be stable within each project. And it has to be stable against future app versions. It doesn't have to be super-accurate at first, just stable. It can start out conservatively for example and then someone like Jason can do what he does best: make it better. And enjoy himself while doing it for a change :) Otherwise, as long as you're calling credit "Cobblestones" people will think there's something wrong with their computer... they'll fiddle with everything for a couple of days, stumble into the forums screaming bloody murder and most you guys will get all territorial and say: "Don't let the door hit you on the way out" How many times do we have to watch re-runs of the same episode? |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
Not sure if those are meant to be insinuations against my engineering abilities. [Edit:] *scratching head* guess not, reading the bit mentioning me :). Just reading way more anger into that than is probably intended. Forgetting about credit altogether will dial down the stress a lot. Better estimates and the cobblestones fall out as a side effect. Think convergence on +/- 10 % elapsed estimates on a given host, adapting to major changes within 10 tasks, would be enough of a goal ? "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Richard Haselgrove ![]() Send message Joined: 4 Jul 99 Posts: 14474 Credit: 200,643,578 RAC: 874 ![]() ![]() |
Ergo: CN = Flopcounter May I take slight issue with that? In my view, CN pretends to be a flopcounter: it presents its figures - yes, in cobblestones - as if they were flops which had been counted. But they're not, and they haven't been. One of the most er, dodgy, things that BOINC (centrally) does is to reconvert [*] arbitrary CN numbers back into what it claims are scientifically counted flops. Thank you for reminding me of that. Arguably, the best BOINC credit system so far was the second - Eric's attempt at true flopcounting, which worked pretty well here, from what I remember. [*] on the BOINC home page currently: "24-hour average: 11.562 PetaFLOPS" |
![]() ![]() Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0 ![]() |
(EDIT: Richard, hadn't seen you're post when I wrote whatever's below but since we're kinda on the same page now I don't think I'll add anything.) Unfortunately that tone is my baseline. It gets under people's skin I know. 3 parts INTJ and 1 part cultural differences. It's you, and Richard, and Raistmer (and all the other "above & and beyond the call of duty" people) whose life we need to make easier. Ironically that was the whole point of my previous post. It was my point back when I got the BoZ speech too. -If we multiply the estimates by rainbows (or call the new credits Andersens if it makes anyone feel better) then everybody can sit back, relax and enjoy themselves. Because once we do, we can define what those credits are. And it'll look something like this (just a lot better written): "We've tried to make a system that's FLOPS-based but unfortunately we can't come up with one (for now) that isn't vulnerable to cheating. CREDITS will henceforth be called Unicorns (and RAC will become Recent Average Unicorns) and be a daily token given by Boinc that CAN NOT and SHOULD NOT be compared with past or future tokens. And it should most certainly NOT be used as a measure of your PC's health or abilities. We'd love to give you a speedometer but are unable to create one ATM that doesn't compromise the science. Hopefully one day we will... But until then it's Unicorns." You can't say any of that though if you're calling your CREDITS "Cobblestones". How much time and energy would it have saved you guys (the gurus) if this was done when v7 rolled out? ----- PS Think convergence on +/- 10 % elapsed estimates on a given host, adapting to major changes within 10 tasks, would be enough of a goal ? Is that question directed at me? I'm guessing it's for Richard but not quite sure... |
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51445 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
The kitties like to call their credits....kibblestones. "Freedom is just Chaos, with better lighting." Alan Dean Foster ![]() |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
yes actually both correct the way I see it. The terms 'flopcounter', 'estimate' and 'controller' are for the purposes of scheduling the tasks, and mildy stretching semantics more or less interchangeable. Doesn't matter whether you estimate tasks in seconds of work, #fpOperations, Cobblestones, or bananas. Ãt's the function of the estimate that's important, and the quality of its predictions with respect to requirements. Unfortunately there is a disconnect between user expectations/needs, and the CreditNew spec which has not specified any metrics to determine whether it's working properly or not. Something related ML1 touched on as well was the idea of 'fudge factors', and a reasonable aversion to them. The thing is, if it meets non-existant specifications/requirements then it isn't a fudge-factor, it's a control point. If it's a logically justifiable procedure to solving problem, it's an algorithm. So 'real' first step IMO, is specifying the requirements/needs (based on our experience that the current situation isn't good enough). Debating semantics is what the Committee is for :-D "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() ![]() Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 ![]() |
Think convergence on +/- 10 % elapsed estimates on a given host, adapting to major changes within 10 tasks, would be enough of a goal ? Anyone :) as per my last post, the requirements spec for CN appears to be missing, so we get to make one :) "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
![]() ![]() ![]() ![]() Send message Joined: 15 May 99 Posts: 3430 Credit: 1,114,826,392 RAC: 3,319 ![]() ![]() |
|
kittyman ![]() ![]() ![]() ![]() Send message Joined: 9 Jul 00 Posts: 51445 Credit: 1,018,363,574 RAC: 1,004 ![]() ![]() |
The kitties like to call their credits....kibblestones. LOL....yes.... And whilst the flavor of the current kibbles seems to be acceptable to the kitties, the quantity has been falling, so they have to get the computers to crunch harder to keep the mean level of kibbles in their bowl at a proper level. "Freedom is just Chaos, with better lighting." Alan Dean Foster ![]() |
![]() ![]() Send message Joined: 16 Jun 01 Posts: 6324 Credit: 106,370,077 RAC: 121 ![]() ![]() |
It's just statement regarding current state of things. Yes, it's smth like that currently. But... if it will be acknowledged... then those unicorns will be useless (as current credits are :) ). Credits used for : 1) (and most important) bragging/competition. 2) performance measurement tool. You propose to forget about 2, discard "FLOPS" and leave arbitrary "unicorns" instead. But all those problems that make current credit not suitable as 2) make it not suitable for fair 1) too. Hence, FLOPS or unicorns - people will not be able to make fair competition so cries remain... Regarding FLOPs-counting as V2 - good (really good!) in single project/single homogenious app. Here we have 2 apps, one of them (MultiBeam) highly non-homogenious one (in such degree that we have separate VLAR rules on server). It's quite obvious that different part of work for such app, being assigned some flops, will have absolutely different performance (measured let say in tasks done for particular AR) on different hardware and different ARs. So, we still have those "VHAR/VLAR storms" that will bias RAC, even in case credit granted in V2 (by pre-assigned FLOPS). Biased RAC w/o any change in both hardware _and_ software part of host setup. Just because of current data stream (totally out of operator's control). So, no fair competition, cherry picking in case of too much competitive person and so on... all spectrum of negative issues remain. |
![]() ![]() Send message Joined: 14 Jun 04 Posts: 1130 Credit: 1,967,904 RAC: 0 ![]() |
Wow, OK... I'm shocked. By some miracle we're all on the same page all of a sudden. Awesome. I thought you guys were gonna pick up from where we left off (almost 3yrs ago) and that I was gonna get another lecture on how credit is NOT a flopcounter. So now what? Well now... From where I'm standing it looks like you are trying to fix something that isn't fixable. Now don't get me wrong... Jason if you think you can crack the FLOPS problem and reel it into some semblance of reality then you're a miracle worker. Plain and simple. But for now, and for your own sanity, and Richard's, and Raistmer's... And all the other gurus. Oh and Eric's of course :) ... ... It might be a good idea to call credits anything but Cobblestones. Because as long as you do, you're all going to go crazy. OK so here's the deal with the CreditNew equation: -After the CN equation is done chasing it's tail it spits out a number. -CreditNew calls this number "F" (after calling it a bunch of other things first) Now let's all call "F" something like "CN" for the sake of ALL our sanity. And again for all our sanity let's agree for a moment that CN is a random number generator (which is one of the kindest things people have called it over the years). Unfortunately (and out of the blue) we do this with it: CN*cobblestone_scale = CREDIT But if CN is arbitrary then no matter what you multiply it with, the result will also be arbitrary. Also, there's no real need to multiply it by anything (let alone Cobblestones). We can easily leave that part out. It won't break the equation or anything because it's not really PART of the equation. It could just as easily be: CN = Credit And "Credit" could be anything. Anything EXCEPT Cobblestones ;) |
©2022 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.