Message boards :
Number crunching :
New Credit Adjustment?
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 16 · Next
| Author | Message |
|---|---|
|
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0
|
The benchmarks are not equivalent between Operating Systems and can be hacked. Using a known-flawed metric and hoping that a statistical sample normalizes it is, at least in my opinion, not scientifically valid. Efforts should be made to equalize the benchmarks before undertaking this endeavor. Once that has been done, then I would be on board with what is proposed. To give you a more concrete scientific measurement, let's say we wanted to find the boiling point of distilled water at 1 atm pressure. Let's say that 80% of the samples were obtained at 1 atm, but 20% were obtained at 0.965 atm (28.88 in/Hg, the current reading in Kansas City, MO). Uh oh, you don't get 212 F (100 C), but 211.61 F (99.79 C). Is that significant? It depends on the circumstances. However, the experiment is fundamentally flawed, since such a large sample was obtained outside of the stated condition of 1 atm pressure (29.921 in/Hg). If the scientific community endorsed 99.79 C as the boiling point of water at standard sea-level pressure (1 atmosphere), would that then make it "correct"?
|
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19983 Credit: 40,757,560 RAC: 67
|
The benchmarks are not equivalent between Operating Systems and can be hacked. Using a known-flawed metric and hoping that a statistical sample normalizes it is, at least in my opinion, not scientifically valid. Efforts should be made to equalize the benchmarks before undertaking this endeavor. Once that has been done, then I would be on board with what is proposed. As I understand it, the benchmark figures used will be a median, i.e. the highs and lows will be removed before calculation. And from past discussions, a bit of research, and relying on my memory. The equalisation of benchmarks is an impossible task, as they will never be the same on different OS's, they will never be the the same when different compilers are used, they will never be meaningful unless they test the complete system not just the cpu, different cpu architectures will give different results all other things being equal, the same cpu architecture will give different results depending on number of cores and different cache memory sizes, the same cpu will give different results when used on a different motherboard. So new completely different benchmarks are needed which would probably take longer to run and it has been said that users do not want longer more intrusive benchmarks. It has been said the only true benchmark is the application being run, but all the project applications are different. And some projects change applications faster than tasks take to run on other projects. |
|
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0
|
To affect the system wide average, you'd have to get your hacked benchmarks onto a whole lot of computers. ... and if you did, you'd simply raise the project-wide multiplier, and those who do not have a "hacked" benchmark would get the same credit increase.
We're not trying to determine the boiling point at sea level. We're trying to determine the boiling point where people live. So, some people boil water in Death Valley, and some boil water in Denver (or at even higher elevations). ... and the answer is not 100 C, because we don't all live near sea level. |
|
Grant (SSSF) Send message Joined: 19 Aug 99 Posts: 14015 Credit: 208,696,464 RAC: 304
|
|
|
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0
|
You always have a comeback, don't you? Good luck with your quest... Brian... done
|
|
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0
|
The benchmarks are not equivalent between Operating Systems and can be hacked. Using a known-flawed metric and hoping that a statistical sample normalizes it is, at least in my opinion, not scientifically valid. Efforts should be made to equalize the benchmarks before undertaking this endeavor. Once that has been done, then I would be on board with what is proposed. I never said "the same". I said "similar". There is a world of difference. I gave a tolerance figure of 1-3% here earlier. This value was not intended as the delta between two systems with the same CPU on a different motherboard, but as the delta between a single system in a dual-boot configuration, thus every piece of hardware is identical. As I understand it, that delta is larger than 1-3%. Anyway, I know this point is lost on most of you, and I'm tired of trying to be a voice of reason amidst the clamor to "fix" this horrific issue... |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19983 Credit: 40,757,560 RAC: 67
|
The benchmarks are not equivalent between Operating Systems and can be hacked. Using a known-flawed metric and hoping that a statistical sample normalizes it is, at least in my opinion, not scientifically valid. Efforts should be made to equalize the benchmarks before undertaking this endeavor. Once that has been done, then I would be on board with what is proposed. I know all about the problem with dual boot, if you go back about 3 years on this board it was discussed over and over again. Some people, even compiled the benchmarks separately for win and Linux using the same compiler, on the dual boot computer they were using and very rarely got BM's within 10% of each other. It fact they very rarely got within your 1-3% for several runs of the same BM on the same OS. |
|
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0
|
The benchmarks are not equivalent between Operating Systems and can be hacked. Using a known-flawed metric and hoping that a statistical sample normalizes it is, at least in my opinion, not scientifically valid. Efforts should be made to equalize the benchmarks before undertaking this endeavor. Once that has been done, then I would be on board with what is proposed. Thus the reason why a more reliable metric is needed. The flop counter is more reliable. From what I'm gathering about the situation, other projects said "no can do" based on their current technology or resources. So, the "answer" is apparently to abandon the idea of the more reliable metric... What many people also don't grasp is that I am not opposed to credit equality across projects. I am, however, opposed to taking what I deem as the wrong approach to get there... Do I know for a fact this is the wrong way? No. My gut feeling tells me it is though...since we are now "building a house upon sand" (unreliable metric being used)... |
W-K 666 ![]() Send message Joined: 18 May 99 Posts: 19983 Credit: 40,757,560 RAC: 67
|
Not sure that Flop counting is any more reliable. It just ensures that for a given task each participating host claims the same credits, which are approximately in line with the average MB * time method. I say this because we don't count each flop, but count operations where each different type of operation has been guestimated to be X number of FLops. Counting each Flop would approx double processing time of a task. Also Flops are not equal, add and multiply are short and quick divide and sqrt are long and slow, and also these individual maths ops are not the same on each family of cpu. Its one reason why on Seti MB the AR cr/time curve is so jagged, and on Einstein the time for each unit varies as per the inverse rectified sine wave. |
|
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0
|
Not sure that Flop counting is any more reliable. It just ensures that for a given task each participating host claims the same credits, which are approximately in line with the average MB * time method. So, what were / are the justifications for using flop counting? Why did we change? If things are so difficult to be at least similar (I'll even broaden that definition to +/- 5%), then why the never-ending drumbeat for "cross-project parity"? Why not just eliminate BOINC-wide standings based upon cobblestones completely? Why not just have the competition within the individual projects, not across projects? If one wanted BOINC-wide standings, then why not create a new BOINC-wide standing that uses a composite index of the relative rank within each project to come up with the BOINC-wide standing? I think there was some site out there that does something similar already. In other words, why keep pissing people off with "fixes" that are no better than the current "crisis" of not having equal credit across the board? Why not shift the focus away from the BOINC-wide ranking based on cobblestones across all projects and onto a different idea? Oh my! The heresy of what I am saying! I surely must be a CW!!!!
|
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0
|
Just some random thoughts on operation counting etc: Actually, flop counts are quite precise and well studied. That is saying 'I just performed such and such (e,g, Fast Fourier Transform of size n) calculation, therefore I did x (e.g. knlogn) operations'. They have a solid foundation in algorithmic complexity which cannot be circumvented, are not estimates, and have 30+ years of scientific research hammering them down to the last floating point operation to back them up. Opportunities exist to choose different implementation algorithms based on hardware bias toward additions & multiples , other special circumstances etc, but these always come at the expense of more operations (no free lunch theorem), and current hardware tends to favour balanced algorithms rather than historical 'lots more adds, for some less multiplies' approaches. No shortcuts. Modern hardware in tandem with modern compilers, a divide is typically treated as a reciprocal & multiply (and undergo hand treatment where the combo fails). The nature of the hybrid RISC-CISC microarchitecture with out of order execution micro-op fusion and cache considerations sees the compound exceptions like this and square root, clouded and squished out of significance to an extent that some more complex operations are desirable as runaway stretches of fast additions will tend to drain the cpu front-end creating resource stalls, slowing the end result. Floating point operations, on average, are equal. The jagged Flop/AR curve is dependant entirely on the consideration that each AR requires a different number of operations to process, irrespective of machine speed. I expect that is a function of telescope geometry and search parameters rather than considerations for which operations are faster than others. The 'somewhat flatter' (but still jagged in the same places) WU time vs AR is a different matter, and is completely dependant on machine architecture, memory subsystem and microarhitectural optimisations within the application. It is those hardware and software design considerations that makes this curve 'flatter' than the original flop curve, while doing no less operations on the data. IMO, considering the above and the diversity of platforms and applications in use, I personally can see that a heuristic/adaptive credit multiplier, similar to the one described, may well be the only 'practical' way to prevent direct inflation with Moore's law in concert with optimisation and compiler technology. Based solely of flops with no multiplier, projects unable to provide computing efficiency garnered here would be priced out of the Boinc credit market before they started. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
|
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0
|
Brian, I'm simply trying to make sure we're talking about the same thing. Your analogy seems relevant, and you're right, your experimental results would not be valid as stated if that was the goal. ... and my response is partially correct. The question we're really trying to answer is: what is the average altitude where people live? If we can't ask them, but we can ask them to boil water and measure the temperature at boiling, we can calculate their elevation based on the measurement. Some will measure on a day when the local barometer is high, some will measure when it is low, and some will have inaccurate thermometers. Some will just say "it's 212, of course!" but on average, we'll get a pretty accurate average boiling point, and be able calculate an accurate average elevation. Your statement "You always have a comeback, don't you?" is an Ad-Hominem argument. Instead of going after the facts presented, it is easier to discredit me. The problem is that a change has been proposed, and there is much Fear, Uncertainty and Doubt that is being cast around. We seem to go very quickly from "BOINC is going to use benchmarks averaged over the fleet" (which is true) to "we're going to normalize at 100 credits/day for the median machine" which is false. The same is true with your "boiling point at 1 atm" argument. It's interesting, it has facts in it, but it does not match the proposal. Eric's right, credit adjustments are "the third rail" and ultimately, the only answer acceptable to everyone may be "credit must never be adjusted." If credit cannot be adjusted, and every application needs to be tweaked before release to provide identical credit as the previous application, it may mean "SETI can never release a new science application." -- Ned |
|
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0
|
Not sure that Flop counting is any more reliable. It just ensures that for a given task each participating host claims the same credits, which are approximately in line with the average MB * time method. As I remember, FLOP counting is a direct response to the benchmark * time variations in the earlier application. As has been noted, the benchmark may run unusually fast or unusually slow on some machines. It may fit in the cache (fast memory) while the actual science application does not, or it may be heavy on specific floating point instructions that are particularly good or bad for that CPU. Many of Jason Gee's comments in this message also apply to the benchmark. Three results form a quorum, and the claimed credits might have been 45, 20 and 15. The highest and lowest were thrown out, and 20 was granted. ... and at that time, the message boards were chock-full of "why did I claim 45 but I only got 20" (or "why was I cheated out of 25 credits!"). So, FLOP counting may not be better, but it is incredibly consistent. Just about everyone claims the exact same number. |
|
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0
|
It was used because I tire of the notion of jousting / fencing with words and redirection. I never knew what was meant by things needing to be "a certain way". It was never explained to me. As an example, instead of conceding, like you did with the follow-up post, that measurements taken at a different environmental setting contaminated the sample, you elected to try to deflect the discussion, avoiding the discussion of a concrete example to which the obvious answer was "no, it is not correct that the boiling point of water at 1 atmosphere pressure is 99.79 C". Furthermore, I know that you're smart enough to know that I knew about the runtimes being uniform at the early stages of SETI, back in 1999 or 2000. I condensed a lot of things so as to not make a lengthy post even more verbose. I suppose I could've followed that quote by Blaise Pascale about lacking time to make the message short, but I didn't know that we were going to get hung up on that as a point of minutae...
Evidence to suggest that large numbers of users flock to projects that award more credits has never been demonstrated. Yes, there is a subset of users, but even with theoretical "uniform credits", there will still be some degree of deviation which the exact same subset of users will get out their proverbial slide rules and figure out. Uniform credits are not the only way to "skin the cat", as it were. If you take away the BOINC-wide cobblestone-based standings, cross-project parity suddenly becomes absolutely meaningless. You could then let each project grant whatever they want. I know the immedate "knee-jerk" to that will be, "oh, but that will cause credit wars, credit inflation, hyperinflation, etc, etc, etc..." That could've already happened. It has not. You mentioned FUD, well the whole paranoia about credit inflation does indeed seem to be FUD being generated by David Anderson / BOINC, and adopted as gospel by various volunteer participants. Also, even if it were to happen, all it is doing is following standard Free Market principles, notably Supply and Demand. It is quite ironic that Berkeley, traditionally a Left-leaning city, is where these anti-competitive and anti-Free Market ideologies are coming from... To illustrate, BOINC credits do not really cost a project anything. If a project boosts credits through the roof and this time, unlike other times with those "nasty and evil" projects (like RieselSieve, QMC, and Cosmology), users actually flock in droves to this new project and the project's servers are overwhelmed, then that project will have learned the lesson to not do that again without having sufficient hardware infrastructure. The same lesson would be learned by the project(s) that incurred losses. All of this is theoretical anyway. As has been proven time and time again by the data, these migratory effects are not happening, nor are credit wars happening. All the pounding of one's chest by David in his decree (20% rule) within the past year may have drawn attention to the issue in the lower-paying projects that previously were just slap happy with life and unaware that things were different elsewhere... In other words, he may have sowed the seeds of discontent on his own. As for the BOINC-wide stats, they are meaningless as it is now anyway. Until / unless a completely new paradigm is established and current levels are frozen (ala "SETI Classic" figures here), then the standings at a BOINC-wide level are an utter mess. At any rate, no amount of logic or reason will stop the drumming for this to happen. It is David's "Pet Peeve"...the "burr in his saddle". It also has a passionate group of individuals in the rank and file of the participant base. As I said, good luck with your quest...
|
|
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0
|
... and this is why I "always have a comeback." I didn't say "it is a problem because people will go to other projects." I said it is a problem because people get upset at the prospect of any change, no matter what the goal. ... and it is a problem because no matter what, it becomes emotional very quickly. It is very disappointing, because this is a technical forum and a technical endeavor and we should be able to discuss things on their merits. Your example of boiling at 1 atmosphere, while technically easy to understand, does not match what is being done. It is incorrect. The boiling point of water at 1 atmosphere is a constant. The average credit on a project is a variable. As slower computers are retired and faster computers join, the average credit rises. But no, we can't go there. We can't question someone's example without questioning their intelligence. So we end up with Ad-Hominem attacks -- we go after each other instead of trying to seek the truth. ... and it becomes clear that if you want to change the SETI science application in any way, you really can't unless you get the multiplier exactly right before you release it anywhere. Eric said in his post that the multiplier for 6.0 should be about 30% lower to adjust for variations in accounting. Why? Different mix of instructions?? Better FFT library?? It doesn't matter. There are those who say "if the multiplier is not 2.85 I quit" yet releasing with 2.85 will give us a 30% increase in the rate of credit. I don't have an answer. I'm a techologist, and this is pure politics. |
|
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0
|
... The work produced by the mb_splitter in SETI Beta for testing 6.02 has been almost all Very High Angle Range. There were essentially no changes from 5.27 to 6.02 affecting speed, it was all to do with adapting to the BOINC version 6 method of graphic display. I can only conclude that Eric has been too busy to adjust the numbers he has for 6.02 to compensate for the known higher credit/time rate of VHAR work. Luckily he has instituted a credit adjustment method which will work from the actual mix of work here rather than flawed figures from inadequate Beta testing. Joe |
|
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0
|
"The goal" is the same as the previous goal, cross-project parity. This is why the "credit war" was mentioned. Just because you seem to be unable to form a linkage in your mind does not mean that the linkage is not there, especially given that Eric openly stated so in his post. If you wish to ignore one part of something, that is your choice, but it doesn't make you right. It is very disappointing, because this is a technical forum and a technical endeavor and we should be able to discuss things on their merits. Eric said: "So, here's the story... Right SETI@home faces two problems. One technical and one political. " Eric also said: "The political problem is related to this. SETI@home has been granting about 15% more credit per cpu second than comparable projects. Other projects have threatened to increase their own credit multipliers to compensate. The problem is that they all have different ideas about how much credit we should be granting. One project has threatened to give 50% more credit per second than he benchmarks would indicate they should. So to avoid the coming credit war, BOINC is implementing this credit multiplier BOINC wide. This will be an objective way to make sure that projects don't grant too much credit. In other words, this will (probably) be happening at most every cpu intensive BOINC project." At this point, the only thing I can say if you keep refusing to agree that there is a political element to the situation is that you appear to have very selective reading. I don't have an answer. I'm a techologist, and this is pure politics. OH! So you did finally catch on that this was political. Whew! The problem is that often times completely technical / clinical ways of addressing issues do not jive with human emotion. One example that was in a movie was in I,Robot. When Will Smith's character and the girl were both in cars drowning, the robot made the "cold" decision that Will Smith's character had the higher statistical odds for survival, so it rescued him instead of the girl.
|
OzzFan ![]() Send message Joined: 9 Apr 02 Posts: 15692 Credit: 84,761,841 RAC: 28
|
The atmosphere is getting pretty tense in here. Please refrain from allowing this discussion to head toward personal attacks. |
|
Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0
|
BTW, you haven't figured out that I'm not reacting to what SETI wants to do internally. I don't really care. It's the whole BOINC-wideness that's going on, which you appear not to be focused on at all, likely because you do not particpate in other projects much... I'm trying to get you to understand the cross-project parity is being shoehorned in along with an internal SETI issue.
|
|
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0
|
I don't have an answer. I'm a techologist, and this is pure politics. The fact that this has become political is not news. It is so blindingly obvious that I'm surprised that you'd think that anyone would miss it. Read Joe Segur's post. It actually explains pretty well why the proper multiplier can't be easily determined before 6.02 is released. ... and probably why the same mechanism would benefit Astropulse. In politics, you can't get an idea without "spin" -- I say "it's political" and you spin that into "what, you didn't know that?" Not what I said. Any idiot can see this is politics -- and anyone can call anyone who disagrees with them an idiot if they don't want to argue on the merits. It's a technical problem, and I'm disappointed that it can't be discussed rationally, based on the facts. |
©2026 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.