CUDA and Resource Share

Message boards : Number crunching : CUDA and Resource Share
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 844635 - Posted: 24 Dec 2008, 17:37:30 UTC

Executive Summary:

For those that have no patience reading anything important that is longer than two sentences or don't have the mental capacity to retain or process complex subject containing large amounts of information. Even so, I will try to be brief.

CUDA is here and this new capability has broken the Resource Share Model used to allocate work among projects on a specific machine.

This problem needs to be fixed. The more obvious fixes would unfortunately rely on cross-project credit compatibility; a problem which has long been allowed to fester because it is deemed to be unimportant.

- Credit was to be allocated based on the processing power of a theoretical computing device.

- Resource Shares were to allow participants to allocate the use of their computer among the various projects for which they were contributor.


Problem Statement:

There were only a few design goals for BOINC among which the first and foremost was that multiple projects would be seamlessly supported on the computing resource. That because the various projects would have wide variations in work requirements that a theoretical computer would be used to establish the "gold standard" by which a participant's computer's processing power would be measured. With that established it would be a straight line calculation of how much credit would be allocated based upon the Cobblestones per second capability of the participant's computer (as measured) and multiplied by the time spent processing the work. Several flaws in this model were quickly established even before BOINC left Beta testing but were either dismissed as irrelevant or inconsequential. Now, discussions of this issue are dismissed as being impossible to solve. One can assert this as being true, just as a journey is impossible to complete if it is never begun.

Why this is once more relevant is that with CUDA, or GPU Co-Processor (Video card) processing being added to BOINC we are once again faced with a situation which we can face squarely and eliminate with a relevant and usable solution or we can adopt the usual "fix" and bury our heads in the sand and pretend that the problem does not exist. If only wishing could make it so. With the addition of the GPU co-processor as a viable computing resource on appropriately equipped machines we have further exacerbated the cross-project credit problem, added a new credit calculation problem, and as if that were not enough we have also broken the Resource Allocation mechanism.


Cross-Project Credit:

I will not re-hash this problem in here. This has been a problem that has been highlighted by the participant community for years and the documentation of the problem and proposed solutions are still available for those that are interested. This problem is of such long standing that the it was one of the most discussed topics during the BETA test of BOINC (during which we proved the existence of several major flaws in the design of the benchmarking system including the inaccuracy across OS. Each time this problem is raised the objections of the developer community are summed up as:

- Credit should be done away with because I don't like it.
- Those that want credit miss the importance of the science.
- The problem is not solvable (see my statement above).
- It is not an important problem.
- There are not enough developers (which is mostly because there is little point in working on a fix for a problem if the likelihood of its acceptance is minimal, also documented elsewhere, I refer specifically to the BOINC Dev mailing list for examples).
- etc.

All of which miss the most important point, Credit, and the fairness of the awards is, and always has been important to the participant community in large. Even more importantly this is the only way we have of measuring our contribution to the science done (the lack of feedback to the participant community is also a well known and documented problem and the variation between projects in outreach and feedback is vast, but this too is a separate issue much discussed elsewhere).


Resource Share:

The point of the resource share numbers is that by joining two projects with each having equal share numbers the computing resource, *OVER TIME* will be evenly divided between the two projects. On a single core processor 50% of the time will be spent on one project and then the other. The CPU will be applied to one project for a set period of time and then the CPU will be switched to the other project for a like period of time. With dual core processors the likelihood is that two tasks will be running at the same time with one core running one task from project 'A' and the other core running project 'B' *ALL THINGS BEING EQUAL* (which they never are). However, the point is that over the long run, the projects will be served relatively equally.

Were cross-project credits comparable we would expect equivalent RAC values and over time an equal increase in project credit scores.

Herein lies the issue. The assumption is that the computer has a pool of resources that are of equal processing power. The one core / single CPU is of course the trivial case. But, in the other cases the computer will have two, four, eight, sixteen, etc., processors, each of which is, individually, just like all of the other individual processors within the computer. In my Q9300, each core is identical to the other three. In the Dual Xeons, the two Xeon processors are equal, and the HT Virtual processors likewise are equivalent. My Mac Pro has two four core Xeons sporting 8 identical processing elements. And, lastly this is also true in the case for my newest jewel the i7 equipped machine with 4 HT capable cores. But the central truth is that the resource pool only contains elements that are interchangeable. Until now... With CUDA, this identical capability of processing elements is no longer valid.

Even more interesting is that we cannot even assume that the CUDA processing elements will be of equal capability. The i7 machine can support 3 video cards none of which have to be the same as the others. What this means is that we have no easy metric to specify how to allocate work across the processing elements so as to allow the participant to control his or her project participation through the assignment of Resource Shares.


Complications:

Though it is early days in the development of this capability the most likely outcome for a long time to come is that we will have the following types of projects:

- Those with CPU only Science Applications.
- Those with CUDA only Science Applications.
- Those with both CPU and CUDA Science Applications.

We will also have participants who,

- can run CPU only Science Applications.
- can run CUDA only Science Applications.
- can run both CPU and CUDA Science Applications.

Adding to that complication we have the unpleasant fact that some that can run CUDA Science Applications may not want to run them (because of gaming, heating considerations, etc.).

More directly we have the complication of how to measure the processing time applied. Do we add the CPU time used to the GPU time used (can the GPU processing time even be established correctly?) or do we ignore the CPU time as trivial and overhead? Do we simply assign standard sized credit values to valid work further degrading the original definition of the Cobblestone? Have the recent "adjustments" to the credit system completely removed the correlation between the original definition of the Cobblestone and the supposed reality of today's awards?

Oh, and we have not addressed the fact that the GPUs may not have similar capabilities even when they have identical model numbers.


A Solution:

In the dimness of time I proposed what was a rather sophisticated scheme to calibrate our computers with standard tasks and within that proposal were mechanisms to coordinate the credit awards across projects. The proposal was of course not well received but it is still well suited to solve what is now two major problems. With an accurate measurement of resource capability, we can then correctly calculate resource allocations.


Concluding remarks:

In the world of science in which I am a participant I am continually astounded by the cavalier attitude of the scientists that are in control of the various aspects of the BOINC System with regard to accurate measurement of something that is of interest to one of the segments of the community which is given so little regard. I speak, of course, of the participant community, without which there would be no work processed and therefore no science, and their simple and obvious desire for accurate recognition of their contributions and the ability to accurately allocate their resources among projects.

The need to be able to correctly and accurately measure the computing capability is no longer only a necessity to calculate credit allocations, it is also needed so that we can accurately allocate the available resources to the pool of work in accordance with the desires of the participant. Which is why I am confounded by the inability of people who's whole career is based on the need for precise and accurate measurements with clear and standardized units are so comfortable with the elastic and variability of the fundamental measurement of the BOINC Application. I grant that it has nothing to do with the direct science of the projects, but how can you stand in front of an audience and say that you have "N" participants doing "Y" amount of computing when the unit of measure is variably defined?

Anyway, my full expectation is that this will be received as other proposals have been received (and not just mine), with scorn, derision, mocking commentary, and little substantive discussion. The opportunity to resolve a problem will be missed (in this case, two problems) until in the fullness of time we will reap the fruits of the lack of action. After all, if we have perfected nothing else with BOINC, it is the ability to ignore a problem while heaping abuse on people that identify issues. I mean, it has worked so well for us so far ... look how far we have come ...



ID: 844635 · Report as offensive
Profile Dr. C.E.T.I.
Avatar

Send message
Joined: 29 Feb 00
Posts: 16019
Credit: 794,685
RAC: 0
United States
Message 844711 - Posted: 24 Dec 2008, 20:23:13 UTC


. . . quite well put there Paul - Thanks and have a Good Holiday Sir


BOINC Wiki . . .

Science Status Page . . .
ID: 844711 · Report as offensive
alpina

Send message
Joined: 18 Dec 08
Posts: 22
Credit: 32,011
RAC: 0
Belgium
Message 844856 - Posted: 25 Dec 2008, 4:15:19 UTC

You make fair points there. There should be done something about the credits of the BOINC system, for some it may seem unimportant but afterall it's the cornerstone on what BOINC relies. The credit system should be as flawless as possible, the equality among projects is at stake and that is crucial. It's a shame there is no serious discussion about this and I think it's a shame that some users find this an obsolete discussion. It's not, not because I'm that interested in credits, but because credits is for the BOINC system what money and the banking system is for the economy. If it's flawed, the system will fail sooner or later.
ID: 844856 · Report as offensive
piper69

Send message
Joined: 25 Sep 08
Posts: 49
Credit: 3,042,244
RAC: 0
Romania
Message 844863 - Posted: 25 Dec 2008, 4:27:09 UTC

nicely sayed
ID: 844863 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 844886 - Posted: 25 Dec 2008, 5:25:33 UTC

I'm going to disagree with my friend Paul, but only slightly.

I think the problem of assigning the proper credit is more difficult than anyone originally anticipated.

... and unfortunately, while the development progresses and the apps get faster, the proper multiplier (for FLOP-based projects) changes. Getting the right multiplier also depends on testers with a reasonable mix of machines.

I suspect that what Paul attributes to indifference is really fatigue.

Eric Korpela's algorithm should make things better, especially if other projects start using it.
ID: 844886 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14650
Credit: 200,643,578
RAC: 874
United Kingdom
Message 844944 - Posted: 25 Dec 2008, 9:13:23 UTC - in response to Message 844886.  

Eric Korpela's algorithm should make things better, especially if other projects start using it.

The trouble is, that Eric's algorithm relies on there being some sort of quasi-fixed credit granting method (flop enumeration here, server-generated fixed values at Einstein), and enough benchmark*time hosts to normalise towards.

By working off the median of the host population, Eric gave himself quite a lot of headroom, but in his original explanation, he did point out that the method relied on there never being enough optimised applications in use to reach down towards and affect that median (probably fair enough). Now we have the CUDA apps too, which will look super-optimised to the algorithm. Add in ATI or OpenCL hosts in the future, and eventually Eric will run out of headroom.

No, Eric's algorithm is a stopgap. A good one, and a long-lasting one, but not the real solution. As Paul says, someone needs to think through the fundamentals.
ID: 844944 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 844951 - Posted: 25 Dec 2008, 10:36:46 UTC - in response to Message 844944.  

Eric Korpela's algorithm should make things better, especially if other projects start using it.

The trouble is, that Eric's algorithm relies on there being some sort of quasi-fixed credit granting method (flop enumeration here, server-generated fixed values at Einstein), and enough benchmark*time hosts to normalize towards.

By working off the median of the host population, Eric gave himself quite a lot of headroom, but in his original explanation, he did point out that the method relied on there never being enough optimized applications in use to reach down towards and affect that median (probably fair enough). Now we have the CUDA apps too, which will look super-optimized to the algorithm. Add in ATI or OpenCL hosts in the future, and eventually Eric will run out of headroom.

No, Eric's algorithm is a stopgap. A good one, and a long-lasting one, but not the real solution. As Paul says, someone needs to think through the fundamentals.

The point here is that this has decoupled the Cobblestone from the original definition. I am not saying that this was wrong. Just, that we now have a yard stick that is not only variable between projects, but also varies with time as it is adjusted. At least if I have understood what little I know about what Eric tried to do ...

This means that a CS earned this year will not equal one earned next year. In effect the rich get richer and the new comers get the shaft ... er, debased and inflated payments.

I'm going to disagree with my friend Paul, but only slightly.

Whew! I was starting to panic ... usually I set off a firestorm of disagreement and disagreeable comment.

I think the problem of assigning the proper credit is more difficult than anyone originally anticipated.

And I disagree right back. Actually I did mention that this was going to be a serious issue because of the selection of an artificial benchmark as the means of characterizing a system. I was able to do this because "benching" systems used to be a hobby of mine and I quickly noted that the very popular Sieve benchmark was a real poor choice when it was the way many rated systems. There was a more ugly algorithm that also did primes but it used loops and was not at all efficient. Because of that, it was a better choice.

And in those days I had a file folder 4" think containing nothing but various benchmarks and article on why the best benchmark was the actual work load to be processed. Which formed the basis for the "Calibration Concept" in the paper (also referenced above) in that we would use known standardized tasks. Which also dovetails nicely with the discussion about why errors in CUDA processing do not mean that the older processing tools are correct.

At any rate, if we look waaaayyyy back, Benchmark for Win2k and Linux, and discussions like: New credit system. ... sadly, the discussions we had in the BOINC Beta did not make it to the new database. So, some of those discussions may be lost forever.

Ha! Well, maybe not all at that ... if you go *HERE* and expand the links at the top of the page you can see some of the post-Beta test discussions I found WAAAAYYY back then ...

... and unfortunately, while the development progresses and the apps get faster, the proper multiplier (for FLOP-based projects) changes. Getting the right multiplier also depends on testers with a reasonable mix of machines.

Kinda like coming of the gold standard. Now the government can print all the money it wants...

I suspect that what Paul attributes to indifference is really fatigue.

Indifference is certainly not the emotion I was attempting to convey. Nor fatigue. Sadly, (and this may be my depression talking, I can't always tell) I think it is out and outright hostility to the interests of the participant community. I have another screed in me where I hope to address this in a way that is more clear about what I mean on that subject.

I will pick on Carl Christensen where he posted that as a project developer he wants to eliminate the whole credit system (BOINC Dev mailing list about a moth ago). He like many others see it as unnecessary. Now I like Carl, and I think that for the most part he is what I would consider "one of the good guys", but he could not be more wrong about this. His perspective is biased because he is a project insider and he sees things in his projects that are never exposed to the outside world. And as such, he can see the progress of the project.

But, the normal participant cannot see that. Progress for virtually all projects from an external viewer's perspective happen at a glacial pace. One, maybe two tidbits a year ... in busy years. For many projects the most news is that the project started and is now ending. And little or nothing else.

So, the participant is left only with the measurement of his contribution.

And the actions of the developers, including the inaction, is more of an attempt to kill off part of the system by neglect.

Eric Korpela's algorithm should make things better, especially if other projects start using it.

As long as we put aside the fact that our meter stick is no longer defined. Not only that it changes length without notice. Imagine building a car with the definition of an inch changing over time. In a year or so you could not buy bolts that would work to repair the car.
ID: 844951 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 845051 - Posted: 25 Dec 2008, 18:58:34 UTC - in response to Message 844944.  

Eric Korpela's algorithm should make things better, especially if other projects start using it.

The trouble is, that Eric's algorithm relies on there being some sort of quasi-fixed credit granting method (flop enumeration here, server-generated fixed values at Einstein), and enough benchmark*time hosts to normalise towards.

By working off the median of the host population, Eric gave himself quite a lot of headroom, but in his original explanation, he did point out that the method relied on there never being enough optimised applications in use to reach down towards and affect that median (probably fair enough). Now we have the CUDA apps too, which will look super-optimised to the algorithm. Add in ATI or OpenCL hosts in the future, and eventually Eric will run out of headroom.

No, Eric's algorithm is a stopgap. A good one, and a long-lasting one, but not the real solution. As Paul says, someone needs to think through the fundamentals.

The BOINC servers know the benchmark values for every machine actively crunching, and it knows the CPU time for every work unit returned.

Therefore, the BOINC servers can calculate the benchmark * time (the "canonical" credit, if you will) for every work unit.

The problem with the canonical credit is that by definition, there will always be some variation between the benchmark * time score, and the actual amount of work done.

Why? Because a general purpose benchmark will never have the exact mix of instructions as all of the science applications. In fact, the whole angle-range vs. credits/hour discussion comes from the fact that we don't even have the same mix of credits when we stick to just one science application.

If I remember correctly, Paul suggested that the solution was to use the science application as the benchmark, along with some known units.

What Eric's code does is compare the canonical credit (the benchmark * time credit as defined in Paul's yardstick) and the non-benchmark credit (which could be anything) and finds the proper ratio between the two -- and adjusts credit to calibrate.

I don't have access to the data, but I suspect the number he calculates using 100 "median" results is very close to the number you'd get if you used every work unit returned during a 24 hour (or 7 day) period.

If you want an absolute level playing field, you have to allow only one processor family, and only one memory architecture.

I can't comment directly on CUDA as most of my computers draw less power than a high-end video card, but work done on CUDA should be based on the same benchmark run on the CUDA chip, not the CPU. If that is done, then the benchmark * time credit should be pretty close, since the benchmark should be orders of magnitude higher than a general purpose CPU.
ID: 845051 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 845055 - Posted: 25 Dec 2008, 19:20:36 UTC - in response to Message 844951.  

I suspect that what Paul attributes to indifference is really fatigue.

Indifference is certainly not the emotion I was attempting to convey. Nor fatigue. Sadly, (and this may be my depression talking, I can't always tell) I think it is out and outright hostility to the interests of the participant community. I have another screed in me where I hope to address this in a way that is more clear about what I mean on that subject.

There is a saying "Never ascribe to malice what can be equally explained by stupidity."

Here is my take:

There is certainly a community among crunchers (us.), and we tend to forget that there is a community among BOINC projects.

We certainly can look to the boards here and see how people interact in a community where the players have passion for what they're doing.

I can't believe that Bruce Allen at Einstein, or Carl at CPDN would not be as passionate about their projects -- as are the leaders of every other project.

It seems obvious that the squabbles over credit extend into the "projects" community, and enough people have gotten it wrong enough times that we have projects that dramatically underpay, and dramatically overpay.

Once a project "overpays" any attempt to adjust downward is going to cause a tremendous uproar among their crunchers.

If the project continues to overpay, they're going to feel pressure from their peers in the "project community."

No matter what, the project leader loses. Having a credit system, in and of itself, only makes somebody angry.

... and if you adjust at all, the new crunchers get an unfair advantage and the old work is "devalued" or seniority counts because you were around when work was worth more. Think "tar baby."

I think credit is very, very hard -- because of the politics of credit, not the mechanics. I can see why Carl would throw his hands into the air and say "do away with it entirely" because it would certainly make the problem go away.

So, there you have it. It seems intuitively obvious that you can do everything for all the right reasons, with the best of intentions, and still be described as "hostile and malicious" by either the BOINC user community or the BOINC project community.

... and then there are those who have proposed fixes. Do nothing, and you maintain the status quo and make everyone who has a better idea angry.

Until we all lighten-up, credit is going to be a big problem.
ID: 845055 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 845124 - Posted: 25 Dec 2008, 22:04:24 UTC - in response to Message 845051.  

The BOINC servers know the benchmark values for every machine actively crunching, and it knows the CPU time for every work unit returned.

Therefore, the BOINC servers can calculate the benchmark * time (the "canonical" credit, if you will) for every work unit.

The problem with the canonical credit is that by definition, there will always be some variation between the benchmark * time score, and the actual amount of work done.

Why? Because a general purpose benchmark will never have the exact mix of instructions as all of the science applications. In fact, the whole angle-range vs. credits/hour discussion comes from the fact that we don't even have the same mix of credits when we stick to just one science application.

If I remember correctly, Paul suggested that the solution was to use the science application as the benchmark, along with some known units.
...

And therein lies the difficulty. The setiathome_enhanced application uses different instructions depending on the system capabilities, and even shifts processing to a GPU if that's available. So the ratio between BOINC benchmarks and actual processing has a huge range. That doesn't prohibit Paul's caibration concept, but it does add another layer of complexity. The fundamental problem is still to get at least a majority of the large projects interested enough to dedicate time to implementing a revised credit system.

The Cobblestone concept in pure form is simply not suitable. Honest participants want credit proportional to work done, not proportional to artificial benchmarks. Eric's server-side adjustment extends the usefulness of Cobblestones as an overall cross-project touchstone, but as BOINC progresses beyond the initial focus on compute intensive uses, the Cobblestone becomes essentially meaningless. Unfortunately nobody has yet conceived a suitable replacement.
                                                            Joe
ID: 845124 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 845166 - Posted: 26 Dec 2008, 0:08:27 UTC
Last modified: 26 Dec 2008, 0:09:45 UTC

On WCG you have the "Badge" system along with their "points" (from the UD days) where as you return work you earn merit badges. The problem there is that it matters not how much work you do, as long as you have done the hours. So my "gold" badge might be 250 tasks completed where your slower computer only did 100 tasks. Still, it is a thing one may get the urge to collect (I am working on getting my "gold" for the newest project, just got my "silver" for Christmas). The problem here is that once you get the gold (at this time) there is nothing more to shoot for. Adding badges like platinum and such extends the concept or adding stars so that at 90 days you get your gold, at 90 + 14 you get a bronze to add to the gold until you get 180 days where you get a gold star on the gold badge. Whatever... But, therein lies the problem, you have to have a system where the rewards keep coming and the system is fair about rewarding effort. But, the WCG system is just a warmed over version of the one task one point system of the original SaH CLassic system.

The point is that many participants do work to "earn" the recognition. As scientists have proven, even dogs sense unfairness in reward systems and will fail to perform. I am not saying that people quit because of the problems in the credit system... but the problems and issues *ARE*, as we have all stated, a hot-button issue.

I am not at all convinced that most projects could *NOT* be brought on board, the problem is that for so long Dr. Anderson has championed the concept that all projects are independent and sovereign empires. Well, he got what he asked for ... anarchy with 50+ realms and 50+ kings. When I suggested that we consider the radical notion of projects, BOINC Developers and Participants recognizing that we *WERE* a community with common interests and goals... well, let us just say that the reception to the idea was less than tepid.

Anyway, the real genius of SaH Classic was that there was a reward system and people could get hooked on earning the counts as the did the work.

One of the ingenious ideas in BOINC was that we would have a system that was applicable across projects. Sadly the implementation did not match up with the cleverness of the concept. And the delays in fixing the problems has only made things worse.

BUT, I also do not think all is lost. I don't think that the CS is necessarily a dead end and we need to toss it and start over. Even though, a SYSTEM wide adjustment might mean inflation, deflation and award adjustments across the board. The problem is like many diseases, it slowly eats away at the healthy body.

But Ned is right, the use of a standardized task that is run on the target system is the best way to identify the processing speed of the system. If we establish that this task should be worth 14 CS and it takes 1,000 seconds on the CPU and 500 on the GPU, well, we have now characterized the system. If I run a Milky Way task on that same system and it takes 500 seconds on the CPU, you can have SOME confidence that MWay should be awarding about 7 CS for that task.

Anyway... this was not intended to devolve into a discussion about credit per se, though it does highlight that this *IS* one of the more important aspects of the issue(s) involved. It also does illustrate my earlier and larger point that we do *NOT* act like a community, we act like three separate groups with completely disconnected agendas (Which is the main point of my other screed, not yet written).

And one of the main points there is that when you are in a hole the first step is to stop digging. President Bush and his team never did learn that lesson and neither have we here in the BOINC Community.
ID: 845166 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 845215 - Posted: 26 Dec 2008, 5:22:26 UTC - in response to Message 845124.  

The BOINC servers know the benchmark values for every machine actively crunching, and it knows the CPU time for every work unit returned.

Therefore, the BOINC servers can calculate the benchmark * time (the "canonical" credit, if you will) for every work unit.

The problem with the canonical credit is that by definition, there will always be some variation between the benchmark * time score, and the actual amount of work done.

Why? Because a general purpose benchmark will never have the exact mix of instructions as all of the science applications. In fact, the whole angle-range vs. credits/hour discussion comes from the fact that we don't even have the same mix of credits when we stick to just one science application.

If I remember correctly, Paul suggested that the solution was to use the science application as the benchmark, along with some known units.
...

And therein lies the difficulty. The setiathome_enhanced application uses different instructions depending on the system capabilities, and even shifts processing to a GPU if that's available. So the ratio between BOINC benchmarks and actual processing has a huge range. That doesn't prohibit Paul's caibration concept, but it does add another layer of complexity. The fundamental problem is still to get at least a majority of the large projects interested enough to dedicate time to implementing a revised credit system.

The Cobblestone concept in pure form is simply not suitable. Honest participants want credit proportional to work done, not proportional to artificial benchmarks. Eric's server-side adjustment extends the usefulness of Cobblestones as an overall cross-project touchstone, but as BOINC progresses beyond the initial focus on compute intensive uses, the Cobblestone becomes essentially meaningless. Unfortunately nobody has yet conceived a suitable replacement.
                                                            Joe

... and the problem there is, how do you define "work."

Counting instructions is flawed, because different instructions take different numbers of clocks.

The number of clocks for a floating point "add" is not the same from processor to processor.

On one CPU, a floating point multiply may take 5 times as many clocks as an add, and another may be able to do a multiply in the same number of clocks.

The we have all of the SIMD instructions.

Got a GPU? Gotta figure it into the benchmark somehow -- but only if your app. uses the GPU.

Okay, let's do something with straight time: we're now rewarding those with slow computers, and removing all incentives to optimize apps. Slow computers get the same prize as the fastest machines. That doesn't work.

1 WU equals 1 credit doesn't work when Multibeam times vary so much, and Astropulse takes much longer than Multibeam. It sure doesn't work across projects.

It may be that the Cobblestone is the worst possible standard, except for the alternatives.

I don't have an answer, but I certainly admire the problem.
ID: 845215 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 845291 - Posted: 26 Dec 2008, 14:05:00 UTC - in response to Message 845215.  

... and the problem there is, how do you define "work."


And that is the reason I was and alway have been arguing for using standardized work units / tasks to make the measurements.

Counting instructions is flawed, because different instructions take different numbers of clocks.

The number of clocks for a floating point "add" is not the same from processor to processor.

On one CPU, a floating point multiply may take 5 times as many clocks as an add, and another may be able to do a multiply in the same number of clocks.


If you just abstract back to the number of operations this is countable, not necessarily the best method, but doable.

The we have all of the SIMD instructions.


USUALLY, somewhere there is an equivelence to the number of FLOPS done. Not always, but usually.

Got a GPU? Gotta figure it into the benchmark somehow -- but only if your app. uses the GPU.

Okay, let's do something with straight time: we're now rewarding those with slow computers, and removing all incentives to optimize apps. Slow computers get the same prize as the fastest machines. That doesn't work.

1 WU equals 1 credit doesn't work when Multibeam times vary so much, and Astropulse takes much longer than Multibeam. It sure doesn't work across projects.


SaH is actually kinda perfect as a model at the moment. IF we had a true test task as I have suggested in another thread that had known signal elements and we ran that task on a specific architecture, CPU or GPU, we would have known run times. Now, if we arbitrarily, in that almost all this is arbitraty, say that this task is worth "x" Cobblestones, we now have a mapping of that computer of time vs productivity.

If I then run another project's task on the computer I *CAN* use that same rough productivity measurement in that MOST of the project are doing roughly the same types of things. I *HAD* suggested that we try to characterize with test tasks from the "biggies" so that instead of only one workload we characterized with a mixture. My suspicion is that we would find in the grand scheme of things that there is not that much of a difference ... potential exceptions would be projects doing only integer math (or an exceptionally higher proportion of integer math) or using scaled numbers (you don't want to know).

What I mean is that there are architectural differences, but, I am not sure that they are large enough that over the course of the typical task that they are going to be dominant enough to make a difference.

Again, the SaH task is (or could be) at least for now the bridge for CPU only, GPU only and Mixed projects.

It may be that the Cobblestone is the worst possible standard, except for the alternatives.


Yes, well, the question has always been, how do you define it, and how do you measure it. In the beginning I had no problems with the definition, but pointed out that it would be nearly impossible to measure it reliably. And we proved that with the tests we ran in Beta on several systems. Tests you can replicate probably today and see the same variations between repeated runs of the benchmarks which demonstrates that the benchmark itself is flawed. For those with multi boot capability you can demonstrate that a simple change of the OS on the identical HW produces radical changes in the Bench scores (our original tests shows Windows, Os-X, then Linux was the sequence with Linux benching consistently the worst).

I don't have an answer, but I certainly admire the problem.


The determination of computing capability is one of the harder problems which is why I was so annoyed by Dr. Anderson's caviler attitude during Beta that the BOINC Devs would be able to make a good benchmark. Failure after failure did not dampen enthusiasm ...

Anyway, on GPU Grid people are running into RS problems where they cannot get work reliably because of the issues with the GPU tasks and CPU tasks both using RS as if the resource pool was uniform and the tasks could be run on either resource, neither of which are true statements.
ID: 845291 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 845293 - Posted: 26 Dec 2008, 14:12:14 UTC

I just looked at one of my last tasks:

Work Unit Info:
...............
Credit multiplier is : 2.85
WU true angle range is : 0.009525

Flopcounter: 19390760638150.219000

Spike count: 0
Pulse count: 3
Triplet count: 0
Gaussian count: 0


Who says we can't count the flops?
ID: 845293 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 845434 - Posted: 26 Dec 2008, 21:58:12 UTC - in response to Message 845293.  

I just looked at one of my last tasks:

Work Unit Info:
...............
Credit multiplier is : 2.85
WU true angle range is : 0.009525

Flopcounter: 19390760638150.219000

Spike count: 0
Pulse count: 3
Triplet count: 0
Gaussian count: 0


Who says we can't count the flops?

I know this looks like a really exact count, but it really counts loops, and multiplies the count times an estimate of the number of flops.

It's an approximation.

But it also makes the grand assumption that "all FLOPs are created equally."

... and that processor architectures don't change.

It'll take some searching (unless someone has it at their fingertips), but there is a thread that discusses angle-range vs. credits/hour for Multibeam work units.

The ratio between FLOPs and credit should be a constant. It is far from it.

I think your original idea of using a representative work unit as the benchmark is a good idea, I just think that as benchmarks go, it would take more CPU time than it is really worth.

But the $64,000 question now is: what angle range "examples" do we use for a work-based benchmark, since the SETI "curve" isn't even badly curved.
ID: 845434 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 845438 - Posted: 26 Dec 2008, 22:13:10 UTC - in response to Message 845291.  

It may be that the Cobblestone is the worst possible standard, except for the alternatives.


Yes, well, the question has always been, how do you define it, and how do you measure it. In the beginning I had no problems with the definition, but pointed out that it would be nearly impossible to measure it reliably. And we proved that with the tests we ran in Beta on several systems. Tests you can replicate probably today and see the same variations between repeated runs of the benchmarks which demonstrates that the benchmark itself is flawed. For those with multi boot capability you can demonstrate that a simple change of the OS on the identical HW produces radical changes in the Bench scores (our original tests shows Windows, Os-X, then Linux was the sequence with Linux benching consistently the worst).

It isn't that Windows and Linux are different, but that we aren't using the same compiler.

Writing benchmarks that can be compiled is difficult, because we're depending on two different compilers emitting the same instructions in the same order, and modern optimizing compilers don't do that.

A perfect optimizing compiler is going to look at benchmark code, and recognize that 99% of the instructions do not change the final result -- and optimize them out.

The better the compiler, the faster the benchmark runs.

One of the things that has definitely happened since you last actively posted on this subject: the actual meaning of the benchmark has been eroding. It is little more than a rough measure of performance, multiplied by a duration correction factor to bring work requests in line with the difference between the predicted speed vs. actual speed. You can experiment if you like with very high vs. very low DCF values and see how work requests become more accurate with time.

At least here on SETI, benchmarks in general have little to do with requested credit. That comes from the FLOP count -- and yes, processor architecture makes a difference. The current AMD architecture is not well suited to SETI, while Intel processors are better.

... and I think a sampling of the median machines provides a good approximation of how benchmark * time compares to credit derived from counting flops.

As you pointed out, this is all pretty arbitrary. I think a good approximation of some arbitrary values will do.

As for statements like "cavalier attitude" I will remind you that written communication is pretty low bandwidth -- and that tone, inflection, and attitude don't always make it through the medium. Unless you've talked to him (more bandwidth, so you can hear tone of voice) I don't think that characterization is fair.
ID: 845438 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 845463 - Posted: 27 Dec 2008, 0:02:34 UTC

Posted to the BOINC Dev mailing list:

Extending the comment made by N. Alvarez;

The policy should be generalized to extend to a list of defined types. At the moment we have:

- CPU
- GPU (CUDA)
- Network
- Non-CPU (currently as stated by N. Alvarez these have been network jobs, but they may not, in fact, be of that class)
- Other Co-Processor
- Coming attractions

This would mean that we would need to make matching capacity lists so that the policy would look at *ALL* potential shortfalls and bottlenecks. It may be that at the moment we cannot clearly identify the bottlenecks by classification of resource capability but the generalization of the mechanism would be such that as we can these mechanisms can be fitted into the scheme cleanly.

In the scheduler message then the "req_seconds=something" becomes a general mechanism and we do not need to add new fields ... the schedular message simply has a list of requirements of the client for the various types of resources available and needing work. The same extends to the other fields necessary (send only = CPU, GPU, Network, Other).

This fits with the abstraction of processing resources which which are at least partially listed above. Extensions then become additions to the list of abstracted resources, which I would extend as suggested to GPU, CPU, Network, Storage, Non-CPU to start. Then the algorithm searches for the accumulated shortfalls among the resources and then to select the project that has the best fit(s).

This also simplifies the coming new APIs and resources and potentially even the fact that the ATI API may be so different that it is in effect a new resource type rather than simply a variant of NVidia CUDA.

I posted here and on the SaH NC forum a discussion of the problem of having a single value for Resource Share to allocate all computing resources among projects. It is entirely possible that I may wish to share and allocate differently among the resources that are usable.

By this I mean I may want to allocate less of my network bandwidth or GPU to project A than B assuming that they both can utilize the two resources.

I am not sure that adding a resource share number to each resource is the best solution, it may be, at least in the short term ... but, this is also something that needs to be considered. What I mean is that I may want to stress one project over another and even if this means that a resource becomes underutilized this may be fine with me. Take for example, FreeHal which is primarily a network using program. If I have no other network using program on a system then this project will get 100% use of the machines which may saturate my network of perform more work than I desire. On the other hand, in the same case, what if I want to maximize the usage? In the case of FreeHal I may want to use all cores available and with my high bandwidth connection this may be a distinct potential. Likewise with GPU Grid where I may want use of the GPUs, but may not mind shortfalls. I am not saying that *I* personally may want this, but, some may want, or not care that full utilization is achieved.

Last objection (I hope):

What of mixed resource projects. I cannot think of one currently though the closest are GPU Grid and SaH with their GPU capability where is the mechanism that allows me to bound usage of one resource over the other? I know I am stretching here, but consider this, what if the CPU use is higher than is currently being shown (I see 0.03 displayed and yet Task Manager is showing 8% which is nearly full usage of one CPU), and this is the share that I want to use to allocate? Again I am not sure this is immediately solvable, but if kept in the back of the mind during the development we may be able to avoid painting ourselves into a corner. How do we allocate and control projects where the resource usage is not cleanly CPU only, GPU only, Network only? Granted it is today, for the most part (by all appearances, though I am not sure that GPU Grid is really not using more CPU than I expect) ... but tomorrow?

Rats, one more:

Allocation calculations to this point are assuming that the resource pool is uniform. With GPUs we now very definitely have a non-uniform potential pool. When the number of GPUs is one it is not an issue. But, when >1 it is far more likely that the GPUs detected will not be of the same type, and even if of the same class may have vastly different computing capabilities. For example two 9800 GT cards with different memory sizes may dramatically affect the computational speed. This may require that the allocation mechanism be generalized such that each GPU is treated as a distinct resource rather than a generalized number of GPU = N calculation.

An additional consideration is that I may want to allocate only to one of the two GPU I have installed, or that project A use GPU 2 while projects B, C, D use GPU 1 ... again, the control mechanism is not in place to allow this. One comment I saw recently was from a gamer that did not want BOINC to use his GPU at all because they felt that it was messing up his games. Which may be another problem, but, I think that he did not want BOINC to use the GPU while he was gaming. A better solution would be to allow better resource controls like the ability to throttle CPU use while the computer is in use.

*IF* we use abstraction and extend the concept ... the application would now have instead of just one pane for Processor, one for Network, and one for Disk & memory we would be seeing a variable number of panes controlling each of the available resources. In my case I would have the standard three, along with two additional ones for the GPUs (one each) with one more for network usage of network applications. What I mean here is that the network usage pane conflates the network usage of BOINC itself with the use by the science applications running on BOINC. Not sure how to label the difference, but, they are, in fact, two different cats and should not be controlled from the same pane. For example I may only want BOINC to connect to servers at predefined times, but to allow continuous network usage by the applications.

As additional resources become available the control panes would grow and shrink as resources come and go ...

And I am not sure we have handled the cases where the project actually will by using two or more resources and how that is clearly handled by resource allocation. I also did not really address the controlling of the N instances questions. One could make the argument that the number of CPUs (and now GPUs) to use should not be a global constraint but should also be a per project constraint. Example, I want BOINC to use 8 CPUs and 2 GPUs, but never want SaH to have more than one of them at a time, or more than one each of them at the same time (1 CPU and 1 GPU), and so forth.

Conclusion:

I think my major objection is that you have cast the problem in terms of work fetch policy only and that is limiting the scope of your vision of the problem. The problem is larger than just the work fetch policy. It is a failure of the design of the Resource Allocation model which *DRIVES* the work fetch policy. From my stance we not only have issues with the Resource Share model but also the controls available to the participant to allow them to distribute their resources to the projects as they desire.

I have read and re-read this and I am not sure I am being clear, but, ... hopefully this will spark some discussion that may be useful. I would also suggest a peek at the discussion in the NC forum (where I will post this also). I know we segued into a discussion of credit policy, but, the point there is that because we never fixed the credit problem the participant really cannot tell if the resource shares that they have selected are really being honored. In other words, if one project over pays then should I be setting my share lower, and how do I tell? Theory said that if I have two projects with the same share that I should be seeing roughly similar RAC and credit balances. Sadly, everything connects to everything ...

Oh, and for David Ball's suggestion for an ability to reset the LTD, for those of us that have never successfully managed to get the CLI to work, could we add it to the GUI? FOr those of us that run BOINC forever and have large numbers of projects I have seen odd behaviors too that are cleaned up with a reset of LTDs. I know BOINC is supposed to manage them ... but ...

On Dec 26, 2008, at 1:58 PM, David Anderson wrote:

The work-fetch policy in 6.4 and 6.5 is flawed;
it can result in idle CPUs and/or GPUs.
I don't think there's a quick fix for this problem;
rather, the work fetch policy needs to be transformed.

I've written a preliminary design document for this:
http://boinc.berkeley.edu/trac/wiki/GpuWorkFetch
(This design is based largely on suggestions from John McLeod).

If you're one of those people who enjoy the nuts and bolts of BOINC scheduling,
please review this document carefully and send me comments
(or let me know if there are parts that are incomplete or ambiguous).
I'll probably start working on the implementation middle of next week.

-- David


ID: 845463 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 845473 - Posted: 27 Dec 2008, 0:30:58 UTC - in response to Message 845438.  

As for statements like "cavalier attitude" I will remind you that written communication is pretty low bandwidth -- and that tone, inflection, and attitude don't always make it through the medium. Unless you've talked to him (more bandwidth, so you can hear tone of voice) I don't think that characterization is fair.


Ned,

I am reflecting on a long history, of multiple individuals I loosely class as developers, which includes many people, and is a summarization of the general attitude.

Yes written communication is bad and I have tried in the past to actually talk to some of the people and yes better understanding can result.

But the history of BOINC is that if you are not a member of one or more small groups your ideas and comments are dismissed, denigrated, rediculed more than they are reflected upon. Yes it probably does a disservice so some. So be it, and sorry about that. I am reflecting on *MY* experience historically and how I have seen others and their ideas treated. There is far too much NIH ... until time passes and someone in the appropriate clique can re-propose the idea ... of course we only lose a couple of years while waiting ...

I will further state that though I am no longer a real active poster as I used to be ... there are periods where I lurk and what I see is still not pretty...

I think your original idea of using a representative work unit as the benchmark is a good idea, I just think that as benchmarks go, it would take more CPU time than it is really worth.


One of the reasons that I dismiss SaH as not doing real science is that there are simple flaws in the methodology that were *I* on the review panel I would use to impeach the results. I grant that for practical purposes the methods are "good enough". But that is not the point of science. The point of science is unimpeachable, impeccable, iron clad, repeatable, re-producible, no holes experiments. We don't do that here.

I made the comment in another thread that the fact that CUDA results not matching other results does not, in and of itself, prove correctness of earlier programs. The reaction there to that comment is interesting to say the least. And it is illustrative of my other point, and supports my contention here.

*IF* we had a suite of test signals that were artificially created, and though I would personally have a suite of at least 10 different mixes (though with the project capacity having as many as a 100 or so is not out of the realm of desirability) and the algorithm correctly identified the pulses, peaks, etc. 100% of the time in testing I would then say that the algorithm has been proven to be basically functional in detection. Note I would then assert that the algorithm would then be tested against those same signals in the presence of noise (take the original test signals and add artificial or real noise from a pure noise source so that it would not add inadvertent spikes). Now we have proven detection and isolation (from noise).

Is that sufficient?

In a word no ...

In the world of science and engineering you measure nothing without a calibrated instrument. And calibration of an instrument has a shelf life. On expiration the instrument cannot be used to make measurements. What is our instrument? Participant's computers. Neither tested, nor calibrated. We just use them. And compare the returns and toss them if they don't compare. Again, good enough for rough work. But you cannot call it science (IMO), because you have not proven your instruments. The fact that we have people that have computers that are so over-clocked that they have fallen over and died... but are they the only ones? What if there are other issues that is biasing or damaging our work? Remember the FPU bug in the 486? What if we have something like that and we don't even know it?

As for too much work? Heck we are running out all the time. My proposal has so many advantages for people that are SERIOUS about what we are doing that it is not funny. With know standard tasks we are not only continually proving that our algorithm works, but that it works on all those different architectures and returns accurate results. My side benefit is that it also benchmarks the computer using the actual work load.

But the real benefit is that it eliminates additional sources of questioning about the methodology used in the processes. Then you are talking about real science. Real science where you eliminate potential objections to the work, not because they have validity, but because you are a scientist, and you should want no doubt about the quality of your work.

I know that I want no doubts ... and I was only an engineer ... and a stinking software engineer at that ... and before that I was a repair technician ... whose work was invalid if I used any instrument in my work that was not calibrated ...




ID: 845473 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 845527 - Posted: 27 Dec 2008, 3:47:13 UTC - in response to Message 845473.  


In the world of science and engineering you measure nothing without a calibrated instrument. And calibration of an instrument has a shelf life. On expiration the instrument cannot be used to make measurements. What is our instrument? Participant's computers. Neither tested, nor calibrated. We just use them. And compare the returns and toss them if they don't compare. Again, good enough for rough work.

... and I picked just this one part of your message, because I completely agree, and I think this is where the confusion lies.

What we are doing is in fact "rough work."

If you want to measure the pH of a substance accurately, we have all manners of tests.

If you need a rough idea, you use litmus paper, and if you get blue it's alkaline, and if you get red it's a acidic. The precise color gives you a rough idea of how much, but it depends a lot.

Our main purpose here is to find those work units that may be of future interest.

... and the only answer is that if we want the kind of calibrated results that you're looking for, then you simply cannot use computers that are "in the wild" and outside the direct control of the project.

Personally, I'd like to see a quorum go back to three.

"BOINC developers" are those actively working on BOINC, IMO.
ID: 845527 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 845574 - Posted: 27 Dec 2008, 6:29:31 UTC - in response to Message 845527.  


In the world of science and engineering you measure nothing without a calibrated instrument. And calibration of an instrument has a shelf life. On expiration the instrument cannot be used to make measurements. What is our instrument? Participant's computers. Neither tested, nor calibrated. We just use them. And compare the returns and toss them if they don't compare. Again, good enough for rough work.

... and I picked just this one part of your message, because I completely agree, and I think this is where the confusion lies.

What we are doing is in fact "rough work."

And my point is that if they don't have the validation tests for computers in the wild that they do not have them for computers under their control. Test tasks are not used in testing for initial tests because they don't exist. One can argue that test signal injection like what is used at EaH is sufficient. Again, were I on the board I would argue that it is not ... because of the inherent limitation of the signal injected we have only one test.

My argument is that the overhead of even minor testing is so little that there is no practical reason not to demonstrate that care was taken even in the preliminary stages of the work. In that the participants would be granted processing credit for valid work means that they are not "hurt". The fact that we could use the testing tasks for computer characterization means that we are killing several birds with one stone in flight.

We characterize compute speed for credit granting purposes, we continually validate all the variations of the software for correct operation, and the computers in use for proper operation. Not bad for one task a week ...

Your test example is good, but, I would use the counter example of when your doctor does tests do you want the lab to use the paper and eyes that may be untrained, or properly calibrated equipment? It is after all, only your life .... :)

As to the last, I think we will need to agree to disagree. Whether is is from actual and intentional malice or inadvertent and careless malice matters little ... it remains malice to the person on the receiving end. How many people have been hounded off these boards over the years? Just as an example. How many ideas have been killed before we have even tried them? How many have been killed even after they have been coded and tested?

Oh, and quorum of three ... yes, no real reason not to ... if we are doing science. If we are just standing around playing with ourselves, well, it matters not how many time we do the work...
ID: 845574 · Report as offensive
1 · 2 · Next

Message boards : Number crunching : CUDA and Resource Share


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.