New Credit Adjustment?

Message boards : Number crunching : New Credit Adjustment?
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 17 · Next

AuthorMessage
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 790556 - Posted: 31 Jul 2008, 21:02:06 UTC - in response to Message 790552.  
Last modified: 31 Jul 2008, 21:09:42 UTC


Much of the time it appears to be "This is part of Dr. Anderson's evil agenda for BOINC domination -- so it's bad."


At the behest of Ozzfan, I gave David the chance to explain his reasons behind the notion of manipulating the data at the stat site level. David elected to tell me that I "didn't understand" and there was not any "true credit" that he was talking about manipulating. Bottom line there is that he admitted to a desire to engage in manipulation of some sort, just that it took place outside of BOINC and at a stat site level. As such, I do not have respect for the man. He may have achieved a lot, and he may be very smart, but I cannot respect him. This also means that since he was so willing to and could not see the problem with manipulating the data, I also feel that I cannot trust data coming from him to be real and factual.

I'm hard on David for a reason Ned...

... and I really don't care, unless your only reason to object to something is "because Dr. Anderson might be behind it."

If that's the case, then all of your "technical" arguments are called to question.


No, it's not my "only reason". I've given you several verifiable examples, specifically the ability to fudge the benchmarks. I'd also imagine that completely legitimate benchmarks for a system would be low if BOINC decided to run them while the processor was in the middle of encoding video, although I'll admit that I don't know enough about the priority ("niceness", in Linux) of the benchmark.

I'll go ahead and tell you what I was going to watch for a few days as well - the decay rate, if any, of the Average Credit per CPU Second as seen on BOINCstats for my Pentium 4 over at Cosmology. I'm guessing that it will remain the same, even though I'm no longer processing any tasks there and have nothing pending for that system. If I'm right, and my system is included in the calculation of the ratios in that chart, then who knows how wildly overstated Cosmology's current ratio will be...because I know that several people with large farms have stopped processing at the moment, as well as I'm sure people with only a couple of computers have also stopped processing. If there is no decay of the value until the next time I am awarded credit on that system, then so long as my system and systems like mine are considered "active" and in the mix, then the true effect of the credit reduction over there is not known, and is likely greater than what the chart will indicate.

With all due respect, how much would he accomplish if he opened a dialog with every Brian Silvers on the planet?


With all due respect, David did not say "I'm sorry, but I do not have time to discuss this with you", he replied with telling me I just didn't understand. This was in a second email that was a reply to me stating that I knew about LOAD_STORE_ADJUSTMENT and the difference between flop counting and benchmark * time as well as server-side (Einstein). His reply was very curt, as well as demeaning. He may be a "busy man", but there is also a proper way to handle oneself if one is at that stature and is truly busy.



It's not perfect, but it's better than what we have now:


Is "doing something is better than doing nothing" always true? There are risks in both. If you "do something", you could mess it up worse than it is now if it is not properly thought through. If you "do nothing", you run the risk of things getting worse on their own.

The only thing missing from that comment is "so, why bother making changes."

... and if the projects (and BOINC developers) take that attitude, then we have what we've got, with no further chance of improvement.


I don't quite understand how you felt that "so, why bother making changes" was missing. It is inferred in "doing nothing".

The bottom line here is you are more willing to rush headlong into this, where I am less willing to "do something just for the sake of doing something".
ID: 790556 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 790568 - Posted: 31 Jul 2008, 22:22:51 UTC - in response to Message 790556.  


I'll go ahead and tell you what I was going to watch for a few days as well - the decay rate, if any, of the Average Credit per CPU Second as seen on BOINCstats for my Pentium 4 over at Cosmology.

Which is true because this is how RAC is calculated. For the credit adjustment, as I read the proposal, it'll be based on a sample of machines that are actively returning work. Stop returning work, and you have opted-out of that sample.

The only thing missing from that comment is "so, why bother making changes."

... and if the projects (and BOINC developers) take that attitude, then we have what we've got, with no further chance of improvement.


I don't quite understand how you felt that "so, why bother making changes" was missing. It is inferred in "doing nothing".

Maybe I was assuming that you'd still like to see some progress, or you would have said it explicitly. If you'd said "why bother" you would have expressed the opinion that progress is hopeless.

The bottom line here is you are more willing to rush headlong into this, where I am less willing to "do something just for the sake of doing something".

It doesn't look like "something for the sake of something" or Dr. Korpela wouldn't have the graphs.

It also doesn't look like "rushing headlong" because all of the information (all of the performance information and all of the credit) is available to the developers -- and the graphs are what I'd expect to see if someone has done research based on the available data.

ID: 790568 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 790570 - Posted: 31 Jul 2008, 22:27:50 UTC - in response to Message 790522.  

Is the C@H credit per second falling because of this change? Or because of some other reason?

Eric


2.023 (3884) Cosmology@Home


As I mentioned, the value should begin a rapid freefall...and it already has.

1.973 (3867)

Down almost 2.5% in a mere 15 hours.


@SETIEric@qoto.org (Mastodon)

ID: 790570 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 790576 - Posted: 31 Jul 2008, 22:53:59 UTC - in response to Message 790568.  
Last modified: 31 Jul 2008, 22:56:20 UTC


I'll go ahead and tell you what I was going to watch for a few days as well - the decay rate, if any, of the Average Credit per CPU Second as seen on BOINCstats for my Pentium 4 over at Cosmology.

Which is true because this is how RAC is calculated. For the credit adjustment, as I read the proposal, it'll be based on a sample of machines that are actively returning work. Stop returning work, and you have opted-out of that sample.

...and the graphs are what I'd expect to see if someone has done research based on the available data.


You still haven't figured out that I'm saying that the available data has flaws. The "available data" told Pons and Fleischmann that cold fusion was discovered.

Also, I may have stopped returning work, but the real question is:

- When is that system removed from the calculation?

Is it 1 day? 3 days? 7 days? 14? 30? 60? For the duration that the system is included, if there was a change in the credit distribution that would've impacted that system had it continued running, that system and all like it cause the ratio to be over or understated. In the case of Cosmology, it would cause SEVERE overstating, as like I said, runtimes went up 4x (ish) while credit only went up 40% (from 50 to 70).
ID: 790576 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 790586 - Posted: 31 Jul 2008, 23:11:15 UTC - in response to Message 790570.  
Last modified: 31 Jul 2008, 23:17:43 UTC

Is the C@H credit per second falling because of this change? Or because of some other reason?


Eric,

There are multiple reasons... Basically, Cosmology is a mess right now. A very big mess...

A new application was released that appears to have had minimal internal testing, causing erroring out immediately. A new version was compiled and then released, which took care of the issue for Windows, but the Linux app was 0 bytes (???). I'm a bit fuzzy on the Linux side of the issue, since I don't run Linux... There is a news item about that today...

As I mentioned, runtimes increased about 4X-5X, while credit increased only 40%. This caused huge drops in cpcs that will take some time to be reflected as it is also taking longer for validation due to hosts not reporting back as quickly as before.

Beyond that, the erroring out also caused some instances where 4 download and/or other fast error condition happened, but they have max_error_results set for 3, but 2 hosts were able to run to completion before the maximum number of errors got tripped, so those tasks were run by those hosts and received 0 credit for what probably would've validated.

Beyond even that, there are also instances of "too many success results", which cause all results to get 0. But wait, there's more!

The validator was not tested either, thus some tasks processed originally by application version 2.12 ended up being redistributed to application version 2.14. When the quorum gets formed by a 2.14 app with a 2.12 result waiting, it doesn't form a consensus and then the task is sent out again to a 2.14 application. When that new replication comes back, the 2.14 tasks validate and the 2.12 is declared invalid, for yet another zip, zap, zero...

The administrator (Scott) is a college grad and has expressed that he is about to head out. He seems to be mostly absent from the project at this point, posting something or doing something perhaps once a week, even if the project is technically down. Ben, the Project Scientist, hardly ever makes an appearance.

When the project runs normally, it runs like a "production level" project. When things go haywire, it is like it is "alpha level". Blend it and you have "beta", which is the "official" status of that project...
ID: 790586 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 790610 - Posted: 31 Jul 2008, 23:41:58 UTC - in response to Message 790576.  


I'll go ahead and tell you what I was going to watch for a few days as well - the decay rate, if any, of the Average Credit per CPU Second as seen on BOINCstats for my Pentium 4 over at Cosmology.

Which is true because this is how RAC is calculated. For the credit adjustment, as I read the proposal, it'll be based on a sample of machines that are actively returning work. Stop returning work, and you have opted-out of that sample.

...and the graphs are what I'd expect to see if someone has done research based on the available data.


You still haven't figured out that I'm saying that the available data has flaws. The "available data" told Pons and Fleischmann that cold fusion was discovered.

Also, I may have stopped returning work, but the real question is:

- When is that system removed from the calculation?

Is it 1 day? 3 days? 7 days? 14? 30? 60? For the duration that the system is included, if there was a change in the credit distribution that would've impacted that system had it continued running, that system and all like it cause the ratio to be over or understated. In the case of Cosmology, it would cause SEVERE overstating, as like I said, runtimes went up 4x (ish) while credit only went up 40% (from 50 to 70).

I don't know how BOINCstats does their calculation. As far as I know, the XML statistics they use reports results by user and host.

I went to the check-in notes for the change and read what Eric Korpela wrote.

They say:

  • Implementation of automatic credit leveling for cpu based projects that wish to use it.

  • The script calculate_credit_multiplier (expected to be run daily as a config.xml task) looks at the ratio of granted credit to CPU time for recent results for each app. Multiplier is calculated to cause median hosts granted credit per cpu second to equal to equal that expected from its benchmarks. This is 30-day exponentially averaged with the previous value of the multplier and stored in the table credit_multplier.

  • When a result is received the server adjusts claimed credit by the value the multiplier had when the result was sent.



My comments:

The first bullet point says "for projects that wish to use it."

The second one basically says "if a project is going to use it, they have to make this script run" -- in other words, it does not default to "on."

I'd have to read the code in this script to find the definition of recent. I mostly do embedded stuff -- not SQL or Perl. I don't see a reference to RAC or Recent Average Credit.

It says it looks at 10,000 results picked from results returned that day. I'm not sure how it uses the data from the result, or how much comes from the host table.

It does look like you have to return a result that day, or you cannot possibly be chosen as part of that days' sample.


ID: 790610 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 790628 - Posted: 1 Aug 2008, 0:05:38 UTC - in response to Message 790610.  


* The script calculate_credit_multiplier (expected to be run daily as a config.xml task) looks at the ratio of granted credit to CPU time for recent results for each app. Multiplier is calculated to cause median hosts granted credit per cpu second to equal to equal that expected from its benchmarks. This is 30-day exponentially averaged with the previous value of the multplier and stored in the table credit_multplier.

I'd have to read the code in this script to find the definition of recent. I mostly do embedded stuff -- not SQL or Perl. I don't see a reference to RAC or Recent Average Credit.


In the script "recent" is defined as up to 10,000 results which have been granted credit and assimilated in the last 24 hours. The timings are defined at the top of the file, so if a project wants a 60 day moving average or 100,000 results or results returned in the last 48 hours, that change is easy to make. There are some instances where changes like that would be required. If an project gets less than a 10 results a day, the multiplier will bounce around by 10 percent or so. In that case you'd want to look at the last week's worth of results and you might want to go with a 60 day average.

If there are so few hosts (<1000) that the median changes drastically from day to day, that could also be a problem. You'd address that in the same way.

It also doesn't make sense to run it if you just started your project and you don't expect the first result to come back for a year or so. Although if you have trickles, you might want to try to calculate a multiplier based upon your grants of credit for trickles.


It says it looks at 10,000 results picked from results returned that day. I'm not sure how it uses the data from the result, or how much comes from the host table.


The CPU time and granted credit come from the result table. The host benchmarks come from the host table.

Eric
@SETIEric@qoto.org (Mastodon)

ID: 790628 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 790638 - Posted: 1 Aug 2008, 0:16:54 UTC - in response to Message 790570.  

Is the C@H credit per second falling because of this change? Or because of some other reason?

Eric


2.023 (3884) Cosmology@Home


As I mentioned, the value should begin a rapid freefall...and it already has.

1.973 (3867)

Down almost 2.5% in a mere 15 hours.


Apparently the run length of the tasks changed upward dramatically but the fixed credits / task only went up a little.


BOINC WIKI
ID: 790638 · Report as offensive
John McLeod VII
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 15 Jul 99
Posts: 24806
Credit: 790,712
RAC: 0
United States
Message 790645 - Posted: 1 Aug 2008, 0:22:00 UTC - in response to Message 790576.  


I'll go ahead and tell you what I was going to watch for a few days as well - the decay rate, if any, of the Average Credit per CPU Second as seen on BOINCstats for my Pentium 4 over at Cosmology.

Which is true because this is how RAC is calculated. For the credit adjustment, as I read the proposal, it'll be based on a sample of machines that are actively returning work. Stop returning work, and you have opted-out of that sample.

...and the graphs are what I'd expect to see if someone has done research based on the available data.


You still haven't figured out that I'm saying that the available data has flaws. The "available data" told Pons and Fleischmann that cold fusion was discovered.

Also, I may have stopped returning work, but the real question is:

- When is that system removed from the calculation?

Is it 1 day? 3 days? 7 days? 14? 30? 60? For the duration that the system is included, if there was a change in the credit distribution that would've impacted that system had it continued running, that system and all like it cause the ratio to be over or understated. In the case of Cosmology, it would cause SEVERE overstating, as like I said, runtimes went up 4x (ish) while credit only went up 40% (from 50 to 70).

Out of the most recent 10,000 returns, pick the median host for Credits granted / (benchmarks * time). Base the multiplier of the FLOPS estimate on this host Note that it throws out all of the high and low benchmarks and under requesting earlier hosts. This removes much of the variability caused by high and low benchmarks and allows the project to correct their estimate.


BOINC WIKI
ID: 790645 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 790651 - Posted: 1 Aug 2008, 0:30:22 UTC - in response to Message 790610.  


I went to the check-in notes for the change and read what Eric Korpela wrote.

They say:

  • Implementation of automatic credit leveling for cpu based projects that wish to use it.

  • The script calculate_credit_multiplier (expected to be run daily as a config.xml task) looks at the ratio of granted credit to CPU time for recent results for each app. Multiplier is calculated to cause median hosts granted credit per cpu second to equal to equal that expected from its benchmarks. This is 30-day exponentially averaged with the previous value of the multplier and stored in the table credit_multplier.

  • When a result is received the server adjusts claimed credit by the value the multiplier had when the result was sent.



My comments:

The first bullet point says "for projects that wish to use it."



All the rest after this is irrelevant if any single project does not use it and that project thus "goes rogue" and offers more than another.

Anyway, being included in the new sample is not what I am talking about. I am talking about the current methodology of the chart at BOINC Combined Statistics that is used to browbeat "bad" projects (1.5x SETI and higher), while yet not giving friendly encouragement to those down in the 0.6x-0.8x range... I'm attached to projects at both ends of the spectrum, LHC on the low side and Cosmology on the high side.

As for LHC, up until the recent server software update, they weren't even generating the cpcs value in their xml dump. The statistical sample is very low on that and could just as easily be lower or higher as it could be right on the money.

As for Cosmology, like I said, if the value does not decay over time without tasks coming in and it waits until a host is considered "inactive" at the project, then hosts that decided to quit processing due to all the turmoil (see my post to Eric about the situation there) will still be weighting the average UP by a large amount until such time as they start processing again if the cpcs value doesn't decay on its' own.

Now I ask you, please pay attention to this next part Ned. What I'm about to type is what you seem to continually misinterpret...

This "equality" thing is being treated as "High Priority" nearly continuously by someone or multiple parties. It is not quite at the level of an Urgent Change Control item, but it's not far from it.

Why is it such a high priority? What justifies the high priority?

The idea that projects are going to go into ever-escalating credit amount battles between each other? That has been brought up many, many, many times, yet it hasn't happened.

Is it perhaps the idea that users are selecting projects that pay out more to increase their standings, leaving deserving projects in the cold without enough resources to complete the data analysis, either at all or not quickly enough? No project that I'm aware of is currently begging for volunteers, so I'd say this is not a good reason to have a high priority on "fixing" things.

So, if the dire predictions (aka "doom and gloom" and/or "FUD") are not coming to fruition, then why is this such a burr in someone's saddle? Is it because "it's the right thing to do"? If so, that is indeed a noble reason, but it doesn't mean that it needs to be rushed or halfway done, especially given the fact that the doom and gloom scenarios are not happening...
ID: 790651 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 790668 - Posted: 1 Aug 2008, 0:48:38 UTC - in response to Message 790645.  
Last modified: 1 Aug 2008, 0:57:58 UTC


Out of the most recent 10,000 returns, pick the median host for Credits granted / (benchmarks * time). Base the multiplier of the FLOPS estimate on this host Note that it throws out all of the high and low benchmarks and under requesting earlier hosts. This removes much of the variability caused by high and low benchmarks and allows the project to correct their estimate.


Given the popularity of Core2 Duo / Quad, the most recent 10,000 will be heavily tilted for high benchmarks and lower times, making the median machine more of a relatively new machine. To illustrate, if a Quad returns 20 results at a time (not unheard of), then only 500 machines doing that make up 10,000 retuns.

If this is the case, those of us with non-Intel and/or single-core Intel machines could be pushed to below median fairly regularly... This could then initiate "slow-host-oriented project shopping", bringing to fruition another fear of people shopping for credit, with a new twist. I'm too tired to think through the math at the moment though...
ID: 790668 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 790747 - Posted: 1 Aug 2008, 2:38:05 UTC - in response to Message 790628.  


* The script calculate_credit_multiplier (expected to be run daily as a config.xml task) looks at the ratio of granted credit to CPU time for recent results for each app. Multiplier is calculated to cause median hosts granted credit per cpu second to equal to equal that expected from its benchmarks. This is 30-day exponentially averaged with the previous value of the multplier and stored in the table credit_multplier.

I'd have to read the code in this script to find the definition of recent. I mostly do embedded stuff -- not SQL or Perl. I don't see a reference to RAC or Recent Average Credit.


In the script "recent" is defined as up to 10,000 results which have been granted credit and assimilated in the last 24 hours. The timings are defined at the top of the file, so if a project wants a 60 day moving average or 100,000 results or results returned in the last 48 hours, that change is easy to make. There are some instances where changes like that would be required. If an project gets less than a 10 results a day, the multiplier will bounce around by 10 percent or so. In that case you'd want to look at the last week's worth of results and you might want to go with a 60 day average.

If there are so few hosts (<1000) that the median changes drastically from day to day, that could also be a problem. You'd address that in the same way.

It also doesn't make sense to run it if you just started your project and you don't expect the first result to come back for a year or so. Although if you have trickles, you might want to try to calculate a multiplier based upon your grants of credit for trickles.


It says it looks at 10,000 results picked from results returned that day. I'm not sure how it uses the data from the result, or how much comes from the host table.


The CPU time and granted credit come from the result table. The host benchmarks come from the host table.

Eric

... and all of this data is for the given project, not any other, because it's working against the result table and host table for that project, right?

Thanks for the clarifications. I do mostly embedded systems these days, and that doesn't give much opportunity to use SQL or Perl.
ID: 790747 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19078
Credit: 40,757,560
RAC: 67
United Kingdom
Message 790754 - Posted: 1 Aug 2008, 3:23:11 UTC - in response to Message 790668.  


Out of the most recent 10,000 returns, pick the median host for Credits granted / (benchmarks * time). Base the multiplier of the FLOPS estimate on this host Note that it throws out all of the high and low benchmarks and under requesting earlier hosts. This removes much of the variability caused by high and low benchmarks and allows the project to correct their estimate.


Given the popularity of Core2 Duo / Quad, the most recent 10,000 will be heavily tilted for high benchmarks and lower times, making the median machine more of a relatively new machine. To illustrate, if a Quad returns 20 results at a time (not unheard of), then only 500 machines doing that make up 10,000 retuns.

Don't think the core2's have taken over the BOINC world yet. The majority of cpu's, by a long way, are P4's between 2.8 and 3.2 GHz. BoincStats Seti CPU Breakdown Which have low benchmarks (BM) and long times. The high end X2's have high BM's, higher than Q6600's, but longer times.
Also, if others have the same experience as my core2's they usually need to request more work before they get anywhere near reporting 20 units, connect interval 0.1days. Five or six tasks reported is the usual max for MB tasks.


If this is the case, those of us with non-Intel and/or single-core Intel machines could be pushed to below median fairly regularly... This could then initiate "slow-host-oriented project shopping", bringing to fruition another fear of people shopping for credit, with a new twist. I'm too tired to think through the math at the moment though...

ID: 790754 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 790759 - Posted: 1 Aug 2008, 3:43:33 UTC - in response to Message 790668.  

Given the popularity of Core2 Duo / Quad, the most recent 10,000 will be heavily tilted for high benchmarks and lower times, making the median machine more of a relatively new machine. To illustrate, if a Quad returns 20 results at a time (not unheard of), then only 500 machines doing that make up 10,000 retuns.

If this is the case, those of us with non-Intel and/or single-core Intel machines could be pushed to below median fairly regularly... This could then initiate "slow-host-oriented project shopping", bringing to fruition another fear of people shopping for credit, with a new twist. I'm too tired to think through the math at the moment though...

High benchmarks and lower time do go hand in hand, demonstrating that there is a correlation between the benchmarks and actual work capability. If the benchmarks were perfect for our purposes, it would be a fixed ratio. Eric is taking the median of the actual ratios as representative.

For this project, the most recent 10000 results is about a 10 or 15 minute snapshot just before the script is run. Because most hosts are using always on connections so report any time of day, that's almost certainly a decent sample. There could be some small geographical effect from those like me who only connect about once a day, though. If the script is run during my sleep period my hosts will never contribute, but there are enough similar hosts I don't see any problem in that.

=====================

The comparisons based on the <credit_per_cpu_second> values in the host stats do have some significant differences from the adjustment method. Each host is equally weighted no matter how much work it's doing, but I don't know what criteria the stats sites choose to keep the comparison among "active" hosts so the minimum to be included is uncertain. Basically, the method identifies the set of hosts which are active and common to the two projects, the sum of all those host's <credit_per_cpu_second> values for each project is computed and the ratio taken between those two totals. When I looked at the code for <credit_per_cpu_second> some time ago, I estimated it has about a 2.5 day time constant. It's 2.5 days of CPU time, though, and a quad can have up to 4 days of CPU time in 1 day of wall time. That makes it change fairly quickly with changes in work characteristics. Because the host stats are only dumped once a calendar day some of those variations won't be seen, it's effectively a snapshot.
                                                             Joe
ID: 790759 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 790769 - Posted: 1 Aug 2008, 4:56:50 UTC - in response to Message 790759.  

Given the popularity of Core2 Duo / Quad, the most recent 10,000 will be heavily tilted for high benchmarks and lower times, making the median machine more of a relatively new machine. To illustrate, if a Quad returns 20 results at a time (not unheard of), then only 500 machines doing that make up 10,000 retuns.

If this is the case, those of us with non-Intel and/or single-core Intel machines could be pushed to below median fairly regularly... This could then initiate "slow-host-oriented project shopping", bringing to fruition another fear of people shopping for credit, with a new twist. I'm too tired to think through the math at the moment though...

High benchmarks and lower time do go hand in hand, demonstrating that there is a correlation between the benchmarks and actual work capability. If the benchmarks were perfect for our purposes, it would be a fixed ratio. Eric is taking the median of the actual ratios as representative.

For this project, the most recent 10000 results is about a 10 or 15 minute snapshot just before the script is run. Because most hosts are using always on connections so report any time of day, that's almost certainly a decent sample. There could be some small geographical effect from those like me who only connect about once a day, though. If the script is run during my sleep period my hosts will never contribute, but there are enough similar hosts I don't see any problem in that.

=====================

The comparisons based on the <credit_per_cpu_second> values in the host stats do have some significant differences from the adjustment method. Each host is equally weighted no matter how much work it's doing, but I don't know what criteria the stats sites choose to keep the comparison among "active" hosts so the minimum to be included is uncertain. Basically, the method identifies the set of hosts which are active and common to the two projects, the sum of all those host's <credit_per_cpu_second> values for each project is computed and the ratio taken between those two totals. When I looked at the code for <credit_per_cpu_second> some time ago, I estimated it has about a 2.5 day time constant. It's 2.5 days of CPU time, though, and a quad can have up to 4 days of CPU time in 1 day of wall time. That makes it change fairly quickly with changes in work characteristics. Because the host stats are only dumped once a calendar day some of those variations won't be seen, it's effectively a snapshot.
                                                             Joe


This project still has a high concentration of Pentium 4 systems. The "bad egg" projects have significantly higher ratios of Core2-based systems. The effect I am thinking about will be dependent upon that ratio.

As for the stat site portion, the thing I'm curious about is if I do not do any work at all on my Pentium 4, does my cpcs value go down, or will it only go down once I have another task validate and be granted credit. If it does not decay, either at the project itself (origin point for the data) or have a weighting in the cross-project calculation for that chart, then the chart is not reliable.
ID: 790769 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 790770 - Posted: 1 Aug 2008, 5:00:21 UTC - in response to Message 790754.  


Out of the most recent 10,000 returns, pick the median host for Credits granted / (benchmarks * time). Base the multiplier of the FLOPS estimate on this host Note that it throws out all of the high and low benchmarks and under requesting earlier hosts. This removes much of the variability caused by high and low benchmarks and allows the project to correct their estimate.


Given the popularity of Core2 Duo / Quad, the most recent 10,000 will be heavily tilted for high benchmarks and lower times, making the median machine more of a relatively new machine. To illustrate, if a Quad returns 20 results at a time (not unheard of), then only 500 machines doing that make up 10,000 retuns.

Don't think the core2's have taken over the BOINC world yet. The majority of cpu's, by a long way, are P4's between 2.8 and 3.2 GHz. BoincStats Seti CPU Breakdown Which have low benchmarks (BM) and long times. The high end X2's have high BM's, higher than Q6600's, but longer times.
Also, if others have the same experience as my core2's they usually need to request more work before they get anywhere near reporting 20 units, connect interval 0.1days. Five or six tasks reported is the usual max for MB tasks.


As I just mentioned to Joe, what you are saying is true for this specific project, but not necessarily other projects. This is a pitfall of trying to force one project's particular dynamics onto another.
ID: 790770 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19078
Credit: 40,757,560
RAC: 67
United Kingdom
Message 790776 - Posted: 1 Aug 2008, 5:27:33 UTC - in response to Message 790770.  
Last modified: 1 Aug 2008, 5:29:30 UTC


Out of the most recent 10,000 returns, pick the median host for Credits granted / (benchmarks * time). Base the multiplier of the FLOPS estimate on this host Note that it throws out all of the high and low benchmarks and under requesting earlier hosts. This removes much of the variability caused by high and low benchmarks and allows the project to correct their estimate.


Given the popularity of Core2 Duo / Quad, the most recent 10,000 will be heavily tilted for high benchmarks and lower times, making the median machine more of a relatively new machine. To illustrate, if a Quad returns 20 results at a time (not unheard of), then only 500 machines doing that make up 10,000 retuns.

Don't think the core2's have taken over the BOINC world yet. The majority of cpu's, by a long way, are P4's between 2.8 and 3.2 GHz. BoincStats Seti CPU Breakdown Which have low benchmarks (BM) and long times. The high end X2's have high BM's, higher than Q6600's, but longer times.
Also, if others have the same experience as my core2's they usually need to request more work before they get anywhere near reporting 20 units, connect interval 0.1days. Five or six tasks reported is the usual max for MB tasks.


As I just mentioned to Joe, what you are saying is true for this specific project, but not necessarily other projects. This is a pitfall of trying to force one project's particular dynamics onto another.

If, as has been suggested, other projects are unhappy at Seti because it is paying above the going rate, then I cannot see how it can be said Seti is trying to force its attempt at adjustment on others. Especially as it should be a self correcting mechanism.
How it works when the median "computer" is different on different projects/application I am not sure. It is almost certain, unless the requirements are changed from the initial proposal, that the median computer for AP tasks will be significantly different from the median computer for MB.

And, to answer your question to Joe, if I understand correctly this method is only about active computers. So if a computer has not been active in a particular project then the figures for that computer are excluded from the calculation.
ID: 790776 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 790780 - Posted: 1 Aug 2008, 5:38:07 UTC - in response to Message 790776.  

How it works when the median "computer" is different on different projects/application I am not sure.

I think the basic idea is to first find the "median" computer.

Then, looking at the individual result that "picked" the median computer, you know the duration, and the number of credits granted for the work unit.

From the host record, you have the benchmarks.

You can now calculate what the benchmark * time credit would have been.

The new scaling factor is the ratio between the two.

As this is the median, it should be a "middle-of-the-road" architecture.

That ratio goes into a weighted average with the result from the 29 previous days, so that one "odd" pick won't distort the value too much. I'd also expect some odd picks to be "high" and others "low."
ID: 790780 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 790789 - Posted: 1 Aug 2008, 6:02:05 UTC - in response to Message 790776.  
Last modified: 1 Aug 2008, 6:10:15 UTC


And, to answer your question to Joe, if I understand correctly this method is only about active computers. So if a computer has not been active in a particular project then the figures for that computer are excluded from the calculation.


Most of the sites delcare "active" as credit within the past 30 days. With folks that are very bent on having that number be 0.8 to 1.2, there could be pressure to initiate a second reduction before the true effect of the first reduction is completely known.

Basically, Cosmology is the "worst offender" of the active projects as the chart would lead you to believe. If you are not familiar with all of the turmoil going on over there, you (general sense) might still be inclined to rail against them...

On a different subject, some time ago my gut feeling was that the Linux application was faster on identical hardware over at Einstein. I couldn't prove it, as I wasn't willing to go with a dual-boot and the overhead of the VM made the comparison meaningless. Fortunately, Tony came along by chance with some dual-boot systems and by chance with his various charts proved I was right. Gary Roberts, a forum moderator there, has a thread over there in which he explicitly states that the Linux app is significantly faster than the Windows app.

I'm unclear on how something like that is accounted for in this schema. Right now I see it as a "that's just too bad" situation that would have to be addressed by the project. What if the project is either unwilling or cannot? Not speaking specifically about Einstein, but in general. Yes, I know that would be the case no matter the credit scheme, but I'm pointing out known problems that are masked by the current situation in that Windows is the dominant OS. If things were to be "fair" and "equal", then why would that specific situation, where the only difference was the OS and the performance delta was 10% or more, not qualify for the disadvantaged host to have its' own separate "compensation adjustment"?

A "compensation adjustment" was a real-life thing that was done for me in my prior job, as I was paid below the stated minimum pay range for the job. At each review for 3 years, I had access to a separate pool of money that was earmarked for gradually adjusting people in my situation up. I would've rather had it all at once, obviously, but at least it was done rather than saying "that's just too bad". Irony of all ironies was, the same year that I finally made it to the bottom of the pay range, I was also laid off...
On that note, must sleep....
ID: 790789 · Report as offensive
Eric Korpela Project Donor
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 3 Apr 99
Posts: 1382
Credit: 54,506,847
RAC: 60
United States
Message 790792 - Posted: 1 Aug 2008, 6:12:15 UTC - in response to Message 790769.  
Last modified: 1 Aug 2008, 6:19:41 UTC


This project still has a high concentration of Pentium 4 systems. The "bad egg" projects have significantly higher ratios of Core2-based systems. The effect I am thinking about will be dependent upon that ratio.


Well lets hope the other projects make the same plots I have (I will suggest that they do, or a stat site could since all the info is in the hosts.xml file). As I mentioned earlier, our current median is a dual processor (or dual core, no way to tell the difference) with 1.5 GFLOP/s per core, which sounds the machine on my desktop at work. It's only marginally slower than the median in beta (which tends to attract higher end machines.)


The good thing about medians is that the middle of a distribution tends to vary far less than the average, especially when exponential growth is involved. If machines get replaced on a 3 year cycle, our median machine is probably about 1.5 years old. (Like the one on my desktop at work).

This method doesn't rely on the median machines being the same across projects. It assumes that on average the work done by a machine is proportional to its benchmark scores.

More tomorrow.... Or today if you're across the pond.
@SETIEric@qoto.org (Mastodon)

ID: 790792 · Report as offensive
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 17 · Next

Message boards : Number crunching : New Credit Adjustment?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.