Observation of CreditNew Impact (4)

Message boards : Number crunching : Observation of CreditNew Impact (4)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1444249 - Posted: 19 Nov 2013, 23:13:18 UTC - in response to Message 1444243.  

It needs to be remembered that this isn't simply a SETI concern. Other projects issue credits too, and draw conclusions from them:

Crossed the 1 PFLOP barrier now

If they have SSE+ apps, and are measuring it with some Boinc server figure, they are likely doing 2-3x , or even more than that.

Understand we're talking umber of operations, not credits or estimated flops etc. CreditNew claims are an underclaim.

That particular project is still allocating fixed, flat-rate credits, and - because of the deterministic runtime of their workunits - getting away with it. But, as the discussion arising from the thread title implication hints, possibly drawing scientifically-invalid conclusions from them.

They are anxious to move to CreditNew once they trust it, and they have a robust and well-managed Beta project which could be used as a testbed.



Good to know. sounds like prudent people.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1444249 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21233
Credit: 7,508,002
RAC: 20
United Kingdom
Message 1444403 - Posted: 20 Nov 2013, 8:00:36 UTC - in response to Message 1444228.  
Last modified: 20 Nov 2013, 8:01:51 UTC

... 'Boinc whetstone' versus app technology. The CN Author didn't understand that instruction level parallelism does more operations in the same time, by an average factor of, you guessed it, 3.3.

First and foremost, Whetstone benchmark was designed as a 'worst case' to defeat compiler and hardware optimisations. In creditNew it's treated as a raw peak claim, which is obviously a bad approach to start with, since no practical software works against the processor but with it...

(Also) Are the credits being compared against the credit granted to the hardware performance for the "median computer" that s@h sees?

What happens when the "median computer" host becomes a GPU-based system rather than CPU-only? Would we then see a sudden unholy credits rate shift?


Keep searchin',
Martin

(Jason: Good to see you're still optimizing!)
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 1444403 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1444410 - Posted: 20 Nov 2013, 8:52:12 UTC - in response to Message 1444403.  
Last modified: 20 Nov 2013, 9:00:09 UTC

... 'Boinc whetstone' versus app technology. The CN Author didn't understand that instruction level parallelism does more operations in the same time, by an average factor of, you guessed it, 3.3.

First and foremost, Whetstone benchmark was designed as a 'worst case' to defeat compiler and hardware optimisations. In creditNew it's treated as a raw peak claim, which is obviously a bad approach to start with, since no practical software works against the processor but with it...

(Also) Are the credits being compared against the credit granted to the hardware performance for the "median computer" that s@h sees?

What happens when the "median computer" host becomes a GPU-based system rather than CPU-only? Would we then see a sudden unholy credits rate shift?


Keep searchin',
Martin

(Jason: Good to see you're still optimizing!)


In 'principle' yes, though there is an interesting combination of factors there that leads to a global downscale to a quite specific under 'claiming' app.

1] the GPU [all brands] 'raw peak flop claims' are inflated, of course having been derived from 'Marketing flops' as opposed to Boinc's FPU Whetstone for CPU apps. In 'principle' that would be OK, because of the second point:

2] it isn't 'the median computer' that's used, but in fact the lowest 'claiming' one. In principle that might be OK too, but at least in Multibeam estimates embedded in the tasks, they are based on a theoretical minimum number of operations , such as k*nlogn for an fft portion for example. since you cannot actually do an fft in fewer operations than this, any claim below the estimate is actually 'suspicious' rather than the currently interpreted 'most efficient'. IOW, AVX does just as many operations as any other app, but no allowance for parallelism means the server codes 'believes' in magic, choosing it as 'the most efficient' in number of operations, globally downscaling everyone to below the immutable cobblestone scale.

Where the problem here exists is in that the 'raw claims' for CPU use a knobbled FPU whetstone for SIMD applications/hosts [vectorised, instruction level parallelism]. SSE+ being by far dominant now, and AVX gaining traction, As a consequence, the two distinct 'unholy steps' we all observed match quite well to the introduction of creditnew itself and SSE+ optimisations into the stock CPU application, followed by more recent stock AVX CPU with V7.

[A bit later this evening , local Oz time] I'll try find / post my graphs I have somewhere that visually illustrate the 2 key issues, improper scaling by inaccurate/improper choice of whetstone, and instability characteristics. There are more minor issues, though at this time it appear most symptoms originate from these two, and are relatively insensitive to the OK workunit estimates that might appear a reasonable first suspect
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1444410 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24912
Credit: 3,081,182
RAC: 7
Ireland
Message 1444423 - Posted: 20 Nov 2013, 10:21:54 UTC - in response to Message 1444245.  

200 year old [control theory] technology that works is generally frowned upon in modern academia though, as it doesn't attract funding.


Two points here. 200 years ago, technology was mainly mechanical with the commencement of electrical thereabouts. So if that worked for 200 years, why hasn't an updated version for the electronic age (we can safely say that the 50's started the electronic age), so that's 60 years so far, been produced?

Secondly, can it be done to meet modern academia's approval?


ID: 1444423 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1444429 - Posted: 20 Nov 2013, 10:41:40 UTC - in response to Message 1444423.  
Last modified: 20 Nov 2013, 10:43:44 UTC

200 year old [control theory] technology that works is generally frowned upon in modern academia though, as it doesn't attract funding.


Two points here. 200 years ago, technology was mainly mechanical with the commencement of electrical thereabouts. So if that worked for 200 years, why hasn't an updated version for the electronic age (we can safely say that the 50's started the electronic age), so that's 60 years so far, been produced?

Secondly, can it be done to meet modern academia's approval?




Correct on both points.

There are a lot of practical possibilities. One possible effective choice, and exceedingly simple to code, modern engineered version using 'classical' control theory is known as a PID controller. It is based on steam engine mechanical governors. See http://en.wikipedia.org/wiki/PID_controller. [OK: 1890's there, so more like 120+ years]

At first educated glance, the existing mechanism looks like a PI controller, i.e. mising the 'D damping term. It isn;t quite that though, because it uses sampled averages [ sigma-delta controller with no delta ], instead of instantaneous cummulative error values. The weightings make it closer to a 'P' with some fudge factors to replace the 'I' and 'D' terms, which if there 'govern' long term drift and noise immunity.

For the second part, there are 'stable' systems and 'unstable' ones, formal engineering definitions. CreditNew as currently implemented fits in the second category, with particular traits observable I'll describe with my graphs a bit later.

Fortunately, instability and improper choices aside, CreditNew as a whole is 'relatively' sound. Short term very minor bandaids are feasible, and more carefully tuned control for the longer term ends up simpler and more robust by far, potentially with less server/database load and other advantages.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1444429 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24912
Credit: 3,081,182
RAC: 7
Ireland
Message 1444431 - Posted: 20 Nov 2013, 10:50:26 UTC - in response to Message 1444429.  


Fortunately, instability and improper choices aside, CreditNew as a whole is 'relatively' sound. Short term very minor bandaids are feasible, and more carefully tuned control for the longer term ends up simpler and more robust by far, potentially with less server/database load and other advantages.


Thanks Jason.

So wouldn't it be more effective for Creditnew to get that fine control now rather than later as the projects as a whole can benefit?

ID: 1444431 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1444437 - Posted: 20 Nov 2013, 11:09:07 UTC - in response to Message 1444431.  
Last modified: 20 Nov 2013, 11:10:23 UTC


Fortunately, instability and improper choices aside, CreditNew as a whole is 'relatively' sound. Short term very minor bandaids are feasible, and more carefully tuned control for the longer term ends up simpler and more robust by far, potentially with less server/database load and other advantages.


Thanks Jason.

So wouldn't it be more effective for Creditnew to get that fine control now rather than later as the projects as a whole can benefit?


Possibly. The mood from the project has been 'understaffed and occupied with other important stuff' like Android and GBT is my guess, which I happen to agree should be high priority. In that light, for example, I emailed months ago about moving Linux and Mac Cuda to Beta, as well as querying GPU reliability for factoring into x42's design. Understandably no response to date, so I'm moving forward regardless.

As for CN itself, it does intrinsically tie into time estimates. I did work on modifying a 6.10.58 years ago for my own use that uses client side per application correction stabilised with such a PID scheme. That's on my Windows hosts. I still use that today: estimates are generally to within a few seconds either way, and it is robust to outliers like overflows etc

So in principle a 'proper fix' is warranted and doable, though I have to factor in that multithreaded and heterogeneous forms of parallelism are very very near [ i.e. planned for phase two x42]. That sounds complex at first, just like the rest, but it does turn out there are easy ways to guage effective parallelism server side if the original work estimates are 'reasonable'. Considering all that though warrants care for future-proofing.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1444437 · Report as offensive
Sirius B Project Donor
Volunteer tester
Avatar

Send message
Joined: 26 Dec 00
Posts: 24912
Credit: 3,081,182
RAC: 7
Ireland
Message 1444444 - Posted: 20 Nov 2013, 11:24:26 UTC - in response to Message 1444437.  

Thanks again. I thought Time and Manpower would enter the equation. However, it would make more sense to get it fine tuned asap so it can reasonably run under it's own steam as well as the reduced server/database load.

That must surely give them more time to spend on android/GBT and others. Another factor of that would be a hell of a reduction in credit complaints.
ID: 1444444 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1444446 - Posted: 20 Nov 2013, 11:36:17 UTC - in response to Message 1444437.  

So in principle a 'proper fix' is warranted and doable, though I have to factor in that multithreaded and heterogeneous forms of parallelism are very very near [ i.e. planned for phase two x42]. That sounds complex at first, just like the rest, but it does turn out there are easy ways to guage effective parallelism server side if the original work estimates are 'reasonable'. Considering all that though warrants care for future-proofing.

As I understand it, "phase two x42" is a specific SETI-centric concept. Unfortunately, 'CreditNew' applies BOINC-wide: data is sparse about how many projects are currently using it, but I suspect the nay-sayers are wider of the mark than they realise.
ID: 1444446 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1444448 - Posted: 20 Nov 2013, 11:41:19 UTC - in response to Message 1444446.  
Last modified: 20 Nov 2013, 11:45:35 UTC

So in principle a 'proper fix' is warranted and doable, though I have to factor in that multithreaded and heterogeneous forms of parallelism are very very near [ i.e. planned for phase two x42]. That sounds complex at first, just like the rest, but it does turn out there are easy ways to guage effective parallelism server side if the original work estimates are 'reasonable'. Considering all that though warrants care for future-proofing.

As I understand it, "phase two x42" is a specific SETI-centric concept. Unfortunately, 'CreditNew' applies BOINC-wide: data is sparse about how many projects are currently using it, but I suspect the nay-sayers are wider of the mark than they realise.


Indeed, though by very nature [evolving] heterogeneous design is somewhat universal, and inevitable now. Debate and new ideas are always welcome, especially in stuff that hasn't really been done before, but 'naysaying' achieves nothing but the destruction of motivation.

[Edit:] e.g. 'looks like more trouble than it's worth', or 'What I'm doing isn't working, so it must be someone else's fault'
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1444448 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14679
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1444453 - Posted: 20 Nov 2013, 12:03:53 UTC - in response to Message 1444448.  

So in principle a 'proper fix' is warranted and doable, though I have to factor in that multithreaded and heterogeneous forms of parallelism are very very near [ i.e. planned for phase two x42]. That sounds complex at first, just like the rest, but it does turn out there are easy ways to guage effective parallelism server side if the original work estimates are 'reasonable'. Considering all that though warrants care for future-proofing.

As I understand it, "phase two x42" is a specific SETI-centric concept. Unfortunately, 'CreditNew' applies BOINC-wide: data is sparse about how many projects are currently using it, but I suspect the nay-sayers are wider of the mark than they realise.

Indeed, though by very nature [evolving] heterogeneous design is somewhat universal, and inevitable now. Debate and new ideas are always welcome, especially in stuff that hasn't really been done before, but 'naysaying' achieves nothing but the destruction of motivation.

[Edit:] e.g. 'looks like more trouble than it's worth', or 'What I'm doing isn't working, so it must be someone else's fault'

Sorry, I only meant 'naysaying' in the sense of people who say "very few projects have adopted CreditNew" - that was the version which had reached Eric's ears. Perhaps because the people who are most interested in Credit, and comment about it on message boards, have migrated to the projects which have moved furthest away from the BOINC norms for credit.
ID: 1444453 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1444454 - Posted: 20 Nov 2013, 12:07:42 UTC - in response to Message 1444453.  
Last modified: 20 Nov 2013, 12:08:12 UTC

So in principle a 'proper fix' is warranted and doable, though I have to factor in that multithreaded and heterogeneous forms of parallelism are very very near [ i.e. planned for phase two x42]. That sounds complex at first, just like the rest, but it does turn out there are easy ways to guage effective parallelism server side if the original work estimates are 'reasonable'. Considering all that though warrants care for future-proofing.

As I understand it, "phase two x42" is a specific SETI-centric concept. Unfortunately, 'CreditNew' applies BOINC-wide: data is sparse about how many projects are currently using it, but I suspect the nay-sayers are wider of the mark than they realise.

Indeed, though by very nature [evolving] heterogeneous design is somewhat universal, and inevitable now. Debate and new ideas are always welcome, especially in stuff that hasn't really been done before, but 'naysaying' achieves nothing but the destruction of motivation.

[Edit:] e.g. 'looks like more trouble than it's worth', or 'What I'm doing isn't working, so it must be someone else's fault'

Sorry, I only meant 'naysaying' in the sense of people who say "very few projects have adopted CreditNew" - that was the version which had reached Eric's ears. Perhaps because the people who are most interested in Credit, and comment about it on message boards, have migrated to the projects which have moved furthest away from the BOINC norms for credit.


Yes I need to post my graphs in a bit, but careful explanation is warranted. When I talk about practicality and feasibility, I am implying many things, including stabilised functionality in a formal engineering sense, and 'fair credit for work done in a more abstract sense. Current awards are a fraction of the cobblestone scale, so unfair and in contradiction to the intent in CreditNew's documentation.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1444454 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1444496 - Posted: 20 Nov 2013, 13:52:01 UTC
Last modified: 20 Nov 2013, 13:53:21 UTC

Here's the first of two graphs, This one for 'stability'

The Blue line is actual credit awards for a CPU SSE3 enabled anonymous machine 'X', on the Beta project. 'Shorties' only in issued/processed sequence.


In this one I deliberately downscaled the Red graph, zooming in, to allow the graph extents to show the real [Blue] credit award instabilities closely.

Notable features of the 'real' blue line include:
- It 'looks like' it wants to be around a particular value but jumps around it.
- When you work out the cobblestone scale against the task estimate, it should be well over 100 credits, as opposed to 30-40
- possible 'self similar' looking oscillations

The Red line is the same input data[whetstone, elapsed, cobblestone scale] fed into a very rough PID controller, implemented in a google spreadsheet. I divided/downscaled its output as mentioned to illustrate what stability is. Notable characteristics include a small 10 percent initial overshoot acting as if starting as a new host, which is for rapid convergence within 10 tasks. No special setiathome multibeam specific factors were needed, and it seems more immune to 'noise', such as like periodic heavy machine usage that was indicated in the source data.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1444496 · Report as offensive
Profile shizaru
Volunteer tester
Avatar

Send message
Joined: 14 Jun 04
Posts: 1130
Credit: 1,967,904
RAC: 0
Greece
Message 1444501 - Posted: 20 Nov 2013, 14:07:30 UTC - in response to Message 1444496.  

At first glance it appears CN is auto-adjusting every few tasks. Assuming all tasks want to settle around 38, it looks like it's trying to compensate whenever you get low-balled and vice-versa. Which could also be an explanation for repeating patterns?

Why it has to jump through these hoops is, of course, a different kettle of fish. But it looks (dare I say) fair?
ID: 1444501 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1444507 - Posted: 20 Nov 2013, 14:25:36 UTC - in response to Message 1444501.  
Last modified: 20 Nov 2013, 14:38:01 UTC

At first glance it appears CN is auto-adjusting every few tasks. Assuming all tasks want to settle around 38, it looks like it's trying to compensate whenever you get low-balled and vice-versa. Which could also be an explanation for repeating patterns?

...


Exactly, that's called oscillation, just like the ringing of a bell.

Here is a link to famous video depicting 'Galloping Gertie', aka the Tacoma narrows bridge that collapsed. It led to a wider understanding of resonance in civil engineering.

http://www.youtube.com/watch?v=j-zczJXSxnw

[Edit:]
Why it has to jump through these hoops is, of course, a different kettle of fish. But it looks (dare I say) fair?

Oh it's perfectly fair to get a random [or more precisely chaotic] amount of credit for work done. I will employ you and pay you based on a random amount determined by magic elves that work faster than you, yet claim less. Sounds fair right ? [ wait for the next graph in a bit though before agreeing ;D ]
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1444507 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1444547 - Posted: 20 Nov 2013, 15:45:30 UTC
Last modified: 20 Nov 2013, 15:55:13 UTC

And here is the second graph depicting 'Scaling': Same input data, now not down scaling the red graph.



The blueline is the same 'real' data. The red line is now unscaled and governed by a commercial SSE whetstone [Sisoft Sandra Single-threaded SSE2] instead of Boinc's [knobbled] FPU one.

First important points here include firstly that Whetstone is not a 'peak measure' at all, as implied in the CreditNew documentation, but a worst case. So this represents 'more reasonable' credit for the same work, but in fact the cobblestone scale specifies higher.

Second point is that the actual work performed irrespective of processing device and elapsed time actually comes out even higher than this. That is the 'fair' cobblestone scale, and it's the system's inability to cope with parallelism [Via SSE and AVX SIMD] that is to blame.

For at least multibeam and astropulse here, in between a simple bandaid and a comprehensive forward looking fix, there exists another option. Any [valid] claim lower than possible by mathematical and physical laws is using some form of effective parallelism [SIMD, multithreading, heterogeneous slave monkeys, etc] . Use the inverse of that to scale the claim and you have 'fair credit' that compensates for multiple threads and enslaved monkeys in parallel.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1444547 · Report as offensive
Profile petri33
Volunteer tester

Send message
Joined: 6 Jun 02
Posts: 1668
Credit: 623,086,772
RAC: 156
Finland
Message 1444681 - Posted: 20 Nov 2013, 20:23:36 UTC

I) What would be the effect of ...

having a powerful host with at least 2 GPUs doing multiple tasks at at time:
a) one or two heavily blanked AP work units that take a long time and then
-- at the same time --
b) doing non blanked AP unit(s) that would execute a lot faster than normal since GPU is less busy waiting for the CPU to do the blanking for other task(s).
-- and on top of that ... at the same time running mixed workloads --
c) running normal MB task(s) and/or vlar on NVIDIA GPU
-- plus some CPU tasks.

An example : A host confugured to run 2 MB or 3 AP per GPU and having 2 GPUs and at the same time doing 6 CPU AP or MB tasks and leaving 6 CPU cores free out of total 12 virtual cores (6 real FPUs).

Then ... A transition from MB only to AP only and then after a few days the evident running out of AP work and transitioning back to MB only.

During the transition there could be a real mixture of different workloads going on and having the most unusual run times.

There can not be an AI system that can figure out the "normal" processing rate.

My APR varies from the average about +- 30 for AP and +-20 for MB. The TDCF (task duration correction factor) is from 0.8 to 2,22. <-- the number seems random (i.e. varies too fast depending of the (too few) last accepted or last reported WU).


II) When doing BOINC Wheatstone or whatever calculation I get 28000 when running 50% of the processors and about 12000 when running 100% of them. I know that there are just 6 physical AVX/SSE/math units in my CPU even though there are 12 virtual cores and when using all of them there is a penalty for switching the tasks (register file backup etc.) and a penalty for overcommitting the CPU cache.

-- but --

if that number is used for determining the efficiency/efficacy of the CPU I'd get two totally different numbers if there was an optimization that favours running the application one or multiple at a time.


The Question: How does the credit new know how efficient a host is? What would be a hosts maximum? How well is it doing right now?

To overcome Heisenbergs:
"You can't always get what you want / but if you try sometimes you just might find / you get what you need." -- Rolling Stones
ID: 1444681 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 1444694 - Posted: 20 Nov 2013, 20:40:38 UTC

Another discrepancy is that BOINC's "Whetstone" benchmark is double precision while the AMD and NVIDIA peak GFLOPS are rated based on single precision. Since the benchmark is non-SIMD that doesn't make a huge difference, the canonical optimized Whetstone implementations from Roy Longbottom's PC benchmark collection give about 1092 MWIPS double precision and 1046 MWIPS single precision on my 1.4GHz Pentium-M (BOINC gives about 1292 MWIPS). Even for 64 bit BOINC builds where the benchmark would be run using SSE scalar operations there probably would be little difference.

One way to sort of level the playing field would be to rate CPU peak GFLOPS using an approach similar to that used for GPUs. A post by an Intel engineer from a few years ago explains some of the considerations which would be needed for a fully detailed version. In practice, Intel has published export compliance metrics which include GFLOPS ratings for many of its CPUs (note those are for the full package so need to be divided by the number of CPUs in the package). The practical approach for BOINC would be to simply multiply the CPU clock rate by the number of single precision operations which can be done simultaneously by whatever SIMD capability a processor has. That would also work for Power PC, ARM, etc. CPUs.
                                                                  Joe
ID: 1444694 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1444717 - Posted: 20 Nov 2013, 21:06:41 UTC - in response to Message 1444681.  

The Question: How does the credit new know how efficient a host is? What would be a hosts maximum? How well is it doing right now?


All creditNew really has at the start is a somewhat reasonable in'our MB case' approximate of the number of calculation operations, minimum that the task takes.

When the task returns it now has an elapsed time. That can yield a rate for the task representing throughput, which is averaged into APR. Moving averages from point samples, without proper damping controls, can be pretty volatile as seen in the prior similar DCF mechanism too. It'll be susceptible to all sorts of ringing, overshoot, drift and susceptibility to noise from even normal conditions like periodic heavy machine usage.

These are loosely linked [the estimates and creditnew that is] through a scaling system and a set of averages, project DCF now disabled if using a relatively recent client. That was to put the estimate scaling server side in the hopes of addressing projects like here that mix different applications in the same project. It's a design choice I wouldn't have gone with due to increasing the server workload and slowing client estimate adaptation to new work fetches.

The somewhat unstable APR value is probably the closest figure we have at the moment to some sortof reality. For multibeam applications, still containing the stderr flopcounter value, you can take this and divide it by your choice of elapsed or CPU time to yield a throughput figure. Either way what's needed is a smooth most of the time, responsive when needed, and relatively noise immune controlled number... Here the absolute value isn't all the critical, how it changes over time is.

Obviously this throughput figure is going to vary for all sorts of reasons. Nonetheless for determining overall throughput and estimates etc, properly handled control loops work a lot better than sampled averages.

It's at this point you diverge to control systems engineering theory, but it's perhaps analogous enough to driving a car. Most people don't sit 'feathering' the throttle every millisecond around the speedometer reading at the speed limit
. That would be prone to all sorts of noise, overshoot etc. Instead you slip into a groove that's near enough,then make minor smooth adjustments as necessary. That's control.

So in a sense in CreditNew's current form here, it's this aggressive use of statistics, where they a not the most robust or elegant choice,that is the Achilles' heel causing so much consternation.


"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1444717 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 1444728 - Posted: 20 Nov 2013, 21:23:04 UTC - in response to Message 1444694.  
Last modified: 20 Nov 2013, 21:30:39 UTC

Another discrepancy is that BOINC's "Whetstone" benchmark is double precision while the AMD and NVIDIA peak GFLOPS are rated based on single precision. Since the benchmark is non-SIMD that doesn't make a huge difference, the canonical optimized Whetstone implementations from Roy Longbottom's PC benchmark collection give about 1092 MWIPS double precision and 1046 MWIPS single precision on my 1.4GHz Pentium-M (BOINC gives about 1292 MWIPS). Even for 64 bit BOINC builds where the benchmark would be run using SSE scalar operations there probably would be little difference.

One way to sort of level the playing field would be to rate CPU peak GFLOPS using an approach similar to that used for GPUs. A post by an Intel engineer from a few years ago explains some of the considerations which would be needed for a fully detailed version. In practice, Intel has published export compliance metrics which include GFLOPS ratings for many of its CPUs (note those are for the full package so need to be divided by the number of CPUs in the package). The practical approach for BOINC would be to simply multiply the CPU clock rate by the number of single precision operations which can be done simultaneously by whatever SIMD capability a processor has. That would also work for Power PC, ARM, etc. CPUs.
                                                                  Joe


Definitely worth detailed consideration IMO, especially for those, I suppose many, projects that don't have particularly good wu estimates to start with.

We certainly don't need a super precise figure here. I'd be happy with plus or minus a few credits out of 100. For MB I'd like to see the 'PI like' weighted sigma averages replaced with a smoother control to compare, and see how close simply scaling the Boinc whetstone by:

effective_parallelism = 1
if raw_claim < wu_est then effective_parallelism = wu_est/raw_claim
proper_claim = raw_claim * effective_parallelism <--- yep, that's claiming the estimate.[ a lower cap for parallel tasks, SIMD etc]

Something like that might even be 'close enough' for current and future SIMD variants, as well as cope with multithreading and maybe even non-symetricload balancing.
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 1444728 · Report as offensive
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Observation of CreditNew Impact (4)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.