Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /disks/centurion/b/carolyn/b/home/boincadm/projects/beta/html/inc/boinc_db.inc on line 147
Astropulse deployment and implementation issues.

Astropulse deployment and implementation issues.

Message boards : AstroPulse : Astropulse deployment and implementation issues.
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 34420 - Posted: 28 Jul 2008, 15:21:42 UTC

Some of this arises from ruminations in the AP v4.35 thread, but it isn't, strictly speaking, version specific, so it's better to separate it out.

I'm still concerned about the credit scoring (we all know how touchy BOINCers can be about credit....).

These are the results which have been reported so far with v4.35 - there will be others ready to report (I have 3) but we won't get their score until the scheduler is back up.

4109211	1203208	15 Jul 2008 00:59:09 UTC	24 Jul 2008 22:49:01 UTC	206,694.90	719.36		Urs	T7200	Linux
4109203	1203204	15 Jul 2008 00:59:30 UTC	25 Jul 2008 00:36:17 UTC	205,045.80	719.36		Urs	T7200	Linux
4243680	1249465	21 Jul 2008 21:54:43 UTC	27 Jul 2008 20:00:34 UTC	205,005.20	705.80		Urs	T7200	Linux
4243678	1249464	21 Jul 2008 21:54:43 UTC	27 Jul 2008 20:00:34 UTC	206,667.20	705.80		Urs	T7200	Linux
4243659	1249455	23 Jul 2008 19:06:37 UTC	25 Jul 2008 18:20:47 UTC	148,495.30	719.36		Richard	Q9300	XP
4243780	1249515	23 Jul 2008 21:17:05 UTC	27 Jul 2008 12:21:08 UTC	151,903.00	683.11		Richard	Q9300	XP
4243742	1249496	23 Jul 2008 21:23:18 UTC	27 Jul 2008 12:21:08 UTC	150,763.50	683.11		Richard	Q9300	XP
4243753	1249502	24 Jul 2008 13:01:44 UTC	27 Jul 2008 12:21:08 UTC	149,514.30	684.33		Richard	Q9300	XP
4243802	1249526	24 Jul 2008 19:23:35 UTC	27 Jul 2008 00:48:42 UTC	152,565.10	684.33		Richard	Q6600	XP
4244316	1249771	25 Jul 2008 22:38:12 UTC	27 Jul 2008 20:06:31 UTC	149,583.80	684.33		Ambrose	Q6600	Vista


We can clearly see that the credits aren't fixed, which is going to complicate matters even further.

This is presumably due to Eric's new cross-project-parity script, which he checked in just as AP v4.35 became available for testing:
The script looks at a day's worth of returned results for each app (up
to 10 000).  It calculates the granted cpu per unit CPU time for each
host that returned one of these results and finds the median (over all
hosts) of the ratio of granted credit to the credit that would have
been granted based upon the benchmarks.  A 30 day moving average of
that ratio is maintained in the database.  The scheduler multiplies
claimed credit by the value of that ratio (on the day the result was
sent to the host rather than the day it was returned.  Think of it as
a contract.).  A median is used rather than an average to avoid
problems where hosts claim zero CPU time or are granted 1e+25 credits
or claim they can do 1e+37 integer operations per second.  It also
will be relatively unaffected by optimized apps unless more than half
the people in the project are using them.

According to Eric's notes, the "contract" for the credit value of a particular task is struck on the day the task was sent to the host. For that reason, I have sorted the table of results into 'date/time sent' order, and the credit awards don't really seem to follow any logical pattern. Perhaps things will become clearer when we have more results.

In the meantime, I've been double-checking that Q9300 graph for my first result. Because the data in the hosts.xml stats export file has a figure for credit_per_cpu_sec, and the BOINC benchmark is expressed in FLOPS_per_cpu_second, we can derive a dimensionless ratio of credit per flop, which a directly related to FLOPcounted per FLOPbench - an efficiency or optimisation measure.

This seems to work. Using a human-scaled measure of Credits per Flop (i've lost track of the orders of magnitude: I've been scaling by an extra million either way when the figures get too big or too small to imagine. I think these are x10e12), the average score for all Q9300 CPUs running Windows XP in my hosts table is 3.176375. When you sort them, there's a big gap between 3.766 and 4.378 - and sure enough, the 3.766 score comes from a host (2185541) running the stock app, and the 4.378 score from a host (4083757) running an optimised app. That's the basis I used for splitting the population of Q9300s into 'stock' and 'optimised' in my graph.

Being thus reassured about the validity of the stock trendline on that graph, I reckon my Q9300 should be claiming about 1040 credits per AP WU to maintain parity with SETI stock MB.

Talking of parity, we now have, by my reckoning, five different parity/normalisation issues which we could be worrying about.

1) Parity between tasks
In the case of SETI Multibeam, this would relate specifically to the rate of credit/CPU_sec at different Angle Ranges. We have a mechanism to deal with this (the WU multiplier - currently motionless at 2.85), and when SETI 6.02 or thereabouts goes live on the main project, the vast majority of hosts will be equipped to handle it. It would then be possible, in theory, to plot a credit rate / AR curve, and use the multiplier to flatten it. I doubt it's worth spending any time on this, because there are so many tasks that they average out for each individual host or user.

This sort of parity doesn't apply to Astropulse, of course.

2) Parity between CPUs
See, in particular, the difference between AMD and Intel in my other graph. I don't think we could, or should, do anything about this except be aware of it.

3) Parity between operating systems
I'm thinking particularly of the Linux benchmark issue, since that affects the whole 'level playing field' question. But that's for BOINC to sort out.

There's a separate question of the relative efficiency/state of optimisation of science apps for different platforms within each project.

4) Parity between applications within a project
That's where the relationship between SETI Enhanced Multibeam and Astropulse fits into the pattern.

5) Parity between projects
This is what Eric's new script is designed to address, and I have no quarrel with the concept. Picking nits, it's perhaps a shame that it should have been implemented via a multiplier - already there is confusion at Main between the parity (1) task multiplier, and the parity (5) project multiplier.
-------------------------------
In this thread, and on this board, I'm concerned with parity (4) - the balance between SETI MB and Astropulse.

As I hope I've demonstrated, I think the balance is currently wrong: and as Joe Segur has pointed out in another thread, there is currently no apparent way of correcting it dynamically. If TOTAL_FLOPS is defined as a constant in the application code header, and there is no analog of

<analysis_cfg>
<credit_rate>2.8499999</credit_rate>

in the AP WU headers, we're going to need a new app deployment to correct any errors - and that's going to cause more than double the deployment headaches at Main.

We've been working on the current round of testing Astropulse for over 16 months, since Josh came on board at about v4.12. Let's not spoil the launch experience by being too hasty, and failing to test the deployment and implementaion just as throughly.
ID: 34420 · Report as offensive
Winterknight
Volunteer tester

Send message
Joined: 15 Jun 05
Posts: 709
Credit: 5,834,108
RAC: 0
United Kingdom
Message 34422 - Posted: 28 Jul 2008, 17:23:58 UTC

Thanks for all that hard work Richard.

Think that will clear up a few things if people read it and take care to understand it before putting foot in mouth and starting more rumours.

Do think that BOINC should sort out the OS benchmark variations, and it would have been better before rather that after this correction.

One thing I have noted, but haven't collected and studied the data to be specific but the variation in cr/time for the stock app at different AR's is greater than for the optimised app. Here using Stock, a quick look says variation is from below 12/hr to 45/hr, on Main site using opt app variation is between 35/hr and 60/hr.
ID: 34422 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 34423 - Posted: 28 Jul 2008, 19:15:40 UTC

...
Being thus reassured about the validity of the stock trendline on that graph, I reckon my Q9300 should be claiming about 1040 credits per AP WU to maintain parity with SETI stock MB...

If i remember that figure correct, after the development period of MB v5.28 linux my T7200 had a levelled RAC of 850 to 900 running MB beta 24/7 (credit multiplier : 2.85). With the current cr given for AstroPulse wus that RAC would be more like 550 to 600. (One example is no proof)

As far as i understood earlier reasoning on these boards, these figures should be aprox. equal for stock applications. Obviously they are not.


_\|/_
U r s
ID: 34423 · Report as offensive
Josef W. Segur
Volunteer tester

Send message
Joined: 14 Oct 05
Posts: 1137
Credit: 1,848,733
RAC: 0
United States
Message 34430 - Posted: 29 Jul 2008, 5:03:50 UTC - in response to Message 34420.  

...
4) Parity between applications within a project
That's where the relationship between SETI Enhanced Multibeam and Astropulse fits into the pattern.
...

In one of Eric's posts related to the new server-side multiplier adjustment, he said it was application specific. IOW there should be separate adjustment for AP and _enhanced going on, with each independently driven toward the same overall credit rate.

Also, the script is designed to be run daily, so if there are fewer than 10000 results in a day the adjustments are likely to be somewhat uneven. That's perhaps why the AP sequence you showed is not a clear trend. With only about 4000 active hosts here and AP work taking perhaps 2 to 3 days on average, there's bound to be a drunken walk effect.

My guess is the credit parity will resolve itself. I'm more concerned about the Duration Correction Factor, which is definitely a project value for each host rather than an application value.
                                                                Joe
ID: 34430 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 34435 - Posted: 29 Jul 2008, 11:12:22 UTC - in response to Message 34430.  

In one of Eric's posts related to the new server-side multiplier adjustment, he said it was application specific. IOW there should be separate adjustment for AP and _enhanced going on, with each independently driven toward the same overall credit rate.

Eric has posted in detail here, and although he mentions parity with Astropulse, he isn't explicit about how this is to be achieved. However, I've had a (very brief) look at the code in the changeset [trac]changeset:15661[/trac] implementing this, and I think I would agree with your assessment.

In which case, we have another source of systematic bias.

I've extracted, and averaged, the p_fpops and credit_per_cpu_sec, for each different CPU used by SETI crunchers. All 2189 of them! (wonderful things, databases). Here's the sort of thing I mean:

CountOfHostID	AvgOfp_fpops	AvgOfcredit_per_cpu_sec	CPU name
16767	1311.109	0.002683	Intel(R) Pentium(R) 4 CPU 3.00GHz
11169	1298.351	0.002779	Intel(R) Pentium(R) 4 CPU 2.80GHz
7166	1406.428	0.002896	Intel(R) Pentium(R) 4 CPU 3.20GHz
7150	2386.587	0.007254	Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
6734	1185.095	0.002540	Intel(R) Pentium(R) 4 CPU 2.40GHz
5959	731.002	0.002060	Power Macintosh
5098	2280.526	0.006576	Intel(R) Core(TM)2 CPU          6600  @ 2.40GHz
4165	1365.323	0.003566	Intel(R) Pentium(R) D CPU 2.80GHz
4049	1916.962	0.003649	AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
3947	1905.211	0.003484	AMD Athlon(tm) 64 Processor 3200+
3928	2340.602	0.003862	AMD Athlon(tm) 64 X2 Dual Core Processor 4200+
3818	1466.749	0.004013	Intel(R) Pentium(R) D CPU 3.00GHz

This sort of averaging is only useful for common CPUs, so here's a plot of the data for the 70 CPU types where these are at least 1,000 in use at SETI:



The bias comes in because Astropulse tasks will only be issued to, and returned by, the fastest hosts. So the Astropulse moving average will tend to lie somewhere towards the right-hand line, and the Multibeam moving average, being drawn from all hosts, will tend to lie somewhere towards the left-hand line. [Eric says 'Today the "middle of the pack" is a dual processor 3 GFLOP machine with 1GB of RAM': not for Astropulse, it won't be].

Another way of checking this, for those of you running BoincView: BV displays a credit estimate for each task, based on the benchmark*time formula. Experience at projects still using this credit-scoring method (LHC, Orbit) tells me that BV is accurate. While the scheduler is down, you'll have completed AP tasks with their final credit claim visible. My Q9300 would claim just over 690 credits, my E5320 over 800, if we were still using benchmark*time for individual tasks. Anyone got any data for a P4 or AMD yet?

Incidentally, why do the four Pentium Ds (2.80 GHz, 3.00 GHz, 3.20 GHz, and 3.40GHz) lie so squarely on the Core 2 line, and not on the P4 line? I didn't think there was that much improvement between the two ranges. Casts an ever-so-slight question mark over the credit_per_cpu_sec measure in the stats XML.
ID: 34435 · Report as offensive
Winterknight
Volunteer tester

Send message
Joined: 15 Jun 05
Posts: 709
Credit: 5,834,108
RAC: 0
United Kingdom
Message 34445 - Posted: 29 Jul 2008, 15:12:59 UTC

Incidentally, why do the four Pentium Ds (2.80 GHz, 3.00 GHz, 3.20 GHz, and 3.40GHz) lie so squarely on the Core 2 line, and not on the P4 line? I didn't think there was that much improvement between the two ranges. Casts an ever-so-slight question mark over the credit_per_cpu_sec measure in the stats XML.

I think the PentD have two separate cpu's as opposed to the P4 HT, and also have larger L2 cache (2 * 1 or 2MB) than P4's 512k.
ID: 34445 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 34447 - Posted: 29 Jul 2008, 15:41:35 UTC - in response to Message 34445.  

Incidentally, why do the four Pentium Ds (2.80 GHz, 3.00 GHz, 3.20 GHz, and 3.40GHz) lie so squarely on the Core 2 line, and not on the P4 line? I didn't think there was that much improvement between the two ranges. Casts an ever-so-slight question mark over the credit_per_cpu_sec measure in the stats XML.

I think the PentD have two separate cpu's as opposed to the P4 HT, and also have larger L2 cache (2 * 1 or 2MB) than P4's 512k.

I would expect a HT processor to score badly on the 'per_cpu' scale, but the older P4s - like this 2.0 GHz Northwood - didn't have HT: on that basis, the four markers below 1000 flops should line up with the Pent Ds (and the two above the line, pointing at the AMDs, would be the ones with the HT penalty).

Maybe you're right about the PDs (and only the PDs) having more/better L2 cache: it's a dramatic demonstration of the benefit of L2 to SETI if that's the explanation.
ID: 34447 · Report as offensive
Winterknight
Volunteer tester

Send message
Joined: 15 Jun 05
Posts: 709
Credit: 5,834,108
RAC: 0
United Kingdom
Message 34449 - Posted: 29 Jul 2008, 16:06:22 UTC - in response to Message 34447.  
Last modified: 29 Jul 2008, 16:09:03 UTC

Incidentally, why do the four Pentium Ds (2.80 GHz, 3.00 GHz, 3.20 GHz, and 3.40GHz) lie so squarely on the Core 2 line, and not on the P4 line? I didn't think there was that much improvement between the two ranges. Casts an ever-so-slight question mark over the credit_per_cpu_sec measure in the stats XML.

I think the PentD have two separate cpu's as opposed to the P4 HT, and also have larger L2 cache (2 * 1 or 2MB) than P4's 512k.

I would expect a HT processor to score badly on the 'per_cpu' scale, but the older P4s - like this 2.0 GHz Northwood - didn't have HT: on that basis, the four markers below 1000 flops should line up with the Pent Ds (and the two above the line, pointing at the AMDs, would be the ones with the HT penalty).

Maybe you're right about the PDs (and only the PDs) having more/better L2 cache: it's a dramatic demonstration of the benefit of L2 to SETI if that's the explanation.

I personally think it is the L2 cache, my son's q9450 at stock speed, 2.66GHz, is about 5% faster than my Q6600 OC'd @3GHz. Main difference is L2 cache size.
The Q9450 doesn't run here.

edit] the E6600 is also faster/core than the q6600 so maybe memory bandwidth is also a factor.
ID: 34449 · Report as offensive
Winterknight
Volunteer tester

Send message
Joined: 15 Jun 05
Posts: 709
Credit: 5,834,108
RAC: 0
United Kingdom
Message 34452 - Posted: 29 Jul 2008, 16:23:29 UTC

The bias comes in because Astropulse tasks will only be issued to, and returned by, the fastest hosts. So the Astropulse moving average will tend to lie somewhere towards the right-hand line, and the Multibeam moving average, being drawn from all hosts, will tend to lie somewhere towards the left-hand line. [Eric says 'Today the "middle of the pack" is a dual processor 3 GFLOP machine with 1GB of RAM': not for Astropulse, it won't be].

What are the implications if the median computer on different projects and/or applications is different. By different I mean something like P4HT @3GHz to core2 @ 2GHz.
ID: 34452 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 34455 - Posted: 29 Jul 2008, 23:33:21 UTC - in response to Message 34452.  

The bias comes in because Astropulse tasks will only be issued to, and returned by, the fastest hosts. So the Astropulse moving average will tend to lie somewhere towards the right-hand line, and the Multibeam moving average, being drawn from all hosts, will tend to lie somewhere towards the left-hand line. [Eric says 'Today the "middle of the pack" is a dual processor 3 GFLOP machine with 1GB of RAM': not for Astropulse, it won't be].

What are the implications if the median computer on different projects and/or applications is different. By different I mean something like P4HT @3GHz to core2 @ 2GHz.

I spent a lot of time working out the answer to that question this afternoon, and got as far as previewing several paragraphs about the 'median computer', before I double-checked Eric's email and realised he was talking about the median of the granted/benchmark credit ratio. Another one for the draft bucket in the sky!

I posted a couple of benchmark*time credit claims in the 'BoincView' paragraph of my long post this morning. Let's extend, and simplify for argument's sake, that trend:

AP credit claim using benchmark*time method (simplified example):

45nm Core2 (q9xxx): 700
65nm Core2 (Q6xxx): 800
Pentium 4 (not HT): 900
AMD: 1,000

If the P4 is the 'median' credit-rate computer, then 900 credits is deemed the 'right' answer by Eric's script, and all AP credit awards will gently move towards that figure. If P4 is the median computer for SETI/MB (as it currently is), and the benchmarks are the same for SETI and AP (as they most certainly are), then the credit per hour for a given machine will be the same for both SETI/MB and AP - as Josh says in the FAQ.

But if the 65nm Core2 becomes the 'median' credit-rate computer for AP, then 800 credits is deemed the 'right' answer by Eric's script, and AP credit awards will trend downwards to this new value. The credit per hour will diminish for all AP tasks on all hosts. If P4 remains the median for SETI/AP, then parity is broken and Josh has egg on his face.

NB Because Eric is talking about the ratio of CREDITflop to CREDITbench, the actual speed of the median computer doesn't make any difference (to a first approximation). It's the slope of the trendline that it lies on that is crucial. With a small AP population, no optimised apps to muddy the waters, and such a clear difference between the trendlines for P4 and Core2, we could be in for some quite sharp fluctuations if the median is anywhere close to the P4/Core2 transition.
ID: 34455 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 34456 - Posted: 29 Jul 2008, 23:39:11 UTC
Last modified: 29 Jul 2008, 23:44:39 UTC

Shouldn't different hosts running the same app version get the same credit on the same wu ? http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=1249491
Or is this a follow up of last weeks scheduler confusion.

If i look at the current claims on my T7200 i get the feeling that the reducing process of DCF is part of the equation to calculate the credit :
4243775 1249512 22 Jul 2008 22:21:10 UTC 29 Jul 2008 23:03:22 UTC Over Success Done 207,643.40 694.05 pending
4243680 1249465 21 Jul 2008 21:54:43 UTC 27 Jul 2008 20:00:34 UTC Over Success Done 205,005.20 705.80 705.80
4243678 1249464 21 Jul 2008 21:54:43 UTC 27 Jul 2008 20:00:34 UTC Over Success Done 206,667.20 705.80 684.33
4109211 1203208 15 Jul 2008 0:59:09 UTC 24 Jul 2008 22:49:01 UTC Over Success Done 206,694.90 719.36 719.36
4109203 1203204 15 Jul 2008 0:59:30 UTC 25 Jul 2008 0:36:17 UTC Over Success Done 205,045.80 719.36 719.36

I am confused now!
_\|/_
U r s
ID: 34456 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 34457 - Posted: 29 Jul 2008, 23:57:42 UTC - in response to Message 34456.  

Shouldn't different hosts running the same app version get the same credit on the same wu ? http://setiweb.ssl.berkeley.edu/beta/workunit.php?wuid=1249491
Or is this a follow up of last weeks scheduler confusion.

Good spot. As well as the implied assumption in Eric's script that the median SETI/MB computer will have the same CREDITflop to CREDITbench ratio as the median AP computer, there is a second implied assumption, in
The scheduler multiplies claimed credit by the value of that ratio (on the day the result was sent to the host rather than the day it was returned. Think of it as a contract.).

that both halves of a quorum pair are sent on the same day, and hence at the same credit rate.

In your example,

4243732 _5456 19 Jul 2008 03:14:44 UTC 27 Jul 2008 19:58:57 UTC Over Success Done 207,572.20 719.36 684.33 
4243731 18041 25 Jul 2008 19:12:40 UTC 29 Jul 2008 16:25:10 UTC Over Success Done 286,221.40 684.33 684.33

host 5456 can sue for breach of contract, because the scheduler didn't get round to issuing the task to host 18041 until the exchange rate had fallen some 5% - and the rules of quorum say host 5456 suffers that new rate, too.

What happens if you have to wait a couple of months for AWOL wingmen, and the median host has moved from P4 to Core2 in the meantime?

I don't think any of these variations are due to DCF - more likely, Eric's script is desparately trying to compensate for all those 1,869.04 AP v4.34 credit awards from less than 30 days ago.
ID: 34457 · Report as offensive
Urs Echternacht
Volunteer tester
Avatar

Send message
Joined: 18 Jan 06
Posts: 1038
Credit: 18,734,730
RAC: 0
Germany
Message 34462 - Posted: 30 Jul 2008, 2:25:51 UTC - in response to Message 34457.  

...more likely, Eric's script is desparately trying to compensate for all those 1,869.04 AP v4.34 credit awards from less than 30 days ago.

Thanks for your explanations of what is going on at the credits.

The script better would ignore such overestimations of earlier app versions, because i think it will overcompensate.

One 'hard cut' would have been easier than sneaking in on the new 'credit multiplier' of ca. 2.47826 for enhanced.
_\|/_
U r s
ID: 34462 · Report as offensive
Winterknight
Volunteer tester

Send message
Joined: 15 Jun 05
Posts: 709
Credit: 5,834,108
RAC: 0
United Kingdom
Message 34465 - Posted: 30 Jul 2008, 4:40:46 UTC - in response to Message 34455.  
Last modified: 30 Jul 2008, 4:41:43 UTC

The bias comes in because Astropulse tasks will only be issued to, and returned by, the fastest hosts. So the Astropulse moving average will tend to lie somewhere towards the right-hand line, and the Multibeam moving average, being drawn from all hosts, will tend to lie somewhere towards the left-hand line. [Eric says 'Today the "middle of the pack" is a dual processor 3 GFLOP machine with 1GB of RAM': not for Astropulse, it won't be].

What are the implications if the median computer on different projects and/or applications is different. By different I mean something like P4HT @3GHz to core2 @ 2GHz.

I spent a lot of time working out the answer to that question this afternoon, and got as far as previewing several paragraphs about the 'median computer', before I double-checked Eric's email and realised he was talking about the median of the granted/benchmark credit ratio. Another one for the draft bucket in the sky!

I posted a couple of benchmark*time credit claims in the 'BoincView' paragraph of my long post this morning. Let's extend, and simplify for argument's sake, that trend:

AP credit claim using benchmark*time method (simplified example):

45nm Core2 (q9xxx): 700
65nm Core2 (Q6xxx): 800
Pentium 4 (not HT): 900
AMD: 1,000

If the P4 is the 'median' credit-rate computer, then 900 credits is deemed the 'right' answer by Eric's script, and all AP credit awards will gently move towards that figure. If P4 is the median computer for SETI/MB (as it currently is), and the benchmarks are the same for SETI and AP (as they most certainly are), then the credit per hour for a given machine will be the same for both SETI/MB and AP - as Josh says in the FAQ.

But if the 65nm Core2 becomes the 'median' credit-rate computer for AP, then 800 credits is deemed the 'right' answer by Eric's script, and AP credit awards will trend downwards to this new value. The credit per hour will diminish for all AP tasks on all hosts. If P4 remains the median for SETI/AP, then parity is broken and Josh has egg on his face.

NB Because Eric is talking about the ratio of CREDITflop to CREDITbench, the actual speed of the median computer doesn't make any difference (to a first approximation). It's the slope of the trendline that it lies on that is crucial. With a small AP population, no optimised apps to muddy the waters, and such a clear difference between the trendlines for P4 and Core2, we could be in for some quite sharp fluctuations if the median is anywhere close to the P4/Core2 transition.

Thanks for that.
But it does lead to some other thoughts.
First in the ideal BOINC world a computer that has an RAC of say 100 on project 'A' should expect the RAC to remain reasonably constant if it moves to project 'B'. But I'm not sure this will happen if the median computers on different project/apps are greatly different.

Second is the "contract point", if the credits for the task are decided when the WU is split, then the host could get differing credits for very similar tasks (MB same AR) if one task is a re-issue and the other is hot off the press. This would be guaranteed to generated heated discussion. But if the "contract point" is at the time of issue then as you said one, or more, hosts could be paid below the agreed price. Again a heated discussion area.

Third, is OS in the mix, considering Einstein's present S5R3 power applications, almost the defaults I am told, the Linux variant is faster than the others.

Fourth point, the median computer. Is it going to be one specific class i.e. P4 HT @ 3.00GHz with identical stepping, or all P4's at 3GHz. P4's cover a huge range with Williamette, Northwoods and Prescotts, L2 can be between 256k and 2m, and all have SSE2 but some have SSE3, plus some Prescotts have EMT64. You can have one with no HT, one with HT or one with HT but is set to only use one cpu in the preferences. Add all these variables together, if they are classed in the same pot, and could get many discussions.

Hope I am not being too pessimistic, I actually think it is a good idea, just not sure that all scenario's have been thought through.
ID: 34465 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 34472 - Posted: 30 Jul 2008, 9:17:19 UTC - in response to Message 34465.  
Last modified: 30 Jul 2008, 9:17:51 UTC

....
Fourth point, the median computer. Is it going to be one specific class i.e. P4 HT @ 3.00GHz with identical stepping, or all P4's at 3GHz. P4's cover a huge range with Williamette, Northwoods and Prescotts, L2 can be between 256k and 2m, and all have SSE2 but some have SSE3, plus some Prescotts have EMT64. You can have one with no HT, one with HT or one with HT but is set to only use one cpu in the preferences. Add all these variables together, if they are classed in the same pot, and could get many discussions.

Ah, you've fallen into exactly the same trap I did the first time round. I had lots of lovely stuff about what would happen if the "daily representative" median computer had a weird performance metric - all complete bunkum.

The median that Eric says he's using is "ratio of granted credit to the credit that would have been granted based upon the benchmarks". I don't know what units Eric is using, but on the day I took my big hosts.xml dump, the median value was 2.07823298886989 in my working unit of Cobblestones per Benchmark Floating Point Operation (scaled by 1,000,000,000,000). On my graphs, that isn't a point representing any particular computer: it's a ratio, or slope, so it looks like this:



(That's reassuring, because I did it a different way yesterday: I looked at the 200 individual hosts closest to the median, and grouped them by CPU model. 94 of the 200 were Pentium 4s, but there were also 27 AMDs and 8 Core2s - 92 distinct processor models in all).

The data comes from the main project hosts file on 18 July, so it is exclusively MB work. Unfortunately, once AP goes live, we won't be able to do this sort of analysis externally, because the MB and AP figures won't be exported separately in the stats dump.

[Also, I note that, because Eric is tracking 'credit that would have been granted based upon the benchmarks', I ought to be plotting the average of p_iops and p_fpops, not just p_fpops. I'll re-work the figures on that basis, and post again if there are any significant changes]
ID: 34472 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 34473 - Posted: 30 Jul 2008, 10:56:58 UTC

Here's the graph for the proper benchmark values: the vertical scale is now the average of p_iops and p_fpops, not just p_fpops as it was before. Not as pretty as the previous one, but then, the real world never is.....



The 'high efficiency' CPUs (the ones furthest from the median line) are the E6550, the 6700 (no letter), and - at the extreme top - the E8400. So I guess the BOINC benchmarks, and the systemic bias in Eric's project multiplier, are going to get worse in comparison to real-world performance on floating-point intensive projects as the 45nm (and beyond) CPUs roll out into the wider community.

The Core2 with the worst efficiency (all mouth and no trousers - high benchmark but low throughput) is the T7700. And look at those Pentium Ds crunch!

(Remember, these are all averages of at least 1,000 individual specimens of each separate processor type. So I doubt that we're seeing the effects of over-claiming BOINC clients - which would push the data towards the top of the graph - or of optimised science applications - which would push it to the right-hand side).
ID: 34473 · Report as offensive
Winterknight
Volunteer tester

Send message
Joined: 15 Jun 05
Posts: 709
Credit: 5,834,108
RAC: 0
United Kingdom
Message 34474 - Posted: 30 Jul 2008, 10:58:03 UTC
Last modified: 30 Jul 2008, 10:58:34 UTC

"ratio of granted credit to the credit that would have been granted based upon the benchmarks"

That is bit that worries me slightly, the BM figs for Pappa's AMD X2 6000+ and my Q6600 are close, but the time to complete a AP unit are Pappa ~84hrs, mine ~38 hrs. So if the choice of median is for cpu type either Pappa is punished or my computer is granted much more than could be expected.

Or am I missing something and wrong again. this is the usual case
ID: 34474 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 34476 - Posted: 30 Jul 2008, 11:44:23 UTC - in response to Message 34474.  

"ratio of granted credit to the credit that would have been granted based upon the benchmarks"

That is bit that worries me slightly, the BM figs for Pappa's AMD X2 6000+ and my Q6600 are close, but the time to complete a AP unit are Pappa ~84hrs, mine ~38 hrs. So if the choice of median is for cpu type either Pappa is punished or my computer is granted much more than could be expected.

Here is the plot one more time - this had better be the last! - with some sample AP v4.35 points.



Your Q6600 is newer than mine (stepping 11 vs. stepping 7), and you have yours overclocked by ~20%, according to the benchmarks. Pappa is benchmarking just a touch above the average for his CPU class (the fastest of the AMD series).

I've done the cr/sec as if we were all claiming 719.36 credits - the first fixed (we thought) credits for AP v4.35 before the self-adjusting multiplier kicked in. I think Pappa is taking more seconds than his class of computer would be expected to. What we don't know yet is whether Pappa has a particularly sluggish computer, or whether all AMDs (or a particular subset of them) do AP work very inefficiently. More data, please.
ID: 34476 · Report as offensive
Richard Haselgrove
Volunteer tester

Send message
Joined: 3 Jan 07
Posts: 1451
Credit: 3,272,268
RAC: 0
United Kingdom
Message 34477 - Posted: 30 Jul 2008, 11:49:51 UTC - in response to Message 34474.  

.... if the choice of median is for cpu type ....

No. The choice of median is the ratio only. That ratio will be typical of, but not defined by, any particular processor type.
ID: 34477 · Report as offensive
Profile Pappa
Volunteer tester
Avatar

Send message
Joined: 13 Nov 05
Posts: 1724
Credit: 3,121,901
RAC: 0
United States
Message 34495 - Posted: 31 Jul 2008, 1:35:33 UTC - in response to Message 34476.  
Last modified: 31 Jul 2008, 1:37:33 UTC

Richard et al

The X2 6000 has been consistent in times for the various WU's back for the last couple over versions. The 6000 does have a 1 meg cache * 2 for the CPU's and the bus is operating cleanly with no OC.

From what I have looked it is suffering the compiler penalty (intel biased). This goes for machines in Seti Main also. The AK does bring times closer into alignment with the various Intel Machines.

"ratio of granted credit to the credit that would have
What we don't know yet is whether Pappa has a particularly sluggish computer, or whether all AMDs (or a particular subset of them) do AP work very inefficiently. More data, please.


Edit:

BTW the graph shows that fairly clearly... AMD is to the left of the median and Intel is to the right.
Thanks to Paul and Friends
Please consider a Donation to the Seti Project
ID: 34495 · Report as offensive
1 · 2 · Next

Message boards : AstroPulse : Astropulse deployment and implementation issues.


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.