Let's Play CreditNew (Credit & RAC support thread)

Message boards : Number crunching : Let's Play CreditNew (Credit & RAC support thread)
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 13 · Next

AuthorMessage
Profile iwazaru
Volunteer tester
Avatar

Send message
Joined: 31 Oct 99
Posts: 173
Credit: 509,430
RAC: 0
Greece
Message 1937256 - Posted: 26 May 2018, 13:26:57 UTC - in response to Message 1937234.  
Last modified: 26 May 2018, 13:30:08 UTC

That takes us back to the naughties - 2008 and before, when neither CN nor GPUs were in use.

Does this mean you can now tell Eric that, "Hey, turns out there's nothing really too wrong with the credit system. It kinda works like your flops based system that everybody loved. So, you know... oops."
?

The basic reason is that <rsc_fpops_est> is overstated for MB tasks by - oooh, 248%?

Does that mean that if we were to count "real" flops, credit would drop by more than 50% (instead of rise by 50%) ?
And that we should be careful what we wish for? :)
ID: 1937256 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22228
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1937257 - Posted: 26 May 2018, 13:46:00 UTC

Both your quotes are THEORY - not based on what is actually happening.

With the progressive improvement in the stock and optimised applications the use of "peak" FLOPs has become more inaccurate, which I know is somewhat counter intuitive. But let me try to explain this AGAIN.
FLOPs is based on a "standard" set of calculations complied in a very simple way, no compiler optimisation, no code optimisation, no "fancy" op-codes. As any of these come into play the quality of the estimate is degraded. How predictable the degradation is very hard to predict - take an 80core AMD FX procssor as an example:
Run a benchmark process on a single core and you will get one answer. Run the same benchmark twice, simultaneously or in sequence you have a roughly equal chance of getting the same answer, or a different one, depending on how the operating system decides to allocate the tasks (these processors have a Floating Point Unit shared between pairs of Logic Processing Units), now try it with three, four times..... Very messy, and we haven't even looked at the impact of compiling the code for an "Intel" or "AMD" base instruction set - both will run, but may give different performances depending on the actual low level instruction set that is required... So take FLOPs as implemented within BOINC & SETI with at least a big bucket of salt, a pinch just isn't big enough :-(

And since FLOPs are inaccurate that means that Cobblestones (which are only scaled FLOPs are equally inaccurate.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1937257 · Report as offensive
Profile iwazaru
Volunteer tester
Avatar

Send message
Joined: 31 Oct 99
Posts: 173
Credit: 509,430
RAC: 0
Greece
Message 1937266 - Posted: 26 May 2018, 14:15:11 UTC - in response to Message 1937257.  

Yeah I know all that.

I also know you can't go above the LINPACK benchmark.
I have found no evidence you can go above the WhetStone bench either.

Please show me so I can really understand.
ID: 1937266 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1937272 - Posted: 26 May 2018, 14:52:37 UTC - in response to Message 1937266.  

OK, here's a show. An Overview of Common Benchmarks. From the top of the third page:

Many Whetstone versions copied informally
and used for benchmarking have the
print statements removed, apparently with
the intention of achieving better timing
accuracy. This is contrary to the authors’
intentions, since optimizing compilers may
then eliminate significant parts of the
program. If timing accuracy is a problem,
the loop bounds should be increased in
such a way that the time spent in the extra
statements becomes insignificant.
We discussed SIMD operations a day or two ago. Whetstone dates from 1976, long before even the most basic SSE instruction set had been implemented (that was 1999). Use of any form of SSE, up to and beyond AVX, would be considered an optimisation in the Whetstone context, and excluded. That's why efficiencies of up to 400% are possible, and why the statement of equivalence of the terms 'whetstone' and 'peak flops' in the BOINC documentation was nonsense from the day it was written.

Linpack allows benchmarking to include optimisations, which would make it a better choice for this purpose.
ID: 1937272 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1937275 - Posted: 26 May 2018, 15:39:24 UTC - in response to Message 1937272.  

It is, of course, possible that BOINC's use of the term 'Whetstone' is false, in that it breaks the 'no optimisation' rule. Somebody will have to check both the source code, and the compiler settings on David's VS2010 build machine, for that.

Here's some data.

BOINC:
26/05/2018 16:26:16 |  | Benchmark results:
26/05/2018 16:26:16 |  | Number of CPUs: 4
26/05/2018 16:26:16 |  | 4080 floating point MIPS (Whetstone) per CPU
26/05/2018 16:26:16 |  | 15704 integer MIPS (Dhrystone) per CPU

MajorGeeks:
3,418,803

whets64MP:
Whetstone Single Precision MP SSE Benchmark Sat May 26 16:16:27 2018

 Via Microsoft C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64
 Uses 2 threads second at THREAD_PRIORITY_BELOW_NORMAL. Code produced by
 compiler is not necessarily the same for both threads. So speed can vary
 between threads. Vax MIPS are over-inflated due to excessive optimisation.

 MFLOPS    Vax  MWIPS MFLOPS MFLOPS MFLOPS    Cos    Exp  Fixpt     If  Equal
  Gmean   MIPS            1      2      3    MOPS   MOPS   MOPS   MOPS   MOPS
   1728  46757   9982   2053   1689   1488    399    139   6406   8155  15653
  Thread 1              1024    842    838    199   69.4   3195   4030  14555
  Thread 2              1029    847    650    200   69.4   3211   4126   1098

 Numeric results were as expected

  CPUID and RDTSC Assembly Code
  CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000506E3
  Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz Measured 3192 MHz
  Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow,
  Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus
  AMD64 processor architecture, 4 CPUs 
  Windows NT  Version 6.1, build 7601, Service Pack 1
  Memory 8117 MB, Free 5416 MB
  User Virtual Space 8388608 MB, Free 8388601 MB

linpack64:
 Linpack SSE2 Double Precision Unrolled Benchmark n @ 100
 Via Microsoft C/C++ Optimizing Compiler Version 15.00.30729.207 for x64
 Sat May 26 16:13:22 2018

 Speed    3637.06 MFLOPS

 Numeric results were as expected

  CPUID and RDTSC Assembly Code
  CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000506E3
  Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz Measured 3192 MHz
  Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow,
  Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus
  AMD64 processor architecture, 4 CPUs 
  Windows NT  Version 6.1, build 7601, Service Pack 1
  Memory 8117 MB, Free 5463 MB
  User Virtual Space 8388608 MB, Free 8388600 MB
The latter two came from http://www.roylongbottom.org.uk/win64.htm - I'm not sure he obeyed the 'no optimisations' rule, either.
ID: 1937275 · Report as offensive
Profile iwazaru
Volunteer tester
Avatar

Send message
Joined: 31 Oct 99
Posts: 173
Credit: 509,430
RAC: 0
Greece
Message 1937282 - Posted: 26 May 2018, 17:41:41 UTC - in response to Message 1937275.  

Yeah I did mine a couple weeks ago with similar results....

 Whetstone Single Precision SSE Benchmark Fri May 11 02:00:44 2018

 Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86

 MFLOPS    Vax  MWIPS MFLOPS MFLOPS MFLOPS    Cos    Exp  Fixpt     If  Equal
  Gmean   MIPS            1      2      3    MOPS   MOPS   MOPS   MOPS   MOPS
   1050  28812   3727   1125   1125    915   76.9   49.1   4019   5348   8903

 Numeric results were as expected

  CPUID and RDTSC Assembly Code
  CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000906E9
  Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz Measured 2808 MHz
  Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow,
  Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus
  Intel processor architecture, 8 CPUs 
  Windows NT  Version 6.2, build 9200, 
  Memory 4096 MB, Free 4096 MB
  User Virtual Space 4096 MB, Free 4052 MB


 Linpack SSE2 Double Precision Unrolled Benchmark n @ 100
 Via Microsoft C/C++ Optimizing Compiler Version 15.00.30729.207 for x64
 Fri May 11 01:54:58 2018

 Speed    4083.80 MFLOPS

 Numeric results were as expected

  CPUID and RDTSC Assembly Code
  CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000906E9
  Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz Measured 2808 MHz
  Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow,
  Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus
  AMD64 processor architecture, 8 CPUs 
  Windows NT  Version 6.2, build 9200, 
  Memory 8078 MB, Free 5464 MB
  User Virtual Space 134217728 MB, Free 134217683 MB


I took that to mean 4 GFLOPS for our procs is the highest possible, with or without optimizations.
Am I incorrect?
ID: 1937282 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22228
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1937285 - Posted: 26 May 2018, 17:49:15 UTC

All that means is that running the mix of operations, in the sequence in the Linpac benchmark your computer has scored a speed of 4083.80 MFLOPS
Change the mix or sequence operations and the same computer could score far more or far less. Such benchmarks are really only a rough guide to the real world, not an absolute prediction.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1937285 · Report as offensive
Profile iwazaru
Volunteer tester
Avatar

Send message
Joined: 31 Oct 99
Posts: 173
Credit: 509,430
RAC: 0
Greece
Message 1937286 - Posted: 26 May 2018, 17:56:28 UTC - in response to Message 1937285.  

Once again you haven't answered my question.

I take the LINPACK result to be a best case scenario, unobtainable in the real world.
Is this assumption incorrect?
ID: 1937286 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22228
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1937288 - Posted: 26 May 2018, 18:45:19 UTC

It is a metric using a very specific set of instructions, it is neither "the best" nor "the worst".

It is, and only is, an indication of performance, not an absolute metric.
Thus you are WRONG.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1937288 · Report as offensive
Profile iwazaru
Volunteer tester
Avatar

Send message
Joined: 31 Oct 99
Posts: 173
Credit: 509,430
RAC: 0
Greece
Message 1937289 - Posted: 26 May 2018, 19:01:13 UTC - in response to Message 1937288.  

So there's a LINPACK benchmark out there that'll show my proc to be 10 GFLOPS instead of 4GFLOPS?
Can you show me this version of the LINPACK bench?
ID: 1937289 · Report as offensive
Richard Haselgrove Project Donor
Volunteer tester

Send message
Joined: 4 Jul 99
Posts: 14654
Credit: 200,643,578
RAC: 874
United Kingdom
Message 1937290 - Posted: 26 May 2018, 19:33:52 UTC - in response to Message 1937289.  

So there's a LINPACK benchmark out there that'll show my proc to be 10 GFLOPS instead of 4GFLOPS?
Can you show me this version of the LINPACK bench?
What would be the point? You could write your own program, output some pretty figures, publish it on the web, and call it a benchmark utility. It would still be meaningless.

Benchmarks don't change the speed of anything. They are barely useful, but possibly interesting, if you are careful to run the SAME version of the benchmark on two DIFFERENT computers. Provided the other conditions - like the operating system - are carefully controlled, that can possibly advise you which CPU is faster at running benchmarks under those controlled conditions.

One of the authorities I read earlier contained the lines:

The Intel microprocessors were designed at the height of popularity of the Whetstone benchmark. Examining the instruction set of the math coprocessor, with instructions for sin, cos, atan, sqrt and log, possibly indicates a complete hardware implementation (the one and only?) to match the benchmark.
That's getting into Volkswagen diesel emission territory.

There's one case where an enhanced benchmark might be helpful in the real world. The version we've both reported results from shows the output line

Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow,
If it's only running those tests (unknown), then the fact that my processor also supports SSSE and AVX (both of which are known to be helpful to SETI), wouldn't be discovered when compared with a CPU which maxxed out at SSE3 and no more.
ID: 1937290 · Report as offensive
rob smith Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer moderator
Volunteer tester

Send message
Joined: 7 Mar 03
Posts: 22228
Credit: 416,307,556
RAC: 380
United Kingdom
Message 1937294 - Posted: 26 May 2018, 19:37:44 UTC

Linpac is a DEFINED set, thus there shouldn't be. It is very obvious that you do no understand the use of the commonly used benchmarks like Linpac - they are INDICATORS, not predictors of absolute performance in the real world.

If you recall I mentioned in a post a while back that I had spent some time calculating how many clock cycles a process was going to take on a given processor - This was triggered because the selected processor had a Whetstone benchmark (this was the late 1970, so pre-dated Linpac) that suggested it would do the task we were coding with a fair margin in hand, but it was failing to do so in a rather dramatic manner. So we were looking for an alternative processor to do the job, studying the published benchmarks for processors gave a couple of interest so we had prototype samples and boards produced, again they failed. Next we got serious, there was a new chip rumoured to be coming out from one of the then major players. A lot of hard work by our management and commercial people and we managed to get the datasheet, but there were no benchmarks at the time. We went through the b*ache of hand cranking the application to work on the new processor, looking at the resultant op-code and working out the actual timings, there was time to spare and we went with it. Just after we'd done this the manufacturer published the benchmarks, which indicated that this chip was slower than the ones we had rejected..... Later we went back and tried to hand optimise the code for the rejected processor, but we still couldn't get it fast enough.

Moral? Don't trust benchmarks to accurately predict the real world, to do so is to expose yourself to some very embarrassing situations.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 1937294 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1937306 - Posted: 26 May 2018, 20:59:32 UTC

And if we use the Credit system I've been proposing the benchmarks on each system play no part in the determination of Credit, and so become irrelevant.
The only benchmark that would count is the one used to define the Cobblestone, and by the Cobblestone definition, it doesn't vary.
1 GFLOP= 200/86,400 seconds (1 day).
200 doesn't vary.
Number of seconds in a day doesn't vary.
So that reference 1 GFLOP doesn't vary.
So the Credit allocated for a given WU won't vary depending on that was used to process it.
Grant
Darwin NT
ID: 1937306 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1937309 - Posted: 26 May 2018, 21:10:52 UTC - in response to Message 1937294.  

Moral? Don't trust benchmarks to accurately predict the real world, to do so is to expose yourself to some very embarrassing situations.

Particularly so for synthetic benchmarks.
Benchmarks based on actual programmes using actual data from those programmes can provide relevant indications of performance, however comparisons between different hardware can be problematic as configuration settings that boost the performance of one system can result in significant performance degradation on another, and visa versa.
Grant
Darwin NT
ID: 1937309 · Report as offensive
Profile iwazaru
Volunteer tester
Avatar

Send message
Joined: 31 Oct 99
Posts: 173
Credit: 509,430
RAC: 0
Greece
Message 1937322 - Posted: 26 May 2018, 22:58:11 UTC

The performance measured by the LINPACK benchmark consists of the number of 64-bit floating-point operations, generally additions and multiplications, a computer can perform per second, also known as FLOPS. However, a computer's performance when running actual applications is likely to be far behind the maximal performance it achieves running the appropriate LINPACK benchmark.

Vendors of Massively Parallel Processors (MPPs) have put significant effort into the processors and interconnection networks of their parallel computers. They boast about their computer's latest performance on benchmarks and kernels such as the NAS Parallel Benchmarks or LINPACK. However, such benchmark numbers are typically viewed as levels of performance that are guaranteed not to be exceeded and generally unobtainable by all but a very few programmers who tediously optimize their code for that machine and that machine alone.

https://en.wikipedia.org/wiki/LINPACK_benchmarks
http://opensky.ucar.edu/islandora/object/technotes:183
- - - - - -
I wasn't talking about any ol' bench. My last question was about LINPACK.
Keyword "unobtainable".
It would be nice to verify we are NOT obtaining "unobtainable" FLOPS.
ID: 1937322 · Report as offensive
Profile iwazaru
Volunteer tester
Avatar

Send message
Joined: 31 Oct 99
Posts: 173
Credit: 509,430
RAC: 0
Greece
Message 1937324 - Posted: 26 May 2018, 23:04:26 UTC - in response to Message 1937290.  
Last modified: 26 May 2018, 23:19:15 UTC

...then the fact that my processor also supports SSSE and AVX (both of which are known to be helpful to SETI)...


Just asking for an educated guess.
Do you think SSSE and AVX are enough to justify a 150% performance increase over SSE3?
ID: 1937324 · Report as offensive
Profile iwazaru
Volunteer tester
Avatar

Send message
Joined: 31 Oct 99
Posts: 173
Credit: 509,430
RAC: 0
Greece
Message 1937331 - Posted: 26 May 2018, 23:45:02 UTC - in response to Message 1937306.  

And if we use the Credit system I've been proposing...


For the record, I wish we could too.
It's the only "scientific" way.

But the question remains...
We wave our magic wand and get a credit system that counts flops as accurately as humanly possible...
Does credit go up? Down? Stay the same?

I know we all think it'll go up. But do we have any real proof?

(The math+numbers kind of proof, not the blah-blah-blah kind)
ID: 1937331 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1937334 - Posted: 26 May 2018, 23:57:01 UTC - in response to Message 1937331.  

But the question remains...
We wave our magic wand and get a credit system that counts flops as accurately as humanly possible...
Does credit go up? Down? Stay the same?

Sorry, my mistake.
I thought the goal of this thread was a working Credit system.
Grant
Darwin NT
ID: 1937334 · Report as offensive
Profile betreger Project Donor
Avatar

Send message
Joined: 29 Jun 99
Posts: 11362
Credit: 29,581,041
RAC: 66
United States
Message 1937335 - Posted: 26 May 2018, 23:59:20 UTC - in response to Message 1937334.  

I thought the goal of this thread was a working Credit system.

Face it, credit at Seti has always driven people insane.
ID: 1937335 · Report as offensive
Grant (SSSF)
Volunteer tester

Send message
Joined: 19 Aug 99
Posts: 13751
Credit: 208,696,464
RAC: 304
Australia
Message 1937338 - Posted: 27 May 2018, 0:10:15 UTC - in response to Message 1937335.  

I thought the goal of this thread was a working Credit system.

Face it, credit at Seti has always driven people insane.

Hence the system that I proposed, that as far as I can tell will work. Not only will it work, but it addresses the stated goals of Credit New. And it also will allow for the BOINC Manager to reschedule work according to the users settings.
I've yet to have anyone point out why it won't work.
Grant
Darwin NT
ID: 1937338 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 13 · Next

Message boards : Number crunching : Let's Play CreditNew (Credit & RAC support thread)


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.