Let's Play CreditNew (Credit & RAC support thread)

Author	Message
iwazaru Volunteer tester Send message Joined: 31 Oct 99 Posts: 173 Credit: 509,430 RAC: 0	Message 1937256 - Posted: 26 May 2018, 13:26:57 UTC - in response to Message 1937234. Last modified: 26 May 2018, 13:30:08 UTC That takes us back to the naughties - 2008 and before, when neither CN nor GPUs were in use. Does this mean you can now tell Eric that, "Hey, turns out there's nothing really too wrong with the credit system. It kinda works like your flops based system that everybody loved. So, you know... oops." ? The basic reason is that <rsc_fpops_est> is overstated for MB tasks by - oooh, 248%? Does that mean that if we were to count "real" flops, credit would drop by more than 50% (instead of rise by 50%) ? And that we should be careful what we wish for? :) ID: 1937256 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22228 Credit: 416,307,556 RAC: 380	Message 1937257 - Posted: 26 May 2018, 13:46:00 UTC Both your quotes are THEORY - not based on what is actually happening. With the progressive improvement in the stock and optimised applications the use of "peak" FLOPs has become more inaccurate, which I know is somewhat counter intuitive. But let me try to explain this AGAIN. FLOPs is based on a "standard" set of calculations complied in a very simple way, no compiler optimisation, no code optimisation, no "fancy" op-codes. As any of these come into play the quality of the estimate is degraded. How predictable the degradation is very hard to predict - take an 80core AMD FX procssor as an example: Run a benchmark process on a single core and you will get one answer. Run the same benchmark twice, simultaneously or in sequence you have a roughly equal chance of getting the same answer, or a different one, depending on how the operating system decides to allocate the tasks (these processors have a Floating Point Unit shared between pairs of Logic Processing Units), now try it with three, four times..... Very messy, and we haven't even looked at the impact of compiling the code for an "Intel" or "AMD" base instruction set - both will run, but may give different performances depending on the actual low level instruction set that is required... So take FLOPs as implemented within BOINC & SETI with at least a big bucket of salt, a pinch just isn't big enough :-( And since FLOPs are inaccurate that means that Cobblestones (which are only scaled FLOPs are equally inaccurate. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1937257 ·

iwazaru Volunteer tester Send message Joined: 31 Oct 99 Posts: 173 Credit: 509,430 RAC: 0	Message 1937266 - Posted: 26 May 2018, 14:15:11 UTC - in response to Message 1937257. Yeah I know all that. I also know you can't go above the LINPACK benchmark. I have found no evidence you can go above the WhetStone bench either. Please show me so I can really understand. ID: 1937266 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 1937272 - Posted: 26 May 2018, 14:52:37 UTC - in response to Message 1937266. OK, here's a show. An Overview of Common Benchmarks. From the top of the third page: Many Whetstone versions copied informally and used for benchmarking have the print statements removed, apparently with the intention of achieving better timing accuracy. This is contrary to the authorsâ€™ intentions, since optimizing compilers may then eliminate significant parts of the program. If timing accuracy is a problem, the loop bounds should be increased in such a way that the time spent in the extra statements becomes insignificant. We discussed SIMD operations a day or two ago. Whetstone dates from 1976, long before even the most basic SSE instruction set had been implemented (that was 1999). Use of any form of SSE, up to and beyond AVX, would be considered an optimisation in the Whetstone context, and excluded. That's why efficiencies of up to 400% are possible, and why the statement of equivalence of the terms 'whetstone' and 'peak flops' in the BOINC documentation was nonsense from the day it was written. Linpack allows benchmarking to include optimisations, which would make it a better choice for this purpose. ID: 1937272 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 1937275 - Posted: 26 May 2018, 15:39:24 UTC - in response to Message 1937272. It is, of course, possible that BOINC's use of the term 'Whetstone' is false, in that it breaks the 'no optimisation' rule. Somebody will have to check both the source code, and the compiler settings on David's VS2010 build machine, for that. Here's some data. BOINC: 26/05/2018 16:26:16 \| \| Benchmark results: 26/05/2018 16:26:16 \| \| Number of CPUs: 4 26/05/2018 16:26:16 \| \| 4080 floating point MIPS (Whetstone) per CPU 26/05/2018 16:26:16 \| \| 15704 integer MIPS (Dhrystone) per CPU MajorGeeks: 3,418,803 whets64MP: Whetstone Single Precision MP SSE Benchmark Sat May 26 16:16:27 2018 Via Microsoft C/C++ Optimizing Compiler Version 14.00.40310.41 for AMD64 Uses 2 threads second at THREAD_PRIORITY_BELOW_NORMAL. Code produced by compiler is not necessarily the same for both threads. So speed can vary between threads. Vax MIPS are over-inflated due to excessive optimisation. MFLOPS Vax MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal Gmean MIPS 1 2 3 MOPS MOPS MOPS MOPS MOPS 1728 46757 9982 2053 1689 1488 399 139 6406 8155 15653 Thread 1 1024 842 838 199 69.4 3195 4030 14555 Thread 2 1029 847 650 200 69.4 3211 4126 1098 Numeric results were as expected CPUID and RDTSC Assembly Code CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000506E3 Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz Measured 3192 MHz Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow, Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus AMD64 processor architecture, 4 CPUs Windows NT Version 6.1, build 7601, Service Pack 1 Memory 8117 MB, Free 5416 MB User Virtual Space 8388608 MB, Free 8388601 MB linpack64: Linpack SSE2 Double Precision Unrolled Benchmark n @ 100 Via Microsoft C/C++ Optimizing Compiler Version 15.00.30729.207 for x64 Sat May 26 16:13:22 2018 Speed 3637.06 MFLOPS Numeric results were as expected CPUID and RDTSC Assembly Code CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000506E3 Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz Measured 3192 MHz Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow, Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus AMD64 processor architecture, 4 CPUs Windows NT Version 6.1, build 7601, Service Pack 1 Memory 8117 MB, Free 5463 MB User Virtual Space 8388608 MB, Free 8388600 MB The latter two came from http://www.roylongbottom.org.uk/win64.htm - I'm not sure he obeyed the 'no optimisations' rule, either. ID: 1937275 ·

iwazaru Volunteer tester Send message Joined: 31 Oct 99 Posts: 173 Credit: 509,430 RAC: 0	Message 1937282 - Posted: 26 May 2018, 17:41:41 UTC - in response to Message 1937275. Yeah I did mine a couple weeks ago with similar results.... Whetstone Single Precision SSE Benchmark Fri May 11 02:00:44 2018 Via Microsoft 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86 MFLOPS Vax MWIPS MFLOPS MFLOPS MFLOPS Cos Exp Fixpt If Equal Gmean MIPS 1 2 3 MOPS MOPS MOPS MOPS MOPS 1050 28812 3727 1125 1125 915 76.9 49.1 4019 5348 8903 Numeric results were as expected CPUID and RDTSC Assembly Code CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000906E9 Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz Measured 2808 MHz Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow, Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus Intel processor architecture, 8 CPUs Windows NT Version 6.2, build 9200, Memory 4096 MB, Free 4096 MB User Virtual Space 4096 MB, Free 4052 MB Linpack SSE2 Double Precision Unrolled Benchmark n @ 100 Via Microsoft C/C++ Optimizing Compiler Version 15.00.30729.207 for x64 Fri May 11 01:54:58 2018 Speed 4083.80 MFLOPS Numeric results were as expected CPUID and RDTSC Assembly Code CPU GenuineIntel, Features Code BFEBFBFF, Model Code 000906E9 Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz Measured 2808 MHz Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow, Windows GetSystemInfo, GetVersionEx, GlobalMemoryStatus AMD64 processor architecture, 8 CPUs Windows NT Version 6.2, build 9200, Memory 8078 MB, Free 5464 MB User Virtual Space 134217728 MB, Free 134217683 MB I took that to mean 4 GFLOPS for our procs is the highest possible, with or without optimizations. Am I incorrect? ID: 1937282 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22228 Credit: 416,307,556 RAC: 380	Message 1937285 - Posted: 26 May 2018, 17:49:15 UTC All that means is that running the mix of operations, in the sequence in the Linpac benchmark your computer has scored a speed of 4083.80 MFLOPS Change the mix or sequence operations and the same computer could score far more or far less. Such benchmarks are really only a rough guide to the real world, not an absolute prediction. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1937285 ·

iwazaru Volunteer tester Send message Joined: 31 Oct 99 Posts: 173 Credit: 509,430 RAC: 0	Message 1937286 - Posted: 26 May 2018, 17:56:28 UTC - in response to Message 1937285. Once again you haven't answered my question. I take the LINPACK result to be a best case scenario, unobtainable in the real world. Is this assumption incorrect? ID: 1937286 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22228 Credit: 416,307,556 RAC: 380	Message 1937288 - Posted: 26 May 2018, 18:45:19 UTC It is a metric using a very specific set of instructions, it is neither "the best" nor "the worst". It is, and only is, an indication of performance, not an absolute metric. Thus you are WRONG. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1937288 ·

iwazaru Volunteer tester Send message Joined: 31 Oct 99 Posts: 173 Credit: 509,430 RAC: 0	Message 1937289 - Posted: 26 May 2018, 19:01:13 UTC - in response to Message 1937288. So there's a LINPACK benchmark out there that'll show my proc to be 10 GFLOPS instead of 4GFLOPS? Can you show me this version of the LINPACK bench? ID: 1937289 ·

Richard Haselgrove Volunteer tester Send message Joined: 4 Jul 99 Posts: 14654 Credit: 200,643,578 RAC: 874	Message 1937290 - Posted: 26 May 2018, 19:33:52 UTC - in response to Message 1937289. So there's a LINPACK benchmark out there that'll show my proc to be 10 GFLOPS instead of 4GFLOPS? Can you show me this version of the LINPACK bench? What would be the point? You could write your own program, output some pretty figures, publish it on the web, and call it a benchmark utility. It would still be meaningless. Benchmarks don't change the speed of anything. They are barely useful, but possibly interesting, if you are careful to run the SAME version of the benchmark on two DIFFERENT computers. Provided the other conditions - like the operating system - are carefully controlled, that can possibly advise you which CPU is faster at running benchmarks under those controlled conditions. One of the authorities I read earlier contained the lines: The Intel microprocessors were designed at the height of popularity of the Whetstone benchmark. Examining the instruction set of the math coprocessor, with instructions for sin, cos, atan, sqrt and log, possibly indicates a complete hardware implementation (the one and only?) to match the benchmark. That's getting into Volkswagen diesel emission territory. There's one case where an enhanced benchmark might be helpful in the real world. The version we've both reported results from shows the output line Has MMX, Has SSE, Has SSE2, Has SSE3, No 3DNow, If it's only running those tests (unknown), then the fact that my processor also supports SSSE and AVX (both of which are known to be helpful to SETI), wouldn't be discovered when compared with a CPU which maxxed out at SSE3 and no more. ID: 1937290 ·

rob smith Volunteer moderator Volunteer tester Send message Joined: 7 Mar 03 Posts: 22228 Credit: 416,307,556 RAC: 380	Message 1937294 - Posted: 26 May 2018, 19:37:44 UTC Linpac is a DEFINED set, thus there shouldn't be. It is very obvious that you do no understand the use of the commonly used benchmarks like Linpac - they are INDICATORS, not predictors of absolute performance in the real world. If you recall I mentioned in a post a while back that I had spent some time calculating how many clock cycles a process was going to take on a given processor - This was triggered because the selected processor had a Whetstone benchmark (this was the late 1970, so pre-dated Linpac) that suggested it would do the task we were coding with a fair margin in hand, but it was failing to do so in a rather dramatic manner. So we were looking for an alternative processor to do the job, studying the published benchmarks for processors gave a couple of interest so we had prototype samples and boards produced, again they failed. Next we got serious, there was a new chip rumoured to be coming out from one of the then major players. A lot of hard work by our management and commercial people and we managed to get the datasheet, but there were no benchmarks at the time. We went through the b*ache of hand cranking the application to work on the new processor, looking at the resultant op-code and working out the actual timings, there was time to spare and we went with it. Just after we'd done this the manufacturer published the benchmarks, which indicated that this chip was slower than the ones we had rejected..... Later we went back and tried to hand optimise the code for the rejected processor, but we still couldn't get it fast enough. Moral? Don't trust benchmarks to accurately predict the real world, to do so is to expose yourself to some very embarrassing situations. Bob Smith Member of Seti PIPPS (Pluto is a Planet Protest Society) Somewhere in the (un)known Universe? ID: 1937294 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304	Message 1937306 - Posted: 26 May 2018, 20:59:32 UTC And if we use the Credit system I've been proposing the benchmarks on each system play no part in the determination of Credit, and so become irrelevant. The only benchmark that would count is the one used to define the Cobblestone, and by the Cobblestone definition, it doesn't vary. 1 GFLOP= 200/86,400 seconds (1 day). 200 doesn't vary. Number of seconds in a day doesn't vary. So that reference 1 GFLOP doesn't vary. So the Credit allocated for a given WU won't vary depending on that was used to process it. Grant Darwin NT ID: 1937306 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304	Message 1937309 - Posted: 26 May 2018, 21:10:52 UTC - in response to Message 1937294. Moral? Don't trust benchmarks to accurately predict the real world, to do so is to expose yourself to some very embarrassing situations. Particularly so for synthetic benchmarks. Benchmarks based on actual programmes using actual data from those programmes can provide relevant indications of performance, however comparisons between different hardware can be problematic as configuration settings that boost the performance of one system can result in significant performance degradation on another, and visa versa. Grant Darwin NT ID: 1937309 ·

iwazaru Volunteer tester Send message Joined: 31 Oct 99 Posts: 173 Credit: 509,430 RAC: 0	Message 1937322 - Posted: 26 May 2018, 22:58:11 UTC The performance measured by the LINPACK benchmark consists of the number of 64-bit floating-point operations, generally additions and multiplications, a computer can perform per second, also known as FLOPS. However, a computer's performance when running actual applications is likely to be far behind the maximal performance it achieves running the appropriate LINPACK benchmark. Vendors of Massively Parallel Processors (MPPs) have put significant effort into the processors and interconnection networks of their parallel computers. They boast about their computer's latest performance on benchmarks and kernels such as the NAS Parallel Benchmarks or LINPACK. However, such benchmark numbers are typically viewed as levels of performance that are guaranteed not to be exceeded and generally unobtainable by all but a very few programmers who tediously optimize their code for that machine and that machine alone. https://en.wikipedia.org/wiki/LINPACK_benchmarks http://opensky.ucar.edu/islandora/object/technotes:183 - - - - - - I wasn't talking about any ol' bench. My last question was about LINPACK. Keyword "unobtainable". It would be nice to verify we are NOT obtaining "unobtainable" FLOPS. ID: 1937322 ·

iwazaru Volunteer tester Send message Joined: 31 Oct 99 Posts: 173 Credit: 509,430 RAC: 0	Message 1937324 - Posted: 26 May 2018, 23:04:26 UTC - in response to Message 1937290. Last modified: 26 May 2018, 23:19:15 UTC ...then the fact that my processor also supports SSSE and AVX (both of which are known to be helpful to SETI)... Just asking for an educated guess. Do you think SSSE and AVX are enough to justify a 150% performance increase over SSE3? ID: 1937324 ·

iwazaru Volunteer tester Send message Joined: 31 Oct 99 Posts: 173 Credit: 509,430 RAC: 0	Message 1937331 - Posted: 26 May 2018, 23:45:02 UTC - in response to Message 1937306. And if we use the Credit system I've been proposing... For the record, I wish we could too. It's the only "scientific" way. But the question remains... We wave our magic wand and get a credit system that counts flops as accurately as humanly possible... Does credit go up? Down? Stay the same? I know we all think it'll go up. But do we have any real proof? (The math+numbers kind of proof, not the blah-blah-blah kind) ID: 1937331 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304	Message 1937334 - Posted: 26 May 2018, 23:57:01 UTC - in response to Message 1937331. But the question remains... We wave our magic wand and get a credit system that counts flops as accurately as humanly possible... Does credit go up? Down? Stay the same? Sorry, my mistake. I thought the goal of this thread was a working Credit system. Grant Darwin NT ID: 1937334 ·

betreger Send message Joined: 29 Jun 99 Posts: 11362 Credit: 29,581,041 RAC: 66	Message 1937335 - Posted: 26 May 2018, 23:59:20 UTC - in response to Message 1937334. I thought the goal of this thread was a working Credit system. Face it, credit at Seti has always driven people insane. ID: 1937335 ·

Grant (SSSF) Volunteer tester Send message Joined: 19 Aug 99 Posts: 13751 Credit: 208,696,464 RAC: 304	Message 1937338 - Posted: 27 May 2018, 0:10:15 UTC - in response to Message 1937335. I thought the goal of this thread was a working Credit system. Face it, credit at Seti has always driven people insane. Hence the system that I proposed, that as far as I can tell will work. Not only will it work, but it addresses the stated goals of Credit New. And it also will allow for the BOINC Manager to reschedule work according to the users settings. I've yet to have anyone point out why it won't work. Grant Darwin NT ID: 1937338 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.