Refresh My Memory, Why can't we detect CPU to use optimized

Author	Message
Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 735164 - Posted: 6 Apr 2008, 1:12:51 UTC Can someone refresh my memory why we cannot detect the CPU capability and then use the most efficient processing code that you folks have laboriously tested? I mean, it is not THAT hard, is it? Sorry for the question, but, I am back in confused mode as to why we (the collective we) are being sub-optimal in our approach. It just does not make sense to me that we are not using the fastest processing code possbile on the widest possible set of contributors. I mean, if there is a rash of errors, then you fall back to stock (or the next level down) ... As it is, we waste more time than we need to processing ... I suppose it is a silly question ... ID: 735164 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 735166 - Posted: 6 Apr 2008, 1:17:46 UTC - in response to Message 735164. Can someone refresh my memory why we cannot detect the CPU capability and then use the most efficient processing code that you folks have laboriously tested? I mean, it is not THAT hard, is it? Sorry for the question, but, I am back in confused mode as to why we (the collective we) are being sub-optimal in our approach. It just does not make sense to me that we are not using the fastest processing code possbile on the widest possible set of contributors. I mean, if there is a rash of errors, then you fall back to stock (or the next level down) ... As it is, we waste more time than we need to processing ... I suppose it is a silly question ... Not that old chestnut again....... You know darned well why Seti cannot support all platform tweaks,,,,,,, So quit the argument......please. "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 735166 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 735175 - Posted: 6 Apr 2008, 1:30:37 UTC - in response to Message 735166. Not that old chestnut again....... You know darned well why Seti cannot support all platform tweaks,,,,,,, So quit the argument......please. Um, I am not arguing. I asked an honest question. If I knew the answer I would not have asked the question. If Iknew the answer at one time in the past, well, I have since forgotten it. At one time in my past I could even drive a car. At this point, there are many things I can no longer perform. ID: 735175 ·

kittyman Volunteer tester Send message Joined: 9 Jul 00 Posts: 51468 Credit: 1,018,363,574 RAC: 1,004	Message 735176 - Posted: 6 Apr 2008, 1:33:47 UTC - in response to Message 735175. Not that old chestnut again....... You know darned well why Seti cannot support all platform tweaks,,,,,,, So quit the argument......please. Um, I am not arguing. I asked an honest question. If I knew the answer I would not have asked the question. If Iknew the answer at one time in the past, well, I have since forgotten it. At one time in my past I could even drive a car. At this point, there are many things I can no longer perform. I am sorry, Sir........ I sometimes forget who I am talking to........ You have my respect, and apologies...... "Freedom is just Chaos, with better lighting." Alan Dean Foster ID: 735176 ·

Toby Volunteer tester Send message Joined: 26 Oct 00 Posts: 1005 Credit: 6,366,949 RAC: 0	Message 735180 - Posted: 6 Apr 2008, 1:42:41 UTC Its not that it can't be done... It just hasn't been done yet. David Anderson ceated Trac ticket 562 one month ago that describes how he would like to see this handled. Of course it would only work with reasonably new clients that actually report the CPU capabilities to the server. A member of The Knights Who Say NI! For rankings, history graphs and more, check out: My BOINC stats site ID: 735180 ·

StokeyBob Send message Joined: 31 Aug 03 Posts: 848 Credit: 2,218,691 RAC: 0	Message 735217 - Posted: 6 Apr 2008, 3:28:19 UTC Long time, no see! Paul D. Buck I haven't been on the message boards for a long time. It is good to see you still around. ID: 735217 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19080 Credit: 40,757,560 RAC: 67	Message 735233 - Posted: 6 Apr 2008, 4:03:01 UTC Joe (Josef W. Segur) in post 729751 Optimised Apps question states that; The stock app has limited support for up to SSE3, but only in certain specific routines. ID: 735233 ·

MarkJ Volunteer tester Send message Joined: 17 Feb 08 Posts: 1139 Credit: 80,854,192 RAC: 5	Message 735249 - Posted: 6 Apr 2008, 5:05:15 UTC - in response to Message 735164. Can someone refresh my memory why we cannot detect the CPU capability and then use the most efficient processing code that you folks have laboriously tested? I mean, it is not THAT hard, is it? Sorry for the question, but, I am back in confused mode as to why we (the collective we) are being sub-optimal in our approach. It just does not make sense to me that we are not using the fastest processing code possbile on the widest possible set of contributors. I mean, if there is a rash of errors, then you fall back to stock (or the next level down) ... As it is, we waste more time than we need to processing ... I suppose it is a silly question ... Not only do you have to put code to determine what the cpu is capable of, you also need all this conditional stuff in there to use the optimizations at the appropiate point. It would make the stock app much larger and harder to maintain. Does it really matter if the app can detect the best capability and use it? After all we have the optimized app (currently from Crunch3r, and another version coming through) that can use your cpu to its potential. Its just that the user has to ascertain their cpu's capability (ie run cpu-z) and then use the appropiate app rather than the stock one. BOINC blog ID: 735249 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 735260 - Posted: 6 Apr 2008, 5:44:47 UTC - in response to Message 735164. Can someone refresh my memory why we cannot detect the CPU capability and then use the most efficient processing code that you folks have laboriously tested? I mean, it is not THAT hard, is it? ... We do, though not yet everything which may eventually be included. If you do a standalone run of the stock S@H app now and use a -verbose argument then stderr.txt will show all the variant routines which were checked: v_GetPowerSpectrum 0.00263 0.00000 test v_vGetPowerSpectrum 0.00267 0.00000 test v_vGetPowerSpectrum2 0.00267 0.00000 test v_vGetPowerSpectrumUnrolled 0.00268 0.00000 test v_vGetPowerSpectrumUnrolled2 0.00265 0.00000 test v_GetPowerSpectrum 0.00263 0.00000 choice v_ChirpData 0.07636 0.00000 test fpu_ChirpData 0.06686 0.00053 test v_vChirpData_x86_64 0.45416 0.00054 test sse1_ChirpData_ak 0.05942 0.00053 test sse2_ChirpData_ak 0.04772 0.00053 test sse2_ChirpData_ak 0.04772 0.00053 choice v_Transpose 0.18752 0.00000 test v_Transpose2 0.09686 0.00000 test v_Transpose4 0.05149 0.00000 test v_Transpose8 0.09197 0.00000 test v_pfTranspose2 0.12427 0.00000 test v_pfTranspose4 0.06222 0.00000 test v_pfTranspose8 0.10654 0.00000 test v_vTranspose4 0.05803 0.00000 test v_vTranspose4np 0.05159 0.00000 test v_vTranspose4ntw 0.04135 0.00000 test v_vTranspose4x8ntw 0.03448 0.00000 test v_vTranspose4x16ntw 0.10690 0.00000 test v_vpfTranspose8x4ntw 0.04127 0.00000 test v_vTranspose4x8ntw 0.03448 0.00000 choice FPU opt folding 0.01391 0.00000 test AK SSE folding 0.00778 0.00000 test BH SSE folding 0.00890 0.00000 test AK SSE folding 0.00778 0.00000 choice That was on one of my systems which doesn't have more than SSE2, so doesn't show the sse3_ChirpData_ak variant chirping routine. The situation is fairly complex, simply checking the CPU capabilities is not always enough. For instance, on some Core 2 systems the SSE1 chirping turns out faster than the SSE3 version. That's why the app tests all the variants which the host can do for speed and accuracy. Joe ID: 735260 ·

Paul D. Buck Volunteer tester Send message Joined: 19 Jul 00 Posts: 3898 Credit: 1,158,042 RAC: 0	Message 735271 - Posted: 6 Apr 2008, 7:30:56 UTC Um, thanks for the answers. Though, the thrust was not to "bulk-up" the stock application or to do conditionals. But, to follow the logic of Josef, to test the CPU and then to download the appropriately "tuned" application. I know we have the "stable" of applications and that there is a great deal of advice on the selection process which is why I posed the question. In the years prior, I remember that this was one of the most common questions on this forum, which app and how to install it ... It seems that this is one more area where we are no further along years later than we were ... We used to call situations like this, when I was a boy, "stuck in the mud" ... Anyway, thanks for the answers. ID: 735271 ·

ML1 Volunteer moderator Volunteer tester Send message Joined: 25 Nov 01 Posts: 20334 Credit: 7,508,002 RAC: 20	Message 735373 - Posted: 6 Apr 2008, 15:02:17 UTC - in response to Message 735271. Last modified: 6 Apr 2008, 15:07:02 UTC Um, thanks for the answers. Though, the thrust was not to "bulk-up" the stock application or to do conditionals. But, to follow the logic of Josef, to test the CPU and then to download the appropriately "tuned" application... I think that the only workable solution is indeed to suffer 'bulking up' the stock application. The 'best' optimisation critically depends on what features the CPU supports but also upon all of: CPU caches sizes, latency, and available bandwidth; system RAM latency and available bandwidth; use for s@h only or whether shared with other applications. I think you can only realistically "test-and-see" to select the best optimisations for that hardware, which means that all available optimisations all need to be available in the stock application. Having a test application, with special support in Boinc for that test, and then many permutations of optimised applications: All that lot is way too convoluted to be maintainable. Hence, one big application whereby the appropriate optimised subroutiones are called up as needed is the best bet at present. Any other simpler ways?... We used to call situations like this, when I was a boy, "stuck in the mud" ... That's more an issue of what development effort is available. Yes, the science application can be improved. There's fantastic volunteer effort working on that. However, for Berkeley, I suspect that mere survival and getting something very visual working such as the NITPICKER are far far greater concerns for the 3(?) people available there. The present s@h is 'ticking along' nicely (server hardware panics aside!). The most urgent problems and developments are being worked on. Adding a few more percent performance is, I would guess, not a hot priority for the time being. Consistency doesn't always mean "thick mud". Good question still :-) Happy crunchin', Martin See new freedom: Mageia Linux Take a look for yourself: Linux Format The Future is what We all make IT (GPLv3) ID: 735373 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 735383 - Posted: 6 Apr 2008, 15:32:37 UTC - in response to Message 735271. It seems that this is one more area where we are no further along years later than we were ... We used to call situations like this, when I was a boy, "stuck in the mud" ... I don't think I'd agree with the "no further along" argument, as Joe Segur points out, the current stock application does test (using benchmarks, not just CPU ID) and use routines that are more suited to certain processors.... But I also think there is a constant theme that the project(s) have more developer resources than they actually have. Does anyone know off the top of their heads how many staff developers are working on SETI (not BOINC, SETI) science applications? I think it's just Eric, plus a few volunteers and contributions from those doing the separate optimized apps. ID: 735383 ·

W-K 666 Volunteer tester Send message Joined: 18 May 99 Posts: 19080 Credit: 40,757,560 RAC: 67	Message 735403 - Posted: 6 Apr 2008, 16:03:12 UTC The other thing about the Seti units is that the various apps, available at the moment, may be the good at one or two portions of the AR range and another app better at other parts of the AR range. It's a pity Tony's research has been withdrawn, it illustrated the problem very well. Ned, I'm pretty sure are correct in your assumption that Eric is the only one working on the Seti app, when he has time, I think he is also involved with the Nitpicker and helping Josh with AstroPulse, plus all the other Seti paperwork etc. And we keep bugging him for updates and news. ID: 735403 ·

Clyde C. Phillips, III Send message Joined: 2 Aug 00 Posts: 1851 Credit: 5,955,047 RAC: 0	Message 735434 - Posted: 6 Apr 2008, 17:58:38 UTC Maybe it would be better to compare crunchtimes for several samples of each angle range class for each processor than to use Dhrystone and Whetstone values. Often just the number of credits awarded each unit can serve as a proxy for the comparison of angle-range workunits. It might get tougher when comparing Intels with AMDs, though. Also, for days on end, there might not be workunits of certain angle range/number-of-credits classes distributed. ID: 735434 ·

1mp0Â£173 Volunteer tester Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0	Message 735441 - Posted: 6 Apr 2008, 18:10:37 UTC - in response to Message 735434. Maybe it would be better to compare crunchtimes for several samples of each angle range class for each processor than to use Dhrystone and Whetstone values. Often just the number of credits awarded each unit can serve as a proxy for the comparison of angle-range workunits. It might get tougher when comparing Intels with AMDs, though. Also, for days on end, there might not be workunits of certain angle range/number-of-credits classes distributed. Absolutely. The very best way to do this is to use the app. as the benchmark, and to use several selected "benchmark work units" to measure performance. Trouble is, running these benchmarks will take hours, and while crunch times are generally similar, you'd want to select the "benchmark" WUs carefully. So (at least for SETI) the best compromise is what we have: whetstone and dhrystone for a rough measure, and duration correction factor, averaged across several work units for a more representative time estimate. Also, for any project that counts flops, the benchmark has no role in claimed credit. If the benchmark is off, you'll overfetch or underfetch work, but it won't change your scores. ID: 735441 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 735567 - Posted: 6 Apr 2008, 21:26:14 UTC - in response to Message 735271. Paul D. Buck: Um, thanks for the answers. Though, the thrust was not to "bulk-up" the stock application or to do conditionals. But, to follow the logic of Josef, to test the CPU and then to download the appropriately "tuned" application. I know we have the "stable" of applications and that there is a great deal of advice on the selection process which is why I posed the question. ... In essence, I think it's simply a matter of how many different versions of the app the project has the time to test or maintain. They currently have apps for 8 platforms (though some may be duplicates). If they had separate apps for the various CPU architectures, etc. then that might be multiplied by 4 or so. Much optimization involves finding the parts of the code which are executed billions or trillions of times while processing a WU and figuring out more efficient ways of doing those operations. That tends to be fairly small sequences of instructions, so providing alternate variants doesn't "bulk-up" the app significantly. The added variants in stock plus the code to test them amount to about 63 KB now, for instance, and similar for the Lunatics 2.4 builds. But of course using different compiler options for different target architectures makes it necessary to have many more Lunatics builds than the project can support. I do regret that BOINC doesn't provide a convenient method for users to get and update optimized builds from third parties. I wince every time I come across a host running an obsolete optimised version. The anonymous platform mechanism wasn't really designed for that purpose, and some fraction of users who install the optimised apps will fail to check for updates. Joe ID: 735567 ·

Odysseus Volunteer tester Send message Joined: 26 Jul 99 Posts: 1808 Credit: 6,701,347 RAC: 6	Message 735625 - Posted: 7 Apr 2008, 1:13:10 UTC - in response to Message 735383. Does anyone know off the top of their heads how many staff developers are working on SETI (not BOINC, SETI) science applications? I think it's just Eric, plus a few volunteers and contributions from those doing the separate optimized apps. ThereÃ¢â‚¬â„¢s a grad-student as well, Josh Von Korff, working on Astropulse. ThatÃ¢â‚¬â„¢s two, or maybe one and a halfÃ¢â‚¬â€I donÃ¢â‚¬â„¢t know whether Josh is full- or part-time. ID: 735625 ·

Brian Silvers Send message Joined: 11 Jun 99 Posts: 1681 Credit: 492,052 RAC: 0	Message 735680 - Posted: 7 Apr 2008, 6:25:02 UTC - in response to Message 735567. The anonymous platform mechanism wasn't really designed for that purpose, and some fraction of users who install the optimised apps will fail to check for updates. Joe, Does this statement by you match what I had said, that the anonymous platform mechanism should only be for / was originally designed for unsupported OSes and not so much for SIMD optimization levels? Thanks... Brian ID: 735680 ·

Josef W. Segur Volunteer developer Volunteer tester Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0	Message 735821 - Posted: 7 Apr 2008, 16:34:17 UTC - in response to Message 735680. The anonymous platform mechanism wasn't really designed for that purpose, and some fraction of users who install the optimised apps will fail to check for updates. Joe, Does this statement by you match what I had said, that the anonymous platform mechanism should only be for / was originally designed for unsupported OSes and not so much for SIMD optimization levels? ... Brian BOINC provides a set of capabilities which allow the projects to focus on the work they want to get done. The documentation for the anonymous platform mechanism certainly indicates the BOINC developers were not specifically thinking about optimized versions of open source science applications, but the feature is flexible enough to allow that usage (with limitations). As to "should only be for", I think the BOINC developers would rather not spend time creating something new and better to handle optimized apps. They are probably pleased that what they provided is "good enough". I also think that if someone submitted code changes for something better they would accept them. They are currently working on ways to deal efficiently with multi-core and/or CPU plus GPU processing, perhaps some of the additions for that purpose will be adaptable. Joe ID: 735821 ·

©2024 University of California

SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.