Refresh My Memory, Why can't we detect CPU to use optimized

Message boards : Number crunching : Refresh My Memory, Why can't we detect CPU to use optimized
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 735164 - Posted: 6 Apr 2008, 1:12:51 UTC

Can someone refresh my memory why we cannot detect the CPU capability and then use the most efficient processing code that you folks have laboriously tested?

I mean, it is not THAT hard, is it?

Sorry for the question, but, I am back in confused mode as to why we (the collective we) are being sub-optimal in our approach.

It just does not make sense to me that we are not using the fastest processing code possbile on the widest possible set of contributors.

I mean, if there is a rash of errors, then you fall back to stock (or the next level down) ...

As it is, we waste more time than we need to processing ...

I suppose it is a silly question ...
ID: 735164 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 735166 - Posted: 6 Apr 2008, 1:17:46 UTC - in response to Message 735164.  

Can someone refresh my memory why we cannot detect the CPU capability and then use the most efficient processing code that you folks have laboriously tested?

I mean, it is not THAT hard, is it?

Sorry for the question, but, I am back in confused mode as to why we (the collective we) are being sub-optimal in our approach.

It just does not make sense to me that we are not using the fastest processing code possbile on the widest possible set of contributors.

I mean, if there is a rash of errors, then you fall back to stock (or the next level down) ...

As it is, we waste more time than we need to processing ...

I suppose it is a silly question ...

Not that old chestnut again.......
You know darned well why Seti cannot support all platform tweaks,,,,,,,
So quit the argument......please.
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 735166 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 735175 - Posted: 6 Apr 2008, 1:30:37 UTC - in response to Message 735166.  

Not that old chestnut again.......
You know darned well why Seti cannot support all platform tweaks,,,,,,,
So quit the argument......please.

Um, I am not arguing.

I asked an honest question. If I knew the answer I would not have asked the question. If Iknew the answer at one time in the past, well, I have since forgotten it.

At one time in my past I could even drive a car. At this point, there are many things I can no longer perform.
ID: 735175 · Report as offensive
kittyman Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 9 Jul 00
Posts: 51468
Credit: 1,018,363,574
RAC: 1,004
United States
Message 735176 - Posted: 6 Apr 2008, 1:33:47 UTC - in response to Message 735175.  

Not that old chestnut again.......
You know darned well why Seti cannot support all platform tweaks,,,,,,,
So quit the argument......please.

Um, I am not arguing.

I asked an honest question. If I knew the answer I would not have asked the question. If Iknew the answer at one time in the past, well, I have since forgotten it.

At one time in my past I could even drive a car. At this point, there are many things I can no longer perform.

I am sorry, Sir........
I sometimes forget who I am talking to........
You have my respect, and apologies......
"Freedom is just Chaos, with better lighting." Alan Dean Foster

ID: 735176 · Report as offensive
Profile Toby
Volunteer tester
Avatar

Send message
Joined: 26 Oct 00
Posts: 1005
Credit: 6,366,949
RAC: 0
United States
Message 735180 - Posted: 6 Apr 2008, 1:42:41 UTC

Its not that it can't be done... It just hasn't been done yet. David Anderson ceated Trac ticket 562 one month ago that describes how he would like to see this handled. Of course it would only work with reasonably new clients that actually report the CPU capabilities to the server.
A member of The Knights Who Say NI!
For rankings, history graphs and more, check out:
My BOINC stats site
ID: 735180 · Report as offensive
Profile StokeyBob
Avatar

Send message
Joined: 31 Aug 03
Posts: 848
Credit: 2,218,691
RAC: 0
United States
Message 735217 - Posted: 6 Apr 2008, 3:28:19 UTC

Long time, no see! Paul D. Buck

I haven't been on the message boards for a long time. It is good to see you still around.


ID: 735217 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19063
Credit: 40,757,560
RAC: 67
United Kingdom
Message 735233 - Posted: 6 Apr 2008, 4:03:01 UTC

Joe (Josef W. Segur) in post 729751 Optimised Apps question states that;
The stock app has limited support for up to SSE3, but only in certain specific routines.
ID: 735233 · Report as offensive
MarkJ Crowdfunding Project Donor*Special Project $75 donorSpecial Project $250 donor
Volunteer tester
Avatar

Send message
Joined: 17 Feb 08
Posts: 1139
Credit: 80,854,192
RAC: 5
Australia
Message 735249 - Posted: 6 Apr 2008, 5:05:15 UTC - in response to Message 735164.  

Can someone refresh my memory why we cannot detect the CPU capability and then use the most efficient processing code that you folks have laboriously tested?

I mean, it is not THAT hard, is it?

Sorry for the question, but, I am back in confused mode as to why we (the collective we) are being sub-optimal in our approach.

It just does not make sense to me that we are not using the fastest processing code possbile on the widest possible set of contributors.

I mean, if there is a rash of errors, then you fall back to stock (or the next level down) ...

As it is, we waste more time than we need to processing ...

I suppose it is a silly question ...


Not only do you have to put code to determine what the cpu is capable of, you also need all this conditional stuff in there to use the optimizations at the appropiate point. It would make the stock app much larger and harder to maintain.

Does it really matter if the app can detect the best capability and use it? After all we have the optimized app (currently from Crunch3r, and another version coming through) that can use your cpu to its potential. Its just that the user has to ascertain their cpu's capability (ie run cpu-z) and then use the appropiate app rather than the stock one.
BOINC blog
ID: 735249 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 735260 - Posted: 6 Apr 2008, 5:44:47 UTC - in response to Message 735164.  

Can someone refresh my memory why we cannot detect the CPU capability and then use the most efficient processing code that you folks have laboriously tested?

I mean, it is not THAT hard, is it?
...

We do, though not yet everything which may eventually be included. If you do a standalone run of the stock S@H app now and use a -verbose argument then stderr.txt will show all the variant routines which were checked:
v_GetPowerSpectrum 0.00263 0.00000 test
v_vGetPowerSpectrum 0.00267 0.00000 test
v_vGetPowerSpectrum2 0.00267 0.00000 test
v_vGetPowerSpectrumUnrolled 0.00268 0.00000 test
v_vGetPowerSpectrumUnrolled2 0.00265 0.00000 test
v_GetPowerSpectrum 0.00263 0.00000 choice

v_ChirpData 0.07636 0.00000 test
fpu_ChirpData 0.06686 0.00053 test
v_vChirpData_x86_64 0.45416 0.00054 test
sse1_ChirpData_ak 0.05942 0.00053 test
sse2_ChirpData_ak 0.04772 0.00053 test
sse2_ChirpData_ak 0.04772 0.00053 choice

v_Transpose 0.18752 0.00000 test
v_Transpose2 0.09686 0.00000 test
v_Transpose4 0.05149 0.00000 test
v_Transpose8 0.09197 0.00000 test
v_pfTranspose2 0.12427 0.00000 test
v_pfTranspose4 0.06222 0.00000 test
v_pfTranspose8 0.10654 0.00000 test
v_vTranspose4 0.05803 0.00000 test
v_vTranspose4np 0.05159 0.00000 test
v_vTranspose4ntw 0.04135 0.00000 test
v_vTranspose4x8ntw 0.03448 0.00000 test
v_vTranspose4x16ntw 0.10690 0.00000 test
v_vpfTranspose8x4ntw 0.04127 0.00000 test
v_vTranspose4x8ntw 0.03448 0.00000 choice

FPU opt folding 0.01391 0.00000 test
AK SSE folding 0.00778 0.00000 test
BH SSE folding 0.00890 0.00000 test
AK SSE folding 0.00778 0.00000 choice

That was on one of my systems which doesn't have more than SSE2, so doesn't show the sse3_ChirpData_ak variant chirping routine.

The situation is fairly complex, simply checking the CPU capabilities is not always enough. For instance, on some Core 2 systems the SSE1 chirping turns out faster than the SSE3 version. That's why the app tests all the variants which the host can do for speed and accuracy.
                                                              Joe
ID: 735260 · Report as offensive
Profile Paul D. Buck
Volunteer tester

Send message
Joined: 19 Jul 00
Posts: 3898
Credit: 1,158,042
RAC: 0
United States
Message 735271 - Posted: 6 Apr 2008, 7:30:56 UTC

Um, thanks for the answers.

Though, the thrust was not to "bulk-up" the stock application or to do conditionals.

But, to follow the logic of Josef, to test the CPU and then to download the appropriately "tuned" application. I know we have the "stable" of applications and that there is a great deal of advice on the selection process which is why I posed the question.

In the years prior, I remember that this was one of the most common questions on this forum, which app and how to install it ...

It seems that this is one more area where we are no further along years later than we were ...

We used to call situations like this, when I was a boy, "stuck in the mud" ...

Anyway, thanks for the answers.


ID: 735271 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 20291
Credit: 7,508,002
RAC: 20
United Kingdom
Message 735373 - Posted: 6 Apr 2008, 15:02:17 UTC - in response to Message 735271.  
Last modified: 6 Apr 2008, 15:07:02 UTC

Um, thanks for the answers.

Though, the thrust was not to "bulk-up" the stock application or to do conditionals.

But, to follow the logic of Josef, to test the CPU and then to download the appropriately "tuned" application...

I think that the only workable solution is indeed to suffer 'bulking up' the stock application.

The 'best' optimisation critically depends on what features the CPU supports but also upon all of:

  • CPU caches sizes, latency, and available bandwidth;
  • system RAM latency and available bandwidth;
  • use for s@h only or whether shared with other applications.



I think you can only realistically "test-and-see" to select the best optimisations for that hardware, which means that all available optimisations all need to be available in the stock application.

Having a test application, with special support in Boinc for that test, and then many permutations of optimised applications: All that lot is way too convoluted to be maintainable.

Hence, one big application whereby the appropriate optimised subroutiones are called up as needed is the best bet at present.

Any other simpler ways?...


We used to call situations like this, when I was a boy, "stuck in the mud" ...

That's more an issue of what development effort is available.

Yes, the science application can be improved. There's fantastic volunteer effort working on that. However, for Berkeley, I suspect that mere survival and getting something very visual working such as the NITPICKER are far far greater concerns for the 3(?) people available there.


The present s@h is 'ticking along' nicely (server hardware panics aside!). The most urgent problems and developments are being worked on. Adding a few more percent performance is, I would guess, not a hot priority for the time being.

Consistency doesn't always mean "thick mud".


Good question still :-)

Happy crunchin',
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 735373 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 735383 - Posted: 6 Apr 2008, 15:32:37 UTC - in response to Message 735271.  

It seems that this is one more area where we are no further along years later than we were ...

We used to call situations like this, when I was a boy, "stuck in the mud" ...


I don't think I'd agree with the "no further along" argument, as Joe Segur points out, the current stock application does test (using benchmarks, not just CPU ID) and use routines that are more suited to certain processors....

But I also think there is a constant theme that the project(s) have more developer resources than they actually have.

Does anyone know off the top of their heads how many staff developers are working on SETI (not BOINC, SETI) science applications? I think it's just Eric, plus a few volunteers and contributions from those doing the separate optimized apps.

ID: 735383 · Report as offensive
W-K 666 Project Donor
Volunteer tester

Send message
Joined: 18 May 99
Posts: 19063
Credit: 40,757,560
RAC: 67
United Kingdom
Message 735403 - Posted: 6 Apr 2008, 16:03:12 UTC

The other thing about the Seti units is that the various apps, available at the moment, may be the good at one or two portions of the AR range and another app better at other parts of the AR range.
It's a pity Tony's research has been withdrawn, it illustrated the problem very well.

Ned, I'm pretty sure are correct in your assumption that Eric is the only one working on the Seti app, when he has time, I think he is also involved with the Nitpicker and helping Josh with AstroPulse, plus all the other Seti paperwork etc. And we keep bugging him for updates and news.
ID: 735403 · Report as offensive
Profile Clyde C. Phillips, III

Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 735434 - Posted: 6 Apr 2008, 17:58:38 UTC

Maybe it would be better to compare crunchtimes for several samples of each angle range class for each processor than to use Dhrystone and Whetstone values. Often just the number of credits awarded each unit can serve as a proxy for the comparison of angle-range workunits. It might get tougher when comparing Intels with AMDs, though. Also, for days on end, there might not be workunits of certain angle range/number-of-credits classes distributed.
ID: 735434 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 735441 - Posted: 6 Apr 2008, 18:10:37 UTC - in response to Message 735434.  

Maybe it would be better to compare crunchtimes for several samples of each angle range class for each processor than to use Dhrystone and Whetstone values. Often just the number of credits awarded each unit can serve as a proxy for the comparison of angle-range workunits. It might get tougher when comparing Intels with AMDs, though. Also, for days on end, there might not be workunits of certain angle range/number-of-credits classes distributed.

Absolutely. The very best way to do this is to use the app. as the benchmark, and to use several selected "benchmark work units" to measure performance.

Trouble is, running these benchmarks will take hours, and while crunch times are generally similar, you'd want to select the "benchmark" WUs carefully.

So (at least for SETI) the best compromise is what we have: whetstone and dhrystone for a rough measure, and duration correction factor, averaged across several work units for a more representative time estimate.

Also, for any project that counts flops, the benchmark has no role in claimed credit. If the benchmark is off, you'll overfetch or underfetch work, but it won't change your scores.
ID: 735441 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 735567 - Posted: 6 Apr 2008, 21:26:14 UTC - in response to Message 735271.  

Paul D. Buck:
Um, thanks for the answers.

Though, the thrust was not to "bulk-up" the stock application or to do conditionals.

But, to follow the logic of Josef, to test the CPU and then to download the appropriately "tuned" application. I know we have the "stable" of applications and that there is a great deal of advice on the selection process which is why I posed the question.
...

In essence, I think it's simply a matter of how many different versions of the app the project has the time to test or maintain. They currently have apps for 8 platforms (though some may be duplicates). If they had separate apps for the various CPU architectures, etc. then that might be multiplied by 4 or so.

Much optimization involves finding the parts of the code which are executed billions or trillions of times while processing a WU and figuring out more efficient ways of doing those operations. That tends to be fairly small sequences of instructions, so providing alternate variants doesn't "bulk-up" the app significantly. The added variants in stock plus the code to test them amount to about 63 KB now, for instance, and similar for the Lunatics 2.4 builds. But of course using different compiler options for different target architectures makes it necessary to have many more Lunatics builds than the project can support.

I do regret that BOINC doesn't provide a convenient method for users to get and update optimized builds from third parties. I wince every time I come across a host running an obsolete optimised version. The anonymous platform mechanism wasn't really designed for that purpose, and some fraction of users who install the optimised apps will fail to check for updates.
                                                                 Joe
ID: 735567 · Report as offensive
Odysseus
Volunteer tester
Avatar

Send message
Joined: 26 Jul 99
Posts: 1808
Credit: 6,701,347
RAC: 6
Canada
Message 735625 - Posted: 7 Apr 2008, 1:13:10 UTC - in response to Message 735383.  

Does anyone know off the top of their heads how many staff developers are working on SETI (not BOINC, SETI) science applications? I think it's just Eric, plus a few volunteers and contributions from those doing the separate optimized apps.

There’s a grad-student as well, Josh Von Korff, working on Astropulse. That’s two, or maybe one and a half—I don’t know whether Josh is full- or part-time.
ID: 735625 · Report as offensive
Brian Silvers

Send message
Joined: 11 Jun 99
Posts: 1681
Credit: 492,052
RAC: 0
United States
Message 735680 - Posted: 7 Apr 2008, 6:25:02 UTC - in response to Message 735567.  

The anonymous platform mechanism wasn't really designed for that purpose, and some fraction of users who install the optimised apps will fail to check for updates.


Joe,

Does this statement by you match what I had said, that the anonymous platform mechanism should only be for / was originally designed for unsupported OSes and not so much for SIMD optimization levels?

Thanks...

Brian
ID: 735680 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 735821 - Posted: 7 Apr 2008, 16:34:17 UTC - in response to Message 735680.  

The anonymous platform mechanism wasn't really designed for that purpose, and some fraction of users who install the optimised apps will fail to check for updates.

Joe,

Does this statement by you match what I had said, that the anonymous platform mechanism should only be for / was originally designed for unsupported OSes and not so much for SIMD optimization levels?
...
Brian

BOINC provides a set of capabilities which allow the projects to focus on the work they want to get done. The documentation for the anonymous platform mechanism certainly indicates the BOINC developers were not specifically thinking about optimized versions of open source science applications, but the feature is flexible enough to allow that usage (with limitations).

As to "should only be for", I think the BOINC developers would rather not spend time creating something new and better to handle optimized apps. They are probably pleased that what they provided is "good enough". I also think that if someone submitted code changes for something better they would accept them. They are currently working on ways to deal efficiently with multi-core and/or CPU plus GPU processing, perhaps some of the additions for that purpose will be adaptable.
                                                              Joe
ID: 735821 · Report as offensive

Message boards : Number crunching : Refresh My Memory, Why can't we detect CPU to use optimized


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.