Message boards :
Number crunching :
SSE2
Message board moderation
Author | Message |
---|---|
nemesis Send message Joined: 12 Oct 99 Posts: 1408 Credit: 35,074,350 RAC: 0 |
i'm concerned that the lack of a SSE2 version of the AK V8 will make some strong computers obsolete. think about the pentium m (centrino), pentium 4 Northwood chips. sure the Cruncher and Chicken apps are still good, but as most will agree they are not on a level playing field. any thoughts from my most esteemed crunching colleagues? |
Luke Send message Joined: 31 Dec 06 Posts: 2546 Credit: 817,560 RAC: 0 |
|
tfp Send message Joined: 20 Feb 01 Posts: 104 Credit: 3,137,259 RAC: 0 |
In the whale thread it states that Alex never made a SSE3 version (apple never had an x86 release with less then SSE3), because of that to get a SSE2 or earlier version much more work would need to be done to make the older clients. The first release was a port of the Alex base code with tweaks, if they release a version for SSE2, SSE, and/or MMX much would need to be done. |
hiamps Send message Joined: 23 May 99 Posts: 4292 Credit: 72,971,319 RAC: 0 |
Guess it is about time to upgrade the 2.8's and 3.0's anyways. Q6600's will be nicer...now where can I get some quick cash..... Official Abuser of Boinc Buttons... And no good credit hound! |
Josef W. Segur Send message Joined: 30 Oct 99 Posts: 4504 Credit: 1,414,761 RAC: 0 |
I totally agree.... Mac/Intel systems don't go back that far, they all have SSE3 or better so where Alex Kan's code is different from the stock 5.13 he started with it's very different. Don't forget he also has PPC code, and that came before the Mac/Intel. That means doing SSE2, SSE, or other "ports" is significantly more work. OTOH, SSE3 didn't add a lot of instructions over SSE2 so it's largely a matter of finding the places where those few new instructions are used and providing similar functionality. 'Taint simple, but not impossible either. Joe |
[B^S] madmac Send message Joined: 9 Feb 04 Posts: 1175 Credit: 4,754,897 RAC: 0 |
|
jegs Send message Joined: 3 May 07 Posts: 16 Credit: 7,649 RAC: 0 |
According to BOINC I have got SSE2 and I will wait until the new version does come out. However I am sure that I have got SSE3 can anyone tell me what site to go to so that I can check. Thank you in advance. Yes your P4 2.93 GHz does support SSE3. You can download CPU-Z and run it to find all of the info on your CPU. |
SATAN Send message Joined: 27 Aug 06 Posts: 835 Credit: 2,129,006 RAC: 0 |
I can't believe some people around here. Jason, JD and the other guys have done something brilliant and all some people can do is moan about it. Us Mac guys have had an advantage for ages thanks to Alex, a group of people come along and make the field a little more level and you moan? What a way to say thank you. If those of you want a SSE2 app why don't you do something about it? Are you going to help the port the code backwards? The code has been available for 18 months or so yet none of you did anything about it. You have no right to complain about them not making and SSE2 app, you have had plenty of time to do it on your own. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
LOL, Down Satan! (Hehe , sorry, I had to say it ;D) I completely understand the need for an SSE2 version, a Linux version for each SSE level too, as my p4 Northwood is sitting in a drawer gathering dust, and I would like to use it to play with Linux and crunch a bit too. Though no match for modern hardware it'll store some files and doesn't use all that much power. SSE2 back conversion is no trivial task as you might all have guessed. There are only a few instructions in SSE3, but they are hairy compound ones that were introduced by Intel to simplify common code sequences.... So we need to UNsimplify those sequences :S. As you might also have guessed some of Alex's hand codings are conceptually and functionally rather intricate to begin with (Picture a Swiss watch made out of soft spaghetti noodles). For a Linux build, Alex's code might be easier to use as a starting point, though the windows port sources are available on Lunatics (to those who can reach it at the moment! Someone much more experienced with Linux development will hopefully beat me to having a crack at that build, though I'll definitely be tinkering in the meantime. An SSE2 build might be a natural consequence of the Linux port also, not sure. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
One solution, I hesitate to mention, is for the older machines to migrate to other projects. I guess if seti, as a group, isn't inclusive enough people will vote with their feet and find more suitable projects. I'm sure this would be an unintended outcome, yet, as an owner of some older machines, I feel a bit slighted that they can't yet run as efficiently as my newest machines. Yet, if someone else was willing to port AK8 to an older machine or two, how would she do it? I don't see that the code is available anywhere. Just the executable and references to the original code exists for download; not the AK8 variety. Perhaps I can't find it, or, like that French guy, someone is unwilling to release it to the group at large. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Alex's source AFAIK has always been available, from the location posted in the new Mac Apps of the NEW OPTIMISED APPS thread, stickied at the top. The Windows port source is available at http://lunatics.kwsn.net ... In the downloads section "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
I've had a go at the SSE2 last night with a pitcher of Strawberry-Mango Margaritas... Got it to build ~4am, but was crashing early in the WU... A quick debugging session after a couple hours sleep seem promising. Now I'm crashing at the end of my simple WU instead of at the beginning. anyway... point is: I've desimplified some of those SSE3 calls to SSE sequences... example: p1 = _mm_hadd_ps( f1_1, f1_2 ); becomes p1 = _mm_add_ps(_mm_shuffle_ps(f1_1, f1_2, 0x88), _mm_shuffle_ps(f1_1, f1_2, 0xdd)); and cd1 = _mm_addsub_ps(cd1, td1); becomes cd1 = _mm_shuffle_ps(d1=_mm_shuffle_ps(_mm_sub_ps(cd1,td1),_mm_add_ps(cd1,td1),0xd8), d1, 0xd8); Also, these macros work for a couple others... #define _mm_moveldup_ps(X) (_mm_shuffle_ps( (X) , (X) , 0xa0)) #define _mm_movehdup_ps(X) (_mm_shuffle_ps( (X) , (X) , 0xf5)) So..... if anyone wants to pick this up and have a go, the above might help you get started... Back to trying to get this thing to run on my P4M (Northwood) laptop. Regards, John |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
LOL, so my "Swiss watch made from wet noodles" analogy isn't too far off then ... ;D "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
LOL, so my "Swiss watch made from wet noodles" analogy isn't too far off then ... ;D LOL....I know my brain feels like "wet noodles" after sorting those MM_SHUFFLE MASKS. I think I'll implement macros for all SSE3 functions and restart with the original code.... we know the original works! My brain is soggy, I don't know anything right now! |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
Shuffle masks and booze eh? I find that numbering bottle caps (Crown Lager for example) helps, then you can move them around, make little towers out of them, then wonder why you put numbers on the bottle caps ;D "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Dennis Send message Joined: 26 Jun 07 Posts: 153 Credit: 15,826,319 RAC: 0 |
I would think the AMD's Phenom quads and X64 dual's would benifit from a SSE2 app build since all I read on this is how bad the AMD's use/made the SSE3... and so work better on the SSE2. I would love to know I am wrong here. All you involved in this new application, again, thank you for the all your efforts. |
jason_gee Send message Joined: 24 Nov 06 Posts: 7489 Credit: 91,093,184 RAC: 0 |
I would think the AMD's Phenom quads and X64 dual's would benifit from a SSE2 app build since all I read on this is how bad the AMD's use/made the SSE3... and so work better on the SSE2. I would love to know I am wrong here. All you involved in this new application, again, thank you for the all your efforts. Hi Dennis, AFAIK the Phenom works fine with the SSE3 version. I'd be very surprised if an SSE2 version were faster on that because that would require more instructions in flight to achieve the same task. I think that earlier lunatics builds were so even in performance, for a different reason, because of the way RAM was being accessed. Jason "Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions. |
Laffo Send message Joined: 30 Dec 02 Posts: 13 Credit: 35,665,239 RAC: 0 |
Hi Dennis, The new AK V8 SSE3 app works like a charm on my Phenom. ON the 54 credit WU's it has brought the crunch times down from 2Hrs 5mins to 1Hr 15mins, thats no mean feat.An SSE2 version may or may not be beneficial to the Phenom but would to chips not supporting SSE3.If you have a Phenom give it a try you won't be dissapointed. Laffo. |
zoom3+1=4 Send message Joined: 30 Nov 03 Posts: 65763 Credit: 55,293,173 RAC: 49 |
LOL, so my "Swiss watch made from wet noodles" analogy isn't too far off then ... ;D Well as long as You aren't a Wet Noodle and don't know It, And I don't think You are of course. The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's |
JDWhale Send message Joined: 6 Apr 99 Posts: 921 Credit: 21,935,817 RAC: 3 |
I'm running live with my first "cut" at SSE2 client on my P4M laptop Gilligan. It'll be a few hours before he reports, I'm predicting ~20% or better improvement on the mid AR WUs in his queue (currently 6.5 hours down to 5.0 hours). The code changes will require minimal effort to install on current ported code. Now it's just playing the "waiting game" to see if Gilligan can get a variety of WUs to see how other ARs perform. Regards, JDWhale |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.