SSE2

Message boards : Number crunching : SSE2
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 9 · Next

AuthorMessage
nemesis
Avatar

Send message
Joined: 12 Oct 99
Posts: 1408
Credit: 35,074,350
RAC: 0
Message 746825 - Posted: 2 May 2008, 4:00:25 UTC

i'm concerned that the lack of a SSE2 version of the AK V8 will make some strong computers obsolete.
think about the pentium m (centrino), pentium 4
Northwood chips.
sure the Cruncher and Chicken apps are still good, but as most will agree they are not on a level playing field.

any thoughts from my most esteemed crunching colleagues?
ID: 746825 · Report as offensive
Luke
Volunteer developer
Avatar

Send message
Joined: 31 Dec 06
Posts: 2546
Credit: 817,560
RAC: 0
New Zealand
Message 746832 - Posted: 2 May 2008, 4:34:12 UTC

I totally agree....
where is the SSE2 version??????


- Luke.
ID: 746832 · Report as offensive
tfp
Volunteer tester

Send message
Joined: 20 Feb 01
Posts: 104
Credit: 3,137,259
RAC: 0
United States
Message 746835 - Posted: 2 May 2008, 4:42:39 UTC
Last modified: 2 May 2008, 4:43:22 UTC

In the whale thread it states that Alex never made a SSE3 version (apple never had an x86 release with less then SSE3), because of that to get a SSE2 or earlier version much more work would need to be done to make the older clients.

The first release was a port of the Alex base code with tweaks, if they release a version for SSE2, SSE, and/or MMX much would need to be done.
ID: 746835 · Report as offensive
Profile hiamps
Volunteer tester
Avatar

Send message
Joined: 23 May 99
Posts: 4292
Credit: 72,971,319
RAC: 0
United States
Message 746837 - Posted: 2 May 2008, 4:44:46 UTC

Guess it is about time to upgrade the 2.8's and 3.0's anyways. Q6600's will be nicer...now where can I get some quick cash.....
Official Abuser of Boinc Buttons...
And no good credit hound!
ID: 746837 · Report as offensive
Josef W. Segur
Volunteer developer
Volunteer tester

Send message
Joined: 30 Oct 99
Posts: 4504
Credit: 1,414,761
RAC: 0
United States
Message 746845 - Posted: 2 May 2008, 5:03:57 UTC - in response to Message 746832.  

I totally agree....
where is the SSE2 version??????

Mac/Intel systems don't go back that far, they all have SSE3 or better so where Alex Kan's code is different from the stock 5.13 he started with it's very different. Don't forget he also has PPC code, and that came before the Mac/Intel. That means doing SSE2, SSE, or other "ports" is significantly more work. OTOH, SSE3 didn't add a lot of instructions over SSE2 so it's largely a matter of finding the places where those few new instructions are used and providing similar functionality. 'Taint simple, but not impossible either.
                                                                Joe
ID: 746845 · Report as offensive
Profile [B^S] madmac
Volunteer tester
Avatar

Send message
Joined: 9 Feb 04
Posts: 1175
Credit: 4,754,897
RAC: 0
United Kingdom
Message 746909 - Posted: 2 May 2008, 11:21:36 UTC

According to BOINC I have got SSE2 and I will wait until the new version does come out. However I am sure that I have got SSE3 can anyone tell me what site to go to so that I can check. Thank you in advance.
ID: 746909 · Report as offensive
jegs

Send message
Joined: 3 May 07
Posts: 16
Credit: 7,649
RAC: 0
Message 746912 - Posted: 2 May 2008, 11:31:59 UTC - in response to Message 746909.  

According to BOINC I have got SSE2 and I will wait until the new version does come out. However I am sure that I have got SSE3 can anyone tell me what site to go to so that I can check. Thank you in advance.


Yes your P4 2.93 GHz does support SSE3. You can download CPU-Z and run it to find all of the info on your CPU.
ID: 746912 · Report as offensive
Profile SATAN
Avatar

Send message
Joined: 27 Aug 06
Posts: 835
Credit: 2,129,006
RAC: 0
United Kingdom
Message 746914 - Posted: 2 May 2008, 11:41:49 UTC

I can't believe some people around here. Jason, JD and the other guys have done something brilliant and all some people can do is moan about it. Us Mac guys have had an advantage for ages thanks to Alex, a group of people come along and make the field a little more level and you moan?

What a way to say thank you.

If those of you want a SSE2 app why don't you do something about it? Are you going to help the port the code backwards? The code has been available for 18 months or so yet none of you did anything about it.

You have no right to complain about them not making and SSE2 app, you have had plenty of time to do it on your own.

ID: 746914 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 746925 - Posted: 2 May 2008, 12:17:16 UTC
Last modified: 2 May 2008, 12:30:36 UTC

LOL, Down Satan! (Hehe , sorry, I had to say it ;D)

I completely understand the need for an SSE2 version, a Linux version for each SSE level too, as my p4 Northwood is sitting in a drawer gathering dust, and I would like to use it to play with Linux and crunch a bit too. Though no match for modern hardware it'll store some files and doesn't use all that much power.

SSE2 back conversion is no trivial task as you might all have guessed. There are only a few instructions in SSE3, but they are hairy compound ones that were introduced by Intel to simplify common code sequences.... So we need to UNsimplify those sequences :S.

As you might also have guessed some of Alex's hand codings are conceptually and functionally rather intricate to begin with (Picture a Swiss watch made out of soft spaghetti noodles).

For a Linux build, Alex's code might be easier to use as a starting point, though the windows port sources are available on Lunatics (to those who can reach it at the moment! Someone much more experienced with Linux development will hopefully beat me to having a crack at that build, though I'll definitely be tinkering in the meantime.

An SSE2 build might be a natural consequence of the Linux port also, not sure.

Jason
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 746925 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 746927 - Posted: 2 May 2008, 12:28:30 UTC

One solution, I hesitate to mention, is for the older machines to migrate to other projects. I guess if seti, as a group, isn't inclusive enough people will vote with their feet and find more suitable projects. I'm sure this would be an unintended outcome, yet, as an owner of some older machines, I feel a bit slighted that they can't yet run as efficiently as my newest machines.

Yet, if someone else was willing to port AK8 to an older machine or two, how would she do it? I don't see that the code is available anywhere. Just the executable and references to the original code exists for download; not the AK8 variety. Perhaps I can't find it, or, like that French guy, someone is unwilling to release it to the group at large.
ID: 746927 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 746931 - Posted: 2 May 2008, 12:34:48 UTC - in response to Message 746927.  
Last modified: 2 May 2008, 12:35:59 UTC

Alex's source AFAIK has always been available, from the location posted in the new Mac Apps of the NEW OPTIMISED APPS thread, stickied at the top. The Windows port source is available at http://lunatics.kwsn.net ... In the downloads section
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 746931 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 746956 - Posted: 2 May 2008, 14:02:23 UTC - in response to Message 746925.  


I completely understand the need for an SSE2 version, a Linux version for each SSE level too, as my p4 Northwood is sitting in a drawer gathering dust, and I would like to use it to play with Linux and crunch a bit too. Though no match for modern hardware it'll store some files and doesn't use all that much power.

SSE2 back conversion is no trivial task as you might all have guessed. There are only a few instructions in SSE3, but they are hairy compound ones that were introduced by Intel to simplify common code sequences.... So we need to UNsimplify those sequences :S.


I've had a go at the SSE2 last night with a pitcher of Strawberry-Mango Margaritas... Got it to build ~4am, but was crashing early in the WU... A quick debugging session after a couple hours sleep seem promising. Now I'm crashing at the end of my simple WU instead of at the beginning.

anyway... point is: I've desimplified some of those SSE3 calls to SSE sequences...

example:

p1 = _mm_hadd_ps( f1_1, f1_2 );

becomes
p1 = _mm_add_ps(_mm_shuffle_ps(f1_1, f1_2, 0x88), _mm_shuffle_ps(f1_1, f1_2, 0xdd));



and

cd1 = _mm_addsub_ps(cd1, td1);

becomes
cd1 = _mm_shuffle_ps(d1=_mm_shuffle_ps(_mm_sub_ps(cd1,td1),_mm_add_ps(cd1,td1),0xd8), d1, 0xd8);


Also, these macros work for a couple others...
#define _mm_moveldup_ps(X) (_mm_shuffle_ps( (X) , (X) , 0xa0))
#define _mm_movehdup_ps(X) (_mm_shuffle_ps( (X) , (X) , 0xf5))



So..... if anyone wants to pick this up and have a go, the above might help you get started...

Back to trying to get this thing to run on my P4M (Northwood) laptop.

Regards,
John
ID: 746956 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 746974 - Posted: 2 May 2008, 15:04:06 UTC
Last modified: 2 May 2008, 15:04:25 UTC

LOL, so my "Swiss watch made from wet noodles" analogy isn't too far off then ... ;D
"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 746974 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 746981 - Posted: 2 May 2008, 15:21:30 UTC - in response to Message 746974.  

LOL, so my "Swiss watch made from wet noodles" analogy isn't too far off then ... ;D


LOL....I know my brain feels like "wet noodles" after sorting those MM_SHUFFLE MASKS.

I think I'll implement macros for all SSE3 functions and restart with the original code.... we know the original works! My brain is soggy, I don't know anything right now!

ID: 746981 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 746984 - Posted: 2 May 2008, 15:28:40 UTC

Shuffle masks and booze eh?

I find that numbering bottle caps (Crown Lager for example) helps, then you can move them around, make little towers out of them, then wonder why you put numbers on the bottle caps ;D

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 746984 · Report as offensive
Profile Dennis

Send message
Joined: 26 Jun 07
Posts: 153
Credit: 15,826,319
RAC: 0
United States
Message 746999 - Posted: 2 May 2008, 15:53:35 UTC

I would think the AMD's Phenom quads and X64 dual's would benifit from a SSE2 app build since all I read on this is how bad the AMD's use/made the SSE3... and so work better on the SSE2. I would love to know I am wrong here. All you involved in this new application, again, thank you for the all your efforts.
ID: 746999 · Report as offensive
Profile jason_gee
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 24 Nov 06
Posts: 7489
Credit: 91,093,184
RAC: 0
Australia
Message 747002 - Posted: 2 May 2008, 16:00:29 UTC - in response to Message 746999.  

I would think the AMD's Phenom quads and X64 dual's would benifit from a SSE2 app build since all I read on this is how bad the AMD's use/made the SSE3... and so work better on the SSE2. I would love to know I am wrong here. All you involved in this new application, again, thank you for the all your efforts.


Hi Dennis,
AFAIK the Phenom works fine with the SSE3 version. I'd be very surprised if an SSE2 version were faster on that because that would require more instructions in flight to achieve the same task. I think that earlier lunatics builds were so even in performance, for a different reason, because of the way RAM was being accessed.

Jason

"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.
ID: 747002 · Report as offensive
Laffo

Send message
Joined: 30 Dec 02
Posts: 13
Credit: 35,665,239
RAC: 0
United Kingdom
Message 747013 - Posted: 2 May 2008, 16:45:47 UTC

Hi Dennis,
The new AK V8 SSE3 app works like a charm on my Phenom. ON the 54 credit WU's it has brought the crunch times down from 2Hrs 5mins to 1Hr 15mins, thats no mean feat.An SSE2 version may or may not be beneficial to the Phenom but would to chips not supporting SSE3.If you have a Phenom give it a try you won't be dissapointed.

Laffo.
ID: 747013 · Report as offensive
Profile zoom3+1=4
Volunteer tester
Avatar

Send message
Joined: 30 Nov 03
Posts: 65763
Credit: 55,293,173
RAC: 49
United States
Message 747056 - Posted: 2 May 2008, 18:05:53 UTC - in response to Message 746981.  

LOL, so my "Swiss watch made from wet noodles" analogy isn't too far off then ... ;D


LOL....I know my brain feels like "wet noodles" after sorting those MM_SHUFFLE MASKS.

I think I'll implement macros for all SSE3 functions and restart with the original code.... we know the original works! My brain is soggy, I don't know anything right now!


Well as long as You aren't a Wet Noodle and don't know It, And I don't think You are of course.
The T1 Trust, PRR T1 Class 4-4-4-4 #5550, 1 of America's First HST's
ID: 747056 · Report as offensive
Profile JDWhale
Volunteer tester
Avatar

Send message
Joined: 6 Apr 99
Posts: 921
Credit: 21,935,817
RAC: 3
United States
Message 747095 - Posted: 2 May 2008, 19:40:47 UTC
Last modified: 2 May 2008, 19:42:12 UTC

I'm running live with my first "cut" at SSE2 client on my P4M laptop Gilligan. It'll be a few hours before he reports, I'm predicting ~20% or better improvement on the mid AR WUs in his queue (currently 6.5 hours down to 5.0 hours).

The code changes will require minimal effort to install on current ported code. Now it's just playing the "waiting game" to see if Gilligan can get a variety of WUs to see how other ARs perform.

Regards,
JDWhale
ID: 747095 · Report as offensive
1 · 2 · 3 · 4 . . . 9 · Next

Message boards : Number crunching : SSE2


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.