PowerPC Altivec Code for G4 and G5

Message boards : Number crunching : PowerPC Altivec Code for G4 and G5
Message board moderation

To post messages, you must log in.

AuthorMessage
SETI @ The Anderson House

Send message
Joined: 12 Jul 00
Posts: 8
Credit: 301,172
RAC: 0
United States
Message 3267 - Posted: 1 Jul 2004, 22:38:13 UTC
Last modified: 2 Jul 2004, 15:42:43 UTC

I wanted to post the follow code that I created. It has a decent amount of speed up. I have more following but it's not finished yet. This code works for the G4 and G5 with Altivec.


//The time test results:
// 0.455862 for the original code
// 0.207627 for the altivec code
void v_GetPowerSpectrum(
float* FreqData,
float* PowerSpectrum,
int NumDataPoints)
{
register vector float t1, t2, s1, s2;
register vector float zero = (vector float) vec_splat_u32(0);

vector unsigned char mergeHigh = (vector unsigned char)
( 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27 );
vector unsigned char mergeLow = (vector unsigned char)
( 4, 5, 6, 7, 12, 13, 14, 15, 20, 21, 22, 23, 28, 29, 30, 31 );


for (int i = 0, j = 0; i
ID: 3267 · Report as offensive
SETI @ The Anderson House

Send message
Joined: 12 Jul 00
Posts: 8
Credit: 301,172
RAC: 0
United States
Message 3269 - Posted: 1 Jul 2004, 22:39:22 UTC - in response to Message 3267.  
Last modified: 2 Jul 2004, 15:45:54 UTC

hmmmm it seems that bad things happen when posting this code.

so I'll link to it!!

http://www.djbradanderson.com/ppc_seti_fft8.txt


This code is the altivec-ized version of: v_ChirpData, v_GetPowerSpectrum, and cftfsub

There is more code for cftmdl_optimized, but then setting the optimization flag from -O0 to -O2, something about the function breaks for now.

Here are the results of the speed test:
original power elapsed Time: 0.455862
optimized power elapsed Time: 0.207627
power version had 0 offending results

original chirp elapsed Time: 1.641663
optimized chirp elapsed Time: 1.460379
chirp version had 0 offending results



//primary fft function
//This function is about a 3-5% speed gain.
//The results vary on the size of the FFT, obviously, the larger the faster
void cftfsub(int n, float *a, float *w) {
void cft1st(int n, float *a, float *w);
void cftmdl(int n, int l, float *a, float *w);
int j, j1, j2, j3, l;
register vector float vaj0, vaj1, vaj2, vaj3, vajl0, vt0, vt1, vt2, vt3, v1, v2;

register vector float zero = (vector float) vec_splat_u32(0);

l = 2;
if (n >= 16) {
cft1st(n, a, w);
l = 16;
while ((l
ID: 3269 · Report as offensive
SETI @ The Anderson House

Send message
Joined: 12 Jul 00
Posts: 8
Credit: 301,172
RAC: 0
United States
Message 3289 - Posted: 1 Jul 2004, 23:32:50 UTC - in response to Message 3269.  

The compiler needs option "-faltivec"

I just posted the altivec inner fft loop. After some research i found that the FFTW project is having problems with the vec_perm (permutation) function given the compiler gcc 3.x. That is quite unfortunate.

let me show you the speed gains with this altivec inner loop:

testing fft size: 32768
original elapsed Time: 1.281796
optimized elapsed Time: 1.071806

testing fft size: 65536
original elapsed Time: 1.503782
optimized elapsed Time: 1.167144

testing fft size: 131072
original elapsed Time: 1.636699
optimized elapsed Time: 1.462052

testing fft size: 262144
original elapsed Time: 2.541583
optimized elapsed Time: 1.890876

This last one is the default size of the largest FFT in seti, so I believe. This is a hefty speed gain!!
ID: 3289 · Report as offensive
SETI @ The Anderson House

Send message
Joined: 12 Jul 00
Posts: 8
Credit: 301,172
RAC: 0
United States
Message 3505 - Posted: 2 Jul 2004, 15:55:41 UTC

The code can be selected through the define constant __POWERPC__

#ifdef __POWERPC__

#include "ppcfft8.cpp"
#else

// all of the code this is replacing
#endif
ID: 3505 · Report as offensive
SETI @ The Anderson House

Send message
Joined: 12 Jul 00
Posts: 8
Credit: 301,172
RAC: 0
United States
Message 23518 - Posted: 8 Sep 2004, 2:24:06 UTC - in response to Message 3505.  

I finally have it posted!!!

http://www.djbradanderson.com/global/seti.php


Thanks to Mikkyo!


This code is working about 25% faster than the normal G5 optimized code. At top speed, this code is pushing 3500 MIPS in the average sized FFT. This puts it in line with the FFTW 3 library. Lastly, this is pushing the temperature on my CPU up near 150 degree F. Run this version at the disadvantage of a little more noise coming from your computer.

Bo-ya-ca-sha.
ID: 23518 · Report as offensive
Petit Soleil
Avatar

Send message
Joined: 17 Feb 03
Posts: 1497
Credit: 70,934
RAC: 0
Canada
Message 23588 - Posted: 8 Sep 2004, 6:43:25 UTC
Last modified: 8 Sep 2004, 7:42:56 UTC

Would it work on a PowerPC 7441 (eMAC 800) Running OS 10.2.8 ?
I am currently running BOINC 4.05 + SETI 3.10 (from Mikkyo)
SETI 4.02 not working on 10.2.8

Thanks
Marc
ID: 23588 · Report as offensive
Petit Soleil
Avatar

Send message
Joined: 17 Feb 03
Posts: 1497
Credit: 70,934
RAC: 0
Canada
Message 23680 - Posted: 8 Sep 2004, 13:14:52 UTC

Would it work on a PowerPC 7441 (eMAC 800) Running OS 10.2.8 ?
I am currently running BOINC 4.05 + SETI 3.10 (from Mikkyo)
SETI 4.02 not working on 10.2.8

Thanks
Marc

ID: 23680 · Report as offensive
Petit Soleil
Avatar

Send message
Joined: 17 Feb 03
Posts: 1497
Credit: 70,934
RAC: 0
Canada
Message 24075 - Posted: 9 Sep 2004, 12:59:07 UTC

up
ID: 24075 · Report as offensive
goobus

Send message
Joined: 5 Mar 03
Posts: 1
Credit: 349,295
RAC: 0
United States
Message 25265 - Posted: 11 Sep 2004, 20:47:09 UTC

http://members.dslextreme.com/~readerforum/forum_team/boinc.html

for optimized compiles. They are FAST
ID: 25265 · Report as offensive
Petit Soleil
Avatar

Send message
Joined: 17 Feb 03
Posts: 1497
Credit: 70,934
RAC: 0
Canada
Message 25309 - Posted: 11 Sep 2004, 22:40:13 UTC - in response to Message 25265.  

> http://members.dslextreme.com/~readerforum/forum_team/boinc.html
>
> for optimized compiles. They are FAST

The only problem for me is I'm running 10.2.8
ID: 25309 · Report as offensive
Profile Shaktai
Volunteer tester
Avatar

Send message
Joined: 16 Jun 99
Posts: 211
Credit: 259,752
RAC: 0
United States
Message 25370 - Posted: 12 Sep 2004, 2:57:18 UTC

Yeah, mikkyo tried to get a 10.2.x version working under BOINC 3.x, but it was erratic. Worked for some and not for others. Apparently BOINC Dev had similar problems and just hasn't had the time pursue it further. -- Maybe someday, someone will figure out the glitch.



The best Macintosh team ever.
ID: 25370 · Report as offensive
Petit Soleil
Avatar

Send message
Joined: 17 Feb 03
Posts: 1497
Credit: 70,934
RAC: 0
Canada
Message 25383 - Posted: 12 Sep 2004, 3:18:02 UTC - in response to Message 25370.  
Last modified: 12 Sep 2004, 3:19:20 UTC

> Yeah, mikkyo tried to get a 10.2.x version working under BOINC 3.x, but it was
> erratic. Worked for some and not for others. Apparently BOINC Dev had
> similar problems and just hasn't had the time pursue it further. -- Maybe
> someday, someone will figure out the glitch.

Hi Shaktai,

The mikkyo's BOINC 3.x + SETI 3.10 is working very well for me. I am still running SETI 3.10 under BOINC 4.05 without any problem. Actually as long as I can run the project I don't mind too much about speed. I rather takes 10 hours per WU than nothing. I am also concern about the low end eMAC running a "supercharged" version that would make it "work" too hard. I've had a few
problems with eMACs and I have the feeling they are fragile box.

I wanted to give it a try though. but it seem that the optimized version available here works for G4 7400 and G4 7450. Would it work on a PowerPC 7441 (eMAC 800) Running OS 10.2.8 ?

I didn't know there was so many version of G4. I thought they were all the same.

Friendly
Marc

-.-. --.- -.. -..- . - --... ...-- .-.-. -.-
ID: 25383 · Report as offensive
Profile Shaktai
Volunteer tester
Avatar

Send message
Joined: 16 Jun 99
Posts: 211
Credit: 259,752
RAC: 0
United States
Message 25426 - Posted: 12 Sep 2004, 5:33:15 UTC - in response to Message 25383.  

>
> I wanted to give it a try though. but it seem that the optimized version
> available here works for G4 7400 and G4 7450. Would it work on a PowerPC 7441
> (eMAC 800) Running OS 10.2.8 ?
>
> Friendly
> Marc

Hey Marc,

I don't know about the eMacs being fragile, but it could be they just need a little better cooling. The 7450 version should work on the 7441 fine. Actually both will work okay, but you could test them to see which delivers the best performance. It should be the 7450 version.



The best Macintosh team ever.
ID: 25426 · Report as offensive
Petit Soleil
Avatar

Send message
Joined: 17 Feb 03
Posts: 1497
Credit: 70,934
RAC: 0
Canada
Message 25517 - Posted: 12 Sep 2004, 14:38:18 UTC - in response to Message 25426.  

> Hey Marc,
>
> I don't know about the eMacs being fragile, but it could be they just need a
> little better cooling. The 7450 version should work on the 7441 fine.
> Actually both will work okay, but you could test them to see which delivers
> the best performance. It should be the 7450 version.

Thanks Shaktai,

I will give it a try then. I'll post the results here.

Friendly
Marc
ID: 25517 · Report as offensive
Petit Soleil
Avatar

Send message
Joined: 17 Feb 03
Posts: 1497
Credit: 70,934
RAC: 0
Canada
Message 25648 - Posted: 12 Sep 2004, 20:34:44 UTC
Last modified: 12 Sep 2004, 20:59:36 UTC

@Shaktai

I have tried both version and it's not working, processed signal 5 error or something like that again. Remember that I am running 10.2.8 and it's probably why. I will stick with SETI 3.10 for now. 10 hours per WU is better than nothing.

Friendly
Marc

-.-. --.- -.. -..- . - --... ...-- .-.-. -.-
ID: 25648 · Report as offensive

Message boards : Number crunching : PowerPC Altivec Code for G4 and G5


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.