Message boards :
Number crunching :
PowerPC Altivec Code for G4 and G5
Message board moderation
Author | Message |
---|---|
SETI @ The Anderson House Send message Joined: 12 Jul 00 Posts: 8 Credit: 301,172 RAC: 0 |
I wanted to post the follow code that I created. It has a decent amount of speed up. I have more following but it's not finished yet. This code works for the G4 and G5 with Altivec. //The time test results: // 0.455862 for the original code // 0.207627 for the altivec code void v_GetPowerSpectrum( float* FreqData, float* PowerSpectrum, int NumDataPoints) { register vector float t1, t2, s1, s2; register vector float zero = (vector float) vec_splat_u32(0); vector unsigned char mergeHigh = (vector unsigned char) ( 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27 ); vector unsigned char mergeLow = (vector unsigned char) ( 4, 5, 6, 7, 12, 13, 14, 15, 20, 21, 22, 23, 28, 29, 30, 31 ); for (int i = 0, j = 0; i |
SETI @ The Anderson House Send message Joined: 12 Jul 00 Posts: 8 Credit: 301,172 RAC: 0 |
hmmmm it seems that bad things happen when posting this code. so I'll link to it!! http://www.djbradanderson.com/ppc_seti_fft8.txt This code is the altivec-ized version of: v_ChirpData, v_GetPowerSpectrum, and cftfsub There is more code for cftmdl_optimized, but then setting the optimization flag from -O0 to -O2, something about the function breaks for now. Here are the results of the speed test: original power elapsed Time: 0.455862 optimized power elapsed Time: 0.207627 power version had 0 offending results original chirp elapsed Time: 1.641663 optimized chirp elapsed Time: 1.460379 chirp version had 0 offending results //primary fft function //This function is about a 3-5% speed gain. //The results vary on the size of the FFT, obviously, the larger the faster void cftfsub(int n, float *a, float *w) { void cft1st(int n, float *a, float *w); void cftmdl(int n, int l, float *a, float *w); int j, j1, j2, j3, l; register vector float vaj0, vaj1, vaj2, vaj3, vajl0, vt0, vt1, vt2, vt3, v1, v2; register vector float zero = (vector float) vec_splat_u32(0); l = 2; if (n >= 16) { cft1st(n, a, w); l = 16; while ((l |
SETI @ The Anderson House Send message Joined: 12 Jul 00 Posts: 8 Credit: 301,172 RAC: 0 |
The compiler needs option "-faltivec" I just posted the altivec inner fft loop. After some research i found that the FFTW project is having problems with the vec_perm (permutation) function given the compiler gcc 3.x. That is quite unfortunate. let me show you the speed gains with this altivec inner loop: testing fft size: 32768 original elapsed Time: 1.281796 optimized elapsed Time: 1.071806 testing fft size: 65536 original elapsed Time: 1.503782 optimized elapsed Time: 1.167144 testing fft size: 131072 original elapsed Time: 1.636699 optimized elapsed Time: 1.462052 testing fft size: 262144 original elapsed Time: 2.541583 optimized elapsed Time: 1.890876 This last one is the default size of the largest FFT in seti, so I believe. This is a hefty speed gain!! |
SETI @ The Anderson House Send message Joined: 12 Jul 00 Posts: 8 Credit: 301,172 RAC: 0 |
The code can be selected through the define constant __POWERPC__ #ifdef __POWERPC__ #include "ppcfft8.cpp" #else // all of the code this is replacing #endif |
SETI @ The Anderson House Send message Joined: 12 Jul 00 Posts: 8 Credit: 301,172 RAC: 0 |
I finally have it posted!!! http://www.djbradanderson.com/global/seti.php Thanks to Mikkyo! This code is working about 25% faster than the normal G5 optimized code. At top speed, this code is pushing 3500 MIPS in the average sized FFT. This puts it in line with the FFTW 3 library. Lastly, this is pushing the temperature on my CPU up near 150 degree F. Run this version at the disadvantage of a little more noise coming from your computer. Bo-ya-ca-sha. |
Petit Soleil Send message Joined: 17 Feb 03 Posts: 1497 Credit: 70,934 RAC: 0 |
Would it work on a PowerPC 7441 (eMAC 800) Running OS 10.2.8 ? I am currently running BOINC 4.05 + SETI 3.10 (from Mikkyo) SETI 4.02 not working on 10.2.8 Thanks Marc |
Petit Soleil Send message Joined: 17 Feb 03 Posts: 1497 Credit: 70,934 RAC: 0 |
Would it work on a PowerPC 7441 (eMAC 800) Running OS 10.2.8 ? I am currently running BOINC 4.05 + SETI 3.10 (from Mikkyo) SETI 4.02 not working on 10.2.8 Thanks Marc |
Petit Soleil Send message Joined: 17 Feb 03 Posts: 1497 Credit: 70,934 RAC: 0 |
up |
goobus Send message Joined: 5 Mar 03 Posts: 1 Credit: 349,295 RAC: 0 |
http://members.dslextreme.com/~readerforum/forum_team/boinc.html for optimized compiles. They are FAST |
Petit Soleil Send message Joined: 17 Feb 03 Posts: 1497 Credit: 70,934 RAC: 0 |
> http://members.dslextreme.com/~readerforum/forum_team/boinc.html > > for optimized compiles. They are FAST The only problem for me is I'm running 10.2.8 |
Shaktai Send message Joined: 16 Jun 99 Posts: 211 Credit: 259,752 RAC: 0 |
|
Petit Soleil Send message Joined: 17 Feb 03 Posts: 1497 Credit: 70,934 RAC: 0 |
> Yeah, mikkyo tried to get a 10.2.x version working under BOINC 3.x, but it was > erratic. Worked for some and not for others. Apparently BOINC Dev had > similar problems and just hasn't had the time pursue it further. -- Maybe > someday, someone will figure out the glitch. Hi Shaktai, The mikkyo's BOINC 3.x + SETI 3.10 is working very well for me. I am still running SETI 3.10 under BOINC 4.05 without any problem. Actually as long as I can run the project I don't mind too much about speed. I rather takes 10 hours per WU than nothing. I am also concern about the low end eMAC running a "supercharged" version that would make it "work" too hard. I've had a few problems with eMACs and I have the feeling they are fragile box. I wanted to give it a try though. but it seem that the optimized version available here works for G4 7400 and G4 7450. Would it work on a PowerPC 7441 (eMAC 800) Running OS 10.2.8 ? I didn't know there was so many version of G4. I thought they were all the same. Friendly Marc -.-. --.- -.. -..- . - --... ...-- .-.-. -.- |
Shaktai Send message Joined: 16 Jun 99 Posts: 211 Credit: 259,752 RAC: 0 |
> > I wanted to give it a try though. but it seem that the optimized version > available here works for G4 7400 and G4 7450. Would it work on a PowerPC 7441 > (eMAC 800) Running OS 10.2.8 ? > > Friendly > Marc Hey Marc, I don't know about the eMacs being fragile, but it could be they just need a little better cooling. The 7450 version should work on the 7441 fine. Actually both will work okay, but you could test them to see which delivers the best performance. It should be the 7450 version. The best Macintosh team ever. |
Petit Soleil Send message Joined: 17 Feb 03 Posts: 1497 Credit: 70,934 RAC: 0 |
> Hey Marc, > > I don't know about the eMacs being fragile, but it could be they just need a > little better cooling. The 7450 version should work on the 7441 fine. > Actually both will work okay, but you could test them to see which delivers > the best performance. It should be the 7450 version. Thanks Shaktai, I will give it a try then. I'll post the results here. Friendly Marc |
Petit Soleil Send message Joined: 17 Feb 03 Posts: 1497 Credit: 70,934 RAC: 0 |
@Shaktai I have tried both version and it's not working, processed signal 5 error or something like that again. Remember that I am running 10.2.8 and it's probably why. I will stick with SETI 3.10 for now. 10 hours per WU is better than nothing. Friendly Marc -.-. --.- -.. -..- . - --... ...-- .-.-. -.- |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.