Hey Who, lets discuss code

Message boards : Number crunching : Hey Who, lets discuss code
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 424423 - Posted: 21 Sep 2006, 19:28:01 UTC

Regarding your idea of splitting the chirped complex data into simd size chunks...apparently the Intel IPP library is already doing something like that.

This is from a profile disassembly of a run WU. Intel library calls a function 'cFft_BlkSplit' which does the following:
w7_ipps_cFft_BlkSplit_32fc+200:
    movaps  xmm0,[edi]
    movaps  xmm1,[edi+10h]
    movaps  xmm2,[edi+20h]
    movaps  xmm3,[edi+30h]
    add     edi,40h
    movaps  xmm4,xmm0
    unpckhps xmm4,xmm1
    movaps  xmm5,xmm2
    unpcklps xmm2,xmm3
    unpckhps xmm5,xmm3
    movaps  [edx+esi],xmm0
    movaps  [edx+esi+10h],xmm4
    movaps  [edx+esi+20h],xmm2
    movaps  [edx+esi+30h],xmm5
    add     edx,40h
    sub     eax,08h
    jnle    $-3dh (0x4adf78)
ID: 424423 · Report as offensive
Profile mikey
Volunteer tester
Avatar

Send message
Joined: 17 Dec 99
Posts: 4215
Credit: 3,474,603
RAC: 0
United States
Message 424758 - Posted: 22 Sep 2006, 11:37:52 UTC - in response to Message 424423.  
Last modified: 22 Sep 2006, 11:38:55 UTC

Regarding your idea of splitting the chirped complex data into simd size chunks...apparently the Intel IPP library is already doing something like that.

This is from a profile disassembly of a run WU. Intel library calls a function 'cFft_BlkSplit' which does the following:
w7_ipps_cFft_BlkSplit_32fc+200:    

Your post shortened to save space; You might want to find Chicken and discuss the code with him/her. They are doing the current 'optimized' versions of the software and should be able to discuss this with you.

ID: 424758 · Report as offensive
Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 424791 - Posted: 22 Sep 2006, 12:48:44 UTC

Yeah well, we are ;) Ben has been part of that work for a while, mikey.

Let him be :o)
Regards,
Simon.
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information
ID: 424791 · Report as offensive
Profile Clyde C. Phillips, III

Send message
Joined: 2 Aug 00
Posts: 1851
Credit: 5,955,047
RAC: 0
United States
Message 424934 - Posted: 22 Sep 2006, 19:00:52 UTC - in response to Message 424758.  


Your post shortened to save space; You might want to find Chicken and discuss the code with him/her. They are doing the current 'optimized' versions of the software and should be able to discuss this with you.


I don't think Chicken is a girl. Now if his name were Simona or Simone that would probably be a different story. I think he is a young (and very important to this project) man. Maybe about the age of one of my nephews, if I remember correctly.

ID: 424934 · Report as offensive
Profile KWSN - Chicken of Angnor
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 9 Jul 99
Posts: 1199
Credit: 6,615,780
RAC: 0
Austria
Message 425010 - Posted: 22 Sep 2006, 21:53:21 UTC

31, to be exact, and male :o)

Regards,
Simon.
Donate to SETI@Home via PayPal!

Optimized SETI@Home apps + Information
ID: 425010 · Report as offensive
Profile Benher
Volunteer developer
Volunteer tester

Send message
Joined: 25 Jul 99
Posts: 517
Credit: 465,152
RAC: 0
United States
Message 425616 - Posted: 24 Sep 2006, 7:30:15 UTC

For me, the di is going in the range of from 3900 to 5500, I guess it is based on the work load.

It is about 20KByte.

there is actually a way to "group" all the pass all together in 1 pass, I ll be giving away the code in November.


I wrote a few sampler lines of code in find_pulse. Let it crunch WU #2 for about 2 minutes, then captured the output...

The number in the [number] brackets is the number of times the find_pulse was called with a given length. The number outside the bracket is the length.

So as you can see, find_pulse was called 15million times with lenght of 17, and 4 million times with length 33, and so on. This was only during a 2 minute run, so multiply these results by a few hours.

Short are FAR more common.

The new relase of chicken should be around 25-30% faster soon.

Typical calling length values taken from a WU run: LENGTH[ times used ]
-- 16913[ 105] -- 8456[ 225] -- 4228[ 465] -- 2114[ 945]
-- 1057[ 5715] -- 529[19125] -- 264[68,985] -- 132[260,865]
-- 66[1,043,970] -- 33[4,118,610] -- 17[15,480,990] --
Which lengths cause the following di length value in sum2 and sum3
--- Length = 132
pulse tbl3: (di: 44) [ 132]= 0, 44, 88
pulse tbl2: (di: 22) [ 198]= 132, 154,
pulse tbl2: (di: 11) [ 231]= 198, 209,
pulse tbl2: (di: 5) [ 247]= 231, 237,

--- Length = 66
pulse tbl3: (di: 22) [ 66]= 0, 22, 44
pulse tbl2: (di: 11) [ 99]= 66, 77,
pulse tbl2: (di: 5) [ 115]= 99, 105,

--- Length = 33
pulse tbl3: (di: 11) [ 33]= 0, 11, 22
pulse tbl2: (di: 5) [ 49]= 33, 39,

--- Length = 17
pulse tbl3: (di: 5) [ 17]= 0, 6, 11
ID: 425616 · Report as offensive
Profile ML1
Volunteer moderator
Volunteer tester

Send message
Joined: 25 Nov 01
Posts: 21731
Credit: 7,508,002
RAC: 20
United Kingdom
Message 425666 - Posted: 24 Sep 2006, 12:42:27 UTC

OK, I'm jumping in 99% blind here 'cos I ain't looked at the code!...

Are you rewriting the chirp routines to make better use of SIMD for small chunks?

Could that be extended so that you also make better use of L1 cache and then L2 cache for more of the range??


The 30% speedup sounds very interesting!... :-)

Happy hackings,

Regards,
Martin
See new freedom: Mageia Linux
Take a look for yourself: Linux Format
The Future is what We all make IT (GPLv3)
ID: 425666 · Report as offensive

Message boards : Number crunching : Hey Who, lets discuss code


 
©2025 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.