Message boards :
Number crunching :
I guess the burn-in is over?
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
AlphaLaser Send message Joined: 6 Jul 03 Posts: 262 Credit: 4,430,487 RAC: 0 |
since for example the L3 did not exist on most x86 machines until recently. Interesting you note the K6-III. L3 was also used on some high-end Xeons and very large amounts are used on Itanium's. With AMD's new Phenoms and the Core i5/i7, L3's is becoming more of a "standard" feature and not just exclusive to enthusiast parts like the P4EE or server chips. |
Raistmer Send message Joined: 16 Jun 01 Posts: 6325 Credit: 106,370,077 RAC: 121 |
For correctness: Current AKv8 build uses software prefetch. For example: " // prefetch 1 loop iteration ahead _mm_prefetch((char *) (d+16), _MM_HINT_NTA); " in v_vpChirpData(). Current AP build doesn't use software prefetch. I tried to use some block prefetch in prev builds but it seems that method was AMD specific and could give some speedup mostly on older AMD chips like Athlon XP. Current CPUs have both hardware and software prefetch (pre-load data from memory to cache). |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
Here is a link that describes a bit more. I'm inferring from this and from Intel's instruction set description that you can use prefetch to notify the cpu that you will want to use a line of cache, but you get no guarantee it will be cached when you use it. Noticed that a different argument is used depending on whether your data will be integer or floating point. It seems that the FPU's draw their data directly from L2, which surprised me. But I know less than I think I know, I'm sure. |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
Here is a link that describes a bit more. I'm inferring from this and from Intel's instruction set description that you can use prefetch to notify the cpu that you will want to use a line of cache, but you get no guarantee it will be cached when you use it. Noticed that a different argument is used depending on whether your data will be integer or floating point. It seems that the FPU's draw their data directly from L2, which surprised me. But I know less than I think I know, I'm sure. The link points into the Intel Compiler documentation, which says this is a subroutine -- meaning more than one instruction. It'd be interesting to see what is in the actual subroutine. |
AlphaLaser Send message Joined: 6 Jul 03 Posts: 262 Credit: 4,430,487 RAC: 0 |
Here's the docs on available prefetch instructions specified by SSE: - prefetcht0 - prefetcht1 - prefetcht2 - prefetchnta |
PhonAcq Send message Joined: 14 Apr 01 Posts: 1656 Credit: 30,658,217 RAC: 1 |
It appears that Ned is rusty on intrinsics. Does anybody have any assembly code to demonstrate the prefetch instruction? |
1mp0£173 Send message Joined: 3 Apr 99 Posts: 8423 Credit: 356,897 RAC: 0 |
It appears that Ned is rusty on intrinsics. Does anybody have any assembly code to demonstrate the prefetch instruction? My applications rarely have patterns that would benefit from triggering a prefetch. My code also needs to run on machines that don't have SSE (my applications do not benefit), so it's not anything I would have used. There are some really good examples in the Intel documentation, including examples where the prefetch would actually hurt. Most are in C which is pretty close to Assembly. Then there are Cache Oblivious algorithms, which is incredibly interesting. Wouldn't surprise me to find out that IPP uses the prefetch instructions. FFTW probably doesn't. |
©2024 University of California
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.