I guess the burn-in is over?

Message boards : Number crunching : I guess the burn-in is over?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile AlphaLaser
Volunteer tester

Send message
Joined: 6 Jul 03
Posts: 262
Credit: 4,430,487
RAC: 0
United States
Message 902759 - Posted: 2 Jun 2009, 3:27:57 UTC - in response to Message 902719.  

since for example the L3 did not exist on most x86 machines until recently.


I believe the first appearance of L3 cache on the x86 architecture is when the K6-III was introduced with having L2 cache built into the processor, the L2 cache on the motherboard became the L3 cache. The first sighting of L3 cache on Intel chips as far as I can remember was the Pentium 4 Extreme Edition on the Socket 478 platform. These chips had 16K L1, 512K L2 and 2MB L3.

Back then, no additional coding needed to be done to existing applications to take advantage of the third level cache.


Interesting you note the K6-III. L3 was also used on some high-end Xeons and very large amounts are used on Itanium's. With AMD's new Phenoms and the Core i5/i7, L3's is becoming more of a "standard" feature and not just exclusive to enthusiast parts like the P4EE or server chips.
ID: 902759 · Report as offensive
Profile Raistmer
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 16 Jun 01
Posts: 6325
Credit: 106,370,077
RAC: 121
Russia
Message 903036 - Posted: 3 Jun 2009, 0:58:17 UTC

For correctness:

Current AKv8 build uses software prefetch.
For example:
"
// prefetch 1 loop iteration ahead
_mm_prefetch((char *) (d+16), _MM_HINT_NTA);
"
in v_vpChirpData().

Current AP build doesn't use software prefetch.
I tried to use some block prefetch in prev builds but it seems that method was AMD specific and could give some speedup mostly on older AMD chips like Athlon XP.

Current CPUs have both hardware and software prefetch (pre-load data from memory to cache).
ID: 903036 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 903348 - Posted: 3 Jun 2009, 22:41:36 UTC

Here is a link that describes a bit more. I'm inferring from this and from Intel's instruction set description that you can use prefetch to notify the cpu that you will want to use a line of cache, but you get no guarantee it will be cached when you use it. Noticed that a different argument is used depending on whether your data will be integer or floating point. It seems that the FPU's draw their data directly from L2, which surprised me. But I know less than I think I know, I'm sure.
ID: 903348 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 903353 - Posted: 3 Jun 2009, 22:48:39 UTC - in response to Message 903348.  

Here is a link that describes a bit more. I'm inferring from this and from Intel's instruction set description that you can use prefetch to notify the cpu that you will want to use a line of cache, but you get no guarantee it will be cached when you use it. Noticed that a different argument is used depending on whether your data will be integer or floating point. It seems that the FPU's draw their data directly from L2, which surprised me. But I know less than I think I know, I'm sure.

The link points into the Intel Compiler documentation, which says this is a subroutine -- meaning more than one instruction.

It'd be interesting to see what is in the actual subroutine.
ID: 903353 · Report as offensive
Profile AlphaLaser
Volunteer tester

Send message
Joined: 6 Jul 03
Posts: 262
Credit: 4,430,487
RAC: 0
United States
Message 903472 - Posted: 4 Jun 2009, 3:41:05 UTC

Here's the docs on available prefetch instructions specified by SSE:

- prefetcht0
- prefetcht1
- prefetcht2
- prefetchnta
ID: 903472 · Report as offensive
PhonAcq

Send message
Joined: 14 Apr 01
Posts: 1656
Credit: 30,658,217
RAC: 1
United States
Message 903715 - Posted: 4 Jun 2009, 20:17:44 UTC - in response to Message 903472.  

It appears that Ned is rusty on intrinsics. Does anybody have any assembly code to demonstrate the prefetch instruction?
ID: 903715 · Report as offensive
1mp0£173
Volunteer tester

Send message
Joined: 3 Apr 99
Posts: 8423
Credit: 356,897
RAC: 0
United States
Message 903741 - Posted: 4 Jun 2009, 21:42:55 UTC - in response to Message 903715.  

It appears that Ned is rusty on intrinsics. Does anybody have any assembly code to demonstrate the prefetch instruction?

My applications rarely have patterns that would benefit from triggering a prefetch. My code also needs to run on machines that don't have SSE (my applications do not benefit), so it's not anything I would have used.

There are some really good examples in the Intel documentation, including examples where the prefetch would actually hurt. Most are in C which is pretty close to Assembly.

Then there are Cache Oblivious algorithms, which is incredibly interesting.

Wouldn't surprise me to find out that IPP uses the prefetch instructions. FFTW probably doesn't.
ID: 903741 · Report as offensive
Previous · 1 · 2

Message boards : Number crunching : I guess the burn-in is over?


 
©2024 University of California
 
SETI@home and Astropulse are funded by grants from the National Science Foundation, NASA, and donations from SETI@home volunteers. AstroPulse is funded in part by the NSF through grant AST-0307956.