Line 0 Line 1 Line 127 Mem 0x-0x1F Mem 0x20-0x3F Mem 0x7E0-0x7FF Mem 0x800-0x81F Mem 0x820-0x83F Mem 0x0FE0-0x0FFF Cache SDRAM
Two-way cache (L1D)
Line 0A Line 1A Line 63A Line 0B Line 1B Line 63B Mem 0x-0x1F Mem 0x20-0x3F Mem 0x7E0-0x7FF Mem 0x800-0x81F Mem 0x820-0x83F Mem 0x0FE0-0x0FFF
L1D cache
Tag Set index Offset 0 4 5 10 11 31 L1D address allocation: A new line of 32bytes is loaded on a read-miss with penalty 4 clock-cycles.
If two words are loaded per clock-cycle (reading sequentially from a memory segment) the overhead is 8/32*4=1clock-cykle per instruction cycle.
A write-miss doesn’t lead to a loading of a new-line. A write buffer of four words handle up to four misses without penalty.
cache_miss_example
main.c: Illustrates impact of L1D write and read misses (compulsory misses).
main2.c: Illustrates the problem with several data objects in the same set (thrashing)
Two data objects are in the same set if:
Aa = K*2048+ Ab,
for some address Aa and Ab in Object A or B respectively, and for some K.
Two code objects are in the same set if:
Aa = K*4096+ Ab,
for some address Aa and Ab in Object A or B respectively, and for some K.
Läs direkt ur commentarerna till main.c och main2.c
What to consider when programming to make good use of the cache
Align all data buffers on 32byte boundaries. (#pragma DATA_ALIGN).
Avoid to allocate more than two objects that map to the same set in the same algorithm.
Avoid having two or more computationally complex algorithms that map to the same set.
Profile the algorithms with and without cached data and program (see cache_miss_example).
Force caching of important data and code before starting the realtime program starts (e.g in appl_Init() for EDMA_RTDX_GPIO) by reading the data and calling the functions.
Test processing data in smaller buffers to see if performance improves.
Comments