Cache locality is the empirical observation that programs don’t access memory uniformly at random — they tend to revisit recent locations and step through nearby ones. Caches exploit this with two specific behaviors:
- Temporal locality — recently accessed data is likely to be accessed again soon.
- Spatial locality — data near a recently accessed address is likely to be accessed soon.
Without locality, caches wouldn’t work — every access would miss. With locality, even tiny caches catch most accesses.
Temporal locality
A program loop typically references the same handful of variables over and over. The loop counter, the loop bounds, the accumulator — all hot. They get loaded into registers if possible, but anything that doesn’t fit goes to cache. The first miss pays the slow main-memory cost; every subsequent reference hits.
Examples:
- Loop counters:
iaccessed every iteration. - Frequently called subroutines: their instructions stay hot in the instruction cache.
- Stack frames: SP and FP keep the same range of stack addresses hot.
- Recently allocated objects (in heap-using languages): “young” objects tend to be accessed more than “old” ones.
The cache exploits temporal locality with replacement policies like LRU (Least Recently Used) — evict the block that hasn’t been touched in the longest time, on the bet that older blocks are less likely to be reused.
Spatial locality
When a program touches address , it often touches next (the next array element, the next field of a struct, the next instruction in straight-line code). The cache exploits this by fetching a whole block on a miss, not just the requested word.
A block is typically 32 to 64 bytes. When you ask for one word and miss, the cache pulls in 8 to 16 surrounding words too. Subsequent accesses to those words hit “for free.”
Examples:
- Sequential code execution — instructions live consecutively in memory; one instruction fetch brings in many.
- Array traversal —
for (i = 0; i < n; i++) sum += a[i];walks linearly through memory. - Struct field access — accessing
obj.field1andobj.field2hits adjacent memory.
When locality breaks
Programs with poor locality run much slower than their algorithmic complexity would predict:
- Random pointer chasing (linked lists, hash tables) — each “next” pointer jumps to an unpredictable address. Each load might miss.
- Large strided access —
for (i = 0; i < n; i += 1024) sum += a[i];— each access lands in a different block. Every load misses unless the cache is huge. - Cache thrashing — accessing slightly more data than fits in cache, with each new access evicting one that you’ll soon need again. Hit rate plummets.
This is why “cache-friendly” data structures (arrays of structs vs. structs of arrays, depending on access pattern) and algorithms (blocking matrix multiply, B-trees) are so much faster than naive equivalents on real hardware.
Why both kinds matter
Different cache design choices target different localities:
- Larger cache size helps temporal locality (more old data fits, longer until eviction).
- Larger block size helps spatial locality (more surrounding data prefetched per miss).
- Higher associativity reduces conflict misses, helping with mixed access patterns.
- Multiple cache levels — L1 catches temporal hits (small, fast); L2/L3 catch the spatial sweep across larger working sets.
Tuning these for a target workload is what cache designers do. For a programmer, the practical takeaway is: write code that touches related data close together in time, and place related data close together in memory. The hardware will reward you.
For how the cache actually maps addresses to its internal storage, see Cache address mapping.