Cache locality

Cache locality is the empirical observation that programs don’t access memory uniformly at random. They tend to revisit recent locations and step through nearby ones. Caches exploit this with two behaviors:

Temporal locality: recently accessed data is likely to be accessed again soon.
Spatial locality: data near a recently accessed address is likely to be accessed soon.

Without locality, caches wouldn’t work, every access would miss. With it, even tiny caches catch most accesses.

Temporal locality

A loop references the same handful of variables over and over. Loop counter, loop bounds, accumulator: all hot. They get loaded into registers if possible, but anything that doesn’t fit goes to cache. The first miss pays the slow main-memory cost; every subsequent reference hits.

Examples:

Loop counters: i accessed every iteration.
Frequently called subroutines: their instructions stay hot in the instruction cache.
Stack frames: SP and FP keep the same range of stack addresses hot.
Recently allocated objects (in heap-using languages): “young” objects tend to be accessed more than “old” ones.

The cache exploits temporal locality with replacement policies like LRU (Least Recently Used): evict the block that hasn’t been touched in the longest time, betting older blocks are less likely to be reused.

Spatial locality

When a program touches address $X$ , it often touches $X + 4$ next (the next array element, the next field of a struct, the next instruction in straight-line code). The cache exploits this by fetching a whole block on a miss, not just the requested word.

A block is typically 32 to 64 bytes. Ask for one word and miss, and the cache pulls in 8 to 16 surrounding words too. Subsequent accesses to those words hit “for free.”

Examples:

Sequential code execution: instructions live consecutively in memory, one fetch brings in many.
Array traversal: for (i = 0; i < n; i++) sum += a[i]; walks linearly through memory.
Struct field access: obj.field1 and obj.field2 hit adjacent memory.

When locality breaks

Programs with poor locality run much slower than their algorithmic complexity would predict:

Random pointer chasing (linked lists, hash tables): each “next” pointer jumps to an unpredictable address, so each load might miss.
Large strided access: in for (i = 0; i < n; i += 1024) sum += a[i]; each access lands in a different block. Every load misses unless the cache is huge.
Cache thrashing: accessing slightly more data than fits in cache, with each new access evicting one you’ll soon need again. Hit rate plummets.

This is why “cache-friendly” data structures (arrays of structs vs. structs of arrays, depending on access pattern) and algorithms (blocking matrix multiply, B-trees) are so much faster than naive equivalents on real hardware.

Why both kinds matter

Different cache design choices target different localities:

Larger cache size helps temporal locality (more old data fits, longer until eviction).
Larger block size helps spatial locality (more surrounding data prefetched per miss).
Higher associativity reduces conflict misses, helping with mixed access patterns.
Multiple cache levels: L1 catches temporal hits (small, fast); L2/L3 catch the spatial sweep across larger working sets.

For how the cache maps addresses to its internal storage, see Cache address mapping.

Idriss Rami — Notes

Explorer

Cache locality

Temporal locality

Spatial locality

When locality breaks

Why both kinds matter

Graph View

Table of Contents

Backlinks