MESI protocol

The MESI protocol keeps the private caches of multiple cores agreeing on the value of every shared memory location. Each cache line carries one of four states, Modified, Exclusive, Shared, Invalid (the M-E-S-I that name it), and transitions between them in response to local reads/writes and snooped bus traffic.

Why it has to exist: in a multicore CPU each core has its own L1 (and often L2). If two cores both cache memory address $X$ and one writes, the other’s copy is now stale. Without coherence, threads on different cores would see different values for the same address, and every shared variable would behave incorrectly.

The four states

Each cache line in each core’s cache is in exactly one of these states:

State	Other caches	Memory consistent?	Can read locally?	Can write locally?
Modified (M)	Invalid	No (cache has the only fresh copy)	Yes	Yes
Exclusive (E)	Invalid	Yes	Yes	Yes (becomes M)
Shared (S)	Possibly Shared	Yes	Yes	No (must invalidate others first)
Invalid (I)	Any state	N/A	No (must fetch)	No

Modified: this cache is the only one with the line, and the line has been written since it was loaded. Memory is stale.
Exclusive: this cache is the only one with the line, but it hasn’t been written; memory still matches. The line is “clean” and exclusive.
Shared: this line exists clean in this cache and possibly others. Memory is up to date. Reads are local; writes need to invalidate other copies first.
Invalid: not in this cache (or was evicted/invalidated). Any access misses.

State transitions

The state changes both on local actions (the CPU reads or writes the line) and on snooped actions (another core’s bus traffic involving the same line).

For each starting state and each event, the transition is fixed:

Local read.

I → S or E. If snoop says no other cache has it, go to E. Otherwise go to S.
S, E, M → unchanged. Read locally; no bus traffic.

Local write.

I → M. Issue a “read-for-ownership” (RFO) on the bus, which invalidates other caches’ copies, then write.
S → M. Issue an “invalidate” on the bus, then write locally.
E → M. Just write; no bus traffic needed (we already had exclusive ownership).
M → M. Just write; we already own it dirty.

Snoop: another cache reads this line.

I, S → unchanged.
E → S. Send the data to the requesting core, downgrade self to Shared.
M → S. Send the data, write back to memory (so memory is fresh), downgrade to Shared.

Snoop: another cache writes this line (RFO).

I → unchanged.
S, E → I. The other cache is now the owner; we invalidate.
M → I. Send the data first (so it’s not lost), then invalidate.

Snooping vs directory

The transitions above describe a snooping protocol: every cache watches all bus traffic, and each cache decides on its own which transitions to make based on what it sees and what state its lines are in. Snooping works on a shared bus where every transaction is visible to every cache, typical for a few cores on one chip.

For larger systems (dozens of cores, or multiple chips), the bus saturates. The alternative is a directory-based protocol: a central directory tracks which caches hold which lines, and only sends invalidations/updates to those caches. More complex but scales further. AMD’s HyperTransport and Intel’s QPI use directory-based coherence at the inter-socket level even when individual sockets snoop internally.

MSI: three-state predecessor (no Exclusive). Every read of a line not present forces it to Shared, even if no other cache has it. The first write then takes a bus transaction to invalidate (no other cache has the line, but the protocol doesn’t know that). MESI’s Exclusive state lets a read-then-write sequence avoid the redundant invalidation.
MOESI: adds Owner, like Modified (this cache has the only writable copy and memory is stale) but also like Shared (other caches may have read-only copies). Lets a Modified line be shared on a read without immediately writing back. AMD x86 uses MOESI.
MESIF: adds Forward. In a Shared situation, exactly one cache is designated F (forwarder) and supplies the data on a snoop, avoiding multiple caches racing to respond. Intel x86 uses MESIF.

Every extra state means a bigger state machine with more transitions to get right, the price for shaving off bus traffic.

Why this is hard

Coherence interacts with the rest of the memory system:

False sharing. Two cores write to different variables that happen to share a cache line. Each write invalidates the other core’s copy. The variables aren’t logically shared, but the cache-line granularity makes them behave as if they were. The fix is padding to put them on different lines.
Coherence misses. A “fourth C” beyond conflict: the line was here, but another core’s write invalidated it. Adds latency that doesn’t show up in single-threaded miss-rate analysis.
Bus traffic. Every write to a Shared line costs an invalidate broadcast. Heavy writes to shared data can saturate the interconnect even at low miss rates.
Consistency models. Coherence guarantees that all cores eventually agree on each individual line. It says nothing about the order in which a thread’s writes become visible to other threads; that’s the memory consistency model (sequential consistency, total store order, release-acquire, etc.). x86 is TSO; ARM is weaker. Memory barriers explicitly enforce ordering.

Idriss Rami — Notes

Explorer

MESI protocol

The four states

State transitions

Snooping vs directory

Why this is hard

Graph View

Table of Contents

Backlinks

Idriss Rami — Notes

Explorer

MESI protocol

The four states

State transitions

Snooping vs directory

MESI vs related protocols

Why this is hard

Graph View

Table of Contents

Backlinks