Memory controller

The memory controller sits between the processor and the DRAM modules and runs the protocol for reading and writing memory. It hides DRAM’s complexity (refresh, row/column timing, bank scheduling, multiplexed addressing) behind a simple “read this address, write this address” interface to the rest of the chip.

In modern processors, the memory controller is integrated into the CPU package. In older designs (pre-2008 Intel, for example), it was a separate chip on the motherboard’s “northbridge.”

What it does

For every memory access from the processor:

Decode the high-order address bits to determine which memory module to talk to (asserts the Chip Select signal for that module).
Split the remaining address into row and column halves for DRAM’s multiplexed addressing.
Send the row address with RAS (Row Address Strobe).
Send the column address with CAS (Column Address Strobe).
Wait for the data, transfer it to/from the processor.

For batches of accesses, more cleverness:

Bank parallelism: modern DRAM has multiple internal banks, so the controller can have transactions in flight on several banks at once.
Burst transfers: read a sequence of consecutive addresses with a single command, the data streams out on consecutive clock edges.
Refresh scheduling: interleave refresh cycles with normal reads/writes to keep bandwidth up.
Reordering: group related accesses together to maximize row-buffer hits (consecutive accesses to the same DRAM row are much faster than accesses spanning rows).

DRAM access protocol details

DRAM access has several phases, each with its own timing constraints. The controller manages all of them.

Activate (ACT) phase

Before reading or writing a row, the controller must “open” it, copying the entire row from the DRAM cell array into the row buffer (a row of sense amplifiers). This takes $t_{RC D}$ (RAS-to-CAS delay), typically 10-20 ns.

Once a row is open, accesses to columns within it are fast.

Read/write phase

With the row open, the controller can issue read or write commands to specific columns. Each takes the column-access time $t_{C L}$ (CAS latency), typically 10-15 ns.

Precharge (PRE) phase

Before opening a different row in the same bank, the current row must be closed and the bit lines precharged. This takes $t_{RP}$ (precharge time), typically 10-15 ns.

So accessing a new row in the same bank costs $t_{RP} + t_{RC D} + t_{C L}$ , much more than accessing within an already-open row.

Refresh

DRAM cells lose charge over time. Every row must be refreshed (read and rewritten) every $\sim 64$ ms. The controller schedules refresh commands to interleave with normal accesses, typically refreshing one row every $\sim 7.8$ μs.

Multi-module systems

When a system has multiple DRAM modules (multiple DIMMs on the motherboard), the memory controller decodes high-order address bits to assert the right Chip Select line:

[ chip select bits | local address ]

Each module sees only its share of the address space. The controller routes each access to the appropriate module, transparently to the processor.

For more bandwidth, modern controllers interleave addresses across modules: successive cache lines go to different DIMMs, allowing parallel access. Needs the right BIOS configuration, but doubles bandwidth in the best case.

Multi-channel architecture

Modern systems have multiple memory channels, independent paths from the controller to DRAM, each with its own command/address/data wires. Common configurations:

Single channel: 64-bit data path, simplest, lowest bandwidth.
Dual channel: two 64-bit paths in parallel, doubling theoretical bandwidth.
Quad channel: four 64-bit paths, used in workstations and servers.
Eight-channel: server-class chips for memory-bandwidth-hungry workloads.

To use multi-channel, install DIMMs in matched pairs (or sets) in the right slots. The BIOS and motherboard manual specify which.

Scheduling policies

The controller chooses which pending request to service next. Common policies:

First-come-first-served (FCFS): simple, fair, but doesn’t exploit row-buffer locality.
First-ready-first-come-first-served (FR-FCFS): prioritize requests that hit the open row, fall back to FCFS otherwise. Good for typical workloads.
Bank-aware scheduling: track which banks are busy, prefer requests to idle banks.
Quality-of-service (QoS) scheduling: ensure low-latency for high-priority requesters (e.g., GPU vs. background tasks).

The right policy depends on the workload mix. Modern controllers use adaptive heuristics that switch between policies.

Why it lives near the CPU

Latency is the killer. Every nanosecond of round-trip time costs the processor cycles. Putting the controller on the CPU die:

Eliminates the chip-to-chip hop that an external northbridge required.
Lets the controller share clock and protocols with the CPU.
Makes integrated GPUs and other on-chip subsystems easier to feed.

Intel introduced an on-die memory controller with Nehalem (2008); AMD did so earlier with the Athlon 64 (2003). Today every consumer CPU has its memory controller on-chip.

ECC

Server-class controllers support ECC (Error-Correcting Code) memory: each 64-bit word is stored alongside 8 extra bits computed as a SECDED (Single Error Correct, Double Error Detect) Hamming code. The controller checks the code on every read, corrects a single flipped bit, and signals a machine-check exception on an uncorrectable double-bit error. Soft errors from cosmic-ray-induced bit flips happen often enough at datacenter scale that running without ECC is considered negligent for server workloads. Consumer DDR4/DDR5 increasingly ships with on-die ECC for internal cell errors even when the bus itself isn’t ECC-protected.

Idriss Rami — Notes

Explorer