A carry-lookahead adder (CLA) computes all the carry bits in parallel directly from the inputs, instead of waiting for them to ripple stage by stage. The trick is to express each carry as a flat function of the inputs by decomposing per-bit behaviour into two simpler signals — generate and propagate.

The motivation: the ripple-carry adder takes gate delays for an -bit add because each stage must wait for the previous stage’s carry. For wide datapaths at high clock rates, that ripple is too slow. Lookahead breaks the dependency chain and brings worst-case carry delay down to when applied hierarchically.

Generate and propagate

For each bit position with operand bits :

  • Generate — bit produces a carry-out regardless of carry-in (both inputs are , so unconditionally).
  • Propagate — bit passes its carry-in through unchanged (exactly one input is , so and adding any carry-in flips that to ).

These two signals capture everything about how stage contributes to the carry chain:

Read it as: stage ‘s carry-out is high if it generates a carry by itself, or if it propagates an incoming carry. The sum bit comes from the propagate signal and the carry-in:

(For the generate-active case where , both , so and the sum bit is — the carry, not the operand sum, is what surfaces.)

0000000
0010001
0100101
0110110
1000101
1010110
1101010
1111011

Flattening the carry chain

Substitute the recurrence into itself to expand each carry as a function of the inputs and the original :

Every carry is a single-level OR of ANDs over the inputs — independent of every other carry. With enough silicon, all four can be computed in parallel from in just three gate delays (XOR for , AND for , OR-of-ANDs for ). That’s constant time, not linear in the bit width — at this scale.

A 4-bit CLA combines:

  • 4 full-adders (one per bit) for the sum and per-bit .
  • A carry-lookahead unit (CLU) computing in parallel using the flattened formulas above.

Group propagate and generate

The CLU also produces group signals that summarise the whole 4-bit block:

— the block as a whole propagates a carry only if every stage propagates.

— the block as a whole generates a carry if any stage generates and all higher stages propagate.

These are the same recurrence applied at the next level up. Cascade four 4-bit CLAs with a higher-level lookahead carry unit (LCU) consuming their group and you get a 16-bit adder with carry latency proportional to levels. Stack again for 64 bits, 256 bits, etc.

-bit hierarchical CLA

Generally for a -bit CLA, four -bit CLAs feed an LCU. The LCU computes group and from the per-block signals, applying the same recurrence at the higher level:

The carry latency grows as — three levels for 64 bits, four for 256.

Cost

The price of delay is more silicon. The CLU’s gate count grows superlinearly because of the wider AND gates needed for the higher-order terms. In practice, designers limit the lookahead block size (4 bits is the textbook unit; 8 occasionally) and use hierarchy for the rest.

AspectRipple-carryCarry-lookahead (hierarchical)
Worst-case carry delay stages levels
Gate count approximately
LayoutTrivially regular, neighbour wiringGroup/level structure, longer wires
Fan-in pressureConstant per stageGrows with block size
When to useLow-speed, narrow operandsWide, high-speed datapaths

Other fast-adder schemes

Carry-lookahead is the textbook example, but not the only fast adder. Brief mention:

  • Carry-skip adder — splits the operand into blocks; if a block has (carries propagate through the entire block), the incoming carry “skips” past the block via a multiplexer. Simpler than full lookahead.
  • Carry-select adder — computes both possible sums for each block (assuming carry-in 0 and carry-in 1), then selects the right one once the carry arrives.
  • Kogge-Stone, Brent-Kung, Han-Carlsonparallel-prefix adders that organise the combination as a tree, optimising different points on the depth/wire/area trade-off. Modern CPU adders are usually parallel-prefix designs.

All of these reduce carry latency below the of ripple, by paying with more gates and more wiring.

In context

For typical undergraduate digital logic, ripple-carry and carry-lookahead are the two canonical adders. Real CPUs use deeply optimised parallel-prefix adders, but the lookahead idea — break the carry dependency by computing separately and combining them in parallel — is the foundational insight all of those build on.

For the cell-level building blocks, see Half-adder and Full-adder. For the comparison adder family, see Adder.