Binary Multiplication

Binary multiplication computes the product of two binary numbers — the digital analog of long multiplication. A 1-bit multiplication is just a logical AND: $a \times b$ is $1$ only when both $a$ and $b$ are $1$ . Multi-bit multiplication generates partial products by ANDing each bit of the multiplier with the entire multiplicand, then shifting and summing.

$a \times b$	$b = 0$	$b = 1$
$a = 0$	0	0
$a = 1$	0	1

An $m$ -digit multiplicand times an $n$ -digit multiplier produces a result up to $m + n$ digits wide.

Long multiplication

Same procedure as decimal long multiplication. For each bit of the multiplier, write down the multiplicand (if the bit is $1$ ) or zero (if the bit is $0$ ), shifted left by the bit’s position. Sum all the partial products.

For example, $1011 \times 0110$ (i.e., $11 \times 6 = 66$ ):

Each partial product is the multiplicand shifted left by the multiplier bit’s position when that bit is $1$ , or zero when it is $0$ (written right-aligned, so columns line up by place value):

1011 \times 0110 00001011010110000000001000010

The four partial products come from multiplier bits $0, 1, 1, 0$ (LSB first): $1011 \times 0$ ; $1011 \times 1$ shifted left 1; $1011 \times 1$ shifted left 2; $1011 \times 0$ shifted left 3. Their sum is $100001 0_{2} = 66$ .

Shift = power-of-2 multiply

Multiplying a number by $2^{k}$ shifts it left by $k$ bits. Dividing by $2^{k}$ shifts right. So binary multiplication of arbitrary numbers reduces to:

Shifts (free in hardware).
ANDs (one gate per bit pair).
Adds (need adders).

Signed 2’s complement multiplication

For 2’s complement operands, the multiplicand, multiplier, and all partial products must be sign-extended to the product bit width $m + n$ . Then perform the same multiplication. The result is read as a 2’s complement number of width $m + n$ .

Sign extension is required because each partial product is conceptually a multiplication by a 2’s-complement digit; without extending, the high bits get garbage.

Multiplying by a positive value gives a shorter computation (fewer non-zero partial products to sum).

Implementation: adder tree (unsigned)

A simple, slow approach. Generate $m \times n$ partial product bits with AND gates, then sum them with a tree of adders. For an $m \times n$ multiplier:

$m \times n$ AND gates.
$n - 1$ adders, each $m$ bits wide.
The first adder gets a zero input alongside the first partial product (the LSB of the first partial product is read directly as part of the result; the rest gets summed).

Slow because $n - 1$ adders in series form a long propagation path.

Implementation: iterative array (unsigned, faster)

Each cell of the array handles one bit position of one partial product, including the local AND, the adder, and carry propagation. The array is regular — the same cell layout repeats $m \times n$ times.

The first row gets zero as the incoming partial product sum. The rightmost cells get zero as carry-in.

This is the “carry-save adder array” approach: instead of fully propagating each partial product’s carries before summing the next, save them and propagate all at the end. Much faster than the adder tree.

Implementation: Baugh-Wooley (signed 2’s complement)

For 2’s complement multiplication, the Baugh-Wooley algorithm modifies the iterative array. Most cells are the same as the unsigned version; some cells use the complement of the AND (replacing AND with NAND) at positions $i = n - 1$ XOR $j = n - 1$ (one MSB, not both).

Cells in the first column and first row accept zero for the sum input; first row also accepts zero for carry-in. The final row uses full adders with one constant input to handle the sign-extension corrections.

The result handles negative operands without explicit sign extension, by encoding the corrections directly into the cell logic.

Trade-offs

Implementation	Speed	Hardware	Use
Adder tree	Slow	Many adders in series	Simple, low transistor count
Iterative array	Fast	Many cells, more silicon	Standard for unsigned multiplication
Baugh-Wooley	Fast	Modified iterative array	Standard for signed multiplication
Booth’s algorithm (not covered here)	Faster	More complex	Used in modern processors

Modern CPUs use even more sophisticated techniques (Wallace tree, Booth recoding) to compress the partial-product summing time. For the underlying Adder used as the building block, see that note.

Idriss Rami — Notes

Explorer