Binary multiplication computes the product of two binary numbers — the digital analog of long multiplication. A 1-bit multiplication is just a logical AND: is only when both and are . Multi-bit multiplication generates partial products by ANDing each bit of the multiplier with the entire multiplicand, then shifting and summing.
| 0 | 0 | |
| 0 | 1 |
An -digit multiplicand times an -digit multiplier produces a result up to digits wide.
Long multiplication
Same procedure as decimal long multiplication. For each bit of the multiplier, write down the multiplicand (if the bit is ) or zero (if the bit is ), shifted left by the bit’s position. Sum all the partial products.
For example, (i.e., ):
Each partial product is the multiplicand shifted left by the multiplier bit’s position when that bit is , or zero when it is (written right-aligned, so columns line up by place value):
The four partial products come from multiplier bits (LSB first): ; shifted left 1; shifted left 2; shifted left 3. Their sum is .
Shift = power-of-2 multiply
Multiplying a number by shifts it left by bits. Dividing by shifts right. So binary multiplication of arbitrary numbers reduces to:
- Shifts (free in hardware).
- ANDs (one gate per bit pair).
- Adds (need adders).
Signed 2’s complement multiplication
For 2’s complement operands, the multiplicand, multiplier, and all partial products must be sign-extended to the product bit width . Then perform the same multiplication. The result is read as a 2’s complement number of width .
Sign extension is required because each partial product is conceptually a multiplication by a 2’s-complement digit; without extending, the high bits get garbage.
Multiplying by a positive value gives a shorter computation (fewer non-zero partial products to sum).
Implementation: adder tree (unsigned)
A simple, slow approach. Generate partial product bits with AND gates, then sum them with a tree of adders. For an multiplier:
- AND gates.
- adders, each bits wide.
- The first adder gets a zero input alongside the first partial product (the LSB of the first partial product is read directly as part of the result; the rest gets summed).
Slow because adders in series form a long propagation path.
Implementation: iterative array (unsigned, faster)
Each cell of the array handles one bit position of one partial product, including the local AND, the adder, and carry propagation. The array is regular — the same cell layout repeats times.
The first row gets zero as the incoming partial product sum. The rightmost cells get zero as carry-in.
This is the “carry-save adder array” approach: instead of fully propagating each partial product’s carries before summing the next, save them and propagate all at the end. Much faster than the adder tree.
Implementation: Baugh-Wooley (signed 2’s complement)
For 2’s complement multiplication, the Baugh-Wooley algorithm modifies the iterative array. Most cells are the same as the unsigned version; some cells use the complement of the AND (replacing AND with NAND) at positions XOR (one MSB, not both).
Cells in the first column and first row accept zero for the sum input; first row also accepts zero for carry-in. The final row uses full adders with one constant input to handle the sign-extension corrections.
The result handles negative operands without explicit sign extension, by encoding the corrections directly into the cell logic.
Trade-offs
| Implementation | Speed | Hardware | Use |
|---|---|---|---|
| Adder tree | Slow | Many adders in series | Simple, low transistor count |
| Iterative array | Fast | Many cells, more silicon | Standard for unsigned multiplication |
| Baugh-Wooley | Fast | Modified iterative array | Standard for signed multiplication |
| Booth’s algorithm (not covered here) | Faster | More complex | Used in modern processors |
Modern CPUs use even more sophisticated techniques (Wallace tree, Booth recoding) to compress the partial-product summing time. For the underlying Adder used as the building block, see that note.