One-hot encoding

One-hot encoding represents one of $n$ states using $n$ bits, with exactly one bit set (“hot”) and the rest zero. State $i$ is $0 \dots 010 \dots 0$ with the $1$ at position $i$ .

This is wasteful in bit count compared to binary (which packs $n$ states into $⌈ lo g_{2} n ⌉$ bits) but it has practical advantages.

Where it’s used

Decoder outputs. A 3-to-8 decoder produces a one-hot 8-bit output: input $5$ → $00100000$ .
Finite state machine encoding. Assign each FSM state to its own bit. Transitioning between states is just clearing one bit and setting another, no decoding needed before checking “are we in state X?”
Address decoding. Memory chip-select lines are one-hot — exactly one chip is selected at a time.
Multiplexer select. Some MUX designs use one-hot select lines (one wire per data input) instead of binary select.

Why it’s faster (sometimes)

To check “is state == 5?” with binary encoding, you need a 3-input AND of select lines (with appropriate inversions). With one-hot, you check whether the bit for state 5 is set, a single wire. No combinational logic required.

The trade-off is wires and flip-flops. A one-hot FSM with $n$ states uses $n$ flip-flops; a binary-encoded FSM uses $⌈ lo g_{2} n ⌉$ . For small FSMs ( $\leq 16$ states) the wire count usually wins on speed, especially in FPGAs where flip-flops are cheap and routing is the bottleneck.

Binary ↔ one-hot

A Decoder converts binary to one-hot: 2 input bits → 4 output lines.

An Encoder (or priority encoder) does the reverse: 4 input lines → 2 binary output bits.

Other encoding choices

Several variants of “compact representation of one of $n$ states”:

Binary: $⌈ lo g_{2} n ⌉$ bits, fully packed. Compact but needs decoding logic.
One-hot: $n$ bits, exactly one set. Decode-free but uses $n$ flip-flops.
One-cold: $n$ bits, exactly one cleared. Same trade-offs as one-hot but with inverted polarity. Used in some legacy systems where the inactive (idle) state should be all-1s.
Gray code: $⌈ lo g_{2} n ⌉$ bits, but adjacent states differ in only one bit. Used in K-maps and to reduce switching activity in FSMs. Same density as binary but with the adjacency property.
Johnson code (twisted-ring counter): $n$ bits cycling through $2 n$ states, transitioning by shifting and inverting one end. Another decode-friendly compact code.

Each has its niche. One-hot dominates for small FSMs in FPGAs because of the speed-and-simplicity advantage.

One-hot in machine learning

The same idea is heavily used in machine learning, where it’s called one-hot encoding of categorical variables. To encode “color = red” out of {red, green, blue}, use the vector $[1, 0, 0]$ . ML models can then process the categorical variable as a numeric input.

The downside is the same as in hardware: dimensionality. A categorical with 1000 categories needs 1000 features. Modern alternatives (embeddings, hashing tricks) compress this back down for high-cardinality categoricals.

When to choose one-hot

The classic use case in digital design: FSM state encoding for FPGAs.

For an FSM with 8 states:

Binary: 3 flip-flops, $\sim 5$ LUTs of decoding logic to identify states.
One-hot: 8 flip-flops, 0 LUTs of decoding (just check the right bit).

In an FPGA, each LUT is roughly the same area as one flip-flop, so the trade is comparable. But the one-hot is faster: no decoding delay before state checks. For small-to-medium FSMs in FPGAs, one-hot wins.

For very large FSMs (50+ states), binary’s compactness eventually wins. Synthesis tools often choose automatically based on the state count.

Idriss Rami — Notes

Explorer