Majority voting (labelling)

Majority voting is the simplest way to combine multiple noisy labels into a single consensus label: ask several annotators to label the same example and take whichever label was assigned most often. If five workers label an image as cat, cat, cat, dog, cat, the majority vote is cat.

The mathematical justification is the law of large numbers. If each annotator has independent probability $p > 0.5$ of giving the correct label, the probability that a majority of $n$ independent annotators all err on the same example shrinks exponentially in $n$ . With enough annotators, the consensus is very nearly correct even when individual annotators are quite unreliable.

The independence assumption is the catch. If annotators share a systematic bias (all trained on the same flawed reference material, all misreading the same kind of edge case) then voting doesn’t help. Their errors are correlated, and the consensus inherits the bias.

Majority voting is the standard aggregation technique for crowdsourced labels. A more sophisticated variant, confidence-weighted labelling, weights each annotator’s vote by how trustworthy they’ve been on prior tasks; an annotator with high gold-standard accuracy gets more weight than one with low accuracy. Active learning goes further and lets the model itself choose which examples to send out for voting, focusing the labelling budget on the most informative cases.

Idriss Rami — Notes

Explorer

Majority voting (labelling)

Graph View

Backlinks