An artifact is a pattern in a signal that looks like a feature but isn’t real, caused by something other than the phenomenon being measured. A spike on an ECG trace caused by the patient briefly touching the lead wire is an artifact: it looks like a heartbeat but isn’t one. The reading reflects external interference, not the underlying cardiac signal.
Artifact vs. noise. Noise obscures real features; it adds randomness on top of the signal, making the real pattern harder to see. An artifact mimics a feature, creating a pattern in the data that looks real but doesn’t reflect what we wanted to measure. In the literature the terms are often used interchangeably, but technically:
- Noise hides features. Smoothing reduces it.
- Artifacts mimic features. Smoothing can hide them but doesn’t actually fix the underlying mistake.
Common artifact sources:
- Electrode movement in an ECG or EEG recording (a touch, a stretch, a lead becoming loose) produces brief spikes or step changes that look like real signal but aren’t.
- Power-line hum at 50 or 60 Hz (depending on the country) appears as a periodic ripple, easily mistaken for a real periodic signal.
- Saturation when a signal exceeds the sensor’s dynamic range produces flat plateaus at the maximum or minimum value, clearly artifacts because the real signal is varying behind them.
- Quantization when the sensor’s resolution is too coarse for the signal’s amplitude: the recording looks stepped rather than smooth.
- Aliasing when the sampling rate is too low for the signal’s frequency content: real high-frequency content gets mirrored into the low-frequency band as a fake pattern.
The right response to artifacts is usually different from the right response to noise. For noise, smooth the signal (e.g., Moving-average filter). For artifacts, identify the cause and either fix it upstream (better electrode contact, properly shielded cables) or detect and reject the affected segments. A Moving-average filter applied to an artifact might smooth it into something less obvious, but the data underneath is still wrong, and the model trained on it will learn the wrong things.
In machine-learning pipelines, knowing the difference matters. A good preprocessing pipeline detects common artifacts (saturation plateaus, sudden steps, missing-data segments) and either flags them or removes the affected windows entirely. Treating artifacts as if they were noise, running them through a smoothing filter and hoping, leads to models that work in the lab and fail in the field.