A window in Feature extraction is a contiguous segment of a signal over which features are computed. The signal is divided into windows, features are extracted within each, and the model sees a sequence of feature vectors — one per window — instead of the raw signal.

Three design choices govern how windows are placed:

Window size

A smaller window gives us more information per unit time — more windows, more features, finer-grained tracking of how the signal changes. But each window contains fewer samples, so the features extracted are noisier. A larger window gives us steadier features but coarser time resolution.

The rule of thumb: pick a window size roughly the size of the events we’re trying to characterize. A heartbeat is on the order of 0.5 to 1 second, so for ECG we’d typically use windows of 250 to 500 samples at 500 Hz. A walking footstep is about 1 second, so for accelerometer-based gait recognition we’d use similar windows. In practice this is empirical: try a few sizes, see which performs best on a validation set.

Window overlap

Windows can be placed:

  • Back-to-back, no overlap. Every sample belongs to exactly one window. Simple, fewer windows, fewer feature vectors per signal. Risks losing events that happen to fall at the boundary between two non-overlapping windows.
  • Overlapping. Successive windows share some samples. 50% overlap is common — each new window starts halfway through the previous one. Produces smoother feature trajectories and reduces boundary problems.
  • Step-of-one sliding. Advance the window by one sample at a time. Every sample gets its own window centered on it. Produces a feature value at every sample position. Expensive but for offline analysis often fine.

Overlap is almost always better than no overlap, because it produces smoother feature trajectories and avoids losing events at boundaries. The step-of-one form is what df['col'].rolling(N).mean() in Pandas computes by default.

Number of features

More features give us more to work with, but push us back toward the Curse of dimensionality. If we extract 30 features per window from each of 12 IMU sensors, we have 360 features per training example, and most models will struggle with that many dimensions unless we have enormous amounts of data.

The right number is a balance — enough features to capture the relevant variation, not so many that dimensionality blows up. A common starting point is 5-10 statistical features per channel (mean, std, max, min, Skewness, Kurtosis) followed by feature selection or PCA if needed.

These three choices — size, overlap, count — interact. A small window with tight overlap and many features produces a long sequence of detailed feature vectors. A large window with no overlap and few features produces a short sequence of coarse summaries. The right combination depends on the task.