Pandas rolling

Pandas rolling creates a rolling window over a Pandas Series or DataFrame and applies windowed computations: moving averages, rolling sums, rolling standard deviations, rolling maxima, anything that’s a function of the values in the current window. It’s the engine behind both moving-average filtering and windowed feature extraction.

Two steps: build a rolling object, chain on an aggregation method.

import pandas as pd
df = pd.read_csv("data.csv")
 
# Moving average with window size 5
y_smoothed = df['signal'].rolling(window=5).mean()
 
# Rolling standard deviation
y_std = df['signal'].rolling(window=125).std()
 
# Rolling maximum
y_max = df['signal'].rolling(window=125).max()
 
# Rolling skewness and kurtosis
y_skew = df['signal'].rolling(window=125).skew()
y_kurt = df['signal'].rolling(window=125).kurt()

The window slides one sample at a time. At each position, the chained aggregation method computes a value from the $N$ samples currently in the window. The result is a Series the same length as the input, with the first $N - 1$ entries as NaN: the window doesn’t have enough samples to compute an aggregate at those positions.

Pandas accepts the NaNs at the beginning rather than fabricating values. Downstream code typically drops them (.dropna()) before using the rolling result.

Useful arguments:

window=N — the window size in samples.
min_periods=k — minimum number of valid samples required to produce an output (default is window). Lowering this lets the filter produce output earlier at the cost of accuracy on the boundary.
center=True — center the window on the current sample rather than ending at it. The output is no longer causal but the temporal alignment with the input is symmetric.
step=N — advance the window by $N$ samples per step rather than 1. Reduces output length when fine resolution isn’t needed. Added in pandas 1.5 (2022); older installs raise a TypeError.

The Pandas rolling machinery is implemented in compiled C, so it’s much faster than writing the equivalent loop in Python. For feature extraction over long signals, this matters. A one-minute ECG at 500 Hz is 30,000 samples, and a Python loop over it would be slow.

Idriss Rami — Notes

Explorer

Pandas rolling

Graph View

Backlinks