Reinforcement learning

Reinforcement learning (RL) is the family of machine-learning algorithms where a model — usually called an agent — learns by interacting with an environment. It takes actions, gets rewards or penalties depending on whether the actions worked out, and gradually learns a policy that maximizes the cumulative reward over time.

The framing is different from Supervised learning and Unsupervised learning:

Supervised learning has labelled data — a fixed dataset of (input, correct answer) pairs.
Unsupervised learning has unlabelled data — a fixed dataset of inputs, structure to be discovered.
Reinforcement learning has no fixed dataset. The agent generates its own experience by acting, and learns from the rewards that the environment provides in response.

This is how chess engines, Go engines, and game-playing agents like the ones that learned to play Atari games from raw pixels are trained. The agent plays many games against itself or against fixed opponents, gets reward for winning and penalty for losing, and slowly learns a policy that picks good moves in arbitrary positions.

The standard formal framework is the Markov Decision Process (MDP): at each timestep, the agent is in some state, picks an action, transitions to a new state, and receives a reward. The agent’s job is to learn a policy — a mapping from states to actions — that maximizes expected cumulative reward. Algorithms include Q-learning (learns a value function over state-action pairs), policy gradient methods (directly learns the policy), and the modern deep-RL methods (PPO, A3C, DQN) that combine these with deep neural networks.

RL has produced some of the most spectacular machine-learning results — AlphaGo beating world champions, OpenAI Five winning Dota 2 matches, robotic systems learning manipulation from scratch. It’s also brittle and sample-inefficient compared to supervised learning: an RL agent often needs millions of episodes of experience to learn what a supervised model would learn from thousands of labelled examples.

For the Introduction to Data Science course, reinforcement learning is mentioned as one of the three ML families but not developed further. The course focuses on Supervised learning — Regression for continuous outputs, classification for discrete outputs — which is the family where the standard ML toolkit (Gradient descent, train/test splits, Logistic regression, Confusion matrix evaluation) developed.

Idriss Rami — Notes

Explorer

Reinforcement learning

Graph View

Backlinks