The instruction execution cycle is the universal sequence of steps a processor performs for every instruction: Fetch, Decode, Execute, Memory, Writeback. Five stages, five clock cycles in a simple non-pipelined design — though pipelined processors overlap these stages so the effective throughput is one instruction per cycle.

The cycle is the same for every RISC-style instruction. What varies between instruction types is what each stage does — the Memory stage is critical for Loads and Stores but does nothing for plain ALU ops, the Writeback stage updates a register for ALU and Loads but not Stores.

The five stages

Stage 1 — Fetch

Pull the instruction at PC from memory into IR; increment PC.

In Register transfer notation:

The brackets matter: is “contents of PC” (the address); is “contents of memory at that address” (the instruction itself).

This is always a memory read. See Memory Read and Write Operations.

Stage 2 — Decode / Read

Decode the opcode in IR; read the source register operands from the register file into the inter-stage registers RA and RB.

The control unit examines IR’s opcode to determine which type of instruction this is and which control signals to assert in the upcoming stages.

Stages 3–5 — Vary by instruction type

Each instruction type uses these stages differently:

Instruction typeStep 3 (Execute)Step 4 (Memory)Step 5 (Writeback)
ALU (Add, Sub, etc.)
Load
Store ; (no action)
Call address; address
Return(no action)(no action)

What’s happening in each row

ALU operations (add, subtract, etc.):

  • Stage 3: ALU computes the result, latched into RZ.
  • Stage 4: RZ moves to RY (just a pass-through, but the inter-stage register lets the next stage read consistently).
  • Stage 5: RY’s value writes back to the destination register.

Loads (ldw):

  • Stage 3: ALU computes the effective memory address (base + offset), result in RZ.
  • Stage 4: Memory is read at that address; the data goes into RY.
  • Stage 5: RY writes back to the destination register.

Stores (stw):

  • Stage 3: ALU computes the effective address (RZ); the value to store loads into RM.
  • Stage 4: Memory writes RM to address RZ.
  • Stage 5: Nothing — store has no destination register.

Call:

  • Stage 3: Save current PC into PC_Temp; set PC to the call target.
  • Stage 4: Move PC_Temp into RY.
  • Stage 5: Write RY to the link register.

Return:

  • Stage 3: Set PC to the contents of RA (which holds the link register’s value).
  • Stages 4–5: Nothing.

Inter-stage registers

The capital-letter registers (RA, RB, RZ, RY, RM) aren’t visible to the programmer — they’re implementation details that hold values between pipeline stages. Each stage’s outputs get latched into one of these registers at the end of the stage; the next stage reads from the latch.

Why have them? In a pipelined design, multiple instructions are in flight at once. RA’s value for the instruction currently in stage 3 has to stay stable while the instruction in stage 2 computes a different RA for itself. The inter-stage registers prevent the stages from interfering.

Pipelining

In a non-pipelined design, only one instruction is being processed at a time, and it takes 5 cycles to complete (5 stages × 1 cycle each). Throughput: one instruction per 5 cycles.

In a pipelined design, each stage works on a different instruction simultaneously:

Cycle 1: Inst1=F
Cycle 2: Inst1=D Inst2=F
Cycle 3: Inst1=E Inst2=D Inst3=F
Cycle 4: Inst1=M Inst2=E Inst3=D Inst4=F
Cycle 5: Inst1=W Inst2=M Inst3=E Inst4=D Inst5=F
Cycle 6:         Inst2=W Inst3=M Inst4=E Inst5=D ...

After the pipeline fills, one instruction completes per cycle — a 5× speedup over the non-pipelined version in the ideal case. Real pipelines never sustain that ideal because of three classes of hazard:

  • Data hazards. An instruction needs a value that an earlier instruction hasn’t yet written back. The pipeline must stall (insert a bubble) or forward the value from a later stage — extra hardware that costs cycles.
  • Control hazards. Branches don’t resolve until the Execute stage (or later), but the Fetch stage has already pulled in instructions assuming the branch wasn’t taken. On a misprediction, those fetched instructions must be flushed — costing several cycles per mispredicted branch.
  • Structural hazards. Two instructions need the same hardware resource (e.g., a single memory port for both fetch and load) at the same time. One must wait.

A typical cycles-per-instruction (CPI) for a real 5-stage pipeline running general-purpose code is around 1.2–1.5 once stalls and mispredictions are factored in, not the ideal 1.0.

For the hardware that implements all this, see Hardware datapath and Control unit and control signals.