An assembler translates assembly-language source code into machine code, the binary instructions a processor actually executes. It’s the simplest piece of the toolchain: no real “compilation” in the optimization sense, just a near-mechanical conversion of mnemonics like add r3, r4, r5 to their 32-bit binary encodings.

The assembler also recognizes:

  • Mnemonics: add, ldw, beq, etc. → opcode bits.
  • Register names: r0..r31 → 5-bit register field values.
  • Addressing modes: different syntax forms, mapped to the right encoding.
  • Directives: .data, .text, .word, .global, etc. These tell the assembler how to organize output but aren’t machine code.
  • Labels: symbolic names like loop: that mark addresses.

Two-pass assembly

How does the assembler know what address loop refers to in a bne r3, r0, loop instruction, if loop is defined later in the source?

The Two-pass assembler design: walk the source twice. First pass builds a symbol table mapping every label to its address, second pass generates machine code with the now-known label addresses filled in.

Pass 1 reads the source and populates the symbol table; pass 2 re-reads the source, consults the symbol table to resolve every label reference, and writes the object file.

Relationship to the compiler

The Compiler is the previous stage in the toolchain. A compiler translates a high-level language (C, C++, Rust) into assembly; the assembler converts that assembly into machine code. The compiler does the hard intellectual work (parsing, optimizing, register allocation); the assembler does the more mechanical mnemonic → opcode mapping.

program.c  →  [compiler]  →  program.s  →  [assembler]  →  program.o (object file)

In modern toolchains the two are often combined: the compiler emits assembly internally and immediately invokes the assembler.

Object files and what comes next

The assembler emits an Object file containing:

  • Machine code for the source’s instructions.
  • Initialized data (.data section).
  • Uninitialized data declarations (.bss section).
  • Symbol table (entries for each defined label, plus references to external labels).
  • Relocation information (where addresses need to be patched up at link time).

That object file isn’t yet a runnable program. It has placeholders for any external symbols that live in other source files. The Linker combines multiple object files (and library files) into a single executable, resolving all the cross-file references.

Assembler directives

Common directives the assembler recognizes (besides actual instructions):

  • .text: start of the code section.
  • .data: start of the initialized data section.
  • .bss: uninitialized data.
  • .word value: emit a 4-byte word with the given value.
  • .byte value: emit one byte.
  • .global LABEL: make LABEL visible to the linker (export it).
  • .equ NAME, value: define a symbolic constant.
  • .org address: set the assembly’s current address.

These don’t generate machine code themselves; they instruct the assembler about layout, exports, and constants.