An assembler translates assembly-language source code into machine code — the binary instructions a processor actually executes. It’s the simplest piece of the toolchain: there’s no real “compilation” in the optimization sense, just a near-mechanical conversion of mnemonics like add r3, r4, r5 to their 32-bit binary encodings.

The assembler also recognizes:

  • Mnemonicsadd, ldw, beq, etc. → opcode bits.
  • Register namesr0..r31 → 5-bit register field values.
  • Addressing modes — different syntax forms, mapped to the right encoding.
  • Directives.data, .text, .word, .global, etc. — instructions to the assembler about how to organize output, but not actual machine code.
  • Labels — symbolic names like loop: that mark addresses.

Two-pass assembly

A simple challenge: how does the assembler know what address loop refers to in a bne r3, r0, loop instruction, if loop is defined later in the source?

The fix is the Two-pass assembler design: walk the source twice — first to build a symbol table mapping every label to its address, then to generate machine code with the now-known label addresses filled in.

Relationship to the compiler

The Compiler is the previous stage in the toolchain. A compiler translates a high-level language (C, C++, Rust) into assembly; the assembler converts that assembly into machine code. The compiler does the hard intellectual work — parsing, optimizing, register allocation; the assembler does the more mechanical mnemonic → opcode mapping.

program.c  →  [compiler]  →  program.s  →  [assembler]  →  program.o (object file)

In modern toolchains the two are often combined — the compiler emits assembly internally and immediately invokes the assembler.

Object files and what comes next

The assembler emits an Object file containing:

  • Machine code for the source’s instructions.
  • Initialized data (.data section).
  • Uninitialized data declarations (.bss section).
  • Symbol table (entries for each defined label, plus references to external labels).
  • Relocation information (where addresses need to be patched up at link time).

That object file isn’t yet a runnable program — it has placeholders for any external symbols that live in other source files. The Linker combines multiple object files (and library files) into a single executable, resolving all the cross-file references.

For the role of the assembler within the broader toolchain, see Linker and Loader.

Assembler directives

Common directives the assembler recognizes (besides actual instructions):

  • .text — start of the code section.
  • .data — start of the initialized data section.
  • .bss — uninitialized data.
  • .word value — emit a 4-byte word with the given value.
  • .byte value — emit one byte.
  • .global LABEL — make LABEL visible to the linker (export it).
  • .equ NAME, value — define a symbolic constant.
  • .org address — set the assembly’s current address.

These don’t generate machine code themselves; they instruct the assembler about layout, exports, and constants.