Assembler

An assembler translates assembly-language source code into machine code — the binary instructions a processor actually executes. It’s the simplest piece of the toolchain: there’s no real “compilation” in the optimization sense, just a near-mechanical conversion of mnemonics like add r3, r4, r5 to their 32-bit binary encodings.

The assembler also recognizes:

Mnemonics — add, ldw, beq, etc. → opcode bits.
Register names — r0..r31 → 5-bit register field values.
Addressing modes — different syntax forms, mapped to the right encoding.
Directives — .data, .text, .word, .global, etc. — instructions to the assembler about how to organize output, but not actual machine code.
Labels — symbolic names like loop: that mark addresses.

Two-pass assembly

A simple challenge: how does the assembler know what address loop refers to in a bne r3, r0, loop instruction, if loop is defined later in the source?

The fix is the Two-pass assembler design: walk the source twice — first to build a symbol table mapping every label to its address, then to generate machine code with the now-known label addresses filled in.

Relationship to the compiler

The Compiler is the previous stage in the toolchain. A compiler translates a high-level language (C, C++, Rust) into assembly; the assembler converts that assembly into machine code. The compiler does the hard intellectual work — parsing, optimizing, register allocation; the assembler does the more mechanical mnemonic → opcode mapping.

program.c  →  [compiler]  →  program.s  →  [assembler]  →  program.o (object file)

In modern toolchains the two are often combined — the compiler emits assembly internally and immediately invokes the assembler.

Object files and what comes next

The assembler emits an Object file containing:

Machine code for the source’s instructions.
Initialized data (.data section).
Uninitialized data declarations (.bss section).
Symbol table (entries for each defined label, plus references to external labels).
Relocation information (where addresses need to be patched up at link time).

That object file isn’t yet a runnable program — it has placeholders for any external symbols that live in other source files. The Linker combines multiple object files (and library files) into a single executable, resolving all the cross-file references.

For the role of the assembler within the broader toolchain, see Linker and Loader.

Assembler directives

Common directives the assembler recognizes (besides actual instructions):

.text — start of the code section.
.data — start of the initialized data section.
.bss — uninitialized data.
.word value — emit a 4-byte word with the given value.
.byte value — emit one byte.
.global LABEL — make LABEL visible to the linker (export it).
.equ NAME, value — define a symbolic constant.
.org address — set the assembly’s current address.

These don’t generate machine code themselves; they instruct the assembler about layout, exports, and constants.

Idriss Rami — Notes

Explorer

Assembler

Two-pass assembly

Relationship to the compiler

Object files and what comes next

Assembler directives

Graph View

Table of Contents

Backlinks