An assembler translates assembly-language source code into machine code — the binary instructions a processor actually executes. It’s the simplest piece of the toolchain: there’s no real “compilation” in the optimization sense, just a near-mechanical conversion of mnemonics like add r3, r4, r5 to their 32-bit binary encodings.
The assembler also recognizes:
- Mnemonics —
add,ldw,beq, etc. → opcode bits. - Register names —
r0..r31→ 5-bit register field values. - Addressing modes — different syntax forms, mapped to the right encoding.
- Directives —
.data,.text,.word,.global, etc. — instructions to the assembler about how to organize output, but not actual machine code. - Labels — symbolic names like
loop:that mark addresses.
Two-pass assembly
A simple challenge: how does the assembler know what address loop refers to in a bne r3, r0, loop instruction, if loop is defined later in the source?
The fix is the Two-pass assembler design: walk the source twice — first to build a symbol table mapping every label to its address, then to generate machine code with the now-known label addresses filled in.

Relationship to the compiler
The Compiler is the previous stage in the toolchain. A compiler translates a high-level language (C, C++, Rust) into assembly; the assembler converts that assembly into machine code. The compiler does the hard intellectual work — parsing, optimizing, register allocation; the assembler does the more mechanical mnemonic → opcode mapping.
program.c → [compiler] → program.s → [assembler] → program.o (object file)
In modern toolchains the two are often combined — the compiler emits assembly internally and immediately invokes the assembler.
Object files and what comes next
The assembler emits an Object file containing:
- Machine code for the source’s instructions.
- Initialized data (
.datasection). - Uninitialized data declarations (
.bsssection). - Symbol table (entries for each defined label, plus references to external labels).
- Relocation information (where addresses need to be patched up at link time).
That object file isn’t yet a runnable program — it has placeholders for any external symbols that live in other source files. The Linker combines multiple object files (and library files) into a single executable, resolving all the cross-file references.
For the role of the assembler within the broader toolchain, see Linker and Loader.
Assembler directives
Common directives the assembler recognizes (besides actual instructions):
.text— start of the code section..data— start of the initialized data section..bss— uninitialized data..word value— emit a 4-byte word with the given value..byte value— emit one byte..global LABEL— make LABEL visible to the linker (export it)..equ NAME, value— define a symbolic constant..org address— set the assembly’s current address.
These don’t generate machine code themselves; they instruct the assembler about layout, exports, and constants.