An assembler translates assembly-language source code into machine code, the binary instructions a processor actually executes. It’s the simplest piece of the toolchain: no real “compilation” in the optimization sense, just a near-mechanical conversion of mnemonics like add r3, r4, r5 to their 32-bit binary encodings.
The assembler also recognizes:
- Mnemonics:
add,ldw,beq, etc. → opcode bits. - Register names:
r0..r31→ 5-bit register field values. - Addressing modes: different syntax forms, mapped to the right encoding.
- Directives:
.data,.text,.word,.global, etc. These tell the assembler how to organize output but aren’t machine code. - Labels: symbolic names like
loop:that mark addresses.
Two-pass assembly
How does the assembler know what address loop refers to in a bne r3, r0, loop instruction, if loop is defined later in the source?
The Two-pass assembler design: walk the source twice. First pass builds a symbol table mapping every label to its address, second pass generates machine code with the now-known label addresses filled in.
Pass 1 reads the source and populates the symbol table; pass 2 re-reads the source, consults the symbol table to resolve every label reference, and writes the object file.
Relationship to the compiler
The Compiler is the previous stage in the toolchain. A compiler translates a high-level language (C, C++, Rust) into assembly; the assembler converts that assembly into machine code. The compiler does the hard intellectual work (parsing, optimizing, register allocation); the assembler does the more mechanical mnemonic → opcode mapping.
program.c → [compiler] → program.s → [assembler] → program.o (object file)
In modern toolchains the two are often combined: the compiler emits assembly internally and immediately invokes the assembler.
Object files and what comes next
The assembler emits an Object file containing:
- Machine code for the source’s instructions.
- Initialized data (
.datasection). - Uninitialized data declarations (
.bsssection). - Symbol table (entries for each defined label, plus references to external labels).
- Relocation information (where addresses need to be patched up at link time).
That object file isn’t yet a runnable program. It has placeholders for any external symbols that live in other source files. The Linker combines multiple object files (and library files) into a single executable, resolving all the cross-file references.
Assembler directives
Common directives the assembler recognizes (besides actual instructions):
.text: start of the code section..data: start of the initialized data section..bss: uninitialized data..word value: emit a 4-byte word with the given value..byte value: emit one byte..global LABEL: make LABEL visible to the linker (export it)..equ NAME, value: define a symbolic constant..org address: set the assembly’s current address.
These don’t generate machine code themselves; they instruct the assembler about layout, exports, and constants.