A compiler translates a high-level programming language (C, C++, Rust, Go, Java) into a lower-level form — typically assembly or directly into machine code. The compiler is where most of the engineering of language tooling lives: it parses the source, builds intermediate representations, runs optimizations, allocates registers, and finally emits target code.
A typical compiler pipeline:
- Lexical analysis — break the source into tokens (
int,x,=,5,;). - Parsing — group tokens into an abstract syntax tree (AST).
- Semantic analysis — type-check, resolve names, verify the program is well-formed.
- Intermediate representation (IR) — convert the AST into a lower-level form amenable to optimization.
- Optimization — constant folding, dead-code elimination, inlining, loop transformations, etc.
- Register allocation — map IR-level “virtual” variables onto a finite set of physical registers, inserting spills when more values are live than registers available.
- Code generation — emit assembly or machine code for the target architecture.
Register allocation is shown here before code generation because that’s the order it happens in real compilers: you can’t pick instructions until you know which physical register each value lives in (an instruction’s choice of register can affect available encodings or addressing modes). In LLVM and GCC, allocation runs as part of the code-generator phase rather than after it.
Each stage is a substantial engineering effort. Real compilers (GCC, Clang, MSVC) are millions of lines of code and decades of incremental improvement.
Compiler vs assembler
The Assembler is the next stage after the compiler. The compiler emits human-readable assembly; the assembler converts that to machine code. In modern toolchains the two are often combined — the compiler produces the assembly internally and immediately invokes the assembler, hiding the intermediate .s file.
The intellectual work is concentrated in the compiler. The assembler does mostly mechanical translation (mnemonic → opcode bits, label → address). The compiler is where decisions like “should this loop be unrolled?” or “is it cheaper to keep this variable in a register or memory?” get made.
Compilation flow
program.c → [compiler] → program.s → [assembler] → program.o
↓
library files → [linker] → executable
Each .c source file goes through compiler+assembler independently, producing one .o per source. The Linker combines them.
This per-file separate compilation is what enables incremental rebuilds: change one file, recompile only it, re-link. Without separate compilation, every change would require recompiling the entire program.
Optimization
Modern compilers do dramatic optimization. Examples:
- Constant folding: replace
2 + 3in source with5at compile time. - Dead code elimination: remove statements whose results are never used.
- Inlining: replace function calls with the function body when small.
- Loop unrolling: replace
for (i = 0; i < 4; i++) f(i);with four copies off. - Register allocation: put hot variables in registers, spill cold ones to memory.
- Vectorization: turn a loop over an array into SIMD instructions that process multiple elements per cycle.
Optimization levels (-O0 through -O3 in gcc/clang) control how aggressively these are applied. -O0 is fast to compile, easy to debug; -O3 is slow to compile, hard to debug, often 2–10× faster runtime.
Compiled vs interpreted
Compilers produce code that’s executed by hardware directly. Contrast with interpreters (Python, Ruby, JavaScript-without-JIT), which read source and execute it step by step at runtime. Compiled code is faster (no per-statement parsing overhead) but harder to develop iteratively (rebuild needed for changes).
Modern hybrid systems blur the line:
- JIT compilation (Java, JavaScript V8): start interpreting, then compile hot code to native at runtime.
- AOT compilation (Java’s
native-image, .NET ReadyToRun): compile ahead of time for fast startup.
For C and C++ specifically, compilation is universally ahead-of-time — you build once, the executable runs many times.
In context
The compiler is one stage in the toolchain that turns source code into a runnable program. See:
- Assembler — converts assembly to machine code.
- Linker — combines object files into an executable.
- Loader — loads the executable into memory at runtime.
- Object file — what the assembler emits.
For the design behind a specific assembly language, see Nios II assembly language and Instruction Set Architecture.