A multi-byte data value is word-aligned when its starting address is a multiple of its size. A 32-bit word is aligned at addresses ; a 64-bit double is aligned at .
Many processors require or strongly prefer aligned accesses. The historical reason was that the data bus between CPU and memory was word-wide and a properly aligned word fetched in one bus transaction. On modern systems the picture is different: data paths are typically 64–128 bits wide and memory is fetched in cache lines of 64 bytes. The penalty an unaligned access pays now is crossing a cache-line boundary: a 4-byte access at address 62 spans bytes 62–65, hitting bytes 62–63 in one cache line and 64–65 in the next. That requires two cache lookups (or more, on some hardware) plus byte shifting/merging. Within a single cache line, an unaligned access on x86 is nearly free; only the line-crossing case is slow.
Aligned access
Fast and simple. The address goes out, memory returns one word, the processor uses it.
Address: 0 4 8 12 16 20 ...
Words: [W0][W1][W2][W3][W4][W5]...
A word at address 8 fits cleanly. One bus transaction.
Unaligned access
A 32-bit word at address 6 (not a multiple of 4) straddles two word boundaries:
Address: 0 4 8 12
[W0][??? ][W2]
↑
word at 6 spans bytes 6-9
To get this word, the processor (or memory subsystem) has to:
- Read word at address 4. Take its upper bytes (6, 7).
- Read word at address 8. Take its lower bytes (8, 9).
- Combine the four bytes into one word.
That’s two bus transactions plus shifting. On processors that support unaligned access (x86), it works but is slower. On processors that don’t (older ARM, MIPS, Nios II), the result is an exception or trap.
Why it’s enforced (or not)
- RISC machines (Nios II, MIPS, classic ARM, RISC-V): typically reject unaligned accesses with an exception. Simpler hardware — the bus interface only needs to handle aligned cases.
- CISC machines (x86): accept unaligned accesses transparently, paying the performance penalty.
- Modern ARM: configurable, with most operating systems opting to allow unaligned access for compatibility.
Practical consequences
- Compilers add padding to structs so each field is naturally aligned. A
struct { char c; int i; }typically takes 8 bytes (1 forc, 3 padding, 4 fori), not 5. - The first instruction in any program is at a word-aligned address. Branch targets must be word-aligned.
- Pointer arithmetic in C is in units of the pointed-to type’s size, not bytes —
int *p; p++;advances by 4 (on 32-bit ints), preserving alignment automatically. - For stacks, the stack pointer is usually kept word-aligned (or doubleword-aligned for SSE/AVX) by convention.
In Nios II, all word loads (ldw) and stores (stw) require 4-byte alignment. Misaligned accesses raise an exception, so make sure addresses you build for word loads are multiples of 4.