Skip to main content

Memory Alignment

Data arranged at addresses that are multiples of its size. Required for correctness on some architectures, critical for performance on all.

Hardware Requirement

x86-64: Misaligned = 2-10x slower
ARM/RISC: Misaligned = crash
Compiler aligns automatically, but understanding helps optimize struct layouts.

Why Alignment Matters​

CPUs read memory in chunks (4/8/16 bytes). Aligned data = one read. Misaligned = multiple reads + masking.

Memory:  [0][1][2][3][4][5][6][7]
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ 4-byte read

Aligned int at [0]: One read βœ…
Misaligned int at [1]: Two reads ❌ (slower)

Performance impact: 2-10x slower for misaligned access on x86-64.

Natural Alignment​

Type alignment = its size (up to word size).

char c;      // 1-byte alignment
short s; // 2-byte alignment
int i; // 4-byte alignment
double d; // 8-byte alignment

// Memory layout:
// Address Type
// 0x1000 char c
// 0x1001 (padding)
// 0x1002 short s
// 0x1004 int i
// 0x1008 double d

Rule: Address must be divisible by alignment requirement.

Struct Padding​

Compiler inserts padding for alignment.

struct Bad {
char c; // 1 byte
// 3 bytes padding
int i; // 4 bytes
char c2; // 1 byte
// 3 bytes padding
};
sizeof(Bad); // 12 bytes (50% waste!)

struct Good {
int i; // 4 bytes
char c; // 1 byte
char c2; // 1 byte
// 2 bytes padding
};
sizeof(Good); // 8 bytes (33% smaller!)

Optimization: Order members largest β†’ smallest to minimize padding.

Array Alignment​

Struct size must be multiple of alignment for array elements.

struct Example {
char c; // 1 byte
int i; // 4 bytes (+ 3 padding before)
char c2; // 1 byte
// 3 bytes trailing padding β†’ size = 12
};

Example arr[2];
// arr[0] at 0x1000: properly aligned
// arr[1] at 0x100C: properly aligned
// Without trailing padding, arr[1].i would be misaligned!

Why trailing padding: Ensures array elements stay aligned.

alignof and alignas​

Query and control alignment.

// Query alignment
std::cout << alignof(char) << "\n"; // 1
std::cout << alignof(int) << "\n"; // 4
std::cout << alignof(double) << "\n"; // 8

struct Widget { char c; int i; };
std::cout << alignof(Widget) << "\n"; // 4 (largest member)

// Specify alignment
alignas(64) int cache_line_var; // 64-byte aligned

struct alignas(32) Aligned {
int x;
// Padding to 32 bytes
};

Common Use Cases​

Cache-Line Alignment (Prevent False Sharing)​

// Problem: False sharing
struct Counters {
int thread1; // Same cache line
int thread2; // Threads fight for cache line ownership
};

// Solution: Align to cache line (64 bytes)
struct Counters {
alignas(64) int thread1; // Cache line 1
alignas(64) int thread2; // Cache line 2
};
// 10-100x faster in multithreaded code!

SIMD Alignment​

// SIMD requires 16-byte alignment
alignas(16) float vector[4];

// Fast: aligned SIMD load
load_simd(vector); // 1 instruction

// Slow: unaligned load
float* unaligned = (float*)((char*)vector + 1);
load_simd(unaligned); // Multiple instructions + shuffles

Optimal Struct Layout​

// ❌ Poor: 24 bytes (7 bytes wasted)
struct Poor {
char a; // 1 + 7 padding
double b; // 8
char c; // 1 + 7 padding
};

// βœ… Better: 16 bytes (6 bytes wasted)
struct Better {
double b; // 8
char a; // 1
char c; // 1
// 6 bytes padding
};

// βœ… Best: 16 bytes (6 bytes used)
struct Best {
double b; // 8
int i; // 4
char a; // 1
char c; // 1
short s; // 2
};

Strategy: Largest types first, fill gaps with smaller types.

Packed Structures​

Remove padding (dangerous, slow).

// GCC/Clang
struct __attribute__((packed)) Packed {
char c; // 1
int i; // 4 (misaligned!)
char c2; // 1
}; // 6 bytes total

// MSVC
#pragma pack(push, 1)
struct Packed {
char c; int i; char c2;
};
#pragma pack(pop)

Effects:

  • x86-64: Works but slow (2-10x)
  • ARM: May crash
  • Use only for: file formats, network protocols

Alignment at Runtime​

template<typename T>
bool is_aligned(const T* ptr) {
return reinterpret_cast<uintptr_t>(ptr) % alignof(T) == 0;
}

int x;
std::cout << is_aligned(&x); // Usually true

char buffer[16];
int* p = reinterpret_cast<int*>(buffer + 1);
std::cout << is_aligned(p); // False - misaligned

Quick Reference​

TypeSizeAlignment
char11
short22
int44
long88 (64-bit)
float44
double88
pointer88 (64-bit)
structsum + paddingmax member

Summary​

Memory Alignment - Key Points

Alignment Requirements:

  • Data at addresses divisible by its size
  • x86-64: Misaligned = 2-10x slower
  • ARM/RISC: Misaligned = crash
  • Compiler aligns automatically (inserts padding)

Natural Alignment Rules:

  • char: 1-byte (any address)
  • short: 2-byte (even addresses)
  • int: 4-byte (divisible by 4)
  • double: 8-byte (divisible by 8)
  • pointer: 8-byte on 64-bit systems

Struct Padding:

  • Compiler inserts padding to align members
  • Order matters: largeβ†’small minimizes waste
  • Trailing padding ensures array element alignment
  • Example: char (1) + int (4) + char (1) = 12 bytes (not 6!)

Optimization Strategy:

  • Order members: largest types first
  • Fill gaps with smaller types
  • Bad: char, double, char = 24 bytes
  • Good: double, char, char = 16 bytes

Cache-Line Alignment (64 bytes):

  • Prevents false sharing in multithreaded code
  • alignas(64) for thread-local counters
  • 10-100x speedup by avoiding cache thrashing

SIMD Alignment (16 bytes):

  • Required for vectorized operations
  • alignas(16) float vector[4]
  • One instruction vs multiple with shuffles

Control Alignment:

  • Query: alignof(Type) returns requirement
  • Specify: alignas(N) increases alignment
  • Packed structs: Remove padding (dangerous)

Packed Structures:

  • Remove all padding with __attribute__((packed))
  • Causes slow/unsafe misaligned access
  • Only for: binary file formats, network protocols
  • Never for: program data structures
// Interview answer:
// "Alignment means data at addresses divisible by its size.
// Required for correctness (ARM crashes on misalignment) and
// performance (x86 is 2-10x slower). Compiler inserts struct
// padding to align members. Optimize by ordering large→small.
// Cache-line alignment (64 bytes) prevents false sharing in
// multithreaded code. Use alignof/alignas for explicit control."