Skip to main content

Atomics and Memory Fences

Atomic operations execute as single, indivisible steps with no interference from other threads. Memory fences (barriers) establish ordering constraints between operations.

Lock-Free Synchronization

Atomics = indivisible operations (read/write/modify)
Fences = ordering guarantees for surrounding operations
Together they enable lock-free programming and fine-grained synchronization

Atomic Operations

Atomic operations are indivisible - no other thread can observe them half-complete.

#include <atomic>

std::atomic<int> counter{0};

void increment() {
++counter; // Atomic: read-modify-write as one operation
}

void read() {
int value = counter; // Atomic read
}

The Non-Atomic Problem

// WITHOUT atomics - DATA RACE!
int counter = 0;

// Thread 1 Thread 2
counter++; counter++;
// Load: 0 Load: 0
// Add: 1 Add: 1
// Store: 1 Store: 1
// Result: counter = 1 (should be 2!)

Without atomics, the three operations (load, add, store) can interleave, causing lost updates.

Basic Atomic Operations

#include <atomic>

std::atomic<int> x{0};

// Read
int val = x.load(); // Explicit load
int val2 = x; // Implicit load

// Write
x.store(42); // Explicit store
x = 42; // Implicit store

// Read-modify-write
int old = x.exchange(10); // Swap value, return old

// Increment/decrement
x++; // Atomic increment
++x; // Atomic increment
x--; // Atomic decrement
int prev = x.fetch_add(5); // Add 5, return old value
int prev2 = x.fetch_sub(3); // Subtract 3, return old value

Compare-and-Swap (CAS)

The fundamental operation for lock-free algorithms.

std::atomic<int> value{10};

int expected = 10;
int desired = 20;

// If value==expected, set value=desired; return true
// Otherwise, set expected=value; return false
bool success = value.compare_exchange_strong(expected, desired);

if (success) {
// value is now 20
} else {
// value was not 10, expected now contains actual value
}

Strong vs Weak CAS

// Strong: only fails if value != expected
value.compare_exchange_strong(expected, desired);

// Weak: may spuriously fail even if value == expected
// Faster on some architectures
value.compare_exchange_weak(expected, desired);

// Typical usage of weak in loop
while (!value.compare_exchange_weak(expected, desired)) {
// Retry if spurious failure
}

Weak CAS is faster but can fail spuriously. Use in loops where retry is acceptable.

Lock-Free Stack Example

template<typename T>
class LockFreeStack {
struct Node {
T data;
Node* next;
};

std::atomic<Node*> head_{nullptr};

public:
void push(T value) {
Node* newNode = new Node{value, nullptr};
newNode->next = head_.load();

// Keep trying until successful
while (!head_.compare_exchange_weak(newNode->next, newNode)) {
// If head changed, newNode->next updated automatically
}
}

bool pop(T& result) {
Node* oldHead = head_.load();

while (oldHead &&
!head_.compare_exchange_weak(oldHead, oldHead->next)) {
// Retry if head changed
}

if (oldHead) {
result = oldHead->data;
delete oldHead; // ⚠️ Unsafe - ABA problem!
return true;
}
return false;
}
};

Memory Fences

Fences establish ordering constraints between non-atomic and relaxed atomic operations.

Without Fences - Problem

std::atomic<bool> ready{false};
int data = 0;

void writer() {
data = 42; // (1)
ready.store(true, std::memory_order_relaxed); // (2)
}

void reader() {
while (!ready.load(std::memory_order_relaxed)); // (3)
assert(data == 42); // ⚠️ NOT GUARANTEED with relaxed!
}

With relaxed ordering, there's no guarantee data = 42 happens before ready = true.

With Fences - Solution

void writer() {
data = 42; // (1)
std::atomic_thread_fence(std::memory_order_release); // Fence!
ready.store(true, std::memory_order_relaxed); // (2)
}

void reader() {
while (!ready.load(std::memory_order_relaxed)); // (3)
std::atomic_thread_fence(std::memory_order_acquire); // Fence!
assert(data == 42); // ✅ Now guaranteed!
}

The release fence ensures all prior writes complete before the store. The acquire fence ensures all subsequent reads see those writes.

Fence Types

// Acquire fence: prevents later reads/writes from moving before
std::atomic_thread_fence(std::memory_order_acquire);

// Release fence: prevents earlier reads/writes from moving after
std::atomic_thread_fence(std::memory_order_release);

// Full fence: both acquire and release
std::atomic_thread_fence(std::memory_order_acq_rel);

// Sequential consistency fence: strongest ordering
std::atomic_thread_fence(std::memory_order_seq_cst);

Fence Usage Pattern

Protecting multiple variables with relaxed atomics:

std::atomic<bool> ready{false};
int x = 0, y = 0, z = 0;

void producer() {
x = 1;
y = 2;
z = 3;
std::atomic_thread_fence(std::memory_order_release);
ready.store(true, std::memory_order_relaxed);
}

void consumer() {
while (!ready.load(std::memory_order_relaxed));
std::atomic_thread_fence(std::memory_order_acquire);
// All writes to x, y, z are visible
assert(x == 1 && y == 2 && z == 3);
}

Atomic Counter Example

#include <atomic>

class AtomicCounter {
std::atomic<int> count_{0};

public:
void increment() {
count_.fetch_add(1, std::memory_order_relaxed);
}

int get() const {
return count_.load(std::memory_order_relaxed);
}
};

Relaxed ordering is sufficient for simple counters where exact ordering doesn't matter.

std::atomic_flag

The only guaranteed lock-free atomic type.

std::atomic_flag flag = ATOMIC_FLAG_INIT;

// Test and set
if (!flag.test_and_set()) {
// First time, flag is now set
}

// Clear
flag.clear();

// Simple spinlock
class Spinlock {
std::atomic_flag locked_ = ATOMIC_FLAG_INIT;

public:
void lock() {
while (locked_.test_and_set(std::memory_order_acquire)) {
// Spin
}
}

void unlock() {
locked_.clear(std::memory_order_release);
}
};

Checking Lock-Free Support

std::atomic<int> x;
std::atomic<double> y;
std::atomic<MyLargeStruct> z;

// Check if lock-free at compile-time
static_assert(std::atomic<int>::is_always_lock_free);

// Check at runtime
if (x.is_lock_free()) {
std::cout << "int atomic is lock-free\n";
}

if (y.is_lock_free()) {
std::cout << "double atomic is lock-free\n";
}

// Large types might not be lock-free
if (!z.is_lock_free()) {
std::cout << "MyLargeStruct uses locks\n";
}

Performance Comparison

// Relaxed: fastest, no ordering
counter.fetch_add(1, std::memory_order_relaxed);

// Acquire/Release: medium, synchronization
flag.store(true, std::memory_order_release);

// Sequential: slowest, full ordering
flag.store(true, std::memory_order_seq_cst); // Default

Typical latency:

  • Relaxed: ~1-2 cycles
  • Acquire/Release: ~5-10 cycles
  • Sequential consistency: ~10-20 cycles

Producer-Consumer with Fences

#include <atomic>
#include <vector>

class MessageQueue {
std::vector<int> buffer_;
std::atomic<size_t> writePos_{0};
std::atomic<size_t> readPos_{0};

public:
void push(int value) {
size_t pos = writePos_.load(std::memory_order_relaxed);
buffer_[pos] = value;

std::atomic_thread_fence(std::memory_order_release);
writePos_.store(pos + 1, std::memory_order_relaxed);
}

bool pop(int& result) {
size_t pos = readPos_.load(std::memory_order_relaxed);
if (pos >= writePos_.load(std::memory_order_relaxed)) {
return false;
}

std::atomic_thread_fence(std::memory_order_acquire);
result = buffer_[pos];
readPos_.store(pos + 1, std::memory_order_relaxed);
return true;
}
};

Best Practices

DO
  • Use atomics for simple shared counters/flags
  • Use relaxed ordering for independent operations
  • Check is_lock_free() for critical paths
  • Profile before optimizing memory orders
  • Start with sequential consistency, optimize if needed
DON'T
  • Use atomics for complex data structures (use mutexes)
  • Mix atomic and non-atomic access to same variable
  • Assume all atomic operations are lock-free
  • Use relaxed ordering without understanding memory model
  • Over-optimize memory ordering prematurely

Summary

Atomic operations:

  • Indivisible read/write/modify operations
  • No data races, no torn reads/writes
  • Support lock-free algorithms
  • load(), store(), exchange(), CAS

Memory fences:

  • Establish ordering between operations
  • Release fence: prior writes complete before fence
  • Acquire fence: subsequent reads see prior writes
  • Allow relaxed atomics with ordering guarantees

Use cases:

  • Atomics: Simple counters, flags, lock-free structures
  • Fences: Protecting multiple variables with relaxed atomics
  • Sequential consistency: When in doubt (default, safe)
  • Relaxed: Independent counters, statistics