453: Memory Ordering
Tutorial
The Problem
Modern CPUs and compilers reorder instructions for performance. On a multi-core system, one thread's operations may appear in a different order to another thread. Memory ordering specifies the synchronization guarantees: Relaxed (no ordering guarantees), Acquire/Release (synchronized handoff between writer and reader), AcqRel (both acquire and release), SeqCst (total global order). Choosing the wrong ordering causes data races or needless performance loss. The Release-Acquire pair is the key idiom: a Release store "publishes" writes; an Acquire load "subscribes" to them.
Memory ordering is foundational to all lock-free programming, Arc's reference counting, message passing channel internals, and spinlock implementations.
🎯 Learning Outcomes
Relaxed, Acquire, Release, AcqRel, SeqCststore(..., Release) and load(..., Acquire) form a happens-before edgeRelaxed is sufficient for independent counters where ordering doesn't matterSeqCst is the safest default but has the highest costCode Example
#![allow(clippy::all)]
// 453. Memory ordering: Relaxed, Acquire, Release
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_release_acquire() {
let d = Arc::new(AtomicUsize::new(0));
let f = Arc::new(AtomicBool::new(false));
let (dc, fc) = (Arc::clone(&d), Arc::clone(&f));
thread::spawn(move || {
dc.store(42, Ordering::Relaxed);
fc.store(true, Ordering::Release);
})
.join()
.unwrap();
assert!(f.load(Ordering::Acquire));
assert_eq!(d.load(Ordering::Relaxed), 42);
}
}Key Differences
OCaml Approach
OCaml 5.x's Atomic module uses sequential consistency for all operations — there is no explicit ordering control. The simplicity reduces bug potential but prevents optimizations that weaker orderings enable. OCaml's memory model is based on the "OCaml Memory Model" paper (2022), which is weaker than C11's sequentially consistent model in some edge cases involving non-atomic accesses.
Full Source
#![allow(clippy::all)]
// 453. Memory ordering: Relaxed, Acquire, Release
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_release_acquire() {
let d = Arc::new(AtomicUsize::new(0));
let f = Arc::new(AtomicBool::new(false));
let (dc, fc) = (Arc::clone(&d), Arc::clone(&f));
thread::spawn(move || {
dc.store(42, Ordering::Relaxed);
fc.store(true, Ordering::Release);
})
.join()
.unwrap();
assert!(f.load(Ordering::Acquire));
assert_eq!(d.load(Ordering::Relaxed), 42);
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_release_acquire() {
let d = Arc::new(AtomicUsize::new(0));
let f = Arc::new(AtomicBool::new(false));
let (dc, fc) = (Arc::clone(&d), Arc::clone(&f));
thread::spawn(move || {
dc.store(42, Ordering::Relaxed);
fc.store(true, Ordering::Release);
})
.join()
.unwrap();
assert!(f.load(Ordering::Acquire));
assert_eq!(d.load(Ordering::Relaxed), 42);
}
}
Exercises
AtomicBool with compare_exchange(false, true, Acquire, Relaxed) for lock and store(false, Release) for unlock. Explain in a comment why these orderings are sufficient.Acquire), reads data, reads counter again, retries if different. Use correct orderings.Relaxed on a flag without the Release-Acquire pattern: have one thread write data then set a Relaxed flag, another spin on the Relaxed flag then read data. Document what incorrect result the reader might observe on weakly-ordered CPUs (ARM/POWER).