454 Fundamental

454: Compare-and-Exchange (CAS)

Functional Programming

Tutorial

The Problem

Atomic increment (fetch_add) handles the simple case, but complex atomic updates — "set to max", "set if value is X", "conditional swap" — require compare_and_exchange (CAS). CAS is the universal primitive for lock-free algorithms: read the current value, compute the new value, atomically swap old→new only if the value is still what you read. If another thread changed it, retry. This retry loop is the foundation of all lock-free data structures: lock-free stacks, queues, linked lists, and hash maps.

CAS appears in Arc's reference count decrement (is-zero check), lock-free queue implementations, optimistic concurrency control, and AtomicPtr pointer swaps.

🎯 Learning Outcomes

• Understand the CAS loop pattern: load → compute → compare_exchange → retry on failure

• Learn the difference between compare_exchange (strong, no spurious failure) and compare_exchange_weak (weak, may spuriously fail — preferred in loops)

• See how atomic_max uses CAS to atomically update a maximum value

• Understand the (success_ordering, failure_ordering) parameters

• Learn the ABA problem and why some lock-free algorithms need versioned pointers

Code Example

#![allow(clippy::all)]
// 454. CAS: compare_exchange and loops
use std::sync::atomic::{AtomicI64, AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;

fn cas_increment(a: &AtomicUsize) {
    let mut cur = a.load(Ordering::Relaxed);
    loop {
        match a.compare_exchange_weak(cur, cur + 1, Ordering::AcqRel, Ordering::Relaxed) {
            Ok(_) => break,
            Err(actual) => cur = actual,
        }
    }
}

fn atomic_max(a: &AtomicI64, v: i64) {
    let mut cur = a.load(Ordering::Relaxed);
    loop {
        if v <= cur {
            break;
        }
        match a.compare_exchange_weak(cur, v, Ordering::AcqRel, Ordering::Relaxed) {
            Ok(_) => break,
            Err(actual) => cur = actual,
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test_cas_inc() {
        let a = AtomicUsize::new(0);
        for _ in 0..100 {
            cas_increment(&a);
        }
        assert_eq!(a.load(Ordering::SeqCst), 100);
    }
    #[test]
    fn test_cas_fail() {
        let a = AtomicUsize::new(5);
        assert_eq!(
            a.compare_exchange(99, 0, Ordering::SeqCst, Ordering::SeqCst),
            Err(5)
        );
    }
    #[test]
    fn test_max() {
        let m = AtomicI64::new(i64::MIN);
        for v in [5, 3, 8, 1, 9, 2] {
            atomic_max(&m, v);
        }
        assert_eq!(m.load(Ordering::SeqCst), 9);
    }
}

(* 454. CAS loops – OCaml 5 *)
let counter = Atomic.make 0

let cas_increment () =
  let rec loop () =
    let cur = Atomic.get counter in
    if not (Atomic.compare_and_set counter cur (cur+1)) then loop ()
  in loop ()

let global_max = Atomic.make min_int

let update_max v =
  let rec loop () =
    let cur = Atomic.get global_max in
    if v > cur && not (Atomic.compare_and_set global_max cur v) then loop ()
  in loop ()

let () =
  let ws = Array.init 4 (fun _ ->
    Domain.spawn (fun () -> for _ = 1 to 100 do cas_increment () done)
  ) in
  Array.iter Domain.join ws;
  Printf.printf "counter=%d\n" (Atomic.get counter);
  List.iter update_max [3;7;1;9;4;6];
  Printf.printf "max=%d\n" (Atomic.get global_max)

Key Differences

Result vs. bool: Rust's compare_exchange returns Result<old_val, actual_val>; OCaml returns bool, requiring a separate Atomic.get to get the actual value on failure.

Weak variant: Rust has compare_exchange_weak for loop use; OCaml's single compare_and_set corresponds roughly to compare_exchange (strong).

ABA problem: Both languages' CAS operations are susceptible to the ABA problem; solutions require versioned pointers (u128 packing tag + pointer) or hazard pointers.

Ordering: Rust separates success and failure orderings; OCaml uses sequential consistency for all atomic operations.

OCaml Approach

OCaml 5.x's Atomic.compare_and_set old_val new_val at is the CAS primitive. It returns a bool rather than Result<_, actual>, so you reload the value after failure. OCaml's CAS uses sequential consistency. A CAS-based counter: let cas_inc a = while not (let cur = Atomic.get a in Atomic.compare_and_set a cur (cur+1)) do () done.

Full Source

#![allow(clippy::all)]
// 454. CAS: compare_exchange and loops
use std::sync::atomic::{AtomicI64, AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;

fn cas_increment(a: &AtomicUsize) {
    let mut cur = a.load(Ordering::Relaxed);
    loop {
        match a.compare_exchange_weak(cur, cur + 1, Ordering::AcqRel, Ordering::Relaxed) {
            Ok(_) => break,
            Err(actual) => cur = actual,
        }
    }
}

fn atomic_max(a: &AtomicI64, v: i64) {
    let mut cur = a.load(Ordering::Relaxed);
    loop {
        if v <= cur {
            break;
        }
        match a.compare_exchange_weak(cur, v, Ordering::AcqRel, Ordering::Relaxed) {
            Ok(_) => break,
            Err(actual) => cur = actual,
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test_cas_inc() {
        let a = AtomicUsize::new(0);
        for _ in 0..100 {
            cas_increment(&a);
        }
        assert_eq!(a.load(Ordering::SeqCst), 100);
    }
    #[test]
    fn test_cas_fail() {
        let a = AtomicUsize::new(5);
        assert_eq!(
            a.compare_exchange(99, 0, Ordering::SeqCst, Ordering::SeqCst),
            Err(5)
        );
    }
    #[test]
    fn test_max() {
        let m = AtomicI64::new(i64::MIN);
        for v in [5, 3, 8, 1, 9, 2] {
            atomic_max(&m, v);
        }
        assert_eq!(m.load(Ordering::SeqCst), 9);
    }
}

(* 454. CAS loops – OCaml 5 *)
let counter = Atomic.make 0

let cas_increment () =
  let rec loop () =
    let cur = Atomic.get counter in
    if not (Atomic.compare_and_set counter cur (cur+1)) then loop ()
  in loop ()

let global_max = Atomic.make min_int

let update_max v =
  let rec loop () =
    let cur = Atomic.get global_max in
    if v > cur && not (Atomic.compare_and_set global_max cur v) then loop ()
  in loop ()

let () =
  let ws = Array.init 4 (fun _ ->
    Domain.spawn (fun () -> for _ = 1 to 100 do cas_increment () done)
  ) in
  Array.iter Domain.join ws;
  Printf.printf "counter=%d\n" (Atomic.get counter);
  List.iter update_max [3;7;1;9;4;6];
  Printf.printf "max=%d\n" (Atomic.get global_max)

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test_cas_inc() {
        let a = AtomicUsize::new(0);
        for _ in 0..100 {
            cas_increment(&a);
        }
        assert_eq!(a.load(Ordering::SeqCst), 100);
    }
    #[test]
    fn test_cas_fail() {
        let a = AtomicUsize::new(5);
        assert_eq!(
            a.compare_exchange(99, 0, Ordering::SeqCst, Ordering::SeqCst),
            Err(5)
        );
    }
    #[test]
    fn test_max() {
        let m = AtomicI64::new(i64::MIN);
        for v in [5, 3, 8, 1, 9, 2] {
            atomic_max(&m, v);
        }
        assert_eq!(m.load(Ordering::SeqCst), 9);
    }
}

Exercises

Atomic stack push: Implement lock-free stack push using CAS on AtomicPtr<Node<T>>. push allocates a node, sets its next to the current head, then CAS-swaps the head from old_head to new_node. Repeat on failure.

Versioned CAS (DCAS): On platforms with 128-bit atomics (using AtomicU128 or a crate), implement an (version, value) CAS that increments the version on every successful update, preventing the ABA problem.

CAS vs. mutex benchmark: Implement a shared counter using CAS loop and one using Mutex<u64>. Benchmark both with 8 threads each doing 1M increments. Plot throughput vs. number of threads from 1 to 16.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust