ExamplesBy LevelBy TopicLearning Paths
988 Fundamental

988 Thread Local

Functional Programming

Tutorial

The Problem

Demonstrate thread-local storage (TLS) in Rust using the thread_local! macro. Each thread gets its own independent copy of the storage — no locks or synchronization needed. Show a thread-local counter where threads set independent values, and a thread-local accumulator that aggregates per-thread sums without shared state.

🎯 Learning Outcomes

  • • Declare thread-local storage with thread_local! { static NAME: RefCell<T> = ... }
  • • Access TLS via .with(|cell| ...) — the closure receives a reference to the thread-local value
  • • Use RefCell for interior mutability: borrow() for read, borrow_mut() for write
  • • Understand that thread_local! values are not Send and cannot be moved between threads
  • • Recognize the use cases: per-thread performance counters, per-request contexts, PRNG state
  • Code Example

    #![allow(clippy::all)]
    // 988: Thread-Local Storage
    // Rust: thread_local! macro — each thread gets its own instance
    
    use std::cell::RefCell;
    use std::sync::{Arc, Mutex};
    use std::thread;
    
    // --- Approach 1: thread_local! with Cell (simple counter) ---
    thread_local! {
        static COUNTER: RefCell<i32> = const { RefCell::new(0) };
    }
    
    fn thread_local_counter() -> Vec<i32> {
        let results = Arc::new(Mutex::new(Vec::new()));
    
        let handles: Vec<_> = (0..5i32)
            .map(|i| {
                let results = Arc::clone(&results);
                thread::spawn(move || {
                    // Each thread has its own COUNTER — no sharing
                    COUNTER.with(|c| *c.borrow_mut() = i * 10);
                    thread::yield_now();
                    let v = COUNTER.with(|c| *c.borrow());
                    results.lock().unwrap().push(v);
                })
            })
            .collect();
    
        for h in handles {
            h.join().unwrap();
        }
        let mut v = results.lock().unwrap().clone();
        v.sort();
        v
    }
    
    // --- Approach 2: Thread-local accumulator (no shared state needed) ---
    thread_local! {
        static LOCAL_SUM: RefCell<i64> = const { RefCell::new(0) };
    }
    
    fn thread_local_sum(id: i64) -> i64 {
        LOCAL_SUM.with(|s| {
            *s.borrow_mut() = 0; // reset for this thread
            for i in 1..=10 {
                *s.borrow_mut() += i * id;
            }
            *s.borrow()
        })
    }
    
    fn parallel_sums() -> i64 {
        let results = Arc::new(Mutex::new(Vec::new()));
    
        let handles: Vec<_> = (0..4i64)
            .map(|id| {
                let results = Arc::clone(&results);
                thread::spawn(move || {
                    let s = thread_local_sum(id);
                    results.lock().unwrap().push(s);
                })
            })
            .collect();
    
        for h in handles {
            h.join().unwrap();
        }
        let x = results.lock().unwrap().iter().sum();
        x
    }
    
    // --- Approach 3: Thread-local cache (computed once per thread) ---
    thread_local! {
        static THREAD_ID_CACHE: RefCell<Option<String>> = const { RefCell::new(None) };
    }
    
    fn get_thread_name(name: &str) -> String {
        THREAD_ID_CACHE.with(|cache| {
            let mut c = cache.borrow_mut();
            if c.is_none() {
                *c = Some(format!("thread-{}", name));
            }
            c.clone().unwrap()
        })
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_thread_local_isolation() {
            let counts = thread_local_counter();
            assert_eq!(counts, vec![0, 10, 20, 30, 40]);
        }
    
        #[test]
        fn test_parallel_sums() {
            // 0 + 55 + 110 + 165 = 330
            assert_eq!(parallel_sums(), 330);
        }
    
        #[test]
        fn test_thread_local_doesnt_leak_across_threads() {
            COUNTER.with(|c| *c.borrow_mut() = 999);
            let val_in_new_thread = thread::spawn(|| {
                COUNTER.with(|c| *c.borrow()) // should be 0, not 999
            })
            .join()
            .unwrap();
            assert_eq!(val_in_new_thread, 0);
        }
    
        #[test]
        fn test_thread_name_cached() {
            let n1 = get_thread_name("x");
            let n2 = get_thread_name("y"); // returns cached value, not "thread-y"
            assert_eq!(n1, n2); // same thread — cached
        }
    }

    Key Differences

    AspectRustOCaml
    Declarationthread_local! { static N: T = ... }No built-in (pre-5.0); Domain.DLS (5.0+)
    Access.with(|r| ...) closureDomain.DLS.get key
    Interior mutabilityRefCell<T> in TLSMutable domain-local slot
    Lock-freeYes — no concurrent access possibleYes (domain-local)

    TLS is ideal for per-thread random number generators, per-request logging contexts, and accumulating performance counters that are merged at the end. The key advantage over Mutex<T> is zero synchronization overhead.

    OCaml Approach

    (* OCaml: Thread.self() as key into a Hashtbl — manual TLS *)
    let tls_table : (int, int) Hashtbl.t = Hashtbl.create 16
    let tls_mutex = Mutex.create ()
    
    let tls_set v =
      let tid = Thread.id (Thread.self ()) in
      Mutex.lock tls_mutex;
      Hashtbl.replace tls_table tid v;
      Mutex.unlock tls_mutex
    
    let tls_get () =
      let tid = Thread.id (Thread.self ()) in
      Mutex.protect tls_mutex (fun () ->
        Hashtbl.find_opt tls_table tid
      )
    
    (* OCaml 5.0+: Domain.DLS for domain-local storage *)
    let key = Domain.DLS.new_key (fun () -> 0)
    let set v = Domain.DLS.set key v
    let get () = Domain.DLS.get key
    

    OCaml before 5.0 lacks built-in TLS — it requires a Hashtbl keyed by thread ID with manual locking. OCaml 5.0+'s Domain.DLS provides domain-local storage analogous to Rust's thread_local!.

    Full Source

    #![allow(clippy::all)]
    // 988: Thread-Local Storage
    // Rust: thread_local! macro — each thread gets its own instance
    
    use std::cell::RefCell;
    use std::sync::{Arc, Mutex};
    use std::thread;
    
    // --- Approach 1: thread_local! with Cell (simple counter) ---
    thread_local! {
        static COUNTER: RefCell<i32> = const { RefCell::new(0) };
    }
    
    fn thread_local_counter() -> Vec<i32> {
        let results = Arc::new(Mutex::new(Vec::new()));
    
        let handles: Vec<_> = (0..5i32)
            .map(|i| {
                let results = Arc::clone(&results);
                thread::spawn(move || {
                    // Each thread has its own COUNTER — no sharing
                    COUNTER.with(|c| *c.borrow_mut() = i * 10);
                    thread::yield_now();
                    let v = COUNTER.with(|c| *c.borrow());
                    results.lock().unwrap().push(v);
                })
            })
            .collect();
    
        for h in handles {
            h.join().unwrap();
        }
        let mut v = results.lock().unwrap().clone();
        v.sort();
        v
    }
    
    // --- Approach 2: Thread-local accumulator (no shared state needed) ---
    thread_local! {
        static LOCAL_SUM: RefCell<i64> = const { RefCell::new(0) };
    }
    
    fn thread_local_sum(id: i64) -> i64 {
        LOCAL_SUM.with(|s| {
            *s.borrow_mut() = 0; // reset for this thread
            for i in 1..=10 {
                *s.borrow_mut() += i * id;
            }
            *s.borrow()
        })
    }
    
    fn parallel_sums() -> i64 {
        let results = Arc::new(Mutex::new(Vec::new()));
    
        let handles: Vec<_> = (0..4i64)
            .map(|id| {
                let results = Arc::clone(&results);
                thread::spawn(move || {
                    let s = thread_local_sum(id);
                    results.lock().unwrap().push(s);
                })
            })
            .collect();
    
        for h in handles {
            h.join().unwrap();
        }
        let x = results.lock().unwrap().iter().sum();
        x
    }
    
    // --- Approach 3: Thread-local cache (computed once per thread) ---
    thread_local! {
        static THREAD_ID_CACHE: RefCell<Option<String>> = const { RefCell::new(None) };
    }
    
    fn get_thread_name(name: &str) -> String {
        THREAD_ID_CACHE.with(|cache| {
            let mut c = cache.borrow_mut();
            if c.is_none() {
                *c = Some(format!("thread-{}", name));
            }
            c.clone().unwrap()
        })
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_thread_local_isolation() {
            let counts = thread_local_counter();
            assert_eq!(counts, vec![0, 10, 20, 30, 40]);
        }
    
        #[test]
        fn test_parallel_sums() {
            // 0 + 55 + 110 + 165 = 330
            assert_eq!(parallel_sums(), 330);
        }
    
        #[test]
        fn test_thread_local_doesnt_leak_across_threads() {
            COUNTER.with(|c| *c.borrow_mut() = 999);
            let val_in_new_thread = thread::spawn(|| {
                COUNTER.with(|c| *c.borrow()) // should be 0, not 999
            })
            .join()
            .unwrap();
            assert_eq!(val_in_new_thread, 0);
        }
    
        #[test]
        fn test_thread_name_cached() {
            let n1 = get_thread_name("x");
            let n2 = get_thread_name("y"); // returns cached value, not "thread-y"
            assert_eq!(n1, n2); // same thread — cached
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_thread_local_isolation() {
            let counts = thread_local_counter();
            assert_eq!(counts, vec![0, 10, 20, 30, 40]);
        }
    
        #[test]
        fn test_parallel_sums() {
            // 0 + 55 + 110 + 165 = 330
            assert_eq!(parallel_sums(), 330);
        }
    
        #[test]
        fn test_thread_local_doesnt_leak_across_threads() {
            COUNTER.with(|c| *c.borrow_mut() = 999);
            let val_in_new_thread = thread::spawn(|| {
                COUNTER.with(|c| *c.borrow()) // should be 0, not 999
            })
            .join()
            .unwrap();
            assert_eq!(val_in_new_thread, 0);
        }
    
        #[test]
        fn test_thread_name_cached() {
            let n1 = get_thread_name("x");
            let n2 = get_thread_name("y"); // returns cached value, not "thread-y"
            assert_eq!(n1, n2); // same thread — cached
        }
    }

    Deep Comparison

    Thread-Local Storage — Comparison

    Core Insight

    Thread-local storage is the answer to "I want mutable state but don't want synchronization overhead." Each thread has its own private copy — no races possible, no locks needed.

    OCaml Approach

  • • OCaml 5: Domain.DLS.new_key / Domain.DLS.get / Domain.DLS.set (domain-local)
  • • OCaml < 5: Simulate with Thread.idHashtbl (requires mutex for the table itself)
  • • Domains ≠ threads in OCaml 5 — one domain can run many lightweight threads
  • • Typical use: per-domain RNG seeds, error buffers, caches
  • Rust Approach

  • thread_local! { static NAME: Type = init; } declares the variable
  • .with(|v| ...) is the only access method — ensures scoped lifetime
  • • Usually paired with Cell<T> (copy types) or RefCell<T> (arbitrary types)
  • • Initialized lazily on first access per thread
  • • Dropped when thread exits
  • Comparison Table

    ConceptOCamlRust
    DeclareDomain.DLS.new_key (fun () -> init)thread_local! { static X: T }
    ReadDomain.DLS.get keyX.with(\|v\| *v.borrow())
    WriteDomain.DLS.set key valX.with(\|v\| *v.borrow_mut() = x)
    Interior mutabilityMutable by natureCell<T> or RefCell<T>
    InitializationClosure passed at creationExpression in macro
    IsolationPer-domain (not per-thread in OCaml 5)Per-OS-thread
    No sync neededYesYes — the whole point

    std vs tokio

    Aspectstd versiontokio version
    RuntimeOS threads via std::threadAsync tasks on tokio runtime
    Synchronizationstd::sync::Mutex, Condvartokio::sync::Mutex, channels
    Channelsstd::sync::mpsc (unbounded)tokio::sync::mpsc (bounded, async)
    BlockingThread blocks on lock/recvTask yields, runtime switches tasks
    OverheadOne OS thread per taskMany tasks per thread (M:N)
    Best forCPU-bound, simple concurrencyI/O-bound, high-concurrency servers

    Exercises

  • Implement a thread-local RNG: each thread seeds its own rand::thread_rng() equivalent.
  • Implement per-thread allocation counters that are summed at program end without a shared counter.
  • Implement a "request ID" TLS that is set at thread entry and read by all functions without passing it as a parameter.
  • Demonstrate that modifying COUNTER in one thread does not affect another thread's COUNTER value.
  • Implement thread_local_cache<K: Hash+Eq, V> — a per-thread HashMap that serves as a local cache before hitting a shared Mutex<HashMap>.
  • Open Source Repos