ExamplesBy LevelBy TopicLearning Paths
487 Intermediate

String Interning

Functional Programming

Tutorial

The Problem

Compilers, parsers, and databases deal with many repeated string values: variable names, SQL column names, log message templates. Without interning, each copy of "username" is a separate heap allocation — wasting memory and requiring character-by-character comparison for equality. Interning maintains a global table of unique strings; the same content always returns the same pointer. This enables pointer equality (ptr_eq) instead of string comparison — O(1) vs. O(N) — and a 2–10× memory reduction for symbol-heavy workloads. LLVM's StringRef, Java's String.intern(), and Python's string interning all use this pattern.

🎯 Learning Outcomes

  • • Build an Interner backed by HashMap<String, Arc<str>>
  • • Return Arc<str> as a cheap cloneable handle to an interned string
  • • Verify pointer identity with Arc::ptr_eq to confirm interning works
  • • Wrap the interner in Mutex for thread-safe shared access
  • • Understand Arc<str> as a fat-pointer reference-counted string without a String wrapper
  • Code Example

    #![allow(clippy::all)]
    // 487. String interning pattern
    use std::collections::HashMap;
    use std::sync::{Arc, Mutex};
    
    pub struct Interner {
        table: HashMap<String, Arc<str>>,
    }
    
    impl Interner {
        pub fn new() -> Self {
            Interner {
                table: HashMap::new(),
            }
        }
        pub fn intern(&mut self, s: &str) -> Arc<str> {
            if let Some(v) = self.table.get(s) {
                return Arc::clone(v);
            }
            let arc: Arc<str> = Arc::from(s);
            self.table.insert(s.to_string(), Arc::clone(&arc));
            arc
        }
        pub fn len(&self) -> usize {
            self.table.len()
        }
    }
    
    // Thread-safe interner
    pub struct SyncInterner(Mutex<Interner>);
    impl SyncInterner {
        pub fn new() -> Self {
            SyncInterner(Mutex::new(Interner::new()))
        }
        pub fn intern(&self, s: &str) -> Arc<str> {
            self.0.lock().unwrap().intern(s)
        }
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
        #[test]
        fn test_intern_same() {
            let mut i = Interner::new();
            let a = i.intern("hi");
            let b = i.intern("hi");
            assert!(Arc::ptr_eq(&a, &b));
        }
        #[test]
        fn test_intern_diff() {
            let mut i = Interner::new();
            let a = i.intern("hi");
            let b = i.intern("ho");
            assert!(!Arc::ptr_eq(&a, &b));
        }
        #[test]
        fn test_intern_size() {
            let mut i = Interner::new();
            i.intern("a");
            i.intern("b");
            i.intern("a");
            assert_eq!(i.len(), 2);
        }
    }

    Key Differences

  • Reference counting: Rust uses Arc<str> to track interned string lifetimes — the interner table holds one count, callers hold additional counts; OCaml's GC handles lifetime automatically.
  • Pointer equality: Rust uses Arc::ptr_eq; OCaml uses (==) (physical equality).
  • Thread safety: Rust's SyncInterner wraps in Mutex; OCaml requires a Mutex too, but the GC ensures no use-after-free even without one on reads.
  • **Arc<str> vs. Arc<String>**: Arc<str> is a fat pointer (address + length) with one allocation; Arc<String> would add a String wrapper layer unnecessarily.
  • OCaml Approach

    OCaml strings are immutable and GC-managed; the GC does not intern by default. A manual interner uses a Hashtbl:

    let table : (string, string) Hashtbl.t = Hashtbl.create 64
    
    let intern s =
      match Hashtbl.find_opt table s with
      | Some v -> v   (* same physical string *)
      | None -> Hashtbl.add table s s; s
    
    (* Pointer equality *)
    let same a b = a == b  (* physical equality in OCaml *)
    

    OCaml's (==) is physical (pointer) equality; (=) is structural. Interning allows using (==) for string identity checks.

    Full Source

    #![allow(clippy::all)]
    // 487. String interning pattern
    use std::collections::HashMap;
    use std::sync::{Arc, Mutex};
    
    pub struct Interner {
        table: HashMap<String, Arc<str>>,
    }
    
    impl Interner {
        pub fn new() -> Self {
            Interner {
                table: HashMap::new(),
            }
        }
        pub fn intern(&mut self, s: &str) -> Arc<str> {
            if let Some(v) = self.table.get(s) {
                return Arc::clone(v);
            }
            let arc: Arc<str> = Arc::from(s);
            self.table.insert(s.to_string(), Arc::clone(&arc));
            arc
        }
        pub fn len(&self) -> usize {
            self.table.len()
        }
    }
    
    // Thread-safe interner
    pub struct SyncInterner(Mutex<Interner>);
    impl SyncInterner {
        pub fn new() -> Self {
            SyncInterner(Mutex::new(Interner::new()))
        }
        pub fn intern(&self, s: &str) -> Arc<str> {
            self.0.lock().unwrap().intern(s)
        }
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
        #[test]
        fn test_intern_same() {
            let mut i = Interner::new();
            let a = i.intern("hi");
            let b = i.intern("hi");
            assert!(Arc::ptr_eq(&a, &b));
        }
        #[test]
        fn test_intern_diff() {
            let mut i = Interner::new();
            let a = i.intern("hi");
            let b = i.intern("ho");
            assert!(!Arc::ptr_eq(&a, &b));
        }
        #[test]
        fn test_intern_size() {
            let mut i = Interner::new();
            i.intern("a");
            i.intern("b");
            i.intern("a");
            assert_eq!(i.len(), 2);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
        #[test]
        fn test_intern_same() {
            let mut i = Interner::new();
            let a = i.intern("hi");
            let b = i.intern("hi");
            assert!(Arc::ptr_eq(&a, &b));
        }
        #[test]
        fn test_intern_diff() {
            let mut i = Interner::new();
            let a = i.intern("hi");
            let b = i.intern("ho");
            assert!(!Arc::ptr_eq(&a, &b));
        }
        #[test]
        fn test_intern_size() {
            let mut i = Interner::new();
            i.intern("a");
            i.intern("b");
            i.intern("a");
            assert_eq!(i.len(), 2);
        }
    }

    Exercises

  • Weak references: Replace the interner's Arc<str> values with Weak<str> so interned strings are freed when no external handles remain, turning the table into a cache rather than a permanent registry.
  • Lock-free interner: Replace Mutex<Interner> with a DashMap<String, Arc<str>> (from the dashmap crate) for concurrent interning without a global lock.
  • Symbol type: Wrap Arc<str> in a newtype struct Symbol(Arc<str>) and implement PartialEq/Hash using pointer identity rather than string content for O(1) hash map lookups.
  • Open Source Repos