ExamplesBy LevelBy TopicLearning Paths
368 Advanced

368: Persistent Data Structures

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "368: Persistent Data Structures" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. Mutable data structures destroy history — you can't go back to a previous state without making copies. Key difference from OCaml: | Aspect | Rust `Rc<PList<T>>` | OCaml `'a list` |

Tutorial

The Problem

Mutable data structures destroy history — you can't go back to a previous state without making copies. Persistent (functional/immutable) data structures solve this by sharing unchanged structure between versions. A persistent linked list shares its tail: cons(x, list) creates a new list that shares all of list without copying. This structural sharing makes undo/redo O(1), version control O(changed-nodes), and functional programming idioms possible. Rust implements persistent structures using Rc<T> (single-threaded) or Arc<T> (multi-threaded) for reference-counted shared ownership.

🎯 Learning Outcomes

  • • Implement a persistent linked list as Rc<PList<T>> with Nil / Cons(T, Rc<...>) variants
  • • Understand that cons(x, tail) is O(1) and shares tail without copying
  • • Use Rc::clone to create new owners of the same node (reference counting, not deep copy)
  • • Implement head and tail as O(1) operations on persistent lists
  • • Extend to persistent trees where modifications create new paths from root to changed node
  • • Recognize that all of OCaml's standard data structures are persistent
  • Code Example

    enum PList<T> {
        Nil,
        Cons(T, Rc<PList<T>>),
    }
    
    let l1 = PList::cons(1, PList::nil());
    let l2 = PList::cons(0, Rc::clone(&l1)); // shares l1

    Key Differences

    AspectRust Rc<PList<T>>OCaml 'a list
    Reference trackingRc reference countingGC tracing
    AllocationExplicit Rc::new(...)Implicit on :: cons
    Clone costO(1) Rc::clone (inc refcount)O(1) (GC handles it)
    Thread safetyRc is not Send (use Arc)GC handles concurrency
    CyclesRc cycles leak memoryGC detects cycles

    OCaml Approach

    All OCaml lists are persistent by default — there's no special wrapper needed:

    let nil = []
    let cons head tail = head :: tail  (* O(1), shares tail *)
    let head = List.hd
    let tail = List.tl
    
    (* Multiple lists sharing a tail *)
    let shared = [3]
    let list_a = 1 :: shared  (* [1; 3] — shares shared *)
    let list_b = 2 :: shared  (* [2; 3] — shares shared *)
    (* GC handles deallocation when no references remain *)
    

    In OCaml, the GC tracks all live references automatically — no Rc counter needed. The cons cell (head :: tail) is one allocation on the minor heap. Structural sharing is the default behavior for all functional data structures.

    Full Source

    #![allow(clippy::all)]
    //! Persistent Data Structures
    //!
    //! Immutable data structures with structural sharing via Rc.
    
    use std::rc::Rc;
    
    // === Approach 1: Persistent Linked List ===
    
    /// A persistent (immutable) linked list
    #[derive(Clone, Debug)]
    pub enum PList<T> {
        Nil,
        Cons(T, Rc<PList<T>>),
    }
    
    impl<T: Clone> PList<T> {
        /// Create an empty list
        pub fn nil() -> Rc<Self> {
            Rc::new(Self::Nil)
        }
    
        /// Prepend an element (O(1))
        pub fn cons(head: T, tail: Rc<Self>) -> Rc<Self> {
            Rc::new(Self::Cons(head, tail))
        }
    
        /// Get the head element
        pub fn head(list: &Rc<Self>) -> Option<&T> {
            match list.as_ref() {
                Self::Nil => None,
                Self::Cons(h, _) => Some(h),
            }
        }
    
        /// Get the tail (O(1) - just clone the Rc)
        pub fn tail(list: &Rc<Self>) -> Rc<Self> {
            match list.as_ref() {
                Self::Nil => Self::nil(),
                Self::Cons(_, t) => Rc::clone(t),
            }
        }
    
        /// Check if empty
        pub fn is_empty(list: &Rc<Self>) -> bool {
            matches!(list.as_ref(), Self::Nil)
        }
    
        /// Get length
        pub fn len(list: &Rc<Self>) -> usize {
            let mut count = 0;
            let mut cur = Rc::clone(list);
            loop {
                match cur.as_ref() {
                    Self::Nil => break,
                    Self::Cons(_, t) => {
                        count += 1;
                        cur = Rc::clone(t);
                    }
                }
            }
            count
        }
    
        /// Convert to Vec
        pub fn to_vec(list: &Rc<Self>) -> Vec<T> {
            let mut v = Vec::new();
            let mut cur = Rc::clone(list);
            loop {
                match cur.as_ref() {
                    Self::Nil => break,
                    Self::Cons(x, next) => {
                        v.push(x.clone());
                        cur = Rc::clone(next);
                    }
                }
            }
            v
        }
    
        /// Create from iterator
        pub fn from_iter<I: IntoIterator<Item = T>>(iter: I) -> Rc<Self> {
            let items: Vec<T> = iter.into_iter().collect();
            let mut list = Self::nil();
            for item in items.into_iter().rev() {
                list = Self::cons(item, list);
            }
            list
        }
    }
    
    // === Approach 2: Persistent Vector (simple path-copying) ===
    
    /// A persistent vector using copy-on-write
    #[derive(Clone, Debug)]
    pub struct PVec<T: Clone> {
        data: Rc<Vec<T>>,
    }
    
    impl<T: Clone> PVec<T> {
        /// Create an empty vector
        pub fn new() -> Self {
            Self {
                data: Rc::new(Vec::new()),
            }
        }
    
        /// Push an element (returns new vector)
        pub fn push(&self, val: T) -> Self {
            let mut new_data = (*self.data).clone();
            new_data.push(val);
            Self {
                data: Rc::new(new_data),
            }
        }
    
        /// Set element at index (returns new vector)
        pub fn set(&self, i: usize, val: T) -> Self {
            let mut new_data = (*self.data).clone();
            new_data[i] = val;
            Self {
                data: Rc::new(new_data),
            }
        }
    
        /// Get element at index
        pub fn get(&self, i: usize) -> Option<&T> {
            self.data.get(i)
        }
    
        /// Get length
        pub fn len(&self) -> usize {
            self.data.len()
        }
    
        /// Check if empty
        pub fn is_empty(&self) -> bool {
            self.data.is_empty()
        }
    
        /// Convert to Vec
        pub fn to_vec(&self) -> Vec<T> {
            (*self.data).clone()
        }
    
        /// Pop last element (returns new vector and popped value)
        pub fn pop(&self) -> (Self, Option<T>) {
            if self.is_empty() {
                return (self.clone(), None);
            }
            let mut new_data = (*self.data).clone();
            let val = new_data.pop();
            (
                Self {
                    data: Rc::new(new_data),
                },
                val,
            )
        }
    }
    
    impl<T: Clone> Default for PVec<T> {
        fn default() -> Self {
            Self::new()
        }
    }
    
    // === Approach 3: Persistent Map (simple version) ===
    
    /// A persistent map using sorted vector
    #[derive(Clone, Debug)]
    pub struct PMap<K: Ord + Clone, V: Clone> {
        data: Rc<Vec<(K, V)>>,
    }
    
    impl<K: Ord + Clone, V: Clone> PMap<K, V> {
        /// Create empty map
        pub fn new() -> Self {
            Self {
                data: Rc::new(Vec::new()),
            }
        }
    
        /// Insert key-value (returns new map)
        pub fn insert(&self, key: K, value: V) -> Self {
            let mut new_data = (*self.data).clone();
            match new_data.binary_search_by(|(k, _)| k.cmp(&key)) {
                Ok(i) => new_data[i] = (key, value),
                Err(i) => new_data.insert(i, (key, value)),
            }
            Self {
                data: Rc::new(new_data),
            }
        }
    
        /// Get value by key
        pub fn get(&self, key: &K) -> Option<&V> {
            match self.data.binary_search_by(|(k, _)| k.cmp(key)) {
                Ok(i) => Some(&self.data[i].1),
                Err(_) => None,
            }
        }
    
        /// Remove key (returns new map)
        pub fn remove(&self, key: &K) -> Self {
            let mut new_data = (*self.data).clone();
            if let Ok(i) = new_data.binary_search_by(|(k, _)| k.cmp(key)) {
                new_data.remove(i);
            }
            Self {
                data: Rc::new(new_data),
            }
        }
    
        /// Get length
        pub fn len(&self) -> usize {
            self.data.len()
        }
    
        /// Check if empty
        pub fn is_empty(&self) -> bool {
            self.data.is_empty()
        }
    }
    
    impl<K: Ord + Clone, V: Clone> Default for PMap<K, V> {
        fn default() -> Self {
            Self::new()
        }
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_plist_basic() {
            let l1 = PList::cons(2, PList::cons(1, PList::nil()));
            let l2 = PList::cons(3, Rc::clone(&l1));
    
            assert_eq!(PList::to_vec(&l1), vec![2, 1]);
            assert_eq!(PList::to_vec(&l2), vec![3, 2, 1]);
            // l1 is unchanged
            assert_eq!(PList::to_vec(&l1), vec![2, 1]);
        }
    
        #[test]
        fn test_plist_head_tail() {
            let l = PList::cons(1, PList::cons(2, PList::nil()));
            assert_eq!(PList::head(&l), Some(&1));
            assert_eq!(PList::to_vec(&PList::tail(&l)), vec![2]);
        }
    
        #[test]
        fn test_plist_from_iter() {
            let l = PList::from_iter(vec![1, 2, 3]);
            assert_eq!(PList::to_vec(&l), vec![1, 2, 3]);
        }
    
        #[test]
        fn test_pvec_basic() {
            let v0: PVec<i32> = PVec::new();
            let v1 = v0.push(1);
            let v2 = v1.push(2);
            let v3 = v2.set(0, 99);
    
            assert_eq!(v1.to_vec(), vec![1]);
            assert_eq!(v2.to_vec(), vec![1, 2]);
            assert_eq!(v3.to_vec(), vec![99, 2]);
            // v1 unchanged
            assert_eq!(v1.to_vec(), vec![1]);
        }
    
        #[test]
        fn test_pvec_pop() {
            let v = PVec::new().push(1).push(2).push(3);
            let (v2, val) = v.pop();
            assert_eq!(val, Some(3));
            assert_eq!(v2.to_vec(), vec![1, 2]);
        }
    
        #[test]
        fn test_pmap_basic() {
            let m0: PMap<&str, i32> = PMap::new();
            let m1 = m0.insert("a", 1);
            let m2 = m1.insert("b", 2);
            let m3 = m2.insert("a", 10); // update
    
            assert_eq!(m1.get(&"a"), Some(&1));
            assert_eq!(m2.get(&"b"), Some(&2));
            assert_eq!(m3.get(&"a"), Some(&10));
            // m1 unchanged
            assert_eq!(m1.get(&"a"), Some(&1));
        }
    
        #[test]
        fn test_pmap_remove() {
            let m = PMap::new().insert("a", 1).insert("b", 2);
            let m2 = m.remove(&"a");
            assert_eq!(m2.get(&"a"), None);
            assert_eq!(m2.get(&"b"), Some(&2));
        }
    
        #[test]
        fn test_structural_sharing() {
            let l1 = PList::cons(1, PList::nil());
            let l2 = PList::cons(2, Rc::clone(&l1));
            let l3 = PList::cons(3, Rc::clone(&l1));
            // l2 and l3 share the tail l1
            assert_eq!(PList::len(&l1), 1);
            assert_eq!(PList::len(&l2), 2);
            assert_eq!(PList::len(&l3), 2);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_plist_basic() {
            let l1 = PList::cons(2, PList::cons(1, PList::nil()));
            let l2 = PList::cons(3, Rc::clone(&l1));
    
            assert_eq!(PList::to_vec(&l1), vec![2, 1]);
            assert_eq!(PList::to_vec(&l2), vec![3, 2, 1]);
            // l1 is unchanged
            assert_eq!(PList::to_vec(&l1), vec![2, 1]);
        }
    
        #[test]
        fn test_plist_head_tail() {
            let l = PList::cons(1, PList::cons(2, PList::nil()));
            assert_eq!(PList::head(&l), Some(&1));
            assert_eq!(PList::to_vec(&PList::tail(&l)), vec![2]);
        }
    
        #[test]
        fn test_plist_from_iter() {
            let l = PList::from_iter(vec![1, 2, 3]);
            assert_eq!(PList::to_vec(&l), vec![1, 2, 3]);
        }
    
        #[test]
        fn test_pvec_basic() {
            let v0: PVec<i32> = PVec::new();
            let v1 = v0.push(1);
            let v2 = v1.push(2);
            let v3 = v2.set(0, 99);
    
            assert_eq!(v1.to_vec(), vec![1]);
            assert_eq!(v2.to_vec(), vec![1, 2]);
            assert_eq!(v3.to_vec(), vec![99, 2]);
            // v1 unchanged
            assert_eq!(v1.to_vec(), vec![1]);
        }
    
        #[test]
        fn test_pvec_pop() {
            let v = PVec::new().push(1).push(2).push(3);
            let (v2, val) = v.pop();
            assert_eq!(val, Some(3));
            assert_eq!(v2.to_vec(), vec![1, 2]);
        }
    
        #[test]
        fn test_pmap_basic() {
            let m0: PMap<&str, i32> = PMap::new();
            let m1 = m0.insert("a", 1);
            let m2 = m1.insert("b", 2);
            let m3 = m2.insert("a", 10); // update
    
            assert_eq!(m1.get(&"a"), Some(&1));
            assert_eq!(m2.get(&"b"), Some(&2));
            assert_eq!(m3.get(&"a"), Some(&10));
            // m1 unchanged
            assert_eq!(m1.get(&"a"), Some(&1));
        }
    
        #[test]
        fn test_pmap_remove() {
            let m = PMap::new().insert("a", 1).insert("b", 2);
            let m2 = m.remove(&"a");
            assert_eq!(m2.get(&"a"), None);
            assert_eq!(m2.get(&"b"), Some(&2));
        }
    
        #[test]
        fn test_structural_sharing() {
            let l1 = PList::cons(1, PList::nil());
            let l2 = PList::cons(2, Rc::clone(&l1));
            let l3 = PList::cons(3, Rc::clone(&l1));
            // l2 and l3 share the tail l1
            assert_eq!(PList::len(&l1), 1);
            assert_eq!(PList::len(&l2), 2);
            assert_eq!(PList::len(&l3), 2);
        }
    }

    Deep Comparison

    OCaml vs Rust: Persistent Data Structures

    Side-by-Side Comparison

    Persistent List

    OCaml:

    let lst1 = [1;2;3;4;5]
    let lst2 = 0 :: lst1  (* shares tail with lst1 *)
    let lst3 = List.tl lst1  (* shares almost all of lst1 *)
    

    Rust:

    enum PList<T> {
        Nil,
        Cons(T, Rc<PList<T>>),
    }
    
    let l1 = PList::cons(1, PList::nil());
    let l2 = PList::cons(0, Rc::clone(&l1)); // shares l1
    

    Key Differences

    AspectOCamlRust
    Default behaviorPersistent (immutable)Owned (mutable)
    Sharing mechanismAutomatic (GC)Explicit Rc<T>
    List cons:: operatorCons(h, Rc::clone(&t))
    Memory managementGCReference counting

    Memory Model

    OCaml: The GC tracks all references. When you write h :: t, you create a new cons cell pointing to the same tail. The GC ensures neither version is freed while references exist.

    Rust: Rc<T> provides reference counting. When all Rc clones go out of scope, the data is freed. No GC pauses, but cycles would leak (use Weak if needed).

    Performance

    OperationPersistent ListPersistent Vec
    PrependO(1)O(n)
    AppendO(n)O(n)
    UpdateO(n)O(n)
    AccessO(n)O(1)

    Note: For better persistent vector performance, use a tree-based structure like HAMT (O(log₃₂ n) updates).

    Exercises

  • Persistent stack: Implement push(stack, val) -> Rc<PList<T>> and pop(stack) -> (Option<T>, Rc<PList<T>>) using persistent lists; demonstrate multiple stacks sharing common history after a common push sequence.
  • Path sharing: Build a binary tree where update(tree, key, value) creates a new tree that shares all unchanged subtrees with the original — only the path from root to the updated node is new.
  • Version history: Implement a VersionedList<T> that stores a Vec<Rc<PList<T>>> of all versions; commit(value) prepends to the current version and saves it; checkout(version) returns that version's state.
  • Open Source Repos