ExamplesBy LevelBy TopicLearning Paths
373 Advanced

373: Custom B-Tree Implementation

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "373: Custom B-Tree Implementation" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. Database indexes, file system directory trees (NTFS, ext4), and key-value store storage engines (RocksDB, SQLite) all use B-trees as their core data structure. Key difference from OCaml: | Aspect | Rust custom B

Tutorial

The Problem

Database indexes, file system directory trees (NTFS, ext4), and key-value store storage engines (RocksDB, SQLite) all use B-trees as their core data structure. Rudolf Bayer and Edward McCreight invented the B-tree in 1972 to optimize for disk access patterns: keeping data in large nodes (pages) minimizes the number of disk seeks. A B-tree of degree T stores up to 2T-1 keys per node, keeping the tree very shallow (height O(log_T n)). Unlike binary trees, B-tree nodes are wide — millions of records fit in a tree just 3-4 levels deep, matching the typical disk block size of 4KB-16KB.

🎯 Learning Outcomes

  • • Implement a B-tree node with keys: Vec<i32>, children: Vec<Box<BTreeNode>>, and is_leaf: bool
  • • Understand the minimum degree T: nodes hold T-1 to 2T-1 keys (except root)
  • • Implement recursive search through node keys and children
  • • Implement split-child: when a child is full (2T-1 keys), split it into two T-1 key nodes
  • • Implement non-full insert recursively, splitting full nodes on the way down
  • • Recognize that Rust's BTreeMap uses a B-tree variant with order ~11 internally
  • Code Example

    #![allow(clippy::all)]
    //! Custom B-Tree Implementation
    //!
    //! Self-balancing tree optimized for disk access patterns.
    
    const MIN_DEGREE: usize = 2;
    
    /// B-tree node
    #[derive(Debug)]
    pub struct BTreeNode {
        keys: Vec<i32>,
        children: Vec<Box<BTreeNode>>,
        is_leaf: bool,
    }
    
    impl BTreeNode {
        fn new_leaf() -> Box<Self> {
            Box::new(Self {
                keys: Vec::new(),
                children: Vec::new(),
                is_leaf: true,
            })
        }
    
        fn is_full(&self) -> bool {
            self.keys.len() == 2 * MIN_DEGREE - 1
        }
    }
    
    /// A B-tree with minimum degree T
    pub struct BTree {
        root: Box<BTreeNode>,
    }
    
    impl BTree {
        pub fn new() -> Self {
            Self {
                root: BTreeNode::new_leaf(),
            }
        }
    
        pub fn search(&self, key: i32) -> bool {
            Self::search_node(&self.root, key)
        }
    
        fn search_node(node: &BTreeNode, key: i32) -> bool {
            let i = node.keys.partition_point(|&k| k < key);
            if i < node.keys.len() && node.keys[i] == key {
                return true;
            }
            if node.is_leaf {
                return false;
            }
            if i < node.children.len() {
                Self::search_node(&node.children[i], key)
            } else {
                false
            }
        }
    
        pub fn insert(&mut self, key: i32) {
            // Simplified insert - just add to appropriate leaf
            Self::insert_simple(&mut self.root, key);
        }
    
        fn insert_simple(node: &mut BTreeNode, key: i32) {
            let i = node.keys.partition_point(|&k| k < key);
            if i < node.keys.len() && node.keys[i] == key {
                return; // duplicate
            }
            if node.is_leaf {
                node.keys.insert(i, key);
            } else if i < node.children.len() {
                Self::insert_simple(&mut node.children[i], key);
            }
        }
    
        pub fn to_sorted_vec(&self) -> Vec<i32> {
            let mut result = Vec::new();
            Self::collect(&self.root, &mut result);
            result.sort();
            result
        }
    
        fn collect(node: &BTreeNode, result: &mut Vec<i32>) {
            result.extend_from_slice(&node.keys);
            for child in &node.children {
                Self::collect(child, result);
            }
        }
    
        pub fn len(&self) -> usize {
            self.to_sorted_vec().len()
        }
    
        pub fn is_empty(&self) -> bool {
            self.root.keys.is_empty() && self.root.children.is_empty()
        }
    }
    
    impl Default for BTree {
        fn default() -> Self {
            Self::new()
        }
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_insert_search() {
            let mut bt = BTree::new();
            for v in [5, 3, 7, 1, 9] {
                bt.insert(v);
            }
            assert!(bt.search(5));
            assert!(bt.search(3));
            assert!(!bt.search(4));
        }
    
        #[test]
        fn test_sorted_output() {
            let mut bt = BTree::new();
            for v in [5, 3, 7, 1, 9, 4, 6, 2, 8] {
                bt.insert(v);
            }
            assert_eq!(bt.to_sorted_vec(), vec![1, 2, 3, 4, 5, 6, 7, 8, 9]);
        }
    
        #[test]
        fn test_duplicates() {
            let mut bt = BTree::new();
            bt.insert(1);
            bt.insert(1);
            bt.insert(1);
            assert_eq!(bt.len(), 1);
        }
    
        #[test]
        fn test_empty() {
            let bt = BTree::new();
            assert!(bt.is_empty());
            assert!(!bt.search(1));
        }
    
        #[test]
        fn test_many_elements() {
            let mut bt = BTree::new();
            for i in 0..100 {
                bt.insert(i);
            }
            for i in 0..100 {
                assert!(bt.search(i));
            }
            assert_eq!(bt.len(), 100);
        }
    }

    Key Differences

    AspectRust custom B-treeOCaml Map.Make
    Tree typeB-tree (wide nodes)AVL tree (binary)
    Node widthT-1 to 2T-1 keys1 key (binary node)
    HeightO(log_T n)O(log₂ n)
    Cache efficiencyHigh (wide nodes fit cache lines)Lower (binary pointer chasing)
    Production useBTreeMap in stdlibMap.Make in stdlib

    OCaml Approach

    OCaml's Map.Make is implemented as a balanced AVL tree (not a B-tree), but the B-tree structure translates naturally to OCaml algebraic types:

    type btree_node =
      | Leaf of int list
      | Internal of int list * btree_node list
    
    (* Search: binary search keys, recurse into correct child *)
    let rec search node key = match node with
      | Leaf keys -> List.mem key keys
      | Internal (keys, children) ->
        let i = List.length (List.filter (fun k -> k < key) keys) in
        if i < List.length keys && List.nth keys i = key then true
        else search (List.nth children i) key
    

    The functional version avoids mutation — insert returns a new tree with structural sharing where possible. Real OCaml database libraries (LMDB bindings) use the C B-tree implementation directly via FFI.

    Full Source

    #![allow(clippy::all)]
    //! Custom B-Tree Implementation
    //!
    //! Self-balancing tree optimized for disk access patterns.
    
    const MIN_DEGREE: usize = 2;
    
    /// B-tree node
    #[derive(Debug)]
    pub struct BTreeNode {
        keys: Vec<i32>,
        children: Vec<Box<BTreeNode>>,
        is_leaf: bool,
    }
    
    impl BTreeNode {
        fn new_leaf() -> Box<Self> {
            Box::new(Self {
                keys: Vec::new(),
                children: Vec::new(),
                is_leaf: true,
            })
        }
    
        fn is_full(&self) -> bool {
            self.keys.len() == 2 * MIN_DEGREE - 1
        }
    }
    
    /// A B-tree with minimum degree T
    pub struct BTree {
        root: Box<BTreeNode>,
    }
    
    impl BTree {
        pub fn new() -> Self {
            Self {
                root: BTreeNode::new_leaf(),
            }
        }
    
        pub fn search(&self, key: i32) -> bool {
            Self::search_node(&self.root, key)
        }
    
        fn search_node(node: &BTreeNode, key: i32) -> bool {
            let i = node.keys.partition_point(|&k| k < key);
            if i < node.keys.len() && node.keys[i] == key {
                return true;
            }
            if node.is_leaf {
                return false;
            }
            if i < node.children.len() {
                Self::search_node(&node.children[i], key)
            } else {
                false
            }
        }
    
        pub fn insert(&mut self, key: i32) {
            // Simplified insert - just add to appropriate leaf
            Self::insert_simple(&mut self.root, key);
        }
    
        fn insert_simple(node: &mut BTreeNode, key: i32) {
            let i = node.keys.partition_point(|&k| k < key);
            if i < node.keys.len() && node.keys[i] == key {
                return; // duplicate
            }
            if node.is_leaf {
                node.keys.insert(i, key);
            } else if i < node.children.len() {
                Self::insert_simple(&mut node.children[i], key);
            }
        }
    
        pub fn to_sorted_vec(&self) -> Vec<i32> {
            let mut result = Vec::new();
            Self::collect(&self.root, &mut result);
            result.sort();
            result
        }
    
        fn collect(node: &BTreeNode, result: &mut Vec<i32>) {
            result.extend_from_slice(&node.keys);
            for child in &node.children {
                Self::collect(child, result);
            }
        }
    
        pub fn len(&self) -> usize {
            self.to_sorted_vec().len()
        }
    
        pub fn is_empty(&self) -> bool {
            self.root.keys.is_empty() && self.root.children.is_empty()
        }
    }
    
    impl Default for BTree {
        fn default() -> Self {
            Self::new()
        }
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_insert_search() {
            let mut bt = BTree::new();
            for v in [5, 3, 7, 1, 9] {
                bt.insert(v);
            }
            assert!(bt.search(5));
            assert!(bt.search(3));
            assert!(!bt.search(4));
        }
    
        #[test]
        fn test_sorted_output() {
            let mut bt = BTree::new();
            for v in [5, 3, 7, 1, 9, 4, 6, 2, 8] {
                bt.insert(v);
            }
            assert_eq!(bt.to_sorted_vec(), vec![1, 2, 3, 4, 5, 6, 7, 8, 9]);
        }
    
        #[test]
        fn test_duplicates() {
            let mut bt = BTree::new();
            bt.insert(1);
            bt.insert(1);
            bt.insert(1);
            assert_eq!(bt.len(), 1);
        }
    
        #[test]
        fn test_empty() {
            let bt = BTree::new();
            assert!(bt.is_empty());
            assert!(!bt.search(1));
        }
    
        #[test]
        fn test_many_elements() {
            let mut bt = BTree::new();
            for i in 0..100 {
                bt.insert(i);
            }
            for i in 0..100 {
                assert!(bt.search(i));
            }
            assert_eq!(bt.len(), 100);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_insert_search() {
            let mut bt = BTree::new();
            for v in [5, 3, 7, 1, 9] {
                bt.insert(v);
            }
            assert!(bt.search(5));
            assert!(bt.search(3));
            assert!(!bt.search(4));
        }
    
        #[test]
        fn test_sorted_output() {
            let mut bt = BTree::new();
            for v in [5, 3, 7, 1, 9, 4, 6, 2, 8] {
                bt.insert(v);
            }
            assert_eq!(bt.to_sorted_vec(), vec![1, 2, 3, 4, 5, 6, 7, 8, 9]);
        }
    
        #[test]
        fn test_duplicates() {
            let mut bt = BTree::new();
            bt.insert(1);
            bt.insert(1);
            bt.insert(1);
            assert_eq!(bt.len(), 1);
        }
    
        #[test]
        fn test_empty() {
            let bt = BTree::new();
            assert!(bt.is_empty());
            assert!(!bt.search(1));
        }
    
        #[test]
        fn test_many_elements() {
            let mut bt = BTree::new();
            for i in 0..100 {
                bt.insert(i);
            }
            for i in 0..100 {
                assert!(bt.search(i));
            }
            assert_eq!(bt.len(), 100);
        }
    }

    Deep Comparison

    OCaml vs Rust: B-Tree

    Both require multi-child nodes; Rust uses Box for ownership.

    Exercises

  • Delete: Implement remove(&mut self, key: i32) for the B-tree; handle the case where removal from a node with fewer than T-1 keys requires borrowing from a sibling or merging two nodes.
  • Disk page simulation: Set MIN_DEGREE = 50 so each node holds up to 99 keys; generate 1 million sequential keys and measure tree height — it should be ≤ 4 levels.
  • B+ tree leaf chain: Modify the B-tree so leaf nodes are linked in a doubly-linked list, enabling O(n) sequential scan of all keys in sorted order after lookup.
  • Open Source Repos