365 Intermediate

365: Disjoint Set (Union-Find)

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "365: Disjoint Set (Union-Find)" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Many problems require tracking which elements belong to the same group and efficiently merging groups: Kruskal's minimum spanning tree algorithm, connected components in a graph, network percolation, image segmentation, and equivalence class tracking in type inference. Key difference from OCaml: | Aspect | Rust `UnionFind` | OCaml `uf` |

Tutorial

The Problem

Many problems require tracking which elements belong to the same group and efficiently merging groups: Kruskal's minimum spanning tree algorithm, connected components in a graph, network percolation, image segmentation, and equivalence class tracking in type inference. The Union-Find (Disjoint Set Union) data structure, developed by Bernard Galler and Michael Fischer (1964) with the key optimizations by Tarjan (1975), provides near-O(1) amortized operations. The two optimizations — path compression and union by rank — together achieve O(α(n)) per operation where α is the inverse Ackermann function, effectively constant for all practical inputs.

🎯 Learning Outcomes

• Implement Union-Find with parent: Vec<usize> where parent[x] == x marks root nodes

• Apply path compression in find: flatten the tree by setting parent[x] = root directly

• Apply union by rank: always attach the shorter tree under the taller one

• Track component count and component sizes

• Implement connected(x, y) -> bool as find(x) == find(y)

• Recognize Union-Find in Kruskal's MST, cycle detection, and percolation

Code Example

struct UnionFind {
    parent: Vec<usize>,
    rank: Vec<u32>,
    size: Vec<usize>,
    components: usize,
}

let parent = Array.init 10 (fun i -> i)
let rank   = Array.make 10 0

Key Differences

Aspect	Rust `UnionFind`	OCaml `uf`
Storage	`Vec<usize>`	`int array`
Path compression	Recursive with assignment	Recursive with assignment
Borrowing	`&mut self` for path compression	Mutable array cells
Size tracking	`Vec<usize>` per root	Not shown (add `size` array)
Complexity	O(α(n)) amortized	O(α(n)) amortized

OCaml Approach

type uf = {
  parent: int array;
  rank: int array;
  mutable components: int;
}

let make n =
  { parent = Array.init n Fun.id; rank = Array.make n 0; components = n }

let rec find uf x =
  if uf.parent.(x) = x then x
  else begin
    let root = find uf uf.parent.(x) in
    uf.parent.(x) <- root;  (* path compression *)
    root
  end

let union uf x y =
  let rx = find uf x and ry = find uf y in
  if rx = ry then false
  else begin
    if uf.rank.(rx) < uf.rank.(ry) then uf.parent.(rx) <- ry
    else if uf.rank.(rx) > uf.rank.(ry) then uf.parent.(ry) <- rx
    else (uf.parent.(ry) <- rx; uf.rank.(rx) <- uf.rank.(rx) + 1);
    uf.components <- uf.components - 1;
    true
  end

The algorithm is identical in both languages — Union-Find is inherently imperative (mutating parent pointers is the key insight). Both use array-based storage for O(1) index access.

Full Source

#![allow(clippy::all)]
//! Union-Find (Disjoint Set) Data Structure
//!
//! Track which elements belong to the same group and merge groups in O(α(n)) amortized.

/// Union-Find with path compression and union by rank
#[derive(Debug, Clone)]
pub struct UnionFind {
    parent: Vec<usize>,
    rank: Vec<u32>,
    size: Vec<usize>,
    components: usize,
}

impl UnionFind {
    // === Approach 1: Basic API ===

    /// Create a new Union-Find with n elements (0..n)
    pub fn new(n: usize) -> Self {
        Self {
            parent: (0..n).collect(),
            rank: vec![0; n],
            size: vec![1; n],
            components: n,
        }
    }

    /// Find the root of element x with path compression - O(α(n)) amortized
    pub fn find(&mut self, x: usize) -> usize {
        if self.parent[x] != x {
            self.parent[x] = self.find(self.parent[x]); // path compression
        }
        self.parent[x]
    }

    /// Union two sets - returns true if they were different sets
    pub fn union(&mut self, x: usize, y: usize) -> bool {
        let rx = self.find(x);
        let ry = self.find(y);
        if rx == ry {
            return false; // already connected
        }
        // Union by rank
        if self.rank[rx] < self.rank[ry] {
            self.parent[rx] = ry;
            self.size[ry] += self.size[rx];
        } else if self.rank[rx] > self.rank[ry] {
            self.parent[ry] = rx;
            self.size[rx] += self.size[ry];
        } else {
            self.parent[ry] = rx;
            self.size[rx] += self.size[ry];
            self.rank[rx] += 1;
        }
        self.components -= 1;
        true
    }

    /// Check if two elements are in the same set
    pub fn connected(&mut self, x: usize, y: usize) -> bool {
        self.find(x) == self.find(y)
    }

    // === Approach 2: Query methods ===

    /// Get the size of the component containing x
    pub fn component_size(&mut self, x: usize) -> usize {
        let root = self.find(x);
        self.size[root]
    }

    /// Get the number of distinct components
    pub fn num_components(&self) -> usize {
        self.components
    }

    /// Get the total number of elements
    pub fn len(&self) -> usize {
        self.parent.len()
    }

    /// Check if empty
    pub fn is_empty(&self) -> bool {
        self.parent.is_empty()
    }

    // === Approach 3: Immutable find (for read-only queries) ===

    /// Find root without path compression (for immutable access)
    pub fn find_immut(&self, mut x: usize) -> usize {
        while self.parent[x] != x {
            x = self.parent[x];
        }
        x
    }

    /// Check connectivity without mutation
    pub fn connected_immut(&self, x: usize, y: usize) -> bool {
        self.find_immut(x) == self.find_immut(y)
    }

    /// Get all elements in the same component as x
    pub fn component_members(&self, x: usize) -> Vec<usize> {
        let root = self.find_immut(x);
        (0..self.len())
            .filter(|&i| self.find_immut(i) == root)
            .collect()
    }

    /// Get all roots (one per component)
    pub fn roots(&self) -> Vec<usize> {
        (0..self.len()).filter(|&i| self.parent[i] == i).collect()
    }
}

/// Count connected components in a graph
pub fn count_connected_components(n: usize, edges: &[(usize, usize)]) -> usize {
    let mut uf = UnionFind::new(n);
    for &(u, v) in edges {
        uf.union(u, v);
    }
    uf.num_components()
}

/// Check if a graph has a cycle (for undirected graphs)
pub fn has_cycle(n: usize, edges: &[(usize, usize)]) -> bool {
    let mut uf = UnionFind::new(n);
    for &(u, v) in edges {
        if !uf.union(u, v) {
            return true; // already connected = cycle
        }
    }
    false
}

/// Kruskal's MST - returns edges in MST and total weight
pub fn kruskal_mst(n: usize, mut edges: Vec<(i64, usize, usize)>) -> (Vec<(usize, usize)>, i64) {
    edges.sort_by_key(|e| e.0);
    let mut uf = UnionFind::new(n);
    let mut mst = Vec::new();
    let mut total_weight = 0;

    for (weight, u, v) in edges {
        if uf.union(u, v) {
            mst.push((u, v));
            total_weight += weight;
        }
    }

    (mst, total_weight)
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_basic_union_find() {
        let mut uf = UnionFind::new(5);
        assert!(!uf.connected(0, 1));
        uf.union(0, 1);
        assert!(uf.connected(0, 1));
    }

    #[test]
    fn test_transitive_connectivity() {
        let mut uf = UnionFind::new(4);
        uf.union(0, 1);
        uf.union(1, 2);
        assert!(uf.connected(0, 2));
        assert!(!uf.connected(0, 3));
    }

    #[test]
    fn test_component_count() {
        let mut uf = UnionFind::new(5);
        assert_eq!(uf.num_components(), 5);
        uf.union(0, 1);
        assert_eq!(uf.num_components(), 4);
        uf.union(2, 3);
        assert_eq!(uf.num_components(), 3);
        uf.union(0, 2);
        assert_eq!(uf.num_components(), 2);
    }

    #[test]
    fn test_component_size() {
        let mut uf = UnionFind::new(5);
        assert_eq!(uf.component_size(0), 1);
        uf.union(0, 1);
        uf.union(1, 2);
        assert_eq!(uf.component_size(0), 3);
        assert_eq!(uf.component_size(1), 3);
        assert_eq!(uf.component_size(2), 3);
    }

    #[test]
    fn test_count_connected_components() {
        assert_eq!(count_connected_components(5, &[(0, 1), (2, 3)]), 3);
        assert_eq!(count_connected_components(4, &[(0, 1), (1, 2), (2, 3)]), 1);
    }

    #[test]
    fn test_has_cycle() {
        assert!(!has_cycle(3, &[(0, 1), (1, 2)]));
        assert!(has_cycle(3, &[(0, 1), (1, 2), (2, 0)]));
    }

    #[test]
    fn test_kruskal_mst() {
        // Triangle with weights 1, 2, 3
        let edges = vec![(1, 0, 1), (2, 1, 2), (3, 0, 2)];
        let (mst, weight) = kruskal_mst(3, edges);
        assert_eq!(mst.len(), 2);
        assert_eq!(weight, 3); // edges with weight 1 and 2
    }

    #[test]
    fn test_component_members() {
        let mut uf = UnionFind::new(5);
        uf.union(0, 1);
        uf.union(1, 2);

        let mut members = uf.component_members(0);
        members.sort();
        assert_eq!(members, vec![0, 1, 2]);
    }

    #[test]
    fn test_roots() {
        let mut uf = UnionFind::new(4);
        uf.union(0, 1);
        uf.union(2, 3);

        let roots = uf.roots();
        assert_eq!(roots.len(), 2);
    }

    #[test]
    fn test_immutable_find() {
        let mut uf = UnionFind::new(3);
        uf.union(0, 1);

        // Immutable find should work
        assert!(uf.connected_immut(0, 1));
        assert!(!uf.connected_immut(0, 2));
    }
}

(* OCaml: Union-Find with arrays *)

let parent = Array.init 10 (fun i -> i)
let rank   = Array.make 10 0

let rec find x =
  if parent.(x) = x then x
  else begin
    parent.(x) <- find parent.(x);  (* path compression *)
    parent.(x)
  end

let union x y =
  let rx = find x and ry = find y in
  if rx = ry then ()
  else if rank.(rx) < rank.(ry) then parent.(rx) <- ry
  else if rank.(rx) > rank.(ry) then parent.(ry) <- rx
  else begin parent.(ry) <- rx; rank.(rx) <- rank.(rx)+1 end

let () =
  union 0 1; union 2 3; union 0 2;
  Printf.printf "Connected(0,3): %b\n" (find 0 = find 3);
  Printf.printf "Connected(0,4): %b\n" (find 0 = find 4)

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_basic_union_find() {
        let mut uf = UnionFind::new(5);
        assert!(!uf.connected(0, 1));
        uf.union(0, 1);
        assert!(uf.connected(0, 1));
    }

    #[test]
    fn test_transitive_connectivity() {
        let mut uf = UnionFind::new(4);
        uf.union(0, 1);
        uf.union(1, 2);
        assert!(uf.connected(0, 2));
        assert!(!uf.connected(0, 3));
    }

    #[test]
    fn test_component_count() {
        let mut uf = UnionFind::new(5);
        assert_eq!(uf.num_components(), 5);
        uf.union(0, 1);
        assert_eq!(uf.num_components(), 4);
        uf.union(2, 3);
        assert_eq!(uf.num_components(), 3);
        uf.union(0, 2);
        assert_eq!(uf.num_components(), 2);
    }

    #[test]
    fn test_component_size() {
        let mut uf = UnionFind::new(5);
        assert_eq!(uf.component_size(0), 1);
        uf.union(0, 1);
        uf.union(1, 2);
        assert_eq!(uf.component_size(0), 3);
        assert_eq!(uf.component_size(1), 3);
        assert_eq!(uf.component_size(2), 3);
    }

    #[test]
    fn test_count_connected_components() {
        assert_eq!(count_connected_components(5, &[(0, 1), (2, 3)]), 3);
        assert_eq!(count_connected_components(4, &[(0, 1), (1, 2), (2, 3)]), 1);
    }

    #[test]
    fn test_has_cycle() {
        assert!(!has_cycle(3, &[(0, 1), (1, 2)]));
        assert!(has_cycle(3, &[(0, 1), (1, 2), (2, 0)]));
    }

    #[test]
    fn test_kruskal_mst() {
        // Triangle with weights 1, 2, 3
        let edges = vec![(1, 0, 1), (2, 1, 2), (3, 0, 2)];
        let (mst, weight) = kruskal_mst(3, edges);
        assert_eq!(mst.len(), 2);
        assert_eq!(weight, 3); // edges with weight 1 and 2
    }

    #[test]
    fn test_component_members() {
        let mut uf = UnionFind::new(5);
        uf.union(0, 1);
        uf.union(1, 2);

        let mut members = uf.component_members(0);
        members.sort();
        assert_eq!(members, vec![0, 1, 2]);
    }

    #[test]
    fn test_roots() {
        let mut uf = UnionFind::new(4);
        uf.union(0, 1);
        uf.union(2, 3);

        let roots = uf.roots();
        assert_eq!(roots.len(), 2);
    }

    #[test]
    fn test_immutable_find() {
        let mut uf = UnionFind::new(3);
        uf.union(0, 1);

        // Immutable find should work
        assert!(uf.connected_immut(0, 1));
        assert!(!uf.connected_immut(0, 2));
    }
}

Deep Comparison

OCaml vs Rust: Union-Find (Disjoint Set)

Side-by-Side Comparison

Data Structure

OCaml:

let parent = Array.init 10 (fun i -> i)
let rank   = Array.make 10 0

Rust:

struct UnionFind {
    parent: Vec<usize>,
    rank: Vec<u32>,
    size: Vec<usize>,
    components: usize,
}

Find with Path Compression

OCaml:

let rec find x =
  if parent.(x) = x then x
  else begin
    parent.(x) <- find parent.(x);  (* path compression *)
    parent.(x)
  end

Rust:

fn find(&mut self, x: usize) -> usize {
    if self.parent[x] != x {
        self.parent[x] = self.find(self.parent[x]); // path compression
    }
    self.parent[x]
}

Union by Rank

OCaml:

let union x y =
  let rx = find x and ry = find y in
  if rx = ry then ()
  else if rank.(rx) < rank.(ry) then parent.(rx) <- ry
  else if rank.(rx) > rank.(ry) then parent.(ry) <- rx
  else begin parent.(ry) <- rx; rank.(rx) <- rank.(rx)+1 end

Rust:

fn union(&mut self, x: usize, y: usize) -> bool {
    let rx = self.find(x);
    let ry = self.find(y);
    if rx == ry { return false; }
    
    if self.rank[rx] < self.rank[ry] {
        self.parent[rx] = ry;
    } else if self.rank[rx] > self.rank[ry] {
        self.parent[ry] = rx;
    } else {
        self.parent[ry] = rx;
        self.rank[rx] += 1;
    }
    self.components -= 1;
    true
}

Key Differences

Aspect	OCaml	Rust
Return value	`unit`	`bool` (was different?)
Component tracking	Manual	Built into struct
Size tracking	Not included	Included
Mutability	Global arrays	`&mut self`
Encapsulation	Module-level	Struct methods

Complexity

Both achieve the same optimal complexity:

Operation	Complexity
Find	O(α(n)) amortized
Union	O(α(n)) amortized
Connected	O(α(n)) amortized

Where α(n) is the inverse Ackermann function, effectively ≤ 4 for all practical n.

Optimizations

Path Compression: After finding a root, update all nodes on the path to point directly to the root.

Union by Rank: Always attach the shorter tree under the root of the taller tree.

Together, these make operations essentially O(1) in practice.

Use Cases

Use Case	Description
Kruskal's MST	Check if edge creates cycle
Network connectivity	Online "are A and B connected?"
Cycle detection	`union()` returns false = cycle
Image segmentation	Group connected pixels

Exercises

Kruskal's MST: Sort edges by weight, then apply Union-Find to select edges that connect different components; stop when you have n-1 edges for n nodes — this is Kruskal's O(E log E) MST algorithm.

Connected components: Given an undirected graph as edge list, use Union-Find to count connected components; report the size of each component using the size array.

Percolation: Create an N×N grid where each cell is open with probability p; use Union-Find to determine if the top row connects to the bottom row; find the percolation threshold via Monte Carlo simulation.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust