078 Advanced

078 — Topological Sort (Ownership Focus)

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "078 — Topological Sort (Ownership Focus)" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. This example revisits topological sort (see example 073) with explicit focus on ownership in the DFS algorithm. Key difference from OCaml: 1. **Lifetime annotations**: Rust requires explicit `'a` to connect the lifetimes of `node: &'a str` and `visited: &mut HashSet<&'a str>`. OCaml's GC eliminates this bookkeeping.

Tutorial

The Problem

This example revisits topological sort (see example 073) with explicit focus on ownership in the DFS algorithm. The adjacency map borrows string slices (&str), the visited set borrows node names, and the result vector owns String values. Managing these lifetimes correctly is a concrete exercise in Rust's borrow checker.

Understanding how borrowed data flows through recursive algorithms — where closures cannot be used because of borrow conflicts — illustrates why Rust sometimes requires nested function definitions with explicit lifetime parameters.

🎯 Learning Outcomes

• Build an adjacency map HashMap<&str, Vec<&str>> from borrowed edge data

• Use a HashSet<&str> for visited tracking with lifetime annotations

• Implement DFS with a nested function that takes explicit lifetime parameters

• Understand why visited: &mut HashSet<&str> requires 'a on the node references

• Compare with the HashMap-and-clone approach for simpler lifetime management

Code Example

#![allow(clippy::all)]
use std::collections::{HashMap, HashSet};

/// Topological sort using DFS
///
/// Ownership insight: edges are borrowed as slices of string slices.
/// The visited set and result vector are owned locally.
pub fn topo_sort(edges: &[(&str, &str)]) -> Vec<String> {
    let mut all_nodes = HashSet::new();
    let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
    for &(a, b) in edges {
        all_nodes.insert(a);
        all_nodes.insert(b);
        adj.entry(a).or_default().push(b);
    }

    let mut visited = HashSet::new();
    let mut order = Vec::new();

    fn visit<'a>(
        node: &'a str,
        adj: &HashMap<&str, Vec<&'a str>>,
        visited: &mut HashSet<&'a str>,
        order: &mut Vec<String>,
    ) {
        if visited.contains(node) {
            return;
        }
        visited.insert(node);
        if let Some(neighbors) = adj.get(node) {
            for &n in neighbors {
                visit(n, adj, visited, order);
            }
        }
        order.push(node.to_string());
    }

    for &node in &all_nodes {
        visit(node, &adj, &mut visited, &mut order);
    }
    order
}

/// Version using owned Strings
pub fn topo_sort_owned(edges: Vec<(String, String)>) -> Vec<String> {
    let str_edges: Vec<(&str, &str)> = edges
        .iter()
        .map(|(a, b)| (a.as_str(), b.as_str()))
        .collect();
    topo_sort(&str_edges)
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_linear_chain() {
        let edges = vec![("a", "b"), ("b", "c"), ("c", "d")];
        let result = topo_sort(&edges);
        // d should come before c, c before b, b before a
        let pos = |s: &str| result.iter().position(|x| x == s).unwrap();
        assert!(pos("d") < pos("c"));
        assert!(pos("c") < pos("b"));
    }

    #[test]
    fn test_diamond() {
        let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d")];
        let result = topo_sort(&edges);
        let pos = |s: &str| result.iter().position(|x| x == s).unwrap();
        assert!(pos("d") < pos("b"));
        assert!(pos("d") < pos("c"));
    }

    #[test]
    fn test_empty() {
        let result = topo_sort(&[]);
        assert!(result.is_empty());
    }

    #[test]
    fn test_single_edge() {
        let result = topo_sort(&[("x", "y")]);
        assert_eq!(result.len(), 2);
    }

    #[test]
    fn test_owned_version() {
        let edges = vec![("a".into(), "b".into()), ("b".into(), "c".into())];
        let result = topo_sort_owned(edges);
        assert_eq!(result.len(), 3);
    }
}

(* Topological Sort — DAG Ordering *)
module SS = Set.Make(String)

(* Version 1: DFS-based topological sort *)
let topo_sort edges =
  let neighbors node =
    List.filter_map (fun (a, b) -> if a = node then Some b else None) edges in
  let all_nodes = List.fold_left (fun s (a, b) -> SS.add a (SS.add b s)) SS.empty edges in
  let rec visit node (visited, order) =
    if SS.mem node visited then (visited, order)
    else
      let visited = SS.add node visited in
      let visited, order = List.fold_left (fun acc n ->
        visit n acc) (visited, order) (neighbors node) in
      (visited, node :: order)
  in
  let _, order = SS.fold (fun node acc -> visit node acc) all_nodes (SS.empty, []) in
  order

let () =
  let edges = [("a","b");("a","c");("b","d");("c","d");("d","e")] in
  let result = topo_sort edges in
  assert (List.mem "e" result);
  List.iter (Printf.printf "%s ") result

Key Differences

Lifetime annotations: Rust requires explicit 'a to connect the lifetimes of node: &'a str and visited: &mut HashSet<&'a str>. OCaml's GC eliminates this bookkeeping.

Nested function vs closure: Rust's nested fn visit is a free function (not a closure) and cannot capture from the outer scope — each parameter must be passed explicitly. A closure would simplify this but creates borrow conflicts.

**&str in HashSet**: HashSet<&str> stores borrowed references. HashSet<String> stores owned copies. The &str version is faster (no clone) but requires lifetime management. OCaml uses string values directly.

**order.push(node.to_string())**: Converting &str to String at push time converts from borrowed to owned. OCaml's order := node :: !order works directly with string values.

OCaml Approach

OCaml's DFS uses mutable state but requires no lifetime annotations:

let topo_sort edges =
  let visited = Hashtbl.create 16 in
  let order = ref [] in
  let adj = build_adj edges in
  let rec visit node =
    if not (Hashtbl.mem visited node) then begin
      Hashtbl.add visited node ();
      List.iter visit (Hashtbl.find_opt adj node |> Option.value ~default:[]);
      order := node :: !order
    end
  in
  List.iter (fun (a, b) -> visit a; visit b) edges;
  !order

The closure captures adj, visited, and order by reference from the enclosing scope. OCaml's GC ensures all references remain valid. No lifetime annotations needed.

Full Source

#![allow(clippy::all)]
use std::collections::{HashMap, HashSet};

/// Topological sort using DFS
///
/// Ownership insight: edges are borrowed as slices of string slices.
/// The visited set and result vector are owned locally.
pub fn topo_sort(edges: &[(&str, &str)]) -> Vec<String> {
    let mut all_nodes = HashSet::new();
    let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
    for &(a, b) in edges {
        all_nodes.insert(a);
        all_nodes.insert(b);
        adj.entry(a).or_default().push(b);
    }

    let mut visited = HashSet::new();
    let mut order = Vec::new();

    fn visit<'a>(
        node: &'a str,
        adj: &HashMap<&str, Vec<&'a str>>,
        visited: &mut HashSet<&'a str>,
        order: &mut Vec<String>,
    ) {
        if visited.contains(node) {
            return;
        }
        visited.insert(node);
        if let Some(neighbors) = adj.get(node) {
            for &n in neighbors {
                visit(n, adj, visited, order);
            }
        }
        order.push(node.to_string());
    }

    for &node in &all_nodes {
        visit(node, &adj, &mut visited, &mut order);
    }
    order
}

/// Version using owned Strings
pub fn topo_sort_owned(edges: Vec<(String, String)>) -> Vec<String> {
    let str_edges: Vec<(&str, &str)> = edges
        .iter()
        .map(|(a, b)| (a.as_str(), b.as_str()))
        .collect();
    topo_sort(&str_edges)
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_linear_chain() {
        let edges = vec![("a", "b"), ("b", "c"), ("c", "d")];
        let result = topo_sort(&edges);
        // d should come before c, c before b, b before a
        let pos = |s: &str| result.iter().position(|x| x == s).unwrap();
        assert!(pos("d") < pos("c"));
        assert!(pos("c") < pos("b"));
    }

    #[test]
    fn test_diamond() {
        let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d")];
        let result = topo_sort(&edges);
        let pos = |s: &str| result.iter().position(|x| x == s).unwrap();
        assert!(pos("d") < pos("b"));
        assert!(pos("d") < pos("c"));
    }

    #[test]
    fn test_empty() {
        let result = topo_sort(&[]);
        assert!(result.is_empty());
    }

    #[test]
    fn test_single_edge() {
        let result = topo_sort(&[("x", "y")]);
        assert_eq!(result.len(), 2);
    }

    #[test]
    fn test_owned_version() {
        let edges = vec![("a".into(), "b".into()), ("b".into(), "c".into())];
        let result = topo_sort_owned(edges);
        assert_eq!(result.len(), 3);
    }
}

(* Topological Sort — DAG Ordering *)
module SS = Set.Make(String)

(* Version 1: DFS-based topological sort *)
let topo_sort edges =
  let neighbors node =
    List.filter_map (fun (a, b) -> if a = node then Some b else None) edges in
  let all_nodes = List.fold_left (fun s (a, b) -> SS.add a (SS.add b s)) SS.empty edges in
  let rec visit node (visited, order) =
    if SS.mem node visited then (visited, order)
    else
      let visited = SS.add node visited in
      let visited, order = List.fold_left (fun acc n ->
        visit n acc) (visited, order) (neighbors node) in
      (visited, node :: order)
  in
  let _, order = SS.fold (fun node acc -> visit node acc) all_nodes (SS.empty, []) in
  order

let () =
  let edges = [("a","b");("a","c");("b","d");("c","d");("d","e")] in
  let result = topo_sort edges in
  assert (List.mem "e" result);
  List.iter (Printf.printf "%s ") result

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_linear_chain() {
        let edges = vec![("a", "b"), ("b", "c"), ("c", "d")];
        let result = topo_sort(&edges);
        // d should come before c, c before b, b before a
        let pos = |s: &str| result.iter().position(|x| x == s).unwrap();
        assert!(pos("d") < pos("c"));
        assert!(pos("c") < pos("b"));
    }

    #[test]
    fn test_diamond() {
        let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d")];
        let result = topo_sort(&edges);
        let pos = |s: &str| result.iter().position(|x| x == s).unwrap();
        assert!(pos("d") < pos("b"));
        assert!(pos("d") < pos("c"));
    }

    #[test]
    fn test_empty() {
        let result = topo_sort(&[]);
        assert!(result.is_empty());
    }

    #[test]
    fn test_single_edge() {
        let result = topo_sort(&[("x", "y")]);
        assert_eq!(result.len(), 2);
    }

    #[test]
    fn test_owned_version() {
        let edges = vec![("a".into(), "b".into()), ("b".into(), "c".into())];
        let result = topo_sort_owned(edges);
        assert_eq!(result.len(), 3);
    }
}

Deep Comparison

Topological Sort — Comparison

Core Insight

Topological sort highlights how OCaml and Rust handle mutable state differently. OCaml passes immutable tuples (visited, order) through recursive calls. Rust uses &mut references, requiring lifetime annotations when the recursive helper borrows string slices.

OCaml Approach

• Set.Make(String) for visited nodes — persistent/immutable set

• State threaded as (visited, order) tuple through fold

• SS.fold iterates all nodes; inner visit recurses on neighbors

• No mutation — each recursive call returns updated state

Rust Approach

• HashSet and HashMap — mutable, efficient hash-based collections

• &mut visited and &mut order passed to recursive helper

• Lifetime 'a needed on visit to link borrowed &str to input edges

• to_string() at collection point to own the results

Comparison Table

Aspect	OCaml	Rust
Set type	`Set.Make(String)`	`HashSet<&str>`
State passing	Immutable tuple	`&mut` references
Lifetimes	Implicit (GC)	Explicit `'a` annotations
String handling	`string` (GC)	`&str` borrowed, `String` owned
Adjacency	`List.filter_map`	`HashMap<&str, Vec<&str>>`

Learner Notes

• Rust lifetime annotations on recursive functions can be tricky

• HashMap::entry().or_default() is idiomatic for building adjacency lists

• OCaml threading immutable state is elegant but can be harder to optimize

• Rust mutable approach is more imperative but avoids allocation per recursive call

Exercises

String-owned version: Rewrite topo_sort to use HashMap<String, Vec<String>> and HashSet<String> — clone-based, no lifetimes needed. Benchmark vs the &str version.

Parallel DFS: Describe the challenges of parallelizing DFS-based topological sort. What data structures need to be concurrent? When would this be beneficial?

Incremental sort: Write IncrementalTopoSort that allows adding edges one at a time and queries "is there a topological order?" and "what is the current order?" efficiently.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust