ExamplesBy LevelBy TopicLearning Paths
078 Advanced

078 — Topological Sort (Ownership Focus)

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "078 — Topological Sort (Ownership Focus)" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. This example revisits topological sort (see example 073) with explicit focus on ownership in the DFS algorithm. Key difference from OCaml: 1. **Lifetime annotations**: Rust requires explicit `'a` to connect the lifetimes of `node: &'a str` and `visited: &mut HashSet<&'a str>`. OCaml's GC eliminates this bookkeeping.

Tutorial

The Problem

This example revisits topological sort (see example 073) with explicit focus on ownership in the DFS algorithm. The adjacency map borrows string slices (&str), the visited set borrows node names, and the result vector owns String values. Managing these lifetimes correctly is a concrete exercise in Rust's borrow checker.

Understanding how borrowed data flows through recursive algorithms — where closures cannot be used because of borrow conflicts — illustrates why Rust sometimes requires nested function definitions with explicit lifetime parameters.

🎯 Learning Outcomes

  • • Build an adjacency map HashMap<&str, Vec<&str>> from borrowed edge data
  • • Use a HashSet<&str> for visited tracking with lifetime annotations
  • • Implement DFS with a nested function that takes explicit lifetime parameters
  • • Understand why visited: &mut HashSet<&str> requires 'a on the node references
  • • Compare with the HashMap-and-clone approach for simpler lifetime management
  • Code Example

    #![allow(clippy::all)]
    use std::collections::{HashMap, HashSet};
    
    /// Topological sort using DFS
    ///
    /// Ownership insight: edges are borrowed as slices of string slices.
    /// The visited set and result vector are owned locally.
    pub fn topo_sort(edges: &[(&str, &str)]) -> Vec<String> {
        let mut all_nodes = HashSet::new();
        let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
        for &(a, b) in edges {
            all_nodes.insert(a);
            all_nodes.insert(b);
            adj.entry(a).or_default().push(b);
        }
    
        let mut visited = HashSet::new();
        let mut order = Vec::new();
    
        fn visit<'a>(
            node: &'a str,
            adj: &HashMap<&str, Vec<&'a str>>,
            visited: &mut HashSet<&'a str>,
            order: &mut Vec<String>,
        ) {
            if visited.contains(node) {
                return;
            }
            visited.insert(node);
            if let Some(neighbors) = adj.get(node) {
                for &n in neighbors {
                    visit(n, adj, visited, order);
                }
            }
            order.push(node.to_string());
        }
    
        for &node in &all_nodes {
            visit(node, &adj, &mut visited, &mut order);
        }
        order
    }
    
    /// Version using owned Strings
    pub fn topo_sort_owned(edges: Vec<(String, String)>) -> Vec<String> {
        let str_edges: Vec<(&str, &str)> = edges
            .iter()
            .map(|(a, b)| (a.as_str(), b.as_str()))
            .collect();
        topo_sort(&str_edges)
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_linear_chain() {
            let edges = vec![("a", "b"), ("b", "c"), ("c", "d")];
            let result = topo_sort(&edges);
            // d should come before c, c before b, b before a
            let pos = |s: &str| result.iter().position(|x| x == s).unwrap();
            assert!(pos("d") < pos("c"));
            assert!(pos("c") < pos("b"));
        }
    
        #[test]
        fn test_diamond() {
            let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d")];
            let result = topo_sort(&edges);
            let pos = |s: &str| result.iter().position(|x| x == s).unwrap();
            assert!(pos("d") < pos("b"));
            assert!(pos("d") < pos("c"));
        }
    
        #[test]
        fn test_empty() {
            let result = topo_sort(&[]);
            assert!(result.is_empty());
        }
    
        #[test]
        fn test_single_edge() {
            let result = topo_sort(&[("x", "y")]);
            assert_eq!(result.len(), 2);
        }
    
        #[test]
        fn test_owned_version() {
            let edges = vec![("a".into(), "b".into()), ("b".into(), "c".into())];
            let result = topo_sort_owned(edges);
            assert_eq!(result.len(), 3);
        }
    }

    Key Differences

  • Lifetime annotations: Rust requires explicit 'a to connect the lifetimes of node: &'a str and visited: &mut HashSet<&'a str>. OCaml's GC eliminates this bookkeeping.
  • Nested function vs closure: Rust's nested fn visit is a free function (not a closure) and cannot capture from the outer scope — each parameter must be passed explicitly. A closure would simplify this but creates borrow conflicts.
  • **&str in HashSet**: HashSet<&str> stores borrowed references. HashSet<String> stores owned copies. The &str version is faster (no clone) but requires lifetime management. OCaml uses string values directly.
  • **order.push(node.to_string())**: Converting &str to String at push time converts from borrowed to owned. OCaml's order := node :: !order works directly with string values.
  • OCaml Approach

    OCaml's DFS uses mutable state but requires no lifetime annotations:

    let topo_sort edges =
      let visited = Hashtbl.create 16 in
      let order = ref [] in
      let adj = build_adj edges in
      let rec visit node =
        if not (Hashtbl.mem visited node) then begin
          Hashtbl.add visited node ();
          List.iter visit (Hashtbl.find_opt adj node |> Option.value ~default:[]);
          order := node :: !order
        end
      in
      List.iter (fun (a, b) -> visit a; visit b) edges;
      !order
    

    The closure captures adj, visited, and order by reference from the enclosing scope. OCaml's GC ensures all references remain valid. No lifetime annotations needed.

    Full Source

    #![allow(clippy::all)]
    use std::collections::{HashMap, HashSet};
    
    /// Topological sort using DFS
    ///
    /// Ownership insight: edges are borrowed as slices of string slices.
    /// The visited set and result vector are owned locally.
    pub fn topo_sort(edges: &[(&str, &str)]) -> Vec<String> {
        let mut all_nodes = HashSet::new();
        let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
        for &(a, b) in edges {
            all_nodes.insert(a);
            all_nodes.insert(b);
            adj.entry(a).or_default().push(b);
        }
    
        let mut visited = HashSet::new();
        let mut order = Vec::new();
    
        fn visit<'a>(
            node: &'a str,
            adj: &HashMap<&str, Vec<&'a str>>,
            visited: &mut HashSet<&'a str>,
            order: &mut Vec<String>,
        ) {
            if visited.contains(node) {
                return;
            }
            visited.insert(node);
            if let Some(neighbors) = adj.get(node) {
                for &n in neighbors {
                    visit(n, adj, visited, order);
                }
            }
            order.push(node.to_string());
        }
    
        for &node in &all_nodes {
            visit(node, &adj, &mut visited, &mut order);
        }
        order
    }
    
    /// Version using owned Strings
    pub fn topo_sort_owned(edges: Vec<(String, String)>) -> Vec<String> {
        let str_edges: Vec<(&str, &str)> = edges
            .iter()
            .map(|(a, b)| (a.as_str(), b.as_str()))
            .collect();
        topo_sort(&str_edges)
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_linear_chain() {
            let edges = vec![("a", "b"), ("b", "c"), ("c", "d")];
            let result = topo_sort(&edges);
            // d should come before c, c before b, b before a
            let pos = |s: &str| result.iter().position(|x| x == s).unwrap();
            assert!(pos("d") < pos("c"));
            assert!(pos("c") < pos("b"));
        }
    
        #[test]
        fn test_diamond() {
            let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d")];
            let result = topo_sort(&edges);
            let pos = |s: &str| result.iter().position(|x| x == s).unwrap();
            assert!(pos("d") < pos("b"));
            assert!(pos("d") < pos("c"));
        }
    
        #[test]
        fn test_empty() {
            let result = topo_sort(&[]);
            assert!(result.is_empty());
        }
    
        #[test]
        fn test_single_edge() {
            let result = topo_sort(&[("x", "y")]);
            assert_eq!(result.len(), 2);
        }
    
        #[test]
        fn test_owned_version() {
            let edges = vec![("a".into(), "b".into()), ("b".into(), "c".into())];
            let result = topo_sort_owned(edges);
            assert_eq!(result.len(), 3);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_linear_chain() {
            let edges = vec![("a", "b"), ("b", "c"), ("c", "d")];
            let result = topo_sort(&edges);
            // d should come before c, c before b, b before a
            let pos = |s: &str| result.iter().position(|x| x == s).unwrap();
            assert!(pos("d") < pos("c"));
            assert!(pos("c") < pos("b"));
        }
    
        #[test]
        fn test_diamond() {
            let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d")];
            let result = topo_sort(&edges);
            let pos = |s: &str| result.iter().position(|x| x == s).unwrap();
            assert!(pos("d") < pos("b"));
            assert!(pos("d") < pos("c"));
        }
    
        #[test]
        fn test_empty() {
            let result = topo_sort(&[]);
            assert!(result.is_empty());
        }
    
        #[test]
        fn test_single_edge() {
            let result = topo_sort(&[("x", "y")]);
            assert_eq!(result.len(), 2);
        }
    
        #[test]
        fn test_owned_version() {
            let edges = vec![("a".into(), "b".into()), ("b".into(), "c".into())];
            let result = topo_sort_owned(edges);
            assert_eq!(result.len(), 3);
        }
    }

    Deep Comparison

    Topological Sort — Comparison

    Core Insight

    Topological sort highlights how OCaml and Rust handle mutable state differently. OCaml passes immutable tuples (visited, order) through recursive calls. Rust uses &mut references, requiring lifetime annotations when the recursive helper borrows string slices.

    OCaml Approach

  • Set.Make(String) for visited nodes — persistent/immutable set
  • • State threaded as (visited, order) tuple through fold
  • SS.fold iterates all nodes; inner visit recurses on neighbors
  • • No mutation — each recursive call returns updated state
  • Rust Approach

  • HashSet and HashMap — mutable, efficient hash-based collections
  • &mut visited and &mut order passed to recursive helper
  • • Lifetime 'a needed on visit to link borrowed &str to input edges
  • to_string() at collection point to own the results
  • Comparison Table

    AspectOCamlRust
    Set typeSet.Make(String)HashSet<&str>
    State passingImmutable tuple&mut references
    LifetimesImplicit (GC)Explicit 'a annotations
    String handlingstring (GC)&str borrowed, String owned
    AdjacencyList.filter_mapHashMap<&str, Vec<&str>>

    Learner Notes

  • • Rust lifetime annotations on recursive functions can be tricky
  • HashMap::entry().or_default() is idiomatic for building adjacency lists
  • • OCaml threading immutable state is elegant but can be harder to optimize
  • • Rust mutable approach is more imperative but avoids allocation per recursive call
  • Exercises

  • String-owned version: Rewrite topo_sort to use HashMap<String, Vec<String>> and HashSet<String> — clone-based, no lifetimes needed. Benchmark vs the &str version.
  • Parallel DFS: Describe the challenges of parallelizing DFS-based topological sort. What data structures need to be concurrent? When would this be beneficial?
  • Incremental sort: Write IncrementalTopoSort that allows adding edges one at a time and queries "is there a topological order?" and "what is the current order?" efficiently.
  • Open Source Repos