ExamplesBy LevelBy TopicLearning Paths
073 Advanced

073 — Topological Sort

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "073 — Topological Sort" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. Topological sort orders the nodes of a directed acyclic graph (DAG) such that for every directed edge (u → v), node u appears before v in the ordering. Key difference from OCaml: 1. **Nested function**: Rust's nested `fn visit<'a>(...)` inside `topo_sort` requires explicit lifetime annotations for the borrowed adjacency map. OCaml closures capture the environment implicitly.

Tutorial

The Problem

Topological sort orders the nodes of a directed acyclic graph (DAG) such that for every directed edge (u → v), node u appears before v in the ordering. It answers the question: "in what order should I complete these tasks, given their dependencies?" This is one of the most widely used graph algorithms in practice.

Build systems (Bazel, Make, Cargo) topologically sort compilation units so dependencies compile before dependents. Package managers (npm, pip, apt) sort package installations so transitive dependencies install first. Spreadsheet engines recalculate cells in topological order. Database engines plan query execution and schema migrations using topological ordering.

The DFS-based algorithm (Tarjan, 1976) marks each node visited, recurses on all neighbors first, then adds the current node — this is post-order DFS. Reversing the post-order gives a valid topological ordering. Kahn's algorithm (BFS-based) is an alternative that processes nodes with in-degree 0 iteratively and naturally detects cycles.

🎯 Learning Outcomes

  • • Implement DFS-based topological sort with a visited set and post-order accumulation
  • • Build an adjacency list from edge pairs using HashMap<&str, Vec<&str>>
  • • Use a HashSet for O(1) visited checking
  • • Understand the post-order property: a node is added AFTER all its descendants
  • • Recognize that the reversed post-order DFS is a valid topological ordering
  • Code Example

    #![allow(clippy::all)]
    /// # Topological Sort — DAG Ordering
    ///
    /// Order nodes in a directed acyclic graph so every edge goes from earlier to later.
    /// Uses DFS-based algorithm.
    use std::collections::{HashMap, HashSet};
    
    /// DFS-based topological sort.
    /// Returns nodes in topological order (dependencies first).
    pub fn topo_sort(edges: &[(&str, &str)]) -> Vec<String> {
        // Collect all nodes
        let mut all_nodes = HashSet::new();
        let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
        for &(from, to) in edges {
            all_nodes.insert(from);
            all_nodes.insert(to);
            adj.entry(from).or_default().push(to);
        }
    
        let mut visited = HashSet::new();
        let mut order = Vec::new();
    
        fn visit<'a>(
            node: &'a str,
            adj: &HashMap<&'a str, Vec<&'a str>>,
            visited: &mut HashSet<&'a str>,
            order: &mut Vec<String>,
        ) {
            if visited.contains(node) {
                return;
            }
            visited.insert(node);
            if let Some(neighbors) = adj.get(node) {
                for &neighbor in neighbors {
                    visit(neighbor, adj, visited, order);
                }
            }
            // Post-order: add after all descendants visited
            order.push(node.to_string());
        }
    
        // Visit all nodes (handles disconnected components)
        let mut sorted_nodes: Vec<&str> = all_nodes.into_iter().collect();
        sorted_nodes.sort(); // deterministic ordering
        for node in sorted_nodes {
            visit(node, &adj, &mut visited, &mut order);
        }
    
        order.reverse(); // reverse post-order = topological order
        order
    }
    
    /// Kahn's algorithm (BFS-based) — alternative approach using in-degree counting.
    pub fn topo_sort_kahn(edges: &[(&str, &str)]) -> Vec<String> {
        let mut all_nodes = HashSet::new();
        let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
        let mut in_degree: HashMap<&str, usize> = HashMap::new();
    
        for &(from, to) in edges {
            all_nodes.insert(from);
            all_nodes.insert(to);
            adj.entry(from).or_default().push(to);
            *in_degree.entry(to).or_insert(0) += 1;
            in_degree.entry(from).or_insert(0);
        }
    
        // Start with nodes that have no incoming edges
        let mut queue: Vec<&str> = in_degree
            .iter()
            .filter(|(_, &deg)| deg == 0)
            .map(|(&node, _)| node)
            .collect();
        queue.sort(); // deterministic
        let mut result = Vec::new();
    
        while let Some(node) = queue.first().copied() {
            queue.remove(0);
            result.push(node.to_string());
            if let Some(neighbors) = adj.get(node) {
                for &neighbor in neighbors {
                    let deg = in_degree.get_mut(neighbor).unwrap();
                    *deg -= 1;
                    if *deg == 0 {
                        queue.push(neighbor);
                        queue.sort();
                    }
                }
            }
        }
    
        result
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_basic() {
            let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d"), ("d", "e")];
            let order = topo_sort(&edges);
            // a must come before b and c; b and c before d; d before e
            let pos = |s: &str| order.iter().position(|x| x == s).unwrap();
            assert!(pos("a") < pos("b"));
            assert!(pos("a") < pos("c"));
            assert!(pos("b") < pos("d"));
            assert!(pos("d") < pos("e"));
        }
    
        #[test]
        fn test_kahn() {
            let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d"), ("d", "e")];
            let order = topo_sort_kahn(&edges);
            let pos = |s: &str| order.iter().position(|x| x == s).unwrap();
            assert!(pos("a") < pos("b"));
            assert!(pos("d") < pos("e"));
        }
    
        #[test]
        fn test_empty() {
            let order = topo_sort(&[]);
            assert!(order.is_empty());
        }
    
        #[test]
        fn test_single_edge() {
            let order = topo_sort(&[("a", "b")]);
            assert_eq!(order, vec!["a", "b"]);
        }
    
        #[test]
        fn test_linear_chain() {
            let edges = vec![("a", "b"), ("b", "c"), ("c", "d")];
            let order = topo_sort(&edges);
            assert_eq!(order, vec!["a", "b", "c", "d"]);
        }
    }

    Key Differences

  • Nested function: Rust's nested fn visit<'a>(...) inside topo_sort requires explicit lifetime annotations for the borrowed adjacency map. OCaml closures capture the environment implicitly.
  • **HashSet vs Hashtbl**: Rust's HashSet for visited nodes. OCaml's Hashtbl serves both set and map purposes. Both provide O(1) lookup.
  • Cycle detection: This implementation does not detect cycles (silently handles them by skipping visited nodes, which may produce incorrect output). A full implementation needs a "in progress" visited state (three-color DFS).
  • Ordering: The post-order accumulation produces dependencies-last. Reversing gives dependencies-first. Kahn's algorithm naturally produces dependencies-first without reversal.
  • OCaml Approach

    OCaml uses a hash table for the visited set and a reference list for accumulation:

    let topo_sort edges =
      let visited = Hashtbl.create 16 in
      let order = ref [] in
      let adj = build_adj edges in
      let rec visit node =
        if not (Hashtbl.mem visited node) then begin
          Hashtbl.add visited node ();
          List.iter visit (List.assoc_opt node adj |> Option.value ~default:[]);
          order := node :: !order
        end
      in
      List.iter (fun (a, b) -> visit a; visit b) edges;
      !order
    

    The order := node :: !order prepends in post-order, giving the correct topological sequence without an explicit reverse step.

    Full Source

    #![allow(clippy::all)]
    /// # Topological Sort — DAG Ordering
    ///
    /// Order nodes in a directed acyclic graph so every edge goes from earlier to later.
    /// Uses DFS-based algorithm.
    use std::collections::{HashMap, HashSet};
    
    /// DFS-based topological sort.
    /// Returns nodes in topological order (dependencies first).
    pub fn topo_sort(edges: &[(&str, &str)]) -> Vec<String> {
        // Collect all nodes
        let mut all_nodes = HashSet::new();
        let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
        for &(from, to) in edges {
            all_nodes.insert(from);
            all_nodes.insert(to);
            adj.entry(from).or_default().push(to);
        }
    
        let mut visited = HashSet::new();
        let mut order = Vec::new();
    
        fn visit<'a>(
            node: &'a str,
            adj: &HashMap<&'a str, Vec<&'a str>>,
            visited: &mut HashSet<&'a str>,
            order: &mut Vec<String>,
        ) {
            if visited.contains(node) {
                return;
            }
            visited.insert(node);
            if let Some(neighbors) = adj.get(node) {
                for &neighbor in neighbors {
                    visit(neighbor, adj, visited, order);
                }
            }
            // Post-order: add after all descendants visited
            order.push(node.to_string());
        }
    
        // Visit all nodes (handles disconnected components)
        let mut sorted_nodes: Vec<&str> = all_nodes.into_iter().collect();
        sorted_nodes.sort(); // deterministic ordering
        for node in sorted_nodes {
            visit(node, &adj, &mut visited, &mut order);
        }
    
        order.reverse(); // reverse post-order = topological order
        order
    }
    
    /// Kahn's algorithm (BFS-based) — alternative approach using in-degree counting.
    pub fn topo_sort_kahn(edges: &[(&str, &str)]) -> Vec<String> {
        let mut all_nodes = HashSet::new();
        let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
        let mut in_degree: HashMap<&str, usize> = HashMap::new();
    
        for &(from, to) in edges {
            all_nodes.insert(from);
            all_nodes.insert(to);
            adj.entry(from).or_default().push(to);
            *in_degree.entry(to).or_insert(0) += 1;
            in_degree.entry(from).or_insert(0);
        }
    
        // Start with nodes that have no incoming edges
        let mut queue: Vec<&str> = in_degree
            .iter()
            .filter(|(_, &deg)| deg == 0)
            .map(|(&node, _)| node)
            .collect();
        queue.sort(); // deterministic
        let mut result = Vec::new();
    
        while let Some(node) = queue.first().copied() {
            queue.remove(0);
            result.push(node.to_string());
            if let Some(neighbors) = adj.get(node) {
                for &neighbor in neighbors {
                    let deg = in_degree.get_mut(neighbor).unwrap();
                    *deg -= 1;
                    if *deg == 0 {
                        queue.push(neighbor);
                        queue.sort();
                    }
                }
            }
        }
    
        result
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_basic() {
            let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d"), ("d", "e")];
            let order = topo_sort(&edges);
            // a must come before b and c; b and c before d; d before e
            let pos = |s: &str| order.iter().position(|x| x == s).unwrap();
            assert!(pos("a") < pos("b"));
            assert!(pos("a") < pos("c"));
            assert!(pos("b") < pos("d"));
            assert!(pos("d") < pos("e"));
        }
    
        #[test]
        fn test_kahn() {
            let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d"), ("d", "e")];
            let order = topo_sort_kahn(&edges);
            let pos = |s: &str| order.iter().position(|x| x == s).unwrap();
            assert!(pos("a") < pos("b"));
            assert!(pos("d") < pos("e"));
        }
    
        #[test]
        fn test_empty() {
            let order = topo_sort(&[]);
            assert!(order.is_empty());
        }
    
        #[test]
        fn test_single_edge() {
            let order = topo_sort(&[("a", "b")]);
            assert_eq!(order, vec!["a", "b"]);
        }
    
        #[test]
        fn test_linear_chain() {
            let edges = vec![("a", "b"), ("b", "c"), ("c", "d")];
            let order = topo_sort(&edges);
            assert_eq!(order, vec!["a", "b", "c", "d"]);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_basic() {
            let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d"), ("d", "e")];
            let order = topo_sort(&edges);
            // a must come before b and c; b and c before d; d before e
            let pos = |s: &str| order.iter().position(|x| x == s).unwrap();
            assert!(pos("a") < pos("b"));
            assert!(pos("a") < pos("c"));
            assert!(pos("b") < pos("d"));
            assert!(pos("d") < pos("e"));
        }
    
        #[test]
        fn test_kahn() {
            let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d"), ("d", "e")];
            let order = topo_sort_kahn(&edges);
            let pos = |s: &str| order.iter().position(|x| x == s).unwrap();
            assert!(pos("a") < pos("b"));
            assert!(pos("d") < pos("e"));
        }
    
        #[test]
        fn test_empty() {
            let order = topo_sort(&[]);
            assert!(order.is_empty());
        }
    
        #[test]
        fn test_single_edge() {
            let order = topo_sort(&[("a", "b")]);
            assert_eq!(order, vec!["a", "b"]);
        }
    
        #[test]
        fn test_linear_chain() {
            let edges = vec![("a", "b"), ("b", "c"), ("c", "d")];
            let order = topo_sort(&edges);
            assert_eq!(order, vec!["a", "b", "c", "d"]);
        }
    }

    Deep Comparison

    Topological Sort — OCaml vs Rust Comparison

    Core Insight

    Topological sort reveals how each language handles graph traversal state. OCaml threads immutable (visited_set, result_list) pairs through recursive calls — purely functional but verbose. Rust passes mutable references to HashSet and Vec — more explicit about mutation but closer to the imperative algorithm.

    OCaml Approach

    Uses Set.Make(String) for the visited set. The visit function takes and returns (visited, order) tuples — no mutation, but every recursive call creates new tuples. SS.fold iterates over all nodes to handle disconnected components. The functional style ensures referential transparency.

    Rust Approach

    Two algorithms: (1) DFS with &mut HashSet and &mut Vec passed to recursive visit — clear ownership of mutable state. (2) Kahn's BFS algorithm using in-degree counting. The Rust version explicitly allocates HashMap for adjacency lists upfront, making the graph structure clear.

    Comparison Table

    AspectOCamlRust
    MemoryImmutable sets (tree nodes)HashSet + Vec (hash table + array)
    Null safetyN/AN/A
    ErrorsNo cycle detectionNo cycle detection (add if needed)
    IterationRecursive with tuple threadingRecursive with &mut references
    StateFunctional (pass-and-return)Mutable references (borrow checker)

    Things Rust Learners Should Notice

  • **&mut in recursive functions** — passing mutable refs down the call stack is idiomatic for graph algorithms
  • **HashMap::entry().or_default()** — builds adjacency lists cleanly
  • **Lifetime annotations 'a** — needed when storing references to input data in the visited set
  • Two algorithms — DFS (post-order reverse) vs Kahn's (BFS with in-degree) — same result, different trade-offs
  • Deterministic output — sorting nodes before iteration ensures reproducible results in tests
  • Further Reading

  • • [Topological sort (Wikipedia)](https://en.wikipedia.org/wiki/Topological_sorting)
  • • [petgraph crate](https://docs.rs/petgraph/) — production-quality graph algorithms for Rust
  • Exercises

  • Cycle detection: Extend topo_sort to return Result<Vec<String>, Vec<String>> where the error contains the nodes in the detected cycle. Use three-color DFS: white (unvisited), gray (in progress), black (done).
  • Kahn's algorithm: Implement topological sort using Kahn's algorithm: repeatedly remove nodes with in-degree 0. Return None if a cycle prevents completion.
  • Build order: Given Cargo-style [[dependencies]] as a list of (package, depends_on) pairs, produce a valid build order using topological sort.
  • Open Source Repos