073 — Topological Sort
Tutorial Video
Text description (accessibility)
This video demonstrates the "073 — Topological Sort" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. Topological sort orders the nodes of a directed acyclic graph (DAG) such that for every directed edge (u → v), node u appears before v in the ordering. Key difference from OCaml: 1. **Nested function**: Rust's nested `fn visit<'a>(...)` inside `topo_sort` requires explicit lifetime annotations for the borrowed adjacency map. OCaml closures capture the environment implicitly.
Tutorial
The Problem
Topological sort orders the nodes of a directed acyclic graph (DAG) such that for every directed edge (u → v), node u appears before v in the ordering. It answers the question: "in what order should I complete these tasks, given their dependencies?" This is one of the most widely used graph algorithms in practice.
Build systems (Bazel, Make, Cargo) topologically sort compilation units so dependencies compile before dependents. Package managers (npm, pip, apt) sort package installations so transitive dependencies install first. Spreadsheet engines recalculate cells in topological order. Database engines plan query execution and schema migrations using topological ordering.
The DFS-based algorithm (Tarjan, 1976) marks each node visited, recurses on all neighbors first, then adds the current node — this is post-order DFS. Reversing the post-order gives a valid topological ordering. Kahn's algorithm (BFS-based) is an alternative that processes nodes with in-degree 0 iteratively and naturally detects cycles.
🎯 Learning Outcomes
HashMap<&str, Vec<&str>>HashSet for O(1) visited checkingCode Example
#![allow(clippy::all)]
/// # Topological Sort — DAG Ordering
///
/// Order nodes in a directed acyclic graph so every edge goes from earlier to later.
/// Uses DFS-based algorithm.
use std::collections::{HashMap, HashSet};
/// DFS-based topological sort.
/// Returns nodes in topological order (dependencies first).
pub fn topo_sort(edges: &[(&str, &str)]) -> Vec<String> {
// Collect all nodes
let mut all_nodes = HashSet::new();
let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
for &(from, to) in edges {
all_nodes.insert(from);
all_nodes.insert(to);
adj.entry(from).or_default().push(to);
}
let mut visited = HashSet::new();
let mut order = Vec::new();
fn visit<'a>(
node: &'a str,
adj: &HashMap<&'a str, Vec<&'a str>>,
visited: &mut HashSet<&'a str>,
order: &mut Vec<String>,
) {
if visited.contains(node) {
return;
}
visited.insert(node);
if let Some(neighbors) = adj.get(node) {
for &neighbor in neighbors {
visit(neighbor, adj, visited, order);
}
}
// Post-order: add after all descendants visited
order.push(node.to_string());
}
// Visit all nodes (handles disconnected components)
let mut sorted_nodes: Vec<&str> = all_nodes.into_iter().collect();
sorted_nodes.sort(); // deterministic ordering
for node in sorted_nodes {
visit(node, &adj, &mut visited, &mut order);
}
order.reverse(); // reverse post-order = topological order
order
}
/// Kahn's algorithm (BFS-based) — alternative approach using in-degree counting.
pub fn topo_sort_kahn(edges: &[(&str, &str)]) -> Vec<String> {
let mut all_nodes = HashSet::new();
let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
let mut in_degree: HashMap<&str, usize> = HashMap::new();
for &(from, to) in edges {
all_nodes.insert(from);
all_nodes.insert(to);
adj.entry(from).or_default().push(to);
*in_degree.entry(to).or_insert(0) += 1;
in_degree.entry(from).or_insert(0);
}
// Start with nodes that have no incoming edges
let mut queue: Vec<&str> = in_degree
.iter()
.filter(|(_, °)| deg == 0)
.map(|(&node, _)| node)
.collect();
queue.sort(); // deterministic
let mut result = Vec::new();
while let Some(node) = queue.first().copied() {
queue.remove(0);
result.push(node.to_string());
if let Some(neighbors) = adj.get(node) {
for &neighbor in neighbors {
let deg = in_degree.get_mut(neighbor).unwrap();
*deg -= 1;
if *deg == 0 {
queue.push(neighbor);
queue.sort();
}
}
}
}
result
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic() {
let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d"), ("d", "e")];
let order = topo_sort(&edges);
// a must come before b and c; b and c before d; d before e
let pos = |s: &str| order.iter().position(|x| x == s).unwrap();
assert!(pos("a") < pos("b"));
assert!(pos("a") < pos("c"));
assert!(pos("b") < pos("d"));
assert!(pos("d") < pos("e"));
}
#[test]
fn test_kahn() {
let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d"), ("d", "e")];
let order = topo_sort_kahn(&edges);
let pos = |s: &str| order.iter().position(|x| x == s).unwrap();
assert!(pos("a") < pos("b"));
assert!(pos("d") < pos("e"));
}
#[test]
fn test_empty() {
let order = topo_sort(&[]);
assert!(order.is_empty());
}
#[test]
fn test_single_edge() {
let order = topo_sort(&[("a", "b")]);
assert_eq!(order, vec!["a", "b"]);
}
#[test]
fn test_linear_chain() {
let edges = vec![("a", "b"), ("b", "c"), ("c", "d")];
let order = topo_sort(&edges);
assert_eq!(order, vec!["a", "b", "c", "d"]);
}
}Key Differences
fn visit<'a>(...) inside topo_sort requires explicit lifetime annotations for the borrowed adjacency map. OCaml closures capture the environment implicitly.HashSet vs Hashtbl**: Rust's HashSet for visited nodes. OCaml's Hashtbl serves both set and map purposes. Both provide O(1) lookup.OCaml Approach
OCaml uses a hash table for the visited set and a reference list for accumulation:
let topo_sort edges =
let visited = Hashtbl.create 16 in
let order = ref [] in
let adj = build_adj edges in
let rec visit node =
if not (Hashtbl.mem visited node) then begin
Hashtbl.add visited node ();
List.iter visit (List.assoc_opt node adj |> Option.value ~default:[]);
order := node :: !order
end
in
List.iter (fun (a, b) -> visit a; visit b) edges;
!order
The order := node :: !order prepends in post-order, giving the correct topological sequence without an explicit reverse step.
Full Source
#![allow(clippy::all)]
/// # Topological Sort — DAG Ordering
///
/// Order nodes in a directed acyclic graph so every edge goes from earlier to later.
/// Uses DFS-based algorithm.
use std::collections::{HashMap, HashSet};
/// DFS-based topological sort.
/// Returns nodes in topological order (dependencies first).
pub fn topo_sort(edges: &[(&str, &str)]) -> Vec<String> {
// Collect all nodes
let mut all_nodes = HashSet::new();
let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
for &(from, to) in edges {
all_nodes.insert(from);
all_nodes.insert(to);
adj.entry(from).or_default().push(to);
}
let mut visited = HashSet::new();
let mut order = Vec::new();
fn visit<'a>(
node: &'a str,
adj: &HashMap<&'a str, Vec<&'a str>>,
visited: &mut HashSet<&'a str>,
order: &mut Vec<String>,
) {
if visited.contains(node) {
return;
}
visited.insert(node);
if let Some(neighbors) = adj.get(node) {
for &neighbor in neighbors {
visit(neighbor, adj, visited, order);
}
}
// Post-order: add after all descendants visited
order.push(node.to_string());
}
// Visit all nodes (handles disconnected components)
let mut sorted_nodes: Vec<&str> = all_nodes.into_iter().collect();
sorted_nodes.sort(); // deterministic ordering
for node in sorted_nodes {
visit(node, &adj, &mut visited, &mut order);
}
order.reverse(); // reverse post-order = topological order
order
}
/// Kahn's algorithm (BFS-based) — alternative approach using in-degree counting.
pub fn topo_sort_kahn(edges: &[(&str, &str)]) -> Vec<String> {
let mut all_nodes = HashSet::new();
let mut adj: HashMap<&str, Vec<&str>> = HashMap::new();
let mut in_degree: HashMap<&str, usize> = HashMap::new();
for &(from, to) in edges {
all_nodes.insert(from);
all_nodes.insert(to);
adj.entry(from).or_default().push(to);
*in_degree.entry(to).or_insert(0) += 1;
in_degree.entry(from).or_insert(0);
}
// Start with nodes that have no incoming edges
let mut queue: Vec<&str> = in_degree
.iter()
.filter(|(_, °)| deg == 0)
.map(|(&node, _)| node)
.collect();
queue.sort(); // deterministic
let mut result = Vec::new();
while let Some(node) = queue.first().copied() {
queue.remove(0);
result.push(node.to_string());
if let Some(neighbors) = adj.get(node) {
for &neighbor in neighbors {
let deg = in_degree.get_mut(neighbor).unwrap();
*deg -= 1;
if *deg == 0 {
queue.push(neighbor);
queue.sort();
}
}
}
}
result
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic() {
let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d"), ("d", "e")];
let order = topo_sort(&edges);
// a must come before b and c; b and c before d; d before e
let pos = |s: &str| order.iter().position(|x| x == s).unwrap();
assert!(pos("a") < pos("b"));
assert!(pos("a") < pos("c"));
assert!(pos("b") < pos("d"));
assert!(pos("d") < pos("e"));
}
#[test]
fn test_kahn() {
let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d"), ("d", "e")];
let order = topo_sort_kahn(&edges);
let pos = |s: &str| order.iter().position(|x| x == s).unwrap();
assert!(pos("a") < pos("b"));
assert!(pos("d") < pos("e"));
}
#[test]
fn test_empty() {
let order = topo_sort(&[]);
assert!(order.is_empty());
}
#[test]
fn test_single_edge() {
let order = topo_sort(&[("a", "b")]);
assert_eq!(order, vec!["a", "b"]);
}
#[test]
fn test_linear_chain() {
let edges = vec![("a", "b"), ("b", "c"), ("c", "d")];
let order = topo_sort(&edges);
assert_eq!(order, vec!["a", "b", "c", "d"]);
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic() {
let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d"), ("d", "e")];
let order = topo_sort(&edges);
// a must come before b and c; b and c before d; d before e
let pos = |s: &str| order.iter().position(|x| x == s).unwrap();
assert!(pos("a") < pos("b"));
assert!(pos("a") < pos("c"));
assert!(pos("b") < pos("d"));
assert!(pos("d") < pos("e"));
}
#[test]
fn test_kahn() {
let edges = vec![("a", "b"), ("a", "c"), ("b", "d"), ("c", "d"), ("d", "e")];
let order = topo_sort_kahn(&edges);
let pos = |s: &str| order.iter().position(|x| x == s).unwrap();
assert!(pos("a") < pos("b"));
assert!(pos("d") < pos("e"));
}
#[test]
fn test_empty() {
let order = topo_sort(&[]);
assert!(order.is_empty());
}
#[test]
fn test_single_edge() {
let order = topo_sort(&[("a", "b")]);
assert_eq!(order, vec!["a", "b"]);
}
#[test]
fn test_linear_chain() {
let edges = vec![("a", "b"), ("b", "c"), ("c", "d")];
let order = topo_sort(&edges);
assert_eq!(order, vec!["a", "b", "c", "d"]);
}
}
Deep Comparison
Topological Sort — OCaml vs Rust Comparison
Core Insight
Topological sort reveals how each language handles graph traversal state. OCaml threads immutable (visited_set, result_list) pairs through recursive calls — purely functional but verbose. Rust passes mutable references to HashSet and Vec — more explicit about mutation but closer to the imperative algorithm.
OCaml Approach
Uses Set.Make(String) for the visited set. The visit function takes and returns (visited, order) tuples — no mutation, but every recursive call creates new tuples. SS.fold iterates over all nodes to handle disconnected components. The functional style ensures referential transparency.
Rust Approach
Two algorithms: (1) DFS with &mut HashSet and &mut Vec passed to recursive visit — clear ownership of mutable state. (2) Kahn's BFS algorithm using in-degree counting. The Rust version explicitly allocates HashMap for adjacency lists upfront, making the graph structure clear.
Comparison Table
| Aspect | OCaml | Rust |
|---|---|---|
| Memory | Immutable sets (tree nodes) | HashSet + Vec (hash table + array) |
| Null safety | N/A | N/A |
| Errors | No cycle detection | No cycle detection (add if needed) |
| Iteration | Recursive with tuple threading | Recursive with &mut references |
| State | Functional (pass-and-return) | Mutable references (borrow checker) |
Things Rust Learners Should Notice
&mut in recursive functions** — passing mutable refs down the call stack is idiomatic for graph algorithmsHashMap::entry().or_default()** — builds adjacency lists cleanly'a** — needed when storing references to input data in the visited setFurther Reading
Exercises
topo_sort to return Result<Vec<String>, Vec<String>> where the error contains the nodes in the detected cycle. Use three-color DFS: white (unvisited), gray (in progress), black (done).None if a cycle prevents completion.[[dependencies]] as a list of (package, depends_on) pairs, produce a valid build order using topological sort.