Map-Reduce with Closures
Tutorial Video
Text description (accessibility)
This video demonstrates the "Map-Reduce with Closures" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Map-reduce was popularized by Google's 2004 paper as a framework for processing petabytes of data across thousands of machines. Key difference from OCaml: 1. **Parallelism path**: Rust's `map_reduce` can be parallelized by switching `iter()` to `par_iter()` from Rayon with no other change; OCaml's `List.fold_left` is inherently sequential.
Tutorial
The Problem
Map-reduce was popularized by Google's 2004 paper as a framework for processing petabytes of data across thousands of machines. But the pattern predates distributed computing — it is a direct application of map and fold from functional programming, rooted in lambda calculus. Even in single-threaded Rust, the map-reduce idiom cleanly separates the transformation of individual elements from their aggregation, making code more composable and testable than equivalent imperative loops.
🎯 Learning Outcomes
map_reduce combining a mapper and reducer closurefold serves as the universal aggregation primitivegroup_by_key generalizes the reduce step to produce grouped collectionsCode Example
pub fn map_reduce<T, U, V, M, R>(items: &[T], mapper: M, reducer: R, init: V) -> V
where M: Fn(&T) -> U, R: Fn(V, U) -> V {
items.iter().map(mapper).fold(init, reducer)
}Key Differences
map_reduce can be parallelized by switching iter() to par_iter() from Rayon with no other change; OCaml's List.fold_left is inherently sequential.fold takes ownership of the accumulator at each step (passing V by value); OCaml's fold passes the accumulator by value too but without ownership semantics — the GC handles sharing.HashMap is part of std::collections; OCaml's stdlib has Hashtbl (mutable) and Map (immutable functional maps) as separate choices.map_reduce signatures; OCaml's HM inference handles the same without annotations in most cases.OCaml Approach
OCaml's List.map and List.fold_left are the direct equivalents. A word count in OCaml uses Hashtbl or Map with List.fold_left. The pattern is idiomatic and concise, with no need for explicit type annotations in simple cases.
let word_count words =
List.fold_left (fun tbl w ->
let n = try Hashtbl.find tbl w with Not_found -> 0 in
Hashtbl.replace tbl w (n + 1); tbl
) (Hashtbl.create 16) words
Full Source
#![allow(clippy::all)]
//! Map-Reduce with Closures
//!
//! Transforming and aggregating data with iterator map and fold.
use std::collections::HashMap;
/// Generic map-reduce: transform each element, then aggregate.
pub fn map_reduce<T, U, V, M, R>(items: &[T], mapper: M, reducer: R, init: V) -> V
where
M: Fn(&T) -> U,
R: Fn(V, U) -> V,
{
items.iter().map(mapper).fold(init, reducer)
}
/// Word frequency count via map-reduce.
pub fn word_count<'a>(words: &[&'a str]) -> HashMap<&'a str, usize> {
words.iter().fold(HashMap::new(), |mut acc, &word| {
*acc.entry(word).or_insert(0) += 1;
acc
})
}
/// Sum of squares via map-reduce.
pub fn sum_of_squares(nums: &[i32]) -> i32 {
map_reduce(nums, |&x| x * x, |acc, x| acc + x, 0)
}
/// Product of all elements.
pub fn product(nums: &[i32]) -> i32 {
map_reduce(nums, |&x| x, |acc, x| acc * x, 1)
}
/// Concatenate strings with separator.
pub fn join_strings(strings: &[&str], sep: &str) -> String {
if strings.is_empty() {
return String::new();
}
strings[1..].iter().fold(strings[0].to_string(), |acc, &s| {
format!("{}{}{}", acc, sep, s)
})
}
/// Group by key function.
pub fn group_by_key<T: Clone, K: std::hash::Hash + Eq, F>(
items: &[T],
key_fn: F,
) -> HashMap<K, Vec<T>>
where
F: Fn(&T) -> K,
{
items.iter().fold(HashMap::new(), |mut acc, item| {
acc.entry(key_fn(item)).or_default().push(item.clone());
acc
})
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_map_reduce_sum() {
let nums = [1, 2, 3, 4, 5];
let sum = map_reduce(&nums, |&x| x, |acc, x| acc + x, 0);
assert_eq!(sum, 15);
}
#[test]
fn test_sum_of_squares() {
assert_eq!(sum_of_squares(&[1, 2, 3]), 14); // 1 + 4 + 9
assert_eq!(sum_of_squares(&[]), 0);
}
#[test]
fn test_product() {
assert_eq!(product(&[1, 2, 3, 4]), 24);
assert_eq!(product(&[5]), 5);
}
#[test]
fn test_word_count() {
let words = vec!["hello", "world", "hello", "rust", "world", "world"];
let counts = word_count(&words);
assert_eq!(counts.get("hello"), Some(&2));
assert_eq!(counts.get("world"), Some(&3));
assert_eq!(counts.get("rust"), Some(&1));
}
#[test]
fn test_join_strings() {
assert_eq!(join_strings(&["a", "b", "c"], ", "), "a, b, c");
assert_eq!(join_strings(&["one"], "-"), "one");
assert_eq!(join_strings(&[], "-"), "");
}
#[test]
fn test_group_by_key() {
let words = vec!["apple", "banana", "apricot", "blueberry", "cherry"];
let grouped = group_by_key(&words, |w| w.chars().next().unwrap());
assert_eq!(grouped.get(&'a').unwrap().len(), 2);
assert_eq!(grouped.get(&'b').unwrap().len(), 2);
assert_eq!(grouped.get(&'c').unwrap().len(), 1);
}
#[test]
fn test_map_reduce_max() {
let nums = [3, 1, 4, 1, 5, 9, 2, 6];
let max = map_reduce(&nums, |&x| x, |acc, x| acc.max(x), i32::MIN);
assert_eq!(max, 9);
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_map_reduce_sum() {
let nums = [1, 2, 3, 4, 5];
let sum = map_reduce(&nums, |&x| x, |acc, x| acc + x, 0);
assert_eq!(sum, 15);
}
#[test]
fn test_sum_of_squares() {
assert_eq!(sum_of_squares(&[1, 2, 3]), 14); // 1 + 4 + 9
assert_eq!(sum_of_squares(&[]), 0);
}
#[test]
fn test_product() {
assert_eq!(product(&[1, 2, 3, 4]), 24);
assert_eq!(product(&[5]), 5);
}
#[test]
fn test_word_count() {
let words = vec!["hello", "world", "hello", "rust", "world", "world"];
let counts = word_count(&words);
assert_eq!(counts.get("hello"), Some(&2));
assert_eq!(counts.get("world"), Some(&3));
assert_eq!(counts.get("rust"), Some(&1));
}
#[test]
fn test_join_strings() {
assert_eq!(join_strings(&["a", "b", "c"], ", "), "a, b, c");
assert_eq!(join_strings(&["one"], "-"), "one");
assert_eq!(join_strings(&[], "-"), "");
}
#[test]
fn test_group_by_key() {
let words = vec!["apple", "banana", "apricot", "blueberry", "cherry"];
let grouped = group_by_key(&words, |w| w.chars().next().unwrap());
assert_eq!(grouped.get(&'a').unwrap().len(), 2);
assert_eq!(grouped.get(&'b').unwrap().len(), 2);
assert_eq!(grouped.get(&'c').unwrap().len(), 1);
}
#[test]
fn test_map_reduce_max() {
let nums = [3, 1, 4, 1, 5, 9, 2, 6];
let max = map_reduce(&nums, |&x| x, |acc, x| acc.max(x), i32::MIN);
assert_eq!(max, 9);
}
}
Deep Comparison
OCaml vs Rust: Map-Reduce
OCaml
let map_reduce mapper reducer init items =
List.fold_left (fun acc x -> reducer acc (mapper x)) init items
let word_count words =
List.fold_left (fun acc word ->
let count = try Hashtbl.find acc word with Not_found -> 0 in
Hashtbl.replace acc word (count + 1); acc
) (Hashtbl.create 10) words
Rust
pub fn map_reduce<T, U, V, M, R>(items: &[T], mapper: M, reducer: R, init: V) -> V
where M: Fn(&T) -> U, R: Fn(V, U) -> V {
items.iter().map(mapper).fold(init, reducer)
}
Key Differences
Exercises
iter() with rayon::iter::IntoParallelRefIterator::par_iter() in map_reduce and verify the word count produces the same result on a large word list.max_by_key<T, K: Ord, F: Fn(&T) -> K>(items: &[T], key: F) -> Option<&T> using a single fold call without importing Iterator::max_by_key.&[f64] and returns a Vec<usize> representing a frequency histogram with n equal-width buckets between the min and max values.