521 Intermediate

Map-Reduce with Closures

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "Map-Reduce with Closures" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Map-reduce was popularized by Google's 2004 paper as a framework for processing petabytes of data across thousands of machines. Key difference from OCaml: 1. **Parallelism path**: Rust's `map_reduce` can be parallelized by switching `iter()` to `par_iter()` from Rayon with no other change; OCaml's `List.fold_left` is inherently sequential.

Tutorial

The Problem

Map-reduce was popularized by Google's 2004 paper as a framework for processing petabytes of data across thousands of machines. But the pattern predates distributed computing — it is a direct application of map and fold from functional programming, rooted in lambda calculus. Even in single-threaded Rust, the map-reduce idiom cleanly separates the transformation of individual elements from their aggregation, making code more composable and testable than equivalent imperative loops.

🎯 Learning Outcomes

• How to implement a generic map_reduce combining a mapper and reducer closure

• How fold serves as the universal aggregation primitive

• How to express word frequency counting, sum of squares, and string joining as map-reduce pipelines

• How group_by_key generalizes the reduce step to produce grouped collections

• Why separating map from reduce enables parallelism (Rayon, async tasks)

Code Example

pub fn map_reduce<T, U, V, M, R>(items: &[T], mapper: M, reducer: R, init: V) -> V
where M: Fn(&T) -> U, R: Fn(V, U) -> V {
    items.iter().map(mapper).fold(init, reducer)
}

let map_reduce mapper reducer init items =
  List.fold_left (fun acc x -> reducer acc (mapper x)) init items

let word_count words =
  List.fold_left (fun acc word ->
    let count = try Hashtbl.find acc word with Not_found -> 0 in
    Hashtbl.replace acc word (count + 1); acc
  ) (Hashtbl.create 10) words

Key Differences

Parallelism path: Rust's map_reduce can be parallelized by switching iter() to par_iter() from Rayon with no other change; OCaml's List.fold_left is inherently sequential.

Ownership in fold: Rust's fold takes ownership of the accumulator at each step (passing V by value); OCaml's fold passes the accumulator by value too but without ownership semantics — the GC handles sharing.

HashMap vs Hashtbl: Rust's HashMap is part of std::collections; OCaml's stdlib has Hashtbl (mutable) and Map (immutable functional maps) as separate choices.

Type inference depth: Rust requires explicit generic parameters on map_reduce signatures; OCaml's HM inference handles the same without annotations in most cases.

OCaml Approach

OCaml's List.map and List.fold_left are the direct equivalents. A word count in OCaml uses Hashtbl or Map with List.fold_left. The pattern is idiomatic and concise, with no need for explicit type annotations in simple cases.

let word_count words =
  List.fold_left (fun tbl w ->
    let n = try Hashtbl.find tbl w with Not_found -> 0 in
    Hashtbl.replace tbl w (n + 1); tbl
  ) (Hashtbl.create 16) words

Full Source

#![allow(clippy::all)]
//! Map-Reduce with Closures
//!
//! Transforming and aggregating data with iterator map and fold.

use std::collections::HashMap;

/// Generic map-reduce: transform each element, then aggregate.
pub fn map_reduce<T, U, V, M, R>(items: &[T], mapper: M, reducer: R, init: V) -> V
where
    M: Fn(&T) -> U,
    R: Fn(V, U) -> V,
{
    items.iter().map(mapper).fold(init, reducer)
}

/// Word frequency count via map-reduce.
pub fn word_count<'a>(words: &[&'a str]) -> HashMap<&'a str, usize> {
    words.iter().fold(HashMap::new(), |mut acc, &word| {
        *acc.entry(word).or_insert(0) += 1;
        acc
    })
}

/// Sum of squares via map-reduce.
pub fn sum_of_squares(nums: &[i32]) -> i32 {
    map_reduce(nums, |&x| x * x, |acc, x| acc + x, 0)
}

/// Product of all elements.
pub fn product(nums: &[i32]) -> i32 {
    map_reduce(nums, |&x| x, |acc, x| acc * x, 1)
}

/// Concatenate strings with separator.
pub fn join_strings(strings: &[&str], sep: &str) -> String {
    if strings.is_empty() {
        return String::new();
    }
    strings[1..].iter().fold(strings[0].to_string(), |acc, &s| {
        format!("{}{}{}", acc, sep, s)
    })
}

/// Group by key function.
pub fn group_by_key<T: Clone, K: std::hash::Hash + Eq, F>(
    items: &[T],
    key_fn: F,
) -> HashMap<K, Vec<T>>
where
    F: Fn(&T) -> K,
{
    items.iter().fold(HashMap::new(), |mut acc, item| {
        acc.entry(key_fn(item)).or_default().push(item.clone());
        acc
    })
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_map_reduce_sum() {
        let nums = [1, 2, 3, 4, 5];
        let sum = map_reduce(&nums, |&x| x, |acc, x| acc + x, 0);
        assert_eq!(sum, 15);
    }

    #[test]
    fn test_sum_of_squares() {
        assert_eq!(sum_of_squares(&[1, 2, 3]), 14); // 1 + 4 + 9
        assert_eq!(sum_of_squares(&[]), 0);
    }

    #[test]
    fn test_product() {
        assert_eq!(product(&[1, 2, 3, 4]), 24);
        assert_eq!(product(&[5]), 5);
    }

    #[test]
    fn test_word_count() {
        let words = vec!["hello", "world", "hello", "rust", "world", "world"];
        let counts = word_count(&words);
        assert_eq!(counts.get("hello"), Some(&2));
        assert_eq!(counts.get("world"), Some(&3));
        assert_eq!(counts.get("rust"), Some(&1));
    }

    #[test]
    fn test_join_strings() {
        assert_eq!(join_strings(&["a", "b", "c"], ", "), "a, b, c");
        assert_eq!(join_strings(&["one"], "-"), "one");
        assert_eq!(join_strings(&[], "-"), "");
    }

    #[test]
    fn test_group_by_key() {
        let words = vec!["apple", "banana", "apricot", "blueberry", "cherry"];
        let grouped = group_by_key(&words, |w| w.chars().next().unwrap());

        assert_eq!(grouped.get(&'a').unwrap().len(), 2);
        assert_eq!(grouped.get(&'b').unwrap().len(), 2);
        assert_eq!(grouped.get(&'c').unwrap().len(), 1);
    }

    #[test]
    fn test_map_reduce_max() {
        let nums = [3, 1, 4, 1, 5, 9, 2, 6];
        let max = map_reduce(&nums, |&x| x, |acc, x| acc.max(x), i32::MIN);
        assert_eq!(max, 9);
    }
}

(* Map-reduce patterns in OCaml *)

(* Basic map-reduce *)
let map_reduce mapper reducer init xs =
  List.fold_left reducer init (List.map mapper xs)

(* Word frequency count *)
let word_count words =
  List.fold_left (fun acc word ->
    let count = try Hashtbl.find acc word with Not_found -> 0 in
    Hashtbl.replace acc word (count + 1);
    acc
  ) (Hashtbl.create 16) words

let () =
  let nums = [1; 2; 3; 4; 5; 6; 7; 8; 9; 10] in

  (* Sum of squares *)
  let sum_sq = map_reduce (fun x -> x * x) (+) 0 nums in
  Printf.printf "sum of squares: %d\n" sum_sq;

  (* Max element after transform *)
  let max_val = List.fold_left max min_int (List.map (fun x -> x * 3 - 7) nums) in
  Printf.printf "max(3x-7): %d\n" max_val;

  (* String reduction *)
  let words = ["hello"; "world"; "foo"; "bar"; "hello"; "foo"; "foo"] in
  let freq = word_count words in
  Hashtbl.iter (fun w c -> Printf.printf "%s: %d\n" w c) freq

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_map_reduce_sum() {
        let nums = [1, 2, 3, 4, 5];
        let sum = map_reduce(&nums, |&x| x, |acc, x| acc + x, 0);
        assert_eq!(sum, 15);
    }

    #[test]
    fn test_sum_of_squares() {
        assert_eq!(sum_of_squares(&[1, 2, 3]), 14); // 1 + 4 + 9
        assert_eq!(sum_of_squares(&[]), 0);
    }

    #[test]
    fn test_product() {
        assert_eq!(product(&[1, 2, 3, 4]), 24);
        assert_eq!(product(&[5]), 5);
    }

    #[test]
    fn test_word_count() {
        let words = vec!["hello", "world", "hello", "rust", "world", "world"];
        let counts = word_count(&words);
        assert_eq!(counts.get("hello"), Some(&2));
        assert_eq!(counts.get("world"), Some(&3));
        assert_eq!(counts.get("rust"), Some(&1));
    }

    #[test]
    fn test_join_strings() {
        assert_eq!(join_strings(&["a", "b", "c"], ", "), "a, b, c");
        assert_eq!(join_strings(&["one"], "-"), "one");
        assert_eq!(join_strings(&[], "-"), "");
    }

    #[test]
    fn test_group_by_key() {
        let words = vec!["apple", "banana", "apricot", "blueberry", "cherry"];
        let grouped = group_by_key(&words, |w| w.chars().next().unwrap());

        assert_eq!(grouped.get(&'a').unwrap().len(), 2);
        assert_eq!(grouped.get(&'b').unwrap().len(), 2);
        assert_eq!(grouped.get(&'c').unwrap().len(), 1);
    }

    #[test]
    fn test_map_reduce_max() {
        let nums = [3, 1, 4, 1, 5, 9, 2, 6];
        let max = map_reduce(&nums, |&x| x, |acc, x| acc.max(x), i32::MIN);
        assert_eq!(max, 9);
    }
}

Deep Comparison

OCaml vs Rust: Map-Reduce

OCaml

let map_reduce mapper reducer init items =
  List.fold_left (fun acc x -> reducer acc (mapper x)) init items

let word_count words =
  List.fold_left (fun acc word ->
    let count = try Hashtbl.find acc word with Not_found -> 0 in
    Hashtbl.replace acc word (count + 1); acc
  ) (Hashtbl.create 10) words

Rust

pub fn map_reduce<T, U, V, M, R>(items: &[T], mapper: M, reducer: R, init: V) -> V
where M: Fn(&T) -> U, R: Fn(V, U) -> V {
    items.iter().map(mapper).fold(init, reducer)
}

Key Differences

OCaml: fold_left/fold_right for reductions

Rust: .fold() on iterators, chainable with .map()

Both: Closures passed to map and reduce phases

Rust: HashMap via std::collections

Both support building complex aggregations

Exercises

Parallel map-reduce: Replace iter() with rayon::iter::IntoParallelRefIterator::par_iter() in map_reduce and verify the word count produces the same result on a large word list.

Max by key: Implement max_by_key<T, K: Ord, F: Fn(&T) -> K>(items: &[T], key: F) -> Option<&T> using a single fold call without importing Iterator::max_by_key.

Histogram: Write a function that takes a &[f64] and returns a Vec<usize> representing a frequency histogram with n equal-width buckets between the min and max values.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust