ExamplesBy LevelBy TopicLearning Paths
521 Intermediate

Map-Reduce with Closures

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "Map-Reduce with Closures" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Map-reduce was popularized by Google's 2004 paper as a framework for processing petabytes of data across thousands of machines. Key difference from OCaml: 1. **Parallelism path**: Rust's `map_reduce` can be parallelized by switching `iter()` to `par_iter()` from Rayon with no other change; OCaml's `List.fold_left` is inherently sequential.

Tutorial

The Problem

Map-reduce was popularized by Google's 2004 paper as a framework for processing petabytes of data across thousands of machines. But the pattern predates distributed computing — it is a direct application of map and fold from functional programming, rooted in lambda calculus. Even in single-threaded Rust, the map-reduce idiom cleanly separates the transformation of individual elements from their aggregation, making code more composable and testable than equivalent imperative loops.

🎯 Learning Outcomes

  • • How to implement a generic map_reduce combining a mapper and reducer closure
  • • How fold serves as the universal aggregation primitive
  • • How to express word frequency counting, sum of squares, and string joining as map-reduce pipelines
  • • How group_by_key generalizes the reduce step to produce grouped collections
  • • Why separating map from reduce enables parallelism (Rayon, async tasks)
  • Code Example

    pub fn map_reduce<T, U, V, M, R>(items: &[T], mapper: M, reducer: R, init: V) -> V
    where M: Fn(&T) -> U, R: Fn(V, U) -> V {
        items.iter().map(mapper).fold(init, reducer)
    }

    Key Differences

  • Parallelism path: Rust's map_reduce can be parallelized by switching iter() to par_iter() from Rayon with no other change; OCaml's List.fold_left is inherently sequential.
  • Ownership in fold: Rust's fold takes ownership of the accumulator at each step (passing V by value); OCaml's fold passes the accumulator by value too but without ownership semantics — the GC handles sharing.
  • HashMap vs Hashtbl: Rust's HashMap is part of std::collections; OCaml's stdlib has Hashtbl (mutable) and Map (immutable functional maps) as separate choices.
  • Type inference depth: Rust requires explicit generic parameters on map_reduce signatures; OCaml's HM inference handles the same without annotations in most cases.
  • OCaml Approach

    OCaml's List.map and List.fold_left are the direct equivalents. A word count in OCaml uses Hashtbl or Map with List.fold_left. The pattern is idiomatic and concise, with no need for explicit type annotations in simple cases.

    let word_count words =
      List.fold_left (fun tbl w ->
        let n = try Hashtbl.find tbl w with Not_found -> 0 in
        Hashtbl.replace tbl w (n + 1); tbl
      ) (Hashtbl.create 16) words
    

    Full Source

    #![allow(clippy::all)]
    //! Map-Reduce with Closures
    //!
    //! Transforming and aggregating data with iterator map and fold.
    
    use std::collections::HashMap;
    
    /// Generic map-reduce: transform each element, then aggregate.
    pub fn map_reduce<T, U, V, M, R>(items: &[T], mapper: M, reducer: R, init: V) -> V
    where
        M: Fn(&T) -> U,
        R: Fn(V, U) -> V,
    {
        items.iter().map(mapper).fold(init, reducer)
    }
    
    /// Word frequency count via map-reduce.
    pub fn word_count<'a>(words: &[&'a str]) -> HashMap<&'a str, usize> {
        words.iter().fold(HashMap::new(), |mut acc, &word| {
            *acc.entry(word).or_insert(0) += 1;
            acc
        })
    }
    
    /// Sum of squares via map-reduce.
    pub fn sum_of_squares(nums: &[i32]) -> i32 {
        map_reduce(nums, |&x| x * x, |acc, x| acc + x, 0)
    }
    
    /// Product of all elements.
    pub fn product(nums: &[i32]) -> i32 {
        map_reduce(nums, |&x| x, |acc, x| acc * x, 1)
    }
    
    /// Concatenate strings with separator.
    pub fn join_strings(strings: &[&str], sep: &str) -> String {
        if strings.is_empty() {
            return String::new();
        }
        strings[1..].iter().fold(strings[0].to_string(), |acc, &s| {
            format!("{}{}{}", acc, sep, s)
        })
    }
    
    /// Group by key function.
    pub fn group_by_key<T: Clone, K: std::hash::Hash + Eq, F>(
        items: &[T],
        key_fn: F,
    ) -> HashMap<K, Vec<T>>
    where
        F: Fn(&T) -> K,
    {
        items.iter().fold(HashMap::new(), |mut acc, item| {
            acc.entry(key_fn(item)).or_default().push(item.clone());
            acc
        })
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_map_reduce_sum() {
            let nums = [1, 2, 3, 4, 5];
            let sum = map_reduce(&nums, |&x| x, |acc, x| acc + x, 0);
            assert_eq!(sum, 15);
        }
    
        #[test]
        fn test_sum_of_squares() {
            assert_eq!(sum_of_squares(&[1, 2, 3]), 14); // 1 + 4 + 9
            assert_eq!(sum_of_squares(&[]), 0);
        }
    
        #[test]
        fn test_product() {
            assert_eq!(product(&[1, 2, 3, 4]), 24);
            assert_eq!(product(&[5]), 5);
        }
    
        #[test]
        fn test_word_count() {
            let words = vec!["hello", "world", "hello", "rust", "world", "world"];
            let counts = word_count(&words);
            assert_eq!(counts.get("hello"), Some(&2));
            assert_eq!(counts.get("world"), Some(&3));
            assert_eq!(counts.get("rust"), Some(&1));
        }
    
        #[test]
        fn test_join_strings() {
            assert_eq!(join_strings(&["a", "b", "c"], ", "), "a, b, c");
            assert_eq!(join_strings(&["one"], "-"), "one");
            assert_eq!(join_strings(&[], "-"), "");
        }
    
        #[test]
        fn test_group_by_key() {
            let words = vec!["apple", "banana", "apricot", "blueberry", "cherry"];
            let grouped = group_by_key(&words, |w| w.chars().next().unwrap());
    
            assert_eq!(grouped.get(&'a').unwrap().len(), 2);
            assert_eq!(grouped.get(&'b').unwrap().len(), 2);
            assert_eq!(grouped.get(&'c').unwrap().len(), 1);
        }
    
        #[test]
        fn test_map_reduce_max() {
            let nums = [3, 1, 4, 1, 5, 9, 2, 6];
            let max = map_reduce(&nums, |&x| x, |acc, x| acc.max(x), i32::MIN);
            assert_eq!(max, 9);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_map_reduce_sum() {
            let nums = [1, 2, 3, 4, 5];
            let sum = map_reduce(&nums, |&x| x, |acc, x| acc + x, 0);
            assert_eq!(sum, 15);
        }
    
        #[test]
        fn test_sum_of_squares() {
            assert_eq!(sum_of_squares(&[1, 2, 3]), 14); // 1 + 4 + 9
            assert_eq!(sum_of_squares(&[]), 0);
        }
    
        #[test]
        fn test_product() {
            assert_eq!(product(&[1, 2, 3, 4]), 24);
            assert_eq!(product(&[5]), 5);
        }
    
        #[test]
        fn test_word_count() {
            let words = vec!["hello", "world", "hello", "rust", "world", "world"];
            let counts = word_count(&words);
            assert_eq!(counts.get("hello"), Some(&2));
            assert_eq!(counts.get("world"), Some(&3));
            assert_eq!(counts.get("rust"), Some(&1));
        }
    
        #[test]
        fn test_join_strings() {
            assert_eq!(join_strings(&["a", "b", "c"], ", "), "a, b, c");
            assert_eq!(join_strings(&["one"], "-"), "one");
            assert_eq!(join_strings(&[], "-"), "");
        }
    
        #[test]
        fn test_group_by_key() {
            let words = vec!["apple", "banana", "apricot", "blueberry", "cherry"];
            let grouped = group_by_key(&words, |w| w.chars().next().unwrap());
    
            assert_eq!(grouped.get(&'a').unwrap().len(), 2);
            assert_eq!(grouped.get(&'b').unwrap().len(), 2);
            assert_eq!(grouped.get(&'c').unwrap().len(), 1);
        }
    
        #[test]
        fn test_map_reduce_max() {
            let nums = [3, 1, 4, 1, 5, 9, 2, 6];
            let max = map_reduce(&nums, |&x| x, |acc, x| acc.max(x), i32::MIN);
            assert_eq!(max, 9);
        }
    }

    Deep Comparison

    OCaml vs Rust: Map-Reduce

    OCaml

    let map_reduce mapper reducer init items =
      List.fold_left (fun acc x -> reducer acc (mapper x)) init items
    
    let word_count words =
      List.fold_left (fun acc word ->
        let count = try Hashtbl.find acc word with Not_found -> 0 in
        Hashtbl.replace acc word (count + 1); acc
      ) (Hashtbl.create 10) words
    

    Rust

    pub fn map_reduce<T, U, V, M, R>(items: &[T], mapper: M, reducer: R, init: V) -> V
    where M: Fn(&T) -> U, R: Fn(V, U) -> V {
        items.iter().map(mapper).fold(init, reducer)
    }
    

    Key Differences

  • OCaml: fold_left/fold_right for reductions
  • Rust: .fold() on iterators, chainable with .map()
  • Both: Closures passed to map and reduce phases
  • Rust: HashMap via std::collections
  • Both support building complex aggregations
  • Exercises

  • Parallel map-reduce: Replace iter() with rayon::iter::IntoParallelRefIterator::par_iter() in map_reduce and verify the word count produces the same result on a large word list.
  • Max by key: Implement max_by_key<T, K: Ord, F: Fn(&T) -> K>(items: &[T], key: F) -> Option<&T> using a single fold call without importing Iterator::max_by_key.
  • Histogram: Write a function that takes a &[f64] and returns a Vec<usize> representing a frequency histogram with n equal-width buckets between the min and max values.
  • Open Source Repos