ExamplesBy LevelBy TopicLearning Paths
1030 Intermediate

1030-hashmap-groupby — Group By with HashMap

Functional Programming

Tutorial

The Problem

Grouping items by a key — SQL's GROUP BY, Python's itertools.groupby, Haskell's Data.Map.fromListWith — is one of the most frequent data processing operations. You want to transform a flat list into a map from key to list of matching elements: group customers by country, group log events by severity, group words by length.

The canonical Rust implementation uses HashMap<K, Vec<V>> with the Entry API's or_default() helper. This avoids double lookups and is idiomatic across the Rust ecosystem.

🎯 Learning Outcomes

  • • Implement group-by using HashMap<K, Vec<V>> and the Entry API
  • • Use or_default() for ergonomic initialization of empty Vecs
  • • Write a generic group_by function parameterized over key type and key function
  • • Compare the push-based approach to Iterator::partition for binary grouping
  • • Understand when BTreeMap is better than HashMap for grouped output
  • Code Example

    #![allow(clippy::all)]
    // 1030: Group Elements by Key — HashMap<K, Vec<V>>
    // The classic group-by pattern using Entry API
    
    use std::collections::HashMap;
    
    /// Group words by their first character
    fn group_by_first_letter() {
        let words = vec!["apple", "avocado", "banana", "blueberry", "cherry"];
    
        let mut groups: HashMap<char, Vec<&str>> = HashMap::new();
        for word in &words {
            let key = word.chars().next().unwrap();
            groups.entry(key).or_default().push(word);
        }
    
        assert_eq!(groups[&'a'], vec!["apple", "avocado"]);
        assert_eq!(groups[&'b'], vec!["banana", "blueberry"]);
        assert_eq!(groups[&'c'], vec!["cherry"]);
    }
    
    /// Group numbers by parity
    fn group_by_parity() {
        let nums = vec![1, 2, 3, 4, 5, 6, 7, 8];
    
        let mut groups: HashMap<&str, Vec<i32>> = HashMap::new();
        for &n in &nums {
            let key = if n % 2 == 0 { "even" } else { "odd" };
            groups.entry(key).or_default().push(n);
        }
    
        assert_eq!(groups["even"], vec![2, 4, 6, 8]);
        assert_eq!(groups["odd"], vec![1, 3, 5, 7]);
    }
    
    /// Generic group_by function
    fn group_by<T, K, F>(items: &[T], key_fn: F) -> HashMap<K, Vec<&T>>
    where
        K: std::hash::Hash + Eq,
        F: Fn(&T) -> K,
    {
        let mut groups: HashMap<K, Vec<&T>> = HashMap::new();
        for item in items {
            groups.entry(key_fn(item)).or_default().push(item);
        }
        groups
    }
    
    fn test_generic_group_by() {
        let data = vec![("Alice", 90), ("Bob", 85), ("Alice", 92), ("Bob", 88)];
        let groups = group_by(&data, |&(name, _)| name);
    
        assert_eq!(groups["Alice"].len(), 2);
        assert_eq!(groups["Bob"].len(), 2);
        assert_eq!(groups["Alice"][0].1, 90);
        assert_eq!(groups["Alice"][1].1, 92);
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_group_by_first_letter() {
            group_by_first_letter();
        }
    
        #[test]
        fn test_group_by_parity() {
            group_by_parity();
        }
    
        #[test]
        fn test_generic() {
            test_generic_group_by();
        }
    
        #[test]
        fn test_group_by_length() {
            let words = vec!["hi", "hey", "hello", "yo", "yes"];
            let groups = group_by(&words, |w| w.len());
            assert_eq!(groups[&2].len(), 2); // "hi", "yo"
            assert_eq!(groups[&3].len(), 2); // "hey", "yes"
            assert_eq!(groups[&5].len(), 1); // "hello"
        }
    }

    Key Differences

  • Mutability: Rust's HashMap mutates in place via push; OCaml's Map creates new versions on each insert.
  • or_default: Rust's or_default() inserts an empty Vec<_> on first access; OCaml requires explicit Not_found handling.
  • Sorted vs unsorted: Rust's HashMap groups in arbitrary order; use BTreeMap for sorted group keys, matching OCaml's Map.Make behavior.
  • Generic API: Rust's generic group_by with a closure is natural; OCaml uses first-class functions via List.fold_left.
  • OCaml Approach

    OCaml's functional approach uses List.fold_left with a persistent map:

    module StringMap = Map.Make(String)
    
    let group_by key_fn items =
      List.fold_left (fun acc item ->
        let k = key_fn item in
        let current = try StringMap.find k acc with Not_found -> [] in
        StringMap.add k (item :: current) acc
      ) StringMap.empty items
    

    Each update creates a new map version. The Base.Map.add_multi function provides a one-liner for the same pattern.

    Full Source

    #![allow(clippy::all)]
    // 1030: Group Elements by Key — HashMap<K, Vec<V>>
    // The classic group-by pattern using Entry API
    
    use std::collections::HashMap;
    
    /// Group words by their first character
    fn group_by_first_letter() {
        let words = vec!["apple", "avocado", "banana", "blueberry", "cherry"];
    
        let mut groups: HashMap<char, Vec<&str>> = HashMap::new();
        for word in &words {
            let key = word.chars().next().unwrap();
            groups.entry(key).or_default().push(word);
        }
    
        assert_eq!(groups[&'a'], vec!["apple", "avocado"]);
        assert_eq!(groups[&'b'], vec!["banana", "blueberry"]);
        assert_eq!(groups[&'c'], vec!["cherry"]);
    }
    
    /// Group numbers by parity
    fn group_by_parity() {
        let nums = vec![1, 2, 3, 4, 5, 6, 7, 8];
    
        let mut groups: HashMap<&str, Vec<i32>> = HashMap::new();
        for &n in &nums {
            let key = if n % 2 == 0 { "even" } else { "odd" };
            groups.entry(key).or_default().push(n);
        }
    
        assert_eq!(groups["even"], vec![2, 4, 6, 8]);
        assert_eq!(groups["odd"], vec![1, 3, 5, 7]);
    }
    
    /// Generic group_by function
    fn group_by<T, K, F>(items: &[T], key_fn: F) -> HashMap<K, Vec<&T>>
    where
        K: std::hash::Hash + Eq,
        F: Fn(&T) -> K,
    {
        let mut groups: HashMap<K, Vec<&T>> = HashMap::new();
        for item in items {
            groups.entry(key_fn(item)).or_default().push(item);
        }
        groups
    }
    
    fn test_generic_group_by() {
        let data = vec![("Alice", 90), ("Bob", 85), ("Alice", 92), ("Bob", 88)];
        let groups = group_by(&data, |&(name, _)| name);
    
        assert_eq!(groups["Alice"].len(), 2);
        assert_eq!(groups["Bob"].len(), 2);
        assert_eq!(groups["Alice"][0].1, 90);
        assert_eq!(groups["Alice"][1].1, 92);
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_group_by_first_letter() {
            group_by_first_letter();
        }
    
        #[test]
        fn test_group_by_parity() {
            group_by_parity();
        }
    
        #[test]
        fn test_generic() {
            test_generic_group_by();
        }
    
        #[test]
        fn test_group_by_length() {
            let words = vec!["hi", "hey", "hello", "yo", "yes"];
            let groups = group_by(&words, |w| w.len());
            assert_eq!(groups[&2].len(), 2); // "hi", "yo"
            assert_eq!(groups[&3].len(), 2); // "hey", "yes"
            assert_eq!(groups[&5].len(), 1); // "hello"
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_group_by_first_letter() {
            group_by_first_letter();
        }
    
        #[test]
        fn test_group_by_parity() {
            group_by_parity();
        }
    
        #[test]
        fn test_generic() {
            test_generic_group_by();
        }
    
        #[test]
        fn test_group_by_length() {
            let words = vec!["hi", "hey", "hello", "yo", "yes"];
            let groups = group_by(&words, |w| w.len());
            assert_eq!(groups[&2].len(), 2); // "hi", "yo"
            assert_eq!(groups[&3].len(), 2); // "hey", "yes"
            assert_eq!(groups[&5].len(), 1); // "hello"
        }
    }

    Deep Comparison

    Group Elements by Key — Comparison

    Core Insight

    Group-by is one of the most common data transformation patterns. Both languages build a map from key to collection of values, but Rust's Entry API makes the pattern a one-liner while OCaml requires explicit option matching.

    OCaml Approach

  • find_opt + pattern match + add with appended list
  • List.fold_left to accumulate groups
  • • Appending to list end (@ [item]) is O(n) — use cons + reverse for performance
  • • Generic version uses first-class modules or functors for key type
  • Rust Approach

  • entry(key).or_default().push(value) — single expressive line
  • or_default() creates empty Vec if key absent
  • • Returns &mut Vec<V> so push works inline
  • • Generic version needs Hash + Eq bounds on key type
  • Comparison Table

    AspectOCamlRust
    Core patternfind_opt + match + addentry().or_default().push()
    Lines of code4-5 per group-by1
    Key constraintOrd (for Map)Hash + Eq (for HashMap)
    Value collectionList (prepend is O(1))Vec (append is amortized O(1))
    MutabilityReturns new map each stepMutates in place
    Iterator methodfold_leftfor loop or fold

    Exercises

  • Write group_and_count<T: Hash + Eq>(items: &[T]) -> HashMap<&T, usize> that counts occurrences without storing the items.
  • Implement a bucket_by_range(numbers: &[i32], bucket_size: i32) -> BTreeMap<i32, Vec<i32>> that groups numbers into buckets of fixed width.
  • Write a invert_map<K, V>(map: HashMap<K, V>) -> HashMap<V, Vec<K>> that inverts a map, grouping keys that share a value.
  • Open Source Repos