1030-hashmap-groupby — Group By with HashMap
Tutorial
The Problem
Grouping items by a key — SQL's GROUP BY, Python's itertools.groupby, Haskell's Data.Map.fromListWith — is one of the most frequent data processing operations. You want to transform a flat list into a map from key to list of matching elements: group customers by country, group log events by severity, group words by length.
The canonical Rust implementation uses HashMap<K, Vec<V>> with the Entry API's or_default() helper. This avoids double lookups and is idiomatic across the Rust ecosystem.
🎯 Learning Outcomes
HashMap<K, Vec<V>> and the Entry APIor_default() for ergonomic initialization of empty Vecsgroup_by function parameterized over key type and key functionIterator::partition for binary groupingBTreeMap is better than HashMap for grouped outputCode Example
#![allow(clippy::all)]
// 1030: Group Elements by Key — HashMap<K, Vec<V>>
// The classic group-by pattern using Entry API
use std::collections::HashMap;
/// Group words by their first character
fn group_by_first_letter() {
let words = vec!["apple", "avocado", "banana", "blueberry", "cherry"];
let mut groups: HashMap<char, Vec<&str>> = HashMap::new();
for word in &words {
let key = word.chars().next().unwrap();
groups.entry(key).or_default().push(word);
}
assert_eq!(groups[&'a'], vec!["apple", "avocado"]);
assert_eq!(groups[&'b'], vec!["banana", "blueberry"]);
assert_eq!(groups[&'c'], vec!["cherry"]);
}
/// Group numbers by parity
fn group_by_parity() {
let nums = vec![1, 2, 3, 4, 5, 6, 7, 8];
let mut groups: HashMap<&str, Vec<i32>> = HashMap::new();
for &n in &nums {
let key = if n % 2 == 0 { "even" } else { "odd" };
groups.entry(key).or_default().push(n);
}
assert_eq!(groups["even"], vec![2, 4, 6, 8]);
assert_eq!(groups["odd"], vec![1, 3, 5, 7]);
}
/// Generic group_by function
fn group_by<T, K, F>(items: &[T], key_fn: F) -> HashMap<K, Vec<&T>>
where
K: std::hash::Hash + Eq,
F: Fn(&T) -> K,
{
let mut groups: HashMap<K, Vec<&T>> = HashMap::new();
for item in items {
groups.entry(key_fn(item)).or_default().push(item);
}
groups
}
fn test_generic_group_by() {
let data = vec![("Alice", 90), ("Bob", 85), ("Alice", 92), ("Bob", 88)];
let groups = group_by(&data, |&(name, _)| name);
assert_eq!(groups["Alice"].len(), 2);
assert_eq!(groups["Bob"].len(), 2);
assert_eq!(groups["Alice"][0].1, 90);
assert_eq!(groups["Alice"][1].1, 92);
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_group_by_first_letter() {
group_by_first_letter();
}
#[test]
fn test_group_by_parity() {
group_by_parity();
}
#[test]
fn test_generic() {
test_generic_group_by();
}
#[test]
fn test_group_by_length() {
let words = vec!["hi", "hey", "hello", "yo", "yes"];
let groups = group_by(&words, |w| w.len());
assert_eq!(groups[&2].len(), 2); // "hi", "yo"
assert_eq!(groups[&3].len(), 2); // "hey", "yes"
assert_eq!(groups[&5].len(), 1); // "hello"
}
}Key Differences
push; OCaml's Map creates new versions on each insert.or_default() inserts an empty Vec<_> on first access; OCaml requires explicit Not_found handling.HashMap groups in arbitrary order; use BTreeMap for sorted group keys, matching OCaml's Map.Make behavior.group_by with a closure is natural; OCaml uses first-class functions via List.fold_left.OCaml Approach
OCaml's functional approach uses List.fold_left with a persistent map:
module StringMap = Map.Make(String)
let group_by key_fn items =
List.fold_left (fun acc item ->
let k = key_fn item in
let current = try StringMap.find k acc with Not_found -> [] in
StringMap.add k (item :: current) acc
) StringMap.empty items
Each update creates a new map version. The Base.Map.add_multi function provides a one-liner for the same pattern.
Full Source
#![allow(clippy::all)]
// 1030: Group Elements by Key — HashMap<K, Vec<V>>
// The classic group-by pattern using Entry API
use std::collections::HashMap;
/// Group words by their first character
fn group_by_first_letter() {
let words = vec!["apple", "avocado", "banana", "blueberry", "cherry"];
let mut groups: HashMap<char, Vec<&str>> = HashMap::new();
for word in &words {
let key = word.chars().next().unwrap();
groups.entry(key).or_default().push(word);
}
assert_eq!(groups[&'a'], vec!["apple", "avocado"]);
assert_eq!(groups[&'b'], vec!["banana", "blueberry"]);
assert_eq!(groups[&'c'], vec!["cherry"]);
}
/// Group numbers by parity
fn group_by_parity() {
let nums = vec![1, 2, 3, 4, 5, 6, 7, 8];
let mut groups: HashMap<&str, Vec<i32>> = HashMap::new();
for &n in &nums {
let key = if n % 2 == 0 { "even" } else { "odd" };
groups.entry(key).or_default().push(n);
}
assert_eq!(groups["even"], vec![2, 4, 6, 8]);
assert_eq!(groups["odd"], vec![1, 3, 5, 7]);
}
/// Generic group_by function
fn group_by<T, K, F>(items: &[T], key_fn: F) -> HashMap<K, Vec<&T>>
where
K: std::hash::Hash + Eq,
F: Fn(&T) -> K,
{
let mut groups: HashMap<K, Vec<&T>> = HashMap::new();
for item in items {
groups.entry(key_fn(item)).or_default().push(item);
}
groups
}
fn test_generic_group_by() {
let data = vec![("Alice", 90), ("Bob", 85), ("Alice", 92), ("Bob", 88)];
let groups = group_by(&data, |&(name, _)| name);
assert_eq!(groups["Alice"].len(), 2);
assert_eq!(groups["Bob"].len(), 2);
assert_eq!(groups["Alice"][0].1, 90);
assert_eq!(groups["Alice"][1].1, 92);
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_group_by_first_letter() {
group_by_first_letter();
}
#[test]
fn test_group_by_parity() {
group_by_parity();
}
#[test]
fn test_generic() {
test_generic_group_by();
}
#[test]
fn test_group_by_length() {
let words = vec!["hi", "hey", "hello", "yo", "yes"];
let groups = group_by(&words, |w| w.len());
assert_eq!(groups[&2].len(), 2); // "hi", "yo"
assert_eq!(groups[&3].len(), 2); // "hey", "yes"
assert_eq!(groups[&5].len(), 1); // "hello"
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_group_by_first_letter() {
group_by_first_letter();
}
#[test]
fn test_group_by_parity() {
group_by_parity();
}
#[test]
fn test_generic() {
test_generic_group_by();
}
#[test]
fn test_group_by_length() {
let words = vec!["hi", "hey", "hello", "yo", "yes"];
let groups = group_by(&words, |w| w.len());
assert_eq!(groups[&2].len(), 2); // "hi", "yo"
assert_eq!(groups[&3].len(), 2); // "hey", "yes"
assert_eq!(groups[&5].len(), 1); // "hello"
}
}
Deep Comparison
Group Elements by Key — Comparison
Core Insight
Group-by is one of the most common data transformation patterns. Both languages build a map from key to collection of values, but Rust's Entry API makes the pattern a one-liner while OCaml requires explicit option matching.
OCaml Approach
find_opt + pattern match + add with appended listList.fold_left to accumulate groups@ [item]) is O(n) — use cons + reverse for performanceRust Approach
entry(key).or_default().push(value) — single expressive lineor_default() creates empty Vec if key absent&mut Vec<V> so push works inlineHash + Eq bounds on key typeComparison Table
| Aspect | OCaml | Rust |
|---|---|---|
| Core pattern | find_opt + match + add | entry().or_default().push() |
| Lines of code | 4-5 per group-by | 1 |
| Key constraint | Ord (for Map) | Hash + Eq (for HashMap) |
| Value collection | List (prepend is O(1)) | Vec (append is amortized O(1)) |
| Mutability | Returns new map each step | Mutates in place |
| Iterator method | fold_left | for loop or fold |
Exercises
group_and_count<T: Hash + Eq>(items: &[T]) -> HashMap<&T, usize> that counts occurrences without storing the items.bucket_by_range(numbers: &[i32], bucket_size: i32) -> BTreeMap<i32, Vec<i32>> that groups numbers into buckets of fixed width.invert_map<K, V>(map: HashMap<K, V>) -> HashMap<V, Vec<K>> that inverts a map, grouping keys that share a value.