356: HashMap Patterns
Tutorial
The Problem
Counting, grouping, and frequency analysis are among the most common data processing tasks. A hash map is the universal tool: O(1) average insert and lookup, with keys mapped to values. Rust's HashMap<K, V> uses the SipHash-1-3 algorithm by default (DoS-resistant), supports the entry() API for atomic insert-or-update, and requires K: Hash + Eq. Understanding the three canonical patterns — word counting, grouping by key, and top-N frequency — prepares you for most real-world data pipeline work.
🎯 Learning Outcomes
entry().or_insert(0) and += 1group_by<T, K> that partitions items into a HashMap<K, Vec<T>>entry() avoids double-lookup vs get + insertor_default() as a shorthand for or_insert(Default::default())HashMap's O(1) average vs BTreeMap's O(log n) sorted operationsCode Example
#![allow(clippy::all)]
//! # HashMap Patterns
//! Common patterns: word count, grouping, frequency analysis.
use std::collections::HashMap;
pub fn word_count(text: &str) -> HashMap<String, usize> {
let mut map = HashMap::new();
for word in text.split_whitespace() {
*map.entry(word.to_string()).or_insert(0) += 1;
}
map
}
pub fn group_by<T, K, F>(items: Vec<T>, key: F) -> HashMap<K, Vec<T>>
where
K: Eq + std::hash::Hash,
F: Fn(&T) -> K,
{
let mut map: HashMap<K, Vec<T>> = HashMap::new();
for item in items {
map.entry(key(&item)).or_default().push(item);
}
map
}
pub fn frequency_top_n(map: &HashMap<String, usize>, n: usize) -> Vec<(&str, usize)> {
let mut pairs: Vec<_> = map.iter().map(|(k, &v)| (k.as_str(), v)).collect();
pairs.sort_by(|a, b| b.1.cmp(&a.1));
pairs.into_iter().take(n).collect()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn count_words() {
let wc = word_count("the cat sat on the mat");
assert_eq!(wc["the"], 2);
assert_eq!(wc["cat"], 1);
}
#[test]
fn group_by_parity() {
let grouped = group_by(
vec![1, 2, 3, 4, 5],
|&x| if x % 2 == 0 { "even" } else { "odd" },
);
assert_eq!(grouped["even"].len(), 2);
assert_eq!(grouped["odd"].len(), 3);
}
#[test]
fn top_n_frequency() {
let wc = word_count("a a a b b c");
let top = frequency_top_n(&wc, 2);
assert_eq!(top[0].0, "a");
assert_eq!(top[1].0, "b");
}
}Key Differences
| Aspect | Rust HashMap | OCaml Hashtbl |
|---|---|---|
| Insert-or-update | entry().or_insert() | find + replace (two lookups) |
| Hash algorithm | SipHash-1-3 (DoS-resistant) | Polymorphic hash (faster, not DoS-safe) |
| Ordering | None | None |
group_by | or_default().push(item) | find + replace with list cons |
| Alternative | BTreeMap for ordered | Map.Make for ordered |
OCaml Approach
OCaml's Hashtbl provides the imperative equivalent:
let word_count text =
let tbl = Hashtbl.create 64 in
String.split_on_char ' ' text |> List.iter (fun w ->
let count = try Hashtbl.find tbl w with Not_found -> 0 in
Hashtbl.replace tbl w (count + 1));
tbl
(* Functional alternative with Map *)
module SMap = Map.Make(String)
let word_count_pure text =
String.split_on_char ' ' text
|> List.fold_left (fun m w ->
let n = try SMap.find w m with Not_found -> 0 in
SMap.add w (n + 1) m) SMap.empty
The Hashtbl version mutates in place; the Map version returns a new map per insertion (persistent). Rust's HashMap is imperative like Hashtbl, but the entry API makes the "find or insert" pattern atomic and idiomatic.
Full Source
#![allow(clippy::all)]
//! # HashMap Patterns
//! Common patterns: word count, grouping, frequency analysis.
use std::collections::HashMap;
pub fn word_count(text: &str) -> HashMap<String, usize> {
let mut map = HashMap::new();
for word in text.split_whitespace() {
*map.entry(word.to_string()).or_insert(0) += 1;
}
map
}
pub fn group_by<T, K, F>(items: Vec<T>, key: F) -> HashMap<K, Vec<T>>
where
K: Eq + std::hash::Hash,
F: Fn(&T) -> K,
{
let mut map: HashMap<K, Vec<T>> = HashMap::new();
for item in items {
map.entry(key(&item)).or_default().push(item);
}
map
}
pub fn frequency_top_n(map: &HashMap<String, usize>, n: usize) -> Vec<(&str, usize)> {
let mut pairs: Vec<_> = map.iter().map(|(k, &v)| (k.as_str(), v)).collect();
pairs.sort_by(|a, b| b.1.cmp(&a.1));
pairs.into_iter().take(n).collect()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn count_words() {
let wc = word_count("the cat sat on the mat");
assert_eq!(wc["the"], 2);
assert_eq!(wc["cat"], 1);
}
#[test]
fn group_by_parity() {
let grouped = group_by(
vec![1, 2, 3, 4, 5],
|&x| if x % 2 == 0 { "even" } else { "odd" },
);
assert_eq!(grouped["even"].len(), 2);
assert_eq!(grouped["odd"].len(), 3);
}
#[test]
fn top_n_frequency() {
let wc = word_count("a a a b b c");
let top = frequency_top_n(&wc, 2);
assert_eq!(top[0].0, "a");
assert_eq!(top[1].0, "b");
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn count_words() {
let wc = word_count("the cat sat on the mat");
assert_eq!(wc["the"], 2);
assert_eq!(wc["cat"], 1);
}
#[test]
fn group_by_parity() {
let grouped = group_by(
vec![1, 2, 3, 4, 5],
|&x| if x % 2 == 0 { "even" } else { "odd" },
);
assert_eq!(grouped["even"].len(), 2);
assert_eq!(grouped["odd"].len(), 3);
}
#[test]
fn top_n_frequency() {
let wc = word_count("a a a b b c");
let top = frequency_top_n(&wc, 2);
assert_eq!(top[0].0, "a");
assert_eq!(top[1].0, "b");
}
}
Deep Comparison
OCaml vs Rust: Hashmap Patterns
Overview
See the example.rs and example.ml files for detailed implementations.
Key Differences
| Aspect | OCaml | Rust |
|---|---|---|
| Type system | Hindley-Milner | Ownership + traits |
| Memory | GC | Zero-cost abstractions |
| Mutability | Explicit ref | mut keyword |
| Error handling | Option/Result | Result<T, E> |
See README.md for detailed comparison.
Exercises
group_by(words, |w| { let mut sorted = w.chars().collect::<Vec<_>>(); sorted.sort(); sorted }); output groups where each group contains words that are anagrams of each other.HashMap<String, HashSet<usize>>; then query "documents containing both 'rust' and 'async'".HashMap<K, (V, usize)> tracking access counts; evict the least-frequently-used entry when size exceeds capacity.