1031-hashmap-counting — Frequency Counting with HashMap
Tutorial
The Problem
Frequency counting is the foundation of histogram generation, text analysis, log aggregation, and A/B test result collection. The operation reads each item from a stream and increments a counter in a map, requiring an efficient insert-or-increment primitive.
Rust's Entry API provides and_modify(|c| *c += 1).or_insert(1) as the canonical single-lookup pattern. This example explores all the Entry API variants for counting, showing how each is optimized for different use cases.
🎯 Learning Outcomes
entry(ch).or_insert(0) and *count += 1and_modify(|c| *c += 1).or_insert(1) for combined increment-or-insertHashMap::iter() to iterate and sort countsentry vs re-lookup performance characteristicsCode Example
#![allow(clippy::all)]
// 1031: Count Frequencies
// The classic frequency counting pattern with Entry API
use std::collections::HashMap;
/// Count character frequencies
fn char_frequency() {
let text = "abracadabra";
let mut counts: HashMap<char, usize> = HashMap::new();
for ch in text.chars() {
*counts.entry(ch).or_insert(0) += 1;
}
assert_eq!(counts[&'a'], 5);
assert_eq!(counts[&'b'], 2);
assert_eq!(counts[&'r'], 2);
assert_eq!(counts[&'c'], 1);
assert_eq!(counts[&'d'], 1);
}
/// Count word frequencies using and_modify
fn word_frequency() {
let words = vec!["the", "cat", "sat", "on", "the", "mat", "the", "cat"];
let mut counts: HashMap<&str, usize> = HashMap::new();
for word in &words {
counts.entry(word).and_modify(|c| *c += 1).or_insert(1);
}
assert_eq!(counts["the"], 3);
assert_eq!(counts["cat"], 2);
assert_eq!(counts["sat"], 1);
}
/// Find the most frequent element
fn most_frequent() {
let items = vec![1, 2, 3, 2, 1, 2, 3, 2, 2];
let mut counts: HashMap<i32, usize> = HashMap::new();
for &x in &items {
*counts.entry(x).or_insert(0) += 1;
}
let (most, count) = counts
.iter()
.max_by_key(|&(_, &v)| v)
.map(|(&k, &v)| (k, v))
.unwrap();
assert_eq!(most, 2);
assert_eq!(count, 5);
}
/// Frequency counting with iterators (functional style)
fn functional_counting() {
let data = vec![1, 1, 2, 3, 3, 3];
let counts: HashMap<i32, usize> = data.iter().fold(HashMap::new(), |mut acc, &x| {
*acc.entry(x).or_insert(0) += 1;
acc
});
assert_eq!(counts[&1], 2);
assert_eq!(counts[&3], 3);
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_char_frequency() {
char_frequency();
}
#[test]
fn test_word_frequency() {
word_frequency();
}
#[test]
fn test_most_frequent() {
most_frequent();
}
#[test]
fn test_functional_counting() {
functional_counting();
}
#[test]
fn test_empty_input() {
let empty: Vec<i32> = vec![];
let counts: HashMap<i32, usize> = empty.iter().fold(HashMap::new(), |mut acc, &x| {
*acc.entry(x).or_insert(0) += 1;
acc
});
assert!(counts.is_empty());
}
}Key Differences
entry().and_modify().or_insert() is one hash computation; OCaml's find + replace is two.and_modify(...).or_insert(1) is a fluent chain; OCaml's stdlib requires two statements.Base.Hashtbl.incr**: OCaml's Base library provides a dedicated increment function; Rust's stdlib does not but the Entry API is equivalent.or_insert returns &mut V enabling direct mutation; OCaml's Hashtbl.find returns the value by copy.OCaml Approach
OCaml uses Hashtbl for mutable counting or Map for immutable:
let char_frequency s =
let tbl = Hashtbl.create 16 in
String.iter (fun c ->
let count = try Hashtbl.find tbl c with Not_found -> 0 in
Hashtbl.replace tbl c (count + 1)
) s;
tbl
Base.Hashtbl.incr provides a one-liner: Hashtbl.incr tbl ~key:c ~by:1 ~default:0.
Full Source
#![allow(clippy::all)]
// 1031: Count Frequencies
// The classic frequency counting pattern with Entry API
use std::collections::HashMap;
/// Count character frequencies
fn char_frequency() {
let text = "abracadabra";
let mut counts: HashMap<char, usize> = HashMap::new();
for ch in text.chars() {
*counts.entry(ch).or_insert(0) += 1;
}
assert_eq!(counts[&'a'], 5);
assert_eq!(counts[&'b'], 2);
assert_eq!(counts[&'r'], 2);
assert_eq!(counts[&'c'], 1);
assert_eq!(counts[&'d'], 1);
}
/// Count word frequencies using and_modify
fn word_frequency() {
let words = vec!["the", "cat", "sat", "on", "the", "mat", "the", "cat"];
let mut counts: HashMap<&str, usize> = HashMap::new();
for word in &words {
counts.entry(word).and_modify(|c| *c += 1).or_insert(1);
}
assert_eq!(counts["the"], 3);
assert_eq!(counts["cat"], 2);
assert_eq!(counts["sat"], 1);
}
/// Find the most frequent element
fn most_frequent() {
let items = vec![1, 2, 3, 2, 1, 2, 3, 2, 2];
let mut counts: HashMap<i32, usize> = HashMap::new();
for &x in &items {
*counts.entry(x).or_insert(0) += 1;
}
let (most, count) = counts
.iter()
.max_by_key(|&(_, &v)| v)
.map(|(&k, &v)| (k, v))
.unwrap();
assert_eq!(most, 2);
assert_eq!(count, 5);
}
/// Frequency counting with iterators (functional style)
fn functional_counting() {
let data = vec![1, 1, 2, 3, 3, 3];
let counts: HashMap<i32, usize> = data.iter().fold(HashMap::new(), |mut acc, &x| {
*acc.entry(x).or_insert(0) += 1;
acc
});
assert_eq!(counts[&1], 2);
assert_eq!(counts[&3], 3);
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_char_frequency() {
char_frequency();
}
#[test]
fn test_word_frequency() {
word_frequency();
}
#[test]
fn test_most_frequent() {
most_frequent();
}
#[test]
fn test_functional_counting() {
functional_counting();
}
#[test]
fn test_empty_input() {
let empty: Vec<i32> = vec![];
let counts: HashMap<i32, usize> = empty.iter().fold(HashMap::new(), |mut acc, &x| {
*acc.entry(x).or_insert(0) += 1;
acc
});
assert!(counts.is_empty());
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_char_frequency() {
char_frequency();
}
#[test]
fn test_word_frequency() {
word_frequency();
}
#[test]
fn test_most_frequent() {
most_frequent();
}
#[test]
fn test_functional_counting() {
functional_counting();
}
#[test]
fn test_empty_input() {
let empty: Vec<i32> = vec![];
let counts: HashMap<i32, usize> = empty.iter().fold(HashMap::new(), |mut acc, &x| {
*acc.entry(x).or_insert(0) += 1;
acc
});
assert!(counts.is_empty());
}
}
Deep Comparison
Count Frequencies — Comparison
Core Insight
Frequency counting is the "hello world" of the Entry API. Both languages use a fold/loop to accumulate counts, but Rust's entry pattern compresses the check-then-update into a single dereference-and-increment.
OCaml Approach
find_opt + default 0 + add with incremented valuelet n = match find_opt k m with Some n -> n | None -> 0 in add k (n+1) mfold to find max elementRust Approach
*entry(k).or_insert(0) += 1 — one lineentry(k).and_modify(|c| *c += 1).or_insert(1) — more explicitmax_by_key on iterator to find most frequentiter().fold() also worksComparison Table
| Aspect | OCaml | Rust |
|---|---|---|
| Count idiom | find_opt + match + add | *entry(k).or_insert(0) += 1 |
| Lines per count | 4 | 1 |
| Max element | fold with comparison | iter().max_by_key() |
| Allocation | New map node per update | In-place mutation |
| Style | Always functional | Imperative or functional |
Exercises
top_k_chars(text: &str, k: usize) -> Vec<(char, usize)> that returns the k most frequent characters sorted by descending count.FrequencyCounter<K> struct with add(&mut self, key: K) and top_n(n: usize) -> Vec<(K, usize)> methods.