102-frequency-counter — Frequency Counter
Tutorial Video
Text description (accessibility)
This video demonstrates the "102-frequency-counter — Frequency Counter" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Counting the frequency of items in a collection is one of the most common data processing operations: word frequency in text analysis, character histograms in compression, event counts in telemetry. Key difference from OCaml: 1. **Mutability model**: Rust's `HashMap` is imperatively mutated via the entry API; OCaml's `Map` produces new persistent versions on each update.
Tutorial
The Problem
Counting the frequency of items in a collection is one of the most common data processing operations: word frequency in text analysis, character histograms in compression, event counts in telemetry. The pattern requires a map from item to count that inserts a zero for new keys and increments existing ones atomically.
Rust's Entry API makes this pattern concise and allocation-efficient. OCaml's Map.Make module provides an immutable tree-based equivalent with different trade-offs.
🎯 Learning Outcomes
HashMap::entry().or_insert(0) to count occurrences idiomaticallyBTreeMap when sorted iteration order is requiredCode Example
pub fn word_freq(text: &str) -> HashMap<String, usize> {
let mut freq = HashMap::new();
for word in text.split_whitespace() {
*freq.entry(word.to_lowercase()).or_insert(0) += 1;
}
freq
}Key Differences
HashMap is imperatively mutated via the entry API; OCaml's Map produces new persistent versions on each update.Map.Make always iterates in key order; Rust's HashMap has random order and requires BTreeMap for sorted iteration.entry().or_insert(0) is a single lookup; OCaml's find + add is conceptually two operations but O(log n) per structural sharing update.HashMap is compact and cache-friendly; OCaml's tree map uses more pointer-chasing but enables sharing between versions.OCaml Approach
OCaml's standard Map.Make(String) creates an immutable, sorted map:
module StringMap = Map.Make(String)
let word_freq text =
String.split_on_char ' ' text
|> List.fold_left (fun acc w ->
let count = try StringMap.find w acc with Not_found -> 0 in
StringMap.add w (count + 1) acc
) StringMap.empty
Every update produces a new map via structural sharing, so intermediate maps are not copied entirely. The Hashtbl module provides mutable hash tables analogous to Rust's HashMap.
Full Source
#![allow(clippy::all)]
//! # Map Module — Frequency Counter
//!
//! Count word frequencies using a map. OCaml's `Map.Make(String)` creates
//! an immutable tree map; Rust's `HashMap` is the standard mutable map.
use std::collections::BTreeMap;
use std::collections::HashMap;
// ---------------------------------------------------------------------------
// Approach A: HashMap (idiomatic Rust)
// ---------------------------------------------------------------------------
pub fn word_freq_hashmap(text: &str) -> HashMap<String, usize> {
let mut freq = HashMap::new();
for word in text.split_whitespace() {
let w = word.to_lowercase();
*freq.entry(w).or_insert(0) += 1;
}
freq
}
// ---------------------------------------------------------------------------
// Approach B: BTreeMap (sorted, like OCaml's Map)
// ---------------------------------------------------------------------------
pub fn word_freq_btree(text: &str) -> BTreeMap<String, usize> {
text.split_whitespace()
.map(|w| w.to_lowercase())
.fold(BTreeMap::new(), |mut acc, w| {
*acc.entry(w).or_insert(0) += 1;
acc
})
}
// ---------------------------------------------------------------------------
// Approach C: Functional — collect into counts
// ---------------------------------------------------------------------------
pub fn word_freq_functional(text: &str) -> BTreeMap<String, usize> {
let words: Vec<String> = text.split_whitespace().map(|w| w.to_lowercase()).collect();
let mut sorted = words.clone();
sorted.sort();
let mut result = BTreeMap::new();
let mut i = 0;
while i < sorted.len() {
let word = &sorted[i];
let count = sorted[i..].iter().take_while(|w| *w == word).count();
result.insert(word.clone(), count);
i += count;
}
result
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic_freq() {
let freq = word_freq_hashmap("the cat sat on the mat the cat");
assert_eq!(freq["the"], 3);
assert_eq!(freq["cat"], 2);
assert_eq!(freq["sat"], 1);
}
#[test]
fn test_btree_sorted() {
let freq = word_freq_btree("the cat sat on the mat the cat");
let keys: Vec<&String> = freq.keys().collect();
assert_eq!(keys, vec!["cat", "mat", "on", "sat", "the"]);
}
#[test]
fn test_empty() {
let freq = word_freq_hashmap("");
assert!(freq.is_empty());
}
#[test]
fn test_case_insensitive() {
let freq = word_freq_hashmap("The THE the");
assert_eq!(freq["the"], 3);
}
#[test]
fn test_functional_matches() {
let a = word_freq_btree("hello world hello");
let b = word_freq_functional("hello world hello");
assert_eq!(a, b);
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic_freq() {
let freq = word_freq_hashmap("the cat sat on the mat the cat");
assert_eq!(freq["the"], 3);
assert_eq!(freq["cat"], 2);
assert_eq!(freq["sat"], 1);
}
#[test]
fn test_btree_sorted() {
let freq = word_freq_btree("the cat sat on the mat the cat");
let keys: Vec<&String> = freq.keys().collect();
assert_eq!(keys, vec!["cat", "mat", "on", "sat", "the"]);
}
#[test]
fn test_empty() {
let freq = word_freq_hashmap("");
assert!(freq.is_empty());
}
#[test]
fn test_case_insensitive() {
let freq = word_freq_hashmap("The THE the");
assert_eq!(freq["the"], 3);
}
#[test]
fn test_functional_matches() {
let a = word_freq_btree("hello world hello");
let b = word_freq_functional("hello world hello");
assert_eq!(a, b);
}
}
Deep Comparison
Comparison: Frequency Counter — OCaml vs Rust
Core Insight
OCaml's Map is immutable — SMap.add returns a new map, leaving the original unchanged. Rust's HashMap is mutable, and the entry API is its killer feature: freq.entry(w).or_insert(0) += 1 does a single lookup for both read and write, whereas OCaml's find + add does two tree traversals.
OCaml
module SMap = Map.Make(String)
let word_freq text =
text |> String.split_on_char ' '
|> List.fold_left (fun acc w ->
let count = try SMap.find w acc with Not_found -> 0 in
SMap.add w (count + 1) acc
) SMap.empty
Rust
pub fn word_freq(text: &str) -> HashMap<String, usize> {
let mut freq = HashMap::new();
for word in text.split_whitespace() {
*freq.entry(word.to_lowercase()).or_insert(0) += 1;
}
freq
}
Comparison Table
| Aspect | OCaml | Rust |
|---|---|---|
| Map type | Map.Make(String) (tree) | HashMap (hash) / BTreeMap (tree) |
| Mutability | Immutable (new map each add) | Mutable in-place |
| Lookup+update | find + add (2 traversals) | entry().or_insert() (1 lookup) |
| Missing key | Not_found exception | or_insert(default) |
| Ordering | Sorted (tree-based) | Unordered (HashMap) / Sorted (BTreeMap) |
| Functor needed | Yes: Map.Make(String) | No: generics handle it |
Learner Notes
BTreeMap** ≈ OCaml's Map: both are balanced trees with O(log n) ops and sorted iterationHashMap** is O(1) average but unordered — use BTreeMap when you need sorted keysNot_found requires try/with; Rust's or_insert is cleanerentry() takes ownership of the key — use String, not &strExercises
word_freq_hashmap to normalize words by stripping punctuation before counting.top_n(freq: &HashMap<String, usize>, n: usize) -> Vec<(String, usize)> function that returns the n most frequent words in descending order.merge_frequencies function that combines two frequency maps, summing counts for keys that appear in both.