068 — Frequency Counter (Map Module)
Tutorial Video
Text description (accessibility)
This video demonstrates the "068 — Frequency Counter (Map Module)" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Counting word frequencies is the "Hello World" of data analysis — used in text indexing (Elasticsearch), word clouds, spell-checking frequency tables, natural language processing (term frequency for TF-IDF), and compression (Huffman coding frequencies). Key difference from OCaml: 1. **`HashMap` vs `Map.Make`**: Rust's `HashMap` uses hashing (O(1)). OCaml's `Map.Make` uses a balanced BST (O(log n)) — always sorted. Use `BTreeMap` for sorted Rust maps.
Tutorial
The Problem
Counting word frequencies is the "Hello World" of data analysis — used in text indexing (Elasticsearch), word clouds, spell-checking frequency tables, natural language processing (term frequency for TF-IDF), and compression (Huffman coding frequencies). The Rust equivalent of OCaml's Map.Make module is HashMap (unordered, O(1) ops) or BTreeMap (ordered, O(log n) ops).
The entry().or_insert(0) API is the idiomatic Rust pattern for updating-or-inserting — a single lookup that either modifies an existing value or inserts a default. This avoids the two-lookup pattern (check if exists, then insert or update).
🎯 Learning Outcomes
HashMap<String, usize> as a frequency counterentry().or_insert(default) pattern for update-or-insertBTreeMap when sorted output is needed (like OCaml's ordered Map.Make)fold for a functional styleHashMap (unordered) and BTreeMap (sorted)HashMap::entry(key).or_insert(0) to atomically insert-or-update a counter without double lookupBTreeMap instead of HashMap when the output must be sorted by keyCode Example
#![allow(clippy::all)]
/// # Map Module — Frequency Counter
///
/// Word frequency counting using HashMap — the Rust equivalent of OCaml's Map.Make.
use std::collections::HashMap;
/// Idiomatic Rust: split, lowercase, count with entry API.
/// The `entry().or_insert(0)` pattern is the standard way to update-or-insert.
pub fn word_freq(text: &str) -> HashMap<String, usize> {
let mut freq = HashMap::new();
for word in text.split_whitespace() {
let lower = word.to_lowercase();
// `entry` API: get mutable reference, inserting default if absent
*freq.entry(lower).or_insert(0) += 1;
}
freq
}
/// Functional style using fold (iterator chain).
pub fn word_freq_functional(text: &str) -> HashMap<String, usize> {
text.split_whitespace()
.map(|w| w.to_lowercase())
.fold(HashMap::new(), |mut acc, word| {
*acc.entry(word).or_insert(0) += 1;
acc
})
}
/// Using BTreeMap for sorted output (like OCaml's Map which is ordered).
pub fn word_freq_sorted(text: &str) -> std::collections::BTreeMap<String, usize> {
let mut freq = std::collections::BTreeMap::new();
for word in text.split_whitespace() {
*freq.entry(word.to_lowercase()).or_insert(0) += 1;
}
freq
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic() {
let freq = word_freq("the cat sat on the mat the cat");
assert_eq!(freq["the"], 3);
assert_eq!(freq["cat"], 2);
assert_eq!(freq["sat"], 1);
}
#[test]
fn test_empty() {
let freq = word_freq("");
assert!(freq.is_empty());
}
#[test]
fn test_single_word() {
let freq = word_freq("hello");
assert_eq!(freq["hello"], 1);
}
#[test]
fn test_case_insensitive() {
let freq = word_freq("Hello hello HELLO");
assert_eq!(freq["hello"], 3);
}
#[test]
fn test_functional_version() {
let freq = word_freq_functional("a b a c b a");
assert_eq!(freq["a"], 3);
assert_eq!(freq["b"], 2);
}
#[test]
fn test_sorted() {
let freq = word_freq_sorted("b a c a b");
let keys: Vec<&String> = freq.keys().collect();
assert_eq!(keys, vec!["a", "b", "c"]); // sorted!
}
}Key Differences
HashMap vs Map.Make**: Rust's HashMap uses hashing (O(1)). OCaml's Map.Make uses a balanced BST (O(log n)) — always sorted. Use BTreeMap for sorted Rust maps.entry API**: Rust's entry().or_insert() avoids double lookup. OCaml's Map.add key (Option.value (Map.find_opt key m) ~default:0 + 1) m does two lookups.Map.add returns a new map — fully immutable, structural sharing. Rust's HashMap::insert mutates in place. The functional fold style in Rust accumulates a new HashMap per step — less efficient.Map.Make iterates in sorted key order. Rust's HashMap has no guaranteed iteration order; use BTreeMap or sort_by_key on the entries.HashMap::entry API:** The entry API handles the insert-or-update pattern atomically: map.entry(key).or_insert(0) returns a mutable reference to the value, creating it if absent. Then *count += 1. This avoids double-lookup.BTreeMap for sorted output:** Use BTreeMap<K, V> instead of HashMap<K, V> when you need keys in sorted order. Iteration over a BTreeMap is always sorted by key.Hashtbl:** Hashtbl.create 16 + Hashtbl.find_opt + Hashtbl.replace implements the same frequency counting pattern. OCaml's Map module provides a functional (immutable) sorted map.s.chars() for Unicode-correct character counting; s.bytes() for faster ASCII counting. For most frequency-counting tasks, byte-level is sufficient and faster.OCaml Approach
OCaml's Map: module StringMap = Map.Make(String). Frequency counting: List.fold_left (fun map word -> let count = try StringMap.find word map with Not_found -> 0 in StringMap.add word (count + 1) map) StringMap.empty words. Or with find_opt: let count = Option.value (StringMap.find_opt word map) ~default:0.
Full Source
#![allow(clippy::all)]
/// # Map Module — Frequency Counter
///
/// Word frequency counting using HashMap — the Rust equivalent of OCaml's Map.Make.
use std::collections::HashMap;
/// Idiomatic Rust: split, lowercase, count with entry API.
/// The `entry().or_insert(0)` pattern is the standard way to update-or-insert.
pub fn word_freq(text: &str) -> HashMap<String, usize> {
let mut freq = HashMap::new();
for word in text.split_whitespace() {
let lower = word.to_lowercase();
// `entry` API: get mutable reference, inserting default if absent
*freq.entry(lower).or_insert(0) += 1;
}
freq
}
/// Functional style using fold (iterator chain).
pub fn word_freq_functional(text: &str) -> HashMap<String, usize> {
text.split_whitespace()
.map(|w| w.to_lowercase())
.fold(HashMap::new(), |mut acc, word| {
*acc.entry(word).or_insert(0) += 1;
acc
})
}
/// Using BTreeMap for sorted output (like OCaml's Map which is ordered).
pub fn word_freq_sorted(text: &str) -> std::collections::BTreeMap<String, usize> {
let mut freq = std::collections::BTreeMap::new();
for word in text.split_whitespace() {
*freq.entry(word.to_lowercase()).or_insert(0) += 1;
}
freq
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic() {
let freq = word_freq("the cat sat on the mat the cat");
assert_eq!(freq["the"], 3);
assert_eq!(freq["cat"], 2);
assert_eq!(freq["sat"], 1);
}
#[test]
fn test_empty() {
let freq = word_freq("");
assert!(freq.is_empty());
}
#[test]
fn test_single_word() {
let freq = word_freq("hello");
assert_eq!(freq["hello"], 1);
}
#[test]
fn test_case_insensitive() {
let freq = word_freq("Hello hello HELLO");
assert_eq!(freq["hello"], 3);
}
#[test]
fn test_functional_version() {
let freq = word_freq_functional("a b a c b a");
assert_eq!(freq["a"], 3);
assert_eq!(freq["b"], 2);
}
#[test]
fn test_sorted() {
let freq = word_freq_sorted("b a c a b");
let keys: Vec<&String> = freq.keys().collect();
assert_eq!(keys, vec!["a", "b", "c"]); // sorted!
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic() {
let freq = word_freq("the cat sat on the mat the cat");
assert_eq!(freq["the"], 3);
assert_eq!(freq["cat"], 2);
assert_eq!(freq["sat"], 1);
}
#[test]
fn test_empty() {
let freq = word_freq("");
assert!(freq.is_empty());
}
#[test]
fn test_single_word() {
let freq = word_freq("hello");
assert_eq!(freq["hello"], 1);
}
#[test]
fn test_case_insensitive() {
let freq = word_freq("Hello hello HELLO");
assert_eq!(freq["hello"], 3);
}
#[test]
fn test_functional_version() {
let freq = word_freq_functional("a b a c b a");
assert_eq!(freq["a"], 3);
assert_eq!(freq["b"], 2);
}
#[test]
fn test_sorted() {
let freq = word_freq_sorted("b a c a b");
let keys: Vec<&String> = freq.keys().collect();
assert_eq!(keys, vec!["a", "b", "c"]); // sorted!
}
}
Deep Comparison
Frequency Counter — OCaml vs Rust Comparison
Core Insight
Counting frequencies is the classic map use case. OCaml's Map.Make functor creates an immutable balanced tree map — each add creates a new map. Rust's HashMap is mutable with an entry API that makes update-or-insert a one-liner. For sorted output, Rust offers BTreeMap.
OCaml Approach
Map.Make(String) creates a module with an ordered map backed by balanced binary trees. Each update via SMap.add returns a new immutable map (structural sharing keeps this efficient). The find + Not_found exception pattern is common but slightly awkward.
Rust Approach
HashMap::entry() returns an Entry enum (Occupied or Vacant) — calling .or_insert(0) provides a mutable reference to the count, which can be incremented in place. No exception handling needed. For ordered iteration, use BTreeMap instead.
Comparison Table
| Aspect | OCaml | Rust |
|---|---|---|
| Memory | Immutable tree (shared nodes) | Mutable hash table |
| Null safety | Not_found exception | entry API (no missing key panic) |
| Errors | Exception on missing key | Entry enum handles presence |
| Iteration | SMap.iter (sorted by key) | Unordered (HashMap) or sorted (BTreeMap) |
| Update | find + add (new map) | entry().or_insert() (in-place) |
Things Rust Learners Should Notice
entry() API** — the killer feature for maps; avoids double-lookup (check then insert)*freq.entry(k).or_insert(0) += 1** — dereference the mutable ref, then incrementHashMap vs BTreeMap** — hash is O(1) average but unordered; BTree is O(log n) but sortedsplit_whitespace()** — handles multiple spaces, tabs, newlines; better than split(' ')Further Reading
Exercises
top_n_words(text: &str, n: usize) -> Vec<(String, usize)> that returns the n most frequent words. Sort by frequency descending.HashMap<(String, String), usize>.print_histogram(freq: &HashMap<String, usize>) that prints each word with a bar of * characters proportional to its frequency (max bar length = 40 chars).top_n_frequent<T: Eq + Hash + Clone>(items: &[T], n: usize) -> Vec<(T, usize)> that returns the n most frequent elements in descending order of frequency. Use a min-heap of size n.char_frequency(s: &str) -> Vec<(char, usize)> that counts character frequencies in a string and returns them sorted by frequency descending, then alphabetically for ties. This is the first step in Huffman coding.