ExamplesBy LevelBy TopicLearning Paths
068 Fundamental

068 — Frequency Counter (Map Module)

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "068 — Frequency Counter (Map Module)" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Counting word frequencies is the "Hello World" of data analysis — used in text indexing (Elasticsearch), word clouds, spell-checking frequency tables, natural language processing (term frequency for TF-IDF), and compression (Huffman coding frequencies). Key difference from OCaml: 1. **`HashMap` vs `Map.Make`**: Rust's `HashMap` uses hashing (O(1)). OCaml's `Map.Make` uses a balanced BST (O(log n)) — always sorted. Use `BTreeMap` for sorted Rust maps.

Tutorial

The Problem

Counting word frequencies is the "Hello World" of data analysis — used in text indexing (Elasticsearch), word clouds, spell-checking frequency tables, natural language processing (term frequency for TF-IDF), and compression (Huffman coding frequencies). The Rust equivalent of OCaml's Map.Make module is HashMap (unordered, O(1) ops) or BTreeMap (ordered, O(log n) ops).

The entry().or_insert(0) API is the idiomatic Rust pattern for updating-or-inserting — a single lookup that either modifies an existing value or inserts a default. This avoids the two-lookup pattern (check if exists, then insert or update).

🎯 Learning Outcomes

  • • Use HashMap<String, usize> as a frequency counter
  • • Apply the entry().or_insert(default) pattern for update-or-insert
  • • Use BTreeMap when sorted output is needed (like OCaml's ordered Map.Make)
  • • Implement word frequency with iterators and fold for a functional style
  • • Understand the difference between HashMap (unordered) and BTreeMap (sorted)
  • • Use HashMap::entry(key).or_insert(0) to atomically insert-or-update a counter without double lookup
  • • Use BTreeMap instead of HashMap when the output must be sorted by key
  • Code Example

    #![allow(clippy::all)]
    /// # Map Module — Frequency Counter
    ///
    /// Word frequency counting using HashMap — the Rust equivalent of OCaml's Map.Make.
    use std::collections::HashMap;
    
    /// Idiomatic Rust: split, lowercase, count with entry API.
    /// The `entry().or_insert(0)` pattern is the standard way to update-or-insert.
    pub fn word_freq(text: &str) -> HashMap<String, usize> {
        let mut freq = HashMap::new();
        for word in text.split_whitespace() {
            let lower = word.to_lowercase();
            // `entry` API: get mutable reference, inserting default if absent
            *freq.entry(lower).or_insert(0) += 1;
        }
        freq
    }
    
    /// Functional style using fold (iterator chain).
    pub fn word_freq_functional(text: &str) -> HashMap<String, usize> {
        text.split_whitespace()
            .map(|w| w.to_lowercase())
            .fold(HashMap::new(), |mut acc, word| {
                *acc.entry(word).or_insert(0) += 1;
                acc
            })
    }
    
    /// Using BTreeMap for sorted output (like OCaml's Map which is ordered).
    pub fn word_freq_sorted(text: &str) -> std::collections::BTreeMap<String, usize> {
        let mut freq = std::collections::BTreeMap::new();
        for word in text.split_whitespace() {
            *freq.entry(word.to_lowercase()).or_insert(0) += 1;
        }
        freq
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_basic() {
            let freq = word_freq("the cat sat on the mat the cat");
            assert_eq!(freq["the"], 3);
            assert_eq!(freq["cat"], 2);
            assert_eq!(freq["sat"], 1);
        }
    
        #[test]
        fn test_empty() {
            let freq = word_freq("");
            assert!(freq.is_empty());
        }
    
        #[test]
        fn test_single_word() {
            let freq = word_freq("hello");
            assert_eq!(freq["hello"], 1);
        }
    
        #[test]
        fn test_case_insensitive() {
            let freq = word_freq("Hello hello HELLO");
            assert_eq!(freq["hello"], 3);
        }
    
        #[test]
        fn test_functional_version() {
            let freq = word_freq_functional("a b a c b a");
            assert_eq!(freq["a"], 3);
            assert_eq!(freq["b"], 2);
        }
    
        #[test]
        fn test_sorted() {
            let freq = word_freq_sorted("b a c a b");
            let keys: Vec<&String> = freq.keys().collect();
            assert_eq!(keys, vec!["a", "b", "c"]); // sorted!
        }
    }

    Key Differences

  • **HashMap vs Map.Make**: Rust's HashMap uses hashing (O(1)). OCaml's Map.Make uses a balanced BST (O(log n)) — always sorted. Use BTreeMap for sorted Rust maps.
  • **entry API**: Rust's entry().or_insert() avoids double lookup. OCaml's Map.add key (Option.value (Map.find_opt key m) ~default:0 + 1) m does two lookups.
  • Functional vs mutable: OCaml's Map.add returns a new map — fully immutable, structural sharing. Rust's HashMap::insert mutates in place. The functional fold style in Rust accumulates a new HashMap per step — less efficient.
  • Ordering: OCaml's Map.Make iterates in sorted key order. Rust's HashMap has no guaranteed iteration order; use BTreeMap or sort_by_key on the entries.
  • **HashMap::entry API:** The entry API handles the insert-or-update pattern atomically: map.entry(key).or_insert(0) returns a mutable reference to the value, creating it if absent. Then *count += 1. This avoids double-lookup.
  • **BTreeMap for sorted output:** Use BTreeMap<K, V> instead of HashMap<K, V> when you need keys in sorted order. Iteration over a BTreeMap is always sorted by key.
  • **OCaml Hashtbl:** Hashtbl.create 16 + Hashtbl.find_opt + Hashtbl.replace implements the same frequency counting pattern. OCaml's Map module provides a functional (immutable) sorted map.
  • Char vs byte counting: s.chars() for Unicode-correct character counting; s.bytes() for faster ASCII counting. For most frequency-counting tasks, byte-level is sufficient and faster.
  • OCaml Approach

    OCaml's Map: module StringMap = Map.Make(String). Frequency counting: List.fold_left (fun map word -> let count = try StringMap.find word map with Not_found -> 0 in StringMap.add word (count + 1) map) StringMap.empty words. Or with find_opt: let count = Option.value (StringMap.find_opt word map) ~default:0.

    Full Source

    #![allow(clippy::all)]
    /// # Map Module — Frequency Counter
    ///
    /// Word frequency counting using HashMap — the Rust equivalent of OCaml's Map.Make.
    use std::collections::HashMap;
    
    /// Idiomatic Rust: split, lowercase, count with entry API.
    /// The `entry().or_insert(0)` pattern is the standard way to update-or-insert.
    pub fn word_freq(text: &str) -> HashMap<String, usize> {
        let mut freq = HashMap::new();
        for word in text.split_whitespace() {
            let lower = word.to_lowercase();
            // `entry` API: get mutable reference, inserting default if absent
            *freq.entry(lower).or_insert(0) += 1;
        }
        freq
    }
    
    /// Functional style using fold (iterator chain).
    pub fn word_freq_functional(text: &str) -> HashMap<String, usize> {
        text.split_whitespace()
            .map(|w| w.to_lowercase())
            .fold(HashMap::new(), |mut acc, word| {
                *acc.entry(word).or_insert(0) += 1;
                acc
            })
    }
    
    /// Using BTreeMap for sorted output (like OCaml's Map which is ordered).
    pub fn word_freq_sorted(text: &str) -> std::collections::BTreeMap<String, usize> {
        let mut freq = std::collections::BTreeMap::new();
        for word in text.split_whitespace() {
            *freq.entry(word.to_lowercase()).or_insert(0) += 1;
        }
        freq
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_basic() {
            let freq = word_freq("the cat sat on the mat the cat");
            assert_eq!(freq["the"], 3);
            assert_eq!(freq["cat"], 2);
            assert_eq!(freq["sat"], 1);
        }
    
        #[test]
        fn test_empty() {
            let freq = word_freq("");
            assert!(freq.is_empty());
        }
    
        #[test]
        fn test_single_word() {
            let freq = word_freq("hello");
            assert_eq!(freq["hello"], 1);
        }
    
        #[test]
        fn test_case_insensitive() {
            let freq = word_freq("Hello hello HELLO");
            assert_eq!(freq["hello"], 3);
        }
    
        #[test]
        fn test_functional_version() {
            let freq = word_freq_functional("a b a c b a");
            assert_eq!(freq["a"], 3);
            assert_eq!(freq["b"], 2);
        }
    
        #[test]
        fn test_sorted() {
            let freq = word_freq_sorted("b a c a b");
            let keys: Vec<&String> = freq.keys().collect();
            assert_eq!(keys, vec!["a", "b", "c"]); // sorted!
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_basic() {
            let freq = word_freq("the cat sat on the mat the cat");
            assert_eq!(freq["the"], 3);
            assert_eq!(freq["cat"], 2);
            assert_eq!(freq["sat"], 1);
        }
    
        #[test]
        fn test_empty() {
            let freq = word_freq("");
            assert!(freq.is_empty());
        }
    
        #[test]
        fn test_single_word() {
            let freq = word_freq("hello");
            assert_eq!(freq["hello"], 1);
        }
    
        #[test]
        fn test_case_insensitive() {
            let freq = word_freq("Hello hello HELLO");
            assert_eq!(freq["hello"], 3);
        }
    
        #[test]
        fn test_functional_version() {
            let freq = word_freq_functional("a b a c b a");
            assert_eq!(freq["a"], 3);
            assert_eq!(freq["b"], 2);
        }
    
        #[test]
        fn test_sorted() {
            let freq = word_freq_sorted("b a c a b");
            let keys: Vec<&String> = freq.keys().collect();
            assert_eq!(keys, vec!["a", "b", "c"]); // sorted!
        }
    }

    Deep Comparison

    Frequency Counter — OCaml vs Rust Comparison

    Core Insight

    Counting frequencies is the classic map use case. OCaml's Map.Make functor creates an immutable balanced tree map — each add creates a new map. Rust's HashMap is mutable with an entry API that makes update-or-insert a one-liner. For sorted output, Rust offers BTreeMap.

    OCaml Approach

    Map.Make(String) creates a module with an ordered map backed by balanced binary trees. Each update via SMap.add returns a new immutable map (structural sharing keeps this efficient). The find + Not_found exception pattern is common but slightly awkward.

    Rust Approach

    HashMap::entry() returns an Entry enum (Occupied or Vacant) — calling .or_insert(0) provides a mutable reference to the count, which can be incremented in place. No exception handling needed. For ordered iteration, use BTreeMap instead.

    Comparison Table

    AspectOCamlRust
    MemoryImmutable tree (shared nodes)Mutable hash table
    Null safetyNot_found exceptionentry API (no missing key panic)
    ErrorsException on missing keyEntry enum handles presence
    IterationSMap.iter (sorted by key)Unordered (HashMap) or sorted (BTreeMap)
    Updatefind + add (new map)entry().or_insert() (in-place)

    Things Rust Learners Should Notice

  • **entry() API** — the killer feature for maps; avoids double-lookup (check then insert)
  • ***freq.entry(k).or_insert(0) += 1** — dereference the mutable ref, then increment
  • **HashMap vs BTreeMap** — hash is O(1) average but unordered; BTree is O(log n) but sorted
  • OCaml's Map is immutable — functional purity at the cost of creating new nodes on each update
  • **split_whitespace()** — handles multiple spaces, tabs, newlines; better than split(' ')
  • Further Reading

  • • [HashMap::entry](https://doc.rust-lang.org/std/collections/struct.HashMap.html#method.entry)
  • • [BTreeMap](https://doc.rust-lang.org/std/collections/struct.BTreeMap.html)
  • Exercises

  • Top N words: Write top_n_words(text: &str, n: usize) -> Vec<(String, usize)> that returns the n most frequent words. Sort by frequency descending.
  • Co-occurrence: Count how often pairs of adjacent words appear together in a text, building a HashMap<(String, String), usize>.
  • Histogram: Write print_histogram(freq: &HashMap<String, usize>) that prints each word with a bar of * characters proportional to its frequency (max bar length = 40 chars).
  • Top-N frequent: Implement top_n_frequent<T: Eq + Hash + Clone>(items: &[T], n: usize) -> Vec<(T, usize)> that returns the n most frequent elements in descending order of frequency. Use a min-heap of size n.
  • Character frequency: Write char_frequency(s: &str) -> Vec<(char, usize)> that counts character frequencies in a string and returns them sorted by frequency descending, then alphabetically for ties. This is the first step in Huffman coding.
  • Open Source Repos