ExamplesBy LevelBy TopicLearning Paths
356 Intermediate

356: HashMap Patterns

Functional Programming

Tutorial

The Problem

Counting, grouping, and frequency analysis are among the most common data processing tasks. A hash map is the universal tool: O(1) average insert and lookup, with keys mapped to values. Rust's HashMap<K, V> uses the SipHash-1-3 algorithm by default (DoS-resistant), supports the entry() API for atomic insert-or-update, and requires K: Hash + Eq. Understanding the three canonical patterns — word counting, grouping by key, and top-N frequency — prepares you for most real-world data pipeline work.

🎯 Learning Outcomes

  • • Build a word frequency counter using entry().or_insert(0) and += 1
  • • Implement group_by<T, K> that partitions items into a HashMap<K, Vec<T>>
  • • Extract top-N entries by sorting pairs by value descending
  • • Understand that entry() avoids double-lookup vs get + insert
  • • Use or_default() as a shorthand for or_insert(Default::default())
  • • Recognize HashMap's O(1) average vs BTreeMap's O(log n) sorted operations
  • Code Example

    #![allow(clippy::all)]
    //! # HashMap Patterns
    //! Common patterns: word count, grouping, frequency analysis.
    
    use std::collections::HashMap;
    
    pub fn word_count(text: &str) -> HashMap<String, usize> {
        let mut map = HashMap::new();
        for word in text.split_whitespace() {
            *map.entry(word.to_string()).or_insert(0) += 1;
        }
        map
    }
    
    pub fn group_by<T, K, F>(items: Vec<T>, key: F) -> HashMap<K, Vec<T>>
    where
        K: Eq + std::hash::Hash,
        F: Fn(&T) -> K,
    {
        let mut map: HashMap<K, Vec<T>> = HashMap::new();
        for item in items {
            map.entry(key(&item)).or_default().push(item);
        }
        map
    }
    
    pub fn frequency_top_n(map: &HashMap<String, usize>, n: usize) -> Vec<(&str, usize)> {
        let mut pairs: Vec<_> = map.iter().map(|(k, &v)| (k.as_str(), v)).collect();
        pairs.sort_by(|a, b| b.1.cmp(&a.1));
        pairs.into_iter().take(n).collect()
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn count_words() {
            let wc = word_count("the cat sat on the mat");
            assert_eq!(wc["the"], 2);
            assert_eq!(wc["cat"], 1);
        }
        #[test]
        fn group_by_parity() {
            let grouped = group_by(
                vec![1, 2, 3, 4, 5],
                |&x| if x % 2 == 0 { "even" } else { "odd" },
            );
            assert_eq!(grouped["even"].len(), 2);
            assert_eq!(grouped["odd"].len(), 3);
        }
        #[test]
        fn top_n_frequency() {
            let wc = word_count("a a a b b c");
            let top = frequency_top_n(&wc, 2);
            assert_eq!(top[0].0, "a");
            assert_eq!(top[1].0, "b");
        }
    }

    Key Differences

    AspectRust HashMapOCaml Hashtbl
    Insert-or-updateentry().or_insert()find + replace (two lookups)
    Hash algorithmSipHash-1-3 (DoS-resistant)Polymorphic hash (faster, not DoS-safe)
    OrderingNoneNone
    group_byor_default().push(item)find + replace with list cons
    AlternativeBTreeMap for orderedMap.Make for ordered

    OCaml Approach

    OCaml's Hashtbl provides the imperative equivalent:

    let word_count text =
      let tbl = Hashtbl.create 64 in
      String.split_on_char ' ' text |> List.iter (fun w ->
        let count = try Hashtbl.find tbl w with Not_found -> 0 in
        Hashtbl.replace tbl w (count + 1));
      tbl
    
    (* Functional alternative with Map *)
    module SMap = Map.Make(String)
    let word_count_pure text =
      String.split_on_char ' ' text
      |> List.fold_left (fun m w ->
        let n = try SMap.find w m with Not_found -> 0 in
        SMap.add w (n + 1) m) SMap.empty
    

    The Hashtbl version mutates in place; the Map version returns a new map per insertion (persistent). Rust's HashMap is imperative like Hashtbl, but the entry API makes the "find or insert" pattern atomic and idiomatic.

    Full Source

    #![allow(clippy::all)]
    //! # HashMap Patterns
    //! Common patterns: word count, grouping, frequency analysis.
    
    use std::collections::HashMap;
    
    pub fn word_count(text: &str) -> HashMap<String, usize> {
        let mut map = HashMap::new();
        for word in text.split_whitespace() {
            *map.entry(word.to_string()).or_insert(0) += 1;
        }
        map
    }
    
    pub fn group_by<T, K, F>(items: Vec<T>, key: F) -> HashMap<K, Vec<T>>
    where
        K: Eq + std::hash::Hash,
        F: Fn(&T) -> K,
    {
        let mut map: HashMap<K, Vec<T>> = HashMap::new();
        for item in items {
            map.entry(key(&item)).or_default().push(item);
        }
        map
    }
    
    pub fn frequency_top_n(map: &HashMap<String, usize>, n: usize) -> Vec<(&str, usize)> {
        let mut pairs: Vec<_> = map.iter().map(|(k, &v)| (k.as_str(), v)).collect();
        pairs.sort_by(|a, b| b.1.cmp(&a.1));
        pairs.into_iter().take(n).collect()
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn count_words() {
            let wc = word_count("the cat sat on the mat");
            assert_eq!(wc["the"], 2);
            assert_eq!(wc["cat"], 1);
        }
        #[test]
        fn group_by_parity() {
            let grouped = group_by(
                vec![1, 2, 3, 4, 5],
                |&x| if x % 2 == 0 { "even" } else { "odd" },
            );
            assert_eq!(grouped["even"].len(), 2);
            assert_eq!(grouped["odd"].len(), 3);
        }
        #[test]
        fn top_n_frequency() {
            let wc = word_count("a a a b b c");
            let top = frequency_top_n(&wc, 2);
            assert_eq!(top[0].0, "a");
            assert_eq!(top[1].0, "b");
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn count_words() {
            let wc = word_count("the cat sat on the mat");
            assert_eq!(wc["the"], 2);
            assert_eq!(wc["cat"], 1);
        }
        #[test]
        fn group_by_parity() {
            let grouped = group_by(
                vec![1, 2, 3, 4, 5],
                |&x| if x % 2 == 0 { "even" } else { "odd" },
            );
            assert_eq!(grouped["even"].len(), 2);
            assert_eq!(grouped["odd"].len(), 3);
        }
        #[test]
        fn top_n_frequency() {
            let wc = word_count("a a a b b c");
            let top = frequency_top_n(&wc, 2);
            assert_eq!(top[0].0, "a");
            assert_eq!(top[1].0, "b");
        }
    }

    Deep Comparison

    OCaml vs Rust: Hashmap Patterns

    Overview

    See the example.rs and example.ml files for detailed implementations.

    Key Differences

    AspectOCamlRust
    Type systemHindley-MilnerOwnership + traits
    MemoryGCZero-cost abstractions
    MutabilityExplicit refmut keyword
    Error handlingOption/ResultResult<T, E>

    See README.md for detailed comparison.

    Exercises

  • Anagram grouping: Given a list of words, group anagrams together: group_by(words, |w| { let mut sorted = w.chars().collect::<Vec<_>>(); sorted.sort(); sorted }); output groups where each group contains words that are anagrams of each other.
  • Inverted index: Build an inverted index mapping each word to the set of document IDs it appears in, using HashMap<String, HashSet<usize>>; then query "documents containing both 'rust' and 'async'".
  • LRU approximation: Implement a simple frequency-based cache using HashMap<K, (V, usize)> tracking access counts; evict the least-frequently-used entry when size exceeds capacity.
  • Open Source Repos