ExamplesBy LevelBy TopicLearning Paths
090 Intermediate

Frequency Analysis — Letter Distribution

String Processing

Tutorial Video

Text description (accessibility)

This video demonstrates the "Frequency Analysis — Letter Distribution" functional Rust example. Difficulty level: Intermediate. Key concepts covered: String Processing. Count how many times each letter (a–z) appears in a string, ignoring case and non-alphabetic characters, then return results sorted by frequency descending. Key difference from OCaml: 1. **Map type:** OCaml uses `Map.Make(Char)` (balanced BST, sorted by key); Rust offers `HashMap` (hash table, unordered) or `BTreeMap` (B

Tutorial

The Problem

Count how many times each letter (a–z) appears in a string, ignoring case and non-alphabetic characters, then return results sorted by frequency descending.

🎯 Learning Outcomes

  • • How HashMap::entry with or_insert replaces OCaml's Map.update for in-place counting
  • • Why BTreeMap mirrors OCaml's Map.Make(Char) — both keep keys sorted, both are O(log n) per operation
  • • How iterator chains (.filter().map().fold()) replace OCaml's String.fold_left pipeline
  • • The difference between hash-based (HashMap) and tree-based (BTreeMap) maps and when to choose each
  • 🦀 The Rust Way

    Rust offers two natural equivalents: HashMap<char, usize> for O(1) average access and BTreeMap<char, usize> for O(log n) with keys in sorted order — the latter directly mirrors OCaml's Map.Make. The entry().or_insert(0) pattern compresses OCaml's update ... function None -> Some 1 | Some n -> Some (n+1) into a single idiomatic expression.

    Code Example

    use std::collections::HashMap;
    
    pub fn frequency(s: &str) -> HashMap<char, usize> {
        s.chars()
            .filter(|c| c.is_ascii_alphabetic())
            .map(|c| c.to_ascii_lowercase())
            .fold(HashMap::new(), |mut map, c| {
                *map.entry(c).or_insert(0) += 1;
                map
            })
    }
    
    pub fn sorted_freq(s: &str) -> Vec<(char, usize)> {
        let mut pairs: Vec<(char, usize)> = frequency(s).into_iter().collect();
        pairs.sort_by(|(c1, n1), (c2, n2)| n2.cmp(n1).then(c1.cmp(c2)));
        pairs
    }

    Key Differences

  • Map type: OCaml uses Map.Make(Char) (balanced BST, sorted by key); Rust offers HashMap (hash table, unordered) or BTreeMap (B-tree, sorted by key)
  • Update idiom: OCaml's CMap.update c f m takes a function over option; Rust's map.entry(c).or_insert(0) returns a mutable reference for in-place mutation
  • String iteration: OCaml's String.fold_left threads state; Rust's .chars().filter().map().fold() composes the same pipeline with explicit types
  • Sorting: OCaml's List.sort (fun (_, a) (_, b) -> compare b a) sorts by value descending; Rust's .sort_by(|(c1,n1),(c2,n2)| n2.cmp(n1).then(c1.cmp(c2))) adds a stable tiebreaker
  • OCaml Approach

    OCaml uses a functor Map.Make(Char) to create a balanced BST map keyed on char. The String.fold_left threads a map accumulator through each character, calling CMap.update to increment or initialize counts. The map is then converted to a sorted binding list for display.

    Full Source

    #![allow(clippy::all)]
    use std::collections::{BTreeMap, HashMap};
    
    /// Solution 1: Idiomatic Rust — HashMap with fold
    /// Uses entry API for efficient in-place counting
    pub fn frequency(s: &str) -> HashMap<char, usize> {
        s.chars()
            .filter(|c| c.is_ascii_alphabetic())
            .map(|c| c.to_ascii_lowercase())
            .fold(HashMap::new(), |mut map, c| {
                *map.entry(c).or_insert(0) += 1;
                map
            })
    }
    
    /// Solution 2: BTreeMap — mirrors OCaml's Map.Make(Char), keys stay sorted
    /// Equivalent to OCaml's `CMap.update c (function None -> Some 1 | Some n -> Some (n+1)) m`
    pub fn frequency_btree(s: &str) -> BTreeMap<char, usize> {
        s.chars()
            .filter(|c| c.is_ascii_alphabetic())
            .map(|c| c.to_ascii_lowercase())
            .fold(BTreeMap::new(), |mut map, c| {
                *map.entry(c).or_insert(0) += 1;
                map
            })
    }
    
    /// Solution 3: Sorted by frequency descending, ties broken alphabetically
    /// Mirrors OCaml's `List.sort (fun (_, a) (_, b) -> compare b a)`
    pub fn sorted_freq(s: &str) -> Vec<(char, usize)> {
        let mut pairs: Vec<(char, usize)> = frequency(s).into_iter().collect();
        pairs.sort_by(|(c1, n1), (c2, n2)| n2.cmp(n1).then(c1.cmp(c2)));
        pairs
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_empty_string() {
            assert_eq!(frequency(""), HashMap::new());
            assert!(sorted_freq("").is_empty());
        }
    
        #[test]
        fn test_single_character() {
            let freq = frequency("a");
            assert_eq!(freq[&'a'], 1);
            assert_eq!(freq.len(), 1);
        }
    
        #[test]
        fn test_case_insensitive() {
            let freq = frequency("AaAa");
            assert_eq!(freq[&'a'], 4);
            assert_eq!(freq.len(), 1);
        }
    
        #[test]
        fn test_non_alpha_ignored() {
            let freq = frequency("a1b2c! a");
            assert_eq!(freq[&'a'], 2);
            assert_eq!(freq[&'b'], 1);
            assert_eq!(freq[&'c'], 1);
            assert_eq!(freq.len(), 3);
        }
    
        #[test]
        fn test_pangram_all_letters_present() {
            let text = "The quick brown fox jumps over the lazy dog";
            let freq = frequency(text);
            // All 26 letters appear at least once in a pangram
            assert_eq!(freq.len(), 26);
            // 'e' and 'o' appear most in this pangram
            assert_eq!(freq[&'e'], 3);
            assert_eq!(freq[&'o'], 4);
        }
    
        #[test]
        fn test_btree_sorted_by_key() {
            let freq = frequency_btree("bac");
            let keys: Vec<char> = freq.keys().copied().collect();
            assert_eq!(keys, vec!['a', 'b', 'c']);
        }
    
        #[test]
        fn test_sorted_freq_descending() {
            let result = sorted_freq("aaabbc");
            assert_eq!(result[0], ('a', 3));
            assert_eq!(result[1], ('b', 2));
            assert_eq!(result[2], ('c', 1));
        }
    
        #[test]
        fn test_sorted_freq_ties_broken_alphabetically() {
            // 'a' and 'b' both appear twice — 'a' should come first
            let result = sorted_freq("aabb");
            assert_eq!(result[0], ('a', 2));
            assert_eq!(result[1], ('b', 2));
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_empty_string() {
            assert_eq!(frequency(""), HashMap::new());
            assert!(sorted_freq("").is_empty());
        }
    
        #[test]
        fn test_single_character() {
            let freq = frequency("a");
            assert_eq!(freq[&'a'], 1);
            assert_eq!(freq.len(), 1);
        }
    
        #[test]
        fn test_case_insensitive() {
            let freq = frequency("AaAa");
            assert_eq!(freq[&'a'], 4);
            assert_eq!(freq.len(), 1);
        }
    
        #[test]
        fn test_non_alpha_ignored() {
            let freq = frequency("a1b2c! a");
            assert_eq!(freq[&'a'], 2);
            assert_eq!(freq[&'b'], 1);
            assert_eq!(freq[&'c'], 1);
            assert_eq!(freq.len(), 3);
        }
    
        #[test]
        fn test_pangram_all_letters_present() {
            let text = "The quick brown fox jumps over the lazy dog";
            let freq = frequency(text);
            // All 26 letters appear at least once in a pangram
            assert_eq!(freq.len(), 26);
            // 'e' and 'o' appear most in this pangram
            assert_eq!(freq[&'e'], 3);
            assert_eq!(freq[&'o'], 4);
        }
    
        #[test]
        fn test_btree_sorted_by_key() {
            let freq = frequency_btree("bac");
            let keys: Vec<char> = freq.keys().copied().collect();
            assert_eq!(keys, vec!['a', 'b', 'c']);
        }
    
        #[test]
        fn test_sorted_freq_descending() {
            let result = sorted_freq("aaabbc");
            assert_eq!(result[0], ('a', 3));
            assert_eq!(result[1], ('b', 2));
            assert_eq!(result[2], ('c', 1));
        }
    
        #[test]
        fn test_sorted_freq_ties_broken_alphabetically() {
            // 'a' and 'b' both appear twice — 'a' should come first
            let result = sorted_freq("aabb");
            assert_eq!(result[0], ('a', 2));
            assert_eq!(result[1], ('b', 2));
        }
    }

    Deep Comparison

    OCaml vs Rust: Frequency Analysis — Letter Distribution

    Side-by-Side Code

    OCaml

    module CMap = Map.Make(Char)
    
    let frequency s =
      String.fold_left (fun m c ->
        let c = Char.lowercase_ascii c in
        if c >= 'a' && c <= 'z' then
          CMap.update c (function None -> Some 1 | Some n -> Some (n+1)) m
        else m
      ) CMap.empty s
    
    let sorted_freq s =
      frequency s |> CMap.bindings
      |> List.sort (fun (_, a) (_, b) -> compare b a)
    

    Rust (idiomatic — HashMap)

    use std::collections::HashMap;
    
    pub fn frequency(s: &str) -> HashMap<char, usize> {
        s.chars()
            .filter(|c| c.is_ascii_alphabetic())
            .map(|c| c.to_ascii_lowercase())
            .fold(HashMap::new(), |mut map, c| {
                *map.entry(c).or_insert(0) += 1;
                map
            })
    }
    
    pub fn sorted_freq(s: &str) -> Vec<(char, usize)> {
        let mut pairs: Vec<(char, usize)> = frequency(s).into_iter().collect();
        pairs.sort_by(|(c1, n1), (c2, n2)| n2.cmp(n1).then(c1.cmp(c2)));
        pairs
    }
    

    Rust (BTreeMap — mirrors OCaml's Map.Make, keys sorted)

    use std::collections::BTreeMap;
    
    pub fn frequency_btree(s: &str) -> BTreeMap<char, usize> {
        s.chars()
            .filter(|c| c.is_ascii_alphabetic())
            .map(|c| c.to_ascii_lowercase())
            .fold(BTreeMap::new(), |mut map, c| {
                *map.entry(c).or_insert(0) += 1;
                map
            })
    }
    

    Type Signatures

    ConceptOCamlRust
    Frequency map typeCMap.t (≈ Char -> int)HashMap<char, usize> or BTreeMap<char, usize>
    Function signatureval frequency : string -> CMap.tfn frequency(s: &str) -> HashMap<char, usize>
    Sorted result(char * int) listVec<(char, usize)>
    Map entry updateCMap.update c f m*map.entry(c).or_insert(0) += 1
    String foldString.fold_left f init ss.chars().fold(init, f)
    Bindings extractionCMap.bindings mmap.into_iter().collect()

    Key Insights

  • **BTreeMap is the faithful OCaml equivalent.** OCaml's Map.Make(Char) is a balanced BST with O(log n) operations and keys always in sorted order. Rust's BTreeMap<char, usize> is structurally identical. HashMap is faster on average but unordered — a deliberate trade-off.
  • **entry().or_insert() beats OCaml's update.** OCaml requires a function option -> option to express "insert 1 or increment". Rust's entry API returns a &mut usize, enabling += 1 directly — no pattern match needed, zero allocation, and no intermediate closure.
  • Iterator chains compose like OCaml pipelines. OCaml's String.fold_left (fun m c -> ...) CMap.empty s threads state explicitly. Rust's .chars().filter().map().fold() is the same pipeline with typed stages. Both are lazy in spirit; Rust's iterators are actually zero-cost lazy.
  • Case normalization is symmetric. OCaml uses Char.lowercase_ascii; Rust uses .to_ascii_lowercase(). Both operate on ASCII only — correct for letter frequency on ASCII text and cheaper than Unicode case folding.
  • Sorting requires a tiebreaker for determinism. OCaml's List.sort (fun (_, a) (_, b) -> compare b a) only compares counts, leaving ties non-deterministic (sort is not stable in all implementations). Rust's .sort_by(|(c1,n1),(c2,n2)| n2.cmp(n1).then(c1.cmp(c2))) adds alphabetical tiebreaking, producing a fully deterministic result — important for testing and reproducibility.
  • When to Use Each Style

    **Use HashMap (idiomatic Rust) when:** you need maximum throughput and don't care about key order — e.g., building frequency tables for large corpora where you'll sort the results anyway.

    **Use BTreeMap (OCaml-equivalent) when:** you need keys in sorted order without a separate sort step, or when you want structural parity with OCaml code for comparison/porting purposes.

    Exercises

  • Extend frequency analysis to produce a normalized frequency map (HashMap<char, f64>) where values sum to 1.0, then compute the cosine similarity between two texts.
  • Implement index_of_coincidence — the probability that two randomly selected letters from a text are the same — and use it to estimate whether a text is in English.
  • Build a Vigenère cipher breaker: use the index of coincidence to guess the key length, then apply frequency analysis on each substream to recover the key.
  • Open Source Repos