090 Intermediate

Frequency Analysis — Letter Distribution

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "Frequency Analysis — Letter Distribution" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Count how many times each letter (a–z) appears in a string, ignoring case and non-alphabetic characters, then return results sorted by frequency descending. Key difference from OCaml: 1. **Map type:** OCaml uses `Map.Make(Char)` (balanced BST, sorted by key); Rust offers `HashMap` (hash table, unordered) or `BTreeMap` (B

Tutorial

The Problem

Count how many times each letter (a–z) appears in a string, ignoring case and non-alphabetic characters, then return results sorted by frequency descending.

🎯 Learning Outcomes

• How HashMap::entry with or_insert replaces OCaml's Map.update for in-place counting

• Why BTreeMap mirrors OCaml's Map.Make(Char) — both keep keys sorted, both are O(log n) per operation

• How iterator chains (.filter().map().fold()) replace OCaml's String.fold_left pipeline

• The difference between hash-based (HashMap) and tree-based (BTreeMap) maps and when to choose each

🦀 The Rust Way

Rust offers two natural equivalents: HashMap<char, usize> for O(1) average access and BTreeMap<char, usize> for O(log n) with keys in sorted order — the latter directly mirrors OCaml's Map.Make. The entry().or_insert(0) pattern compresses OCaml's update ... function None -> Some 1 | Some n -> Some (n+1) into a single idiomatic expression.

Code Example

use std::collections::HashMap;

pub fn frequency(s: &str) -> HashMap<char, usize> {
    s.chars()
        .filter(|c| c.is_ascii_alphabetic())
        .map(|c| c.to_ascii_lowercase())
        .fold(HashMap::new(), |mut map, c| {
            *map.entry(c).or_insert(0) += 1;
            map
        })
}

pub fn sorted_freq(s: &str) -> Vec<(char, usize)> {
    let mut pairs: Vec<(char, usize)> = frequency(s).into_iter().collect();
    pairs.sort_by(|(c1, n1), (c2, n2)| n2.cmp(n1).then(c1.cmp(c2)));
    pairs
}

module CMap = Map.Make(Char)

let frequency s =
  String.fold_left (fun m c ->
    let c = Char.lowercase_ascii c in
    if c >= 'a' && c <= 'z' then
      CMap.update c (function None -> Some 1 | Some n -> Some (n+1)) m
    else m
  ) CMap.empty s

let sorted_freq s =
  frequency s |> CMap.bindings
  |> List.sort (fun (_, a) (_, b) -> compare b a)

Key Differences

Map type: OCaml uses Map.Make(Char) (balanced BST, sorted by key); Rust offers HashMap (hash table, unordered) or BTreeMap (B-tree, sorted by key)

Update idiom: OCaml's CMap.update c f m takes a function over option; Rust's map.entry(c).or_insert(0) returns a mutable reference for in-place mutation

String iteration: OCaml's String.fold_left threads state; Rust's .chars().filter().map().fold() composes the same pipeline with explicit types

Sorting: OCaml's List.sort (fun (_, a) (_, b) -> compare b a) sorts by value descending; Rust's .sort_by(|(c1,n1),(c2,n2)| n2.cmp(n1).then(c1.cmp(c2))) adds a stable tiebreaker

OCaml Approach

OCaml uses a functor Map.Make(Char) to create a balanced BST map keyed on char. The String.fold_left threads a map accumulator through each character, calling CMap.update to increment or initialize counts. The map is then converted to a sorted binding list for display.

Full Source

#![allow(clippy::all)]
use std::collections::{BTreeMap, HashMap};

/// Solution 1: Idiomatic Rust — HashMap with fold
/// Uses entry API for efficient in-place counting
pub fn frequency(s: &str) -> HashMap<char, usize> {
    s.chars()
        .filter(|c| c.is_ascii_alphabetic())
        .map(|c| c.to_ascii_lowercase())
        .fold(HashMap::new(), |mut map, c| {
            *map.entry(c).or_insert(0) += 1;
            map
        })
}

/// Solution 2: BTreeMap — mirrors OCaml's Map.Make(Char), keys stay sorted
/// Equivalent to OCaml's `CMap.update c (function None -> Some 1 | Some n -> Some (n+1)) m`
pub fn frequency_btree(s: &str) -> BTreeMap<char, usize> {
    s.chars()
        .filter(|c| c.is_ascii_alphabetic())
        .map(|c| c.to_ascii_lowercase())
        .fold(BTreeMap::new(), |mut map, c| {
            *map.entry(c).or_insert(0) += 1;
            map
        })
}

/// Solution 3: Sorted by frequency descending, ties broken alphabetically
/// Mirrors OCaml's `List.sort (fun (_, a) (_, b) -> compare b a)`
pub fn sorted_freq(s: &str) -> Vec<(char, usize)> {
    let mut pairs: Vec<(char, usize)> = frequency(s).into_iter().collect();
    pairs.sort_by(|(c1, n1), (c2, n2)| n2.cmp(n1).then(c1.cmp(c2)));
    pairs
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_empty_string() {
        assert_eq!(frequency(""), HashMap::new());
        assert!(sorted_freq("").is_empty());
    }

    #[test]
    fn test_single_character() {
        let freq = frequency("a");
        assert_eq!(freq[&'a'], 1);
        assert_eq!(freq.len(), 1);
    }

    #[test]
    fn test_case_insensitive() {
        let freq = frequency("AaAa");
        assert_eq!(freq[&'a'], 4);
        assert_eq!(freq.len(), 1);
    }

    #[test]
    fn test_non_alpha_ignored() {
        let freq = frequency("a1b2c! a");
        assert_eq!(freq[&'a'], 2);
        assert_eq!(freq[&'b'], 1);
        assert_eq!(freq[&'c'], 1);
        assert_eq!(freq.len(), 3);
    }

    #[test]
    fn test_pangram_all_letters_present() {
        let text = "The quick brown fox jumps over the lazy dog";
        let freq = frequency(text);
        // All 26 letters appear at least once in a pangram
        assert_eq!(freq.len(), 26);
        // 'e' and 'o' appear most in this pangram
        assert_eq!(freq[&'e'], 3);
        assert_eq!(freq[&'o'], 4);
    }

    #[test]
    fn test_btree_sorted_by_key() {
        let freq = frequency_btree("bac");
        let keys: Vec<char> = freq.keys().copied().collect();
        assert_eq!(keys, vec!['a', 'b', 'c']);
    }

    #[test]
    fn test_sorted_freq_descending() {
        let result = sorted_freq("aaabbc");
        assert_eq!(result[0], ('a', 3));
        assert_eq!(result[1], ('b', 2));
        assert_eq!(result[2], ('c', 1));
    }

    #[test]
    fn test_sorted_freq_ties_broken_alphabetically() {
        // 'a' and 'b' both appear twice — 'a' should come first
        let result = sorted_freq("aabb");
        assert_eq!(result[0], ('a', 2));
        assert_eq!(result[1], ('b', 2));
    }
}

module CMap = Map.Make(Char)

(* Idiomatic OCaml — fold over string, update map with lowercase letters *)
let frequency s =
  String.fold_left (fun m c ->
    let c = Char.lowercase_ascii c in
    if c >= 'a' && c <= 'z' then
      CMap.update c (function None -> Some 1 | Some n -> Some (n+1)) m
    else m
  ) CMap.empty s

(* Sorted by frequency descending — bindings gives (key, value) list *)
let sorted_freq s =
  frequency s |> CMap.bindings
  |> List.sort (fun (_, a) (_, b) -> compare b a)

(* Recursive variant — explicit pattern matching on char list *)
let frequency_rec s =
  let rec go m = function
    | [] -> m
    | c :: rest ->
      let c = Char.lowercase_ascii c in
      let m' =
        if c >= 'a' && c <= 'z' then
          CMap.update c (function None -> Some 1 | Some n -> Some (n+1)) m
        else m
      in
      go m' rest
  in
  go CMap.empty (List.of_seq (String.to_seq s))

let () =
  (* Basic frequency tests *)
  let f = frequency "aabb" in
  assert (CMap.find 'a' f = 2);
  assert (CMap.find 'b' f = 2);

  (* Case insensitive *)
  let f2 = frequency "AaAa" in
  assert (CMap.find 'a' f2 = 4);

  (* Non-alpha ignored *)
  let f3 = frequency "a1b!" in
  assert (CMap.find 'a' f3 = 1);
  assert (CMap.find 'b' f3 = 1);
  assert (not (CMap.mem '1' f3));

  (* Sorted descending *)
  let s = sorted_freq "aaabbc" in
  assert (s = [('a', 3); ('b', 2); ('c', 1)]);

  (* Recursive variant matches fold variant *)
  let text = "Hello World" in
  assert (frequency text = frequency_rec text);

  (* Pangram has all 26 letters *)
  let pangram = "The quick brown fox jumps over the lazy dog" in
  assert (CMap.cardinal (frequency pangram) = 26);

  print_endline "ok"

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_empty_string() {
        assert_eq!(frequency(""), HashMap::new());
        assert!(sorted_freq("").is_empty());
    }

    #[test]
    fn test_single_character() {
        let freq = frequency("a");
        assert_eq!(freq[&'a'], 1);
        assert_eq!(freq.len(), 1);
    }

    #[test]
    fn test_case_insensitive() {
        let freq = frequency("AaAa");
        assert_eq!(freq[&'a'], 4);
        assert_eq!(freq.len(), 1);
    }

    #[test]
    fn test_non_alpha_ignored() {
        let freq = frequency("a1b2c! a");
        assert_eq!(freq[&'a'], 2);
        assert_eq!(freq[&'b'], 1);
        assert_eq!(freq[&'c'], 1);
        assert_eq!(freq.len(), 3);
    }

    #[test]
    fn test_pangram_all_letters_present() {
        let text = "The quick brown fox jumps over the lazy dog";
        let freq = frequency(text);
        // All 26 letters appear at least once in a pangram
        assert_eq!(freq.len(), 26);
        // 'e' and 'o' appear most in this pangram
        assert_eq!(freq[&'e'], 3);
        assert_eq!(freq[&'o'], 4);
    }

    #[test]
    fn test_btree_sorted_by_key() {
        let freq = frequency_btree("bac");
        let keys: Vec<char> = freq.keys().copied().collect();
        assert_eq!(keys, vec!['a', 'b', 'c']);
    }

    #[test]
    fn test_sorted_freq_descending() {
        let result = sorted_freq("aaabbc");
        assert_eq!(result[0], ('a', 3));
        assert_eq!(result[1], ('b', 2));
        assert_eq!(result[2], ('c', 1));
    }

    #[test]
    fn test_sorted_freq_ties_broken_alphabetically() {
        // 'a' and 'b' both appear twice — 'a' should come first
        let result = sorted_freq("aabb");
        assert_eq!(result[0], ('a', 2));
        assert_eq!(result[1], ('b', 2));
    }
}

Deep Comparison

OCaml vs Rust: Frequency Analysis — Letter Distribution

Side-by-Side Code

OCaml

module CMap = Map.Make(Char)

let frequency s =
  String.fold_left (fun m c ->
    let c = Char.lowercase_ascii c in
    if c >= 'a' && c <= 'z' then
      CMap.update c (function None -> Some 1 | Some n -> Some (n+1)) m
    else m
  ) CMap.empty s

let sorted_freq s =
  frequency s |> CMap.bindings
  |> List.sort (fun (_, a) (_, b) -> compare b a)

Rust (idiomatic — HashMap)

use std::collections::HashMap;

pub fn frequency(s: &str) -> HashMap<char, usize> {
    s.chars()
        .filter(|c| c.is_ascii_alphabetic())
        .map(|c| c.to_ascii_lowercase())
        .fold(HashMap::new(), |mut map, c| {
            *map.entry(c).or_insert(0) += 1;
            map
        })
}

pub fn sorted_freq(s: &str) -> Vec<(char, usize)> {
    let mut pairs: Vec<(char, usize)> = frequency(s).into_iter().collect();
    pairs.sort_by(|(c1, n1), (c2, n2)| n2.cmp(n1).then(c1.cmp(c2)));
    pairs
}

Rust (BTreeMap — mirrors OCaml's Map.Make, keys sorted)

use std::collections::BTreeMap;

pub fn frequency_btree(s: &str) -> BTreeMap<char, usize> {
    s.chars()
        .filter(|c| c.is_ascii_alphabetic())
        .map(|c| c.to_ascii_lowercase())
        .fold(BTreeMap::new(), |mut map, c| {
            *map.entry(c).or_insert(0) += 1;
            map
        })
}

Type Signatures

Concept	OCaml	Rust
Frequency map type	`CMap.t` (≈ `Char -> int`)	`HashMap<char, usize>` or `BTreeMap<char, usize>`
Function signature	`val frequency : string -> CMap.t`	`fn frequency(s: &str) -> HashMap<char, usize>`
Sorted result	`(char * int) list`	`Vec<(char, usize)>`
Map entry update	`CMap.update c f m`	`*map.entry(c).or_insert(0) += 1`
String fold	`String.fold_left f init s`	`s.chars().fold(init, f)`
Bindings extraction	`CMap.bindings m`	`map.into_iter().collect()`

Key Insights

**BTreeMap is the faithful OCaml equivalent.** OCaml's Map.Make(Char) is a balanced BST with O(log n) operations and keys always in sorted order. Rust's BTreeMap<char, usize> is structurally identical. HashMap is faster on average but unordered — a deliberate trade-off.

**entry().or_insert() beats OCaml's update.** OCaml requires a function option -> option to express "insert 1 or increment". Rust's entry API returns a &mut usize, enabling += 1 directly — no pattern match needed, zero allocation, and no intermediate closure.

Iterator chains compose like OCaml pipelines. OCaml's String.fold_left (fun m c -> ...) CMap.empty s threads state explicitly. Rust's .chars().filter().map().fold() is the same pipeline with typed stages. Both are lazy in spirit; Rust's iterators are actually zero-cost lazy.

Case normalization is symmetric. OCaml uses Char.lowercase_ascii; Rust uses .to_ascii_lowercase(). Both operate on ASCII only — correct for letter frequency on ASCII text and cheaper than Unicode case folding.

Sorting requires a tiebreaker for determinism. OCaml's List.sort (fun (_, a) (_, b) -> compare b a) only compares counts, leaving ties non-deterministic (sort is not stable in all implementations). Rust's .sort_by(|(c1,n1),(c2,n2)| n2.cmp(n1).then(c1.cmp(c2))) adds alphabetical tiebreaking, producing a fully deterministic result — important for testing and reproducibility.

When to Use Each Style

**Use HashMap (idiomatic Rust) when:** you need maximum throughput and don't care about key order — e.g., building frequency tables for large corpora where you'll sort the results anyway.

**Use BTreeMap (OCaml-equivalent) when:** you need keys in sorted order without a separate sort step, or when you want structural parity with OCaml code for comparison/porting purposes.

Exercises

Extend frequency analysis to produce a normalized frequency map (HashMap<char, f64>) where values sum to 1.0, then compute the cosine similarity between two texts.

Implement index_of_coincidence — the probability that two randomly selected letters from a text are the same — and use it to estimate whether a text is in English.

Build a Vigenère cipher breaker: use the index of coincidence to guess the key length, then apply frequency analysis on each substream to recover the key.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust