082 Fundamental

082 — Nucleotide Count

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "082 — Nucleotide Count" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Count the occurrences of each DNA nucleotide (`A`, `C`, `G`, `T`) in a string, returning a `HashMap<char, usize>` or an error indicating the first invalid character. Key difference from OCaml: | Aspect | Rust | OCaml |

Tutorial

The Problem

Count the occurrences of each DNA nucleotide (A, C, G, T) in a string, returning a HashMap<char, usize> or an error indicating the first invalid character. Implement three versions — imperative loop, try_fold, and array-backed — and compare with OCaml's Map.Make(Char) and mutable ref-based approaches.

🎯 Learning Outcomes

• Initialise a HashMap with known keys and zero counts using array-into-iter

• Use get_mut to increment in-place without a double lookup

• Apply try_fold with the ? operator to combine accumulation and early error return

• Recognise when an array [usize; 4] beats HashMap for fixed-key counting

• Map Rust's Result<T, E> early return to OCaml's failwith exception

• Understand chars() vs bytes() for ASCII-only DNA strings

Code Example

#![allow(clippy::all)]
use std::collections::HashMap;

/// Nucleotide Count — counting character frequencies
///
/// Ownership: input DNA string is borrowed (&str).
/// Result HashMap is owned by the caller.

pub fn nucleotide_count(dna: &str) -> Result<HashMap<char, usize>, char> {
    let mut counts: HashMap<char, usize> = [('A', 0), ('C', 0), ('G', 0), ('T', 0)].into();

    for c in dna.chars() {
        match counts.get_mut(&c) {
            Some(n) => *n += 1,
            None => return Err(c),
        }
    }
    Ok(counts)
}

/// Version 2: Using fold
pub fn nucleotide_count_fold(dna: &str) -> Result<HashMap<char, usize>, char> {
    dna.chars().try_fold(
        [('A', 0), ('C', 0), ('G', 0), ('T', 0)]
            .into_iter()
            .collect::<HashMap<_, _>>(),
        |mut acc, c| {
            *acc.get_mut(&c).ok_or(c)? += 1;
            Ok(acc)
        },
    )
}

/// Version 3: Using array instead of HashMap for performance
pub fn nucleotide_count_array(dna: &str) -> Result<[usize; 4], char> {
    let mut counts = [0usize; 4];
    for c in dna.chars() {
        match c {
            'A' => counts[0] += 1,
            'C' => counts[1] += 1,
            'G' => counts[2] += 1,
            'T' => counts[3] += 1,
            _ => return Err(c),
        }
    }
    Ok(counts)
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_gattaca() {
        let counts = nucleotide_count("GATTACA").unwrap();
        assert_eq!(counts[&'A'], 3);
        assert_eq!(counts[&'T'], 2);
        assert_eq!(counts[&'G'], 1);
        assert_eq!(counts[&'C'], 1);
    }

    #[test]
    fn test_empty() {
        let counts = nucleotide_count("").unwrap();
        assert!(counts.values().all(|&v| v == 0));
    }

    #[test]
    fn test_invalid() {
        assert_eq!(nucleotide_count("GATTXCA"), Err('X'));
    }

    #[test]
    fn test_fold_version() {
        let counts = nucleotide_count_fold("GATTACA").unwrap();
        assert_eq!(counts[&'A'], 3);
    }

    #[test]
    fn test_array_version() {
        let counts = nucleotide_count_array("GATTACA").unwrap();
        assert_eq!(counts, [3, 1, 1, 2]); // A, C, G, T
    }
}

(* Nucleotide Count — Bioinformatics *)

module CMap = Map.Make(Char)

(* Version 1: Using Map for counting *)
let nucleotide_count dna =
  let init = List.fold_left (fun m c -> CMap.add c 0 m)
    CMap.empty ['A';'C';'G';'T'] in
  String.fold_left (fun m c ->
    match CMap.find_opt c m with
    | Some n -> CMap.add c (n + 1) m
    | None -> failwith ("invalid nucleotide: " ^ String.make 1 c)
  ) init dna

(* Version 2: Using a simple assoc list *)
let nucleotide_count_simple dna =
  let counts = [('A', ref 0); ('C', ref 0); ('G', ref 0); ('T', ref 0)] in
  String.iter (fun c ->
    (List.assoc c counts) := !(List.assoc c counts) + 1
  ) dna;
  List.map (fun (c, r) -> (c, !r)) counts

let () =
  let counts = nucleotide_count "GATTACA" in
  assert (CMap.find 'A' counts = 3);
  assert (CMap.find 'T' counts = 2)

Key Differences

Aspect	Rust	OCaml
Error return	`Result<_, char>` with `?`	Exception via `failwith`
Map type	`HashMap<char, usize>`	`Map.Make(Char)` (balanced tree)
In-place update	`get_mut` + `*n += 1`	`CMap.add c (n+1) m` (persistent)
`try_fold`	Built into `Iterator`	Custom recursion or exception
Performance alt	`[usize; 4]` array	Array or `Bytes`
String iteration	`.chars()`	`String.fold_left` / `String.iter`

The array version demonstrates an important optimisation pattern: when the key space is small and known, replace HashMap with an array indexed by position — O(1) access with no hashing overhead.

OCaml Approach

OCaml uses a functional map CMap = Map.Make(Char) with fold_left to initialise zero counts and String.fold_left to accumulate. find_opt returns None for invalid characters, triggering failwith. The second version uses mutable ref cells in an association list — closer to the imperative loop. OCaml lacks try_fold but achieves the same effect through exceptions, which are lightweight in OCaml and commonly used for early exit.

Full Source

#![allow(clippy::all)]
use std::collections::HashMap;

/// Nucleotide Count — counting character frequencies
///
/// Ownership: input DNA string is borrowed (&str).
/// Result HashMap is owned by the caller.

pub fn nucleotide_count(dna: &str) -> Result<HashMap<char, usize>, char> {
    let mut counts: HashMap<char, usize> = [('A', 0), ('C', 0), ('G', 0), ('T', 0)].into();

    for c in dna.chars() {
        match counts.get_mut(&c) {
            Some(n) => *n += 1,
            None => return Err(c),
        }
    }
    Ok(counts)
}

/// Version 2: Using fold
pub fn nucleotide_count_fold(dna: &str) -> Result<HashMap<char, usize>, char> {
    dna.chars().try_fold(
        [('A', 0), ('C', 0), ('G', 0), ('T', 0)]
            .into_iter()
            .collect::<HashMap<_, _>>(),
        |mut acc, c| {
            *acc.get_mut(&c).ok_or(c)? += 1;
            Ok(acc)
        },
    )
}

/// Version 3: Using array instead of HashMap for performance
pub fn nucleotide_count_array(dna: &str) -> Result<[usize; 4], char> {
    let mut counts = [0usize; 4];
    for c in dna.chars() {
        match c {
            'A' => counts[0] += 1,
            'C' => counts[1] += 1,
            'G' => counts[2] += 1,
            'T' => counts[3] += 1,
            _ => return Err(c),
        }
    }
    Ok(counts)
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_gattaca() {
        let counts = nucleotide_count("GATTACA").unwrap();
        assert_eq!(counts[&'A'], 3);
        assert_eq!(counts[&'T'], 2);
        assert_eq!(counts[&'G'], 1);
        assert_eq!(counts[&'C'], 1);
    }

    #[test]
    fn test_empty() {
        let counts = nucleotide_count("").unwrap();
        assert!(counts.values().all(|&v| v == 0));
    }

    #[test]
    fn test_invalid() {
        assert_eq!(nucleotide_count("GATTXCA"), Err('X'));
    }

    #[test]
    fn test_fold_version() {
        let counts = nucleotide_count_fold("GATTACA").unwrap();
        assert_eq!(counts[&'A'], 3);
    }

    #[test]
    fn test_array_version() {
        let counts = nucleotide_count_array("GATTACA").unwrap();
        assert_eq!(counts, [3, 1, 1, 2]); // A, C, G, T
    }
}

(* Nucleotide Count — Bioinformatics *)

module CMap = Map.Make(Char)

(* Version 1: Using Map for counting *)
let nucleotide_count dna =
  let init = List.fold_left (fun m c -> CMap.add c 0 m)
    CMap.empty ['A';'C';'G';'T'] in
  String.fold_left (fun m c ->
    match CMap.find_opt c m with
    | Some n -> CMap.add c (n + 1) m
    | None -> failwith ("invalid nucleotide: " ^ String.make 1 c)
  ) init dna

(* Version 2: Using a simple assoc list *)
let nucleotide_count_simple dna =
  let counts = [('A', ref 0); ('C', ref 0); ('G', ref 0); ('T', ref 0)] in
  String.iter (fun c ->
    (List.assoc c counts) := !(List.assoc c counts) + 1
  ) dna;
  List.map (fun (c, r) -> (c, !r)) counts

let () =
  let counts = nucleotide_count "GATTACA" in
  assert (CMap.find 'A' counts = 3);
  assert (CMap.find 'T' counts = 2)

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_gattaca() {
        let counts = nucleotide_count("GATTACA").unwrap();
        assert_eq!(counts[&'A'], 3);
        assert_eq!(counts[&'T'], 2);
        assert_eq!(counts[&'G'], 1);
        assert_eq!(counts[&'C'], 1);
    }

    #[test]
    fn test_empty() {
        let counts = nucleotide_count("").unwrap();
        assert!(counts.values().all(|&v| v == 0));
    }

    #[test]
    fn test_invalid() {
        assert_eq!(nucleotide_count("GATTXCA"), Err('X'));
    }

    #[test]
    fn test_fold_version() {
        let counts = nucleotide_count_fold("GATTACA").unwrap();
        assert_eq!(counts[&'A'], 3);
    }

    #[test]
    fn test_array_version() {
        let counts = nucleotide_count_array("GATTACA").unwrap();
        assert_eq!(counts, [3, 1, 1, 2]); // A, C, G, T
    }
}

Deep Comparison

Nucleotide Count — Comparison

Core Insight

Character frequency counting shows the difference between immutable maps (OCaml) and mutable maps (Rust). OCaml rebuilds the map on each update; Rust mutates in place. Both handle invalid input, but with different error mechanisms.

OCaml Approach

• Map.Make(Char) — persistent (immutable) map, O(log n) per update

• String.fold_left to iterate and accumulate

• failwith for invalid nucleotides

• Each update creates a new map (old one still valid)

Rust Approach

• HashMap<char, usize> — mutable, O(1) amortized per update

• get_mut(&c) for in-place mutation

• Result<HashMap, char> for error handling

• Array variant [usize; 4] avoids allocation entirely

Comparison Table

Aspect	OCaml	Rust
Map type	`Map.Make(Char)` persistent	`HashMap<char, usize>` mutable
Update	New map per insert	In-place `*n += 1`
Error	`failwith` exception	`Result<T, char>`
Performance	O(log n) per update	O(1) amortized
Zero-alloc option	No	Yes (array variant)

Learner Notes

• HashMap::from([...]) initializes from array of tuples

• get_mut returns Option<&mut V> — the mutable reference pattern

• try_fold is the Rust equivalent of OCaml fold with early exit

• Array variant shows Rust's strength: zero-allocation when structure is known

Exercises

Extend nucleotide_count to also accept lowercase a, c, g, t by normalising with .to_ascii_uppercase().

Add a gc_content(dna: &str) -> Result<f64, char> function that returns the fraction of G and C nucleotides.

Implement a parallel version using rayon::par_iter() that splits the string and merges partial counts.

Write a function complement(dna: &str) -> Result<String, char> that returns the complement strand (A↔T, G↔C).

In OCaml, implement a version using Bytes with mutable array slots indexed by Char.code for O(1) update performance.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust