082 — Nucleotide Count
Tutorial Video
Text description (accessibility)
This video demonstrates the "082 — Nucleotide Count" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Count the occurrences of each DNA nucleotide (`A`, `C`, `G`, `T`) in a string, returning a `HashMap<char, usize>` or an error indicating the first invalid character. Key difference from OCaml: | Aspect | Rust | OCaml |
Tutorial
The Problem
Count the occurrences of each DNA nucleotide (A, C, G, T) in a string, returning a HashMap<char, usize> or an error indicating the first invalid character. Implement three versions — imperative loop, try_fold, and array-backed — and compare with OCaml's Map.Make(Char) and mutable ref-based approaches.
🎯 Learning Outcomes
HashMap with known keys and zero counts using array-into-iterget_mut to increment in-place without a double lookuptry_fold with the ? operator to combine accumulation and early error return[usize; 4] beats HashMap for fixed-key countingResult<T, E> early return to OCaml's failwith exceptionchars() vs bytes() for ASCII-only DNA stringsCode Example
#![allow(clippy::all)]
use std::collections::HashMap;
/// Nucleotide Count — counting character frequencies
///
/// Ownership: input DNA string is borrowed (&str).
/// Result HashMap is owned by the caller.
pub fn nucleotide_count(dna: &str) -> Result<HashMap<char, usize>, char> {
let mut counts: HashMap<char, usize> = [('A', 0), ('C', 0), ('G', 0), ('T', 0)].into();
for c in dna.chars() {
match counts.get_mut(&c) {
Some(n) => *n += 1,
None => return Err(c),
}
}
Ok(counts)
}
/// Version 2: Using fold
pub fn nucleotide_count_fold(dna: &str) -> Result<HashMap<char, usize>, char> {
dna.chars().try_fold(
[('A', 0), ('C', 0), ('G', 0), ('T', 0)]
.into_iter()
.collect::<HashMap<_, _>>(),
|mut acc, c| {
*acc.get_mut(&c).ok_or(c)? += 1;
Ok(acc)
},
)
}
/// Version 3: Using array instead of HashMap for performance
pub fn nucleotide_count_array(dna: &str) -> Result<[usize; 4], char> {
let mut counts = [0usize; 4];
for c in dna.chars() {
match c {
'A' => counts[0] += 1,
'C' => counts[1] += 1,
'G' => counts[2] += 1,
'T' => counts[3] += 1,
_ => return Err(c),
}
}
Ok(counts)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_gattaca() {
let counts = nucleotide_count("GATTACA").unwrap();
assert_eq!(counts[&'A'], 3);
assert_eq!(counts[&'T'], 2);
assert_eq!(counts[&'G'], 1);
assert_eq!(counts[&'C'], 1);
}
#[test]
fn test_empty() {
let counts = nucleotide_count("").unwrap();
assert!(counts.values().all(|&v| v == 0));
}
#[test]
fn test_invalid() {
assert_eq!(nucleotide_count("GATTXCA"), Err('X'));
}
#[test]
fn test_fold_version() {
let counts = nucleotide_count_fold("GATTACA").unwrap();
assert_eq!(counts[&'A'], 3);
}
#[test]
fn test_array_version() {
let counts = nucleotide_count_array("GATTACA").unwrap();
assert_eq!(counts, [3, 1, 1, 2]); // A, C, G, T
}
}Key Differences
| Aspect | Rust | OCaml |
|---|---|---|
| Error return | Result<_, char> with ? | Exception via failwith |
| Map type | HashMap<char, usize> | Map.Make(Char) (balanced tree) |
| In-place update | get_mut + *n += 1 | CMap.add c (n+1) m (persistent) |
try_fold | Built into Iterator | Custom recursion or exception |
| Performance alt | [usize; 4] array | Array or Bytes |
| String iteration | .chars() | String.fold_left / String.iter |
The array version demonstrates an important optimisation pattern: when the key space is small and known, replace HashMap with an array indexed by position — O(1) access with no hashing overhead.
OCaml Approach
OCaml uses a functional map CMap = Map.Make(Char) with fold_left to initialise zero counts and String.fold_left to accumulate. find_opt returns None for invalid characters, triggering failwith. The second version uses mutable ref cells in an association list — closer to the imperative loop. OCaml lacks try_fold but achieves the same effect through exceptions, which are lightweight in OCaml and commonly used for early exit.
Full Source
#![allow(clippy::all)]
use std::collections::HashMap;
/// Nucleotide Count — counting character frequencies
///
/// Ownership: input DNA string is borrowed (&str).
/// Result HashMap is owned by the caller.
pub fn nucleotide_count(dna: &str) -> Result<HashMap<char, usize>, char> {
let mut counts: HashMap<char, usize> = [('A', 0), ('C', 0), ('G', 0), ('T', 0)].into();
for c in dna.chars() {
match counts.get_mut(&c) {
Some(n) => *n += 1,
None => return Err(c),
}
}
Ok(counts)
}
/// Version 2: Using fold
pub fn nucleotide_count_fold(dna: &str) -> Result<HashMap<char, usize>, char> {
dna.chars().try_fold(
[('A', 0), ('C', 0), ('G', 0), ('T', 0)]
.into_iter()
.collect::<HashMap<_, _>>(),
|mut acc, c| {
*acc.get_mut(&c).ok_or(c)? += 1;
Ok(acc)
},
)
}
/// Version 3: Using array instead of HashMap for performance
pub fn nucleotide_count_array(dna: &str) -> Result<[usize; 4], char> {
let mut counts = [0usize; 4];
for c in dna.chars() {
match c {
'A' => counts[0] += 1,
'C' => counts[1] += 1,
'G' => counts[2] += 1,
'T' => counts[3] += 1,
_ => return Err(c),
}
}
Ok(counts)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_gattaca() {
let counts = nucleotide_count("GATTACA").unwrap();
assert_eq!(counts[&'A'], 3);
assert_eq!(counts[&'T'], 2);
assert_eq!(counts[&'G'], 1);
assert_eq!(counts[&'C'], 1);
}
#[test]
fn test_empty() {
let counts = nucleotide_count("").unwrap();
assert!(counts.values().all(|&v| v == 0));
}
#[test]
fn test_invalid() {
assert_eq!(nucleotide_count("GATTXCA"), Err('X'));
}
#[test]
fn test_fold_version() {
let counts = nucleotide_count_fold("GATTACA").unwrap();
assert_eq!(counts[&'A'], 3);
}
#[test]
fn test_array_version() {
let counts = nucleotide_count_array("GATTACA").unwrap();
assert_eq!(counts, [3, 1, 1, 2]); // A, C, G, T
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_gattaca() {
let counts = nucleotide_count("GATTACA").unwrap();
assert_eq!(counts[&'A'], 3);
assert_eq!(counts[&'T'], 2);
assert_eq!(counts[&'G'], 1);
assert_eq!(counts[&'C'], 1);
}
#[test]
fn test_empty() {
let counts = nucleotide_count("").unwrap();
assert!(counts.values().all(|&v| v == 0));
}
#[test]
fn test_invalid() {
assert_eq!(nucleotide_count("GATTXCA"), Err('X'));
}
#[test]
fn test_fold_version() {
let counts = nucleotide_count_fold("GATTACA").unwrap();
assert_eq!(counts[&'A'], 3);
}
#[test]
fn test_array_version() {
let counts = nucleotide_count_array("GATTACA").unwrap();
assert_eq!(counts, [3, 1, 1, 2]); // A, C, G, T
}
}
Deep Comparison
Nucleotide Count — Comparison
Core Insight
Character frequency counting shows the difference between immutable maps (OCaml) and mutable maps (Rust). OCaml rebuilds the map on each update; Rust mutates in place. Both handle invalid input, but with different error mechanisms.
OCaml Approach
Map.Make(Char) — persistent (immutable) map, O(log n) per updateString.fold_left to iterate and accumulatefailwith for invalid nucleotidesRust Approach
HashMap<char, usize> — mutable, O(1) amortized per updateget_mut(&c) for in-place mutationResult<HashMap, char> for error handling[usize; 4] avoids allocation entirelyComparison Table
| Aspect | OCaml | Rust |
|---|---|---|
| Map type | Map.Make(Char) persistent | HashMap<char, usize> mutable |
| Update | New map per insert | In-place *n += 1 |
| Error | failwith exception | Result<T, char> |
| Performance | O(log n) per update | O(1) amortized |
| Zero-alloc option | No | Yes (array variant) |
Learner Notes
HashMap::from([...]) initializes from array of tuplesget_mut returns Option<&mut V> — the mutable reference patterntry_fold is the Rust equivalent of OCaml fold with early exitExercises
nucleotide_count to also accept lowercase a, c, g, t by normalising with .to_ascii_uppercase().gc_content(dna: &str) -> Result<f64, char> function that returns the fraction of G and C nucleotides.rayon::par_iter() that splits the string and merges partial counts.complement(dna: &str) -> Result<String, char> that returns the complement strand (A↔T, G↔C).Bytes with mutable array slots indexed by Char.code for O(1) update performance.