ExamplesBy LevelBy TopicLearning Paths
082 Fundamental

082 — Nucleotide Count

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "082 — Nucleotide Count" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Count the occurrences of each DNA nucleotide (`A`, `C`, `G`, `T`) in a string, returning a `HashMap<char, usize>` or an error indicating the first invalid character. Key difference from OCaml: | Aspect | Rust | OCaml |

Tutorial

The Problem

Count the occurrences of each DNA nucleotide (A, C, G, T) in a string, returning a HashMap<char, usize> or an error indicating the first invalid character. Implement three versions — imperative loop, try_fold, and array-backed — and compare with OCaml's Map.Make(Char) and mutable ref-based approaches.

🎯 Learning Outcomes

  • • Initialise a HashMap with known keys and zero counts using array-into-iter
  • • Use get_mut to increment in-place without a double lookup
  • • Apply try_fold with the ? operator to combine accumulation and early error return
  • • Recognise when an array [usize; 4] beats HashMap for fixed-key counting
  • • Map Rust's Result<T, E> early return to OCaml's failwith exception
  • • Understand chars() vs bytes() for ASCII-only DNA strings
  • Code Example

    #![allow(clippy::all)]
    use std::collections::HashMap;
    
    /// Nucleotide Count — counting character frequencies
    ///
    /// Ownership: input DNA string is borrowed (&str).
    /// Result HashMap is owned by the caller.
    
    pub fn nucleotide_count(dna: &str) -> Result<HashMap<char, usize>, char> {
        let mut counts: HashMap<char, usize> = [('A', 0), ('C', 0), ('G', 0), ('T', 0)].into();
    
        for c in dna.chars() {
            match counts.get_mut(&c) {
                Some(n) => *n += 1,
                None => return Err(c),
            }
        }
        Ok(counts)
    }
    
    /// Version 2: Using fold
    pub fn nucleotide_count_fold(dna: &str) -> Result<HashMap<char, usize>, char> {
        dna.chars().try_fold(
            [('A', 0), ('C', 0), ('G', 0), ('T', 0)]
                .into_iter()
                .collect::<HashMap<_, _>>(),
            |mut acc, c| {
                *acc.get_mut(&c).ok_or(c)? += 1;
                Ok(acc)
            },
        )
    }
    
    /// Version 3: Using array instead of HashMap for performance
    pub fn nucleotide_count_array(dna: &str) -> Result<[usize; 4], char> {
        let mut counts = [0usize; 4];
        for c in dna.chars() {
            match c {
                'A' => counts[0] += 1,
                'C' => counts[1] += 1,
                'G' => counts[2] += 1,
                'T' => counts[3] += 1,
                _ => return Err(c),
            }
        }
        Ok(counts)
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_gattaca() {
            let counts = nucleotide_count("GATTACA").unwrap();
            assert_eq!(counts[&'A'], 3);
            assert_eq!(counts[&'T'], 2);
            assert_eq!(counts[&'G'], 1);
            assert_eq!(counts[&'C'], 1);
        }
    
        #[test]
        fn test_empty() {
            let counts = nucleotide_count("").unwrap();
            assert!(counts.values().all(|&v| v == 0));
        }
    
        #[test]
        fn test_invalid() {
            assert_eq!(nucleotide_count("GATTXCA"), Err('X'));
        }
    
        #[test]
        fn test_fold_version() {
            let counts = nucleotide_count_fold("GATTACA").unwrap();
            assert_eq!(counts[&'A'], 3);
        }
    
        #[test]
        fn test_array_version() {
            let counts = nucleotide_count_array("GATTACA").unwrap();
            assert_eq!(counts, [3, 1, 1, 2]); // A, C, G, T
        }
    }

    Key Differences

    AspectRustOCaml
    Error returnResult<_, char> with ?Exception via failwith
    Map typeHashMap<char, usize>Map.Make(Char) (balanced tree)
    In-place updateget_mut + *n += 1CMap.add c (n+1) m (persistent)
    try_foldBuilt into IteratorCustom recursion or exception
    Performance alt[usize; 4] arrayArray or Bytes
    String iteration.chars()String.fold_left / String.iter

    The array version demonstrates an important optimisation pattern: when the key space is small and known, replace HashMap with an array indexed by position — O(1) access with no hashing overhead.

    OCaml Approach

    OCaml uses a functional map CMap = Map.Make(Char) with fold_left to initialise zero counts and String.fold_left to accumulate. find_opt returns None for invalid characters, triggering failwith. The second version uses mutable ref cells in an association list — closer to the imperative loop. OCaml lacks try_fold but achieves the same effect through exceptions, which are lightweight in OCaml and commonly used for early exit.

    Full Source

    #![allow(clippy::all)]
    use std::collections::HashMap;
    
    /// Nucleotide Count — counting character frequencies
    ///
    /// Ownership: input DNA string is borrowed (&str).
    /// Result HashMap is owned by the caller.
    
    pub fn nucleotide_count(dna: &str) -> Result<HashMap<char, usize>, char> {
        let mut counts: HashMap<char, usize> = [('A', 0), ('C', 0), ('G', 0), ('T', 0)].into();
    
        for c in dna.chars() {
            match counts.get_mut(&c) {
                Some(n) => *n += 1,
                None => return Err(c),
            }
        }
        Ok(counts)
    }
    
    /// Version 2: Using fold
    pub fn nucleotide_count_fold(dna: &str) -> Result<HashMap<char, usize>, char> {
        dna.chars().try_fold(
            [('A', 0), ('C', 0), ('G', 0), ('T', 0)]
                .into_iter()
                .collect::<HashMap<_, _>>(),
            |mut acc, c| {
                *acc.get_mut(&c).ok_or(c)? += 1;
                Ok(acc)
            },
        )
    }
    
    /// Version 3: Using array instead of HashMap for performance
    pub fn nucleotide_count_array(dna: &str) -> Result<[usize; 4], char> {
        let mut counts = [0usize; 4];
        for c in dna.chars() {
            match c {
                'A' => counts[0] += 1,
                'C' => counts[1] += 1,
                'G' => counts[2] += 1,
                'T' => counts[3] += 1,
                _ => return Err(c),
            }
        }
        Ok(counts)
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_gattaca() {
            let counts = nucleotide_count("GATTACA").unwrap();
            assert_eq!(counts[&'A'], 3);
            assert_eq!(counts[&'T'], 2);
            assert_eq!(counts[&'G'], 1);
            assert_eq!(counts[&'C'], 1);
        }
    
        #[test]
        fn test_empty() {
            let counts = nucleotide_count("").unwrap();
            assert!(counts.values().all(|&v| v == 0));
        }
    
        #[test]
        fn test_invalid() {
            assert_eq!(nucleotide_count("GATTXCA"), Err('X'));
        }
    
        #[test]
        fn test_fold_version() {
            let counts = nucleotide_count_fold("GATTACA").unwrap();
            assert_eq!(counts[&'A'], 3);
        }
    
        #[test]
        fn test_array_version() {
            let counts = nucleotide_count_array("GATTACA").unwrap();
            assert_eq!(counts, [3, 1, 1, 2]); // A, C, G, T
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_gattaca() {
            let counts = nucleotide_count("GATTACA").unwrap();
            assert_eq!(counts[&'A'], 3);
            assert_eq!(counts[&'T'], 2);
            assert_eq!(counts[&'G'], 1);
            assert_eq!(counts[&'C'], 1);
        }
    
        #[test]
        fn test_empty() {
            let counts = nucleotide_count("").unwrap();
            assert!(counts.values().all(|&v| v == 0));
        }
    
        #[test]
        fn test_invalid() {
            assert_eq!(nucleotide_count("GATTXCA"), Err('X'));
        }
    
        #[test]
        fn test_fold_version() {
            let counts = nucleotide_count_fold("GATTACA").unwrap();
            assert_eq!(counts[&'A'], 3);
        }
    
        #[test]
        fn test_array_version() {
            let counts = nucleotide_count_array("GATTACA").unwrap();
            assert_eq!(counts, [3, 1, 1, 2]); // A, C, G, T
        }
    }

    Deep Comparison

    Nucleotide Count — Comparison

    Core Insight

    Character frequency counting shows the difference between immutable maps (OCaml) and mutable maps (Rust). OCaml rebuilds the map on each update; Rust mutates in place. Both handle invalid input, but with different error mechanisms.

    OCaml Approach

  • Map.Make(Char) — persistent (immutable) map, O(log n) per update
  • String.fold_left to iterate and accumulate
  • failwith for invalid nucleotides
  • • Each update creates a new map (old one still valid)
  • Rust Approach

  • HashMap<char, usize> — mutable, O(1) amortized per update
  • get_mut(&c) for in-place mutation
  • Result<HashMap, char> for error handling
  • • Array variant [usize; 4] avoids allocation entirely
  • Comparison Table

    AspectOCamlRust
    Map typeMap.Make(Char) persistentHashMap<char, usize> mutable
    UpdateNew map per insertIn-place *n += 1
    Errorfailwith exceptionResult<T, char>
    PerformanceO(log n) per updateO(1) amortized
    Zero-alloc optionNoYes (array variant)

    Learner Notes

  • HashMap::from([...]) initializes from array of tuples
  • get_mut returns Option<&mut V> — the mutable reference pattern
  • try_fold is the Rust equivalent of OCaml fold with early exit
  • • Array variant shows Rust's strength: zero-allocation when structure is known
  • Exercises

  • Extend nucleotide_count to also accept lowercase a, c, g, t by normalising with .to_ascii_uppercase().
  • Add a gc_content(dna: &str) -> Result<f64, char> function that returns the fraction of G and C nucleotides.
  • Implement a parallel version using rayon::par_iter() that splits the string and merges partial counts.
  • Write a function complement(dna: &str) -> Result<String, char> that returns the complement strand (A↔T, G↔C).
  • In OCaml, implement a version using Bytes with mutable array slots indexed by Char.code for O(1) update performance.
  • Open Source Repos