070 Fundamental

070 — Hamming Distance

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "070 — Hamming Distance" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Hamming distance counts the number of positions at which two equal-length strings differ. Key difference from OCaml: 1. **`fold_left2` vs `zip`**: OCaml's `List.fold_left2` processes two lists simultaneously. Rust's `zip` creates an iterator of pairs, then processes with `filter + count` or `fold`. Both are O(n).

Tutorial

The Problem

Hamming distance counts the number of positions at which two equal-length strings differ. Named after Richard Hamming, who introduced it in his landmark 1950 paper on error-detecting and error-correcting codes, the concept was foundational for reliable digital communication. A code with minimum Hamming distance d can detect up to d-1 bit errors and correct up to floor((d-1)/2) errors — the foundation of Hamming codes and modern ECC memory.

Applications span many fields: DNA sequence comparison in bioinformatics measures how far two genetic strands have diverged through mutation; error-correcting codes in storage and networking (Hamming codes, Reed-Solomon) rely on Hamming distance to detect and repair corruption; nearest-neighbor search in machine learning uses it to find similar binary feature vectors in LSH (locality-sensitive hashing); and cryptographic protocols analyze password similarity to enforce minimum edit distance policies. The zip + filter + count pipeline it demonstrates is one of the most fundamental iterator composition patterns in Rust, applicable whenever you need to compare two sequences element-by-element.

🎯 Learning Outcomes

• Use .zip() to pair corresponding elements from two iterators into a stream of tuples

• Use .filter(|(a, b)| a != b) to select only the mismatched pairs before counting

• Return Result<usize, String> when inputs must have equal length, signaling the precondition violation explicitly

• Implement the same logic using both filter().count() and fold to compare styles

• Use byte-level comparison (as_bytes()) for ASCII performance, bypassing UTF-8 decoding

• Understand the connection to error-correcting codes and binary distance metrics

Code Example

#![allow(clippy::all)]
/// Hamming Distance
///
/// Count positional differences between two equal-length strings.
/// A clean example of zip + filter + count pattern in both languages.

/// Using zip + filter + count — idiomatic Rust.
pub fn hamming_distance(s1: &str, s2: &str) -> Result<usize, String> {
    if s1.len() != s2.len() {
        return Err("strands must be of equal length".to_string());
    }
    Ok(s1.chars().zip(s2.chars()).filter(|(a, b)| a != b).count())
}

/// Using fold — mirrors OCaml's fold_left2 approach.
pub fn hamming_fold(s1: &str, s2: &str) -> Result<usize, String> {
    if s1.len() != s2.len() {
        return Err("strands must be of equal length".to_string());
    }
    Ok(s1
        .chars()
        .zip(s2.chars())
        .fold(0, |acc, (a, b)| if a != b { acc + 1 } else { acc }))
}

/// Byte-level comparison — faster for ASCII strings.
pub fn hamming_bytes(s1: &[u8], s2: &[u8]) -> Result<usize, String> {
    if s1.len() != s2.len() {
        return Err("strands must be of equal length".to_string());
    }
    Ok(s1.iter().zip(s2.iter()).filter(|(a, b)| a != b).count())
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_hamming_basic() {
        assert_eq!(
            hamming_distance("GAGCCTACTAACGGGAT", "CATCGTAATGACGGCCT"),
            Ok(7)
        );
    }

    #[test]
    fn test_identical() {
        assert_eq!(hamming_distance("AAAA", "AAAA"), Ok(0));
    }

    #[test]
    fn test_completely_different() {
        assert_eq!(hamming_distance("AAAA", "TTTT"), Ok(4));
    }

    #[test]
    fn test_unequal_length() {
        assert!(hamming_distance("AA", "AAA").is_err());
    }

    #[test]
    fn test_empty() {
        assert_eq!(hamming_distance("", ""), Ok(0));
    }

    #[test]
    fn test_fold_variant() {
        assert_eq!(
            hamming_fold("GAGCCTACTAACGGGAT", "CATCGTAATGACGGCCT"),
            Ok(7)
        );
    }

    #[test]
    fn test_single_char() {
        assert_eq!(hamming_distance("A", "G"), Ok(1));
        assert_eq!(hamming_distance("A", "A"), Ok(0));
    }
}

exception Invalid_argument of string

let chars_of_string s =
  List.init (String.length s) (String.get s)

let hamming_distance s1 s2 =
  if String.length s1 <> String.length s2 then
    raise (Invalid_argument "strands must be of equal length");
  List.combine (chars_of_string s1) (chars_of_string s2)
  |> List.filter (fun (a, b) -> a <> b)
  |> List.length

let hamming_fold s1 s2 =
  if String.length s1 <> String.length s2 then
    raise (Invalid_argument "strands must be of equal length");
  List.fold_left2
    (fun acc c1 c2 -> if c1 <> c2 then acc + 1 else acc)
    0
    (chars_of_string s1)
    (chars_of_string s2)

let () =
  assert (hamming_distance "GAGCCTACTAACGGGAT" "CATCGTAATGACGGCCT" = 7);
  assert (hamming_distance "AAAA" "AAAA" = 0);
  assert (hamming_fold "AAAA" "TTTT" = 4);
  print_endline "All assertions passed."

Key Differences

**fold_left2 vs zip**: OCaml's List.fold_left2 processes two lists simultaneously. Rust's zip creates an iterator of pairs, then processes with filter + count or fold. Both are O(n).

Error type: Both return Result for the length mismatch. Rust: Result<usize, String>. OCaml: (int, string) result. The pattern is identical.

Byte vs char: Rust's as_bytes() comparison is O(n) without Unicode overhead. OCaml's String.get returns bytes too. Both should use byte comparison for ASCII input.

**chars().zip(chars())**: Creates two Chars iterators that advance in sync. Rust's zip is lazy and stops at the shorter iterator — the earlier length check ensures they are equal.

OCaml Approach

OCaml uses String.fold_left2 (OCaml 4.14+) to visit corresponding characters:

let hamming s1 s2 =
  if String.length s1 <> String.length s2
  then Error "unequal lengths"
  else Ok (String.fold_left2
             (fun acc c1 c2 -> if c1 = c2 then acc else acc + 1)
             0 s1 s2)

Earlier versions convert to List.of_seq and use List.fold_left2. OCaml's String is byte-indexed so characters are always single bytes — no Unicode code-point complexity. The error result type is (int, string) result, matching Rust's Result<usize, String> structurally.

Full Source

#![allow(clippy::all)]
/// Hamming Distance
///
/// Count positional differences between two equal-length strings.
/// A clean example of zip + filter + count pattern in both languages.

/// Using zip + filter + count — idiomatic Rust.
pub fn hamming_distance(s1: &str, s2: &str) -> Result<usize, String> {
    if s1.len() != s2.len() {
        return Err("strands must be of equal length".to_string());
    }
    Ok(s1.chars().zip(s2.chars()).filter(|(a, b)| a != b).count())
}

/// Using fold — mirrors OCaml's fold_left2 approach.
pub fn hamming_fold(s1: &str, s2: &str) -> Result<usize, String> {
    if s1.len() != s2.len() {
        return Err("strands must be of equal length".to_string());
    }
    Ok(s1
        .chars()
        .zip(s2.chars())
        .fold(0, |acc, (a, b)| if a != b { acc + 1 } else { acc }))
}

/// Byte-level comparison — faster for ASCII strings.
pub fn hamming_bytes(s1: &[u8], s2: &[u8]) -> Result<usize, String> {
    if s1.len() != s2.len() {
        return Err("strands must be of equal length".to_string());
    }
    Ok(s1.iter().zip(s2.iter()).filter(|(a, b)| a != b).count())
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_hamming_basic() {
        assert_eq!(
            hamming_distance("GAGCCTACTAACGGGAT", "CATCGTAATGACGGCCT"),
            Ok(7)
        );
    }

    #[test]
    fn test_identical() {
        assert_eq!(hamming_distance("AAAA", "AAAA"), Ok(0));
    }

    #[test]
    fn test_completely_different() {
        assert_eq!(hamming_distance("AAAA", "TTTT"), Ok(4));
    }

    #[test]
    fn test_unequal_length() {
        assert!(hamming_distance("AA", "AAA").is_err());
    }

    #[test]
    fn test_empty() {
        assert_eq!(hamming_distance("", ""), Ok(0));
    }

    #[test]
    fn test_fold_variant() {
        assert_eq!(
            hamming_fold("GAGCCTACTAACGGGAT", "CATCGTAATGACGGCCT"),
            Ok(7)
        );
    }

    #[test]
    fn test_single_char() {
        assert_eq!(hamming_distance("A", "G"), Ok(1));
        assert_eq!(hamming_distance("A", "A"), Ok(0));
    }
}

exception Invalid_argument of string

let chars_of_string s =
  List.init (String.length s) (String.get s)

let hamming_distance s1 s2 =
  if String.length s1 <> String.length s2 then
    raise (Invalid_argument "strands must be of equal length");
  List.combine (chars_of_string s1) (chars_of_string s2)
  |> List.filter (fun (a, b) -> a <> b)
  |> List.length

let hamming_fold s1 s2 =
  if String.length s1 <> String.length s2 then
    raise (Invalid_argument "strands must be of equal length");
  List.fold_left2
    (fun acc c1 c2 -> if c1 <> c2 then acc + 1 else acc)
    0
    (chars_of_string s1)
    (chars_of_string s2)

let () =
  assert (hamming_distance "GAGCCTACTAACGGGAT" "CATCGTAATGACGGCCT" = 7);
  assert (hamming_distance "AAAA" "AAAA" = 0);
  assert (hamming_fold "AAAA" "TTTT" = 4);
  print_endline "All assertions passed."

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_hamming_basic() {
        assert_eq!(
            hamming_distance("GAGCCTACTAACGGGAT", "CATCGTAATGACGGCCT"),
            Ok(7)
        );
    }

    #[test]
    fn test_identical() {
        assert_eq!(hamming_distance("AAAA", "AAAA"), Ok(0));
    }

    #[test]
    fn test_completely_different() {
        assert_eq!(hamming_distance("AAAA", "TTTT"), Ok(4));
    }

    #[test]
    fn test_unequal_length() {
        assert!(hamming_distance("AA", "AAA").is_err());
    }

    #[test]
    fn test_empty() {
        assert_eq!(hamming_distance("", ""), Ok(0));
    }

    #[test]
    fn test_fold_variant() {
        assert_eq!(
            hamming_fold("GAGCCTACTAACGGGAT", "CATCGTAATGACGGCCT"),
            Ok(7)
        );
    }

    #[test]
    fn test_single_char() {
        assert_eq!(hamming_distance("A", "G"), Ok(1));
        assert_eq!(hamming_distance("A", "A"), Ok(0));
    }
}

Deep Comparison

Hamming Distance: OCaml vs Rust

The Core Insight

Hamming distance is a perfect showcase for the zip-filter-count pipeline pattern. Both languages express it elegantly, but Rust's lazy iterators avoid the intermediate list allocations that OCaml's eager List.combine creates.

OCaml Approach

OCaml's approach first converts strings to character lists with chars_of_string, then uses List.combine to pair corresponding characters, List.filter to keep mismatches, and List.length to count them. Each step creates a new list. The fold_left2 variant avoids intermediate lists by folding over two lists simultaneously. Error handling uses exceptions (raise Invalid_argument).

Rust Approach

Rust's s1.chars().zip(s2.chars()).filter(|(a, b)| a != b).count() is a single lazy pipeline — no intermediate collections are allocated. Each iterator adapter transforms the stream without materializing it. Error handling uses Result<usize, String>, forcing callers to handle the error case at compile time. The byte-level variant (&[u8]) avoids UTF-8 decoding overhead for ASCII strings.

Side-by-Side

Concept	OCaml	Rust
String to chars	`chars_of_string` (custom)	`.chars()` (built-in iterator)
Zip	`List.combine` (eager, allocates)	`.zip()` (lazy, zero allocation)
Filter	`List.filter` (eager)	`.filter()` (lazy)
Count	`List.length`	`.count()`
Errors	`raise (Invalid_argument ...)`	`Err("...".to_string())`
Fold variant	`List.fold_left2`	`.zip().fold(...)`

What Rust Learners Should Notice

• Rust iterators are lazy — .zip().filter().count() makes a single pass with zero allocation, compared to OCaml's multiple intermediate lists

• Result<T, E> in the return type forces callers to handle the unequal-length case — no hidden exceptions

• .chars() iterates over Unicode scalar values; for ASCII-only strings, working with .bytes() or &[u8] is faster

• The tuple destructuring |(a, b)| in closures mirrors OCaml's (fun (a, b) -> ...)

• Pattern: iter().zip().filter().count() is one of Rust's most common and powerful iterator patterns

Exercises

Maximum in population: Given a list of DNA strands, find the pair with the maximum Hamming distance. Use combinations from example 026 or nested loops.

Error correction: Given a received codeword and a set of valid codewords, find the closest valid codeword by minimum Hamming distance. This is the basis of hard-decision decoding.

XOR shortcut: For binary strings (only '0' and '1'), implement Hamming distance using XOR: s1.bytes().zip(s2.bytes()).filter(|(a, b)| a ^ b != 0).count(). How does this compare for performance?

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust

Tutorial Video

Tutorial

The Problem

🎯 Learning Outcomes

Code Example

Key Differences

OCaml Approach

Full Source

Deep Comparison

Hamming Distance: OCaml vs Rust

The Core Insight

OCaml Approach

Rust Approach

Side-by-Side

What Rust Learners Should Notice

Further Reading

Exercises

Open Source Repos