String Diff Pattern
Functional Programming
Tutorial
The Problem
How similar are two strings? This question drives: autocorrect (find the dictionary word closest to a typo), DNA sequence alignment (edit distance is Smith-Waterman without gaps), database fuzzy search (pg_trgm), file diff tools, and version control. Levenshtein distance solves this with dynamic programming: dp[i][j] is the edit distance between the first i chars of s and the first j chars of t. The recurrence is O(M×N) time and space, with well-known space-optimisation to O(min(M,N)).
🎯 Learning Outcomes
.min() chaining for the three-way minimumclosest using Iterator::min_by_key with Levenshtein as the keyCode Example
#![allow(clippy::all)]
// 496. String diff / edit distance
fn levenshtein(s: &str, t: &str) -> usize {
let sv: Vec<char> = s.chars().collect();
let tv: Vec<char> = t.chars().collect();
let (m, n) = (sv.len(), tv.len());
let mut dp = vec![vec![0usize; n + 1]; m + 1];
for i in 0..=m {
dp[i][0] = i;
}
for j in 0..=n {
dp[0][j] = j;
}
for i in 1..=m {
for j in 1..=n {
dp[i][j] = if sv[i - 1] == tv[j - 1] {
dp[i - 1][j - 1]
} else {
1 + dp[i - 1][j].min(dp[i][j - 1]).min(dp[i - 1][j - 1])
};
}
}
dp[m][n]
}
fn closest<'a>(query: &str, candidates: &[&'a str]) -> Option<&'a str> {
candidates
.iter()
.min_by_key(|&&c| levenshtein(query, c))
.copied()
}
fn starts_with_ignore_case(s: &str, prefix: &str) -> bool {
s.len() >= prefix.len() && s[..prefix.len()].eq_ignore_ascii_case(prefix)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_kitten() {
assert_eq!(levenshtein("kitten", "sitting"), 3);
}
#[test]
fn test_same() {
assert_eq!(levenshtein("abc", "abc"), 0);
}
#[test]
fn test_empty() {
assert_eq!(levenshtein("", "abc"), 3);
assert_eq!(levenshtein("abc", ""), 3);
}
#[test]
fn test_closest() {
assert_eq!(closest("rast", &["rust", "bust", "just"]), Some("rust"));
}
}Key Differences
Vec<char> vs. String**: Rust converts to Vec<char> for O(1) indexed character access; OCaml uses Array.of_seq (String.to_seq s) for the same purpose.min_by_key ergonomics**: Rust's Iterator::min_by_key reads as prose; OCaml's List.fold_left with explicit comparison is more verbose.Vec<char> ensures Unicode scalar values are the unit of comparison; byte-level comparison would corrupt multi-byte characters.OCaml Approach
let levenshtein s t =
let sv = Array.of_seq (String.to_seq s) in
let tv = Array.of_seq (String.to_seq t) in
let m, n = Array.length sv, Array.length tv in
let dp = Array.make_matrix (m+1) (n+1) 0 in
for i = 0 to m do dp.(i).(0) <- i done;
for j = 0 to n do dp.(0).(j) <- j done;
for i = 1 to m do for j = 1 to n do
dp.(i).(j) <- if sv.(i-1) = tv.(j-1) then dp.(i-1).(j-1)
else 1 + min dp.(i-1).(j) (min dp.(i).(j-1) dp.(i-1).(j-1))
done done;
dp.(m).(n)
OCaml's min is a 2-argument function requiring chaining; Rust's .min() is chained on the value.
Full Source
#![allow(clippy::all)]
// 496. String diff / edit distance
fn levenshtein(s: &str, t: &str) -> usize {
let sv: Vec<char> = s.chars().collect();
let tv: Vec<char> = t.chars().collect();
let (m, n) = (sv.len(), tv.len());
let mut dp = vec![vec![0usize; n + 1]; m + 1];
for i in 0..=m {
dp[i][0] = i;
}
for j in 0..=n {
dp[0][j] = j;
}
for i in 1..=m {
for j in 1..=n {
dp[i][j] = if sv[i - 1] == tv[j - 1] {
dp[i - 1][j - 1]
} else {
1 + dp[i - 1][j].min(dp[i][j - 1]).min(dp[i - 1][j - 1])
};
}
}
dp[m][n]
}
fn closest<'a>(query: &str, candidates: &[&'a str]) -> Option<&'a str> {
candidates
.iter()
.min_by_key(|&&c| levenshtein(query, c))
.copied()
}
fn starts_with_ignore_case(s: &str, prefix: &str) -> bool {
s.len() >= prefix.len() && s[..prefix.len()].eq_ignore_ascii_case(prefix)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_kitten() {
assert_eq!(levenshtein("kitten", "sitting"), 3);
}
#[test]
fn test_same() {
assert_eq!(levenshtein("abc", "abc"), 0);
}
#[test]
fn test_empty() {
assert_eq!(levenshtein("", "abc"), 3);
assert_eq!(levenshtein("abc", ""), 3);
}
#[test]
fn test_closest() {
assert_eq!(closest("rast", &["rust", "bust", "just"]), Some("rust"));
}
}
✓ Tests
Rust test suite
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_kitten() {
assert_eq!(levenshtein("kitten", "sitting"), 3);
}
#[test]
fn test_same() {
assert_eq!(levenshtein("abc", "abc"), 0);
}
#[test]
fn test_empty() {
assert_eq!(levenshtein("", "abc"), 3);
assert_eq!(levenshtein("abc", ""), 3);
}
#[test]
fn test_closest() {
assert_eq!(closest("rast", &["rust", "bust", "just"]), Some("rust"));
}
}
Exercises
levenshtein using two Vec<usize> rows instead of the full matrix, reducing memory from O(M×N) to O(min(M,N)).s into t.