787-edit-distance-levenshtein — Edit Distance (Levenshtein)
Tutorial Video
Text description (accessibility)
This video demonstrates the "787-edit-distance-levenshtein — Edit Distance (Levenshtein)" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Edit distance (Levenshtein distance, 1966) measures how many single-character insertions, deletions, or substitutions transform one string into another. Key difference from OCaml: 1. **Algorithm**: The DP algorithm is language
Tutorial
The Problem
Edit distance (Levenshtein distance, 1966) measures how many single-character insertions, deletions, or substitutions transform one string into another. It is the backbone of spell checkers (aspell, hunspell), fuzzy string search (Elasticsearch, Redis), autocorrect (mobile keyboards), DNA sequence alignment, and natural language processing. The DP algorithm runs in O(mn) time and is one of the most practically important algorithms in computing.
🎯 Learning Outcomes
dp[i-1][j]+1), insert (dp[i][j-1]+1), and substitute (dp[i-1][j-1]+cost)Code Example
#![allow(clippy::all)]
//! # Edit Distance (Levenshtein)
pub fn edit_distance(a: &str, b: &str) -> usize {
let (m, n) = (a.len(), b.len());
let a: Vec<char> = a.chars().collect();
let b: Vec<char> = b.chars().collect();
let mut dp = vec![vec![0; n + 1]; m + 1];
for i in 0..=m {
dp[i][0] = i;
}
for j in 0..=n {
dp[0][j] = j;
}
for i in 1..=m {
for j in 1..=n {
let cost = if a[i - 1] == b[j - 1] { 0 } else { 1 };
dp[i][j] = (dp[i - 1][j] + 1)
.min(dp[i][j - 1] + 1)
.min(dp[i - 1][j - 1] + cost);
}
}
dp[m][n]
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_edit() {
assert_eq!(edit_distance("kitten", "sitting"), 3);
}
#[test]
fn test_empty() {
assert_eq!(edit_distance("", "abc"), 3);
}
#[test]
fn test_identical() {
assert_eq!(edit_distance("abc", "abc"), 0);
}
#[test]
fn test_delete() {
assert_eq!(edit_distance("abc", "ab"), 1);
}
}Key Differences
chars() handles Unicode properly; OCaml's Bytes.get operates on bytes, requiring explicit UTF-8 handling for Unicode strings.strsim Rust crate provides Levenshtein, Damerau-Levenshtein, and Jaro-Winkler; OCaml's stringdist provides similar functionality.OCaml Approach
OCaml implements the same algorithm with Array.make_matrix. The ukkonen library provides a fast O(nd) edit distance for closely-related strings. diff libraries in OCaml use edit distance internally for line-level comparisons. The functional style can express this as a memoized recursion, but the imperative nested loop is more efficient and idiomatic for DP.
Full Source
#![allow(clippy::all)]
//! # Edit Distance (Levenshtein)
pub fn edit_distance(a: &str, b: &str) -> usize {
let (m, n) = (a.len(), b.len());
let a: Vec<char> = a.chars().collect();
let b: Vec<char> = b.chars().collect();
let mut dp = vec![vec![0; n + 1]; m + 1];
for i in 0..=m {
dp[i][0] = i;
}
for j in 0..=n {
dp[0][j] = j;
}
for i in 1..=m {
for j in 1..=n {
let cost = if a[i - 1] == b[j - 1] { 0 } else { 1 };
dp[i][j] = (dp[i - 1][j] + 1)
.min(dp[i][j - 1] + 1)
.min(dp[i - 1][j - 1] + cost);
}
}
dp[m][n]
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_edit() {
assert_eq!(edit_distance("kitten", "sitting"), 3);
}
#[test]
fn test_empty() {
assert_eq!(edit_distance("", "abc"), 3);
}
#[test]
fn test_identical() {
assert_eq!(edit_distance("abc", "abc"), 0);
}
#[test]
fn test_delete() {
assert_eq!(edit_distance("abc", "ab"), 1);
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_edit() {
assert_eq!(edit_distance("kitten", "sitting"), 3);
}
#[test]
fn test_empty() {
assert_eq!(edit_distance("", "abc"), 3);
}
#[test]
fn test_identical() {
assert_eq!(edit_distance("abc", "abc"), 0);
}
#[test]
fn test_delete() {
assert_eq!(edit_distance("abc", "ab"), 1);
}
}
Deep Comparison
OCaml vs Rust: Edit Distance Levenshtein
Overview
See the example.rs and example.ml files for detailed implementations.
Key Differences
| Aspect | OCaml | Rust |
|---|---|---|
| Type system | Hindley-Milner | Ownership + traits |
| Memory | GC | Zero-cost abstractions |
| Mutability | Explicit ref | mut keyword |
| Error handling | Option/Result | Result<T, E> |
See README.md for detailed comparison.
Exercises
bounded_edit_distance(a, b, max_dist) that returns None if the edit distance exceeds max_dist, short-circuiting computation for obviously different strings."ab" → "ba" costs 1 instead of 2).fuzzy_search(query: &str, candidates: &[&str], threshold: usize) -> Vec<(&str, usize)> that returns all candidates within threshold edit distance, sorted by distance.