ExamplesBy LevelBy TopicLearning Paths
787 Intermediate

787-edit-distance-levenshtein — Edit Distance (Levenshtein)

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "787-edit-distance-levenshtein — Edit Distance (Levenshtein)" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Edit distance (Levenshtein distance, 1966) measures how many single-character insertions, deletions, or substitutions transform one string into another. Key difference from OCaml: 1. **Algorithm**: The DP algorithm is language

Tutorial

The Problem

Edit distance (Levenshtein distance, 1966) measures how many single-character insertions, deletions, or substitutions transform one string into another. It is the backbone of spell checkers (aspell, hunspell), fuzzy string search (Elasticsearch, Redis), autocorrect (mobile keyboards), DNA sequence alignment, and natural language processing. The DP algorithm runs in O(mn) time and is one of the most practically important algorithms in computing.

🎯 Learning Outcomes

  • • Build the Levenshtein DP table with proper initialization of boundary conditions
  • • Apply the recurrence: min of delete (dp[i-1][j]+1), insert (dp[i][j-1]+1), and substitute (dp[i-1][j-1]+cost)
  • • Understand the relationship between edit distance and LCS
  • • Implement Damerau-Levenshtein (adds transpositions) and Jaro-Winkler distance
  • • Optimize from O(mn) space to O(min(m,n)) using two rolling rows
  • Code Example

    #![allow(clippy::all)]
    //! # Edit Distance (Levenshtein)
    
    pub fn edit_distance(a: &str, b: &str) -> usize {
        let (m, n) = (a.len(), b.len());
        let a: Vec<char> = a.chars().collect();
        let b: Vec<char> = b.chars().collect();
        let mut dp = vec![vec![0; n + 1]; m + 1];
    
        for i in 0..=m {
            dp[i][0] = i;
        }
        for j in 0..=n {
            dp[0][j] = j;
        }
    
        for i in 1..=m {
            for j in 1..=n {
                let cost = if a[i - 1] == b[j - 1] { 0 } else { 1 };
                dp[i][j] = (dp[i - 1][j] + 1)
                    .min(dp[i][j - 1] + 1)
                    .min(dp[i - 1][j - 1] + cost);
            }
        }
        dp[m][n]
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_edit() {
            assert_eq!(edit_distance("kitten", "sitting"), 3);
        }
        #[test]
        fn test_empty() {
            assert_eq!(edit_distance("", "abc"), 3);
        }
        #[test]
        fn test_identical() {
            assert_eq!(edit_distance("abc", "abc"), 0);
        }
        #[test]
        fn test_delete() {
            assert_eq!(edit_distance("abc", "ab"), 1);
        }
    }

    Key Differences

  • Algorithm: The DP algorithm is language-independent; Rust and OCaml implementations are structurally identical.
  • Rolling optimization: Replacing the 2D table with two rows reduces memory from O(mn) to O(n); both languages implement this identically.
  • Unicode: Rust's chars() handles Unicode properly; OCaml's Bytes.get operates on bytes, requiring explicit UTF-8 handling for Unicode strings.
  • Production use: The strsim Rust crate provides Levenshtein, Damerau-Levenshtein, and Jaro-Winkler; OCaml's stringdist provides similar functionality.
  • OCaml Approach

    OCaml implements the same algorithm with Array.make_matrix. The ukkonen library provides a fast O(nd) edit distance for closely-related strings. diff libraries in OCaml use edit distance internally for line-level comparisons. The functional style can express this as a memoized recursion, but the imperative nested loop is more efficient and idiomatic for DP.

    Full Source

    #![allow(clippy::all)]
    //! # Edit Distance (Levenshtein)
    
    pub fn edit_distance(a: &str, b: &str) -> usize {
        let (m, n) = (a.len(), b.len());
        let a: Vec<char> = a.chars().collect();
        let b: Vec<char> = b.chars().collect();
        let mut dp = vec![vec![0; n + 1]; m + 1];
    
        for i in 0..=m {
            dp[i][0] = i;
        }
        for j in 0..=n {
            dp[0][j] = j;
        }
    
        for i in 1..=m {
            for j in 1..=n {
                let cost = if a[i - 1] == b[j - 1] { 0 } else { 1 };
                dp[i][j] = (dp[i - 1][j] + 1)
                    .min(dp[i][j - 1] + 1)
                    .min(dp[i - 1][j - 1] + cost);
            }
        }
        dp[m][n]
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_edit() {
            assert_eq!(edit_distance("kitten", "sitting"), 3);
        }
        #[test]
        fn test_empty() {
            assert_eq!(edit_distance("", "abc"), 3);
        }
        #[test]
        fn test_identical() {
            assert_eq!(edit_distance("abc", "abc"), 0);
        }
        #[test]
        fn test_delete() {
            assert_eq!(edit_distance("abc", "ab"), 1);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_edit() {
            assert_eq!(edit_distance("kitten", "sitting"), 3);
        }
        #[test]
        fn test_empty() {
            assert_eq!(edit_distance("", "abc"), 3);
        }
        #[test]
        fn test_identical() {
            assert_eq!(edit_distance("abc", "abc"), 0);
        }
        #[test]
        fn test_delete() {
            assert_eq!(edit_distance("abc", "ab"), 1);
        }
    }

    Deep Comparison

    OCaml vs Rust: Edit Distance Levenshtein

    Overview

    See the example.rs and example.ml files for detailed implementations.

    Key Differences

    AspectOCamlRust
    Type systemHindley-MilnerOwnership + traits
    MemoryGCZero-cost abstractions
    MutabilityExplicit refmut keyword
    Error handlingOption/ResultResult<T, E>

    See README.md for detailed comparison.

    Exercises

  • Implement bounded_edit_distance(a, b, max_dist) that returns None if the edit distance exceeds max_dist, short-circuiting computation for obviously different strings.
  • Implement Damerau-Levenshtein distance that also counts adjacent character transpositions ("ab" → "ba" costs 1 instead of 2).
  • Write fuzzy_search(query: &str, candidates: &[&str], threshold: usize) -> Vec<(&str, usize)> that returns all candidates within threshold edit distance, sorted by distance.
  • Open Source Repos