ExamplesBy LevelBy TopicLearning Paths
814 Intermediate

814-kmp-pattern-matching — KMP Pattern Matching

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "814-kmp-pattern-matching — KMP Pattern Matching" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Knuth-Morris-Pratt (1977) finds all occurrences of a pattern in a text in O(n+m) time, avoiding redundant comparisons. Key difference from OCaml: 1. **LPS computation**: Both languages implement the same KMP failure function loop; the logic is language

Tutorial

The Problem

Knuth-Morris-Pratt (1977) finds all occurrences of a pattern in a text in O(n+m) time, avoiding redundant comparisons. Naive search backtracks to the beginning on mismatch; KMP uses the Failure Function (LPS array) to skip ahead. It is used in grep, text editors (search/replace), network intrusion detection systems (signature matching), and bioinformatics (exact sequence matching). KMP is the foundational substring search algorithm that all others build upon.

🎯 Learning Outcomes

  • • Compute the LPS (Longest Proper Prefix-Suffix) array for a pattern in O(m)
  • • Use the LPS array to avoid re-examining characters in the text
  • • Implement kmp_search returning all starting positions of pattern occurrences
  • • Understand the invariant: at each mismatch, jump to lps[j-1] to skip known matches
  • • Compare with naive O(nm) search and Boyer-Moore O(n/m) average
  • Code Example

    #![allow(clippy::all)]
    //! # KMP Pattern Matching
    pub fn compute_lps(pattern: &str) -> Vec<usize> {
        let p: Vec<char> = pattern.chars().collect();
        let m = p.len();
        let mut lps = vec![0; m];
        let mut len = 0;
        let mut i = 1;
        while i < m {
            if p[i] == p[len] {
                len += 1;
                lps[i] = len;
                i += 1;
            } else if len != 0 {
                len = lps[len - 1];
            } else {
                lps[i] = 0;
                i += 1;
            }
        }
        lps
    }
    pub fn kmp_search(text: &str, pattern: &str) -> Vec<usize> {
        let t: Vec<char> = text.chars().collect();
        let p: Vec<char> = pattern.chars().collect();
        let (n, m) = (t.len(), p.len());
        if m == 0 {
            return vec![];
        }
        let lps = compute_lps(pattern);
        let mut results = vec![];
        let (mut i, mut j) = (0, 0);
        while i < n {
            if t[i] == p[j] {
                i += 1;
                j += 1;
            }
            if j == m {
                results.push(i - j);
                j = lps[j - 1];
            } else if i < n && t[i] != p[j] {
                if j != 0 {
                    j = lps[j - 1];
                } else {
                    i += 1;
                }
            }
        }
        results
    }
    #[cfg(test)]
    mod tests {
        use super::*;
        #[test]
        fn test_kmp() {
            assert_eq!(kmp_search("ABABDABACDABABCABAB", "ABABCABAB"), vec![10]);
        }
    }

    Key Differences

  • LPS computation: Both languages implement the same KMP failure function loop; the logic is language-independent.
  • String iteration: Rust collects to Vec<char> for indexed access (O(1) per char); OCaml's String.get is O(1) for bytes; both handle ASCII correctly.
  • Multiple matches: Both return Vec<usize> / int list of starting positions; the jump after match (lps[m-1]) enables overlapping matches.
  • Production use: Rust's memchr crate uses SIMD-accelerated byte search for single characters; KMP is used for multi-character patterns.
  • OCaml Approach

    OCaml implements KMP with Array.make m 0 for LPS and Buffer.t or direct position list for results. The recursive functional style: let rec fill i len = ... computes LPS cleanly. OCaml's String.get provides O(1) character access. The Re.execp function in the re library uses a different but related approach for regex matching. OCaml's Str module uses NFA-based matching.

    Full Source

    #![allow(clippy::all)]
    //! # KMP Pattern Matching
    pub fn compute_lps(pattern: &str) -> Vec<usize> {
        let p: Vec<char> = pattern.chars().collect();
        let m = p.len();
        let mut lps = vec![0; m];
        let mut len = 0;
        let mut i = 1;
        while i < m {
            if p[i] == p[len] {
                len += 1;
                lps[i] = len;
                i += 1;
            } else if len != 0 {
                len = lps[len - 1];
            } else {
                lps[i] = 0;
                i += 1;
            }
        }
        lps
    }
    pub fn kmp_search(text: &str, pattern: &str) -> Vec<usize> {
        let t: Vec<char> = text.chars().collect();
        let p: Vec<char> = pattern.chars().collect();
        let (n, m) = (t.len(), p.len());
        if m == 0 {
            return vec![];
        }
        let lps = compute_lps(pattern);
        let mut results = vec![];
        let (mut i, mut j) = (0, 0);
        while i < n {
            if t[i] == p[j] {
                i += 1;
                j += 1;
            }
            if j == m {
                results.push(i - j);
                j = lps[j - 1];
            } else if i < n && t[i] != p[j] {
                if j != 0 {
                    j = lps[j - 1];
                } else {
                    i += 1;
                }
            }
        }
        results
    }
    #[cfg(test)]
    mod tests {
        use super::*;
        #[test]
        fn test_kmp() {
            assert_eq!(kmp_search("ABABDABACDABABCABAB", "ABABCABAB"), vec![10]);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
        #[test]
        fn test_kmp() {
            assert_eq!(kmp_search("ABABDABACDABABCABAB", "ABABCABAB"), vec![10]);
        }
    }

    Deep Comparison

    OCaml vs Rust: Kmp Pattern Matching

    Overview

    See the example.rs and example.ml files for detailed implementations.

    Key Differences

    AspectOCamlRust
    Type systemHindley-MilnerOwnership + traits
    MemoryGCZero-cost abstractions
    MutabilityExplicit refmut keyword
    Error handlingOption/ResultResult<T, E>

    See README.md for detailed comparison.

    Exercises

  • Implement kmp_count(text, pattern) -> usize that counts non-overlapping occurrences efficiently.
  • Extend to 2D pattern matching: search for a 2D pattern in a 2D text by running KMP on each row to find column matches, then KMP on columns.
  • Use KMP to find the shortest string period: compute LPS of pattern and use it to determine if pattern has a period p = m - lps[m-1].
  • Open Source Repos