814-kmp-pattern-matching — KMP Pattern Matching
Tutorial Video
Text description (accessibility)
This video demonstrates the "814-kmp-pattern-matching — KMP Pattern Matching" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Knuth-Morris-Pratt (1977) finds all occurrences of a pattern in a text in O(n+m) time, avoiding redundant comparisons. Key difference from OCaml: 1. **LPS computation**: Both languages implement the same KMP failure function loop; the logic is language
Tutorial
The Problem
Knuth-Morris-Pratt (1977) finds all occurrences of a pattern in a text in O(n+m) time, avoiding redundant comparisons. Naive search backtracks to the beginning on mismatch; KMP uses the Failure Function (LPS array) to skip ahead. It is used in grep, text editors (search/replace), network intrusion detection systems (signature matching), and bioinformatics (exact sequence matching). KMP is the foundational substring search algorithm that all others build upon.
🎯 Learning Outcomes
kmp_search returning all starting positions of pattern occurrenceslps[j-1] to skip known matchesCode Example
#![allow(clippy::all)]
//! # KMP Pattern Matching
pub fn compute_lps(pattern: &str) -> Vec<usize> {
let p: Vec<char> = pattern.chars().collect();
let m = p.len();
let mut lps = vec![0; m];
let mut len = 0;
let mut i = 1;
while i < m {
if p[i] == p[len] {
len += 1;
lps[i] = len;
i += 1;
} else if len != 0 {
len = lps[len - 1];
} else {
lps[i] = 0;
i += 1;
}
}
lps
}
pub fn kmp_search(text: &str, pattern: &str) -> Vec<usize> {
let t: Vec<char> = text.chars().collect();
let p: Vec<char> = pattern.chars().collect();
let (n, m) = (t.len(), p.len());
if m == 0 {
return vec![];
}
let lps = compute_lps(pattern);
let mut results = vec![];
let (mut i, mut j) = (0, 0);
while i < n {
if t[i] == p[j] {
i += 1;
j += 1;
}
if j == m {
results.push(i - j);
j = lps[j - 1];
} else if i < n && t[i] != p[j] {
if j != 0 {
j = lps[j - 1];
} else {
i += 1;
}
}
}
results
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_kmp() {
assert_eq!(kmp_search("ABABDABACDABABCABAB", "ABABCABAB"), vec![10]);
}
}Key Differences
Vec<char> for indexed access (O(1) per char); OCaml's String.get is O(1) for bytes; both handle ASCII correctly.Vec<usize> / int list of starting positions; the jump after match (lps[m-1]) enables overlapping matches.memchr crate uses SIMD-accelerated byte search for single characters; KMP is used for multi-character patterns.OCaml Approach
OCaml implements KMP with Array.make m 0 for LPS and Buffer.t or direct position list for results. The recursive functional style: let rec fill i len = ... computes LPS cleanly. OCaml's String.get provides O(1) character access. The Re.execp function in the re library uses a different but related approach for regex matching. OCaml's Str module uses NFA-based matching.
Full Source
#![allow(clippy::all)]
//! # KMP Pattern Matching
pub fn compute_lps(pattern: &str) -> Vec<usize> {
let p: Vec<char> = pattern.chars().collect();
let m = p.len();
let mut lps = vec![0; m];
let mut len = 0;
let mut i = 1;
while i < m {
if p[i] == p[len] {
len += 1;
lps[i] = len;
i += 1;
} else if len != 0 {
len = lps[len - 1];
} else {
lps[i] = 0;
i += 1;
}
}
lps
}
pub fn kmp_search(text: &str, pattern: &str) -> Vec<usize> {
let t: Vec<char> = text.chars().collect();
let p: Vec<char> = pattern.chars().collect();
let (n, m) = (t.len(), p.len());
if m == 0 {
return vec![];
}
let lps = compute_lps(pattern);
let mut results = vec![];
let (mut i, mut j) = (0, 0);
while i < n {
if t[i] == p[j] {
i += 1;
j += 1;
}
if j == m {
results.push(i - j);
j = lps[j - 1];
} else if i < n && t[i] != p[j] {
if j != 0 {
j = lps[j - 1];
} else {
i += 1;
}
}
}
results
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_kmp() {
assert_eq!(kmp_search("ABABDABACDABABCABAB", "ABABCABAB"), vec![10]);
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_kmp() {
assert_eq!(kmp_search("ABABDABACDABABCABAB", "ABABCABAB"), vec![10]);
}
}
Deep Comparison
OCaml vs Rust: Kmp Pattern Matching
Overview
See the example.rs and example.ml files for detailed implementations.
Key Differences
| Aspect | OCaml | Rust |
|---|---|---|
| Type system | Hindley-Milner | Ownership + traits |
| Memory | GC | Zero-cost abstractions |
| Mutability | Explicit ref | mut keyword |
| Error handling | Option/Result | Result<T, E> |
See README.md for detailed comparison.
Exercises
kmp_count(text, pattern) -> usize that counts non-overlapping occurrences efficiently.pattern and use it to determine if pattern has a period p = m - lps[m-1].