Boyer-Moore String Search
Tutorial Video
Text description (accessibility)
This video demonstrates the "Boyer-Moore String Search" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. Naive string search checks every position in the text, yielding O(n*m) worst-case time. Key difference from OCaml: | Aspect | Rust | OCaml |
Tutorial
The Problem
Naive string search checks every position in the text, yielding O(n*m) worst-case time. For long texts with long patterns this is prohibitively slow. Real-world applications — log scanning, virus signature detection, text editors, DNA sequence analysis — require sublinear average-case search. Boyer-Moore achieves this by using two heuristics: the bad-character rule skips large chunks of text when a mismatch occurs at the wrong character position, and the good-suffix rule leverages known pattern structure to skip even further. The result is often O(n/m) average-case performance, making Boyer-Moore the algorithm behind grep and many production text search tools.
🎯 Learning Outcomes
Code Example
#![allow(clippy::all)]
//! # Boyer-Moore String Search
pub fn boyer_moore(text: &str, pattern: &str) -> Vec<usize> {
let t: Vec<char> = text.chars().collect();
let p: Vec<char> = pattern.chars().collect();
let (n, m) = (t.len(), p.len());
if m == 0 || m > n {
return vec![];
}
let mut bad_char = std::collections::HashMap::new();
for (i, &c) in p.iter().enumerate() {
bad_char.insert(c, i);
}
let mut results = vec![];
let mut s = 0;
while s <= n - m {
let mut j = m - 1;
while j < m && p[j] == t[s + j] {
if j == 0 {
break;
}
j -= 1;
}
if j == 0 && p[0] == t[s] {
results.push(s);
s += 1;
} else {
s += (j.saturating_sub(*bad_char.get(&t[s + j]).unwrap_or(&0))).max(1);
}
}
results
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_bm() {
assert!(!boyer_moore("ABAAABCD", "ABC").is_empty());
}
}Key Differences
| Aspect | Rust | OCaml |
|---|---|---|
| Shift table | HashMap<char, usize> for Unicode | int array (256) for ASCII or Map |
| String indexing | chars().collect() to Vec | String.get/Bytes.get by index |
| Alphabet size | Unbounded (any Unicode) | Usually 256 for byte-level search |
| Result collection | Vec<usize> pushed in loop | Recursive list accumulation |
| Pattern scanning | Right-to-left inner loop | Same, via index arithmetic |
| Worst case | O(n*m) with bad-character only | Same without good-suffix rule |
OCaml Approach
In OCaml, Boyer-Moore is implemented using Bytes or String with Char.code indexing for O(1) character access. The bad-character table is often an int array of size 256 for ASCII. OCaml's tail recursion enables a clean recursive formulation of the scan loop. The Buffer module avoids string concatenation in preprocessing. Functors can parameterize over alphabet type, and the Map module serves the role of Rust's HashMap. OCaml's immutable-first style encourages returning a list of match positions rather than mutating a results vector.
Full Source
#![allow(clippy::all)]
//! # Boyer-Moore String Search
pub fn boyer_moore(text: &str, pattern: &str) -> Vec<usize> {
let t: Vec<char> = text.chars().collect();
let p: Vec<char> = pattern.chars().collect();
let (n, m) = (t.len(), p.len());
if m == 0 || m > n {
return vec![];
}
let mut bad_char = std::collections::HashMap::new();
for (i, &c) in p.iter().enumerate() {
bad_char.insert(c, i);
}
let mut results = vec![];
let mut s = 0;
while s <= n - m {
let mut j = m - 1;
while j < m && p[j] == t[s + j] {
if j == 0 {
break;
}
j -= 1;
}
if j == 0 && p[0] == t[s] {
results.push(s);
s += 1;
} else {
s += (j.saturating_sub(*bad_char.get(&t[s + j]).unwrap_or(&0))).max(1);
}
}
results
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_bm() {
assert!(!boyer_moore("ABAAABCD", "ABC").is_empty());
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_bm() {
assert!(!boyer_moore("ABAAABCD", "ABC").is_empty());
}
}
Deep Comparison
OCaml vs Rust: Boyer Moore Search
Overview
See the example.rs and example.ml files for detailed implementations.
Key Differences
| Aspect | OCaml | Rust |
|---|---|---|
| Type system | Hindley-Milner | Ownership + traits |
| Memory | GC | Zero-cost abstractions |
| Mutability | Explicit ref | mut keyword |
| Error handling | Option/Result | Result<T, E> |
See README.md for detailed comparison.
Exercises
"aaaa...a" searching for "aaab".(usize, usize) pairs of (start, end) positions instead of just start indices.