Aho-Corasick Multi-Pattern Search
Tutorial Video
Text description (accessibility)
This video demonstrates the "Aho-Corasick Multi-Pattern Search" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. When you need to find hundreds of keywords simultaneously in a large text — network intrusion detection scanning for attack signatures, spam filters checking for banned phrases, DNA analysis searching for multiple probes — running each pattern separately costs O(n * k) where k is the number of patterns. Key difference from OCaml: | Aspect | Rust | OCaml |
Tutorial
The Problem
When you need to find hundreds of keywords simultaneously in a large text — network intrusion detection scanning for attack signatures, spam filters checking for banned phrases, DNA analysis searching for multiple probes — running each pattern separately costs O(n * k) where k is the number of patterns. Aho-Corasick solves this by building a trie of all patterns and adding failure links so that mismatches fall back to the longest suffix that is also a prefix of some pattern. The result is O(n + m + z) where n is text length, m is total pattern length, and z is match count — regardless of how many patterns there are. This is the algorithm powering fgrep -f patterns.txt, network packet inspection, and antivirus scanning.
🎯 Learning Outcomes
Code Example
#![allow(clippy::all)]
//! PlaceholderKey Differences
| Aspect | Rust | OCaml |
|---|---|---|
| Goto table | Vec<HashMap<char, usize>> | int array or Hashtbl per node |
| Failure links | Vec<usize> indexed by state | int array or int ref per node |
| BFS queue | std::collections::VecDeque | Queue.t from standard library |
| Output links | Vec<Vec<String>> | string list ref per node |
| Unicode support | HashMap handles any char | Depends on implementation |
| Memory layout | Cache-friendly Vec storage | GC-managed node heap |
OCaml Approach
OCaml represents the Aho-Corasick automaton using arrays of int option for goto transitions or Hashtbl per node. Failure links are mutable int ref values set during BFS using a Queue. The output at each node is a string list ref grown by appending output-linked completions. OCaml's polymorphic variants can express node states cleanly. The Buffer module builds the goto structure, and Array.make creates fixed-size transition arrays for ASCII alphabets. The search function is naturally tail-recursive with the current state threaded through.
Full Source
#![allow(clippy::all)]
//! PlaceholderDeep Comparison
OCaml vs Rust: Aho Corasick Multi
Overview
See the example.rs and example.ml files for detailed implementations.
Key Differences
| Aspect | OCaml | Rust |
|---|---|---|
| Type system | Hindley-Milner | Ownership + traits |
| Memory | GC | Zero-cost abstractions |
| Mutability | Explicit ref | mut keyword |
| Error handling | Option/Result | Result<T, E> |
See README.md for detailed comparison.