ExamplesBy LevelBy TopicLearning Paths
817 Advanced

Aho-Corasick Multi-Pattern Search

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "Aho-Corasick Multi-Pattern Search" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. When you need to find hundreds of keywords simultaneously in a large text — network intrusion detection scanning for attack signatures, spam filters checking for banned phrases, DNA analysis searching for multiple probes — running each pattern separately costs O(n * k) where k is the number of patterns. Key difference from OCaml: | Aspect | Rust | OCaml |

Tutorial

The Problem

When you need to find hundreds of keywords simultaneously in a large text — network intrusion detection scanning for attack signatures, spam filters checking for banned phrases, DNA analysis searching for multiple probes — running each pattern separately costs O(n * k) where k is the number of patterns. Aho-Corasick solves this by building a trie of all patterns and adding failure links so that mismatches fall back to the longest suffix that is also a prefix of some pattern. The result is O(n + m + z) where n is text length, m is total pattern length, and z is match count — regardless of how many patterns there are. This is the algorithm powering fgrep -f patterns.txt, network packet inspection, and antivirus scanning.

🎯 Learning Outcomes

  • • Build a trie (prefix tree) from a set of patterns with nodes tracking pattern completions
  • • Compute failure links via BFS, connecting mismatch states to the longest proper suffix-prefix
  • • Understand why BFS is necessary for failure link computation (level-by-level guarantees parent links exist)
  • • Implement the search phase that follows failure links on mismatch rather than restarting
  • • Recognize output links that allow reporting multiple overlapping patterns ending at the same position
  • Code Example

    #![allow(clippy::all)]
    //! Placeholder

    Key Differences

    AspectRustOCaml
    Goto tableVec<HashMap<char, usize>>int array or Hashtbl per node
    Failure linksVec<usize> indexed by stateint array or int ref per node
    BFS queuestd::collections::VecDequeQueue.t from standard library
    Output linksVec<Vec<String>>string list ref per node
    Unicode supportHashMap handles any charDepends on implementation
    Memory layoutCache-friendly Vec storageGC-managed node heap

    OCaml Approach

    OCaml represents the Aho-Corasick automaton using arrays of int option for goto transitions or Hashtbl per node. Failure links are mutable int ref values set during BFS using a Queue. The output at each node is a string list ref grown by appending output-linked completions. OCaml's polymorphic variants can express node states cleanly. The Buffer module builds the goto structure, and Array.make creates fixed-size transition arrays for ASCII alphabets. The search function is naturally tail-recursive with the current state threaded through.

    Full Source

    #![allow(clippy::all)]
    //! Placeholder

    Deep Comparison

    OCaml vs Rust: Aho Corasick Multi

    Overview

    See the example.rs and example.ml files for detailed implementations.

    Key Differences

    AspectOCamlRust
    Type systemHindley-MilnerOwnership + traits
    MemoryGCZero-cost abstractions
    MutabilityExplicit refmut keyword
    Error handlingOption/ResultResult<T, E>

    See README.md for detailed comparison.

    Exercises

  • Add support for case-insensitive matching by lowercasing during trie construction.
  • Implement output links to efficiently report all patterns ending at each position, not just direct matches.
  • Measure the speedup over k separate Boyer-Moore searches for k=10, 100, 1000 patterns.
  • Serialize the compiled automaton to disk and reload it without rebuilding from patterns.
  • Extend to report overlapping matches where one pattern is a substring of another.
  • Open Source Repos