Rabin-Karp Rolling Hash
Tutorial Video
Text description (accessibility)
This video demonstrates the "Rabin-Karp Rolling Hash" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Finding multiple patterns in a text, or detecting plagiarism across documents, requires more than single-pattern search. Key difference from OCaml: | Aspect | Rust | OCaml |
Tutorial
The Problem
Finding multiple patterns in a text, or detecting plagiarism across documents, requires more than single-pattern search. Rabin-Karp solves this using polynomial hashing: compute a hash for each window of text equal in size to the pattern, and compare hashes rather than characters. The rolling hash property means each step takes O(1) — subtract the contribution of the outgoing character and add the incoming one. This enables searching for k patterns simultaneously in O(n + k*m) expected time. Real-world uses: duplicate code detection, plagiarism detection, DNA fingerprinting, and detecting repeated substrings in large files.
🎯 Learning Outcomes
hash = (c0*b^(m-1) + c1*b^(m-2) + ... + c(m-1)) mod pCode Example
#![allow(clippy::all)]
//! PlaceholderKey Differences
| Aspect | Rust | OCaml |
|---|---|---|
| Arithmetic | u64 with explicit % modulus | int (63-bit) with mod |
| Byte access | .as_bytes()[i] | Char.code (String.get s i) |
| Hash state | Mutable variable in loop | Threaded through recursion |
| Multi-pattern | HashMap<u64, Vec<Pattern>> | Hashtbl keyed by hash |
| Overflow risk | None with u64 % modulus | Rare on 64-bit, possible on 32-bit |
| False positives | Verified with == comparison | Same character-level check |
OCaml Approach
OCaml implements rolling hash with int arithmetic, relying on its 63-bit native integers to avoid overflow on most platforms. The Char.code function extracts byte values. The rolling hash state threads through a tail-recursive function rather than a loop, maintaining immutability. OCaml's List.rev collects match positions in reverse and reverses at the end. For multi-pattern Rabin-Karp, a Hashtbl maps hash values to pattern lists, checking all patterns with matching hashes.
Full Source
#![allow(clippy::all)]
//! PlaceholderDeep Comparison
OCaml vs Rust: Rabin Karp Rolling Hash
Overview
See the example.rs and example.ml files for detailed implementations.
Key Differences
| Aspect | OCaml | Rust |
|---|---|---|
| Type system | Hindley-Milner | Ownership + traits |
| Memory | GC | Zero-cost abstractions |
| Mutability | Explicit ref | mut keyword |
| Error handling | Option/Result | Result<T, E> |
See README.md for detailed comparison.