ExamplesBy LevelBy TopicLearning Paths
816 Fundamental

Rabin-Karp Rolling Hash

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "Rabin-Karp Rolling Hash" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Finding multiple patterns in a text, or detecting plagiarism across documents, requires more than single-pattern search. Key difference from OCaml: | Aspect | Rust | OCaml |

Tutorial

The Problem

Finding multiple patterns in a text, or detecting plagiarism across documents, requires more than single-pattern search. Rabin-Karp solves this using polynomial hashing: compute a hash for each window of text equal in size to the pattern, and compare hashes rather than characters. The rolling hash property means each step takes O(1) — subtract the contribution of the outgoing character and add the incoming one. This enables searching for k patterns simultaneously in O(n + k*m) expected time. Real-world uses: duplicate code detection, plagiarism detection, DNA fingerprinting, and detecting repeated substrings in large files.

🎯 Learning Outcomes

  • • Understand polynomial rolling hash: hash = (c0*b^(m-1) + c1*b^(m-2) + ... + c(m-1)) mod p
  • • Implement O(1) hash update by rolling: subtract old char contribution, multiply, add new char
  • • Handle hash collisions with character-level verification (spurious hits)
  • • Recognize Rabin-Karp's advantage for multi-pattern search vs single-pattern algorithms
  • • Learn the importance of choosing prime modulus and base to minimize collision probability
  • Code Example

    #![allow(clippy::all)]
    //! Placeholder

    Key Differences

    AspectRustOCaml
    Arithmeticu64 with explicit % modulusint (63-bit) with mod
    Byte access.as_bytes()[i]Char.code (String.get s i)
    Hash stateMutable variable in loopThreaded through recursion
    Multi-patternHashMap<u64, Vec<Pattern>>Hashtbl keyed by hash
    Overflow riskNone with u64 % modulusRare on 64-bit, possible on 32-bit
    False positivesVerified with == comparisonSame character-level check

    OCaml Approach

    OCaml implements rolling hash with int arithmetic, relying on its 63-bit native integers to avoid overflow on most platforms. The Char.code function extracts byte values. The rolling hash state threads through a tail-recursive function rather than a loop, maintaining immutability. OCaml's List.rev collects match positions in reverse and reverses at the end. For multi-pattern Rabin-Karp, a Hashtbl maps hash values to pattern lists, checking all patterns with matching hashes.

    Full Source

    #![allow(clippy::all)]
    //! Placeholder

    Deep Comparison

    OCaml vs Rust: Rabin Karp Rolling Hash

    Overview

    See the example.rs and example.ml files for detailed implementations.

    Key Differences

    AspectOCamlRust
    Type systemHindley-MilnerOwnership + traits
    MemoryGCZero-cost abstractions
    MutabilityExplicit refmut keyword
    Error handlingOption/ResultResult<T, E>

    See README.md for detailed comparison.

    Exercises

  • Implement 2D Rabin-Karp to find a pattern matrix inside a larger text matrix.
  • Add multi-pattern support: search for all patterns in a set simultaneously in a single pass.
  • Measure false positive rate empirically with random text and various base/modulus choices.
  • Compare performance with Boyer-Moore on long text with many short patterns of equal length.
  • Handle Unicode by hashing on code points rather than bytes and verify correctness.
  • Open Source Repos