ExamplesBy LevelBy TopicLearning Paths
823 Fundamental

Polynomial String Hashing

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "Polynomial String Hashing" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Comparing strings character by character takes O(m) time. Key difference from OCaml: | Aspect | Rust | OCaml |

Tutorial

The Problem

Comparing strings character by character takes O(m) time. When you need to compare many substrings — checking if a string is a rotation of another, finding duplicate substrings, comparing all n^2 pairs of substrings — naive comparison becomes O(n^2 * m). Polynomial hashing reduces substring comparison to O(1) with O(n) preprocessing: compute prefix hashes, then any substring hash is (prefix[r] - prefix[l] * base^(r-l)) % mod. This enables O(n log n) string sorting, O(n) duplicate detection, and O(n log n) LCP (longest common prefix) binary search. It's the hash function behind rolling hash algorithms, string fingerprinting, and near-duplicate document detection.

🎯 Learning Outcomes

  • • Build prefix hash array h[i] = h[i-1]*base + s[i] for O(1) substring hash queries
  • • Precompute powers of base for the rolling subtraction formula
  • • Handle hash collisions using double hashing (two independent hash functions)
  • • Understand the polynomial hash formula and why prime moduli reduce collision probability
  • • Apply hashing to O(n log n) LCP computation via binary search + hash comparison
  • Code Example

    #![allow(clippy::all)]
    //! Placeholder

    Key Differences

    AspectRustOCaml
    Arithmetic typeu64 (64-bit unsigned)int (63-bit signed) or Int64
    Underflow preventionAdd modulus^2 before subtractUse absolute value or unsigned
    Double hashingTwo PolyHash structsRecord with two hash arrays
    Power precomputationVec<u64> with loopArray.init or loop
    Collision rate1/mod per hashSame; 1/(mod1*mod2) for double
    String access.as_bytes()[i]Char.code (String.get s i)

    OCaml Approach

    OCaml uses Int64 or native int (63-bit on 64-bit systems) for modular arithmetic. The prefix hash array is Array.make (n+1) 0 and powers array is Array.make (n+1) 1. OCaml's lsl, land, and mod operators handle modular arithmetic. The get function is a simple pure function over the arrays. For double hashing, a record { h1: int array; h2: int array; pw1: int array; pw2: int array } keeps both hash functions bundled. OCaml's polymorphic comparison avoids the need for custom hash combination.

    Full Source

    #![allow(clippy::all)]
    //! Placeholder

    Deep Comparison

    OCaml vs Rust: String Hashing Poly

    Overview

    See the example.rs and example.ml files for detailed implementations.

    Key Differences

    AspectOCamlRust
    Type systemHindley-MilnerOwnership + traits
    MemoryGCZero-cost abstractions
    MutabilityExplicit refmut keyword
    Error handlingOption/ResultResult<T, E>

    See README.md for detailed comparison.

    Exercises

  • Implement O(n log n) LCP array computation using binary search + polynomial hashing.
  • Use double hashing to find the longest duplicate substring with high confidence and no false positives.
  • Detect all anagram windows: substrings of length k that are permutations of a query string.
  • Implement string sorting using hash-based radix sort on hashed prefixes.
  • Measure empirical collision rate for single vs. double hashing on a large random string corpus.
  • Open Source Repos