ExamplesBy LevelBy TopicLearning Paths
818 Advanced

Suffix Array

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "Suffix Array" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. A suffix array enables lightning-fast substring search, counting occurrences, finding the longest repeated substring, and computing the longest common substring between two strings — all after O(n log n) preprocessing. Key difference from OCaml: | Aspect | Rust | OCaml |

Tutorial

The Problem

A suffix array enables lightning-fast substring search, counting occurrences, finding the longest repeated substring, and computing the longest common substring between two strings — all after O(n log n) preprocessing. Without a suffix array or suffix tree, answering "how many times does pattern P appear in text T?" requires O(n*m) per query. With a suffix array, binary search answers each query in O(m log n). This powers full-text search engines, bioinformatics sequence analysis, data compression (BWT transform), and plagiarism detection tools. The suffix array is a space-efficient alternative to suffix trees that fits in O(n) integers.

🎯 Learning Outcomes

  • • Understand that a suffix array is a sorted array of starting indices of all suffixes of a string
  • • Implement the naive O(n^2 log n) construction using sort with suffix comparisons
  • • Recognize how binary search on a sorted suffix array answers substring queries in O(m log n)
  • • Learn how the LCP (Longest Common Prefix) array augments the suffix array for advanced queries
  • • Compare with suffix trees: suffix arrays use less memory but require more algorithmic sophistication
  • Code Example

    #![allow(clippy::all)]
    //! Placeholder

    Key Differences

    AspectRustOCaml
    Suffix comparisonbytes[a..].cmp(&bytes[b..])String.sub s a len |> compare
    Sort.sort_by() with closureArray.sort with comparison function
    Construction complexityNaive O(n^2 log n)Same for naive; SA-IS for O(n)
    Binary searchpartition_point for rangeManual bisection or library
    MemoryVec<usize> — cache-friendlyint array — GC-managed
    UnicodeByte-level (ASCII assumed)Uchar.t array for full Unicode

    OCaml Approach

    OCaml builds the suffix array with Array.init n (fun i -> i) then Array.sort with a comparator using String.sub for suffix extraction. The naive approach is functionally clean: Array.sort (fun a b -> compare (String.sub s a n) (String.sub s b n)) sa. More efficient O(n log^2 n) doubling algorithms use Array.map to rank suffixes and iteratively refine. OCaml's String.compare for lexicographic order maps directly to the comparison needed. For binary search, Array.binary_search (in some libraries) or manual bisection applies.

    Full Source

    #![allow(clippy::all)]
    //! Placeholder

    Deep Comparison

    OCaml vs Rust: Suffix Array

    Overview

    See the example.rs and example.ml files for detailed implementations.

    Key Differences

    AspectOCamlRust
    Type systemHindley-MilnerOwnership + traits
    MemoryGCZero-cost abstractions
    MutabilityExplicit refmut keyword
    Error handlingOption/ResultResult<T, E>

    See README.md for detailed comparison.

    Exercises

  • Implement the LCP array using Kasai's algorithm and use it to find the longest repeated substring.
  • Add a search_range function returning all positions where a pattern occurs using the LCP array.
  • Implement the O(n log^2 n) suffix array construction using the doubling technique.
  • Use the suffix array to find the longest common substring of two strings by concatenating with a sentinel.
  • Benchmark suffix array binary search against a simple sliding window search for varying pattern lengths.
  • Open Source Repos