Suffix Array
Tutorial Video
Text description (accessibility)
This video demonstrates the "Suffix Array" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. A suffix array enables lightning-fast substring search, counting occurrences, finding the longest repeated substring, and computing the longest common substring between two strings — all after O(n log n) preprocessing. Key difference from OCaml: | Aspect | Rust | OCaml |
Tutorial
The Problem
A suffix array enables lightning-fast substring search, counting occurrences, finding the longest repeated substring, and computing the longest common substring between two strings — all after O(n log n) preprocessing. Without a suffix array or suffix tree, answering "how many times does pattern P appear in text T?" requires O(n*m) per query. With a suffix array, binary search answers each query in O(m log n). This powers full-text search engines, bioinformatics sequence analysis, data compression (BWT transform), and plagiarism detection tools. The suffix array is a space-efficient alternative to suffix trees that fits in O(n) integers.
🎯 Learning Outcomes
Code Example
#![allow(clippy::all)]
//! PlaceholderKey Differences
| Aspect | Rust | OCaml |
|---|---|---|
| Suffix comparison | bytes[a..].cmp(&bytes[b..]) | String.sub s a len |> compare |
| Sort | .sort_by() with closure | Array.sort with comparison function |
| Construction complexity | Naive O(n^2 log n) | Same for naive; SA-IS for O(n) |
| Binary search | partition_point for range | Manual bisection or library |
| Memory | Vec<usize> — cache-friendly | int array — GC-managed |
| Unicode | Byte-level (ASCII assumed) | Uchar.t array for full Unicode |
OCaml Approach
OCaml builds the suffix array with Array.init n (fun i -> i) then Array.sort with a comparator using String.sub for suffix extraction. The naive approach is functionally clean: Array.sort (fun a b -> compare (String.sub s a n) (String.sub s b n)) sa. More efficient O(n log^2 n) doubling algorithms use Array.map to rank suffixes and iteratively refine. OCaml's String.compare for lexicographic order maps directly to the comparison needed. For binary search, Array.binary_search (in some libraries) or manual bisection applies.
Full Source
#![allow(clippy::all)]
//! PlaceholderDeep Comparison
OCaml vs Rust: Suffix Array
Overview
See the example.rs and example.ml files for detailed implementations.
Key Differences
| Aspect | OCaml | Rust |
|---|---|---|
| Type system | Hindley-Milner | Ownership + traits |
| Memory | GC | Zero-cost abstractions |
| Mutability | Explicit ref | mut keyword |
| Error handling | Option/Result | Result<T, E> |
See README.md for detailed comparison.
Exercises
search_range function returning all positions where a pattern occurs using the LCP array.