ExamplesBy LevelBy TopicLearning Paths
822 Fundamental

Burrows-Wheeler Transform

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "Burrows-Wheeler Transform" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Data compression algorithms like bzip2 achieve high compression ratios by first applying the Burrows-Wheeler Transform (BWT) to rearrange text so that similar characters cluster together, making run-length encoding and move-to-front coding highly effective. Key difference from OCaml: | Aspect | Rust | OCaml |

Tutorial

The Problem

Data compression algorithms like bzip2 achieve high compression ratios by first applying the Burrows-Wheeler Transform (BWT) to rearrange text so that similar characters cluster together, making run-length encoding and move-to-front coding highly effective. The BWT is a reversible transformation: it takes a string, generates all rotations, sorts them lexicographically, and returns the last column. The magic is that similar contexts cluster the last-column characters. The inverse BWT recovers the original string exactly. BWT is also used in bioinformatics for FM-index construction, enabling compressed full-text indexes that power genome alignment tools like BWA and Bowtie.

🎯 Learning Outcomes

  • • Understand BWT as the last column of the sorted rotation matrix
  • • Implement forward BWT: generate all rotations, sort, take last characters, record original row index
  • • Implement inverse BWT using the first/last column correspondence and the rank property
  • • Recognize why BWT aids compression: characters with similar contexts appear together
  • • Connect BWT to the FM-index and suffix array for full-text search
  • Code Example

    #![allow(clippy::all)]
    //! Placeholder

    Key Differences

    AspectRustOCaml
    Rotation comparisonchain iterator, zero allocationString.sub concat or simulated
    BWT stringString collected from iteratorString.init with Array.get
    Inverse LF-mappingRank array via countingArray sort + index correspondence
    Memory (sort)O(n) indices + O(n log n) sortSame approach
    Rotation materializationAvoided via index sortAvoided similarly
    Compression useFoundation for bzip2-likeSame theoretical role

    OCaml Approach

    OCaml implements BWT with Array.init n (fun i -> i) for rotation indices and Array.sort with a comparator simulating rotation. String.get s ((i + n - 1) mod n) gets the last character. OCaml's String.init builds the BWT string. The inverse BWT uses a Array.sort-built rank table mapping characters to their position in the first column. OCaml's functional style uses Array.fold_left for the inverse reconstruction loop. The String.concat "" [s1; s2] approach for rotation comparison is simpler but allocates O(n) per comparison.

    Full Source

    #![allow(clippy::all)]
    //! Placeholder

    Deep Comparison

    OCaml vs Rust: Burrows Wheeler

    Overview

    See the example.rs and example.ml files for detailed implementations.

    Key Differences

    AspectOCamlRust
    Type systemHindley-MilnerOwnership + traits
    MemoryGCZero-cost abstractions
    MutabilityExplicit refmut keyword
    Error handlingOption/ResultResult<T, E>

    See README.md for detailed comparison.

    Exercises

  • Implement the inverse BWT to recover the original string from (bwt, original_row).
  • Integrate BWT with move-to-front encoding and run-length encoding to build a simple compressor.
  • Measure compression ratio on English text vs. random bytes to demonstrate BWT's effectiveness.
  • Build a simplified FM-index using the BWT and a rank/select data structure for O(m) pattern search.
  • Compare BWT-based compression ratios with deflate (gzip) on natural language text.
  • Open Source Repos