ExamplesBy LevelBy TopicLearning Paths
011 Intermediate

011 — Modified Run-Length Encoding

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "011 — Modified Run-Length Encoding" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Standard run-length encoding always emits `(count, element)` pairs, but this is wasteful for singleton runs: `(1, 'a')` is less clear than just `'a'`. Key difference from OCaml: 1. **Generic enum**: Rust's `RleItem<T>` is a generic enum with a type parameter. OCaml's `'a rle` uses a type variable directly in the variant definition — the same concept, different syntax.

Tutorial

The Problem

Standard run-length encoding always emits (count, element) pairs, but this is wasteful for singleton runs: (1, 'a') is less clear than just 'a'. The modified encoding (OCaml 99 Problems #11) uses a sum type: elements with count > 1 are represented as Many(count, element), while singletons are represented as One(element). This avoids redundancy and makes the encoding self-describing.

This problem demonstrates how algebraic data types (enums with data) replace verbose class hierarchies. The RleItem<T> enum is the Rust/OCaml equivalent of a sealed interface with two implementations in Java. Modified RLE is used in fax transmission (CCITT T.4), BMP file compression, and PCX image format.

🎯 Learning Outcomes

  • • Use a generic enum to represent a sum type (One vs Many)
  • • Implement encoding in both iterative (stateful loop) and recursive styles
  • • Apply the run-detection pattern: scan for runs, then emit the appropriate variant
  • • Understand how enums with data replace class hierarchies from OOP languages
  • • Use PartialEq + Clone trait bounds for generic types
  • • Implement both a stateful loop and a recursive helper that uses position() to find run boundaries
  • • Derive Debug and PartialEq on RleItem<T> to enable assertions in tests
  • Code Example

    #[derive(Debug, PartialEq, Clone)]
    pub enum RleItem<T> { One(T), Many(usize, T) }

    Key Differences

  • Generic enum: Rust's RleItem<T> is a generic enum with a type parameter. OCaml's 'a rle uses a type variable directly in the variant definition — the same concept, different syntax.
  • Clone bound: Rust requires T: Clone when the encode function clones elements into results. OCaml's GC shares structure automatically; no explicit cloning.
  • Run detection: Rust uses indexed access (list[i] == list[i-1]) in the loop. OCaml's span (not in stdlib but easily defined) splits a list by predicate — a higher-level abstraction.
  • Memory layout: Rust's Vec<RleItem<T>> stores enum variants inline with a discriminant tag. OCaml boxes variant values on the heap.
  • Sum types: RleItem<T> with One(T) and Many(usize, T) is a sum type — a value is exactly one of two shapes. OCaml's type 'a rle = One of 'a | Many of int * 'a is identical in concept and nearly identical in syntax.
  • Generic enum: Rust's RleItem<T> is generic over T. The enum itself is not tied to any specific element type. OCaml's 'a rle is the same.
  • Pattern matching in loops: The stateful loop uses an index i from 1..=list.len() and accesses list[i-1] at boundaries. This is a common pattern when you need to emit the last group after the loop ends.
  • Ownership: RleItem::One(T) owns the value. Constructing RleItem::Many(count, list[i-1].clone()) requires cloning from the slice — explicit in Rust, transparent in OCaml.
  • OCaml Approach

    OCaml defines type 'a rle = One of 'a | Many of int * 'a. The encoding function matches on lists: let rec encode = function | [] -> [] | x :: xs -> let (run, rest) = span (fun y -> y = x) xs in let item = if run = [] then One x else Many (1 + List.length run, x) in item :: encode rest. The span function splits the list at the first element that does not match — a common functional idiom.

    Full Source

    #![allow(clippy::all)]
    /// Modified Run-Length Encoding (99 Problems #11)
    ///
    /// Modify the RLE result so that elements without duplicates are simply
    /// copied into the result list, while duplicates are encoded as (count, elem).
    
    #[derive(Debug, PartialEq, Clone)]
    pub enum RleItem<T> {
        One(T),
        Many(usize, T),
    }
    
    // ── Idiomatic Rust: iterator-based ──────────────────────────────────────────
    
    pub fn encode_modified<T: PartialEq + Clone>(list: &[T]) -> Vec<RleItem<T>> {
        if list.is_empty() {
            return vec![];
        }
        let mut result = Vec::new();
        let mut count = 1;
        for i in 1..=list.len() {
            if i < list.len() && list[i] == list[i - 1] {
                count += 1;
            } else {
                result.push(if count == 1 {
                    RleItem::One(list[i - 1].clone())
                } else {
                    RleItem::Many(count, list[i - 1].clone())
                });
                count = 1;
            }
        }
        result
    }
    
    // ── Functional/recursive style ──────────────────────────────────────────────
    
    pub fn encode_modified_recursive<T: PartialEq + Clone>(list: &[T]) -> Vec<RleItem<T>> {
        fn pack_run<T: PartialEq + Clone>(list: &[T]) -> (&[T], &[T]) {
            if list.len() <= 1 {
                return (list, &[]);
            }
            let first = &list[0];
            let end = list.iter().position(|x| x != first).unwrap_or(list.len());
            (&list[..end], &list[end..])
        }
    
        fn aux<T: PartialEq + Clone>(list: &[T], acc: &mut Vec<RleItem<T>>) {
            if list.is_empty() {
                return;
            }
            let (run, rest) = pack_run(list);
            acc.push(if run.len() == 1 {
                RleItem::One(run[0].clone())
            } else {
                RleItem::Many(run.len(), run[0].clone())
            });
            aux(rest, acc);
        }
    
        let mut result = Vec::new();
        aux(list, &mut result);
        result
    }
    
    // ── Using chunk_by (functional, slice-based) ────────────────────────────────
    
    pub fn encode_modified_chunks<T: PartialEq + Clone>(list: &[T]) -> Vec<RleItem<T>> {
        // Manual chunk-by since slice::chunk_by is nightly
        let mut result = Vec::new();
        let mut i = 0;
        while i < list.len() {
            let start = i;
            while i < list.len() && list[i] == list[start] {
                i += 1;
            }
            let len = i - start;
            result.push(if len == 1 {
                RleItem::One(list[start].clone())
            } else {
                RleItem::Many(len, list[start].clone())
            });
        }
        result
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
        use RleItem::*;
    
        #[test]
        fn test_empty() {
            assert_eq!(encode_modified::<char>(&[]), vec![]);
            assert_eq!(encode_modified_recursive::<char>(&[]), vec![]);
        }
    
        #[test]
        fn test_no_duplicates() {
            assert_eq!(
                encode_modified(&['a', 'b', 'c']),
                vec![One('a'), One('b'), One('c')]
            );
        }
    
        #[test]
        fn test_all_same() {
            assert_eq!(encode_modified(&['x', 'x', 'x']), vec![Many(3, 'x')]);
        }
    
        #[test]
        fn test_mixed() {
            let input = vec!['a', 'a', 'a', 'b', 'c', 'c', 'd', 'd', 'd', 'd'];
            let expected = vec![Many(3, 'a'), One('b'), Many(2, 'c'), Many(4, 'd')];
            assert_eq!(encode_modified(&input), expected);
            assert_eq!(encode_modified_recursive(&input), expected);
            assert_eq!(encode_modified_chunks(&input), expected);
        }
    
        #[test]
        fn test_single_element() {
            assert_eq!(encode_modified(&[42]), vec![One(42)]);
        }
    
        #[test]
        fn test_strings() {
            let input = vec!["hi", "hi", "bye"];
            assert_eq!(encode_modified(&input), vec![Many(2, "hi"), One("bye")]);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
        use RleItem::*;
    
        #[test]
        fn test_empty() {
            assert_eq!(encode_modified::<char>(&[]), vec![]);
            assert_eq!(encode_modified_recursive::<char>(&[]), vec![]);
        }
    
        #[test]
        fn test_no_duplicates() {
            assert_eq!(
                encode_modified(&['a', 'b', 'c']),
                vec![One('a'), One('b'), One('c')]
            );
        }
    
        #[test]
        fn test_all_same() {
            assert_eq!(encode_modified(&['x', 'x', 'x']), vec![Many(3, 'x')]);
        }
    
        #[test]
        fn test_mixed() {
            let input = vec!['a', 'a', 'a', 'b', 'c', 'c', 'd', 'd', 'd', 'd'];
            let expected = vec![Many(3, 'a'), One('b'), Many(2, 'c'), Many(4, 'd')];
            assert_eq!(encode_modified(&input), expected);
            assert_eq!(encode_modified_recursive(&input), expected);
            assert_eq!(encode_modified_chunks(&input), expected);
        }
    
        #[test]
        fn test_single_element() {
            assert_eq!(encode_modified(&[42]), vec![One(42)]);
        }
    
        #[test]
        fn test_strings() {
            let input = vec!["hi", "hi", "bye"];
            assert_eq!(encode_modified(&input), vec![Many(2, "hi"), One("bye")]);
        }
    }

    Deep Comparison

    Modified Run-Length Encoding: OCaml vs Rust

    The Core Insight

    This problem naturally requires a sum type to represent two kinds of elements: singles and runs. Both OCaml and Rust make this elegant with algebraic data types. The interesting part is how each language handles the stateful traversal — accumulating runs of equal elements while distinguishing singletons.

    OCaml Approach

    OCaml defines a parameterized variant type and uses pattern matching with accumulators:

    type 'a rle = One of 'a | Many of int * 'a
    
    let encode lst =
      pack lst |> List.map (fun group ->
        match group with
        | [x] -> One x
        | x :: _ -> Many (List.length group, x)
        | [] -> failwith "impossible")
    

    The pack helper groups consecutive duplicates into sublists first, then encode classifies each group. The direct version tracks count in a recursive accumulator.

    Rust Approach

    Rust uses a generic enum with the same structure:

    #[derive(Debug, PartialEq, Clone)]
    pub enum RleItem<T> { One(T), Many(usize, T) }
    

    The idiomatic approach uses index-based iteration with equality checks. The recursive version uses split_first() or position-finding to identify runs. Both require Clone because Rust can't share data implicitly.

    Key Differences

    AspectOCamlRust
    Sum typetype 'a rle = One of 'a \| Many of int * 'aenum RleItem<T> { One(T), Many(usize, T) }
    Type parameter'a (implicit)<T: PartialEq + Clone> (explicit bounds)
    EqualityStructural (built-in)Requires PartialEq trait
    CopyingAutomatic (GC)Requires Clone
    Groupingpack into sublists, then mapDirect counting (no intermediate lists)
    Pattern matching[x] matches singleton listNo slice literal patterns in stable
    PerformanceO(n) with intermediate allocationsO(n) single pass possible

    What Rust Learners Should Notice

  • Trait bounds are explicit: Where OCaml uses structural equality by default, Rust requires PartialEq to compare elements. This makes the contract visible.
  • Clone is the cost of ownership: OCaml freely copies values (GC handles it). Rust requires Clone when you want to put a value into an enum variant while still traversing the original slice.
  • No list pattern matching on slices: OCaml's [x] and x :: rest patterns don't exist for Rust slices (stable). You use split_first(), .len(), or index-based patterns instead.
  • Enums are the same concept: The translation from OCaml variant to Rust enum is nearly mechanical — add #[derive] for common traits, add trait bounds for generics.
  • Further Reading

  • • [The Rust Book — Enums and Pattern Matching](https://doc.rust-lang.org/book/ch06-00-enums.html)
  • • [99 OCaml Problems](https://ocaml.org/problems)
  • • [Rust — Deriving Traits](https://doc.rust-lang.org/book/appendix-03-derivable-traits.html)
  • Exercises

  • Round-trip: Write a decode_modified(encoded: &[RleItem<T>]) -> Vec<T> function and verify that decode_modified(encode_modified(v)) == v for any input.
  • Compression ratio: Write compression_ratio(original: &[i32]) -> f64 that returns original.len() as f64 / encode_modified(original).len() as f64. What input maximizes/minimizes compression?
  • Stream encode: Rewrite encode_modified to accept impl Iterator<Item=T> and return impl Iterator<Item=RleItem<T>> without collecting to a Vec first.
  • Round-trip test: Write a property test (using proptest or manual generation) that verifies decode(encode_modified(list)) == list for any list of characters.
  • Modify to run-length: Convert the RleItem<T> encoding to a simple (count, value) encoding by implementing to_plain_rle(items: &[RleItem<T>]) -> Vec<(usize, T)>.
  • Open Source Repos