ExamplesBy LevelBy TopicLearning Paths
013 Intermediate

013 — Direct Run-Length Encoding

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "013 — Direct Run-Length Encoding" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. OCaml 99 Problems #13 challenges you to implement run-length encoding directly — without first packing runs into sublists (as in example 010) and then counting them. Key difference from OCaml: 1. **Two

Tutorial

The Problem

OCaml 99 Problems #13 challenges you to implement run-length encoding directly — without first packing runs into sublists (as in example 010) and then counting them. This single-pass approach is more efficient because it avoids intermediate allocations and processes each element exactly once.

The direct approach teaches an important programming pattern: maintaining a small amount of state (current element, current count) as you scan through a sequence, emitting output at each run boundary. This is the basis for streaming compression algorithms, tokenizers, lexers, and parser front-ends. Anything that groups consecutive similar items uses this pattern.

🎯 Learning Outcomes

  • • Implement a stateful single-pass scan using a while loop and index tracking
  • • Use the "current run" pattern: track (current_element, count) and emit at boundaries
  • • Compare single-pass direct encoding with two-pass (pack then count) approaches
  • • Understand when single-pass algorithms are preferable to multi-pass
  • • Apply the RleItem enum from example 011 as a shared output type
  • • Use a two-pointer scan: start marks the run beginning, i advances until a different element is found
  • • Understand that single-pass encoding is more memory-efficient than two-pass (pack-then-count) encoding
  • Code Example

    list.iter().fold(Vec::new(), |mut acc, x| {
        match acc.last_mut() {
            Some(RleItem::Many(n, ref y)) if y == x => { *n += 1; }
            Some(RleItem::One(ref y)) if y == x => {
                *acc.last_mut().unwrap() = RleItem::Many(2, y.clone());
            }
            _ => acc.push(RleItem::One(x.clone())),
        }
        acc
    })

    Key Differences

  • Two-pointer vs nested helper: Rust uses an index-based two-pointer approach (idiomatic for Vec/slice). OCaml uses a nested recursive count_run function (idiomatic for linked lists).
  • Slice vs list navigation: Rust random-accesses list[i] and list[start] in O(1). OCaml's list traversal is sequential — random access is O(n), so index-based approaches are avoided.
  • Boundary detection: Rust checks list[i] != list[start] to detect run end. OCaml's when y = x guard in the match arm advances while equal.
  • Output accumulation: Rust pushes to a Vec (amortized O(1)). OCaml builds a cons list in reverse and reverses at the end, or builds forward using a tail-recursive accumulator.
  • Single-pass vs two-pass: Direct encoding scans once, emitting output at run boundaries. Two-pass (pack then encode) scans twice and allocates an intermediate grouped structure. For large inputs, single-pass is significantly faster.
  • State tracking: The direct approach tracks (start, i) or (current, count) explicitly. OCaml's recursive version passes state as function arguments — equivalent logic, different syntactic form.
  • **while loop vs recursion:** The nested while loops in Rust's direct version are idiomatic for scanning with two pointers. OCaml prefers recursive definitions; Rust uses loops for efficiency.
  • **Shared RleItem type:** Both this example and example 011 use the same RleItem<T> enum — showing how shared types enable composition between modules.
  • OCaml Approach

    The direct OCaml version is: let rec encode lst = match lst with | [] -> [] | x :: rest -> let rec count_run acc = function | y :: t when y = x -> count_run (acc + 1) t | rest -> (acc, rest) in let (n, remaining) = count_run 1 rest in let item = if n = 1 then One x else Many (n, x) in item :: encode remaining. This uses a nested helper count_run that advances through matching elements, making the structure explicitly recursive.

    Full Source

    #![allow(clippy::all)]
    /// Direct Run-Length Encoding (99 Problems #13)
    ///
    /// Implement RLE directly — don't create sublists first, then count.
    /// Instead, count consecutive duplicates in a single pass.
    
    #[derive(Debug, PartialEq, Clone)]
    pub enum RleItem<T> {
        One(T),
        Many(usize, T),
    }
    
    // ── Idiomatic Rust: single-pass with state ──────────────────────────────────
    
    pub fn encode_direct<T: PartialEq + Clone>(list: &[T]) -> Vec<RleItem<T>> {
        let mut result = Vec::new();
        let mut i = 0;
        while i < list.len() {
            let start = i;
            while i < list.len() && list[i] == list[start] {
                i += 1;
            }
            let count = i - start;
            result.push(if count == 1 {
                RleItem::One(list[start].clone())
            } else {
                RleItem::Many(count, list[start].clone())
            });
        }
        result
    }
    
    // ── Recursive style with accumulator ────────────────────────────────────────
    
    pub fn encode_direct_recursive<T: PartialEq + Clone>(list: &[T]) -> Vec<RleItem<T>> {
        fn aux<T: PartialEq + Clone>(list: &[T], count: usize, acc: &mut Vec<RleItem<T>>) {
            match list.split_first() {
                None => {
                    // Flush any remaining count — handled by previous step
                }
                Some((head, tail)) => {
                    match tail.first() {
                        Some(next) if next == head => {
                            aux(tail, count + 1, acc);
                        }
                        _ => {
                            // End of run (or end of list)
                            acc.push(if count + 1 == 1 {
                                RleItem::One(head.clone())
                            } else {
                                RleItem::Many(count + 1, head.clone())
                            });
                            aux(tail, 0, acc);
                        }
                    }
                }
            }
        }
        let mut result = Vec::new();
        aux(list, 0, &mut result);
        result
    }
    
    // ── Fold-based approach ─────────────────────────────────────────────────────
    
    pub fn encode_direct_fold<T: PartialEq + Clone>(list: &[T]) -> Vec<RleItem<T>> {
        list.iter().fold(Vec::new(), |mut acc, x| {
            match acc.last_mut() {
                Some(RleItem::One(ref y)) if y == x => {
                    let y_clone = y.clone();
                    *acc.last_mut().unwrap() = RleItem::Many(2, y_clone);
                }
                Some(RleItem::Many(n, ref y)) if y == x => {
                    *n += 1;
                }
                _ => {
                    acc.push(RleItem::One(x.clone()));
                }
            }
            acc
        })
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
        use RleItem::*;
    
        #[test]
        fn test_empty() {
            assert_eq!(encode_direct::<i32>(&[]), vec![]);
            assert_eq!(encode_direct_recursive::<i32>(&[]), vec![]);
            assert_eq!(encode_direct_fold::<i32>(&[]), vec![]);
        }
    
        #[test]
        fn test_no_repeats() {
            let input = vec!['a', 'b', 'c'];
            let expected = vec![One('a'), One('b'), One('c')];
            assert_eq!(encode_direct(&input), expected);
            assert_eq!(encode_direct_fold(&input), expected);
        }
    
        #[test]
        fn test_all_same() {
            assert_eq!(encode_direct(&[1, 1, 1, 1]), vec![Many(4, 1)]);
        }
    
        #[test]
        fn test_classic_example() {
            let input = vec![
                'a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e',
            ];
            let expected = vec![
                Many(4, 'a'),
                One('b'),
                Many(2, 'c'),
                Many(2, 'a'),
                One('d'),
                Many(4, 'e'),
            ];
            assert_eq!(encode_direct(&input), expected);
            assert_eq!(encode_direct_recursive(&input), expected);
            assert_eq!(encode_direct_fold(&input), expected);
        }
    
        #[test]
        fn test_single() {
            assert_eq!(encode_direct(&['z']), vec![One('z')]);
        }
    
        #[test]
        fn test_two_same() {
            assert_eq!(encode_direct(&[5, 5]), vec![Many(2, 5)]);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
        use RleItem::*;
    
        #[test]
        fn test_empty() {
            assert_eq!(encode_direct::<i32>(&[]), vec![]);
            assert_eq!(encode_direct_recursive::<i32>(&[]), vec![]);
            assert_eq!(encode_direct_fold::<i32>(&[]), vec![]);
        }
    
        #[test]
        fn test_no_repeats() {
            let input = vec!['a', 'b', 'c'];
            let expected = vec![One('a'), One('b'), One('c')];
            assert_eq!(encode_direct(&input), expected);
            assert_eq!(encode_direct_fold(&input), expected);
        }
    
        #[test]
        fn test_all_same() {
            assert_eq!(encode_direct(&[1, 1, 1, 1]), vec![Many(4, 1)]);
        }
    
        #[test]
        fn test_classic_example() {
            let input = vec![
                'a', 'a', 'a', 'a', 'b', 'c', 'c', 'a', 'a', 'd', 'e', 'e', 'e', 'e',
            ];
            let expected = vec![
                Many(4, 'a'),
                One('b'),
                Many(2, 'c'),
                Many(2, 'a'),
                One('d'),
                Many(4, 'e'),
            ];
            assert_eq!(encode_direct(&input), expected);
            assert_eq!(encode_direct_recursive(&input), expected);
            assert_eq!(encode_direct_fold(&input), expected);
        }
    
        #[test]
        fn test_single() {
            assert_eq!(encode_direct(&['z']), vec![One('z')]);
        }
    
        #[test]
        fn test_two_same() {
            assert_eq!(encode_direct(&[5, 5]), vec![Many(2, 5)]);
        }
    }

    Deep Comparison

    Direct Run-Length Encoding: OCaml vs Rust

    The Core Insight

    Direct RLE eliminates the intermediate step of grouping — it counts runs on the fly. This requires carrying mutable state (the current count) through traversal. OCaml does this with recursive accumulators; Rust offers both fold-based and imperative approaches, with the fold version being particularly interesting because it mutates the accumulator in place.

    OCaml Approach

    The recursive version carries a count alongside the accumulator:

    let rec aux count acc = function
      | [] -> List.rev acc
      | [x] -> List.rev ((if count = 0 then One x else Many (count+1, x)) :: acc)
      | x :: (y :: _ as rest) ->
        if x = y then aux (count+1) acc rest
        else let item = if count = 0 then One x else Many (count+1, x) in
             aux 0 (item :: acc) rest
    

    The fold version builds the result in reverse, checking the head of the accumulator to extend the current run.

    Rust Approach

    Rust's fold approach modifies the last element of the accumulator in place:

    list.iter().fold(Vec::new(), |mut acc, x| {
        match acc.last_mut() {
            Some(RleItem::Many(n, ref y)) if y == x => { *n += 1; }
            Some(RleItem::One(ref y)) if y == x => {
                *acc.last_mut().unwrap() = RleItem::Many(2, y.clone());
            }
            _ => acc.push(RleItem::One(x.clone())),
        }
        acc
    })
    

    The last_mut() method gives a mutable reference to the last element — a pattern impossible in pure functional OCaml but natural in Rust.

    Key Differences

    AspectOCamlRust
    State threadingExplicit parameters (count, acc)fold with mutable accumulator
    MutationNever (new list nodes)In-place via last_mut()
    Peeking aheadx :: (y :: _ as rest) patterntail.first() or index comparison
    Reverse at endList.rev acc (common pattern)Not needed (push appends)
    Two-element look-aheadNative with pattern matchingRequires explicit index logic

    What Rust Learners Should Notice

  • • **last_mut() enables accumulator mutation**: Rust's ownership system lets you mutate the last element of a Vec through a mutable reference — something OCaml can't do with immutable lists.
  • OCaml's two-element pattern is cleaner: x :: (y :: _ as rest) naturally looks at two elements. Rust requires comparing list[i] with list[i-1] or using windows(2).
  • Both approaches are O(n): The algorithmic complexity is identical. The difference is stylistic — OCaml's recursive version is more declarative, Rust's fold version is more imperative.
  • Direct encoding avoids allocation: Both languages benefit from not creating intermediate sublists, making this more memory-efficient than the two-step approach.
  • Further Reading

  • • [99 OCaml Problems #13](https://ocaml.org/problems)
  • • [Rust — Iterator::fold](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.fold)
  • • [Rust — slice::last_mut](https://doc.rust-lang.org/std/primitive.slice.html#method.last_mut)
  • Exercises

  • Encode bytes: Adapt encode_direct to work on &[u8] for binary data encoding. Benchmark it against the version from example 011 on a large repeated-byte input.
  • Maximum run: Write longest_run<T: PartialEq>(list: &[T]) -> Option<(usize, &T)> that returns the count and value of the longest consecutive run in a single pass.
  • RLE streaming encoder: Implement RleEncoder<T> as a struct that accepts elements one at a time via push(&mut self, x: T) and emits complete RleItem<T> values when a run ends (like a streaming codec).
  • Streaming encoder: Implement encode_direct as an Iterator<Item = RleItem<T>> — returning a lazy encoder that produces encoded items on demand without building the full output vector.
  • Benchmark comparison: Measure the performance difference between the two-pass approach (example 010) and the direct single-pass approach on a 1MB input with varying run lengths.
  • Open Source Repos