011 — Modified Run-Length Encoding
Tutorial Video
Text description (accessibility)
This video demonstrates the "011 — Modified Run-Length Encoding" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Standard run-length encoding always emits `(count, element)` pairs, but this is wasteful for singleton runs: `(1, 'a')` is less clear than just `'a'`. Key difference from OCaml: 1. **Generic enum**: Rust's `RleItem<T>` is a generic enum with a type parameter. OCaml's `'a rle` uses a type variable directly in the variant definition — the same concept, different syntax.
Tutorial
The Problem
Standard run-length encoding always emits (count, element) pairs, but this is wasteful for singleton runs: (1, 'a') is less clear than just 'a'. The modified encoding (OCaml 99 Problems #11) uses a sum type: elements with count > 1 are represented as Many(count, element), while singletons are represented as One(element). This avoids redundancy and makes the encoding self-describing.
This problem demonstrates how algebraic data types (enums with data) replace verbose class hierarchies. The RleItem<T> enum is the Rust/OCaml equivalent of a sealed interface with two implementations in Java. Modified RLE is used in fax transmission (CCITT T.4), BMP file compression, and PCX image format.
🎯 Learning Outcomes
One vs Many)PartialEq + Clone trait bounds for generic typesposition() to find run boundariesDebug and PartialEq on RleItem<T> to enable assertions in testsCode Example
#[derive(Debug, PartialEq, Clone)]
pub enum RleItem<T> { One(T), Many(usize, T) }Key Differences
RleItem<T> is a generic enum with a type parameter. OCaml's 'a rle uses a type variable directly in the variant definition — the same concept, different syntax.T: Clone when the encode function clones elements into results. OCaml's GC shares structure automatically; no explicit cloning.list[i] == list[i-1]) in the loop. OCaml's span (not in stdlib but easily defined) splits a list by predicate — a higher-level abstraction.Vec<RleItem<T>> stores enum variants inline with a discriminant tag. OCaml boxes variant values on the heap.RleItem<T> with One(T) and Many(usize, T) is a sum type — a value is exactly one of two shapes. OCaml's type 'a rle = One of 'a | Many of int * 'a is identical in concept and nearly identical in syntax.RleItem<T> is generic over T. The enum itself is not tied to any specific element type. OCaml's 'a rle is the same.i from 1..=list.len() and accesses list[i-1] at boundaries. This is a common pattern when you need to emit the last group after the loop ends.RleItem::One(T) owns the value. Constructing RleItem::Many(count, list[i-1].clone()) requires cloning from the slice — explicit in Rust, transparent in OCaml.OCaml Approach
OCaml defines type 'a rle = One of 'a | Many of int * 'a. The encoding function matches on lists: let rec encode = function | [] -> [] | x :: xs -> let (run, rest) = span (fun y -> y = x) xs in let item = if run = [] then One x else Many (1 + List.length run, x) in item :: encode rest. The span function splits the list at the first element that does not match — a common functional idiom.
Full Source
#![allow(clippy::all)]
/// Modified Run-Length Encoding (99 Problems #11)
///
/// Modify the RLE result so that elements without duplicates are simply
/// copied into the result list, while duplicates are encoded as (count, elem).
#[derive(Debug, PartialEq, Clone)]
pub enum RleItem<T> {
One(T),
Many(usize, T),
}
// ── Idiomatic Rust: iterator-based ──────────────────────────────────────────
pub fn encode_modified<T: PartialEq + Clone>(list: &[T]) -> Vec<RleItem<T>> {
if list.is_empty() {
return vec![];
}
let mut result = Vec::new();
let mut count = 1;
for i in 1..=list.len() {
if i < list.len() && list[i] == list[i - 1] {
count += 1;
} else {
result.push(if count == 1 {
RleItem::One(list[i - 1].clone())
} else {
RleItem::Many(count, list[i - 1].clone())
});
count = 1;
}
}
result
}
// ── Functional/recursive style ──────────────────────────────────────────────
pub fn encode_modified_recursive<T: PartialEq + Clone>(list: &[T]) -> Vec<RleItem<T>> {
fn pack_run<T: PartialEq + Clone>(list: &[T]) -> (&[T], &[T]) {
if list.len() <= 1 {
return (list, &[]);
}
let first = &list[0];
let end = list.iter().position(|x| x != first).unwrap_or(list.len());
(&list[..end], &list[end..])
}
fn aux<T: PartialEq + Clone>(list: &[T], acc: &mut Vec<RleItem<T>>) {
if list.is_empty() {
return;
}
let (run, rest) = pack_run(list);
acc.push(if run.len() == 1 {
RleItem::One(run[0].clone())
} else {
RleItem::Many(run.len(), run[0].clone())
});
aux(rest, acc);
}
let mut result = Vec::new();
aux(list, &mut result);
result
}
// ── Using chunk_by (functional, slice-based) ────────────────────────────────
pub fn encode_modified_chunks<T: PartialEq + Clone>(list: &[T]) -> Vec<RleItem<T>> {
// Manual chunk-by since slice::chunk_by is nightly
let mut result = Vec::new();
let mut i = 0;
while i < list.len() {
let start = i;
while i < list.len() && list[i] == list[start] {
i += 1;
}
let len = i - start;
result.push(if len == 1 {
RleItem::One(list[start].clone())
} else {
RleItem::Many(len, list[start].clone())
});
}
result
}
#[cfg(test)]
mod tests {
use super::*;
use RleItem::*;
#[test]
fn test_empty() {
assert_eq!(encode_modified::<char>(&[]), vec![]);
assert_eq!(encode_modified_recursive::<char>(&[]), vec![]);
}
#[test]
fn test_no_duplicates() {
assert_eq!(
encode_modified(&['a', 'b', 'c']),
vec![One('a'), One('b'), One('c')]
);
}
#[test]
fn test_all_same() {
assert_eq!(encode_modified(&['x', 'x', 'x']), vec![Many(3, 'x')]);
}
#[test]
fn test_mixed() {
let input = vec!['a', 'a', 'a', 'b', 'c', 'c', 'd', 'd', 'd', 'd'];
let expected = vec![Many(3, 'a'), One('b'), Many(2, 'c'), Many(4, 'd')];
assert_eq!(encode_modified(&input), expected);
assert_eq!(encode_modified_recursive(&input), expected);
assert_eq!(encode_modified_chunks(&input), expected);
}
#[test]
fn test_single_element() {
assert_eq!(encode_modified(&[42]), vec![One(42)]);
}
#[test]
fn test_strings() {
let input = vec!["hi", "hi", "bye"];
assert_eq!(encode_modified(&input), vec![Many(2, "hi"), One("bye")]);
}
}#[cfg(test)]
mod tests {
use super::*;
use RleItem::*;
#[test]
fn test_empty() {
assert_eq!(encode_modified::<char>(&[]), vec![]);
assert_eq!(encode_modified_recursive::<char>(&[]), vec![]);
}
#[test]
fn test_no_duplicates() {
assert_eq!(
encode_modified(&['a', 'b', 'c']),
vec![One('a'), One('b'), One('c')]
);
}
#[test]
fn test_all_same() {
assert_eq!(encode_modified(&['x', 'x', 'x']), vec![Many(3, 'x')]);
}
#[test]
fn test_mixed() {
let input = vec!['a', 'a', 'a', 'b', 'c', 'c', 'd', 'd', 'd', 'd'];
let expected = vec![Many(3, 'a'), One('b'), Many(2, 'c'), Many(4, 'd')];
assert_eq!(encode_modified(&input), expected);
assert_eq!(encode_modified_recursive(&input), expected);
assert_eq!(encode_modified_chunks(&input), expected);
}
#[test]
fn test_single_element() {
assert_eq!(encode_modified(&[42]), vec![One(42)]);
}
#[test]
fn test_strings() {
let input = vec!["hi", "hi", "bye"];
assert_eq!(encode_modified(&input), vec![Many(2, "hi"), One("bye")]);
}
}
Deep Comparison
Modified Run-Length Encoding: OCaml vs Rust
The Core Insight
This problem naturally requires a sum type to represent two kinds of elements: singles and runs. Both OCaml and Rust make this elegant with algebraic data types. The interesting part is how each language handles the stateful traversal — accumulating runs of equal elements while distinguishing singletons.
OCaml Approach
OCaml defines a parameterized variant type and uses pattern matching with accumulators:
type 'a rle = One of 'a | Many of int * 'a
let encode lst =
pack lst |> List.map (fun group ->
match group with
| [x] -> One x
| x :: _ -> Many (List.length group, x)
| [] -> failwith "impossible")
The pack helper groups consecutive duplicates into sublists first, then encode classifies each group. The direct version tracks count in a recursive accumulator.
Rust Approach
Rust uses a generic enum with the same structure:
#[derive(Debug, PartialEq, Clone)]
pub enum RleItem<T> { One(T), Many(usize, T) }
The idiomatic approach uses index-based iteration with equality checks. The recursive version uses split_first() or position-finding to identify runs. Both require Clone because Rust can't share data implicitly.
Key Differences
| Aspect | OCaml | Rust |
|---|---|---|
| Sum type | type 'a rle = One of 'a \| Many of int * 'a | enum RleItem<T> { One(T), Many(usize, T) } |
| Type parameter | 'a (implicit) | <T: PartialEq + Clone> (explicit bounds) |
| Equality | Structural (built-in) | Requires PartialEq trait |
| Copying | Automatic (GC) | Requires Clone |
| Grouping | pack into sublists, then map | Direct counting (no intermediate lists) |
| Pattern matching | [x] matches singleton list | No slice literal patterns in stable |
| Performance | O(n) with intermediate allocations | O(n) single pass possible |
What Rust Learners Should Notice
PartialEq to compare elements. This makes the contract visible.Clone when you want to put a value into an enum variant while still traversing the original slice.[x] and x :: rest patterns don't exist for Rust slices (stable). You use split_first(), .len(), or index-based patterns instead.#[derive] for common traits, add trait bounds for generics.Further Reading
Exercises
decode_modified(encoded: &[RleItem<T>]) -> Vec<T> function and verify that decode_modified(encode_modified(v)) == v for any input.compression_ratio(original: &[i32]) -> f64 that returns original.len() as f64 / encode_modified(original).len() as f64. What input maximizes/minimizes compression?encode_modified to accept impl Iterator<Item=T> and return impl Iterator<Item=RleItem<T>> without collecting to a Vec first.proptest or manual generation) that verifies decode(encode_modified(list)) == list for any list of characters.RleItem<T> encoding to a simple (count, value) encoding by implementing to_plain_rle(items: &[RleItem<T>]) -> Vec<(usize, T)>.