063 — Run-Length Encoding
Tutorial Video
Text description (accessibility)
This video demonstrates the "063 — Run-Length Encoding" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Run-length encoding (RLE) compresses consecutive repeated characters: "AABCCCDEEEE" → "2AB3CD4E". Key difference from OCaml: 1. **`chars()` vs `String.get`**: Rust's `s.chars()` returns a `Chars` iterator over Unicode scalar values. `s.as_bytes()[i]` gives bytes. OCaml's `s.[i]` or `String.get s i` gives bytes. Both should use character
Tutorial
The Problem
Run-length encoding (RLE) compresses consecutive repeated characters: "AABCCCDEEEE" → "2AB3CD4E". It is one of the oldest and simplest compression algorithms, used in fax machines (CCITT T.4 standard, 1980), BMP image files, PCX format, and as the basis for more sophisticated codecs. The decode-recode round-trip test is a classic example of property-based testing.
This version operates on strings rather than generic lists, demonstrating character-level iteration in Rust: collecting to Vec<char> for indexed access, building output with String::push_str, and decoding by parsing runs of digits followed by a character.
🎯 Learning Outcomes
Vec<char> for indexed accessdecode(encode(s)) == sencode_fold with acc.last_mut() for single-pass encoding that avoids intermediate allocationCode Example
#![allow(clippy::all)]
/// # Run-Length Encoding
///
/// Compress consecutive runs of characters into count+char pairs.
/// "AABCCCDEEEE" → "2AB3CD4E"
/// Idiomatic Rust using iterators and fold.
pub fn encode(s: &str) -> String {
if s.is_empty() {
return String::new();
}
let chars: Vec<char> = s.chars().collect();
let mut result = String::new();
let mut count = 1;
for i in 1..chars.len() {
if chars[i] == chars[i - 1] {
count += 1;
} else {
if count > 1 {
result.push_str(&count.to_string());
}
result.push(chars[i - 1]);
count = 1;
}
}
// Don't forget the last run
if count > 1 {
result.push_str(&count.to_string());
}
result.push(*chars.last().unwrap());
result
}
/// Decode: expand "2AB3CD4E" → "AABCCCDEEEE"
pub fn decode(s: &str) -> String {
let mut result = String::new();
let mut count = 0;
for c in s.chars() {
if c.is_ascii_digit() {
count = count * 10 + (c as u32 - '0' as u32) as usize;
} else {
let repeat = if count == 0 { 1 } else { count };
for _ in 0..repeat {
result.push(c);
}
count = 0;
}
}
result
}
/// Recursive encode
pub fn encode_recursive(s: &str) -> String {
fn go(chars: &[char], idx: usize, count: usize, result: &mut String) {
if idx >= chars.len() {
return;
}
if idx + 1 < chars.len() && chars[idx] == chars[idx + 1] {
go(chars, idx + 1, count + 1, result);
} else {
if count > 1 {
result.push_str(&count.to_string());
}
result.push(chars[idx]);
go(chars, idx + 1, 1, result);
}
}
let chars: Vec<char> = s.chars().collect();
let mut result = String::new();
go(&chars, 0, 1, &mut result);
result
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_encode() {
assert_eq!(encode("AABCCCDEEEE"), "2AB3CD4E");
}
#[test]
fn test_encode_no_repeats() {
assert_eq!(encode("ABCDE"), "ABCDE");
}
#[test]
fn test_encode_empty() {
assert_eq!(encode(""), "");
}
#[test]
fn test_encode_single() {
assert_eq!(encode("A"), "A");
}
#[test]
fn test_decode() {
assert_eq!(decode("2AB3CD4E"), "AABCCCDEEEE");
}
#[test]
fn test_roundtrip() {
let original = "AABCCCDEEEE";
assert_eq!(decode(&encode(original)), original);
}
#[test]
fn test_recursive() {
assert_eq!(encode_recursive("AABCCCDEEEE"), "2AB3CD4E");
}
}Key Differences
chars() vs String.get**: Rust's s.chars() returns a Chars iterator over Unicode scalar values. s.as_bytes()[i] gives bytes. OCaml's s.[i] or String.get s i gives bytes. Both should use character-level access for Unicode safety.String::push and String::push_str. OCaml's Buffer is equivalent — append-efficient string accumulation. Both avoid O(n²) repeated concatenation.c.is_ascii_digit() and arithmetic c as u32 - '0' as u32. OCaml: Char.code c - Char.code '0'. Same approach.count * 10 + digit**: Multi-digit run lengths (10+) require accumulating digits: count = count * 10 + digit. Both implementations handle this the same way.dedup vs RLE:** dedup() removes all consecutive duplicates. RLE counts them. Different operations for different use cases — compression uses RLE; uniqueness uses dedup.Itertools::chunk_by:** The itertools crate provides chunk_by(|a, b| a == b) which groups consecutive equal elements. This is the iterator-based pack operation from example 009.encode<T: PartialEq + Clone> generic handles any element type, not just characters. OCaml's 'a rle is also polymorphic.fold is faster than two-pass (group then count) because it avoids intermediate allocation. For string RLE specifically, byte-level iteration (bytes()) is faster than chars() for ASCII input.OCaml Approach
OCaml's string version: let encode s = let n = String.length s in let buf = Buffer.create n in (* ... iterate with a counter ... *) Buffer.contents buf. OCaml's String.get accesses characters by index. For decoding: String.to_seq s |> Seq.fold_left (fun (count, result) c -> ...). The Buffer type accumulates the output string efficiently.
Full Source
#![allow(clippy::all)]
/// # Run-Length Encoding
///
/// Compress consecutive runs of characters into count+char pairs.
/// "AABCCCDEEEE" → "2AB3CD4E"
/// Idiomatic Rust using iterators and fold.
pub fn encode(s: &str) -> String {
if s.is_empty() {
return String::new();
}
let chars: Vec<char> = s.chars().collect();
let mut result = String::new();
let mut count = 1;
for i in 1..chars.len() {
if chars[i] == chars[i - 1] {
count += 1;
} else {
if count > 1 {
result.push_str(&count.to_string());
}
result.push(chars[i - 1]);
count = 1;
}
}
// Don't forget the last run
if count > 1 {
result.push_str(&count.to_string());
}
result.push(*chars.last().unwrap());
result
}
/// Decode: expand "2AB3CD4E" → "AABCCCDEEEE"
pub fn decode(s: &str) -> String {
let mut result = String::new();
let mut count = 0;
for c in s.chars() {
if c.is_ascii_digit() {
count = count * 10 + (c as u32 - '0' as u32) as usize;
} else {
let repeat = if count == 0 { 1 } else { count };
for _ in 0..repeat {
result.push(c);
}
count = 0;
}
}
result
}
/// Recursive encode
pub fn encode_recursive(s: &str) -> String {
fn go(chars: &[char], idx: usize, count: usize, result: &mut String) {
if idx >= chars.len() {
return;
}
if idx + 1 < chars.len() && chars[idx] == chars[idx + 1] {
go(chars, idx + 1, count + 1, result);
} else {
if count > 1 {
result.push_str(&count.to_string());
}
result.push(chars[idx]);
go(chars, idx + 1, 1, result);
}
}
let chars: Vec<char> = s.chars().collect();
let mut result = String::new();
go(&chars, 0, 1, &mut result);
result
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_encode() {
assert_eq!(encode("AABCCCDEEEE"), "2AB3CD4E");
}
#[test]
fn test_encode_no_repeats() {
assert_eq!(encode("ABCDE"), "ABCDE");
}
#[test]
fn test_encode_empty() {
assert_eq!(encode(""), "");
}
#[test]
fn test_encode_single() {
assert_eq!(encode("A"), "A");
}
#[test]
fn test_decode() {
assert_eq!(decode("2AB3CD4E"), "AABCCCDEEEE");
}
#[test]
fn test_roundtrip() {
let original = "AABCCCDEEEE";
assert_eq!(decode(&encode(original)), original);
}
#[test]
fn test_recursive() {
assert_eq!(encode_recursive("AABCCCDEEEE"), "2AB3CD4E");
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_encode() {
assert_eq!(encode("AABCCCDEEEE"), "2AB3CD4E");
}
#[test]
fn test_encode_no_repeats() {
assert_eq!(encode("ABCDE"), "ABCDE");
}
#[test]
fn test_encode_empty() {
assert_eq!(encode(""), "");
}
#[test]
fn test_encode_single() {
assert_eq!(encode("A"), "A");
}
#[test]
fn test_decode() {
assert_eq!(decode("2AB3CD4E"), "AABCCCDEEEE");
}
#[test]
fn test_roundtrip() {
let original = "AABCCCDEEEE";
assert_eq!(decode(&encode(original)), original);
}
#[test]
fn test_recursive() {
assert_eq!(encode_recursive("AABCCCDEEEE"), "2AB3CD4E");
}
}
Deep Comparison
Run-Length Encoding — OCaml vs Rust Comparison
Core Insight
Both languages handle string building through a mutable buffer pattern. OCaml uses Buffer.t, Rust uses String (which is essentially a growable UTF-8 buffer). The recursive structure maps directly between both languages.
OCaml Approach
Uses Buffer.create for efficient string building, with a recursive go function that tracks current character and count. Character access via s.[i] is O(1) since OCaml strings are byte arrays. Buffer operations are imperative but wrapped in a functional recursive structure.
Rust Approach
Uses String::push and String::push_str for building. Characters must first be collected into Vec<char> for indexed access since Rust strings are UTF-8 (variable-width). The iterative version is more idiomatic in Rust than the recursive one due to ownership considerations.
Comparison Table
| Aspect | OCaml | Rust |
|---|---|---|
| Memory | Buffer.t (growable) | String (growable Vec<u8>) |
| Null safety | N/A | unwrap() on last char |
| Errors | Index out of bounds | Panics on OOB, or use .get() |
| Iteration | Recursive with index | For loop over indices or windows |
| Strings | Byte array (O(1) index) | UTF-8 (must collect to Vec<char>) |
Things Rust Learners Should Notice
s[i] doesn't work on &str; collect to Vec<char> firstString is a buffer** — push() for char, push_str() for &str, like OCaml's Bufferto_string() on numbers** — count.to_string() allocates; consider write! macro for formattingdecode(encode(s)) == s is a property-based testing patternFurther Reading
Exercises
&[T] instead of &str, returning a String-like encoding. What type should the output be for generic T?RleEncoder struct with a push(c: char) -> Option<String> method that emits encoded chunks when runs complete, enabling streaming use.rle_encode_str(s: &str) -> String that encodes a string in a compact format like "aaa bb c" → "3a 2b c" and rle_decode_str that reverses it.encode_image(pixels: &[u8]) -> Vec<(u8, u8)> for a grayscale image encoded as a flat byte array — this is the actual BMP/PCX compression format.