String Compression — Run-Length Encoding
Functional Programming
Tutorial
The Problem
Run-length encoding is one of the simplest lossless compression schemes: instead of storing "aaabbbcc" (8 bytes), store [(3,'a'), (3,'b'), (2,'c')] or "3a3b2c" (6 chars). It is used in TIFF/BMP image formats (runs of same-color pixels), fax transmission (ITU-T T.4, runs of black/white pixels), PCX graphics, and DNA sequence compression. While LZ77/Huffman/DEFLATE are more powerful for general text, RLE is O(N) time/space and trivially invertible — ideal for data with long runs.
🎯 Learning Outcomes
Vec<(usize, char)> using fold with a (current_char, count, acc) accumulator(count, char) pairs into a String with repeated push"3a3b2c" with map + collectfold with the first element to avoid an OptionCode Example
#![allow(clippy::all)]
//! # 500. String Compression — Run-Length Encoding
//! Encode/decode strings using run-length encoding via iterator fold.
/// Encode a string into (count, char) pairs.
/// "aabbbcc" -> vec![(2,'a'), (3,'b'), (2,'c')]
fn encode(s: &str) -> Vec<(usize, char)> {
let mut chars = s.chars();
let first = match chars.next() {
None => return vec![],
Some(c) => c,
};
let (cur, count, mut acc) =
chars.fold((first, 1usize, Vec::new()), |(cur, count, mut acc), c| {
if c == cur {
(cur, count + 1, acc)
} else {
acc.push((count, cur));
(c, 1, acc)
}
});
acc.push((count, cur));
acc
}
/// Decode (count, char) pairs back into a string.
fn decode(pairs: &[(usize, char)]) -> String {
pairs.iter().fold(String::new(), |mut s, &(n, c)| {
for _ in 0..n {
s.push(c);
}
s
})
}
/// Format encoded pairs as human-readable "2a3b2c"
fn show_encoded(pairs: &[(usize, char)]) -> String {
pairs.iter().map(|(n, c)| format!("{}{}", n, c)).collect()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_encode_basic() {
assert_eq!(encode("aabbbcc"), vec![(2, 'a'), (3, 'b'), (2, 'c')]);
}
#[test]
fn test_encode_all_same() {
assert_eq!(encode("aaaa"), vec![(4, 'a')]);
}
#[test]
fn test_encode_no_repeats() {
assert_eq!(
encode("abcde"),
vec![(1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (1, 'e')]
);
}
#[test]
fn test_encode_empty() {
assert_eq!(encode(""), vec![]);
}
#[test]
fn test_roundtrip() {
let inputs = ["aabbbcc", "aaaa", "abcde", "aabbcc", "z"];
for s in &inputs {
assert_eq!(decode(&encode(s)), *s, "roundtrip failed for {:?}", s);
}
}
}Key Differences
fold accumulator**: Both Rust and OCaml use a three-element fold accumulator (current, count, result_so_far); the seeding pattern (extract first element before fold) is the same.fold accumulates into a String via push; OCaml uses Buffer + iter. Both are O(N) with one allocation.String::with_capacity**: The decode result size is known (pairs.iter().map(|(n,_)| n).sum()); pre-allocating avoids reallocation. OCaml's Buffer.create hint is advisory.Iterator::flat_map decode**: An alternative Rust decode: pairs.iter().flat_map(|(n,c)| std::iter::repeat(*c).take(*n)).collect() — more functional but slightly less efficient.OCaml Approach
let encode s =
match String.to_seq s |> List.of_seq with
| [] -> []
| first :: rest ->
let (cur, count, acc) = List.fold_left (fun (cur, count, acc) c ->
if c = cur then (cur, count + 1, acc)
else (c, 1, (count, cur) :: acc)) (first, 1, []) rest in
List.rev ((count, cur) :: acc)
let decode pairs =
let buf = Buffer.create 64 in
List.iter (fun (n, c) ->
for _ = 1 to n do Buffer.add_char buf c done) pairs;
Buffer.contents buf
OCaml's List.fold_left is equivalent to Rust's Iterator::fold; the accumulator pattern is identical.
Full Source
#![allow(clippy::all)]
//! # 500. String Compression — Run-Length Encoding
//! Encode/decode strings using run-length encoding via iterator fold.
/// Encode a string into (count, char) pairs.
/// "aabbbcc" -> vec![(2,'a'), (3,'b'), (2,'c')]
fn encode(s: &str) -> Vec<(usize, char)> {
let mut chars = s.chars();
let first = match chars.next() {
None => return vec![],
Some(c) => c,
};
let (cur, count, mut acc) =
chars.fold((first, 1usize, Vec::new()), |(cur, count, mut acc), c| {
if c == cur {
(cur, count + 1, acc)
} else {
acc.push((count, cur));
(c, 1, acc)
}
});
acc.push((count, cur));
acc
}
/// Decode (count, char) pairs back into a string.
fn decode(pairs: &[(usize, char)]) -> String {
pairs.iter().fold(String::new(), |mut s, &(n, c)| {
for _ in 0..n {
s.push(c);
}
s
})
}
/// Format encoded pairs as human-readable "2a3b2c"
fn show_encoded(pairs: &[(usize, char)]) -> String {
pairs.iter().map(|(n, c)| format!("{}{}", n, c)).collect()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_encode_basic() {
assert_eq!(encode("aabbbcc"), vec![(2, 'a'), (3, 'b'), (2, 'c')]);
}
#[test]
fn test_encode_all_same() {
assert_eq!(encode("aaaa"), vec![(4, 'a')]);
}
#[test]
fn test_encode_no_repeats() {
assert_eq!(
encode("abcde"),
vec![(1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (1, 'e')]
);
}
#[test]
fn test_encode_empty() {
assert_eq!(encode(""), vec![]);
}
#[test]
fn test_roundtrip() {
let inputs = ["aabbbcc", "aaaa", "abcde", "aabbcc", "z"];
for s in &inputs {
assert_eq!(decode(&encode(s)), *s, "roundtrip failed for {:?}", s);
}
}
}
✓ Tests
Rust test suite
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_encode_basic() {
assert_eq!(encode("aabbbcc"), vec![(2, 'a'), (3, 'b'), (2, 'c')]);
}
#[test]
fn test_encode_all_same() {
assert_eq!(encode("aaaa"), vec![(4, 'a')]);
}
#[test]
fn test_encode_no_repeats() {
assert_eq!(
encode("abcde"),
vec![(1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (1, 'e')]
);
}
#[test]
fn test_encode_empty() {
assert_eq!(encode(""), vec![]);
}
#[test]
fn test_roundtrip() {
let inputs = ["aabbbcc", "aaaa", "abcde", "aabbcc", "z"];
for s in &inputs {
assert_eq!(decode(&encode(s)), *s, "roundtrip failed for {:?}", s);
}
}
}
Exercises
fn encode_str(s: &str) -> String that formats directly as "3a3b2c" without an intermediate Vec.fn decode_str(s: &str) -> Option<String> that parses "3a3b2c" by alternating between parse::<usize>() runs and char reads."abcabcabc", and (c) binary-like data — measure when RLE expands vs. compresses.