500 Fundamental

String Compression — Run-Length Encoding

Functional Programming

Tutorial

The Problem

Run-length encoding is one of the simplest lossless compression schemes: instead of storing "aaabbbcc" (8 bytes), store [(3,'a'), (3,'b'), (2,'c')] or "3a3b2c" (6 chars). It is used in TIFF/BMP image formats (runs of same-color pixels), fax transmission (ITU-T T.4, runs of black/white pixels), PCX graphics, and DNA sequence compression. While LZ77/Huffman/DEFLATE are more powerful for general text, RLE is O(N) time/space and trivially invertible — ideal for data with long runs.

🎯 Learning Outcomes

• Encode a string into Vec<(usize, char)> using fold with a (current_char, count, acc) accumulator

• Decode by folding (count, char) pairs into a String with repeated push

• Format encoded output as human-readable "3a3b2c" with map + collect

• Handle the edge case of an empty string (no fold initial value)

• Recognise the pattern of seeding fold with the first element to avoid an Option

Code Example

#![allow(clippy::all)]
//! # 500. String Compression — Run-Length Encoding
//! Encode/decode strings using run-length encoding via iterator fold.

/// Encode a string into (count, char) pairs.
/// "aabbbcc" -> vec![(2,'a'), (3,'b'), (2,'c')]
fn encode(s: &str) -> Vec<(usize, char)> {
    let mut chars = s.chars();
    let first = match chars.next() {
        None => return vec![],
        Some(c) => c,
    };
    let (cur, count, mut acc) =
        chars.fold((first, 1usize, Vec::new()), |(cur, count, mut acc), c| {
            if c == cur {
                (cur, count + 1, acc)
            } else {
                acc.push((count, cur));
                (c, 1, acc)
            }
        });
    acc.push((count, cur));
    acc
}

/// Decode (count, char) pairs back into a string.
fn decode(pairs: &[(usize, char)]) -> String {
    pairs.iter().fold(String::new(), |mut s, &(n, c)| {
        for _ in 0..n {
            s.push(c);
        }
        s
    })
}

/// Format encoded pairs as human-readable "2a3b2c"
fn show_encoded(pairs: &[(usize, char)]) -> String {
    pairs.iter().map(|(n, c)| format!("{}{}", n, c)).collect()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_encode_basic() {
        assert_eq!(encode("aabbbcc"), vec![(2, 'a'), (3, 'b'), (2, 'c')]);
    }

    #[test]
    fn test_encode_all_same() {
        assert_eq!(encode("aaaa"), vec![(4, 'a')]);
    }

    #[test]
    fn test_encode_no_repeats() {
        assert_eq!(
            encode("abcde"),
            vec![(1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (1, 'e')]
        );
    }

    #[test]
    fn test_encode_empty() {
        assert_eq!(encode(""), vec![]);
    }

    #[test]
    fn test_roundtrip() {
        let inputs = ["aabbbcc", "aaaa", "abcde", "aabbcc", "z"];
        for s in &inputs {
            assert_eq!(decode(&encode(s)), *s, "roundtrip failed for {:?}", s);
        }
    }
}

(* Run-length encoding in OCaml *)

(* Encode: "aabbbcc" -> [(2,'a');(3,'b');(2,'c')] *)
let encode (s : string) : (int * char) list =
  let chars = List.of_seq (String.to_seq s) in
  match chars with
  | [] -> []
  | first :: rest ->
    let (cur, count, acc) =
      List.fold_left
        (fun (cur, count, acc) c ->
          if c = cur then (cur, count + 1, acc)
          else (c, 1, (count, cur) :: acc))
        (first, 1, [])
        rest
    in
    List.rev ((count, cur) :: acc)

(* Decode: [(2,'a');(3,'b')] -> "aabbb" *)
let decode (pairs : (int * char) list) : string =
  List.fold_left
    (fun acc (n, c) -> acc ^ String.make n c)
    ""
    pairs

(* Pretty print encoded form *)
let show_encoded pairs =
  let parts = List.map (fun (n, c) -> Printf.sprintf "%d%c" n c) pairs in
  String.concat "" parts

let () =
  let tests = ["aabbbcc"; "aaaa"; "abcde"; ""; "aabbcc"] in
  List.iter (fun s ->
    let enc = encode s in
    let dec = decode enc in
    Printf.printf "Input:   %S\n" s;
    Printf.printf "Encoded: %s\n" (show_encoded enc);
    Printf.printf "Decoded: %S\n" dec;
    Printf.printf "Match:   %b\n\n" (s = dec)
  ) tests

Key Differences

**fold accumulator**: Both Rust and OCaml use a three-element fold accumulator (current, count, result_so_far); the seeding pattern (extract first element before fold) is the same.

Decode strategy: Rust's fold accumulates into a String via push; OCaml uses Buffer + iter. Both are O(N) with one allocation.

**String::with_capacity**: The decode result size is known (pairs.iter().map(|(n,_)| n).sum()); pre-allocating avoids reallocation. OCaml's Buffer.create hint is advisory.

**Iterator::flat_map decode**: An alternative Rust decode: pairs.iter().flat_map(|(n,c)| std::iter::repeat(*c).take(*n)).collect() — more functional but slightly less efficient.

OCaml Approach

let encode s =
  match String.to_seq s |> List.of_seq with
  | [] -> []
  | first :: rest ->
    let (cur, count, acc) = List.fold_left (fun (cur, count, acc) c ->
      if c = cur then (cur, count + 1, acc)
      else (c, 1, (count, cur) :: acc)) (first, 1, []) rest in
    List.rev ((count, cur) :: acc)

let decode pairs =
  let buf = Buffer.create 64 in
  List.iter (fun (n, c) ->
    for _ = 1 to n do Buffer.add_char buf c done) pairs;
  Buffer.contents buf

OCaml's List.fold_left is equivalent to Rust's Iterator::fold; the accumulator pattern is identical.

Full Source

#![allow(clippy::all)]
//! # 500. String Compression — Run-Length Encoding
//! Encode/decode strings using run-length encoding via iterator fold.

/// Encode a string into (count, char) pairs.
/// "aabbbcc" -> vec![(2,'a'), (3,'b'), (2,'c')]
fn encode(s: &str) -> Vec<(usize, char)> {
    let mut chars = s.chars();
    let first = match chars.next() {
        None => return vec![],
        Some(c) => c,
    };
    let (cur, count, mut acc) =
        chars.fold((first, 1usize, Vec::new()), |(cur, count, mut acc), c| {
            if c == cur {
                (cur, count + 1, acc)
            } else {
                acc.push((count, cur));
                (c, 1, acc)
            }
        });
    acc.push((count, cur));
    acc
}

/// Decode (count, char) pairs back into a string.
fn decode(pairs: &[(usize, char)]) -> String {
    pairs.iter().fold(String::new(), |mut s, &(n, c)| {
        for _ in 0..n {
            s.push(c);
        }
        s
    })
}

/// Format encoded pairs as human-readable "2a3b2c"
fn show_encoded(pairs: &[(usize, char)]) -> String {
    pairs.iter().map(|(n, c)| format!("{}{}", n, c)).collect()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_encode_basic() {
        assert_eq!(encode("aabbbcc"), vec![(2, 'a'), (3, 'b'), (2, 'c')]);
    }

    #[test]
    fn test_encode_all_same() {
        assert_eq!(encode("aaaa"), vec![(4, 'a')]);
    }

    #[test]
    fn test_encode_no_repeats() {
        assert_eq!(
            encode("abcde"),
            vec![(1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (1, 'e')]
        );
    }

    #[test]
    fn test_encode_empty() {
        assert_eq!(encode(""), vec![]);
    }

    #[test]
    fn test_roundtrip() {
        let inputs = ["aabbbcc", "aaaa", "abcde", "aabbcc", "z"];
        for s in &inputs {
            assert_eq!(decode(&encode(s)), *s, "roundtrip failed for {:?}", s);
        }
    }
}

(* Run-length encoding in OCaml *)

(* Encode: "aabbbcc" -> [(2,'a');(3,'b');(2,'c')] *)
let encode (s : string) : (int * char) list =
  let chars = List.of_seq (String.to_seq s) in
  match chars with
  | [] -> []
  | first :: rest ->
    let (cur, count, acc) =
      List.fold_left
        (fun (cur, count, acc) c ->
          if c = cur then (cur, count + 1, acc)
          else (c, 1, (count, cur) :: acc))
        (first, 1, [])
        rest
    in
    List.rev ((count, cur) :: acc)

(* Decode: [(2,'a');(3,'b')] -> "aabbb" *)
let decode (pairs : (int * char) list) : string =
  List.fold_left
    (fun acc (n, c) -> acc ^ String.make n c)
    ""
    pairs

(* Pretty print encoded form *)
let show_encoded pairs =
  let parts = List.map (fun (n, c) -> Printf.sprintf "%d%c" n c) pairs in
  String.concat "" parts

let () =
  let tests = ["aabbbcc"; "aaaa"; "abcde"; ""; "aabbcc"] in
  List.iter (fun s ->
    let enc = encode s in
    let dec = decode enc in
    Printf.printf "Input:   %S\n" s;
    Printf.printf "Encoded: %s\n" (show_encoded enc);
    Printf.printf "Decoded: %S\n" dec;
    Printf.printf "Match:   %b\n\n" (s = dec)
  ) tests

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_encode_basic() {
        assert_eq!(encode("aabbbcc"), vec![(2, 'a'), (3, 'b'), (2, 'c')]);
    }

    #[test]
    fn test_encode_all_same() {
        assert_eq!(encode("aaaa"), vec![(4, 'a')]);
    }

    #[test]
    fn test_encode_no_repeats() {
        assert_eq!(
            encode("abcde"),
            vec![(1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (1, 'e')]
        );
    }

    #[test]
    fn test_encode_empty() {
        assert_eq!(encode(""), vec![]);
    }

    #[test]
    fn test_roundtrip() {
        let inputs = ["aabbbcc", "aaaa", "abcde", "aabbcc", "z"];
        for s in &inputs {
            assert_eq!(decode(&encode(s)), *s, "roundtrip failed for {:?}", s);
        }
    }
}

Exercises

Encode to string: Implement fn encode_str(s: &str) -> String that formats directly as "3a3b2c" without an intermediate Vec.

Decode from string: Implement fn decode_str(s: &str) -> Option<String> that parses "3a3b2c" by alternating between parse::<usize>() runs and char reads.

Compression ratio: Write a benchmark comparing RLE compression on (a) random ASCII text, (b) repeated patterns like "abcabcabc", and (c) binary-like data — measure when RLE expands vs. compresses.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust