063 Intermediate

063 — Run-Length Encoding

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "063 — Run-Length Encoding" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Run-length encoding (RLE) compresses consecutive repeated characters: "AABCCCDEEEE" → "2AB3CD4E". Key difference from OCaml: 1. **`chars()` vs `String.get`**: Rust's `s.chars()` returns a `Chars` iterator over Unicode scalar values. `s.as_bytes()[i]` gives bytes. OCaml's `s.[i]` or `String.get s i` gives bytes. Both should use character

Tutorial

The Problem

Run-length encoding (RLE) compresses consecutive repeated characters: "AABCCCDEEEE" → "2AB3CD4E". It is one of the oldest and simplest compression algorithms, used in fax machines (CCITT T.4 standard, 1980), BMP image files, PCX format, and as the basis for more sophisticated codecs. The decode-recode round-trip test is a classic example of property-based testing.

This version operates on strings rather than generic lists, demonstrating character-level iteration in Rust: collecting to Vec<char> for indexed access, building output with String::push_str, and decoding by parsing runs of digits followed by a character.

🎯 Learning Outcomes

• Iterate over a string as Vec<char> for indexed access

• Detect run boundaries and emit count+char pairs

• Decode by parsing alternating digit sequences and characters

• Handle the edge case: count=1 (single character, no prefix)

• Implement the round-trip invariant: decode(encode(s)) == s

• Implement both two-pass (pack then count) and single-pass (fold with run tracking) RLE encoding

• Use encode_fold with acc.last_mut() for single-pass encoding that avoids intermediate allocation

Code Example

#![allow(clippy::all)]
/// # Run-Length Encoding
///
/// Compress consecutive runs of characters into count+char pairs.
/// "AABCCCDEEEE" → "2AB3CD4E"

/// Idiomatic Rust using iterators and fold.
pub fn encode(s: &str) -> String {
    if s.is_empty() {
        return String::new();
    }

    let chars: Vec<char> = s.chars().collect();
    let mut result = String::new();
    let mut count = 1;

    for i in 1..chars.len() {
        if chars[i] == chars[i - 1] {
            count += 1;
        } else {
            if count > 1 {
                result.push_str(&count.to_string());
            }
            result.push(chars[i - 1]);
            count = 1;
        }
    }
    // Don't forget the last run
    if count > 1 {
        result.push_str(&count.to_string());
    }
    result.push(*chars.last().unwrap());
    result
}

/// Decode: expand "2AB3CD4E" → "AABCCCDEEEE"
pub fn decode(s: &str) -> String {
    let mut result = String::new();
    let mut count = 0;
    for c in s.chars() {
        if c.is_ascii_digit() {
            count = count * 10 + (c as u32 - '0' as u32) as usize;
        } else {
            let repeat = if count == 0 { 1 } else { count };
            for _ in 0..repeat {
                result.push(c);
            }
            count = 0;
        }
    }
    result
}

/// Recursive encode
pub fn encode_recursive(s: &str) -> String {
    fn go(chars: &[char], idx: usize, count: usize, result: &mut String) {
        if idx >= chars.len() {
            return;
        }
        if idx + 1 < chars.len() && chars[idx] == chars[idx + 1] {
            go(chars, idx + 1, count + 1, result);
        } else {
            if count > 1 {
                result.push_str(&count.to_string());
            }
            result.push(chars[idx]);
            go(chars, idx + 1, 1, result);
        }
    }
    let chars: Vec<char> = s.chars().collect();
    let mut result = String::new();
    go(&chars, 0, 1, &mut result);
    result
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_encode() {
        assert_eq!(encode("AABCCCDEEEE"), "2AB3CD4E");
    }

    #[test]
    fn test_encode_no_repeats() {
        assert_eq!(encode("ABCDE"), "ABCDE");
    }

    #[test]
    fn test_encode_empty() {
        assert_eq!(encode(""), "");
    }

    #[test]
    fn test_encode_single() {
        assert_eq!(encode("A"), "A");
    }

    #[test]
    fn test_decode() {
        assert_eq!(decode("2AB3CD4E"), "AABCCCDEEEE");
    }

    #[test]
    fn test_roundtrip() {
        let original = "AABCCCDEEEE";
        assert_eq!(decode(&encode(original)), original);
    }

    #[test]
    fn test_recursive() {
        assert_eq!(encode_recursive("AABCCCDEEEE"), "2AB3CD4E");
    }
}

(* Run-Length Encoding — String Compression *)

let encode s =
  let n = String.length s in
  if n = 0 then "" else
  let buf = Buffer.create n in
  let rec go i c count =
    if i = n then begin
      if count > 1 then Buffer.add_string buf (string_of_int count);
      Buffer.add_char buf c
    end else if s.[i] = c then go (i+1) c (count+1)
    else begin
      if count > 1 then Buffer.add_string buf (string_of_int count);
      Buffer.add_char buf c;
      go (i+1) s.[i] 1
    end
  in
  go 1 s.[0] 1;
  Buffer.contents buf

let decode s =
  let buf = Buffer.create (String.length s) in
  let rec go i count =
    if i >= String.length s then ()
    else
      let c = s.[i] in
      if c >= '0' && c <= '9' then
        go (i+1) (count * 10 + Char.code c - Char.code '0')
      else begin
        let n = if count = 0 then 1 else count in
        for _ = 1 to n do Buffer.add_char buf c done;
        go (i+1) 0
      end
  in
  go 0 0;
  Buffer.contents buf

let () =
  assert (encode "AABCCCDEEEE" = "2AB3CD4E");
  assert (decode "2AB3CD4E" = "AABCCCDEEEE");
  assert (encode "" = "");
  Printf.printf "All RLE tests passed!\n"

Key Differences

**chars() vs String.get**: Rust's s.chars() returns a Chars iterator over Unicode scalar values. s.as_bytes()[i] gives bytes. OCaml's s.[i] or String.get s i gives bytes. Both should use character-level access for Unicode safety.

String building: Rust uses String::push and String::push_str. OCaml's Buffer is equivalent — append-efficient string accumulation. Both avoid O(n²) repeated concatenation.

Digit parsing: Rust uses c.is_ascii_digit() and arithmetic c as u32 - '0' as u32. OCaml: Char.code c - Char.code '0'. Same approach.

**count * 10 + digit**: Multi-digit run lengths (10+) require accumulating digits: count = count * 10 + digit. Both implementations handle this the same way.

**dedup vs RLE:** dedup() removes all consecutive duplicates. RLE counts them. Different operations for different use cases — compression uses RLE; uniqueness uses dedup.

**Itertools::chunk_by:** The itertools crate provides chunk_by(|a, b| a == b) which groups consecutive equal elements. This is the iterator-based pack operation from example 009.

Generic vs concrete: Making encode<T: PartialEq + Clone> generic handles any element type, not just characters. OCaml's 'a rle is also polymorphic.

Performance: Single-pass encoding with fold is faster than two-pass (group then count) because it avoids intermediate allocation. For string RLE specifically, byte-level iteration (bytes()) is faster than chars() for ASCII input.

OCaml Approach

OCaml's string version: let encode s = let n = String.length s in let buf = Buffer.create n in (* ... iterate with a counter ... *) Buffer.contents buf. OCaml's String.get accesses characters by index. For decoding: String.to_seq s |> Seq.fold_left (fun (count, result) c -> ...). The Buffer type accumulates the output string efficiently.

Full Source

#![allow(clippy::all)]
/// # Run-Length Encoding
///
/// Compress consecutive runs of characters into count+char pairs.
/// "AABCCCDEEEE" → "2AB3CD4E"

/// Idiomatic Rust using iterators and fold.
pub fn encode(s: &str) -> String {
    if s.is_empty() {
        return String::new();
    }

    let chars: Vec<char> = s.chars().collect();
    let mut result = String::new();
    let mut count = 1;

    for i in 1..chars.len() {
        if chars[i] == chars[i - 1] {
            count += 1;
        } else {
            if count > 1 {
                result.push_str(&count.to_string());
            }
            result.push(chars[i - 1]);
            count = 1;
        }
    }
    // Don't forget the last run
    if count > 1 {
        result.push_str(&count.to_string());
    }
    result.push(*chars.last().unwrap());
    result
}

/// Decode: expand "2AB3CD4E" → "AABCCCDEEEE"
pub fn decode(s: &str) -> String {
    let mut result = String::new();
    let mut count = 0;
    for c in s.chars() {
        if c.is_ascii_digit() {
            count = count * 10 + (c as u32 - '0' as u32) as usize;
        } else {
            let repeat = if count == 0 { 1 } else { count };
            for _ in 0..repeat {
                result.push(c);
            }
            count = 0;
        }
    }
    result
}

/// Recursive encode
pub fn encode_recursive(s: &str) -> String {
    fn go(chars: &[char], idx: usize, count: usize, result: &mut String) {
        if idx >= chars.len() {
            return;
        }
        if idx + 1 < chars.len() && chars[idx] == chars[idx + 1] {
            go(chars, idx + 1, count + 1, result);
        } else {
            if count > 1 {
                result.push_str(&count.to_string());
            }
            result.push(chars[idx]);
            go(chars, idx + 1, 1, result);
        }
    }
    let chars: Vec<char> = s.chars().collect();
    let mut result = String::new();
    go(&chars, 0, 1, &mut result);
    result
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_encode() {
        assert_eq!(encode("AABCCCDEEEE"), "2AB3CD4E");
    }

    #[test]
    fn test_encode_no_repeats() {
        assert_eq!(encode("ABCDE"), "ABCDE");
    }

    #[test]
    fn test_encode_empty() {
        assert_eq!(encode(""), "");
    }

    #[test]
    fn test_encode_single() {
        assert_eq!(encode("A"), "A");
    }

    #[test]
    fn test_decode() {
        assert_eq!(decode("2AB3CD4E"), "AABCCCDEEEE");
    }

    #[test]
    fn test_roundtrip() {
        let original = "AABCCCDEEEE";
        assert_eq!(decode(&encode(original)), original);
    }

    #[test]
    fn test_recursive() {
        assert_eq!(encode_recursive("AABCCCDEEEE"), "2AB3CD4E");
    }
}

(* Run-Length Encoding — String Compression *)

let encode s =
  let n = String.length s in
  if n = 0 then "" else
  let buf = Buffer.create n in
  let rec go i c count =
    if i = n then begin
      if count > 1 then Buffer.add_string buf (string_of_int count);
      Buffer.add_char buf c
    end else if s.[i] = c then go (i+1) c (count+1)
    else begin
      if count > 1 then Buffer.add_string buf (string_of_int count);
      Buffer.add_char buf c;
      go (i+1) s.[i] 1
    end
  in
  go 1 s.[0] 1;
  Buffer.contents buf

let decode s =
  let buf = Buffer.create (String.length s) in
  let rec go i count =
    if i >= String.length s then ()
    else
      let c = s.[i] in
      if c >= '0' && c <= '9' then
        go (i+1) (count * 10 + Char.code c - Char.code '0')
      else begin
        let n = if count = 0 then 1 else count in
        for _ = 1 to n do Buffer.add_char buf c done;
        go (i+1) 0
      end
  in
  go 0 0;
  Buffer.contents buf

let () =
  assert (encode "AABCCCDEEEE" = "2AB3CD4E");
  assert (decode "2AB3CD4E" = "AABCCCDEEEE");
  assert (encode "" = "");
  Printf.printf "All RLE tests passed!\n"

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_encode() {
        assert_eq!(encode("AABCCCDEEEE"), "2AB3CD4E");
    }

    #[test]
    fn test_encode_no_repeats() {
        assert_eq!(encode("ABCDE"), "ABCDE");
    }

    #[test]
    fn test_encode_empty() {
        assert_eq!(encode(""), "");
    }

    #[test]
    fn test_encode_single() {
        assert_eq!(encode("A"), "A");
    }

    #[test]
    fn test_decode() {
        assert_eq!(decode("2AB3CD4E"), "AABCCCDEEEE");
    }

    #[test]
    fn test_roundtrip() {
        let original = "AABCCCDEEEE";
        assert_eq!(decode(&encode(original)), original);
    }

    #[test]
    fn test_recursive() {
        assert_eq!(encode_recursive("AABCCCDEEEE"), "2AB3CD4E");
    }
}

Deep Comparison

Run-Length Encoding — OCaml vs Rust Comparison

Core Insight

Both languages handle string building through a mutable buffer pattern. OCaml uses Buffer.t, Rust uses String (which is essentially a growable UTF-8 buffer). The recursive structure maps directly between both languages.

OCaml Approach

Uses Buffer.create for efficient string building, with a recursive go function that tracks current character and count. Character access via s.[i] is O(1) since OCaml strings are byte arrays. Buffer operations are imperative but wrapped in a functional recursive structure.

Rust Approach

Uses String::push and String::push_str for building. Characters must first be collected into Vec<char> for indexed access since Rust strings are UTF-8 (variable-width). The iterative version is more idiomatic in Rust than the recursive one due to ownership considerations.

Comparison Table

Aspect	OCaml	Rust
Memory	`Buffer.t` (growable)	`String` (growable Vec<u8>)
Null safety	N/A	`unwrap()` on last char
Errors	Index out of bounds	Panics on OOB, or use `.get()`
Iteration	Recursive with index	For loop over indices or windows
Strings	Byte array (O(1) index)	UTF-8 (must collect to Vec<char>)

Things Rust Learners Should Notice

Strings aren't indexable — s[i] doesn't work on &str; collect to Vec<char> first

**String is a buffer** — push() for char, push_str() for &str, like OCaml's Buffer

**to_string() on numbers** — count.to_string() allocates; consider write! macro for formatting

Roundtrip testing — decode(encode(s)) == s is a property-based testing pattern

Exercises

Generic RLE: Adapt to work on &[T] instead of &str, returning a String-like encoding. What type should the output be for generic T?

Streaming codec: Write an RleEncoder struct with a push(c: char) -> Option<String> method that emits encoded chunks when runs complete, enabling streaming use.

Benchmark: Compare encoding performance on "AAAAABBBCCCCC..." (long runs) vs "ABCDEFGH..." (no runs). When is RLE beneficial vs harmful?

String RLE: Implement rle_encode_str(s: &str) -> String that encodes a string in a compact format like "aaa bb c" → "3a 2b c" and rle_decode_str that reverses it.

Image RLE: Implement encode_image(pixels: &[u8]) -> Vec<(u8, u8)> for a grayscale image encoded as a flat byte array — this is the actual BMP/PCX compression format.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust

Tutorial Video

Tutorial

The Problem

🎯 Learning Outcomes

Code Example

Key Differences

OCaml Approach

Full Source

Deep Comparison

Run-Length Encoding — OCaml vs Rust Comparison

Core Insight

OCaml Approach

Rust Approach

Comparison Table

Things Rust Learners Should Notice

Further Reading

Exercises

Open Source Repos