ExamplesBy LevelBy TopicLearning Paths
063 Intermediate

063 — Run-Length Encoding

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "063 — Run-Length Encoding" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Run-length encoding (RLE) compresses consecutive repeated characters: "AABCCCDEEEE" → "2AB3CD4E". Key difference from OCaml: 1. **`chars()` vs `String.get`**: Rust's `s.chars()` returns a `Chars` iterator over Unicode scalar values. `s.as_bytes()[i]` gives bytes. OCaml's `s.[i]` or `String.get s i` gives bytes. Both should use character

Tutorial

The Problem

Run-length encoding (RLE) compresses consecutive repeated characters: "AABCCCDEEEE" → "2AB3CD4E". It is one of the oldest and simplest compression algorithms, used in fax machines (CCITT T.4 standard, 1980), BMP image files, PCX format, and as the basis for more sophisticated codecs. The decode-recode round-trip test is a classic example of property-based testing.

This version operates on strings rather than generic lists, demonstrating character-level iteration in Rust: collecting to Vec<char> for indexed access, building output with String::push_str, and decoding by parsing runs of digits followed by a character.

🎯 Learning Outcomes

  • • Iterate over a string as Vec<char> for indexed access
  • • Detect run boundaries and emit count+char pairs
  • • Decode by parsing alternating digit sequences and characters
  • • Handle the edge case: count=1 (single character, no prefix)
  • • Implement the round-trip invariant: decode(encode(s)) == s
  • • Implement both two-pass (pack then count) and single-pass (fold with run tracking) RLE encoding
  • • Use encode_fold with acc.last_mut() for single-pass encoding that avoids intermediate allocation
  • Code Example

    #![allow(clippy::all)]
    /// # Run-Length Encoding
    ///
    /// Compress consecutive runs of characters into count+char pairs.
    /// "AABCCCDEEEE" → "2AB3CD4E"
    
    /// Idiomatic Rust using iterators and fold.
    pub fn encode(s: &str) -> String {
        if s.is_empty() {
            return String::new();
        }
    
        let chars: Vec<char> = s.chars().collect();
        let mut result = String::new();
        let mut count = 1;
    
        for i in 1..chars.len() {
            if chars[i] == chars[i - 1] {
                count += 1;
            } else {
                if count > 1 {
                    result.push_str(&count.to_string());
                }
                result.push(chars[i - 1]);
                count = 1;
            }
        }
        // Don't forget the last run
        if count > 1 {
            result.push_str(&count.to_string());
        }
        result.push(*chars.last().unwrap());
        result
    }
    
    /// Decode: expand "2AB3CD4E" → "AABCCCDEEEE"
    pub fn decode(s: &str) -> String {
        let mut result = String::new();
        let mut count = 0;
        for c in s.chars() {
            if c.is_ascii_digit() {
                count = count * 10 + (c as u32 - '0' as u32) as usize;
            } else {
                let repeat = if count == 0 { 1 } else { count };
                for _ in 0..repeat {
                    result.push(c);
                }
                count = 0;
            }
        }
        result
    }
    
    /// Recursive encode
    pub fn encode_recursive(s: &str) -> String {
        fn go(chars: &[char], idx: usize, count: usize, result: &mut String) {
            if idx >= chars.len() {
                return;
            }
            if idx + 1 < chars.len() && chars[idx] == chars[idx + 1] {
                go(chars, idx + 1, count + 1, result);
            } else {
                if count > 1 {
                    result.push_str(&count.to_string());
                }
                result.push(chars[idx]);
                go(chars, idx + 1, 1, result);
            }
        }
        let chars: Vec<char> = s.chars().collect();
        let mut result = String::new();
        go(&chars, 0, 1, &mut result);
        result
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_encode() {
            assert_eq!(encode("AABCCCDEEEE"), "2AB3CD4E");
        }
    
        #[test]
        fn test_encode_no_repeats() {
            assert_eq!(encode("ABCDE"), "ABCDE");
        }
    
        #[test]
        fn test_encode_empty() {
            assert_eq!(encode(""), "");
        }
    
        #[test]
        fn test_encode_single() {
            assert_eq!(encode("A"), "A");
        }
    
        #[test]
        fn test_decode() {
            assert_eq!(decode("2AB3CD4E"), "AABCCCDEEEE");
        }
    
        #[test]
        fn test_roundtrip() {
            let original = "AABCCCDEEEE";
            assert_eq!(decode(&encode(original)), original);
        }
    
        #[test]
        fn test_recursive() {
            assert_eq!(encode_recursive("AABCCCDEEEE"), "2AB3CD4E");
        }
    }

    Key Differences

  • **chars() vs String.get**: Rust's s.chars() returns a Chars iterator over Unicode scalar values. s.as_bytes()[i] gives bytes. OCaml's s.[i] or String.get s i gives bytes. Both should use character-level access for Unicode safety.
  • String building: Rust uses String::push and String::push_str. OCaml's Buffer is equivalent — append-efficient string accumulation. Both avoid O(n²) repeated concatenation.
  • Digit parsing: Rust uses c.is_ascii_digit() and arithmetic c as u32 - '0' as u32. OCaml: Char.code c - Char.code '0'. Same approach.
  • **count * 10 + digit**: Multi-digit run lengths (10+) require accumulating digits: count = count * 10 + digit. Both implementations handle this the same way.
  • **dedup vs RLE:** dedup() removes all consecutive duplicates. RLE counts them. Different operations for different use cases — compression uses RLE; uniqueness uses dedup.
  • **Itertools::chunk_by:** The itertools crate provides chunk_by(|a, b| a == b) which groups consecutive equal elements. This is the iterator-based pack operation from example 009.
  • Generic vs concrete: Making encode<T: PartialEq + Clone> generic handles any element type, not just characters. OCaml's 'a rle is also polymorphic.
  • Performance: Single-pass encoding with fold is faster than two-pass (group then count) because it avoids intermediate allocation. For string RLE specifically, byte-level iteration (bytes()) is faster than chars() for ASCII input.
  • OCaml Approach

    OCaml's string version: let encode s = let n = String.length s in let buf = Buffer.create n in (* ... iterate with a counter ... *) Buffer.contents buf. OCaml's String.get accesses characters by index. For decoding: String.to_seq s |> Seq.fold_left (fun (count, result) c -> ...). The Buffer type accumulates the output string efficiently.

    Full Source

    #![allow(clippy::all)]
    /// # Run-Length Encoding
    ///
    /// Compress consecutive runs of characters into count+char pairs.
    /// "AABCCCDEEEE" → "2AB3CD4E"
    
    /// Idiomatic Rust using iterators and fold.
    pub fn encode(s: &str) -> String {
        if s.is_empty() {
            return String::new();
        }
    
        let chars: Vec<char> = s.chars().collect();
        let mut result = String::new();
        let mut count = 1;
    
        for i in 1..chars.len() {
            if chars[i] == chars[i - 1] {
                count += 1;
            } else {
                if count > 1 {
                    result.push_str(&count.to_string());
                }
                result.push(chars[i - 1]);
                count = 1;
            }
        }
        // Don't forget the last run
        if count > 1 {
            result.push_str(&count.to_string());
        }
        result.push(*chars.last().unwrap());
        result
    }
    
    /// Decode: expand "2AB3CD4E" → "AABCCCDEEEE"
    pub fn decode(s: &str) -> String {
        let mut result = String::new();
        let mut count = 0;
        for c in s.chars() {
            if c.is_ascii_digit() {
                count = count * 10 + (c as u32 - '0' as u32) as usize;
            } else {
                let repeat = if count == 0 { 1 } else { count };
                for _ in 0..repeat {
                    result.push(c);
                }
                count = 0;
            }
        }
        result
    }
    
    /// Recursive encode
    pub fn encode_recursive(s: &str) -> String {
        fn go(chars: &[char], idx: usize, count: usize, result: &mut String) {
            if idx >= chars.len() {
                return;
            }
            if idx + 1 < chars.len() && chars[idx] == chars[idx + 1] {
                go(chars, idx + 1, count + 1, result);
            } else {
                if count > 1 {
                    result.push_str(&count.to_string());
                }
                result.push(chars[idx]);
                go(chars, idx + 1, 1, result);
            }
        }
        let chars: Vec<char> = s.chars().collect();
        let mut result = String::new();
        go(&chars, 0, 1, &mut result);
        result
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_encode() {
            assert_eq!(encode("AABCCCDEEEE"), "2AB3CD4E");
        }
    
        #[test]
        fn test_encode_no_repeats() {
            assert_eq!(encode("ABCDE"), "ABCDE");
        }
    
        #[test]
        fn test_encode_empty() {
            assert_eq!(encode(""), "");
        }
    
        #[test]
        fn test_encode_single() {
            assert_eq!(encode("A"), "A");
        }
    
        #[test]
        fn test_decode() {
            assert_eq!(decode("2AB3CD4E"), "AABCCCDEEEE");
        }
    
        #[test]
        fn test_roundtrip() {
            let original = "AABCCCDEEEE";
            assert_eq!(decode(&encode(original)), original);
        }
    
        #[test]
        fn test_recursive() {
            assert_eq!(encode_recursive("AABCCCDEEEE"), "2AB3CD4E");
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_encode() {
            assert_eq!(encode("AABCCCDEEEE"), "2AB3CD4E");
        }
    
        #[test]
        fn test_encode_no_repeats() {
            assert_eq!(encode("ABCDE"), "ABCDE");
        }
    
        #[test]
        fn test_encode_empty() {
            assert_eq!(encode(""), "");
        }
    
        #[test]
        fn test_encode_single() {
            assert_eq!(encode("A"), "A");
        }
    
        #[test]
        fn test_decode() {
            assert_eq!(decode("2AB3CD4E"), "AABCCCDEEEE");
        }
    
        #[test]
        fn test_roundtrip() {
            let original = "AABCCCDEEEE";
            assert_eq!(decode(&encode(original)), original);
        }
    
        #[test]
        fn test_recursive() {
            assert_eq!(encode_recursive("AABCCCDEEEE"), "2AB3CD4E");
        }
    }

    Deep Comparison

    Run-Length Encoding — OCaml vs Rust Comparison

    Core Insight

    Both languages handle string building through a mutable buffer pattern. OCaml uses Buffer.t, Rust uses String (which is essentially a growable UTF-8 buffer). The recursive structure maps directly between both languages.

    OCaml Approach

    Uses Buffer.create for efficient string building, with a recursive go function that tracks current character and count. Character access via s.[i] is O(1) since OCaml strings are byte arrays. Buffer operations are imperative but wrapped in a functional recursive structure.

    Rust Approach

    Uses String::push and String::push_str for building. Characters must first be collected into Vec<char> for indexed access since Rust strings are UTF-8 (variable-width). The iterative version is more idiomatic in Rust than the recursive one due to ownership considerations.

    Comparison Table

    AspectOCamlRust
    MemoryBuffer.t (growable)String (growable Vec<u8>)
    Null safetyN/Aunwrap() on last char
    ErrorsIndex out of boundsPanics on OOB, or use .get()
    IterationRecursive with indexFor loop over indices or windows
    StringsByte array (O(1) index)UTF-8 (must collect to Vec<char>)

    Things Rust Learners Should Notice

  • Strings aren't indexables[i] doesn't work on &str; collect to Vec<char> first
  • **String is a buffer** — push() for char, push_str() for &str, like OCaml's Buffer
  • **to_string() on numbers** — count.to_string() allocates; consider write! macro for formatting
  • Roundtrip testingdecode(encode(s)) == s is a property-based testing pattern
  • Further Reading

  • • [String in Rust](https://doc.rust-lang.org/std/string/struct.String.html)
  • • [Exercism: Run-Length Encoding](https://exercism.org/tracks/rust/exercises/run-length-encoding)
  • Exercises

  • Generic RLE: Adapt to work on &[T] instead of &str, returning a String-like encoding. What type should the output be for generic T?
  • Streaming codec: Write an RleEncoder struct with a push(c: char) -> Option<String> method that emits encoded chunks when runs complete, enabling streaming use.
  • Benchmark: Compare encoding performance on "AAAAABBBCCCCC..." (long runs) vs "ABCDEFGH..." (no runs). When is RLE beneficial vs harmful?
  • String RLE: Implement rle_encode_str(s: &str) -> String that encodes a string in a compact format like "aaa bb c""3a 2b c" and rle_decode_str that reverses it.
  • Image RLE: Implement encode_image(pixels: &[u8]) -> Vec<(u8, u8)> for a grayscale image encoded as a flat byte array — this is the actual BMP/PCX compression format.
  • Open Source Repos