ExamplesBy LevelBy TopicLearning Paths
500 Fundamental

String Compression — Run-Length Encoding

Functional Programming

Tutorial

The Problem

Run-length encoding is one of the simplest lossless compression schemes: instead of storing "aaabbbcc" (8 bytes), store [(3,'a'), (3,'b'), (2,'c')] or "3a3b2c" (6 chars). It is used in TIFF/BMP image formats (runs of same-color pixels), fax transmission (ITU-T T.4, runs of black/white pixels), PCX graphics, and DNA sequence compression. While LZ77/Huffman/DEFLATE are more powerful for general text, RLE is O(N) time/space and trivially invertible — ideal for data with long runs.

🎯 Learning Outcomes

  • • Encode a string into Vec<(usize, char)> using fold with a (current_char, count, acc) accumulator
  • • Decode by folding (count, char) pairs into a String with repeated push
  • • Format encoded output as human-readable "3a3b2c" with map + collect
  • • Handle the edge case of an empty string (no fold initial value)
  • • Recognise the pattern of seeding fold with the first element to avoid an Option
  • Code Example

    #![allow(clippy::all)]
    //! # 500. String Compression — Run-Length Encoding
    //! Encode/decode strings using run-length encoding via iterator fold.
    
    /// Encode a string into (count, char) pairs.
    /// "aabbbcc" -> vec![(2,'a'), (3,'b'), (2,'c')]
    fn encode(s: &str) -> Vec<(usize, char)> {
        let mut chars = s.chars();
        let first = match chars.next() {
            None => return vec![],
            Some(c) => c,
        };
        let (cur, count, mut acc) =
            chars.fold((first, 1usize, Vec::new()), |(cur, count, mut acc), c| {
                if c == cur {
                    (cur, count + 1, acc)
                } else {
                    acc.push((count, cur));
                    (c, 1, acc)
                }
            });
        acc.push((count, cur));
        acc
    }
    
    /// Decode (count, char) pairs back into a string.
    fn decode(pairs: &[(usize, char)]) -> String {
        pairs.iter().fold(String::new(), |mut s, &(n, c)| {
            for _ in 0..n {
                s.push(c);
            }
            s
        })
    }
    
    /// Format encoded pairs as human-readable "2a3b2c"
    fn show_encoded(pairs: &[(usize, char)]) -> String {
        pairs.iter().map(|(n, c)| format!("{}{}", n, c)).collect()
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_encode_basic() {
            assert_eq!(encode("aabbbcc"), vec![(2, 'a'), (3, 'b'), (2, 'c')]);
        }
    
        #[test]
        fn test_encode_all_same() {
            assert_eq!(encode("aaaa"), vec![(4, 'a')]);
        }
    
        #[test]
        fn test_encode_no_repeats() {
            assert_eq!(
                encode("abcde"),
                vec![(1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (1, 'e')]
            );
        }
    
        #[test]
        fn test_encode_empty() {
            assert_eq!(encode(""), vec![]);
        }
    
        #[test]
        fn test_roundtrip() {
            let inputs = ["aabbbcc", "aaaa", "abcde", "aabbcc", "z"];
            for s in &inputs {
                assert_eq!(decode(&encode(s)), *s, "roundtrip failed for {:?}", s);
            }
        }
    }

    Key Differences

  • **fold accumulator**: Both Rust and OCaml use a three-element fold accumulator (current, count, result_so_far); the seeding pattern (extract first element before fold) is the same.
  • Decode strategy: Rust's fold accumulates into a String via push; OCaml uses Buffer + iter. Both are O(N) with one allocation.
  • **String::with_capacity**: The decode result size is known (pairs.iter().map(|(n,_)| n).sum()); pre-allocating avoids reallocation. OCaml's Buffer.create hint is advisory.
  • **Iterator::flat_map decode**: An alternative Rust decode: pairs.iter().flat_map(|(n,c)| std::iter::repeat(*c).take(*n)).collect() — more functional but slightly less efficient.
  • OCaml Approach

    let encode s =
      match String.to_seq s |> List.of_seq with
      | [] -> []
      | first :: rest ->
        let (cur, count, acc) = List.fold_left (fun (cur, count, acc) c ->
          if c = cur then (cur, count + 1, acc)
          else (c, 1, (count, cur) :: acc)) (first, 1, []) rest in
        List.rev ((count, cur) :: acc)
    
    let decode pairs =
      let buf = Buffer.create 64 in
      List.iter (fun (n, c) ->
        for _ = 1 to n do Buffer.add_char buf c done) pairs;
      Buffer.contents buf
    

    OCaml's List.fold_left is equivalent to Rust's Iterator::fold; the accumulator pattern is identical.

    Full Source

    #![allow(clippy::all)]
    //! # 500. String Compression — Run-Length Encoding
    //! Encode/decode strings using run-length encoding via iterator fold.
    
    /// Encode a string into (count, char) pairs.
    /// "aabbbcc" -> vec![(2,'a'), (3,'b'), (2,'c')]
    fn encode(s: &str) -> Vec<(usize, char)> {
        let mut chars = s.chars();
        let first = match chars.next() {
            None => return vec![],
            Some(c) => c,
        };
        let (cur, count, mut acc) =
            chars.fold((first, 1usize, Vec::new()), |(cur, count, mut acc), c| {
                if c == cur {
                    (cur, count + 1, acc)
                } else {
                    acc.push((count, cur));
                    (c, 1, acc)
                }
            });
        acc.push((count, cur));
        acc
    }
    
    /// Decode (count, char) pairs back into a string.
    fn decode(pairs: &[(usize, char)]) -> String {
        pairs.iter().fold(String::new(), |mut s, &(n, c)| {
            for _ in 0..n {
                s.push(c);
            }
            s
        })
    }
    
    /// Format encoded pairs as human-readable "2a3b2c"
    fn show_encoded(pairs: &[(usize, char)]) -> String {
        pairs.iter().map(|(n, c)| format!("{}{}", n, c)).collect()
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_encode_basic() {
            assert_eq!(encode("aabbbcc"), vec![(2, 'a'), (3, 'b'), (2, 'c')]);
        }
    
        #[test]
        fn test_encode_all_same() {
            assert_eq!(encode("aaaa"), vec![(4, 'a')]);
        }
    
        #[test]
        fn test_encode_no_repeats() {
            assert_eq!(
                encode("abcde"),
                vec![(1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (1, 'e')]
            );
        }
    
        #[test]
        fn test_encode_empty() {
            assert_eq!(encode(""), vec![]);
        }
    
        #[test]
        fn test_roundtrip() {
            let inputs = ["aabbbcc", "aaaa", "abcde", "aabbcc", "z"];
            for s in &inputs {
                assert_eq!(decode(&encode(s)), *s, "roundtrip failed for {:?}", s);
            }
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_encode_basic() {
            assert_eq!(encode("aabbbcc"), vec![(2, 'a'), (3, 'b'), (2, 'c')]);
        }
    
        #[test]
        fn test_encode_all_same() {
            assert_eq!(encode("aaaa"), vec![(4, 'a')]);
        }
    
        #[test]
        fn test_encode_no_repeats() {
            assert_eq!(
                encode("abcde"),
                vec![(1, 'a'), (1, 'b'), (1, 'c'), (1, 'd'), (1, 'e')]
            );
        }
    
        #[test]
        fn test_encode_empty() {
            assert_eq!(encode(""), vec![]);
        }
    
        #[test]
        fn test_roundtrip() {
            let inputs = ["aabbbcc", "aaaa", "abcde", "aabbcc", "z"];
            for s in &inputs {
                assert_eq!(decode(&encode(s)), *s, "roundtrip failed for {:?}", s);
            }
        }
    }

    Exercises

  • Encode to string: Implement fn encode_str(s: &str) -> String that formats directly as "3a3b2c" without an intermediate Vec.
  • Decode from string: Implement fn decode_str(s: &str) -> Option<String> that parses "3a3b2c" by alternating between parse::<usize>() runs and char reads.
  • Compression ratio: Write a benchmark comparing RLE compression on (a) random ASCII text, (b) repeated patterns like "abcabcabc", and (c) binary-like data — measure when RLE expands vs. compresses.
  • Open Source Repos