ExamplesBy LevelBy TopicLearning Paths
499 Fundamental

String Escaping

Functional Programming

Tutorial

The Problem

Injecting unescaped user input into HTML causes XSS; into SQL causes injection; into shell commands causes command injection. Every output context has a set of special characters that must be escaped. The inverse — unescaping — is equally important when reading stored or transmitted data. A correct implementation must handle the escape sequences as a state machine (for unescaping, \ introduces a two-character sequence), not just a series of replace calls that could double-escape.

🎯 Learning Outcomes

  • • Escape HTML entities (<, >, &, ", ') using flat_map over chars
  • • Unescape HTML entities using sequential replace calls (order matters: &amp; last)
  • • Escape control characters (\n, \t, \r, \\, ") with a push_str pattern
  • • Unescape with a stateful character iterator using Peekable
  • • Understand why unescape_html must replace &amp; last to avoid double-unescaping
  • Code Example

    #![allow(clippy::all)]
    // 499. Escaping and unescaping strings
    fn escape_html(s: &str) -> String {
        s.chars()
            .flat_map(|c| match c {
                '<' => "&lt;".chars().collect::<Vec<_>>(),
                '>' => "&gt;".chars().collect(),
                '&' => "&amp;".chars().collect(),
                '"' => "&quot;".chars().collect(),
                '\'' => "&#39;".chars().collect(),
                c => vec![c],
            })
            .collect()
    }
    
    fn unescape_html(s: &str) -> String {
        s.replace("&lt;", "<")
            .replace("&gt;", ">")
            .replace("&amp;", "&")
            .replace("&quot;", "\"")
            .replace("&#39;", "'")
    }
    
    fn escape_control(s: &str) -> String {
        let mut out = String::with_capacity(s.len());
        for c in s.chars() {
            match c {
                '\n' => out.push_str("\\n"),
                '\t' => out.push_str("\\t"),
                '\r' => out.push_str("\\r"),
                '\\' => out.push_str("\\\\"),
                '"' => out.push_str("\\\""),
                c => out.push(c),
            }
        }
        out
    }
    
    fn unescape_control(s: &str) -> String {
        let mut out = String::with_capacity(s.len());
        let mut iter = s.chars().peekable();
        while let Some(c) = iter.next() {
            if c == '\\' {
                match iter.next() {
                    Some('n') => out.push('\n'),
                    Some('t') => out.push('\t'),
                    Some('r') => out.push('\r'),
                    Some('\\') => out.push('\\'),
                    Some('"') => out.push('"'),
                    Some(c) => {
                        out.push('\\');
                        out.push(c);
                    }
                    None => out.push('\\'),
                }
            } else {
                out.push(c);
            }
        }
        out
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
        #[test]
        fn test_html_escape() {
            assert_eq!(escape_html("<b>hi</b>"), "&lt;b&gt;hi&lt;/b&gt;");
        }
        #[test]
        fn test_html_unescape() {
            assert_eq!(unescape_html("&lt;b&gt;"), "<b>");
        }
        #[test]
        fn test_roundtrip_html() {
            let s = "<div>&amp;</div>";
            assert_eq!(unescape_html(&escape_html(s)), s);
        }
        #[test]
        fn test_control_esc() {
            assert_eq!(escape_control("a\nb"), "a\\nb");
        }
        #[test]
        fn test_control_unesc() {
            assert_eq!(unescape_control("a\\nb"), "a\nb");
        }
    }

    Key Differences

  • **flat_map vs. Buffer**: Rust's flat_map is declarative but allocates intermediate Vec<char> per character; the Buffer/push_str approach (like OCaml's) is more allocation-efficient for hot paths.
  • Unescape ordering: Both Rust and OCaml must unescape &amp; last in HTML unescaping; the replace chain is equivalent to sequential OCaml Buffer scans.
  • **Peekable iterator**: Rust's chars().peekable() provides lookahead for the two-character escape state machine; OCaml uses integer index advancement.
  • Context-specific escaping: Rust's html-escape crate and OCaml's tyxml provide well-tested context-aware escaping; handwritten implementations are error-prone for edge cases.
  • OCaml Approach

    let escape_html s =
      let buf = Buffer.create (String.length s) in
      String.iter (fun c -> match c with
        | '<' -> Buffer.add_string buf "&lt;"
        | '>' -> Buffer.add_string buf "&gt;"
        | '&' -> Buffer.add_string buf "&amp;"
        | '"' -> Buffer.add_string buf "&quot;"
        | '\'' -> Buffer.add_string buf "&#39;"
        | c -> Buffer.add_char buf c) s;
      Buffer.contents buf
    

    OCaml's Buffer-based approach avoids intermediate allocation. The tyxml library provides HTML-safe string escaping as part of its typed HTML API; yojson and ezjsonm handle JSON escaping.

    Full Source

    #![allow(clippy::all)]
    // 499. Escaping and unescaping strings
    fn escape_html(s: &str) -> String {
        s.chars()
            .flat_map(|c| match c {
                '<' => "&lt;".chars().collect::<Vec<_>>(),
                '>' => "&gt;".chars().collect(),
                '&' => "&amp;".chars().collect(),
                '"' => "&quot;".chars().collect(),
                '\'' => "&#39;".chars().collect(),
                c => vec![c],
            })
            .collect()
    }
    
    fn unescape_html(s: &str) -> String {
        s.replace("&lt;", "<")
            .replace("&gt;", ">")
            .replace("&amp;", "&")
            .replace("&quot;", "\"")
            .replace("&#39;", "'")
    }
    
    fn escape_control(s: &str) -> String {
        let mut out = String::with_capacity(s.len());
        for c in s.chars() {
            match c {
                '\n' => out.push_str("\\n"),
                '\t' => out.push_str("\\t"),
                '\r' => out.push_str("\\r"),
                '\\' => out.push_str("\\\\"),
                '"' => out.push_str("\\\""),
                c => out.push(c),
            }
        }
        out
    }
    
    fn unescape_control(s: &str) -> String {
        let mut out = String::with_capacity(s.len());
        let mut iter = s.chars().peekable();
        while let Some(c) = iter.next() {
            if c == '\\' {
                match iter.next() {
                    Some('n') => out.push('\n'),
                    Some('t') => out.push('\t'),
                    Some('r') => out.push('\r'),
                    Some('\\') => out.push('\\'),
                    Some('"') => out.push('"'),
                    Some(c) => {
                        out.push('\\');
                        out.push(c);
                    }
                    None => out.push('\\'),
                }
            } else {
                out.push(c);
            }
        }
        out
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
        #[test]
        fn test_html_escape() {
            assert_eq!(escape_html("<b>hi</b>"), "&lt;b&gt;hi&lt;/b&gt;");
        }
        #[test]
        fn test_html_unescape() {
            assert_eq!(unescape_html("&lt;b&gt;"), "<b>");
        }
        #[test]
        fn test_roundtrip_html() {
            let s = "<div>&amp;</div>";
            assert_eq!(unescape_html(&escape_html(s)), s);
        }
        #[test]
        fn test_control_esc() {
            assert_eq!(escape_control("a\nb"), "a\\nb");
        }
        #[test]
        fn test_control_unesc() {
            assert_eq!(unescape_control("a\\nb"), "a\nb");
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
        #[test]
        fn test_html_escape() {
            assert_eq!(escape_html("<b>hi</b>"), "&lt;b&gt;hi&lt;/b&gt;");
        }
        #[test]
        fn test_html_unescape() {
            assert_eq!(unescape_html("&lt;b&gt;"), "<b>");
        }
        #[test]
        fn test_roundtrip_html() {
            let s = "<div>&amp;</div>";
            assert_eq!(unescape_html(&escape_html(s)), s);
        }
        #[test]
        fn test_control_esc() {
            assert_eq!(escape_control("a\nb"), "a\\nb");
        }
        #[test]
        fn test_control_unesc() {
            assert_eq!(unescape_control("a\\nb"), "a\nb");
        }
    }

    Exercises

  • JSON string escaping: Extend escape_control to produce a valid JSON string (wrap in ", escape all required characters per RFC 8259 including Unicode escapes \uXXXX for control chars).
  • SQL escaping: Implement escape_sql_string(s: &str) -> String that escapes single quotes by doubling them ('''), suitable for SQL string literals (but note: parameterised queries are always preferred).
  • Round-trip property test: Write a proptest test that verifies unescape_html(escape_html(s)) == s for all strings not containing existing HTML entities.
  • Open Source Repos