499 Fundamental

String Escaping

Functional Programming

Tutorial

The Problem

Injecting unescaped user input into HTML causes XSS; into SQL causes injection; into shell commands causes command injection. Every output context has a set of special characters that must be escaped. The inverse — unescaping — is equally important when reading stored or transmitted data. A correct implementation must handle the escape sequences as a state machine (for unescaping, \ introduces a two-character sequence), not just a series of replace calls that could double-escape.

🎯 Learning Outcomes

• Escape HTML entities (<, >, &, ", ') using flat_map over chars

• Unescape HTML entities using sequential replace calls (order matters: & last)

• Escape control characters (\n, \t, \r, \\, ") with a push_str pattern

• Unescape with a stateful character iterator using Peekable

• Understand why unescape_html must replace & last to avoid double-unescaping

Code Example

#![allow(clippy::all)]
// 499. Escaping and unescaping strings
fn escape_html(s: &str) -> String {
    s.chars()
        .flat_map(|c| match c {
            '<' => "&lt;".chars().collect::<Vec<_>>(),
            '>' => "&gt;".chars().collect(),
            '&' => "&amp;".chars().collect(),
            '"' => "&quot;".chars().collect(),
            '\'' => "&#39;".chars().collect(),
            c => vec![c],
        })
        .collect()
}

fn unescape_html(s: &str) -> String {
    s.replace("&lt;", "<")
        .replace("&gt;", ">")
        .replace("&amp;", "&")
        .replace("&quot;", "\"")
        .replace("&#39;", "'")
}

fn escape_control(s: &str) -> String {
    let mut out = String::with_capacity(s.len());
    for c in s.chars() {
        match c {
            '\n' => out.push_str("\\n"),
            '\t' => out.push_str("\\t"),
            '\r' => out.push_str("\\r"),
            '\\' => out.push_str("\\\\"),
            '"' => out.push_str("\\\""),
            c => out.push(c),
        }
    }
    out
}

fn unescape_control(s: &str) -> String {
    let mut out = String::with_capacity(s.len());
    let mut iter = s.chars().peekable();
    while let Some(c) = iter.next() {
        if c == '\\' {
            match iter.next() {
                Some('n') => out.push('\n'),
                Some('t') => out.push('\t'),
                Some('r') => out.push('\r'),
                Some('\\') => out.push('\\'),
                Some('"') => out.push('"'),
                Some(c) => {
                    out.push('\\');
                    out.push(c);
                }
                None => out.push('\\'),
            }
        } else {
            out.push(c);
        }
    }
    out
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test_html_escape() {
        assert_eq!(escape_html("<b>hi</b>"), "&lt;b&gt;hi&lt;/b&gt;");
    }
    #[test]
    fn test_html_unescape() {
        assert_eq!(unescape_html("&lt;b&gt;"), "<b>");
    }
    #[test]
    fn test_roundtrip_html() {
        let s = "<div>&amp;</div>";
        assert_eq!(unescape_html(&escape_html(s)), s);
    }
    #[test]
    fn test_control_esc() {
        assert_eq!(escape_control("a\nb"), "a\\nb");
    }
    #[test]
    fn test_control_unesc() {
        assert_eq!(unescape_control("a\\nb"), "a\nb");
    }
}

(* 499. String escaping – OCaml *)
let escape_html s =
  let buf = Buffer.create (String.length s) in
  String.iter (fun c -> match c with
    | '<' -> Buffer.add_string buf "&lt;"
    | '>' -> Buffer.add_string buf "&gt;"
    | '&' -> Buffer.add_string buf "&amp;"
    | '"' -> Buffer.add_string buf "&quot;"
    | '\'' -> Buffer.add_string buf "&#39;"
    | c   -> Buffer.add_char buf c
  ) s;
  Buffer.contents buf

let escape_backslash s =
  let buf = Buffer.create (String.length s) in
  String.iter (fun c -> match c with
    | '\n' -> Buffer.add_string buf "\\n"
    | '\t' -> Buffer.add_string buf "\\t"
    | '\\' -> Buffer.add_string buf "\\\\"
    | c   -> Buffer.add_char buf c
  ) s;
  Buffer.contents buf

let () =
  let html = "<div class=\"hello\">Hello & World!</div>" in
  Printf.printf "%s\n" (escape_html html);
  let raw = "line1\nline2\ttab\\slash" in
  Printf.printf "%s\n" (escape_backslash raw)

Key Differences

**flat_map vs. Buffer**: Rust's flat_map is declarative but allocates intermediate Vec<char> per character; the Buffer/push_str approach (like OCaml's) is more allocation-efficient for hot paths.

Unescape ordering: Both Rust and OCaml must unescape & last in HTML unescaping; the replace chain is equivalent to sequential OCaml Buffer scans.

**Peekable iterator**: Rust's chars().peekable() provides lookahead for the two-character escape state machine; OCaml uses integer index advancement.

Context-specific escaping: Rust's html-escape crate and OCaml's tyxml provide well-tested context-aware escaping; handwritten implementations are error-prone for edge cases.

OCaml Approach

let escape_html s =
  let buf = Buffer.create (String.length s) in
  String.iter (fun c -> match c with
    | '<' -> Buffer.add_string buf "&lt;"
    | '>' -> Buffer.add_string buf "&gt;"
    | '&' -> Buffer.add_string buf "&amp;"
    | '"' -> Buffer.add_string buf "&quot;"
    | '\'' -> Buffer.add_string buf "&#39;"
    | c -> Buffer.add_char buf c) s;
  Buffer.contents buf

OCaml's Buffer-based approach avoids intermediate allocation. The tyxml library provides HTML-safe string escaping as part of its typed HTML API; yojson and ezjsonm handle JSON escaping.

Full Source

#![allow(clippy::all)]
// 499. Escaping and unescaping strings
fn escape_html(s: &str) -> String {
    s.chars()
        .flat_map(|c| match c {
            '<' => "&lt;".chars().collect::<Vec<_>>(),
            '>' => "&gt;".chars().collect(),
            '&' => "&amp;".chars().collect(),
            '"' => "&quot;".chars().collect(),
            '\'' => "&#39;".chars().collect(),
            c => vec![c],
        })
        .collect()
}

fn unescape_html(s: &str) -> String {
    s.replace("&lt;", "<")
        .replace("&gt;", ">")
        .replace("&amp;", "&")
        .replace("&quot;", "\"")
        .replace("&#39;", "'")
}

fn escape_control(s: &str) -> String {
    let mut out = String::with_capacity(s.len());
    for c in s.chars() {
        match c {
            '\n' => out.push_str("\\n"),
            '\t' => out.push_str("\\t"),
            '\r' => out.push_str("\\r"),
            '\\' => out.push_str("\\\\"),
            '"' => out.push_str("\\\""),
            c => out.push(c),
        }
    }
    out
}

fn unescape_control(s: &str) -> String {
    let mut out = String::with_capacity(s.len());
    let mut iter = s.chars().peekable();
    while let Some(c) = iter.next() {
        if c == '\\' {
            match iter.next() {
                Some('n') => out.push('\n'),
                Some('t') => out.push('\t'),
                Some('r') => out.push('\r'),
                Some('\\') => out.push('\\'),
                Some('"') => out.push('"'),
                Some(c) => {
                    out.push('\\');
                    out.push(c);
                }
                None => out.push('\\'),
            }
        } else {
            out.push(c);
        }
    }
    out
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test_html_escape() {
        assert_eq!(escape_html("<b>hi</b>"), "&lt;b&gt;hi&lt;/b&gt;");
    }
    #[test]
    fn test_html_unescape() {
        assert_eq!(unescape_html("&lt;b&gt;"), "<b>");
    }
    #[test]
    fn test_roundtrip_html() {
        let s = "<div>&amp;</div>";
        assert_eq!(unescape_html(&escape_html(s)), s);
    }
    #[test]
    fn test_control_esc() {
        assert_eq!(escape_control("a\nb"), "a\\nb");
    }
    #[test]
    fn test_control_unesc() {
        assert_eq!(unescape_control("a\\nb"), "a\nb");
    }
}

(* 499. String escaping – OCaml *)
let escape_html s =
  let buf = Buffer.create (String.length s) in
  String.iter (fun c -> match c with
    | '<' -> Buffer.add_string buf "&lt;"
    | '>' -> Buffer.add_string buf "&gt;"
    | '&' -> Buffer.add_string buf "&amp;"
    | '"' -> Buffer.add_string buf "&quot;"
    | '\'' -> Buffer.add_string buf "&#39;"
    | c   -> Buffer.add_char buf c
  ) s;
  Buffer.contents buf

let escape_backslash s =
  let buf = Buffer.create (String.length s) in
  String.iter (fun c -> match c with
    | '\n' -> Buffer.add_string buf "\\n"
    | '\t' -> Buffer.add_string buf "\\t"
    | '\\' -> Buffer.add_string buf "\\\\"
    | c   -> Buffer.add_char buf c
  ) s;
  Buffer.contents buf

let () =
  let html = "<div class=\"hello\">Hello & World!</div>" in
  Printf.printf "%s\n" (escape_html html);
  let raw = "line1\nline2\ttab\\slash" in
  Printf.printf "%s\n" (escape_backslash raw)

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test_html_escape() {
        assert_eq!(escape_html("<b>hi</b>"), "&lt;b&gt;hi&lt;/b&gt;");
    }
    #[test]
    fn test_html_unescape() {
        assert_eq!(unescape_html("&lt;b&gt;"), "<b>");
    }
    #[test]
    fn test_roundtrip_html() {
        let s = "<div>&amp;</div>";
        assert_eq!(unescape_html(&escape_html(s)), s);
    }
    #[test]
    fn test_control_esc() {
        assert_eq!(escape_control("a\nb"), "a\\nb");
    }
    #[test]
    fn test_control_unesc() {
        assert_eq!(unescape_control("a\\nb"), "a\nb");
    }
}

Exercises

JSON string escaping: Extend escape_control to produce a valid JSON string (wrap in ", escape all required characters per RFC 8259 including Unicode escapes \uXXXX for control chars).

SQL escaping: Implement escape_sql_string(s: &str) -> String that escapes single quotes by doubling them (' → ''), suitable for SQL string literals (but note: parameterised queries are always preferred).

Round-trip property test: Write a proptest test that verifies unescape_html(escape_html(s)) == s for all strings not containing existing HTML entities.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust