String Escaping
Functional Programming
Tutorial
The Problem
Injecting unescaped user input into HTML causes XSS; into SQL causes injection; into shell commands causes command injection. Every output context has a set of special characters that must be escaped. The inverse — unescaping — is equally important when reading stored or transmitted data. A correct implementation must handle the escape sequences as a state machine (for unescaping, \ introduces a two-character sequence), not just a series of replace calls that could double-escape.
🎯 Learning Outcomes
<, >, &, ", ') using flat_map over charsreplace calls (order matters: & last)\n, \t, \r, \\, ") with a push_str patternPeekableunescape_html must replace & last to avoid double-unescapingCode Example
#![allow(clippy::all)]
// 499. Escaping and unescaping strings
fn escape_html(s: &str) -> String {
s.chars()
.flat_map(|c| match c {
'<' => "<".chars().collect::<Vec<_>>(),
'>' => ">".chars().collect(),
'&' => "&".chars().collect(),
'"' => """.chars().collect(),
'\'' => "'".chars().collect(),
c => vec![c],
})
.collect()
}
fn unescape_html(s: &str) -> String {
s.replace("<", "<")
.replace(">", ">")
.replace("&", "&")
.replace(""", "\"")
.replace("'", "'")
}
fn escape_control(s: &str) -> String {
let mut out = String::with_capacity(s.len());
for c in s.chars() {
match c {
'\n' => out.push_str("\\n"),
'\t' => out.push_str("\\t"),
'\r' => out.push_str("\\r"),
'\\' => out.push_str("\\\\"),
'"' => out.push_str("\\\""),
c => out.push(c),
}
}
out
}
fn unescape_control(s: &str) -> String {
let mut out = String::with_capacity(s.len());
let mut iter = s.chars().peekable();
while let Some(c) = iter.next() {
if c == '\\' {
match iter.next() {
Some('n') => out.push('\n'),
Some('t') => out.push('\t'),
Some('r') => out.push('\r'),
Some('\\') => out.push('\\'),
Some('"') => out.push('"'),
Some(c) => {
out.push('\\');
out.push(c);
}
None => out.push('\\'),
}
} else {
out.push(c);
}
}
out
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_html_escape() {
assert_eq!(escape_html("<b>hi</b>"), "<b>hi</b>");
}
#[test]
fn test_html_unescape() {
assert_eq!(unescape_html("<b>"), "<b>");
}
#[test]
fn test_roundtrip_html() {
let s = "<div>&</div>";
assert_eq!(unescape_html(&escape_html(s)), s);
}
#[test]
fn test_control_esc() {
assert_eq!(escape_control("a\nb"), "a\\nb");
}
#[test]
fn test_control_unesc() {
assert_eq!(unescape_control("a\\nb"), "a\nb");
}
}Key Differences
flat_map vs. Buffer**: Rust's flat_map is declarative but allocates intermediate Vec<char> per character; the Buffer/push_str approach (like OCaml's) is more allocation-efficient for hot paths.& last in HTML unescaping; the replace chain is equivalent to sequential OCaml Buffer scans.Peekable iterator**: Rust's chars().peekable() provides lookahead for the two-character escape state machine; OCaml uses integer index advancement.html-escape crate and OCaml's tyxml provide well-tested context-aware escaping; handwritten implementations are error-prone for edge cases.OCaml Approach
let escape_html s =
let buf = Buffer.create (String.length s) in
String.iter (fun c -> match c with
| '<' -> Buffer.add_string buf "<"
| '>' -> Buffer.add_string buf ">"
| '&' -> Buffer.add_string buf "&"
| '"' -> Buffer.add_string buf """
| '\'' -> Buffer.add_string buf "'"
| c -> Buffer.add_char buf c) s;
Buffer.contents buf
OCaml's Buffer-based approach avoids intermediate allocation. The tyxml library provides HTML-safe string escaping as part of its typed HTML API; yojson and ezjsonm handle JSON escaping.
Full Source
#![allow(clippy::all)]
// 499. Escaping and unescaping strings
fn escape_html(s: &str) -> String {
s.chars()
.flat_map(|c| match c {
'<' => "<".chars().collect::<Vec<_>>(),
'>' => ">".chars().collect(),
'&' => "&".chars().collect(),
'"' => """.chars().collect(),
'\'' => "'".chars().collect(),
c => vec![c],
})
.collect()
}
fn unescape_html(s: &str) -> String {
s.replace("<", "<")
.replace(">", ">")
.replace("&", "&")
.replace(""", "\"")
.replace("'", "'")
}
fn escape_control(s: &str) -> String {
let mut out = String::with_capacity(s.len());
for c in s.chars() {
match c {
'\n' => out.push_str("\\n"),
'\t' => out.push_str("\\t"),
'\r' => out.push_str("\\r"),
'\\' => out.push_str("\\\\"),
'"' => out.push_str("\\\""),
c => out.push(c),
}
}
out
}
fn unescape_control(s: &str) -> String {
let mut out = String::with_capacity(s.len());
let mut iter = s.chars().peekable();
while let Some(c) = iter.next() {
if c == '\\' {
match iter.next() {
Some('n') => out.push('\n'),
Some('t') => out.push('\t'),
Some('r') => out.push('\r'),
Some('\\') => out.push('\\'),
Some('"') => out.push('"'),
Some(c) => {
out.push('\\');
out.push(c);
}
None => out.push('\\'),
}
} else {
out.push(c);
}
}
out
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_html_escape() {
assert_eq!(escape_html("<b>hi</b>"), "<b>hi</b>");
}
#[test]
fn test_html_unescape() {
assert_eq!(unescape_html("<b>"), "<b>");
}
#[test]
fn test_roundtrip_html() {
let s = "<div>&</div>";
assert_eq!(unescape_html(&escape_html(s)), s);
}
#[test]
fn test_control_esc() {
assert_eq!(escape_control("a\nb"), "a\\nb");
}
#[test]
fn test_control_unesc() {
assert_eq!(unescape_control("a\\nb"), "a\nb");
}
}
✓ Tests
Rust test suite
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_html_escape() {
assert_eq!(escape_html("<b>hi</b>"), "<b>hi</b>");
}
#[test]
fn test_html_unescape() {
assert_eq!(unescape_html("<b>"), "<b>");
}
#[test]
fn test_roundtrip_html() {
let s = "<div>&</div>";
assert_eq!(unescape_html(&escape_html(s)), s);
}
#[test]
fn test_control_esc() {
assert_eq!(escape_control("a\nb"), "a\\nb");
}
#[test]
fn test_control_unesc() {
assert_eq!(unescape_control("a\\nb"), "a\nb");
}
}
Exercises
escape_control to produce a valid JSON string (wrap in ", escape all required characters per RFC 8259 including Unicode escapes \uXXXX for control chars).escape_sql_string(s: &str) -> String that escapes single quotes by doubling them (' → ''), suitable for SQL string literals (but note: parameterised queries are always preferred).proptest test that verifies unescape_html(escape_html(s)) == s for all strings not containing existing HTML entities.