String Parser
Tutorial
The Problem
Many grammars require matching fixed string literals: keywords like "true" and "false", operators like "+=", or delimiters like "<!--". Matching character by character would work but is verbose and error-prone for multi-character strings. The tag combinator matches an entire expected string at once, using str::starts_with for efficiency. Case-insensitive variants handle HTTP methods, SQL keywords, and HTML tag names where casing is not significant.
🎯 Learning Outcomes
tag (exact string match) and tag_no_case (case-insensitive match)&'a str (a slice of the input) rather than String (an allocation)Code Example
fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
let expected_owned = expected.to_string();
Box::new(move |input: &'a str| {
if input.starts_with(&expected_owned) {
let rest = &input[expected_owned.len()..];
Ok((&input[..expected_owned.len()], rest))
} else {
Err(format!("Expected \"{}\"", expected_owned))
}
})
}Key Differences
tag returns a &'a str slice — no allocation; OCaml typically returns a new string (allocation under GC).'a lifetime annotation enforces the validity relationship between input and returned slice; OCaml has no lifetime concept.tag_no_case in Rust must be careful with Unicode case folding (.to_lowercase() allocates); OCaml's String.lowercase_ascii handles only ASCII.OCaml Approach
OCaml's angstrom provides string : string -> string t and string_ci : string -> string t. Because OCaml strings are immutable byte sequences, string_ci may allocate for the lowercased comparison. The returned value is a newly allocated string, not a slice of the input — allocation is less of a concern under GC.
Full Source
#![allow(clippy::all)]
// Example 154: String Parser
// Parse exact string literals
type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
// ============================================================
// Approach 1: Parse exact string (tag)
// ============================================================
fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
let expected_owned = expected.to_string();
Box::new(move |input: &'a str| {
if input.starts_with(&expected_owned) {
let rest = &input[expected_owned.len()..];
Ok((&input[..expected_owned.len()], rest))
} else {
Err(format!("Expected \"{}\"", expected_owned))
}
})
}
// ============================================================
// Approach 2: Case-insensitive string match
// ============================================================
fn tag_no_case<'a>(expected: &str) -> Parser<'a, &'a str> {
let expected_lower = expected.to_lowercase();
let len = expected.len();
Box::new(move |input: &'a str| {
if input.len() >= len && input[..len].to_lowercase() == expected_lower {
Ok((&input[..len], &input[len..]))
} else {
Err(format!(
"Expected \"{}\" (case insensitive)",
expected_lower
))
}
})
}
// ============================================================
// Approach 3: Build from character-by-character matching
// ============================================================
fn string_from_chars<'a>(expected: &str) -> Parser<'a, String> {
let expected = expected.to_string();
Box::new(move |input: &'a str| {
let mut remaining = input;
for expected_char in expected.chars() {
match remaining.chars().next() {
Some(c) if c == expected_char => {
remaining = &remaining[c.len_utf8()..];
}
Some(c) => return Err(format!("Expected '{}', got '{}'", expected_char, c)),
None => return Err(format!("Expected '{}', got EOF", expected_char)),
}
}
Ok((expected.clone(), remaining))
})
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_tag_match() {
let p = tag("hello");
assert_eq!(p("hello world"), Ok(("hello", " world")));
}
#[test]
fn test_tag_exact() {
let p = tag("hello");
assert_eq!(p("hello"), Ok(("hello", "")));
}
#[test]
fn test_tag_no_match() {
let p = tag("hello");
assert!(p("world").is_err());
}
#[test]
fn test_tag_too_short() {
let p = tag("hello");
assert!(p("hel").is_err());
}
#[test]
fn test_tag_no_case_upper() {
let p = tag_no_case("Hello");
assert_eq!(p("HELLO world"), Ok(("HELLO", " world")));
}
#[test]
fn test_tag_no_case_mixed() {
let p = tag_no_case("hello");
assert_eq!(p("HeLLo!"), Ok(("HeLLo", "!")));
}
#[test]
fn test_string_from_chars() {
let p = string_from_chars("abc");
assert_eq!(p("abcdef"), Ok(("abc".to_string(), "def")));
}
#[test]
fn test_string_from_chars_fail() {
let p = string_from_chars("abc");
assert!(p("axc").is_err());
}
#[test]
fn test_tag_empty_string() {
let p = tag("");
assert_eq!(p("anything"), Ok(("", "anything")));
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_tag_match() {
let p = tag("hello");
assert_eq!(p("hello world"), Ok(("hello", " world")));
}
#[test]
fn test_tag_exact() {
let p = tag("hello");
assert_eq!(p("hello"), Ok(("hello", "")));
}
#[test]
fn test_tag_no_match() {
let p = tag("hello");
assert!(p("world").is_err());
}
#[test]
fn test_tag_too_short() {
let p = tag("hello");
assert!(p("hel").is_err());
}
#[test]
fn test_tag_no_case_upper() {
let p = tag_no_case("Hello");
assert_eq!(p("HELLO world"), Ok(("HELLO", " world")));
}
#[test]
fn test_tag_no_case_mixed() {
let p = tag_no_case("hello");
assert_eq!(p("HeLLo!"), Ok(("HeLLo", "!")));
}
#[test]
fn test_string_from_chars() {
let p = string_from_chars("abc");
assert_eq!(p("abcdef"), Ok(("abc".to_string(), "def")));
}
#[test]
fn test_string_from_chars_fail() {
let p = string_from_chars("abc");
assert!(p("axc").is_err());
}
#[test]
fn test_tag_empty_string() {
let p = tag("");
assert_eq!(p("anything"), Ok(("", "anything")));
}
}
Deep Comparison
Comparison: Example 154 — String Parser
tag (exact match)
OCaml:
let tag (expected : string) : string parser = fun input ->
let len = String.length expected in
if String.length input >= len && String.sub input 0 len = expected then
Ok (expected, String.sub input len (String.length input - len))
else
Error (Printf.sprintf "Expected \"%s\"" expected)
Rust:
fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
let expected_owned = expected.to_string();
Box::new(move |input: &'a str| {
if input.starts_with(&expected_owned) {
let rest = &input[expected_owned.len()..];
Ok((&input[..expected_owned.len()], rest))
} else {
Err(format!("Expected \"{}\"", expected_owned))
}
})
}
tag_no_case
OCaml:
let tag_no_case (expected : string) : string parser = fun input ->
let len = String.length expected in
if String.length input >= len &&
String.lowercase_ascii (String.sub input 0 len) = String.lowercase_ascii expected then
Ok (String.sub input 0 len, String.sub input len (String.length input - len))
else
Error (Printf.sprintf "Expected \"%s\" (case insensitive)" expected)
Rust:
fn tag_no_case<'a>(expected: &str) -> Parser<'a, &'a str> {
let expected_lower = expected.to_lowercase();
let len = expected.len();
Box::new(move |input: &'a str| {
if input.len() >= len && input[..len].to_lowercase() == expected_lower {
Ok((&input[..len], &input[len..]))
} else {
Err(format!("Expected \"{}\" (case insensitive)", expected_lower))
}
})
}
Exercises
tag_byte_slice(expected: &[u8]) -> Parser<&[u8]> that works on raw bytes instead of UTF-8 strings.bool_parser() -> Parser<bool> using tag("true") and tag("false") combined with choice.keyword(kw: &str) -> Parser<&str> that matches a string followed by a non-identifier character (word boundary).