Introduction to Parser Combinators
Tutorial
The Problem
Parsing transforms raw text into structured data. Hand-written parsers for each format are tedious, error-prone, and hard to compose. Parser combinators solve this by representing parsers as first-class values (functions) that can be combined using higher-order functions: sequence two parsers, try alternatives, repeat zero or more times. This functional approach, pioneered in Haskell by Parsec, produces parsers that closely mirror the grammar they parse, making them readable and maintainable.
🎯 Learning Outcomes
&str to Result<(value, remaining), error>char_p, pure, fail) that form the building blocksCode Example
type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;Key Differences
Box<dyn Fn> due to lack of implicit heap allocation; OCaml uses plain functions or named types.'a lifetime ties the parser to the input string's lifetime; OCaml has no lifetime concept — the GC manages the input string.>>=, <|>) for sequencing and choice; Rust uses method chaining or named functions.&str slices; OCaml's allocate more intermediate values.OCaml Approach
OCaml's parser combinator tradition predates Rust: the angstrom library and Menhir use similar ideas. The core type is typically type 'a parser = string -> int -> ('a * int, string) result. OCaml's lighter syntax for closures (fun input -> ...) and partial application make combinator definitions more concise. The let ( >>= ) and let ( <|> ) operators integrate naturally with OCaml's infix operator system.
Full Source
#![allow(clippy::all)]
// Example 151: Introduction to Parser Combinators
// Defining the core Parser type and running basic parsers
/// Core type: a parse result is either (value, remaining_input) or an error
type ParseResult<'a, T> = Result<(T, &'a str), String>;
// ============================================================
// Approach 1: Parser as a plain function
// ============================================================
fn parse_char_a(input: &str) -> ParseResult<char> {
match input.chars().next() {
Some('a') => Ok(('a', &input[1..])),
Some(c) => Err(format!("Expected 'a', got '{}'", c)),
None => Err("Expected 'a', got end of input".to_string()),
}
}
// ============================================================
// Approach 2: Parser as a boxed closure (our standard type)
// ============================================================
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
/// Create a parser that always succeeds with the given value
fn pure<'a, T: Clone + 'a>(value: T) -> Parser<'a, T> {
Box::new(move |input| Ok((value.clone(), input)))
}
/// Create a parser that always fails with the given message
fn fail<'a, T: 'a>(msg: &str) -> Parser<'a, T> {
let msg = msg.to_string();
Box::new(move |_input| Err(msg.clone()))
}
/// Create a parser that matches a specific character
fn char_p<'a>(expected: char) -> Parser<'a, char> {
Box::new(move |input: &'a str| match input.chars().next() {
Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
None => Err(format!("Expected '{}', got end of input", expected)),
})
}
/// Run a parser on input
fn run_parser<'a, T>(parser: &Parser<'a, T>, input: &'a str) -> ParseResult<'a, T> {
parser(input)
}
// ============================================================
// Approach 3: Parser as a trait object (alternative design)
// ============================================================
trait Parse<T> {
fn parse<'a>(&self, input: &'a str) -> ParseResult<'a, T>;
}
struct CharParser {
expected: char,
}
impl Parse<char> for CharParser {
fn parse<'a>(&self, input: &'a str) -> ParseResult<'a, char> {
match input.chars().next() {
Some(c) if c == self.expected => Ok((c, &input[c.len_utf8()..])),
Some(c) => Err(format!("Expected '{}', got '{}'", self.expected, c)),
None => Err(format!("Expected '{}', got EOF", self.expected)),
}
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_plain_function_success() {
assert_eq!(parse_char_a("abc"), Ok(('a', "bc")));
}
#[test]
fn test_plain_function_failure() {
assert!(parse_char_a("xyz").is_err());
}
#[test]
fn test_plain_function_empty() {
assert!(parse_char_a("").is_err());
}
#[test]
fn test_char_parser_success() {
let p = char_p('h');
assert_eq!(run_parser(&p, "hello"), Ok(('h', "ello")));
}
#[test]
fn test_char_parser_failure() {
let p = char_p('h');
assert!(run_parser(&p, "world").is_err());
}
#[test]
fn test_pure() {
let p = pure(42);
assert_eq!(run_parser(&p, "hello"), Ok((42, "hello")));
}
#[test]
fn test_fail() {
let p: Parser<i32> = fail("oops");
assert_eq!(run_parser(&p, "hello"), Err("oops".to_string()));
}
#[test]
fn test_trait_parser() {
let p = CharParser { expected: 'z' };
assert_eq!(p.parse("zoo"), Ok(('z', "oo")));
}
#[test]
fn test_trait_parser_failure() {
let p = CharParser { expected: 'z' };
assert!(p.parse("abc").is_err());
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_plain_function_success() {
assert_eq!(parse_char_a("abc"), Ok(('a', "bc")));
}
#[test]
fn test_plain_function_failure() {
assert!(parse_char_a("xyz").is_err());
}
#[test]
fn test_plain_function_empty() {
assert!(parse_char_a("").is_err());
}
#[test]
fn test_char_parser_success() {
let p = char_p('h');
assert_eq!(run_parser(&p, "hello"), Ok(('h', "ello")));
}
#[test]
fn test_char_parser_failure() {
let p = char_p('h');
assert!(run_parser(&p, "world").is_err());
}
#[test]
fn test_pure() {
let p = pure(42);
assert_eq!(run_parser(&p, "hello"), Ok((42, "hello")));
}
#[test]
fn test_fail() {
let p: Parser<i32> = fail("oops");
assert_eq!(run_parser(&p, "hello"), Err("oops".to_string()));
}
#[test]
fn test_trait_parser() {
let p = CharParser { expected: 'z' };
assert_eq!(p.parse("zoo"), Ok(('z', "oo")));
}
#[test]
fn test_trait_parser_failure() {
let p = CharParser { expected: 'z' };
assert!(p.parse("abc").is_err());
}
}
Deep Comparison
Comparison: Example 151 — Parser Combinator Introduction
Core Parser Type
OCaml:
type 'a parse_result = Ok of 'a * string | Error of string
type 'a parser = string -> 'a parse_result
Rust:
type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
Character Parser
OCaml:
let char (c : char) : char parser = fun input ->
if String.length input > 0 && input.[0] = c then
Ok (c, String.sub input 1 (String.length input - 1))
else
Error (Printf.sprintf "Expected '%c'" c)
Rust:
fn char_p<'a>(expected: char) -> Parser<'a, char> {
Box::new(move |input: &'a str| {
match input.chars().next() {
Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
None => Err(format!("Expected '{}', got end of input", expected)),
}
})
}
Pure / Return
OCaml:
let return (x : 'a) : 'a parser = fun input -> Ok (x, input)
Rust:
fn pure<'a, T: Clone + 'a>(value: T) -> Parser<'a, T> {
Box::new(move |input| Ok((value.clone(), input)))
}
Running a Parser
OCaml:
let run (p : 'a parser) (input : string) = p input
Rust:
fn run_parser<'a, T>(parser: &Parser<'a, T>, input: &'a str) -> ParseResult<'a, T> {
parser(input)
}
Exercises
always_succeed<T: Clone>(val: T) -> Parser<T> and verify it consumes no input.one_of_chars(chars: Vec<char>) -> Parser<char> that matches any character in the set.expect_empty() -> Parser<()> that succeeds only at the end of input.