ExamplesBy LevelBy TopicLearning Paths
151 Advanced

Introduction to Parser Combinators

Functional Programming

Tutorial

The Problem

Parsing transforms raw text into structured data. Hand-written parsers for each format are tedious, error-prone, and hard to compose. Parser combinators solve this by representing parsers as first-class values (functions) that can be combined using higher-order functions: sequence two parsers, try alternatives, repeat zero or more times. This functional approach, pioneered in Haskell by Parsec, produces parsers that closely mirror the grammar they parse, making them readable and maintainable.

🎯 Learning Outcomes

  • • Understand the core type: a parser is a function from &str to Result<(value, remaining), error>
  • • Learn to create primitive parsers (char_p, pure, fail) that form the building blocks
  • • See how parsers compose to handle complex grammars
  • • Understand why parser combinators are preferred over hand-written recursive descent in functional code
  • Code Example

    type ParseResult<'a, T> = Result<(T, &'a str), String>;
    type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;

    Key Differences

  • Core type: Both represent parsers as functions on string input; Rust uses Box<dyn Fn> due to lack of implicit heap allocation; OCaml uses plain functions or named types.
  • Lifetime annotations: Rust's 'a lifetime ties the parser to the input string's lifetime; OCaml has no lifetime concept — the GC manages the input string.
  • Operator syntax: OCaml naturally defines custom infix operators (>>=, <|>) for sequencing and choice; Rust uses method chaining or named functions.
  • Performance: Rust's parser combinators are typically faster due to zero-allocation &str slices; OCaml's allocate more intermediate values.
  • OCaml Approach

    OCaml's parser combinator tradition predates Rust: the angstrom library and Menhir use similar ideas. The core type is typically type 'a parser = string -> int -> ('a * int, string) result. OCaml's lighter syntax for closures (fun input -> ...) and partial application make combinator definitions more concise. The let ( >>= ) and let ( <|> ) operators integrate naturally with OCaml's infix operator system.

    Full Source

    #![allow(clippy::all)]
    // Example 151: Introduction to Parser Combinators
    // Defining the core Parser type and running basic parsers
    
    /// Core type: a parse result is either (value, remaining_input) or an error
    type ParseResult<'a, T> = Result<(T, &'a str), String>;
    
    // ============================================================
    // Approach 1: Parser as a plain function
    // ============================================================
    
    fn parse_char_a(input: &str) -> ParseResult<char> {
        match input.chars().next() {
            Some('a') => Ok(('a', &input[1..])),
            Some(c) => Err(format!("Expected 'a', got '{}'", c)),
            None => Err("Expected 'a', got end of input".to_string()),
        }
    }
    
    // ============================================================
    // Approach 2: Parser as a boxed closure (our standard type)
    // ============================================================
    
    type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
    
    /// Create a parser that always succeeds with the given value
    fn pure<'a, T: Clone + 'a>(value: T) -> Parser<'a, T> {
        Box::new(move |input| Ok((value.clone(), input)))
    }
    
    /// Create a parser that always fails with the given message
    fn fail<'a, T: 'a>(msg: &str) -> Parser<'a, T> {
        let msg = msg.to_string();
        Box::new(move |_input| Err(msg.clone()))
    }
    
    /// Create a parser that matches a specific character
    fn char_p<'a>(expected: char) -> Parser<'a, char> {
        Box::new(move |input: &'a str| match input.chars().next() {
            Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
            Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
            None => Err(format!("Expected '{}', got end of input", expected)),
        })
    }
    
    /// Run a parser on input
    fn run_parser<'a, T>(parser: &Parser<'a, T>, input: &'a str) -> ParseResult<'a, T> {
        parser(input)
    }
    
    // ============================================================
    // Approach 3: Parser as a trait object (alternative design)
    // ============================================================
    
    trait Parse<T> {
        fn parse<'a>(&self, input: &'a str) -> ParseResult<'a, T>;
    }
    
    struct CharParser {
        expected: char,
    }
    
    impl Parse<char> for CharParser {
        fn parse<'a>(&self, input: &'a str) -> ParseResult<'a, char> {
            match input.chars().next() {
                Some(c) if c == self.expected => Ok((c, &input[c.len_utf8()..])),
                Some(c) => Err(format!("Expected '{}', got '{}'", self.expected, c)),
                None => Err(format!("Expected '{}', got EOF", self.expected)),
            }
        }
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_plain_function_success() {
            assert_eq!(parse_char_a("abc"), Ok(('a', "bc")));
        }
    
        #[test]
        fn test_plain_function_failure() {
            assert!(parse_char_a("xyz").is_err());
        }
    
        #[test]
        fn test_plain_function_empty() {
            assert!(parse_char_a("").is_err());
        }
    
        #[test]
        fn test_char_parser_success() {
            let p = char_p('h');
            assert_eq!(run_parser(&p, "hello"), Ok(('h', "ello")));
        }
    
        #[test]
        fn test_char_parser_failure() {
            let p = char_p('h');
            assert!(run_parser(&p, "world").is_err());
        }
    
        #[test]
        fn test_pure() {
            let p = pure(42);
            assert_eq!(run_parser(&p, "hello"), Ok((42, "hello")));
        }
    
        #[test]
        fn test_fail() {
            let p: Parser<i32> = fail("oops");
            assert_eq!(run_parser(&p, "hello"), Err("oops".to_string()));
        }
    
        #[test]
        fn test_trait_parser() {
            let p = CharParser { expected: 'z' };
            assert_eq!(p.parse("zoo"), Ok(('z', "oo")));
        }
    
        #[test]
        fn test_trait_parser_failure() {
            let p = CharParser { expected: 'z' };
            assert!(p.parse("abc").is_err());
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_plain_function_success() {
            assert_eq!(parse_char_a("abc"), Ok(('a', "bc")));
        }
    
        #[test]
        fn test_plain_function_failure() {
            assert!(parse_char_a("xyz").is_err());
        }
    
        #[test]
        fn test_plain_function_empty() {
            assert!(parse_char_a("").is_err());
        }
    
        #[test]
        fn test_char_parser_success() {
            let p = char_p('h');
            assert_eq!(run_parser(&p, "hello"), Ok(('h', "ello")));
        }
    
        #[test]
        fn test_char_parser_failure() {
            let p = char_p('h');
            assert!(run_parser(&p, "world").is_err());
        }
    
        #[test]
        fn test_pure() {
            let p = pure(42);
            assert_eq!(run_parser(&p, "hello"), Ok((42, "hello")));
        }
    
        #[test]
        fn test_fail() {
            let p: Parser<i32> = fail("oops");
            assert_eq!(run_parser(&p, "hello"), Err("oops".to_string()));
        }
    
        #[test]
        fn test_trait_parser() {
            let p = CharParser { expected: 'z' };
            assert_eq!(p.parse("zoo"), Ok(('z', "oo")));
        }
    
        #[test]
        fn test_trait_parser_failure() {
            let p = CharParser { expected: 'z' };
            assert!(p.parse("abc").is_err());
        }
    }

    Deep Comparison

    Comparison: Example 151 — Parser Combinator Introduction

    Core Parser Type

    OCaml:

    type 'a parse_result = Ok of 'a * string | Error of string
    type 'a parser = string -> 'a parse_result
    

    Rust:

    type ParseResult<'a, T> = Result<(T, &'a str), String>;
    type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
    

    Character Parser

    OCaml:

    let char (c : char) : char parser = fun input ->
      if String.length input > 0 && input.[0] = c then
        Ok (c, String.sub input 1 (String.length input - 1))
      else
        Error (Printf.sprintf "Expected '%c'" c)
    

    Rust:

    fn char_p<'a>(expected: char) -> Parser<'a, char> {
        Box::new(move |input: &'a str| {
            match input.chars().next() {
                Some(c) if c == expected => Ok((c, &input[c.len_utf8()..])),
                Some(c) => Err(format!("Expected '{}', got '{}'", expected, c)),
                None => Err(format!("Expected '{}', got end of input", expected)),
            }
        })
    }
    

    Pure / Return

    OCaml:

    let return (x : 'a) : 'a parser = fun input -> Ok (x, input)
    

    Rust:

    fn pure<'a, T: Clone + 'a>(value: T) -> Parser<'a, T> {
        Box::new(move |input| Ok((value.clone(), input)))
    }
    

    Running a Parser

    OCaml:

    let run (p : 'a parser) (input : string) = p input
    

    Rust:

    fn run_parser<'a, T>(parser: &Parser<'a, T>, input: &'a str) -> ParseResult<'a, T> {
        parser(input)
    }
    

    Exercises

  • Implement always_succeed<T: Clone>(val: T) -> Parser<T> and verify it consumes no input.
  • Write one_of_chars(chars: Vec<char>) -> Parser<char> that matches any character in the set.
  • Implement expect_empty() -> Parser<()> that succeeds only at the end of input.
  • Open Source Repos