ExamplesBy LevelBy TopicLearning Paths
770 Advanced

770-recursive-descent-parser — Recursive Descent Parser

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "770-recursive-descent-parser — Recursive Descent Parser" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. Recursive descent parsing is the simplest technique for parsing context-free grammars. Key difference from OCaml: 1. **Mutual recursion**: Both languages handle mutual recursion naturally; Rust requires `fn parse_expr(&mut self)` calling `self.parse_term()`, while OCaml uses `let rec ... and ...`.

Tutorial

The Problem

Recursive descent parsing is the simplest technique for parsing context-free grammars. It was invented in the 1960s and remains the dominant approach for hand-written parsers in production compilers: Clang, GCC, rustc, and Go's parser are all recursive descent. The idea is elegant: each grammar rule becomes a function; parsing is calling that function. Operator precedence is handled by a hierarchy of mutually recursive functions (or the Pratt technique in the next example).

🎯 Learning Outcomes

  • • Implement a lexer that tokenizes an arithmetic expression into Token variants
  • • Build a recursive descent parser for expr → term ('+' | '-' term)* grammar
  • • Handle operator precedence via grammar rule hierarchy: expr → term → factor → atom
  • • Evaluate expressions directly during parsing (no separate AST phase)
  • • Understand why left-recursive grammars cannot be directly parsed by recursive descent
  • Code Example

    /// expr = term (('+' | '-') term)*
    pub fn parse_expr(&mut self) -> Result<Expr, String> {
        let mut left = self.parse_term()?;
        loop {
            match &self.current {
                Token::Plus => {
                    self.advance();
                    let right = self.parse_term()?;
                    left = Expr::BinOp { op: '+', left: Box::new(left), right: Box::new(right) };
                }
                _ => break,
            }
        }
        Ok(left)
    }

    Key Differences

  • Mutual recursion: Both languages handle mutual recursion naturally; Rust requires fn parse_expr(&mut self) calling self.parse_term(), while OCaml uses let rec ... and ....
  • Error recovery: Rust parsers typically panic on errors in simple implementations; production parsers return Result and synchronize on recovery tokens.
  • Parser generators: Rust has lalrpop, pest, and nom; OCaml has Menhir (LALR), sedlex (lexer), and angstrom (combinator).
  • Grammar expression: OCaml's Menhir uses BNF-like rules; Rust's lalrpop uses a similar notation with action code in Rust.
  • OCaml Approach

    OCaml is an excellent language for recursive descent parsers because functions are first-class and tail-call optimization prevents stack overflow on deep recursion. Menhir is OCaml's standard LALR parser generator, used in the OCaml compiler itself. Hand-written recursive descent parsers in OCaml look nearly identical to Rust: let rec parse_expr () = ... and parse_term () = .... The angstrom combinator library provides an alternative that avoids hand-rolling lexers.

    Full Source

    #![allow(clippy::all)]
    //! # Recursive Descent Parser
    //!
    //! A simple expression parser using recursive descent.
    
    /// Token types
    #[derive(Debug, Clone, PartialEq)]
    pub enum Token {
        Number(f64),
        Plus,
        Minus,
        Star,
        Slash,
        LParen,
        RParen,
        Eof,
    }
    
    /// Lexer
    pub struct Lexer<'a> {
        input: &'a str,
        pos: usize,
    }
    
    impl<'a> Lexer<'a> {
        pub fn new(input: &'a str) -> Self {
            Lexer { input, pos: 0 }
        }
    
        fn peek_char(&self) -> Option<char> {
            self.input[self.pos..].chars().next()
        }
    
        fn advance(&mut self) {
            if let Some(c) = self.peek_char() {
                self.pos += c.len_utf8();
            }
        }
    
        fn skip_whitespace(&mut self) {
            while let Some(c) = self.peek_char() {
                if c.is_whitespace() {
                    self.advance();
                } else {
                    break;
                }
            }
        }
    
        pub fn next_token(&mut self) -> Token {
            self.skip_whitespace();
    
            match self.peek_char() {
                None => Token::Eof,
                Some('+') => {
                    self.advance();
                    Token::Plus
                }
                Some('-') => {
                    self.advance();
                    Token::Minus
                }
                Some('*') => {
                    self.advance();
                    Token::Star
                }
                Some('/') => {
                    self.advance();
                    Token::Slash
                }
                Some('(') => {
                    self.advance();
                    Token::LParen
                }
                Some(')') => {
                    self.advance();
                    Token::RParen
                }
                Some(c) if c.is_ascii_digit() || c == '.' => {
                    let start = self.pos;
                    while let Some(c) = self.peek_char() {
                        if c.is_ascii_digit() || c == '.' {
                            self.advance();
                        } else {
                            break;
                        }
                    }
                    let num_str = &self.input[start..self.pos];
                    Token::Number(num_str.parse().unwrap_or(0.0))
                }
                Some(c) => panic!("Unexpected character: {}", c),
            }
        }
    }
    
    /// AST nodes
    #[derive(Debug, Clone)]
    pub enum Expr {
        Number(f64),
        BinOp {
            op: char,
            left: Box<Expr>,
            right: Box<Expr>,
        },
        UnaryMinus(Box<Expr>),
    }
    
    /// Recursive descent parser
    pub struct Parser<'a> {
        lexer: Lexer<'a>,
        current: Token,
    }
    
    impl<'a> Parser<'a> {
        pub fn new(input: &'a str) -> Self {
            let mut lexer = Lexer::new(input);
            let current = lexer.next_token();
            Parser { lexer, current }
        }
    
        fn advance(&mut self) {
            self.current = self.lexer.next_token();
        }
    
        /// expr = term (('+' | '-') term)*
        pub fn parse_expr(&mut self) -> Result<Expr, String> {
            let mut left = self.parse_term()?;
    
            loop {
                match &self.current {
                    Token::Plus => {
                        self.advance();
                        let right = self.parse_term()?;
                        left = Expr::BinOp {
                            op: '+',
                            left: Box::new(left),
                            right: Box::new(right),
                        };
                    }
                    Token::Minus => {
                        self.advance();
                        let right = self.parse_term()?;
                        left = Expr::BinOp {
                            op: '-',
                            left: Box::new(left),
                            right: Box::new(right),
                        };
                    }
                    _ => break,
                }
            }
    
            Ok(left)
        }
    
        /// term = factor (('*' | '/') factor)*
        fn parse_term(&mut self) -> Result<Expr, String> {
            let mut left = self.parse_factor()?;
    
            loop {
                match &self.current {
                    Token::Star => {
                        self.advance();
                        let right = self.parse_factor()?;
                        left = Expr::BinOp {
                            op: '*',
                            left: Box::new(left),
                            right: Box::new(right),
                        };
                    }
                    Token::Slash => {
                        self.advance();
                        let right = self.parse_factor()?;
                        left = Expr::BinOp {
                            op: '/',
                            left: Box::new(left),
                            right: Box::new(right),
                        };
                    }
                    _ => break,
                }
            }
    
            Ok(left)
        }
    
        /// factor = '-' factor | number | '(' expr ')'
        fn parse_factor(&mut self) -> Result<Expr, String> {
            match &self.current {
                Token::Minus => {
                    self.advance();
                    let expr = self.parse_factor()?;
                    Ok(Expr::UnaryMinus(Box::new(expr)))
                }
                Token::Number(n) => {
                    let n = *n;
                    self.advance();
                    Ok(Expr::Number(n))
                }
                Token::LParen => {
                    self.advance();
                    let expr = self.parse_expr()?;
                    if self.current != Token::RParen {
                        return Err("Expected ')'".to_string());
                    }
                    self.advance();
                    Ok(expr)
                }
                _ => Err(format!("Unexpected token: {:?}", self.current)),
            }
        }
    }
    
    /// Evaluate an expression
    pub fn eval(expr: &Expr) -> f64 {
        match expr {
            Expr::Number(n) => *n,
            Expr::UnaryMinus(e) => -eval(e),
            Expr::BinOp { op, left, right } => {
                let l = eval(left);
                let r = eval(right);
                match op {
                    '+' => l + r,
                    '-' => l - r,
                    '*' => l * r,
                    '/' => l / r,
                    _ => 0.0,
                }
            }
        }
    }
    
    /// Parse and evaluate
    pub fn calculate(input: &str) -> Result<f64, String> {
        let mut parser = Parser::new(input);
        let expr = parser.parse_expr()?;
        Ok(eval(&expr))
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_number() {
            assert_eq!(calculate("42").unwrap(), 42.0);
        }
    
        #[test]
        fn test_addition() {
            assert_eq!(calculate("1 + 2").unwrap(), 3.0);
        }
    
        #[test]
        fn test_precedence() {
            assert_eq!(calculate("1 + 2 * 3").unwrap(), 7.0);
        }
    
        #[test]
        fn test_parens() {
            assert_eq!(calculate("(1 + 2) * 3").unwrap(), 9.0);
        }
    
        #[test]
        fn test_unary_minus() {
            assert_eq!(calculate("-5").unwrap(), -5.0);
            assert_eq!(calculate("--5").unwrap(), 5.0);
        }
    
        #[test]
        fn test_complex() {
            assert_eq!(calculate("(10 - 2) / 4 + 1").unwrap(), 3.0);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_number() {
            assert_eq!(calculate("42").unwrap(), 42.0);
        }
    
        #[test]
        fn test_addition() {
            assert_eq!(calculate("1 + 2").unwrap(), 3.0);
        }
    
        #[test]
        fn test_precedence() {
            assert_eq!(calculate("1 + 2 * 3").unwrap(), 7.0);
        }
    
        #[test]
        fn test_parens() {
            assert_eq!(calculate("(1 + 2) * 3").unwrap(), 9.0);
        }
    
        #[test]
        fn test_unary_minus() {
            assert_eq!(calculate("-5").unwrap(), -5.0);
            assert_eq!(calculate("--5").unwrap(), 5.0);
        }
    
        #[test]
        fn test_complex() {
            assert_eq!(calculate("(10 - 2) / 4 + 1").unwrap(), 3.0);
        }
    }

    Deep Comparison

    OCaml vs Rust: Recursive Descent Parser

    Grammar Implementation

    Rust

    /// expr = term (('+' | '-') term)*
    pub fn parse_expr(&mut self) -> Result<Expr, String> {
        let mut left = self.parse_term()?;
        loop {
            match &self.current {
                Token::Plus => {
                    self.advance();
                    let right = self.parse_term()?;
                    left = Expr::BinOp { op: '+', left: Box::new(left), right: Box::new(right) };
                }
                _ => break,
            }
        }
        Ok(left)
    }
    

    OCaml

    (* expr = term (('+' | '-') term)* *)
    let rec parse_expr lexer =
      let left = parse_term lexer in
      match peek lexer with
      | Plus ->
          advance lexer;
          let right = parse_term lexer in
          BinOp { op = '+'; left; right }
      | _ -> left
    

    AST Definition

    Rust

    pub enum Expr {
        Number(f64),
        BinOp { op: char, left: Box<Expr>, right: Box<Expr> },
        UnaryMinus(Box<Expr>),
    }
    

    OCaml

    type expr =
      | Number of float
      | BinOp of { op: char; left: expr; right: expr }
      | UnaryMinus of expr
    

    Key Differences

    AspectOCamlRust
    RecursionNaturalSame
    Box for recursionNot needed (GC)Required
    Error handlingException or resultResult<T, E>
    Pattern matchmatchmatch

    Exercises

  • Add a ^ power operator (right-associative, highest precedence) by adding a parse_power level between parse_factor and parse_primary.
  • Extend the parser to build an AST (Expr enum) instead of evaluating directly, then write a separate eval(expr: &Expr) -> f64 function.
  • Add error recovery: instead of panicking on unexpected tokens, emit an error message and try to continue parsing from the next + or ) token.
  • Open Source Repos