ExamplesBy LevelBy TopicLearning Paths
096 Advanced

096 — Recursive Descent Parser

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "096 — Recursive Descent Parser" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. Parse arithmetic expressions with addition and multiplication into an AST, respecting operator precedence (`*` binds tighter than `+`). Key difference from OCaml: | Aspect | Rust | OCaml |

Tutorial

The Problem

Parse arithmetic expressions with addition and multiplication into an AST, respecting operator precedence (* binds tighter than +). Implement a three-function recursive descent parser (parse_expr, parse_term, parse_atom) on a token slice, returning (Expr, remaining_tokens). Compare with OCaml's mutually recursive and-joined functions.

🎯 Learning Outcomes

  • • Represent ASTs with recursive enums Expr::Num | Add(Box<Expr>, Box<Expr>) | Mul(...)
  • • Use slice.split_first() to consume one token at a time without mutation
  • • Return (Expr, &[&str]) — the parsed subtree plus the unconsumed token slice
  • • Apply lifetime annotations 'a to express that the returned slice borrows the input
  • • Map Rust's separate function definitions to OCaml's let rec … and … mutual recursion
  • • Understand how operator precedence is encoded in the call hierarchy
  • Code Example

    fn parse_expr<'a>(tokens: &'a [&str]) -> Result<(Expr, &'a [&str]), String> {
        let (left, rest) = parse_term(tokens)?;
        if let Some(("+", rest)) = rest.split_first() {
            let (right, rest) = parse_expr(rest)?;
            Ok((Expr::Add(Box::new(left), Box::new(right)), rest))
        } else { Ok((left, rest)) }
    }

    Key Differences

    AspectRustOCaml
    Token consumptionslice.split_first()List pattern token :: rest
    Mutual recursionSeparate fn definitionslet rec … and …
    Error handlingResult<…, String>failwith exception
    Remaining tokens&'a [&'a str] with lifetimestring list (by value)
    AST node allocationBox::new(left)Native recursive type
    Precedence encodingCall hierarchySame call hierarchy

    The call hierarchy encodes precedence: parse_expr calls parse_term, which calls parse_atom. Higher-priority operators are parsed deeper in the call stack — multiplication is resolved before addition because parse_term is called from parse_expr.

    OCaml Approach

    OCaml uses let rec parse_expr tokens = … and parse_term tokens = … and parse_atom = … for mutually recursive functions. Pattern matching on "+" :: rest' consumes tokens from the list. The list-based approach is natural in OCaml — cons :: deconstruction maps directly to the recursive descent pattern. The eval function is a separate catamorphism over the expr AST.

    Full Source

    #![allow(clippy::all)]
    //! # Simple Recursive Descent Parser
    //!
    //! Parse arithmetic expressions into an AST with correct precedence.
    //! OCaml's mutual recursion with `and` maps to Rust functions calling each other.
    
    #[derive(Debug, PartialEq)]
    pub enum Expr {
        Num(i64),
        Add(Box<Expr>, Box<Expr>),
        Mul(Box<Expr>, Box<Expr>),
    }
    
    // ---------------------------------------------------------------------------
    // Approach A: Slice-based parser (mirrors OCaml's list consumption)
    // ---------------------------------------------------------------------------
    
    pub fn parse<'a>(tokens: &'a [&'a str]) -> Result<Expr, String> {
        let (expr, rest) = parse_expr(tokens)?;
        if rest.is_empty() {
            Ok(expr)
        } else {
            Err(format!("unexpected tokens: {:?}", rest))
        }
    }
    
    fn parse_expr<'a>(tokens: &'a [&'a str]) -> Result<(Expr, &'a [&'a str]), String> {
        let (left, rest) = parse_term(tokens)?;
        if let Some((&"+", rest)) = rest.split_first() {
            let (right, rest) = parse_expr(rest)?;
            Ok((Expr::Add(Box::new(left), Box::new(right)), rest))
        } else {
            Ok((left, rest))
        }
    }
    
    fn parse_term<'a>(tokens: &'a [&'a str]) -> Result<(Expr, &'a [&'a str]), String> {
        let (left, rest) = parse_atom(tokens)?;
        if let Some((&"*", rest)) = rest.split_first() {
            let (right, rest) = parse_term(rest)?;
            Ok((Expr::Mul(Box::new(left), Box::new(right)), rest))
        } else {
            Ok((left, rest))
        }
    }
    
    fn parse_atom<'a>(tokens: &'a [&'a str]) -> Result<(Expr, &'a [&'a str]), String> {
        match tokens.split_first() {
            Some((token, rest)) => {
                let n: i64 = token
                    .parse()
                    .map_err(|_| format!("not a number: {}", token))?;
                Ok((Expr::Num(n), rest))
            }
            None => Err("unexpected end of input".to_string()),
        }
    }
    
    // ---------------------------------------------------------------------------
    // Approach B: Evaluator (mirrors OCaml's eval)
    // ---------------------------------------------------------------------------
    
    pub fn eval(expr: &Expr) -> i64 {
        match expr {
            Expr::Num(n) => *n,
            Expr::Add(a, b) => eval(a) + eval(b),
            Expr::Mul(a, b) => eval(a) * eval(b),
        }
    }
    
    // ---------------------------------------------------------------------------
    // Approach C: Index-based parser (avoids slice lifetimes)
    // ---------------------------------------------------------------------------
    
    pub fn parse_and_eval(tokens: &[&str]) -> Result<i64, String> {
        let expr = parse(tokens)?;
        Ok(eval(&expr))
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_simple_add() {
            assert_eq!(parse_and_eval(&["2", "+", "3"]), Ok(5));
        }
    
        #[test]
        fn test_simple_mul() {
            assert_eq!(parse_and_eval(&["2", "*", "3"]), Ok(6));
        }
    
        #[test]
        fn test_precedence() {
            assert_eq!(parse_and_eval(&["2", "+", "3", "*", "4"]), Ok(14));
        }
    
        #[test]
        fn test_single_number() {
            assert_eq!(parse_and_eval(&["42"]), Ok(42));
        }
    
        #[test]
        fn test_empty() {
            assert!(parse_and_eval(&[]).is_err());
        }
    
        #[test]
        fn test_complex() {
            assert_eq!(parse_and_eval(&["1", "+", "2", "+", "3"]), Ok(6));
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_simple_add() {
            assert_eq!(parse_and_eval(&["2", "+", "3"]), Ok(5));
        }
    
        #[test]
        fn test_simple_mul() {
            assert_eq!(parse_and_eval(&["2", "*", "3"]), Ok(6));
        }
    
        #[test]
        fn test_precedence() {
            assert_eq!(parse_and_eval(&["2", "+", "3", "*", "4"]), Ok(14));
        }
    
        #[test]
        fn test_single_number() {
            assert_eq!(parse_and_eval(&["42"]), Ok(42));
        }
    
        #[test]
        fn test_empty() {
            assert!(parse_and_eval(&[]).is_err());
        }
    
        #[test]
        fn test_complex() {
            assert_eq!(parse_and_eval(&["1", "+", "2", "+", "3"]), Ok(6));
        }
    }

    Deep Comparison

    Comparison: Recursive Descent Parser — OCaml vs Rust

    Core Insight

    The parser structure is nearly identical: parse_expr calls parse_term calls parse_atom, with each level consuming tokens and returning (AST, remaining_tokens). The key Rust difference is Box<Expr> — recursive enum variants must be heap-allocated because Rust needs compile-time size. OCaml's GC handles this transparently.

    OCaml

    type expr = Num of int | Add of expr * expr | Mul of expr * expr
    
    let rec parse_expr tokens = 
      let left, rest = parse_term tokens in
      match rest with
      | "+" :: rest' -> let right, rest'' = parse_expr rest' in (Add (left, right), rest'')
      | _ -> (left, rest)
    and parse_term tokens =
      let left, rest = parse_atom tokens in
      match rest with
      | "*" :: rest' -> let right, rest'' = parse_term rest' in (Mul (left, right), rest'')
      | _ -> (left, rest)
    

    Rust

    fn parse_expr<'a>(tokens: &'a [&str]) -> Result<(Expr, &'a [&str]), String> {
        let (left, rest) = parse_term(tokens)?;
        if let Some(("+", rest)) = rest.split_first() {
            let (right, rest) = parse_expr(rest)?;
            Ok((Expr::Add(Box::new(left), Box::new(right)), rest))
        } else { Ok((left, rest)) }
    }
    

    Comparison Table

    AspectOCamlRust
    Recursive typeexpr * expr directlyBox<Expr> required
    Mutual recursionand keywordFunctions call each other
    Token consumption"+" :: rest' list matchsplit_first() on slice
    Error handlingfailwith exceptionResult<T, String>
    Tuple return(ast, rest)(Expr, &[&str]) with lifetime
    Parse intint_of_stringstr::parse::<i64>()

    Learner Notes

  • • **Box<Expr>**: The #1 surprise for OCaml devs. Rust enums are stack-allocated, so recursive variants need indirection
  • Lifetimes in parsers: &'a [&str] — the output slice borrows from the input, and Rust tracks this
  • • **split_first()**: Returns Option<(&T, &[T])> — Rust's equivalent of OCaml's list head/tail destructuring
  • • **No and keyword**: Rust doesn't need it. Forward references work naturally for functions (not for types — use Box)
  • • **? operator**: Propagates parse errors elegantly — each ? is like OCaml's match ... with Error -> ...
  • Exercises

  • Add subtraction and division to the parser, keeping + - at lower precedence than * /.
  • Add parentheses support: parse_atom should handle "(" by calling parse_expr and then expecting ")".
  • Add a unary negation operator: parse_atom should handle "-" followed by another atom.
  • Write a tokeniser tokenise(s: &str) -> Vec<String> that splits an arithmetic expression string into tokens.
  • In OCaml, extend the parser with let x = e in e' expressions and implement a substitution-based evaluator.
  • Open Source Repos