096 — Recursive Descent Parser
Tutorial Video
Text description (accessibility)
This video demonstrates the "096 — Recursive Descent Parser" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. Parse arithmetic expressions with addition and multiplication into an AST, respecting operator precedence (`*` binds tighter than `+`). Key difference from OCaml: | Aspect | Rust | OCaml |
Tutorial
The Problem
Parse arithmetic expressions with addition and multiplication into an AST, respecting operator precedence (* binds tighter than +). Implement a three-function recursive descent parser (parse_expr, parse_term, parse_atom) on a token slice, returning (Expr, remaining_tokens). Compare with OCaml's mutually recursive and-joined functions.
🎯 Learning Outcomes
Expr::Num | Add(Box<Expr>, Box<Expr>) | Mul(...)slice.split_first() to consume one token at a time without mutation(Expr, &[&str]) — the parsed subtree plus the unconsumed token slice'a to express that the returned slice borrows the inputlet rec … and … mutual recursionCode Example
fn parse_expr<'a>(tokens: &'a [&str]) -> Result<(Expr, &'a [&str]), String> {
let (left, rest) = parse_term(tokens)?;
if let Some(("+", rest)) = rest.split_first() {
let (right, rest) = parse_expr(rest)?;
Ok((Expr::Add(Box::new(left), Box::new(right)), rest))
} else { Ok((left, rest)) }
}Key Differences
| Aspect | Rust | OCaml |
|---|---|---|
| Token consumption | slice.split_first() | List pattern token :: rest |
| Mutual recursion | Separate fn definitions | let rec … and … |
| Error handling | Result<…, String> | failwith exception |
| Remaining tokens | &'a [&'a str] with lifetime | string list (by value) |
| AST node allocation | Box::new(left) | Native recursive type |
| Precedence encoding | Call hierarchy | Same call hierarchy |
The call hierarchy encodes precedence: parse_expr calls parse_term, which calls parse_atom. Higher-priority operators are parsed deeper in the call stack — multiplication is resolved before addition because parse_term is called from parse_expr.
OCaml Approach
OCaml uses let rec parse_expr tokens = … and parse_term tokens = … and parse_atom = … for mutually recursive functions. Pattern matching on "+" :: rest' consumes tokens from the list. The list-based approach is natural in OCaml — cons :: deconstruction maps directly to the recursive descent pattern. The eval function is a separate catamorphism over the expr AST.
Full Source
#![allow(clippy::all)]
//! # Simple Recursive Descent Parser
//!
//! Parse arithmetic expressions into an AST with correct precedence.
//! OCaml's mutual recursion with `and` maps to Rust functions calling each other.
#[derive(Debug, PartialEq)]
pub enum Expr {
Num(i64),
Add(Box<Expr>, Box<Expr>),
Mul(Box<Expr>, Box<Expr>),
}
// ---------------------------------------------------------------------------
// Approach A: Slice-based parser (mirrors OCaml's list consumption)
// ---------------------------------------------------------------------------
pub fn parse<'a>(tokens: &'a [&'a str]) -> Result<Expr, String> {
let (expr, rest) = parse_expr(tokens)?;
if rest.is_empty() {
Ok(expr)
} else {
Err(format!("unexpected tokens: {:?}", rest))
}
}
fn parse_expr<'a>(tokens: &'a [&'a str]) -> Result<(Expr, &'a [&'a str]), String> {
let (left, rest) = parse_term(tokens)?;
if let Some((&"+", rest)) = rest.split_first() {
let (right, rest) = parse_expr(rest)?;
Ok((Expr::Add(Box::new(left), Box::new(right)), rest))
} else {
Ok((left, rest))
}
}
fn parse_term<'a>(tokens: &'a [&'a str]) -> Result<(Expr, &'a [&'a str]), String> {
let (left, rest) = parse_atom(tokens)?;
if let Some((&"*", rest)) = rest.split_first() {
let (right, rest) = parse_term(rest)?;
Ok((Expr::Mul(Box::new(left), Box::new(right)), rest))
} else {
Ok((left, rest))
}
}
fn parse_atom<'a>(tokens: &'a [&'a str]) -> Result<(Expr, &'a [&'a str]), String> {
match tokens.split_first() {
Some((token, rest)) => {
let n: i64 = token
.parse()
.map_err(|_| format!("not a number: {}", token))?;
Ok((Expr::Num(n), rest))
}
None => Err("unexpected end of input".to_string()),
}
}
// ---------------------------------------------------------------------------
// Approach B: Evaluator (mirrors OCaml's eval)
// ---------------------------------------------------------------------------
pub fn eval(expr: &Expr) -> i64 {
match expr {
Expr::Num(n) => *n,
Expr::Add(a, b) => eval(a) + eval(b),
Expr::Mul(a, b) => eval(a) * eval(b),
}
}
// ---------------------------------------------------------------------------
// Approach C: Index-based parser (avoids slice lifetimes)
// ---------------------------------------------------------------------------
pub fn parse_and_eval(tokens: &[&str]) -> Result<i64, String> {
let expr = parse(tokens)?;
Ok(eval(&expr))
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_simple_add() {
assert_eq!(parse_and_eval(&["2", "+", "3"]), Ok(5));
}
#[test]
fn test_simple_mul() {
assert_eq!(parse_and_eval(&["2", "*", "3"]), Ok(6));
}
#[test]
fn test_precedence() {
assert_eq!(parse_and_eval(&["2", "+", "3", "*", "4"]), Ok(14));
}
#[test]
fn test_single_number() {
assert_eq!(parse_and_eval(&["42"]), Ok(42));
}
#[test]
fn test_empty() {
assert!(parse_and_eval(&[]).is_err());
}
#[test]
fn test_complex() {
assert_eq!(parse_and_eval(&["1", "+", "2", "+", "3"]), Ok(6));
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_simple_add() {
assert_eq!(parse_and_eval(&["2", "+", "3"]), Ok(5));
}
#[test]
fn test_simple_mul() {
assert_eq!(parse_and_eval(&["2", "*", "3"]), Ok(6));
}
#[test]
fn test_precedence() {
assert_eq!(parse_and_eval(&["2", "+", "3", "*", "4"]), Ok(14));
}
#[test]
fn test_single_number() {
assert_eq!(parse_and_eval(&["42"]), Ok(42));
}
#[test]
fn test_empty() {
assert!(parse_and_eval(&[]).is_err());
}
#[test]
fn test_complex() {
assert_eq!(parse_and_eval(&["1", "+", "2", "+", "3"]), Ok(6));
}
}
Deep Comparison
Comparison: Recursive Descent Parser — OCaml vs Rust
Core Insight
The parser structure is nearly identical: parse_expr calls parse_term calls parse_atom, with each level consuming tokens and returning (AST, remaining_tokens). The key Rust difference is Box<Expr> — recursive enum variants must be heap-allocated because Rust needs compile-time size. OCaml's GC handles this transparently.
OCaml
type expr = Num of int | Add of expr * expr | Mul of expr * expr
let rec parse_expr tokens =
let left, rest = parse_term tokens in
match rest with
| "+" :: rest' -> let right, rest'' = parse_expr rest' in (Add (left, right), rest'')
| _ -> (left, rest)
and parse_term tokens =
let left, rest = parse_atom tokens in
match rest with
| "*" :: rest' -> let right, rest'' = parse_term rest' in (Mul (left, right), rest'')
| _ -> (left, rest)
Rust
fn parse_expr<'a>(tokens: &'a [&str]) -> Result<(Expr, &'a [&str]), String> {
let (left, rest) = parse_term(tokens)?;
if let Some(("+", rest)) = rest.split_first() {
let (right, rest) = parse_expr(rest)?;
Ok((Expr::Add(Box::new(left), Box::new(right)), rest))
} else { Ok((left, rest)) }
}
Comparison Table
| Aspect | OCaml | Rust |
|---|---|---|
| Recursive type | expr * expr directly | Box<Expr> required |
| Mutual recursion | and keyword | Functions call each other |
| Token consumption | "+" :: rest' list match | split_first() on slice |
| Error handling | failwith exception | Result<T, String> |
| Tuple return | (ast, rest) | (Expr, &[&str]) with lifetime |
| Parse int | int_of_string | str::parse::<i64>() |
Learner Notes
Box<Expr>**: The #1 surprise for OCaml devs. Rust enums are stack-allocated, so recursive variants need indirection&'a [&str] — the output slice borrows from the input, and Rust tracks thissplit_first()**: Returns Option<(&T, &[T])> — Rust's equivalent of OCaml's list head/tail destructuringand keyword**: Rust doesn't need it. Forward references work naturally for functions (not for types — use Box)? operator**: Propagates parse errors elegantly — each ? is like OCaml's match ... with Error -> ...Exercises
+ - at lower precedence than * /.parse_atom should handle "(" by calling parse_expr and then expecting ")".parse_atom should handle "-" followed by another atom.tokenise(s: &str) -> Vec<String> that splits an arithmetic expression string into tokens.let x = e in e' expressions and implement a substitution-based evaluator.