096 Advanced

096 — Recursive Descent Parser

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "096 — Recursive Descent Parser" functional Rust example. Difficulty level: Advanced. Key concepts covered: Functional Programming. Parse arithmetic expressions with addition and multiplication into an AST, respecting operator precedence (`*` binds tighter than `+`). Key difference from OCaml: | Aspect | Rust | OCaml |

Tutorial

The Problem

Parse arithmetic expressions with addition and multiplication into an AST, respecting operator precedence (* binds tighter than +). Implement a three-function recursive descent parser (parse_expr, parse_term, parse_atom) on a token slice, returning (Expr, remaining_tokens). Compare with OCaml's mutually recursive and-joined functions.

🎯 Learning Outcomes

• Represent ASTs with recursive enums Expr::Num | Add(Box<Expr>, Box<Expr>) | Mul(...)

• Use slice.split_first() to consume one token at a time without mutation

• Return (Expr, &[&str]) — the parsed subtree plus the unconsumed token slice

• Apply lifetime annotations 'a to express that the returned slice borrows the input

• Map Rust's separate function definitions to OCaml's let rec … and … mutual recursion

• Understand how operator precedence is encoded in the call hierarchy

Code Example

fn parse_expr<'a>(tokens: &'a [&str]) -> Result<(Expr, &'a [&str]), String> {
    let (left, rest) = parse_term(tokens)?;
    if let Some(("+", rest)) = rest.split_first() {
        let (right, rest) = parse_expr(rest)?;
        Ok((Expr::Add(Box::new(left), Box::new(right)), rest))
    } else { Ok((left, rest)) }
}

type expr = Num of int | Add of expr * expr | Mul of expr * expr

let rec parse_expr tokens = 
  let left, rest = parse_term tokens in
  match rest with
  | "+" :: rest' -> let right, rest'' = parse_expr rest' in (Add (left, right), rest'')
  | _ -> (left, rest)
and parse_term tokens =
  let left, rest = parse_atom tokens in
  match rest with
  | "*" :: rest' -> let right, rest'' = parse_term rest' in (Mul (left, right), rest'')
  | _ -> (left, rest)

Key Differences

Aspect	Rust	OCaml
Token consumption	`slice.split_first()`	List pattern `token :: rest`
Mutual recursion	Separate `fn` definitions	`let rec … and …`
Error handling	`Result<…, String>`	`failwith` exception
Remaining tokens	`&'a [&'a str]` with lifetime	`string list` (by value)
AST node allocation	`Box::new(left)`	Native recursive type
Precedence encoding	Call hierarchy	Same call hierarchy

The call hierarchy encodes precedence: parse_expr calls parse_term, which calls parse_atom. Higher-priority operators are parsed deeper in the call stack — multiplication is resolved before addition because parse_term is called from parse_expr.

OCaml Approach

OCaml uses let rec parse_expr tokens = … and parse_term tokens = … and parse_atom = … for mutually recursive functions. Pattern matching on "+" :: rest' consumes tokens from the list. The list-based approach is natural in OCaml — cons :: deconstruction maps directly to the recursive descent pattern. The eval function is a separate catamorphism over the expr AST.

Full Source

#![allow(clippy::all)]
//! # Simple Recursive Descent Parser
//!
//! Parse arithmetic expressions into an AST with correct precedence.
//! OCaml's mutual recursion with `and` maps to Rust functions calling each other.

#[derive(Debug, PartialEq)]
pub enum Expr {
    Num(i64),
    Add(Box<Expr>, Box<Expr>),
    Mul(Box<Expr>, Box<Expr>),
}

// ---------------------------------------------------------------------------
// Approach A: Slice-based parser (mirrors OCaml's list consumption)
// ---------------------------------------------------------------------------

pub fn parse<'a>(tokens: &'a [&'a str]) -> Result<Expr, String> {
    let (expr, rest) = parse_expr(tokens)?;
    if rest.is_empty() {
        Ok(expr)
    } else {
        Err(format!("unexpected tokens: {:?}", rest))
    }
}

fn parse_expr<'a>(tokens: &'a [&'a str]) -> Result<(Expr, &'a [&'a str]), String> {
    let (left, rest) = parse_term(tokens)?;
    if let Some((&"+", rest)) = rest.split_first() {
        let (right, rest) = parse_expr(rest)?;
        Ok((Expr::Add(Box::new(left), Box::new(right)), rest))
    } else {
        Ok((left, rest))
    }
}

fn parse_term<'a>(tokens: &'a [&'a str]) -> Result<(Expr, &'a [&'a str]), String> {
    let (left, rest) = parse_atom(tokens)?;
    if let Some((&"*", rest)) = rest.split_first() {
        let (right, rest) = parse_term(rest)?;
        Ok((Expr::Mul(Box::new(left), Box::new(right)), rest))
    } else {
        Ok((left, rest))
    }
}

fn parse_atom<'a>(tokens: &'a [&'a str]) -> Result<(Expr, &'a [&'a str]), String> {
    match tokens.split_first() {
        Some((token, rest)) => {
            let n: i64 = token
                .parse()
                .map_err(|_| format!("not a number: {}", token))?;
            Ok((Expr::Num(n), rest))
        }
        None => Err("unexpected end of input".to_string()),
    }
}

// ---------------------------------------------------------------------------
// Approach B: Evaluator (mirrors OCaml's eval)
// ---------------------------------------------------------------------------

pub fn eval(expr: &Expr) -> i64 {
    match expr {
        Expr::Num(n) => *n,
        Expr::Add(a, b) => eval(a) + eval(b),
        Expr::Mul(a, b) => eval(a) * eval(b),
    }
}

// ---------------------------------------------------------------------------
// Approach C: Index-based parser (avoids slice lifetimes)
// ---------------------------------------------------------------------------

pub fn parse_and_eval(tokens: &[&str]) -> Result<i64, String> {
    let expr = parse(tokens)?;
    Ok(eval(&expr))
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_simple_add() {
        assert_eq!(parse_and_eval(&["2", "+", "3"]), Ok(5));
    }

    #[test]
    fn test_simple_mul() {
        assert_eq!(parse_and_eval(&["2", "*", "3"]), Ok(6));
    }

    #[test]
    fn test_precedence() {
        assert_eq!(parse_and_eval(&["2", "+", "3", "*", "4"]), Ok(14));
    }

    #[test]
    fn test_single_number() {
        assert_eq!(parse_and_eval(&["42"]), Ok(42));
    }

    #[test]
    fn test_empty() {
        assert!(parse_and_eval(&[]).is_err());
    }

    #[test]
    fn test_complex() {
        assert_eq!(parse_and_eval(&["1", "+", "2", "+", "3"]), Ok(6));
    }
}

type expr = Num of int | Add of expr * expr | Mul of expr * expr

let rec parse_expr tokens = 
  let left, rest = parse_term tokens in
  match rest with
  | "+" :: rest' ->
    let right, rest'' = parse_expr rest' in
    (Add (left, right), rest'')
  | _ -> (left, rest)
and parse_term tokens =
  let left, rest = parse_atom tokens in
  match rest with
  | "*" :: rest' ->
    let right, rest'' = parse_term rest' in
    (Mul (left, right), rest'')
  | _ -> (left, rest)
and parse_atom = function
  | n :: rest -> (Num (int_of_string n), rest)
  | [] -> failwith "unexpected end"

let rec eval = function
  | Num n -> n | Add (a,b) -> eval a + eval b | Mul (a,b) -> eval a * eval b

let () =
  let tokens = ["2";"+";"3";"*";"4"] in
  let ast, _ = parse_expr tokens in
  Printf.printf "2+3*4 = %d\n" (eval ast)

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_simple_add() {
        assert_eq!(parse_and_eval(&["2", "+", "3"]), Ok(5));
    }

    #[test]
    fn test_simple_mul() {
        assert_eq!(parse_and_eval(&["2", "*", "3"]), Ok(6));
    }

    #[test]
    fn test_precedence() {
        assert_eq!(parse_and_eval(&["2", "+", "3", "*", "4"]), Ok(14));
    }

    #[test]
    fn test_single_number() {
        assert_eq!(parse_and_eval(&["42"]), Ok(42));
    }

    #[test]
    fn test_empty() {
        assert!(parse_and_eval(&[]).is_err());
    }

    #[test]
    fn test_complex() {
        assert_eq!(parse_and_eval(&["1", "+", "2", "+", "3"]), Ok(6));
    }
}

Deep Comparison

Comparison: Recursive Descent Parser — OCaml vs Rust

Core Insight

The parser structure is nearly identical: parse_expr calls parse_term calls parse_atom, with each level consuming tokens and returning (AST, remaining_tokens). The key Rust difference is Box<Expr> — recursive enum variants must be heap-allocated because Rust needs compile-time size. OCaml's GC handles this transparently.

OCaml

type expr = Num of int | Add of expr * expr | Mul of expr * expr

let rec parse_expr tokens = 
  let left, rest = parse_term tokens in
  match rest with
  | "+" :: rest' -> let right, rest'' = parse_expr rest' in (Add (left, right), rest'')
  | _ -> (left, rest)
and parse_term tokens =
  let left, rest = parse_atom tokens in
  match rest with
  | "*" :: rest' -> let right, rest'' = parse_term rest' in (Mul (left, right), rest'')
  | _ -> (left, rest)

Rust

fn parse_expr<'a>(tokens: &'a [&str]) -> Result<(Expr, &'a [&str]), String> {
    let (left, rest) = parse_term(tokens)?;
    if let Some(("+", rest)) = rest.split_first() {
        let (right, rest) = parse_expr(rest)?;
        Ok((Expr::Add(Box::new(left), Box::new(right)), rest))
    } else { Ok((left, rest)) }
}

Comparison Table

Aspect	OCaml	Rust
Recursive type	`expr * expr` directly	`Box<Expr>` required
Mutual recursion	`and` keyword	Functions call each other
Token consumption	`"+" :: rest'` list match	`split_first()` on slice
Error handling	`failwith` exception	`Result<T, String>`
Tuple return	`(ast, rest)`	`(Expr, &[&str])` with lifetime
Parse int	`int_of_string`	`str::parse::<i64>()`

Learner Notes

• **Box<Expr>**: The #1 surprise for OCaml devs. Rust enums are stack-allocated, so recursive variants need indirection

• Lifetimes in parsers: &'a [&str] — the output slice borrows from the input, and Rust tracks this

• **split_first()**: Returns Option<(&T, &[T])> — Rust's equivalent of OCaml's list head/tail destructuring

• **No and keyword**: Rust doesn't need it. Forward references work naturally for functions (not for types — use Box)

• **? operator**: Propagates parse errors elegantly — each ? is like OCaml's match ... with Error -> ...

Exercises

Add subtraction and division to the parser, keeping + - at lower precedence than * /.

Add parentheses support: parse_atom should handle "(" by calling parse_expr and then expecting ")".

Add a unary negation operator: parse_atom should handle "-" followed by another atom.

Write a tokeniser tokenise(s: &str) -> Vec<String> that splits an arithmetic expression string into tokens.

In OCaml, extend the parser with let x = e in e' expressions and implement a substitution-based evaluator.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust