771 Fundamental

771-pratt-parser — Pratt Parser

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "771-pratt-parser — Pratt Parser" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Recursive descent parsers encode operator precedence as a grammar rule hierarchy, requiring a separate parsing function per precedence level. Key difference from OCaml: 1. **Binding power encoding**: Rust's `(left_bp, right_bp)` tuple is the standard Pratt encoding; OCaml's `Menhir` uses declaration

Tutorial

The Problem

Recursive descent parsers encode operator precedence as a grammar rule hierarchy, requiring a separate parsing function per precedence level. Vaughan Pratt's 1973 "Top-Down Operator Precedence" technique encodes precedence as integer "binding powers" on operators, enabling a single compact parsing loop to handle any precedence and associativity. It is used in rustc's expression parser, TypeScript's compiler, and the Crafting Interpreters book. Pratt parsing is often the first thing compiler engineers reach for when hand-writing expression parsers.

🎯 Learning Outcomes

• Understand binding power as a mechanism for encoding precedence and associativity

• Implement infix_binding_power returning (left_bp, right_bp) pairs

• Implement prefix_binding_power for unary operators

• Write a single parse_expr(min_bp) loop that handles all binary and unary operators

• See how left-associativity (1+2+3 = (1+2)+3) and right-associativity (2^3^4 = 2^(3^4)) emerge naturally from the binding power values

Code Example

fn infix_binding_power(op: char) -> Option<(u8, u8)> {
    match op {
        '+' | '-' => Some((1, 2)),   // left associative
        '*' | '/' => Some((3, 4)),   // left associative
        '^' => Some((6, 5)),         // right associative
        _ => None,
    }
}

let infix_binding_power = function
  | '+' | '-' -> Some (1, 2)
  | '*' | '/' -> Some (3, 4)
  | '^' -> Some (6, 5)
  | _ -> None

Key Differences

Binding power encoding: Rust's (left_bp, right_bp) tuple is the standard Pratt encoding; OCaml's Menhir uses declaration-based precedence at the grammar level.

Nud/Led terminology: Pratt's original paper used "nud" (null denotation) and "led" (left denotation); modern presentations use "prefix_bp" and "infix_bp" as in this example.

Flexibility: Pratt parsers trivially add new operators with custom precedence; Menhir requires grammar modifications and recompilation.

Ternary and postfix: Pratt handles ternary (?:) and postfix (function calls f(x)) naturally by treating them as infix/postfix operators with appropriate binding powers.

OCaml Approach

OCaml's Menhir handles precedence via %left, %right, %nonassoc declarations — equivalent to Pratt's binding power pairs. Hand-written Pratt parsers in OCaml look similar to Rust: a recursive parse_expr min_bp function. The angstrom library's chainl1 and chainr1 combinators implement left- and right-associative infix parsing, abstracting over Pratt's core idea.

Full Source

#![allow(clippy::all)]
//! # Pratt Parser
//!
//! Operator precedence parsing using the Pratt algorithm.

/// Token types
#[derive(Debug, Clone, PartialEq)]
pub enum Token {
    Number(f64),
    Ident(String),
    Plus,
    Minus,
    Star,
    Slash,
    Caret,
    LParen,
    RParen,
    Eof,
}

/// AST
#[derive(Debug, Clone)]
pub enum Expr {
    Number(f64),
    Ident(String),
    Prefix {
        op: char,
        expr: Box<Expr>,
    },
    Infix {
        op: char,
        left: Box<Expr>,
        right: Box<Expr>,
    },
}

/// Get binding power (precedence) for infix operators
fn infix_binding_power(op: char) -> Option<(u8, u8)> {
    match op {
        '+' | '-' => Some((1, 2)), // left associative
        '*' | '/' => Some((3, 4)), // left associative
        '^' => Some((6, 5)),       // right associative
        _ => None,
    }
}

/// Get binding power for prefix operators
fn prefix_binding_power(op: char) -> Option<u8> {
    match op {
        '+' | '-' => Some(5),
        _ => None,
    }
}

/// Lexer
pub struct Lexer<'a> {
    input: &'a str,
    pos: usize,
}

impl<'a> Lexer<'a> {
    pub fn new(input: &'a str) -> Self {
        Lexer { input, pos: 0 }
    }

    fn peek_char(&self) -> Option<char> {
        self.input[self.pos..].chars().next()
    }

    fn advance(&mut self) {
        if let Some(c) = self.peek_char() {
            self.pos += c.len_utf8();
        }
    }

    fn skip_whitespace(&mut self) {
        while let Some(c) = self.peek_char() {
            if c.is_whitespace() {
                self.advance();
            } else {
                break;
            }
        }
    }

    pub fn next_token(&mut self) -> Token {
        self.skip_whitespace();

        match self.peek_char() {
            None => Token::Eof,
            Some('+') => {
                self.advance();
                Token::Plus
            }
            Some('-') => {
                self.advance();
                Token::Minus
            }
            Some('*') => {
                self.advance();
                Token::Star
            }
            Some('/') => {
                self.advance();
                Token::Slash
            }
            Some('^') => {
                self.advance();
                Token::Caret
            }
            Some('(') => {
                self.advance();
                Token::LParen
            }
            Some(')') => {
                self.advance();
                Token::RParen
            }
            Some(c) if c.is_ascii_digit() => {
                let start = self.pos;
                while let Some(c) = self.peek_char() {
                    if c.is_ascii_digit() || c == '.' {
                        self.advance();
                    } else {
                        break;
                    }
                }
                Token::Number(self.input[start..self.pos].parse().unwrap())
            }
            Some(c) if c.is_alphabetic() => {
                let start = self.pos;
                while let Some(c) = self.peek_char() {
                    if c.is_alphanumeric() {
                        self.advance();
                    } else {
                        break;
                    }
                }
                Token::Ident(self.input[start..self.pos].to_string())
            }
            Some(c) => panic!("Unexpected: {}", c),
        }
    }
}

/// Pratt parser
pub struct Parser<'a> {
    lexer: Lexer<'a>,
    current: Token,
}

impl<'a> Parser<'a> {
    pub fn new(input: &'a str) -> Self {
        let mut lexer = Lexer::new(input);
        let current = lexer.next_token();
        Parser { lexer, current }
    }

    fn advance(&mut self) {
        self.current = self.lexer.next_token();
    }

    fn token_to_op(token: &Token) -> Option<char> {
        match token {
            Token::Plus => Some('+'),
            Token::Minus => Some('-'),
            Token::Star => Some('*'),
            Token::Slash => Some('/'),
            Token::Caret => Some('^'),
            _ => None,
        }
    }

    /// Main Pratt parsing function
    pub fn parse_expr(&mut self, min_bp: u8) -> Result<Expr, String> {
        // Handle prefix (atoms and prefix operators)
        let mut lhs = match &self.current.clone() {
            Token::Number(n) => {
                let n = *n;
                self.advance();
                Expr::Number(n)
            }
            Token::Ident(s) => {
                let s = s.clone();
                self.advance();
                Expr::Ident(s)
            }
            Token::LParen => {
                self.advance();
                let expr = self.parse_expr(0)?;
                if self.current != Token::RParen {
                    return Err("Expected ')'".to_string());
                }
                self.advance();
                expr
            }
            Token::Plus | Token::Minus => {
                let op = Self::token_to_op(&self.current).unwrap();
                self.advance();
                let bp = prefix_binding_power(op).unwrap();
                let rhs = self.parse_expr(bp)?;
                Expr::Prefix {
                    op,
                    expr: Box::new(rhs),
                }
            }
            _ => return Err(format!("Unexpected token: {:?}", self.current)),
        };

        // Handle infix operators
        loop {
            let op = match Self::token_to_op(&self.current) {
                Some(op) => op,
                None => break,
            };

            let (l_bp, r_bp) = match infix_binding_power(op) {
                Some(bp) => bp,
                None => break,
            };

            if l_bp < min_bp {
                break;
            }

            self.advance();
            let rhs = self.parse_expr(r_bp)?;
            lhs = Expr::Infix {
                op,
                left: Box::new(lhs),
                right: Box::new(rhs),
            };
        }

        Ok(lhs)
    }
}

/// Evaluate expression
pub fn eval(expr: &Expr) -> f64 {
    match expr {
        Expr::Number(n) => *n,
        Expr::Ident(_) => 0.0, // Variables not supported
        Expr::Prefix { op, expr } => {
            let v = eval(expr);
            match op {
                '-' => -v,
                '+' => v,
                _ => v,
            }
        }
        Expr::Infix { op, left, right } => {
            let l = eval(left);
            let r = eval(right);
            match op {
                '+' => l + r,
                '-' => l - r,
                '*' => l * r,
                '/' => l / r,
                '^' => l.powf(r),
                _ => 0.0,
            }
        }
    }
}

pub fn calculate(input: &str) -> Result<f64, String> {
    let mut parser = Parser::new(input);
    let expr = parser.parse_expr(0)?;
    Ok(eval(&expr))
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_precedence() {
        assert_eq!(calculate("1 + 2 * 3").unwrap(), 7.0);
        assert_eq!(calculate("2 * 3 + 1").unwrap(), 7.0);
    }

    #[test]
    fn test_power_right_assoc() {
        assert_eq!(calculate("2 ^ 3 ^ 2").unwrap(), 512.0); // 2^(3^2) = 2^9
    }

    #[test]
    fn test_unary() {
        assert_eq!(calculate("-5").unwrap(), -5.0);
        assert_eq!(calculate("--5").unwrap(), 5.0);
    }

    #[test]
    fn test_parens() {
        assert_eq!(calculate("(1 + 2) * 3").unwrap(), 9.0);
    }

    #[test]
    fn test_complex() {
        assert_eq!(calculate("2 ^ 2 + 3 * 4").unwrap(), 16.0);
    }
}

(* Pratt parser in OCaml — handles (1+2)*3, unary minus, right-assoc ^ *)

(* ── Tokens ─────────────────────────────────────────────────────────────────── *)
type token = TNum of float | TPlus | TMinus | TStar | TSlash | TCaret
           | TLParen | TRParen | TEof

(* ── AST ─────────────────────────────────────────────────────────────────────── *)
type expr =
  | Num of float
  | Unary  of char * expr
  | Binary of char * expr * expr

let rec eval = function
  | Num n -> n
  | Unary  ('-', e) -> -. (eval e)
  | Unary  (_, e)   -> eval e
  | Binary ('+', a, b) -> eval a +. eval b
  | Binary ('-', a, b) -> eval a -. eval b
  | Binary ('*', a, b) -> eval a *. eval b
  | Binary ('/', a, b) -> eval a /. eval b
  | Binary ('^', a, b) -> Float.pow (eval a) (eval b)
  | Binary _ -> 0.0

(* ── Lexer ─────────────────────────────────────────────────────────────────── *)
let tokenize s =
  let tokens = ref [] in
  let i = ref 0 in
  let len = String.length s in
  while !i < len do
    let c = s.[!i] in
    incr i;
    match c with
    | ' ' | '\t' -> ()
    | '+' -> tokens := TPlus   :: !tokens
    | '-' -> tokens := TMinus  :: !tokens
    | '*' -> tokens := TStar   :: !tokens
    | '/' -> tokens := TSlash  :: !tokens
    | '^' -> tokens := TCaret  :: !tokens
    | '(' -> tokens := TLParen :: !tokens
    | ')' -> tokens := TRParen :: !tokens
    | c when c >= '0' && c <= '9' ->
      let start = !i - 1 in
      while !i < len && (s.[!i] >= '0' && s.[!i] <= '9' || s.[!i] = '.') do incr i done;
      tokens := TNum (float_of_string (String.sub s start (!i - start))) :: !tokens
    | _ -> ()
  done;
  List.rev (TEof :: !tokens)

(* ── Pratt parser ─────────────────────────────────────────────────────────── *)
let tokens = ref []
let current () = match !tokens with t :: _ -> t | [] -> TEof
let consume () = match !tokens with _ :: rest -> tokens := rest | [] -> ()

let infix_bp = function
  | TPlus | TMinus -> (10, 11)
  | TStar | TSlash -> (20, 21)
  | TCaret          -> (30, 29)  (* right-associative: right bp < left bp *)
  | _ -> (0, 0)

let op_char = function
  | TPlus -> '+' | TMinus -> '-' | TStar -> '*' | TSlash -> '/' | TCaret -> '^'
  | _ -> '?'

let rec parse_bp min_bp =
  let left = ref (parse_nud ()) in
  let continue = ref true in
  while !continue do
    let tok = current () in
    let (lbp, rbp) = infix_bp tok in
    if lbp <= min_bp then continue := false
    else begin
      consume ();
      let right = parse_bp rbp in
      left := Binary (op_char tok, !left, right)
    end
  done;
  !left

and parse_nud () =
  match current () with
  | TNum n -> consume (); Num n
  | TMinus ->
    consume ();
    let e = parse_bp 25 in  (* higher than * to bind tight *)
    Unary ('-', e)
  | TLParen ->
    consume ();
    let e = parse_bp 0 in
    (match current () with TRParen -> consume () | _ -> failwith "expected ')'");
    e
  | t -> failwith (Printf.sprintf "unexpected token: %d" (Obj.tag (Obj.repr t)))

let parse s =
  tokens := tokenize s;
  parse_bp 0

let () =
  let tests = [
    "(1 + 2) * 3",       (* = 9 *)
    "2 ^ 3 ^ 2",         (* = 512, right-assoc: 2^(3^2) = 2^9 *)
    "-2 * 3",            (* = -6 *)
    "1 + 2 * 3 - 4 / 2", (* = 5 *)
  ] in
  List.iter (fun s ->
    Printf.printf "%s = %g\n" s (eval (parse s))
  ) tests

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_precedence() {
        assert_eq!(calculate("1 + 2 * 3").unwrap(), 7.0);
        assert_eq!(calculate("2 * 3 + 1").unwrap(), 7.0);
    }

    #[test]
    fn test_power_right_assoc() {
        assert_eq!(calculate("2 ^ 3 ^ 2").unwrap(), 512.0); // 2^(3^2) = 2^9
    }

    #[test]
    fn test_unary() {
        assert_eq!(calculate("-5").unwrap(), -5.0);
        assert_eq!(calculate("--5").unwrap(), 5.0);
    }

    #[test]
    fn test_parens() {
        assert_eq!(calculate("(1 + 2) * 3").unwrap(), 9.0);
    }

    #[test]
    fn test_complex() {
        assert_eq!(calculate("2 ^ 2 + 3 * 4").unwrap(), 16.0);
    }
}

Deep Comparison

OCaml vs Rust: Pratt Parser

Binding Power

Rust

fn infix_binding_power(op: char) -> Option<(u8, u8)> {
    match op {
        '+' | '-' => Some((1, 2)),   // left associative
        '*' | '/' => Some((3, 4)),   // left associative
        '^' => Some((6, 5)),         // right associative
        _ => None,
    }
}

OCaml

let infix_binding_power = function
  | '+' | '-' -> Some (1, 2)
  | '*' | '/' -> Some (3, 4)
  | '^' -> Some (6, 5)
  | _ -> None

Core Loop

Rust

pub fn parse_expr(&mut self, min_bp: u8) -> Result<Expr, String> {
    let mut lhs = self.parse_prefix()?;
    
    loop {
        let (l_bp, r_bp) = match infix_binding_power(op) {
            Some(bp) => bp,
            None => break,
        };
        if l_bp < min_bp { break; }
        
        self.advance();
        let rhs = self.parse_expr(r_bp)?;
        lhs = Expr::Infix { op, left: Box::new(lhs), right: Box::new(rhs) };
    }
    Ok(lhs)
}

Key Differences

Aspect	OCaml	Rust
Binding power	Tuple `int * int`	`(u8, u8)`
Recursion	Tail-recursive	Loop + recursion
Box	Not needed	Required for AST
Option handling	Pattern match	`?` + match

Exercises

Add the ternary ?: operator (lowest precedence, right-associative) using Pratt's technique: treat ? as an infix operator that reads the : and right expression inside its led.

Extend the parser to support function calls f(a, b) by treating ( after an identifier as a postfix operator with high precedence.

Add a Postfix { op, expr } AST node for the unary postfix ! (factorial) operator with the highest binding power of all.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust