771-pratt-parser — Pratt Parser
Tutorial Video
Text description (accessibility)
This video demonstrates the "771-pratt-parser — Pratt Parser" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Recursive descent parsers encode operator precedence as a grammar rule hierarchy, requiring a separate parsing function per precedence level. Key difference from OCaml: 1. **Binding power encoding**: Rust's `(left_bp, right_bp)` tuple is the standard Pratt encoding; OCaml's `Menhir` uses declaration
Tutorial
The Problem
Recursive descent parsers encode operator precedence as a grammar rule hierarchy, requiring a separate parsing function per precedence level. Vaughan Pratt's 1973 "Top-Down Operator Precedence" technique encodes precedence as integer "binding powers" on operators, enabling a single compact parsing loop to handle any precedence and associativity. It is used in rustc's expression parser, TypeScript's compiler, and the Crafting Interpreters book. Pratt parsing is often the first thing compiler engineers reach for when hand-writing expression parsers.
🎯 Learning Outcomes
infix_binding_power returning (left_bp, right_bp) pairsprefix_binding_power for unary operatorsparse_expr(min_bp) loop that handles all binary and unary operators1+2+3 = (1+2)+3) and right-associativity (2^3^4 = 2^(3^4)) emerge naturally from the binding power valuesCode Example
fn infix_binding_power(op: char) -> Option<(u8, u8)> {
match op {
'+' | '-' => Some((1, 2)), // left associative
'*' | '/' => Some((3, 4)), // left associative
'^' => Some((6, 5)), // right associative
_ => None,
}
}Key Differences
(left_bp, right_bp) tuple is the standard Pratt encoding; OCaml's Menhir uses declaration-based precedence at the grammar level.?:) and postfix (function calls f(x)) naturally by treating them as infix/postfix operators with appropriate binding powers.OCaml Approach
OCaml's Menhir handles precedence via %left, %right, %nonassoc declarations — equivalent to Pratt's binding power pairs. Hand-written Pratt parsers in OCaml look similar to Rust: a recursive parse_expr min_bp function. The angstrom library's chainl1 and chainr1 combinators implement left- and right-associative infix parsing, abstracting over Pratt's core idea.
Full Source
#![allow(clippy::all)]
//! # Pratt Parser
//!
//! Operator precedence parsing using the Pratt algorithm.
/// Token types
#[derive(Debug, Clone, PartialEq)]
pub enum Token {
Number(f64),
Ident(String),
Plus,
Minus,
Star,
Slash,
Caret,
LParen,
RParen,
Eof,
}
/// AST
#[derive(Debug, Clone)]
pub enum Expr {
Number(f64),
Ident(String),
Prefix {
op: char,
expr: Box<Expr>,
},
Infix {
op: char,
left: Box<Expr>,
right: Box<Expr>,
},
}
/// Get binding power (precedence) for infix operators
fn infix_binding_power(op: char) -> Option<(u8, u8)> {
match op {
'+' | '-' => Some((1, 2)), // left associative
'*' | '/' => Some((3, 4)), // left associative
'^' => Some((6, 5)), // right associative
_ => None,
}
}
/// Get binding power for prefix operators
fn prefix_binding_power(op: char) -> Option<u8> {
match op {
'+' | '-' => Some(5),
_ => None,
}
}
/// Lexer
pub struct Lexer<'a> {
input: &'a str,
pos: usize,
}
impl<'a> Lexer<'a> {
pub fn new(input: &'a str) -> Self {
Lexer { input, pos: 0 }
}
fn peek_char(&self) -> Option<char> {
self.input[self.pos..].chars().next()
}
fn advance(&mut self) {
if let Some(c) = self.peek_char() {
self.pos += c.len_utf8();
}
}
fn skip_whitespace(&mut self) {
while let Some(c) = self.peek_char() {
if c.is_whitespace() {
self.advance();
} else {
break;
}
}
}
pub fn next_token(&mut self) -> Token {
self.skip_whitespace();
match self.peek_char() {
None => Token::Eof,
Some('+') => {
self.advance();
Token::Plus
}
Some('-') => {
self.advance();
Token::Minus
}
Some('*') => {
self.advance();
Token::Star
}
Some('/') => {
self.advance();
Token::Slash
}
Some('^') => {
self.advance();
Token::Caret
}
Some('(') => {
self.advance();
Token::LParen
}
Some(')') => {
self.advance();
Token::RParen
}
Some(c) if c.is_ascii_digit() => {
let start = self.pos;
while let Some(c) = self.peek_char() {
if c.is_ascii_digit() || c == '.' {
self.advance();
} else {
break;
}
}
Token::Number(self.input[start..self.pos].parse().unwrap())
}
Some(c) if c.is_alphabetic() => {
let start = self.pos;
while let Some(c) = self.peek_char() {
if c.is_alphanumeric() {
self.advance();
} else {
break;
}
}
Token::Ident(self.input[start..self.pos].to_string())
}
Some(c) => panic!("Unexpected: {}", c),
}
}
}
/// Pratt parser
pub struct Parser<'a> {
lexer: Lexer<'a>,
current: Token,
}
impl<'a> Parser<'a> {
pub fn new(input: &'a str) -> Self {
let mut lexer = Lexer::new(input);
let current = lexer.next_token();
Parser { lexer, current }
}
fn advance(&mut self) {
self.current = self.lexer.next_token();
}
fn token_to_op(token: &Token) -> Option<char> {
match token {
Token::Plus => Some('+'),
Token::Minus => Some('-'),
Token::Star => Some('*'),
Token::Slash => Some('/'),
Token::Caret => Some('^'),
_ => None,
}
}
/// Main Pratt parsing function
pub fn parse_expr(&mut self, min_bp: u8) -> Result<Expr, String> {
// Handle prefix (atoms and prefix operators)
let mut lhs = match &self.current.clone() {
Token::Number(n) => {
let n = *n;
self.advance();
Expr::Number(n)
}
Token::Ident(s) => {
let s = s.clone();
self.advance();
Expr::Ident(s)
}
Token::LParen => {
self.advance();
let expr = self.parse_expr(0)?;
if self.current != Token::RParen {
return Err("Expected ')'".to_string());
}
self.advance();
expr
}
Token::Plus | Token::Minus => {
let op = Self::token_to_op(&self.current).unwrap();
self.advance();
let bp = prefix_binding_power(op).unwrap();
let rhs = self.parse_expr(bp)?;
Expr::Prefix {
op,
expr: Box::new(rhs),
}
}
_ => return Err(format!("Unexpected token: {:?}", self.current)),
};
// Handle infix operators
loop {
let op = match Self::token_to_op(&self.current) {
Some(op) => op,
None => break,
};
let (l_bp, r_bp) = match infix_binding_power(op) {
Some(bp) => bp,
None => break,
};
if l_bp < min_bp {
break;
}
self.advance();
let rhs = self.parse_expr(r_bp)?;
lhs = Expr::Infix {
op,
left: Box::new(lhs),
right: Box::new(rhs),
};
}
Ok(lhs)
}
}
/// Evaluate expression
pub fn eval(expr: &Expr) -> f64 {
match expr {
Expr::Number(n) => *n,
Expr::Ident(_) => 0.0, // Variables not supported
Expr::Prefix { op, expr } => {
let v = eval(expr);
match op {
'-' => -v,
'+' => v,
_ => v,
}
}
Expr::Infix { op, left, right } => {
let l = eval(left);
let r = eval(right);
match op {
'+' => l + r,
'-' => l - r,
'*' => l * r,
'/' => l / r,
'^' => l.powf(r),
_ => 0.0,
}
}
}
}
pub fn calculate(input: &str) -> Result<f64, String> {
let mut parser = Parser::new(input);
let expr = parser.parse_expr(0)?;
Ok(eval(&expr))
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_precedence() {
assert_eq!(calculate("1 + 2 * 3").unwrap(), 7.0);
assert_eq!(calculate("2 * 3 + 1").unwrap(), 7.0);
}
#[test]
fn test_power_right_assoc() {
assert_eq!(calculate("2 ^ 3 ^ 2").unwrap(), 512.0); // 2^(3^2) = 2^9
}
#[test]
fn test_unary() {
assert_eq!(calculate("-5").unwrap(), -5.0);
assert_eq!(calculate("--5").unwrap(), 5.0);
}
#[test]
fn test_parens() {
assert_eq!(calculate("(1 + 2) * 3").unwrap(), 9.0);
}
#[test]
fn test_complex() {
assert_eq!(calculate("2 ^ 2 + 3 * 4").unwrap(), 16.0);
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_precedence() {
assert_eq!(calculate("1 + 2 * 3").unwrap(), 7.0);
assert_eq!(calculate("2 * 3 + 1").unwrap(), 7.0);
}
#[test]
fn test_power_right_assoc() {
assert_eq!(calculate("2 ^ 3 ^ 2").unwrap(), 512.0); // 2^(3^2) = 2^9
}
#[test]
fn test_unary() {
assert_eq!(calculate("-5").unwrap(), -5.0);
assert_eq!(calculate("--5").unwrap(), 5.0);
}
#[test]
fn test_parens() {
assert_eq!(calculate("(1 + 2) * 3").unwrap(), 9.0);
}
#[test]
fn test_complex() {
assert_eq!(calculate("2 ^ 2 + 3 * 4").unwrap(), 16.0);
}
}
Deep Comparison
OCaml vs Rust: Pratt Parser
Binding Power
Rust
fn infix_binding_power(op: char) -> Option<(u8, u8)> {
match op {
'+' | '-' => Some((1, 2)), // left associative
'*' | '/' => Some((3, 4)), // left associative
'^' => Some((6, 5)), // right associative
_ => None,
}
}
OCaml
let infix_binding_power = function
| '+' | '-' -> Some (1, 2)
| '*' | '/' -> Some (3, 4)
| '^' -> Some (6, 5)
| _ -> None
Core Loop
Rust
pub fn parse_expr(&mut self, min_bp: u8) -> Result<Expr, String> {
let mut lhs = self.parse_prefix()?;
loop {
let (l_bp, r_bp) = match infix_binding_power(op) {
Some(bp) => bp,
None => break,
};
if l_bp < min_bp { break; }
self.advance();
let rhs = self.parse_expr(r_bp)?;
lhs = Expr::Infix { op, left: Box::new(lhs), right: Box::new(rhs) };
}
Ok(lhs)
}
Key Differences
| Aspect | OCaml | Rust |
|---|---|---|
| Binding power | Tuple int * int | (u8, u8) |
| Recursion | Tail-recursive | Loop + recursion |
| Box | Not needed | Required for AST |
| Option handling | Pattern match | ? + match |
Exercises
?: operator (lowest precedence, right-associative) using Pratt's technique: treat ? as an infix operator that reads the : and right expression inside its led.f(a, b) by treating ( after an identifier as a postfix operator with high precedence.Postfix { op, expr } AST node for the unary postfix ! (factorial) operator with the highest binding power of all.