415: Token Tree Munching
Tutorial Video
Text description (accessibility)
This video demonstrates the "415: Token Tree Munching" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Complex macro DSLs need to parse arbitrary syntax that doesn't fit standard repetition patterns. Key difference from OCaml: 1. **Token level**: Rust TT munching operates on raw tokens; OCaml PPX operates on parsed AST nodes, making it more structured.
Tutorial
The Problem
Complex macro DSLs need to parse arbitrary syntax that doesn't fit standard repetition patterns. Token tree (TT) munching processes input one token tree at a time: one arm peels off the first $tt and processes it, recursing with the remainder. This enables parsing heterogeneous sequences, implementing mini-parsers within macros, and supporting complex field definition syntax like field: Type = default. TT munching is the technique behind the bitflags!, clap::arg!, and pest grammar macros.
Understanding TT munching unlocks the ability to write DSL macros that parse natural, human-readable syntax rather than forcing callers into rigid comma-separated patterns.
🎯 Learning Outcomes
$tt per recursive step@internal_name naming conventions mark internal macro armsdefine_config! parses field: Type = default, syntax incrementallyCode Example
// Parse: struct Name { field: Type = default, ... }
macro_rules! define_config {
// Done: emit the struct
(@fields $name:ident {} -> { $($fields:tt)* }) => {
struct $name { $($fields)* }
};
// Munch one field
(@fields $name:ident {
$field:ident : $ty:ty = $default:expr,
$($rest:tt)*
} -> { $($fields:tt)* }) => {
define_config!(@fields $name { $($rest)* } -> {
$($fields)*
$field: $ty,
});
};
// Entry point
(struct $name:ident { $($body:tt)* }) => {
define_config!(@fields $name { $($body)* } -> {});
};
}
define_config!(struct Config {
port: u16 = 8080,
debug: bool = false,
});Key Differences
$($acc:tt)* accumulator arms; OCaml PPX uses mutable buffers or immutable list accumulation in OCaml code.Location.error.OCaml Approach
OCaml's PPX approach to DSL parsing is more direct: it operates on the already-parsed OCaml AST. A PPX extension receives a Parsetree.structure (sequence of items) and transforms it. For custom syntax, OCaml uses Menhir parser generators or angstrom combinator parsers at runtime. True DSL parsing during OCaml compilation requires camlp5 or ppx extensions, which are more powerful than macro_rules! TT munching but require more infrastructure.
Full Source
#![allow(clippy::all)]
//! Token Tree Munching
//!
//! Parsing complex syntax by consuming token trees one at a time.
/// Parse a struct definition DSL with defaults.
#[macro_export]
macro_rules! define_config {
// Done: emit the struct
(@fields $name:ident {} -> { $($fields:tt)* } defaults: { $($defaults:tt)* }) => {
#[derive(Debug, Clone)]
pub struct $name {
$($fields)*
}
impl Default for $name {
fn default() -> Self {
$name {
$($defaults)*
}
}
}
};
// Munch one field: name: Type = default,
(@fields $name:ident {
$field:ident : $ty:ty = $default:expr,
$($rest:tt)*
} -> { $($fields:tt)* } defaults: { $($defaults:tt)* }) => {
define_config!(@fields $name { $($rest)* } -> {
$($fields)*
pub $field: $ty,
} defaults: {
$($defaults)*
$field: $default,
});
};
// Entry point
(struct $name:ident { $($body:tt)* }) => {
define_config!(@fields $name { $($body)* } -> {} defaults: {});
};
}
/// Simple calculator DSL.
#[macro_export]
macro_rules! calc {
// Base: single literal
($n:literal) => { $n };
// Addition
($a:literal + $($rest:tt)+) => {
$a + calc!($($rest)+)
};
// Subtraction
($a:literal - $($rest:tt)+) => {
$a - calc!($($rest)+)
};
// Multiplication (simple case)
($a:literal * $b:literal) => {
$a * $b
};
}
/// Parse key=value pairs into a HashMap.
#[macro_export]
macro_rules! parse_pairs {
// Done
(@acc $map:ident) => {};
// Munch one pair
(@acc $map:ident $key:ident = $val:expr; $($rest:tt)*) => {
$map.insert(stringify!($key).to_string(), $val.to_string());
parse_pairs!(@acc $map $($rest)*);
};
// Entry
($($key:ident = $val:expr;)*) => {{
let mut map = ::std::collections::HashMap::new();
parse_pairs!(@acc map $($key = $val;)*);
map
}};
}
/// Process command-like DSL.
#[macro_export]
macro_rules! commands {
// Done
(@acc $results:ident) => {};
// set command
(@acc $results:ident set $var:ident = $val:expr; $($rest:tt)*) => {
$results.push(format!("SET {} = {}", stringify!($var), $val));
commands!(@acc $results $($rest)*);
};
// get command
(@acc $results:ident get $var:ident; $($rest:tt)*) => {
$results.push(format!("GET {}", stringify!($var)));
commands!(@acc $results $($rest)*);
};
// delete command
(@acc $results:ident delete $var:ident; $($rest:tt)*) => {
$results.push(format!("DELETE {}", stringify!($var)));
commands!(@acc $results $($rest)*);
};
// Entry
($($cmd:tt)*) => {{
let mut results = Vec::new();
commands!(@acc results $($cmd)*);
results
}};
}
/// Count specific tokens.
#[macro_export]
macro_rules! count_tokens {
// Done
(@counting $count:expr,) => { $count };
// Skip commas, count everything else
(@counting $count:expr, , $($rest:tt)*) => {
count_tokens!(@counting $count, $($rest)*)
};
(@counting $count:expr, $head:tt $($rest:tt)*) => {
count_tokens!(@counting $count + 1, $($rest)*)
};
// Entry
($($tokens:tt)*) => {
count_tokens!(@counting 0usize, $($tokens)*)
};
}
define_config!(
struct ServerConfig {
host: String = "localhost".to_string(),
port: u16 = 8080,
timeout_secs: u32 = 30,
}
);
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_define_config() {
let cfg = ServerConfig::default();
assert_eq!(cfg.host, "localhost");
assert_eq!(cfg.port, 8080);
assert_eq!(cfg.timeout_secs, 30);
}
#[test]
fn test_define_config_override() {
let cfg = ServerConfig {
port: 9090,
..Default::default()
};
assert_eq!(cfg.port, 9090);
assert_eq!(cfg.host, "localhost");
}
#[test]
fn test_calc_add() {
assert_eq!(calc!(2 + 3), 5);
}
#[test]
fn test_calc_sub() {
assert_eq!(calc!(10 - 4), 6);
}
#[test]
fn test_calc_mul() {
assert_eq!(calc!(3 * 4), 12);
}
#[test]
fn test_parse_pairs() {
let pairs = parse_pairs! {
name = "Alice";
age = 30;
city = "NYC";
};
assert_eq!(pairs["name"], "Alice");
assert_eq!(pairs["age"], "30");
assert_eq!(pairs["city"], "NYC");
}
#[test]
fn test_commands() {
let cmds = commands! {
set x = 10;
get y;
delete z;
};
assert_eq!(cmds.len(), 3);
assert!(cmds[0].contains("SET x"));
assert!(cmds[1].contains("GET y"));
assert!(cmds[2].contains("DELETE z"));
}
#[test]
fn test_count_tokens() {
assert_eq!(count_tokens!(a b c), 3);
assert_eq!(count_tokens!(1 + 2 + 3), 5);
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_define_config() {
let cfg = ServerConfig::default();
assert_eq!(cfg.host, "localhost");
assert_eq!(cfg.port, 8080);
assert_eq!(cfg.timeout_secs, 30);
}
#[test]
fn test_define_config_override() {
let cfg = ServerConfig {
port: 9090,
..Default::default()
};
assert_eq!(cfg.port, 9090);
assert_eq!(cfg.host, "localhost");
}
#[test]
fn test_calc_add() {
assert_eq!(calc!(2 + 3), 5);
}
#[test]
fn test_calc_sub() {
assert_eq!(calc!(10 - 4), 6);
}
#[test]
fn test_calc_mul() {
assert_eq!(calc!(3 * 4), 12);
}
#[test]
fn test_parse_pairs() {
let pairs = parse_pairs! {
name = "Alice";
age = 30;
city = "NYC";
};
assert_eq!(pairs["name"], "Alice");
assert_eq!(pairs["age"], "30");
assert_eq!(pairs["city"], "NYC");
}
#[test]
fn test_commands() {
let cmds = commands! {
set x = 10;
get y;
delete z;
};
assert_eq!(cmds.len(), 3);
assert!(cmds[0].contains("SET x"));
assert!(cmds[1].contains("GET y"));
assert!(cmds[2].contains("DELETE z"));
}
#[test]
fn test_count_tokens() {
assert_eq!(count_tokens!(a b c), 3);
assert_eq!(count_tokens!(1 + 2 + 3), 5);
}
}
Deep Comparison
OCaml vs Rust: Token Tree Munching
What is TT Munching?
Token Tree Munching (TTM) is a macro technique that:
It's the macro equivalent of recursive descent parsing.
Example: Parsing a DSL
// Parse: struct Name { field: Type = default, ... }
macro_rules! define_config {
// Done: emit the struct
(@fields $name:ident {} -> { $($fields:tt)* }) => {
struct $name { $($fields)* }
};
// Munch one field
(@fields $name:ident {
$field:ident : $ty:ty = $default:expr,
$($rest:tt)*
} -> { $($fields:tt)* }) => {
define_config!(@fields $name { $($rest)* } -> {
$($fields)*
$field: $ty,
});
};
// Entry point
(struct $name:ident { $($body:tt)* }) => {
define_config!(@fields $name { $($body)* } -> {});
};
}
define_config!(struct Config {
port: u16 = 8080,
debug: bool = false,
});
The Pattern
macro_rules! muncher {
// Base case: input exhausted
(@internal $acc:tt) => { /* emit result */ };
// Recursive case: consume one token, accumulate, recurse
(@internal $acc:tt $head:tt $($tail:tt)*) => {
muncher!(@internal (/* new acc with $head */) $($tail)*)
};
// Entry point
($($input:tt)*) => {
muncher!(@internal () $($input)*)
};
}
OCaml Equivalent
OCaml doesn't have macros, but parser combinators achieve similar goals:
(* Using parser combinators *)
let rec parse_fields = function
| [] -> []
| Field(name, ty, default) :: rest ->
(name, ty, default) :: parse_fields rest
When to Use TTM
5 Takeaways
Consume tokens left-to-right, accumulate results.
@internal prefix for helper rules.**Keeps the public API clean.
$($rest:tt)* captures remaining tokens.**The "rest of input" to process recursively.
Build up output as you consume input.
Simple $(...)* repetition can't handle context-sensitive syntax.
Exercises
define_enum!(Status { Active => "active", Inactive => "inactive" }) that generates an enum and a fn as_str(&self) -> &str method using TT munching to parse each variant-to-string mapping.build_struct!(Point { x: f64 required, y: f64 required, label: String optional = "".to_string() }) where required fields must be set and optional fields have defaults.state_machine!(start: Idle { on(Event::Start) => Running }, Running { on(Event::Stop) => Idle }) using TT munching to generate a state enum and transition method.