Keyword Parser
Functional Programming
Tutorial
The Problem
Keywords like if, else, while, and return must be distinguished from identifiers. A naive string parser would match "if" at the start of "ifstream" — failing to recognize that "ifstream" is an identifier containing the prefix "if". Keyword parsers enforce word boundary checking: the keyword must not be followed by an identifier character. This is the difference between lexer-level and parser-level thinking in grammar design.
🎯 Learning Outcomes
tag("if") alone is insufficient for keywordskeyword as tag followed by a negative lookahead (no identifier character)Code Example
fn keyword<'a>(kw: &str) -> Parser<'a, &'a str> {
let kw_owned = kw.to_string();
Box::new(move |input: &'a str| {
if !input.starts_with(&kw_owned) {
return Err(format!("Expected \"{}\"", kw_owned));
}
let rest = &input[kw_owned.len()..];
match rest.chars().next() {
Some(c) if is_ident_char(c) => Err(format!("not a complete keyword")),
_ => Ok((&input[..kw_owned.len()], rest)),
}
})
}Key Differences
peek_char.HashSet and check identifiers against it; this is an alternative to individual keyword parsers.OCaml Approach
In OCaml's angstrom, the common pattern is:
let keyword kw =
string kw *> peek_char >>= function
| Some c when is_ident_char c -> fail ("expected end of keyword '" ^ kw ^ "'")
| _ -> return kw
peek_char looks at the next character without consuming it, providing the lookahead. This is equivalent to Rust's approach and equally necessary for correct keyword parsing.
Full Source
#![allow(clippy::all)]
// Example 165: Keyword Parser
// Keywords with word boundary checking
type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
fn is_ident_char(c: char) -> bool {
c.is_ascii_alphanumeric() || c == '_'
}
// ============================================================
// Approach 1: keyword — match string + word boundary
// ============================================================
fn keyword<'a>(kw: &str) -> Parser<'a, &'a str> {
let kw_owned = kw.to_string();
Box::new(move |input: &'a str| {
if !input.starts_with(&kw_owned) {
return Err(format!("Expected \"{}\"", kw_owned));
}
let rest = &input[kw_owned.len()..];
match rest.chars().next() {
Some(c) if is_ident_char(c) => Err(format!(
"\"{}\" not a complete keyword (followed by '{}')",
kw_owned, c
)),
_ => Ok((&input[..kw_owned.len()], rest)),
}
})
}
// ============================================================
// Approach 2: keyword mapped to a token type
// ============================================================
#[derive(Debug, Clone, PartialEq)]
enum Token {
If,
Then,
Else,
Let,
In,
Fn,
}
fn keyword_token<'a>(kw: &str, tok: Token) -> Parser<'a, Token> {
let kw_owned = kw.to_string();
Box::new(move |input: &'a str| {
if !input.starts_with(&kw_owned) {
return Err(format!("Expected \"{}\"", kw_owned));
}
let rest = &input[kw_owned.len()..];
match rest.chars().next() {
Some(c) if is_ident_char(c) => Err(format!("\"{}\" not a complete keyword", kw_owned)),
_ => Ok((tok.clone(), rest)),
}
})
}
// ============================================================
// Approach 3: any_keyword — try multiple, longest first
// ============================================================
fn any_keyword<'a>(keywords: Vec<(&str, Token)>) -> Parser<'a, Token> {
let mut kws: Vec<(String, Token)> = keywords
.into_iter()
.map(|(s, t)| (s.to_string(), t))
.collect();
// Sort longest first to avoid prefix ambiguity
kws.sort_by(|a, b| b.0.len().cmp(&a.0.len()));
Box::new(move |input: &'a str| {
for (kw, tok) in &kws {
if input.starts_with(kw.as_str()) {
let rest = &input[kw.len()..];
match rest.chars().next() {
Some(c) if is_ident_char(c) => continue,
_ => return Ok((tok.clone(), rest)),
}
}
}
Err("Expected keyword".to_string())
})
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_keyword_match() {
assert_eq!(keyword("if")("if x"), Ok(("if", " x")));
}
#[test]
fn test_keyword_boundary() {
assert!(keyword("if")("iffy").is_err());
}
#[test]
fn test_keyword_paren() {
assert_eq!(keyword("if")("if("), Ok(("if", "(")));
}
#[test]
fn test_keyword_eof() {
assert_eq!(keyword("if")("if"), Ok(("if", "")));
}
#[test]
fn test_keyword_token() {
assert_eq!(
keyword_token("let", Token::Let)("let x"),
Ok((Token::Let, " x"))
);
}
#[test]
fn test_any_keyword() {
let p = any_keyword(vec![
("if", Token::If),
("in", Token::In),
("let", Token::Let),
]);
assert_eq!(p("let x"), Ok((Token::Let, " x")));
assert_eq!(p("in "), Ok((Token::In, " ")));
}
#[test]
fn test_any_keyword_none() {
let p = any_keyword(vec![("if", Token::If)]);
assert!(p("hello").is_err());
}
#[test]
fn test_keyword_no_match() {
assert!(keyword("else")("elseif").is_err());
}
}
✓ Tests
Rust test suite
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_keyword_match() {
assert_eq!(keyword("if")("if x"), Ok(("if", " x")));
}
#[test]
fn test_keyword_boundary() {
assert!(keyword("if")("iffy").is_err());
}
#[test]
fn test_keyword_paren() {
assert_eq!(keyword("if")("if("), Ok(("if", "(")));
}
#[test]
fn test_keyword_eof() {
assert_eq!(keyword("if")("if"), Ok(("if", "")));
}
#[test]
fn test_keyword_token() {
assert_eq!(
keyword_token("let", Token::Let)("let x"),
Ok((Token::Let, " x"))
);
}
#[test]
fn test_any_keyword() {
let p = any_keyword(vec![
("if", Token::If),
("in", Token::In),
("let", Token::Let),
]);
assert_eq!(p("let x"), Ok((Token::Let, " x")));
assert_eq!(p("in "), Ok((Token::In, " ")));
}
#[test]
fn test_any_keyword_none() {
let p = any_keyword(vec![("if", Token::If)]);
assert!(p("hello").is_err());
}
#[test]
fn test_keyword_no_match() {
assert!(keyword("else")("elseif").is_err());
}
}
Deep Comparison
Comparison: Example 165 — Keyword Parser
keyword with boundary
OCaml:
let keyword (kw : string) : string parser = fun input ->
match tag kw input with
| Error e -> Error e
| Ok (matched, rest) ->
if String.length rest > 0 && is_ident_char rest.[0] then
Error (Printf.sprintf "\"%s\" is not a complete keyword" kw)
else Ok (matched, rest)
Rust:
fn keyword<'a>(kw: &str) -> Parser<'a, &'a str> {
let kw_owned = kw.to_string();
Box::new(move |input: &'a str| {
if !input.starts_with(&kw_owned) {
return Err(format!("Expected \"{}\"", kw_owned));
}
let rest = &input[kw_owned.len()..];
match rest.chars().next() {
Some(c) if is_ident_char(c) => Err(format!("not a complete keyword")),
_ => Ok((&input[..kw_owned.len()], rest)),
}
})
}
Token enum
OCaml:
type token = If | Then | Else | Let | In | Fun
Rust:
#[derive(Debug, Clone, PartialEq)]
enum Token { If, Then, Else, Let, In, Fn }
Exercises
keywords combinator that tries a list of keywords in order: keywords(["if", "else", "while"])."for" (keyword) from "formula" (identifier) and test both.reserved_word(words: &[&str]) -> impl Fn(&str) -> bool as a fast lookup for keyword exclusion in identifier parsing.