Whitespace Parser
Functional Programming
Tutorial
The Problem
Most text formats are whitespace-insensitive: {"key": "value"} and { "key" : "value" } are equivalent JSON. Parsers must skip whitespace between tokens without interfering with token recognition. ws0 (zero or more whitespace characters) and ws1 (one or more) are standard utilities. Wrapping content parsers with ws_wrap allows callers to ignore whitespace concerns entirely, keeping individual token parsers clean and focused.
🎯 Learning Outcomes
ws0, ws1, and ws_wrap as standard whitespace-handling utilitiesCode Example
fn ws0<'a>() -> Parser<'a, ()> {
Box::new(|input: &'a str| {
let trimmed = input.trim_start();
Ok(((), trimmed))
})
}Key Differences
skip_while skips bytes without constructing values; Rust's many0(satisfy(...)) creates Vec<char> and discards it.input.trim_start() directly or scan with str::find(|c: char| !c.is_whitespace()) — bypassing the combinator overhead.ws0 tracks line numbers; adding line/column tracking requires threading a position state through the parser.ws0 to also skip comments; the typical approach is many0(choice([whitespace, line_comment, block_comment])).OCaml Approach
Angstrom provides skip_while : (char -> bool) -> unit t for efficient whitespace skipping without character-by-character overhead:
let ws = skip_while (fun c -> c = ' ' || c = '\t' || c = '\n' || c = '\r')
let ws_wrap p = ws *> p <* ws
OCaml's skip_while scans the buffer without constructing char values, making whitespace skipping more efficient than many0(satisfy(...)).
Full Source
#![allow(clippy::all)]
// Example 163: Whitespace Parser
// Parse and skip whitespace: ws0, ws1, ws_wrap
type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
// ============================================================
// Approach 1: ws0 — skip zero or more whitespace (always succeeds)
// ============================================================
fn ws0<'a>() -> Parser<'a, ()> {
Box::new(|input: &'a str| {
let trimmed = input.trim_start();
Ok(((), trimmed))
})
}
// ============================================================
// Approach 2: ws1 — require at least one whitespace
// ============================================================
fn ws1<'a>() -> Parser<'a, ()> {
Box::new(|input: &'a str| match input.chars().next() {
Some(c) if c.is_ascii_whitespace() => {
let trimmed = input.trim_start();
Ok(((), trimmed))
}
_ => Err("Expected whitespace".to_string()),
})
}
// ============================================================
// Approach 3: ws_wrap — parse with surrounding whitespace
// ============================================================
fn ws_wrap<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, T> {
Box::new(move |input: &'a str| {
let trimmed = input.trim_start();
let (value, rest) = parser(trimmed)?;
let trimmed_rest = rest.trim_start();
Ok((value, trimmed_rest))
})
}
/// Line comment: skip from '#' to end of line
fn line_comment<'a>() -> Parser<'a, ()> {
Box::new(|input: &'a str| {
if input.starts_with('#') {
match input.find('\n') {
Some(pos) => Ok(((), &input[pos..])),
None => Ok(((), "")),
}
} else {
Err("Expected '#'".to_string())
}
})
}
fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
let exp = expected.to_string();
Box::new(move |input: &'a str| {
if input.starts_with(&exp) {
Ok((&input[..exp.len()], &input[exp.len()..]))
} else {
Err(format!("Expected \"{}\"", exp))
}
})
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_ws0_spaces() {
assert_eq!(ws0()(" hello"), Ok(((), "hello")));
}
#[test]
fn test_ws0_no_spaces() {
assert_eq!(ws0()("hello"), Ok(((), "hello")));
}
#[test]
fn test_ws0_empty() {
assert_eq!(ws0()(""), Ok(((), "")));
}
#[test]
fn test_ws0_tabs_newlines() {
assert_eq!(ws0()("\t\n x"), Ok(((), "x")));
}
#[test]
fn test_ws1_success() {
assert_eq!(ws1()(" hello"), Ok(((), "hello")));
}
#[test]
fn test_ws1_fail() {
assert!(ws1()("hello").is_err());
}
#[test]
fn test_ws_wrap() {
let p = ws_wrap(tag("hello"));
assert_eq!(p(" hello rest"), Ok(("hello", "rest")));
}
#[test]
fn test_ws_wrap_no_spaces() {
let p = ws_wrap(tag("hello"));
assert_eq!(p("hello"), Ok(("hello", "")));
}
#[test]
fn test_line_comment() {
assert_eq!(line_comment()("# comment\ncode"), Ok(((), "\ncode")));
}
#[test]
fn test_line_comment_eof() {
assert_eq!(line_comment()("# comment"), Ok(((), "")));
}
#[test]
fn test_line_comment_not_hash() {
assert!(line_comment()("code").is_err());
}
}
✓ Tests
Rust test suite
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_ws0_spaces() {
assert_eq!(ws0()(" hello"), Ok(((), "hello")));
}
#[test]
fn test_ws0_no_spaces() {
assert_eq!(ws0()("hello"), Ok(((), "hello")));
}
#[test]
fn test_ws0_empty() {
assert_eq!(ws0()(""), Ok(((), "")));
}
#[test]
fn test_ws0_tabs_newlines() {
assert_eq!(ws0()("\t\n x"), Ok(((), "x")));
}
#[test]
fn test_ws1_success() {
assert_eq!(ws1()(" hello"), Ok(((), "hello")));
}
#[test]
fn test_ws1_fail() {
assert!(ws1()("hello").is_err());
}
#[test]
fn test_ws_wrap() {
let p = ws_wrap(tag("hello"));
assert_eq!(p(" hello rest"), Ok(("hello", "rest")));
}
#[test]
fn test_ws_wrap_no_spaces() {
let p = ws_wrap(tag("hello"));
assert_eq!(p("hello"), Ok(("hello", "")));
}
#[test]
fn test_line_comment() {
assert_eq!(line_comment()("# comment\ncode"), Ok(((), "\ncode")));
}
#[test]
fn test_line_comment_eof() {
assert_eq!(line_comment()("# comment"), Ok(((), "")));
}
#[test]
fn test_line_comment_not_hash() {
assert!(line_comment()("code").is_err());
}
}
Deep Comparison
Comparison: Example 163 — Whitespace Parser
ws0
OCaml:
let ws0 : unit parser = fun input ->
match many0 (satisfy is_ws "whitespace") input with
| Ok (_, rest) -> Ok ((), rest)
| Error e -> Error e
Rust:
fn ws0<'a>() -> Parser<'a, ()> {
Box::new(|input: &'a str| {
let trimmed = input.trim_start();
Ok(((), trimmed))
})
}
ws_wrap
OCaml:
let ws_wrap (p : 'a parser) : 'a parser = fun input ->
match ws0 input with
| Ok ((), r1) ->
(match p r1 with
| Ok (v, r2) ->
(match ws0 r2 with
| Ok ((), r3) -> Ok (v, r3)
| Error e -> Error e)
| Error e -> Error e)
| Error e -> Error e
Rust:
fn ws_wrap<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, T> {
Box::new(move |input: &'a str| {
let trimmed = input.trim_start();
let (value, rest) = parser(trimmed)?;
let trimmed_rest = rest.trim_start();
Ok((value, trimmed_rest))
})
}
Exercises
ws0 to also skip line comments: // ...until end of line.ws_between(open: Parser<A>, sep: Parser<B>, close: Parser<C>) -> Parser<Vec<B>> that handles whitespace around separators.lexeme(p: Parser<T>) -> Parser<T> combinator that skips whitespace after p (a common pattern in language parsers).