ExamplesBy LevelBy TopicLearning Paths
163 Advanced

Whitespace Parser

Functional Programming

Tutorial

The Problem

Most text formats are whitespace-insensitive: {"key": "value"} and { "key" : "value" } are equivalent JSON. Parsers must skip whitespace between tokens without interfering with token recognition. ws0 (zero or more whitespace characters) and ws1 (one or more) are standard utilities. Wrapping content parsers with ws_wrap allows callers to ignore whitespace concerns entirely, keeping individual token parsers clean and focused.

🎯 Learning Outcomes

  • • Implement ws0, ws1, and ws_wrap as standard whitespace-handling utilities
  • • Understand why whitespace parsers always succeed (zero whitespace is valid)
  • • Learn the "wrap with whitespace" pattern for building whitespace-insensitive parsers
  • • See how comment-skipping extends whitespace handling for real languages
  • Code Example

    fn ws0<'a>() -> Parser<'a, ()> {
        Box::new(|input: &'a str| {
            let trimmed = input.trim_start();
            Ok(((), trimmed))
        })
    }

    Key Differences

  • Efficiency: OCaml's skip_while skips bytes without constructing values; Rust's many0(satisfy(...)) creates Vec<char> and discards it.
  • Optimization: A production Rust parser would use input.trim_start() directly or scan with str::find(|c: char| !c.is_whitespace()) — bypassing the combinator overhead.
  • Line counting: Neither basic ws0 tracks line numbers; adding line/column tracking requires threading a position state through the parser.
  • Comment handling: Both can extend ws0 to also skip comments; the typical approach is many0(choice([whitespace, line_comment, block_comment])).
  • OCaml Approach

    Angstrom provides skip_while : (char -> bool) -> unit t for efficient whitespace skipping without character-by-character overhead:

    let ws = skip_while (fun c -> c = ' ' || c = '\t' || c = '\n' || c = '\r')
    let ws_wrap p = ws *> p <* ws
    

    OCaml's skip_while scans the buffer without constructing char values, making whitespace skipping more efficient than many0(satisfy(...)).

    Full Source

    #![allow(clippy::all)]
    // Example 163: Whitespace Parser
    // Parse and skip whitespace: ws0, ws1, ws_wrap
    
    type ParseResult<'a, T> = Result<(T, &'a str), String>;
    type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
    
    // ============================================================
    // Approach 1: ws0 — skip zero or more whitespace (always succeeds)
    // ============================================================
    
    fn ws0<'a>() -> Parser<'a, ()> {
        Box::new(|input: &'a str| {
            let trimmed = input.trim_start();
            Ok(((), trimmed))
        })
    }
    
    // ============================================================
    // Approach 2: ws1 — require at least one whitespace
    // ============================================================
    
    fn ws1<'a>() -> Parser<'a, ()> {
        Box::new(|input: &'a str| match input.chars().next() {
            Some(c) if c.is_ascii_whitespace() => {
                let trimmed = input.trim_start();
                Ok(((), trimmed))
            }
            _ => Err("Expected whitespace".to_string()),
        })
    }
    
    // ============================================================
    // Approach 3: ws_wrap — parse with surrounding whitespace
    // ============================================================
    
    fn ws_wrap<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, T> {
        Box::new(move |input: &'a str| {
            let trimmed = input.trim_start();
            let (value, rest) = parser(trimmed)?;
            let trimmed_rest = rest.trim_start();
            Ok((value, trimmed_rest))
        })
    }
    
    /// Line comment: skip from '#' to end of line
    fn line_comment<'a>() -> Parser<'a, ()> {
        Box::new(|input: &'a str| {
            if input.starts_with('#') {
                match input.find('\n') {
                    Some(pos) => Ok(((), &input[pos..])),
                    None => Ok(((), "")),
                }
            } else {
                Err("Expected '#'".to_string())
            }
        })
    }
    
    fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
        let exp = expected.to_string();
        Box::new(move |input: &'a str| {
            if input.starts_with(&exp) {
                Ok((&input[..exp.len()], &input[exp.len()..]))
            } else {
                Err(format!("Expected \"{}\"", exp))
            }
        })
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_ws0_spaces() {
            assert_eq!(ws0()("  hello"), Ok(((), "hello")));
        }
    
        #[test]
        fn test_ws0_no_spaces() {
            assert_eq!(ws0()("hello"), Ok(((), "hello")));
        }
    
        #[test]
        fn test_ws0_empty() {
            assert_eq!(ws0()(""), Ok(((), "")));
        }
    
        #[test]
        fn test_ws0_tabs_newlines() {
            assert_eq!(ws0()("\t\n  x"), Ok(((), "x")));
        }
    
        #[test]
        fn test_ws1_success() {
            assert_eq!(ws1()("  hello"), Ok(((), "hello")));
        }
    
        #[test]
        fn test_ws1_fail() {
            assert!(ws1()("hello").is_err());
        }
    
        #[test]
        fn test_ws_wrap() {
            let p = ws_wrap(tag("hello"));
            assert_eq!(p("  hello  rest"), Ok(("hello", "rest")));
        }
    
        #[test]
        fn test_ws_wrap_no_spaces() {
            let p = ws_wrap(tag("hello"));
            assert_eq!(p("hello"), Ok(("hello", "")));
        }
    
        #[test]
        fn test_line_comment() {
            assert_eq!(line_comment()("# comment\ncode"), Ok(((), "\ncode")));
        }
    
        #[test]
        fn test_line_comment_eof() {
            assert_eq!(line_comment()("# comment"), Ok(((), "")));
        }
    
        #[test]
        fn test_line_comment_not_hash() {
            assert!(line_comment()("code").is_err());
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_ws0_spaces() {
            assert_eq!(ws0()("  hello"), Ok(((), "hello")));
        }
    
        #[test]
        fn test_ws0_no_spaces() {
            assert_eq!(ws0()("hello"), Ok(((), "hello")));
        }
    
        #[test]
        fn test_ws0_empty() {
            assert_eq!(ws0()(""), Ok(((), "")));
        }
    
        #[test]
        fn test_ws0_tabs_newlines() {
            assert_eq!(ws0()("\t\n  x"), Ok(((), "x")));
        }
    
        #[test]
        fn test_ws1_success() {
            assert_eq!(ws1()("  hello"), Ok(((), "hello")));
        }
    
        #[test]
        fn test_ws1_fail() {
            assert!(ws1()("hello").is_err());
        }
    
        #[test]
        fn test_ws_wrap() {
            let p = ws_wrap(tag("hello"));
            assert_eq!(p("  hello  rest"), Ok(("hello", "rest")));
        }
    
        #[test]
        fn test_ws_wrap_no_spaces() {
            let p = ws_wrap(tag("hello"));
            assert_eq!(p("hello"), Ok(("hello", "")));
        }
    
        #[test]
        fn test_line_comment() {
            assert_eq!(line_comment()("# comment\ncode"), Ok(((), "\ncode")));
        }
    
        #[test]
        fn test_line_comment_eof() {
            assert_eq!(line_comment()("# comment"), Ok(((), "")));
        }
    
        #[test]
        fn test_line_comment_not_hash() {
            assert!(line_comment()("code").is_err());
        }
    }

    Deep Comparison

    Comparison: Example 163 — Whitespace Parser

    ws0

    OCaml:

    let ws0 : unit parser = fun input ->
      match many0 (satisfy is_ws "whitespace") input with
      | Ok (_, rest) -> Ok ((), rest)
      | Error e -> Error e
    

    Rust:

    fn ws0<'a>() -> Parser<'a, ()> {
        Box::new(|input: &'a str| {
            let trimmed = input.trim_start();
            Ok(((), trimmed))
        })
    }
    

    ws_wrap

    OCaml:

    let ws_wrap (p : 'a parser) : 'a parser = fun input ->
      match ws0 input with
      | Ok ((), r1) ->
        (match p r1 with
         | Ok (v, r2) ->
           (match ws0 r2 with
            | Ok ((), r3) -> Ok (v, r3)
            | Error e -> Error e)
         | Error e -> Error e)
      | Error e -> Error e
    

    Rust:

    fn ws_wrap<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, T> {
        Box::new(move |input: &'a str| {
            let trimmed = input.trim_start();
            let (value, rest) = parser(trimmed)?;
            let trimmed_rest = rest.trim_start();
            Ok((value, trimmed_rest))
        })
    }
    

    Exercises

  • Extend ws0 to also skip line comments: // ...until end of line.
  • Implement ws_between(open: Parser<A>, sep: Parser<B>, close: Parser<C>) -> Parser<Vec<B>> that handles whitespace around separators.
  • Write a lexeme(p: Parser<T>) -> Parser<T> combinator that skips whitespace after p (a common pattern in language parsers).
  • Open Source Repos