ExamplesBy LevelBy TopicLearning Paths
155 Advanced

Many Parser

Functional Programming

Tutorial

The Problem

Most grammars involve repetition: a sequence of digits forms a number, a sequence of letters forms an identifier, a sequence of statements forms a block. many0 (zero or more) and many1 (one or more) generalize repetition for any parser. These combinators run a parser repeatedly until it fails, collecting all results into a Vec<T>. many0 always succeeds (empty input is zero repetitions); many1 fails if the first application fails.

🎯 Learning Outcomes

  • • Understand many0 and many1 as the standard repetition combinators
  • • Learn why many0 cannot fail (the empty case is valid)
  • • See how many combines with character parsers to build word and number parsers
  • • Understand the potential infinite loop: using many0 with a parser that always succeeds
  • Code Example

    fn many0<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, Vec<T>> {
        Box::new(move |mut input: &'a str| {
            let mut results = Vec::new();
            while let Ok((value, rest)) = parser(input) {
                results.push(value);
                input = rest;
            }
            Ok((results, input))
        })
    }

    Key Differences

  • Return type: Rust collects into Vec<T> (O(1) amortized push); OCaml traditionally uses lists (O(n) for reversing if using cons).
  • Termination: Both must ensure the inner parser consumes input to avoid infinite loops; this is a convention, not a type-system guarantee.
  • Lazy vs. eager: OCaml's fix-based many terminates naturally in a lazy context; Rust's explicit loop is strictly eager.
  • Error handling: many0 swallows failures silently (treating them as termination); many1 propagates the first failure.
  • OCaml Approach

    OCaml's angstrom provides many : 'a t -> 'a list t and many1 : 'a t -> 'a list t. The implementation uses fix (the fixed-point combinator) for laziness:

    let many p = fix (fun m -> (lift2 (fun x xs -> x :: xs) p m) <|> return [])
    

    This recursive definition handles the termination naturally in OCaml's lazy evaluation model. The list-based return type in OCaml means repeated prepending — less efficient than Rust's Vec::push for long sequences.

    Full Source

    #![allow(clippy::all)]
    // Example 155: Many Parser
    // many0 and many1: parse zero or more / one or more
    
    type ParseResult<'a, T> = Result<(T, &'a str), String>;
    type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
    
    fn satisfy<'a, F>(pred: F, desc: &str) -> Parser<'a, char>
    where
        F: Fn(char) -> bool + 'a,
    {
        let desc = desc.to_string();
        Box::new(move |input: &'a str| match input.chars().next() {
            Some(c) if pred(c) => Ok((c, &input[c.len_utf8()..])),
            _ => Err(format!("Expected {}", desc)),
        })
    }
    
    // ============================================================
    // Approach 1: many0 — zero or more, always succeeds
    // ============================================================
    
    fn many0<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, Vec<T>> {
        Box::new(move |mut input: &'a str| {
            let mut results = Vec::new();
            while let Ok((value, rest)) = parser(input) {
                results.push(value);
                input = rest;
            }
            Ok((results, input))
        })
    }
    
    // ============================================================
    // Approach 2: many1 — one or more, fails if zero
    // ============================================================
    
    fn many1<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, Vec<T>> {
        Box::new(move |input: &'a str| {
            let (first, mut remaining) = parser(input)?;
            let mut results = vec![first];
            while let Ok((value, rest)) = parser(remaining) {
                results.push(value);
                remaining = rest;
            }
            Ok((results, remaining))
        })
    }
    
    // ============================================================
    // Approach 3: many_till — parse until terminator succeeds
    // ============================================================
    
    fn many_till<'a, T: 'a, U: 'a>(
        parser: Parser<'a, T>,
        stop: Parser<'a, U>,
    ) -> Parser<'a, (Vec<T>, U)> {
        Box::new(move |mut input: &'a str| {
            let mut results = Vec::new();
            loop {
                if let Ok((term, rest)) = stop(input) {
                    return Ok(((results, term), rest));
                }
                let (value, rest) = parser(input)?;
                results.push(value);
                input = rest;
            }
        })
    }
    
    /// Collect many0 chars into a String
    fn many0_str<'a>(parser: Parser<'a, char>) -> Parser<'a, String> {
        Box::new(move |mut input: &'a str| {
            let mut s = String::new();
            while let Ok((c, rest)) = parser(input) {
                s.push(c);
                input = rest;
            }
            Ok((s, input))
        })
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_many0_some() {
            let p = many0(satisfy(|c| c.is_ascii_digit(), "digit"));
            let (digits, rest) = p("123abc").unwrap();
            assert_eq!(digits, vec!['1', '2', '3']);
            assert_eq!(rest, "abc");
        }
    
        #[test]
        fn test_many0_none() {
            let p = many0(satisfy(|c| c.is_ascii_digit(), "digit"));
            let (digits, rest) = p("abc").unwrap();
            assert!(digits.is_empty());
            assert_eq!(rest, "abc");
        }
    
        #[test]
        fn test_many0_empty_input() {
            let p = many0(satisfy(|c| c.is_ascii_digit(), "digit"));
            let (digits, rest) = p("").unwrap();
            assert!(digits.is_empty());
            assert_eq!(rest, "");
        }
    
        #[test]
        fn test_many1_some() {
            let p = many1(satisfy(|c| c.is_ascii_digit(), "digit"));
            let (digits, rest) = p("123abc").unwrap();
            assert_eq!(digits, vec!['1', '2', '3']);
            assert_eq!(rest, "abc");
        }
    
        #[test]
        fn test_many1_none_fails() {
            let p = many1(satisfy(|c| c.is_ascii_digit(), "digit"));
            assert!(p("abc").is_err());
        }
    
        #[test]
        fn test_many_till() {
            let letter = satisfy(|c| c.is_ascii_alphabetic(), "letter");
            let dot = satisfy(|c| c == '.', "dot");
            let p = many_till(letter, dot);
            let ((letters, term), rest) = p("abc.rest").unwrap();
            assert_eq!(letters, vec!['a', 'b', 'c']);
            assert_eq!(term, '.');
            assert_eq!(rest, "rest");
        }
    
        #[test]
        fn test_many0_str() {
            let p = many0_str(satisfy(|c| c.is_ascii_digit(), "digit"));
            let (s, rest) = p("456xy").unwrap();
            assert_eq!(s, "456");
            assert_eq!(rest, "xy");
        }
    
        #[test]
        fn test_many1_single() {
            let p = many1(satisfy(|c| c.is_ascii_digit(), "digit"));
            let (digits, rest) = p("5abc").unwrap();
            assert_eq!(digits, vec!['5']);
            assert_eq!(rest, "abc");
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_many0_some() {
            let p = many0(satisfy(|c| c.is_ascii_digit(), "digit"));
            let (digits, rest) = p("123abc").unwrap();
            assert_eq!(digits, vec!['1', '2', '3']);
            assert_eq!(rest, "abc");
        }
    
        #[test]
        fn test_many0_none() {
            let p = many0(satisfy(|c| c.is_ascii_digit(), "digit"));
            let (digits, rest) = p("abc").unwrap();
            assert!(digits.is_empty());
            assert_eq!(rest, "abc");
        }
    
        #[test]
        fn test_many0_empty_input() {
            let p = many0(satisfy(|c| c.is_ascii_digit(), "digit"));
            let (digits, rest) = p("").unwrap();
            assert!(digits.is_empty());
            assert_eq!(rest, "");
        }
    
        #[test]
        fn test_many1_some() {
            let p = many1(satisfy(|c| c.is_ascii_digit(), "digit"));
            let (digits, rest) = p("123abc").unwrap();
            assert_eq!(digits, vec!['1', '2', '3']);
            assert_eq!(rest, "abc");
        }
    
        #[test]
        fn test_many1_none_fails() {
            let p = many1(satisfy(|c| c.is_ascii_digit(), "digit"));
            assert!(p("abc").is_err());
        }
    
        #[test]
        fn test_many_till() {
            let letter = satisfy(|c| c.is_ascii_alphabetic(), "letter");
            let dot = satisfy(|c| c == '.', "dot");
            let p = many_till(letter, dot);
            let ((letters, term), rest) = p("abc.rest").unwrap();
            assert_eq!(letters, vec!['a', 'b', 'c']);
            assert_eq!(term, '.');
            assert_eq!(rest, "rest");
        }
    
        #[test]
        fn test_many0_str() {
            let p = many0_str(satisfy(|c| c.is_ascii_digit(), "digit"));
            let (s, rest) = p("456xy").unwrap();
            assert_eq!(s, "456");
            assert_eq!(rest, "xy");
        }
    
        #[test]
        fn test_many1_single() {
            let p = many1(satisfy(|c| c.is_ascii_digit(), "digit"));
            let (digits, rest) = p("5abc").unwrap();
            assert_eq!(digits, vec!['5']);
            assert_eq!(rest, "abc");
        }
    }

    Deep Comparison

    Comparison: Example 155 — Many Parser

    many0

    OCaml:

    let many0 (p : 'a parser) : 'a list parser = fun input ->
      let rec go acc remaining =
        match p remaining with
        | Ok (v, rest) -> go (v :: acc) rest
        | Error _ -> Ok (List.rev acc, remaining)
      in
      go [] input
    

    Rust:

    fn many0<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, Vec<T>> {
        Box::new(move |mut input: &'a str| {
            let mut results = Vec::new();
            while let Ok((value, rest)) = parser(input) {
                results.push(value);
                input = rest;
            }
            Ok((results, input))
        })
    }
    

    many1

    OCaml:

    let many1 (p : 'a parser) : 'a list parser = fun input ->
      match p input with
      | Error e -> Error e
      | Ok (first, rest) ->
        match many0 p rest with
        | Ok (others, remaining) -> Ok (first :: others, remaining)
        | Error e -> Error e
    

    Rust:

    fn many1<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, Vec<T>> {
        Box::new(move |input: &'a str| {
            let (first, mut remaining) = parser(input)?;
            let mut results = vec![first];
            while let Ok((value, rest)) = parser(remaining) {
                results.push(value);
                remaining = rest;
            }
            Ok((results, remaining))
        })
    }
    

    many_till

    OCaml:

    let many_till (p : 'a parser) (stop : 'b parser) : ('a list * 'b) parser = fun input ->
      let rec go acc remaining =
        match stop remaining with
        | Ok (term, rest) -> Ok ((List.rev acc, term), rest)
        | Error _ ->
          match p remaining with
          | Ok (v, rest) -> go (v :: acc) rest
          | Error e -> Error e
      in
      go [] input
    

    Rust:

    fn many_till<'a, T: 'a, U: 'a>(
        parser: Parser<'a, T>,
        stop: Parser<'a, U>,
    ) -> Parser<'a, (Vec<T>, U)> {
        Box::new(move |mut input: &'a str| {
            let mut results = Vec::new();
            loop {
                if let Ok((term, rest)) = stop(input) {
                    return Ok(((results, term), rest));
                }
                let (value, rest) = parser(input)?;
                results.push(value);
                input = rest;
            }
        })
    }
    

    Exercises

  • Implement many_sep<A, B>(item: Parser<A>, sep: Parser<B>) -> Parser<Vec<A>> that parses a, a, a where , is the separator.
  • Write at_least<T>(n: usize, p: Parser<T>) -> Parser<Vec<T>> that requires at least n successful applications.
  • Implement at_most<T>(n: usize, p: Parser<T>) -> Parser<Vec<T>> that applies the parser at most n times.
  • Open Source Repos