Many Parser
Tutorial
The Problem
Most grammars involve repetition: a sequence of digits forms a number, a sequence of letters forms an identifier, a sequence of statements forms a block. many0 (zero or more) and many1 (one or more) generalize repetition for any parser. These combinators run a parser repeatedly until it fails, collecting all results into a Vec<T>. many0 always succeeds (empty input is zero repetitions); many1 fails if the first application fails.
🎯 Learning Outcomes
many0 and many1 as the standard repetition combinatorsmany0 cannot fail (the empty case is valid)many0 with a parser that always succeedsCode Example
fn many0<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, Vec<T>> {
Box::new(move |mut input: &'a str| {
let mut results = Vec::new();
while let Ok((value, rest)) = parser(input) {
results.push(value);
input = rest;
}
Ok((results, input))
})
}Key Differences
Vec<T> (O(1) amortized push); OCaml traditionally uses lists (O(n) for reversing if using cons).fix-based many terminates naturally in a lazy context; Rust's explicit loop is strictly eager.many0 swallows failures silently (treating them as termination); many1 propagates the first failure.OCaml Approach
OCaml's angstrom provides many : 'a t -> 'a list t and many1 : 'a t -> 'a list t. The implementation uses fix (the fixed-point combinator) for laziness:
let many p = fix (fun m -> (lift2 (fun x xs -> x :: xs) p m) <|> return [])
This recursive definition handles the termination naturally in OCaml's lazy evaluation model. The list-based return type in OCaml means repeated prepending — less efficient than Rust's Vec::push for long sequences.
Full Source
#![allow(clippy::all)]
// Example 155: Many Parser
// many0 and many1: parse zero or more / one or more
type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
fn satisfy<'a, F>(pred: F, desc: &str) -> Parser<'a, char>
where
F: Fn(char) -> bool + 'a,
{
let desc = desc.to_string();
Box::new(move |input: &'a str| match input.chars().next() {
Some(c) if pred(c) => Ok((c, &input[c.len_utf8()..])),
_ => Err(format!("Expected {}", desc)),
})
}
// ============================================================
// Approach 1: many0 — zero or more, always succeeds
// ============================================================
fn many0<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, Vec<T>> {
Box::new(move |mut input: &'a str| {
let mut results = Vec::new();
while let Ok((value, rest)) = parser(input) {
results.push(value);
input = rest;
}
Ok((results, input))
})
}
// ============================================================
// Approach 2: many1 — one or more, fails if zero
// ============================================================
fn many1<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, Vec<T>> {
Box::new(move |input: &'a str| {
let (first, mut remaining) = parser(input)?;
let mut results = vec![first];
while let Ok((value, rest)) = parser(remaining) {
results.push(value);
remaining = rest;
}
Ok((results, remaining))
})
}
// ============================================================
// Approach 3: many_till — parse until terminator succeeds
// ============================================================
fn many_till<'a, T: 'a, U: 'a>(
parser: Parser<'a, T>,
stop: Parser<'a, U>,
) -> Parser<'a, (Vec<T>, U)> {
Box::new(move |mut input: &'a str| {
let mut results = Vec::new();
loop {
if let Ok((term, rest)) = stop(input) {
return Ok(((results, term), rest));
}
let (value, rest) = parser(input)?;
results.push(value);
input = rest;
}
})
}
/// Collect many0 chars into a String
fn many0_str<'a>(parser: Parser<'a, char>) -> Parser<'a, String> {
Box::new(move |mut input: &'a str| {
let mut s = String::new();
while let Ok((c, rest)) = parser(input) {
s.push(c);
input = rest;
}
Ok((s, input))
})
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_many0_some() {
let p = many0(satisfy(|c| c.is_ascii_digit(), "digit"));
let (digits, rest) = p("123abc").unwrap();
assert_eq!(digits, vec!['1', '2', '3']);
assert_eq!(rest, "abc");
}
#[test]
fn test_many0_none() {
let p = many0(satisfy(|c| c.is_ascii_digit(), "digit"));
let (digits, rest) = p("abc").unwrap();
assert!(digits.is_empty());
assert_eq!(rest, "abc");
}
#[test]
fn test_many0_empty_input() {
let p = many0(satisfy(|c| c.is_ascii_digit(), "digit"));
let (digits, rest) = p("").unwrap();
assert!(digits.is_empty());
assert_eq!(rest, "");
}
#[test]
fn test_many1_some() {
let p = many1(satisfy(|c| c.is_ascii_digit(), "digit"));
let (digits, rest) = p("123abc").unwrap();
assert_eq!(digits, vec!['1', '2', '3']);
assert_eq!(rest, "abc");
}
#[test]
fn test_many1_none_fails() {
let p = many1(satisfy(|c| c.is_ascii_digit(), "digit"));
assert!(p("abc").is_err());
}
#[test]
fn test_many_till() {
let letter = satisfy(|c| c.is_ascii_alphabetic(), "letter");
let dot = satisfy(|c| c == '.', "dot");
let p = many_till(letter, dot);
let ((letters, term), rest) = p("abc.rest").unwrap();
assert_eq!(letters, vec!['a', 'b', 'c']);
assert_eq!(term, '.');
assert_eq!(rest, "rest");
}
#[test]
fn test_many0_str() {
let p = many0_str(satisfy(|c| c.is_ascii_digit(), "digit"));
let (s, rest) = p("456xy").unwrap();
assert_eq!(s, "456");
assert_eq!(rest, "xy");
}
#[test]
fn test_many1_single() {
let p = many1(satisfy(|c| c.is_ascii_digit(), "digit"));
let (digits, rest) = p("5abc").unwrap();
assert_eq!(digits, vec!['5']);
assert_eq!(rest, "abc");
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_many0_some() {
let p = many0(satisfy(|c| c.is_ascii_digit(), "digit"));
let (digits, rest) = p("123abc").unwrap();
assert_eq!(digits, vec!['1', '2', '3']);
assert_eq!(rest, "abc");
}
#[test]
fn test_many0_none() {
let p = many0(satisfy(|c| c.is_ascii_digit(), "digit"));
let (digits, rest) = p("abc").unwrap();
assert!(digits.is_empty());
assert_eq!(rest, "abc");
}
#[test]
fn test_many0_empty_input() {
let p = many0(satisfy(|c| c.is_ascii_digit(), "digit"));
let (digits, rest) = p("").unwrap();
assert!(digits.is_empty());
assert_eq!(rest, "");
}
#[test]
fn test_many1_some() {
let p = many1(satisfy(|c| c.is_ascii_digit(), "digit"));
let (digits, rest) = p("123abc").unwrap();
assert_eq!(digits, vec!['1', '2', '3']);
assert_eq!(rest, "abc");
}
#[test]
fn test_many1_none_fails() {
let p = many1(satisfy(|c| c.is_ascii_digit(), "digit"));
assert!(p("abc").is_err());
}
#[test]
fn test_many_till() {
let letter = satisfy(|c| c.is_ascii_alphabetic(), "letter");
let dot = satisfy(|c| c == '.', "dot");
let p = many_till(letter, dot);
let ((letters, term), rest) = p("abc.rest").unwrap();
assert_eq!(letters, vec!['a', 'b', 'c']);
assert_eq!(term, '.');
assert_eq!(rest, "rest");
}
#[test]
fn test_many0_str() {
let p = many0_str(satisfy(|c| c.is_ascii_digit(), "digit"));
let (s, rest) = p("456xy").unwrap();
assert_eq!(s, "456");
assert_eq!(rest, "xy");
}
#[test]
fn test_many1_single() {
let p = many1(satisfy(|c| c.is_ascii_digit(), "digit"));
let (digits, rest) = p("5abc").unwrap();
assert_eq!(digits, vec!['5']);
assert_eq!(rest, "abc");
}
}
Deep Comparison
Comparison: Example 155 — Many Parser
many0
OCaml:
let many0 (p : 'a parser) : 'a list parser = fun input ->
let rec go acc remaining =
match p remaining with
| Ok (v, rest) -> go (v :: acc) rest
| Error _ -> Ok (List.rev acc, remaining)
in
go [] input
Rust:
fn many0<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, Vec<T>> {
Box::new(move |mut input: &'a str| {
let mut results = Vec::new();
while let Ok((value, rest)) = parser(input) {
results.push(value);
input = rest;
}
Ok((results, input))
})
}
many1
OCaml:
let many1 (p : 'a parser) : 'a list parser = fun input ->
match p input with
| Error e -> Error e
| Ok (first, rest) ->
match many0 p rest with
| Ok (others, remaining) -> Ok (first :: others, remaining)
| Error e -> Error e
Rust:
fn many1<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, Vec<T>> {
Box::new(move |input: &'a str| {
let (first, mut remaining) = parser(input)?;
let mut results = vec![first];
while let Ok((value, rest)) = parser(remaining) {
results.push(value);
remaining = rest;
}
Ok((results, remaining))
})
}
many_till
OCaml:
let many_till (p : 'a parser) (stop : 'b parser) : ('a list * 'b) parser = fun input ->
let rec go acc remaining =
match stop remaining with
| Ok (term, rest) -> Ok ((List.rev acc, term), rest)
| Error _ ->
match p remaining with
| Ok (v, rest) -> go (v :: acc) rest
| Error e -> Error e
in
go [] input
Rust:
fn many_till<'a, T: 'a, U: 'a>(
parser: Parser<'a, T>,
stop: Parser<'a, U>,
) -> Parser<'a, (Vec<T>, U)> {
Box::new(move |mut input: &'a str| {
let mut results = Vec::new();
loop {
if let Ok((term, rest)) = stop(input) {
return Ok(((results, term), rest));
}
let (value, rest) = parser(input)?;
results.push(value);
input = rest;
}
})
}
Exercises
many_sep<A, B>(item: Parser<A>, sep: Parser<B>) -> Parser<Vec<A>> that parses a, a, a where , is the separator.at_least<T>(n: usize, p: Parser<T>) -> Parser<Vec<T>> that requires at least n successful applications.at_most<T>(n: usize, p: Parser<T>) -> Parser<Vec<T>> that applies the parser at most n times.