160 Advanced

FlatMap Parser

Functional Programming

Tutorial

The Problem

Sometimes the choice of what to parse next depends on what was just parsed — a context-sensitive grammar. For example, a length-prefixed string "3:abc" requires first parsing the length 3, then using that to parse exactly 3 more characters. flat_map (also called bind or and_then) enables this: it runs a parser, passes the result to a function that returns a new parser, and runs that parser on the remaining input. This is the monadic bind (>>=) for parsers, enabling context-sensitive parsing.

🎯 Learning Outcomes

• Understand flat_map as monadic bind for parsers, enabling context-sensitive parsing

• Learn why flat_map is more powerful than map (it can choose the next parser dynamically)

• See the length-prefixed string pattern as a canonical flat_map example

• Understand the relationship between parser monads and other Rust monads (Option, Result)

Code Example

fn and_then<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
where F: Fn(A) -> Parser<'a, B> + 'a {
    Box::new(move |input: &'a str| {
        let (value, rest) = parser(input)?;
        (f(value))(rest)
    })
}

let and_then (p : 'a parser) (f : 'a -> 'b parser) : 'b parser = fun input ->
  match p input with
  | Error e -> Error e
  | Ok (v, rest) -> (f v) rest

let ( >>= ) = and_then

Key Differences

Notation: OCaml's let* desugars to >>=, giving near-imperative parser code; Rust uses closure-based chaining, which becomes nested for long sequences.

Power: flat_map makes parsers a full monad — every combinator (map, pair, opt) is derivable from flat_map and pure; some libraries do exactly this.

Performance: flat_map prevents certain parser optimizations (streaming, streaming allocation) because the next step is unknown until the first is complete.

Context sensitivity: Both angstrom and Rust's flat_map handle context-sensitive grammars; PEG parsers without flat_map cannot.

OCaml Approach

OCaml's angstrom provides bind : 'a t -> ('a -> 'b t) -> 'b t and the infix >>=:

let length_prefixed = uint_parser >>= fun n -> take n

Monadic do-notation via let* (OCaml 4.08+) makes context-sensitive parsers readable:

let length_prefixed =
  let* n = uint_parser in
  take n

This is cleaner than Rust's closure-based flat_map for complex sequences.

Full Source

#![allow(clippy::all)]
// Example 160: FlatMap Parser
// flat_map / and_then: monadic chaining of parsers

type ParseResult<'a, T> = Result<(T, &'a str), String>;
type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;

fn satisfy<'a, F>(pred: F, desc: &str) -> Parser<'a, char>
where
    F: Fn(char) -> bool + 'a,
{
    let desc = desc.to_string();
    Box::new(move |input: &'a str| match input.chars().next() {
        Some(c) if pred(c) => Ok((c, &input[c.len_utf8()..])),
        _ => Err(format!("Expected {}", desc)),
    })
}

fn many1<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, Vec<T>> {
    Box::new(move |input: &'a str| {
        let (first, mut rem) = parser(input)?;
        let mut results = vec![first];
        while let Ok((v, rest)) = parser(rem) {
            results.push(v);
            rem = rest;
        }
        Ok((results, rem))
    })
}

fn map<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
where
    F: Fn(A) -> B + 'a,
{
    Box::new(move |input: &'a str| {
        let (v, rest) = parser(input)?;
        Ok((f(v), rest))
    })
}

// ============================================================
// Approach 1: and_then / bind — monadic chaining
// ============================================================

fn and_then<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
where
    F: Fn(A) -> Parser<'a, B> + 'a,
{
    Box::new(move |input: &'a str| {
        let (value, rest) = parser(input)?;
        (f(value))(rest)
    })
}

// ============================================================
// Approach 2: Context-sensitive — length-prefixed string "3:abc"
// ============================================================

fn parse_nat<'a>() -> Parser<'a, usize> {
    map(many1(satisfy(|c| c.is_ascii_digit(), "digit")), |digits| {
        digits
            .iter()
            .fold(0usize, |acc, &d| acc * 10 + (d as usize - '0' as usize))
    })
}

fn length_prefixed<'a>() -> Parser<'a, &'a str> {
    and_then(parse_nat(), |n| {
        Box::new(move |input: &'a str| {
            if let Some(rest) = input.strip_prefix(':') {
                if rest.len() >= n {
                    Ok((&rest[..n], &rest[n..]))
                } else {
                    Err("Not enough characters".to_string())
                }
            } else {
                Err("Expected ':'".to_string())
            }
        })
    })
}

// ============================================================
// Approach 3: Conditional parsing based on tag
// ============================================================

fn conditional_parser<'a>() -> Parser<'a, String> {
    and_then(satisfy(|c| c == 'i' || c == 's', "type tag"), |tag_char| {
        if tag_char == 'i' {
            map(many1(satisfy(|c| c.is_ascii_digit(), "digit")), |chars| {
                chars.into_iter().collect()
            })
        } else {
            map(
                many1(satisfy(|c| c.is_ascii_lowercase(), "letter")),
                |chars| chars.into_iter().collect(),
            )
        }
    })
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_and_then_basic() {
        let p = and_then(satisfy(|c| c.is_ascii_digit(), "digit"), |d| {
            map(satisfy(|c| c.is_ascii_digit(), "digit"), move |d2| {
                format!("{}{}", d, d2)
            })
        });
        assert_eq!(p("12x"), Ok(("12".to_string(), "x")));
    }

    #[test]
    fn test_length_prefixed() {
        let p = length_prefixed();
        assert_eq!(p("3:abcrest"), Ok(("abc", "rest")));
    }

    #[test]
    fn test_length_prefixed_5() {
        let p = length_prefixed();
        assert_eq!(p("5:helloworld"), Ok(("hello", "world")));
    }

    #[test]
    fn test_length_prefixed_too_short() {
        let p = length_prefixed();
        assert!(p("5:hi").is_err());
    }

    #[test]
    fn test_conditional_int() {
        let p = conditional_parser();
        assert_eq!(p("i42rest"), Ok(("42".to_string(), "rest")));
    }

    #[test]
    fn test_conditional_string() {
        let p = conditional_parser();
        assert_eq!(p("sabcREST"), Ok(("abc".to_string(), "REST")));
    }

    #[test]
    fn test_conditional_invalid_tag() {
        let p = conditional_parser();
        assert!(p("x123").is_err());
    }

    #[test]
    fn test_and_then_error_propagation() {
        let p = and_then(satisfy(|c| c.is_ascii_digit(), "digit"), |_| {
            satisfy(|c| c.is_ascii_uppercase(), "upper")
        });
        assert!(p("1a").is_err()); // second parser fails
    }
}

(* Example 160: FlatMap Parser *)
(* flat_map / and_then: monadic chaining of parsers *)

type 'a parse_result = ('a * string, string) result
type 'a parser = string -> 'a parse_result

let satisfy pred desc : char parser = fun input ->
  if String.length input > 0 && pred input.[0] then
    Ok (input.[0], String.sub input 1 (String.length input - 1))
  else Error (Printf.sprintf "Expected %s" desc)

let tag expected : string parser = fun input ->
  let len = String.length expected in
  if String.length input >= len && String.sub input 0 len = expected then
    Ok (expected, String.sub input len (String.length input - len))
  else Error (Printf.sprintf "Expected \"%s\"" expected)

let many1 (p : 'a parser) : 'a list parser = fun input ->
  match p input with
  | Error e -> Error e
  | Ok (v, rest) ->
    let rec go acc rem = match p rem with
      | Ok (v, r) -> go (v :: acc) r
      | Error _ -> Ok (List.rev acc, rem)
    in match go [v] rest with
    | Ok (vs, r) -> Ok (vs, r)
    | Error e -> Error e

let map f p : 'b parser = fun input ->
  match p input with Ok (v, r) -> Ok (f v, r) | Error e -> Error e

(* Approach 1: and_then / bind — monadic chaining *)
let and_then (p : 'a parser) (f : 'a -> 'b parser) : 'b parser = fun input ->
  match p input with
  | Error e -> Error e
  | Ok (v, rest) -> (f v) rest

(* Infix operator for bind *)
let ( >>= ) = and_then

(* Approach 2: flat_map with context-sensitive parsing *)
(* Parse a length-prefixed string: "3:abc" *)
let parse_nat : int parser =
  map (fun chars ->
    List.fold_left (fun acc c -> acc * 10 + (Char.code c - Char.code '0')) 0 chars
  ) (many1 (satisfy (fun c -> c >= '0' && c <= '9') "digit"))

let length_prefixed : string parser =
  parse_nat >>= fun n ->
  (satisfy (fun c -> c = ':') "colon") >>= fun _ ->
  (fun input ->
    if String.length input >= n then
      Ok (String.sub input 0 n, String.sub input n (String.length input - n))
    else Error "Not enough characters")

(* Approach 3: Conditional parsing based on first result *)
let conditional_parser : string parser = fun input ->
  let type_parser = satisfy (fun c -> c = 'i' || c = 's') "type tag" in
  (type_parser >>= fun tag ->
    if tag = 'i' then
      map (fun ds -> String.init (List.length ds) (List.nth ds))
        (many1 (satisfy (fun c -> c >= '0' && c <= '9') "digit"))
    else
      map (fun ls -> String.init (List.length ls) (List.nth ls))
        (many1 (satisfy (fun c -> c >= 'a' && c <= 'z') "letter"))
  ) input

(* Tests *)
let () =
  (* and_then basic *)
  let p = and_then
    (satisfy (fun c -> c >= '0' && c <= '9') "digit")
    (fun d -> tag (String.make 1 d)) in
  assert (p "11rest" = Ok ("1", "rest"));

  (* length_prefixed *)
  assert (length_prefixed "3:abcrest" = Ok ("abc", "rest"));
  assert (length_prefixed "5:helloworld" = Ok ("hello", "world"));

  (* conditional *)
  assert (conditional_parser "i42rest" = Ok ("42", "rest"));
  assert (conditional_parser "sabcREST" = Ok ("abc", "REST"));

  print_endline "✓ All tests passed"

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_and_then_basic() {
        let p = and_then(satisfy(|c| c.is_ascii_digit(), "digit"), |d| {
            map(satisfy(|c| c.is_ascii_digit(), "digit"), move |d2| {
                format!("{}{}", d, d2)
            })
        });
        assert_eq!(p("12x"), Ok(("12".to_string(), "x")));
    }

    #[test]
    fn test_length_prefixed() {
        let p = length_prefixed();
        assert_eq!(p("3:abcrest"), Ok(("abc", "rest")));
    }

    #[test]
    fn test_length_prefixed_5() {
        let p = length_prefixed();
        assert_eq!(p("5:helloworld"), Ok(("hello", "world")));
    }

    #[test]
    fn test_length_prefixed_too_short() {
        let p = length_prefixed();
        assert!(p("5:hi").is_err());
    }

    #[test]
    fn test_conditional_int() {
        let p = conditional_parser();
        assert_eq!(p("i42rest"), Ok(("42".to_string(), "rest")));
    }

    #[test]
    fn test_conditional_string() {
        let p = conditional_parser();
        assert_eq!(p("sabcREST"), Ok(("abc".to_string(), "REST")));
    }

    #[test]
    fn test_conditional_invalid_tag() {
        let p = conditional_parser();
        assert!(p("x123").is_err());
    }

    #[test]
    fn test_and_then_error_propagation() {
        let p = and_then(satisfy(|c| c.is_ascii_digit(), "digit"), |_| {
            satisfy(|c| c.is_ascii_uppercase(), "upper")
        });
        assert!(p("1a").is_err()); // second parser fails
    }
}

Deep Comparison

Comparison: Example 160 — FlatMap Parser

and_then / bind

OCaml:

let and_then (p : 'a parser) (f : 'a -> 'b parser) : 'b parser = fun input ->
  match p input with
  | Error e -> Error e
  | Ok (v, rest) -> (f v) rest

let ( >>= ) = and_then

Rust:

fn and_then<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
where F: Fn(A) -> Parser<'a, B> + 'a {
    Box::new(move |input: &'a str| {
        let (value, rest) = parser(input)?;
        (f(value))(rest)
    })
}

Length-prefixed string

OCaml:

let length_prefixed : string parser =
  parse_nat >>= fun n ->
  (satisfy (fun c -> c = ':') "colon") >>= fun _ ->
  (fun input ->
    if String.length input >= n then
      Ok (String.sub input 0 n, String.sub input n (String.length input - n))
    else Error "Not enough characters")

Rust:

fn length_prefixed<'a>() -> Parser<'a, &'a str> {
    and_then(parse_nat(), |n| {
        Box::new(move |input: &'a str| {
            if input.starts_with(':') {
                let rest = &input[1..];
                if rest.len() >= n {
                    Ok((&rest[..n], &rest[n..]))
                } else {
                    Err("Not enough characters".to_string())
                }
            } else {
                Err("Expected ':'".to_string())
            }
        })
    })
}

Exercises

Parse a Pascal-style string 'n:content' where n is the length: "5:hello" → "hello".

Implement a parser that reads a type tag "i" or "s" and then parses either an integer or a string accordingly.

Write flat_map in terms of pure and a hypothetical join : Parser<Parser<T>> -> Parser<T> to demonstrate the monadic structure.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust