ExamplesBy LevelBy TopicLearning Paths
160 Advanced

FlatMap Parser

Functional Programming

Tutorial

The Problem

Sometimes the choice of what to parse next depends on what was just parsed — a context-sensitive grammar. For example, a length-prefixed string "3:abc" requires first parsing the length 3, then using that to parse exactly 3 more characters. flat_map (also called bind or and_then) enables this: it runs a parser, passes the result to a function that returns a new parser, and runs that parser on the remaining input. This is the monadic bind (>>=) for parsers, enabling context-sensitive parsing.

🎯 Learning Outcomes

  • • Understand flat_map as monadic bind for parsers, enabling context-sensitive parsing
  • • Learn why flat_map is more powerful than map (it can choose the next parser dynamically)
  • • See the length-prefixed string pattern as a canonical flat_map example
  • • Understand the relationship between parser monads and other Rust monads (Option, Result)
  • Code Example

    fn and_then<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
    where F: Fn(A) -> Parser<'a, B> + 'a {
        Box::new(move |input: &'a str| {
            let (value, rest) = parser(input)?;
            (f(value))(rest)
        })
    }

    Key Differences

  • Notation: OCaml's let* desugars to >>=, giving near-imperative parser code; Rust uses closure-based chaining, which becomes nested for long sequences.
  • Power: flat_map makes parsers a full monad — every combinator (map, pair, opt) is derivable from flat_map and pure; some libraries do exactly this.
  • Performance: flat_map prevents certain parser optimizations (streaming, streaming allocation) because the next step is unknown until the first is complete.
  • Context sensitivity: Both angstrom and Rust's flat_map handle context-sensitive grammars; PEG parsers without flat_map cannot.
  • OCaml Approach

    OCaml's angstrom provides bind : 'a t -> ('a -> 'b t) -> 'b t and the infix >>=:

    let length_prefixed = uint_parser >>= fun n -> take n
    

    Monadic do-notation via let* (OCaml 4.08+) makes context-sensitive parsers readable:

    let length_prefixed =
      let* n = uint_parser in
      take n
    

    This is cleaner than Rust's closure-based flat_map for complex sequences.

    Full Source

    #![allow(clippy::all)]
    // Example 160: FlatMap Parser
    // flat_map / and_then: monadic chaining of parsers
    
    type ParseResult<'a, T> = Result<(T, &'a str), String>;
    type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
    
    fn satisfy<'a, F>(pred: F, desc: &str) -> Parser<'a, char>
    where
        F: Fn(char) -> bool + 'a,
    {
        let desc = desc.to_string();
        Box::new(move |input: &'a str| match input.chars().next() {
            Some(c) if pred(c) => Ok((c, &input[c.len_utf8()..])),
            _ => Err(format!("Expected {}", desc)),
        })
    }
    
    fn many1<'a, T: 'a>(parser: Parser<'a, T>) -> Parser<'a, Vec<T>> {
        Box::new(move |input: &'a str| {
            let (first, mut rem) = parser(input)?;
            let mut results = vec![first];
            while let Ok((v, rest)) = parser(rem) {
                results.push(v);
                rem = rest;
            }
            Ok((results, rem))
        })
    }
    
    fn map<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
    where
        F: Fn(A) -> B + 'a,
    {
        Box::new(move |input: &'a str| {
            let (v, rest) = parser(input)?;
            Ok((f(v), rest))
        })
    }
    
    // ============================================================
    // Approach 1: and_then / bind — monadic chaining
    // ============================================================
    
    fn and_then<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
    where
        F: Fn(A) -> Parser<'a, B> + 'a,
    {
        Box::new(move |input: &'a str| {
            let (value, rest) = parser(input)?;
            (f(value))(rest)
        })
    }
    
    // ============================================================
    // Approach 2: Context-sensitive — length-prefixed string "3:abc"
    // ============================================================
    
    fn parse_nat<'a>() -> Parser<'a, usize> {
        map(many1(satisfy(|c| c.is_ascii_digit(), "digit")), |digits| {
            digits
                .iter()
                .fold(0usize, |acc, &d| acc * 10 + (d as usize - '0' as usize))
        })
    }
    
    fn length_prefixed<'a>() -> Parser<'a, &'a str> {
        and_then(parse_nat(), |n| {
            Box::new(move |input: &'a str| {
                if let Some(rest) = input.strip_prefix(':') {
                    if rest.len() >= n {
                        Ok((&rest[..n], &rest[n..]))
                    } else {
                        Err("Not enough characters".to_string())
                    }
                } else {
                    Err("Expected ':'".to_string())
                }
            })
        })
    }
    
    // ============================================================
    // Approach 3: Conditional parsing based on tag
    // ============================================================
    
    fn conditional_parser<'a>() -> Parser<'a, String> {
        and_then(satisfy(|c| c == 'i' || c == 's', "type tag"), |tag_char| {
            if tag_char == 'i' {
                map(many1(satisfy(|c| c.is_ascii_digit(), "digit")), |chars| {
                    chars.into_iter().collect()
                })
            } else {
                map(
                    many1(satisfy(|c| c.is_ascii_lowercase(), "letter")),
                    |chars| chars.into_iter().collect(),
                )
            }
        })
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_and_then_basic() {
            let p = and_then(satisfy(|c| c.is_ascii_digit(), "digit"), |d| {
                map(satisfy(|c| c.is_ascii_digit(), "digit"), move |d2| {
                    format!("{}{}", d, d2)
                })
            });
            assert_eq!(p("12x"), Ok(("12".to_string(), "x")));
        }
    
        #[test]
        fn test_length_prefixed() {
            let p = length_prefixed();
            assert_eq!(p("3:abcrest"), Ok(("abc", "rest")));
        }
    
        #[test]
        fn test_length_prefixed_5() {
            let p = length_prefixed();
            assert_eq!(p("5:helloworld"), Ok(("hello", "world")));
        }
    
        #[test]
        fn test_length_prefixed_too_short() {
            let p = length_prefixed();
            assert!(p("5:hi").is_err());
        }
    
        #[test]
        fn test_conditional_int() {
            let p = conditional_parser();
            assert_eq!(p("i42rest"), Ok(("42".to_string(), "rest")));
        }
    
        #[test]
        fn test_conditional_string() {
            let p = conditional_parser();
            assert_eq!(p("sabcREST"), Ok(("abc".to_string(), "REST")));
        }
    
        #[test]
        fn test_conditional_invalid_tag() {
            let p = conditional_parser();
            assert!(p("x123").is_err());
        }
    
        #[test]
        fn test_and_then_error_propagation() {
            let p = and_then(satisfy(|c| c.is_ascii_digit(), "digit"), |_| {
                satisfy(|c| c.is_ascii_uppercase(), "upper")
            });
            assert!(p("1a").is_err()); // second parser fails
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_and_then_basic() {
            let p = and_then(satisfy(|c| c.is_ascii_digit(), "digit"), |d| {
                map(satisfy(|c| c.is_ascii_digit(), "digit"), move |d2| {
                    format!("{}{}", d, d2)
                })
            });
            assert_eq!(p("12x"), Ok(("12".to_string(), "x")));
        }
    
        #[test]
        fn test_length_prefixed() {
            let p = length_prefixed();
            assert_eq!(p("3:abcrest"), Ok(("abc", "rest")));
        }
    
        #[test]
        fn test_length_prefixed_5() {
            let p = length_prefixed();
            assert_eq!(p("5:helloworld"), Ok(("hello", "world")));
        }
    
        #[test]
        fn test_length_prefixed_too_short() {
            let p = length_prefixed();
            assert!(p("5:hi").is_err());
        }
    
        #[test]
        fn test_conditional_int() {
            let p = conditional_parser();
            assert_eq!(p("i42rest"), Ok(("42".to_string(), "rest")));
        }
    
        #[test]
        fn test_conditional_string() {
            let p = conditional_parser();
            assert_eq!(p("sabcREST"), Ok(("abc".to_string(), "REST")));
        }
    
        #[test]
        fn test_conditional_invalid_tag() {
            let p = conditional_parser();
            assert!(p("x123").is_err());
        }
    
        #[test]
        fn test_and_then_error_propagation() {
            let p = and_then(satisfy(|c| c.is_ascii_digit(), "digit"), |_| {
                satisfy(|c| c.is_ascii_uppercase(), "upper")
            });
            assert!(p("1a").is_err()); // second parser fails
        }
    }

    Deep Comparison

    Comparison: Example 160 — FlatMap Parser

    and_then / bind

    OCaml:

    let and_then (p : 'a parser) (f : 'a -> 'b parser) : 'b parser = fun input ->
      match p input with
      | Error e -> Error e
      | Ok (v, rest) -> (f v) rest
    
    let ( >>= ) = and_then
    

    Rust:

    fn and_then<'a, A: 'a, B: 'a, F>(parser: Parser<'a, A>, f: F) -> Parser<'a, B>
    where F: Fn(A) -> Parser<'a, B> + 'a {
        Box::new(move |input: &'a str| {
            let (value, rest) = parser(input)?;
            (f(value))(rest)
        })
    }
    

    Length-prefixed string

    OCaml:

    let length_prefixed : string parser =
      parse_nat >>= fun n ->
      (satisfy (fun c -> c = ':') "colon") >>= fun _ ->
      (fun input ->
        if String.length input >= n then
          Ok (String.sub input 0 n, String.sub input n (String.length input - n))
        else Error "Not enough characters")
    

    Rust:

    fn length_prefixed<'a>() -> Parser<'a, &'a str> {
        and_then(parse_nat(), |n| {
            Box::new(move |input: &'a str| {
                if input.starts_with(':') {
                    let rest = &input[1..];
                    if rest.len() >= n {
                        Ok((&rest[..n], &rest[n..]))
                    } else {
                        Err("Not enough characters".to_string())
                    }
                } else {
                    Err("Expected ':'".to_string())
                }
            })
        })
    }
    

    Exercises

  • Parse a Pascal-style string 'n:content' where n is the length: "5:hello""hello".
  • Implement a parser that reads a type tag "i" or "s" and then parses either an integer or a string accordingly.
  • Write flat_map in terms of pure and a hypothetical join : Parser<Parser<T>> -> Parser<T> to demonstrate the monadic structure.
  • Open Source Repos