ExamplesBy LevelBy TopicLearning Paths
161 Advanced

Digit Parser

Functional Programming

Tutorial

The Problem

Numbers are ubiquitous in data formats — configuration files, JSON, CSV, protocol messages. Parsing integers and floats correctly requires handling signs, leading zeros (allowed for floats, disallowed in JSON for integers), and overflow. Building a number parser from primitives demonstrates the full combinator pipeline: match sign, match digits, collect and convert, handle errors. This is the most universally used parser in real-world applications.

🎯 Learning Outcomes

  • • Build a complete integer parser from digit primitives: sign → digits → conversion
  • • Handle edge cases: empty input, sign with no digits, overflow
  • • See how many1 + map + flat_map combine for number parsing
  • • Understand the difference between parsing (string recognition) and interpretation (numeric value)
  • Code Example

    fn digit<'a>() -> Parser<'a, u32> {
        map(satisfy(|c| c.is_ascii_digit(), "digit"), |c| c as u32 - '0' as u32)
    }

    Key Differences

  • Overflow handling: Rust's i64::from_str returns Err on overflow; OCaml's int_of_string raises Failure (exception); Zarith in OCaml never overflows.
  • Digit range: Rust's is_ascii_digit() handles ASCII 0-9; OCaml's c >= '0' && c <= '9' is equivalent; Unicode digit handling requires additional work in both.
  • Intermediate representation: Rust collects Vec<char>, joins to String, then parses — three steps; OCaml similarly needs String.init or Buffer.t for the intermediate.
  • Float parsing: Both delegate to system-level float parsers; the main challenge is recognizing the float format (sign, integer part, fraction, exponent) with combinators.
  • OCaml Approach

    OCaml's standard library provides int_of_string and float_of_string. In angstrom:

    let digit = satisfy (fun c -> c >= '0' && c <= '9')
    let uint = many1 digit >>| (fun cs -> int_of_string (String.init (List.length cs) (List.nth cs)))
    

    OCaml's arbitrary-precision integers (Zarith) handle overflow naturally where Rust must explicitly check bounds.

    Full Source

    #![allow(clippy::all)]
    // Example 161: Digit Parser
    // Parse digits: single digit, multi-digit integer, positive/negative
    
    type ParseResult<'a, T> = Result<(T, &'a str), String>;
    type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
    
    fn satisfy<'a, F>(pred: F, desc: &str) -> Parser<'a, char>
    where
        F: Fn(char) -> bool + 'a,
    {
        let desc = desc.to_string();
        Box::new(move |input: &'a str| match input.chars().next() {
            Some(c) if pred(c) => Ok((c, &input[c.len_utf8()..])),
            _ => Err(format!("Expected {}", desc)),
        })
    }
    
    fn many1<'a, T: 'a>(p: Parser<'a, T>) -> Parser<'a, Vec<T>> {
        Box::new(move |input: &'a str| {
            let (first, mut rem) = p(input)?;
            let mut v = vec![first];
            while let Ok((val, r)) = p(rem) {
                v.push(val);
                rem = r;
            }
            Ok((v, rem))
        })
    }
    
    fn map<'a, A: 'a, B: 'a, F>(p: Parser<'a, A>, f: F) -> Parser<'a, B>
    where
        F: Fn(A) -> B + 'a,
    {
        Box::new(move |input: &'a str| {
            let (v, r) = p(input)?;
            Ok((f(v), r))
        })
    }
    
    fn opt<'a, T: 'a>(p: Parser<'a, T>) -> Parser<'a, Option<T>> {
        Box::new(move |input: &'a str| match p(input) {
            Ok((v, r)) => Ok((Some(v), r)),
            Err(_) => Ok((None, input)),
        })
    }
    
    // ============================================================
    // Approach 1: Single digit → u32
    // ============================================================
    
    fn digit<'a>() -> Parser<'a, u32> {
        map(satisfy(|c| c.is_ascii_digit(), "digit"), |c| {
            c as u32 - '0' as u32
        })
    }
    
    // ============================================================
    // Approach 2: Natural number (unsigned) → u64
    // ============================================================
    
    fn natural<'a>() -> Parser<'a, u64> {
        map(many1(satisfy(|c| c.is_ascii_digit(), "digit")), |digits| {
            digits
                .iter()
                .fold(0u64, |acc, &d| acc * 10 + (d as u64 - '0' as u64))
        })
    }
    
    // ============================================================
    // Approach 3: Signed integer → i64
    // ============================================================
    
    fn integer<'a>() -> Parser<'a, i64> {
        Box::new(|input: &'a str| {
            let (sign, rest) = opt(satisfy(|c| c == '+' || c == '-', "sign"))(input)?;
            let (n, rem) = natural()(rest)?;
            let value = match sign {
                Some('-') => -(n as i64),
                _ => n as i64,
            };
            Ok((value, rem))
        })
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_digit() {
            assert_eq!(digit()("5rest"), Ok((5, "rest")));
        }
    
        #[test]
        fn test_digit_zero() {
            assert_eq!(digit()("0x"), Ok((0, "x")));
        }
    
        #[test]
        fn test_digit_fail() {
            assert!(digit()("abc").is_err());
        }
    
        #[test]
        fn test_natural() {
            assert_eq!(natural()("42rest"), Ok((42, "rest")));
        }
    
        #[test]
        fn test_natural_zero() {
            assert_eq!(natural()("0"), Ok((0, "")));
        }
    
        #[test]
        fn test_natural_large() {
            assert_eq!(natural()("123456"), Ok((123456, "")));
        }
    
        #[test]
        fn test_integer_positive() {
            assert_eq!(integer()("42"), Ok((42, "")));
        }
    
        #[test]
        fn test_integer_negative() {
            assert_eq!(integer()("-42"), Ok((-42, "")));
        }
    
        #[test]
        fn test_integer_plus() {
            assert_eq!(integer()("+42"), Ok((42, "")));
        }
    
        #[test]
        fn test_integer_zero() {
            assert_eq!(integer()("0"), Ok((0, "")));
        }
    
        #[test]
        fn test_integer_fail() {
            assert!(integer()("abc").is_err());
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_digit() {
            assert_eq!(digit()("5rest"), Ok((5, "rest")));
        }
    
        #[test]
        fn test_digit_zero() {
            assert_eq!(digit()("0x"), Ok((0, "x")));
        }
    
        #[test]
        fn test_digit_fail() {
            assert!(digit()("abc").is_err());
        }
    
        #[test]
        fn test_natural() {
            assert_eq!(natural()("42rest"), Ok((42, "rest")));
        }
    
        #[test]
        fn test_natural_zero() {
            assert_eq!(natural()("0"), Ok((0, "")));
        }
    
        #[test]
        fn test_natural_large() {
            assert_eq!(natural()("123456"), Ok((123456, "")));
        }
    
        #[test]
        fn test_integer_positive() {
            assert_eq!(integer()("42"), Ok((42, "")));
        }
    
        #[test]
        fn test_integer_negative() {
            assert_eq!(integer()("-42"), Ok((-42, "")));
        }
    
        #[test]
        fn test_integer_plus() {
            assert_eq!(integer()("+42"), Ok((42, "")));
        }
    
        #[test]
        fn test_integer_zero() {
            assert_eq!(integer()("0"), Ok((0, "")));
        }
    
        #[test]
        fn test_integer_fail() {
            assert!(integer()("abc").is_err());
        }
    }

    Deep Comparison

    Comparison: Example 161 — Digit Parser

    Single digit

    OCaml:

    let digit : int parser =
      map (fun c -> Char.code c - Char.code '0')
        (satisfy (fun c -> c >= '0' && c <= '9') "digit")
    

    Rust:

    fn digit<'a>() -> Parser<'a, u32> {
        map(satisfy(|c| c.is_ascii_digit(), "digit"), |c| c as u32 - '0' as u32)
    }
    

    Natural number

    OCaml:

    let natural : int parser =
      map (fun digits -> List.fold_left (fun acc d -> acc * 10 + d) 0 digits)
        (many1 digit)
    

    Rust:

    fn natural<'a>() -> Parser<'a, u64> {
        map(
            many1(satisfy(|c| c.is_ascii_digit(), "digit")),
            |digits| digits.iter().fold(0u64, |acc, &d| acc * 10 + (d as u64 - '0' as u64)),
        )
    }
    

    Signed integer

    OCaml:

    let integer : int parser = fun input ->
      match opt (satisfy (fun c -> c = '+' || c = '-') "sign") input with
      | Ok (sign, rest) ->
        (match natural rest with
         | Ok (n, rem) ->
           let value = match sign with Some '-' -> -n | _ -> n in
           Ok (value, rem)
         | Error e -> Error e)
      | Error e -> Error e
    

    Rust:

    fn integer<'a>() -> Parser<'a, i64> {
        Box::new(|input: &'a str| {
            let (sign, rest) = opt(satisfy(|c| c == '+' || c == '-', "sign"))(input)?;
            let (n, rem) = natural()(rest)?;
            let value = match sign {
                Some('-') => -(n as i64),
                _ => n as i64,
            };
            Ok((value, rem))
        })
    }
    

    Exercises

  • Add parsing for hexadecimal integers: "0x1F"31.
  • Implement bounded_int<const MIN: i64, const MAX: i64>() -> Parser<i64> that fails if the parsed value is out of range.
  • Write a binary number parser: "0b1010"10.
  • Open Source Repos