164 Advanced

Number Parser

Functional Programming

Tutorial

The Problem

Floating-point numbers in text formats (JSON, CSV, scientific data) require parsing optional sign, integer digits, optional decimal point and fractional digits, and optional exponent notation (1.5e-10). Each component is optional or required in a specific combination. This example builds a full floating-point parser using combinators, demonstrating how complex lexical rules reduce to composed simple rules with clear, testable components.

🎯 Learning Outcomes

• Build a complete floating-point parser with sign, integral, fractional, and exponent parts

• Learn how opt and many1 combine to handle optional and required components

• Understand the string-then-convert pattern: collect the number string, then call str::parse

• See how combinator parsers map directly to BNF grammar rules

Code Example

fn float_string<'a>() -> Parser<'a, &'a str> {
    Box::new(|input: &'a str| {
        let bytes = input.as_bytes();
        let mut pos = 0;
        if pos < bytes.len() && (bytes[pos] == b'+' || bytes[pos] == b'-') { pos += 1; }
        while pos < bytes.len() && bytes[pos].is_ascii_digit() { pos += 1; }
        // ... decimal, exponent ...
        Ok((&input[..pos], &input[pos..]))
    })
}

let float_string : string parser = fun input ->
  let buf = Buffer.create 16 in
  let pos = ref 0 in
  let len = String.length input in
  if !pos < len && (input.[!pos] = '+' || input.[!pos] = '-') then begin
    Buffer.add_char buf input.[!pos]; incr pos end;
  while !pos < len && is_digit input.[!pos] do
    Buffer.add_char buf input.[!pos]; incr pos done;
  (* ... decimal, exponent ... *)
  Ok (Buffer.contents buf, String.sub input !pos (len - !pos))

Key Differences

Precision shortcut: OCaml's take_while1 + float_of_string is concise but permissive; Rust's combinator parser is strict but verbose.

Exception vs. Result: OCaml's float_of_string raises Failure on invalid input; Rust's str::parse::<f64>() returns Result, propagated via ?.

Buffer efficiency: OCaml's take_while1 works directly on the buffer; Rust's combinator version collects Vec<char> before converting.

Locale: Both use the C locale for decimal parsing (. as decimal separator); locale-aware parsing requires additional handling.

OCaml Approach

Angstrom provides a direct approach:

let number =
  take_while1 (fun c -> Char.is_digit c || c = '.' || c = 'e' || c = 'E'
                        || c = '+' || c = '-')
  >>| float_of_string

This is a common shortcut, though it accepts invalid strings like "1.2.3" that float_of_string rejects with an exception. A stricter combinator parser follows the BNF more closely.

Full Source

//! # Number Parser
//!
//! Parse integers and floats from `&str` with validation and error handling.
//!
//! Four approaches mirror the OCaml source:
//!   * [`parse_int_safe`] — delegate to the standard library's `str::parse`.
//!   * [`parse_int_custom`] — scan digits ourselves, rejecting any non-digit.
//!   * [`parse_int_with_sign`] — extend the custom scanner with an optional `+`/`-` prefix.
//!   * [`parse_float_safe`] — standard-library float parsing, same shape as the integer version.
//!
//! Each function returns `Result<T, String>` so callers can distinguish success from
//! a malformed input and get a human-readable message back.

/// Parse an unsigned decimal integer using the standard library.
///
/// Returns `Err` with a human-readable message if the input is not a valid `i64`.
pub fn parse_int_safe(s: &str) -> Result<i64, String> {
    s.parse::<i64>()
        .map_err(|_| format!("Not a valid integer: {s}"))
}

/// Parse an unsigned decimal integer by scanning each character.
///
/// Any non-digit (including a leading sign or a trailing letter) is rejected.
/// An empty input is also rejected, matching the OCaml version.
pub fn parse_int_custom(s: &str) -> Result<i64, String> {
    if s.is_empty() || !s.bytes().all(|b| b.is_ascii_digit()) {
        return Err(format!("Invalid characters: {s}"));
    }
    s.bytes()
        .try_fold(0i64, |acc, b| {
            acc.checked_mul(10)?.checked_add(i64::from(b - b'0'))
        })
        .ok_or_else(|| format!("Invalid characters: {s}"))
}

/// Parse a decimal integer with an optional leading `+` or `-`.
pub fn parse_int_with_sign(s: &str) -> Result<i64, String> {
    let (sign, digits) = match s.as_bytes().first() {
        Some(b'-') => (-1, &s[1..]),
        Some(b'+') => (1, &s[1..]),
        _ => (1, s),
    };

    if digits.is_empty() || !digits.bytes().all(|b| b.is_ascii_digit()) {
        return Err(if sign == -1 {
            format!("Invalid negative number: {s}")
        } else {
            format!("Invalid positive number: {s}")
        });
    }

    parse_int_custom(digits).map(|n| sign * n).map_err(|_| {
        if sign == -1 {
            format!("Invalid negative number: {s}")
        } else {
            format!("Invalid positive number: {s}")
        }
    })
}

/// Parse a floating-point number using the standard library.
pub fn parse_float_safe(s: &str) -> Result<f64, String> {
    s.parse::<f64>()
        .map_err(|_| format!("Not a valid float: {s}"))
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn int_safe_accepts_valid_input() {
        assert_eq!(parse_int_safe("42"), Ok(42));
        assert_eq!(parse_int_safe("-17"), Ok(-17));
        assert_eq!(parse_int_safe("0"), Ok(0));
    }

    #[test]
    fn int_safe_rejects_invalid_input() {
        assert_eq!(
            parse_int_safe("abc"),
            Err("Not a valid integer: abc".to_string())
        );
        assert_eq!(parse_int_safe(""), Err("Not a valid integer: ".to_string()));
        assert!(parse_int_safe("1.5").is_err());
    }

    #[test]
    fn int_custom_accepts_only_digits() {
        assert_eq!(parse_int_custom("123"), Ok(123));
        assert_eq!(parse_int_custom("0"), Ok(0));
    }

    #[test]
    fn int_custom_rejects_non_digits() {
        assert_eq!(
            parse_int_custom("12a3"),
            Err("Invalid characters: 12a3".to_string())
        );
        assert_eq!(
            parse_int_custom(""),
            Err("Invalid characters: ".to_string())
        );
        assert!(
            parse_int_custom("-5").is_err(),
            "sign belongs to parse_int_with_sign"
        );
    }

    #[test]
    fn int_with_sign_handles_prefixes() {
        assert_eq!(parse_int_with_sign("+5"), Ok(5));
        assert_eq!(parse_int_with_sign("-5"), Ok(-5));
        assert_eq!(parse_int_with_sign("5"), Ok(5));
    }

    #[test]
    fn int_with_sign_reports_direction() {
        assert_eq!(
            parse_int_with_sign("-abc"),
            Err("Invalid negative number: -abc".to_string())
        );
        assert_eq!(
            parse_int_with_sign("+abc"),
            Err("Invalid positive number: +abc".to_string())
        );
        assert_eq!(
            parse_int_with_sign("abc"),
            Err("Invalid positive number: abc".to_string())
        );
    }

    #[test]
    fn float_safe_accepts_valid_input() {
        assert_eq!(parse_float_safe("2.5"), Ok(2.5));
        assert_eq!(parse_float_safe("-2.0"), Ok(-2.0));
        assert_eq!(parse_float_safe("1e10"), Ok(1e10));
    }

    #[test]
    fn float_safe_rejects_invalid_input() {
        assert_eq!(
            parse_float_safe("abc"),
            Err("Not a valid float: abc".to_string())
        );
    }

    #[test]
    fn int_custom_detects_overflow() {
        // 2^63 = 9223372036854775808, one past i64::MAX.
        assert!(parse_int_custom("9223372036854775808").is_err());
    }
}

(* Example 164: Number Parser *)
(* Parse floating point numbers with optional sign and decimal *)

type 'a parse_result = ('a * string, string) result
type 'a parser = string -> 'a parse_result

let satisfy pred desc : char parser = fun input ->
  if String.length input > 0 && pred input.[0] then
    Ok (input.[0], String.sub input 1 (String.length input - 1))
  else Error (Printf.sprintf "Expected %s" desc)

let many0 p : 'a list parser = fun input ->
  let rec go acc r = match p r with Ok (v, r') -> go (v::acc) r' | Error _ -> Ok (List.rev acc, r)
  in go [] input

let many1 p : 'a list parser = fun input ->
  match p input with Error e -> Error e
  | Ok (v, r) -> match many0 p r with Ok (vs, r') -> Ok (v::vs, r') | Error e -> Error e

let opt p : 'a option parser = fun input ->
  match p input with Ok (v, r) -> Ok (Some v, r) | Error _ -> Ok (None, input)

let is_digit c = c >= '0' && c <= '9'
let digit = satisfy is_digit "digit"

(* Approach 1: Float as string collection *)
let float_string : string parser = fun input ->
  let buf = Buffer.create 16 in
  let pos = ref 0 in
  let len = String.length input in
  (* optional sign *)
  if !pos < len && (input.[!pos] = '+' || input.[!pos] = '-') then begin
    Buffer.add_char buf input.[!pos]; incr pos end;
  (* integer part *)
  let start = !pos in
  while !pos < len && is_digit input.[!pos] do
    Buffer.add_char buf input.[!pos]; incr pos done;
  (* decimal part *)
  if !pos < len && input.[!pos] = '.' then begin
    Buffer.add_char buf '.'; incr pos;
    while !pos < len && is_digit input.[!pos] do
      Buffer.add_char buf input.[!pos]; incr pos done end;
  (* exponent *)
  if !pos < len && (input.[!pos] = 'e' || input.[!pos] = 'E') then begin
    Buffer.add_char buf input.[!pos]; incr pos;
    if !pos < len && (input.[!pos] = '+' || input.[!pos] = '-') then begin
      Buffer.add_char buf input.[!pos]; incr pos end;
    while !pos < len && is_digit input.[!pos] do
      Buffer.add_char buf input.[!pos]; incr pos done end;
  if !pos = start then Error "Expected number"
  else Ok (Buffer.contents buf, String.sub input !pos (len - !pos))

(* Approach 2: Combinator-based *)
let chars_to_string chars = String.init (List.length chars) (List.nth chars)

let number_combinator : float parser = fun input ->
  match opt (satisfy (fun c -> c = '+' || c = '-') "sign") input with
  | Ok (sign, r1) ->
    (match many1 digit r1 with
     | Ok (int_part, r2) ->
       (match opt (satisfy (fun c -> c = '.') "dot") r2 with
        | Ok (Some _, r3) ->
          (match many0 digit r3 with
           | Ok (frac_part, r4) ->
             let s = (match sign with Some c -> String.make 1 c | None -> "") ^
                     chars_to_string int_part ^ "." ^ chars_to_string frac_part in
             Ok (float_of_string s, r4)
           | Error e -> Error e)
        | Ok (None, r3) ->
          let s = (match sign with Some c -> String.make 1 c | None -> "") ^
                  chars_to_string int_part in
          Ok (float_of_string s, r3)
        | Error e -> Error e)
     | Error _ ->
       (match satisfy (fun c -> c = '.') "dot" r1 with
        | Ok (_, r2) ->
          (match many1 digit r2 with
           | Ok (frac_part, r3) ->
             let s = (match sign with Some c -> String.make 1 c | None -> "") ^
                     "0." ^ chars_to_string frac_part in
             Ok (float_of_string s, r3)
           | Error e -> Error e)
        | Error _ -> Error "Expected number"))
  | Error e -> Error e

(* Tests *)
let () =
  assert (float_string "42rest" = Ok ("42", "rest"));
  assert (float_string "3.14!" = Ok ("3.14", "!"));
  assert (float_string "-2.5x" = Ok ("-2.5", "x"));
  assert (float_string "1e10" = Ok ("1e10", ""));
  assert (float_string "1.5e-3" = Ok ("1.5e-3", ""));

  assert (number_combinator "42" = Ok (42.0, ""));
  assert (number_combinator "3.14" = Ok (3.14, ""));
  assert (number_combinator "-2.5" = Ok (-2.5, ""));
  assert (number_combinator ".5" = Ok (0.5, ""));

  print_endline "✓ All tests passed"

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn int_safe_accepts_valid_input() {
        assert_eq!(parse_int_safe("42"), Ok(42));
        assert_eq!(parse_int_safe("-17"), Ok(-17));
        assert_eq!(parse_int_safe("0"), Ok(0));
    }

    #[test]
    fn int_safe_rejects_invalid_input() {
        assert_eq!(
            parse_int_safe("abc"),
            Err("Not a valid integer: abc".to_string())
        );
        assert_eq!(parse_int_safe(""), Err("Not a valid integer: ".to_string()));
        assert!(parse_int_safe("1.5").is_err());
    }

    #[test]
    fn int_custom_accepts_only_digits() {
        assert_eq!(parse_int_custom("123"), Ok(123));
        assert_eq!(parse_int_custom("0"), Ok(0));
    }

    #[test]
    fn int_custom_rejects_non_digits() {
        assert_eq!(
            parse_int_custom("12a3"),
            Err("Invalid characters: 12a3".to_string())
        );
        assert_eq!(
            parse_int_custom(""),
            Err("Invalid characters: ".to_string())
        );
        assert!(
            parse_int_custom("-5").is_err(),
            "sign belongs to parse_int_with_sign"
        );
    }

    #[test]
    fn int_with_sign_handles_prefixes() {
        assert_eq!(parse_int_with_sign("+5"), Ok(5));
        assert_eq!(parse_int_with_sign("-5"), Ok(-5));
        assert_eq!(parse_int_with_sign("5"), Ok(5));
    }

    #[test]
    fn int_with_sign_reports_direction() {
        assert_eq!(
            parse_int_with_sign("-abc"),
            Err("Invalid negative number: -abc".to_string())
        );
        assert_eq!(
            parse_int_with_sign("+abc"),
            Err("Invalid positive number: +abc".to_string())
        );
        assert_eq!(
            parse_int_with_sign("abc"),
            Err("Invalid positive number: abc".to_string())
        );
    }

    #[test]
    fn float_safe_accepts_valid_input() {
        assert_eq!(parse_float_safe("2.5"), Ok(2.5));
        assert_eq!(parse_float_safe("-2.0"), Ok(-2.0));
        assert_eq!(parse_float_safe("1e10"), Ok(1e10));
    }

    #[test]
    fn float_safe_rejects_invalid_input() {
        assert_eq!(
            parse_float_safe("abc"),
            Err("Not a valid float: abc".to_string())
        );
    }

    #[test]
    fn int_custom_detects_overflow() {
        // 2^63 = 9223372036854775808, one past i64::MAX.
        assert!(parse_int_custom("9223372036854775808").is_err());
    }
}

Deep Comparison

Comparison: Example 164 — Number Parser

Imperative scanner

OCaml:

let float_string : string parser = fun input ->
  let buf = Buffer.create 16 in
  let pos = ref 0 in
  let len = String.length input in
  if !pos < len && (input.[!pos] = '+' || input.[!pos] = '-') then begin
    Buffer.add_char buf input.[!pos]; incr pos end;
  while !pos < len && is_digit input.[!pos] do
    Buffer.add_char buf input.[!pos]; incr pos done;
  (* ... decimal, exponent ... *)
  Ok (Buffer.contents buf, String.sub input !pos (len - !pos))

Rust:

fn float_string<'a>() -> Parser<'a, &'a str> {
    Box::new(|input: &'a str| {
        let bytes = input.as_bytes();
        let mut pos = 0;
        if pos < bytes.len() && (bytes[pos] == b'+' || bytes[pos] == b'-') { pos += 1; }
        while pos < bytes.len() && bytes[pos].is_ascii_digit() { pos += 1; }
        // ... decimal, exponent ...
        Ok((&input[..pos], &input[pos..]))
    })
}

String to float conversion

OCaml:

float_of_string "3.14"  (* 3.14 *)

Rust:

"3.14".parse::<f64>()  // Ok(3.14)

Exercises

Add exponent parsing: "1.5e-10", "2.0E+3" should parse correctly.

Implement a strict JSON number parser that rejects leading zeros ("01" is invalid in JSON).

Write a parser for rational numbers in the form "3/4" → (3, 4) as a pair of integers.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust