ExamplesBy LevelBy TopicLearning Paths
164 Advanced

Number Parser

Functional Programming

Tutorial

The Problem

Floating-point numbers in text formats (JSON, CSV, scientific data) require parsing optional sign, integer digits, optional decimal point and fractional digits, and optional exponent notation (1.5e-10). Each component is optional or required in a specific combination. This example builds a full floating-point parser using combinators, demonstrating how complex lexical rules reduce to composed simple rules with clear, testable components.

🎯 Learning Outcomes

  • • Build a complete floating-point parser with sign, integral, fractional, and exponent parts
  • • Learn how opt and many1 combine to handle optional and required components
  • • Understand the string-then-convert pattern: collect the number string, then call str::parse
  • • See how combinator parsers map directly to BNF grammar rules
  • Code Example

    fn float_string<'a>() -> Parser<'a, &'a str> {
        Box::new(|input: &'a str| {
            let bytes = input.as_bytes();
            let mut pos = 0;
            if pos < bytes.len() && (bytes[pos] == b'+' || bytes[pos] == b'-') { pos += 1; }
            while pos < bytes.len() && bytes[pos].is_ascii_digit() { pos += 1; }
            // ... decimal, exponent ...
            Ok((&input[..pos], &input[pos..]))
        })
    }

    Key Differences

  • Precision shortcut: OCaml's take_while1 + float_of_string is concise but permissive; Rust's combinator parser is strict but verbose.
  • Exception vs. Result: OCaml's float_of_string raises Failure on invalid input; Rust's str::parse::<f64>() returns Result, propagated via ?.
  • Buffer efficiency: OCaml's take_while1 works directly on the buffer; Rust's combinator version collects Vec<char> before converting.
  • Locale: Both use the C locale for decimal parsing (. as decimal separator); locale-aware parsing requires additional handling.
  • OCaml Approach

    Angstrom provides a direct approach:

    let number =
      take_while1 (fun c -> Char.is_digit c || c = '.' || c = 'e' || c = 'E'
                            || c = '+' || c = '-')
      >>| float_of_string
    

    This is a common shortcut, though it accepts invalid strings like "1.2.3" that float_of_string rejects with an exception. A stricter combinator parser follows the BNF more closely.

    Full Source

    #![allow(clippy::all)]
    // Example 164: Number Parser
    // Parse floating point numbers with optional sign and decimal
    
    type ParseResult<'a, T> = Result<(T, &'a str), String>;
    type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
    
    // ============================================================
    // Approach 1: Imperative scanner — collect number string
    // ============================================================
    
    fn float_string<'a>() -> Parser<'a, &'a str> {
        Box::new(|input: &'a str| {
            let bytes = input.as_bytes();
            let len = bytes.len();
            let mut pos = 0;
            // optional sign
            if pos < len && (bytes[pos] == b'+' || bytes[pos] == b'-') {
                pos += 1;
            }
            let start_digits = pos;
            // integer part
            while pos < len && bytes[pos].is_ascii_digit() {
                pos += 1;
            }
            // decimal part
            if pos < len && bytes[pos] == b'.' {
                pos += 1;
                while pos < len && bytes[pos].is_ascii_digit() {
                    pos += 1;
                }
            }
            // exponent
            if pos < len && (bytes[pos] == b'e' || bytes[pos] == b'E') {
                pos += 1;
                if pos < len && (bytes[pos] == b'+' || bytes[pos] == b'-') {
                    pos += 1;
                }
                while pos < len && bytes[pos].is_ascii_digit() {
                    pos += 1;
                }
            }
            if pos == start_digits && (pos == 0 || bytes[pos - 1] != b'.') {
                return Err("Expected number".to_string());
            }
            if pos == 0 {
                return Err("Expected number".to_string());
            }
            Ok((&input[..pos], &input[pos..]))
        })
    }
    
    // ============================================================
    // Approach 2: Parse to f64 directly
    // ============================================================
    
    fn number<'a>() -> Parser<'a, f64> {
        Box::new(|input: &'a str| {
            let (s, rest) = float_string()(input)?;
            match s.parse::<f64>() {
                Ok(n) => Ok((n, rest)),
                Err(_) => Err(format!("Invalid number: {}", s)),
            }
        })
    }
    
    // ============================================================
    // Approach 3: Combinator-based (no raw indexing)
    // ============================================================
    
    fn number_combinator<'a>() -> Parser<'a, f64> {
        Box::new(|input: &'a str| {
            let mut pos = 0;
            let chars: Vec<char> = input.chars().collect();
            let len = chars.len();
    
            // optional sign
            if pos < len && (chars[pos] == '+' || chars[pos] == '-') {
                pos += 1;
            }
    
            let digit_start = pos;
            // integer part
            while pos < len && chars[pos].is_ascii_digit() {
                pos += 1;
            }
            let has_int = pos > digit_start;
    
            // decimal part
            let mut has_frac = false;
            if pos < len && chars[pos] == '.' {
                pos += 1;
                let frac_start = pos;
                while pos < len && chars[pos].is_ascii_digit() {
                    pos += 1;
                }
                has_frac = pos > frac_start;
            }
    
            if !has_int && !has_frac {
                return Err("Expected number".to_string());
            }
    
            // exponent
            if pos < len && (chars[pos] == 'e' || chars[pos] == 'E') {
                pos += 1;
                if pos < len && (chars[pos] == '+' || chars[pos] == '-') {
                    pos += 1;
                }
                while pos < len && chars[pos].is_ascii_digit() {
                    pos += 1;
                }
            }
    
            let byte_len: usize = chars[..pos].iter().map(|c| c.len_utf8()).sum();
            let num_str = &input[..byte_len];
            match num_str.parse::<f64>() {
                Ok(n) => Ok((n, &input[byte_len..])),
                Err(_) => Err(format!("Invalid number: {}", num_str)),
            }
        })
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_integer() {
            assert_eq!(float_string()("42rest"), Ok(("42", "rest")));
        }
    
        #[test]
        fn test_float() {
            assert_eq!(float_string()("3.14!"), Ok(("3.14", "!")));
        }
    
        #[test]
        fn test_negative() {
            assert_eq!(float_string()("-2.5x"), Ok(("-2.5", "x")));
        }
    
        #[test]
        fn test_exponent() {
            assert_eq!(float_string()("1e10"), Ok(("1e10", "")));
        }
    
        #[test]
        fn test_full_scientific() {
            assert_eq!(float_string()("1.5e-3rest"), Ok(("1.5e-3", "rest")));
        }
    
        #[test]
        fn test_number_f64() {
            let (n, _) = number()("3.14").unwrap();
            assert!((n - 3.14).abs() < 1e-10);
        }
    
        #[test]
        fn test_number_negative() {
            assert_eq!(number()("-42"), Ok((-42.0, "")));
        }
    
        #[test]
        fn test_number_combinator() {
            let (n, _) = number_combinator()("3.14").unwrap();
            assert!((n - 3.14).abs() < 1e-10);
        }
    
        #[test]
        fn test_number_fail() {
            assert!(number()("abc").is_err());
        }
    
        #[test]
        fn test_leading_dot() {
            let (n, _) = number_combinator()(".5").unwrap();
            assert!((n - 0.5).abs() < 1e-10);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_integer() {
            assert_eq!(float_string()("42rest"), Ok(("42", "rest")));
        }
    
        #[test]
        fn test_float() {
            assert_eq!(float_string()("3.14!"), Ok(("3.14", "!")));
        }
    
        #[test]
        fn test_negative() {
            assert_eq!(float_string()("-2.5x"), Ok(("-2.5", "x")));
        }
    
        #[test]
        fn test_exponent() {
            assert_eq!(float_string()("1e10"), Ok(("1e10", "")));
        }
    
        #[test]
        fn test_full_scientific() {
            assert_eq!(float_string()("1.5e-3rest"), Ok(("1.5e-3", "rest")));
        }
    
        #[test]
        fn test_number_f64() {
            let (n, _) = number()("3.14").unwrap();
            assert!((n - 3.14).abs() < 1e-10);
        }
    
        #[test]
        fn test_number_negative() {
            assert_eq!(number()("-42"), Ok((-42.0, "")));
        }
    
        #[test]
        fn test_number_combinator() {
            let (n, _) = number_combinator()("3.14").unwrap();
            assert!((n - 3.14).abs() < 1e-10);
        }
    
        #[test]
        fn test_number_fail() {
            assert!(number()("abc").is_err());
        }
    
        #[test]
        fn test_leading_dot() {
            let (n, _) = number_combinator()(".5").unwrap();
            assert!((n - 0.5).abs() < 1e-10);
        }
    }

    Deep Comparison

    Comparison: Example 164 — Number Parser

    Imperative scanner

    OCaml:

    let float_string : string parser = fun input ->
      let buf = Buffer.create 16 in
      let pos = ref 0 in
      let len = String.length input in
      if !pos < len && (input.[!pos] = '+' || input.[!pos] = '-') then begin
        Buffer.add_char buf input.[!pos]; incr pos end;
      while !pos < len && is_digit input.[!pos] do
        Buffer.add_char buf input.[!pos]; incr pos done;
      (* ... decimal, exponent ... *)
      Ok (Buffer.contents buf, String.sub input !pos (len - !pos))
    

    Rust:

    fn float_string<'a>() -> Parser<'a, &'a str> {
        Box::new(|input: &'a str| {
            let bytes = input.as_bytes();
            let mut pos = 0;
            if pos < bytes.len() && (bytes[pos] == b'+' || bytes[pos] == b'-') { pos += 1; }
            while pos < bytes.len() && bytes[pos].is_ascii_digit() { pos += 1; }
            // ... decimal, exponent ...
            Ok((&input[..pos], &input[pos..]))
        })
    }
    

    String to float conversion

    OCaml:

    float_of_string "3.14"  (* 3.14 *)
    

    Rust:

    "3.14".parse::<f64>()  // Ok(3.14)
    

    Exercises

  • Add exponent parsing: "1.5e-10", "2.0E+3" should parse correctly.
  • Implement a strict JSON number parser that rejects leading zeros ("01" is invalid in JSON).
  • Write a parser for rational numbers in the form "3/4"(3, 4) as a pair of integers.
  • Open Source Repos