ExamplesBy LevelBy TopicLearning Paths
154 Advanced

String Parser

Functional Programming

Tutorial

The Problem

Many grammars require matching fixed string literals: keywords like "true" and "false", operators like "+=", or delimiters like "<!--". Matching character by character would work but is verbose and error-prone for multi-character strings. The tag combinator matches an entire expected string at once, using str::starts_with for efficiency. Case-insensitive variants handle HTTP methods, SQL keywords, and HTML tag names where casing is not significant.

🎯 Learning Outcomes

  • • Implement tag (exact string match) and tag_no_case (case-insensitive match)
  • • Understand why string parsers return &'a str (a slice of the input) rather than String (an allocation)
  • • See how string parsers combine with other combinators to parse complex patterns
  • • Learn the lifetime relationships between the parser, input, and returned slice
  • Code Example

    fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
        let expected_owned = expected.to_string();
        Box::new(move |input: &'a str| {
            if input.starts_with(&expected_owned) {
                let rest = &input[expected_owned.len()..];
                Ok((&input[..expected_owned.len()], rest))
            } else {
                Err(format!("Expected \"{}\"", expected_owned))
            }
        })
    }

    Key Differences

  • Zero-copy returns: Rust's tag returns a &'a str slice — no allocation; OCaml typically returns a new string (allocation under GC).
  • Lifetime tracking: Rust's 'a lifetime annotation enforces the validity relationship between input and returned slice; OCaml has no lifetime concept.
  • Case folding: tag_no_case in Rust must be careful with Unicode case folding (.to_lowercase() allocates); OCaml's String.lowercase_ascii handles only ASCII.
  • Error messages: Both should include the expected string in the error; convention differs between libraries.
  • OCaml Approach

    OCaml's angstrom provides string : string -> string t and string_ci : string -> string t. Because OCaml strings are immutable byte sequences, string_ci may allocate for the lowercased comparison. The returned value is a newly allocated string, not a slice of the input — allocation is less of a concern under GC.

    Full Source

    #![allow(clippy::all)]
    // Example 154: String Parser
    // Parse exact string literals
    
    type ParseResult<'a, T> = Result<(T, &'a str), String>;
    type Parser<'a, T> = Box<dyn Fn(&'a str) -> ParseResult<'a, T> + 'a>;
    
    // ============================================================
    // Approach 1: Parse exact string (tag)
    // ============================================================
    
    fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
        let expected_owned = expected.to_string();
        Box::new(move |input: &'a str| {
            if input.starts_with(&expected_owned) {
                let rest = &input[expected_owned.len()..];
                Ok((&input[..expected_owned.len()], rest))
            } else {
                Err(format!("Expected \"{}\"", expected_owned))
            }
        })
    }
    
    // ============================================================
    // Approach 2: Case-insensitive string match
    // ============================================================
    
    fn tag_no_case<'a>(expected: &str) -> Parser<'a, &'a str> {
        let expected_lower = expected.to_lowercase();
        let len = expected.len();
        Box::new(move |input: &'a str| {
            if input.len() >= len && input[..len].to_lowercase() == expected_lower {
                Ok((&input[..len], &input[len..]))
            } else {
                Err(format!(
                    "Expected \"{}\" (case insensitive)",
                    expected_lower
                ))
            }
        })
    }
    
    // ============================================================
    // Approach 3: Build from character-by-character matching
    // ============================================================
    
    fn string_from_chars<'a>(expected: &str) -> Parser<'a, String> {
        let expected = expected.to_string();
        Box::new(move |input: &'a str| {
            let mut remaining = input;
            for expected_char in expected.chars() {
                match remaining.chars().next() {
                    Some(c) if c == expected_char => {
                        remaining = &remaining[c.len_utf8()..];
                    }
                    Some(c) => return Err(format!("Expected '{}', got '{}'", expected_char, c)),
                    None => return Err(format!("Expected '{}', got EOF", expected_char)),
                }
            }
            Ok((expected.clone(), remaining))
        })
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_tag_match() {
            let p = tag("hello");
            assert_eq!(p("hello world"), Ok(("hello", " world")));
        }
    
        #[test]
        fn test_tag_exact() {
            let p = tag("hello");
            assert_eq!(p("hello"), Ok(("hello", "")));
        }
    
        #[test]
        fn test_tag_no_match() {
            let p = tag("hello");
            assert!(p("world").is_err());
        }
    
        #[test]
        fn test_tag_too_short() {
            let p = tag("hello");
            assert!(p("hel").is_err());
        }
    
        #[test]
        fn test_tag_no_case_upper() {
            let p = tag_no_case("Hello");
            assert_eq!(p("HELLO world"), Ok(("HELLO", " world")));
        }
    
        #[test]
        fn test_tag_no_case_mixed() {
            let p = tag_no_case("hello");
            assert_eq!(p("HeLLo!"), Ok(("HeLLo", "!")));
        }
    
        #[test]
        fn test_string_from_chars() {
            let p = string_from_chars("abc");
            assert_eq!(p("abcdef"), Ok(("abc".to_string(), "def")));
        }
    
        #[test]
        fn test_string_from_chars_fail() {
            let p = string_from_chars("abc");
            assert!(p("axc").is_err());
        }
    
        #[test]
        fn test_tag_empty_string() {
            let p = tag("");
            assert_eq!(p("anything"), Ok(("", "anything")));
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_tag_match() {
            let p = tag("hello");
            assert_eq!(p("hello world"), Ok(("hello", " world")));
        }
    
        #[test]
        fn test_tag_exact() {
            let p = tag("hello");
            assert_eq!(p("hello"), Ok(("hello", "")));
        }
    
        #[test]
        fn test_tag_no_match() {
            let p = tag("hello");
            assert!(p("world").is_err());
        }
    
        #[test]
        fn test_tag_too_short() {
            let p = tag("hello");
            assert!(p("hel").is_err());
        }
    
        #[test]
        fn test_tag_no_case_upper() {
            let p = tag_no_case("Hello");
            assert_eq!(p("HELLO world"), Ok(("HELLO", " world")));
        }
    
        #[test]
        fn test_tag_no_case_mixed() {
            let p = tag_no_case("hello");
            assert_eq!(p("HeLLo!"), Ok(("HeLLo", "!")));
        }
    
        #[test]
        fn test_string_from_chars() {
            let p = string_from_chars("abc");
            assert_eq!(p("abcdef"), Ok(("abc".to_string(), "def")));
        }
    
        #[test]
        fn test_string_from_chars_fail() {
            let p = string_from_chars("abc");
            assert!(p("axc").is_err());
        }
    
        #[test]
        fn test_tag_empty_string() {
            let p = tag("");
            assert_eq!(p("anything"), Ok(("", "anything")));
        }
    }

    Deep Comparison

    Comparison: Example 154 — String Parser

    tag (exact match)

    OCaml:

    let tag (expected : string) : string parser = fun input ->
      let len = String.length expected in
      if String.length input >= len && String.sub input 0 len = expected then
        Ok (expected, String.sub input len (String.length input - len))
      else
        Error (Printf.sprintf "Expected \"%s\"" expected)
    

    Rust:

    fn tag<'a>(expected: &str) -> Parser<'a, &'a str> {
        let expected_owned = expected.to_string();
        Box::new(move |input: &'a str| {
            if input.starts_with(&expected_owned) {
                let rest = &input[expected_owned.len()..];
                Ok((&input[..expected_owned.len()], rest))
            } else {
                Err(format!("Expected \"{}\"", expected_owned))
            }
        })
    }
    

    tag_no_case

    OCaml:

    let tag_no_case (expected : string) : string parser = fun input ->
      let len = String.length expected in
      if String.length input >= len &&
         String.lowercase_ascii (String.sub input 0 len) = String.lowercase_ascii expected then
        Ok (String.sub input 0 len, String.sub input len (String.length input - len))
      else
        Error (Printf.sprintf "Expected \"%s\" (case insensitive)" expected)
    

    Rust:

    fn tag_no_case<'a>(expected: &str) -> Parser<'a, &'a str> {
        let expected_lower = expected.to_lowercase();
        let len = expected.len();
        Box::new(move |input: &'a str| {
            if input.len() >= len && input[..len].to_lowercase() == expected_lower {
                Ok((&input[..len], &input[len..]))
            } else {
                Err(format!("Expected \"{}\" (case insensitive)", expected_lower))
            }
        })
    }
    

    Exercises

  • Implement tag_byte_slice(expected: &[u8]) -> Parser<&[u8]> that works on raw bytes instead of UTF-8 strings.
  • Write a bool_parser() -> Parser<bool> using tag("true") and tag("false") combined with choice.
  • Implement keyword(kw: &str) -> Parser<&str> that matches a string followed by a non-identifier character (word boundary).
  • Open Source Repos