ExamplesBy LevelBy TopicLearning Paths
170 Advanced

Error Recovery

Functional Programming

Tutorial

The Problem

Production parsers must provide actionable error messages, not "parse error at offset 423." Users need to know what went wrong, where (line and column), and what was expected. Error recovery goes further: instead of stopping at the first error, the parser continues, collecting multiple errors. This requires tracking position (line/column from offset), enriching errors with expected-token information, and synchronizing on error tokens to resume parsing.

🎯 Learning Outcomes

  • • Understand position tracking (offset, line, column) in a parser
  • • Learn how to enrich error messages with "expected X, got Y" information
  • • See the panic-mode error recovery strategy: skip to a synchronization token
  • • Appreciate why error quality is a significant differentiator in production parsers
  • Code Example

    #[derive(Debug, Clone)]
    struct Position { offset: usize, line: usize, col: usize }
    
    fn advance_pos(pos: &Position, c: char) -> Position {
        if c == '\n' {
            Position { offset: pos.offset + 1, line: pos.line + 1, col: 1 }
        } else {
            Position { offset: pos.offset + 1, line: pos.line, col: pos.col + 1 }
        }
    }

    Key Differences

  • Position threading: Rust passes Position as an explicit value; OCaml often uses a state monad or implicit position tracking in the input type.
  • Error continuation: Both languages' simple parsers stop at the first error; production parsers in both use panic-mode recovery or grammar-level error tokens.
  • Error type: Rust's ParseError is a structured value; OCaml's errors are typically strings or custom error types in the result type.
  • Generator support: Menhir provides automatic error recovery and localization; hand-written parsers in both languages must implement this manually.
  • OCaml Approach

    OCaml's Menhir provides built-in error recovery via error tokens and recovery rules in the grammar. Hand-written OCaml parsers use angstrom's Consume.No and position tracking via pos combinators. angstrom reports byte offsets; converting to line/column requires scanning the input up to the offset. Error reporting in production OCaml parsers (like merlin) tracks position via a state monad.

    Full Source

    #![allow(clippy::all)]
    // Example 170: Error Recovery
    // Parser error messages with position info and expected tokens
    
    type ParseResult<'a, T> = Result<(T, &'a str, Position), ParseError>;
    
    #[derive(Debug, Clone, PartialEq)]
    struct Position {
        offset: usize,
        line: usize,
        col: usize,
    }
    
    #[derive(Debug, Clone)]
    struct ParseError {
        pos: Position,
        expected: Vec<String>,
        got: String,
    }
    
    impl std::fmt::Display for ParseError {
        fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
            write!(
                f,
                "Error at line {}, col {}: expected {}, got {}",
                self.pos.line,
                self.pos.col,
                self.expected.join(" or "),
                self.got
            )
        }
    }
    
    fn init_pos() -> Position {
        Position {
            offset: 0,
            line: 1,
            col: 1,
        }
    }
    
    fn advance_pos(pos: &Position, c: char) -> Position {
        if c == '\n' {
            Position {
                offset: pos.offset + 1,
                line: pos.line + 1,
                col: 1,
            }
        } else {
            Position {
                offset: pos.offset + 1,
                line: pos.line,
                col: pos.col + 1,
            }
        }
    }
    
    // ============================================================
    // Approach 1: Satisfy with position tracking
    // ============================================================
    
    fn satisfy_pos<'a>(
        pred: impl Fn(char) -> bool,
        desc: &str,
        input: &'a str,
        pos: &Position,
    ) -> ParseResult<'a, char> {
        match input.chars().next() {
            Some(c) if pred(c) => Ok((c, &input[c.len_utf8()..], advance_pos(pos, c))),
            Some(c) => Err(ParseError {
                pos: pos.clone(),
                expected: vec![desc.to_string()],
                got: format!("'{}'", c),
            }),
            None => Err(ParseError {
                pos: pos.clone(),
                expected: vec![desc.to_string()],
                got: "EOF".to_string(),
            }),
        }
    }
    
    fn tag_pos<'a>(expected: &str, input: &'a str, pos: &Position) -> ParseResult<'a, &'a str> {
        if input.starts_with(expected) {
            let mut new_pos = pos.clone();
            for c in expected.chars() {
                new_pos = advance_pos(&new_pos, c);
            }
            Ok((&input[..expected.len()], &input[expected.len()..], new_pos))
        } else {
            let got_len = expected.len().min(input.len());
            Err(ParseError {
                pos: pos.clone(),
                expected: vec![format!("\"{}\"", expected)],
                got: format!("\"{}\"", &input[..got_len]),
            })
        }
    }
    
    // ============================================================
    // Approach 2: Error merging for alternatives
    // ============================================================
    
    fn alt_pos<'a, T>(
        r1: ParseResult<'a, T>,
        r2: impl FnOnce() -> ParseResult<'a, T>,
    ) -> ParseResult<'a, T> {
        match r1 {
            Ok(r) => Ok(r),
            Err(e1) => match r2() {
                Ok(r) => Ok(r),
                Err(e2) => {
                    if e1.pos.offset == e2.pos.offset {
                        let mut expected = e1.expected;
                        expected.extend(e2.expected);
                        Err(ParseError {
                            pos: e1.pos,
                            expected,
                            got: e1.got,
                        })
                    } else if e1.pos.offset > e2.pos.offset {
                        Err(e1)
                    } else {
                        Err(e2)
                    }
                }
            },
        }
    }
    
    // ============================================================
    // Approach 3: Error recovery — skip to sync point
    // ============================================================
    
    fn recover_until<'a, T>(
        sync: impl Fn(char) -> bool,
        result: ParseResult<'a, T>,
        input: &'a str,
        pos: &Position,
    ) -> Result<(Option<T>, &'a str, Position), ParseError> {
        match result {
            Ok((v, rest, new_pos)) => Ok((Some(v), rest, new_pos)),
            Err(_) => {
                let mut remaining = input;
                let mut current_pos = pos.clone();
                while let Some(c) = remaining.chars().next() {
                    if sync(c) {
                        return Ok((None, remaining, current_pos));
                    }
                    remaining = &remaining[c.len_utf8()..];
                    current_pos = advance_pos(&current_pos, c);
                }
                Ok((None, "", current_pos))
            }
        }
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_error_position() {
            let pos = init_pos();
            let err = satisfy_pos(|c| c.is_ascii_digit(), "digit", "abc", &pos).unwrap_err();
            assert_eq!(err.pos.line, 1);
            assert_eq!(err.pos.col, 1);
            assert_eq!(err.expected, vec!["digit"]);
            assert_eq!(err.got, "'a'");
        }
    
        #[test]
        fn test_error_merge() {
            let pos = init_pos();
            let r1 = satisfy_pos(|c| c.is_ascii_digit(), "digit", "!", &pos);
            let err = alt_pos(r1, || {
                satisfy_pos(|c| c.is_ascii_alphabetic(), "letter", "!", &pos)
            })
            .unwrap_err();
            assert_eq!(err.expected.len(), 2);
            assert!(err.expected.contains(&"digit".to_string()));
            assert!(err.expected.contains(&"letter".to_string()));
        }
    
        #[test]
        fn test_position_tracking() {
            let pos = init_pos();
            let (_, _, new_pos) = tag_pos("ab", "ab\ncd", &pos).unwrap();
            assert_eq!(new_pos.line, 1);
            assert_eq!(new_pos.col, 3);
        }
    
        #[test]
        fn test_newline_tracking() {
            let pos = init_pos();
            let (_, rest, new_pos) = tag_pos("ab\n", "ab\ncd", &pos).unwrap();
            assert_eq!(new_pos.line, 2);
            assert_eq!(new_pos.col, 1);
            assert_eq!(rest, "cd");
        }
    
        #[test]
        fn test_recovery() {
            let pos = init_pos();
            let bad = satisfy_pos(|c| c.is_ascii_digit(), "digit", "abc;rest", &pos);
            let (val, rest, _) = recover_until(|c| c == ';', bad, "abc;rest", &pos).unwrap();
            assert!(val.is_none());
            assert_eq!(rest, ";rest");
        }
    
        #[test]
        fn test_display_error() {
            let err = ParseError {
                pos: Position {
                    offset: 5,
                    line: 2,
                    col: 3,
                },
                expected: vec!["digit".into(), "letter".into()],
                got: "'!'".into(),
            };
            assert_eq!(
                format!("{}", err),
                "Error at line 2, col 3: expected digit or letter, got '!'"
            );
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_error_position() {
            let pos = init_pos();
            let err = satisfy_pos(|c| c.is_ascii_digit(), "digit", "abc", &pos).unwrap_err();
            assert_eq!(err.pos.line, 1);
            assert_eq!(err.pos.col, 1);
            assert_eq!(err.expected, vec!["digit"]);
            assert_eq!(err.got, "'a'");
        }
    
        #[test]
        fn test_error_merge() {
            let pos = init_pos();
            let r1 = satisfy_pos(|c| c.is_ascii_digit(), "digit", "!", &pos);
            let err = alt_pos(r1, || {
                satisfy_pos(|c| c.is_ascii_alphabetic(), "letter", "!", &pos)
            })
            .unwrap_err();
            assert_eq!(err.expected.len(), 2);
            assert!(err.expected.contains(&"digit".to_string()));
            assert!(err.expected.contains(&"letter".to_string()));
        }
    
        #[test]
        fn test_position_tracking() {
            let pos = init_pos();
            let (_, _, new_pos) = tag_pos("ab", "ab\ncd", &pos).unwrap();
            assert_eq!(new_pos.line, 1);
            assert_eq!(new_pos.col, 3);
        }
    
        #[test]
        fn test_newline_tracking() {
            let pos = init_pos();
            let (_, rest, new_pos) = tag_pos("ab\n", "ab\ncd", &pos).unwrap();
            assert_eq!(new_pos.line, 2);
            assert_eq!(new_pos.col, 1);
            assert_eq!(rest, "cd");
        }
    
        #[test]
        fn test_recovery() {
            let pos = init_pos();
            let bad = satisfy_pos(|c| c.is_ascii_digit(), "digit", "abc;rest", &pos);
            let (val, rest, _) = recover_until(|c| c == ';', bad, "abc;rest", &pos).unwrap();
            assert!(val.is_none());
            assert_eq!(rest, ";rest");
        }
    
        #[test]
        fn test_display_error() {
            let err = ParseError {
                pos: Position {
                    offset: 5,
                    line: 2,
                    col: 3,
                },
                expected: vec!["digit".into(), "letter".into()],
                got: "'!'".into(),
            };
            assert_eq!(
                format!("{}", err),
                "Error at line 2, col 3: expected digit or letter, got '!'"
            );
        }
    }

    Deep Comparison

    Comparison: Example 170 — Error Recovery

    Position tracking

    OCaml:

    type position = { offset: int; line: int; col: int }
    
    let advance_pos pos c =
      if c = '\n' then { offset = pos.offset + 1; line = pos.line + 1; col = 1 }
      else { offset = pos.offset + 1; line = pos.line; col = pos.col + 1 }
    

    Rust:

    #[derive(Debug, Clone)]
    struct Position { offset: usize, line: usize, col: usize }
    
    fn advance_pos(pos: &Position, c: char) -> Position {
        if c == '\n' {
            Position { offset: pos.offset + 1, line: pos.line + 1, col: 1 }
        } else {
            Position { offset: pos.offset + 1, line: pos.line, col: pos.col + 1 }
        }
    }
    

    Error merging

    OCaml:

    if e1.pos.offset = e2.pos.offset then
      Error { pos = e1.pos; expected = e1.expected @ e2.expected; got = e1.got }
    

    Rust:

    if e1.pos.offset == e2.pos.offset {
        let mut expected = e1.expected;
        expected.extend(e2.expected);
        Err(ParseError { pos: e1.pos, expected, got: e1.got })
    }
    

    Exercises

  • Add a collect_errors wrapper that runs a parser and returns all errors found, continuing past each one using skip-to-semicolon recovery.
  • Implement line/column tracking by scanning the input up to the error offset and counting newlines.
  • Write an expected combinator that wraps any parser and adds a human-readable "expected X" to its error message.
  • Open Source Repos