ExamplesBy LevelBy TopicLearning Paths
765 Intermediate

765-csv-parsing-pattern — CSV Parsing Pattern

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "765-csv-parsing-pattern — CSV Parsing Pattern" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. CSV (Comma-Separated Values) is the most common format for tabular data exchange — spreadsheets, database exports, analytics pipelines. Key difference from OCaml: 1. **State machine**: Both languages implement CSV parsing as a character

Tutorial

The Problem

CSV (Comma-Separated Values) is the most common format for tabular data exchange — spreadsheets, database exports, analytics pipelines. Despite its simplicity, CSV has edge cases that trip up naive implementations: quoted fields containing commas, escaped quotes inside quoted fields, variable column counts, and different line endings (CRLF vs LF). Building a correct CSV parser teaches state machine design and the importance of handling edge cases explicitly.

🎯 Learning Outcomes

  • • Implement a CSV parser that handles quoted fields with embedded commas and newlines
  • • Detect and report inconsistent column counts across rows
  • • Parse rows as Vec<String> with proper unquoting and un-escaping
  • • Return typed CsvError variants for unterminated quotes and column count mismatches
  • • Understand why the csv crate exists and what it handles that this example omits (BOM, CRLF, etc.)
  • Code Example

    pub fn parse_csv(input: &str) -> Result<Vec<Row>, CsvError> {
        let mut rows = Vec::new();
        for (line_num, line) in input.lines().enumerate() {
            let row = parse_row(line, line_num)?;
            rows.push(row);
        }
        Ok(rows)
    }

    Key Differences

  • State machine: Both languages implement CSV parsing as a character-by-character state machine with explicit quote tracking.
  • Buffer accumulation: Rust uses a String and push for field building; OCaml uses Buffer.t and Buffer.add_char — equivalent patterns.
  • Error types: Rust's typed CsvError enum; OCaml's Csv library raises exceptions with string messages.
  • Streaming: The csv crate supports streaming (iterator over rows); OCaml's Csv.of_channel does the same; this example parses the whole string at once.
  • OCaml Approach

    OCaml's Csv library handles CSV parsing with RFC 4180 compliance. The csv-sxml library converts parsed CSV to XML. Csvfields (Jane Street) generates typed record accessors from CSV headers. The parser pattern in OCaml uses Buffer.t for field accumulation and explicit state variables, structurally identical to the Rust implementation.

    Full Source

    #![allow(clippy::all)]
    //! # CSV Parsing Pattern
    //!
    //! Simple CSV parser without external dependencies.
    
    /// A parsed CSV row
    pub type Row = Vec<String>;
    
    /// CSV parse error
    #[derive(Debug, PartialEq)]
    pub enum CsvError {
        UnterminatedQuote(usize),
        InconsistentColumns {
            expected: usize,
            got: usize,
            line: usize,
        },
    }
    
    /// Parse a CSV string into rows
    pub fn parse_csv(input: &str) -> Result<Vec<Row>, CsvError> {
        let mut rows = Vec::new();
        let mut expected_cols = None;
    
        for (line_num, line) in input.lines().enumerate() {
            if line.trim().is_empty() {
                continue;
            }
            let row = parse_row(line, line_num)?;
    
            match expected_cols {
                None => expected_cols = Some(row.len()),
                Some(n) if row.len() != n => {
                    return Err(CsvError::InconsistentColumns {
                        expected: n,
                        got: row.len(),
                        line: line_num,
                    });
                }
                _ => {}
            }
    
            rows.push(row);
        }
    
        Ok(rows)
    }
    
    /// Parse a single CSV row
    fn parse_row(line: &str, line_num: usize) -> Result<Row, CsvError> {
        let mut fields = Vec::new();
        let mut current = String::new();
        let mut in_quotes = false;
        let mut chars = line.chars().peekable();
    
        while let Some(ch) = chars.next() {
            if in_quotes {
                if ch == '"' {
                    if chars.peek() == Some(&'"') {
                        chars.next();
                        current.push('"');
                    } else {
                        in_quotes = false;
                    }
                } else {
                    current.push(ch);
                }
            } else {
                match ch {
                    '"' => in_quotes = true,
                    ',' => {
                        fields.push(current.trim().to_string());
                        current = String::new();
                    }
                    _ => current.push(ch),
                }
            }
        }
    
        if in_quotes {
            return Err(CsvError::UnterminatedQuote(line_num));
        }
    
        fields.push(current.trim().to_string());
        Ok(fields)
    }
    
    /// Format rows as CSV
    pub fn format_csv(rows: &[Row]) -> String {
        rows.iter()
            .map(|row| {
                row.iter()
                    .map(|field| {
                        if field.contains(',') || field.contains('"') || field.contains('\n') {
                            format!("\"{}\"", field.replace('"', "\"\""))
                        } else {
                            field.clone()
                        }
                    })
                    .collect::<Vec<_>>()
                    .join(",")
            })
            .collect::<Vec<_>>()
            .join("\n")
    }
    
    /// Parse CSV with headers, returning maps
    pub fn parse_csv_with_headers(
        input: &str,
    ) -> Result<Vec<std::collections::HashMap<String, String>>, CsvError> {
        let rows = parse_csv(input)?;
        if rows.is_empty() {
            return Ok(Vec::new());
        }
    
        let headers = &rows[0];
        let mut result = Vec::new();
    
        for row in rows.iter().skip(1) {
            let mut map = std::collections::HashMap::new();
            for (i, value) in row.iter().enumerate() {
                if let Some(header) = headers.get(i) {
                    map.insert(header.clone(), value.clone());
                }
            }
            result.push(map);
        }
    
        Ok(result)
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_simple_csv() {
            let input = "a,b,c\n1,2,3\n4,5,6";
            let rows = parse_csv(input).unwrap();
            assert_eq!(rows.len(), 3);
            assert_eq!(rows[0], vec!["a", "b", "c"]);
            assert_eq!(rows[1], vec!["1", "2", "3"]);
        }
    
        #[test]
        fn test_quoted_field() {
            let input = r#"name,value
    "hello, world",42"#;
            let rows = parse_csv(input).unwrap();
            assert_eq!(rows[1][0], "hello, world");
        }
    
        #[test]
        fn test_escaped_quote() {
            let input = "text\n\"say \"\"hello\"\"\"";
            let rows = parse_csv(input).unwrap();
            assert_eq!(rows[1][0], "say \"hello\"");
        }
    
        #[test]
        fn test_inconsistent_columns() {
            let input = "a,b,c\n1,2";
            let result = parse_csv(input);
            assert!(matches!(result, Err(CsvError::InconsistentColumns { .. })));
        }
    
        #[test]
        fn test_format_csv() {
            let rows = vec![
                vec!["a".to_string(), "b".to_string()],
                vec!["1".to_string(), "2".to_string()],
            ];
            let output = format_csv(&rows);
            assert_eq!(output, "a,b\n1,2");
        }
    
        #[test]
        fn test_with_headers() {
            let input = "name,age\nAlice,30\nBob,25";
            let records = parse_csv_with_headers(input).unwrap();
            assert_eq!(records.len(), 2);
            assert_eq!(records[0].get("name").unwrap(), "Alice");
            assert_eq!(records[0].get("age").unwrap(), "30");
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_simple_csv() {
            let input = "a,b,c\n1,2,3\n4,5,6";
            let rows = parse_csv(input).unwrap();
            assert_eq!(rows.len(), 3);
            assert_eq!(rows[0], vec!["a", "b", "c"]);
            assert_eq!(rows[1], vec!["1", "2", "3"]);
        }
    
        #[test]
        fn test_quoted_field() {
            let input = r#"name,value
    "hello, world",42"#;
            let rows = parse_csv(input).unwrap();
            assert_eq!(rows[1][0], "hello, world");
        }
    
        #[test]
        fn test_escaped_quote() {
            let input = "text\n\"say \"\"hello\"\"\"";
            let rows = parse_csv(input).unwrap();
            assert_eq!(rows[1][0], "say \"hello\"");
        }
    
        #[test]
        fn test_inconsistent_columns() {
            let input = "a,b,c\n1,2";
            let result = parse_csv(input);
            assert!(matches!(result, Err(CsvError::InconsistentColumns { .. })));
        }
    
        #[test]
        fn test_format_csv() {
            let rows = vec![
                vec!["a".to_string(), "b".to_string()],
                vec!["1".to_string(), "2".to_string()],
            ];
            let output = format_csv(&rows);
            assert_eq!(output, "a,b\n1,2");
        }
    
        #[test]
        fn test_with_headers() {
            let input = "name,age\nAlice,30\nBob,25";
            let records = parse_csv_with_headers(input).unwrap();
            assert_eq!(records.len(), 2);
            assert_eq!(records[0].get("name").unwrap(), "Alice");
            assert_eq!(records[0].get("age").unwrap(), "30");
        }
    }

    Deep Comparison

    OCaml vs Rust: CSV Parsing Pattern

    CSV Parser

    Rust

    pub fn parse_csv(input: &str) -> Result<Vec<Row>, CsvError> {
        let mut rows = Vec::new();
        for (line_num, line) in input.lines().enumerate() {
            let row = parse_row(line, line_num)?;
            rows.push(row);
        }
        Ok(rows)
    }
    

    OCaml

    let parse_csv input =
      input
      |> String.split_on_char '\n'
      |> List.mapi (fun i line -> parse_row line i)
      |> Result.all
    

    Quoted Fields

    Rust

    if ch == '"' {
        if chars.peek() == Some(&'"') {
            chars.next();
            current.push('"');
        } else {
            in_quotes = false;
        }
    }
    

    OCaml

    | '"' :: '"' :: rest when in_quotes ->
        parse rest ~in_quotes (Buffer.add_char buf '"')
    | '"' :: rest when in_quotes ->
        parse rest ~in_quotes:false buf
    

    Key Differences

    AspectOCamlRust
    Line iterationString.split.lines()
    Character lookaheadPattern match.peek()
    String buildingBuffer.tString
    Error typeresultResult<T, E>

    Exercises

  • Add support for Windows line endings (CRLF) by stripping trailing \r before processing each line.
  • Implement parse_csv_with_headers that treats the first row as column names and returns Vec<HashMap<String, String>> instead of Vec<Vec<String>>.
  • Write a streaming CSV parser using Iterator that yields one row at a time, suitable for processing files larger than available memory.
  • Open Source Repos