724 Fundamental

Zero-Copy Parsing

Functional Programming

Tutorial

The Problem

A naive parser copies every token from the input buffer into a freshly allocated String. For a 100 MB JSON file, this means allocating millions of strings, applying GC pressure, and touching twice the memory bandwidth. Zero-copy parsing eliminates these allocations by returning borrowed references (&str, &[u8]) into the original input buffer. The parse result's lifetime is tied to the input's lifetime, preventing the input from being freed while tokens are still accessible.

The pattern originated in high-frequency trading systems (parsing FIX protocol messages millions of times per second), network proxies (forwarding HTTP headers without copying), and embedded systems (parsing sensor data from a DMA buffer). Rust's lifetime system makes zero-copy safe by statically ensuring that borrowed parse results cannot outlive their input. Languages with GC (Java, Python, OCaml) achieve this only through careful discipline; Rust enforces it at compile time.

🎯 Learning Outcomes

• Implement a zero-copy parser that returns &str / &[u8] slices into input

• Use lifetime annotations to tie parse output lifetimes to input lifetimes

• Represent parse errors with enum ParseError without heap allocation

• Apply split_once, splitn, and manual byte scanning to avoid allocation

• Understand when zero-copy is impossible (e.g., unescaping, base64 decoding)

Code Example

pub fn take_until(buf: &[u8], delimiter: u8) -> Result<(&[u8], &[u8]), ParseError> {
    buf.iter()
        .position(|&b| b == delimiter)
        .map(|pos| (&buf[..pos], &buf[pos + 1..]))
        .ok_or(ParseError::MissingDelimiter(delimiter))
}

pub fn parse_key_value(line: &[u8]) -> Result<KeyValue<'_>, ParseError> {
    let (key_bytes, value_bytes) = take_until(line, b'=')?;
    Ok(KeyValue { key: as_str(key_bytes)?, value: as_str(value_bytes)? })
}

(* OCaml tracks positions as (start, length) pairs over a shared Bytes buffer *)
type span = { buf: bytes; start: int; len: int }

let span_split_at sep s =
  let rec find i =
    if i >= s.len then None
    else if Bytes.get s.buf (s.start + i) = sep then
      Some ({ s with len = i },
            { s with start = s.start + i + 1; len = s.len - i - 1 })
    else find (i + 1)
  in find 0

let span_to_string s = Bytes.sub_string s.buf s.start s.len

Key Differences

Aspect	Rust	OCaml
Borrowing input	Lifetime-annotated `&str`	GC-managed; string.sub copies
Safety enforcement	Compile-time lifetime check	Runtime / discipline
Binary frames	`&[u8]` slices, no copy	`Bigstring` with `Angstrom`
Error type	Enum, stack-allocated	`string` or exception
Parser libraries	`nom`, `winnow` (zero-copy)	`Angstrom` (Bigstring)

OCaml Approach

OCaml strings are immutable and the GC manages their lifetime, so "zero-copy" means using String.sub (which does copy) or Bytes.sub_string. True zero-copy requires Bigstring/Bigarray or the Angstrom parser combinator library with Lwt:

(* Copies substring — not zero-copy *)
let parse_key_value s =
  match String.split_on_char '=' s with
  | [k; v] -> Ok (String.trim k, String.trim v)
  | _       -> Error "invalid"

(* Zero-copy with Angstrom (returns Bigstring slices) *)
(* let record_parser = ... Angstrom.take_while ... *)

The Angstrom library uses Bigstring (a Bigarray.Array1 of char) as the backing buffer and returns offsets rather than copies, achieving true zero-copy in practice.

Full Source

#![allow(clippy::all)]
// 724. Zero-copy parsing with byte slices
//
// Returns &str / &[u8] slices into the input buffer — no allocation.
// Lifetimes tie parsed references to the original input.

use std::str;

// ── Error type ────────────────────────────────────────────────────────────────

#[derive(Debug, PartialEq)]
pub enum ParseError {
    UnexpectedEof,
    InvalidUtf8,
    MissingDelimiter(u8),
    InvalidFormat(&'static str),
}

// ── Low-level byte-slice combinators ─────────────────────────────────────────

/// Take `n` bytes from the front of `buf`, returning (taken, rest).
pub fn take(buf: &[u8], n: usize) -> Result<(&[u8], &[u8]), ParseError> {
    if buf.len() < n {
        Err(ParseError::UnexpectedEof)
    } else {
        Ok((&buf[..n], &buf[n..]))
    }
}

/// Consume bytes until `delimiter` (exclusive), returning (before, after_delim).
pub fn take_until(buf: &[u8], delimiter: u8) -> Result<(&[u8], &[u8]), ParseError> {
    buf.iter()
        .position(|&b| b == delimiter)
        .map(|pos| (&buf[..pos], &buf[pos + 1..]))
        .ok_or(ParseError::MissingDelimiter(delimiter))
}

/// Interpret a byte slice as UTF-8 `&str` — zero-copy, zero allocation.
pub fn as_str(buf: &[u8]) -> Result<&str, ParseError> {
    str::from_utf8(buf).map_err(|_| ParseError::InvalidUtf8)
}

/// Skip leading ASCII whitespace, returning the trimmed slice.
pub fn skip_whitespace(buf: &[u8]) -> &[u8] {
    let pos = buf
        .iter()
        .position(|b| !b.is_ascii_whitespace())
        .unwrap_or(buf.len());
    &buf[pos..]
}

// ── Span — index-pair view into a shared buffer ───────────────────────────────

/// A lightweight window into a byte buffer: start index + length.
/// Mirrors the OCaml `span` record, but without copying.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct Span {
    pub start: usize,
    pub len: usize,
}

impl Span {
    pub fn new(start: usize, len: usize) -> Self {
        Self { start, len }
    }

    /// Resolve the span against the original buffer — still zero-copy.
    pub fn slice<'a>(&self, buf: &'a [u8]) -> &'a [u8] {
        &buf[self.start..self.start + self.len]
    }

    pub fn as_str<'a>(&self, buf: &'a [u8]) -> Result<&'a str, ParseError> {
        as_str(self.slice(buf))
    }
}

/// Split `buf` at the first `sep` byte, returning two `Span`s (no allocation).
pub fn span_split_at(buf: &[u8], start: usize, len: usize, sep: u8) -> Option<(Span, Span)> {
    let slice = &buf[start..start + len];
    slice.iter().position(|&b| b == sep).map(|pos| {
        let left = Span::new(start, pos);
        // +1 to skip the separator itself
        let right = Span::new(start + pos + 1, len - pos - 1);
        (left, right)
    })
}

// ── HTTP request-line parser ──────────────────────────────────────────────────

/// Parsed HTTP request line.  All fields borrow from the input buffer.
#[derive(Debug, PartialEq)]
pub struct RequestLine<'a> {
    pub method: &'a str,
    pub path: &'a str,
    pub version: &'a str,
}

/// Parse `"METHOD /path HTTP/1.x\r\n"` without allocating.
///
/// Every `&str` in the returned struct points directly into `buf`.
pub fn parse_request_line(buf: &[u8]) -> Result<RequestLine<'_>, ParseError> {
    // Consume up to first space → method
    let (method_bytes, rest) = take_until(buf, b' ')?;
    let method = as_str(method_bytes)?;

    let rest = skip_whitespace(rest);

    // Consume up to second space → path
    let (path_bytes, rest) = take_until(rest, b' ')?;
    let path = as_str(path_bytes)?;

    let rest = skip_whitespace(rest);

    // Consume up to \r\n or end of slice → version
    let version_bytes = rest
        .iter()
        .position(|&b| b == b'\r' || b == b'\n')
        .map(|pos| &rest[..pos])
        .unwrap_or(rest);
    let version = as_str(version_bytes)?;

    Ok(RequestLine {
        method,
        path,
        version,
    })
}

// ── CSV field iterator — yields &str slices, zero-copy ───────────────────────

/// Iterator over comma-separated fields in a single CSV row.
/// Yields `&str` slices borrowed from the original input.
pub struct CsvFields<'a> {
    remaining: &'a [u8],
    done: bool,
}

impl<'a> CsvFields<'a> {
    pub fn new(row: &'a [u8]) -> Self {
        Self {
            remaining: row,
            done: false,
        }
    }
}

impl<'a> Iterator for CsvFields<'a> {
    type Item = Result<&'a str, ParseError>;

    fn next(&mut self) -> Option<Self::Item> {
        if self.done {
            return None;
        }
        match self.remaining.iter().position(|&b| b == b',') {
            Some(pos) => {
                let field = &self.remaining[..pos];
                self.remaining = &self.remaining[pos + 1..];
                Some(as_str(field))
            }
            None => {
                // Last field — consume everything
                self.done = true;
                if self.remaining.is_empty() {
                    None
                } else {
                    let field = self.remaining;
                    self.remaining = &[];
                    Some(as_str(field))
                }
            }
        }
    }
}

/// Collect all CSV fields from a row into a `Vec<&str>`, zero-copy.
pub fn parse_csv_row(row: &[u8]) -> Result<Vec<&str>, ParseError> {
    CsvFields::new(row).collect()
}

// ── Key=Value line parser ─────────────────────────────────────────────────────

/// A single `key=value` pair, both halves borrowing from the input.
#[derive(Debug, PartialEq)]
pub struct KeyValue<'a> {
    pub key: &'a str,
    pub value: &'a str,
}

pub fn parse_key_value(line: &[u8]) -> Result<KeyValue<'_>, ParseError> {
    let (key_bytes, value_bytes) = take_until(line, b'=')?;
    Ok(KeyValue {
        key: as_str(key_bytes)?,
        value: as_str(value_bytes)?,
    })
}

// ─────────────────────────────────────────────────────────────────────────────

#[cfg(test)]
mod tests {
    use super::*;

    // ── take ─────────────────────────────────────────────────────────────────

    #[test]
    fn take_splits_correctly() {
        let buf = b"Hello, world!";
        let (head, tail) = take(buf, 5).unwrap();
        assert_eq!(head, b"Hello");
        assert_eq!(tail, b", world!");
    }

    #[test]
    fn take_eof_returns_error() {
        let buf = b"Hi";
        assert_eq!(take(buf, 10), Err(ParseError::UnexpectedEof));
    }

    #[test]
    fn take_zero_returns_empty_head() {
        let buf = b"abc";
        let (head, tail) = take(buf, 0).unwrap();
        assert_eq!(head, b"");
        assert_eq!(tail, b"abc");
    }

    // ── take_until ───────────────────────────────────────────────────────────

    #[test]
    fn take_until_finds_delimiter() {
        let buf = b"key=value";
        let (before, after) = take_until(buf, b'=').unwrap();
        assert_eq!(before, b"key");
        assert_eq!(after, b"value");
    }

    #[test]
    fn take_until_missing_delimiter_errors() {
        let buf = b"nodot";
        assert_eq!(
            take_until(buf, b'.'),
            Err(ParseError::MissingDelimiter(b'.'))
        );
    }

    // ── Span ─────────────────────────────────────────────────────────────────

    #[test]
    fn span_slice_is_zero_copy() {
        let buf = b"Hello, world!";
        let span = Span::new(7, 5);
        assert_eq!(span.slice(buf), b"world");
        assert_eq!(span.as_str(buf).unwrap(), "world");
    }

    #[test]
    fn span_split_at_produces_two_windows() {
        let buf = b"left:right";
        let (l, r) = span_split_at(buf, 0, buf.len(), b':').unwrap();
        assert_eq!(l.as_str(buf).unwrap(), "left");
        assert_eq!(r.as_str(buf).unwrap(), "right");
    }

    #[test]
    fn span_split_at_missing_sep_returns_none() {
        let buf = b"nodot";
        assert!(span_split_at(buf, 0, buf.len(), b'.').is_none());
    }

    // ── HTTP request-line ─────────────────────────────────────────────────────

    #[test]
    fn parse_request_line_get() {
        let input = b"GET /index.html HTTP/1.1\r\n";
        let req = parse_request_line(input).unwrap();
        assert_eq!(req.method, "GET");
        assert_eq!(req.path, "/index.html");
        assert_eq!(req.version, "HTTP/1.1");
    }

    #[test]
    fn parse_request_line_post_no_crlf() {
        let input = b"POST /api/data HTTP/2";
        let req = parse_request_line(input).unwrap();
        assert_eq!(req.method, "POST");
        assert_eq!(req.path, "/api/data");
        assert_eq!(req.version, "HTTP/2");
    }

    #[test]
    fn parse_request_line_missing_path_errors() {
        let input = b"GET";
        assert!(parse_request_line(input).is_err());
    }

    // ── CSV row ───────────────────────────────────────────────────────────────

    #[test]
    fn parse_csv_row_three_fields() {
        let row = b"alice,30,engineer";
        let fields = parse_csv_row(row).unwrap();
        assert_eq!(fields, vec!["alice", "30", "engineer"]);
    }

    #[test]
    fn parse_csv_row_single_field() {
        let row = b"only";
        let fields = parse_csv_row(row).unwrap();
        assert_eq!(fields, vec!["only"]);
    }

    #[test]
    fn parse_csv_row_empty_fields() {
        let row = b"a,,c";
        let fields = parse_csv_row(row).unwrap();
        assert_eq!(fields, vec!["a", "", "c"]);
    }

    // ── key=value ─────────────────────────────────────────────────────────────

    #[test]
    fn parse_key_value_basic() {
        let line = b"host=localhost";
        let kv = parse_key_value(line).unwrap();
        assert_eq!(kv.key, "host");
        assert_eq!(kv.value, "localhost");
    }

    #[test]
    fn parse_key_value_value_with_equals() {
        // Only splits on the FIRST '='
        let line = b"url=http://x?a=1";
        let kv = parse_key_value(line).unwrap();
        assert_eq!(kv.key, "url");
        assert_eq!(kv.value, "http://x?a=1");
    }

    #[test]
    fn parse_key_value_missing_equals_errors() {
        let line = b"noequals";
        assert!(parse_key_value(line).is_err());
    }

    // ── Lifetime safety (compile-time) ────────────────────────────────────────

    #[test]
    fn parsed_fields_borrow_from_input() {
        let input = b"name=Ferris";
        let kv = parse_key_value(input).unwrap();
        // Both &str slices point into `input` — no heap allocation occurred.
        assert!(std::ptr::eq(kv.key.as_bytes().as_ptr(), input.as_ptr()));
        assert!(std::ptr::eq(kv.value.as_bytes().as_ptr(), unsafe {
            input.as_ptr().add(5)
        }));
    }
}

(* OCaml: Zero-copy parsing with Bytes and substring references.
   OCaml 5.0+ has Bytes.sub_bytes for non-copying slices, but String.sub
   allocates. We demonstrate both and show the Bigarray approach. *)

(* --- Manual zero-copy-ish parsing with index ranges --- *)

(* Instead of allocating substrings, track (start, length) pairs *)
type span = { buf: bytes; start: int; len: int }

let span_of_bytes buf = { buf; start = 0; len = Bytes.length buf }

let span_to_string s =
  Bytes.sub_string s.buf s.start s.len

let span_get s i =
  Bytes.get s.buf (s.start + i)

(* Split a span at the first occurrence of byte `sep` — no allocation *)
let span_split_at sep s =
  let rec find i =
    if i >= s.len then None
    else if Bytes.get s.buf (s.start + i) = sep then Some i
    else find (i + 1)
  in
  match find 0 with
  | None -> None
  | Some i ->
    let left  = { s with len = i } in
    let right = { s with start = s.start + i + 1; len = s.len - i - 1 } in
    Some (left, right)

(* Skip leading whitespace *)
let span_trim_start s =
  let i = ref 0 in
  while !i < s.len && span_get s !i = ' ' do incr i done;
  { s with start = s.start + !i; len = s.len - !i }

(* Parse "METHOD /path HTTP/1.1" without allocating substrings *)
type request_line = { method_: span; path: span; version: span }

let parse_request_line buf =
  let s = span_of_bytes buf in
  match span_split_at ' ' s with
  | None -> Error "missing method"
  | Some (method_, rest) ->
    let rest = span_trim_start rest in
    match span_split_at ' ' rest with
    | None -> Error "missing path"
    | Some (path, version) ->
      Ok { method_; path; version = span_trim_start version }

let () =
  let input = Bytes.of_string "GET /index.html HTTP/1.1" in
  match parse_request_line input with
  | Error e -> Printf.printf "Error: %s\n" e
  | Ok r ->
    Printf.printf "Method:  %s\n" (span_to_string r.method_);
    Printf.printf "Path:    %s\n" (span_to_string r.path);
    Printf.printf "Version: %s\n" (span_to_string r.version)

(* --- Key-value binary format --- *)
(* Format: [u8: key_len][key_bytes][u16_be: val_len][val_bytes] *)

let parse_kv buf pos =
  if pos >= Bytes.length buf then None
  else
    let key_len = Char.code (Bytes.get buf pos) in
    let key_start = pos + 1 in
    if key_start + key_len > Bytes.length buf then None
    else
      let val_len_hi = Char.code (Bytes.get buf (key_start + key_len)) in
      let val_len_lo = Char.code (Bytes.get buf (key_start + key_len + 1)) in
      let val_len = (val_len_hi lsl 8) lor val_len_lo in
      let val_start = key_start + key_len + 2 in
      if val_start + val_len > Bytes.length buf then None
      else
        let key = Bytes.sub_string buf key_start key_len in
        let value = Bytes.sub_string buf val_start val_len in
        Some (key, value, val_start + val_len)

let () =
  (* Build a small KV buffer *)
  let buf = Buffer.create 32 in
  let add_kv k v =
    Buffer.add_char buf (Char.chr (String.length k));
    Buffer.add_string buf k;
    let vl = String.length v in
    Buffer.add_char buf (Char.chr ((vl lsr 8) land 0xFF));
    Buffer.add_char buf (Char.chr (vl land 0xFF));
    Buffer.add_string buf v
  in
  add_kv "name" "Rust";
  add_kv "version" "1.85";
  let raw = Bytes.of_string (Buffer.contents buf) in
  let pos = ref 0 in
  while !pos < Bytes.length raw do
    match parse_kv raw !pos with
    | None -> pos := Bytes.length raw
    | Some (k, v, next) ->
      Printf.printf "  %s = %s\n" k v;
      pos := next
  done

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    // ── take ─────────────────────────────────────────────────────────────────

    #[test]
    fn take_splits_correctly() {
        let buf = b"Hello, world!";
        let (head, tail) = take(buf, 5).unwrap();
        assert_eq!(head, b"Hello");
        assert_eq!(tail, b", world!");
    }

    #[test]
    fn take_eof_returns_error() {
        let buf = b"Hi";
        assert_eq!(take(buf, 10), Err(ParseError::UnexpectedEof));
    }

    #[test]
    fn take_zero_returns_empty_head() {
        let buf = b"abc";
        let (head, tail) = take(buf, 0).unwrap();
        assert_eq!(head, b"");
        assert_eq!(tail, b"abc");
    }

    // ── take_until ───────────────────────────────────────────────────────────

    #[test]
    fn take_until_finds_delimiter() {
        let buf = b"key=value";
        let (before, after) = take_until(buf, b'=').unwrap();
        assert_eq!(before, b"key");
        assert_eq!(after, b"value");
    }

    #[test]
    fn take_until_missing_delimiter_errors() {
        let buf = b"nodot";
        assert_eq!(
            take_until(buf, b'.'),
            Err(ParseError::MissingDelimiter(b'.'))
        );
    }

    // ── Span ─────────────────────────────────────────────────────────────────

    #[test]
    fn span_slice_is_zero_copy() {
        let buf = b"Hello, world!";
        let span = Span::new(7, 5);
        assert_eq!(span.slice(buf), b"world");
        assert_eq!(span.as_str(buf).unwrap(), "world");
    }

    #[test]
    fn span_split_at_produces_two_windows() {
        let buf = b"left:right";
        let (l, r) = span_split_at(buf, 0, buf.len(), b':').unwrap();
        assert_eq!(l.as_str(buf).unwrap(), "left");
        assert_eq!(r.as_str(buf).unwrap(), "right");
    }

    #[test]
    fn span_split_at_missing_sep_returns_none() {
        let buf = b"nodot";
        assert!(span_split_at(buf, 0, buf.len(), b'.').is_none());
    }

    // ── HTTP request-line ─────────────────────────────────────────────────────

    #[test]
    fn parse_request_line_get() {
        let input = b"GET /index.html HTTP/1.1\r\n";
        let req = parse_request_line(input).unwrap();
        assert_eq!(req.method, "GET");
        assert_eq!(req.path, "/index.html");
        assert_eq!(req.version, "HTTP/1.1");
    }

    #[test]
    fn parse_request_line_post_no_crlf() {
        let input = b"POST /api/data HTTP/2";
        let req = parse_request_line(input).unwrap();
        assert_eq!(req.method, "POST");
        assert_eq!(req.path, "/api/data");
        assert_eq!(req.version, "HTTP/2");
    }

    #[test]
    fn parse_request_line_missing_path_errors() {
        let input = b"GET";
        assert!(parse_request_line(input).is_err());
    }

    // ── CSV row ───────────────────────────────────────────────────────────────

    #[test]
    fn parse_csv_row_three_fields() {
        let row = b"alice,30,engineer";
        let fields = parse_csv_row(row).unwrap();
        assert_eq!(fields, vec!["alice", "30", "engineer"]);
    }

    #[test]
    fn parse_csv_row_single_field() {
        let row = b"only";
        let fields = parse_csv_row(row).unwrap();
        assert_eq!(fields, vec!["only"]);
    }

    #[test]
    fn parse_csv_row_empty_fields() {
        let row = b"a,,c";
        let fields = parse_csv_row(row).unwrap();
        assert_eq!(fields, vec!["a", "", "c"]);
    }

    // ── key=value ─────────────────────────────────────────────────────────────

    #[test]
    fn parse_key_value_basic() {
        let line = b"host=localhost";
        let kv = parse_key_value(line).unwrap();
        assert_eq!(kv.key, "host");
        assert_eq!(kv.value, "localhost");
    }

    #[test]
    fn parse_key_value_value_with_equals() {
        // Only splits on the FIRST '='
        let line = b"url=http://x?a=1";
        let kv = parse_key_value(line).unwrap();
        assert_eq!(kv.key, "url");
        assert_eq!(kv.value, "http://x?a=1");
    }

    #[test]
    fn parse_key_value_missing_equals_errors() {
        let line = b"noequals";
        assert!(parse_key_value(line).is_err());
    }

    // ── Lifetime safety (compile-time) ────────────────────────────────────────

    #[test]
    fn parsed_fields_borrow_from_input() {
        let input = b"name=Ferris";
        let kv = parse_key_value(input).unwrap();
        // Both &str slices point into `input` — no heap allocation occurred.
        assert!(std::ptr::eq(kv.key.as_bytes().as_ptr(), input.as_ptr()));
        assert!(std::ptr::eq(kv.value.as_bytes().as_ptr(), unsafe {
            input.as_ptr().add(5)
        }));
    }
}

Deep Comparison

OCaml vs Rust: Zero-Copy Parsing with Byte Slices

Side-by-Side Code

OCaml

(* OCaml tracks positions as (start, length) pairs over a shared Bytes buffer *)
type span = { buf: bytes; start: int; len: int }

let span_split_at sep s =
  let rec find i =
    if i >= s.len then None
    else if Bytes.get s.buf (s.start + i) = sep then
      Some ({ s with len = i },
            { s with start = s.start + i + 1; len = s.len - i - 1 })
    else find (i + 1)
  in find 0

let span_to_string s = Bytes.sub_string s.buf s.start s.len

Rust (idiomatic — slice references)

pub fn take_until(buf: &[u8], delimiter: u8) -> Result<(&[u8], &[u8]), ParseError> {
    buf.iter()
        .position(|&b| b == delimiter)
        .map(|pos| (&buf[..pos], &buf[pos + 1..]))
        .ok_or(ParseError::MissingDelimiter(delimiter))
}

pub fn parse_key_value(line: &[u8]) -> Result<KeyValue<'_>, ParseError> {
    let (key_bytes, value_bytes) = take_until(line, b'=')?;
    Ok(KeyValue { key: as_str(key_bytes)?, value: as_str(value_bytes)? })
}

Rust (Span — index-pair approach, mirrors OCaml)

#[derive(Debug, Clone, Copy)]
pub struct Span { pub start: usize, pub len: usize }

impl Span {
    pub fn slice<'a>(&self, buf: &'a [u8]) -> &'a [u8] {
        &buf[self.start..self.start + self.len]
    }
}

pub fn span_split_at(buf: &[u8], start: usize, len: usize, sep: u8) -> Option<(Span, Span)> {
    let slice = &buf[start..start + len];
    slice.iter().position(|&b| b == sep).map(|pos| {
        (Span::new(start, pos), Span::new(start + pos + 1, len - pos - 1))
    })
}

Type Signatures

Concept	OCaml	Rust
Buffer view	`type span = { buf: bytes; start: int; len: int }`	`&[u8]` (fat pointer: ptr + len)
Split result	`span * span` (tuple of spans)	`(&[u8], &[u8])` (tuple of slices)
UTF-8 view	`Bytes.sub_string` — copies	`str::from_utf8` — borrows
Lifetime contract	Implicit — GC keeps buffer alive	`<'a>` annotation — compiler enforced
Optional result	`'a option`	`Option<T>`
Fallible result	`option` or exception	`Result<T, ParseError>`

Key Insights

**OCaml's String.sub allocates; Rust's &[u8] slice never does.**

In OCaml, extracting a substring almost always copies bytes into a new heap object. Rust &[u8] and &str are fat pointers (address + length) into existing memory — the parsed value is the original bytes, viewed differently.

Lifetimes replace garbage collection as the safety mechanism.

OCaml's GC ensures the underlying bytes buffer is kept alive as long as any span references it. Rust achieves the same guarantee at compile time through lifetime annotations: struct RequestLine<'a> cannot outlive the &'a [u8] it was parsed from. Use-after-free is rejected before the binary is produced.

**The Span struct is the OCaml idiom; slice references are the Rust idiom.**

OCaml must carry buf inside every span because references are opaque. Rust fat-pointer slices already carry both address and length, so the idiomatic Rust equivalent of a span is just &[u8] — no wrapper struct required.

Iterator-based field parsers compose without allocation.

CsvFields is a lazy Iterator<Item = Result<&str, _>> that yields slices into the original row buffer. In OCaml a comparable implementation would either allocate a list of substrings or thread an explicit index through a recursive function.

**? operator + Result makes zero-copy parsers as ergonomic as exception-based ones.**

OCaml parsers often raise exceptions for error paths. Rust's ? propagates Result::Err up the call stack with the same brevity, but without hidden control flow and with explicit error types that the caller can inspect or recover from.

When to Use Each Style

**Use idiomatic Rust &[u8] / &str slices when:** you control the full parser pipeline in one crate and want maximum ergonomics — the compiler infers lifetimes in most cases and the code reads like a sequence of combinator calls.

**Use the Span index-pair style when:** you need to store multiple parsed views alongside the buffer in a single struct (a self-referential pattern that Rust slices cannot express directly without unsafe), or when passing parsed results across FFI boundaries where raw pointer + length pairs are expected.

Exercises

Implement a zero-copy HTTP/1.1 request-line parser returning (&str, &str, &str)

for method, path, and version. Write property tests verifying no allocation occurs (use bumpalo as allocator oracle).

Extend parse_frame to return an iterator over multiple consecutive frames in a

buffer, with each frame borrowing from the original &[u8].

Implement a zero-copy JSON string tokenizer that returns &str slices for

unescaped strings but falls back to String for strings containing \uXXXX escapes (use Cow<'_, str>).

Benchmark your CSV field parser vs one that collects into Vec<String>. Measure

allocations with heaptrack or the dhat allocator.

Write a nom-based parser for a simple binary format and compare its generated code

to your hand-rolled version using cargo asm.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust