ExamplesBy LevelBy TopicLearning Paths
768 Fundamental

768-zero-copy-deserialize — Zero-Copy Deserialize

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "768-zero-copy-deserialize — Zero-Copy Deserialize" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Deserialization normally copies data: the input bytes are parsed and new heap-allocated strings and vectors are created. Key difference from OCaml: 1. **Lifetime tracking**: Rust's `'a` lifetime on returned references is checked at compile time; OCaml has no equivalent — the GC handles lifetime but cannot prevent logical errors.

Tutorial

The Problem

Deserialization normally copies data: the input bytes are parsed and new heap-allocated strings and vectors are created. Zero-copy deserialization avoids this by returning references (&str, &[u8]) that point directly into the input buffer. For high-throughput network servers processing thousands of requests per second, eliminating these copies can halve memory bandwidth usage. serde's #[serde(borrow)] attribute enables zero-copy deserialization for string fields.

🎯 Learning Outcomes

  • • Return &str slices from parsing functions that borrow from the input
  • • Understand lifetime parameters on parsed types: Message<'a>, KeyValue<'a>
  • • Implement parse_message, parse_kv, and parse_csv_row returning borrowed references
  • • See how Rust's lifetime system prevents use-after-free from zero-copy parsing
  • • Understand the trade-off: zero-copy requires the input buffer to live as long as the parsed value
  • Code Example

    pub struct Message<'a> {
        pub header: &'a str,  // Borrows from input
        pub body: &'a str,    // Borrows from input
    }
    
    pub fn parse_message(input: &str) -> Option<Message<'_>> {
        let pos = input.find('\n')?;
        Some(Message {
            header: &input[..pos],      // No allocation!
            body: &input[pos + 1..],    // No allocation!
        })
    }

    Key Differences

  • Lifetime tracking: Rust's 'a lifetime on returned references is checked at compile time; OCaml has no equivalent — the GC handles lifetime but cannot prevent logical errors.
  • String representation: Rust's &str is a fat pointer (ptr + len) into an existing buffer; OCaml's substring always allocates a new string.
  • Production use: Rust's serde with #[serde(borrow)] enables zero-copy JSON parsing; serde_json::from_str::<Message<'_>> avoids all string allocation for borrowed fields.
  • Buffer lifetime: Rust enforces that the parsed value cannot outlive the input buffer; OCaml's GC keeps the buffer alive as long as any string derived from it exists.
  • OCaml Approach

    OCaml's GC makes zero-copy more complex: since strings are GC-managed, returning a substring typically requires either a copy or using String.sub (which copies). Bigstringaf provides a mutable, GC-tracked byte buffer where substrings can be represented as offset-length pairs without copying. Angstrom uses this for zero-copy network parsing. The Cstruct library in MirageOS provides zero-copy buffer slices for network protocols.

    Full Source

    #![allow(clippy::all)]
    //! # Zero-Copy Deserialize
    //!
    //! Borrowing from input data instead of copying.
    
    /// A message that borrows from input
    #[derive(Debug)]
    pub struct Message<'a> {
        pub header: &'a str,
        pub body: &'a str,
    }
    
    /// Parse a message without copying
    pub fn parse_message(input: &str) -> Option<Message<'_>> {
        let input = input.trim();
        let newline_pos = input.find('\n')?;
    
        Some(Message {
            header: &input[..newline_pos],
            body: &input[newline_pos + 1..],
        })
    }
    
    /// Key-value pair that borrows
    #[derive(Debug)]
    pub struct KeyValue<'a> {
        pub key: &'a str,
        pub value: &'a str,
    }
    
    /// Parse key=value without copying
    pub fn parse_kv(input: &str) -> Option<KeyValue<'_>> {
        let eq_pos = input.find('=')?;
        Some(KeyValue {
            key: input[..eq_pos].trim(),
            value: input[eq_pos + 1..].trim(),
        })
    }
    
    /// CSV row that borrows
    #[derive(Debug)]
    pub struct CsvRow<'a> {
        fields: Vec<&'a str>,
    }
    
    impl<'a> CsvRow<'a> {
        pub fn parse(line: &'a str) -> Self {
            CsvRow {
                fields: line.split(',').map(str::trim).collect(),
            }
        }
    
        pub fn get(&self, index: usize) -> Option<&'a str> {
            self.fields.get(index).copied()
        }
    
        pub fn len(&self) -> usize {
            self.fields.len()
        }
    
        pub fn is_empty(&self) -> bool {
            self.fields.is_empty()
        }
    }
    
    /// Parse multiple CSV rows, borrowing from input
    pub fn parse_csv_rows(input: &str) -> Vec<CsvRow<'_>> {
        input
            .lines()
            .filter(|line| !line.trim().is_empty())
            .map(CsvRow::parse)
            .collect()
    }
    
    /// A JSON-like path reference
    #[derive(Debug)]
    pub struct JsonPath<'a> {
        segments: Vec<&'a str>,
    }
    
    impl<'a> JsonPath<'a> {
        pub fn parse(path: &'a str) -> Self {
            JsonPath {
                segments: path.split('.').collect(),
            }
        }
    
        pub fn segments(&self) -> &[&'a str] {
            &self.segments
        }
    }
    
    /// Header-body protocol message
    #[derive(Debug)]
    pub struct HttpLikeMessage<'a> {
        pub method: &'a str,
        pub path: &'a str,
        pub headers: Vec<(&'a str, &'a str)>,
        pub body: &'a str,
    }
    
    /// Parse HTTP-like message
    pub fn parse_http_like(input: &str) -> Option<HttpLikeMessage<'_>> {
        let mut lines = input.lines();
    
        // Request line
        let request_line = lines.next()?;
        let mut parts = request_line.split_whitespace();
        let method = parts.next()?;
        let path = parts.next()?;
    
        // Headers
        let mut headers = Vec::new();
        let mut body_start = request_line.len() + 1;
    
        for line in lines.by_ref() {
            body_start += line.len() + 1;
            if line.is_empty() {
                break;
            }
            if let Some((key, value)) = line.split_once(':') {
                headers.push((key.trim(), value.trim()));
            }
        }
    
        // Body is rest
        let body = if body_start < input.len() {
            &input[body_start..]
        } else {
            ""
        };
    
        Some(HttpLikeMessage {
            method,
            path,
            headers,
            body,
        })
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_parse_message() {
            let input = "Hello\nWorld";
            let msg = parse_message(input).unwrap();
            assert_eq!(msg.header, "Hello");
            assert_eq!(msg.body, "World");
        }
    
        #[test]
        fn test_parse_kv() {
            let input = "name = Alice";
            let kv = parse_kv(input).unwrap();
            assert_eq!(kv.key, "name");
            assert_eq!(kv.value, "Alice");
        }
    
        #[test]
        fn test_csv_row() {
            let line = "a, b, c";
            let row = CsvRow::parse(line);
            assert_eq!(row.len(), 3);
            assert_eq!(row.get(0), Some("a"));
            assert_eq!(row.get(1), Some("b"));
        }
    
        #[test]
        fn test_json_path() {
            let path = "user.profile.name";
            let jp = JsonPath::parse(path);
            assert_eq!(jp.segments(), &["user", "profile", "name"]);
        }
    
        #[test]
        fn test_http_like() {
            let input = "GET /api/users\nContent-Type: application/json\n\n{\"id\": 1}";
            let msg = parse_http_like(input).unwrap();
            assert_eq!(msg.method, "GET");
            assert_eq!(msg.path, "/api/users");
            assert_eq!(msg.headers.len(), 1);
        }
    
        #[test]
        fn test_zero_copy_addresses() {
            let input = "key=value";
            let kv = parse_kv(input).unwrap();
            // Key and value point into original input
            assert!(input.as_ptr() <= kv.key.as_ptr());
            assert!(kv.key.as_ptr() < input.as_ptr().wrapping_add(input.len()));
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_parse_message() {
            let input = "Hello\nWorld";
            let msg = parse_message(input).unwrap();
            assert_eq!(msg.header, "Hello");
            assert_eq!(msg.body, "World");
        }
    
        #[test]
        fn test_parse_kv() {
            let input = "name = Alice";
            let kv = parse_kv(input).unwrap();
            assert_eq!(kv.key, "name");
            assert_eq!(kv.value, "Alice");
        }
    
        #[test]
        fn test_csv_row() {
            let line = "a, b, c";
            let row = CsvRow::parse(line);
            assert_eq!(row.len(), 3);
            assert_eq!(row.get(0), Some("a"));
            assert_eq!(row.get(1), Some("b"));
        }
    
        #[test]
        fn test_json_path() {
            let path = "user.profile.name";
            let jp = JsonPath::parse(path);
            assert_eq!(jp.segments(), &["user", "profile", "name"]);
        }
    
        #[test]
        fn test_http_like() {
            let input = "GET /api/users\nContent-Type: application/json\n\n{\"id\": 1}";
            let msg = parse_http_like(input).unwrap();
            assert_eq!(msg.method, "GET");
            assert_eq!(msg.path, "/api/users");
            assert_eq!(msg.headers.len(), 1);
        }
    
        #[test]
        fn test_zero_copy_addresses() {
            let input = "key=value";
            let kv = parse_kv(input).unwrap();
            // Key and value point into original input
            assert!(input.as_ptr() <= kv.key.as_ptr());
            assert!(kv.key.as_ptr() < input.as_ptr().wrapping_add(input.len()));
        }
    }

    Deep Comparison

    OCaml vs Rust: Zero-Copy Deserialize

    Borrowing vs Copying

    Rust (Zero-Copy)

    pub struct Message<'a> {
        pub header: &'a str,  // Borrows from input
        pub body: &'a str,    // Borrows from input
    }
    
    pub fn parse_message(input: &str) -> Option<Message<'_>> {
        let pos = input.find('\n')?;
        Some(Message {
            header: &input[..pos],      // No allocation!
            body: &input[pos + 1..],    // No allocation!
        })
    }
    

    OCaml (Copying)

    type message = {
      header: string;  (* Owned copy *)
      body: string;    (* Owned copy *)
    }
    
    let parse_message input =
      match String.index_opt input '\n' with
      | None -> None
      | Some pos ->
          Some {
            header = String.sub input 0 pos;       (* Allocates new string *)
            body = String.sub input (pos + 1) ...; (* Allocates new string *)
          }
    

    Lifetime Annotations

    Rust

    // 'a ties output lifetime to input
    pub fn parse_kv(input: &str) -> Option<KeyValue<'_>> {
        let pos = input.find('=')?;
        Some(KeyValue {
            key: &input[..pos],
            value: &input[pos + 1..],
        })
    }
    

    Key Differences

    AspectOCamlRust
    String slicingAlways copiesZero-copy with &str
    Lifetime trackingGC handles itExplicit 'a
    Memory usageO(n) for n fieldsO(1) pointers
    Input lifetimeIndependentMust outlive result

    Exercises

  • Implement parse_http_request<'a>(input: &'a str) -> Option<HttpRequest<'a>> where HttpRequest borrows method, path, and header values from the input.
  • Add a split_fields<'a>(s: &'a str, delim: char) -> impl Iterator<Item = &'a str> that returns borrowed field slices without allocating a Vec.
  • Write a benchmark comparing parse_message (zero-copy, returns &str) against a copying version (returns String) for 1 million parses. Measure allocation count with a custom allocator.
  • Open Source Repos