767 Fundamental

767-versioned-data-format — Versioned Data Format

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "767-versioned-data-format — Versioned Data Format" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. Long-lived systems must evolve their data formats without breaking existing data. Key difference from OCaml: 1. **Version field**: Both languages encode the version as an explicit field; Rust's `u8` pair vs. OCaml's `Int.t` are equivalent.

Tutorial

The Problem

Long-lived systems must evolve their data formats without breaking existing data. A V1 file written in 2020 must still be readable by a V3 application in 2025. This requires explicit version negotiation, backward-compatible additions (new optional fields with defaults), forward-compatible reading (ignoring unknown fields), and migration functions for format upgrades. Protocol Buffers, Avro, and Thrift all have sophisticated solutions; this example shows the principles in pure Rust.

🎯 Learning Outcomes

• Model multiple format versions as distinct structs (DataV1, DataV2, DataV3)

• Implement migration functions: DataV1 -> DataV2, DataV2 -> DataV3

• Use a Version struct for compatibility checking (same major = compatible)

• Implement a unified Data enum that can hold any version

• Write tests that verify both backward compatibility and migration correctness

Code Example

pub enum Data {
    V1(DataV1),
    V2(DataV2),
    V3(DataV3),
}

impl Data {
    pub fn upgrade(self) -> DataV3 {
        match self {
            Data::V1(v1) => DataV3 { name: v1.name, value: v1.value as f64, .. },
            Data::V2(v2) => DataV3 { name: v2.name, value: v2.value as f64, tags: v2.tags, .. },
            Data::V3(v3) => v3,
        }
    }
}

type data =
  | V1 of data_v1
  | V2 of data_v2
  | V3 of data_v3

let upgrade = function
  | V1 v1 -> { name = v1.name; value = float_of_int v1.value; tags = []; metadata = [] }
  | V2 v2 -> { name = v2.name; value = float_of_int v2.value; tags = v2.tags; metadata = [] }
  | V3 v3 -> v3

Key Differences

Version field: Both languages encode the version as an explicit field; Rust's u8 pair vs. OCaml's Int.t are equivalent.

Migration chain: Rust's explicit v1_to_v2/v2_to_v3 functions mirror OCaml's migration module pattern.

Schema evolution: Protocol Buffers and Avro handle versioning at the schema level; this example handles it in application code.

Backward compatibility: Both languages support optional fields with defaults, though Rust's Option<T> and OCaml's option 't require explicit handling.

OCaml Approach

OCaml's Bin_prot handles versioning through Versioned modules: each version has a bin_read_t and migrations are explicit functions. Jane Street uses this pervasively in their trading infrastructure. OCaml's ppx_sexp_conv generates S-expression serializers per version; custom deserialization reads the version field first and dispatches. Protobuf bindings for OCaml (ocaml-protoc) provide language-agnostic versioning.

Full Source

#![allow(clippy::all)]
//! # Versioned Data Format
//!
//! Forward and backward compatible data serialization.

/// Data version
#[derive(Debug, Clone, Copy, PartialEq)]
pub struct Version(pub u8, pub u8);

impl Version {
    pub fn new(major: u8, minor: u8) -> Self {
        Version(major, minor)
    }

    pub fn is_compatible(&self, other: &Version) -> bool {
        self.0 == other.0 // Same major version
    }
}

/// V1 of our data format
#[derive(Debug, Clone, PartialEq)]
pub struct DataV1 {
    pub name: String,
    pub value: i32,
}

/// V2 adds a new field with default
#[derive(Debug, Clone, PartialEq)]
pub struct DataV2 {
    pub name: String,
    pub value: i32,
    pub tags: Vec<String>, // New in V2
}

/// V3 changes value type and adds metadata
#[derive(Debug, Clone, PartialEq)]
pub struct DataV3 {
    pub name: String,
    pub value: f64, // Changed from i32
    pub tags: Vec<String>,
    pub metadata: std::collections::HashMap<String, String>, // New in V3
}

/// Unified data representation
#[derive(Debug, Clone, PartialEq)]
pub enum Data {
    V1(DataV1),
    V2(DataV2),
    V3(DataV3),
}

impl Data {
    /// Upgrade to latest version
    pub fn upgrade(self) -> DataV3 {
        match self {
            Data::V1(v1) => DataV3 {
                name: v1.name,
                value: v1.value as f64,
                tags: Vec::new(),
                metadata: std::collections::HashMap::new(),
            },
            Data::V2(v2) => DataV3 {
                name: v2.name,
                value: v2.value as f64,
                tags: v2.tags,
                metadata: std::collections::HashMap::new(),
            },
            Data::V3(v3) => v3,
        }
    }
}

/// Simple binary serialization
pub fn serialize_v3(data: &DataV3) -> Vec<u8> {
    let mut buf = Vec::new();

    // Version header
    buf.push(3); // major
    buf.push(0); // minor

    // Name (length-prefixed)
    buf.extend_from_slice(&(data.name.len() as u32).to_le_bytes());
    buf.extend_from_slice(data.name.as_bytes());

    // Value (f64)
    buf.extend_from_slice(&data.value.to_le_bytes());

    // Tags (count + items)
    buf.extend_from_slice(&(data.tags.len() as u32).to_le_bytes());
    for tag in &data.tags {
        buf.extend_from_slice(&(tag.len() as u32).to_le_bytes());
        buf.extend_from_slice(tag.as_bytes());
    }

    // Metadata (count + pairs)
    buf.extend_from_slice(&(data.metadata.len() as u32).to_le_bytes());
    for (k, v) in &data.metadata {
        buf.extend_from_slice(&(k.len() as u32).to_le_bytes());
        buf.extend_from_slice(k.as_bytes());
        buf.extend_from_slice(&(v.len() as u32).to_le_bytes());
        buf.extend_from_slice(v.as_bytes());
    }

    buf
}

/// Deserialize with version detection
pub fn deserialize(bytes: &[u8]) -> Result<Data, String> {
    if bytes.len() < 2 {
        return Err("Too short".to_string());
    }

    let major = bytes[0];
    let _minor = bytes[1];
    let rest = &bytes[2..];

    match major {
        1 => deserialize_v1(rest).map(Data::V1),
        2 => deserialize_v2(rest).map(Data::V2),
        3 => deserialize_v3(rest).map(Data::V3),
        v => Err(format!("Unknown version: {}", v)),
    }
}

fn read_string(bytes: &[u8], pos: &mut usize) -> Result<String, String> {
    if *pos + 4 > bytes.len() {
        return Err("Truncated".to_string());
    }
    let len = u32::from_le_bytes([
        bytes[*pos],
        bytes[*pos + 1],
        bytes[*pos + 2],
        bytes[*pos + 3],
    ]) as usize;
    *pos += 4;
    if *pos + len > bytes.len() {
        return Err("Truncated string".to_string());
    }
    let s = String::from_utf8(bytes[*pos..*pos + len].to_vec()).map_err(|_| "Invalid UTF-8")?;
    *pos += len;
    Ok(s)
}

fn deserialize_v1(bytes: &[u8]) -> Result<DataV1, String> {
    let mut pos = 0;
    let name = read_string(bytes, &mut pos)?;
    if pos + 4 > bytes.len() {
        return Err("Truncated".to_string());
    }
    let value = i32::from_le_bytes([bytes[pos], bytes[pos + 1], bytes[pos + 2], bytes[pos + 3]]);
    Ok(DataV1 { name, value })
}

fn deserialize_v2(bytes: &[u8]) -> Result<DataV2, String> {
    let v1 = deserialize_v1(bytes)?;
    // V2 would have tags after the v1 data
    Ok(DataV2 {
        name: v1.name,
        value: v1.value,
        tags: Vec::new(), // Simplified
    })
}

fn deserialize_v3(bytes: &[u8]) -> Result<DataV3, String> {
    let mut pos = 0;
    let name = read_string(bytes, &mut pos)?;

    if pos + 8 > bytes.len() {
        return Err("Truncated".to_string());
    }
    let value = f64::from_le_bytes([
        bytes[pos],
        bytes[pos + 1],
        bytes[pos + 2],
        bytes[pos + 3],
        bytes[pos + 4],
        bytes[pos + 5],
        bytes[pos + 6],
        bytes[pos + 7],
    ]);
    pos += 8;

    Ok(DataV3 {
        name,
        value,
        tags: Vec::new(),
        metadata: std::collections::HashMap::new(),
    })
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_version_compatible() {
        let v1 = Version::new(1, 0);
        let v1_1 = Version::new(1, 1);
        let v2 = Version::new(2, 0);

        assert!(v1.is_compatible(&v1_1));
        assert!(!v1.is_compatible(&v2));
    }

    #[test]
    fn test_upgrade_v1_to_v3() {
        let v1 = DataV1 {
            name: "test".to_string(),
            value: 42,
        };
        let v3 = Data::V1(v1).upgrade();
        assert_eq!(v3.name, "test");
        assert_eq!(v3.value, 42.0);
        assert!(v3.tags.is_empty());
    }

    #[test]
    fn test_upgrade_v2_to_v3() {
        let v2 = DataV2 {
            name: "test".to_string(),
            value: 100,
            tags: vec!["a".to_string(), "b".to_string()],
        };
        let v3 = Data::V2(v2).upgrade();
        assert_eq!(v3.value, 100.0);
        assert_eq!(v3.tags, vec!["a", "b"]);
    }

    #[test]
    fn test_serialize_deserialize() {
        let data = DataV3 {
            name: "hello".to_string(),
            value: 3.14,
            tags: vec![],
            metadata: std::collections::HashMap::new(),
        };
        let bytes = serialize_v3(&data);
        let parsed = deserialize(&bytes).unwrap();

        if let Data::V3(d) = parsed {
            assert_eq!(d.name, "hello");
            assert!((d.value - 3.14).abs() < 0.001);
        } else {
            panic!("Expected V3");
        }
    }
}

(* Versioned serialization with migration in OCaml *)

(* ── V1 schema: name + age *)
type user_v1 = { name: string; age: int }

(* ── V2 schema: name + age + email (new field) *)
type user_v2 = { name: string; age: int; email: string }

(* ── Migration: V1 → V2 *)
let migrate_v1_to_v2 (u: user_v1) : user_v2 =
  { name  = u.name;
    age   = u.age;
    email = u.name ^ "@example.com" }  (* synthesized default *)

(* ── Versioned union *)
type versioned_user =
  | V1User of user_v1
  | V2User of user_v2

(* ── Serialize *)
let serialize_v2 u =
  Printf.sprintf "version=2|name=%s|age=%d|email=%s" u.name u.age u.email

let serialize_v1 u =
  Printf.sprintf "version=1|name=%s|age=%d" u.name u.age

(* ── Deserialize with migration *)
let field pairs key =
  match List.assoc_opt key pairs with
  | Some v -> Ok v
  | None   -> Error ("missing field: " ^ key)

let parse_pairs s =
  String.split_on_char '|' s
  |> List.filter_map (fun p ->
    match String.split_on_char '=' p with
    | [k; v] -> Some (k, v)
    | _ -> None)

let deserialize s =
  let pairs = parse_pairs s in
  match field pairs "version" with
  | Error e -> Error e
  | Ok "1" ->
    (match field pairs "name", field pairs "age" with
     | Ok name, Ok age_s ->
       (try
         let u1 = V1User { name; age = int_of_string age_s } in
         Ok u1
        with Failure e -> Error e)
     | Error e, _ | _, Error e -> Error e)
  | Ok "2" ->
    (match field pairs "name", field pairs "age", field pairs "email" with
     | Ok name, Ok age_s, Ok email ->
       (try Ok (V2User { name; age = int_of_string age_s; email })
        with Failure e -> Error e)
     | Error e, _, _ | _, Error e, _ | _, _, Error e -> Error e)
  | Ok v -> Error ("unsupported version: " ^ v)

(* Normalize to V2 (migrating if needed) *)
let to_v2 = function
  | V1User u1 -> migrate_v1_to_v2 u1
  | V2User u2 -> u2

let () =
  (* Write in old format, read as new *)
  let old_data = serialize_v1 { name = "Alice"; age = 30 } in
  Printf.printf "Old wire: %s\n" old_data;
  match deserialize old_data with
  | Ok v ->
    let u2 = to_v2 v in
    Printf.printf "Migrated: name=%s age=%d email=%s\n" u2.name u2.age u2.email
  | Error e -> Printf.printf "Error: %s\n" e

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_version_compatible() {
        let v1 = Version::new(1, 0);
        let v1_1 = Version::new(1, 1);
        let v2 = Version::new(2, 0);

        assert!(v1.is_compatible(&v1_1));
        assert!(!v1.is_compatible(&v2));
    }

    #[test]
    fn test_upgrade_v1_to_v3() {
        let v1 = DataV1 {
            name: "test".to_string(),
            value: 42,
        };
        let v3 = Data::V1(v1).upgrade();
        assert_eq!(v3.name, "test");
        assert_eq!(v3.value, 42.0);
        assert!(v3.tags.is_empty());
    }

    #[test]
    fn test_upgrade_v2_to_v3() {
        let v2 = DataV2 {
            name: "test".to_string(),
            value: 100,
            tags: vec!["a".to_string(), "b".to_string()],
        };
        let v3 = Data::V2(v2).upgrade();
        assert_eq!(v3.value, 100.0);
        assert_eq!(v3.tags, vec!["a", "b"]);
    }

    #[test]
    fn test_serialize_deserialize() {
        let data = DataV3 {
            name: "hello".to_string(),
            value: 3.14,
            tags: vec![],
            metadata: std::collections::HashMap::new(),
        };
        let bytes = serialize_v3(&data);
        let parsed = deserialize(&bytes).unwrap();

        if let Data::V3(d) = parsed {
            assert_eq!(d.name, "hello");
            assert!((d.value - 3.14).abs() < 0.001);
        } else {
            panic!("Expected V3");
        }
    }
}

Deep Comparison

OCaml vs Rust: Versioned Data Format

Version Handling

Rust

pub enum Data {
    V1(DataV1),
    V2(DataV2),
    V3(DataV3),
}

impl Data {
    pub fn upgrade(self) -> DataV3 {
        match self {
            Data::V1(v1) => DataV3 { name: v1.name, value: v1.value as f64, .. },
            Data::V2(v2) => DataV3 { name: v2.name, value: v2.value as f64, tags: v2.tags, .. },
            Data::V3(v3) => v3,
        }
    }
}

OCaml

type data =
  | V1 of data_v1
  | V2 of data_v2
  | V3 of data_v3

let upgrade = function
  | V1 v1 -> { name = v1.name; value = float_of_int v1.value; tags = []; metadata = [] }
  | V2 v2 -> { name = v2.name; value = float_of_int v2.value; tags = v2.tags; metadata = [] }
  | V3 v3 -> v3

Key Differences

Aspect	OCaml	Rust
Enum variants	`V1 of type`	`V1(Type)`
Upgrade	Function	Method
Default fields	Must specify all	Struct update syntax
Type conversion	`float_of_int`	`as f64`

Exercises

Implement DataV4 that adds a priority: u8 field with a default of 0 and write a v3_to_v4 migration.

Add binary serialization for each version and implement read_any_version(bytes: &[u8]) -> Result<DataV3, Error> that reads the version byte, deserializes, and migrates.

Write a compatibility matrix test that verifies: V1 data can migrate to V3, V2 data can migrate to V3, but V3 data cannot be downgraded to V1 (return an error).

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust