ExamplesBy LevelBy TopicLearning Paths
709 Fundamental

unions in rust

Functional Programming

Tutorial

The Problem

This example covers a specific aspect of Rust's unsafe programming model: raw memory manipulation, FFI interop, allocator customization, or soundness principles. These topics are essential for systems programming — writing OS components, device drivers, game engines, and any code that must interact with C libraries or control memory layout precisely. Rust's unsafe system is designed to confine unsafety to small, auditable regions while maintaining safety in the surrounding code.

🎯 Learning Outcomes

  • • The specific unsafe feature demonstrated: unions in rust
  • • When this feature is necessary vs when safe alternatives exist
  • • How to use it correctly with appropriate SAFETY documentation
  • • The invariants that must be maintained for the operation to be sound
  • • Real-world contexts: embedded systems, OS kernels, C FFI, performance-critical code
  • Code Example

    /// Idiomatic Rust: the compiler generates the tag and dispatch for you.
    #[derive(Debug, Clone, PartialEq)]
    pub enum ValueEnum {
        Int(i64),
        Float(f64),
        Bool(bool),
    }
    
    impl ValueEnum {
        pub fn describe(&self) -> String {
            match self {
                ValueEnum::Int(n)   => format!("Int({n})"),
                ValueEnum::Float(f) => format!("Float({f})"),
                ValueEnum::Bool(b)  => format!("Bool({b})"),
            }
        }
    }

    Key Differences

  • Safety model: Rust requires explicit unsafe for these operations; OCaml achieves safety through the GC and type system without explicit unsafe regions.
  • FFI approach: Rust uses raw C types directly with extern "C"; OCaml uses ctypes which wraps C types in OCaml values.
  • Memory control: Rust allows complete control over memory layout (#[repr(C)], custom allocators); OCaml's GC manages memory layout automatically.
  • Auditability: Rust unsafe regions are syntactically visible and toolable; OCaml unsafe operations (Obj.magic, direct C calls) are also explicit but less common.
  • OCaml Approach

    OCaml's GC and type system eliminate most of the need for these unsafe operations. The equivalent functionality typically uses:

  • • C FFI via the ctypes library for external function calls
  • Bigarray for controlled raw memory access
  • • The GC for memory management (no manual allocators needed)
  • Bytes.t for mutable byte sequences
  • OCaml programs rarely need operations equivalent to these Rust unsafe patterns.

    Full Source

    #![allow(clippy::all)]
    //! 709 — Unions in Rust: C-style Tagged Unions
    //!
    //! Raw `union` + enum tag = safe tagged union.
    //! This is exactly what OCaml's algebraic data types are at the hardware level,
    //! except OCaml hides the tag and dispatch from you. Here we write it explicitly.
    
    // ---------------------------------------------------------------------------
    // Raw union — all fields overlap at the same memory address.
    // Only usable inside `unsafe` blocks.
    // ---------------------------------------------------------------------------
    
    /// Untagged union: all fields share the same memory location.
    /// Reading the wrong field after writing another is undefined behaviour.
    #[repr(C)]
    union RawValue {
        int_val: i64,
        float_val: f64,
        bool_val: u8,
    }
    
    // ---------------------------------------------------------------------------
    // Tag enum — tracks which field of the union is currently valid.
    // ---------------------------------------------------------------------------
    
    /// Discriminant tracking which field is active.
    #[derive(Debug, Clone, Copy, PartialEq, Eq)]
    pub enum Tag {
        Int,
        Float,
        Bool,
    }
    
    // ---------------------------------------------------------------------------
    // Safe tagged union — pairs the raw union with its discriminant.
    // All unsafe access is confined to these methods.
    // ---------------------------------------------------------------------------
    
    /// Safe tagged union: an enum tag guards all reads of the raw union.
    pub struct Value {
        tag: Tag,
        data: RawValue,
    }
    
    impl Value {
        /// Construct a `Value` holding an integer.
        pub fn int(n: i64) -> Self {
            Value {
                tag: Tag::Int,
                data: RawValue { int_val: n },
            }
        }
    
        /// Construct a `Value` holding a float.
        pub fn float(f: f64) -> Self {
            Value {
                tag: Tag::Float,
                data: RawValue { float_val: f },
            }
        }
    
        /// Construct a `Value` holding a boolean.
        pub fn bool(b: bool) -> Self {
            Value {
                tag: Tag::Bool,
                data: RawValue { bool_val: b as u8 },
            }
        }
    
        /// Return the integer if the tag is `Int`, otherwise `None`.
        pub fn as_int(&self) -> Option<i64> {
            if self.tag == Tag::Int {
                // SAFETY: we just checked the tag is Int, so int_val was the last
                // field written and its bits are valid for i64.
                Some(unsafe { self.data.int_val })
            } else {
                None
            }
        }
    
        /// Return the float if the tag is `Float`, otherwise `None`.
        pub fn as_float(&self) -> Option<f64> {
            if self.tag == Tag::Float {
                // SAFETY: tag is Float, so float_val is the active field.
                Some(unsafe { self.data.float_val })
            } else {
                None
            }
        }
    
        /// Return the bool if the tag is `Bool`, otherwise `None`.
        pub fn as_bool(&self) -> Option<bool> {
            if self.tag == Tag::Bool {
                // SAFETY: tag is Bool; u8 non-zero → true, zero → false.
                Some(unsafe { self.data.bool_val != 0 })
            } else {
                None
            }
        }
    
        /// The active tag for this value.
        pub fn tag(&self) -> Tag {
            self.tag
        }
    
        /// Human-readable description — mirrors the OCaml `describe` function.
        pub fn describe(&self) -> String {
            match self.tag {
                Tag::Int => format!("Int({})", unsafe { self.data.int_val }),
                Tag::Float => format!("Float({})", unsafe { self.data.float_val }),
                Tag::Bool => format!("Bool({})", unsafe { self.data.bool_val != 0 }),
            }
        }
    
        /// Size in bytes of the stored value — mirrors OCaml `size_of_value`.
        pub fn size_of_stored(&self) -> usize {
            match self.tag {
                Tag::Int => 8,
                Tag::Float => 8,
                Tag::Bool => 1,
            }
        }
    }
    
    // ---------------------------------------------------------------------------
    // Idiomatic Rust equivalent: just use an enum.
    // In most Rust code you would never touch a raw union directly.
    // ---------------------------------------------------------------------------
    
    /// Idiomatic Rust: the compiler generates the tag and dispatch for you.
    #[derive(Debug, Clone, PartialEq)]
    pub enum ValueEnum {
        Int(i64),
        Float(f64),
        Bool(bool),
    }
    
    impl ValueEnum {
        pub fn describe(&self) -> String {
            match self {
                ValueEnum::Int(n) => format!("Int({n})"),
                ValueEnum::Float(f) => format!("Float({f})"),
                ValueEnum::Bool(b) => format!("Bool({b})"),
            }
        }
    
        pub fn size_of_stored(&self) -> usize {
            match self {
                ValueEnum::Int(_) => 8,
                ValueEnum::Float(_) => 8,
                ValueEnum::Bool(_) => 1,
            }
        }
    }
    
    // ---------------------------------------------------------------------------
    // Tests
    // ---------------------------------------------------------------------------
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        // --- Tagged-union (manual) tests ---
    
        #[test]
        fn test_int_value_round_trip() {
            let v = Value::int(42);
            assert_eq!(v.tag(), Tag::Int);
            assert_eq!(v.as_int(), Some(42));
            assert_eq!(v.as_float(), None);
            assert_eq!(v.as_bool(), None);
        }
    
        #[test]
        fn test_float_value_round_trip() {
            let v = Value::float(3.14);
            assert_eq!(v.tag(), Tag::Float);
            assert!(v.as_float().is_some());
            assert!((v.as_float().unwrap() - 3.14).abs() < f64::EPSILON);
            assert_eq!(v.as_int(), None);
            assert_eq!(v.as_bool(), None);
        }
    
        #[test]
        fn test_bool_value_round_trip() {
            let t = Value::bool(true);
            assert_eq!(t.tag(), Tag::Bool);
            assert_eq!(t.as_bool(), Some(true));
    
            let f = Value::bool(false);
            assert_eq!(f.as_bool(), Some(false));
            assert_eq!(f.as_int(), None);
        }
    
        #[test]
        fn test_negative_int() {
            let v = Value::int(-7);
            assert_eq!(v.as_int(), Some(-7));
            assert_eq!(v.describe(), "Int(-7)");
        }
    
        #[test]
        fn test_describe_and_size() {
            let vals = [Value::int(42), Value::float(3.14), Value::bool(true)];
            let descriptions: Vec<String> = vals.iter().map(|v| v.describe()).collect();
            assert_eq!(descriptions[0], "Int(42)");
            assert!(descriptions[1].starts_with("Float("));
            assert_eq!(descriptions[2], "Bool(true)");
    
            assert_eq!(vals[0].size_of_stored(), 8);
            assert_eq!(vals[1].size_of_stored(), 8);
            assert_eq!(vals[2].size_of_stored(), 1);
        }
    
        #[test]
        fn test_cross_field_isolation() {
            // Writing int then reading float must return None (tag guard prevents it).
            let v = Value::int(100);
            assert_eq!(v.as_float(), None);
            assert_eq!(v.as_bool(), None);
        }
    
        // --- Idiomatic enum tests ---
    
        #[test]
        fn test_enum_describe() {
            assert_eq!(ValueEnum::Int(42).describe(), "Int(42)");
            assert_eq!(ValueEnum::Bool(false).describe(), "Bool(false)");
        }
    
        #[test]
        fn test_enum_size_of_stored() {
            assert_eq!(ValueEnum::Int(0).size_of_stored(), 8);
            assert_eq!(ValueEnum::Float(0.0).size_of_stored(), 8);
            assert_eq!(ValueEnum::Bool(true).size_of_stored(), 1);
        }
    
        #[test]
        fn test_enum_equality() {
            assert_eq!(ValueEnum::Int(1), ValueEnum::Int(1));
            assert_ne!(ValueEnum::Int(1), ValueEnum::Int(2));
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        // --- Tagged-union (manual) tests ---
    
        #[test]
        fn test_int_value_round_trip() {
            let v = Value::int(42);
            assert_eq!(v.tag(), Tag::Int);
            assert_eq!(v.as_int(), Some(42));
            assert_eq!(v.as_float(), None);
            assert_eq!(v.as_bool(), None);
        }
    
        #[test]
        fn test_float_value_round_trip() {
            let v = Value::float(3.14);
            assert_eq!(v.tag(), Tag::Float);
            assert!(v.as_float().is_some());
            assert!((v.as_float().unwrap() - 3.14).abs() < f64::EPSILON);
            assert_eq!(v.as_int(), None);
            assert_eq!(v.as_bool(), None);
        }
    
        #[test]
        fn test_bool_value_round_trip() {
            let t = Value::bool(true);
            assert_eq!(t.tag(), Tag::Bool);
            assert_eq!(t.as_bool(), Some(true));
    
            let f = Value::bool(false);
            assert_eq!(f.as_bool(), Some(false));
            assert_eq!(f.as_int(), None);
        }
    
        #[test]
        fn test_negative_int() {
            let v = Value::int(-7);
            assert_eq!(v.as_int(), Some(-7));
            assert_eq!(v.describe(), "Int(-7)");
        }
    
        #[test]
        fn test_describe_and_size() {
            let vals = [Value::int(42), Value::float(3.14), Value::bool(true)];
            let descriptions: Vec<String> = vals.iter().map(|v| v.describe()).collect();
            assert_eq!(descriptions[0], "Int(42)");
            assert!(descriptions[1].starts_with("Float("));
            assert_eq!(descriptions[2], "Bool(true)");
    
            assert_eq!(vals[0].size_of_stored(), 8);
            assert_eq!(vals[1].size_of_stored(), 8);
            assert_eq!(vals[2].size_of_stored(), 1);
        }
    
        #[test]
        fn test_cross_field_isolation() {
            // Writing int then reading float must return None (tag guard prevents it).
            let v = Value::int(100);
            assert_eq!(v.as_float(), None);
            assert_eq!(v.as_bool(), None);
        }
    
        // --- Idiomatic enum tests ---
    
        #[test]
        fn test_enum_describe() {
            assert_eq!(ValueEnum::Int(42).describe(), "Int(42)");
            assert_eq!(ValueEnum::Bool(false).describe(), "Bool(false)");
        }
    
        #[test]
        fn test_enum_size_of_stored() {
            assert_eq!(ValueEnum::Int(0).size_of_stored(), 8);
            assert_eq!(ValueEnum::Float(0.0).size_of_stored(), 8);
            assert_eq!(ValueEnum::Bool(true).size_of_stored(), 1);
        }
    
        #[test]
        fn test_enum_equality() {
            assert_eq!(ValueEnum::Int(1), ValueEnum::Int(1));
            assert_ne!(ValueEnum::Int(1), ValueEnum::Int(2));
        }
    }

    Deep Comparison

    OCaml vs Rust: Unions / Tagged Unions

    Side-by-Side Code

    OCaml

    (* OCaml: algebraic variants ARE safe tagged unions.
       The compiler tracks the discriminant and guarantees exhaustive matching. *)
    
    type value =
      | Int   of int
      | Float of float
      | Bool  of bool
    
    let describe (v : value) : string =
      match v with
      | Int   n -> Printf.sprintf "Int(%d)"   n
      | Float f -> Printf.sprintf "Float(%g)" f
      | Bool  b -> Printf.sprintf "Bool(%b)"  b
    
    let size_of_value (v : value) : int =
      match v with
      | Int   _ -> 8
      | Float _ -> 8
      | Bool  _ -> 1
    
    let () =
      let vals = [Int 42; Float 3.14; Bool true; Int (-7)] in
      List.iter (fun v ->
        Printf.printf "%s (size=%d)\n" (describe v) (size_of_value v)
      ) vals
    

    Rust — idiomatic enum (OCaml-equivalent)

    /// Idiomatic Rust: the compiler generates the tag and dispatch for you.
    #[derive(Debug, Clone, PartialEq)]
    pub enum ValueEnum {
        Int(i64),
        Float(f64),
        Bool(bool),
    }
    
    impl ValueEnum {
        pub fn describe(&self) -> String {
            match self {
                ValueEnum::Int(n)   => format!("Int({n})"),
                ValueEnum::Float(f) => format!("Float({f})"),
                ValueEnum::Bool(b)  => format!("Bool({b})"),
            }
        }
    }
    

    Rust — explicit tagged union (raw union + enum tag)

    #[repr(C)]
    union RawValue {
        int_val:   i64,
        float_val: f64,
        bool_val:  u8,
    }
    
    #[derive(Debug, Clone, Copy, PartialEq, Eq)]
    pub enum Tag { Int, Float, Bool }
    
    pub struct Value {
        tag:  Tag,
        data: RawValue,
    }
    
    impl Value {
        pub fn int(n: i64) -> Self {
            Value { tag: Tag::Int, data: RawValue { int_val: n } }
        }
    
        pub fn as_int(&self) -> Option<i64> {
            if self.tag == Tag::Int {
                // SAFETY: tag confirmed, int_val is the active field.
                Some(unsafe { self.data.int_val })
            } else {
                None
            }
        }
    }
    

    Type Signatures

    ConceptOCamlRust (enum)Rust (raw union)
    Variant typetype value = Int of int \| Float of float \| Bool of boolenum ValueEnum { Int(i64), Float(f64), Bool(bool) }union RawValue { int_val: i64, float_val: f64, bool_val: u8 }
    Accessorpattern matchpattern matchunsafe { union.int_val } guarded by tag
    Tag trackingimplicit (compiler)implicit (compiler)explicit enum Tag field
    Safetyalways safealways saferequires unsafe
    C-ABI compatiblenonoyes (with #[repr(C)])

    Key Insights

  • OCaml variants = tagged unions under the hood. Every OCaml algebraic type is represented as a tag word plus a payload. The compiler manages both invisibly; you only see safe pattern matching.
  • **Rust enum is the idiomatic equivalent.** For almost all Rust code, enum is the right choice — the compiler handles the tag, guarantees exhaustive matching, and the code is always safe.
  • **Raw union exists for C interop.** When you need a repr(C) struct that maps byte-for-byte to a C union definition, you use Rust's union. Every field access requires unsafe because the compiler cannot know which field is live.
  • The safe-wrapper pattern. Pair the raw union with an enum discriminant in an outer struct and expose Option-returning methods. All unsafe stays inside these methods; callers never see it. This is the Rust analogue of what OCaml's runtime does automatically.
  • Memory layout control. #[repr(C)] unions guarantee a specific layout, enabling zero-cost FFI with C libraries that use union fields — something OCaml variants cannot provide directly.
  • When to Use Each Style

    **Use enum (idiomatic Rust) when:** you are writing pure Rust and need a type-safe sum type. This is the default and the right choice 99 % of the time.

    **Use raw union when:** you are writing FFI bindings that must match a C union layout exactly, or building low-level data structures (e.g., a JIT compiler's value representation) where you need to control every byte of memory and are prepared to manage the tag yourself.

    Exercises

  • Minimize unsafe: Find the smallest possible unsafe region in the source and verify that all safe code is outside the unsafe block.
  • Safe alternative: Identify if a safe alternative exists for the demonstrated technique (e.g., bytemuck for transmute, CString for FFI strings) and implement it.
  • SAFETY documentation: Write a complete SAFETY comment for each unsafe block listing preconditions, invariants, and what would break if violated.
  • Open Source Repos