unions in rust
Tutorial
The Problem
This example covers a specific aspect of Rust's unsafe programming model: raw memory manipulation, FFI interop, allocator customization, or soundness principles. These topics are essential for systems programming — writing OS components, device drivers, game engines, and any code that must interact with C libraries or control memory layout precisely. Rust's unsafe system is designed to confine unsafety to small, auditable regions while maintaining safety in the surrounding code.
🎯 Learning Outcomes
Code Example
/// Idiomatic Rust: the compiler generates the tag and dispatch for you.
#[derive(Debug, Clone, PartialEq)]
pub enum ValueEnum {
Int(i64),
Float(f64),
Bool(bool),
}
impl ValueEnum {
pub fn describe(&self) -> String {
match self {
ValueEnum::Int(n) => format!("Int({n})"),
ValueEnum::Float(f) => format!("Float({f})"),
ValueEnum::Bool(b) => format!("Bool({b})"),
}
}
}Key Differences
unsafe for these operations; OCaml achieves safety through the GC and type system without explicit unsafe regions.extern "C"; OCaml uses ctypes which wraps C types in OCaml values.#[repr(C)], custom allocators); OCaml's GC manages memory layout automatically.OCaml Approach
OCaml's GC and type system eliminate most of the need for these unsafe operations. The equivalent functionality typically uses:
ctypes library for external function callsBigarray for controlled raw memory access Bytes.t for mutable byte sequencesOCaml programs rarely need operations equivalent to these Rust unsafe patterns.
Full Source
#![allow(clippy::all)]
//! 709 — Unions in Rust: C-style Tagged Unions
//!
//! Raw `union` + enum tag = safe tagged union.
//! This is exactly what OCaml's algebraic data types are at the hardware level,
//! except OCaml hides the tag and dispatch from you. Here we write it explicitly.
// ---------------------------------------------------------------------------
// Raw union — all fields overlap at the same memory address.
// Only usable inside `unsafe` blocks.
// ---------------------------------------------------------------------------
/// Untagged union: all fields share the same memory location.
/// Reading the wrong field after writing another is undefined behaviour.
#[repr(C)]
union RawValue {
int_val: i64,
float_val: f64,
bool_val: u8,
}
// ---------------------------------------------------------------------------
// Tag enum — tracks which field of the union is currently valid.
// ---------------------------------------------------------------------------
/// Discriminant tracking which field is active.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Tag {
Int,
Float,
Bool,
}
// ---------------------------------------------------------------------------
// Safe tagged union — pairs the raw union with its discriminant.
// All unsafe access is confined to these methods.
// ---------------------------------------------------------------------------
/// Safe tagged union: an enum tag guards all reads of the raw union.
pub struct Value {
tag: Tag,
data: RawValue,
}
impl Value {
/// Construct a `Value` holding an integer.
pub fn int(n: i64) -> Self {
Value {
tag: Tag::Int,
data: RawValue { int_val: n },
}
}
/// Construct a `Value` holding a float.
pub fn float(f: f64) -> Self {
Value {
tag: Tag::Float,
data: RawValue { float_val: f },
}
}
/// Construct a `Value` holding a boolean.
pub fn bool(b: bool) -> Self {
Value {
tag: Tag::Bool,
data: RawValue { bool_val: b as u8 },
}
}
/// Return the integer if the tag is `Int`, otherwise `None`.
pub fn as_int(&self) -> Option<i64> {
if self.tag == Tag::Int {
// SAFETY: we just checked the tag is Int, so int_val was the last
// field written and its bits are valid for i64.
Some(unsafe { self.data.int_val })
} else {
None
}
}
/// Return the float if the tag is `Float`, otherwise `None`.
pub fn as_float(&self) -> Option<f64> {
if self.tag == Tag::Float {
// SAFETY: tag is Float, so float_val is the active field.
Some(unsafe { self.data.float_val })
} else {
None
}
}
/// Return the bool if the tag is `Bool`, otherwise `None`.
pub fn as_bool(&self) -> Option<bool> {
if self.tag == Tag::Bool {
// SAFETY: tag is Bool; u8 non-zero → true, zero → false.
Some(unsafe { self.data.bool_val != 0 })
} else {
None
}
}
/// The active tag for this value.
pub fn tag(&self) -> Tag {
self.tag
}
/// Human-readable description — mirrors the OCaml `describe` function.
pub fn describe(&self) -> String {
match self.tag {
Tag::Int => format!("Int({})", unsafe { self.data.int_val }),
Tag::Float => format!("Float({})", unsafe { self.data.float_val }),
Tag::Bool => format!("Bool({})", unsafe { self.data.bool_val != 0 }),
}
}
/// Size in bytes of the stored value — mirrors OCaml `size_of_value`.
pub fn size_of_stored(&self) -> usize {
match self.tag {
Tag::Int => 8,
Tag::Float => 8,
Tag::Bool => 1,
}
}
}
// ---------------------------------------------------------------------------
// Idiomatic Rust equivalent: just use an enum.
// In most Rust code you would never touch a raw union directly.
// ---------------------------------------------------------------------------
/// Idiomatic Rust: the compiler generates the tag and dispatch for you.
#[derive(Debug, Clone, PartialEq)]
pub enum ValueEnum {
Int(i64),
Float(f64),
Bool(bool),
}
impl ValueEnum {
pub fn describe(&self) -> String {
match self {
ValueEnum::Int(n) => format!("Int({n})"),
ValueEnum::Float(f) => format!("Float({f})"),
ValueEnum::Bool(b) => format!("Bool({b})"),
}
}
pub fn size_of_stored(&self) -> usize {
match self {
ValueEnum::Int(_) => 8,
ValueEnum::Float(_) => 8,
ValueEnum::Bool(_) => 1,
}
}
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
#[cfg(test)]
mod tests {
use super::*;
// --- Tagged-union (manual) tests ---
#[test]
fn test_int_value_round_trip() {
let v = Value::int(42);
assert_eq!(v.tag(), Tag::Int);
assert_eq!(v.as_int(), Some(42));
assert_eq!(v.as_float(), None);
assert_eq!(v.as_bool(), None);
}
#[test]
fn test_float_value_round_trip() {
let v = Value::float(3.14);
assert_eq!(v.tag(), Tag::Float);
assert!(v.as_float().is_some());
assert!((v.as_float().unwrap() - 3.14).abs() < f64::EPSILON);
assert_eq!(v.as_int(), None);
assert_eq!(v.as_bool(), None);
}
#[test]
fn test_bool_value_round_trip() {
let t = Value::bool(true);
assert_eq!(t.tag(), Tag::Bool);
assert_eq!(t.as_bool(), Some(true));
let f = Value::bool(false);
assert_eq!(f.as_bool(), Some(false));
assert_eq!(f.as_int(), None);
}
#[test]
fn test_negative_int() {
let v = Value::int(-7);
assert_eq!(v.as_int(), Some(-7));
assert_eq!(v.describe(), "Int(-7)");
}
#[test]
fn test_describe_and_size() {
let vals = [Value::int(42), Value::float(3.14), Value::bool(true)];
let descriptions: Vec<String> = vals.iter().map(|v| v.describe()).collect();
assert_eq!(descriptions[0], "Int(42)");
assert!(descriptions[1].starts_with("Float("));
assert_eq!(descriptions[2], "Bool(true)");
assert_eq!(vals[0].size_of_stored(), 8);
assert_eq!(vals[1].size_of_stored(), 8);
assert_eq!(vals[2].size_of_stored(), 1);
}
#[test]
fn test_cross_field_isolation() {
// Writing int then reading float must return None (tag guard prevents it).
let v = Value::int(100);
assert_eq!(v.as_float(), None);
assert_eq!(v.as_bool(), None);
}
// --- Idiomatic enum tests ---
#[test]
fn test_enum_describe() {
assert_eq!(ValueEnum::Int(42).describe(), "Int(42)");
assert_eq!(ValueEnum::Bool(false).describe(), "Bool(false)");
}
#[test]
fn test_enum_size_of_stored() {
assert_eq!(ValueEnum::Int(0).size_of_stored(), 8);
assert_eq!(ValueEnum::Float(0.0).size_of_stored(), 8);
assert_eq!(ValueEnum::Bool(true).size_of_stored(), 1);
}
#[test]
fn test_enum_equality() {
assert_eq!(ValueEnum::Int(1), ValueEnum::Int(1));
assert_ne!(ValueEnum::Int(1), ValueEnum::Int(2));
}
}#[cfg(test)]
mod tests {
use super::*;
// --- Tagged-union (manual) tests ---
#[test]
fn test_int_value_round_trip() {
let v = Value::int(42);
assert_eq!(v.tag(), Tag::Int);
assert_eq!(v.as_int(), Some(42));
assert_eq!(v.as_float(), None);
assert_eq!(v.as_bool(), None);
}
#[test]
fn test_float_value_round_trip() {
let v = Value::float(3.14);
assert_eq!(v.tag(), Tag::Float);
assert!(v.as_float().is_some());
assert!((v.as_float().unwrap() - 3.14).abs() < f64::EPSILON);
assert_eq!(v.as_int(), None);
assert_eq!(v.as_bool(), None);
}
#[test]
fn test_bool_value_round_trip() {
let t = Value::bool(true);
assert_eq!(t.tag(), Tag::Bool);
assert_eq!(t.as_bool(), Some(true));
let f = Value::bool(false);
assert_eq!(f.as_bool(), Some(false));
assert_eq!(f.as_int(), None);
}
#[test]
fn test_negative_int() {
let v = Value::int(-7);
assert_eq!(v.as_int(), Some(-7));
assert_eq!(v.describe(), "Int(-7)");
}
#[test]
fn test_describe_and_size() {
let vals = [Value::int(42), Value::float(3.14), Value::bool(true)];
let descriptions: Vec<String> = vals.iter().map(|v| v.describe()).collect();
assert_eq!(descriptions[0], "Int(42)");
assert!(descriptions[1].starts_with("Float("));
assert_eq!(descriptions[2], "Bool(true)");
assert_eq!(vals[0].size_of_stored(), 8);
assert_eq!(vals[1].size_of_stored(), 8);
assert_eq!(vals[2].size_of_stored(), 1);
}
#[test]
fn test_cross_field_isolation() {
// Writing int then reading float must return None (tag guard prevents it).
let v = Value::int(100);
assert_eq!(v.as_float(), None);
assert_eq!(v.as_bool(), None);
}
// --- Idiomatic enum tests ---
#[test]
fn test_enum_describe() {
assert_eq!(ValueEnum::Int(42).describe(), "Int(42)");
assert_eq!(ValueEnum::Bool(false).describe(), "Bool(false)");
}
#[test]
fn test_enum_size_of_stored() {
assert_eq!(ValueEnum::Int(0).size_of_stored(), 8);
assert_eq!(ValueEnum::Float(0.0).size_of_stored(), 8);
assert_eq!(ValueEnum::Bool(true).size_of_stored(), 1);
}
#[test]
fn test_enum_equality() {
assert_eq!(ValueEnum::Int(1), ValueEnum::Int(1));
assert_ne!(ValueEnum::Int(1), ValueEnum::Int(2));
}
}
Deep Comparison
OCaml vs Rust: Unions / Tagged Unions
Side-by-Side Code
OCaml
(* OCaml: algebraic variants ARE safe tagged unions.
The compiler tracks the discriminant and guarantees exhaustive matching. *)
type value =
| Int of int
| Float of float
| Bool of bool
let describe (v : value) : string =
match v with
| Int n -> Printf.sprintf "Int(%d)" n
| Float f -> Printf.sprintf "Float(%g)" f
| Bool b -> Printf.sprintf "Bool(%b)" b
let size_of_value (v : value) : int =
match v with
| Int _ -> 8
| Float _ -> 8
| Bool _ -> 1
let () =
let vals = [Int 42; Float 3.14; Bool true; Int (-7)] in
List.iter (fun v ->
Printf.printf "%s (size=%d)\n" (describe v) (size_of_value v)
) vals
Rust — idiomatic enum (OCaml-equivalent)
/// Idiomatic Rust: the compiler generates the tag and dispatch for you.
#[derive(Debug, Clone, PartialEq)]
pub enum ValueEnum {
Int(i64),
Float(f64),
Bool(bool),
}
impl ValueEnum {
pub fn describe(&self) -> String {
match self {
ValueEnum::Int(n) => format!("Int({n})"),
ValueEnum::Float(f) => format!("Float({f})"),
ValueEnum::Bool(b) => format!("Bool({b})"),
}
}
}
Rust — explicit tagged union (raw union + enum tag)
#[repr(C)]
union RawValue {
int_val: i64,
float_val: f64,
bool_val: u8,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Tag { Int, Float, Bool }
pub struct Value {
tag: Tag,
data: RawValue,
}
impl Value {
pub fn int(n: i64) -> Self {
Value { tag: Tag::Int, data: RawValue { int_val: n } }
}
pub fn as_int(&self) -> Option<i64> {
if self.tag == Tag::Int {
// SAFETY: tag confirmed, int_val is the active field.
Some(unsafe { self.data.int_val })
} else {
None
}
}
}
Type Signatures
| Concept | OCaml | Rust (enum) | Rust (raw union) |
|---|---|---|---|
| Variant type | type value = Int of int \| Float of float \| Bool of bool | enum ValueEnum { Int(i64), Float(f64), Bool(bool) } | union RawValue { int_val: i64, float_val: f64, bool_val: u8 } |
| Accessor | pattern match | pattern match | unsafe { union.int_val } guarded by tag |
| Tag tracking | implicit (compiler) | implicit (compiler) | explicit enum Tag field |
| Safety | always safe | always safe | requires unsafe |
| C-ABI compatible | no | no | yes (with #[repr(C)]) |
Key Insights
enum is the idiomatic equivalent.** For almost all Rust code, enum is the right choice — the compiler handles the tag, guarantees exhaustive matching, and the code is always safe.union exists for C interop.** When you need a repr(C) struct that maps byte-for-byte to a C union definition, you use Rust's union. Every field access requires unsafe because the compiler cannot know which field is live.union with an enum discriminant in an outer struct and expose Option-returning methods. All unsafe stays inside these methods; callers never see it. This is the Rust analogue of what OCaml's runtime does automatically.#[repr(C)] unions guarantee a specific layout, enabling zero-cost FFI with C libraries that use union fields — something OCaml variants cannot provide directly.When to Use Each Style
**Use enum (idiomatic Rust) when:** you are writing pure Rust and need a type-safe sum type. This is the default and the right choice 99 % of the time.
**Use raw union when:** you are writing FFI bindings that must match a C union layout exactly, or building low-level data structures (e.g., a JIT compiler's value representation) where you need to control every byte of memory and are prepared to manage the tag yourself.
Exercises
bytemuck for transmute, CString for FFI strings) and implement it.