552 Intermediate

Arena Allocation Pattern

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "Arena Allocation Pattern" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Arena allocation (also called region-based memory management) is a technique where all allocations live in a single region and are freed all at once when the region is dropped. Key difference from OCaml: 1. **Lifetime enforcement**: Rust arena references are tied to the arena's lifetime at compile time — use

Tutorial

The Problem

Arena allocation (also called region-based memory management) is a technique where all allocations live in a single region and are freed all at once when the region is dropped. This is dramatically faster than individual heap allocations: no per-object malloc overhead, excellent cache locality, and zero fragmentation. Compilers (LLVM, GCC), game engines, and web browsers use arenas for AST nodes, parse results, and per-frame allocations. In Rust, arenas elegantly tie allocated objects to the arena's lifetime, preventing use-after-arena-drop at compile time.

🎯 Learning Outcomes

• How StringArena stores all strings in a Vec<String> and returns &str references tied to the arena's lifetime

• Why arena-allocated references cannot outlive the arena (lifetime enforcement)

• How arenas eliminate per-object allocation overhead for batch workloads

• How the bumpalo crate implements a production arena allocator

• Where arenas are used: compilers (LLVM), parsers (nom ASTs), game engines, HTTP request processing

Code Example

#![allow(clippy::all)]
//! Arena Allocation Pattern
//!
//! All allocations tied to arena lifetime.

/// Simple arena for strings.
pub struct StringArena {
    storage: Vec<String>,
}

impl StringArena {
    pub fn new() -> Self {
        StringArena {
            storage: Vec::new(),
        }
    }

    pub fn alloc(&mut self, s: &str) -> &str {
        self.storage.push(s.to_string());
        self.storage.last().unwrap()
    }

    pub fn len(&self) -> usize {
        self.storage.len()
    }

    pub fn is_empty(&self) -> bool {
        self.storage.is_empty()
    }
}

impl Default for StringArena {
    fn default() -> Self {
        Self::new()
    }
}

/// Typed arena.
pub struct Arena<T> {
    items: Vec<T>,
}

impl<T> Arena<T> {
    pub fn new() -> Self {
        Arena { items: Vec::new() }
    }

    pub fn alloc(&mut self, item: T) -> &T {
        self.items.push(item);
        self.items.last().unwrap()
    }
}

impl<T> Default for Arena<T> {
    fn default() -> Self {
        Self::new()
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_string_arena() {
        let mut arena = StringArena::new();
        arena.alloc("hello");
        arena.alloc("world");
        assert_eq!(arena.len(), 2);
        // Note: can't keep refs across alloc calls with &mut self
    }

    #[test]
    fn test_typed_arena() {
        let mut arena: Arena<i32> = Arena::new();
        arena.alloc(1);
        arena.alloc(2);
        // Refs would require interior mutability for real arena
    }
}

(* Arena allocation in OCaml *)
(* OCaml's GC is essentially a generational arena *)
(* Simulating with a list that holds all allocated nodes *)

type 'a arena = 'a list ref

let make_arena () = ref []

let alloc arena value =
  arena := value :: !arena;
  value  (* returns the value -- in real arena, returns pointer *)

type node = {
  id: int;
  label: string;
  mutable edges: node list;
}

let make_node arena id label =
  alloc arena { id; label; edges = [] }

let add_edge from_node to_node =
  from_node.edges <- to_node :: from_node.edges

let () =
  let arena = make_arena () in
  let n1 = make_node arena 1 "start" in
  let n2 = make_node arena 2 "middle" in
  let n3 = make_node arena 3 "end" in
  add_edge n1 n2; add_edge n2 n3; add_edge n1 n3;
  Printf.printf "n1 edges: %d\n" (List.length n1.edges);
  Printf.printf "total nodes: %d\n" (List.length !arena);
  (* "Free" the arena: *)
  arena := []

Key Differences

Lifetime enforcement: Rust arena references are tied to the arena's lifetime at compile time — use-after-free is impossible; OCaml relies on the GC to prevent this at runtime.

Batch free: Rust arenas (like bumpalo) support O(1) batch deallocation by dropping the arena; OCaml's GC defers collection until pressure warrants it.

Cache performance: Bumpalo's bump allocation keeps objects contiguous in memory; OCaml's copying GC (in its minor heap) achieves similar cache locality for young objects.

Vec reallocation hazard: The simple StringArena implementation can invalidate references if the Vec reallocates — production Rust arenas use slab allocation to prevent this.

OCaml Approach

OCaml's GC naturally acts as an arena — objects allocated during processing are collected together when they become unreachable. For performance-sensitive arenas, OCaml uses Bigarray or manual memory management via Bigarray.Array1:

(* OCaml GC is effectively an automatic arena *)
let parse_and_process input =
  let ast = parse input in  (* AST nodes live until parse_and_process returns *)
  analyze ast
  (* ast and all its nodes are GC-collected after this *)

Explicit arenas in OCaml typically use Buffer.t for strings or custom allocators for performance.

Full Source

#![allow(clippy::all)]
//! Arena Allocation Pattern
//!
//! All allocations tied to arena lifetime.

/// Simple arena for strings.
pub struct StringArena {
    storage: Vec<String>,
}

impl StringArena {
    pub fn new() -> Self {
        StringArena {
            storage: Vec::new(),
        }
    }

    pub fn alloc(&mut self, s: &str) -> &str {
        self.storage.push(s.to_string());
        self.storage.last().unwrap()
    }

    pub fn len(&self) -> usize {
        self.storage.len()
    }

    pub fn is_empty(&self) -> bool {
        self.storage.is_empty()
    }
}

impl Default for StringArena {
    fn default() -> Self {
        Self::new()
    }
}

/// Typed arena.
pub struct Arena<T> {
    items: Vec<T>,
}

impl<T> Arena<T> {
    pub fn new() -> Self {
        Arena { items: Vec::new() }
    }

    pub fn alloc(&mut self, item: T) -> &T {
        self.items.push(item);
        self.items.last().unwrap()
    }
}

impl<T> Default for Arena<T> {
    fn default() -> Self {
        Self::new()
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_string_arena() {
        let mut arena = StringArena::new();
        arena.alloc("hello");
        arena.alloc("world");
        assert_eq!(arena.len(), 2);
        // Note: can't keep refs across alloc calls with &mut self
    }

    #[test]
    fn test_typed_arena() {
        let mut arena: Arena<i32> = Arena::new();
        arena.alloc(1);
        arena.alloc(2);
        // Refs would require interior mutability for real arena
    }
}

(* Arena allocation in OCaml *)
(* OCaml's GC is essentially a generational arena *)
(* Simulating with a list that holds all allocated nodes *)

type 'a arena = 'a list ref

let make_arena () = ref []

let alloc arena value =
  arena := value :: !arena;
  value  (* returns the value -- in real arena, returns pointer *)

type node = {
  id: int;
  label: string;
  mutable edges: node list;
}

let make_node arena id label =
  alloc arena { id; label; edges = [] }

let add_edge from_node to_node =
  from_node.edges <- to_node :: from_node.edges

let () =
  let arena = make_arena () in
  let n1 = make_node arena 1 "start" in
  let n2 = make_node arena 2 "middle" in
  let n3 = make_node arena 3 "end" in
  add_edge n1 n2; add_edge n2 n3; add_edge n1 n3;
  Printf.printf "n1 edges: %d\n" (List.length n1.edges);
  Printf.printf "total nodes: %d\n" (List.length !arena);
  (* "Free" the arena: *)
  arena := []

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_string_arena() {
        let mut arena = StringArena::new();
        arena.alloc("hello");
        arena.alloc("world");
        assert_eq!(arena.len(), 2);
        // Note: can't keep refs across alloc calls with &mut self
    }

    #[test]
    fn test_typed_arena() {
        let mut arena: Arena<i32> = Arena::new();
        arena.alloc(1);
        arena.alloc(2);
        // Refs would require interior mutability for real arena
    }
}

Deep Comparison

OCaml vs Rust: lifetime arena pattern

See example.rs and example.ml for implementations.

Key Differences

OCaml uses garbage collection

Rust uses ownership and borrowing

Both support the core concept

Exercises

Fixed-capacity arena: Implement an arena that pre-allocates a Vec<String> with a fixed capacity and returns Err when full, ensuring no reallocation can invalidate references.

AST arena: Build a simple arena for AST nodes using Vec<Box<AstNode>> where each node is a heap allocation whose reference is valid for the arena's lifetime.

Benchmark comparison: Compare allocating 10,000 strings with (a) individual String::from, (b) a simple arena, and (c) direct Vec<String> — measure and explain the performance differences.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust