ExamplesBy LevelBy TopicLearning Paths
119 Advanced

Zero-Cost Abstractions

Functional Programming

Tutorial

The Problem

High-level code often runs slower than hand-written low-level code in languages without zero-cost guarantees — intermediate allocations, dynamic dispatch, and interpreter overhead add up. Rust's zero-cost abstraction principle guarantees that iterator chains, closures, and newtype wrappers compile to identical machine code as their hand-written equivalents. This enables writing expressive, composable code without sacrificing performance — a core design goal of the language.

🎯 Learning Outcomes

  • • Understand what "zero-cost abstraction" means concretely in terms of generated code
  • • See how iterator chains fuse into a single loop with no intermediate allocation
  • • Learn how closures are monomorphized and inlined, eliminating indirect calls
  • • Understand how newtypes provide compile-time safety with literally zero runtime overhead
  • Code Example

    pub fn sum_even_squares(n: i64) -> i64 {
        (0..n).filter(|x| x % 2 == 0).map(|x| x * x).sum()
    }

    Key Differences

  • Iterator fusion: Rust's iterator adaptors fuse at compile time (no intermediate collections); OCaml's eager list functions allocate at each step unless using Seq.
  • Closure cost: Rust closures are monomorphized with impl Fn — zero heap allocation, inlined call; OCaml closures are heap-allocated function records.
  • Newtype overhead: Rust newtypes are zero-overhead by construction (#[repr(transparent)]); OCaml has no equivalent — modules wrapping types may add indirection.
  • Guarantee vs. best-effort: Rust makes zero-cost a language-level guarantee for these patterns; OCaml relies on the optimizer (flambda) for similar effects.
  • OCaml Approach

    OCaml does not guarantee zero-cost abstractions by default. List.filter |> List.map allocates two intermediate lists. OCaml closures carry a heap-allocated environment record. The flambda optimizer can eliminate some overhead in batch compilation, and Seq provides lazy iteration, but these are opt-in rather than guaranteed by the language design.

    Full Source

    #![allow(clippy::all)]
    //! Example 119: Zero-Cost Abstractions
    //!
    //! Iterator chains, closures, and newtypes compile to the same machine code
    //! as hand-written equivalents — abstraction with no runtime penalty.
    
    // ── Approach 1: Iterator chains ──────────────────────────────────────────────
    //
    // `.filter().map().sum()` fuses into a single loop at compile time.
    // No intermediate `Vec` is ever allocated.
    
    /// Sum of squares of all even numbers in `0..n`.
    pub fn sum_even_squares(n: i64) -> i64 {
        (0..n).filter(|x| x % 2 == 0).map(|x| x * x).sum()
    }
    
    /// Same computation written as an explicit loop — produces identical assembly.
    pub fn sum_even_squares_manual(n: i64) -> i64 {
        let mut acc = 0i64;
        for x in 0..n {
            if x % 2 == 0 {
                acc += x * x;
            }
        }
        acc
    }
    
    // ── Approach 2: Closures monomorphised at the call site ───────────────────────
    //
    // `impl Fn(f64) -> f64` is a zero-sized type; LLVM inlines the closure body
    // completely. No heap allocation, no indirect call.
    
    /// Returns a closure that evaluates the polynomial `cā‚€ + c₁x + cā‚‚x² + …`.
    pub fn make_polynomial(coeffs: Vec<f64>) -> impl Fn(f64) -> f64 {
        move |x| {
            coeffs
                .iter()
                .enumerate()
                .map(|(i, &c)| c * x.powi(i as i32))
                .sum()
        }
    }
    
    // ── Approach 3: Newtypes — compile-time safety, zero runtime cost ─────────────
    //
    // `Meters` and `Seconds` are distinct types that the compiler enforces,
    // yet at runtime each is just a bare `f64`.
    
    /// Distance measured in metres.
    #[derive(Debug, Clone, Copy, PartialEq)]
    pub struct Meters(pub f64);
    
    /// Time measured in seconds.
    #[derive(Debug, Clone, Copy, PartialEq)]
    pub struct Seconds(pub f64);
    
    /// Speed measured in metres per second (derived from newtypes).
    #[derive(Debug, Clone, Copy, PartialEq)]
    pub struct MetersPerSecond(pub f64);
    
    impl Meters {
        pub fn value(self) -> f64 {
            self.0
        }
    }
    
    impl Seconds {
        pub fn value(self) -> f64 {
            self.0
        }
    }
    
    /// Compute speed — the type system prevents accidentally passing `Seconds`
    /// where `Meters` is expected, at zero runtime cost.
    pub fn speed(distance: Meters, time: Seconds) -> MetersPerSecond {
        MetersPerSecond(distance.0 / time.0)
    }
    
    // ── Approach 4: Higher-order pipeline — idiomatic functional style ────────────
    
    /// Apply a pipeline of transformations to a slice, returning a collected `Vec`.
    /// Each `fn` pointer is monomorphised; the chain is inlined by the optimiser.
    pub fn pipeline<T, U, F>(data: &[T], transform: F) -> Vec<U>
    where
        T: Copy,
        F: Fn(T) -> U,
    {
        data.iter().copied().map(transform).collect()
    }
    
    // ─────────────────────────────────────────────────────────────────────────────
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_iterator_chain_equals_manual_loop() {
            // The two implementations must produce identical results for all n.
            for n in [0, 1, 10, 100, 1000] {
                assert_eq!(
                    sum_even_squares(n),
                    sum_even_squares_manual(n),
                    "mismatch at n={n}"
                );
            }
        }
    
        #[test]
        fn test_sum_even_squares_known_value() {
            // 0..10 even numbers: 0,2,4,6,8  → squares: 0,4,16,36,64 → sum=120
            assert_eq!(sum_even_squares(10), 120);
        }
    
        #[test]
        fn test_polynomial_closure() {
            // p(x) = 1 + 2x + 3x²
            // p(0) = 1, p(1) = 6, p(2) = 17
            let poly = make_polynomial(vec![1.0, 2.0, 3.0]);
            assert!((poly(0.0) - 1.0).abs() < 1e-10);
            assert!((poly(1.0) - 6.0).abs() < 1e-10);
            assert!((poly(2.0) - 17.0).abs() < 1e-10);
        }
    
        #[test]
        fn test_polynomial_constant() {
            // p(x) = 5  (single coefficient)
            let poly = make_polynomial(vec![5.0]);
            assert!((poly(0.0) - 5.0).abs() < 1e-10);
            assert!((poly(99.0) - 5.0).abs() < 1e-10);
        }
    
        #[test]
        fn test_newtype_speed() {
            let d = Meters(100.0);
            let t = Seconds(9.58); // Usain Bolt world record
            let s = speed(d, t);
            assert!((s.0 - 100.0 / 9.58).abs() < 1e-10);
        }
    
        #[test]
        fn test_newtype_zero_cost_value_access() {
            // Newtype wrapper adds no overhead — .value() is an identity function.
            let m = Meters(42.0);
            let s = Seconds(7.0);
            assert_eq!(m.value(), 42.0);
            assert_eq!(s.value(), 7.0);
        }
    
        #[test]
        fn test_pipeline_transform() {
            let data = [1i32, 2, 3, 4, 5];
            let doubled = pipeline(&data, |x| x * 2);
            assert_eq!(doubled, vec![2, 4, 6, 8, 10]);
        }
    
        #[test]
        fn test_pipeline_empty() {
            let data: [i32; 0] = [];
            let result = pipeline(&data, |x| x + 1);
            assert_eq!(result, Vec::<i32>::new());
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_iterator_chain_equals_manual_loop() {
            // The two implementations must produce identical results for all n.
            for n in [0, 1, 10, 100, 1000] {
                assert_eq!(
                    sum_even_squares(n),
                    sum_even_squares_manual(n),
                    "mismatch at n={n}"
                );
            }
        }
    
        #[test]
        fn test_sum_even_squares_known_value() {
            // 0..10 even numbers: 0,2,4,6,8  → squares: 0,4,16,36,64 → sum=120
            assert_eq!(sum_even_squares(10), 120);
        }
    
        #[test]
        fn test_polynomial_closure() {
            // p(x) = 1 + 2x + 3x²
            // p(0) = 1, p(1) = 6, p(2) = 17
            let poly = make_polynomial(vec![1.0, 2.0, 3.0]);
            assert!((poly(0.0) - 1.0).abs() < 1e-10);
            assert!((poly(1.0) - 6.0).abs() < 1e-10);
            assert!((poly(2.0) - 17.0).abs() < 1e-10);
        }
    
        #[test]
        fn test_polynomial_constant() {
            // p(x) = 5  (single coefficient)
            let poly = make_polynomial(vec![5.0]);
            assert!((poly(0.0) - 5.0).abs() < 1e-10);
            assert!((poly(99.0) - 5.0).abs() < 1e-10);
        }
    
        #[test]
        fn test_newtype_speed() {
            let d = Meters(100.0);
            let t = Seconds(9.58); // Usain Bolt world record
            let s = speed(d, t);
            assert!((s.0 - 100.0 / 9.58).abs() < 1e-10);
        }
    
        #[test]
        fn test_newtype_zero_cost_value_access() {
            // Newtype wrapper adds no overhead — .value() is an identity function.
            let m = Meters(42.0);
            let s = Seconds(7.0);
            assert_eq!(m.value(), 42.0);
            assert_eq!(s.value(), 7.0);
        }
    
        #[test]
        fn test_pipeline_transform() {
            let data = [1i32, 2, 3, 4, 5];
            let doubled = pipeline(&data, |x| x * 2);
            assert_eq!(doubled, vec![2, 4, 6, 8, 10]);
        }
    
        #[test]
        fn test_pipeline_empty() {
            let data: [i32; 0] = [];
            let result = pipeline(&data, |x| x + 1);
            assert_eq!(result, Vec::<i32>::new());
        }
    }

    Deep Comparison

    OCaml vs Rust: Zero-Cost Abstractions

    Side-by-Side Code

    OCaml

    (* OCaml: filter + map + fold — creates intermediate lists *)
    let sum_even_squares n =
      List.init n Fun.id
      |> List.filter (fun x -> x mod 2 = 0)
      |> List.map (fun x -> x * x)
      |> List.fold_left ( + ) 0
    
    (* Closure-returning function — may allocate a closure record on the heap *)
    let make_polynomial coeffs =
      fun x ->
        List.mapi (fun i c -> c *. (x ** float_of_int i)) coeffs
        |> List.fold_left ( +. ) 0.0
    

    Rust (idiomatic — iterator chain)

    pub fn sum_even_squares(n: i64) -> i64 {
        (0..n).filter(|x| x % 2 == 0).map(|x| x * x).sum()
    }
    

    Rust (functional/recursive — explicit fold)

    pub fn sum_even_squares_recursive(n: i64) -> i64 {
        fn go(x: i64, limit: i64, acc: i64) -> i64 {
            if x >= limit { return acc; }
            let next = if x % 2 == 0 { acc + x * x } else { acc };
            go(x + 1, limit, next)
        }
        go(0, n, 0)
    }
    

    Rust (newtype — zero-cost phantom type)

    pub struct Meters(pub f64);
    pub struct Seconds(pub f64);
    
    pub fn speed(d: Meters, t: Seconds) -> f64 {
        d.0 / t.0
    }
    

    Type Signatures

    ConceptOCamlRust
    Filter+map+sum'a list -> int (intermediate lists)impl Iterator (lazy, fused)
    Closure-returning fnfloat list -> float -> float (may box closure)fn(...) -> impl Fn(f64) -> f64 (monomorphised)
    Newtypetype meters = Meters of float (constructor overhead in some backends)struct Meters(f64) (transparent repr)
    Polymorphic pipeline('a -> 'b) -> 'a list -> 'b listfn<T,U,F: Fn(T)->U>(&[T], F) -> Vec<U>

    Key Insights

  • Iterator fusion: OCaml's List.filter and List.map each traverse and allocate a new list. Rust's Iterator adapters are lazy; the compiler fuses the entire chain into a single loop with no heap traffic.
  • Closure monomorphisation: OCaml closures are heap-allocated records with a function pointer (indirect call). Rust closures each get a unique anonymous struct type; at the call site the compiler sees the exact type and can inline the body completely, eliminating the indirect call and enabling further optimisation.
  • Newtype transparency: Both languages support newtypes for type-safe wrappers. In OCaml a constructor call may require a tag word and an allocation in boxed contexts. In Rust #[repr(transparent)] (the default for single-field structs) guarantees the wrapper is bit-for-bit identical to its inner type — the type information disappears entirely after type-checking.
  • Abstraction as compiler hints: Rust's higher-level constructs are instructions to the compiler, not the CPU. The Iterator protocol, impl Trait return types, and newtype wrappers give the optimiser the information it needs to produce the same (or better) assembly as a hand-written C loop — without asking the programmer to give up expressiveness.
  • Verification approach: Because Rust cannot directly expose assembly in unit tests, we verify zero-cost empirically by asserting that the iterator-chain result equals the manual-loop result for all inputs. In a release build with cargo asm or Godbolt, the two functions produce identical machine code.
  • When to Use Each Style

    Use the iterator chain when: you want readable, composable data transformation — filter, map, flat_map, take_while, sum, fold. The compiler handles fusion; you get clarity for free.

    Use explicit loops when: the transformation requires mutable state that doesn't fit cleanly into a fold (e.g., sliding-window state machines), or when readability genuinely benefits from imperative style. The performance will be equivalent.

    Use newtypes when: you want the compiler to enforce unit correctness (metres vs seconds, user IDs vs product IDs) at zero runtime cost. Prefer them over type aliases (type Meters = f64) which are transparent to the type checker.

    Exercises

  • Compare assembly output (via cargo asm or Godbolt) of sum_even_squares and sum_even_squares_manual — verify they produce identical loops.
  • Implement a Celsius newtype and a to_fahrenheit conversion; confirm the newtype has size_of::<Celsius>() == size_of::<f64>().
  • Chain five iterator adaptors (filter, map, take, zip, fold) and verify in a benchmark that performance matches a manually unrolled loop.
  • Open Source Repos