530 Intermediate

Closures in Benchmarking

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "Closures in Benchmarking" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Micro-benchmarking is notoriously tricky: compilers optimize aggressively, CPUs speculate and prefetch, and without careful technique, what you measure may not reflect what actually runs in production. Key difference from OCaml: 1. **Dead code prevention**: Rust has `std::hint::black_box` as a stable stdlib function; OCaml uses `Sys.opaque_identity` (internal, not guaranteed stable) or `ignore` which the optimizer may eliminate.

Tutorial

The Problem

Micro-benchmarking is notoriously tricky: compilers optimize aggressively, CPUs speculate and prefetch, and without careful technique, what you measure may not reflect what actually runs in production. Closures are central to benchmarking APIs because they encapsulate the code under test while providing the benchmark harness control over setup, teardown, and iteration. std::hint::black_box prevents the optimizer from eliminating computations whose results are discarded. This example shows how to build closure-based benchmarking utilities similar to criterion.

🎯 Learning Outcomes

• How FnMut() -> T encapsulates the code under test in a reusable way

• Why std::hint::black_box is essential for preventing dead code elimination

• How warmup iterations reduce JIT and cache effects in micro-benchmarks

• How bench_compare takes two FnMut closures and reports their relative performance

• How setup and teardown closures enable fair measurement of just the code under test

Code Example

use std::hint::black_box;

pub fn bench<T, F: FnMut() -> T>(name: &str, iters: usize, mut f: F) {
    let start = Instant::now();
    for _ in 0..iters {
        black_box(f());  // prevent optimization
    }
    println!("{}: {:?}", name, start.elapsed());
}

let bench name iterations f =
  let start = Unix.gettimeofday () in
  for _ = 1 to iterations do
    ignore (f ())
  done;
  let elapsed = Unix.gettimeofday () -. start in
  Printf.printf "%s: %f sec\n" name elapsed

let () = bench "sum" 1000 (fun () -> List.fold_left (+) 0 [1;2;3])

Key Differences

Dead code prevention: Rust has std::hint::black_box as a stable stdlib function; OCaml uses Sys.opaque_identity (internal, not guaranteed stable) or ignore which the optimizer may eliminate.

Timing granularity: Rust's std::time::Instant has nanosecond resolution; OCaml's Unix.gettimeofday has microsecond resolution on most platforms.

Generic test body: Rust's FnMut() -> T allows the benchmark to observe the return value via black_box; OCaml's unit -> unit discards the result, potentially allowing optimization.

Library ecosystem: Rust has criterion (statistics-based) and divan as mature benchmark frameworks; OCaml has Core_bench from Jane Street and Bechamel.

OCaml Approach

OCaml benchmarking uses the Core_bench or Bechamel library. Benchmarks are registered as closures:

let bench name f =
  let t0 = Unix.gettimeofday () in
  for _ = 1 to 1000 do ignore (f ()) done;
  let t1 = Unix.gettimeofday () in
  Printf.printf "%s: %.2fus\n" name ((t1 -. t0) /. 1000.0 *. 1e6)

OCaml lacks a black_box equivalent in stdlib — ignore or Sys.opaque_identity is used instead.

Full Source

#![allow(clippy::all)]
//! Closures in Benchmarking
//!
//! Patterns for measuring performance with closures and black_box.

use std::hint::black_box;
use std::time::{Duration, Instant};

/// Simple benchmark runner: time a closure over N iterations.
pub fn bench<T, F: FnMut() -> T>(name: &str, iterations: usize, mut f: F) -> Duration {
    // Warmup
    for _ in 0..iterations / 10 {
        black_box(f());
    }

    let start = Instant::now();
    for _ in 0..iterations {
        black_box(f());
    }
    let elapsed = start.elapsed();
    let per_iter = elapsed / iterations as u32;
    println!(
        "{}: {:?} per iteration ({} iters)",
        name, per_iter, iterations
    );
    elapsed
}

/// Compare two implementations.
pub fn bench_compare<T, F1, F2>(name1: &str, f1: F1, name2: &str, f2: F2, iterations: usize)
where
    F1: FnMut() -> T,
    F2: FnMut() -> T,
{
    let t1 = bench(name1, iterations, f1);
    let t2 = bench(name2, iterations, f2);

    let ratio = t1.as_nanos() as f64 / t2.as_nanos() as f64;
    println!("Ratio {}/{}: {:.2}x", name1, name2, ratio);
}

/// Prevent value from being optimized away.
pub fn consume<T>(value: T) -> T {
    black_box(value)
}

/// Benchmark with setup and teardown closures.
pub fn bench_with_setup<S, T, Setup, Test, Teardown>(
    name: &str,
    iterations: usize,
    mut setup: Setup,
    mut test: Test,
    mut teardown: Teardown,
) -> Duration
where
    Setup: FnMut() -> S,
    Test: FnMut(S) -> T,
    Teardown: FnMut(T),
{
    // Warmup
    for _ in 0..iterations / 10 {
        let s = setup();
        let t = test(s);
        teardown(t);
    }

    let start = Instant::now();
    for _ in 0..iterations {
        let s = black_box(setup());
        let t = black_box(test(s));
        teardown(t);
    }
    let elapsed = start.elapsed();
    println!("{}: {:?} total ({} iters)", name, elapsed, iterations);
    elapsed
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_bench_basic() {
        let duration = bench("simple_add", 1000, || 1 + 1);
        assert!(duration.as_nanos() > 0);
    }

    #[test]
    fn test_consume() {
        let value = consume(42);
        assert_eq!(value, 42);
    }

    #[test]
    fn test_bench_with_closure_state() {
        let mut counter = 0;
        let _ = bench("counter", 100, || {
            counter += 1;
            counter
        });
        assert_eq!(counter, 110); // 10 warmup + 100 iterations
    }

    #[test]
    fn test_bench_with_setup() {
        let duration = bench_with_setup(
            "vec_sum",
            100,
            || vec![1, 2, 3, 4, 5],
            |v| v.iter().sum::<i32>(),
            |_| {},
        );
        assert!(duration.as_nanos() > 0);
    }

    #[test]
    fn test_black_box_prevents_optimization() {
        // Without black_box, compiler might optimize this away
        let result = black_box(vec![1, 2, 3].iter().sum::<i32>());
        assert_eq!(result, 6);
    }
}

(* Benchmarking with closures in OCaml *)
let time_it label f =
  let start = Unix.gettimeofday () in
  let result = f () in
  let elapsed = Unix.gettimeofday () -. start in
  Printf.printf "%s: %.6f seconds\n" label elapsed;
  result

(* Without Unix, simulate: *)
let benchmark label iterations f =
  let sum = ref 0 in
  for _ = 1 to iterations do
    sum := !sum + (f ())
  done;
  Printf.printf "%s: ran %d iterations (sum=%d to prevent opt)\n" label iterations !sum

let () =
  benchmark "sum_1000" 1000 (fun () ->
    let s = ref 0 in
    for i = 1 to 1000 do s := !s + i done;
    !s
  );
  benchmark "string_ops" 100 (fun () ->
    let s = String.concat "" (List.init 10 (fun i -> string_of_int i)) in
    String.length s
  )

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_bench_basic() {
        let duration = bench("simple_add", 1000, || 1 + 1);
        assert!(duration.as_nanos() > 0);
    }

    #[test]
    fn test_consume() {
        let value = consume(42);
        assert_eq!(value, 42);
    }

    #[test]
    fn test_bench_with_closure_state() {
        let mut counter = 0;
        let _ = bench("counter", 100, || {
            counter += 1;
            counter
        });
        assert_eq!(counter, 110); // 10 warmup + 100 iterations
    }

    #[test]
    fn test_bench_with_setup() {
        let duration = bench_with_setup(
            "vec_sum",
            100,
            || vec![1, 2, 3, 4, 5],
            |v| v.iter().sum::<i32>(),
            |_| {},
        );
        assert!(duration.as_nanos() > 0);
    }

    #[test]
    fn test_black_box_prevents_optimization() {
        // Without black_box, compiler might optimize this away
        let result = black_box(vec![1, 2, 3].iter().sum::<i32>());
        assert_eq!(result, 6);
    }
}

Deep Comparison

OCaml vs Rust: Benchmark Closures

OCaml

let bench name iterations f =
  let start = Unix.gettimeofday () in
  for _ = 1 to iterations do
    ignore (f ())
  done;
  let elapsed = Unix.gettimeofday () -. start in
  Printf.printf "%s: %f sec\n" name elapsed

let () = bench "sum" 1000 (fun () -> List.fold_left (+) 0 [1;2;3])

Rust

use std::hint::black_box;

pub fn bench<T, F: FnMut() -> T>(name: &str, iters: usize, mut f: F) {
    let start = Instant::now();
    for _ in 0..iters {
        black_box(f());  // prevent optimization
    }
    println!("{}: {:?}", name, start.elapsed());
}

Key Differences

Rust: black_box prevents compiler from optimizing away results

OCaml: ignore discards result but doesn't prevent optimization

Both: Closures enable flexible benchmark setup

Rust: Setup/teardown closures for stateful benchmarks

Rust: Criterion crate for production benchmarking

Exercises

Statistics bench: Extend bench to return not just total duration but also mean, standard deviation, and min/max per iteration over multiple runs.

Comparison macro: Write a macro bench_vs!(name1 => expr1, name2 => expr2, iterations: N) that benchmarks two expressions and asserts the ratio is within a given tolerance.

Setup benchmark: Use bench_with_setup to benchmark Vec::sort vs Vec::sort_unstable on random data — ensure the setup closure generates a new random vector each iteration so sorting is never pre-sorted.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust