732 Intermediate

732-benchmarking-harness — Benchmarking Harness

Functional Programming

Tutorial

The Problem

Micro-benchmarking is surprisingly hard to do correctly. The compiler may eliminate "dead" computations, the CPU may boost frequency during warmup, and a single outlier can skew the mean. Production benchmark frameworks like Criterion address these problems with warmup phases, statistical analysis, and outlier rejection. This example builds a Criterion-inspired harness using only std, demonstrating the core techniques that make benchmarks trustworthy.

🎯 Learning Outcomes

• Use std::hint::black_box to prevent dead-code elimination of benchmark subjects

• Implement a warmup phase to stabilize CPU frequency and fill caches before measuring

• Collect per-iteration Duration samples and compute mean, min, max, and standard deviation

• Understand why the standard deviation matters more than the mean for latency-sensitive code

• Structure benchmark results in a BenchResult struct for comparison across runs

Code Example

#![allow(clippy::all)]
/// 732: Benchmarking Harness — Criterion-style, std-only
use std::hint::black_box;
use std::time::{Duration, Instant};

// ── Core Harness ──────────────────────────────────────────────────────────────

struct BenchResult {
    label: &'static str,
    mean: Duration,
    min: Duration,
    max: Duration,
    stddev_ns: f64,
    iters: u64,
}

impl BenchResult {
    fn print(&self) {
        println!(
            "{:40} mean={:>10.2?} min={:>10.2?} max={:>10.2?} σ={:.0}ns  (n={})",
            self.label, self.mean, self.min, self.max, self.stddev_ns, self.iters,
        );
    }
}

fn bench<F, R>(label: &'static str, warmup: u64, iters: u64, mut f: F) -> BenchResult
where
    F: FnMut() -> R,
{
    // Warmup — fill CPU caches, allow CPU to ramp up frequency
    for _ in 0..warmup {
        black_box(f());
    }

    let mut samples = Vec::with_capacity(iters as usize);

    for _ in 0..iters {
        let t0 = Instant::now();
        let result = f();
        let elapsed = t0.elapsed();
        black_box(result); // prevent dead-code elimination
        samples.push(elapsed);
    }

    let total_ns: u128 = samples.iter().map(|d| d.as_nanos()).sum();
    let mean_ns = total_ns / iters as u128;
    let mean = Duration::from_nanos(mean_ns as u64);

    let min = *samples.iter().min().unwrap();
    let max = *samples.iter().max().unwrap();

    let variance_ns: f64 = samples
        .iter()
        .map(|d| {
            let diff = d.as_nanos() as f64 - mean_ns as f64;
            diff * diff
        })
        .sum::<f64>()
        / iters as f64;

    BenchResult {
        label,
        mean,
        min,
        max,
        stddev_ns: variance_ns.sqrt(),
        iters,
    }
}

// ── Functions to Benchmark ────────────────────────────────────────────────────

fn sum_naive(n: u64) -> u64 {
    (0..n).sum()
}

fn sum_formula(n: u64) -> u64 {
    n * (n - 1) / 2
}

fn string_push(n: usize) -> String {
    let mut s = String::with_capacity(n);
    for _ in 0..n {
        s.push('x');
    }
    s
}

fn vec_collect(n: usize) -> Vec<u64> {
    (0..n as u64).collect()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn bench_runs_warmup_and_iters() {
        let mut call_count = 0u64;
        let result = bench("test", 5, 10, || {
            call_count += 1;
            call_count
        });
        // warmup(5) + iters(10) = 15 calls
        assert_eq!(call_count, 15);
        assert_eq!(result.iters, 10);
    }

    #[test]
    fn bench_min_le_mean_le_max() {
        let r = bench("sleep_0ns", 2, 20, || sum_naive(black_box(100)));
        assert!(r.min <= r.mean);
        assert!(r.mean <= r.max);
    }

    #[test]
    fn sum_naive_correct() {
        assert_eq!(sum_naive(5), 10); // 0+1+2+3+4
        assert_eq!(sum_naive(0), 0);
    }

    #[test]
    fn sum_formula_matches_naive() {
        for n in 1..=20u64 {
            assert_eq!(sum_naive(n), sum_formula(n), "n={}", n);
        }
    }
}

(* 732: Benchmarking Harness — OCaml stdlib version *)

let time_ns () =
  let t = Unix.gettimeofday () in
  Int64.of_float (t *. 1e9)

let benchmark ?(warmup=10) ?(iters=1000) label f =
  (* Warmup *)
  for _ = 1 to warmup do ignore (f ()) done;
  (* Measure *)
  let times = Array.init iters (fun _ ->
    let t0 = Unix.gettimeofday () in
    ignore (f ());
    let t1 = Unix.gettimeofday () in
    (t1 -. t0) *. 1e6  (* microseconds *)
  ) in
  (* Stats *)
  let n = float_of_int iters in
  let mean = Array.fold_left (+.) 0.0 times /. n in
  let variance = Array.fold_left (fun acc t ->
    let d = t -. mean in acc +. d *. d) 0.0 times /. n in
  let stddev = Float.sqrt variance in
  Printf.printf "%-30s mean=%.2fµs stddev=%.2fµs\n" label mean stddev

let () =
  (* Example: benchmark list creation *)
  benchmark "List.init 1000" (fun () ->
    List.init 1000 (fun i -> i * i));
  benchmark "String.concat" (fun () ->
    String.concat "," (List.init 100 string_of_int))

Key Differences

Dead-code prevention: Rust has std::hint::black_box; OCaml's Sys.opaque_identity serves the same purpose in core_bench.

GC noise: OCaml benchmarks must account for GC pauses; Rust has no GC, so samples are more consistent but cache warm-up still matters.

Closures: Both languages pass closures to the harness; Rust closures capture by reference or move with explicit annotation, while OCaml closures always capture by reference.

Ecosystem: Rust has criterion (statistical, HTML reports) and divan; OCaml has core_bench and bechamel.

OCaml Approach

OCaml's standard library has no built-in benchmarking framework. The benchmark opam package and Jane Street's core_bench library fill this role. core_bench uses a similar warmup + sample approach with GC-pause awareness: it forces a minor GC collection before each measurement to reduce noise from accumulated garbage. OCaml's Sys.time and Unix.gettimeofday are the primitives; Mtime_clock provides monotonic wall-clock time similar to Instant.

Full Source

#![allow(clippy::all)]
/// 732: Benchmarking Harness — Criterion-style, std-only
use std::hint::black_box;
use std::time::{Duration, Instant};

// ── Core Harness ──────────────────────────────────────────────────────────────

struct BenchResult {
    label: &'static str,
    mean: Duration,
    min: Duration,
    max: Duration,
    stddev_ns: f64,
    iters: u64,
}

impl BenchResult {
    fn print(&self) {
        println!(
            "{:40} mean={:>10.2?} min={:>10.2?} max={:>10.2?} σ={:.0}ns  (n={})",
            self.label, self.mean, self.min, self.max, self.stddev_ns, self.iters,
        );
    }
}

fn bench<F, R>(label: &'static str, warmup: u64, iters: u64, mut f: F) -> BenchResult
where
    F: FnMut() -> R,
{
    // Warmup — fill CPU caches, allow CPU to ramp up frequency
    for _ in 0..warmup {
        black_box(f());
    }

    let mut samples = Vec::with_capacity(iters as usize);

    for _ in 0..iters {
        let t0 = Instant::now();
        let result = f();
        let elapsed = t0.elapsed();
        black_box(result); // prevent dead-code elimination
        samples.push(elapsed);
    }

    let total_ns: u128 = samples.iter().map(|d| d.as_nanos()).sum();
    let mean_ns = total_ns / iters as u128;
    let mean = Duration::from_nanos(mean_ns as u64);

    let min = *samples.iter().min().unwrap();
    let max = *samples.iter().max().unwrap();

    let variance_ns: f64 = samples
        .iter()
        .map(|d| {
            let diff = d.as_nanos() as f64 - mean_ns as f64;
            diff * diff
        })
        .sum::<f64>()
        / iters as f64;

    BenchResult {
        label,
        mean,
        min,
        max,
        stddev_ns: variance_ns.sqrt(),
        iters,
    }
}

// ── Functions to Benchmark ────────────────────────────────────────────────────

fn sum_naive(n: u64) -> u64 {
    (0..n).sum()
}

fn sum_formula(n: u64) -> u64 {
    n * (n - 1) / 2
}

fn string_push(n: usize) -> String {
    let mut s = String::with_capacity(n);
    for _ in 0..n {
        s.push('x');
    }
    s
}

fn vec_collect(n: usize) -> Vec<u64> {
    (0..n as u64).collect()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn bench_runs_warmup_and_iters() {
        let mut call_count = 0u64;
        let result = bench("test", 5, 10, || {
            call_count += 1;
            call_count
        });
        // warmup(5) + iters(10) = 15 calls
        assert_eq!(call_count, 15);
        assert_eq!(result.iters, 10);
    }

    #[test]
    fn bench_min_le_mean_le_max() {
        let r = bench("sleep_0ns", 2, 20, || sum_naive(black_box(100)));
        assert!(r.min <= r.mean);
        assert!(r.mean <= r.max);
    }

    #[test]
    fn sum_naive_correct() {
        assert_eq!(sum_naive(5), 10); // 0+1+2+3+4
        assert_eq!(sum_naive(0), 0);
    }

    #[test]
    fn sum_formula_matches_naive() {
        for n in 1..=20u64 {
            assert_eq!(sum_naive(n), sum_formula(n), "n={}", n);
        }
    }
}

(* 732: Benchmarking Harness — OCaml stdlib version *)

let time_ns () =
  let t = Unix.gettimeofday () in
  Int64.of_float (t *. 1e9)

let benchmark ?(warmup=10) ?(iters=1000) label f =
  (* Warmup *)
  for _ = 1 to warmup do ignore (f ()) done;
  (* Measure *)
  let times = Array.init iters (fun _ ->
    let t0 = Unix.gettimeofday () in
    ignore (f ());
    let t1 = Unix.gettimeofday () in
    (t1 -. t0) *. 1e6  (* microseconds *)
  ) in
  (* Stats *)
  let n = float_of_int iters in
  let mean = Array.fold_left (+.) 0.0 times /. n in
  let variance = Array.fold_left (fun acc t ->
    let d = t -. mean in acc +. d *. d) 0.0 times /. n in
  let stddev = Float.sqrt variance in
  Printf.printf "%-30s mean=%.2fµs stddev=%.2fµs\n" label mean stddev

let () =
  (* Example: benchmark list creation *)
  benchmark "List.init 1000" (fun () ->
    List.init 1000 (fun i -> i * i));
  benchmark "String.concat" (fun () ->
    String.concat "," (List.init 100 string_of_int))

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn bench_runs_warmup_and_iters() {
        let mut call_count = 0u64;
        let result = bench("test", 5, 10, || {
            call_count += 1;
            call_count
        });
        // warmup(5) + iters(10) = 15 calls
        assert_eq!(call_count, 15);
        assert_eq!(result.iters, 10);
    }

    #[test]
    fn bench_min_le_mean_le_max() {
        let r = bench("sleep_0ns", 2, 20, || sum_naive(black_box(100)));
        assert!(r.min <= r.mean);
        assert!(r.mean <= r.max);
    }

    #[test]
    fn sum_naive_correct() {
        assert_eq!(sum_naive(5), 10); // 0+1+2+3+4
        assert_eq!(sum_naive(0), 0);
    }

    #[test]
    fn sum_formula_matches_naive() {
        for n in 1..=20u64 {
            assert_eq!(sum_naive(n), sum_formula(n), "n={}", n);
        }
    }
}

Exercises

Add a p99 latency field to BenchResult by sorting samples and indexing at 0.99 * iters.

Implement a compare function that takes two BenchResult values and prints the speedup ratio and whether the difference is statistically significant (|Δmean| > 2σ).

Extend the harness to detect and discard outliers (samples more than 3 standard deviations from the mean) and recompute statistics on the cleaned dataset.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust