ExamplesBy LevelBy TopicLearning Paths
732 Intermediate

732-benchmarking-harness — Benchmarking Harness

Functional Programming

Tutorial

The Problem

Micro-benchmarking is surprisingly hard to do correctly. The compiler may eliminate "dead" computations, the CPU may boost frequency during warmup, and a single outlier can skew the mean. Production benchmark frameworks like Criterion address these problems with warmup phases, statistical analysis, and outlier rejection. This example builds a Criterion-inspired harness using only std, demonstrating the core techniques that make benchmarks trustworthy.

🎯 Learning Outcomes

  • • Use std::hint::black_box to prevent dead-code elimination of benchmark subjects
  • • Implement a warmup phase to stabilize CPU frequency and fill caches before measuring
  • • Collect per-iteration Duration samples and compute mean, min, max, and standard deviation
  • • Understand why the standard deviation matters more than the mean for latency-sensitive code
  • • Structure benchmark results in a BenchResult struct for comparison across runs
  • Code Example

    #![allow(clippy::all)]
    /// 732: Benchmarking Harness — Criterion-style, std-only
    use std::hint::black_box;
    use std::time::{Duration, Instant};
    
    // ── Core Harness ──────────────────────────────────────────────────────────────
    
    struct BenchResult {
        label: &'static str,
        mean: Duration,
        min: Duration,
        max: Duration,
        stddev_ns: f64,
        iters: u64,
    }
    
    impl BenchResult {
        fn print(&self) {
            println!(
                "{:40} mean={:>10.2?} min={:>10.2?} max={:>10.2?} σ={:.0}ns  (n={})",
                self.label, self.mean, self.min, self.max, self.stddev_ns, self.iters,
            );
        }
    }
    
    fn bench<F, R>(label: &'static str, warmup: u64, iters: u64, mut f: F) -> BenchResult
    where
        F: FnMut() -> R,
    {
        // Warmup — fill CPU caches, allow CPU to ramp up frequency
        for _ in 0..warmup {
            black_box(f());
        }
    
        let mut samples = Vec::with_capacity(iters as usize);
    
        for _ in 0..iters {
            let t0 = Instant::now();
            let result = f();
            let elapsed = t0.elapsed();
            black_box(result); // prevent dead-code elimination
            samples.push(elapsed);
        }
    
        let total_ns: u128 = samples.iter().map(|d| d.as_nanos()).sum();
        let mean_ns = total_ns / iters as u128;
        let mean = Duration::from_nanos(mean_ns as u64);
    
        let min = *samples.iter().min().unwrap();
        let max = *samples.iter().max().unwrap();
    
        let variance_ns: f64 = samples
            .iter()
            .map(|d| {
                let diff = d.as_nanos() as f64 - mean_ns as f64;
                diff * diff
            })
            .sum::<f64>()
            / iters as f64;
    
        BenchResult {
            label,
            mean,
            min,
            max,
            stddev_ns: variance_ns.sqrt(),
            iters,
        }
    }
    
    // ── Functions to Benchmark ────────────────────────────────────────────────────
    
    fn sum_naive(n: u64) -> u64 {
        (0..n).sum()
    }
    
    fn sum_formula(n: u64) -> u64 {
        n * (n - 1) / 2
    }
    
    fn string_push(n: usize) -> String {
        let mut s = String::with_capacity(n);
        for _ in 0..n {
            s.push('x');
        }
        s
    }
    
    fn vec_collect(n: usize) -> Vec<u64> {
        (0..n as u64).collect()
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn bench_runs_warmup_and_iters() {
            let mut call_count = 0u64;
            let result = bench("test", 5, 10, || {
                call_count += 1;
                call_count
            });
            // warmup(5) + iters(10) = 15 calls
            assert_eq!(call_count, 15);
            assert_eq!(result.iters, 10);
        }
    
        #[test]
        fn bench_min_le_mean_le_max() {
            let r = bench("sleep_0ns", 2, 20, || sum_naive(black_box(100)));
            assert!(r.min <= r.mean);
            assert!(r.mean <= r.max);
        }
    
        #[test]
        fn sum_naive_correct() {
            assert_eq!(sum_naive(5), 10); // 0+1+2+3+4
            assert_eq!(sum_naive(0), 0);
        }
    
        #[test]
        fn sum_formula_matches_naive() {
            for n in 1..=20u64 {
                assert_eq!(sum_naive(n), sum_formula(n), "n={}", n);
            }
        }
    }

    Key Differences

  • Dead-code prevention: Rust has std::hint::black_box; OCaml's Sys.opaque_identity serves the same purpose in core_bench.
  • GC noise: OCaml benchmarks must account for GC pauses; Rust has no GC, so samples are more consistent but cache warm-up still matters.
  • Closures: Both languages pass closures to the harness; Rust closures capture by reference or move with explicit annotation, while OCaml closures always capture by reference.
  • Ecosystem: Rust has criterion (statistical, HTML reports) and divan; OCaml has core_bench and bechamel.
  • OCaml Approach

    OCaml's standard library has no built-in benchmarking framework. The benchmark opam package and Jane Street's core_bench library fill this role. core_bench uses a similar warmup + sample approach with GC-pause awareness: it forces a minor GC collection before each measurement to reduce noise from accumulated garbage. OCaml's Sys.time and Unix.gettimeofday are the primitives; Mtime_clock provides monotonic wall-clock time similar to Instant.

    Full Source

    #![allow(clippy::all)]
    /// 732: Benchmarking Harness — Criterion-style, std-only
    use std::hint::black_box;
    use std::time::{Duration, Instant};
    
    // ── Core Harness ──────────────────────────────────────────────────────────────
    
    struct BenchResult {
        label: &'static str,
        mean: Duration,
        min: Duration,
        max: Duration,
        stddev_ns: f64,
        iters: u64,
    }
    
    impl BenchResult {
        fn print(&self) {
            println!(
                "{:40} mean={:>10.2?} min={:>10.2?} max={:>10.2?} σ={:.0}ns  (n={})",
                self.label, self.mean, self.min, self.max, self.stddev_ns, self.iters,
            );
        }
    }
    
    fn bench<F, R>(label: &'static str, warmup: u64, iters: u64, mut f: F) -> BenchResult
    where
        F: FnMut() -> R,
    {
        // Warmup — fill CPU caches, allow CPU to ramp up frequency
        for _ in 0..warmup {
            black_box(f());
        }
    
        let mut samples = Vec::with_capacity(iters as usize);
    
        for _ in 0..iters {
            let t0 = Instant::now();
            let result = f();
            let elapsed = t0.elapsed();
            black_box(result); // prevent dead-code elimination
            samples.push(elapsed);
        }
    
        let total_ns: u128 = samples.iter().map(|d| d.as_nanos()).sum();
        let mean_ns = total_ns / iters as u128;
        let mean = Duration::from_nanos(mean_ns as u64);
    
        let min = *samples.iter().min().unwrap();
        let max = *samples.iter().max().unwrap();
    
        let variance_ns: f64 = samples
            .iter()
            .map(|d| {
                let diff = d.as_nanos() as f64 - mean_ns as f64;
                diff * diff
            })
            .sum::<f64>()
            / iters as f64;
    
        BenchResult {
            label,
            mean,
            min,
            max,
            stddev_ns: variance_ns.sqrt(),
            iters,
        }
    }
    
    // ── Functions to Benchmark ────────────────────────────────────────────────────
    
    fn sum_naive(n: u64) -> u64 {
        (0..n).sum()
    }
    
    fn sum_formula(n: u64) -> u64 {
        n * (n - 1) / 2
    }
    
    fn string_push(n: usize) -> String {
        let mut s = String::with_capacity(n);
        for _ in 0..n {
            s.push('x');
        }
        s
    }
    
    fn vec_collect(n: usize) -> Vec<u64> {
        (0..n as u64).collect()
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn bench_runs_warmup_and_iters() {
            let mut call_count = 0u64;
            let result = bench("test", 5, 10, || {
                call_count += 1;
                call_count
            });
            // warmup(5) + iters(10) = 15 calls
            assert_eq!(call_count, 15);
            assert_eq!(result.iters, 10);
        }
    
        #[test]
        fn bench_min_le_mean_le_max() {
            let r = bench("sleep_0ns", 2, 20, || sum_naive(black_box(100)));
            assert!(r.min <= r.mean);
            assert!(r.mean <= r.max);
        }
    
        #[test]
        fn sum_naive_correct() {
            assert_eq!(sum_naive(5), 10); // 0+1+2+3+4
            assert_eq!(sum_naive(0), 0);
        }
    
        #[test]
        fn sum_formula_matches_naive() {
            for n in 1..=20u64 {
                assert_eq!(sum_naive(n), sum_formula(n), "n={}", n);
            }
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn bench_runs_warmup_and_iters() {
            let mut call_count = 0u64;
            let result = bench("test", 5, 10, || {
                call_count += 1;
                call_count
            });
            // warmup(5) + iters(10) = 15 calls
            assert_eq!(call_count, 15);
            assert_eq!(result.iters, 10);
        }
    
        #[test]
        fn bench_min_le_mean_le_max() {
            let r = bench("sleep_0ns", 2, 20, || sum_naive(black_box(100)));
            assert!(r.min <= r.mean);
            assert!(r.mean <= r.max);
        }
    
        #[test]
        fn sum_naive_correct() {
            assert_eq!(sum_naive(5), 10); // 0+1+2+3+4
            assert_eq!(sum_naive(0), 0);
        }
    
        #[test]
        fn sum_formula_matches_naive() {
            for n in 1..=20u64 {
                assert_eq!(sum_naive(n), sum_formula(n), "n={}", n);
            }
        }
    }

    Exercises

  • Add a p99 latency field to BenchResult by sorting samples and indexing at 0.99 * iters.
  • Implement a compare function that takes two BenchResult values and prints the speedup ratio and whether the difference is statistically significant (|Δmean| > 2σ).
  • Extend the harness to detect and discard outliers (samples more than 3 standard deviations from the mean) and recompute statistics on the cleaned dataset.
  • Open Source Repos