ExamplesBy LevelBy TopicLearning Paths
723 Advanced

Branchless Programming

Functional Programming

Tutorial

The Problem

Modern CPUs speculatively execute both sides of a branch before knowing which path is taken, relying on a branch predictor trained on past behavior. When the prediction is correct, branches are nearly free. When the predictor is wrongβ€”either because the branch outcome is data-dependent and unpredictable, or because the branch alternates irregularlyβ€”a misprediction flushes the pipeline and costs 10–20 cycles. For tight loops over large, unsorted data (sorting networks, cryptography, parsers, games), branch mispredictions can dominate execution time.

Branchless programming replaces conditional jumps with arithmetic: boolean conditions are converted to 0/1 integers, and the desired value is selected via multiplication or bitwise masking. The CPU computes both outcomes and selects one without speculating. The technique is standard in constant-time cryptographic code (preventing timing side channels), SIMD kernels, and performance-critical inner loops.

🎯 Learning Outcomes

  • β€’ Explain branch prediction, misprediction penalties, and pipeline stalls
  • β€’ Convert conditional expressions to arithmetic using sign-extension masks
  • β€’ Implement branchless min, max, abs, clamp, and select for integers
  • β€’ Recognize when the compiler already generates branchless code (check with objdump)
  • β€’ Understand constant-time programming and why branches leak secret data
  • Code Example

    #[inline(always)]
    pub fn min_idiomatic(a: i64, b: i64) -> i64 { a.min(b) }
    
    #[inline(always)]
    pub fn clamp_idiomatic(lo: i64, hi: i64, x: i64) -> i64 { x.clamp(lo, hi) }
    
    #[inline(always)]
    pub fn abs_idiomatic(x: i64) -> i64 { x.wrapping_abs() }

    Key Differences

    AspectRustOCaml
    Integer representationUntagged i64/i32Tagged 63-bit int
    Arithmetic right shift>> 63 fills all bitsasr 62 needed (tag bit)
    Compiler branchless outputa.min(b) often branchlessStandard min may branch
    Constant-time stdlibsubtle crateExternal C (Hacl-star)
    SIMD branchless blend_mm_blendv_epi8 via std::archRequires C FFI

    OCaml Approach

    OCaml integers are tagged (63-bit on 64-bit systems), so arithmetic right-shift tricks require care with the tag bit. The standard library uses branching min/max. For constant-time crypto, OCaml code typically relies on external C via FFI:

    (* Tagged int: arithmetic shift includes tag bit β€” avoid for branchless tricks *)
    let branchless_min_unsafe a b =
      (* Only safe if a and b are untagged ints (Bigarray, Bytes) *)
      let diff = a - b in
      let mask = diff asr 62 in   (* asr 62 to avoid tag bit *)
      b + (mask land diff)
    
    (* Safe OCaml β€” uses compare, may branch *)
    let safe_min a b = if a < b then a else b
    

    For constant-time code, OCaml projects use Hacl-star bindings or C stubs.

    Full Source

    #![allow(clippy::all)]
    //! # Branchless Programming
    //!
    //! Demonstrates replacing conditional branches with arithmetic operations to
    //! avoid branch misprediction penalties on modern out-of-order CPUs.
    //!
    //! The core technique: arithmetic right-shift on signed integers sign-extends
    //! the MSB to all bits, producing a bitmask of 0 (if positive) or -1 (if negative).
    //! This mask selects between two precomputed values without a branch.
    
    // ── Branchless integer min/max ────────────────────────────────────────────────
    
    /// Branchless min using arithmetic right-shift bitmask.
    ///
    /// `(a - b) >> 63` produces 0x0000...0000 when a >= b,
    /// or 0xFFFF...FFFF when a < b. ANDing `diff` with this mask
    /// gives either 0 or `diff`, which we add to `b`.
    ///
    /// **Note:** inputs must not differ by more than `i64::MAX` (i.e., both values
    /// should lie in the same signed half-range) to avoid wrapping in the final add.
    /// For general use, prefer `i64::min` which LLVM compiles to a CMOV.
    #[inline(always)]
    pub fn min_branchless(a: i64, b: i64) -> i64 {
        let diff = a.wrapping_sub(b);
        let mask = diff >> 63; // arithmetic shift: 0 or -1 (all bits set)
        b.wrapping_add(diff & mask)
    }
    
    /// Branchless max using arithmetic right-shift bitmask.
    ///
    /// Same overflow caveat as `min_branchless`.
    #[inline(always)]
    pub fn max_branchless(a: i64, b: i64) -> i64 {
        let diff = a.wrapping_sub(b);
        let mask = diff >> 63;
        a.wrapping_sub(diff & mask)
    }
    
    /// Idiomatic Rust min β€” LLVM compiles this to a CMOV (conditional move)
    /// instruction on x86-64, which is already branchless at the hardware level.
    #[inline(always)]
    pub fn min_idiomatic(a: i64, b: i64) -> i64 {
        a.min(b)
    }
    
    /// Idiomatic Rust max β€” same CMOV story as min_idiomatic.
    #[inline(always)]
    pub fn max_idiomatic(a: i64, b: i64) -> i64 {
        a.max(b)
    }
    
    // ── Clamp ─────────────────────────────────────────────────────────────────────
    
    /// Branchless clamp: compose min/max without any conditional jumps.
    #[inline(always)]
    pub fn clamp_branchless(lo: i64, hi: i64, x: i64) -> i64 {
        min_branchless(hi, max_branchless(lo, x))
    }
    
    /// Idiomatic Rust clamp β€” also CMOV-friendly via LLVM.
    #[inline(always)]
    pub fn clamp_idiomatic(lo: i64, hi: i64, x: i64) -> i64 {
        x.clamp(lo, hi)
    }
    
    // ── Absolute value ────────────────────────────────────────────────────────────
    
    /// Branchless absolute value using the sign-mask technique.
    ///
    /// `mask = x >> 63` is 0 for non-negative, -1 for negative.
    /// `(x + mask) ^ mask` negates x when negative, leaves it alone otherwise.
    #[inline(always)]
    pub fn abs_branchless(x: i64) -> i64 {
        let mask = x >> 63;
        (x + mask) ^ mask
    }
    
    /// Idiomatic Rust abs β€” wrapping_abs avoids UB on i64::MIN.
    #[inline(always)]
    pub fn abs_idiomatic(x: i64) -> i64 {
        x.wrapping_abs()
    }
    
    // ── Select without branch ─────────────────────────────────────────────────────
    
    /// Return `a` if `cond` is true, `b` otherwise β€” no branch.
    ///
    /// `cond as i64` is 0 or 1; negating gives 0 or -1 (the mask).
    #[inline(always)]
    pub fn select_branchless(cond: bool, a: i64, b: i64) -> i64 {
        let mask = -(cond as i64); // 0 or -1
        (a & mask) | (b & !mask)
    }
    
    // ── Sign function ─────────────────────────────────────────────────────────────
    
    /// Returns -1, 0, or 1 depending on the sign of x, branchlessly.
    #[inline(always)]
    pub fn sign_branchless(x: i64) -> i64 {
        // positive: (x > 0) as i64 = 1, -(x < 0) as i64 = 0  β†’ 1
        // negative: (x > 0) as i64 = 0, -(x < 0) as i64 = -1 β†’ -1
        // zero:     both 0 β†’ 0
        (x > 0) as i64 - (x < 0) as i64
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        // ── min ───────────────────────────────────────────────────────────────────
    
        #[test]
        fn test_min_branchless_basic() {
            assert_eq!(min_branchless(3, 7), 3);
            assert_eq!(min_branchless(7, 3), 3);
            assert_eq!(min_branchless(5, 5), 5);
        }
    
        #[test]
        fn test_min_branchless_negatives() {
            assert_eq!(min_branchless(-10, -3), -10);
            assert_eq!(min_branchless(-3, -10), -10);
            assert_eq!(min_branchless(-1, 1), -1);
        }
    
        #[test]
        fn test_min_matches_idiomatic() {
            // Pairs must not differ by more than i64::MAX (documented limitation of
            // the bitmask technique). For arbitrary-range inputs, use min_idiomatic.
            let pairs = [
                (0_i64, 0),
                (42, -42),
                (-100, 100),
                (i64::MAX, i64::MAX - 1),
                (i64::MIN + 1, i64::MIN),
            ];
            for (a, b) in pairs {
                assert_eq!(min_branchless(a, b), min_idiomatic(a, b), "min({a}, {b})");
            }
        }
    
        // ── max ───────────────────────────────────────────────────────────────────
    
        #[test]
        fn test_max_branchless_basic() {
            assert_eq!(max_branchless(3, 7), 7);
            assert_eq!(max_branchless(7, 3), 7);
            assert_eq!(max_branchless(5, 5), 5);
        }
    
        #[test]
        fn test_max_matches_idiomatic() {
            // Same range restriction as min_branchless.
            let pairs = [
                (0_i64, 0),
                (42, -42),
                (i64::MAX, i64::MAX - 1),
                (i64::MIN + 1, i64::MIN),
            ];
            for (a, b) in pairs {
                assert_eq!(max_branchless(a, b), max_idiomatic(a, b), "max({a}, {b})");
            }
        }
    
        // ── clamp ─────────────────────────────────────────────────────────────────
    
        #[test]
        fn test_clamp_within_range() {
            assert_eq!(clamp_branchless(0, 10, 5), 5);
        }
    
        #[test]
        fn test_clamp_below_lo() {
            assert_eq!(clamp_branchless(0, 10, -5), 0);
        }
    
        #[test]
        fn test_clamp_above_hi() {
            assert_eq!(clamp_branchless(0, 10, 15), 10);
        }
    
        #[test]
        fn test_clamp_matches_idiomatic() {
            let cases = [(-5_i64, 5, -10), (-5, 5, 0), (-5, 5, 10), (0, 0, 0)];
            for (lo, hi, x) in cases {
                assert_eq!(
                    clamp_branchless(lo, hi, x),
                    clamp_idiomatic(lo, hi, x),
                    "clamp({lo}, {hi}, {x})"
                );
            }
        }
    
        // ── abs ───────────────────────────────────────────────────────────────────
    
        #[test]
        fn test_abs_branchless_positive() {
            assert_eq!(abs_branchless(42), 42);
        }
    
        #[test]
        fn test_abs_branchless_negative() {
            assert_eq!(abs_branchless(-42), 42);
        }
    
        #[test]
        fn test_abs_branchless_zero() {
            assert_eq!(abs_branchless(0), 0);
        }
    
        #[test]
        fn test_abs_matches_idiomatic() {
            for x in [-1000_i64, -1, 0, 1, 1000, i64::MAX] {
                assert_eq!(abs_branchless(x), abs_idiomatic(x), "abs({x})");
            }
        }
    
        // ── select ────────────────────────────────────────────────────────────────
    
        #[test]
        fn test_select_true() {
            assert_eq!(select_branchless(true, 100, 200), 100);
        }
    
        #[test]
        fn test_select_false() {
            assert_eq!(select_branchless(false, 100, 200), 200);
        }
    
        // ── sign ──────────────────────────────────────────────────────────────────
    
        #[test]
        fn test_sign_positive() {
            assert_eq!(sign_branchless(42), 1);
        }
    
        #[test]
        fn test_sign_negative() {
            assert_eq!(sign_branchless(-42), -1);
        }
    
        #[test]
        fn test_sign_zero() {
            assert_eq!(sign_branchless(0), 0);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        // ── min ───────────────────────────────────────────────────────────────────
    
        #[test]
        fn test_min_branchless_basic() {
            assert_eq!(min_branchless(3, 7), 3);
            assert_eq!(min_branchless(7, 3), 3);
            assert_eq!(min_branchless(5, 5), 5);
        }
    
        #[test]
        fn test_min_branchless_negatives() {
            assert_eq!(min_branchless(-10, -3), -10);
            assert_eq!(min_branchless(-3, -10), -10);
            assert_eq!(min_branchless(-1, 1), -1);
        }
    
        #[test]
        fn test_min_matches_idiomatic() {
            // Pairs must not differ by more than i64::MAX (documented limitation of
            // the bitmask technique). For arbitrary-range inputs, use min_idiomatic.
            let pairs = [
                (0_i64, 0),
                (42, -42),
                (-100, 100),
                (i64::MAX, i64::MAX - 1),
                (i64::MIN + 1, i64::MIN),
            ];
            for (a, b) in pairs {
                assert_eq!(min_branchless(a, b), min_idiomatic(a, b), "min({a}, {b})");
            }
        }
    
        // ── max ───────────────────────────────────────────────────────────────────
    
        #[test]
        fn test_max_branchless_basic() {
            assert_eq!(max_branchless(3, 7), 7);
            assert_eq!(max_branchless(7, 3), 7);
            assert_eq!(max_branchless(5, 5), 5);
        }
    
        #[test]
        fn test_max_matches_idiomatic() {
            // Same range restriction as min_branchless.
            let pairs = [
                (0_i64, 0),
                (42, -42),
                (i64::MAX, i64::MAX - 1),
                (i64::MIN + 1, i64::MIN),
            ];
            for (a, b) in pairs {
                assert_eq!(max_branchless(a, b), max_idiomatic(a, b), "max({a}, {b})");
            }
        }
    
        // ── clamp ─────────────────────────────────────────────────────────────────
    
        #[test]
        fn test_clamp_within_range() {
            assert_eq!(clamp_branchless(0, 10, 5), 5);
        }
    
        #[test]
        fn test_clamp_below_lo() {
            assert_eq!(clamp_branchless(0, 10, -5), 0);
        }
    
        #[test]
        fn test_clamp_above_hi() {
            assert_eq!(clamp_branchless(0, 10, 15), 10);
        }
    
        #[test]
        fn test_clamp_matches_idiomatic() {
            let cases = [(-5_i64, 5, -10), (-5, 5, 0), (-5, 5, 10), (0, 0, 0)];
            for (lo, hi, x) in cases {
                assert_eq!(
                    clamp_branchless(lo, hi, x),
                    clamp_idiomatic(lo, hi, x),
                    "clamp({lo}, {hi}, {x})"
                );
            }
        }
    
        // ── abs ───────────────────────────────────────────────────────────────────
    
        #[test]
        fn test_abs_branchless_positive() {
            assert_eq!(abs_branchless(42), 42);
        }
    
        #[test]
        fn test_abs_branchless_negative() {
            assert_eq!(abs_branchless(-42), 42);
        }
    
        #[test]
        fn test_abs_branchless_zero() {
            assert_eq!(abs_branchless(0), 0);
        }
    
        #[test]
        fn test_abs_matches_idiomatic() {
            for x in [-1000_i64, -1, 0, 1, 1000, i64::MAX] {
                assert_eq!(abs_branchless(x), abs_idiomatic(x), "abs({x})");
            }
        }
    
        // ── select ────────────────────────────────────────────────────────────────
    
        #[test]
        fn test_select_true() {
            assert_eq!(select_branchless(true, 100, 200), 100);
        }
    
        #[test]
        fn test_select_false() {
            assert_eq!(select_branchless(false, 100, 200), 200);
        }
    
        // ── sign ──────────────────────────────────────────────────────────────────
    
        #[test]
        fn test_sign_positive() {
            assert_eq!(sign_branchless(42), 1);
        }
    
        #[test]
        fn test_sign_negative() {
            assert_eq!(sign_branchless(-42), -1);
        }
    
        #[test]
        fn test_sign_zero() {
            assert_eq!(sign_branchless(0), 0);
        }
    }

    Deep Comparison

    OCaml vs Rust: Branchless Programming

    Side-by-Side Code

    OCaml

    (* Branchless min: arithmetic right-shift produces 0 or all-ones mask *)
    let min_branchless (a : int) (b : int) =
      let diff = a - b in
      b + (diff land (diff asr 62))
    
    (* Branchless abs using sign-mask technique *)
    let abs_branchless (x : int) =
      let mask = x asr 62 in
      (x + mask) lxor mask
    
    (* Clamp via composition β€” no branches *)
    let clamp_branchless lo hi x =
      min_branchless hi (max_branchless lo x)
    

    Rust (idiomatic β€” LLVM emits CMOV)

    #[inline(always)]
    pub fn min_idiomatic(a: i64, b: i64) -> i64 { a.min(b) }
    
    #[inline(always)]
    pub fn clamp_idiomatic(lo: i64, hi: i64, x: i64) -> i64 { x.clamp(lo, hi) }
    
    #[inline(always)]
    pub fn abs_idiomatic(x: i64) -> i64 { x.wrapping_abs() }
    

    Rust (explicit branchless bitmask)

    #[inline(always)]
    pub fn min_branchless(a: i64, b: i64) -> i64 {
        let diff = a.wrapping_sub(b);
        let mask = diff >> 63; // 0 or -1 (all bits set)
        b + (diff & mask)
    }
    
    #[inline(always)]
    pub fn abs_branchless(x: i64) -> i64 {
        let mask = x >> 63;
        (x + mask) ^ mask
    }
    
    #[inline(always)]
    pub fn select_branchless(cond: bool, a: i64, b: i64) -> i64 {
        let mask = -(cond as i64); // 0 or -1
        (a & mask) | (b & !mask)
    }
    

    Type Signatures

    ConceptOCamlRust
    Integer typeint (63-bit on 64-bit platform)i64 (explicit 64-bit)
    Arithmetic right-shiftasr operator>> 63 (arithmetic on signed)
    Bitwise ANDland&
    Bitwise XORlxor^
    Safe subtraction- (may overflow on edge cases)wrapping_sub (defined overflow)
    Absabs (stdlib, may branch)wrapping_abs()
    Clampcustom compositionx.clamp(lo, hi)

    Key Insights

  • Arithmetic right-shift is the core primitive. Both OCaml (asr) and Rust (>> 63 on i64) sign-extend the MSB to produce a bitmask of 0 or -1. The difference: OCaml's int is 63-bit, so asr 62 is used; Rust's i64 is 64-bit, so >> 63 is correct.
  • Rust's idiomatic path is already branchless. LLVM compiles a.min(b), x.clamp(lo, hi), and x.wrapping_abs() to CMOV (conditional move) instructions on x86-64 β€” no conditional jump, no pipeline flush. The explicit bitmask technique is a teaching tool and a fallback for targets where CMOV is unavailable.
  • **wrapping_sub vs bare subtraction.** OCaml integers are 63-bit and overflow is technically undefined in some contexts; Rust makes the overflow semantics explicit with wrapping_sub, which is essential for the bitmask trick to be well-defined.
  • **select_branchless generalises the pattern.** Casting bool to i64 gives 0 or 1; negating gives 0 or -1. This is a portable, safe way to implement conditional selection without a branch β€” useful when the compiler fails to emit a CMOV on its own.
  • When branchy code wins. Branchless code always executes both computation paths. For data where one branch dominates (e.g., 99% positive inputs for abs), the CPU branch predictor wins and branchless is slower. For truly unpredictable data (random min/max of arbitrary inputs), branchless eliminates misprediction stalls and wins decisively.
  • When to Use Each Style

    **Use idiomatic Rust (a.min(b), x.clamp(lo, hi)):** Always as the default β€” readable, safe, and LLVM produces CMOV. Profile before reaching for explicit bitmasks.

    Use explicit branchless bitmasks when: Profiling confirms misprediction stalls, the target architecture lacks CMOV, or you need a select primitive that the compiler fails to optimise into a conditional move.

    Exercises

  • Verify that i64::min(a, b) compiles to a cmov (conditional move) instruction
  • by inspecting cargo rustc --release -- --emit asm output. Does branchless change with -C target-cpu=native?

  • Implement a branchless clamp(x: i64, lo: i64, hi: i64) -> i64 using two
  • applications of branchless_min/branchless_max.

  • Benchmark sorting an array of 10,000 random i32 values using sort_unstable vs
  • a hand-written sorting network for N=8 elements. Measure branch mispredictions with perf stat -e branch-misses.

  • Implement ct_select from the subtle crate interface: a Choice type wrapping
  • a secret boolean and a ConditionallySelectable trait for u64.

  • Apply the arithmetic right-shift min to a scalar loop over Vec<i32> and compare
  • the generated assembly with the LLVM auto-vectorized version.

    Open Source Repos