Branchless Programming
Tutorial
The Problem
Modern CPUs speculatively execute both sides of a branch before knowing which path is taken, relying on a branch predictor trained on past behavior. When the prediction is correct, branches are nearly free. When the predictor is wrongβeither because the branch outcome is data-dependent and unpredictable, or because the branch alternates irregularlyβa misprediction flushes the pipeline and costs 10β20 cycles. For tight loops over large, unsorted data (sorting networks, cryptography, parsers, games), branch mispredictions can dominate execution time.
Branchless programming replaces conditional jumps with arithmetic: boolean conditions are converted to 0/1 integers, and the desired value is selected via multiplication or bitwise masking. The CPU computes both outcomes and selects one without speculating. The technique is standard in constant-time cryptographic code (preventing timing side channels), SIMD kernels, and performance-critical inner loops.
🎯 Learning Outcomes
min, max, abs, clamp, and select for integersobjdump)Code Example
#[inline(always)]
pub fn min_idiomatic(a: i64, b: i64) -> i64 { a.min(b) }
#[inline(always)]
pub fn clamp_idiomatic(lo: i64, hi: i64, x: i64) -> i64 { x.clamp(lo, hi) }
#[inline(always)]
pub fn abs_idiomatic(x: i64) -> i64 { x.wrapping_abs() }Key Differences
| Aspect | Rust | OCaml |
|---|---|---|
| Integer representation | Untagged i64/i32 | Tagged 63-bit int |
| Arithmetic right shift | >> 63 fills all bits | asr 62 needed (tag bit) |
| Compiler branchless output | a.min(b) often branchless | Standard min may branch |
| Constant-time stdlib | subtle crate | External C (Hacl-star) |
| SIMD branchless blend | _mm_blendv_epi8 via std::arch | Requires C FFI |
OCaml Approach
OCaml integers are tagged (63-bit on 64-bit systems), so arithmetic right-shift tricks
require care with the tag bit. The standard library uses branching min/max. For
constant-time crypto, OCaml code typically relies on external C via FFI:
(* Tagged int: arithmetic shift includes tag bit β avoid for branchless tricks *)
let branchless_min_unsafe a b =
(* Only safe if a and b are untagged ints (Bigarray, Bytes) *)
let diff = a - b in
let mask = diff asr 62 in (* asr 62 to avoid tag bit *)
b + (mask land diff)
(* Safe OCaml β uses compare, may branch *)
let safe_min a b = if a < b then a else b
For constant-time code, OCaml projects use Hacl-star bindings or C stubs.
Full Source
#![allow(clippy::all)]
//! # Branchless Programming
//!
//! Demonstrates replacing conditional branches with arithmetic operations to
//! avoid branch misprediction penalties on modern out-of-order CPUs.
//!
//! The core technique: arithmetic right-shift on signed integers sign-extends
//! the MSB to all bits, producing a bitmask of 0 (if positive) or -1 (if negative).
//! This mask selects between two precomputed values without a branch.
// ββ Branchless integer min/max ββββββββββββββββββββββββββββββββββββββββββββββββ
/// Branchless min using arithmetic right-shift bitmask.
///
/// `(a - b) >> 63` produces 0x0000...0000 when a >= b,
/// or 0xFFFF...FFFF when a < b. ANDing `diff` with this mask
/// gives either 0 or `diff`, which we add to `b`.
///
/// **Note:** inputs must not differ by more than `i64::MAX` (i.e., both values
/// should lie in the same signed half-range) to avoid wrapping in the final add.
/// For general use, prefer `i64::min` which LLVM compiles to a CMOV.
#[inline(always)]
pub fn min_branchless(a: i64, b: i64) -> i64 {
let diff = a.wrapping_sub(b);
let mask = diff >> 63; // arithmetic shift: 0 or -1 (all bits set)
b.wrapping_add(diff & mask)
}
/// Branchless max using arithmetic right-shift bitmask.
///
/// Same overflow caveat as `min_branchless`.
#[inline(always)]
pub fn max_branchless(a: i64, b: i64) -> i64 {
let diff = a.wrapping_sub(b);
let mask = diff >> 63;
a.wrapping_sub(diff & mask)
}
/// Idiomatic Rust min β LLVM compiles this to a CMOV (conditional move)
/// instruction on x86-64, which is already branchless at the hardware level.
#[inline(always)]
pub fn min_idiomatic(a: i64, b: i64) -> i64 {
a.min(b)
}
/// Idiomatic Rust max β same CMOV story as min_idiomatic.
#[inline(always)]
pub fn max_idiomatic(a: i64, b: i64) -> i64 {
a.max(b)
}
// ββ Clamp βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
/// Branchless clamp: compose min/max without any conditional jumps.
#[inline(always)]
pub fn clamp_branchless(lo: i64, hi: i64, x: i64) -> i64 {
min_branchless(hi, max_branchless(lo, x))
}
/// Idiomatic Rust clamp β also CMOV-friendly via LLVM.
#[inline(always)]
pub fn clamp_idiomatic(lo: i64, hi: i64, x: i64) -> i64 {
x.clamp(lo, hi)
}
// ββ Absolute value ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
/// Branchless absolute value using the sign-mask technique.
///
/// `mask = x >> 63` is 0 for non-negative, -1 for negative.
/// `(x + mask) ^ mask` negates x when negative, leaves it alone otherwise.
#[inline(always)]
pub fn abs_branchless(x: i64) -> i64 {
let mask = x >> 63;
(x + mask) ^ mask
}
/// Idiomatic Rust abs β wrapping_abs avoids UB on i64::MIN.
#[inline(always)]
pub fn abs_idiomatic(x: i64) -> i64 {
x.wrapping_abs()
}
// ββ Select without branch βββββββββββββββββββββββββββββββββββββββββββββββββββββ
/// Return `a` if `cond` is true, `b` otherwise β no branch.
///
/// `cond as i64` is 0 or 1; negating gives 0 or -1 (the mask).
#[inline(always)]
pub fn select_branchless(cond: bool, a: i64, b: i64) -> i64 {
let mask = -(cond as i64); // 0 or -1
(a & mask) | (b & !mask)
}
// ββ Sign function βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
/// Returns -1, 0, or 1 depending on the sign of x, branchlessly.
#[inline(always)]
pub fn sign_branchless(x: i64) -> i64 {
// positive: (x > 0) as i64 = 1, -(x < 0) as i64 = 0 β 1
// negative: (x > 0) as i64 = 0, -(x < 0) as i64 = -1 β -1
// zero: both 0 β 0
(x > 0) as i64 - (x < 0) as i64
}
#[cfg(test)]
mod tests {
use super::*;
// ββ min βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_min_branchless_basic() {
assert_eq!(min_branchless(3, 7), 3);
assert_eq!(min_branchless(7, 3), 3);
assert_eq!(min_branchless(5, 5), 5);
}
#[test]
fn test_min_branchless_negatives() {
assert_eq!(min_branchless(-10, -3), -10);
assert_eq!(min_branchless(-3, -10), -10);
assert_eq!(min_branchless(-1, 1), -1);
}
#[test]
fn test_min_matches_idiomatic() {
// Pairs must not differ by more than i64::MAX (documented limitation of
// the bitmask technique). For arbitrary-range inputs, use min_idiomatic.
let pairs = [
(0_i64, 0),
(42, -42),
(-100, 100),
(i64::MAX, i64::MAX - 1),
(i64::MIN + 1, i64::MIN),
];
for (a, b) in pairs {
assert_eq!(min_branchless(a, b), min_idiomatic(a, b), "min({a}, {b})");
}
}
// ββ max βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_max_branchless_basic() {
assert_eq!(max_branchless(3, 7), 7);
assert_eq!(max_branchless(7, 3), 7);
assert_eq!(max_branchless(5, 5), 5);
}
#[test]
fn test_max_matches_idiomatic() {
// Same range restriction as min_branchless.
let pairs = [
(0_i64, 0),
(42, -42),
(i64::MAX, i64::MAX - 1),
(i64::MIN + 1, i64::MIN),
];
for (a, b) in pairs {
assert_eq!(max_branchless(a, b), max_idiomatic(a, b), "max({a}, {b})");
}
}
// ββ clamp βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_clamp_within_range() {
assert_eq!(clamp_branchless(0, 10, 5), 5);
}
#[test]
fn test_clamp_below_lo() {
assert_eq!(clamp_branchless(0, 10, -5), 0);
}
#[test]
fn test_clamp_above_hi() {
assert_eq!(clamp_branchless(0, 10, 15), 10);
}
#[test]
fn test_clamp_matches_idiomatic() {
let cases = [(-5_i64, 5, -10), (-5, 5, 0), (-5, 5, 10), (0, 0, 0)];
for (lo, hi, x) in cases {
assert_eq!(
clamp_branchless(lo, hi, x),
clamp_idiomatic(lo, hi, x),
"clamp({lo}, {hi}, {x})"
);
}
}
// ββ abs βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_abs_branchless_positive() {
assert_eq!(abs_branchless(42), 42);
}
#[test]
fn test_abs_branchless_negative() {
assert_eq!(abs_branchless(-42), 42);
}
#[test]
fn test_abs_branchless_zero() {
assert_eq!(abs_branchless(0), 0);
}
#[test]
fn test_abs_matches_idiomatic() {
for x in [-1000_i64, -1, 0, 1, 1000, i64::MAX] {
assert_eq!(abs_branchless(x), abs_idiomatic(x), "abs({x})");
}
}
// ββ select ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_select_true() {
assert_eq!(select_branchless(true, 100, 200), 100);
}
#[test]
fn test_select_false() {
assert_eq!(select_branchless(false, 100, 200), 200);
}
// ββ sign ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_sign_positive() {
assert_eq!(sign_branchless(42), 1);
}
#[test]
fn test_sign_negative() {
assert_eq!(sign_branchless(-42), -1);
}
#[test]
fn test_sign_zero() {
assert_eq!(sign_branchless(0), 0);
}
}#[cfg(test)]
mod tests {
use super::*;
// ββ min βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_min_branchless_basic() {
assert_eq!(min_branchless(3, 7), 3);
assert_eq!(min_branchless(7, 3), 3);
assert_eq!(min_branchless(5, 5), 5);
}
#[test]
fn test_min_branchless_negatives() {
assert_eq!(min_branchless(-10, -3), -10);
assert_eq!(min_branchless(-3, -10), -10);
assert_eq!(min_branchless(-1, 1), -1);
}
#[test]
fn test_min_matches_idiomatic() {
// Pairs must not differ by more than i64::MAX (documented limitation of
// the bitmask technique). For arbitrary-range inputs, use min_idiomatic.
let pairs = [
(0_i64, 0),
(42, -42),
(-100, 100),
(i64::MAX, i64::MAX - 1),
(i64::MIN + 1, i64::MIN),
];
for (a, b) in pairs {
assert_eq!(min_branchless(a, b), min_idiomatic(a, b), "min({a}, {b})");
}
}
// ββ max βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_max_branchless_basic() {
assert_eq!(max_branchless(3, 7), 7);
assert_eq!(max_branchless(7, 3), 7);
assert_eq!(max_branchless(5, 5), 5);
}
#[test]
fn test_max_matches_idiomatic() {
// Same range restriction as min_branchless.
let pairs = [
(0_i64, 0),
(42, -42),
(i64::MAX, i64::MAX - 1),
(i64::MIN + 1, i64::MIN),
];
for (a, b) in pairs {
assert_eq!(max_branchless(a, b), max_idiomatic(a, b), "max({a}, {b})");
}
}
// ββ clamp βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_clamp_within_range() {
assert_eq!(clamp_branchless(0, 10, 5), 5);
}
#[test]
fn test_clamp_below_lo() {
assert_eq!(clamp_branchless(0, 10, -5), 0);
}
#[test]
fn test_clamp_above_hi() {
assert_eq!(clamp_branchless(0, 10, 15), 10);
}
#[test]
fn test_clamp_matches_idiomatic() {
let cases = [(-5_i64, 5, -10), (-5, 5, 0), (-5, 5, 10), (0, 0, 0)];
for (lo, hi, x) in cases {
assert_eq!(
clamp_branchless(lo, hi, x),
clamp_idiomatic(lo, hi, x),
"clamp({lo}, {hi}, {x})"
);
}
}
// ββ abs βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_abs_branchless_positive() {
assert_eq!(abs_branchless(42), 42);
}
#[test]
fn test_abs_branchless_negative() {
assert_eq!(abs_branchless(-42), 42);
}
#[test]
fn test_abs_branchless_zero() {
assert_eq!(abs_branchless(0), 0);
}
#[test]
fn test_abs_matches_idiomatic() {
for x in [-1000_i64, -1, 0, 1, 1000, i64::MAX] {
assert_eq!(abs_branchless(x), abs_idiomatic(x), "abs({x})");
}
}
// ββ select ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_select_true() {
assert_eq!(select_branchless(true, 100, 200), 100);
}
#[test]
fn test_select_false() {
assert_eq!(select_branchless(false, 100, 200), 200);
}
// ββ sign ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_sign_positive() {
assert_eq!(sign_branchless(42), 1);
}
#[test]
fn test_sign_negative() {
assert_eq!(sign_branchless(-42), -1);
}
#[test]
fn test_sign_zero() {
assert_eq!(sign_branchless(0), 0);
}
}
Deep Comparison
OCaml vs Rust: Branchless Programming
Side-by-Side Code
OCaml
(* Branchless min: arithmetic right-shift produces 0 or all-ones mask *)
let min_branchless (a : int) (b : int) =
let diff = a - b in
b + (diff land (diff asr 62))
(* Branchless abs using sign-mask technique *)
let abs_branchless (x : int) =
let mask = x asr 62 in
(x + mask) lxor mask
(* Clamp via composition β no branches *)
let clamp_branchless lo hi x =
min_branchless hi (max_branchless lo x)
Rust (idiomatic β LLVM emits CMOV)
#[inline(always)]
pub fn min_idiomatic(a: i64, b: i64) -> i64 { a.min(b) }
#[inline(always)]
pub fn clamp_idiomatic(lo: i64, hi: i64, x: i64) -> i64 { x.clamp(lo, hi) }
#[inline(always)]
pub fn abs_idiomatic(x: i64) -> i64 { x.wrapping_abs() }
Rust (explicit branchless bitmask)
#[inline(always)]
pub fn min_branchless(a: i64, b: i64) -> i64 {
let diff = a.wrapping_sub(b);
let mask = diff >> 63; // 0 or -1 (all bits set)
b + (diff & mask)
}
#[inline(always)]
pub fn abs_branchless(x: i64) -> i64 {
let mask = x >> 63;
(x + mask) ^ mask
}
#[inline(always)]
pub fn select_branchless(cond: bool, a: i64, b: i64) -> i64 {
let mask = -(cond as i64); // 0 or -1
(a & mask) | (b & !mask)
}
Type Signatures
| Concept | OCaml | Rust |
|---|---|---|
| Integer type | int (63-bit on 64-bit platform) | i64 (explicit 64-bit) |
| Arithmetic right-shift | asr operator | >> 63 (arithmetic on signed) |
| Bitwise AND | land | & |
| Bitwise XOR | lxor | ^ |
| Safe subtraction | - (may overflow on edge cases) | wrapping_sub (defined overflow) |
| Abs | abs (stdlib, may branch) | wrapping_abs() |
| Clamp | custom composition | x.clamp(lo, hi) |
Key Insights
asr) and Rust (>> 63 on i64) sign-extend the MSB to produce a bitmask of 0 or -1. The difference: OCaml's int is 63-bit, so asr 62 is used; Rust's i64 is 64-bit, so >> 63 is correct.a.min(b), x.clamp(lo, hi), and x.wrapping_abs() to CMOV (conditional move) instructions on x86-64 β no conditional jump, no pipeline flush. The explicit bitmask technique is a teaching tool and a fallback for targets where CMOV is unavailable.wrapping_sub vs bare subtraction.** OCaml integers are 63-bit and overflow is technically undefined in some contexts; Rust makes the overflow semantics explicit with wrapping_sub, which is essential for the bitmask trick to be well-defined.select_branchless generalises the pattern.** Casting bool to i64 gives 0 or 1; negating gives 0 or -1. This is a portable, safe way to implement conditional selection without a branch β useful when the compiler fails to emit a CMOV on its own.When to Use Each Style
**Use idiomatic Rust (a.min(b), x.clamp(lo, hi)):** Always as the default β readable, safe, and LLVM produces CMOV. Profile before reaching for explicit bitmasks.
Use explicit branchless bitmasks when: Profiling confirms misprediction stalls, the target architecture lacks CMOV, or you need a select primitive that the compiler fails to optimise into a conditional move.
Exercises
i64::min(a, b) compiles to a cmov (conditional move) instruction by inspecting cargo rustc --release -- --emit asm output. Does branchless change
with -C target-cpu=native?
clamp(x: i64, lo: i64, hi: i64) -> i64 using two applications of branchless_min/branchless_max.
i32 values using sort_unstable vs a hand-written sorting network for N=8 elements. Measure branch mispredictions
with perf stat -e branch-misses.
ct_select from the subtle crate interface: a Choice type wrapping a secret boolean and a ConditionallySelectable trait for u64.
Vec<i32> and comparethe generated assembly with the LLVM auto-vectorized version.