1033-binary-heap-topk — Top-K with BinaryHeap
Tutorial
The Problem
Finding the k largest (or smallest) elements from a large stream is a common problem in data processing: top-10 most visited pages from a billion-row log, top-k nearest neighbors in machine learning, k most recent events. Sorting the entire dataset is O(n log n), but maintaining a heap of size k gives O(n log k) — much better when k is small relative to n.
Rust's BinaryHeap is a max-heap by default. For top-k (keeping the k largest), use a min-heap of size k with Reverse<T>: when a new element is larger than the heap minimum, evict the minimum and insert the new element.
🎯 Learning Outcomes
Reverse<T> to turn BinaryHeap into a min-heapBinaryHeap as a priority queue for task schedulingCode Example
#![allow(clippy::all)]
// 1033: Top-K Elements with BinaryHeap
// BinaryHeap is a max-heap; use Reverse<T> for min-heap behavior
use std::cmp::Reverse;
use std::collections::BinaryHeap;
/// Top-K using a min-heap of size K
/// Keep only the K largest elements by evicting the smallest
fn top_k(k: usize, data: &[i32]) -> Vec<i32> {
let mut heap: BinaryHeap<Reverse<i32>> = BinaryHeap::with_capacity(k + 1);
for &val in data {
heap.push(Reverse(val));
if heap.len() > k {
heap.pop(); // Remove smallest (which is the max in Reverse heap)
}
}
let mut result: Vec<i32> = heap.into_iter().map(|Reverse(x)| x).collect();
result.sort_unstable_by(|a, b| b.cmp(a)); // Descending
result
}
fn test_top_k() {
let data = vec![3, 1, 4, 1, 5, 9, 2, 6, 5, 3];
let result = top_k(3, &data);
assert_eq!(result, vec![9, 6, 5]);
let result = top_k(1, &data);
assert_eq!(result, vec![9]);
}
/// BinaryHeap as a max-heap: natural ordering
fn max_heap_demo() {
let mut heap = BinaryHeap::new();
heap.push(3);
heap.push(1);
heap.push(4);
heap.push(1);
heap.push(5);
// pop() returns largest first
assert_eq!(heap.pop(), Some(5));
assert_eq!(heap.pop(), Some(4));
assert_eq!(heap.pop(), Some(3));
// peek without removing
assert_eq!(heap.peek(), Some(&1));
}
/// Top-K with custom key function
fn top_k_by<T, K, F>(k: usize, data: &[T], key_fn: F) -> Vec<&T>
where
K: Ord,
F: Fn(&T) -> K,
{
let mut heap: BinaryHeap<Reverse<(K, usize)>> = BinaryHeap::new();
for (i, item) in data.iter().enumerate() {
heap.push(Reverse((key_fn(item), i)));
if heap.len() > k {
heap.pop();
}
}
let mut indices: Vec<usize> = heap.into_iter().map(|Reverse((_, i))| i).collect();
indices.sort_by(|&a, &b| key_fn(&data[b]).cmp(&key_fn(&data[a])).then(b.cmp(&a)));
indices.iter().map(|&i| &data[i]).collect()
}
fn test_top_k_by() {
let words = vec!["hi", "hello", "hey", "howdy", "h"];
let longest3 = top_k_by(3, &words, |w| w.len());
assert_eq!(longest3.len(), 3);
assert_eq!(*longest3[0], "howdy");
assert_eq!(*longest3[1], "hello");
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_top_k_basic() {
test_top_k();
}
#[test]
fn test_max_heap() {
max_heap_demo();
}
#[test]
fn test_custom_key() {
test_top_k_by();
}
#[test]
fn test_from_vec() {
let v = vec![5, 3, 8, 1, 9];
let heap: BinaryHeap<_> = v.into_iter().collect();
let sorted: Vec<_> = heap.into_sorted_vec();
assert_eq!(sorted, vec![1, 3, 5, 8, 9]); // ascending
}
#[test]
fn test_min_heap() {
let mut min_heap: BinaryHeap<Reverse<i32>> = BinaryHeap::new();
min_heap.push(Reverse(5));
min_heap.push(Reverse(1));
min_heap.push(Reverse(3));
assert_eq!(min_heap.pop(), Some(Reverse(1))); // smallest first
}
}Key Differences
BinaryHeap is a max-heap by default; a min-heap requires Reverse<T>. OCaml's CCHeap takes a custom comparison function.BinaryHeap is mutable; OCaml's functional priority queues (CCFQueue) are persistent.BinaryHeap is in Rust's std; OCaml requires a third-party crate for heap structures.Reverse wrapper**: Rust's Reverse<T> inverts the Ord comparison without allocating; OCaml inverts comparison via a custom functor argument.OCaml Approach
OCaml lacks a standard heap. The algorithmic pattern uses sorting or a functional priority queue:
(* Simple O(n log n) approach via sort *)
let top_k k data =
List.sort (fun a b -> compare b a) data
|> (fun sorted -> List.filteri (fun i _ -> i < k) sorted)
The containers library provides CCHeap and CCFQueue (functional priority queues) for the O(n log k) approach.
Full Source
#![allow(clippy::all)]
// 1033: Top-K Elements with BinaryHeap
// BinaryHeap is a max-heap; use Reverse<T> for min-heap behavior
use std::cmp::Reverse;
use std::collections::BinaryHeap;
/// Top-K using a min-heap of size K
/// Keep only the K largest elements by evicting the smallest
fn top_k(k: usize, data: &[i32]) -> Vec<i32> {
let mut heap: BinaryHeap<Reverse<i32>> = BinaryHeap::with_capacity(k + 1);
for &val in data {
heap.push(Reverse(val));
if heap.len() > k {
heap.pop(); // Remove smallest (which is the max in Reverse heap)
}
}
let mut result: Vec<i32> = heap.into_iter().map(|Reverse(x)| x).collect();
result.sort_unstable_by(|a, b| b.cmp(a)); // Descending
result
}
fn test_top_k() {
let data = vec![3, 1, 4, 1, 5, 9, 2, 6, 5, 3];
let result = top_k(3, &data);
assert_eq!(result, vec![9, 6, 5]);
let result = top_k(1, &data);
assert_eq!(result, vec![9]);
}
/// BinaryHeap as a max-heap: natural ordering
fn max_heap_demo() {
let mut heap = BinaryHeap::new();
heap.push(3);
heap.push(1);
heap.push(4);
heap.push(1);
heap.push(5);
// pop() returns largest first
assert_eq!(heap.pop(), Some(5));
assert_eq!(heap.pop(), Some(4));
assert_eq!(heap.pop(), Some(3));
// peek without removing
assert_eq!(heap.peek(), Some(&1));
}
/// Top-K with custom key function
fn top_k_by<T, K, F>(k: usize, data: &[T], key_fn: F) -> Vec<&T>
where
K: Ord,
F: Fn(&T) -> K,
{
let mut heap: BinaryHeap<Reverse<(K, usize)>> = BinaryHeap::new();
for (i, item) in data.iter().enumerate() {
heap.push(Reverse((key_fn(item), i)));
if heap.len() > k {
heap.pop();
}
}
let mut indices: Vec<usize> = heap.into_iter().map(|Reverse((_, i))| i).collect();
indices.sort_by(|&a, &b| key_fn(&data[b]).cmp(&key_fn(&data[a])).then(b.cmp(&a)));
indices.iter().map(|&i| &data[i]).collect()
}
fn test_top_k_by() {
let words = vec!["hi", "hello", "hey", "howdy", "h"];
let longest3 = top_k_by(3, &words, |w| w.len());
assert_eq!(longest3.len(), 3);
assert_eq!(*longest3[0], "howdy");
assert_eq!(*longest3[1], "hello");
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_top_k_basic() {
test_top_k();
}
#[test]
fn test_max_heap() {
max_heap_demo();
}
#[test]
fn test_custom_key() {
test_top_k_by();
}
#[test]
fn test_from_vec() {
let v = vec![5, 3, 8, 1, 9];
let heap: BinaryHeap<_> = v.into_iter().collect();
let sorted: Vec<_> = heap.into_sorted_vec();
assert_eq!(sorted, vec![1, 3, 5, 8, 9]); // ascending
}
#[test]
fn test_min_heap() {
let mut min_heap: BinaryHeap<Reverse<i32>> = BinaryHeap::new();
min_heap.push(Reverse(5));
min_heap.push(Reverse(1));
min_heap.push(Reverse(3));
assert_eq!(min_heap.pop(), Some(Reverse(1))); // smallest first
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_top_k_basic() {
test_top_k();
}
#[test]
fn test_max_heap() {
max_heap_demo();
}
#[test]
fn test_custom_key() {
test_top_k_by();
}
#[test]
fn test_from_vec() {
let v = vec![5, 3, 8, 1, 9];
let heap: BinaryHeap<_> = v.into_iter().collect();
let sorted: Vec<_> = heap.into_sorted_vec();
assert_eq!(sorted, vec![1, 3, 5, 8, 9]); // ascending
}
#[test]
fn test_min_heap() {
let mut min_heap: BinaryHeap<Reverse<i32>> = BinaryHeap::new();
min_heap.push(Reverse(5));
min_heap.push(Reverse(1));
min_heap.push(Reverse(3));
assert_eq!(min_heap.pop(), Some(Reverse(1))); // smallest first
}
}
Deep Comparison
Top-K Elements with BinaryHeap — Comparison
Core Insight
The top-K problem is a classic priority queue application. Maintain a min-heap of size K; for each element, push it and evict the smallest if heap exceeds K. Result: only the K largest remain. Rust has BinaryHeap in std; OCaml has no built-in heap.
OCaml Approach
Rust Approach
BinaryHeap<T> — max-heap by defaultBinaryHeap<Reverse<T>> — min-heap via newtype wrapperpush + pop for bounded heap patterninto_sorted_vec() for sorted extractionpeek() for O(1) max/min accessComparison Table
| Feature | OCaml | Rust (BinaryHeap) |
|---|---|---|
| Stdlib heap | No | Yes |
| Default order | N/A | Max-heap |
| Min-heap | N/A | Reverse<T> wrapper |
| Top-K complexity | O(n log n) sort | O(n log k) heap |
| Heap sort | Manual | into_sorted_vec() |
| Peek | N/A | peek() O(1) |
| Custom ordering | N/A | Implement Ord |
Exercises
bottom_k(k: usize, data: &[i32]) -> Vec<i32> (k smallest elements) using a max-heap instead of a min-heap.kth_largest(k: usize, data: &[i32]) -> Option<i32> function that returns only the kth largest element without collecting all k.BinaryHeap<(Priority, Task)> where higher priority numbers are processed first.