1033 Advanced

1033-binary-heap-topk — Top-K with BinaryHeap

Functional Programming

Tutorial

The Problem

Finding the k largest (or smallest) elements from a large stream is a common problem in data processing: top-10 most visited pages from a billion-row log, top-k nearest neighbors in machine learning, k most recent events. Sorting the entire dataset is O(n log n), but maintaining a heap of size k gives O(n log k) — much better when k is small relative to n.

Rust's BinaryHeap is a max-heap by default. For top-k (keeping the k largest), use a min-heap of size k with Reverse<T>: when a new element is larger than the heap minimum, evict the minimum and insert the new element.

🎯 Learning Outcomes

• Understand max-heap vs min-heap semantics in Rust

• Use Reverse<T> to turn BinaryHeap into a min-heap

• Implement the top-k algorithm with O(n log k) time complexity

• Use BinaryHeap as a priority queue for task scheduling

• Compare sorting vs heap approaches for small k vs large k

Code Example

//! 1033: Top-K Elements with `BinaryHeap`
//!
//! Rust ships a binary heap in the standard library — `std::collections::BinaryHeap`
//! is a max-heap. Wrap values in `std::cmp::Reverse` to flip it into a min-heap,
//! which is exactly what the Top-K pattern needs: keep a bounded min-heap and
//! evict the smallest whenever a new, larger element arrives.

use std::cmp::Reverse;
use std::collections::BinaryHeap;

/// Return the `k` largest elements from `data` in descending order.
///
/// Uses a bounded min-heap of size `k`, giving `O(n log k)` time.
/// If `k == 0` the result is empty; if `k >= data.len()` all elements are returned sorted.
pub fn top_k(k: usize, data: &[i32]) -> Vec<i32> {
    if k == 0 {
        return Vec::new();
    }

    let mut heap: BinaryHeap<Reverse<i32>> = BinaryHeap::with_capacity(k + 1);
    for &val in data {
        heap.push(Reverse(val));
        if heap.len() > k {
            heap.pop();
        }
    }

    let mut result: Vec<i32> = heap.into_iter().map(|Reverse(x)| x).collect();
    result.sort_unstable_by(|a, b| b.cmp(a));
    result
}

/// Return the `k` elements of `data` with the largest keys, ordered by key descending.
///
/// `key_fn` extracts the ordering key from each element. Ties are broken by the
/// original index so the result is deterministic.
pub fn top_k_by<T, K, F>(k: usize, data: &[T], key_fn: F) -> Vec<&T>
where
    K: Ord,
    F: Fn(&T) -> K,
{
    if k == 0 {
        return Vec::new();
    }

    let mut heap: BinaryHeap<Reverse<(K, usize)>> = BinaryHeap::new();
    for (i, item) in data.iter().enumerate() {
        heap.push(Reverse((key_fn(item), i)));
        if heap.len() > k {
            heap.pop();
        }
    }

    let mut entries: Vec<(K, usize)> = heap.into_iter().map(|Reverse(pair)| pair).collect();
    entries.sort_by(|(ka, ia), (kb, ib)| kb.cmp(ka).then(ia.cmp(ib)));
    entries.into_iter().map(|(_, i)| &data[i]).collect()
}

/// Demonstrate `BinaryHeap` behaving as a max-heap with natural ordering.
pub fn max_heap_demo() {
    let mut heap = BinaryHeap::new();
    for x in [3, 1, 4, 1, 5] {
        heap.push(x);
    }

    assert_eq!(heap.pop(), Some(5));
    assert_eq!(heap.pop(), Some(4));
    assert_eq!(heap.pop(), Some(3));
    assert_eq!(heap.peek(), Some(&1));
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn top_k_basic() {
        let data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3];
        assert_eq!(top_k(3, &data), vec![9, 6, 5]);
        assert_eq!(top_k(1, &data), vec![9]);
        assert_eq!(top_k(5, &data), vec![9, 6, 5, 5, 4]);
    }

    #[test]
    fn top_k_edge_cases() {
        assert_eq!(top_k(0, &[1, 2, 3]), Vec::<i32>::new());
        assert_eq!(top_k(3, &[]), Vec::<i32>::new());
        assert_eq!(top_k(10, &[2, 1]), vec![2, 1]);
    }

    #[test]
    fn top_k_by_length() {
        let words = ["hi", "hello", "hey", "howdy", "h"];
        let longest = top_k_by(3, &words, |w| w.len());
        assert_eq!(longest.len(), 3);
        assert_eq!(*longest[0], "howdy");
        assert_eq!(*longest[1], "hello");
    }

    #[test]
    fn max_heap_works() {
        max_heap_demo();
    }

    #[test]
    fn into_sorted_vec_is_ascending() {
        let heap: BinaryHeap<_> = [5, 3, 8, 1, 9].into_iter().collect();
        assert_eq!(heap.into_sorted_vec(), vec![1, 3, 5, 8, 9]);
    }

    #[test]
    fn reverse_gives_min_heap() {
        let mut min_heap: BinaryHeap<Reverse<i32>> = BinaryHeap::new();
        for x in [5, 1, 3] {
            min_heap.push(Reverse(x));
        }
        assert_eq!(min_heap.pop(), Some(Reverse(1)));
    }
}

(* 1033: Top-K Elements with Binary Heap *)
(* OCaml has no built-in heap — we use sorted insertion or a manual heap *)

(* Approach 1: Sort and take top-k *)
let top_k_sort k lst =
  lst |> List.sort (fun a b -> compare b a) |> List.filteri (fun i _ -> i < k)

let sort_approach () =
  let data = [3; 1; 4; 1; 5; 9; 2; 6; 5; 3] in
  let top3 = top_k_sort 3 data in
  assert (top3 = [9; 6; 5])

(* Approach 2: Maintain a min-heap of size k using a sorted list *)
(* Keep only k largest elements *)
let top_k_linear k lst =
  let insert_bounded heap x =
    if List.length heap < k then
      List.sort compare (x :: heap)
    else
      let min_val = List.hd heap in
      if x > min_val then
        List.sort compare (x :: List.tl heap)
      else
        heap
  in
  List.fold_left (fun heap x -> insert_bounded heap x) [] lst

let bounded_heap_approach () =
  let data = [3; 1; 4; 1; 5; 9; 2; 6; 5; 3] in
  let top3 = top_k_linear 3 data |> List.rev in
  assert (top3 = [9; 6; 5])

(* Approach 3: Top-k with key function *)
let top_k_by k key_fn lst =
  lst
  |> List.map (fun x -> (key_fn x, x))
  |> List.sort (fun (a, _) (b, _) -> compare b a)
  |> List.filteri (fun i _ -> i < k)
  |> List.map snd

let top_k_by_test () =
  let words = ["hi"; "hello"; "hey"; "howdy"; "h"] in
  let longest3 = top_k_by 3 String.length words in
  assert (List.length longest3 = 3);
  assert (List.hd longest3 = "howdy")

let () =
  sort_approach ();
  bounded_heap_approach ();
  top_k_by_test ();
  Printf.printf "✓ All tests passed\n"

Key Differences

Max vs min: Rust's BinaryHeap is a max-heap by default; a min-heap requires Reverse<T>. OCaml's CCHeap takes a custom comparison function.

Mutable vs functional: Rust's BinaryHeap is mutable; OCaml's functional priority queues (CCFQueue) are persistent.

Standard library: BinaryHeap is in Rust's std; OCaml requires a third-party crate for heap structures.

**Reverse wrapper**: Rust's Reverse<T> inverts the Ord comparison without allocating; OCaml inverts comparison via a custom functor argument.

OCaml Approach

OCaml lacks a standard heap. The algorithmic pattern uses sorting or a functional priority queue:

(* Simple O(n log n) approach via sort *)
let top_k k data =
  List.sort (fun a b -> compare b a) data
  |> (fun sorted -> List.filteri (fun i _ -> i < k) sorted)

The containers library provides CCHeap and CCFQueue (functional priority queues) for the O(n log k) approach.

Full Source

//! 1033: Top-K Elements with `BinaryHeap`
//!
//! Rust ships a binary heap in the standard library — `std::collections::BinaryHeap`
//! is a max-heap. Wrap values in `std::cmp::Reverse` to flip it into a min-heap,
//! which is exactly what the Top-K pattern needs: keep a bounded min-heap and
//! evict the smallest whenever a new, larger element arrives.

use std::cmp::Reverse;
use std::collections::BinaryHeap;

/// Return the `k` largest elements from `data` in descending order.
///
/// Uses a bounded min-heap of size `k`, giving `O(n log k)` time.
/// If `k == 0` the result is empty; if `k >= data.len()` all elements are returned sorted.
pub fn top_k(k: usize, data: &[i32]) -> Vec<i32> {
    if k == 0 {
        return Vec::new();
    }

    let mut heap: BinaryHeap<Reverse<i32>> = BinaryHeap::with_capacity(k + 1);
    for &val in data {
        heap.push(Reverse(val));
        if heap.len() > k {
            heap.pop();
        }
    }

    let mut result: Vec<i32> = heap.into_iter().map(|Reverse(x)| x).collect();
    result.sort_unstable_by(|a, b| b.cmp(a));
    result
}

/// Return the `k` elements of `data` with the largest keys, ordered by key descending.
///
/// `key_fn` extracts the ordering key from each element. Ties are broken by the
/// original index so the result is deterministic.
pub fn top_k_by<T, K, F>(k: usize, data: &[T], key_fn: F) -> Vec<&T>
where
    K: Ord,
    F: Fn(&T) -> K,
{
    if k == 0 {
        return Vec::new();
    }

    let mut heap: BinaryHeap<Reverse<(K, usize)>> = BinaryHeap::new();
    for (i, item) in data.iter().enumerate() {
        heap.push(Reverse((key_fn(item), i)));
        if heap.len() > k {
            heap.pop();
        }
    }

    let mut entries: Vec<(K, usize)> = heap.into_iter().map(|Reverse(pair)| pair).collect();
    entries.sort_by(|(ka, ia), (kb, ib)| kb.cmp(ka).then(ia.cmp(ib)));
    entries.into_iter().map(|(_, i)| &data[i]).collect()
}

/// Demonstrate `BinaryHeap` behaving as a max-heap with natural ordering.
pub fn max_heap_demo() {
    let mut heap = BinaryHeap::new();
    for x in [3, 1, 4, 1, 5] {
        heap.push(x);
    }

    assert_eq!(heap.pop(), Some(5));
    assert_eq!(heap.pop(), Some(4));
    assert_eq!(heap.pop(), Some(3));
    assert_eq!(heap.peek(), Some(&1));
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn top_k_basic() {
        let data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3];
        assert_eq!(top_k(3, &data), vec![9, 6, 5]);
        assert_eq!(top_k(1, &data), vec![9]);
        assert_eq!(top_k(5, &data), vec![9, 6, 5, 5, 4]);
    }

    #[test]
    fn top_k_edge_cases() {
        assert_eq!(top_k(0, &[1, 2, 3]), Vec::<i32>::new());
        assert_eq!(top_k(3, &[]), Vec::<i32>::new());
        assert_eq!(top_k(10, &[2, 1]), vec![2, 1]);
    }

    #[test]
    fn top_k_by_length() {
        let words = ["hi", "hello", "hey", "howdy", "h"];
        let longest = top_k_by(3, &words, |w| w.len());
        assert_eq!(longest.len(), 3);
        assert_eq!(*longest[0], "howdy");
        assert_eq!(*longest[1], "hello");
    }

    #[test]
    fn max_heap_works() {
        max_heap_demo();
    }

    #[test]
    fn into_sorted_vec_is_ascending() {
        let heap: BinaryHeap<_> = [5, 3, 8, 1, 9].into_iter().collect();
        assert_eq!(heap.into_sorted_vec(), vec![1, 3, 5, 8, 9]);
    }

    #[test]
    fn reverse_gives_min_heap() {
        let mut min_heap: BinaryHeap<Reverse<i32>> = BinaryHeap::new();
        for x in [5, 1, 3] {
            min_heap.push(Reverse(x));
        }
        assert_eq!(min_heap.pop(), Some(Reverse(1)));
    }
}

(* 1033: Top-K Elements with Binary Heap *)
(* OCaml has no built-in heap — we use sorted insertion or a manual heap *)

(* Approach 1: Sort and take top-k *)
let top_k_sort k lst =
  lst |> List.sort (fun a b -> compare b a) |> List.filteri (fun i _ -> i < k)

let sort_approach () =
  let data = [3; 1; 4; 1; 5; 9; 2; 6; 5; 3] in
  let top3 = top_k_sort 3 data in
  assert (top3 = [9; 6; 5])

(* Approach 2: Maintain a min-heap of size k using a sorted list *)
(* Keep only k largest elements *)
let top_k_linear k lst =
  let insert_bounded heap x =
    if List.length heap < k then
      List.sort compare (x :: heap)
    else
      let min_val = List.hd heap in
      if x > min_val then
        List.sort compare (x :: List.tl heap)
      else
        heap
  in
  List.fold_left (fun heap x -> insert_bounded heap x) [] lst

let bounded_heap_approach () =
  let data = [3; 1; 4; 1; 5; 9; 2; 6; 5; 3] in
  let top3 = top_k_linear 3 data |> List.rev in
  assert (top3 = [9; 6; 5])

(* Approach 3: Top-k with key function *)
let top_k_by k key_fn lst =
  lst
  |> List.map (fun x -> (key_fn x, x))
  |> List.sort (fun (a, _) (b, _) -> compare b a)
  |> List.filteri (fun i _ -> i < k)
  |> List.map snd

let top_k_by_test () =
  let words = ["hi"; "hello"; "hey"; "howdy"; "h"] in
  let longest3 = top_k_by 3 String.length words in
  assert (List.length longest3 = 3);
  assert (List.hd longest3 = "howdy")

let () =
  sort_approach ();
  bounded_heap_approach ();
  top_k_by_test ();
  Printf.printf "✓ All tests passed\n"

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn top_k_basic() {
        let data = [3, 1, 4, 1, 5, 9, 2, 6, 5, 3];
        assert_eq!(top_k(3, &data), vec![9, 6, 5]);
        assert_eq!(top_k(1, &data), vec![9]);
        assert_eq!(top_k(5, &data), vec![9, 6, 5, 5, 4]);
    }

    #[test]
    fn top_k_edge_cases() {
        assert_eq!(top_k(0, &[1, 2, 3]), Vec::<i32>::new());
        assert_eq!(top_k(3, &[]), Vec::<i32>::new());
        assert_eq!(top_k(10, &[2, 1]), vec![2, 1]);
    }

    #[test]
    fn top_k_by_length() {
        let words = ["hi", "hello", "hey", "howdy", "h"];
        let longest = top_k_by(3, &words, |w| w.len());
        assert_eq!(longest.len(), 3);
        assert_eq!(*longest[0], "howdy");
        assert_eq!(*longest[1], "hello");
    }

    #[test]
    fn max_heap_works() {
        max_heap_demo();
    }

    #[test]
    fn into_sorted_vec_is_ascending() {
        let heap: BinaryHeap<_> = [5, 3, 8, 1, 9].into_iter().collect();
        assert_eq!(heap.into_sorted_vec(), vec![1, 3, 5, 8, 9]);
    }

    #[test]
    fn reverse_gives_min_heap() {
        let mut min_heap: BinaryHeap<Reverse<i32>> = BinaryHeap::new();
        for x in [5, 1, 3] {
            min_heap.push(Reverse(x));
        }
        assert_eq!(min_heap.pop(), Some(Reverse(1)));
    }
}

Deep Comparison

Top-K Elements with BinaryHeap — Comparison

Core Insight

The top-K problem is a classic priority queue application. Maintain a min-heap of size K; for each element, push it and evict the smallest if heap exceeds K. Result: only the K largest remain. Rust has BinaryHeap in std; OCaml has no built-in heap.

OCaml Approach

• No stdlib heap — must use sorted list or manual implementation

• Sort-and-take: O(n log n) — simple but not optimal

• Bounded sorted list: maintain sorted list of size K, insert with binary-ish placement

• Functorized priority queue libraries exist but aren't in stdlib

Rust Approach

• BinaryHeap<T> — max-heap by default

• BinaryHeap<Reverse<T>> — min-heap via newtype wrapper

• push + pop for bounded heap pattern

• into_sorted_vec() for sorted extraction

• peek() for O(1) max/min access

Comparison Table

Feature	OCaml	Rust (`BinaryHeap`)
Stdlib heap	No	Yes
Default order	N/A	Max-heap
Min-heap	N/A	`Reverse<T>` wrapper
Top-K complexity	O(n log n) sort	O(n log k) heap
Heap sort	Manual	`into_sorted_vec()`
Peek	N/A	`peek()` O(1)
Custom ordering	N/A	Implement `Ord`

Exercises

Implement bottom_k(k: usize, data: &[i32]) -> Vec<i32> (k smallest elements) using a max-heap instead of a min-heap.

Write a kth_largest(k: usize, data: &[i32]) -> Option<i32> function that returns only the kth largest element without collecting all k.

Implement a task scheduler using BinaryHeap<(Priority, Task)> where higher priority numbers are processed first.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust