ExamplesBy LevelBy TopicLearning Paths
1033 Advanced

1033-binary-heap-topk — Top-K with BinaryHeap

Functional Programming

Tutorial

The Problem

Finding the k largest (or smallest) elements from a large stream is a common problem in data processing: top-10 most visited pages from a billion-row log, top-k nearest neighbors in machine learning, k most recent events. Sorting the entire dataset is O(n log n), but maintaining a heap of size k gives O(n log k) — much better when k is small relative to n.

Rust's BinaryHeap is a max-heap by default. For top-k (keeping the k largest), use a min-heap of size k with Reverse<T>: when a new element is larger than the heap minimum, evict the minimum and insert the new element.

🎯 Learning Outcomes

  • • Understand max-heap vs min-heap semantics in Rust
  • • Use Reverse<T> to turn BinaryHeap into a min-heap
  • • Implement the top-k algorithm with O(n log k) time complexity
  • • Use BinaryHeap as a priority queue for task scheduling
  • • Compare sorting vs heap approaches for small k vs large k
  • Code Example

    #![allow(clippy::all)]
    // 1033: Top-K Elements with BinaryHeap
    // BinaryHeap is a max-heap; use Reverse<T> for min-heap behavior
    
    use std::cmp::Reverse;
    use std::collections::BinaryHeap;
    
    /// Top-K using a min-heap of size K
    /// Keep only the K largest elements by evicting the smallest
    fn top_k(k: usize, data: &[i32]) -> Vec<i32> {
        let mut heap: BinaryHeap<Reverse<i32>> = BinaryHeap::with_capacity(k + 1);
    
        for &val in data {
            heap.push(Reverse(val));
            if heap.len() > k {
                heap.pop(); // Remove smallest (which is the max in Reverse heap)
            }
        }
    
        let mut result: Vec<i32> = heap.into_iter().map(|Reverse(x)| x).collect();
        result.sort_unstable_by(|a, b| b.cmp(a)); // Descending
        result
    }
    
    fn test_top_k() {
        let data = vec![3, 1, 4, 1, 5, 9, 2, 6, 5, 3];
        let result = top_k(3, &data);
        assert_eq!(result, vec![9, 6, 5]);
    
        let result = top_k(1, &data);
        assert_eq!(result, vec![9]);
    }
    
    /// BinaryHeap as a max-heap: natural ordering
    fn max_heap_demo() {
        let mut heap = BinaryHeap::new();
        heap.push(3);
        heap.push(1);
        heap.push(4);
        heap.push(1);
        heap.push(5);
    
        // pop() returns largest first
        assert_eq!(heap.pop(), Some(5));
        assert_eq!(heap.pop(), Some(4));
        assert_eq!(heap.pop(), Some(3));
    
        // peek without removing
        assert_eq!(heap.peek(), Some(&1));
    }
    
    /// Top-K with custom key function
    fn top_k_by<T, K, F>(k: usize, data: &[T], key_fn: F) -> Vec<&T>
    where
        K: Ord,
        F: Fn(&T) -> K,
    {
        let mut heap: BinaryHeap<Reverse<(K, usize)>> = BinaryHeap::new();
    
        for (i, item) in data.iter().enumerate() {
            heap.push(Reverse((key_fn(item), i)));
            if heap.len() > k {
                heap.pop();
            }
        }
    
        let mut indices: Vec<usize> = heap.into_iter().map(|Reverse((_, i))| i).collect();
        indices.sort_by(|&a, &b| key_fn(&data[b]).cmp(&key_fn(&data[a])).then(b.cmp(&a)));
        indices.iter().map(|&i| &data[i]).collect()
    }
    
    fn test_top_k_by() {
        let words = vec!["hi", "hello", "hey", "howdy", "h"];
        let longest3 = top_k_by(3, &words, |w| w.len());
        assert_eq!(longest3.len(), 3);
        assert_eq!(*longest3[0], "howdy");
        assert_eq!(*longest3[1], "hello");
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_top_k_basic() {
            test_top_k();
        }
    
        #[test]
        fn test_max_heap() {
            max_heap_demo();
        }
    
        #[test]
        fn test_custom_key() {
            test_top_k_by();
        }
    
        #[test]
        fn test_from_vec() {
            let v = vec![5, 3, 8, 1, 9];
            let heap: BinaryHeap<_> = v.into_iter().collect();
            let sorted: Vec<_> = heap.into_sorted_vec();
            assert_eq!(sorted, vec![1, 3, 5, 8, 9]); // ascending
        }
    
        #[test]
        fn test_min_heap() {
            let mut min_heap: BinaryHeap<Reverse<i32>> = BinaryHeap::new();
            min_heap.push(Reverse(5));
            min_heap.push(Reverse(1));
            min_heap.push(Reverse(3));
            assert_eq!(min_heap.pop(), Some(Reverse(1))); // smallest first
        }
    }

    Key Differences

  • Max vs min: Rust's BinaryHeap is a max-heap by default; a min-heap requires Reverse<T>. OCaml's CCHeap takes a custom comparison function.
  • Mutable vs functional: Rust's BinaryHeap is mutable; OCaml's functional priority queues (CCFQueue) are persistent.
  • Standard library: BinaryHeap is in Rust's std; OCaml requires a third-party crate for heap structures.
  • **Reverse wrapper**: Rust's Reverse<T> inverts the Ord comparison without allocating; OCaml inverts comparison via a custom functor argument.
  • OCaml Approach

    OCaml lacks a standard heap. The algorithmic pattern uses sorting or a functional priority queue:

    (* Simple O(n log n) approach via sort *)
    let top_k k data =
      List.sort (fun a b -> compare b a) data
      |> (fun sorted -> List.filteri (fun i _ -> i < k) sorted)
    

    The containers library provides CCHeap and CCFQueue (functional priority queues) for the O(n log k) approach.

    Full Source

    #![allow(clippy::all)]
    // 1033: Top-K Elements with BinaryHeap
    // BinaryHeap is a max-heap; use Reverse<T> for min-heap behavior
    
    use std::cmp::Reverse;
    use std::collections::BinaryHeap;
    
    /// Top-K using a min-heap of size K
    /// Keep only the K largest elements by evicting the smallest
    fn top_k(k: usize, data: &[i32]) -> Vec<i32> {
        let mut heap: BinaryHeap<Reverse<i32>> = BinaryHeap::with_capacity(k + 1);
    
        for &val in data {
            heap.push(Reverse(val));
            if heap.len() > k {
                heap.pop(); // Remove smallest (which is the max in Reverse heap)
            }
        }
    
        let mut result: Vec<i32> = heap.into_iter().map(|Reverse(x)| x).collect();
        result.sort_unstable_by(|a, b| b.cmp(a)); // Descending
        result
    }
    
    fn test_top_k() {
        let data = vec![3, 1, 4, 1, 5, 9, 2, 6, 5, 3];
        let result = top_k(3, &data);
        assert_eq!(result, vec![9, 6, 5]);
    
        let result = top_k(1, &data);
        assert_eq!(result, vec![9]);
    }
    
    /// BinaryHeap as a max-heap: natural ordering
    fn max_heap_demo() {
        let mut heap = BinaryHeap::new();
        heap.push(3);
        heap.push(1);
        heap.push(4);
        heap.push(1);
        heap.push(5);
    
        // pop() returns largest first
        assert_eq!(heap.pop(), Some(5));
        assert_eq!(heap.pop(), Some(4));
        assert_eq!(heap.pop(), Some(3));
    
        // peek without removing
        assert_eq!(heap.peek(), Some(&1));
    }
    
    /// Top-K with custom key function
    fn top_k_by<T, K, F>(k: usize, data: &[T], key_fn: F) -> Vec<&T>
    where
        K: Ord,
        F: Fn(&T) -> K,
    {
        let mut heap: BinaryHeap<Reverse<(K, usize)>> = BinaryHeap::new();
    
        for (i, item) in data.iter().enumerate() {
            heap.push(Reverse((key_fn(item), i)));
            if heap.len() > k {
                heap.pop();
            }
        }
    
        let mut indices: Vec<usize> = heap.into_iter().map(|Reverse((_, i))| i).collect();
        indices.sort_by(|&a, &b| key_fn(&data[b]).cmp(&key_fn(&data[a])).then(b.cmp(&a)));
        indices.iter().map(|&i| &data[i]).collect()
    }
    
    fn test_top_k_by() {
        let words = vec!["hi", "hello", "hey", "howdy", "h"];
        let longest3 = top_k_by(3, &words, |w| w.len());
        assert_eq!(longest3.len(), 3);
        assert_eq!(*longest3[0], "howdy");
        assert_eq!(*longest3[1], "hello");
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_top_k_basic() {
            test_top_k();
        }
    
        #[test]
        fn test_max_heap() {
            max_heap_demo();
        }
    
        #[test]
        fn test_custom_key() {
            test_top_k_by();
        }
    
        #[test]
        fn test_from_vec() {
            let v = vec![5, 3, 8, 1, 9];
            let heap: BinaryHeap<_> = v.into_iter().collect();
            let sorted: Vec<_> = heap.into_sorted_vec();
            assert_eq!(sorted, vec![1, 3, 5, 8, 9]); // ascending
        }
    
        #[test]
        fn test_min_heap() {
            let mut min_heap: BinaryHeap<Reverse<i32>> = BinaryHeap::new();
            min_heap.push(Reverse(5));
            min_heap.push(Reverse(1));
            min_heap.push(Reverse(3));
            assert_eq!(min_heap.pop(), Some(Reverse(1))); // smallest first
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_top_k_basic() {
            test_top_k();
        }
    
        #[test]
        fn test_max_heap() {
            max_heap_demo();
        }
    
        #[test]
        fn test_custom_key() {
            test_top_k_by();
        }
    
        #[test]
        fn test_from_vec() {
            let v = vec![5, 3, 8, 1, 9];
            let heap: BinaryHeap<_> = v.into_iter().collect();
            let sorted: Vec<_> = heap.into_sorted_vec();
            assert_eq!(sorted, vec![1, 3, 5, 8, 9]); // ascending
        }
    
        #[test]
        fn test_min_heap() {
            let mut min_heap: BinaryHeap<Reverse<i32>> = BinaryHeap::new();
            min_heap.push(Reverse(5));
            min_heap.push(Reverse(1));
            min_heap.push(Reverse(3));
            assert_eq!(min_heap.pop(), Some(Reverse(1))); // smallest first
        }
    }

    Deep Comparison

    Top-K Elements with BinaryHeap — Comparison

    Core Insight

    The top-K problem is a classic priority queue application. Maintain a min-heap of size K; for each element, push it and evict the smallest if heap exceeds K. Result: only the K largest remain. Rust has BinaryHeap in std; OCaml has no built-in heap.

    OCaml Approach

  • • No stdlib heap — must use sorted list or manual implementation
  • • Sort-and-take: O(n log n) — simple but not optimal
  • • Bounded sorted list: maintain sorted list of size K, insert with binary-ish placement
  • • Functorized priority queue libraries exist but aren't in stdlib
  • Rust Approach

  • BinaryHeap<T> — max-heap by default
  • BinaryHeap<Reverse<T>> — min-heap via newtype wrapper
  • push + pop for bounded heap pattern
  • into_sorted_vec() for sorted extraction
  • peek() for O(1) max/min access
  • Comparison Table

    FeatureOCamlRust (BinaryHeap)
    Stdlib heapNoYes
    Default orderN/AMax-heap
    Min-heapN/AReverse<T> wrapper
    Top-K complexityO(n log n) sortO(n log k) heap
    Heap sortManualinto_sorted_vec()
    PeekN/Apeek() O(1)
    Custom orderingN/AImplement Ord

    Exercises

  • Implement bottom_k(k: usize, data: &[i32]) -> Vec<i32> (k smallest elements) using a max-heap instead of a min-heap.
  • Write a kth_largest(k: usize, data: &[i32]) -> Option<i32> function that returns only the kth largest element without collecting all k.
  • Implement a task scheduler using BinaryHeap<(Priority, Task)> where higher priority numbers are processed first.
  • Open Source Repos