ExamplesBy LevelBy TopicLearning Paths
893 Intermediate

893-group-by-iter — Group By Iterator

Functional Programming

Tutorial

The Problem

Run-length encoding, log analysis, and data aggregation all need to group consecutive equal elements or elements sharing a common key. SQL's GROUP BY aggregates all rows matching a key regardless of position. A consecutive group-by collapses adjacent equal elements — used in run-length encoding, detecting intervals of the same event type, and aggregating time-series data where consecutive observations belong to the same period. Haskell's Data.List.groupBy and OCaml's own group pattern both handle consecutive grouping. Rust's standard library has .chunk_by() (1.77+); for earlier versions or custom key logic, it must be implemented manually.

🎯 Learning Outcomes

  • • Implement group-consecutive using a peekable iterator pattern
  • • Implement group-by-key that produces (key, group) pairs
  • • Use HashMap for non-consecutive grouping (all elements sharing a key)
  • • Compare consecutive grouping (run-length style) with global grouping (SQL style)
  • • Recognize the difference between Itertools::group_by (consecutive) and HashMap::entry (global)
  • Code Example

    pub fn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>> {
        let mut groups: Vec<Vec<T>> = Vec::new();
        let mut iter = data.iter();
        let Some(first) = iter.next() else { return groups; };
        let mut current = vec![first.clone()];
    
        for item in iter {
            if *item == *current[0] {
                current.push(item.clone());
            } else {
                groups.push(current);
                current = vec![item.clone()];
            }
        }
        groups.push(current);
        groups
    }

    Key Differences

  • Consecutive vs global: Both languages have consecutive grouping (run-length style); global grouping requires a HashMap in Rust or Hashtbl in OCaml.
  • Standard library gap: Rust's chunk_by was added in 1.77; before that, it required the itertools crate or manual implementation; OCaml requires a library.
  • Output structure: Rust produces Vec<Vec<T>> or Vec<(K, Vec<T>)>; OCaml produces 'a list list or association lists.
  • Peekable vs recursive: Rust uses a peekable loop for efficiency; OCaml uses recursive pattern matching.
  • OCaml Approach

    OCaml implements group-by recursively: let rec group_by eq = function | [] -> [] | x :: rest -> let (same, others) = List.partition (eq x) rest in (x :: same) :: group_by eq others. This is O(n²) for the partition; the efficient consecutive version uses pattern matching on the head. For consecutive grouping: let rec group_consecutive = function | [] -> [] | [x] -> [[x]] | x :: y :: rest when x = y -> ... . Libraries like Base.List.group_by and Core.List.group provide this.

    Full Source

    #![allow(clippy::all)]
    // Example 099: Group By Iterator
    // Group consecutive equal (or same-key) elements without external crates.
    
    // === Approach 1: Idiomatic Rust — slice-based, uses windows/peekable pattern ===
    // Groups consecutive equal elements; returns Vec of grouped Vecs.
    pub fn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>> {
        let mut groups: Vec<Vec<T>> = Vec::new();
        let mut iter = data.iter();
        let Some(first) = iter.next() else {
            return groups;
        };
        let mut current = vec![first.clone()];
    
        for item in iter {
            if *item == current[0] {
                current.push(item.clone());
            } else {
                groups.push(current);
                current = vec![item.clone()];
            }
        }
        groups.push(current);
        groups
    }
    
    // === Approach 2: Group by key function — functional style, returns (key, group) pairs ===
    pub fn group_by_key<T: Clone, K: PartialEq>(data: &[T], key: impl Fn(&T) -> K) -> Vec<(K, Vec<T>)> {
        let mut groups: Vec<(K, Vec<T>)> = Vec::new();
        let mut iter = data.iter();
        let Some(first) = iter.next() else {
            return groups;
        };
        let mut current_key = key(first);
        let mut current_group = vec![first.clone()];
    
        for item in iter {
            let k = key(item);
            if k == current_key {
                current_group.push(item.clone());
            } else {
                groups.push((current_key, current_group));
                current_key = k;
                current_group = vec![item.clone()];
            }
        }
        groups.push((current_key, current_group));
        groups
    }
    
    // === Approach 3: Run-length encoding — encodes as (element, count) pairs ===
    pub fn run_length_encode<T: PartialEq + Clone>(data: &[T]) -> Vec<(T, usize)> {
        group_consecutive(data)
            .into_iter()
            .map(|g| {
                let len = g.len();
                // Safe: group_consecutive never produces empty groups
                (g.into_iter().next().unwrap(), len)
            })
            .collect()
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        // --- group_consecutive ---
    
        #[test]
        fn test_group_consecutive_empty() {
            let result = group_consecutive::<i32>(&[]);
            assert_eq!(result, Vec::<Vec<i32>>::new());
        }
    
        #[test]
        fn test_group_consecutive_single() {
            assert_eq!(group_consecutive(&[42]), vec![vec![42]]);
        }
    
        #[test]
        fn test_group_consecutive_multiple() {
            let input = vec![1, 1, 2, 2, 2, 3, 1, 1];
            let expected = vec![vec![1, 1], vec![2, 2, 2], vec![3], vec![1, 1]];
            assert_eq!(group_consecutive(&input), expected);
        }
    
        #[test]
        fn test_group_consecutive_all_same() {
            assert_eq!(group_consecutive(&[5, 5, 5]), vec![vec![5, 5, 5]]);
        }
    
        #[test]
        fn test_group_consecutive_all_different() {
            assert_eq!(
                group_consecutive(&[1, 2, 3]),
                vec![vec![1], vec![2], vec![3]]
            );
        }
    
        #[test]
        fn test_group_consecutive_strings() {
            let input = vec!["a", "a", "b", "c", "c"];
            let expected = vec![vec!["a", "a"], vec!["b"], vec!["c", "c"]];
            assert_eq!(group_consecutive(&input), expected);
        }
    
        // --- group_by_key ---
    
        #[test]
        fn test_group_by_key_empty() {
            let result = group_by_key::<i32, bool>(&[], |x| *x > 0);
            assert!(result.is_empty());
        }
    
        #[test]
        fn test_group_by_key_sign() {
            let input = vec![1, 2, -1, -2, 3];
            let result = group_by_key(&input, |x| *x > 0);
            assert_eq!(result.len(), 3);
            assert_eq!(result[0], (true, vec![1, 2]));
            assert_eq!(result[1], (false, vec![-1, -2]));
            assert_eq!(result[2], (true, vec![3]));
        }
    
        #[test]
        fn test_group_by_key_parity() {
            let input = vec![2, 4, 1, 3, 6];
            let result = group_by_key(&input, |x| x % 2);
            assert_eq!(result[0], (0, vec![2, 4]));
            assert_eq!(result[1], (1, vec![1, 3]));
            assert_eq!(result[2], (0, vec![6]));
        }
    
        // --- run_length_encode ---
    
        #[test]
        fn test_rle_empty() {
            let result = run_length_encode::<char>(&[]);
            assert!(result.is_empty());
        }
    
        #[test]
        fn test_rle_typical() {
            let input = vec!['a', 'a', 'a', 'b', 'b', 'c'];
            let expected = vec![('a', 3), ('b', 2), ('c', 1)];
            assert_eq!(run_length_encode(&input), expected);
        }
    
        #[test]
        fn test_rle_no_repeats() {
            let input = vec![1, 2, 3];
            let expected = vec![(1, 1), (2, 1), (3, 1)];
            assert_eq!(run_length_encode(&input), expected);
        }
    
        #[test]
        fn test_rle_single_run() {
            assert_eq!(run_length_encode(&[7, 7, 7, 7]), vec![(7, 4)]);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        // --- group_consecutive ---
    
        #[test]
        fn test_group_consecutive_empty() {
            let result = group_consecutive::<i32>(&[]);
            assert_eq!(result, Vec::<Vec<i32>>::new());
        }
    
        #[test]
        fn test_group_consecutive_single() {
            assert_eq!(group_consecutive(&[42]), vec![vec![42]]);
        }
    
        #[test]
        fn test_group_consecutive_multiple() {
            let input = vec![1, 1, 2, 2, 2, 3, 1, 1];
            let expected = vec![vec![1, 1], vec![2, 2, 2], vec![3], vec![1, 1]];
            assert_eq!(group_consecutive(&input), expected);
        }
    
        #[test]
        fn test_group_consecutive_all_same() {
            assert_eq!(group_consecutive(&[5, 5, 5]), vec![vec![5, 5, 5]]);
        }
    
        #[test]
        fn test_group_consecutive_all_different() {
            assert_eq!(
                group_consecutive(&[1, 2, 3]),
                vec![vec![1], vec![2], vec![3]]
            );
        }
    
        #[test]
        fn test_group_consecutive_strings() {
            let input = vec!["a", "a", "b", "c", "c"];
            let expected = vec![vec!["a", "a"], vec!["b"], vec!["c", "c"]];
            assert_eq!(group_consecutive(&input), expected);
        }
    
        // --- group_by_key ---
    
        #[test]
        fn test_group_by_key_empty() {
            let result = group_by_key::<i32, bool>(&[], |x| *x > 0);
            assert!(result.is_empty());
        }
    
        #[test]
        fn test_group_by_key_sign() {
            let input = vec![1, 2, -1, -2, 3];
            let result = group_by_key(&input, |x| *x > 0);
            assert_eq!(result.len(), 3);
            assert_eq!(result[0], (true, vec![1, 2]));
            assert_eq!(result[1], (false, vec![-1, -2]));
            assert_eq!(result[2], (true, vec![3]));
        }
    
        #[test]
        fn test_group_by_key_parity() {
            let input = vec![2, 4, 1, 3, 6];
            let result = group_by_key(&input, |x| x % 2);
            assert_eq!(result[0], (0, vec![2, 4]));
            assert_eq!(result[1], (1, vec![1, 3]));
            assert_eq!(result[2], (0, vec![6]));
        }
    
        // --- run_length_encode ---
    
        #[test]
        fn test_rle_empty() {
            let result = run_length_encode::<char>(&[]);
            assert!(result.is_empty());
        }
    
        #[test]
        fn test_rle_typical() {
            let input = vec!['a', 'a', 'a', 'b', 'b', 'c'];
            let expected = vec![('a', 3), ('b', 2), ('c', 1)];
            assert_eq!(run_length_encode(&input), expected);
        }
    
        #[test]
        fn test_rle_no_repeats() {
            let input = vec![1, 2, 3];
            let expected = vec![(1, 1), (2, 1), (3, 1)];
            assert_eq!(run_length_encode(&input), expected);
        }
    
        #[test]
        fn test_rle_single_run() {
            assert_eq!(run_length_encode(&[7, 7, 7, 7]), vec![(7, 4)]);
        }
    }

    Deep Comparison

    OCaml vs Rust: Group By Iterator

    Side-by-Side Code

    OCaml

    (* Group consecutive equal elements *)
    let group_consecutive lst =
      match lst with
      | [] -> []
      | first :: rest ->
        let groups, current = List.fold_left (fun (groups, current) x ->
          match current with
          | [] -> (groups, [x])
          | hd :: _ when hd = x -> (groups, x :: current)
          | _ -> (List.rev current :: groups, [x])
        ) ([], [first]) rest in
        List.rev (List.rev current :: groups)
    
    (* Group by key function *)
    let group_by_key key lst =
      match lst with
      | [] -> []
      | first :: rest ->
        let groups, current_key, current = List.fold_left (fun (groups, k, current) x ->
          let new_key = key x in
          if new_key = k then (groups, k, x :: current)
          else (List.rev current :: groups, new_key, [x])
        ) ([], key first, [first]) rest in
        List.rev (List.rev current :: groups)
    

    Rust (idiomatic)

    pub fn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>> {
        let mut groups: Vec<Vec<T>> = Vec::new();
        let mut iter = data.iter();
        let Some(first) = iter.next() else { return groups; };
        let mut current = vec![first.clone()];
    
        for item in iter {
            if *item == *current[0] {
                current.push(item.clone());
            } else {
                groups.push(current);
                current = vec![item.clone()];
            }
        }
        groups.push(current);
        groups
    }
    

    Rust (functional/recursive — run-length encoding via iterator composition)

    pub fn run_length_encode<T: PartialEq + Clone>(data: &[T]) -> Vec<(T, usize)> {
        group_consecutive(data)
            .into_iter()
            .map(|g| {
                let len = g.len();
                (g.into_iter().next().unwrap(), len)
            })
            .collect()
    }
    

    Type Signatures

    ConceptOCamlRust
    Group consecutive'a list -> 'a list listfn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>>
    Group by key('a -> 'b) -> 'a list -> 'a list listfn group_by_key<T: Clone, K: PartialEq>(data: &[T], key: impl Fn(&T) -> K) -> Vec<(K, Vec<T>)>
    Run-length encode'a list -> ('a * int) listfn run_length_encode<T: PartialEq + Clone>(data: &[T]) -> Vec<(T, usize)>
    List type'a list&[T] (borrowed slice)
    Grouped result'a list listVec<Vec<T>>

    Key Insights

  • Fold vs imperative accumulation: OCaml uses List.fold_left with an immutable accumulator tuple (groups, current), building the result purely functionally. Rust naturally reaches for mutable Vec accumulators inside a for loop — equally idiomatic and avoids repeated allocation overhead.
  • Structural pattern matching vs. index comparison: OCaml matches on the head of current with hd :: _ when hd = x, which is elegant list deconstruction. Rust compares via current[0] on a Vec, trading structural elegance for direct, bounds-clear indexing.
  • Key function as higher-order argument: Both languages accept a key function (Fn(&T) -> K in Rust, 'a -> 'b in OCaml). Rust's trait bounds make the constraint explicit — K: PartialEq — while OCaml infers equality capability automatically via polymorphic =.
  • **Ownership and Clone:** Rust requires T: Clone to copy values into groups from a borrowed slice. OCaml's garbage collector and persistent lists make this cost invisible — values are always shared. The Clone bound is the honest cost of building owned sub-collections from a borrowed input.
  • Iterator composition: Rust's run_length_encode naturally composes group_consecutive with .map().collect(), mirroring functional pipeline style. OCaml achieves the same with List.map over the grouped result — both idioms express the same dataflow clearly.
  • When to Use Each Style

    Use idiomatic Rust (mutable accumulator) when: performance matters and you want to avoid repeated intermediate allocations; the grouping logic is straightforward and clarity trumps minimalism.

    Use iterator composition when: you want to build higher-level operations from simpler ones (e.g., run_length_encode built on top of group_consecutive), keeping each function small, pure, and independently testable.

    Exercises

  • Implement run_length_encode<T: Eq + Clone>(data: &[T]) -> Vec<(T, usize)> using group_consecutive.
  • Write group_by_all<T: Clone, K: Eq + Hash>(data: &[T], key: impl Fn(&T) -> K) -> HashMap<K, Vec<T>> for non-consecutive grouping.
  • Implement group_by_ranges(data: &[i32], range_size: i32) -> Vec<(i32, Vec<i32>)> that groups numbers into buckets of size range_size.
  • Open Source Repos