893-group-by-iter — Group By Iterator
Tutorial
The Problem
Run-length encoding, log analysis, and data aggregation all need to group consecutive equal elements or elements sharing a common key. SQL's GROUP BY aggregates all rows matching a key regardless of position. A consecutive group-by collapses adjacent equal elements — used in run-length encoding, detecting intervals of the same event type, and aggregating time-series data where consecutive observations belong to the same period. Haskell's Data.List.groupBy and OCaml's own group pattern both handle consecutive grouping. Rust's standard library has .chunk_by() (1.77+); for earlier versions or custom key logic, it must be implemented manually.
🎯 Learning Outcomes
(key, group) pairsHashMap for non-consecutive grouping (all elements sharing a key)Itertools::group_by (consecutive) and HashMap::entry (global)Code Example
pub fn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>> {
let mut groups: Vec<Vec<T>> = Vec::new();
let mut iter = data.iter();
let Some(first) = iter.next() else { return groups; };
let mut current = vec![first.clone()];
for item in iter {
if *item == *current[0] {
current.push(item.clone());
} else {
groups.push(current);
current = vec![item.clone()];
}
}
groups.push(current);
groups
}Key Differences
HashMap in Rust or Hashtbl in OCaml.chunk_by was added in 1.77; before that, it required the itertools crate or manual implementation; OCaml requires a library.Vec<Vec<T>> or Vec<(K, Vec<T>)>; OCaml produces 'a list list or association lists.OCaml Approach
OCaml implements group-by recursively: let rec group_by eq = function | [] -> [] | x :: rest -> let (same, others) = List.partition (eq x) rest in (x :: same) :: group_by eq others. This is O(n²) for the partition; the efficient consecutive version uses pattern matching on the head. For consecutive grouping: let rec group_consecutive = function | [] -> [] | [x] -> [[x]] | x :: y :: rest when x = y -> ... . Libraries like Base.List.group_by and Core.List.group provide this.
Full Source
#![allow(clippy::all)]
// Example 099: Group By Iterator
// Group consecutive equal (or same-key) elements without external crates.
// === Approach 1: Idiomatic Rust — slice-based, uses windows/peekable pattern ===
// Groups consecutive equal elements; returns Vec of grouped Vecs.
pub fn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>> {
let mut groups: Vec<Vec<T>> = Vec::new();
let mut iter = data.iter();
let Some(first) = iter.next() else {
return groups;
};
let mut current = vec![first.clone()];
for item in iter {
if *item == current[0] {
current.push(item.clone());
} else {
groups.push(current);
current = vec![item.clone()];
}
}
groups.push(current);
groups
}
// === Approach 2: Group by key function — functional style, returns (key, group) pairs ===
pub fn group_by_key<T: Clone, K: PartialEq>(data: &[T], key: impl Fn(&T) -> K) -> Vec<(K, Vec<T>)> {
let mut groups: Vec<(K, Vec<T>)> = Vec::new();
let mut iter = data.iter();
let Some(first) = iter.next() else {
return groups;
};
let mut current_key = key(first);
let mut current_group = vec![first.clone()];
for item in iter {
let k = key(item);
if k == current_key {
current_group.push(item.clone());
} else {
groups.push((current_key, current_group));
current_key = k;
current_group = vec![item.clone()];
}
}
groups.push((current_key, current_group));
groups
}
// === Approach 3: Run-length encoding — encodes as (element, count) pairs ===
pub fn run_length_encode<T: PartialEq + Clone>(data: &[T]) -> Vec<(T, usize)> {
group_consecutive(data)
.into_iter()
.map(|g| {
let len = g.len();
// Safe: group_consecutive never produces empty groups
(g.into_iter().next().unwrap(), len)
})
.collect()
}
#[cfg(test)]
mod tests {
use super::*;
// --- group_consecutive ---
#[test]
fn test_group_consecutive_empty() {
let result = group_consecutive::<i32>(&[]);
assert_eq!(result, Vec::<Vec<i32>>::new());
}
#[test]
fn test_group_consecutive_single() {
assert_eq!(group_consecutive(&[42]), vec![vec![42]]);
}
#[test]
fn test_group_consecutive_multiple() {
let input = vec![1, 1, 2, 2, 2, 3, 1, 1];
let expected = vec![vec![1, 1], vec![2, 2, 2], vec![3], vec![1, 1]];
assert_eq!(group_consecutive(&input), expected);
}
#[test]
fn test_group_consecutive_all_same() {
assert_eq!(group_consecutive(&[5, 5, 5]), vec![vec![5, 5, 5]]);
}
#[test]
fn test_group_consecutive_all_different() {
assert_eq!(
group_consecutive(&[1, 2, 3]),
vec![vec![1], vec![2], vec![3]]
);
}
#[test]
fn test_group_consecutive_strings() {
let input = vec!["a", "a", "b", "c", "c"];
let expected = vec![vec!["a", "a"], vec!["b"], vec!["c", "c"]];
assert_eq!(group_consecutive(&input), expected);
}
// --- group_by_key ---
#[test]
fn test_group_by_key_empty() {
let result = group_by_key::<i32, bool>(&[], |x| *x > 0);
assert!(result.is_empty());
}
#[test]
fn test_group_by_key_sign() {
let input = vec![1, 2, -1, -2, 3];
let result = group_by_key(&input, |x| *x > 0);
assert_eq!(result.len(), 3);
assert_eq!(result[0], (true, vec![1, 2]));
assert_eq!(result[1], (false, vec![-1, -2]));
assert_eq!(result[2], (true, vec![3]));
}
#[test]
fn test_group_by_key_parity() {
let input = vec![2, 4, 1, 3, 6];
let result = group_by_key(&input, |x| x % 2);
assert_eq!(result[0], (0, vec![2, 4]));
assert_eq!(result[1], (1, vec![1, 3]));
assert_eq!(result[2], (0, vec![6]));
}
// --- run_length_encode ---
#[test]
fn test_rle_empty() {
let result = run_length_encode::<char>(&[]);
assert!(result.is_empty());
}
#[test]
fn test_rle_typical() {
let input = vec!['a', 'a', 'a', 'b', 'b', 'c'];
let expected = vec![('a', 3), ('b', 2), ('c', 1)];
assert_eq!(run_length_encode(&input), expected);
}
#[test]
fn test_rle_no_repeats() {
let input = vec![1, 2, 3];
let expected = vec![(1, 1), (2, 1), (3, 1)];
assert_eq!(run_length_encode(&input), expected);
}
#[test]
fn test_rle_single_run() {
assert_eq!(run_length_encode(&[7, 7, 7, 7]), vec![(7, 4)]);
}
}#[cfg(test)]
mod tests {
use super::*;
// --- group_consecutive ---
#[test]
fn test_group_consecutive_empty() {
let result = group_consecutive::<i32>(&[]);
assert_eq!(result, Vec::<Vec<i32>>::new());
}
#[test]
fn test_group_consecutive_single() {
assert_eq!(group_consecutive(&[42]), vec![vec![42]]);
}
#[test]
fn test_group_consecutive_multiple() {
let input = vec![1, 1, 2, 2, 2, 3, 1, 1];
let expected = vec![vec![1, 1], vec![2, 2, 2], vec![3], vec![1, 1]];
assert_eq!(group_consecutive(&input), expected);
}
#[test]
fn test_group_consecutive_all_same() {
assert_eq!(group_consecutive(&[5, 5, 5]), vec![vec![5, 5, 5]]);
}
#[test]
fn test_group_consecutive_all_different() {
assert_eq!(
group_consecutive(&[1, 2, 3]),
vec![vec![1], vec![2], vec![3]]
);
}
#[test]
fn test_group_consecutive_strings() {
let input = vec!["a", "a", "b", "c", "c"];
let expected = vec![vec!["a", "a"], vec!["b"], vec!["c", "c"]];
assert_eq!(group_consecutive(&input), expected);
}
// --- group_by_key ---
#[test]
fn test_group_by_key_empty() {
let result = group_by_key::<i32, bool>(&[], |x| *x > 0);
assert!(result.is_empty());
}
#[test]
fn test_group_by_key_sign() {
let input = vec![1, 2, -1, -2, 3];
let result = group_by_key(&input, |x| *x > 0);
assert_eq!(result.len(), 3);
assert_eq!(result[0], (true, vec![1, 2]));
assert_eq!(result[1], (false, vec![-1, -2]));
assert_eq!(result[2], (true, vec![3]));
}
#[test]
fn test_group_by_key_parity() {
let input = vec![2, 4, 1, 3, 6];
let result = group_by_key(&input, |x| x % 2);
assert_eq!(result[0], (0, vec![2, 4]));
assert_eq!(result[1], (1, vec![1, 3]));
assert_eq!(result[2], (0, vec![6]));
}
// --- run_length_encode ---
#[test]
fn test_rle_empty() {
let result = run_length_encode::<char>(&[]);
assert!(result.is_empty());
}
#[test]
fn test_rle_typical() {
let input = vec!['a', 'a', 'a', 'b', 'b', 'c'];
let expected = vec![('a', 3), ('b', 2), ('c', 1)];
assert_eq!(run_length_encode(&input), expected);
}
#[test]
fn test_rle_no_repeats() {
let input = vec![1, 2, 3];
let expected = vec![(1, 1), (2, 1), (3, 1)];
assert_eq!(run_length_encode(&input), expected);
}
#[test]
fn test_rle_single_run() {
assert_eq!(run_length_encode(&[7, 7, 7, 7]), vec![(7, 4)]);
}
}
Deep Comparison
OCaml vs Rust: Group By Iterator
Side-by-Side Code
OCaml
(* Group consecutive equal elements *)
let group_consecutive lst =
match lst with
| [] -> []
| first :: rest ->
let groups, current = List.fold_left (fun (groups, current) x ->
match current with
| [] -> (groups, [x])
| hd :: _ when hd = x -> (groups, x :: current)
| _ -> (List.rev current :: groups, [x])
) ([], [first]) rest in
List.rev (List.rev current :: groups)
(* Group by key function *)
let group_by_key key lst =
match lst with
| [] -> []
| first :: rest ->
let groups, current_key, current = List.fold_left (fun (groups, k, current) x ->
let new_key = key x in
if new_key = k then (groups, k, x :: current)
else (List.rev current :: groups, new_key, [x])
) ([], key first, [first]) rest in
List.rev (List.rev current :: groups)
Rust (idiomatic)
pub fn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>> {
let mut groups: Vec<Vec<T>> = Vec::new();
let mut iter = data.iter();
let Some(first) = iter.next() else { return groups; };
let mut current = vec![first.clone()];
for item in iter {
if *item == *current[0] {
current.push(item.clone());
} else {
groups.push(current);
current = vec![item.clone()];
}
}
groups.push(current);
groups
}
Rust (functional/recursive — run-length encoding via iterator composition)
pub fn run_length_encode<T: PartialEq + Clone>(data: &[T]) -> Vec<(T, usize)> {
group_consecutive(data)
.into_iter()
.map(|g| {
let len = g.len();
(g.into_iter().next().unwrap(), len)
})
.collect()
}
Type Signatures
| Concept | OCaml | Rust |
|---|---|---|
| Group consecutive | 'a list -> 'a list list | fn group_consecutive<T: PartialEq + Clone>(data: &[T]) -> Vec<Vec<T>> |
| Group by key | ('a -> 'b) -> 'a list -> 'a list list | fn group_by_key<T: Clone, K: PartialEq>(data: &[T], key: impl Fn(&T) -> K) -> Vec<(K, Vec<T>)> |
| Run-length encode | 'a list -> ('a * int) list | fn run_length_encode<T: PartialEq + Clone>(data: &[T]) -> Vec<(T, usize)> |
| List type | 'a list | &[T] (borrowed slice) |
| Grouped result | 'a list list | Vec<Vec<T>> |
Key Insights
List.fold_left with an immutable accumulator tuple (groups, current), building the result purely functionally. Rust naturally reaches for mutable Vec accumulators inside a for loop — equally idiomatic and avoids repeated allocation overhead.current with hd :: _ when hd = x, which is elegant list deconstruction. Rust compares via current[0] on a Vec, trading structural elegance for direct, bounds-clear indexing.Fn(&T) -> K in Rust, 'a -> 'b in OCaml). Rust's trait bounds make the constraint explicit — K: PartialEq — while OCaml infers equality capability automatically via polymorphic =.Clone:** Rust requires T: Clone to copy values into groups from a borrowed slice. OCaml's garbage collector and persistent lists make this cost invisible — values are always shared. The Clone bound is the honest cost of building owned sub-collections from a borrowed input.run_length_encode naturally composes group_consecutive with .map().collect(), mirroring functional pipeline style. OCaml achieves the same with List.map over the grouped result — both idioms express the same dataflow clearly.When to Use Each Style
Use idiomatic Rust (mutable accumulator) when: performance matters and you want to avoid repeated intermediate allocations; the grouping logic is straightforward and clarity trumps minimalism.
Use iterator composition when: you want to build higher-level operations from simpler ones (e.g., run_length_encode built on top of group_consecutive), keeping each function small, pure, and independently testable.
Exercises
run_length_encode<T: Eq + Clone>(data: &[T]) -> Vec<(T, usize)> using group_consecutive.group_by_all<T: Clone, K: Eq + Hash>(data: &[T], key: impl Fn(&T) -> K) -> HashMap<K, Vec<T>> for non-consecutive grouping.group_by_ranges(data: &[i32], range_size: i32) -> Vec<(i32, Vec<i32>)> that groups numbers into buckets of size range_size.