948 Merge Sort
Tutorial
The Problem
Implement functional merge sort in Rust: recursively split a slice in half, sort each half, then merge two sorted slices. Implement two merge variants — one with explicit index cursors and one using peekable iterators — to demonstrate the functional style. Compare with OCaml's list-based merge sort that uses pattern matching.
🎯 Learning Outcomes
merge_sort<T: Ord + Clone>(list: &[T]) -> Vec<T> using slice splittingmerge<T: Ord + Clone>(left: &[T], right: &[T]) -> Vec<T> with two-pointer mergemerge_iter using Peekable iterators for a more functional-style mergeClone is required because merge sort produces new Vec allocations at each level[] | [_] base case, List.init splitCode Example
#![allow(clippy::all)]
/// Merge Sort — Functional Divide and Conquer
///
/// Pure functional merge sort: split the list, sort each half, merge.
/// OCaml uses pattern matching on lists; Rust uses slices and Vec.
/// Merge two sorted slices into a new sorted Vec.
pub fn merge<T: Ord + Clone>(left: &[T], right: &[T]) -> Vec<T> {
let mut result = Vec::with_capacity(left.len() + right.len());
let (mut i, mut j) = (0, 0);
while i < left.len() && j < right.len() {
if left[i] <= right[j] {
result.push(left[i].clone());
i += 1;
} else {
result.push(right[j].clone());
j += 1;
}
}
result.extend_from_slice(&left[i..]);
result.extend_from_slice(&right[j..]);
result
}
/// Functional merge sort — recursive, creates new Vecs at each level.
pub fn merge_sort<T: Ord + Clone>(list: &[T]) -> Vec<T> {
if list.len() <= 1 {
return list.to_vec();
}
let mid = list.len() / 2;
let left = merge_sort(&list[..mid]);
let right = merge_sort(&list[mid..]);
merge(&left, &right)
}
/// Merge using iterators — more functional style.
pub fn merge_iter<T: Ord + Clone>(left: &[T], right: &[T]) -> Vec<T> {
let mut result = Vec::with_capacity(left.len() + right.len());
let mut li = left.iter().peekable();
let mut ri = right.iter().peekable();
loop {
match (li.peek(), ri.peek()) {
(Some(l), Some(r)) => {
if l <= r {
result.push((*li.next().unwrap()).clone());
} else {
result.push((*ri.next().unwrap()).clone());
}
}
(Some(_), None) => {
result.extend(li.cloned());
break;
}
(None, Some(_)) => {
result.extend(ri.cloned());
break;
}
(None, None) => break,
}
}
result
}
/// Custom comparator version (like OCaml's `cmp` parameter).
pub fn merge_sort_by<T: Clone>(list: &[T], cmp: &impl Fn(&T, &T) -> std::cmp::Ordering) -> Vec<T> {
if list.len() <= 1 {
return list.to_vec();
}
let mid = list.len() / 2;
let left = merge_sort_by(&list[..mid], cmp);
let right = merge_sort_by(&list[mid..], cmp);
let mut result = Vec::with_capacity(left.len() + right.len());
let (mut i, mut j) = (0, 0);
while i < left.len() && j < right.len() {
if cmp(&left[i], &right[j]) != std::cmp::Ordering::Greater {
result.push(left[i].clone());
i += 1;
} else {
result.push(right[j].clone());
j += 1;
}
}
result.extend_from_slice(&left[i..]);
result.extend_from_slice(&right[j..]);
result
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic() {
assert_eq!(merge_sort(&[5, 2, 8, 1, 9, 3]), vec![1, 2, 3, 5, 8, 9]);
}
#[test]
fn test_empty() {
assert_eq!(merge_sort::<i32>(&[]), Vec::<i32>::new());
}
#[test]
fn test_single() {
assert_eq!(merge_sort(&[42]), vec![42]);
}
#[test]
fn test_already_sorted() {
assert_eq!(merge_sort(&[1, 2, 3, 4, 5]), vec![1, 2, 3, 4, 5]);
}
#[test]
fn test_reverse_sorted() {
assert_eq!(merge_sort(&[5, 4, 3, 2, 1]), vec![1, 2, 3, 4, 5]);
}
#[test]
fn test_duplicates() {
assert_eq!(merge_sort(&[3, 1, 2, 1, 3]), vec![1, 1, 2, 3, 3]);
}
#[test]
fn test_strings() {
assert_eq!(
merge_sort(&["banana", "apple", "cherry"]),
vec!["apple", "banana", "cherry"]
);
}
#[test]
fn test_custom_cmp() {
// Sort descending
let result = merge_sort_by(&[1, 5, 3, 2, 4], &|a: &i32, b: &i32| b.cmp(a));
assert_eq!(result, vec![5, 4, 3, 2, 1]);
}
}Key Differences
| Aspect | Rust | OCaml |
|---|---|---|
| Input type | &[T] slice — zero-copy split at mid | 'a list — must copy for split |
| Clone requirement | T: Clone — explicit | GC handles sharing transparently |
| Base case | list.len() <= 1 | Pattern match [] \| [_] |
| Merge style | Index cursor or Peekable | Pattern match on pairs |
| Memory | One Vec per merge step | One list cons per element |
Merge sort is naturally recursive and allocation-heavy. For production use, Rust's slice::sort (pdqsort) is significantly faster. Functional merge sort here is for algorithmic clarity.
OCaml Approach
let rec merge xs ys = match xs, ys with
| [], ys -> ys
| xs, [] -> xs
| x :: xs', y :: ys' ->
if x <= y then x :: merge xs' ys
else y :: merge xs ys'
let split xs =
let n = List.length xs in
let k = n / 2 in
(List.filteri (fun i _ -> i < k) xs,
List.filteri (fun i _ -> i >= k) xs)
let rec merge_sort = function
| [] | [_] as xs -> xs
| xs ->
let (left, right) = split xs in
merge (merge_sort left) (merge_sort right)
OCaml's pattern match on (xs, ys) pairs makes the merge function read as three cases: one side empty, other side empty, or both non-empty. The recursive merge in OCaml consumes the lists directly — no index tracking needed.
Full Source
#![allow(clippy::all)]
/// Merge Sort — Functional Divide and Conquer
///
/// Pure functional merge sort: split the list, sort each half, merge.
/// OCaml uses pattern matching on lists; Rust uses slices and Vec.
/// Merge two sorted slices into a new sorted Vec.
pub fn merge<T: Ord + Clone>(left: &[T], right: &[T]) -> Vec<T> {
let mut result = Vec::with_capacity(left.len() + right.len());
let (mut i, mut j) = (0, 0);
while i < left.len() && j < right.len() {
if left[i] <= right[j] {
result.push(left[i].clone());
i += 1;
} else {
result.push(right[j].clone());
j += 1;
}
}
result.extend_from_slice(&left[i..]);
result.extend_from_slice(&right[j..]);
result
}
/// Functional merge sort — recursive, creates new Vecs at each level.
pub fn merge_sort<T: Ord + Clone>(list: &[T]) -> Vec<T> {
if list.len() <= 1 {
return list.to_vec();
}
let mid = list.len() / 2;
let left = merge_sort(&list[..mid]);
let right = merge_sort(&list[mid..]);
merge(&left, &right)
}
/// Merge using iterators — more functional style.
pub fn merge_iter<T: Ord + Clone>(left: &[T], right: &[T]) -> Vec<T> {
let mut result = Vec::with_capacity(left.len() + right.len());
let mut li = left.iter().peekable();
let mut ri = right.iter().peekable();
loop {
match (li.peek(), ri.peek()) {
(Some(l), Some(r)) => {
if l <= r {
result.push((*li.next().unwrap()).clone());
} else {
result.push((*ri.next().unwrap()).clone());
}
}
(Some(_), None) => {
result.extend(li.cloned());
break;
}
(None, Some(_)) => {
result.extend(ri.cloned());
break;
}
(None, None) => break,
}
}
result
}
/// Custom comparator version (like OCaml's `cmp` parameter).
pub fn merge_sort_by<T: Clone>(list: &[T], cmp: &impl Fn(&T, &T) -> std::cmp::Ordering) -> Vec<T> {
if list.len() <= 1 {
return list.to_vec();
}
let mid = list.len() / 2;
let left = merge_sort_by(&list[..mid], cmp);
let right = merge_sort_by(&list[mid..], cmp);
let mut result = Vec::with_capacity(left.len() + right.len());
let (mut i, mut j) = (0, 0);
while i < left.len() && j < right.len() {
if cmp(&left[i], &right[j]) != std::cmp::Ordering::Greater {
result.push(left[i].clone());
i += 1;
} else {
result.push(right[j].clone());
j += 1;
}
}
result.extend_from_slice(&left[i..]);
result.extend_from_slice(&right[j..]);
result
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic() {
assert_eq!(merge_sort(&[5, 2, 8, 1, 9, 3]), vec![1, 2, 3, 5, 8, 9]);
}
#[test]
fn test_empty() {
assert_eq!(merge_sort::<i32>(&[]), Vec::<i32>::new());
}
#[test]
fn test_single() {
assert_eq!(merge_sort(&[42]), vec![42]);
}
#[test]
fn test_already_sorted() {
assert_eq!(merge_sort(&[1, 2, 3, 4, 5]), vec![1, 2, 3, 4, 5]);
}
#[test]
fn test_reverse_sorted() {
assert_eq!(merge_sort(&[5, 4, 3, 2, 1]), vec![1, 2, 3, 4, 5]);
}
#[test]
fn test_duplicates() {
assert_eq!(merge_sort(&[3, 1, 2, 1, 3]), vec![1, 1, 2, 3, 3]);
}
#[test]
fn test_strings() {
assert_eq!(
merge_sort(&["banana", "apple", "cherry"]),
vec!["apple", "banana", "cherry"]
);
}
#[test]
fn test_custom_cmp() {
// Sort descending
let result = merge_sort_by(&[1, 5, 3, 2, 4], &|a: &i32, b: &i32| b.cmp(a));
assert_eq!(result, vec![5, 4, 3, 2, 1]);
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic() {
assert_eq!(merge_sort(&[5, 2, 8, 1, 9, 3]), vec![1, 2, 3, 5, 8, 9]);
}
#[test]
fn test_empty() {
assert_eq!(merge_sort::<i32>(&[]), Vec::<i32>::new());
}
#[test]
fn test_single() {
assert_eq!(merge_sort(&[42]), vec![42]);
}
#[test]
fn test_already_sorted() {
assert_eq!(merge_sort(&[1, 2, 3, 4, 5]), vec![1, 2, 3, 4, 5]);
}
#[test]
fn test_reverse_sorted() {
assert_eq!(merge_sort(&[5, 4, 3, 2, 1]), vec![1, 2, 3, 4, 5]);
}
#[test]
fn test_duplicates() {
assert_eq!(merge_sort(&[3, 1, 2, 1, 3]), vec![1, 1, 2, 3, 3]);
}
#[test]
fn test_strings() {
assert_eq!(
merge_sort(&["banana", "apple", "cherry"]),
vec!["apple", "banana", "cherry"]
);
}
#[test]
fn test_custom_cmp() {
// Sort descending
let result = merge_sort_by(&[1, 5, 3, 2, 4], &|a: &i32, b: &i32| b.cmp(a));
assert_eq!(result, vec![5, 4, 3, 2, 1]);
}
}
Deep Comparison
Merge Sort — Functional Divide and Conquer: OCaml vs Rust
The Core Insight
Merge sort is the quintessential functional sorting algorithm because it's naturally recursive and doesn't require mutation. The comparison reveals how OCaml's linked lists and Rust's contiguous Vecs lead to different implementation strategies with the same O(n log n) complexity but different constant factors.
OCaml Approach
OCaml's merge sort is textbook elegant. split alternates elements into two lists using pattern matching on pairs (a :: b :: rest). merge recursively cons-es the smaller head onto the merged tail. The compare function is passed as a first-class value for custom ordering. All intermediate lists share structure through the GC — no copying needed. The ([] | [_]) as l -> l or-pattern cleanly handles base cases.
Rust Approach
Rust uses slices (&[T]) for input and Vec<T> for output. Splitting is trivial with slice indexing: &list[..mid] and &list[mid..]. Merging uses index-based iteration with push and extend_from_slice for efficient bulk copying. Vec::with_capacity pre-allocates the exact needed size, avoiding reallocation. The Ord trait bound replaces OCaml's compare function parameter. A peekable iterator variant shows the more functional style.
Side-by-Side
| Concept | OCaml | Rust |
|---|---|---|
| Split | Pattern match a :: b :: rest | Slice indexing &list[..mid] |
| Merge | Recursive cons h1 :: merge ... | Vec::push + extend_from_slice |
| Base case | ([] \| [_]) as l -> l | if list.len() <= 1 |
| Comparator | cmp function parameter | Ord trait bound or closure |
| Memory | GC, structural sharing | Vec allocation, contiguous |
| Stability | Stable (preserves order) | Stable (preserves order) |
What Rust Learners Should Notice
Vec::with_capacity(n) is a crucial optimization — it avoids O(log n) reallocations during the merge phase&[T]) give you zero-cost views into data without copying — similar to how OCaml lists share tailsextend_from_slice copies a contiguous block efficiently, much faster than pushing elements one by oneOrd trait bound means any type that implements comparison can be sorted — no need to pass a comparator function (though merge_sort_by with a closure also works)slice::sort() or slice::sort_unstable() — they use Timsort / pdqsort respectively, both highly optimizedFurther Reading
Exercises
merge_sort_by<T, F: Fn(&T, &T) -> Ordering>(list, cmp) with a custom comparator.merge_sort work on Vec<T> in-place with O(n log n) time and O(n) auxiliary space.natural_merge_sort that detects existing sorted runs and merges them (Timsort inspiration).merge_sort vs slice::sort on a 10,000-element Vec<i32>.merge_k_sorted(lists: Vec<Vec<T>>) that merges k sorted lists in O(n log k) using a BinaryHeap.