String Splitting
Functional Programming
Tutorial
The Problem
Parsing structured text — CSV records, HTTP headers, key=value config lines, shell command tokens — requires splitting strings on delimiters. A good split API must handle: unlimited splits, capped splits, single split at first delimiter, and whitespace-normalised tokenisation. Rust's split family covers all cases with a consistent iterator-based interface, avoiding the allocation of a Vec<String> unless the caller explicitly collects.
🎯 Learning Outcomes
&str, or closure predicate with .split().splitn(n, pat), leaving the remainder unsplit.split_once(pat) returning Option<(&str, &str)>.split_whitespace(), handling multiple spacesCode Example
#![allow(clippy::all)]
// 476. split(), splitn(), split_once()
#[cfg(test)]
mod tests {
#[test]
fn test_split() {
assert_eq!("a,b,c".split(',').collect::<Vec<_>>(), ["a", "b", "c"]);
}
#[test]
fn test_splitn() {
let v: Vec<_> = "a:b:c:d".splitn(3, ':').collect();
assert_eq!(v, ["a", "b", "c:d"]);
}
#[test]
fn test_split_once() {
assert_eq!("k=v".split_once('='), Some(("k", "v")));
assert_eq!("noeq".split_once('='), None);
}
#[test]
fn test_whitespace() {
let w: Vec<_> = " a b c ".split_whitespace().collect();
assert_eq!(w, ["a", "b", "c"]);
}
}Key Differences
split returns a lazy Split iterator; OCaml's split_on_char returns an allocated string list immediately.split_once**: Rust's standard library includes split_once; OCaml requires manual index_opt + sub or an external library.split accepts char, &str, &[char], or any Pattern (including closures); OCaml's standard split_on_char accepts only char.split yields empty strings between adjacent delimiters; split_whitespace skips them. OCaml's split_on_char also yields empty strings.OCaml Approach
OCaml 4.04+ has String.split_on_char:
String.split_on_char ',' "a,b,c" (* ["a"; "b"; "c"] *)
For splitn-equivalent behaviour, the Str module provides Str.bounded_split:
Str.bounded_split (Str.regexp ",") "a,b,c,d" 3 (* ["a"; "b"; "c,d"] *)
split_once has no direct equivalent; the idiom is match String.index_opt s '=' with Some i -> .... The astring library provides a richer split API.
Full Source
#![allow(clippy::all)]
// 476. split(), splitn(), split_once()
#[cfg(test)]
mod tests {
#[test]
fn test_split() {
assert_eq!("a,b,c".split(',').collect::<Vec<_>>(), ["a", "b", "c"]);
}
#[test]
fn test_splitn() {
let v: Vec<_> = "a:b:c:d".splitn(3, ':').collect();
assert_eq!(v, ["a", "b", "c:d"]);
}
#[test]
fn test_split_once() {
assert_eq!("k=v".split_once('='), Some(("k", "v")));
assert_eq!("noeq".split_once('='), None);
}
#[test]
fn test_whitespace() {
let w: Vec<_> = " a b c ".split_whitespace().collect();
assert_eq!(w, ["a", "b", "c"]);
}
}
✓ Tests
Rust test suite
#[cfg(test)]
mod tests {
#[test]
fn test_split() {
assert_eq!("a,b,c".split(',').collect::<Vec<_>>(), ["a", "b", "c"]);
}
#[test]
fn test_splitn() {
let v: Vec<_> = "a:b:c:d".splitn(3, ':').collect();
assert_eq!(v, ["a", "b", "c:d"]);
}
#[test]
fn test_split_once() {
assert_eq!("k=v".split_once('='), Some(("k", "v")));
assert_eq!("noeq".split_once('='), None);
}
#[test]
fn test_whitespace() {
let w: Vec<_> = " a b c ".split_whitespace().collect();
assert_eq!(w, ["a", "b", "c"]);
}
}
Exercises
parse_csv_line(s: &str) -> Vec<&str> that splits on commas and trims whitespace from each field, returning slices into the original string.parse_config(text: &str) -> HashMap<&str, &str> that processes each line with split_once('='), ignoring lines without =."a/b/c/d" on /, uppercase each component, then rejoin with :: — implement without intermediate Vec<String> allocation.