476 Fundamental

String Splitting

Functional Programming

Tutorial

The Problem

Parsing structured text — CSV records, HTTP headers, key=value config lines, shell command tokens — requires splitting strings on delimiters. A good split API must handle: unlimited splits, capped splits, single split at first delimiter, and whitespace-normalised tokenisation. Rust's split family covers all cases with a consistent iterator-based interface, avoiding the allocation of a Vec<String> unless the caller explicitly collects.

🎯 Learning Outcomes

• Split on a char, &str, or closure predicate with .split()

• Limit the number of parts with .splitn(n, pat), leaving the remainder unsplit

• Extract a key-value pair efficiently with .split_once(pat) returning Option<(&str, &str)>

• Tokenise whitespace-separated input with .split_whitespace(), handling multiple spaces

• Chain split results with iterator adaptors before collecting

Code Example

#![allow(clippy::all)]
// 476. split(), splitn(), split_once()

#[cfg(test)]
mod tests {
    #[test]
    fn test_split() {
        assert_eq!("a,b,c".split(',').collect::<Vec<_>>(), ["a", "b", "c"]);
    }
    #[test]
    fn test_splitn() {
        let v: Vec<_> = "a:b:c:d".splitn(3, ':').collect();
        assert_eq!(v, ["a", "b", "c:d"]);
    }
    #[test]
    fn test_split_once() {
        assert_eq!("k=v".split_once('='), Some(("k", "v")));
        assert_eq!("noeq".split_once('='), None);
    }
    #[test]
    fn test_whitespace() {
        let w: Vec<_> = "  a  b  c  ".split_whitespace().collect();
        assert_eq!(w, ["a", "b", "c"]);
    }
}

(* 476. String splitting – OCaml *)
let () =
  let csv = "alice,30,amsterdam" in
  List.iter (fun p -> Printf.printf "'%s'\n" p) (String.split_on_char ',' csv);

  (* split_once equivalent *)
  let split_once sep s =
    match String.split_on_char sep s with
    | [] | [_] -> None
    | h::t -> Some(h, String.concat (String.make 1 sep) t)
  in
  (match split_once '=' "key=value=extra" with
   | Some(k,v) -> Printf.printf "k=%s v=%s\n" k v | None->());

  (* split_whitespace *)
  let words = List.filter ((<>) "") (String.split_on_char ' ' "  a  b  c  ") in
  Printf.printf "words: %d\n" (List.length words)

Key Differences

Lazy vs. eager: Rust's split returns a lazy Split iterator; OCaml's split_on_char returns an allocated string list immediately.

**split_once**: Rust's standard library includes split_once; OCaml requires manual index_opt + sub or an external library.

Pattern types: Rust's split accepts char, &str, &[char], or any Pattern (including closures); OCaml's standard split_on_char accepts only char.

Empty tokens: Rust's split yields empty strings between adjacent delimiters; split_whitespace skips them. OCaml's split_on_char also yields empty strings.

OCaml Approach

OCaml 4.04+ has String.split_on_char:

String.split_on_char ',' "a,b,c"  (* ["a"; "b"; "c"] *)

For splitn-equivalent behaviour, the Str module provides Str.bounded_split:

Str.bounded_split (Str.regexp ",") "a,b,c,d" 3  (* ["a"; "b"; "c,d"] *)

split_once has no direct equivalent; the idiom is match String.index_opt s '=' with Some i -> .... The astring library provides a richer split API.

Full Source

#![allow(clippy::all)]
// 476. split(), splitn(), split_once()

#[cfg(test)]
mod tests {
    #[test]
    fn test_split() {
        assert_eq!("a,b,c".split(',').collect::<Vec<_>>(), ["a", "b", "c"]);
    }
    #[test]
    fn test_splitn() {
        let v: Vec<_> = "a:b:c:d".splitn(3, ':').collect();
        assert_eq!(v, ["a", "b", "c:d"]);
    }
    #[test]
    fn test_split_once() {
        assert_eq!("k=v".split_once('='), Some(("k", "v")));
        assert_eq!("noeq".split_once('='), None);
    }
    #[test]
    fn test_whitespace() {
        let w: Vec<_> = "  a  b  c  ".split_whitespace().collect();
        assert_eq!(w, ["a", "b", "c"]);
    }
}

(* 476. String splitting – OCaml *)
let () =
  let csv = "alice,30,amsterdam" in
  List.iter (fun p -> Printf.printf "'%s'\n" p) (String.split_on_char ',' csv);

  (* split_once equivalent *)
  let split_once sep s =
    match String.split_on_char sep s with
    | [] | [_] -> None
    | h::t -> Some(h, String.concat (String.make 1 sep) t)
  in
  (match split_once '=' "key=value=extra" with
   | Some(k,v) -> Printf.printf "k=%s v=%s\n" k v | None->());

  (* split_whitespace *)
  let words = List.filter ((<>) "") (String.split_on_char ' ' "  a  b  c  ") in
  Printf.printf "words: %d\n" (List.length words)

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    #[test]
    fn test_split() {
        assert_eq!("a,b,c".split(',').collect::<Vec<_>>(), ["a", "b", "c"]);
    }
    #[test]
    fn test_splitn() {
        let v: Vec<_> = "a:b:c:d".splitn(3, ':').collect();
        assert_eq!(v, ["a", "b", "c:d"]);
    }
    #[test]
    fn test_split_once() {
        assert_eq!("k=v".split_once('='), Some(("k", "v")));
        assert_eq!("noeq".split_once('='), None);
    }
    #[test]
    fn test_whitespace() {
        let w: Vec<_> = "  a  b  c  ".split_whitespace().collect();
        assert_eq!(w, ["a", "b", "c"]);
    }
}

Exercises

CSV line parser: Write parse_csv_line(s: &str) -> Vec<&str> that splits on commas and trims whitespace from each field, returning slices into the original string.

Key-value config: Write parse_config(text: &str) -> HashMap<&str, &str> that processes each line with split_once('='), ignoring lines without =.

Re-join with modified parts: Split a path "a/b/c/d" on /, uppercase each component, then rejoin with :: — implement without intermediate Vec<String> allocation.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust