ExamplesBy LevelBy TopicLearning Paths
476 Fundamental

String Splitting

Functional Programming

Tutorial

The Problem

Parsing structured text — CSV records, HTTP headers, key=value config lines, shell command tokens — requires splitting strings on delimiters. A good split API must handle: unlimited splits, capped splits, single split at first delimiter, and whitespace-normalised tokenisation. Rust's split family covers all cases with a consistent iterator-based interface, avoiding the allocation of a Vec<String> unless the caller explicitly collects.

🎯 Learning Outcomes

  • • Split on a char, &str, or closure predicate with .split()
  • • Limit the number of parts with .splitn(n, pat), leaving the remainder unsplit
  • • Extract a key-value pair efficiently with .split_once(pat) returning Option<(&str, &str)>
  • • Tokenise whitespace-separated input with .split_whitespace(), handling multiple spaces
  • • Chain split results with iterator adaptors before collecting
  • Code Example

    #![allow(clippy::all)]
    // 476. split(), splitn(), split_once()
    
    #[cfg(test)]
    mod tests {
        #[test]
        fn test_split() {
            assert_eq!("a,b,c".split(',').collect::<Vec<_>>(), ["a", "b", "c"]);
        }
        #[test]
        fn test_splitn() {
            let v: Vec<_> = "a:b:c:d".splitn(3, ':').collect();
            assert_eq!(v, ["a", "b", "c:d"]);
        }
        #[test]
        fn test_split_once() {
            assert_eq!("k=v".split_once('='), Some(("k", "v")));
            assert_eq!("noeq".split_once('='), None);
        }
        #[test]
        fn test_whitespace() {
            let w: Vec<_> = "  a  b  c  ".split_whitespace().collect();
            assert_eq!(w, ["a", "b", "c"]);
        }
    }

    Key Differences

  • Lazy vs. eager: Rust's split returns a lazy Split iterator; OCaml's split_on_char returns an allocated string list immediately.
  • **split_once**: Rust's standard library includes split_once; OCaml requires manual index_opt + sub or an external library.
  • Pattern types: Rust's split accepts char, &str, &[char], or any Pattern (including closures); OCaml's standard split_on_char accepts only char.
  • Empty tokens: Rust's split yields empty strings between adjacent delimiters; split_whitespace skips them. OCaml's split_on_char also yields empty strings.
  • OCaml Approach

    OCaml 4.04+ has String.split_on_char:

    String.split_on_char ',' "a,b,c"  (* ["a"; "b"; "c"] *)
    

    For splitn-equivalent behaviour, the Str module provides Str.bounded_split:

    Str.bounded_split (Str.regexp ",") "a,b,c,d" 3  (* ["a"; "b"; "c,d"] *)
    

    split_once has no direct equivalent; the idiom is match String.index_opt s '=' with Some i -> .... The astring library provides a richer split API.

    Full Source

    #![allow(clippy::all)]
    // 476. split(), splitn(), split_once()
    
    #[cfg(test)]
    mod tests {
        #[test]
        fn test_split() {
            assert_eq!("a,b,c".split(',').collect::<Vec<_>>(), ["a", "b", "c"]);
        }
        #[test]
        fn test_splitn() {
            let v: Vec<_> = "a:b:c:d".splitn(3, ':').collect();
            assert_eq!(v, ["a", "b", "c:d"]);
        }
        #[test]
        fn test_split_once() {
            assert_eq!("k=v".split_once('='), Some(("k", "v")));
            assert_eq!("noeq".split_once('='), None);
        }
        #[test]
        fn test_whitespace() {
            let w: Vec<_> = "  a  b  c  ".split_whitespace().collect();
            assert_eq!(w, ["a", "b", "c"]);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        #[test]
        fn test_split() {
            assert_eq!("a,b,c".split(',').collect::<Vec<_>>(), ["a", "b", "c"]);
        }
        #[test]
        fn test_splitn() {
            let v: Vec<_> = "a:b:c:d".splitn(3, ':').collect();
            assert_eq!(v, ["a", "b", "c:d"]);
        }
        #[test]
        fn test_split_once() {
            assert_eq!("k=v".split_once('='), Some(("k", "v")));
            assert_eq!("noeq".split_once('='), None);
        }
        #[test]
        fn test_whitespace() {
            let w: Vec<_> = "  a  b  c  ".split_whitespace().collect();
            assert_eq!(w, ["a", "b", "c"]);
        }
    }

    Exercises

  • CSV line parser: Write parse_csv_line(s: &str) -> Vec<&str> that splits on commas and trims whitespace from each field, returning slices into the original string.
  • Key-value config: Write parse_config(text: &str) -> HashMap<&str, &str> that processes each line with split_once('='), ignoring lines without =.
  • Re-join with modified parts: Split a path "a/b/c/d" on /, uppercase each component, then rejoin with :: — implement without intermediate Vec<String> allocation.
  • Open Source Repos