String Slices
Functional Programming
Tutorial
The Problem
In many languages, str[2] gives you the third character. In Rust, string slices are byte ranges. UTF-8 encodes non-ASCII characters in 2–4 bytes, so slicing at an arbitrary byte offset can split a multi-byte character and panic at runtime. The str::get method returns Option<&str> — None if the range falls outside a char boundary — while direct indexing panics. Correct Unicode handling requires iterating characters, not bytes.
🎯 Learning Outcomes
"café".len() == 5 (bytes) but "café".chars().count() == 4 (chars).get(range) for safe slicing that returns None on boundary violationschar_indices() to map character positions to byte offsets[byte_range] slicing from multi-byte safe .chars() iterationCode Example
#![allow(clippy::all)]
// 472. String slices and byte boundaries
#[cfg(test)]
mod tests {
#[test]
fn test_ascii() {
assert_eq!(&"hello"[0..3], "hel");
}
#[test]
fn test_safe_get() {
assert_eq!("hello".get(1..4), Some("ell"));
assert_eq!("hello".get(0..99), None);
}
#[test]
fn test_utf8() {
assert_eq!("café".len(), 5);
assert_eq!("café".chars().count(), 4);
}
#[test]
fn test_char_idx() {
let v: Vec<_> = "abc".char_indices().collect();
assert_eq!(v, vec![(0, 'a'), (1, 'b'), (2, 'c')]);
}
}Key Differences
&s[range] panics on invalid UTF-8 boundaries; s.get(range) returns Option. OCaml's String.sub raises Invalid_argument on out-of-bounds.len/length count bytes; character counting requires chars().count() in Rust and a library in OCaml.char_indices**: Rust provides char_indices() as a standard iterator; OCaml requires Uutf.String.fold_utf_8 or manual UTF-8 decoding.char (a Unicode scalar value, 4 bytes) from u8 (a byte); OCaml's char is a single byte, silently wrong for non-ASCII.OCaml Approach
OCaml's standard string is a byte string — String.length "café" returns 5, matching Rust's .len(). Character-level operations require the Uutf or Camomile library:
(* Byte-level slicing *)
let sub = String.sub "hello" 1 3 (* "ell" *)
(* Character count via Uutf *)
let char_count s =
Uutf.String.fold_utf_8 (fun acc _ _ -> acc + 1) 0 s
OCaml 5 does not include Unicode-aware string operations in the standard library; correct Unicode handling always requires an external package.
Full Source
#![allow(clippy::all)]
// 472. String slices and byte boundaries
#[cfg(test)]
mod tests {
#[test]
fn test_ascii() {
assert_eq!(&"hello"[0..3], "hel");
}
#[test]
fn test_safe_get() {
assert_eq!("hello".get(1..4), Some("ell"));
assert_eq!("hello".get(0..99), None);
}
#[test]
fn test_utf8() {
assert_eq!("café".len(), 5);
assert_eq!("café".chars().count(), 4);
}
#[test]
fn test_char_idx() {
let v: Vec<_> = "abc".char_indices().collect();
assert_eq!(v, vec![(0, 'a'), (1, 'b'), (2, 'c')]);
}
}
✓ Tests
Rust test suite
#[cfg(test)]
mod tests {
#[test]
fn test_ascii() {
assert_eq!(&"hello"[0..3], "hel");
}
#[test]
fn test_safe_get() {
assert_eq!("hello".get(1..4), Some("ell"));
assert_eq!("hello".get(0..99), None);
}
#[test]
fn test_utf8() {
assert_eq!("café".len(), 5);
assert_eq!("café".chars().count(), 4);
}
#[test]
fn test_char_idx() {
let v: Vec<_> = "abc".char_indices().collect();
assert_eq!(v, vec![(0, 'a'), (1, 'b'), (2, 'c')]);
}
}
Exercises
nth_char(s: &str, n: usize) -> Option<char> using chars().nth(n) and benchmark it against a byte-indexed approach on ASCII-only input.is_char_boundary_range(s: &str, start: usize, end: usize) -> bool without using str::get — check s.is_char_boundary(start) && s.is_char_boundary(end).unicode-segmentation crate to split "e\u{0301}" (e + combining accent) correctly and compare the grapheme count to .chars().count().