String Chars
Functional Programming
Tutorial
The Problem
Strings in memory are byte sequences, but humans think in characters. For ASCII, bytes and characters coincide; for Unicode text, they diverge. Iterating bytes and assuming each is a character corrupts emoji, accented letters, CJK characters, and anything outside ASCII. Rust's char type is a Unicode scalar value (U+0000 to U+10FFFF, excluding surrogates), and .chars() decodes UTF-8 on the fly, yielding the correct unit for character counting, filtering, reversal, and indexing.
🎯 Learning Outcomes
.chars() vs. iterating bytes with .bytes().chars().count()String.chars().rev().collect().chars().nth(n) (O(N), not O(1))Code Example
#![allow(clippy::all)]
// 480. chars() and char-level operations
#[cfg(test)]
mod tests {
#[test]
fn test_count() {
assert_eq!("café".chars().count(), 4);
assert_eq!("café".len(), 5);
}
#[test]
fn test_filter() {
let s: String = "Hello123".chars().filter(|c| c.is_ascii_digit()).collect();
assert_eq!(s, "123");
}
#[test]
fn test_rev() {
let s: String = "abcde".chars().rev().collect();
assert_eq!(s, "edcba");
}
#[test]
fn test_nth() {
assert_eq!("hello".chars().nth(1), Some('e'));
}
}Key Differences
char semantics**: Rust's char is a 4-byte Unicode scalar value; OCaml's char is a 1-byte value (0–255). True Unicode characters in OCaml require Uchar.t..chars() without any external crate; OCaml requires Uutf or similar.FromIterator<char> for String enables .chars().filter(...).collect::<String>(); OCaml requires String.of_seq (4.07+) which works on bytes, not Unicode scalars.chars().rev().collect() correctly reverses character by character; reversing bytes with OCaml's Bytes can corrupt multi-byte sequences.OCaml Approach
OCaml 4.07+ provides String.to_seq which yields char values (single bytes — not Unicode scalars):
String.to_seq "hello" |> Seq.filter (fun c -> c >= '0' && c <= '9')
|> String.of_seq (* standard lib 4.07+ *)
For true Unicode character iteration, the Uutf library is required:
Uutf.String.fold_utf_8 (fun acc _ d ->
match d with `Uchar u -> u :: acc | _ -> acc) [] "café"
OCaml's char is a single byte; Uchar.t (from uchar package) is the Unicode scalar equivalent.
Full Source
#![allow(clippy::all)]
// 480. chars() and char-level operations
#[cfg(test)]
mod tests {
#[test]
fn test_count() {
assert_eq!("café".chars().count(), 4);
assert_eq!("café".len(), 5);
}
#[test]
fn test_filter() {
let s: String = "Hello123".chars().filter(|c| c.is_ascii_digit()).collect();
assert_eq!(s, "123");
}
#[test]
fn test_rev() {
let s: String = "abcde".chars().rev().collect();
assert_eq!(s, "edcba");
}
#[test]
fn test_nth() {
assert_eq!("hello".chars().nth(1), Some('e'));
}
}
✓ Tests
Rust test suite
#[cfg(test)]
mod tests {
#[test]
fn test_count() {
assert_eq!("café".chars().count(), 4);
assert_eq!("café".len(), 5);
}
#[test]
fn test_filter() {
let s: String = "Hello123".chars().filter(|c| c.is_ascii_digit()).collect();
assert_eq!(s, "123");
}
#[test]
fn test_rev() {
let s: String = "abcde".chars().rev().collect();
assert_eq!(s, "edcba");
}
#[test]
fn test_nth() {
assert_eq!("hello".chars().nth(1), Some('e'));
}
}
Exercises
is_palindrome(s: &str) -> bool that compares the string to its character-reversed form, handling Unicode correctly.HashMap<char, usize> counting character occurrences in a &str using .chars() and .entry().and_modify().or_insert().unicode-segmentation crate's graphemes iterator to correctly reverse "e\u{0301}nde" (e + combining accent + nde) and compare the result to .chars().rev().collect().