String Bytes
Tutorial
The Problem
Network protocols, file formats, and cryptographic functions operate on bytes, not characters. A Rust String is a validated UTF-8 Vec<u8>, but sometimes you need the raw bytes: serialising to a binary protocol, computing a checksum, or interfacing with a C library that returns *const u8. The reverse — constructing a String from bytes — requires validation because not all byte sequences are valid UTF-8. Rust makes this validation explicit with from_utf8 (strict) and from_utf8_lossy (replaces invalid bytes with U+FFFD).
🎯 Learning Outcomes
.bytes() yielding u8 valuesVec<u8> to String with String::from_utf8, which returns Result&[u8] slice as UTF-8 with std::str::from_utf8String::from_utf8_lossy to convert potentially invalid bytes with replacement characters&str, &[u8], String, and Vec<u8>Code Example
#![allow(clippy::all)]
// 481. bytes() and byte-level operations
#[cfg(test)]
mod tests {
#[test]
fn test_bytes() {
assert_eq!("hi".bytes().collect::<Vec<_>>(), vec![104, 105]);
}
#[test]
fn test_from() {
assert_eq!(String::from_utf8(vec![104, 105]).unwrap(), "hi");
}
#[test]
fn test_invalid() {
assert!(String::from_utf8(vec![0xFF]).is_err());
}
#[test]
fn test_lossy() {
let s = String::from_utf8_lossy(&[104, 0xFF, 105]);
assert!(s.contains('h'));
}
}Key Differences
String/&str guarantee UTF-8 validity; OCaml's string is unchecked bytes.from_utf8 (returning Result) to go from bytes to string; OCaml's Bytes.to_string is unconditional.from_utf8_lossy**: Rust provides a built-in lossy decoder that replaces invalid bytes; OCaml needs Uutf or manual implementation.&str as &[u8]**: Rust's str::as_bytes() gives a &[u8] view with no copy; OCaml's String.to_bytes allocates a new Bytes.t.OCaml Approach
OCaml's Bytes.t is a mutable byte sequence; string is an immutable byte sequence. There is no UTF-8 validation in the standard library:
(* Bytes to string — unsafe in OCaml, no validation *)
let bytes = Bytes.of_string "hi"
let s = Bytes.to_string bytes
(* For UTF-8 validation, use uutf *)
let is_valid_utf8 s =
Uutf.String.fold_utf_8 (fun ok _ d ->
ok && d <> `Malformed) true s
OCaml makes no UTF-8 guarantees at the string type level — it is the programmer's responsibility.
Full Source
#![allow(clippy::all)]
// 481. bytes() and byte-level operations
#[cfg(test)]
mod tests {
#[test]
fn test_bytes() {
assert_eq!("hi".bytes().collect::<Vec<_>>(), vec![104, 105]);
}
#[test]
fn test_from() {
assert_eq!(String::from_utf8(vec![104, 105]).unwrap(), "hi");
}
#[test]
fn test_invalid() {
assert!(String::from_utf8(vec![0xFF]).is_err());
}
#[test]
fn test_lossy() {
let s = String::from_utf8_lossy(&[104, 0xFF, 105]);
assert!(s.contains('h'));
}
}#[cfg(test)]
mod tests {
#[test]
fn test_bytes() {
assert_eq!("hi".bytes().collect::<Vec<_>>(), vec![104, 105]);
}
#[test]
fn test_from() {
assert_eq!(String::from_utf8(vec![104, 105]).unwrap(), "hi");
}
#[test]
fn test_invalid() {
assert!(String::from_utf8(vec![0xFF]).is_err());
}
#[test]
fn test_lossy() {
let s = String::from_utf8_lossy(&[104, 0xFF, 105]);
assert!(s.contains('h'));
}
}
Exercises
to_hex(s: &str) -> String that formats each byte as two lowercase hex digits using .bytes() and format!("{:02x}", b).is_valid_utf8(bytes: &[u8]) -> bool using std::str::from_utf8 and write tests for valid ASCII, valid multibyte sequences, and truncated multibyte sequences.to_c_str(s: &str) -> Vec<u8> that appends a null byte — handling any embedded nulls as an error — to produce a C-compatible byte string.