String Owning References
Tutorial Video
Text description (accessibility)
This video demonstrates the "String Owning References" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. A common parser pattern is to own an input string and cache the positions of tokens within it: `{ source: String, tokens: Vec<&str> }`. Key difference from OCaml: 1. **Self
Tutorial
The Problem
A common parser pattern is to own an input string and cache the positions of tokens within it: { source: String, tokens: Vec<&str> }. This is a self-referential struct — tokens would borrow from source in the same struct, which Rust's borrow checker forbids. The safe workaround stores (usize, usize) byte offsets instead of &str references and reconstructs slices from &self.source[start..end] when needed. This is how logos, nom, and most Rust parsers work internally.
🎯 Learning Outcomes
(usize, usize) as a safe alternative to cached &str&str slices from stored offsets on-demandCow-based tri-variant string ownership (Static, Owned, Borrowed)Pin<Box<T>> is needed for genuinely self-referential dataCode Example
#![allow(clippy::all)]
//! # String Owning References — Self-Referential Patterns
//!
//! Patterns for owning data while referencing into it.
use std::pin::Pin;
/// Simple owned string with cached parse result
pub struct ParsedString {
source: String,
words: Vec<(usize, usize)>, // (start, end) indices into source
}
impl ParsedString {
pub fn new(s: &str) -> Self {
let source = s.to_string();
let words: Vec<_> = source
.match_indices(char::is_alphanumeric)
.map(|(i, _)| (i, i + 1))
.collect();
// Actually find word boundaries
let mut words = Vec::new();
let mut start = None;
for (i, c) in source.char_indices() {
if c.is_alphanumeric() {
if start.is_none() {
start = Some(i);
}
} else if let Some(s) = start {
words.push((s, i));
start = None;
}
}
if let Some(s) = start {
words.push((s, source.len()));
}
Self { source, words }
}
pub fn get_word(&self, index: usize) -> Option<&str> {
self.words
.get(index)
.map(|(start, end)| &self.source[*start..*end])
}
pub fn word_count(&self) -> usize {
self.words.len()
}
pub fn source(&self) -> &str {
&self.source
}
}
/// Cow-based approach
use std::borrow::Cow;
pub enum StringOrStatic<'a> {
Static(&'static str),
Owned(String),
Borrowed(&'a str),
}
impl<'a> StringOrStatic<'a> {
pub fn as_str(&self) -> &str {
match self {
Self::Static(s) => s,
Self::Owned(s) => s,
Self::Borrowed(s) => s,
}
}
pub fn into_owned(self) -> String {
match self {
Self::Static(s) => s.to_string(),
Self::Owned(s) => s,
Self::Borrowed(s) => s.to_string(),
}
}
}
/// Using Cow for zero-copy when possible
pub fn maybe_uppercase(s: &str) -> Cow<'_, str> {
if s.chars().all(|c| !c.is_lowercase()) {
Cow::Borrowed(s)
} else {
Cow::Owned(s.to_uppercase())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parsed_string() {
let ps = ParsedString::new("hello world rust");
assert_eq!(ps.word_count(), 3);
assert_eq!(ps.get_word(0), Some("hello"));
assert_eq!(ps.get_word(1), Some("world"));
assert_eq!(ps.get_word(2), Some("rust"));
assert_eq!(ps.get_word(3), None);
}
#[test]
fn test_string_or_static() {
let s = StringOrStatic::Static("hello");
assert_eq!(s.as_str(), "hello");
let owned = StringOrStatic::Owned(String::from("world"));
assert_eq!(owned.as_str(), "world");
}
#[test]
fn test_cow_no_alloc() {
let s = "ALREADY UPPER";
let result = maybe_uppercase(s);
assert!(matches!(result, Cow::Borrowed(_)));
}
#[test]
fn test_cow_with_alloc() {
let s = "needs uppercase";
let result = maybe_uppercase(s);
assert!(matches!(result, Cow::Owned(_)));
assert_eq!(&*result, "NEEDS UPPERCASE");
}
}Key Differences
unsafe; OCaml allows them freely because the GC manages all lifetimes.&str with no allocation; OCaml's String.sub always copies.Pin**: Rust's Pin<Box<T>> prevents a self-referential struct from moving in memory (which would invalidate internal pointers); OCaml moves objects during GC compaction but updates all pointers automatically.Cow lifetime**: Rust's Cow<'a, str> carries a lifetime parameter tying the borrowed variant to its source; OCaml has no equivalent — all strings are GC-lifetime.OCaml Approach
OCaml's GC makes self-referential structures straightforward — the GC follows all pointers, so a struct can hold both an owning reference and a derived slice:
type parsed = {
source: string;
words: (int * int) list; (* or store string directly *)
}
let get_word p i =
let (start, len) = List.nth p.words i in
String.sub p.source start len (* allocates — no slice type *)
OCaml's lack of a zero-copy slice type means get_word always allocates with String.sub; Rust's approach is zero-copy.
Full Source
#![allow(clippy::all)]
//! # String Owning References — Self-Referential Patterns
//!
//! Patterns for owning data while referencing into it.
use std::pin::Pin;
/// Simple owned string with cached parse result
pub struct ParsedString {
source: String,
words: Vec<(usize, usize)>, // (start, end) indices into source
}
impl ParsedString {
pub fn new(s: &str) -> Self {
let source = s.to_string();
let words: Vec<_> = source
.match_indices(char::is_alphanumeric)
.map(|(i, _)| (i, i + 1))
.collect();
// Actually find word boundaries
let mut words = Vec::new();
let mut start = None;
for (i, c) in source.char_indices() {
if c.is_alphanumeric() {
if start.is_none() {
start = Some(i);
}
} else if let Some(s) = start {
words.push((s, i));
start = None;
}
}
if let Some(s) = start {
words.push((s, source.len()));
}
Self { source, words }
}
pub fn get_word(&self, index: usize) -> Option<&str> {
self.words
.get(index)
.map(|(start, end)| &self.source[*start..*end])
}
pub fn word_count(&self) -> usize {
self.words.len()
}
pub fn source(&self) -> &str {
&self.source
}
}
/// Cow-based approach
use std::borrow::Cow;
pub enum StringOrStatic<'a> {
Static(&'static str),
Owned(String),
Borrowed(&'a str),
}
impl<'a> StringOrStatic<'a> {
pub fn as_str(&self) -> &str {
match self {
Self::Static(s) => s,
Self::Owned(s) => s,
Self::Borrowed(s) => s,
}
}
pub fn into_owned(self) -> String {
match self {
Self::Static(s) => s.to_string(),
Self::Owned(s) => s,
Self::Borrowed(s) => s.to_string(),
}
}
}
/// Using Cow for zero-copy when possible
pub fn maybe_uppercase(s: &str) -> Cow<'_, str> {
if s.chars().all(|c| !c.is_lowercase()) {
Cow::Borrowed(s)
} else {
Cow::Owned(s.to_uppercase())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parsed_string() {
let ps = ParsedString::new("hello world rust");
assert_eq!(ps.word_count(), 3);
assert_eq!(ps.get_word(0), Some("hello"));
assert_eq!(ps.get_word(1), Some("world"));
assert_eq!(ps.get_word(2), Some("rust"));
assert_eq!(ps.get_word(3), None);
}
#[test]
fn test_string_or_static() {
let s = StringOrStatic::Static("hello");
assert_eq!(s.as_str(), "hello");
let owned = StringOrStatic::Owned(String::from("world"));
assert_eq!(owned.as_str(), "world");
}
#[test]
fn test_cow_no_alloc() {
let s = "ALREADY UPPER";
let result = maybe_uppercase(s);
assert!(matches!(result, Cow::Borrowed(_)));
}
#[test]
fn test_cow_with_alloc() {
let s = "needs uppercase";
let result = maybe_uppercase(s);
assert!(matches!(result, Cow::Owned(_)));
assert_eq!(&*result, "NEEDS UPPERCASE");
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parsed_string() {
let ps = ParsedString::new("hello world rust");
assert_eq!(ps.word_count(), 3);
assert_eq!(ps.get_word(0), Some("hello"));
assert_eq!(ps.get_word(1), Some("world"));
assert_eq!(ps.get_word(2), Some("rust"));
assert_eq!(ps.get_word(3), None);
}
#[test]
fn test_string_or_static() {
let s = StringOrStatic::Static("hello");
assert_eq!(s.as_str(), "hello");
let owned = StringOrStatic::Owned(String::from("world"));
assert_eq!(owned.as_str(), "world");
}
#[test]
fn test_cow_no_alloc() {
let s = "ALREADY UPPER";
let result = maybe_uppercase(s);
assert!(matches!(result, Cow::Borrowed(_)));
}
#[test]
fn test_cow_with_alloc() {
let s = "needs uppercase";
let result = maybe_uppercase(s);
assert!(matches!(result, Cow::Owned(_)));
assert_eq!(&*result, "NEEDS UPPERCASE");
}
}
Deep Comparison
String Owning Ref: Comparison
See src/lib.rs for the Rust implementation.
Exercises
LineIndex { source: String, line_starts: Vec<usize> } that precomputes newline positions and provides fn line(&self, n: usize) -> &str.Tokenizer that stores the source String and a Vec<(TokenKind, usize, usize)> for token type, start, and end byte offsets. Implement an iterator that yields (TokenKind, &str) by slicing on demand.ouroboros**: Use the ouroboros crate to create a SelfRefParsed struct that safely stores both the String and Vec<&str> references, and compare ergonomics against the offset approach.