ExamplesBy LevelBy TopicLearning Paths
488 Fundamental

String Owning References

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "String Owning References" functional Rust example. Difficulty level: Fundamental. Key concepts covered: Functional Programming. A common parser pattern is to own an input string and cache the positions of tokens within it: `{ source: String, tokens: Vec<&str> }`. Key difference from OCaml: 1. **Self

Tutorial

The Problem

A common parser pattern is to own an input string and cache the positions of tokens within it: { source: String, tokens: Vec<&str> }. This is a self-referential structtokens would borrow from source in the same struct, which Rust's borrow checker forbids. The safe workaround stores (usize, usize) byte offsets instead of &str references and reconstructs slices from &self.source[start..end] when needed. This is how logos, nom, and most Rust parsers work internally.

🎯 Learning Outcomes

  • • Understand why Rust forbids self-referential structs with borrowed fields
  • • Store byte offset pairs (usize, usize) as a safe alternative to cached &str
  • • Reconstruct &str slices from stored offsets on-demand
  • • Understand Cow-based tri-variant string ownership (Static, Owned, Borrowed)
  • • Recognise when Pin<Box<T>> is needed for genuinely self-referential data
  • Code Example

    #![allow(clippy::all)]
    //! # String Owning References — Self-Referential Patterns
    //!
    //! Patterns for owning data while referencing into it.
    
    use std::pin::Pin;
    
    /// Simple owned string with cached parse result
    pub struct ParsedString {
        source: String,
        words: Vec<(usize, usize)>, // (start, end) indices into source
    }
    
    impl ParsedString {
        pub fn new(s: &str) -> Self {
            let source = s.to_string();
            let words: Vec<_> = source
                .match_indices(char::is_alphanumeric)
                .map(|(i, _)| (i, i + 1))
                .collect();
    
            // Actually find word boundaries
            let mut words = Vec::new();
            let mut start = None;
    
            for (i, c) in source.char_indices() {
                if c.is_alphanumeric() {
                    if start.is_none() {
                        start = Some(i);
                    }
                } else if let Some(s) = start {
                    words.push((s, i));
                    start = None;
                }
            }
            if let Some(s) = start {
                words.push((s, source.len()));
            }
    
            Self { source, words }
        }
    
        pub fn get_word(&self, index: usize) -> Option<&str> {
            self.words
                .get(index)
                .map(|(start, end)| &self.source[*start..*end])
        }
    
        pub fn word_count(&self) -> usize {
            self.words.len()
        }
    
        pub fn source(&self) -> &str {
            &self.source
        }
    }
    
    /// Cow-based approach
    use std::borrow::Cow;
    
    pub enum StringOrStatic<'a> {
        Static(&'static str),
        Owned(String),
        Borrowed(&'a str),
    }
    
    impl<'a> StringOrStatic<'a> {
        pub fn as_str(&self) -> &str {
            match self {
                Self::Static(s) => s,
                Self::Owned(s) => s,
                Self::Borrowed(s) => s,
            }
        }
    
        pub fn into_owned(self) -> String {
            match self {
                Self::Static(s) => s.to_string(),
                Self::Owned(s) => s,
                Self::Borrowed(s) => s.to_string(),
            }
        }
    }
    
    /// Using Cow for zero-copy when possible
    pub fn maybe_uppercase(s: &str) -> Cow<'_, str> {
        if s.chars().all(|c| !c.is_lowercase()) {
            Cow::Borrowed(s)
        } else {
            Cow::Owned(s.to_uppercase())
        }
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_parsed_string() {
            let ps = ParsedString::new("hello world rust");
            assert_eq!(ps.word_count(), 3);
            assert_eq!(ps.get_word(0), Some("hello"));
            assert_eq!(ps.get_word(1), Some("world"));
            assert_eq!(ps.get_word(2), Some("rust"));
            assert_eq!(ps.get_word(3), None);
        }
    
        #[test]
        fn test_string_or_static() {
            let s = StringOrStatic::Static("hello");
            assert_eq!(s.as_str(), "hello");
    
            let owned = StringOrStatic::Owned(String::from("world"));
            assert_eq!(owned.as_str(), "world");
        }
    
        #[test]
        fn test_cow_no_alloc() {
            let s = "ALREADY UPPER";
            let result = maybe_uppercase(s);
            assert!(matches!(result, Cow::Borrowed(_)));
        }
    
        #[test]
        fn test_cow_with_alloc() {
            let s = "needs uppercase";
            let result = maybe_uppercase(s);
            assert!(matches!(result, Cow::Owned(_)));
            assert_eq!(&*result, "NEEDS UPPERCASE");
        }
    }

    Key Differences

  • Self-referential structs: Rust forbids them (borrow checker) without unsafe; OCaml allows them freely because the GC manages all lifetimes.
  • Zero-copy slices: Rust's offset-based approach returns &str with no allocation; OCaml's String.sub always copies.
  • **Pin**: Rust's Pin<Box<T>> prevents a self-referential struct from moving in memory (which would invalidate internal pointers); OCaml moves objects during GC compaction but updates all pointers automatically.
  • **Cow lifetime**: Rust's Cow<'a, str> carries a lifetime parameter tying the borrowed variant to its source; OCaml has no equivalent — all strings are GC-lifetime.
  • OCaml Approach

    OCaml's GC makes self-referential structures straightforward — the GC follows all pointers, so a struct can hold both an owning reference and a derived slice:

    type parsed = {
      source: string;
      words: (int * int) list;  (* or store string directly *)
    }
    
    let get_word p i =
      let (start, len) = List.nth p.words i in
      String.sub p.source start len  (* allocates — no slice type *)
    

    OCaml's lack of a zero-copy slice type means get_word always allocates with String.sub; Rust's approach is zero-copy.

    Full Source

    #![allow(clippy::all)]
    //! # String Owning References — Self-Referential Patterns
    //!
    //! Patterns for owning data while referencing into it.
    
    use std::pin::Pin;
    
    /// Simple owned string with cached parse result
    pub struct ParsedString {
        source: String,
        words: Vec<(usize, usize)>, // (start, end) indices into source
    }
    
    impl ParsedString {
        pub fn new(s: &str) -> Self {
            let source = s.to_string();
            let words: Vec<_> = source
                .match_indices(char::is_alphanumeric)
                .map(|(i, _)| (i, i + 1))
                .collect();
    
            // Actually find word boundaries
            let mut words = Vec::new();
            let mut start = None;
    
            for (i, c) in source.char_indices() {
                if c.is_alphanumeric() {
                    if start.is_none() {
                        start = Some(i);
                    }
                } else if let Some(s) = start {
                    words.push((s, i));
                    start = None;
                }
            }
            if let Some(s) = start {
                words.push((s, source.len()));
            }
    
            Self { source, words }
        }
    
        pub fn get_word(&self, index: usize) -> Option<&str> {
            self.words
                .get(index)
                .map(|(start, end)| &self.source[*start..*end])
        }
    
        pub fn word_count(&self) -> usize {
            self.words.len()
        }
    
        pub fn source(&self) -> &str {
            &self.source
        }
    }
    
    /// Cow-based approach
    use std::borrow::Cow;
    
    pub enum StringOrStatic<'a> {
        Static(&'static str),
        Owned(String),
        Borrowed(&'a str),
    }
    
    impl<'a> StringOrStatic<'a> {
        pub fn as_str(&self) -> &str {
            match self {
                Self::Static(s) => s,
                Self::Owned(s) => s,
                Self::Borrowed(s) => s,
            }
        }
    
        pub fn into_owned(self) -> String {
            match self {
                Self::Static(s) => s.to_string(),
                Self::Owned(s) => s,
                Self::Borrowed(s) => s.to_string(),
            }
        }
    }
    
    /// Using Cow for zero-copy when possible
    pub fn maybe_uppercase(s: &str) -> Cow<'_, str> {
        if s.chars().all(|c| !c.is_lowercase()) {
            Cow::Borrowed(s)
        } else {
            Cow::Owned(s.to_uppercase())
        }
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_parsed_string() {
            let ps = ParsedString::new("hello world rust");
            assert_eq!(ps.word_count(), 3);
            assert_eq!(ps.get_word(0), Some("hello"));
            assert_eq!(ps.get_word(1), Some("world"));
            assert_eq!(ps.get_word(2), Some("rust"));
            assert_eq!(ps.get_word(3), None);
        }
    
        #[test]
        fn test_string_or_static() {
            let s = StringOrStatic::Static("hello");
            assert_eq!(s.as_str(), "hello");
    
            let owned = StringOrStatic::Owned(String::from("world"));
            assert_eq!(owned.as_str(), "world");
        }
    
        #[test]
        fn test_cow_no_alloc() {
            let s = "ALREADY UPPER";
            let result = maybe_uppercase(s);
            assert!(matches!(result, Cow::Borrowed(_)));
        }
    
        #[test]
        fn test_cow_with_alloc() {
            let s = "needs uppercase";
            let result = maybe_uppercase(s);
            assert!(matches!(result, Cow::Owned(_)));
            assert_eq!(&*result, "NEEDS UPPERCASE");
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn test_parsed_string() {
            let ps = ParsedString::new("hello world rust");
            assert_eq!(ps.word_count(), 3);
            assert_eq!(ps.get_word(0), Some("hello"));
            assert_eq!(ps.get_word(1), Some("world"));
            assert_eq!(ps.get_word(2), Some("rust"));
            assert_eq!(ps.get_word(3), None);
        }
    
        #[test]
        fn test_string_or_static() {
            let s = StringOrStatic::Static("hello");
            assert_eq!(s.as_str(), "hello");
    
            let owned = StringOrStatic::Owned(String::from("world"));
            assert_eq!(owned.as_str(), "world");
        }
    
        #[test]
        fn test_cow_no_alloc() {
            let s = "ALREADY UPPER";
            let result = maybe_uppercase(s);
            assert!(matches!(result, Cow::Borrowed(_)));
        }
    
        #[test]
        fn test_cow_with_alloc() {
            let s = "needs uppercase";
            let result = maybe_uppercase(s);
            assert!(matches!(result, Cow::Owned(_)));
            assert_eq!(&*result, "NEEDS UPPERCASE");
        }
    }

    Deep Comparison

    String Owning Ref: Comparison

    See src/lib.rs for the Rust implementation.

    Exercises

  • Line-number tracker: Build LineIndex { source: String, line_starts: Vec<usize> } that precomputes newline positions and provides fn line(&self, n: usize) -> &str.
  • Token stream: Build Tokenizer that stores the source String and a Vec<(TokenKind, usize, usize)> for token type, start, and end byte offsets. Implement an iterator that yields (TokenKind, &str) by slicing on demand.
  • **Genuine self-reference with ouroboros**: Use the ouroboros crate to create a SelfRefParsed struct that safely stores both the String and Vec<&str> references, and compare ergonomics against the offset approach.
  • Open Source Repos