ExamplesBy LevelBy TopicLearning Paths
730 Fundamental

730-small-string-optimization — Small String Optimization

Functional Programming

Tutorial

The Problem

Most strings in real applications are short: identifiers, keys, tags, status codes. Yet String always heap-allocates, adding a pointer indirection, an allocator round-trip, and cache pressure. Small String Optimization (SSO) stores strings up to a threshold length (here 23 bytes) directly inside the enum variant, avoiding any heap allocation. This technique is used in C++'s std::string, Rust's smol_str and compact_str crates, and many database engines for short column values.

🎯 Learning Outcomes

  • • Implement an SSO string type as a Rust enum with Inline and Heap variants
  • • Store inline bytes in a [u8; 23] array with a separate len: u8 field to fit 24 bytes total
  • • Understand how the enum discriminant and data fit into the same 24-byte footprint as String
  • • Recognize when to fall back to heap allocation for longer strings
  • • See how is_inline() can guide hot-path decisions in query engines
  • Code Example

    #![allow(clippy::all)]
    /// 730: Small String Optimization
    /// Stores ≤23 bytes inline; falls back to `Box<str>` for longer strings.
    
    const INLINE_CAP: usize = 23;
    
    /// An SSO string. Size = 24 bytes (same as String on 64-bit).
    #[derive(Debug)]
    enum SsoString {
        Inline { buf: [u8; INLINE_CAP], len: u8 },
        Heap(Box<str>),
    }
    
    impl SsoString {
        pub fn new(s: &str) -> Self {
            if s.len() <= INLINE_CAP {
                let mut buf = [0u8; INLINE_CAP];
                buf[..s.len()].copy_from_slice(s.as_bytes());
                SsoString::Inline {
                    buf,
                    len: s.len() as u8,
                }
            } else {
                SsoString::Heap(s.into())
            }
        }
    
        pub fn as_str(&self) -> &str {
            match self {
                SsoString::Inline { buf, len } => std::str::from_utf8(&buf[..*len as usize]).unwrap(),
                SsoString::Heap(s) => s,
            }
        }
    
        pub fn len(&self) -> usize {
            match self {
                SsoString::Inline { len, .. } => *len as usize,
                SsoString::Heap(s) => s.len(),
            }
        }
    
        pub fn is_empty(&self) -> bool {
            self.len() == 0
        }
    
        pub fn is_inline(&self) -> bool {
            matches!(self, SsoString::Inline { .. })
        }
    }
    
    impl std::fmt::Display for SsoString {
        fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
            f.write_str(self.as_str())
        }
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn empty_is_inline() {
            let s = SsoString::new("");
            assert!(s.is_inline());
            assert_eq!(s.len(), 0);
            assert_eq!(s.as_str(), "");
        }
    
        #[test]
        fn short_string_inline() {
            let s = SsoString::new("hello");
            assert!(s.is_inline());
            assert_eq!(s.as_str(), "hello");
        }
    
        #[test]
        fn boundary_23_bytes_is_inline() {
            let s23 = "a".repeat(INLINE_CAP);
            let sso = SsoString::new(&s23);
            assert!(sso.is_inline());
            assert_eq!(sso.as_str(), s23);
        }
    
        #[test]
        fn boundary_24_bytes_is_heap() {
            let s24 = "a".repeat(INLINE_CAP + 1);
            let sso = SsoString::new(&s24);
            assert!(!sso.is_inline());
            assert_eq!(sso.as_str(), s24);
        }
    
        #[test]
        fn long_string_heap() {
            let long = "this is a long string that exceeds the inline capacity";
            let sso = SsoString::new(long);
            assert!(!sso.is_inline());
            assert_eq!(sso.as_str(), long);
        }
    }

    Key Differences

  • Allocation model: Rust SSO avoids the heap entirely for short strings; OCaml relies on the GC's minor heap to make short allocations cheap rather than avoiding them.
  • Size control: Rust enums give explicit control over the 24-byte layout; OCaml's string size is determined by the runtime block header format.
  • Mutability: Rust's SsoString is immutable after construction; OCaml's Bytes.t is a mutable byte array, distinct from the immutable string type.
  • Crate ecosystem: Rust has smol_str, compact_str, and inline-str crates for production SSO; OCaml has no widely adopted equivalent.
  • OCaml Approach

    OCaml's string type is heap-allocated via the GC but represented as a flat byte array with no separate length word overhead (length is stored in the GC block header). For very short strings OCaml's minor GC makes allocation nearly free. The Bytes module provides mutable string buffers. There is no standard SSO type, but libraries like Base use compact representations for identifiers.

    Full Source

    #![allow(clippy::all)]
    /// 730: Small String Optimization
    /// Stores ≤23 bytes inline; falls back to `Box<str>` for longer strings.
    
    const INLINE_CAP: usize = 23;
    
    /// An SSO string. Size = 24 bytes (same as String on 64-bit).
    #[derive(Debug)]
    enum SsoString {
        Inline { buf: [u8; INLINE_CAP], len: u8 },
        Heap(Box<str>),
    }
    
    impl SsoString {
        pub fn new(s: &str) -> Self {
            if s.len() <= INLINE_CAP {
                let mut buf = [0u8; INLINE_CAP];
                buf[..s.len()].copy_from_slice(s.as_bytes());
                SsoString::Inline {
                    buf,
                    len: s.len() as u8,
                }
            } else {
                SsoString::Heap(s.into())
            }
        }
    
        pub fn as_str(&self) -> &str {
            match self {
                SsoString::Inline { buf, len } => std::str::from_utf8(&buf[..*len as usize]).unwrap(),
                SsoString::Heap(s) => s,
            }
        }
    
        pub fn len(&self) -> usize {
            match self {
                SsoString::Inline { len, .. } => *len as usize,
                SsoString::Heap(s) => s.len(),
            }
        }
    
        pub fn is_empty(&self) -> bool {
            self.len() == 0
        }
    
        pub fn is_inline(&self) -> bool {
            matches!(self, SsoString::Inline { .. })
        }
    }
    
    impl std::fmt::Display for SsoString {
        fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
            f.write_str(self.as_str())
        }
    }
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn empty_is_inline() {
            let s = SsoString::new("");
            assert!(s.is_inline());
            assert_eq!(s.len(), 0);
            assert_eq!(s.as_str(), "");
        }
    
        #[test]
        fn short_string_inline() {
            let s = SsoString::new("hello");
            assert!(s.is_inline());
            assert_eq!(s.as_str(), "hello");
        }
    
        #[test]
        fn boundary_23_bytes_is_inline() {
            let s23 = "a".repeat(INLINE_CAP);
            let sso = SsoString::new(&s23);
            assert!(sso.is_inline());
            assert_eq!(sso.as_str(), s23);
        }
    
        #[test]
        fn boundary_24_bytes_is_heap() {
            let s24 = "a".repeat(INLINE_CAP + 1);
            let sso = SsoString::new(&s24);
            assert!(!sso.is_inline());
            assert_eq!(sso.as_str(), s24);
        }
    
        #[test]
        fn long_string_heap() {
            let long = "this is a long string that exceeds the inline capacity";
            let sso = SsoString::new(long);
            assert!(!sso.is_inline());
            assert_eq!(sso.as_str(), long);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        #[test]
        fn empty_is_inline() {
            let s = SsoString::new("");
            assert!(s.is_inline());
            assert_eq!(s.len(), 0);
            assert_eq!(s.as_str(), "");
        }
    
        #[test]
        fn short_string_inline() {
            let s = SsoString::new("hello");
            assert!(s.is_inline());
            assert_eq!(s.as_str(), "hello");
        }
    
        #[test]
        fn boundary_23_bytes_is_inline() {
            let s23 = "a".repeat(INLINE_CAP);
            let sso = SsoString::new(&s23);
            assert!(sso.is_inline());
            assert_eq!(sso.as_str(), s23);
        }
    
        #[test]
        fn boundary_24_bytes_is_heap() {
            let s24 = "a".repeat(INLINE_CAP + 1);
            let sso = SsoString::new(&s24);
            assert!(!sso.is_inline());
            assert_eq!(sso.as_str(), s24);
        }
    
        #[test]
        fn long_string_heap() {
            let long = "this is a long string that exceeds the inline capacity";
            let sso = SsoString::new(long);
            assert!(!sso.is_inline());
            assert_eq!(sso.as_str(), long);
        }
    }

    Exercises

  • Extend SsoString to support push_str that transitions Inline to Heap when the result exceeds 23 bytes.
  • Add a Concat associated function that combines two SsoString values without allocating when both fit inline.
  • Benchmark SsoString::new against String::from for strings of length 1, 12, 23, and 50 bytes. Plot the crossover point.
  • Open Source Repos