730-small-string-optimization — Small String Optimization
Tutorial
The Problem
Most strings in real applications are short: identifiers, keys, tags, status codes. Yet String always heap-allocates, adding a pointer indirection, an allocator round-trip, and cache pressure. Small String Optimization (SSO) stores strings up to a threshold length (here 23 bytes) directly inside the enum variant, avoiding any heap allocation. This technique is used in C++'s std::string, Rust's smol_str and compact_str crates, and many database engines for short column values.
🎯 Learning Outcomes
Inline and Heap variants[u8; 23] array with a separate len: u8 field to fit 24 bytes totalStringis_inline() can guide hot-path decisions in query enginesCode Example
#![allow(clippy::all)]
/// 730: Small String Optimization
/// Stores ≤23 bytes inline; falls back to `Box<str>` for longer strings.
const INLINE_CAP: usize = 23;
/// An SSO string. Size = 24 bytes (same as String on 64-bit).
#[derive(Debug)]
enum SsoString {
Inline { buf: [u8; INLINE_CAP], len: u8 },
Heap(Box<str>),
}
impl SsoString {
pub fn new(s: &str) -> Self {
if s.len() <= INLINE_CAP {
let mut buf = [0u8; INLINE_CAP];
buf[..s.len()].copy_from_slice(s.as_bytes());
SsoString::Inline {
buf,
len: s.len() as u8,
}
} else {
SsoString::Heap(s.into())
}
}
pub fn as_str(&self) -> &str {
match self {
SsoString::Inline { buf, len } => std::str::from_utf8(&buf[..*len as usize]).unwrap(),
SsoString::Heap(s) => s,
}
}
pub fn len(&self) -> usize {
match self {
SsoString::Inline { len, .. } => *len as usize,
SsoString::Heap(s) => s.len(),
}
}
pub fn is_empty(&self) -> bool {
self.len() == 0
}
pub fn is_inline(&self) -> bool {
matches!(self, SsoString::Inline { .. })
}
}
impl std::fmt::Display for SsoString {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.write_str(self.as_str())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn empty_is_inline() {
let s = SsoString::new("");
assert!(s.is_inline());
assert_eq!(s.len(), 0);
assert_eq!(s.as_str(), "");
}
#[test]
fn short_string_inline() {
let s = SsoString::new("hello");
assert!(s.is_inline());
assert_eq!(s.as_str(), "hello");
}
#[test]
fn boundary_23_bytes_is_inline() {
let s23 = "a".repeat(INLINE_CAP);
let sso = SsoString::new(&s23);
assert!(sso.is_inline());
assert_eq!(sso.as_str(), s23);
}
#[test]
fn boundary_24_bytes_is_heap() {
let s24 = "a".repeat(INLINE_CAP + 1);
let sso = SsoString::new(&s24);
assert!(!sso.is_inline());
assert_eq!(sso.as_str(), s24);
}
#[test]
fn long_string_heap() {
let long = "this is a long string that exceeds the inline capacity";
let sso = SsoString::new(long);
assert!(!sso.is_inline());
assert_eq!(sso.as_str(), long);
}
}Key Differences
string size is determined by the runtime block header format.SsoString is immutable after construction; OCaml's Bytes.t is a mutable byte array, distinct from the immutable string type.smol_str, compact_str, and inline-str crates for production SSO; OCaml has no widely adopted equivalent.OCaml Approach
OCaml's string type is heap-allocated via the GC but represented as a flat byte array with no separate length word overhead (length is stored in the GC block header). For very short strings OCaml's minor GC makes allocation nearly free. The Bytes module provides mutable string buffers. There is no standard SSO type, but libraries like Base use compact representations for identifiers.
Full Source
#![allow(clippy::all)]
/// 730: Small String Optimization
/// Stores ≤23 bytes inline; falls back to `Box<str>` for longer strings.
const INLINE_CAP: usize = 23;
/// An SSO string. Size = 24 bytes (same as String on 64-bit).
#[derive(Debug)]
enum SsoString {
Inline { buf: [u8; INLINE_CAP], len: u8 },
Heap(Box<str>),
}
impl SsoString {
pub fn new(s: &str) -> Self {
if s.len() <= INLINE_CAP {
let mut buf = [0u8; INLINE_CAP];
buf[..s.len()].copy_from_slice(s.as_bytes());
SsoString::Inline {
buf,
len: s.len() as u8,
}
} else {
SsoString::Heap(s.into())
}
}
pub fn as_str(&self) -> &str {
match self {
SsoString::Inline { buf, len } => std::str::from_utf8(&buf[..*len as usize]).unwrap(),
SsoString::Heap(s) => s,
}
}
pub fn len(&self) -> usize {
match self {
SsoString::Inline { len, .. } => *len as usize,
SsoString::Heap(s) => s.len(),
}
}
pub fn is_empty(&self) -> bool {
self.len() == 0
}
pub fn is_inline(&self) -> bool {
matches!(self, SsoString::Inline { .. })
}
}
impl std::fmt::Display for SsoString {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.write_str(self.as_str())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn empty_is_inline() {
let s = SsoString::new("");
assert!(s.is_inline());
assert_eq!(s.len(), 0);
assert_eq!(s.as_str(), "");
}
#[test]
fn short_string_inline() {
let s = SsoString::new("hello");
assert!(s.is_inline());
assert_eq!(s.as_str(), "hello");
}
#[test]
fn boundary_23_bytes_is_inline() {
let s23 = "a".repeat(INLINE_CAP);
let sso = SsoString::new(&s23);
assert!(sso.is_inline());
assert_eq!(sso.as_str(), s23);
}
#[test]
fn boundary_24_bytes_is_heap() {
let s24 = "a".repeat(INLINE_CAP + 1);
let sso = SsoString::new(&s24);
assert!(!sso.is_inline());
assert_eq!(sso.as_str(), s24);
}
#[test]
fn long_string_heap() {
let long = "this is a long string that exceeds the inline capacity";
let sso = SsoString::new(long);
assert!(!sso.is_inline());
assert_eq!(sso.as_str(), long);
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn empty_is_inline() {
let s = SsoString::new("");
assert!(s.is_inline());
assert_eq!(s.len(), 0);
assert_eq!(s.as_str(), "");
}
#[test]
fn short_string_inline() {
let s = SsoString::new("hello");
assert!(s.is_inline());
assert_eq!(s.as_str(), "hello");
}
#[test]
fn boundary_23_bytes_is_inline() {
let s23 = "a".repeat(INLINE_CAP);
let sso = SsoString::new(&s23);
assert!(sso.is_inline());
assert_eq!(sso.as_str(), s23);
}
#[test]
fn boundary_24_bytes_is_heap() {
let s24 = "a".repeat(INLINE_CAP + 1);
let sso = SsoString::new(&s24);
assert!(!sso.is_inline());
assert_eq!(sso.as_str(), s24);
}
#[test]
fn long_string_heap() {
let long = "this is a long string that exceeds the inline capacity";
let sso = SsoString::new(long);
assert!(!sso.is_inline());
assert_eq!(sso.as_str(), long);
}
}
Exercises
SsoString to support push_str that transitions Inline to Heap when the result exceeds 23 bytes.Concat associated function that combines two SsoString values without allocating when both fit inline.SsoString::new against String::from for strings of length 1, 12, 23, and 50 bytes. Plot the crossover point.