ffi string conversion
Tutorial
The Problem
This example covers a specific aspect of Rust's unsafe programming model: raw memory manipulation, FFI interop, allocator customization, or soundness principles. These topics are essential for systems programming β writing OS components, device drivers, game engines, and any code that must interact with C libraries or control memory layout precisely. Rust's unsafe system is designed to confine unsafety to small, auditable regions while maintaining safety in the surrounding code.
🎯 Learning Outcomes
Code Example
use std::ffi::{CStr, CString};
// Rust β C: allocate a heap-owned, null-terminated buffer.
fn rust_to_cstring(s: &str) -> Result<CString, std::ffi::NulError> {
CString::new(s)
}
// C β Rust: borrow the C buffer as a &CStr, then validate UTF-8.
unsafe fn ptr_to_str<'a>(ptr: *const std::os::raw::c_char) -> &'a str {
CStr::from_ptr(ptr).to_str().expect("not valid UTF-8")
}Key Differences
unsafe for these operations; OCaml achieves safety through the GC and type system without explicit unsafe regions.extern "C"; OCaml uses ctypes which wraps C types in OCaml values.#[repr(C)], custom allocators); OCaml's GC manages memory layout automatically.OCaml Approach
OCaml's GC and type system eliminate most of the need for these unsafe operations. The equivalent functionality typically uses:
ctypes library for external function callsBigarray for controlled raw memory access Bytes.t for mutable byte sequencesOCaml programs rarely need operations equivalent to these Rust unsafe patterns.
Full Source
#![allow(clippy::all)]
//! 712 β String / CString / CStr Conversion for FFI
//!
//! Rust strings (`&str`, `String`) are UTF-8 and length-prefixed.
//! C strings (`char*`) are null-terminated and encoding-agnostic.
//! `CString` and `CStr` bridge these two worlds without leaking memory
//! or invoking undefined behaviour.
//!
//! Two directions:
//! Rust β C: `CString::new(s)` β heap-allocated, null-terminated, owned.
//! C β Rust: `CStr::from_ptr(ptr)` β borrows the C buffer, zero-copy.
use std::ffi::{CStr, CString, NulError};
use std::os::raw::c_char;
// ββ Rust β C direction ββββββββββββββββββββββββββββββββββββββββββββββββββββ
/// Convert a Rust `&str` into a heap-allocated, null-terminated `CString`.
///
/// Returns `Err` if `s` contains an interior NUL byte, which would silently
/// truncate the string from C's perspective.
pub fn rust_to_cstring(s: &str) -> Result<CString, NulError> {
CString::new(s)
}
/// Get the raw `*const c_char` pointer from a `CString` for passing to C.
///
/// The pointer is valid only as long as the `CString` is alive β store the
/// `CString` in a local variable for the duration of the FFI call.
///
/// # Safety
/// The returned pointer must not outlive `cs`.
pub fn cstring_as_ptr(cs: &CString) -> *const c_char {
cs.as_ptr()
}
// ββ C β Rust direction ββββββββββββββββββββββββββββββββββββββββββββββββββββ
/// Borrow a null-terminated C string as a `&CStr`.
///
/// # Safety
/// `ptr` must be non-null and point to a valid, null-terminated C string
/// for at least the lifetime of the returned `&CStr`.
pub unsafe fn ptr_to_cstr<'a>(ptr: *const c_char) -> &'a CStr {
// SAFETY: caller guarantees ptr is non-null and null-terminated.
CStr::from_ptr(ptr)
}
/// Convert a `&CStr` to a Rust `&str`, returning an error if the bytes are
/// not valid UTF-8.
pub fn cstr_to_str(cs: &CStr) -> Result<&str, std::str::Utf8Error> {
cs.to_str()
}
/// Full round-trip: C pointer β owned `String`, validating UTF-8.
///
/// # Safety
/// `ptr` must be non-null and point to a valid, null-terminated C string.
pub unsafe fn ptr_to_string(ptr: *const c_char) -> Result<String, std::str::Utf8Error> {
// SAFETY: propagated from caller guarantee.
let cstr = CStr::from_ptr(ptr);
cstr.to_str().map(str::to_owned)
}
// ββ Simulated C functions (self-contained, no external linker needed) βββββ
/// Simulated C: returns a static greeting string (null-terminated C literal).
///
/// The `c"..."` literal (Rust 1.77+) is placed in `.rodata`; `.as_ptr()` yields
/// a `*const c_char` valid for the process lifetime.
#[no_mangle]
pub extern "C" fn c_greeting() -> *const c_char {
c"Hello from the C side!".as_ptr()
}
/// Simulated C: compute the length of a null-terminated string.
///
/// # Safety
/// `s` must be non-null and null-terminated.
#[no_mangle]
pub unsafe extern "C" fn c_strlen(s: *const c_char) -> usize {
if s.is_null() {
return 0;
}
// SAFETY: caller guarantees s is non-null and null-terminated.
CStr::from_ptr(s).to_bytes().len()
}
// ββ Safe wrapper over the simulated C functions βββββββββββββββββββββββββββ
/// Retrieve the greeting from the simulated C library as an owned `String`.
pub fn get_greeting() -> String {
let ptr = c_greeting();
// SAFETY: c_greeting() returns a pointer to a 'static null-terminated
// byte literal. It is non-null and valid for the process lifetime.
unsafe { CStr::from_ptr(ptr).to_string_lossy().into_owned() }
}
/// Compute the byte length of a Rust string via the simulated C strlen.
pub fn string_len_via_c(s: &str) -> Result<usize, NulError> {
let cs = CString::new(s)?;
// SAFETY: cs is alive for the duration of this call; c_strlen only reads
// until the null terminator.
Ok(unsafe { c_strlen(cs.as_ptr()) })
}
// ββ Tests βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[cfg(test)]
mod tests {
use super::*;
// ββ rust_to_cstring ββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_rust_to_cstring_happy_path() {
let cs = rust_to_cstring("hello").unwrap();
// CStr should compare equal to the original bytes.
assert_eq!(cs.to_str().unwrap(), "hello");
}
#[test]
fn test_rust_to_cstring_interior_nul_is_error() {
// A NUL byte inside the string must produce an error, not silent truncation.
assert!(rust_to_cstring("hel\0lo").is_err());
}
#[test]
fn test_rust_to_cstring_empty_string() {
let cs = rust_to_cstring("").unwrap();
assert_eq!(cs.to_str().unwrap(), "");
// Even an empty CString is null-terminated: length in bytes == 1 (the NUL).
assert_eq!(cs.as_bytes_with_nul().len(), 1);
}
#[test]
fn test_rust_to_cstring_unicode() {
// UTF-8 content survives the round-trip as long as there's no interior NUL.
let cs = rust_to_cstring("γγγ«γ‘γ―").unwrap();
assert_eq!(cs.to_str().unwrap(), "γγγ«γ‘γ―");
}
// ββ ptr_to_cstr / ptr_to_string βββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_ptr_to_cstr_from_static_literal() {
let ptr = b"static\0".as_ptr() as *const c_char;
// SAFETY: ptr points to a NUL-terminated byte literal with 'static lifetime.
let s = unsafe { ptr_to_cstr(ptr) };
assert_eq!(s.to_str().unwrap(), "static");
}
#[test]
fn test_ptr_to_string_round_trip() {
let original = "round-trip";
let cs = CString::new(original).unwrap();
// SAFETY: cs is alive for the duration of this block.
let recovered = unsafe { ptr_to_string(cs.as_ptr()) }.unwrap();
assert_eq!(recovered, original);
}
// ββ cstr_to_str UTF-8 validation βββββββββββββββββββββββββββββββββββββ
#[test]
fn test_cstr_to_str_invalid_utf8_returns_error() {
// 0xFF is not valid UTF-8.
let bytes = b"\xff\0";
// SAFETY: bytes is null-terminated.
let cs = unsafe { CStr::from_bytes_with_nul_unchecked(bytes) };
assert!(cstr_to_str(cs).is_err());
}
// ββ simulated C functions βββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_c_greeting_returns_valid_string() {
let greeting = get_greeting();
assert_eq!(greeting, "Hello from the C side!");
}
#[test]
fn test_c_strlen_empty() {
assert_eq!(string_len_via_c("").unwrap(), 0);
}
#[test]
fn test_c_strlen_ascii() {
assert_eq!(string_len_via_c("hello").unwrap(), 5);
}
#[test]
fn test_c_strlen_null_pointer_returns_zero() {
// Direct call with null β safe wrapper is not involved here.
// SAFETY: c_strlen explicitly checks for null before dereferencing.
assert_eq!(unsafe { c_strlen(std::ptr::null()) }, 0);
}
// ββ cstring_as_ptr lifetime discipline ββββββββββββββββββββββββββββββββ
#[test]
fn test_cstring_as_ptr_is_null_terminated() {
let cs = CString::new("test").unwrap();
let ptr = cstring_as_ptr(&cs);
// SAFETY: cs is alive; ptr is null-terminated by CString invariant.
let back = unsafe { CStr::from_ptr(ptr) };
assert_eq!(back.to_str().unwrap(), "test");
}
}#[cfg(test)]
mod tests {
use super::*;
// ββ rust_to_cstring ββββββββββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_rust_to_cstring_happy_path() {
let cs = rust_to_cstring("hello").unwrap();
// CStr should compare equal to the original bytes.
assert_eq!(cs.to_str().unwrap(), "hello");
}
#[test]
fn test_rust_to_cstring_interior_nul_is_error() {
// A NUL byte inside the string must produce an error, not silent truncation.
assert!(rust_to_cstring("hel\0lo").is_err());
}
#[test]
fn test_rust_to_cstring_empty_string() {
let cs = rust_to_cstring("").unwrap();
assert_eq!(cs.to_str().unwrap(), "");
// Even an empty CString is null-terminated: length in bytes == 1 (the NUL).
assert_eq!(cs.as_bytes_with_nul().len(), 1);
}
#[test]
fn test_rust_to_cstring_unicode() {
// UTF-8 content survives the round-trip as long as there's no interior NUL.
let cs = rust_to_cstring("γγγ«γ‘γ―").unwrap();
assert_eq!(cs.to_str().unwrap(), "γγγ«γ‘γ―");
}
// ββ ptr_to_cstr / ptr_to_string βββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_ptr_to_cstr_from_static_literal() {
let ptr = b"static\0".as_ptr() as *const c_char;
// SAFETY: ptr points to a NUL-terminated byte literal with 'static lifetime.
let s = unsafe { ptr_to_cstr(ptr) };
assert_eq!(s.to_str().unwrap(), "static");
}
#[test]
fn test_ptr_to_string_round_trip() {
let original = "round-trip";
let cs = CString::new(original).unwrap();
// SAFETY: cs is alive for the duration of this block.
let recovered = unsafe { ptr_to_string(cs.as_ptr()) }.unwrap();
assert_eq!(recovered, original);
}
// ββ cstr_to_str UTF-8 validation βββββββββββββββββββββββββββββββββββββ
#[test]
fn test_cstr_to_str_invalid_utf8_returns_error() {
// 0xFF is not valid UTF-8.
let bytes = b"\xff\0";
// SAFETY: bytes is null-terminated.
let cs = unsafe { CStr::from_bytes_with_nul_unchecked(bytes) };
assert!(cstr_to_str(cs).is_err());
}
// ββ simulated C functions βββββββββββββββββββββββββββββββββββββββββββββ
#[test]
fn test_c_greeting_returns_valid_string() {
let greeting = get_greeting();
assert_eq!(greeting, "Hello from the C side!");
}
#[test]
fn test_c_strlen_empty() {
assert_eq!(string_len_via_c("").unwrap(), 0);
}
#[test]
fn test_c_strlen_ascii() {
assert_eq!(string_len_via_c("hello").unwrap(), 5);
}
#[test]
fn test_c_strlen_null_pointer_returns_zero() {
// Direct call with null β safe wrapper is not involved here.
// SAFETY: c_strlen explicitly checks for null before dereferencing.
assert_eq!(unsafe { c_strlen(std::ptr::null()) }, 0);
}
// ββ cstring_as_ptr lifetime discipline ββββββββββββββββββββββββββββββββ
#[test]
fn test_cstring_as_ptr_is_null_terminated() {
let cs = CString::new("test").unwrap();
let ptr = cstring_as_ptr(&cs);
// SAFETY: cs is alive; ptr is null-terminated by CString invariant.
let back = unsafe { CStr::from_ptr(ptr) };
assert_eq!(back.to_str().unwrap(), "test");
}
}
Deep Comparison
OCaml vs Rust: String/CString/CStr Conversion for FFI
Side-by-Side Code
OCaml
(* OCaml: manual null-termination using Bytes *)
let to_c_string (s : string) : bytes =
let n = String.length s in
let b = Bytes.create (n + 1) in
Bytes.blit_string s 0 b 0 n;
Bytes.set b n '\000';
b
let c_strlen (b : bytes) : int =
let rec go i =
if i >= Bytes.length b || Bytes.get b i = '\000' then i else go (i + 1)
in go 0
let from_c_string (b : bytes) : string =
Bytes.sub_string b 0 (c_strlen b)
let () =
let s = "Hello, FFI!" in
let cs = to_c_string s in
assert (c_strlen cs = String.length s);
assert (from_c_string cs = s);
print_endline "ok"
Rust (idiomatic β using CString / CStr)
use std::ffi::{CStr, CString};
// Rust β C: allocate a heap-owned, null-terminated buffer.
fn rust_to_cstring(s: &str) -> Result<CString, std::ffi::NulError> {
CString::new(s)
}
// C β Rust: borrow the C buffer as a &CStr, then validate UTF-8.
unsafe fn ptr_to_str<'a>(ptr: *const std::os::raw::c_char) -> &'a str {
CStr::from_ptr(ptr).to_str().expect("not valid UTF-8")
}
Rust (manual / functional β mirrors the OCaml recursive strlen)
// Recursive C strlen β mirrors OCaml's `go i` accumulator pattern.
unsafe fn manual_strlen(ptr: *const u8) -> usize {
if *ptr == 0 { 0 } else { 1 + manual_strlen(ptr.add(1)) }
}
Type Signatures
| Concept | OCaml | Rust |
|---|---|---|
| Rust-owned C string | bytes (manual) | CString |
| Borrowed C string | bytes slice | &CStr |
| Raw C pointer | 'a Bigarray / nativeint | *const c_char |
| Conversion to string | Bytes.sub_string | CStr::to_str() -> Result<&str, Utf8Error> |
| Interior NUL guard | runtime String.contains '\000' (manual) | CString::new returns Err(NulError) |
| UTF-8 validation | no built-in (OCaml is byte-agnostic) | CStr::to_str() enforces UTF-8 |
Key Insights
to_c_string returns a bytes value whose lifetime is managed by the GC β there's no dangling-pointer risk. In Rust, CString is a heap-allocated RAII type; calling .as_ptr() borrows from the CString, so the CString must outlive the pointer. The compiler enforces this statically.bytes is just bytes β the programmer manually appends '\000'. Rust's CString guarantees null-termination by construction; you cannot create one without the terminator, and you cannot get a &CStr from bytes that aren't null-terminated.'\0', CString::new returns Err(NulError) instead of silently truncating the C string at the first NUL β a common source of FFI security bugs.&str is always UTF-8. When reading a *const c_char from C, CStr::to_str() validates UTF-8 and returns Err(Utf8Error) rather than producing a corrupted &str.CStr::from_ptr borrows the C buffer directly β no allocation, no copy. OCaml's from_c_string always allocates a new string. Rust pays for allocation only when you call .to_owned() or .to_string_lossy().into_owned().When to Use Each Style
**Use CString / CStr (idiomatic Rust) when:** calling real C libraries (libc, system calls, C extensions). These types prevent the null-termination and UTF-8 bugs at compile time and make FFI audits easier.
Use manual byte manipulation when: you need precise control over the buffer layout (e.g., fixed-size stack buffers, MaybeUninit patterns for output parameters) or when interoperating with non-UTF-8 encodings where CStr::to_str() would always fail and to_string_lossy() better reflects intent.
Exercises
bytemuck for transmute, CString for FFI strings) and implement it.