712 Fundamental

ffi string conversion

Functional Programming

Tutorial

The Problem

This example covers a specific aspect of Rust's unsafe programming model: raw memory manipulation, FFI interop, allocator customization, or soundness principles. These topics are essential for systems programming — writing OS components, device drivers, game engines, and any code that must interact with C libraries or control memory layout precisely. Rust's unsafe system is designed to confine unsafety to small, auditable regions while maintaining safety in the surrounding code.

🎯 Learning Outcomes

• The specific unsafe feature demonstrated: ffi string conversion

• When this feature is necessary vs when safe alternatives exist

• How to use it correctly with appropriate SAFETY documentation

• The invariants that must be maintained for the operation to be sound

• Real-world contexts: embedded systems, OS kernels, C FFI, performance-critical code

Code Example

use std::ffi::{CStr, CString};

// Rust → C: allocate a heap-owned, null-terminated buffer.
fn rust_to_cstring(s: &str) -> Result<CString, std::ffi::NulError> {
    CString::new(s)
}

// C → Rust: borrow the C buffer as a &CStr, then validate UTF-8.
unsafe fn ptr_to_str<'a>(ptr: *const std::os::raw::c_char) -> &'a str {
    CStr::from_ptr(ptr).to_str().expect("not valid UTF-8")
}

(* OCaml: manual null-termination using Bytes *)
let to_c_string (s : string) : bytes =
  let n = String.length s in
  let b = Bytes.create (n + 1) in
  Bytes.blit_string s 0 b 0 n;
  Bytes.set b n '\000';
  b

let c_strlen (b : bytes) : int =
  let rec go i =
    if i >= Bytes.length b || Bytes.get b i = '\000' then i else go (i + 1)
  in go 0

let from_c_string (b : bytes) : string =
  Bytes.sub_string b 0 (c_strlen b)

let () =
  let s = "Hello, FFI!" in
  let cs = to_c_string s in
  assert (c_strlen cs = String.length s);
  assert (from_c_string cs = s);
  print_endline "ok"

Key Differences

Safety model: Rust requires explicit unsafe for these operations; OCaml achieves safety through the GC and type system without explicit unsafe regions.

FFI approach: Rust uses raw C types directly with extern "C"; OCaml uses ctypes which wraps C types in OCaml values.

Memory control: Rust allows complete control over memory layout (#[repr(C)], custom allocators); OCaml's GC manages memory layout automatically.

Auditability: Rust unsafe regions are syntactically visible and toolable; OCaml unsafe operations (Obj.magic, direct C calls) are also explicit but less common.

OCaml Approach

OCaml's GC and type system eliminate most of the need for these unsafe operations. The equivalent functionality typically uses:

• C FFI via the ctypes library for external function calls

• Bigarray for controlled raw memory access

• The GC for memory management (no manual allocators needed)

• Bytes.t for mutable byte sequences

OCaml programs rarely need operations equivalent to these Rust unsafe patterns.

Full Source

#![allow(clippy::all)]
//! 712 — String / CString / CStr Conversion for FFI
//!
//! Rust strings (`&str`, `String`) are UTF-8 and length-prefixed.
//! C strings (`char*`) are null-terminated and encoding-agnostic.
//! `CString` and `CStr` bridge these two worlds without leaking memory
//! or invoking undefined behaviour.
//!
//! Two directions:
//!   Rust → C: `CString::new(s)` — heap-allocated, null-terminated, owned.
//!   C → Rust: `CStr::from_ptr(ptr)` — borrows the C buffer, zero-copy.

use std::ffi::{CStr, CString, NulError};
use std::os::raw::c_char;

// ── Rust → C direction ────────────────────────────────────────────────────

/// Convert a Rust `&str` into a heap-allocated, null-terminated `CString`.
///
/// Returns `Err` if `s` contains an interior NUL byte, which would silently
/// truncate the string from C's perspective.
pub fn rust_to_cstring(s: &str) -> Result<CString, NulError> {
    CString::new(s)
}

/// Get the raw `*const c_char` pointer from a `CString` for passing to C.
///
/// The pointer is valid only as long as the `CString` is alive — store the
/// `CString` in a local variable for the duration of the FFI call.
///
/// # Safety
/// The returned pointer must not outlive `cs`.
pub fn cstring_as_ptr(cs: &CString) -> *const c_char {
    cs.as_ptr()
}

// ── C → Rust direction ────────────────────────────────────────────────────

/// Borrow a null-terminated C string as a `&CStr`.
///
/// # Safety
/// `ptr` must be non-null and point to a valid, null-terminated C string
/// for at least the lifetime of the returned `&CStr`.
pub unsafe fn ptr_to_cstr<'a>(ptr: *const c_char) -> &'a CStr {
    // SAFETY: caller guarantees ptr is non-null and null-terminated.
    CStr::from_ptr(ptr)
}

/// Convert a `&CStr` to a Rust `&str`, returning an error if the bytes are
/// not valid UTF-8.
pub fn cstr_to_str(cs: &CStr) -> Result<&str, std::str::Utf8Error> {
    cs.to_str()
}

/// Full round-trip: C pointer → owned `String`, validating UTF-8.
///
/// # Safety
/// `ptr` must be non-null and point to a valid, null-terminated C string.
pub unsafe fn ptr_to_string(ptr: *const c_char) -> Result<String, std::str::Utf8Error> {
    // SAFETY: propagated from caller guarantee.
    let cstr = CStr::from_ptr(ptr);
    cstr.to_str().map(str::to_owned)
}

// ── Simulated C functions (self-contained, no external linker needed) ─────

/// Simulated C: returns a static greeting string (null-terminated C literal).
///
/// The `c"..."` literal (Rust 1.77+) is placed in `.rodata`; `.as_ptr()` yields
/// a `*const c_char` valid for the process lifetime.
#[no_mangle]
pub extern "C" fn c_greeting() -> *const c_char {
    c"Hello from the C side!".as_ptr()
}

/// Simulated C: compute the length of a null-terminated string.
///
/// # Safety
/// `s` must be non-null and null-terminated.
#[no_mangle]
pub unsafe extern "C" fn c_strlen(s: *const c_char) -> usize {
    if s.is_null() {
        return 0;
    }
    // SAFETY: caller guarantees s is non-null and null-terminated.
    CStr::from_ptr(s).to_bytes().len()
}

// ── Safe wrapper over the simulated C functions ───────────────────────────

/// Retrieve the greeting from the simulated C library as an owned `String`.
pub fn get_greeting() -> String {
    let ptr = c_greeting();
    // SAFETY: c_greeting() returns a pointer to a 'static null-terminated
    // byte literal. It is non-null and valid for the process lifetime.
    unsafe { CStr::from_ptr(ptr).to_string_lossy().into_owned() }
}

/// Compute the byte length of a Rust string via the simulated C strlen.
pub fn string_len_via_c(s: &str) -> Result<usize, NulError> {
    let cs = CString::new(s)?;
    // SAFETY: cs is alive for the duration of this call; c_strlen only reads
    // until the null terminator.
    Ok(unsafe { c_strlen(cs.as_ptr()) })
}

// ── Tests ─────────────────────────────────────────────────────────────────

#[cfg(test)]
mod tests {
    use super::*;

    // ── rust_to_cstring ────────────────────────────────────────────────────

    #[test]
    fn test_rust_to_cstring_happy_path() {
        let cs = rust_to_cstring("hello").unwrap();
        // CStr should compare equal to the original bytes.
        assert_eq!(cs.to_str().unwrap(), "hello");
    }

    #[test]
    fn test_rust_to_cstring_interior_nul_is_error() {
        // A NUL byte inside the string must produce an error, not silent truncation.
        assert!(rust_to_cstring("hel\0lo").is_err());
    }

    #[test]
    fn test_rust_to_cstring_empty_string() {
        let cs = rust_to_cstring("").unwrap();
        assert_eq!(cs.to_str().unwrap(), "");
        // Even an empty CString is null-terminated: length in bytes == 1 (the NUL).
        assert_eq!(cs.as_bytes_with_nul().len(), 1);
    }

    #[test]
    fn test_rust_to_cstring_unicode() {
        // UTF-8 content survives the round-trip as long as there's no interior NUL.
        let cs = rust_to_cstring("こんにちは").unwrap();
        assert_eq!(cs.to_str().unwrap(), "こんにちは");
    }

    // ── ptr_to_cstr / ptr_to_string ───────────────────────────────────────

    #[test]
    fn test_ptr_to_cstr_from_static_literal() {
        let ptr = b"static\0".as_ptr() as *const c_char;
        // SAFETY: ptr points to a NUL-terminated byte literal with 'static lifetime.
        let s = unsafe { ptr_to_cstr(ptr) };
        assert_eq!(s.to_str().unwrap(), "static");
    }

    #[test]
    fn test_ptr_to_string_round_trip() {
        let original = "round-trip";
        let cs = CString::new(original).unwrap();
        // SAFETY: cs is alive for the duration of this block.
        let recovered = unsafe { ptr_to_string(cs.as_ptr()) }.unwrap();
        assert_eq!(recovered, original);
    }

    // ── cstr_to_str UTF-8 validation ─────────────────────────────────────

    #[test]
    fn test_cstr_to_str_invalid_utf8_returns_error() {
        // 0xFF is not valid UTF-8.
        let bytes = b"\xff\0";
        // SAFETY: bytes is null-terminated.
        let cs = unsafe { CStr::from_bytes_with_nul_unchecked(bytes) };
        assert!(cstr_to_str(cs).is_err());
    }

    // ── simulated C functions ─────────────────────────────────────────────

    #[test]
    fn test_c_greeting_returns_valid_string() {
        let greeting = get_greeting();
        assert_eq!(greeting, "Hello from the C side!");
    }

    #[test]
    fn test_c_strlen_empty() {
        assert_eq!(string_len_via_c("").unwrap(), 0);
    }

    #[test]
    fn test_c_strlen_ascii() {
        assert_eq!(string_len_via_c("hello").unwrap(), 5);
    }

    #[test]
    fn test_c_strlen_null_pointer_returns_zero() {
        // Direct call with null — safe wrapper is not involved here.
        // SAFETY: c_strlen explicitly checks for null before dereferencing.
        assert_eq!(unsafe { c_strlen(std::ptr::null()) }, 0);
    }

    // ── cstring_as_ptr lifetime discipline ────────────────────────────────

    #[test]
    fn test_cstring_as_ptr_is_null_terminated() {
        let cs = CString::new("test").unwrap();
        let ptr = cstring_as_ptr(&cs);
        // SAFETY: cs is alive; ptr is null-terminated by CString invariant.
        let back = unsafe { CStr::from_ptr(ptr) };
        assert_eq!(back.to_str().unwrap(), "test");
    }
}

(* OCaml: String is UTF-8 bytes + length; C strings are null-terminated. *)

(** Simulate a C-style null-terminated string as a Bytes. *)
let to_c_string (s : string) : bytes =
  let n = String.length s in
  let b = Bytes.create (n + 1) in
  Bytes.blit_string s 0 b 0 n;
  Bytes.set b n '\000';
  b

(** Read until null terminator — simulate C strlen. *)
let c_strlen (b : bytes) : int =
  let rec go i =
    if i >= Bytes.length b || Bytes.get b i = '\000' then i else go (i + 1)
  in go 0

(** Convert C string (bytes) back to OCaml string. *)
let from_c_string (b : bytes) : string =
  Bytes.sub_string b 0 (c_strlen b)

let () =
  let s = "Hello, FFI!" in
  let cs = to_c_string s in
  Printf.printf "Original:        '%s' (len=%d)\n" s (String.length s);
  Printf.printf "C string strlen: %d\n" (c_strlen cs);
  Printf.printf "C->OCaml:        '%s'\n" (from_c_string cs)

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    // ── rust_to_cstring ────────────────────────────────────────────────────

    #[test]
    fn test_rust_to_cstring_happy_path() {
        let cs = rust_to_cstring("hello").unwrap();
        // CStr should compare equal to the original bytes.
        assert_eq!(cs.to_str().unwrap(), "hello");
    }

    #[test]
    fn test_rust_to_cstring_interior_nul_is_error() {
        // A NUL byte inside the string must produce an error, not silent truncation.
        assert!(rust_to_cstring("hel\0lo").is_err());
    }

    #[test]
    fn test_rust_to_cstring_empty_string() {
        let cs = rust_to_cstring("").unwrap();
        assert_eq!(cs.to_str().unwrap(), "");
        // Even an empty CString is null-terminated: length in bytes == 1 (the NUL).
        assert_eq!(cs.as_bytes_with_nul().len(), 1);
    }

    #[test]
    fn test_rust_to_cstring_unicode() {
        // UTF-8 content survives the round-trip as long as there's no interior NUL.
        let cs = rust_to_cstring("こんにちは").unwrap();
        assert_eq!(cs.to_str().unwrap(), "こんにちは");
    }

    // ── ptr_to_cstr / ptr_to_string ───────────────────────────────────────

    #[test]
    fn test_ptr_to_cstr_from_static_literal() {
        let ptr = b"static\0".as_ptr() as *const c_char;
        // SAFETY: ptr points to a NUL-terminated byte literal with 'static lifetime.
        let s = unsafe { ptr_to_cstr(ptr) };
        assert_eq!(s.to_str().unwrap(), "static");
    }

    #[test]
    fn test_ptr_to_string_round_trip() {
        let original = "round-trip";
        let cs = CString::new(original).unwrap();
        // SAFETY: cs is alive for the duration of this block.
        let recovered = unsafe { ptr_to_string(cs.as_ptr()) }.unwrap();
        assert_eq!(recovered, original);
    }

    // ── cstr_to_str UTF-8 validation ─────────────────────────────────────

    #[test]
    fn test_cstr_to_str_invalid_utf8_returns_error() {
        // 0xFF is not valid UTF-8.
        let bytes = b"\xff\0";
        // SAFETY: bytes is null-terminated.
        let cs = unsafe { CStr::from_bytes_with_nul_unchecked(bytes) };
        assert!(cstr_to_str(cs).is_err());
    }

    // ── simulated C functions ─────────────────────────────────────────────

    #[test]
    fn test_c_greeting_returns_valid_string() {
        let greeting = get_greeting();
        assert_eq!(greeting, "Hello from the C side!");
    }

    #[test]
    fn test_c_strlen_empty() {
        assert_eq!(string_len_via_c("").unwrap(), 0);
    }

    #[test]
    fn test_c_strlen_ascii() {
        assert_eq!(string_len_via_c("hello").unwrap(), 5);
    }

    #[test]
    fn test_c_strlen_null_pointer_returns_zero() {
        // Direct call with null — safe wrapper is not involved here.
        // SAFETY: c_strlen explicitly checks for null before dereferencing.
        assert_eq!(unsafe { c_strlen(std::ptr::null()) }, 0);
    }

    // ── cstring_as_ptr lifetime discipline ────────────────────────────────

    #[test]
    fn test_cstring_as_ptr_is_null_terminated() {
        let cs = CString::new("test").unwrap();
        let ptr = cstring_as_ptr(&cs);
        // SAFETY: cs is alive; ptr is null-terminated by CString invariant.
        let back = unsafe { CStr::from_ptr(ptr) };
        assert_eq!(back.to_str().unwrap(), "test");
    }
}

Deep Comparison

OCaml vs Rust: String/CString/CStr Conversion for FFI

Side-by-Side Code

OCaml

(* OCaml: manual null-termination using Bytes *)
let to_c_string (s : string) : bytes =
  let n = String.length s in
  let b = Bytes.create (n + 1) in
  Bytes.blit_string s 0 b 0 n;
  Bytes.set b n '\000';
  b

let c_strlen (b : bytes) : int =
  let rec go i =
    if i >= Bytes.length b || Bytes.get b i = '\000' then i else go (i + 1)
  in go 0

let from_c_string (b : bytes) : string =
  Bytes.sub_string b 0 (c_strlen b)

let () =
  let s = "Hello, FFI!" in
  let cs = to_c_string s in
  assert (c_strlen cs = String.length s);
  assert (from_c_string cs = s);
  print_endline "ok"

Rust (idiomatic — using `CString` / `CStr`)

use std::ffi::{CStr, CString};

// Rust → C: allocate a heap-owned, null-terminated buffer.
fn rust_to_cstring(s: &str) -> Result<CString, std::ffi::NulError> {
    CString::new(s)
}

// C → Rust: borrow the C buffer as a &CStr, then validate UTF-8.
unsafe fn ptr_to_str<'a>(ptr: *const std::os::raw::c_char) -> &'a str {
    CStr::from_ptr(ptr).to_str().expect("not valid UTF-8")
}

Rust (manual / functional — mirrors the OCaml recursive strlen)

// Recursive C strlen — mirrors OCaml's `go i` accumulator pattern.
unsafe fn manual_strlen(ptr: *const u8) -> usize {
    if *ptr == 0 { 0 } else { 1 + manual_strlen(ptr.add(1)) }
}

Type Signatures

Concept	OCaml	Rust
Rust-owned C string	`bytes` (manual)	`CString`
Borrowed C string	`bytes` slice	`&CStr`
Raw C pointer	`'a Bigarray` / `nativeint`	`*const c_char`
Conversion to string	`Bytes.sub_string`	`CStr::to_str() -> Result<&str, Utf8Error>`
Interior NUL guard	runtime `String.contains '\000'` (manual)	`CString::new` returns `Err(NulError)`
UTF-8 validation	no built-in (OCaml is byte-agnostic)	`CStr::to_str()` enforces UTF-8

Key Insights

Ownership encodes lifetime: In OCaml, to_c_string returns a bytes value whose lifetime is managed by the GC — there's no dangling-pointer risk. In Rust, CString is a heap-allocated RAII type; calling .as_ptr() borrows from the CString, so the CString must outlive the pointer. The compiler enforces this statically.

Null-termination is a type invariant: OCaml's bytes is just bytes — the programmer manually appends '\000'. Rust's CString guarantees null-termination by construction; you cannot create one without the terminator, and you cannot get a &CStr from bytes that aren't null-terminated.

Interior NUL is a type-level error: If the Rust string contains '\0', CString::new returns Err(NulError) instead of silently truncating the C string at the first NUL — a common source of FFI security bugs.

UTF-8 flows in one direction only: OCaml strings are byte sequences without encoding guarantees. Rust &str is always UTF-8. When reading a *const c_char from C, CStr::to_str() validates UTF-8 and returns Err(Utf8Error) rather than producing a corrupted &str.

Zero-copy on the C→Rust path: CStr::from_ptr borrows the C buffer directly — no allocation, no copy. OCaml's from_c_string always allocates a new string. Rust pays for allocation only when you call .to_owned() or .to_string_lossy().into_owned().

When to Use Each Style

**Use CString / CStr (idiomatic Rust) when:** calling real C libraries (libc, system calls, C extensions). These types prevent the null-termination and UTF-8 bugs at compile time and make FFI audits easier.

Use manual byte manipulation when: you need precise control over the buffer layout (e.g., fixed-size stack buffers, MaybeUninit patterns for output parameters) or when interoperating with non-UTF-8 encodings where CStr::to_str() would always fail and to_string_lossy() better reflects intent.

Exercises

Minimize unsafe: Find the smallest possible unsafe region in the source and verify that all safe code is outside the unsafe block.

Safe alternative: Identify if a safe alternative exists for the demonstrated technique (e.g., bytemuck for transmute, CString for FFI strings) and implement it.

SAFETY documentation: Write a complete SAFETY comment for each unsafe block listing preconditions, invariants, and what would break if violated.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust