ExamplesBy LevelBy TopicLearning Paths
712 Fundamental

ffi string conversion

Functional Programming

Tutorial

The Problem

This example covers a specific aspect of Rust's unsafe programming model: raw memory manipulation, FFI interop, allocator customization, or soundness principles. These topics are essential for systems programming β€” writing OS components, device drivers, game engines, and any code that must interact with C libraries or control memory layout precisely. Rust's unsafe system is designed to confine unsafety to small, auditable regions while maintaining safety in the surrounding code.

🎯 Learning Outcomes

  • β€’ The specific unsafe feature demonstrated: ffi string conversion
  • β€’ When this feature is necessary vs when safe alternatives exist
  • β€’ How to use it correctly with appropriate SAFETY documentation
  • β€’ The invariants that must be maintained for the operation to be sound
  • β€’ Real-world contexts: embedded systems, OS kernels, C FFI, performance-critical code
  • Code Example

    use std::ffi::{CStr, CString};
    
    // Rust β†’ C: allocate a heap-owned, null-terminated buffer.
    fn rust_to_cstring(s: &str) -> Result<CString, std::ffi::NulError> {
        CString::new(s)
    }
    
    // C β†’ Rust: borrow the C buffer as a &CStr, then validate UTF-8.
    unsafe fn ptr_to_str<'a>(ptr: *const std::os::raw::c_char) -> &'a str {
        CStr::from_ptr(ptr).to_str().expect("not valid UTF-8")
    }

    Key Differences

  • Safety model: Rust requires explicit unsafe for these operations; OCaml achieves safety through the GC and type system without explicit unsafe regions.
  • FFI approach: Rust uses raw C types directly with extern "C"; OCaml uses ctypes which wraps C types in OCaml values.
  • Memory control: Rust allows complete control over memory layout (#[repr(C)], custom allocators); OCaml's GC manages memory layout automatically.
  • Auditability: Rust unsafe regions are syntactically visible and toolable; OCaml unsafe operations (Obj.magic, direct C calls) are also explicit but less common.
  • OCaml Approach

    OCaml's GC and type system eliminate most of the need for these unsafe operations. The equivalent functionality typically uses:

  • β€’ C FFI via the ctypes library for external function calls
  • β€’ Bigarray for controlled raw memory access
  • β€’ The GC for memory management (no manual allocators needed)
  • β€’ Bytes.t for mutable byte sequences
  • OCaml programs rarely need operations equivalent to these Rust unsafe patterns.

    Full Source

    #![allow(clippy::all)]
    //! 712 β€” String / CString / CStr Conversion for FFI
    //!
    //! Rust strings (`&str`, `String`) are UTF-8 and length-prefixed.
    //! C strings (`char*`) are null-terminated and encoding-agnostic.
    //! `CString` and `CStr` bridge these two worlds without leaking memory
    //! or invoking undefined behaviour.
    //!
    //! Two directions:
    //!   Rust β†’ C: `CString::new(s)` β€” heap-allocated, null-terminated, owned.
    //!   C β†’ Rust: `CStr::from_ptr(ptr)` β€” borrows the C buffer, zero-copy.
    
    use std::ffi::{CStr, CString, NulError};
    use std::os::raw::c_char;
    
    // ── Rust β†’ C direction ────────────────────────────────────────────────────
    
    /// Convert a Rust `&str` into a heap-allocated, null-terminated `CString`.
    ///
    /// Returns `Err` if `s` contains an interior NUL byte, which would silently
    /// truncate the string from C's perspective.
    pub fn rust_to_cstring(s: &str) -> Result<CString, NulError> {
        CString::new(s)
    }
    
    /// Get the raw `*const c_char` pointer from a `CString` for passing to C.
    ///
    /// The pointer is valid only as long as the `CString` is alive β€” store the
    /// `CString` in a local variable for the duration of the FFI call.
    ///
    /// # Safety
    /// The returned pointer must not outlive `cs`.
    pub fn cstring_as_ptr(cs: &CString) -> *const c_char {
        cs.as_ptr()
    }
    
    // ── C β†’ Rust direction ────────────────────────────────────────────────────
    
    /// Borrow a null-terminated C string as a `&CStr`.
    ///
    /// # Safety
    /// `ptr` must be non-null and point to a valid, null-terminated C string
    /// for at least the lifetime of the returned `&CStr`.
    pub unsafe fn ptr_to_cstr<'a>(ptr: *const c_char) -> &'a CStr {
        // SAFETY: caller guarantees ptr is non-null and null-terminated.
        CStr::from_ptr(ptr)
    }
    
    /// Convert a `&CStr` to a Rust `&str`, returning an error if the bytes are
    /// not valid UTF-8.
    pub fn cstr_to_str(cs: &CStr) -> Result<&str, std::str::Utf8Error> {
        cs.to_str()
    }
    
    /// Full round-trip: C pointer β†’ owned `String`, validating UTF-8.
    ///
    /// # Safety
    /// `ptr` must be non-null and point to a valid, null-terminated C string.
    pub unsafe fn ptr_to_string(ptr: *const c_char) -> Result<String, std::str::Utf8Error> {
        // SAFETY: propagated from caller guarantee.
        let cstr = CStr::from_ptr(ptr);
        cstr.to_str().map(str::to_owned)
    }
    
    // ── Simulated C functions (self-contained, no external linker needed) ─────
    
    /// Simulated C: returns a static greeting string (null-terminated C literal).
    ///
    /// The `c"..."` literal (Rust 1.77+) is placed in `.rodata`; `.as_ptr()` yields
    /// a `*const c_char` valid for the process lifetime.
    #[no_mangle]
    pub extern "C" fn c_greeting() -> *const c_char {
        c"Hello from the C side!".as_ptr()
    }
    
    /// Simulated C: compute the length of a null-terminated string.
    ///
    /// # Safety
    /// `s` must be non-null and null-terminated.
    #[no_mangle]
    pub unsafe extern "C" fn c_strlen(s: *const c_char) -> usize {
        if s.is_null() {
            return 0;
        }
        // SAFETY: caller guarantees s is non-null and null-terminated.
        CStr::from_ptr(s).to_bytes().len()
    }
    
    // ── Safe wrapper over the simulated C functions ───────────────────────────
    
    /// Retrieve the greeting from the simulated C library as an owned `String`.
    pub fn get_greeting() -> String {
        let ptr = c_greeting();
        // SAFETY: c_greeting() returns a pointer to a 'static null-terminated
        // byte literal. It is non-null and valid for the process lifetime.
        unsafe { CStr::from_ptr(ptr).to_string_lossy().into_owned() }
    }
    
    /// Compute the byte length of a Rust string via the simulated C strlen.
    pub fn string_len_via_c(s: &str) -> Result<usize, NulError> {
        let cs = CString::new(s)?;
        // SAFETY: cs is alive for the duration of this call; c_strlen only reads
        // until the null terminator.
        Ok(unsafe { c_strlen(cs.as_ptr()) })
    }
    
    // ── Tests ─────────────────────────────────────────────────────────────────
    
    #[cfg(test)]
    mod tests {
        use super::*;
    
        // ── rust_to_cstring ────────────────────────────────────────────────────
    
        #[test]
        fn test_rust_to_cstring_happy_path() {
            let cs = rust_to_cstring("hello").unwrap();
            // CStr should compare equal to the original bytes.
            assert_eq!(cs.to_str().unwrap(), "hello");
        }
    
        #[test]
        fn test_rust_to_cstring_interior_nul_is_error() {
            // A NUL byte inside the string must produce an error, not silent truncation.
            assert!(rust_to_cstring("hel\0lo").is_err());
        }
    
        #[test]
        fn test_rust_to_cstring_empty_string() {
            let cs = rust_to_cstring("").unwrap();
            assert_eq!(cs.to_str().unwrap(), "");
            // Even an empty CString is null-terminated: length in bytes == 1 (the NUL).
            assert_eq!(cs.as_bytes_with_nul().len(), 1);
        }
    
        #[test]
        fn test_rust_to_cstring_unicode() {
            // UTF-8 content survives the round-trip as long as there's no interior NUL.
            let cs = rust_to_cstring("こんにけは").unwrap();
            assert_eq!(cs.to_str().unwrap(), "こんにけは");
        }
    
        // ── ptr_to_cstr / ptr_to_string ───────────────────────────────────────
    
        #[test]
        fn test_ptr_to_cstr_from_static_literal() {
            let ptr = b"static\0".as_ptr() as *const c_char;
            // SAFETY: ptr points to a NUL-terminated byte literal with 'static lifetime.
            let s = unsafe { ptr_to_cstr(ptr) };
            assert_eq!(s.to_str().unwrap(), "static");
        }
    
        #[test]
        fn test_ptr_to_string_round_trip() {
            let original = "round-trip";
            let cs = CString::new(original).unwrap();
            // SAFETY: cs is alive for the duration of this block.
            let recovered = unsafe { ptr_to_string(cs.as_ptr()) }.unwrap();
            assert_eq!(recovered, original);
        }
    
        // ── cstr_to_str UTF-8 validation ─────────────────────────────────────
    
        #[test]
        fn test_cstr_to_str_invalid_utf8_returns_error() {
            // 0xFF is not valid UTF-8.
            let bytes = b"\xff\0";
            // SAFETY: bytes is null-terminated.
            let cs = unsafe { CStr::from_bytes_with_nul_unchecked(bytes) };
            assert!(cstr_to_str(cs).is_err());
        }
    
        // ── simulated C functions ─────────────────────────────────────────────
    
        #[test]
        fn test_c_greeting_returns_valid_string() {
            let greeting = get_greeting();
            assert_eq!(greeting, "Hello from the C side!");
        }
    
        #[test]
        fn test_c_strlen_empty() {
            assert_eq!(string_len_via_c("").unwrap(), 0);
        }
    
        #[test]
        fn test_c_strlen_ascii() {
            assert_eq!(string_len_via_c("hello").unwrap(), 5);
        }
    
        #[test]
        fn test_c_strlen_null_pointer_returns_zero() {
            // Direct call with null β€” safe wrapper is not involved here.
            // SAFETY: c_strlen explicitly checks for null before dereferencing.
            assert_eq!(unsafe { c_strlen(std::ptr::null()) }, 0);
        }
    
        // ── cstring_as_ptr lifetime discipline ────────────────────────────────
    
        #[test]
        fn test_cstring_as_ptr_is_null_terminated() {
            let cs = CString::new("test").unwrap();
            let ptr = cstring_as_ptr(&cs);
            // SAFETY: cs is alive; ptr is null-terminated by CString invariant.
            let back = unsafe { CStr::from_ptr(ptr) };
            assert_eq!(back.to_str().unwrap(), "test");
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        use super::*;
    
        // ── rust_to_cstring ────────────────────────────────────────────────────
    
        #[test]
        fn test_rust_to_cstring_happy_path() {
            let cs = rust_to_cstring("hello").unwrap();
            // CStr should compare equal to the original bytes.
            assert_eq!(cs.to_str().unwrap(), "hello");
        }
    
        #[test]
        fn test_rust_to_cstring_interior_nul_is_error() {
            // A NUL byte inside the string must produce an error, not silent truncation.
            assert!(rust_to_cstring("hel\0lo").is_err());
        }
    
        #[test]
        fn test_rust_to_cstring_empty_string() {
            let cs = rust_to_cstring("").unwrap();
            assert_eq!(cs.to_str().unwrap(), "");
            // Even an empty CString is null-terminated: length in bytes == 1 (the NUL).
            assert_eq!(cs.as_bytes_with_nul().len(), 1);
        }
    
        #[test]
        fn test_rust_to_cstring_unicode() {
            // UTF-8 content survives the round-trip as long as there's no interior NUL.
            let cs = rust_to_cstring("こんにけは").unwrap();
            assert_eq!(cs.to_str().unwrap(), "こんにけは");
        }
    
        // ── ptr_to_cstr / ptr_to_string ───────────────────────────────────────
    
        #[test]
        fn test_ptr_to_cstr_from_static_literal() {
            let ptr = b"static\0".as_ptr() as *const c_char;
            // SAFETY: ptr points to a NUL-terminated byte literal with 'static lifetime.
            let s = unsafe { ptr_to_cstr(ptr) };
            assert_eq!(s.to_str().unwrap(), "static");
        }
    
        #[test]
        fn test_ptr_to_string_round_trip() {
            let original = "round-trip";
            let cs = CString::new(original).unwrap();
            // SAFETY: cs is alive for the duration of this block.
            let recovered = unsafe { ptr_to_string(cs.as_ptr()) }.unwrap();
            assert_eq!(recovered, original);
        }
    
        // ── cstr_to_str UTF-8 validation ─────────────────────────────────────
    
        #[test]
        fn test_cstr_to_str_invalid_utf8_returns_error() {
            // 0xFF is not valid UTF-8.
            let bytes = b"\xff\0";
            // SAFETY: bytes is null-terminated.
            let cs = unsafe { CStr::from_bytes_with_nul_unchecked(bytes) };
            assert!(cstr_to_str(cs).is_err());
        }
    
        // ── simulated C functions ─────────────────────────────────────────────
    
        #[test]
        fn test_c_greeting_returns_valid_string() {
            let greeting = get_greeting();
            assert_eq!(greeting, "Hello from the C side!");
        }
    
        #[test]
        fn test_c_strlen_empty() {
            assert_eq!(string_len_via_c("").unwrap(), 0);
        }
    
        #[test]
        fn test_c_strlen_ascii() {
            assert_eq!(string_len_via_c("hello").unwrap(), 5);
        }
    
        #[test]
        fn test_c_strlen_null_pointer_returns_zero() {
            // Direct call with null β€” safe wrapper is not involved here.
            // SAFETY: c_strlen explicitly checks for null before dereferencing.
            assert_eq!(unsafe { c_strlen(std::ptr::null()) }, 0);
        }
    
        // ── cstring_as_ptr lifetime discipline ────────────────────────────────
    
        #[test]
        fn test_cstring_as_ptr_is_null_terminated() {
            let cs = CString::new("test").unwrap();
            let ptr = cstring_as_ptr(&cs);
            // SAFETY: cs is alive; ptr is null-terminated by CString invariant.
            let back = unsafe { CStr::from_ptr(ptr) };
            assert_eq!(back.to_str().unwrap(), "test");
        }
    }

    Deep Comparison

    OCaml vs Rust: String/CString/CStr Conversion for FFI

    Side-by-Side Code

    OCaml

    (* OCaml: manual null-termination using Bytes *)
    let to_c_string (s : string) : bytes =
      let n = String.length s in
      let b = Bytes.create (n + 1) in
      Bytes.blit_string s 0 b 0 n;
      Bytes.set b n '\000';
      b
    
    let c_strlen (b : bytes) : int =
      let rec go i =
        if i >= Bytes.length b || Bytes.get b i = '\000' then i else go (i + 1)
      in go 0
    
    let from_c_string (b : bytes) : string =
      Bytes.sub_string b 0 (c_strlen b)
    
    let () =
      let s = "Hello, FFI!" in
      let cs = to_c_string s in
      assert (c_strlen cs = String.length s);
      assert (from_c_string cs = s);
      print_endline "ok"
    

    Rust (idiomatic β€” using CString / CStr)

    use std::ffi::{CStr, CString};
    
    // Rust β†’ C: allocate a heap-owned, null-terminated buffer.
    fn rust_to_cstring(s: &str) -> Result<CString, std::ffi::NulError> {
        CString::new(s)
    }
    
    // C β†’ Rust: borrow the C buffer as a &CStr, then validate UTF-8.
    unsafe fn ptr_to_str<'a>(ptr: *const std::os::raw::c_char) -> &'a str {
        CStr::from_ptr(ptr).to_str().expect("not valid UTF-8")
    }
    

    Rust (manual / functional β€” mirrors the OCaml recursive strlen)

    // Recursive C strlen β€” mirrors OCaml's `go i` accumulator pattern.
    unsafe fn manual_strlen(ptr: *const u8) -> usize {
        if *ptr == 0 { 0 } else { 1 + manual_strlen(ptr.add(1)) }
    }
    

    Type Signatures

    ConceptOCamlRust
    Rust-owned C stringbytes (manual)CString
    Borrowed C stringbytes slice&CStr
    Raw C pointer'a Bigarray / nativeint*const c_char
    Conversion to stringBytes.sub_stringCStr::to_str() -> Result<&str, Utf8Error>
    Interior NUL guardruntime String.contains '\000' (manual)CString::new returns Err(NulError)
    UTF-8 validationno built-in (OCaml is byte-agnostic)CStr::to_str() enforces UTF-8

    Key Insights

  • Ownership encodes lifetime: In OCaml, to_c_string returns a bytes value whose lifetime is managed by the GC β€” there's no dangling-pointer risk. In Rust, CString is a heap-allocated RAII type; calling .as_ptr() borrows from the CString, so the CString must outlive the pointer. The compiler enforces this statically.
  • Null-termination is a type invariant: OCaml's bytes is just bytes β€” the programmer manually appends '\000'. Rust's CString guarantees null-termination by construction; you cannot create one without the terminator, and you cannot get a &CStr from bytes that aren't null-terminated.
  • Interior NUL is a type-level error: If the Rust string contains '\0', CString::new returns Err(NulError) instead of silently truncating the C string at the first NUL β€” a common source of FFI security bugs.
  • UTF-8 flows in one direction only: OCaml strings are byte sequences without encoding guarantees. Rust &str is always UTF-8. When reading a *const c_char from C, CStr::to_str() validates UTF-8 and returns Err(Utf8Error) rather than producing a corrupted &str.
  • Zero-copy on the Cβ†’Rust path: CStr::from_ptr borrows the C buffer directly β€” no allocation, no copy. OCaml's from_c_string always allocates a new string. Rust pays for allocation only when you call .to_owned() or .to_string_lossy().into_owned().
  • When to Use Each Style

    **Use CString / CStr (idiomatic Rust) when:** calling real C libraries (libc, system calls, C extensions). These types prevent the null-termination and UTF-8 bugs at compile time and make FFI audits easier.

    Use manual byte manipulation when: you need precise control over the buffer layout (e.g., fixed-size stack buffers, MaybeUninit patterns for output parameters) or when interoperating with non-UTF-8 encodings where CStr::to_str() would always fail and to_string_lossy() better reflects intent.

    Exercises

  • Minimize unsafe: Find the smallest possible unsafe region in the source and verify that all safe code is outside the unsafe block.
  • Safe alternative: Identify if a safe alternative exists for the demonstrated technique (e.g., bytemuck for transmute, CString for FFI strings) and implement it.
  • SAFETY documentation: Write a complete SAFETY comment for each unsafe block listing preconditions, invariants, and what would break if violated.
  • Open Source Repos