512 Intermediate

Closure State Machine

Functional Programming

Tutorial Video

Text description (accessibility)

This video demonstrates the "Closure State Machine" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. State machines appear in: lexers (tokenising source code), protocol parsers (HTTP state machine), UI event systems (button: idle → pressed → released), and regex engines. Key difference from OCaml: 1. **`Box` overhead**: Each state transition in Rust allocates a `Box`; OCaml's closures are GC

Tutorial

The Problem

State machines appear in: lexers (tokenising source code), protocol parsers (HTTP state machine), UI event systems (button: idle → pressed → released), and regex engines. The traditional implementation uses an enum of states with a match on transitions. A functional alternative represents each state as a closure that takes input and returns the next state — the continuation-passing style. This approach is compositional: states are values that can be passed, stored, and combined.

🎯 Learning Outcomes

• Represent states as Box<dyn Fn(char) -> StateResult> — closures as transitions

• Define StateResult as an enum: Accept | Reject | Continue(Box<dyn Fn(char) -> StateResult>)

• Drive the machine with a run_machine(input: &str) -> bool loop

• Understand how Continue(next_state) is the continuation-passing equivalent of goto state

• Recognise that free functions state_start, state_after_a, state_after_b serve as state closures

Code Example

pub enum StateResult {
    Accept,
    Reject,
    Continue(Box<dyn Fn(char) -> StateResult>),
}

pub fn state_start(c: char) -> StateResult {
    match c {
        'a' => StateResult::Continue(Box::new(state_after_a)),
        'b' => StateResult::Continue(Box::new(state_after_b)),
        _ => StateResult::Reject,
    }
}

type state_result = Accept | Reject | Continue of (char -> state_result)

let rec state_start c = match c with
  | 'a' -> Continue state_after_a
  | 'b' -> Continue state_after_b
  | _ -> Reject

Key Differences

**Box overhead**: Each state transition in Rust allocates a Box; OCaml's closures are GC-managed without explicit boxing.

Mutual recursion: OCaml uses let rec ... and ... for mutually recursive state functions; Rust uses separately defined free functions (no mutual fn recursion syntax needed).

Type-state alternative: For compile-time state machine verification, Rust's type-state pattern (using distinct types for each state) is preferred over runtime closures.

**Debug for StateResult**: Rust must manually implement Debug for StateResult because Box<dyn Fn> is not Debug; OCaml's polymorphic printing handles this automatically.

OCaml Approach

OCaml's algebraic types and first-class functions make this pattern natural:

type result = Accept | Reject | Continue of (char -> result)

let rec state_start c = match c with
  | 'a' -> Continue state_after_a
  | 'b' -> Continue state_after_b
  | _ -> Reject

and state_after_b c = match c with
  | 'b' -> Continue state_after_b
  | _ -> Reject

let run input =
  String.fold_left (fun state c -> match state c with
    | Continue next -> next
    | other -> Fun.const other) state_start input = Accept

OCaml's and (mutually recursive definitions) is cleaner than Rust's free functions here.

Full Source

#![allow(clippy::all)]
//! Closures as State Machine Transitions
//!
//! Pattern: a*b+ — zero-or-more 'a' followed by one-or-more 'b'

/// State machine result.
pub enum StateResult {
    Accept,
    Reject,
    Continue(Box<dyn Fn(char) -> StateResult>),
}

impl std::fmt::Debug for StateResult {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            StateResult::Accept => write!(f, "Accept"),
            StateResult::Reject => write!(f, "Reject"),
            StateResult::Continue(_) => write!(f, "Continue(<fn>)"),
        }
    }
}

/// State: start — expects 'a' or 'b'
pub fn state_start(c: char) -> StateResult {
    match c {
        'a' => StateResult::Continue(Box::new(state_after_a)),
        'b' => StateResult::Continue(Box::new(state_after_b)),
        _ => StateResult::Reject,
    }
}

/// State: seen at least one 'a', expecting more 'a' or 'b'
pub fn state_after_a(c: char) -> StateResult {
    match c {
        'a' => StateResult::Continue(Box::new(state_after_a)),
        'b' => StateResult::Continue(Box::new(state_after_b)),
        _ => StateResult::Reject,
    }
}

/// State: seen at least one 'b' after a's — expecting more 'b' or end
pub fn state_after_b(c: char) -> StateResult {
    match c {
        'b' => StateResult::Continue(Box::new(state_after_b)),
        _ => StateResult::Reject,
    }
}

/// Run the state machine on input.
pub fn run_machine(input: &str) -> bool {
    let mut state: Box<dyn Fn(char) -> StateResult> = Box::new(state_start);
    let mut saw_b = false;

    for c in input.chars() {
        match state(c) {
            StateResult::Accept => return true,
            StateResult::Reject => return false,
            StateResult::Continue(next) => {
                if c == 'b' {
                    saw_b = true;
                }
                state = next;
            }
        }
    }

    // Must have seen at least one 'b' to accept
    saw_b
}

/// Alternative: enum-based state machine (more idiomatic Rust)
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum State {
    Start,
    AfterA,
    AfterB,
    Reject,
}

pub fn transition(state: State, c: char) -> State {
    match (state, c) {
        (State::Start, 'a') => State::AfterA,
        (State::Start, 'b') => State::AfterB,
        (State::AfterA, 'a') => State::AfterA,
        (State::AfterA, 'b') => State::AfterB,
        (State::AfterB, 'b') => State::AfterB,
        _ => State::Reject,
    }
}

pub fn run_enum_machine(input: &str) -> bool {
    let final_state = input.chars().fold(State::Start, transition);
    final_state == State::AfterB
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_accepts_b() {
        assert!(run_machine("b"));
    }

    #[test]
    fn test_accepts_ab() {
        assert!(run_machine("ab"));
    }

    #[test]
    fn test_accepts_aab() {
        assert!(run_machine("aab"));
    }

    #[test]
    fn test_accepts_aaabbb() {
        assert!(run_machine("aaabbb"));
    }

    #[test]
    fn test_rejects_a_only() {
        assert!(!run_machine("a"));
        assert!(!run_machine("aaa"));
    }

    #[test]
    fn test_rejects_empty() {
        assert!(!run_machine(""));
    }

    #[test]
    fn test_enum_machine_matches() {
        for input in &["b", "ab", "aab", "aaabbb", "bbb"] {
            assert_eq!(run_machine(input), run_enum_machine(input));
        }
        for input in &["a", "aaa", "", "ba", "aba"] {
            assert_eq!(run_machine(input), run_enum_machine(input));
        }
    }
}

(* State machine using closures in OCaml *)
(* Parsing a simple pattern: a*b+ (zero or more 'a', one or more 'b') *)

type result = Accept | Reject | Continue of (char -> result)

let rec state_start () : char -> result =
  fun c -> match c with
    | 'a' -> Continue (state_a ())
    | 'b' -> Continue (state_b ())
    | _   -> Reject

and state_a () : char -> result =
  fun c -> match c with
    | 'a' -> Continue (state_a ())
    | 'b' -> Continue (state_b ())
    | _   -> Reject

and state_b () : char -> result =
  fun c -> match c with
    | 'b' -> Continue (state_b ())
    | _   -> Reject

let run_machine input =
  let chars = List.of_seq (String.to_seq input) in
  match chars with
  | [] -> false  (* empty string doesn't match a*b+ *)
  | first :: rest ->
    let initial = state_start () first in
    let rec step state chars =
      match chars with
      | [] -> (match state with Accept -> true | Continue _ -> false | Reject -> false)
      | c :: cs ->
        match state with
        | Continue f -> step (f c) cs
        | Accept -> false
        | Reject -> false
    in
    (* Check if last state after processing all is Accept or we need final check *)
    let final_state = List.fold_left (fun st c ->
      match st with
      | Continue f -> f c
      | other -> other
    ) initial rest in
    match final_state with
    | Continue _ -> false  (* not in accept state at end *)
    | Accept -> true
    | Reject -> false

let () =
  let tests = ["b"; "ab"; "aab"; "abb"; "aabb"; ""; "a"; "ba"; "abc"] in
  List.iter (fun s ->
    Printf.printf "%S -> %b\n" s (run_machine s)
  ) tests

✓ Tests Rust test suite

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_accepts_b() {
        assert!(run_machine("b"));
    }

    #[test]
    fn test_accepts_ab() {
        assert!(run_machine("ab"));
    }

    #[test]
    fn test_accepts_aab() {
        assert!(run_machine("aab"));
    }

    #[test]
    fn test_accepts_aaabbb() {
        assert!(run_machine("aaabbb"));
    }

    #[test]
    fn test_rejects_a_only() {
        assert!(!run_machine("a"));
        assert!(!run_machine("aaa"));
    }

    #[test]
    fn test_rejects_empty() {
        assert!(!run_machine(""));
    }

    #[test]
    fn test_enum_machine_matches() {
        for input in &["b", "ab", "aab", "aaabbb", "bbb"] {
            assert_eq!(run_machine(input), run_enum_machine(input));
        }
        for input in &["a", "aaa", "", "ba", "aba"] {
            assert_eq!(run_machine(input), run_enum_machine(input));
        }
    }
}

Deep Comparison

OCaml vs Rust: Closure State Machines

OCaml

type state_result = Accept | Reject | Continue of (char -> state_result)

let rec state_start c = match c with
  | 'a' -> Continue state_after_a
  | 'b' -> Continue state_after_b
  | _ -> Reject

Rust

pub enum StateResult {
    Accept,
    Reject,
    Continue(Box<dyn Fn(char) -> StateResult>),
}

pub fn state_start(c: char) -> StateResult {
    match c {
        'a' => StateResult::Continue(Box::new(state_after_a)),
        'b' => StateResult::Continue(Box::new(state_after_b)),
        _ => StateResult::Reject,
    }
}

Key Differences

OCaml: Functions stored directly in variants

Rust: Need Box<dyn Fn> for dynamic dispatch

Both: State transitions as function returns

Rust: Enum-based approach often more idiomatic

Both support modeling complex state machines functionally

Exercises

Extend the DFA: Add StateResult::Error(String) for states that can provide error messages, and implement a new state that recognises (a|b)*c (any sequence ending in c).

State machine builder: Design a DfaBuilder that accepts a HashMap<(StateId, char), StateId> transition table and generates the corresponding closure-based state machine.

Type-state comparison: Rewrite the a*b+ recogniser using the type-state pattern (each state is a distinct struct type, transitions are methods) and compare the compile-time guarantees vs. the closure approach.

Open Source Repos

functional-rust

View the source for this example on GitHub — OCaml and Rust side by side in the repo.

Rust