794-word-break-dp — Word Break
Tutorial Video
Text description (accessibility)
This video demonstrates the "794-word-break-dp — Word Break" functional Rust example. Difficulty level: Intermediate. Key concepts covered: Functional Programming. Given a string and a dictionary of words, determine if the string can be segmented into a sequence of dictionary words. Key difference from OCaml: 1. **HashSet vs Hashtbl**: Rust uses `HashSet<&str>` for O(1) lookup; OCaml's `Hashtbl.mem` provides the same.
Tutorial
The Problem
Given a string and a dictionary of words, determine if the string can be segmented into a sequence of dictionary words. This is used in natural language processing (Chinese word segmentation — Chinese text has no spaces), search engine query parsing, and code tokenization. The DP approach avoids the exponential backtracking of a naive recursive search by caching whether each prefix is breakable.
🎯 Learning Outcomes
dp[i] = whether s[:i] can be formed from dictionary wordsdp[i] = any j: dp[j] && s[j..i] in dictHashSet for O(1) word lookups vs. O(k) linear scanCode Example
#![allow(clippy::all)]
//! # Word Break
use std::collections::HashSet;
pub fn word_break(s: &str, dict: &[&str]) -> bool {
let words: HashSet<&str> = dict.iter().copied().collect();
let n = s.len();
let mut dp = vec![false; n + 1];
dp[0] = true;
for i in 1..=n {
for j in 0..i {
if dp[j] && words.contains(&s[j..i]) {
dp[i] = true;
break;
}
}
}
dp[n]
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_word_break() {
assert!(word_break("leetcode", &["leet", "code"]));
}
#[test]
fn test_no_break() {
assert!(!word_break(
"catsandog",
&["cats", "dog", "sand", "and", "cat"]
));
}
}Key Differences
HashSet<&str> for O(1) lookup; OCaml's Hashtbl.mem provides the same.s[j..i] string slice is a &str; OCaml's String.sub creates a new string allocation on each check — a minor performance difference.break exits the inner loop once dp[i] is true; OCaml uses a boolean flag or exception-based early exit.OCaml Approach
OCaml implements with Array.make (n+1) false and uses Hashtbl for the dictionary. The inner loop uses try with Hashtbl.find or Hashtbl.mem. Functional style uses Array.init + List.exists over the dictionary. The words list can be sorted by length to prune impossible substrings early. OCaml's Re library provides regex-based word segmentation as an alternative approach.
Full Source
#![allow(clippy::all)]
//! # Word Break
use std::collections::HashSet;
pub fn word_break(s: &str, dict: &[&str]) -> bool {
let words: HashSet<&str> = dict.iter().copied().collect();
let n = s.len();
let mut dp = vec![false; n + 1];
dp[0] = true;
for i in 1..=n {
for j in 0..i {
if dp[j] && words.contains(&s[j..i]) {
dp[i] = true;
break;
}
}
}
dp[n]
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_word_break() {
assert!(word_break("leetcode", &["leet", "code"]));
}
#[test]
fn test_no_break() {
assert!(!word_break(
"catsandog",
&["cats", "dog", "sand", "and", "cat"]
));
}
}#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_word_break() {
assert!(word_break("leetcode", &["leet", "code"]));
}
#[test]
fn test_no_break() {
assert!(!word_break(
"catsandog",
&["cats", "dog", "sand", "and", "cat"]
));
}
}
Deep Comparison
OCaml vs Rust: Word Break Dp
Overview
See the example.rs and example.ml files for detailed implementations.
Key Differences
| Aspect | OCaml | Rust |
|---|---|---|
| Type system | Hindley-Milner | Ownership + traits |
| Memory | GC | Zero-cost abstractions |
| Mutability | Explicit ref | mut keyword |
| Error handling | Option/Result | Result<T, E> |
See README.md for detailed comparison.
Exercises
word_break_all(s, dict) -> Vec<String> that returns all valid space-separated segmentations (e.g., ["cat sand dog", "cats and dog"]).min_word_len and max_word_len in the dictionary, reducing the inner loop iterations.