ExamplesBy LevelBy TopicLearning Paths
478 Fundamental

String Searching

Functional Programming

Tutorial

The Problem

Searching strings is fundamental: checking whether a URL starts with "https://", finding the position of a delimiter for manual parsing, counting occurrences of a substring, validating file extensions. A well-designed API avoids returning raw byte indices without validation, distinguishes first-occurrence (find) from last-occurrence (rfind), and supports both exact substring and character-predicate matching through a single interface.

🎯 Learning Outcomes

  • • Check substring presence with .contains(pat)
  • • Find the byte offset of the first match with .find(pat) returning Option<usize>
  • • Find the last occurrence with .rfind(pat)
  • • Check prefix/suffix with .starts_with(pat) / .ends_with(pat)
  • • Count non-overlapping occurrences with .matches(pat).count()
  • Code Example

    #![allow(clippy::all)]
    // 478. contains(), find(), starts_with()
    
    #[cfg(test)]
    mod tests {
        #[test]
        fn test_contains() {
            assert!("hello world".contains("world"));
            assert!(!"hello".contains("xyz"));
        }
        #[test]
        fn test_starts() {
            assert!("hello".starts_with("hel"));
            assert!(!"hello".ends_with("hel"));
        }
        #[test]
        fn test_find() {
            assert_eq!("hello".find('l'), Some(2));
            assert_eq!("hello".find('z'), None);
        }
        #[test]
        fn test_rfind() {
            assert_eq!("hello".rfind('l'), Some(3));
        }
        #[test]
        fn test_matches() {
            assert_eq!("aaabaa".matches('a').count(), 5);
        }
    }

    Key Differences

  • Pattern abstraction: Rust's Pattern trait unifies char, &str, &[char], and closures under one API; OCaml has separate functions for each (index_opt, contains, Str.search_forward).
  • Byte vs. char offset: Rust's find returns a byte offset (always a valid char boundary for the match start); OCaml's String.index_opt returns a byte offset (same caveat for non-ASCII).
  • **matches iterator**: Rust's .matches(pat) returns a lazy iterator; OCaml has no standard equivalent — count requires a loop.
  • **starts_with/ends_with**: Available in OCaml only since 4.13; Rust has had them since 1.0.
  • OCaml Approach

    (* contains *)
    let contains_sub s sub =
      let n = String.length sub in
      let found = ref false in
      for i = 0 to String.length s - n do
        if String.sub s i n = sub then found := true
      done; !found
    
    (* find — String.index_opt for single char *)
    String.index_opt "hello" 'l'   (* Some 2 *)
    
    (* starts_with — OCaml 4.13+ *)
    String.starts_with ~prefix:"hel" "hello"  (* true *)
    

    OCaml 4.13 added String.starts_with and String.ends_with. For older versions and substring search, Str.search_forward or astring were the common choices.

    Full Source

    #![allow(clippy::all)]
    // 478. contains(), find(), starts_with()
    
    #[cfg(test)]
    mod tests {
        #[test]
        fn test_contains() {
            assert!("hello world".contains("world"));
            assert!(!"hello".contains("xyz"));
        }
        #[test]
        fn test_starts() {
            assert!("hello".starts_with("hel"));
            assert!(!"hello".ends_with("hel"));
        }
        #[test]
        fn test_find() {
            assert_eq!("hello".find('l'), Some(2));
            assert_eq!("hello".find('z'), None);
        }
        #[test]
        fn test_rfind() {
            assert_eq!("hello".rfind('l'), Some(3));
        }
        #[test]
        fn test_matches() {
            assert_eq!("aaabaa".matches('a').count(), 5);
        }
    }
    ✓ Tests Rust test suite
    #[cfg(test)]
    mod tests {
        #[test]
        fn test_contains() {
            assert!("hello world".contains("world"));
            assert!(!"hello".contains("xyz"));
        }
        #[test]
        fn test_starts() {
            assert!("hello".starts_with("hel"));
            assert!(!"hello".ends_with("hel"));
        }
        #[test]
        fn test_find() {
            assert_eq!("hello".find('l'), Some(2));
            assert_eq!("hello".find('z'), None);
        }
        #[test]
        fn test_rfind() {
            assert_eq!("hello".rfind('l'), Some(3));
        }
        #[test]
        fn test_matches() {
            assert_eq!("aaabaa".matches('a').count(), 5);
        }
    }

    Exercises

  • All occurrences: Write find_all(haystack: &str, needle: &str) -> Vec<usize> that returns the byte offset of every non-overlapping occurrence using find in a loop.
  • Fuzzy prefix match: Write matches_prefix_ci(s: &str, prefix: &str) -> bool that does case-insensitive prefix checking without allocating a lowercase copy.
  • **Regex vs. find**: Benchmark .contains("error") vs. a compiled Regex pattern for scanning 10,000 log lines; measure when the regex overhead becomes worthwhile.
  • Open Source Repos