Lesson 4
String Searching Algorithms
Lesson Overview

Welcome to this insightful lesson on string searching algorithms, a core part of programming often used in software development, database design, and information retrieval. Today, we’ll introduce you to one of the most efficient approaches — the Knuth-Morris-Pratt (KMP) algorithm.

Example: Using the KMP Algorithm for Pattern Search

The KMP algorithm optimizes string searching by avoiding unnecessary backtracking. Rather than re-checking characters in the text, it uses a precomputed array called the Longest Prefix Suffix (LPS) array to skip over parts that have already been matched. Here’s the code:

Ruby
1def kmp_search(text, pattern) 2 # Helper function to compute the Longest Prefix Suffix (LPS) array 3 def compute_lps(pattern) 4 lps = Array.new(pattern.length, 0) 5 length = 0 6 i = 1 7 8 # Build the LPS array for the pattern 9 while i < pattern.length 10 if pattern[i] == pattern[length] 11 length += 1 12 lps[i] = length 13 i += 1 14 else 15 # If there's a mismatch and we have a prefix match, use the LPS 16 length = length != 0 ? lps[length - 1] : 0 17 i += 1 if length == 0 18 end 19 end 20 lps 21 end 22 23 # Compute the LPS array and initialize pointers for text and pattern 24 lps = compute_lps(pattern) 25 i = j = 0 26 27 # Search for the pattern within the text 28 while i < text.length 29 if pattern[j] == text[i] 30 i += 1 31 j += 1 32 end 33 34 # If we reached the end of the pattern, we found a match 35 return i - j if j == pattern.length 36 37 # If there's a mismatch after some matches 38 if i < text.length && pattern[j] != text[i] 39 j = j != 0 ? lps[j - 1] : 0 40 i += 1 if j == 0 41 end 42 end 43 44 -1 # Return -1 if no match is found 45end 46 47# Test 48puts kmp_search("abxabcabcaby", "abcaby") # Output: 6

Here's what's happening in the above code:

  1. The compute_lps function builds the Longest Prefix Suffix (LPS) array for the pattern, helping determine skips in case of mismatches. For each character, if it matches the previous prefix, length is incremented; otherwise, it resets to avoid redundant checks.

  2. The main function then uses this array to match characters in pattern and text. On mismatches, it leverages lps to skip parts of the pattern, avoiding unnecessary comparisons.

This example shows how KMP efficiently finds the first occurrence of pattern in text. Let me know if you have any questions about a specific part of the algorithm!

What’s Next? Practice Makes Perfect

Ready to deepen your understanding? We’ll dive into hands-on exercises to solidify your grasp of these concepts and help you see how these algorithms can be applied in real-world programming tasks. Remember, mastering the "why" behind each step will set you up for success in tackling complex problems!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.