Welcome to this insightful lesson on string searching algorithms, a core part of programming often used in software development, database design, and information retrieval. Today, we’ll introduce you to one of the most efficient approaches — the Knuth-Morris-Pratt (KMP) algorithm.
The KMP algorithm optimizes string searching by avoiding unnecessary backtracking. Rather than re-checking characters in the text, it uses a precomputed array called the Longest Prefix Suffix (LPS) array to skip over parts that have already been matched. Here’s the code:
Ruby1def kmp_search(text, pattern) 2 # Helper function to compute the Longest Prefix Suffix (LPS) array 3 def compute_lps(pattern) 4 lps = Array.new(pattern.length, 0) 5 length = 0 6 i = 1 7 8 # Build the LPS array for the pattern 9 while i < pattern.length 10 if pattern[i] == pattern[length] 11 length += 1 12 lps[i] = length 13 i += 1 14 else 15 # If there's a mismatch and we have a prefix match, use the LPS 16 length = length != 0 ? lps[length - 1] : 0 17 i += 1 if length == 0 18 end 19 end 20 lps 21 end 22 23 # Compute the LPS array and initialize pointers for text and pattern 24 lps = compute_lps(pattern) 25 i = j = 0 26 27 # Search for the pattern within the text 28 while i < text.length 29 if pattern[j] == text[i] 30 i += 1 31 j += 1 32 end 33 34 # If we reached the end of the pattern, we found a match 35 return i - j if j == pattern.length 36 37 # If there's a mismatch after some matches 38 if i < text.length && pattern[j] != text[i] 39 j = j != 0 ? lps[j - 1] : 0 40 i += 1 if j == 0 41 end 42 end 43 44 -1 # Return -1 if no match is found 45end 46 47# Test 48puts kmp_search("abxabcabcaby", "abcaby") # Output: 6
Here's what's happening in the above code:
-
The
compute_lps
function builds the Longest Prefix Suffix (LPS) array for thepattern
, helping determine skips in case of mismatches. For each character, if it matches the previous prefix,length
is incremented; otherwise, it resets to avoid redundant checks. -
The main function then uses this array to match characters in
pattern
andtext
. On mismatches, it leverageslps
to skip parts of the pattern, avoiding unnecessary comparisons.
This example shows how KMP efficiently finds the first occurrence of pattern
in text
. Let me know if you have any questions about a specific part of the algorithm!
Ready to deepen your understanding? We’ll dive into hands-on exercises to solidify your grasp of these concepts and help you see how these algorithms can be applied in real-world programming tasks. Remember, mastering the "why" behind each step will set you up for success in tackling complex problems!