String Searching Algorithms

Lesson 4

Lesson Overview

Welcome to this insightful lesson on string searching algorithms, a core part of programming often used in software development, database design, and information retrieval. Today, we’ll introduce you to one of the most efficient approaches — the Knuth-Morris-Pratt (KMP) algorithm.

Example: Using the KMP Algorithm for Pattern Search

The KMP algorithm optimizes string searching by avoiding unnecessary backtracking. Rather than re-checking characters in the text, it uses a precomputed array called the Longest Prefix Suffix (LPS) array to skip over parts that have already been matched. Here’s the code:

Ruby
1def kmp_search(text, pattern)
2  # Helper function to compute the Longest Prefix Suffix (LPS) array
3  def compute_lps(pattern)
4    lps = Array.new(pattern.length, 0)
5    length = 0
6    i = 1
7
8    # Build the LPS array for the pattern
9    while i < pattern.length
10      if pattern[i] == pattern[length]
11        length += 1
12        lps[i] = length
13        i += 1
14      else
15        # If there's a mismatch and we have a prefix match, use the LPS
16        length = length != 0 ? lps[length - 1] : 0
17        i += 1 if length == 0
18      end
19    end
20    lps
21  end
22
23  # Compute the LPS array and initialize pointers for text and pattern
24  lps = compute_lps(pattern)
25  i = j = 0
26
27  # Search for the pattern within the text
28  while i < text.length
29    if pattern[j] == text[i]
30      i += 1
31      j += 1
32    end
33
34    # If we reached the end of the pattern, we found a match
35    return i - j if j == pattern.length
36
37    # If there's a mismatch after some matches
38    if i < text.length && pattern[j] != text[i]
39      j = j != 0 ? lps[j - 1] : 0
40      i += 1 if j == 0
41    end
42  end
43
44  -1  # Return -1 if no match is found
45end
46
47# Test
48puts kmp_search("abxabcabcaby", "abcaby")  # Output: 6

Here's what's happening in the above code:

The compute_lps function builds the Longest Prefix Suffix (LPS) array for the pattern, helping determine skips in case of mismatches. For each character, if it matches the previous prefix, length is incremented; otherwise, it resets to avoid redundant checks.
The main function then uses this array to match characters in pattern and text. On mismatches, it leverages lps to skip parts of the pattern, avoiding unnecessary comparisons.

This example shows how KMP efficiently finds the first occurrence of pattern in text. Let me know if you have any questions about a specific part of the algorithm!

What’s Next? Practice Makes Perfect

Ready to deepen your understanding? We’ll dive into hands-on exercises to solidify your grasp of these concepts and help you see how these algorithms can be applied in real-world programming tasks. Remember, mastering the "why" behind each step will set you up for success in tackling complex problems!

Enjoy this lesson? Now it's time to practice with Cosmo!

Practice is how you turn knowledge into actual skills.