Day 56: String Algorithms - KMP
Table of Contents
Day 56: String Algorithms - Knuth-Morris-Pratt (KMP) #
Welcome to Day 56 of our 60 Days of Coding Algorithm Challenge! Today, we’ll explore the Knuth-Morris-Pratt (KMP) algorithm, an efficient string matching algorithm that improves upon the naive approach for finding a pattern within a text.
What is the KMP Algorithm? #
The Knuth-Morris-Pratt (KMP) algorithm is a string matching algorithm that searches for occurrences of a “word” W within a main “text string” S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters.
How KMP Works #
The key idea behind KMP is to utilize the information gained by previous match attempts to skip redundant comparisons. It does this by pre-processing the pattern to create a “failure function” or “partial match table” that is used to skip characters in the input string.
The algorithm consists of two main steps:
- Preprocessing: Create the failure function for the pattern.
- Searching: Use the failure function to perform the actual search.
Implementation #
Here’s a Python implementation of the KMP algorithm:
def compute_lps(pattern):
lps = [0] * len(pattern)
length = 0
i = 1
while i < len(pattern):
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps
def kmp_search(text, pattern):
M = len(pattern)
N = len(text)
lps = compute_lps(pattern)
i = j = 0
results = []
while i < N:
if pattern[j] == text[i]:
i += 1
j += 1
if j == M:
results.append(i - j)
j = lps[j - 1]
elif i < N and pattern[j] != text[i]:
if j != 0:
j = lps[j - 1]
else:
i += 1
return results
# Example usage
text = "ABABDABACDABABCABAB"
pattern = "ABABCABAB"
matches = kmp_search(text, pattern)
print(f"Pattern found at indices: {matches}")
Time Complexity #
- Preprocessing: O(m), where m is the length of the pattern
- Searching: O(n), where n is the length of the text
The total time complexity is O(m + n), which is a significant improvement over the naive approach’s O(mn) complexity.
Space Complexity #
The space complexity is O(m) for storing the failure function.
Advantages of KMP #
- Efficient for long patterns and texts
- No backtracking in the main text
- Performs well when the pattern has many repeating subpatterns
Disadvantages of KMP #
- Requires preprocessing of the pattern
- More complex to implement compared to naive approach
- Not as efficient for short patterns or texts
Applications of KMP #
- Text editors for finding and replacing patterns
- Intrusion detection in computer networks
- Computational biology for finding DNA sequences
- Spam filtering in email systems
Variations and Related Algorithms #
- Boyer-Moore algorithm: Another efficient string matching algorithm
- Rabin-Karp algorithm: Uses hashing to find patterns
- Aho-Corasick algorithm: Finds multiple patterns simultaneously
Exercise #
- Modify the KMP algorithm to find all non-overlapping occurrences of the pattern in the text.
- Implement a function that uses KMP to determine if a string is a rotation of another string.
- Use KMP to solve the problem of finding the longest prefix which is also a suffix in a given string.
Summary #
Today, we explored the Knuth-Morris-Pratt (KMP) algorithm, an efficient string matching algorithm that improves upon the naive approach. We implemented the algorithm, discussed its time and space complexity, and looked at its advantages, disadvantages, and applications.
Understanding string matching algorithms like KMP is crucial for many text processing tasks and forms the basis for more advanced string algorithms. As we progress through this challenge, you’ll see how these concepts can be applied to solve various string-related problems efficiently.
Tomorrow, we’ll dive into another important string matching algorithm: the Rabin-Karp Algorithm. Stay tuned!