Longest Common Subsequence (LCS)
Welcome to Day 32 of our 60 Days of Coding Algorithm Challenge! Today, we’ll explore the Longest Common Subsequence (LCS) problem, a classic example of dynamic programming.
What is the Longest Common Subsequence?
The Longest Common Subsequence problem is to find the longest subsequence common to all sequences in a set of sequences. Unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences.
For example, the LCS of “ABCDGH” and “AEDFHR” is “ADH” of length 3.
Naive Recursive Approach
Let’s start with a naive recursive implementation:
1def lcs_naive(X, Y, m, n):
2 if m == 0 or n == 0:
3 return 0
4 elif X[m-1] == Y[n-1]:
5 return 1 + lcs_naive(X, Y, m-1, n-1)
6 else:
7 return max(lcs_naive(X, Y, m, n-1), lcs_naive(X, Y, m-1, n))
8
9# Example usage
10X = "ABCDGH"
11Y = "AEDFHR"
12print(f"Length of LCS is {lcs_naive(X, Y, len(X), len(Y))}")
This approach has a time complexity of O(2^n) in the worst case, making it inefficient for longer sequences.
Dynamic …
Longest Common Subsequence (LCS)
Welcome to Day 32 of our 60 Days of Coding Algorithm Challenge! Today, we’ll explore the Longest Common Subsequence (LCS) problem, a classic example of dynamic programming.
What is the Longest Common Subsequence?
The Longest Common Subsequence problem is to find the longest subsequence common to all sequences in a set of sequences. Unlike substrings, subsequences are not required to occupy consecutive positions within the original sequences.
For example, the LCS of “ABCDGH” and “AEDFHR” is “ADH” of length 3.
Naive Recursive Approach
Let’s start with a naive recursive implementation:
1def lcs_naive(X, Y, m, n):
2 if m == 0 or n == 0:
3 return 0
4 elif X[m-1] == Y[n-1]:
5 return 1 + lcs_naive(X, Y, m-1, n-1)
6 else:
7 return max(lcs_naive(X, Y, m, n-1), lcs_naive(X, Y, m-1, n))
8
9# Example usage
10X = "ABCDGH"
11Y = "AEDFHR"
12print(f"Length of LCS is {lcs_naive(X, Y, len(X), len(Y))}")
This approach has a time complexity of O(2^n) in the worst case, making it inefficient for longer sequences.
Dynamic Programming Approach
We can significantly improve the efficiency using Dynamic Programming. Let’s implement both top-down (memoization) and bottom-up (tabulation) approaches.
Top-Down Approach (Memoization)
1def lcs_memo(X, Y):
2 def lcs_helper(m, n, memo):
3 if m == 0 or n == 0:
4 return 0
5 if (m, n) in memo:
6 return memo[(m, n)]
7 if X[m-1] == Y[n-1]:
8 memo[(m, n)] = 1 + lcs_helper(m-1, n-1, memo)
9 else:
10 memo[(m, n)] = max(lcs_helper(m, n-1, memo), lcs_helper(m-1, n, memo))
11 return memo[(m, n)]
12
13 return lcs_helper(len(X), len(Y), {})
14
15# Example usage
16X = "ABCDGH"
17Y = "AEDFHR"
18print(f"Length of LCS is {lcs_memo(X, Y)}")
Bottom-Up Approach (Tabulation)
1def lcs_tab(X, Y):
2 m, n = len(X), len(Y)
3 L = [[0] * (n + 1) for _ in range(m + 1)]
4
5 for i in range(1, m + 1):
6 for j in range(1, n + 1):
7 if X[i-1] == Y[j-1]:
8 L[i][j] = L[i-1][j-1] + 1
9 else:
10 L[i][j] = max(L[i-1][j], L[i][j-1])
11
12 return L[m][n]
13
14# Example usage
15X = "ABCDGH"
16Y = "AEDFHR"
17print(f"Length of LCS is {lcs_tab(X, Y)}")
Both of these Dynamic Programming approaches have a time complexity of O(mn) and a space complexity of O(mn), where m and n are the lengths of the input sequences.
Printing the Longest Common Subsequence
We can modify our bottom-up approach to not only find the length of the LCS but also print the LCS itself:
1def print_lcs(X, Y):
2 m, n = len(X), len(Y)
3 L = [[0] * (n + 1) for _ in range(m + 1)]
4
5 # Fill L[][] in bottom up fashion
6 for i in range(1, m + 1):
7 for j in range(1, n + 1):
8 if X[i-1] == Y[j-1]:
9 L[i][j] = L[i-1][j-1] + 1
10 else:
11 L[i][j] = max(L[i-1][j], L[i][j-1])
12
13 # Following steps build LCS string from L[][]
14 lcs = []
15 i, j = m, n
16 while i > 0 and j > 0:
17 if X[i-1] == Y[j-1]:
18 lcs.append(X[i-1])
19 i -= 1
20 j -= 1
21 elif L[i-1][j] > L[i][j-1]:
22 i -= 1
23 else:
24 j -= 1
25
26 return ''.join(reversed(lcs))
27
28# Example usage
29X = "ABCDGH"
30Y = "AEDFHR"
31print(f"LCS of {X} and {Y} is {print_lcs(X, Y)}")
Space-Optimized Solution
We can optimize the space complexity to O(min(m,n)) by only keeping track of the current and previous rows of the DP table:
1def lcs_space_optimized(X, Y):
2 m, n = len(X), len(Y)
3
4 # Make sure X is the shorter string
5 if m > n:
6 return lcs_space_optimized(Y, X)
7
8 # Previous and current row of the DP table
9 prev = [0] * (m + 1)
10 curr = [0] * (m + 1)
11
12 for j in range(1, n + 1):
13 for i in range(1, m + 1):
14 if X[i-1] == Y[j-1]:
15 curr[i] = prev[i-1] + 1
16 else:
17 curr[i] = max(curr[i-1], prev[i])
18 prev, curr = curr, prev
19
20 return prev[m]
21
22# Example usage
23X = "ABCDGH"
24Y = "AEDFHR"
25print(f"Length of LCS is {lcs_space_optimized(X, Y)}")
This optimized version has a space complexity of O(min(m,n)) while maintaining the time complexity of O(mn).
Applications of LCS
- DNA sequence alignment in bioinformatics
- File comparison (diff utility)
- Plagiarism detection
- Version control systems
Exercise
Implement a function to find the Longest Common Substring (contiguous) of two strings using dynamic programming.
Modify the LCS algorithm to find the Longest Palindromic Subsequence of a given string.
Implement a function to find the Shortest Common Supersequence of two strings using the LCS algorithm.
Summary
Today, we explored the Longest Common Subsequence problem, a classic example of dynamic programming. We implemented various approaches, from naive recursion to optimized dynamic programming solutions, and even a space-optimized version.
Understanding the LCS problem and its solutions provides valuable insights into dynamic programming techniques and their applications in string manipulation and comparison problems.
Tomorrow, we’ll dive into another fundamental dynamic programming problem: the Knapsack Problem. Stay tuned!