+1 vote
Given two DNA sequences (sequences of A, C, T, G characters), the longest common subsequence (LCS) problem is to find the longest subsequence (not nescessarily contiguous) that exists in both of the input strings. For example, given strings "ACTGACT" and "CAAGCATA", the subsequence "CGCT" is common in both. Given two DNA sequences of sizes n1 and n2 respectively, find a dynamic programming algorithm to find the longest common subsequence in O(n1n2) time.

From the midterm practice
in Dynamic Programming by AlgoMeister (684 points)

3 Answers

+1 vote

notation: LCS[i][j] 

Recurrence Relation: LCS[i][j] = max{lcs[i-1][j], lcs[i][j-1]}

Optimality:  Suppose 2 strings m and n.  If the last character does not match m[i-1] != n[i-1] than LCS[[i-1][j-1] = MAX{lcs[m-2][n-1], lcs[m-1][n-2]} 

however, if the last character m[i-1] == n[i-1] than LCS[i-1][j-1] = LCS[i-2][j-2] + 1. 

For example given the strings ACTGACT and CAAGCAT if the last character T matches than the LCS at the last position would increase.  However, if the last character is not equal to each other than we would select either the LCS of either the previous character of string m or string n. 

Visually speaking this would look like this.

ACTGACT
C0100010
A1000200
A1000200
G0002000
C0200030
A1000300
T0030004

I included the below table to mirror what would be happening in the algorithm

ACTGACT
C011111
A11122
A1111222
G1112222
C12223
A1222233
T12333

Giving the LCS of 4 which can be selected from any value to the top left of the largest value at that row.

So the LCS would be AGAT or ACAT or CGAT 

Algorithm:

LCS[n][m]
for( int i = 1; i < n; i++) {
    for(int j = 1; j < m, j++) {
        if(n[i-1] == m[i-1]) {
            LCS[n][m] = LCS[n-1][m-1] + 1
        }
        else {
            LCS[n][m] = MAX{LCS[i-1][j], LCS[j][i-1]}
        }
    }
}
by AlgoMeister (684 points)
edited by
Can someone check my notations / recurrence relation / optimality?  I am still attempting to figure out exactly what is expected on these steps.
+1 vote

Notation: LCS(i,j) represents the length of maximum common subsequence of the respective strings A(0...i), B(0....j) 

Optimality: To get the Longest Common Subsequence (LCS) at position (i)(j) 

when the final characters of two strings, A and B, do not match (A(i) is not equal to B(i)), take the maximum of two values: LCS(i)(j) = MAX(LCS(i-1)(j), LCS(i)(j-1)). 

if A(i) equal to B(i) then LCS(i)(j) = 1+ LCS(i-1)(j-1)

This illustrates the notion that the LCS is affected by the LCS values derived by removing the final character from each string when the last characters are not identical.

Recurrence Relation: if A(i)!=B(i) then LCS(i, j) =max( LCS(i-1,j) ,(i, j-1)

                                   if A(i)=B(j) then LCS(i, j) = 1 + LCS(i-1,j-1)

Algorithm:

def find_lcs(A, B):

    dp = [[0] * (len(B) + 1) for _ in range(len(A) + 1)]

    for i in range(1, len(A) + 1):

          for j in range(1, len(B) + 1):

            dp[i][j] = dp[i-1][j-1] + 1 if A[i-1] == B[j-1] else max(dp[i-1][j], dp[i][j-1])

        lcs = ''.join([s1[i-1] for i, j in zip(range(len(A), 0, -1), range(len(B), 0, -1)) if dp[i][j] !=              dp[i-1][j] and dp[i][j] != dp[i][j-1]])

 return lcs

s1 = "ACTGACT" 

s2 = "CAAGCATA"

result = find_lcs(s1, s2)

print(result)  

As we are storing the results of sub-problem in the array of size n1*n2 so the time and space complexity of this problem is O(n1*n2)

by (220 points)
0 votes
notation: dp[i][j] to represent the max result ends at the  position ith, jth elements.

if (n1[0]=n2[0]){

dp[0][0]=1;

}else {

dp[0][0]=0;

}
for( i =1; i<n1.length;i++){

if(n1[0]=n2[i]){

dp[0][i]=dp[0][i-1]+1;

}else {

dp[0][i]=dp[0][i-1];

}

for(i=1;i<=n2.length;i++){

if(n1[i]=n2[0]){

dp[i][0]=dp[i-1][0]+1;

}

else{

dp[i][0]=dp[i-1][0];

}

}

for( i =1; i<n1.length;i++)

(for j=1;j<n2.length;j++)

if (n1[i]=n2[i])

 dp[i][j]=dp[i-1][j-1]+1;

if n1[i]!=n2[i]

 dp[i][j]=Math.max(dp[i-1][j],dp[i][j-1])

return dp[n1.length-1][n2.length-1];
by Active (320 points)
edited by

Related questions

+2 votes
2 answers
+1 vote
3 answers
asked Mar 3, 2018 in Dynamic Programming by jmlitfi AlgoMeister (684 points)
+2 votes
2 answers
asked Mar 3, 2018 in Dynamic Programming by jmlitfi AlgoMeister (684 points)
0 votes
1 answer
The Book: Analysis and Design of Algorithms | Presentations on Slideshare | Lecture Notes, etc
...