# Longest Common Subsequence The **longest common subsequence** (**LCS**) is the most popular dynamic programming problem. It's the problem of finding the longest subsequence common to all sequences in a set of sequences. **It differs from the longest common substring problem**. LCS for input Sequences “ABCDGH” and “AEDFHR” is “ADH” of length 3. LCS for input Sequences “AGGTAB” and “GXTXAYB” is “GTAB” of length 4. The LCS is the basis the usual `diff` tool. The naïve solution is exponential : the complexity is $2^n$ with $n$ length of the string. The running time of the dynamic programming approach is $O (nm)$ with $n$ and $m$ the lengths of the two sequences. The LCS has an optimal substructure: the LCS(z) contains also the LCS of the prefix of Z. Since the algorithm builds a matrix to store the partial results we can called this algorithm a '2D dynamic program'. We can define the LCS algorithm: - LCS of 2 empty sequences is the empty sequence. - LCS of “{prefix1}A” and “{prefix2}A” is LCS({prefix1}, {prefix2}) + A - LCS of “{prefix1}A” and “{prefix2}B” is the **longest** of LCS({prefix1}A, {prefix2}) and LCS({prefix1}, {prefix2}B) So it's the solution iteratively starting from the simple base cases. Note that the algorithm can work like this because the problem has so-called “optimal” structure, meaning that it can be built by reusing previous memoized steps. The algorithm is 'divided' in two parts: the first one is focused on find the length of the longest common subsequence. $ \begin{array}{|c|c|c|c|c|c|c|} \hline & \boldsymbol{\varnothing} & \mathbf{A} & \mathbf{G} & \mathbf{C} & \mathbf{A} & \mathbf{T} \\ \hline \boldsymbol{\varnothing} & 0 & 0 & 0 & 0 & 0 & 0 \\ \hline \mathbf{G} & 0 & \leftarrow \uparrow0 & \nwarrow 1 & \leftarrow 1 & \leftarrow 1 & \leftarrow 1 \\ \hline \mathbf{A} & 0 & \nwarrow 1 & \leftarrow \uparrow 1 & \leftarrow \uparrow 1 & \nwarrow 2 & \leftarrow 2 \\ \hline \mathbf{C} & 0 & \uparrow 1 & \leftarrow \uparrow 1 & \nwarrow 2 & \leftarrow \uparrow 2 & \leftarrow \uparrow 2 \\ \hline \end{array}$ This matrix will be used later to reconstruct LCS by tracing backwards: $ \begin{array}{|c|c|c|c|c|c|c|} \hline & \varnothing & \mathbf{A} & \mathbf{G} & \mathbf{C} & \mathbf{A} & \mathbf{T} \\ \hline \boldsymbol{\varnothing} & \varnothing & \varnothing & \varnothing & \varnothing & \varnothing & \varnothing \\ \hline \mathbf{G} & \varnothing & \leftarrow \varnothing & \nwarrow(\mathrm{G}) & \leftarrow(\mathrm{G}) & \leftarrow(\mathrm{G}) & \leftarrow(\mathrm{G}) \\ \hline \mathbf{A} & \varnothing & \nwarrow(\mathrm{A}) & \leftarrow\uparrow(\mathrm{A}) \&(\mathrm{G}) & \leftarrow\uparrow(\mathrm{A}) \&(\mathrm{G}) & \nwarrow(\mathrm{GA}) & \leftarrow(\mathrm{GA}) \\ \hline \mathbf{C} & \varnothing & \uparrow(\mathrm{A}) & \leftarrow\uparrow(\mathrm{A}) \&(\mathrm{G}) & \nwarrow(\mathrm{AC}) \&(\mathrm{GC}) & \leftarrow\uparrow(\mathrm{AC}) \&(\mathrm{GC}) \&(\mathrm{GA}) & \leftarrow \uparrow(\mathrm{AC}) \&(\mathrm{GC}) \&(\mathrm{GA}) \\ \hline \end{array}$ Super useful video: [Longest common subsequence algorithm](https://www.youtube.com/watch?v=P-mMvhfJhu8)