Dynamic Programming

Section 5.1 Dynamic Programming

What makes hard problems hard? One of the challenges that one faces when trying to develop an algorithm to solve a hard problem is that any “decision” that you make during the procedure, as intelligent as it may seem, may come back to bite you later on. For example, when trying to properly colour a graph \(G\) with as few colours as possible, you may want to start by colouring a subgraph \(H\) of \(G\) with as few colours as possible, and then extend this colouring to the rest of \(G\text{.}\) However, for some graphs \(G\) and some subgraphs \(H\) of \(G\text{,}\) it may be the case that every optimal colouring for \(G\) uses a suboptimal number of colours on \(H\text{.}\) Therefore, if you build up a proper colouring from colouring substructures in an optimal way without seeing the “big picture,” then it is possible that the decisions that you make early on will lead to poor outcomes.

Dynamic programming is an approach for solving problems that are “better behaved” in the sense that every instance can be reduced to

Finding an optimal solution to several smaller subproblems (recursively) and
Choosing the optimal way of extending the solution of one of these subproblems to a solution to the original problem.

To be clear, given an instance of the problem, it is not always clear which subproblem would be the best one to consider. In fact, in dynamic programming, we will typically explore every possible subproblem of a given type and, for each of them, compute the optimal solution of the subproblem and the optimal way of extending it to solve the original problem. Of course, as we discussed above, not all problems are amenable to this approach. So, it not only important to understand how dynamic programming works, but to also be able to recognize when a problem has the right “structure” to be attacked via dynamic programming.

Let us make this more concrete by describing a specific problem that can be solved via dynamic programming. As we shall see below and in the exercises of this section, many natural optimization problems can be reduced directly to this problem. Say that a digraph is acyclic if it does not contain a directed cycle (including a cycle of length 2); an acyclic digraph is often referred to as a DAG (standing for “directed acyclic graph”).

Problem 5.1.1. Longest Path in a DAG.

Input: A DAG \(D\) and a length function \(\ell:V(D)\cup A(D)\to [-\infty,\infty]\text{.}\)
Problem: Find a longest directed path in \(D\) with respect to \(\ell\text{,}\) where the length of a path is the sum of the lengths of the vertices and arcs within it.

The assumption that the digraph is acyclic is important here, since finding a longest path in a graph is NP-hard (because the Hamiltonian Path Problem is NP-complete). Let us now see how to apply the idea of dynamic programming to solve this problem in polynomial time. The key principle that makes this work is that a longest path that ends at a given vertex \(v\) in a DAG can be computed easily if we know the length of the longest path that ends at each of its in-neighbours (the DAG structure is important here). Moreover, once a longest path that ends at a given vertex \(u\) has been computed, we can simply store that data and re-use it without recomputing it. In terms of the paradigm described earlier, computing the longest path ending \(v\) can be broken down into solving a subproblem for each in-neighbour of \(v\) and then determining which of these subproblems extend optimally to solve the original problem. Let us now explain this more rigorously.

Theorem 5.1.2.

There is an algorithm which, given a digraph \(D\) and a length function \(\ell:V(D)\cup A(D)\to[-\infty,\infty]\text{,}\) outputs, for every \(v\in V(D)\text{,}\) the length of the longest directed path that ends at \(v\) and the predecessor of \(v\) on such a path, in time \(O(|V(D)|+|A(D)|)\text{.}\) In particular, the Longest Path in a DAG problem can be solved in time \(O(|V(D)|+|A(D)|)\text{.}\)

Proof.

First, we observe that, since \(D\) is acyclic, every directed walk in \(D\) is a directed path since, if a walk were to self-intersect, then it would contain a directed cycle. So, finding a longest path is equivalent to finding a longest directed walk. Take an arbitrary ordering \(v_1,\dots,v_n\) of the vertices of \(D\text{.}\) Throughout the algorithm, we will update two functions \(h:V(D)\to[-\infty,\infty]\cup\{\text{null}\}\) and \(p:V(D)\to V(D)\cup \{\text{null}\}\text{,}\) where \(h(v)\) and \(p(v)\) denote our current “best guess” for the length of the longest path that ends at \(v\) and the predecessor of \(v\) on one such path. (For lack of a better symbol, we \(h\) to stand for “height”). The “large scale” structure of the algorithm is as follows, where the subroutine \(\text{ComputeLength}(v,h,p)\) will be described later.

Algorithm 5.1.3. Dynamic Programming Algorithm for Longest Path in a DAG.

Initialize: \(h(v)=\text{null}\) and \(p(v)=\text{null}\) for all \(v\in V(G)\text{.}\)
for \(i=1,\dots,n\)
- run \(\text{ComputeLength}(v_i,h,p)\)
Output: The values \(h(v)\) and \(p(v)\) for every vertex \(v\text{.}\)

Clearly, the initialization step takes time \(O(|V(G)|)\) and, after completing the for-loop, it takes time \(O(|V(G)|)\) to produce the output. To complete the proof, we need to show that the for-loop can be executed in time \(O(|V(D)|+|A(D)|)\) and that the output that we obtain is always an optimal solution to the problem. To this end, let us now describe the subroutine \(\text{ComputeLength}(v,h,p)\text{.}\) Note that this subroutine takes in the functions \(h\) and \(p\) as input and may modify them during its execution. The modifications that are made in early iterations of the for-loop in Algorithm 5.1.3 will affect later iterations. This observation is important for bounding the total running time at the end.

Algorithm 5.1.4. \(\text{ComputeLength}(v,h,p)\).

if \(h(v)\neq \text{null}\)
- Do nothing.
if \(h(v)=\text{null}\)
- Update \(h(v)\) to be equal to \(\ell(v)\) and \(p(v)\) to be equal to \(v\text{.}\)
- for each \(u\in N^-(v)\)
  - run \(\text{ComputeLength}(u,h,p)\)
- Choose \(u\in N^-(v)\) such that \(h(u)+\ell(uv)\) is maximized and, subject to this, \(u\) comes earliest in the ordering \(v_1,\dots,v_n\) of \(V(D)\text{.}\)
- if \(h(u)+\ell(uv)>0\)
  - set \(h(v)=h(u)+\ell(uv)+\ell(v)\) and \(p(v)=u\)

Because of the first if-statement in the description of \(\text{ComputeLength}(u,h,p)\text{,}\) it is clear that the values of \(h(v)\) and \(p(v)\) for a vertex \(v\) are only updated once during the entire algorithm, and that this update occurs at the first time that \(\text{ComputeLength}(v,h,p)\) is called.

Let us prove that Algorithm 5.1.3 correctly identifies the longest path ending at every vertex \(v\) and, therefore, correctly solves the Longest Path in a DAG problem. We proceed by induction on the maximum number of arcs in a directed walk in \(D\) that ends at \(v\text{.}\) Note that this quantity is well-defined since \(D\) is acyclic and so no directed walk can have a repeated vertex, and therefore must have a finite number of arcs. If the maximum number of arcs in a directed walk ending at \(v\) is zero (i.e. \(v\) is a source vertex, meaning a vertex of in-degree zero), then, at the first time that the algorithm runs the subroutine \(\text{ComputeLength}(v,h,p)\text{,}\) the second if-statement executes and it sets \(h(v)\) to be \(\ell(v)\) and \(p(v)\) to be \(v\text{.}\) This signifies that the maximum length directed path that ends at \(v\) is the trivial path that also starts at \(v\text{,}\) which is indeed true because there are no other directed paths that end at \(v\text{.}\) Now, suppose that the maximum number of arcs in a directed walk that ends at \(v\) is \(k\geq1\text{.}\) In this case, the maximum length directed walk that ends at \(v\) is either the trivial walk, which has length \(\ell(v)\text{,}\) or passes through one of the in-neighbours \(u\) of \(v\) before reaching \(v\text{.}\) On the first time that \(\text{ComputeLength}(v,h,p)\) is called during the algorithm, it ensures that the subroutines \(\text{ComputeLength}(u,h,p)\) for each \(u\in N^-(v)\) have all been run at least once, and so the values of \(h(u)\) and \(p(u)\) have been computed, before it makes a decision about the quantities \(h(v)\) and \(p(v)\text{.}\) For every such \(u\text{,}\) the maximum number of arcs in any directed walk that ends in \(u\) is at most \(k-1\) for, if not, then we could get a directed walk of more than \(k\) arcs ending at \(v\text{.}\) So, by induction on \(k\text{,}\) the algorithm correctly identifies the maximum length \(h(u)\) of a directed path ending at \(u\) and the predecessor of \(u\) on such a path. From this, we see that the algorithm also correctly identifies the maximum length of a directed path ending at \(v\) and the predecessor of \(v\) on such a path as well.

Finally, we analyze the running time of the for loop in Algorithm 5.1.3. In doing so, we will associate each step of the algorithm to a vertex \(v\) in such a way that each vertex is associated to \(O(1+d^-(v)+d^+(v))\) steps. We need to be a bit careful here. To avoid over-counting, for each vertex \(v\text{,}\) during any given execution of \(\text{ComputeLength}(v,h,p)\text{,}\) we will treat each recursive call to \(\text{ComputeLength}(u,h,p)\) for \(u\in N^-(v)\) as contributing only \(O(1)\) steps to the vertex \(v\text{.}\) This is not cheating because all of the steps within that recursive call will be associated with \(u\) (or to an in-neighbour of \(u\text{,}\) etc, if further recursive calls are made) and so they will be accounted for eventually. Given this, we observe that, for each vertex \(v\text{,}\) the subroutine \(\text{ComputeLength}(v,h,p)\) is called exactly \(1+d^+(v)\) times. In each call to this subroutine, except for the first one, the first if-statement is triggered and the subroutine executes in time \(O(1)\text{.}\) On the first call to this subroutine, we do \(O(1)\) steps to determine which of the three cases we are in. If we are in either of the first two cases, then the entire call executes in time \(O(1)\text{.}\) If we are in the third case, then the algorithm runs for \(O(d^-(v))\) steps (as we said before, the steps within each recursive call are associated to the vertex \(u\text{,}\) not \(v\)). So, in total, each vertex indeed contributes \(O(1+d^-(v)+d^+(v))\) steps to the algorithm. Summing this over all \(v\) yields \(O(|V(G)|+|A(G)|)\text{,}\) as desired.

Given the above theorem, we recognize that, for any problem, if we can reduce it (in an efficient way) to the problem of finding the longest path in a DAG, then we immediately get an efficient algorithm for our problem. Let us look at an example of this. First, we observe that it can be applied to solve the longest or shortest path problem in DAGs with a better running time than the algorithms presented for general digraphs and conservative length functions in Section 1.3.

Theorem 5.1.5.

There is an algorithm which, given a DAG \(D\text{,}\) a length function \(\ell:E(D)\cup V(D)\to[-\infty,\infty]\) and a vertex \(s\in V(D)\text{,}\) finds the lengths of the longest and/or shortest directed paths in \(D\) from \(s\) to every vertex \(v\) of \(D\) for which such a path exists and the predecessor of \(v\) on such a path in time \(O(|V(D)|+|A(D)|)\text{.}\)

Proof.

Let us first explain how to find the longest path from \(s\) to every other vertex. Add a new vertex, called \(v_0\text{,}\) and add one arc from \(v_0\) to \(s\) and let the length of the arc \(v_0s\) be equal to 10 times the sum of the absolute value of the lengths of all vertices and arcs of \(D\text{.}\) Now, run the algorithm from the proof of Theorem 5.1.2 to compute the quantites \(h(v)\) and \(p(v)\) for all vertices \(v\text{.}\) If \(h(v)\) is less than, say, \((9/10)\ell(v_0s)\text{,}\) then we can conclude that there does not exist any directed path from \(s\) to \(v\) in \(D\text{;}\) so, the longest such path for this vertex \(v\) has length \(-\infty\text{.}\) On the other hand, \(h(v)\geq (9/10)\ell(v_0s)\text{,}\) then the longest path in the augmented digraph must use the arc \(v_0s\text{.}\) Its now easy to see that the length of the longest path from \(s\) to \(v\) is \(h(v)-\ell(v_0s)\text{.}\)

The proof for the shortest path problem is very similar. The only difference is that, prior to adding the vertex \(v_0\text{,}\) we multiply all lengths by \(-1\) and, at the end, we multiply our final answers by \(-1\) again.

Let us now discuss how the idea behind the algorithm in the proof of Theorem 5.1.2 can be generalized beyond the specific problem that we were solving there. In that algorithm, we computed two functions \(h(v)\) and \(p(v)\) for every vertex \(v\) in a DAG. The main features of the pair \((h,p)\) of functions that we required were that

\((h(v),p(v))\) can be computed exactly and efficiently for any source vertex \(v\text{,}\) without requiring any additional information, and
if we know the value of both functions on every in-neighbour of \(v\text{,}\) then we can compute \((h(v),p(v))\) exactly and efficiently, without requiring any additional information.

More generally, if any ensemble \((f_1,\dots,f_k)\) of functions defined on the vertex set of a DAG have the above two properties, then it is always possible to compute \((f_1(v),\dots,f_k(v))\) for every \(v\in V(G)\) fairly efficiently by following exactly the same idea as the proof of Theorem 5.1.2. Of course, this still only scratches the surface of the types of problems that dynamic programming can be applied to. For example, in the next section, we will see an application of dynamic programming in which DAGs are not the right structure to deal with.

To illustrate the first point that we were making in the previous paragraph, let us investigate another example of a problem that can be reduced to a dynamic programming problem on a DAG, but does not naturally reduce to the Longest Path Problem specifically. Suppose that you have access to an algorithm that can multiply two matrices of dimensions \(a\times b\) and \(b\times c\) in time \(t(a,b,c)\text{.}\) For example, in the standard matrix multiplication algorithm, the number of steps is \(t(a,b,c)=O(abc)\text{,}\) but some faster algorithms exist. Now, suppose that, for some sequence \((a_1,\dots,a_{n+1})\) of positive integers, your boss has given you matrices \(A_1,\dots, A_n\) where the size of \(A_i\) is \(a_i\times a_{i+1}\) and they ask you to compute the product \(A_1A_2\cdots A_n\) using the pairwise matrix multiplication algorithm that you have access to. However, you quickly notice that applying this algorithm in different ways can take a very different amount of time. For example, suppose that \(n=3\) and we are trying to compute \(A_1A_2A_3\text{.}\) You have the option of computing \(A_1A_2\) first, which results in an \(a_1\times a_3\) matrix, and then multiplying the result by \(A_3\) on the right. Doing it this way would take time

\begin{equation*} t(a_1,a_2,a_3) + t(a_1,a_3,a_4). \end{equation*}

However, if you first compute \(A_2A_3\) and then multiply by \(A_1\) on the left, it will take time

\begin{equation*} t(a_2,a_3,a_4) + t(a_1,a_2,a_4) \end{equation*}

which could be much different. For example, for the naive matrix multiplication algorithm, the first would take time roughly \(a_1a_2a_3 + a_1a_3a_4 = a_1a_3(a_2+a_4)\) compared to \(a_2a_3a_4 + a_1a_2a_4 = a_2a_4(a_1+a_3)\) for the second. Since you are lazy (or smart... or both?), before you embark on this quest, you would first like to determine the best way of multiplying the matrices to minimize the amount of time that it will take you. Of course, one wants to calculate this minimum as quickly as possible since, if it takes too long to figure out the best order to multiply the matrices, then that defeats the purpose!

Let us now see how this can be tackled via dynamic programming. First, we create a DAG \(D\) as follows. For each \(i,j\in \{1,\dots,n\}\) with \(i\leq j\text{,}\) create a vertex which is labelled by \((i,j)\text{.}\) So, there are \(n+\binom{n}{2} = O(n^2)\) vertices in total. Add an arc from any vertex \((i,j)\) to any vertex of the form \((i,k)\) where \(i\leq k< j\) or \((k,j)\neq (i,j)\) where \(i< k\leq j\text{.}\) We want to compute, for each \((i,j)\text{,}\) the minimum length of time (i.e. minimum “duration”), say \(d(i,j)\text{,}\) that it would take to compute the product \(A_i\cdots A_j\text{.}\)

Here is how we do this via dynamic programming. For each vertex \((i,j)\text{,}\) initialize \(d(i,j)=\text{null}\text{.}\) Then, one by one, for each \((i,j)\) we run a similar procedure to Algorithm 5.1.4. That is, if \(d(i,j)\neq\text{null}\text{,}\) then we do nothing. If \(d(i,j)=\text{null}\) and \((i,j)\) has no in-neigbhbours, then we set \(d(i,j)=0\text{;}\) this corresponds to the fact that computing a product of just one matrix takes time 0. And, finally, if \(d(i,j)=\text{null}\) and \((i,j)\) has some in-neighbours, then we first run this subroutine on all in-neighbours. Then, after doing so, we compute the minimum of \(d(i,k)+d(k+1,j) + t(a_i,a_{k+1},a_{j+1})\) over all \(i\leq k\leq j-1\text{,}\) which represents the time it would take to compute \(A_i\cdots A_k\) and then \(A_{k+1}\cdots A_j\) and then multiply the two together. Set \(d(i,j)\) to be equal to this minimum value.

In the end, the value that we want is \(d(1,n)\text{.}\) If we are careful in how we implement this algorithm, then the running time is \(O(|A(D)|)=O(n^3)\text{.}\)

Let us give one more nice example.

Problem 5.1.6. Longest Increasing Subsequence.

Input: A sequence of real numbers \(x_1,\dots,x_n\text{.}\)
Problem: Find a strictly increasing subsequence of maximum length.

It is straightforward to reduce this to Problem 5.1.1 in time \(O(n^2)\text{.}\) That is, we can create a digraph with vertex set \(1,\dots,n\) where \(ij\) is an arc if \(i< j\) and \(x_i< x_j\text{.}\) Given this, it is easy to see that this digraph is acyclic and that the longest path corresponds to a longest increasing subsequence. So, Problem 5.1.6 can be solved in time \(O(n^2)\text{.}\)

However, we can do better than this by taking greater advantage of the structure of problem. We process each element \(x_i\) of the sequence one by one where, throughout the algorithm, for each \(k\in\{1,\dots,n\}\) we store the value \(f(k)\) which is equal to the minimum of \(x_j\) over all \(j< i\) such that there exists an increasing subsequence of length \(k\) ending with \(x_j\text{.}\) Also, for convenience, we will always have \(f(0)=-\infty\text{.}\) Start by initializing \(f(1)=\cdots =f(n)=\infty\text{.}\) Now, when we process \(x_i\text{,}\) note that the sequence \(f(0),\dots,f(n)\) is increasing. So, using a binary search, we can find the maximum \(k\in\{0,\dots,n\}\) such that \(f(k)< x_i\) in time \(O(\log(n))\text{.}\) In other words, there exists an increasing subsequence of length \(k\) in \(x_1,\dots,x_{i-1}\) whose largest element is less than \(x_i\text{,}\) but every increasing subsequence of length \(k+1\) in \(x_1,\dots,x_{i-1}\) has largest entry at least \(x_i\text{.}\) We update \(f(k+1)\) to be \(x_i\text{.}\)

We claim that this is correct. First, it is indeed true that there exists an increasing subsequence in \(x_1,\dots,x_i\) whose largest element is \(x_i\) as we can take an increasing sequence of length \(k\) whose largest element is less than \(x_i\) and append \(x_i\) to it. Also, there is no increasing sequence of length \(k+2\) or longer in \(x_1,\dots,x_{i}\) whose largest element is \(x_i\) as deleting \(x_i\) would give us a sequence of length \(k+1\) with that property. Therefore, none of the values of \(f(k+2),\dots,f(n)\) should have been updated. The values of \(f(0),\dots,f(k)\) also should not be updated because each of them is already at most \(x_i\text{.}\)

At the end of the algorithm, we simply output the largest \(k\) such that \(f(k)< \infty\) and we are done. The running time is clearly \(O(n\log(n))\text{.}\)

Here is a of YouTube video related to the topics of this section.

Prev Top Next