Algorithm II
6. Dynamic Programming II
WU Xiaokun 吴晓堃
xkun.wu [at] gmail
Q. How similar are two strings?
Ex. ocurrance and occurrence.
o | c | u | r | r | a | n | c | e | - | |
o | c | c | u | r | r | e | n | c | e | |
* | * | * | * | * | * | - | ||||
o | c | - | u | r | r | a | n | c | e | |
o | c | c | u | r | r | e | n | c | e | |
- | * | |||||||||
o | c | - | u | r | r | - | a | n | c | e |
o | c | c | u | r | r | e | - | n | c | e |
- | - | - |
Edit distance. [Levenshtein 1966, Needleman–Wunsch 1970]
C | T | – | G | A | C | C | T | A | C | G |
C | T | G | G | A | C | G | A | A | C | G |
- | - | - |
Applications. Bioinformatics, spell correction, machine translation, speech recognition, information extraction, etc.
Goal. Given two strings x_1 x_2 ... x_m and y_1 y_2 ... y_n, find a min-cost alignment.
Def. An alignment M is a set of ordered pairs x_i – y_j such that each character appears in at most one pair and no crossings.
Def. The cost of an alignment M is:
Def. OPT(i, j) = min cost of aligning prefix strings x_1 x_2 ... x_i and y_1 y_2 ... y_j.
Goal. OPT(m, n).
Case 1. OPT(i, j) matches x_i – y_i.
Case 2a. OPT(i, j) leaves x_i unmatched.
Case 2b. OPT(i, j) leaves y_i unmatched.
Def. OPT(i, j) = min cost of aligning prefix strings x_1 x_2 ... x_i and y_1 y_2 ... y_j.
Goal. OPT(m, n).
Case 1. OPT(i, j) matches x_i – y_i.
Case 2a. OPT(i, j) leaves x_i unmatched.
Case 2b. OPT(i, j) leaves y_i unmatched.
Bellman equation.
OPT(i, j) = \left\{ \begin{array}{ll} j \delta & \text{if}\ i = 0 \\ i \delta & \text{if}\ j = 0 \\ \min \{ \alpha_{x_i,y_j} + OPT(i-1,j-1), \delta + OPT(i-1,j), \delta + OPT(i,j-1) \} & \text{otherwise} \\ \end{array}\right.
SEQUENCE-ALIGNMENT
(m, n, x_1,
\ldots, x_m, y_1, \ldots, y_n, \delta, \alpha)
Ex. Matching mean and name, with:
m | e | a | n | - |
n | - | a | m | e |
* | - | * | - | |
1 | 2 | 0 | 1 | 2 |
Theorem. The DP algorithm computes the edit distance
(and an optimal alignment) of two strings of lengths m and n in
\Theta(mn) time and space.
Pf.
Correctness.
Time. M has mn entries.
Theorem. [Hirschberg] There exists an algorithm to find an optimal alignment in O(mn) time and O(m + n) space.
Edit distance graph.
Edit distance graph.
Pf of Lemma. [ by strong induction on i + j ]
\begin{aligned} f(i, j) & = \min \{ \alpha_{x_i,y_j} + f(i-1,j-1), \delta + f(i-1,j), \delta + f(i,j-1) \} \\ & = \min \{ \alpha_{x_i,y_j} + OPT(i-1,j-1), \delta + OPT(i-1,j), \delta + OPT(i,j-1) \} \\ & = OPT(i, j) \\ \end{aligned}
Edit distance graph.
Edit distance graph.
Edit distance graph.
Edit distance graph.
Observation 1. The length of a shortest path that uses (i, j) is f(i, j) + g(i, j).
Observation 2. let q be an index that minimizes f(q, n / 2) + g(q, n / 2). Then, there exists a shortest path from (0, 0) to (m, n) that uses (q, n / 2).
Divide. Find index q that minimizes f(q, n / 2) + g(q, n / 2); save node i–j as part of solution.
Conquer. Recursively compute optimal alignment in each piece.
Theorem. Hirschberg’s algorithm uses \Theta(m +n) space.
Pf.
Theorem. Let T(m,
n) = max running time of Hirschberg’s algorithm on strings of
lengths at most m and n. Then, T(m, n) =
O(m n \log n).
Pf.
Remark. Analysis is not tight because two sub-problems are of size (q, n / 2) and (m – q, n / 2). Next, we prove T(m, n) = O(m n).
Theorem. Let T(m,
n) = max running time of Hirschberg’s algorithm on strings of
lengths at most m and n. Then, T(m, n) =
O(m n).
Pf. [ by strong induction on m + n ]
\begin{aligned} T(m, n) & \le cmn + T(q, n/2) + T(m − q, n/2) \\ T(m, 2) & \le cm \\ T(2, n) & \le cn \\ \end{aligned}
Claim. T(m, n) \le 2 c m n.
\begin{aligned} T(m, n) & \le cmn + T(q, n/2) + T(m − q, n/2) \\ T(m, 2) & \le cm \\ T(2, n) & \le cn \\ \end{aligned}
Claim. T(m, n) \le 2 c m
n.
Pf. [ by strong induction on m + n ]
\begin{aligned} T(m, n) & \le T(q, n / 2) + T(m – q, n / 2) + c m n \\ & \le 2 c q n / 2 + 2 c (m – q) n / 2 + c m n \\ & = c q n + c m n – c q n + c m n \\ & = 2 c m n \\ \end{aligned}
Problem. Given two strings x_1 x_2 ... x_m and y_1 y_2 ... y_n, find a common subsequence that is as long as possible.
Alternative viewpoint. Delete some characters from x; delete some character from y; a common subsequence if it results in the same string.
Ex. LCS(GGCACCACG, ACGGCGGATACG ) = GGCAACG.
Applications. Unix diff, git, bioinformatics.
Def. OPT(i, j) = length of LCS of prefix strings x_1 x_2 ... x_i and y_1 y_2 ... y_j.
Goal. OPT(m, n).
Case 1. x_i = y_j.
Case 2. x_i \ne y_j.
Bellman equation.
OPT(i, j) = \left\{ \begin{array}{ll} 0 & \text{if}\ i = 0\ \text{or}\ j = 0\\ 1 + OPT(i-1, j-1) & \text{if}\ x_i = y_j \\ \max \{ OPT(i-1,j), OPT(i,j-1) \} & \text{if}\ x_i \ne y_j \\ \end{array}\right.
Solution 2. Reduce to finding a min-cost alignment of x and y with
Analysis. O(mn) time and O(m + n) space.
Shortest-path problem. Given a digraph G = (V, E), with arbitrary edge lengths l_{vw}, find shortest path from source node s to destination node t.
Dijkstra. May not produce shortest paths when edge lengths are negative.
Re-weighting. Adding a constant to every edge length does not necessarily make Dijkstra’s algorithm produce shortest paths.
Def. A negative cycle is a directed cycle for which the sum of its edge lengths is negative.
Lemma 1. If some v
\leadsto t path contains a negative cycle, then there does not
exist a shortest v \leadsto t
path.
Pf. If there exists such a cycle W, then can build a v \leadsto t path of arbitrarily negative
length by detouring around W as many
times as desired.
Def. A negative cycle is a directed cycle for which the sum of its edge lengths is negative.
Lemma 2. If G has
no negative cycles, then there exists a shortest v \leadsto t path that is simple (and has n –
1 edges).
Pf.
Single-destination shortest-paths problem. Given a digraph G = (V, E) with edge lengths l_{vw} (but no negative cycles) and a distinguished node t, find a shortest v \leadsto t path for every node v.
Negative-cycle problem. Given a digraph G = (V, E) with edge lengths l_{vw}, find a negative cycle (if one exists).
Which sub-problems to find shortest v \leadsto t paths for every node v?
A. OPT(i, v) =
length of shortest v \leadsto t path
that uses exactly i
edges.
B. OPT(i, v) = length
of shortest v \leadsto t path that uses
at most i edges.
C. Neither A nor B.
A: cannot eliminate shorter paths, since adding a negative edge may greatly reduce length and cancel previous effort, thus reduce to brute-force
Def. OPT(i, v) = length of shortest v \leadsto t path that uses \le i edges.
Goal. OPT(n – 1, v) for each v.
Case 1. Shortest v \leadsto t path uses \le i – 1 edges.
Case 2. Shortest v \leadsto t path uses exactly i edges.
Def. OPT(i, v) = length of shortest v \leadsto t path that uses \le i edges.
Goal. OPT(n – 1, v) for each v.
Case 1. Shortest v \leadsto t path uses \le i – 1 edges.
Case 2. Shortest v \leadsto t path uses exactly i edges.
Bellman equation.
OPT(i, v) = \left\{ \begin{array}{ll} 0 & \text{if}\ i = 0\ \text{and}\ v = t\\ \infty & \text{if}\ i = 0\ \text{and}\ v \ne t\\ \min \{OPT(i-1, v),\min_{(v,w) \in E} \{OPT(i-1,w) + l_{vw}\} \} & \text{if}\ i > 0 \\ \end{array}\right.
SHORTEST-PATHS
(V, E, l,
t)
OPT(i, v) = \left\{ \begin{array}{ll} 0 & \text{if}\ i = 0\ \text{and}\ v = t\\ \infty & \text{if}\ i = 0\ \text{and}\ v \ne t\\ \min \{OPT(i-1, v),\min_{(v,w) \in E} \{OPT(i-1,w) + l_{vw}\} \} & \text{if}\ i > 0 \\ \end{array}\right.
Theorem 1. Given a digraph G = (V, E) with no negative cycles, the DP
algorithm computes the length of a shortest v
\leadsto t path for every node v
in \Theta(mn) time and \Theta(n^2) space.
Pf.
Finding the shortest paths.
successor
[i, v] that points to next node on a shortest
v \leadsto t path using \le i edges.It is easy to modify the DP algorithm for shortest paths to …
A. Compute lengths of shortest paths in O(mn) time and O(m
+ n) space.
B. Compute shortest paths in O(mn) time and O(m
+ n) space.
C. Both A and B.
D. Neither A nor B.
Space optimization. Maintain two 1D arrays (instead of 2D array).
successor
[v] = next node on a v \leadsto t path.
Performance optimization. If d[w] was not updated in iteration i – 1, then no reason to consider edges entering w in iteration i.
BELLMAN–FORD–MOORE
(V, E, c,
t)
FOREACH node v:
d[v] = INF;
successor[v] = null;
d[t] = 0;
FOR i = 1 .. n – 1:
FOREACH node w:
IF (d[w] was updated in previous pass):
FOREACH edge (v, w):
IF (d[v] > d[w] + l_vw) d[v] = d[w] + l_vw;
successor[v] = w;
IF (no d[.] value changed in pass i): STOP;
Which properties must hold after pass i of Bellman–Ford–Moore?
A. d[v] = length of
a shortest v \leadsto t path using
\le i edges.
B. d[v] = length of a
shortest v \leadsto t path using
exactly i edges.
C. Both A and B.
D. Neither A nor B.
D. Now i is just a counter of n-1 iterations.
Lemma 3. For each node v: d[v] is
the length of some v \leadsto t
path.
Lemma 4. For each node v: d[v] is
monotone non-increasing.
Lemma 5. After pass i, d[v] \le
length of a shortest v \leadsto t path
using \le i edges.
Pf. [ by induction on i ]
\begin{aligned} d[v] & \le l_{vw} + d[w] \\ & \le l_{vw} + l(P') \\ & = l(P) \\ \end{aligned}
Theorem 2. Assuming no negative cycles,
Bellman–Ford–Moore computes the lengths of the shortest v \leadsto t paths in O(mn) time and \Theta(n) extra space.
Pf. Lemma 2 + Lemma 5.
Remark. Bellman–Ford–Moore is typically faster in practice.
Assuming no negative cycles, which properties must hold throughout Bellman–Ford–Moore?
A. Following successor
[v] pointers gives a directed v \leadsto t path.
B. If following successor
[v] pointers gives a directed v \leadsto t path, then the length
of that v \leadsto t path is d[v].
C. Both A and B.
D. Neither A nor B.
Claim. Throughout Bellman–Ford–Moore, following
the successor
[v] pointers
gives a directed path from v to t of length d[v].
Counterexample. Claim is false!
Claim. Throughout Bellman–Ford–Moore, following
the successor
[v] pointers
gives a directed path from v to t of length d[v].
Counterexample. Claim is false!
Lemma 6. Any directed cycle W in the successor graph is a negative
cycle.
Pf.
successor
[v] =
w, we must have d[v] \ge d[w] + l_{vw}.\begin{aligned} d[v_1] & \ge l(v_1, v_2) + d[v_2] \\ % d[v_2] & \ge l(v_2, v_3) + d[v_3] \\ & \vdots \\ d[v_{k–1}] & \ge l(v_{k–1}, v_k) + d[v_k] \\ d[v_k] & > l(v_k, v_1) + d[v_1] \ \text{strict less: updating now} \\ \end{aligned}
Theorem 3. Assuming no negative cycles,
Bellman–Ford–Moore finds shortest v \leadsto
t paths for every node v in
O(mn) time and \Theta(n) extra space.
Pf.
successor
[v] = w, we
must have d[v] = d[w] + l_{vw}.\begin{aligned} d[v_1] & = l(v_1, v_2) + d[v_2] \\ d[v_2] & = l(v_2, v_3) + d[v_3] \\ & \vdots \\ d[v_{k–1}] & = l(v_{k–1}, v_k) + d[v_k] \\ \end{aligned}
Communication network.
Dijkstra’s algorithm. Requires global information of network.
Bellman–Ford–Moore. Uses only local knowledge of neighboring nodes.
Synchronization. We don’t expect routers to run in lockstep.
Distance-vector routing protocols. [ “routing by rumor” ]
Ex. Routing Information Protocol (RIP), Xerox XNS RIP, Novell’s IPX RIP, Cisco’s IGRP, DEC’s DNA Phase IV, AppleTalk’s RTMP.
Caveat. Edge lengths may change during algorithm (or fail completely).
Link-state routing protocols.
Ex. Border Gateway Protocol (BGP), Open Shortest Path First (OSPF).
Negative cycle detection problem. Given a digraph G = (V, E), with edge lengths l_{vw}, find a negative cycle (if one exists).
Currency conversion. Given n currencies and exchange rates between pairs of currencies, is there an arbitrage opportunity?
Remark. Fastest algorithm very valuable!
Lemma 7. If OPT(n, v) =
OPT(n – 1, v) for every node v,
then no negative cycles.
Pf. The OPT(n, v)
values have converged \Rightarrow
shortest v \leadsto t path exists.
Lemma 8. If OPT(n, v) <
OPT(n – 1, v) for some node v,
then (any) shortest v \leadsto t path
of length \le n contains a cycle W. Moreover W is a negative cycle.
Pf. [by contradiction]
Theorem 4. Can find a negative cycle in Θ(mn) time and Θ(n2) space.
Pf.
Construct Augmented graph Gʹ: Add new sink node t and connect all nodes to t with 0-length edge.
Case 1. [ OPT(n, v) = OPT(n – 1, v) for every node v ]
Case 2. [ OPT(n, v) < OPT(n – 1, v) for some node v ]
Theorem 5. Can find a negative cycle in O(mn) time and O(n) extra space.
Pf.
Bellman–Ford–Moore
on Gʹ for nʹ = n +
1 passes (instead of nʹ –
1).pass
(v) = last
pass in which d[v] was updated.pass
(s) =
nʹ and
pass
(successor
[v]) \ge
pass
(v) – 1 for each v.