Algorithm II
5. Divide and Conquer I
WU Xiaokun 吴晓堃
xkun.wu [at] gmail
Divide-and-conquer.
Most common usage.
Benefit. Closest Pair Problem:
Problem. Given a list L of n elements from a totally ordered universe, rearrange them in ascending order.
Obvious applications.
Some problems become easier once elements are sorted.
Non-obvious applications.
Goal. Combine two sorted lists A and B into a sorted whole C.
Input. List L of
n elements from a totally ordered
universe.
Output. The n elements
in ascending order.
MERGE-SORT
(A): T(n/2);MERGE-SORT
(B): T(n/2);MERGE
(A, B): \Theta(n);Def. T(n) = max number of compares to mergesort a list of length n.
Recurrence.
T(n) \le \left\{ \begin{array}{lcl} 0 & \text{if} & n = 1 \\ T(\lfloor n/2 \rfloor) + T(\lceil n/2 \rceil) + cn & \text{if} & n > 1 \\ \end{array}\right.
Solution. T(n) is O(n \log_2 n).
Assorted proofs. several ways to solve this recurrence (following slides).
Proposition. If T(n) satisfies the following recurrence, then T(n) = cn \log_2 n.
T(n) = \left\{ \begin{array}{lcl} 0 & \text{if} & n = 1 \\ 2T(n/2) + cn & \text{if} & n > 1 \\ \end{array}\right.
Pf. [ by identifying a pattern ]
cn per level, \log_2 n levels.
Proposition. If T(n) satisfies the following recurrence, then T(n) = cn \log_2 n.
T(n) = \left\{ \begin{array}{lcl} 0 & \text{if} & n = 1 \\ 2T(n/2) + cn & \text{if} & n > 1 \\ \end{array}\right.
Pf. [ by induction on n ]
\begin{aligned} T(n) & = 2T(n/2) + cn \\ & = 2c(n/2) \log_2(n/2) + cn \\ & = cn[(\log_2 n) − 1]+ cn \\ & = (cn \log_2 n) − cn + cn \\ & = cn \log_2 n \\ \end{aligned}
Proposition. If T(n) satisfies the following recurrence, then T(n) = cn \log_2 n.
T(n) = \left\{ \begin{array}{lcl} 0 & \text{if} & n = 1 \\ 2T(n/2) + cn & \text{if} & n > 1 \\ \end{array}\right.
Pf. [ guess T(n) = kn \log_b n ]
Which is the exact solution of the following recurrence?
T(n) = \left\{ \begin{array}{lcl} 0 & \text{if} & n = 1 \\ T(\lfloor n/2 \rfloor) + T(\lceil n/2 \rceil) + n - 1 & \text{if} & n > 1 \\ \end{array}\right.
A. T(n) = n \lfloor \log_2
n \rfloor
B. T(n) = n \lceil \log_2 n
\rceil
C. T(n) = n \lfloor \log_2 n
\rfloor + 2^{\lfloor \log_2 n \rfloor} − 1
D. T(n) = n \lceil \log_2 n
\rceil − 2^{\lceil \log_2 n \rceil} + 1
E. Not even Knuth knows.
\begin{aligned} T(2n) & = 2T(n) + 2n - 1 \\ (D) & = 2n \lceil \log_2 n \rceil − 2^{\lceil \log_2 n \rceil + 1} + 2 + 2n - 1 \\ & = 2n \lceil \log_2 n \rceil − 2^{\lceil \log_2 n \rceil + 1} + 2n + 1 \\ \end{aligned}
\begin{aligned} T(2n) & = 2n \lceil \log_2 2n \rceil − 2^{\lceil \log_2 2n \rceil} + 1 \\ & = 2n \lceil \log_2 n + 1 \rceil - 2^{\lceil \log_2 n + 1\rceil} + 1 \\ & = 2n (\lceil \log_2 n \rceil + 1) - 2^{\lceil \log_2 n \rceil + 1} + 1 \\ & = 2n \lceil \log_2 n \rceil - 2^{\lceil \log_2 n \rceil + 1} + 2n + 1 \\ \end{aligned}
Proposition. If T(n) satisfies the following recurrence, then T(n) \le n \lceil \log_2 n \rceil.
T(n) \le \left\{ \begin{array}{lcl} 0 & \text{if} & n = 1 \\ T(\lfloor n/2 \rfloor) + T(\lceil n/2 \rceil) + n & \text{if} & n > 1 \\ \end{array}\right.
Pf. [ by strong induction on n ]
Induction step: assume true for 1, 2, \ldots,
n – 1.
Let n_1 = \lfloor n / 2 \rfloor and
n_2 = \lceil n / 2 \rceil: note that
n = n_1 + n_2.
\begin{aligned} T(n) & \le T(n_1) + T(n_2) + n \\ & \le n_1 \lceil \log_2 n_1 \rceil + n_2 \lceil \log_2 n_2 \rceil + n \\ & \le n_1 \lceil \log_2 n_2 \rceil + n_2 \lceil \log_2 n_2 \rceil + n \\ & = n \lceil \log_2 n_2 \rceil + n \\ (\ast) & \le n (\lceil \log_2 n \rceil – 1) + n \\ & = n \lceil \log_2 n \rceil \\ \end{aligned}
\begin{aligned} n_2 & = \lceil n/2 \rceil \\ & \le \lceil 2^{\lceil \log_2 n \rceil} /2 \rceil \\ & = 2^{\lceil \log_2 n \rceil} /2 \\ & \Downarrow \\ (\ast) \log_2 n_2 & \le \lceil \log_2 n \rceil - 1 \\ \end{aligned}
Challenge. How to prove a lower bound for all conceivable algorithms?
Model of computation. Comparison trees.
Cost. Number of compares.
Q. Realistic model?
A1. Yes. Java, Python, C++, etc.
A2. Yes. Mergesort, insertion sort, quicksort,
heapsort, etc.
A3. No. Bucket sort, radix sorts, etc.
Theorem. Any deterministic compare-based
sorting algorithm must make \Theta(n \log
n) compares in the worst-case.
Pf. [ information theoretic ]
Theorem. Any deterministic compare-based
sorting algorithm must make \Theta(n \log
n) compares in the worst-case.
Pf. [ information theoretic ]
\begin{aligned} 2^h & \ge n! \\ \Rightarrow h & \ge \log_2 n! \\ (Stirling’s) & \ge n \log_2 n - n / \ln 2 \\ \end{aligned}
More general formulation: recursively solve q sub-problems of size n/2 each:
T(n) \le \left\{ \begin{array}{lcl} 0 & \text{if} & n = 1 \\ qT(n/2) + cn & \text{if} & n > 1 \\ \end{array}\right.
Proposition. If T(n) satisfies the following recurrence with q \ge 2, then T(n) = O(n^{\log_2 q}).
T(n) \le qT(n/2) + cn
Pf. Geometric sum: \sum_{j=0}^{\log_2 n - 1}(r)^j = \frac{r^{\log_2 n} - 1}{r - 1}.
Especially, O(n^{\log_2 3}) = O(n^{1.59}), O(n^{\log_2 4}) = O(n^2).
Proposition. If T(n) satisfies the following recurrence with q = 1, then T(n) = O(n).
T(n) \le qT(n/2) + cn
Pf. Geometric sum: \sum_{j=0}^{\log_2 n - 1}(\frac{1}{2^j}) = 2.
Special case: spending quadratic time for the initial division and final recombining.
T(n) \le \left\{ \begin{array}{lcl} 0 & \text{if} & n = 1 \\ 2T(n/2) + cn^2 & \text{if} & n > 1 \\ \end{array}\right.
Solution. T(n) is
O(n^2 \log_2 n).
Pf. Geometric sum: \sum_{j=0}^{\log_2 n - 1}(\frac{1}{2^j}) =
2.
Music recommendation. Music site tries to match your preference with others.
Similarity metric: number of inversions between two rankings.
A | B | C | D | E | |
---|---|---|---|---|---|
me | 1 | 2 | 3 | 4 | 5 |
you | 1 | 3 | 4 | 2 | 5 |
Brute force: check all \Theta(n^2) pairs.
1 | 5 | 4 | 8 | 10 | 2 | 6 | 9 | 3 | 7 |
---|---|---|---|---|---|---|---|---|---|
1 | 5 | 4 | 8 | 10 | |||||
2 | 6 | 9 | 3 | 7 |
Output. 1 + 3 + 13 = 17.
Q. How to count inversions (a, b) with a \in
A, b \in B?
A. Easy if A and B are sorted!
Warmup algorithm.
7 | 10 | 18 | 3 | 14 | 20 | 23 | 2 | 11 | 16 |
---|---|---|---|---|---|---|---|---|---|
3 | 7 | 10 | 14 | 18 | |||||
2 | 11 | 16 | 20 | 23 |
Output. 5 + 2 + 1 + 0 + 0 = 8.
Count inversions (a, b) with a \in A, b \in B, assuming A and B are sorted.
Scan A and B from left to right:
7 | 10 | 18 | 3 | 14 | 20 | 23 | 2 | 11 | 16 |
---|---|---|---|---|---|---|---|---|---|
3 | 7 | 10 | 14 | 18 | |||||
a_i | 18 | ||||||||
2 | 11 | 16 | 20 | 23 | |||||
5 | 2 | b_j | 20 | 23 | |||||
2 | 3 | 7 | 10 | 11 |
Input. List L.
Output. Number of inversions in L and L in
sorted order.
SORT-AND-COUNT
(A): T(n/2);SORT-AND-COUNT
(B): T(n/2);MERGE-AND-COUNT
(A, B): \Theta(n);Proposition. The sort-and-count algorithm counts the
number of inversions in a permutation of size n in O(n \log
n) time.
Pf. The worst-case running time T(n) satisfies the recurrence:
T(n) = \left\{ \begin{array}{lcl} \Theta(1) & \text{if} & n = 1 \\ T(\lfloor n/2 \rfloor) + T(\lceil n/2 \rceil) + \Theta(n) & \text{if} & n > 1 \\ \end{array}\right.
Goal. Given an array A and pivot element p, partition array so that:
7 | 6 | 12 | 3 | 11 | 8 | 9 | 1 | 4 | 10 | 2 |
---|---|---|---|---|---|---|---|---|---|---|
L | 6 | R | ||||||||
3 | 1 | 4 | 2 | 6 | 7 | 12 | 11 | 8 | 9 | 10 |
Challenge. O(n) time and O(1) space.
7 | 6 | 12 | 3 | 11 | 8 | 9 | 1 | 4 | 10 | 2 | |
---|---|---|---|---|---|---|---|---|---|---|---|
partition | 3 | 1 | 4 | 2 | 6 | 7 | 12 | 11 | 8 | 9 | 10 |
sort L | 1 | 2 | 3 | 4 | 6 | ||||||
sort R | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ||||
sorted | 1 | 2 | 3 | 4 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
Input. List A.
Output. A in sorted
order.
PARTITION-3-WAY
(A, p):
\Theta(n);RANDOMIZED-QUICKSORT
(L): T(i);RANDOMIZED-QUICKSORT
(R): T(n-i-1);Proposition. The expected number of compares to
quicksort an array of n distinct
elements a_1 < a_2 < \ldots <
a_n is O(n \log n).
Pf. Consider BST representation of pivot elements.
Given an array of n \ge 8 distinct elements a_1 < a_2 < \ldots < a_n, what is the probability that a_7 and a_8 are compared during randomized quicksort?
A. 0
B. 1 / n
C. 2 / n
D. 1
D: ancestry
Given an array of n \ge 2 distinct elements a_1 < a_2 < \ldots < a_n, what is the probability that a_1 and a_n are compared during randomized quicksort?
A. 0
B. 1 / n
C. 2 / n
D. 1
C: compared iff either is chosen as pivot before any of the other
Proposition. The expected number of compares to
quicksort an array of n distinct
elements a_1 < a_2 < \ldots <
a_n is O(n \log n).
Pf. Consider BST representation of pivot elements.
Proposition. The expected number of compares to
quicksort an array of n distinct
elements a_1 < a_2 < \ldots <
a_n is O(n \log n).
Pf. Consider BST representation of pivot elements.
Expected number of compares:
\begin{aligned} \sum_{i=1}^n\sum_{j=i+1}^n \frac{2}{j-i+1} & = 2 \sum_{i=1}^n\sum_{j=2}^{n-i+1} \frac{1}{j} \\ & \le 2n \sum_{j=1}^{n} \frac{1}{j} \\ (harmonic) & \le 2n (\ln n + 1) \\ \end{aligned}
Remark. Number of compares only decreases on equal elements.
Closest Pair Problem. Given n points in the plane, find a pair of points with the smallest Euclidean distance between them.
Fundamental geometric primitive.
Brute force. Check all pairs with (n^2) distance calculations.
Sorting solution.
Divide. Subdivide region into 4 quadrants.
Obstacle. Impossible to ensure n / 4 points in each piece.
Find closest pair with one point in each side, assuming that distance < \delta.
Observation: suffices to consider only those points within \delta of line L.
Find closest pair with one point in each side, assuming that distance < \delta.
Observation: suffices to consider only those points within \delta of line L.
Def. Let s_i be the point in the 2 \delta-strip, with the i^{th} smallest y-coordinate.
Claim. If |j – i| >
7, then the distance between s_i
and s_j is at least \delta.
Pf.
Consider the 2\delta-by-\delta rectangle R in strip whose min y-coordinate is y-coordinate of s_i.
Input. n points
P = p_1, p_2, \ldots, p_n.
Output. distance \delta.
CLOSEST-PAIR
(points in left half): T(n/2);CLOSEST-PAIR
(points in right half): T(n/2);What is the solution to the following recurrence?
T(n) = \left\{ \begin{array}{lcl} \Theta(1) & \text{if} & n = 1 \\ T(\lfloor n/2 \rfloor) + T(\lceil n/2 \rceil) + \Theta(n \log n) & \text{if} & n > 1 \\ \end{array}\right.
A. T(n) =
\Theta(n).
B. T(n) = \Theta(n \log
n).
C. T(n) = \Theta(n \log^2
n).
D. T(n) =
\Theta(n^2).
C
Q. How to improve to O(n
\log n) ?
A. Don’t sort points in strip from scratch each
time.
Theorem. [Shamos 1975] The divide-and-conquer
algorithm for finding a closest pair of points in the plane can be
implemented in O(n \log n) time.
Pf.
T(n) = \left\{ \begin{array}{lcl} \Theta(1) & \text{if} & n = 1 \\ T(\lfloor n/2 \rfloor) + T(\lceil n/2 \rceil) + \Theta(n) & \text{if} & n > 1 \\ \end{array}\right.
Theorem. [Ben-Or 1983, Yao 1989] In quadratic decision tree model, any algorithm for closest pair (even in 1D) requires \Omega(n \log n) quadratic tests.
Theorem. [Rabin 1976] There exists an algorithm to find the closest pair of points in the plane whose expected running time is O(n).
Ingenious divide-and-conquer algorithms for core geometric problems.
problem | brute | clever |
---|---|---|
closest pair | O(n^2) | O(n \log n) |
farthest pair | O(n^2) | O(n \log n) |
convex hull | O(n^2) | O(n \log n) |
Delaunay/Voronoi | O(n^4) | O(n \log n) |
Euclidean MST | O(n^2) | O(n \log n) |