Algorithm II
5. Divide and Conquer II
WU Xiaokun 吴晓堃
xkun.wu [at] gmail
Goal. Recipe for solving general divide-and-conquer recurrences:
T(n) = a T(\frac{n}{b}) + f(n)
with T(0) = 0 and T(1) = \Theta(1).
Terms.
Suppose T(n) = a T (n / b) + n^c, with T(1) = 1 and n a power of b.
Suppose T(n) = a T (n / b) + n^c, with T(1) = 1 and n a power of b.
Note the last one on the right: a^{\log_b n} (\frac{n}{b^{\log_b n}})^c = n^{\log_b a} (\frac{n}{n})^c = n^{\log_b a}.
Suppose T(n) = a T (n / b) + n^c, with T(1) = 1 and n a power of b.
T(n) = n^c \sum_{i=0}^{\log_b n} r^i = \left\{ \begin{array}{lcl} \Theta(n^c) & \text{if} & r < 1, \text{ie. cost dominated by root}\\ \Theta(n^c \log n) & \text{if} & r = 1, \text{ie. cost evenly distributed}\\ \Theta(n^{\log_b a}) & \text{if} & r > 1, \text{ie. cost dominated by leaves}\\ \end{array}\right.
Geometric series.
Let a \ge 1, b \ge 2, c \ge 0 and suppose T(n) is a function on non-negative integers that satisfies the recurrence T(n) = a T (n / b) + \Theta(n^c) with T(0) = 0, T(1) = \Theta(1), where n / b means either \lfloor n / b \rfloor or \lceil n / b \rceil. Then,
Case 1. If c > \log_b
a, then T(n) =
\Theta(n^c).
Case 2. If c = \log_b
a, then T(n) = \Theta(n^c \log
n).
Case 3. If c < \log_b
a, then T(n) = \Theta(n^{\log_b
a}).
Pf sketch.
Let a \ge 1, b \ge 2, c \ge 0 and suppose T(n) is a function on non-negative integers that satisfies the recurrence T(n) = a T (n / b) + \Theta(n^c) with T(0) = 0, T(1) = \Theta(1), where n / b means either \lfloor n / b \rfloor or \lceil n / b \rceil. Then,
Case 1. If c > \log_b
a, then T(n) =
\Theta(n^c).
Case 2. If c = \log_b
a, then T(n) = \Theta(n^c \log
n).
Case 3. If c < \log_b
a, then T(n) = \Theta(n^{\log_b
a}).
Extensions.
Let a \ge 1, b \ge 2, c \ge 0 and suppose T(n) is a function on non-negative integers that satisfies the recurrence T(n) = a T (n / b) + \Theta(n^c) with T(0) = 0, T(1) = \Theta(1), where n / b means either \lfloor n / b \rfloor or \lceil n / b \rceil. Then,
Case 1. If c > \log_b
a, then T(n) =
\Theta(n^c).
Case 2. If c = \log_b
a, then T(n) = \Theta(n^c \log
n).
Case 3. If c < \log_b
a, then T(n) = \Theta(n^{\log_b
a}).
Ex. [Case 1] T(n) = 48 T(\lfloor n / 4 \rfloor) + n^3.
Let a \ge 1, b \ge 2, c \ge 0 and suppose T(n) is a function on non-negative integers that satisfies the recurrence T(n) = a T (n / b) + \Theta(n^c) with T(0) = 0, T(1) = \Theta(1), where n / b means either \lfloor n / b \rfloor or \lceil n / b \rceil. Then,
Case 1. If c > \log_b
a, then T(n) =
\Theta(n^c).
Case 2. If c = \log_b
a, then T(n) = \Theta(n^c \log
n).
Case 3. If c < \log_b
a, then T(n) = \Theta(n^{\log_b
a}).
Ex. [Case 2] T(n) = T(\lfloor n / 2 \rfloor) + T(\lceil n / 2 \rceil) + 17 n.
Let a \ge 1, b \ge 2, c \ge 0 and suppose T(n) is a function on non-negative integers that satisfies the recurrence T(n) = a T (n / b) + \Theta(n^c) with T(0) = 0, T(1) = \Theta(1), where n / b means either \lfloor n / b \rfloor or \lceil n / b \rceil. Then,
Case 1. If c > \log_b
a, then T(n) =
\Theta(n^c).
Case 2. If c = \log_b
a, then T(n) = \Theta(n^c \log
n).
Case 3. If c < \log_b
a, then T(n) = \Theta(n^{\log_b
a}).
Ex. [Case 3] T(n) = 3 T(\lfloor n / 2 \rfloor) + 5 n.
Gaps in master theorem.
Let a_1 > 0, 0 < b_i < 1, functions |h_i(n)| = O(n / \log^2 n) and g(n) = O(n^c). If T(n) satisfies the recurrence:
T(n) = \sum_{i=1}^k a_i T(b_i n + h_i(n)) + g(n)
then, T(n) = (n^p (1 + \int_1^n \frac{g(u)}{u^{p+1}}\,du)), where p satisfies \sum_{i=1}^k a_i b_i^p = 1.
Ex. T(n) = T(\lfloor n / 5 \rfloor) + T(n - 3\lfloor n / 10 \rfloor) + 11/5 n, with T(0) = 0, T(1) = 0.
Addition. Given two n-bit integers a and b, compute a + b.
Subtraction. Given two n-bit integers a and b, compute a - b.
Grade-school algorithm. \Theta(n) bit operations.
= | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 |
---|---|---|---|---|---|---|---|---|---|
1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | ||
+ | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | |
1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
Remark. Grade-school addition and subtraction algorithms are optimal.
Multiplication. Given two n-bit integers a and b,
compute a \times b.
Grade-school algorithm (long multiplication). \Theta(n^2) bit operations.
Conjecture. [Kolmogorov 1956] Grade-school algorithm
is optimal.
Theorem. [Karatsuba 1960] Conjecture is false.
To multiply two n-bit integers x and y:
x y = (2^m a + b) (2^m c + d) = 2^{2m} ac + 2^m (bc + ad) + bd
Ex.
a | b | |||||||
---|---|---|---|---|---|---|---|---|
x = | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 |
y = | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 |
MULTIPLY
(a, c, m);MULTIPLY
(b, d, m);MULTIPLY
(b, c, m);MULTIPLY
(a, d, m);Time.
2.1-2.3
: \Theta(n)2.4-2.7
: 4 T(\lceil n/2
\rceil)3
: \Theta(n)How many bit operations to multiply two n-bit integers using the divide-and-conquer multiplication algorithm?
T(n) = \left\{ \begin{array}{lcl} \Theta(1) & \text{if} & n = 1 \\ 4 T(\lceil n/2 \rceil) + \Theta(n) & \text{if} & n > 1 \\ \end{array}\right.
A. T(n) =
\Theta(n^{1/2}).
B. T(n) = \Theta(n \log
n).
C. T(n) = \Theta(n^{\log_2
3}) = O(n^{1.585}).
D. T(n) =
\Theta(n^2).
D: T(n) = \Theta(n^{\log_2 4}) = \Theta(n^2)
To multiply two n-bit integers x and y:
\begin{aligned} x y = (2^m a + b) (2^m c + d) & = 2^{2m} ac + 2^m (bc + ad) + bd \\ & = 2^{2m} ac + 2^m (ac + bd - (a - b) (c - d)) + bd \\ \end{aligned}
KARATSUBA-MULTIPLY
(a, c,
m);KARATSUBA-MULTIPLY
(b, d,
m);KARATSUBA-MULTIPLY
(|a - b|, |c -
d|, m);Time.
2.1-2.3
: \Theta(n)2.4-2.6
: 3 T(\lceil n/2
\rceil)3
: \Theta(n)Proposition. Karatsuba’s algorithm requires O(n^{1.585}) bit operations to multiply two
n-bit integers.
Pf. Apply Case 3 of the master theorem to the
recurrence:
T(n) = \left\{ \begin{array}{lcl} \Theta(1) & \text{if} & n = 1 \\ 3 T(\lceil n/2 \rceil) + \Theta(n) & \text{if} & n > 1 \\ \end{array}\right.
\Rightarrow T(n) = \Theta(n^{\log_2 3}) = O(n^{1.585})
Practice.
arithmetic problem | formula | bit complexity |
---|---|---|
integer multiplication | a \times b | M(n) |
integer square | a^2 | \Theta(M(n)) |
integer division | \lfloor a / b \rfloor, a \mod b | \Theta(M(n)) |
integer square root | \lfloor \sqrt{a} \rfloor | \Theta(M(n)) |
Integer arithmetic problems have the same bit complexity M(n) as integer multiplication.
Dot product. Given two length-n vectors a
and b, compute c = a \cdot b = \sum_{i=1}^n a_i b_i.
Grade-school. \Theta(n) arithmetic operations.
Ex. a = [.70 .20 .10 ], b = [.30 .40 .30 ]:
a \cdot b = (.70 \times .30) + (.20 \times .40) + (.10 \times .30) = .32
Remark. “Grade-school” dot product algorithm is asymptotically optimal.
Matrix multiplication. Given two n-by-n
matrices A and B, compute C =
AB.
Grade-school. \Theta(n^3) arithmetic operations.
\left[ \begin{array}{c} c_{11} & c_{12} & \cdots & c_{1n} \\ c_{21} & c_{22} & \cdots & c_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ c_{n1} & c_{n2} & \cdots & c_{nn} \\ \end{array} \right] = \left[ \begin{array}{c} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \\ \end{array} \right] \times \left[ \begin{array}{c} b_{11} & b_{12} & \cdots & b_{1n} \\ b_{21} & b_{22} & \cdots & b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ b_{n1} & b_{n2} & \cdots & b_{nn} \\ \end{array} \right]
Q. Is “grade-school” matrix multiplication algorithm asymptotically optimal?
\begin{aligned} C_{11} & = A_{11} \times B_{11} + A_{12} \times B_{21} \\ & = \left[ \begin{array}{c} 0 & 1 \\ 4 & 5 \\ \end{array} \right] \times \left[ \begin{array}{c} 16 & 17 \\ 20 & 21 \\ \end{array} \right] + \left[ \begin{array}{c} 2 & 3 \\ 6 & 7 \\ \end{array} \right] \times \left[ \begin{array}{c} 24 & 25 \\ 28 & 29 \\ \end{array} \right] \\ & = \left[ \begin{array}{c} 152 & 158 \\ 504 & 526 \\ \end{array} \right] \\ \end{aligned}
To multiply two n-by-n matrices A and B:
\begin{aligned} C & = A \times B \\ \left[ \begin{array}{c} C_{11} & C_{12} \\ C_{21} & C_{22} \\ \end{array} \right] & = \left[ \begin{array}{c} A_{11} & A_{12} \\ A_{21} & A_{22} \\ \end{array} \right] \times \left[ \begin{array}{c} B_{11} & B_{12} \\ B_{21} & B_{22} \\ \end{array} \right] \\ \end{aligned}
\begin{aligned} C_{11} & = A_{11} \times B_{11} + A_{12} \times B_{21} \\ C_{12} & = A_{11} \times B_{12} + A_{12} \times B_{22} \\ C_{21} & = A_{21} \times B_{11} + A_{22} \times B_{21} \\ C_{22} & = A_{21} \times B_{12} + A_{22} \times B_{22} \\ \end{aligned}
Running time. Apply Case 3 of the master theorem.
T(n) = 8 T(n/2) + \Theta(n^2) \Rightarrow T(n) = \Theta(n^3)
Key idea. Can multiply two 2-by-2 matrices via 7 scalar multiplications (plus 11 additions and 7 subtractions).
\begin{aligned} \left[ \begin{array}{c} C_{11} & C_{12} \\ C_{21} & C_{22} \\ \end{array} \right] & = \left[ \begin{array}{c} A_{11} & A_{12} \\ A_{21} & A_{22} \\ \end{array} \right] \times \left[ \begin{array}{c} B_{11} & B_{12} \\ B_{21} & B_{22} \\ \end{array} \right] \\ \end{aligned} \begin{aligned} C_{11} & = P_5 + P_4 - P_2 + P_6 \\ C_{12} & = P_1 + P_2 \\ C_{21} & = P_3 + P_4 \\ C_{22} & = P_1 + P_5 - P_3 - P_7 \\ \end{aligned}
\begin{aligned} P_1 & = A_{11} \times (B_{12} - B_{22}) \\ P_2 & = (A_{11} + A_{12}) \times B_{22} \\ P_3 & = (A_{21} + A_{22}) \times B_{11} \\ P_4 & = A_{22} \times (B_{21} - B_{11}) \\ P_5 & = (A_{11} + A_{22}) \times (B_{11} + B_{22}) \\ P_6 & = (A_{12} - A_{22}) \times (B_{21} + B_{22}) \\ P_7 & = (A_{11} - A_{21}) \times (B_{11} + B_{12}) \\ \end{aligned}
Pf. C_{12} = P_1 + P_2 = A_{11} \times (B_{12} - B_{22}) + (A_{11} + A_{12}) \times B_{22} = A_{11} \times B_{12} + A_{12} \times B_{22}
STRASSEN
(n / 2, A_{11}, (B_{12} - B_{22}));STRASSEN
(n / 2, (A_{11} + A_{12}), B_{22});STRASSEN
(n / 2, (A_{21} + A_{22}), B_{11});STRASSEN
(n / 2, A_{22}, (B_{21} - B_{11}));STRASSEN
(n / 2, (A_{11} + A_{22}), (B_{11} +
B_{22}));STRASSEN
(n / 2, (A_{12} - A_{22}), (B_{21} +
B_{22}));STRASSEN
(n / 2, (A_{11} - A_{21}), (B_{11} +
B_{12}));Time.
3-9
: 7 T(n/2) +
\Theta(n^2)10-13
: \Theta(n^2)Theorem. [Strassen 1968] Strassen’s algorithm
requires O(n^{2.81}) arithmetic
operations to multiply two n-by-n matrices.
Pf.
Implementation issues.
Common misperception. “Strassen’s algorithm is only a theoretical curiosity.”
Suppose that you could multiply two 3-by-3 matrices with 21 scalar multiplications. How fast could you multiply two n-by-n matrices?
A. \Theta(n^{\log_3
21})
B. \Theta(n^{\log_2
21})
C. \Theta(n^{\log_9
21})
D. \Theta(n^2)
A
Q. Multiply two 2-by-2 matrices with 7 scalar
multiplications?
A. Yes! [Strassen 1969] \Theta(n^{\log_2 7}) = O(n^{2.81})
Q. Multiply two 2-by-2 matrices with 6 scalar
multiplications?
A. Impossible. [Hopcroft–Kerr, Winograd 1971] \Theta(n^{\log_2 6}) = O(n^{2.59})
linear algebra problem | expression | arithmetic complexity |
---|---|---|
matrix multiplication | A × B | MM(n) |
matrix squaring | A^2 | \Theta(MM(n)) |
matrix inversion | A^{-1} | \Theta(MM(n)) |
determinant | \|A\| | \Theta(MM(n)) |
rank | rank(A) | \Theta(MM(n)) |
system of linear equations | Ax = b | \Theta(MM(n)) |
LU decomposition | A = L U | \Theta(MM(n)) |
least squares | \min \Vert Ax - b\Vert_2 | \Theta(MM(n)) |
Numerical linear algebra problems have the same arithmetic complexity MM(n) as matrix multiplication
Fourier theorem. [Fourier, Dirichlet, Riemann] Any (sufficiently smooth) periodic function can be expressed as the sum of a series of sinusoids.
Euler’s identity. e^{ix} = \cos x + i \sin x.
Sinusoids. Sum of sine and cosines = sum of complex exponentials.
FFT. Fast way to convert between time domain and frequency domain.
Alternate viewpoint. Fast way to multiply and evaluate polynomials.
“ If you speed up any nontrivial algorithm by a factor of a million or so the world will beat a path towards finding useful applications for it. ” — Numerical Recipes
Univariate polynomial. [ coefficient representation ]
Addition. O(n) arithmetic operations.
Evaluation. O(n) using Horner’s method.
for
j = n-1 .. 0:
v = a[j] + (x * v);Multiplication (linear convolution). O(n^2) using brute-force.
Fundamental theorem of algebra. A degree n univariate polynomial with complex coefficients has exactly n complex roots.
Corollary. A degree n - 1 univariate polynomial A(x) is uniquely specified by its evaluation at n distinct values of x.
Univariate polynomial. [ point-value representation ]
Addition. O(n) arithmetic operations.
Multiplication. O(n), but represent A(x) and B(x) using 2n points.
Evaluation. O(n^2) using Lagrange’s formula.
Tradeoff. Either fast evaluation or fast multiplication. We want both!
representation | multiply | evaluate |
---|---|---|
coefficient | O(n^2) | O(n) |
point-value | O(n) | O(n^2) |
Goal. Efficient conversion between two representations \Rightarrow all ops fast.
Application. Polynomial multiplication (coefficient representation).
coefficient | transform | point-value |
---|---|---|
coefficient | FFT: O(n \log n) | point-value |
multiplication: O(n) | ||
coefficient | inv. FFT: O(n \log n) | point-value |
Coefficient \Rightarrow Point-value. Given a polynomial A(x) = a_0 + a_1 x + ... + a_{n-1} x^{n-1}, evaluate it at n distinct points x_0, ..., x_{n-1}.
\left[ \begin{array}{c} y_{0} \\ y_{1} \\ y_{2} \\ \vdots \\ y_{n-1} \\ \end{array} \right] = \left[ \begin{array}{c} 1 & x_{0} & x_{0}^2 & \cdots & x_{0}^{n-1} \\ 1 & x_{1} & x_{1}^2 & \cdots & x_{1}^{n-1} \\ 1 & x_{2} & x_{2}^2 & \cdots & x_{2}^{n-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n-1} & x_{n-1}^2 & \cdots & x_{n-1}^{n-1} \\ \end{array} \right] \times \left[ \begin{array}{c} a_{0} \\ a_{1} \\ a_{2} \\ \vdots \\ a_{n-1} \\ \end{array} \right]
Running time. O(n^2) via matrix-vector multiply (or O(n) Horner’s).
Point-value \Rightarrow Coefficient. Given n distinct points x_0, ..., x_{n-1} and values y_0, ..., y_{n-1}, find unique polynomial A(x) = a_0 + a_1 x + ... + a_{n-1} x^{n-1}, that has given values at given points.
\left[ \begin{array}{c} y_{0} \\ y_{1} \\ y_{2} \\ \vdots \\ y_{n-1} \\ \end{array} \right] = \left[ \begin{array}{c} 1 & x_{0} & x_{0}^2 & \cdots & x_{0}^{n-1} \\ 1 & x_{1} & x_{1}^2 & \cdots & x_{1}^{n-1} \\ 1 & x_{2} & x_{2}^2 & \cdots & x_{2}^{n-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n-1} & x_{n-1}^2 & \cdots & x_{n-1}^{n-1} \\ \end{array} \right] \times \left[ \begin{array}{c} a_{0} \\ a_{1} \\ a_{2} \\ \vdots \\ a_{n-1} \\ \end{array} \right]
Running time. O(n^3) via Gaussian elimination.
Decimation in time. Divide into even- and odd- degree terms.
Decimation in frequency. Divide into low- and high-degree terms.
Coefficient \Rightarrow Point-value. Given a polynomial A(x) = a_0 + a_1 x + ... + a_{n-1} x^{n-1}, evaluate it at n distinct points x_0, ..., x_{n-1}.
Divide. Break up polynomial into even- and odd-degree terms.
Need 4 evaluations.
Intuition. Choose four complex points to be \pm 1, \pm i.
What if n \ge 8?
Def. An n^{th} root of unity is a complex number x such that x^n = 1.
Fact. The n^{th}
roots of unity are: \omega^0, \omega^1,
\ldots, \omega^{n-1} where \omega =
e^{2π i / n}.
Pf. (\omega^k)^n = (e^{2π i k
/ n})^n = (e^{π i})^{2k} = (−1)^{2k} = 1.
Fact. The (n/2)^{th} roots of unity are: \nu^0, \nu^1, \ldots, \nu^{n/2-1} where \nu = \omega^2 = e^{4π i / n}.
Coefficient \Rightarrow Point-value. Given a polynomial A(x) = a_0 + a_1 x + ... + a_{n-1} x^{n-1}, evaluate it at n distinct points x_0, ..., x_{n-1}.
Key idea. Choose x_k = \omega^k where \omega is principal n^{th} root of unity.
\left[ \begin{array}{c} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ \vdots \\ y_{n-1} \\ \end{array} \right] = \left[ \begin{array}{c} 1 & 1 & 1 & 1 & \cdots & 1 \\ 1 & \omega^{1} & \omega^2 & \omega^3 & \cdots & \omega^{(n-1)} \\ 1 & \omega^{2} & \omega^4 & \omega^6 & \cdots & \omega^{2(n-1)} \\ 1 & \omega^{3} & \omega^6 & \omega^9 & \cdots & \omega^{3(n-1)} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \omega^{n-1} & \omega^{2(n-1)} & \omega^{3(n-1)} & \cdots & \omega^{(n-1)(n-1)} \\ \end{array} \right] \times \left[ \begin{array}{c} a_{0} \\ a_{1} \\ a_{2} \\ a_{3} \\ \vdots \\ a_{n-1} \\ \end{array} \right]
Goal. Evaluate a degree n - 1 polynomial A(x) = a_0 + a_1 x + ... + a_{n-1} x^{n-1} at its n^{th} roots of unity: \omega^0, \omega^1, \ldots, \omega^{n-1}.
Divide. Break up polynomial into even- and odd-degree terms.
Conquer. Evaluate A_{even}(x) and A_{odd}(x) at (n/2)^{th} roots of unity: \nu^0, \nu^1, \ldots, \nu^{n/2-1}
Combine.
FFT
(n / 2, a_0, a_2, a_4, \ldots,
a_{n-2});FFT
(n / 2, a_1, a_3, a_5, \ldots,
a_{n-1});Time.
2-3
: 2 T(n/2)4.1-4.3
: \Theta(n)Theorem. The FFT algorithm evaluates a degree n - 1 polynomial at each of the n^{th} roots of unity in O(n \log n) arithmetic operations and O(n) extra space.
Pf.
T(n) = \left\{ \begin{array}{lcl} \Theta(1) & \text{if} & n = 1 \\ 2 T(n/2) + \Theta(n) & \text{if} & n > 1 \\ \end{array}\right.
When computing the FFT of (a_0, a_1, a_2, \dots, a_7), which are the first two coefficients involved in an arithmetic operation?
A. a_0 and a_1.
B. a_0 and a_2.
C. a_0 and a_4.
D. a_0 and a_7.
E. None of the above.
C: first leaf of the FFT tree.
Alternative viewpoint. FFT is a recursive decomposition of Fourier matrix.
F_n = \left[ \begin{array}{c} 1 & 1 & 1 & 1 & \cdots & 1 \\ 1 & \omega^{1} & \omega^2 & \omega^3 & \cdots & \omega^{(n-1)} \\ 1 & \omega^{2} & \omega^4 & \omega^6 & \cdots & \omega^{2(n-1)} \\ 1 & \omega^{3} & \omega^6 & \omega^9 & \cdots & \omega^{3(n-1)} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \omega^{n-1} & \omega^{2(n-1)} & \omega^{3(n-1)} & \cdots & \omega^{(n-1)(n-1)} \\ \end{array} \right] a = \left[ \begin{array}{c} a_{0} \\ a_{1} \\ a_{2} \\ \vdots \\ a_{n-1} \\ \end{array} \right]
I_n = \left[ \begin{array}{c} 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \\ \end{array} \right] D_n = \left[ \begin{array}{c} \omega^{0} & 0 & 0 & \cdots & 0 \\ 0 & \omega^{1} & 0 & \cdots & 0 \\ 0 & 0 & \omega^{2} & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \omega^{n-1} \\ \end{array} \right]
y = F_n a = \left[ \begin{array}{c} I_{n/2} & D_{n/2} \\ I_{n/2} & -D_{n/2} \\ \end{array} \right] \left[ \begin{array}{c} F_{n/2} a_{even} \\ F_{n/2} a_{odd} \\ \end{array} \right]
Point-value \Rightarrow Coefficient. Given n distinct points x_0, ..., x_{n-1} and values y_0, ..., y_{n-1}, find unique polynomial A(x) = a_0 + a_1 x + ... + a_{n-1} x^{n-1}, that has given values at given points.
\left[ \begin{array}{c} a_{0} \\ a_{1} \\ a_{2} \\ a_{3} \\ \vdots \\ a_{n-1} \\ \end{array} \right] = \left[ \begin{array}{c} 1 & 1 & 1 & 1 & \cdots & 1 \\ 1 & \omega^{1} & \omega^2 & \omega^3 & \cdots & \omega^{(n-1)} \\ 1 & \omega^{2} & \omega^4 & \omega^6 & \cdots & \omega^{2(n-1)} \\ 1 & \omega^{3} & \omega^6 & \omega^9 & \cdots & \omega^{3(n-1)} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \omega^{n-1} & \omega^{2(n-1)} & \omega^{3(n-1)} & \cdots & \omega^{(n-1)(n-1)} \\ \end{array} \right]^{-1} \left[ \begin{array}{c} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ \vdots \\ y_{n-1} \\ \end{array} \right]
Claim. Inverse of Fourier matrix F_n is given by following formula:
G_n = \frac{1}{n} = \left[ \begin{array}{c} 1 & 1 & 1 & 1 & \cdots & 1 \\ 1 & \omega^{-1} & \omega^{-2} & \omega^{-3} & \cdots & \omega^{-(n-1)} \\ 1 & \omega^{-2} & \omega^{-4} & \omega^{-6} & \cdots & \omega^{-2(n-1)} \\ 1 & \omega^{-3} & \omega^{-6} & \omega^{-9} & \cdots & \omega^{-3(n-1)} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \omega^{-(n-1)} & \omega^{-2(n-1)} & \omega^{-3(n-1)} & \cdots & \omega^{-(n-1)(n-1)} \\ \end{array} \right]
Consequence. To compute the inverse FFT, apply the same algorithm but use \omega^{-1} = e^{-2π i / n} as principal n^{th} root of unity (and divide the result by n).
Claim. F_n and
G_n are inverses.
Pf.
(F_n G_n)_{k k'} = \frac{1}{n} \sum_{j=0}^{n-1} \omega^{kj} \omega^{-jk'} = \frac{1}{n} \sum_{j=0}^{n-1} \omega^{(k-k')j} = \left\{ \begin{array}{lll} 1 & \text{if}\ k = k \\ 0 & \text{otherwise} \\ \end{array}\right.
Summation lemma. Let \omega be a principal n^{th} root of unity. Then
\sum_{j=0}^{n-1} \omega^{kj} = \left\{ \begin{array}{lll} n & \text{if}\ k \equiv 0 \pmod{0} \\ 0 & \text{otherwise} \\ \end{array}\right.
Pf.
INVERSE-FFT
(n / 2, y_0, y_2, y_4,
\ldots, y_{n-2});INVERSE-FFT
(n / 2, y_1, y_3, y_5,
\ldots, y_{n-1});Time.
2-3
: 2 T(n/2)4.1-4.3
: \Theta(n)Theorem. The inverse FFT algorithm interpolates a degree n - 1 polynomial at each of the n^{th} roots of unity in O(n \log n) arithmetic operations.
Corollary. Can convert between coefficient and point-value representations in O(n \log n) arithmetic operations.
coefficient | transform | point-value |
---|---|---|
coefficient | FFT: O(n \log n) | point-value |
multiplication: O(n) | ||
coefficient | inv. FFT: O(n \log n) | point-value |
Theorem. Given two polynomials A(x) = a_0 + a_1 x + a_2 x^2 + \ldots + a_{n−1} x^{n−1} and B(x) = b_0 + b_1 x + b_2 x^2 + \ldots + b_{n−1} x^{n−1} of degree n - 1, can multiply them in O(n \log n) arithmetic operations.
Pf.
coefficient | transform | point-value |
---|---|---|
coefficient | FFT: O(n \log n) | point-value |
multiplication: O(n) | ||
coefficient | inv. FFT: O(n \log n) | point-value |
Convolution. A vector with 2n -1 coordinates, where c_k = \sum_{(i,j):i+j=k} a_i b_j.
\left[ \begin{array}{c} a_0 b_0 & a_0 b_1 & a_0 b_2 & a_0 b_3 & \cdots & a_0 b_{n-1} \\ a_1 b_0 & a_1 b_1 & a_1 b_2 & a_1 b_3 & \cdots & a_1 b_{n-1} \\ a_2 b_0 & a_2 b_1 & a_2 b_2 & a_2 b_3 & \cdots & a_2 b_{n-1} \\ a_3 b_0 & a_3 b_1 & a_3 b_2 & a_3 b_3 & \cdots & a_3 b_{n-1} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ a_{n-1} b_0 & a_{n-1} b_1 & a_{n-1} b_2 & a_{n-1} b_3 & \cdots & a_{n-1} b_{n-1} \\ \end{array} \right]
Integer multiplication. Given two n-bit integers a = a_{n-1} … a_1 a_0 and b = b_{n-1} … b_1 b_0, compute their product a \cdot b.
Convolution algorithm.
Practice. [GNU Multiple Precision Arithmetic Library]
3-SUM. Given three sets X, Y, and Z of n integers each, determine whether there is a triple i \in X, j \in Y, k \in Z such that i + j = k.
Assumption. All integers are between 0 and m.
Goal. O(m \log m + n \log n) time.
Ex.
m = 19, n = 3
4 + 15 = 19
An O(m \log m + n) solution.
Ex.
m = 19, n = 3
4 + 15 = 19
\begin{aligned} A(x) & = x^4 + x^7 + x^{10} \\ B(x) & = x^{5} + x^{8} + x^{15} \\ C(x) & = x^{9} + 2x^{12} + 2x^{15} + x^{18} + x^{19} + x^{22} + x^{25} \\ \end{aligned}