Algorithm II


5. Divide and Conquer II


WU Xiaokun 吴晓堃

Master theorem

DnC recurrences

Goal. Recipe for solving general divide-and-conquer recurrences:

T(n) = a T(\frac{n}{b}) + f(n)

with T(0) = 0 and T(1) = \Theta(1).

 

Terms.

  • a \ge 1: number of sub-problems.
  • b \ge 2: factor by which the subproblem size decreases.
  • f(n) \ge 0: work to divide and combine sub-problems.

Recursion tree

Suppose T(n) = a T (n / b) + n^c, with T(1) = 1 and n a power of b.

  • a: branching factor.
  • a^i: number of sub-problems at level i.
  • n / b^i: size of subproblem at level i.
  • 1 + \log_b n levels.

Recursion tree (cont.)

Suppose T(n) = a T (n / b) + n^c, with T(1) = 1 and n a power of b.

  • Let r = a / b^c, then T(n) = n^c \sum_{i=0}^{\log_b n} r^i.

Note the last one on the right: a^{\log_b n} (\frac{n}{b^{\log_b n}})^c = n^{\log_b a} (\frac{n}{n})^c = n^{\log_b a}.

Recursion tree: analysis

Suppose T(n) = a T (n / b) + n^c, with T(1) = 1 and n a power of b.

  • Let r = a / b^c. Note that r < 1 iff c > log_b a.

T(n) = n^c \sum_{i=0}^{\log_b n} r^i = \left\{ \begin{array}{lcl} \Theta(n^c) & \text{if} & r < 1, \text{ie. cost dominated by root}\\ \Theta(n^c \log n) & \text{if} & r = 1, \text{ie. cost evenly distributed}\\ \Theta(n^{\log_b a}) & \text{if} & r > 1, \text{ie. cost dominated by leaves}\\ \end{array}\right.

 

Geometric series.

  • If 0 < r < 1, then 1 + r + r^2 + r^3 + \ldots + r^k = 1 / (1 − r).
  • If r = 1, then 1 + r + r^2 + r^3 + \ldots + r^k = k + 1.
  • If r > 1, then 1 + r + r^2 + r^3 + \ldots + r^k = (r^{k+1} − 1) / (r − 1).

DnC: Master theorem

Master theorem

Let a \ge 1, b \ge 2, c \ge 0 and suppose T(n) is a function on non-negative integers that satisfies the recurrence T(n) = a T (n / b) + \Theta(n^c) with T(0) = 0, T(1) = \Theta(1), where n / b means either \lfloor n / b \rfloor or \lceil n / b \rceil. Then,

Case 1. If c > \log_b a, then T(n) = \Theta(n^c).
Case 2. If c = \log_b a, then T(n) = \Theta(n^c \log n).
Case 3. If c < \log_b a, then T(n) = \Theta(n^{\log_b a}).

Pf sketch.

  • Prove when b is an integer and n is an exact power of b.
  • Extend domain of recurrences to reals (or rationals).
  • Deal with floors and ceilings.

DnC: Master theorem extensions

Master theorem

Let a \ge 1, b \ge 2, c \ge 0 and suppose T(n) is a function on non-negative integers that satisfies the recurrence T(n) = a T (n / b) + \Theta(n^c) with T(0) = 0, T(1) = \Theta(1), where n / b means either \lfloor n / b \rfloor or \lceil n / b \rceil. Then,

Case 1. If c > \log_b a, then T(n) = \Theta(n^c).
Case 2. If c = \log_b a, then T(n) = \Theta(n^c \log n).
Case 3. If c < \log_b a, then T(n) = \Theta(n^{\log_b a}).

Extensions.

  • Can replace \Theta with O everywhere.
  • Can replace \Theta with \Omega everywhere.
  • Can replace initial conditions so T(n) = \Theta(1) for all n \le n_0 and require recurrence to hold only for all n > n_0.

DnC: Master theorem - case 1

Master theorem

Let a \ge 1, b \ge 2, c \ge 0 and suppose T(n) is a function on non-negative integers that satisfies the recurrence T(n) = a T (n / b) + \Theta(n^c) with T(0) = 0, T(1) = \Theta(1), where n / b means either \lfloor n / b \rfloor or \lceil n / b \rceil. Then,

Case 1. If c > \log_b a, then T(n) = \Theta(n^c).
Case 2. If c = \log_b a, then T(n) = \Theta(n^c \log n).
Case 3. If c < \log_b a, then T(n) = \Theta(n^{\log_b a}).

Ex. [Case 1] T(n) = 48 T(\lfloor n / 4 \rfloor) + n^3.

  • a = 48, b = 4, c = 3 > \log_b a = 2.7924....
  • T(n) = \Theta(n^3).

DnC: Master theorem - case 2

Master theorem

Let a \ge 1, b \ge 2, c \ge 0 and suppose T(n) is a function on non-negative integers that satisfies the recurrence T(n) = a T (n / b) + \Theta(n^c) with T(0) = 0, T(1) = \Theta(1), where n / b means either \lfloor n / b \rfloor or \lceil n / b \rceil. Then,

Case 1. If c > \log_b a, then T(n) = \Theta(n^c).
Case 2. If c = \log_b a, then T(n) = \Theta(n^c \log n).
Case 3. If c < \log_b a, then T(n) = \Theta(n^{\log_b a}).

Ex. [Case 2] T(n) = T(\lfloor n / 2 \rfloor) + T(\lceil n / 2 \rceil) + 17 n.

  • a = 2, b = 2, c = 1 = \log_b a.
  • T(n) = \Theta(n \log n).

DnC: Master theorem - case 3

Master theorem

Let a \ge 1, b \ge 2, c \ge 0 and suppose T(n) is a function on non-negative integers that satisfies the recurrence T(n) = a T (n / b) + \Theta(n^c) with T(0) = 0, T(1) = \Theta(1), where n / b means either \lfloor n / b \rfloor or \lceil n / b \rceil. Then,

Case 1. If c > \log_b a, then T(n) = \Theta(n^c).
Case 2. If c = \log_b a, then T(n) = \Theta(n^c \log n).
Case 3. If c < \log_b a, then T(n) = \Theta(n^{\log_b a}).

Ex. [Case 3] T(n) = 3 T(\lfloor n / 2 \rfloor) + 5 n.

  • a = 3, b = 2, c = 1 < \log_b a = 1.5849....
  • T(n) = \Theta(n^{log_2 3}) = O(n^{1.58}).

DnC: Master theorem - exceptions

Gaps in master theorem.

  • Number of sub-problems is not a constant.
    • T(n) = n T(n/2) + n^2
  • Number of sub-problems is less than 1.
    • T(n) = \frac{1}{2} T(n/2) + n^2
  • Work to divide and combine sub-problems is not \Theta(n^c).
    • T(n) = 2 T(n/2) + n \log n

Akra–Bazzi theorem

Theorem. [Akra–Bazzi 1998]

Let a_1 > 0, 0 < b_i < 1, functions |h_i(n)| = O(n / \log^2 n) and g(n) = O(n^c). If T(n) satisfies the recurrence:

T(n) = \sum_{i=1}^k a_i T(b_i n + h_i(n)) + g(n)

then, T(n) = (n^p (1 + \int_1^n \frac{g(u)}{u^{p+1}}\,du)), where p satisfies \sum_{i=1}^k a_i b_i^p = 1.

Ex. T(n) = T(\lfloor n / 5 \rfloor) + T(n - 3\lfloor n / 10 \rfloor) + 11/5 n, with T(0) = 0, T(1) = 0.

  • a_1 = 1, b_1 = 1/5, a_2 = 1, b2 = 7/10 \Rightarrow p = 0.83978... < 1.
  • h_1(n) = \lfloor n / 5 \rfloor - n / 5, h_2(n) = 3/10 n - 3\lfloor n / 10 \rfloor.
  • g(n) = 11/5 n \Rightarrow T(n) = \Theta(n).

Integer multiplication

Integer addition and subtraction

Addition. Given two n-bit integers a and b, compute a + b.

Subtraction. Given two n-bit integers a and b, compute a - b.

 

Grade-school algorithm. \Theta(n) bit operations.

= 1 0 1 0 1 0 0 1 0
1 1 0 1 0 1 0 1
+ 0 1 1 1 1 1 0 1
1 1 1 1 1 1 0 1

Remark. Grade-school addition and subtraction algorithms are optimal.

Integer Multiplication Problem

Multiplication. Given two n-bit integers a and b, compute a \times b.
Grade-school algorithm (long multiplication). \Theta(n^2) bit operations.

Conjecture. [Kolmogorov 1956] Grade-school algorithm is optimal.
Theorem. [Karatsuba 1960] Conjecture is false.

DnC multiplication

To multiply two n-bit integers x and y:

  • Divide x and y into low- and high-order bits.
    • m = \lceil n/2 \rceil
    • a = \lfloor x / 2^m \rfloor, b = x \mod 2^m
    • c = \lfloor y / 2^m \rfloor, d = y \mod 2^m
  • Multiply four (n/2)-bit integers, recursively.
  • Add and shift to obtain result.

x y = (2^m a + b) (2^m c + d) = 2^{2m} ac + 2^m (bc + ad) + bd

Ex.

a b
x = 1 0 0 0 1 1 0 1
y = 1 1 1 0 0 0 0 1

DnC multiplication: algorithm

  1. IF (n = 1): RETURN x \times y;
  2. ELSE:
    1. m = \lceil n/2 \rceil;
    2. a = \lfloor x / 2^m \rfloor, b = x \mod 2^m;
    3. c = \lfloor y / 2^m \rfloor, d = y \mod 2^m;
    4. e = MULTIPLY(a, c, m);
    5. f = MULTIPLY(b, d, m);
    6. g = MULTIPLY(b, c, m);
    7. h = MULTIPLY(a, d, m);
  3. RETURN 2^{2m} e + 2^m (g + h) + f;

Time.

  • 2.1-2.3: \Theta(n)
  • 2.4-2.7: 4 T(\lceil n/2 \rceil)
  • 3: \Theta(n)

Quiz: DnC multiplication

How many bit operations to multiply two n-bit integers using the divide-and-conquer multiplication algorithm?

T(n) = \left\{ \begin{array}{lcl} \Theta(1) & \text{if} & n = 1 \\ 4 T(\lceil n/2 \rceil) + \Theta(n) & \text{if} & n > 1 \\ \end{array}\right.

A. T(n) = \Theta(n^{1/2}).
B. T(n) = \Theta(n \log n).
C. T(n) = \Theta(n^{\log_2 3}) = O(n^{1.585}).
D. T(n) = \Theta(n^2).

 

D: T(n) = \Theta(n^{\log_2 4}) = \Theta(n^2)

Karatsuba trick

To multiply two n-bit integers x and y:

  • Divide x and y into low- and high-order bits.
    • m = \lceil n/2 \rceil
    • a = \lfloor x / 2^m \rfloor, b = x \mod 2^m
    • c = \lfloor y / 2^m \rfloor, d = y \mod 2^m
  • To compute middle term bc + ad, use identity:
    • bc + ad = ac + bd - (a - b) (c - d)
  • Multiply only three (n/2)-bit integers, recursively.
    • reuse: ac and bd.

\begin{aligned} x y = (2^m a + b) (2^m c + d) & = 2^{2m} ac + 2^m (bc + ad) + bd \\ & = 2^{2m} ac + 2^m (ac + bd - (a - b) (c - d)) + bd \\ \end{aligned}

Karatsuba multiplication

  1. IF (n = 1): RETURN x \times y;
  2. ELSE:
    1. m = \lceil n/2 \rceil;
    2. a = \lfloor x / 2^m \rfloor, b = x \mod 2^m;
    3. c = \lfloor y / 2^m \rfloor, d = y \mod 2^m;
    4. e = KARATSUBA-MULTIPLY(a, c, m);
    5. f = KARATSUBA-MULTIPLY(b, d, m);
    6. g = KARATSUBA-MULTIPLY(|a - b|, |c - d|, m);
    7. Flip sign of g if needed.
  3. RETURN 2^{2m} e + 2^m (e + f - g) + f;

Time.

  • 2.1-2.3: \Theta(n)
  • 2.4-2.6: 3 T(\lceil n/2 \rceil)
  • 3: \Theta(n)

Karatsuba analysis

Proposition. Karatsuba’s algorithm requires O(n^{1.585}) bit operations to multiply two n-bit integers.
Pf. Apply Case 3 of the master theorem to the recurrence:

T(n) = \left\{ \begin{array}{lcl} \Theta(1) & \text{if} & n = 1 \\ 3 T(\lceil n/2 \rceil) + \Theta(n) & \text{if} & n > 1 \\ \end{array}\right.

\Rightarrow T(n) = \Theta(n^{\log_2 3}) = O(n^{1.585})

Practice.

  • Use base-32 or -64 (instead of base-2).
  • Faster than grade-school algorithm for about 320 – 640 bits.

Integer arithmetic reductions

arithmetic problem formula bit complexity
integer multiplication a \times b M(n)
integer square a^2 \Theta(M(n))
integer division \lfloor a / b \rfloor, a \mod b \Theta(M(n))
integer square root \lfloor \sqrt{a} \rfloor \Theta(M(n))

Integer arithmetic problems have the same bit complexity M(n) as integer multiplication.

Matrix multiplication

Dot product

Dot product. Given two length-n vectors a and b, compute c = a \cdot b = \sum_{i=1}^n a_i b_i.
Grade-school. \Theta(n) arithmetic operations.

 

Ex. a = [.70 .20 .10 ], b = [.30 .40 .30 ]:

a \cdot b = (.70 \times .30) + (.20 \times .40) + (.10 \times .30) = .32

 

Remark. “Grade-school” dot product algorithm is asymptotically optimal.

Matrix Multiplication Problem

Matrix multiplication. Given two n-by-n matrices A and B, compute C = AB.
Grade-school. \Theta(n^3) arithmetic operations.

 

\left[ \begin{array}{c} c_{11} & c_{12} & \cdots & c_{1n} \\ c_{21} & c_{22} & \cdots & c_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ c_{n1} & c_{n2} & \cdots & c_{nn} \\ \end{array} \right] = \left[ \begin{array}{c} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \\ \end{array} \right] \times \left[ \begin{array}{c} b_{11} & b_{12} & \cdots & b_{1n} \\ b_{21} & b_{22} & \cdots & b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ b_{n1} & b_{n2} & \cdots & b_{nn} \\ \end{array} \right]

 

Q. Is “grade-school” matrix multiplication algorithm asymptotically optimal?

Block matrix multiplication

\begin{aligned} C_{11} & = A_{11} \times B_{11} + A_{12} \times B_{21} \\ & = \left[ \begin{array}{c} 0 & 1 \\ 4 & 5 \\ \end{array} \right] \times \left[ \begin{array}{c} 16 & 17 \\ 20 & 21 \\ \end{array} \right] + \left[ \begin{array}{c} 2 & 3 \\ 6 & 7 \\ \end{array} \right] \times \left[ \begin{array}{c} 24 & 25 \\ 28 & 29 \\ \end{array} \right] \\ & = \left[ \begin{array}{c} 152 & 158 \\ 504 & 526 \\ \end{array} \right] \\ \end{aligned}

Block matrix multiplication: warmup

To multiply two n-by-n matrices A and B:

  • Divide: partition A and B into n/2-by-n/2 blocks.
  • Conquer: multiply 8 pairs of n/2-by-n/2 matrices, recursively.
  • Combine: add appropriate products using 4 matrix additions.

\begin{aligned} C & = A \times B \\ \left[ \begin{array}{c} C_{11} & C_{12} \\ C_{21} & C_{22} \\ \end{array} \right] & = \left[ \begin{array}{c} A_{11} & A_{12} \\ A_{21} & A_{22} \\ \end{array} \right] \times \left[ \begin{array}{c} B_{11} & B_{12} \\ B_{21} & B_{22} \\ \end{array} \right] \\ \end{aligned}

\begin{aligned} C_{11} & = A_{11} \times B_{11} + A_{12} \times B_{21} \\ C_{12} & = A_{11} \times B_{12} + A_{12} \times B_{22} \\ C_{21} & = A_{21} \times B_{11} + A_{22} \times B_{21} \\ C_{22} & = A_{21} \times B_{12} + A_{22} \times B_{22} \\ \end{aligned}

Running time. Apply Case 3 of the master theorem.

T(n) = 8 T(n/2) + \Theta(n^2) \Rightarrow T(n) = \Theta(n^3)

Strassen’s trick

Key idea. Can multiply two 2-by-2 matrices via 7 scalar multiplications (plus 11 additions and 7 subtractions).

\begin{aligned} \left[ \begin{array}{c} C_{11} & C_{12} \\ C_{21} & C_{22} \\ \end{array} \right] & = \left[ \begin{array}{c} A_{11} & A_{12} \\ A_{21} & A_{22} \\ \end{array} \right] \times \left[ \begin{array}{c} B_{11} & B_{12} \\ B_{21} & B_{22} \\ \end{array} \right] \\ \end{aligned} \begin{aligned} C_{11} & = P_5 + P_4 - P_2 + P_6 \\ C_{12} & = P_1 + P_2 \\ C_{21} & = P_3 + P_4 \\ C_{22} & = P_1 + P_5 - P_3 - P_7 \\ \end{aligned}

\begin{aligned} P_1 & = A_{11} \times (B_{12} - B_{22}) \\ P_2 & = (A_{11} + A_{12}) \times B_{22} \\ P_3 & = (A_{21} + A_{22}) \times B_{11} \\ P_4 & = A_{22} \times (B_{21} - B_{11}) \\ P_5 & = (A_{11} + A_{22}) \times (B_{11} + B_{22}) \\ P_6 & = (A_{12} - A_{22}) \times (B_{21} + B_{22}) \\ P_7 & = (A_{11} - A_{21}) \times (B_{11} + B_{12}) \\ \end{aligned}

Pf. C_{12} = P_1 + P_2 = A_{11} \times (B_{12} - B_{22}) + (A_{11} + A_{12}) \times B_{22} = A_{11} \times B_{12} + A_{12} \times B_{22}

Strassen’s algorithm

  1. IF (n = 1): RETURN A \times B;
  2. Partition A and B into n/2-by-n/2 blocks;
  3. P_{1} = STRASSEN(n / 2, A_{11}, (B_{12} - B_{22}));
  4. P_{2} = STRASSEN(n / 2, (A_{11} + A_{12}), B_{22});
  5. P_{3} = STRASSEN(n / 2, (A_{21} + A_{22}), B_{11});
  6. P_{4} = STRASSEN(n / 2, A_{22}, (B_{21} - B_{11}));
  7. P_{5} = STRASSEN(n / 2, (A_{11} + A_{22}), (B_{11} + B_{22}));
  8. P_{6} = STRASSEN(n / 2, (A_{12} - A_{22}), (B_{21} + B_{22}));
  9. P_{7} = STRASSEN(n / 2, (A_{11} - A_{21}), (B_{11} + B_{12}));
  10. C_{11} = P_{5} + P_{4} - P_{2} + P_{6};
  11. C_{12} = P_{1} + P_{2};
  12. C_{21} = P_{3} + P_{4};
  13. C_{22} = P_{1} + P_{5} - P_{3} - P_{7};
  14. RETURN C;

Time.

  • 3-9: 7 T(n/2) + \Theta(n^2)
  • 10-13: \Theta(n^2)

Strassen’s algorithm: analysis

Theorem. [Strassen 1968] Strassen’s algorithm requires O(n^{2.81}) arithmetic operations to multiply two n-by-n matrices.
Pf.

  • When n is a power of 2, apply Case 1 of the master theorem:
    • T(n) = 7 T(n/2) + \Theta(n^2) \Rightarrow T(n) = \Theta(n^{\log_2 7}) = \Theta(n^{2.81})
  • When n is not a power of 2, pad matrices with zeros to be -by-, where n \le nʹ < 2n and is a power of 2.

Strassen’s algorithm: practice

Implementation issues.

  • Sparsity.
  • Caching.
  • n not a power of 2.
  • Numerical stability.
  • Non-square matrices.
  • Storage for intermediate sub-matrices.
  • Crossover to classical algorithm when n is “small.”
  • Parallelism for multi-core and many-core architectures.

Common misperception. “Strassen’s algorithm is only a theoretical curiosity.”

  • Apple reports 8x speedup when n \approx 2,048.
  • Range of instances where it’s useful is a subject of controversy.

Quiz: matrix multiplication

Suppose that you could multiply two 3-by-3 matrices with 21 scalar multiplications. How fast could you multiply two n-by-n matrices?

A. \Theta(n^{\log_3 21})
B. \Theta(n^{\log_2 21})
C. \Theta(n^{\log_9 21})
D. \Theta(n^2)

 

A

Fast matrix multiplication: theory

Q. Multiply two 2-by-2 matrices with 7 scalar multiplications?
A. Yes! [Strassen 1969] \Theta(n^{\log_2 7}) = O(n^{2.81})

Q. Multiply two 2-by-2 matrices with 6 scalar multiplications?
A. Impossible. [Hopcroft–Kerr, Winograd 1971] \Theta(n^{\log_2 6}) = O(n^{2.59})

Numeric linear algebra reductions

linear algebra problem expression arithmetic complexity
matrix multiplication A × B MM(n)
matrix squaring A^2 \Theta(MM(n))
matrix inversion A^{-1} \Theta(MM(n))
determinant \|A\| \Theta(MM(n))
rank rank(A) \Theta(MM(n))
system of linear equations Ax = b \Theta(MM(n))
LU decomposition A = L U \Theta(MM(n))
least squares \min \Vert Ax - b\Vert_2 \Theta(MM(n))

Numerical linear algebra problems have the same arithmetic complexity MM(n) as matrix multiplication

Convolution and FFT

Fourier analysis

Fourier theorem. [Fourier, Dirichlet, Riemann] Any (sufficiently smooth) periodic function can be expressed as the sum of a series of sinusoids.

  • transform: time domain –> frequency domain.

Euler’s identity

Euler’s identity. e^{ix} = \cos x + i \sin x.

Sinusoids. Sum of sine and cosines = sum of complex exponentials.

  • s_N(x) = \frac{a_0}{2} + \sum_{n=1}^N (a_n \cos(2\pi nx) + b_n \sin(2\pi nx))
  • s_N(x) = \sum_{n=-N}^N c_n \cdot e^{i 2\pi nx}

Fast Fourier transform

FFT. Fast way to convert between time domain and frequency domain.

Alternate viewpoint. Fast way to multiply and evaluate polynomials.

 

“ If you speed up any nontrivial algorithm by a factor of a million or so the world will beat a path towards finding useful applications for it. ” — Numerical Recipes

Polynomials: coefficient representation

Univariate polynomial. [ coefficient representation ]

  • A(x) = a_0 + a_1 x + a_2 x^2 + \ldots + a_{n−1} x^{n−1}: a_0, a_1, \ldots + a_{n−1}
  • B(x) = b_0 + b_1 x + b_2 x^2 + \ldots + b_{n−1} x^{n−1}: b_0, b_1, \ldots + b_{n−1}

Addition. O(n) arithmetic operations.

  • A(x) + B(x) = (a_0 + b_0) + (a_1 + b_1) x + \ldots + (a_{n−1} + b_{n−1}) x^{n−1}

Evaluation. O(n) using Horner’s method.

  • A(x) = a_0 + (x(a_1 + x(a_2 + \ldots + x(a_{n−2} + x(a_{n−1})) \ldots))
    • for j = n-1 .. 0: v = a[j] + (x * v);

Multiplication (linear convolution). O(n^2) using brute-force.

  • A(x) \times B(x) = \sum_{i=0}^{2n-2} c_i x^i, where c_i = \sum_{j=0}^i a_j b_{i-j}.

Polynomials: point-value representation

Fundamental theorem of algebra. A degree n univariate polynomial with complex coefficients has exactly n complex roots.

Corollary. A degree n - 1 univariate polynomial A(x) is uniquely specified by its evaluation at n distinct values of x.

Polynomials: point-value (cont.)

Univariate polynomial. [ point-value representation ]

  • A(x): (x_0, y_0), \ldots, (x_{n-1}, y_{n-1})
  • B(x): (x_0, z_0), \ldots, (x_{n-1}, z_{n-1})

Addition. O(n) arithmetic operations.

  • A(x) + B(x): (x_0, y_0 + z_0), \ldots, (x_{n-1}, y_{n-1} + z_{n-1})

Multiplication. O(n), but represent A(x) and B(x) using 2n points.

  • A(x) \times B(x): (x_0, y_0 \times z_0), \ldots, (x_{2n−1}, y_{2n−1} \times z_{2n−1})

Evaluation. O(n^2) using Lagrange’s formula.

  • A(x) = \sum_{k=0}^{n-1} y_k \frac{\prod_{j \ne k} (x - x_j)}{\prod_{j \ne k} (x_k - x_j)}

Representations: tradeoff

Tradeoff. Either fast evaluation or fast multiplication. We want both!

representation multiply evaluate
coefficient O(n^2) O(n)
point-value O(n) O(n^2)

Goal. Efficient conversion between two representations \Rightarrow all ops fast.

Converting between two representations

Application. Polynomial multiplication (coefficient representation).

  • exactly the reason to do Fourier transform
coefficient transform point-value
coefficient FFT: O(n \log n) point-value
multiplication: O(n)
coefficient inv. FFT: O(n \log n) point-value

Converting representation: brute-force

Coefficient \Rightarrow Point-value. Given a polynomial A(x) = a_0 + a_1 x + ... + a_{n-1} x^{n-1}, evaluate it at n distinct points x_0, ..., x_{n-1}.

\left[ \begin{array}{c} y_{0} \\ y_{1} \\ y_{2} \\ \vdots \\ y_{n-1} \\ \end{array} \right] = \left[ \begin{array}{c} 1 & x_{0} & x_{0}^2 & \cdots & x_{0}^{n-1} \\ 1 & x_{1} & x_{1}^2 & \cdots & x_{1}^{n-1} \\ 1 & x_{2} & x_{2}^2 & \cdots & x_{2}^{n-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n-1} & x_{n-1}^2 & \cdots & x_{n-1}^{n-1} \\ \end{array} \right] \times \left[ \begin{array}{c} a_{0} \\ a_{1} \\ a_{2} \\ \vdots \\ a_{n-1} \\ \end{array} \right]

 

Running time. O(n^2) via matrix-vector multiply (or O(n) Horner’s).

Converting: brute-force (cont.)

Point-value \Rightarrow Coefficient. Given n distinct points x_0, ..., x_{n-1} and values y_0, ..., y_{n-1}, find unique polynomial A(x) = a_0 + a_1 x + ... + a_{n-1} x^{n-1}, that has given values at given points.

\left[ \begin{array}{c} y_{0} \\ y_{1} \\ y_{2} \\ \vdots \\ y_{n-1} \\ \end{array} \right] = \left[ \begin{array}{c} 1 & x_{0} & x_{0}^2 & \cdots & x_{0}^{n-1} \\ 1 & x_{1} & x_{1}^2 & \cdots & x_{1}^{n-1} \\ 1 & x_{2} & x_{2}^2 & \cdots & x_{2}^{n-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & x_{n-1} & x_{n-1}^2 & \cdots & x_{n-1}^{n-1} \\ \end{array} \right] \times \left[ \begin{array}{c} a_{0} \\ a_{1} \\ a_{2} \\ \vdots \\ a_{n-1} \\ \end{array} \right]

  • Vandermonde matrix is invertible iff x_i distinct.

Running time. O(n^3) via Gaussian elimination.

  • or O(n^{2.38}) via fast matrix multiplication

Divide-and-conquer

Decimation in time. Divide into even- and odd- degree terms.

  • A(x) = a_0+ a_1 x + a_2 x^2 + a_3 x^3+ a_4 x^4 + a_5 x^5 + a_6 x^6 + a_7 x^7.
  • A_{even}(x) = a_0+ a_2 x + a_4 x^2 + a_6 x^3.
  • A_{odd}(x) = a_1 + a_3 x + a_5 x^2 + a_7 x^3.
  • A(x) = A_{even}(x^2) + x A_{odd}(x^2).

 

Decimation in frequency. Divide into low- and high-degree terms.

  • A(x) = a_0+ a_1 x + a_2 x^2 + a_3 x^3+ a_4 x^4 + a_5 x^5 + a_6 x^6 + a_7 x^7.
  • A_{low}(x) = a_0+ a_1 x + a_2 x^2 + a_3 x^3.
  • A_{high}(x) = a_4 + a_5 x + a_6 x^2 + a_7 x^3.
  • A(x) = A_{low}(x) + x^4 A_{high}(x).

Coefficient \Rightarrow Point-value: intuition

Coefficient \Rightarrow Point-value. Given a polynomial A(x) = a_0 + a_1 x + ... + a_{n-1} x^{n-1}, evaluate it at n distinct points x_0, ..., x_{n-1}.

Divide. Break up polynomial into even- and odd-degree terms.

  • A(x) = a_0+ a_1 x + a_2 x^2 + a_3 x^3+ a_4 x^4 + a_5 x^5 + a_6 x^6 + a_7 x^7.
    • A_{even}(x) = a_0+ a_2 x + a_4 x^2 + a_6 x^3.
    • A_{odd}(x) = a_1 + a_3 x + a_5 x^2 + a_7 x^3.
  • A(x) = A_{even}(x^2) + x A_{odd}(x^2).
    • A(-x) = A_{even}(x^2) - x A_{odd}(x^2).

Need 4 evaluations.

Coefficient \Rightarrow Point-value: intuition (cont.)

Intuition. Choose four complex points to be \pm 1, \pm i.

  • A(1) = A_{even}(1) + 1 A_{odd}(1).
  • A(-1) = A_{even}(1) - 1 A_{odd}(1).
  • A(i) = A_{even}(-1) + i A_{odd}(-1).
  • A(-i) = A_{even}(-1) - i A_{odd}(-1).

What if n \ge 8?

Roots of unity

Def. An n^{th} root of unity is a complex number x such that x^n = 1.

Fact. The n^{th} roots of unity are: \omega^0, \omega^1, \ldots, \omega^{n-1} where \omega = e^{2π i / n}.
Pf. (\omega^k)^n = (e^{2π i k / n})^n = (e^{π i})^{2k} = (−1)^{2k} = 1.

Fact. The (n/2)^{th} roots of unity are: \nu^0, \nu^1, \ldots, \nu^{n/2-1} where \nu = \omega^2 = e^{4π i / n}.

Discrete Fourier transform

Coefficient \Rightarrow Point-value. Given a polynomial A(x) = a_0 + a_1 x + ... + a_{n-1} x^{n-1}, evaluate it at n distinct points x_0, ..., x_{n-1}.

Key idea. Choose x_k = \omega^k where \omega is principal n^{th} root of unity.

\left[ \begin{array}{c} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ \vdots \\ y_{n-1} \\ \end{array} \right] = \left[ \begin{array}{c} 1 & 1 & 1 & 1 & \cdots & 1 \\ 1 & \omega^{1} & \omega^2 & \omega^3 & \cdots & \omega^{(n-1)} \\ 1 & \omega^{2} & \omega^4 & \omega^6 & \cdots & \omega^{2(n-1)} \\ 1 & \omega^{3} & \omega^6 & \omega^9 & \cdots & \omega^{3(n-1)} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \omega^{n-1} & \omega^{2(n-1)} & \omega^{3(n-1)} & \cdots & \omega^{(n-1)(n-1)} \\ \end{array} \right] \times \left[ \begin{array}{c} a_{0} \\ a_{1} \\ a_{2} \\ a_{3} \\ \vdots \\ a_{n-1} \\ \end{array} \right]

Fast Fourier transform: steps

Goal. Evaluate a degree n - 1 polynomial A(x) = a_0 + a_1 x + ... + a_{n-1} x^{n-1} at its n^{th} roots of unity: \omega^0, \omega^1, \ldots, \omega^{n-1}.

Divide. Break up polynomial into even- and odd-degree terms.

  • A_{even}(x) = a_0+ a_2 x + a_4 x^2 + \ldots + a_{n-2} x^{n/2-1}.
  • A_{odd}(x) = a_1 + a_3 x + a_5 x^2 + \ldots + a_{n-1} x^{n/2-1}.
  • A(x) = A_{even}(x^2) + x A_{odd}(x^2).
  • A(-x) = A_{even}(x^2) - x A_{odd}(x^2).

Conquer. Evaluate A_{even}(x) and A_{odd}(x) at (n/2)^{th} roots of unity: \nu^0, \nu^1, \ldots, \nu^{n/2-1}

Combine.

  • y_k = A(\omega^k) = A_{even}(\nu^k) + \omega^k A_{odd}(\nu^k), 0 \le k < n/2.
  • y_{k+ n/2} = A(\omega^{k+ n/2}) = A_{even}(\nu^k) - \omega^k A_{odd}(\nu^k), 0 \le k < n/2.

Fast Fourier transform: algorithm

  1. IF (n = 1): RETURN a_0;
  2. (e_0, e_1, \ldots, e_{n/2-1}) = FFT(n / 2, a_0, a_2, a_4, \ldots, a_{n-2});
  3. (d_0, d_1, \ldots, d_{n/2-1}) = FFT(n / 2, a_1, a_3, a_5, \ldots, a_{n-1});
  4. FOR k = 0 .. n / 2 - 1:
    1. \omega^k = e^{2π i k/n};
    2. y_k = e_k + \omega^k d_k;
    3. y_{k + n/2} = e_k - \omega^k d_k;
  5. RETURN (y_0, y_1, y_2, \ldots, y_{n-1}).

Time.

  • 2-3: 2 T(n/2)
  • 4.1-4.3: \Theta(n)

FFT: analysis

Theorem. The FFT algorithm evaluates a degree n - 1 polynomial at each of the n^{th} roots of unity in O(n \log n) arithmetic operations and O(n) extra space.
Pf.

T(n) = \left\{ \begin{array}{lcl} \Theta(1) & \text{if} & n = 1 \\ 2 T(n/2) + \Theta(n) & \text{if} & n > 1 \\ \end{array}\right.

Quiz: FFT tree

When computing the FFT of (a_0, a_1, a_2, \dots, a_7), which are the first two coefficients involved in an arithmetic operation?

A. a_0 and a_1.
B. a_0 and a_2.
C. a_0 and a_4.
D. a_0 and a_7.
E. None of the above.

C: first leaf of the FFT tree.

FFT: Fourier matrix decomposition

Alternative viewpoint. FFT is a recursive decomposition of Fourier matrix.

F_n = \left[ \begin{array}{c} 1 & 1 & 1 & 1 & \cdots & 1 \\ 1 & \omega^{1} & \omega^2 & \omega^3 & \cdots & \omega^{(n-1)} \\ 1 & \omega^{2} & \omega^4 & \omega^6 & \cdots & \omega^{2(n-1)} \\ 1 & \omega^{3} & \omega^6 & \omega^9 & \cdots & \omega^{3(n-1)} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \omega^{n-1} & \omega^{2(n-1)} & \omega^{3(n-1)} & \cdots & \omega^{(n-1)(n-1)} \\ \end{array} \right] a = \left[ \begin{array}{c} a_{0} \\ a_{1} \\ a_{2} \\ \vdots \\ a_{n-1} \\ \end{array} \right]

I_n = \left[ \begin{array}{c} 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \\ \end{array} \right] D_n = \left[ \begin{array}{c} \omega^{0} & 0 & 0 & \cdots & 0 \\ 0 & \omega^{1} & 0 & \cdots & 0 \\ 0 & 0 & \omega^{2} & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & \omega^{n-1} \\ \end{array} \right]

y = F_n a = \left[ \begin{array}{c} I_{n/2} & D_{n/2} \\ I_{n/2} & -D_{n/2} \\ \end{array} \right] \left[ \begin{array}{c} F_{n/2} a_{even} \\ F_{n/2} a_{odd} \\ \end{array} \right]

Inverse discrete Fourier transform

Point-value \Rightarrow Coefficient. Given n distinct points x_0, ..., x_{n-1} and values y_0, ..., y_{n-1}, find unique polynomial A(x) = a_0 + a_1 x + ... + a_{n-1} x^{n-1}, that has given values at given points.

\left[ \begin{array}{c} a_{0} \\ a_{1} \\ a_{2} \\ a_{3} \\ \vdots \\ a_{n-1} \\ \end{array} \right] = \left[ \begin{array}{c} 1 & 1 & 1 & 1 & \cdots & 1 \\ 1 & \omega^{1} & \omega^2 & \omega^3 & \cdots & \omega^{(n-1)} \\ 1 & \omega^{2} & \omega^4 & \omega^6 & \cdots & \omega^{2(n-1)} \\ 1 & \omega^{3} & \omega^6 & \omega^9 & \cdots & \omega^{3(n-1)} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \omega^{n-1} & \omega^{2(n-1)} & \omega^{3(n-1)} & \cdots & \omega^{(n-1)(n-1)} \\ \end{array} \right]^{-1} \left[ \begin{array}{c} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ \vdots \\ y_{n-1} \\ \end{array} \right]

Inverse FFT

Claim. Inverse of Fourier matrix F_n is given by following formula:

G_n = \frac{1}{n} = \left[ \begin{array}{c} 1 & 1 & 1 & 1 & \cdots & 1 \\ 1 & \omega^{-1} & \omega^{-2} & \omega^{-3} & \cdots & \omega^{-(n-1)} \\ 1 & \omega^{-2} & \omega^{-4} & \omega^{-6} & \cdots & \omega^{-2(n-1)} \\ 1 & \omega^{-3} & \omega^{-6} & \omega^{-9} & \cdots & \omega^{-3(n-1)} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & \omega^{-(n-1)} & \omega^{-2(n-1)} & \omega^{-3(n-1)} & \cdots & \omega^{-(n-1)(n-1)} \\ \end{array} \right]

Consequence. To compute the inverse FFT, apply the same algorithm but use \omega^{-1} = e^{-2π i / n} as principal n^{th} root of unity (and divide the result by n).

Inverse FFT: correctness

Claim. F_n and G_n are inverses.
Pf.

(F_n G_n)_{k k'} = \frac{1}{n} \sum_{j=0}^{n-1} \omega^{kj} \omega^{-jk'} = \frac{1}{n} \sum_{j=0}^{n-1} \omega^{(k-k')j} = \left\{ \begin{array}{lll} 1 & \text{if}\ k = k \\ 0 & \text{otherwise} \\ \end{array}\right.

Inverse FFT: correctness (cont.)

Summation lemma. Let \omega be a principal n^{th} root of unity. Then

\sum_{j=0}^{n-1} \omega^{kj} = \left\{ \begin{array}{lll} n & \text{if}\ k \equiv 0 \pmod{0} \\ 0 & \text{otherwise} \\ \end{array}\right.

Pf.

  • If k is a multiple of n, then \omega^k = 1 \Rightarrow series sums to n.
  • Each n^{th} root of unity \omega^k is a root of x^n - 1 = (x - 1) (1 + x + x^2 + ... + x^{n−1}).
  • if \omega^k \ne 1, then 1 + \omega^k + \omega^{k(2)} + … + \omega^{k(n−1)} = 0 \Rightarrow series sums to 0.

Inverse FFT: algorithm

  1. IF (n = 1): RETURN y_0;
  2. (e_0, e_1, \ldots, e_{n/2-1}) = INVERSE-FFT(n / 2, y_0, y_2, y_4, \ldots, y_{n-2});
  3. (d_0, d_1, \ldots, d_{n/2-1}) = INVERSE-FFT(n / 2, y_1, y_3, y_5, \ldots, y_{n-1});
  4. FOR k = 0 .. n / 2 - 1:
    1. \omega^k = e^{-2π i k/n};
    2. a_k = e_k + \omega^k d_k;
    3. a_{k + n/2} = e_k - \omega^k d_k;
  5. RETURN (a_0, a_1, a_2, \ldots, a_{n-1});

Time.

  • 2-3: 2 T(n/2)
  • 4.1-4.3: \Theta(n)

Inverse FFT: analysis

Theorem. The inverse FFT algorithm interpolates a degree n - 1 polynomial at each of the n^{th} roots of unity in O(n \log n) arithmetic operations.

  • assumes n is a power of 2

Corollary. Can convert between coefficient and point-value representations in O(n \log n) arithmetic operations.

coefficient transform point-value
coefficient FFT: O(n \log n) point-value
multiplication: O(n)
coefficient inv. FFT: O(n \log n) point-value

Polynomial multiplication

Theorem. Given two polynomials A(x) = a_0 + a_1 x + a_2 x^2 + \ldots + a_{n−1} x^{n−1} and B(x) = b_0 + b_1 x + b_2 x^2 + \ldots + b_{n−1} x^{n−1} of degree n - 1, can multiply them in O(n \log n) arithmetic operations.

  • pad with 0s to make n a power of 2

Pf.

coefficient transform point-value
coefficient FFT: O(n \log n) point-value
multiplication: O(n)
coefficient inv. FFT: O(n \log n) point-value

Convolution

Convolution. A vector with 2n -1 coordinates, where c_k = \sum_{(i,j):i+j=k} a_i b_j.

  • a \ast b = (a_0 b_0, a_0 b_1 + a_1 b_0, a_0 b_2 + a_1 b_1 + a_2 b_0, ..., a_{n−2} b_{n−1} + a_{n−1} b_{n−2}, a_{n−1} b_{n−1}).
    • exactly the coordinates of polynomial multiplication.
  • summing along anti-diagonals of the following matrix.

\left[ \begin{array}{c} a_0 b_0 & a_0 b_1 & a_0 b_2 & a_0 b_3 & \cdots & a_0 b_{n-1} \\ a_1 b_0 & a_1 b_1 & a_1 b_2 & a_1 b_3 & \cdots & a_1 b_{n-1} \\ a_2 b_0 & a_2 b_1 & a_2 b_2 & a_2 b_3 & \cdots & a_2 b_{n-1} \\ a_3 b_0 & a_3 b_1 & a_3 b_2 & a_3 b_3 & \cdots & a_3 b_{n-1} \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ a_{n-1} b_0 & a_{n-1} b_1 & a_{n-1} b_2 & a_{n-1} b_3 & \cdots & a_{n-1} b_{n-1} \\ \end{array} \right]

Integer multiplication, revisit

Integer multiplication. Given two n-bit integers a = a_{n-1} … a_1 a_0 and b = b_{n-1} … b_1 b_0, compute their product a \cdot b.

Convolution algorithm.

  • Form two polynomials.
    • A(x) = a_0 + a_1 x + a_2 x^2 + \ldots + a_{n−1} x^{n−1}
    • B(x) = b_0 + b_1 x + b_2 x^2 + \ldots + b_{n−1} x^{n−1}
    • Note: a = A(2), b = B(2).
  • Compute C(x) = A(x) \cdot B(x).
  • Evaluate C(2) = a \cdot b.
  • Running time: O(n \log n) floating-point operations.

Practice. [GNU Multiple Precision Arithmetic Library]

  • Switches to FFT-based algorithm when n is large (\ge 5 – 10K).

3-Sum: revisit

3-SUM. Given three sets X, Y, and Z of n integers each, determine whether there is a triple i \in X, j \in Y, k \in Z such that i + j = k.

Assumption. All integers are between 0 and m.

Goal. O(m \log m + n \log n) time.

Ex.

m = 19, n = 3

  • X = \{ 4, 7, 10 \}
  • Y = \{ 5, 8, 15 \}
  • Z = \{ 4, 13, 19 \}

4 + 15 = 19

3-Sum: solution

An O(m \log m + n) solution.

  • Form polynomial A(x) = a_0 + a_1 x + \ldots + a_m x^m with a_i = 1 iff i \in X.
  • Form polynomial B(x) = b_0 + b_1 x + \ldots + b_m x^m with b_j = 1 iff j \in Y.
  • Compute product/convolution C(x) = A(x) \times B(x).
  • The coefficient c_k = number of ways to choose an integer i \in X and an integer j \in Y that sum to exactly k.
  • For each k \in Z: check whether c_k > 0.

Ex.

m = 19, n = 3

  • X = \{ 4, 7, 10 \}
  • Y = \{ 5, 8, 15 \}
  • Z = \{ 4, 13, 19 \}

4 + 15 = 19

\begin{aligned} A(x) & = x^4 + x^7 + x^{10} \\ B(x) & = x^{5} + x^{8} + x^{15} \\ C(x) & = x^{9} + 2x^{12} + 2x^{15} + x^{18} + x^{19} + x^{22} + x^{25} \\ \end{aligned}