Content
- Computational Tractability
- Asymptotic Order of Growth
- Implement Gale–Shapley
- Common Running Times
- Recap: Priority Queue
Algorithm II
2. Algorithm Analysis
WU Xiaokun 吴晓堃
xkun.wu [at] gmail
Precise assessment leads to better understanding.
We focus on the efficiency of algorithms now.
Loosely speaking: delimitate whether a problem can be solved in practice.
Intractable problem maybe solvable in theory, but in practice any solution takes too many resources to be useful.
So efficiency is about: resource requirements vs. computational power.
“By what course of calculation can these results be arrived at by the machine in the shortest time?” — Charles Babbage (1864)
Consider a 64-bit system:
Intuition. When implemented, runs fast and uses few memory on real inputs.
struct
, int
We need a measure of algorithm itself, rather than external indicators.
Can we measure efficiency when input number is fixed (same PC)?
Better measure: How is the algorithm scale with problem size.
How resource requirements grow with increasing input size.
So we study and compare growth of functions.
Worst-Case Running Times: longest possible running time.
Average-case analysis: averaged over “random” instances.
Now consider and compare T(N) on worst-cases
Brute-Force Search: the most natural last-resort solution.
Define efficient: achieves qualitatively better worst-case performance, at an analytical level, than brute-force search.
What is “qualitatively better”? Better scalability
Desirable scaling property. When input size doubles, algorithm slow down by at most some multiplicative constant factor C.
There exist constants c > 0 and d > 0 such that, for every input of size N, the running time of the algorithm is bounded above by c N^d primitive computational steps.
Def. An algorithm is efficient if it has a polynomial running time.
Exceptions: galactic constants and/or huge exponents
Assume: one million (10^6) high-level instructions per second.
Notice the the huge difference between polynomial and exponential.
Now we found the way to compare growth of functions.
Mathematically, asymptotic is used for describing limiting behavior.
Limits are natural bounds for analysis.
Caution. In CS, deal with discrete quantities.
Ex. T(n) = pn^2 + qn + r:
One-way “equality”. O(g(n)) is a set of functions.
Note. O(\cdot) expresses only an upper bound.
Domain and Range. T and f are real-valued functions.
Reflexivity. f is O(f).
Constants. If f is O(g) and c > 0, then c f is O(g).
Products. If f_1 is
O(g_1) and f_2 is O(g_2), then f_1 f_2 is
O(g_1 g_2).
Pf.
Sums. If f_1 is O(g_1) and f_2 is O(g_2), then f_1 + f_2 is O(\max\{g_1, g_2\}).
Transitivity. If f is O(g) and g is O(h), then f is O(h).
Ex. f(n) = 5n^3 + 3n^2 + n + 1234 is O(n^3).
Ex. T(n) = 32n^2 + 17n + 1
Ex. T(n) = 32n^2 + 17n + 1
Compute: closing gap between upper bound and lower bound
Proposition. If for some constant 0 < c < \infty \lim_{n \to \infty} \frac{f(n)}{g(n)} = c
then f(n) is \Theta(g(n)).
Pf.
Proposition. If \lim_{n \to \infty} \frac{f(n)}{g(n)} = 0, then f(n) is O(g(n)) but not \Omega(g(n)).
Proposition. If \lim_{n \to \infty} \frac{f(n)}{g(n)} = \infty, then f(n) is \Omega(g(n)) but not O(g(n)).
Polynomials. Let f(n) =
a_0 + a_1 n + \ldots + a_d n^d with a_d
> 0. Then f(n) =
\Theta(n^d).
Pf. \lim_{n \to \infty}
\frac{a_0 + a_1 n + \ldots + a_d n^d}{n^d} = a_d > 0
Logarithms. \log_a n =
\Theta(\log_b n) for every a >
1 and every b > 1.
Pf. \frac{\log_a n}{\log_b n}
= \frac{1}{\log_b a}.
Logarithms and polynomials. \log_a n = O(n^d) for every a > 1 and every d > 0.
Pf. \lim_{n \to \infty}
\frac{\log_a n}{n^d} = 0.
Exponentials and polynomials. n^d = O(r^n) for every r > 1 and every d > 0.
Pf. \lim_{n \to \infty}
\frac{n^d}{r^n} = 0.
Factorials. n! =
2^{\Theta(n \log n)}.
Pf. Stirling’s formula: n!
\thicksim \sqrt{2 \pi n} (\frac{n}{e})^n.
Upper bounds. f(m, n) = O(g(m, n)) if there exist constants c > 0, m_0 \ge 0, and n_0 \ge 0 such that 0 \le f(m, n) \le c g (m, n) for all n \ge n_0, m \ge m_0.
Ex. f(m, n) = 32mn^2 + 17mn + 32n^3.
Compute: closing gap between upper bound and lower bound
Recall: Algorithm terminates in at most n^2 iterations
INPUT: M, W, R_m, R_w
P = \varnothing; mark m \in M and w \in
W free
;
WHILE some m \in M is
free
free
free
;RETURN P;
Goal: the following operations take constant time:
free
m.matched
,
free
List NF containing indices of M (queue or stack also works)
Reuse the preference list R_m, only check the head.
If R_m is constant
,
maintain a pointer to next proposal.
Index M, W: \{1, 2, \ldots, n\}.
Matching. Arrays P_m, P_w.
unmatched
.
matched
, and to whom: O(1)So far, operation 1-3 can be implemented in O(1) time.
Naive implementation: walk R_w
Alternative: trade space for time.
For each w \in W, maintain an array I_w contains the inverse of R_w.
Theorem. Can implement Gale–Shapley to run in O(n^2) time.
Pf.
Theorem. In the worst case, any algorithm to find a
stable matching must query R_m for
\Omega(n^2) times.
[proof skipped.]
Conclusion. GS = \Theta(n^2)
Constant time. Running time is O(1).
Examples.
Linear time. Running time is O(n).
Computing the Maximum.
Merge two sorted lists. Combine two sorted linked lists A = a_1, a_2, \ldots, a_n and B = b_1, b_2, \ldots, b_n into a sorted whole.
Target-Sum. Given a sorted array of n distinct integers and an integer T, find two that sum to exactly T?
Hint: move two indices from opposite side towards each other.
Logarithmic time. Running time is O(\log n).
Search in a sorted array. Given a sorted array A of n
distinct integers and an integer x,
find index of x in array.
Binary search.
WHILE
loop, (hi − lo + 1) \le n /
2^k \Rightarrow k \le 1 + \log_2 n.Linearithmic time. Running time is O(n \log n).
Sorting. Given an array of n elements, rearrange them in ascending order.
Quadratic time. Running time is O(n^2).
Closest pair of points. Given a list of n points in
the plane (x_1, y_1), (x_2, y_2), \ldots,
(x_n, y_n), find the pair that is closest to each other.
O(n^2) algorithm.
Enumerate all pairs of points (with i <
j).
Remark. \Omega(n^2) seems inevitable, but this is just an illusion.
Cubic time. Running time is O(n^3).
3-SUM. Given an array of n distinct integers, find three that sum to
0.
O(n^3) algorithm.
Enumerate all triples (with i < j <
k).
Remark. \Omega(n^3) seems inevitable, but O(n^2) is not hard.
Polynomial time. Running time is O(n^k) for some constant k > 0.
Independent set of size k. Given a graph, find k nodes such that no two are joined by an
edge.
O(n^k) algorithm.
Enumerate all subsets of k nodes.
Exponential time. Running time is O(2^{n^k}) for some constant k > 0.
Independent set. Given a graph, find independent set
of max cardinality.
O(n^2 2^n) algorithm.
Enumerate all subsets of n
elements.
Which is an equivalent definition of exponential time?
Neither: take the limit of division.
Primary goal. seek algorithms that improve qualitatively on brute-force search.
Priority Queue. Each element has a priority value.
Maintain elements in sorted order of keys.
Conceptually, think heap as balanced binary tree
Heap order: For every element v, at a node i, the element w at i’s parent satisfies key(w) \le key(v).
Heapify-up: fixing the heap by pushing the damaged part upward.
Heapify-down: proceeds down the tree recursively.