Algorithm II
12. Local Search
WU Xiaokun 吴晓堃
xkun.wu [at] gmail
Q. Suppose I need to solve an NP-hard problem. What
should I do?
A. Sacrifice one of three desired features.
Vertex cover. Given a graph G = (V, E), find a subset of nodes S of minimal cardinality such that for each (u, v) \in E, either u or v (or both) are in S.
Neighbor relation. S \sim S' if S' can be obtained from S by adding or deleting a single node. Each vertex cover S has at most n neighbors.
Gradient descent. Start with S = V. If there is a neighbor S' that is a vertex cover and has lower cardinality, replace S with S'.
Remark. Algorithm terminates after at most n steps since each update decreases the size of the cover by one.
Local optimum. No neighbor is strictly better.
Local search. Algorithm that explores the space of possible solutions in sequential fashion, moving from a current solution to a “nearby” one.
Neighbor relation. Let S \sim S' be a neighbor relation for the problem.
Gradient descent. Let S denote current solution. If there is a neighbor S' of S with strictly lower cost, replace S with the neighbor whose cost is as small as possible. Otherwise, terminate the algorithm.
Problem. Stuck into local optimal solution.
Metropolis algorithm.
Gibbs-Boltzmann function. The probability of finding a physical system in a state with energy E is proportional to e^{-E / (kT)}, where T > 0 is temperature and k is a constant.
Metropolis algorithm.
Theorem. Let f_S(t) be fraction of first t steps in which simulation is in state S. Then, assuming some technical conditions, with probability 1:
\lim_{t \to \infty} f_S(t) \frac{1}{Z} e^{-E(S)/(kT)},\ \text{where}\ Z = \sum_{S \in N(S)} e^{-E(S)/(kT)}
Intuition. Simulation spends roughly the right amount of time in each state, according to Gibbs-Boltzmann equation.
Simulated annealing.
Physical analog.
Hopfield networks. Simple model of an associative memory, in which a large collection of units are connected by an underlying network, and neighboring units try to correlate their states.
Input: Graph G = (V, E) with integer (positive or negative) edge weights w.
Configuration. Node assignment s_u = \pm 1.
Intuition. If w_{uv} <
0, then u and v want to have the same state;
if w_{uv} > 0 then u and v want
different states.
Note. In general, no configuration respects all constraints.
Def. With respect to a configuration S, edge e = (u, v) is good if w_e \times s_u \times s_v < 0. That is, if w_e < 0 then s_u = s_v; if w_e > 0, then s_u \ne s_v.
Def. With respect to a configuration S, a node u is satisfied if the weight of incident good edges \ge weight of incident bad edges.
Def. A configuration is stable if all nodes are satisfied.
Goal. Find a stable configuration, if such a configuration exists.
Goal. Find a stable configuration, if such a configuration exists.
State-flipping algorithm. Repeated flip state of an unsatisfied node.
HOPFIELD-FLIP
(G,
w)
Theorem. The state-flipping algorithm terminates
with a stable configuration after at most W =
\sum_e | w_e | iterations.
Pf attempt. Consider measure of progress \Phi(S) = \# satisfied nodes.
Theorem. The state-flipping algorithm terminates
with a stable configuration after at most W =
\sum_e | w_e | iterations.
Pf. Consider measure of progress \Phi(S) = \sum_{e\ \text{good}} | w_e |.
When u flips state:
\Phi(S') = \Phi(S) - \sum_{e\ \text{bad}} |w_e| + \sum_{e\ \text{good}} |w_e| \ge \Phi(S) + 1
Hopfield network search problem. Given a weighted graph, find a stable configuration if one exists.
Hopfield network decision problem. Given a weighted graph, does there exist a stable configuration?
Remark. The decision problem is trivially solvable (always yes), but there is no known poly-time algorithm for the search problem.
Maximum cut. Given an undirected graph G = (V, E) with positive integer edge weights w_e, find a cut (A, B) such that the total weight of edges crossing the cut is maximized.
Toy application.
Real applications. Circuit layout, statistical physics.
Single-flip neighborhood. Given a cut (A, B), move one node from A to B, or one from B to A if it improves the solution.
Greedy algorithm.
MAX-CUT-LOCAL
(G,
w)
Theorem. Let (A, B)
be a locally optimal cut and let (A^\ast,
B^\ast) be an optimal cut. Then w(A, B)
\ge \frac{1}{2} \sum_e w_e \ge \frac{1}{2} w(A^\ast,
B^\ast).
Pf.
Local search. Within a factor of 2 for MAX-CUT, but not poly-time!
Big-improvement-flip algorithm. Only choose a node which, when flipped, increases the cut value by at least \frac{2 \epsilon}{n} w(A,B).
Claim. Upon termination, big-improvement-flip
algorithm returns a cut (A, B) such
that (2 + \epsilon) w(A, B) \ge w(A^\ast,
B^\ast).
Pf idea. Add \frac{2
\epsilon}{n} w(A,B) to each inequality in original proof.
Claim. Big-improvement-flip algorithm terminates after O(\epsilon^{-1} n \log W) flips, where W = \sum_e w_e.
Theorem. [Sahni-Gonzales 1976] There exists a \frac{1}{2}-approximation algorithm for MAX-CUT.
Theorem. There exists an 0.878-approximation algorithm for MAX-CUT.
Theorem. Unless \mathcal{P} = \mathcal{NP}, no 0.942-approximation algorithm for MAX-CUT.
1-flip neighborhood. Cuts (A, B) and (Aʹ, Bʹ) differ in exactly one node.
k-flip neighborhood. Cuts (A, B) and (Aʹ, Bʹ) differ in at most k nodes.
KL-neighborhood. [Kernighan-Lin 1970]
Multicast routing. Given a directed graph G = (V, E) with edge costs c_e \ge 0, a source node s, and k agents located at terminal nodes t_1, .., t_k. Agent j must construct a path P_j from node s to its terminal t_j.
Fair share. If x agents use edge e, they each pay c_e / x.
1 | 2 | cost 1 | cost 2 |
---|---|---|---|
outer | outer | 4 | 8 |
outer | middle | 4 | 5 + 1 |
middle | outer | 5 + 1 | 8 |
middle | middle | 5/2 + 1 | 5/2 + 1 |
Best response dynamics. Each agent is continually prepared to improve its solution in response to changes made by other agents.
Nash equilibrium. Solution where no agent has an incentive to switch.
Fundamental question. When do Nash equilibria exist?
Ex:
Local search algorithm. Each agent is continually prepared to improve its solution in response to changes made by other agents.
Analogies.
Contrast. Best-response dynamics need not terminate since no single objective function is being optimized.
Price of stability. Ratio of best Nash equilibrium to social optimum.
Fundamental question. What is price of stability?
Ex: Price of stability = \Theta(\log k).
Social optimum. Everyone takes bottom paths.
Unique Nash equilibrium. Everyone takes top paths.
Price of stability. H(k) / (1 +
\epsilon).
Theorem. The following algorithm terminates with a Nash equilibrium.
BEST-RESPONSE-DYNAMICS
(G, c,
k)
Pf. Consider a set of paths P_1, .., P_k.
Pf. [ continued ]
Lemma. Let C(P_1, ..,
P_k) denote the total cost of selecting paths P_1, .., P_k.
For any set of paths P_1, .., P_k, we
have
C(P_1, .., P_k) \le \Phi(P_1, .., P_k) \le H(k) \cdot C(P_1, .., P_k)
Pf. Let x_e denote the number of paths containing edge e.
C(P_1, .., P_k) = \sum_{e \in E^+} c_e \le \underbrace{\sum_{e \in E^+} c_e H(x_e)}_{\Phi(P_1, .., P_k)} \le \sum_{e \in E^+} c_e H(k) = H(k) \cdot C(P_1, .., P_k)
Theorem. There is a Nash equilibrium for which the
total cost to all agents exceeds that of the social optimum by at most a
factor of H(k).
Pf.
C(P_1, .., P_k) \le \Phi(P_1, .., P_k) \le \Phi(P_1^\ast, .., P_k^\ast) \le H(k) \cdot C(P_1^\ast, .., P_k^\ast)
Existence. Nash equilibria always exist for k-agent multicast routing with fair sharing.
Price of stability. Best Nash equilibrium is never more than a factor of H(k) worse than the social optimum.
Fundamental open problem. Find any Nash equilibria in poly-time.
Socially optimum
Social optimum. Minimizes total cost to all agent.
Observation. In general, there can be many Nash equilibria.
Even when its unique, it does not necessarily equal the social optimum.
social optimum = 7
unique Nash equilibrium = 8
social optimum = (1 + \epsilon)/k
Nash equilibrium A = (1 + \epsilon)/k
Nash equilibrium B = 1