- Input: A graph \(G\text{.}\)
- Problem: Find a cut of maximum size.
Section 7.2 SDP Approximation Algorithm for MAXCUT
In this section, we present a beautiful application of semi-definite programming to the MAXCUT problem, described below. For disjoint sets \(S,T\text{,}\) define \(E(S,T)\) to be the set of edges with one endpoint in \(S\) and one endpoint in \(T\) and let \(e(S,T):=|E(S,T)|\text{.}\) We define the size of a cut \((S,V(G)\setminus S)\) to be equal to \(e(S,V(G)\setminus S)\text{.}\)
Problem 7.2.1. MAXCUT.
Let \(\maxcut(G)\) be maximum size of a cut in \(G\text{.}\) The MAXCUT problem is well-known to be NP-hard. Therefore, we should not expect to have an efficient way to solve it precisely. However, miraculously, it is possible to get a very good approximation using semi-definite programming.
Theorem 7.2.2. Goemans and Williamson.
Define
\begin{equation*}
\alpha:=\frac{2}{\pi}\min_{0\leq \theta\leq \pi}\frac{\theta}{1-\cos\theta}.
\end{equation*}
There exists a randomized polynomial-time algorithm for producing a cut of average size at least \(\alpha\cdot \maxcut(G)\text{.}\)Note that, in the above theorem, it turns out that \(\alpha\approx0.878\text{.}\) So, while it is NP-hard to solve the MAXCUT problem, we can get a cut which is at least \(87.8\%\) of the size of the maximum cut in polynomial time! Let’s work through the proof of this.
The first step is to come up with an appropriate ILP for computing \(\maxcut(G)\text{.}\) Label the vertices of \(G\) by \(v_1,\dots,v_n\) and, given a set \(S\text{,}\) let \(z_S=(z_1,\dots,z_n)\) be a vector where \(z_i=1\) if \(v_i\in S\) and \(z_1=-1\) otherwise. Then it is easy to see that an edge \(v_iv_j\in E(G)\) is counted by \(e(S,V(G)\setminus S)\) if and only if \(x_ix_j=-1\text{.}\) From this, we see that \(\maxcut(G)\) is equal to
\begin{align*}
\text{max } \amp \amp \sum_{v_iv_j\in E(G)}\frac{1-z_iz_j}{2}\\
\text{subject to } \amp \amp z_i\in \{-1,1\} \text{ for all }1\leq i\leq n.
\end{align*}
Of course, it is NP-hard to solve this ILP because the MAXCUT problem is NP-hard. So, rather than focusing on this problem directly, it is wise to consider a relaxation of it. As a first step toward this, let \(u\) be any unit vector in \(\mathbb{R}^n\text{.}\) Then, since \(u^Tu=1\) and \(u^T(-u)=-1\text{,}\) the above ILP is clearly equivalent to
\begin{align*}
\text{max } \amp \amp \sum_{v_iv_j\in E(G)}\frac{1-z_i^Tz_j}{2}\\
\text{subject to } \amp \amp z_i\in \{-u,u\} \text{ for all }1\leq i\leq n.
\end{align*}
Now, let us “relax” this problem by allowing \(z_1,\dots,z_n\) to be arbitrary unit vectors. That is, we let \(\maxcut'(G)\) be the value of the optimization problem
\begin{align*}
\text{max } \amp \amp \sum_{v_iv_j\in E(G)}\frac{1-z_i^Tz_j}{2}\\
\text{subject to } \amp \amp z_i\in \mathbb{S}^{n-1} \text{ for all }1\leq i\leq n
\end{align*}
where \(\mathbb{S}^{n-1}=\{z\in \mathbb{R}^n: \|z\|_2=1\}\text{.}\) Now, clearly, \(\maxcut(G)\leq \maxcut'(G)\) because the constraints of the above optimization problem are less restrictive than the one that came before it. Our goal now is to show that
- \(\maxcut'(G)\) can be computed by solving an SDP with a polynomial (in terms of \(n\)) number of variables and constraints and
- Given an optimal choice of vectors \(z_1,\dots,z_n\) for the above optimization problem, there is a polynomial time randomized algorithm to find a cut in \(G\) of average size \(\alpha\cdot\maxcut'(G)\) in \(G\text{.}\)
These two statements together will imply Theorem 7.2.2 since SDPs can be approximately solved in polynomial time. Let us explain why the above optimization problem boils down to an SDP. Given unit vectors \(z_1,\dots,z_n\text{,}\) let \(R\) be the \(n\times n\) matrix in which the \(i\)th column of \(R\) is \(z_i\) and let \(X=R^TR\text{.}\) Then the matrix \(X\) is positive semidefinite. Note that, by definition, the \((i,j)\) entry of \(X\) is precisely \(z_i^Tz_j\text{.}\) Also, by Condition 4 of Lemma 7.1.2, every positive semidefinite matrix \(X\) can be expressed in the form \(R^TR\) for some \(n\times n\) matrix \(R\text{.}\) So, the above optimization problem is equivalent to
\begin{align*}
\text{max } \amp \amp \sum_{v_iv_j\in E(G)}\frac{1-X_{i,j}}{2}\\
\text{subject to } \amp \amp X_{i,i}=1 \text{ for all }1\leq i\leq n,\\
\amp \amp X\succeq 0
\end{align*}
where the constraint \(X_{i,i}=1\) comes from the fact that each of \(z_1,\dots,z_n\) is a unit vector. Now, clearly, this is an SDP and so we have achieved our first goal. Also, it is clear that solving this SDP is equivalent to solving
\begin{align*}
\text{min } \amp \amp \langle A_G,X\rangle\\
\text{subject to } \amp \amp X_{i,i}=1 \text{ for all }1\leq i\leq n,\\
\amp \amp X\succeq 0
\end{align*}
where \(A_G\) is the adjacency matrix of \(G\text{.}\) This is certainly a semi-definite program.
Now, suppose that \(z_1,\dots,z_n\) are (approximately) optimal vectors for computing \(\maxcut'(G)\text{,}\) which can be computed in polynomial time by solving the above semi-definite program. We want to devise a randomized algorithm which finds a cut of average size at least \(\alpha \cdot \maxcut'(G)\) in polynomial time. The clever idea of Goemanns and Williamson is to take a random hyperplane through the origin in \(\mathbb{R}^n\) and simply define \(S\) to be the set of vertices \(v_i\) of \(G\) such that the vector \(z_i\) lies on one side of the hyperplane and \(V(G)\setminus S\) be the set of vertices for which the vector lies on the other side. That is, we choose \(w\in\mathbb{S}^{n-1}\) uniformly at random and define
\begin{equation*}
S:=\{v_i: w^Tz_i\geq 0\}.
\end{equation*}
Now, let’s think about how large this cut will be. Given an edge \(v_iv_j\in E(G)\text{,}\) the probability that this edge goes across the cut is equal to the probability that line perpendicular to the projection of \(w\) onto the plane generated by \(v_i,v_j\) passes between \(z_i\) and \(z_j\text{.}\) If \(\theta_{i,j}\) is the angle between \(z_i\) and \(z_j\text{,}\) then this probability is \(\frac{2\theta_{i,j}}{2\pi}= \frac{\theta_{i,j}}{\pi}\text{.}\) Therefore,
\begin{equation*}
\mathbb{E}(e(S,V(G)\setminus S)) = \sum_{v_iv_j\in E(G)}\frac{\theta_{i,j}}{\pi}.
\end{equation*}
By definition of \(\alpha\text{,}\) this is at least
\begin{equation*}
\sum_{v_iv_j\in E(G)}\alpha\cdot \left(\frac{1-\cos\theta_{i,j}}{2}\right) = \sum_{v_iv_j\in E(G)}\alpha\cdot \left(\frac{1-z_i^Tz_j}{2}\right) = \alpha\cdot \maxcut'(G)\geq \alpha\maxcut(G).
\end{equation*}
What a fantastic idea!
To recap, the Goemans and Williamson algorithm proceeds as follows. First, solve the SDP
\begin{align*}
\text{min } \amp \amp \langle A_G,X\rangle\\
\text{subject to } \amp \amp X_{i,i}=1 \text{ for all }1\leq i\leq n,\\
\amp \amp X\succeq 0.
\end{align*}
Then, take the optimal matrix \(X\) and find an \(n\times n\) matrix \(R\) such that \(R^TR=X\text{.}\) Let \(z_1,\dots,z_n\) be the columns of \(R\text{.}\) Take \(w\in\mathbb{S}^{n-1}\) uniformly at random and let \(S\) be the set of all vertices \(v_i\) such that \(w^Tz_i\geq0\text{.}\) On average, this will give you a cut that is at least \(87.8\%\) as big as the maximum one!
Here is a YouTube video on this topic.
Here is one more video from the same excellent series as the videos by Bachir El Khadir that were included at the end of Section 7.1.
