Semidefinite Matrices and Semidefinite Programming

Section 7.1 Semidefinite Matrices and Semidefinite Programming

Semidefinite programming is a powerful extension of linear programming in which the variables are arranged in a matrix and the non-negativity constraints appearing in the Standard or Canonical Forms of an LP (see Section 2.2) are replaced by the constraint that the matrix holding the variables is positive semidefinite (to be defined shortly). As we shall see, like with linear programming, semi-definite programming provides a powerful tool for attacking a wide range of interesting optimization problems.

As we just said, semidefinite programs generalize linear programs by replacing “non-negative numbers” with “positive semidefinite matrices”. In fact, there is an intuitive way in which positive semidefinite matrices are a natural generalization of the concept of non-negative numbers. At the risk of insulting your intelligence, let us think about what happens when we multiply a real number \(x\) by a real number \(a\geq 0\text{.}\) Regardless of which number \(x\) we choose, the operation of multiplying by a non-negative number \(a\) simply scales the number \(x\) and, importantly, the resulting number \(ax\) remains on the “same side” of the origin as \(x\text{.}\) In contrast, if \(a\) is negative, then the multiplication by \(a\) operation flips \(x\) to the other side of the origin.

Now, let us think of a vector \(x\in\mathbb{R}^n\) as opposed to a real number. A natural analogue of the concept of a positive number is an \(n\times n\) matrix \(A\) with the property that, for any \(x\in\mathbb{R}^n\text{,}\) the vector \(Ax\) is on the “same side” as the vector \(x\text{.}\) “The same side of what?” you may ask. In one dimension, we spoke about being on the same side of the origin, but this doesn’t make much sense in higher dimensions. In higher dimensions, it turns out that the right thing to think about is whether or not \(Ax\) is on the same side of the hyperplane \(x^{\perp}=\{y\in\mathbb{R}^n: x^Ty=0\}\text{.}\) Note that, in one dimension, if \(x\neq0\text{,}\) then \(x^{\perp}=\{0\}\) and so this notion of “sides” in dimension \(n\) coincides with “sides of the origin” in one dimension. What it means for a vector \(z\) to be on the “same side” of this hyperplane as \(x\) is that \(x^Tz\geq 0\text{.}\) So, this leads us to a natural geometric intuition about the following definition of a positive semidefinite (positive definite) matrix which generalizes the concept of a non-negative (positive) real number.

Definition 7.1.1.

An \(n\times n\) symmetric (i.e. \(A^T=A\)) real matrix \(A\) is positive semidefinite (PSD) if, for all \(x\in \mathbb{R}^n\text{,}\) it holds that \(x^TAx\geq 0\text{.}\) Moreover, if \(x^TAx>0\) for all non-zero vectors \(x\text{,}\) then we say that \(A\) is positive definite (PD). We write \(A\succeq 0\) to mean that the matrix \(A\) is PSD and \(A\succ 0\) to mean that it is PD.

There are many equivalent definitions of positive semidefinite matrices, as we shall see now.

Lemma 7.1.2.

Let \(A\) be a real symmetric \(n\times n\) matrix where the entry on the \(i\)th row and \(j\)th column is \(a_{i,j}\text{.}\) Then the following are equivalent:

\(x^TAx\geq0\) for every \(x\in\mathbb{R}^n\text{.}\)
The quadratic function \(f:\mathbb{R}^n\to\mathbb{R}\) defined by \(f(x_1,\dots,x_n)=\sum_{i=1}^n\sum_{j=1}^na_{i,j}x_ix_j\) satisfies \(f(x_1,\dots,x_n)\geq 0\) for all \(x_1,\dots,x_n\in\mathbb{R}\text{.}\)
The eigenvalues of \(A\) are all non-negative.
There exists an \(n\times n\) real matrix \(R\) such that \(R^TR=A\text{.}\)

Proof.

The fact that Condition 1 \(\Longleftrightarrow\) Condition 2 is clear because \(x^TAx\) is precisely equal to \(\sum_{i=1}^n\sum_{j=1}^na_{i,j}x_ix_j\text{.}\) So, from here forward, we show that Condition 1 is equivalent to the other two conditions.

We prove that Condition 1 \(\Longrightarrow\) Condition 3 by contrapositive. Suppose that \(A\) has a negative eigenvalue \(\lambda\) and let \(x\) be an eigenvector for \(\lambda\text{.}\) Then

\begin{equation*} x^TAx = x^T(\lambda x)=\lambda x^Tx = \lambda\|x\|_2^2<0 \end{equation*}

because \(x\) is a non-zero vector and so it has non-zero Euclidean norm. Thus, Condition 1 is false.

We prove Condition 3 \(\Longrightarrow\) Condition 4. Let \(\lambda_1,\dots,\lambda_n\) be the eigenvalues of \(A\text{.}\) By standard linear algebra, \(A\) can be expressed as

\begin{equation*} A = QLQ^{-1} \end{equation*}

where \(Q\) is an \(n\times n\) real matrix whose rows are eigenvectors for \(A\) and \(L\) is a diagonal matrix whose diagonal entries are the eigenvalues of \(A\text{.}\) Since \(A\) is a symmetric real matrix, we can choose the eigenvectors to be orthonormal and so we can assume that \(Q^{-1}=Q^T\text{.}\) Let \(\sqrt{L}\) be the diagonal matrix in which each diagonal entry is the square roots of corresponding entry of \(L\text{.}\) Note that this is well-defined because the eigenvalues of \(A\) are non-negative. Now, let \(R=Q\sqrt{L}Q^{T}\text{.}\) Then, using the fact that \(Q^{T}=Q^{-1}\text{,}\) we have

\begin{equation*} R^TR = \left(Q\sqrt{L}Q^{T}\right)^TQ\sqrt{L}Q^{T} = Q\sqrt{L}^TQ^{T}Q\sqrt{L}Q^{T} = QLQ^T=A \end{equation*}

which completes the proof of the desired implication.

Finally, we prove Condition 4 \(\Longrightarrow\) Condition 1. Suppose that \(R\) is a real matrix and that \(A=R^TR\text{.}\) Then, for any \(x\in\mathbb{R}^n\text{,}\) we have

\begin{equation*} x^TAx = x^TR^TRx = (Rx)^TRx = \|Rx\|_2^2\geq0 \end{equation*}

which is precisely what we wanted.

Condition 4 in Lemma 7.1.2 is particularly natural in the context of our analogy between positive semidefinite matrices and non-negative real numbers. As you know, non-negative real numbers are the only numbers which have a real square root. The matrix \(R\) in Condition 4 can be naturally thought of as the “square root” of the matrix \(A\text{.}\)

Now that we have a good understanding of semidefinite matrices, it is time to introduce the general form of a semidefinite program (SDP). For two \(n\times n\) matrices \(A\) and \(B\) with entries \(a_{i,j}\) and \(b_{i,j}\text{,}\) respectively, define the inner product of \(A\) and \(B\) to be

\begin{equation*} \langle A,B\rangle = \sum_{i=1}^n\sum_{j=1}^n A_{i,j}B_{i,j}. \end{equation*}

Given real symmetric \(n\times n\) matrices \(C,A_1,\dots,A_m\text{,}\) a real vector \(b=(b_1,\dots,b_m)^T\) and an \(n\times n\) matrices \(X\) of variables, the following is a typical form of a semidefinite program (SDP):

\begin{align*} \text{min } \amp \amp \langle C,X\rangle\\ \text{subject to } \amp \amp \langle A_1,X\rangle = b_1\\ \amp \amp \langle A_2,X\rangle = b_2\\ \amp \amp \vdots\\ \amp \amp \langle A_m,X\rangle = b_m\\ \amp \amp X\succeq 0. \end{align*}

In this form, the dual has a vector \(y=(y_1,\dots,y_m)^T\in\mathbb{R}^m\) of variables and is given by

\begin{align*} \text{max } \amp \amp b^Ty\\ \text{subject to } \amp \amp C - \sum_{i=1}^my_iA_i \succeq 0. \end{align*}

Note that, here, there is no non-negativity constraint on the entries of \(y\text{.}\) This is similar in structure to the form of the dual obtained for the Standard Equality Form of a linear program in Exercise 19.

We have the following Weak Duality Theorem for semidefinite programming.

Theorem 7.1.3. Weak Duality for SDP.

For any semidefinite program and its dual, in the form described above, the value of the primal is at most the value of the dual.

Proof.

Exercise 6.

Given our experience with LP duality, we might expect semidefinite programs to satisfy a Strong Duality Theorem as well. However, in SDP, strong duality does not always hold; see Exercise 7. However, under certain additional conditions, strong duality holds. Say that a primal SDP in the form given above is strictly feasible if there exists a feasible solution such that \(X\succ0\) and that the dual SDP is strictly feasible if it has a feasible solution such that \(C - \sum_{i=1}^my_iA_i \succ 0\text{.}\) As it turns out, if the primal and dual are both strictly feasible, then strong duality holds; this follows from something known as “Slater’s condition” that we will not go into depth about here.

Theorem 7.1.4. Strong Duality for SDP under Slater’s Condition.

If a primal SDP and its dual are both strictly feasible, then the value of the primal is equal to the value of the dual.

There exist efficient algorithms for approximately solving SDPs. Here, “approximately” means that they can be solved up to any degree of accuracy where the the running time of the algorithm depends on an error parameter. As with linear programs, we will not discuss the efficient algorithms that are used to solve SDPs in practice. However, it is important to be aware that such algorithms exist and so, whenever we can reduce a problem to solving an SDP, we can immediately conclude that that problem can be approximately solved in polynomial time.

You may remember that, back in Chapter 2, the concept of convexity was crucial in the study of linear programming. Once again, convexity is the key concept which underpins semidefinite programming as well (in fact, SDP itself is part of a broader theory known as convex programming). Let us look at one way in which convexity comes into play. Consider three real variables \(x,y,z\) and let \(A\) be the matrix

\begin{equation*} A=\begin{bmatrix}x\amp z\\ z\amp y\end{bmatrix}. \end{equation*}

In the case of a simple \(2\times 2\) matrix like this, being PSD simply means that the diagonal entries are non-negative and that the determinant is non-negative (see Exercise 4). So, \(A\) is positive semidefinite if and only if \(x,y\geq0\) and \(xy\geq z^2\text{.}\) In the video below, we have plotted all points for which \(0\leq x,y\leq 4\) and \(z^2=xy\text{.}\) If you imagine “filling in” the points inside of this curve, you will see that it yields a convex set. More generally, semidefinite programming extends linear programming by allowing us to optimize over more general convex sets than just polyhedra. If you have MATLAB, then you can download the figure in the video below here and the file that was used to generate it here.

Subsection 7.1.1 A First Application of SDP: Maximizing Angles

Now, let us look at our first application of semi-definite programming. Suppose that we would like to solve the following geometry problem.

Problem 7.1.5. Maximizing Angles.

In this problem, we want to find three non-zero vectors in \(\mathbb{R}^3\) which are as “spread out” as possible. More precisely, we solve the following.

\begin{align*} \text{max } \amp \amp \theta\\ \text{subject to } \amp \amp u_1,u_2,u_3\in\mathbb{R}^3\setminus\{0\},\\ \amp\amp\text{the angle between any two distinct vectors in }\{u_1,u_2,u_3\}\text{ is equal to }\theta. \end{align*}

Before we can attack this with semidefinite programming, we need to first convert it into an equivalent problem involving a semidefinite matrix. First, note that Problem 7.1.5 does not change if we assume that \(u_1,u_2,u_3\) are unit vectors, and so we will assume this as it will be convenient later. We recall that the angle \(\varphi\) between two unit vectors \(u,v\) can be determined via

\begin{equation*} \cos(\varphi) = u^Tv. \end{equation*}

Also, observe that the function \(\cos(\varphi)\) is decreasing for \(0\leq \varphi\leq 2\pi\text{.}\) So, the problem of maximizing the smallest angle between any two of the three vectors can be rephrased as minimizing the largest dot product among any pair of them. That is, our problem is equivalent to

\begin{align*} \text{min } \amp \amp x\\ \text{subject to } \amp\amp u_1^Tu_1=1,\\ \amp\amp u_2^Tu_2=1,\\ \amp\amp u_3^Tu_3=1,\\ \amp\amp u_1^Tu_2= x,\\ \amp\amp u_1^Tu_3= x,\\ \amp\amp u_2^Tu_3= x. \end{align*}

We can remove the variable \(x\) as follows:

\begin{align*} \text{min } \amp \amp u_1^Tu_2\\ \text{subject to } \amp\amp u_1^Tu_1=1,\\ \amp\amp u_2^Tu_2=1,\\ \amp\amp u_3^Tu_3=1,\\ \amp\amp u_1^Tu_3- u_1^Tu_2=0 ,\\ \amp\amp u_2^Tu_3-u_1^Tu_2= 0. \end{align*}

Note that the first three constraints come from our assumption that all three vectors have length one. Now, this still isn’t in quite the right format, but it is closer. Our next step is to use Condition 4 to convert this into an SDP. Indeed, imagine that we let \(R\) be the \(3\times 3\) matrix where the three columns are given by \(u_1,u_2,u_3\text{,}\) respectively, and let \(X=R^TR\text{.}\) Note that the entry on the \(i\)th row and \(j\)th column of \(X\) is precisely \(u_i^Tu_j\text{.}\) Also, by Lemma 7.1.2, not only is the matrix \(X\) positive semidefinite, but every positive semidefinite matrix can be represented as such a product. So, letting the entries of \(X\) be the variables of our matrix, we get the following:

\begin{align*} \text{min } \amp \amp x_{1,2}\\ \text{subject to } \amp\amp x_{1,1}=1,\\ \amp\amp x_{2,2}=1,\\ \amp\amp x_{3,3}=1,\\ \amp\amp x_{1,3}-x_{1,2}=0 ,\\ \amp\amp x_{2,3}-x_{1,2}= 0,\\ \amp\amp X\succeq 0. \end{align*}

Now, we need to be careful here. There is a still a subtle issue here. In an SDP, the matrix \(C\) in the objective function as well as all of the matrices \(A_1,\dots,A_m\) in the constraints are supposed to be symmetric. The above optimization problem fails that condition. However, its always easy to “symmetrize” it by replacing every instance of \(x_{i,j}\) with \(\frac{1}{2}\left(x_{i,j}+x_{j,i}\right)\text{,}\) which does not change anything because the matrix is symmetric anyway. In this case, after doing these substitutions, we will also re-scale some of our constraints to get rid of fractions. So, the above optimization problem is equivalent to

\begin{align*} \text{min } \amp \amp \frac{1}{2}x_{1,2}+\frac{1}{2} x_{2,1}\\ \text{subject to } \amp\amp x_{1,1}=1,\\ \amp\amp x_{2,2}=1,\\ \amp\amp x_{3,3}=1,\\ \amp\amp x_{1,3}+x_{3,1}-x_{1,2}-x_{2,1}=0 ,\\ \amp\amp x_{2,3}+x_{3,2}-x_{1,2}-x_{2,1}= 0,\\ \amp\amp X\succeq 0. \end{align*}

Okay, now, after all of this messing around, let’s try to solve this thing. There is a video at the end of this section which does this via computing the eigenvalues of \(X\) and determining when they are non-negative (which is a pretty easy calculation). To give ourselves some practice with SDP duality, let’s instead solve this by taking the dual and analyzing it. First, we observe that the matrix

\begin{equation*} \begin{bmatrix}1\amp -1/2\amp -1/2\\ -1/2\amp 1\amp -1/2\\-1/2\amp -1/2\amp 1\end{bmatrix} \end{equation*}

is primal feasible (its eigenvalues are \(3/2,3/2,0\)). Therefore, the value of the primal SDP is at most \(-1/2\text{.}\) Let’s now take the dual. By definition, it is

\begin{align*} \text{max } \amp \amp y_1+y_2+y_3\\ \amp\amp \begin{bmatrix}-y_1\amp \frac{1}{2}+y_4+y_5\amp -y_4\\ \frac{1}{2}+y_4+y_5\amp -y_2\amp -y_5\\-y_4\amp -y_5\amp-y_3\end{bmatrix}\succeq 0. \end{align*}

Now, if we plug in \(y_1=y_2=\cdots=y_5=-1/6\text{,}\) we get a feasible point (the matrix has eigenvalues \(3/2,0,0\)) which gives a value of \(-1/2\text{.}\) So, by weak duality for SDPs, the value of the primal and the dual SDPs must be equal to \(-1/2\text{.}\) So, the maximum possible angle it \(\arccos(-1/2)=2\pi/3\text{.}\)

Subsection 7.1.2 Computing the Chromatic Number of a Perfect Graph

In Section 4.1, we showed that the complement of a perfect graph is perfect and that the stable set polytope of a perfect graph is equal to the set of its fractional stable sets. Unfortunately, the naive LP approach using Theorem 4.1.7 does not yield a polynomial algorithm for computing the chromatic number of a perfect graph because the LP will have exponential size. Our next goal is to show how the chromatic number can be computed in polynomial time via an SDP.

To this end, let \(G\) be a perfect graph with \(n\) vertices and, for each \(v\in V(G)\text{,}\) let \(z_v\in \mathbb{R}^n\) be a variable corresponding to \(v\text{.}\) We consider the following optimization problem:

\begin{align*} \text{min } \amp \amp x\\ \text{subject to } \amp\amp z_v^Tz_v=1\amp\amp \forall v\in V(G),\\ \amp\amp z_u^Tz_v\leq x \amp\amp \forall uv\in E(G) \end{align*}

Let \(x^*(G)\) be the value of the above optimization problem. In order to get a polynomial-time algorithm for computing the chromatic number of a perfect graph, we prove the following three things:

The above optimization problem can be expressed as a semi-definite program where the number of variables and constraints is bounded by a polynomial in \(|V(G)|\text{.}\)
Every graph \(G\) satisfies \(x^*(G)\leq \frac{-1}{\chi(G)-1}\text{.}\)
Every graph \(G\) satisfies \(\frac{-1}{\omega(G)-1}\leq x^*(G)\text{.}\)

Recall that perfect graphs satisfy \(\chi(G)=\omega(G)\text{.}\) Combining this with the second and third statements, if \(G\) is perfect, then we have \(x^*(G)= \frac{-1}{\chi(G)-1}\) and so \(\chi(G)=\frac{x^*(G)-1}{x^*(G)}\text{.}\) Now, let’s see how the first statement allows us to compute \(\chi(G)\) for any perfect graph. The key observation is that, for any \(k\geq1\text{,}\) in order to determine whether \(\chi(G)\leq k\) or \(\chi(G)\geq k+1\text{,}\) it suffices to compute \(x^*(G)\) to within a precision of less than \(\frac{1}{2k(k-1)}\text{.}\) So, since the chromatic number is at most \(n\text{,}\) it suffices to solve the SDP to a precision of, say, \(\frac{0.9999}{2(n-2)(n-1)}\text{,}\) which can be done in polynomial time.

Let us show that the above optimization problem can be phrased as an SDP. This is fairly similar to the angle minimization problem that we analyzed above, except that it is useful to introduce some slack variables to turn the inequalities into equalities. That is, we let \(t_{uv}\) be a real variable for each \(uv\in E(G)\) and observe that the following optimization problem is equivalent to the original one:

\begin{align*} \text{min } \amp \amp x\\ \text{subject to } \amp\amp z_v^Tz_v=1\amp\amp \forall v\in V(G),\\ \amp\amp z_u^Tz_v+ t_{uv}= x \amp\amp \forall uv\in E(G)\\ \amp\amp t_{uv}\geq 0 \amp\amp \forall uv\in E(G) \end{align*}

Now, let \(ab\) be a fixed edge of \(G\text{.}\) We can eliminate the variable \(x\) as follows:

\begin{align*} \text{min } \amp \amp z_a^Tz_b+ t_{ab}\\ \text{subject to } \amp\amp z_v^Tz_v=1\amp\amp \forall v\in V(G),\\ \amp\amp z_u^Tz_v+ t_{uv}- z_a^Tz_b- t_{ab}=0 \amp\amp \forall uv\in E(G)\\ \amp\amp t_{uv}\geq 0 \amp\amp \forall uv\in E(G) \end{align*}

Now, let \(R\) be the \(n\times n\) matrix with columns \(z_v\) for \(v\in V(G)\) and let \(X=R^TR\text{.}\) As in the angle maximization problem, we get

\begin{align*} \text{min } \amp \amp x_{a,b}+ t_{ab}\\ \text{subject to } \amp\amp x_{v,v}=1\amp\amp \forall v\in V(G),\\ \amp\amp x_{u,v}+ t_{uv}- x_{a,b}- t_{ab}=0 \amp\amp \forall uv\in E(G)\\ \amp\amp t_{uv}\geq 0 \amp\amp \forall uv\in E(G) \end{align*}

This is almost the right form, except that we need to deal with the slack variables. For this, we add \(|E(G)|\) additional rows and columns to the matrix \(X\text{,}\) one for each edge. Then, we let the diagonal entries of these rows/columns play the role of the variables \(t_{uv}\) for \(uv\in E(G)\) and add constraints to ensure that all of the off-diagonal entries in these rows/columns are zero. Subject to these constraints, the new matrix will be PSD if and only if the original matrix is PSD and all of the diagonal entries are non-negative. Thus, the problem of computing \(x^*(G)\) can be phrased as an SDP.

Okay, now, let’s prove the other two statements that we require.

Proposition 7.1.6.

Every graph \(G\) satisfies \(x^*(G)\leq \frac{-1}{\chi(G)-1}\text{.}\)

Proof.

Let \(k=\chi(G)\) and let \(y_1,\dots,y_k\) be unit vectors forming a \(k\)-simplex in \(\mathbb{R}^n\text{.}\) In particular, we have that \(\sum_{i=1}^ky_i=0\) and that all of the angles between distinct vectors \(y_i\) and \(y_j\) are the same. So, we can let \(s\in\mathbb{R}\) so that \(\langle y_i,y_j\rangle =s\) for all \(i\neq j\text{.}\) Now, let us compute \(s\text{.}\) Since \(\sum_{i=1}^ky_i=0\) and \(y_1,\dots,y_k\) are unit vectors, we have

\begin{equation*} \left\|\sum_{i=1}^ky_i=0\right\|_2^2 = \sum_{i=1}^ky_i^Ty_i + \sum_{i\neq j}y_i^Ty_j = k + k(k-1)s \end{equation*}

and therefore \(s=\frac{-1}{k-1}\text{.}\) Now, we let \(f:V(G)\to\{1,\dots,k\}\) be a proper \(k\)-colouring of \(G\) and let \(z_v:=y_{f(v)}\) for all \(v\in V(G)\text{.}\) It follows that \(x^*(G)\leq \frac{-1}{k-1}=\frac{-1}{\chi(G)-1}\text{.}\)

Proposition 7.1.7.

Every graph \(G\) satisfies \(\frac{-1}{\omega(G)-1}\leq x^*(G)\text{.}\)

Proof.

Let \(z_v\) for \(v\in V(G)\) be optimal vectors for the SDP computing \(x^*(G)\text{.}\) Let \(k=\omega(G)\text{,}\) let \(v_1,\dots,v_k\) be vertices of a clique of size \(k\) in \(G\text{,}\) and let \(z_i:=z_{v_i}\) for \(1\leq i\leq k\text{.}\) We have

\begin{equation*} 0\leq \left\|\sum_{i=1}^kz_i=0\right\|_2^2 = \sum_{i=1}^kz_i^Tz_i = \sum_{i\neq j}z_i^Tz_j \leq k + k(k-1)x^*(G) \end{equation*}

and therefore \(x^*(G)\geq \frac{-1}{k-1}=\frac{-1}{\omega(G)-1}\text{.}\)

For another application of semi-definite programming, see the notes on the Shannon capacity of a graph from this course on Extremal Combinatorics.

Here are two YouTube videos on this topic.

Here are three excellent short videos on semi-definite matrices and semi-definite programming by Bachir El Khadir. They do not contain many technical details or proofs but are excellent for building intuition.

Prev Top Next