Section 6.3 Shannon Capacity
Suppose that you are a spy with an alphabet of \(5\) distinct symbols \(v_1,\dots,v_5\) and you want to develop a language of “code words” that you can use to secretly communicate with your fellow spies in Vancouver. The only way to safely communicate is by sending signals through a rusty old wire that runs along the bottom of the Salish Sea.
Unfortunately, because of the poor quality of the wire, the signals that you send will sometimes get corrupted. However, on the bright side, the corruption only happens in a small number of specific ways. For example, the letter \(v_1\) sometimes gets swapped with \(v_2\) or \(v_5\text{.}\) The letter \(v_3\) sometimes gets swapped with \(v_2\) or \(v_4\text{,}\) etc. We assume that this relation is symmetric, in the sense that if \(v_5\) can be swapped for \(v_4\text{,}\) then \(v_4\) can be swapped for \(v_5\text{.}\) We can represent all of the possible “mistakes” with a graph where we add an edge from \(v_i\) to \(v_j\) if and only if \(i\neq j\) and \(v_i\) can be swapped for \(v_j\text{.}\) For example, the graph may look like this:

Now, since the information that you are trying to communicate is very sensitive, it is very important that there is no possible confusion. So, given two possible code words such as
\begin{equation*}
v_1v_4v_5 \text{ and } v_5v_3v_5\text{,}
\end{equation*}
it is important that your set of code words does not include both of them, since you may send the word \(v_1v_4v_5\) down the wire, the first letter might get switched to a \(v_5\) and the second might get switched to a \(v_3\text{.}\) Thus, the recipient will receive \(v_5v_3v_5\) instead. If this word has a different meaning than \(v_1v_4v_5\text{,}\) then it could have devastating consequences! So, a natural combinatorial problem arises: How many code words of length \(k\) can you create using the letters from this alphabet such that no two of these code words can be confused with one another?
This can be modeled as a problem regarding independent sets in a specific type of graph. Specifically, suppose that \(G\) is a graph on vertex set \(\{v_1,\dots,v_n\}\) where \(v_i\) is adjacent to \(v_j\) if symbol \(v_i\) can be confused with symbol \(v_j\text{.}\) Then, for \(k\geq1\text{,}\) we can create a new graph where the vertex set is \(V(G)^k\text{,}\) the set of all \(k\)-letter words in our alphabet, and two distinct words are adjacent if, for all \(1\leq i\leq k\text{,}\) the \(i\)th letter in the first word is either equal to or can be confused with the \(i\)th letter in the second word. Then the maximum possible number of \(k\)-letter code words that can be formed is equal to the independence number of this graph. A good way to formalize this construction is with the notion of the “strong product” of a graph.
Definition 6.3.1.
Given two graphs \(G\) and \(H\text{,}\) the strong product of \(G\) and \(H\text{,}\) denoted G strongprod H, is the graph with vertex set \(V(G)\times V(H)\) where \((u,v)\) is adjacent to \((x,y)\) in \(G\boxtimes H\) if and only if the following three conditions hold:
\(u=x\) or \(ux\in E(G)\text{,}\)
\(v=y\) or \(vy\in E(H)\) and
\((u,v)\neq (x,y)\text{.}\)
Definition 6.3.2.
For a graph \(G\text{,}\) define \(G^1:=G\) and, for \(k\geq2\text{,}\) let \(G^k:=G^{k-1}\boxtimes G\text{.}\)
In this language, the largest possible number of code words that one can construct with no possibility of confusion is precisely \(\alpha(G^k)\text{.}\) What we would like to determine is the approximate growth rate of \(\alpha(G^k)\text{.}\) Of course, if \(G\) had no edges at all, i.e., if no confusion was possible, then \(\alpha(G^k)\) would be simply \(|V(G)|^k\text{.}\) For example, in the English language, we have 26 letters, and if we assume that no two letters can be confused with one another, then there are \(26^k\) possible words of length \(k\text{.}\) Therefore, in general, we should expect the number of words to grow at an exponential rate with respect to \(k\text{.}\) The main problem that we will be investigating is to determine the base of that exponential; this is captured by the following definition.
Definition 6.3.3.
The Shannon capacity of a graph is defined to be
\begin{equation*}
\Theta(G):=\lim_{k\to\infty}\alpha(G^k)^{1/k}\text{.}
\end{equation*}
Given this definition, the first question that should come to mind is: does the limit exist? As it turns out, it does exist, and so the Shannon capacity is well-defined. We will prove this shortly. Before that, let us get a bit of intuition about \(\Theta(G)\) by computing it in a simple example.
Example 6.3.4.
Consider the graph \(C_4\) where the vertices are labelled \(v_1,v_2,v_3\) and \(v_4\) in the order that they appear on the cycle. For each \(k\geq 1\text{,}\) consider the set of all \(k\)-tuples whose entries are in \(\{v_1,v_3\}\text{.}\) This is easily seen to be an independent set in \(C_4^k\) and so
\begin{equation*}
\alpha(C_4)^k\geq 2^k\text{.}
\end{equation*}
Now, for an upper bound, suppose that \(S\) is an independent set in \(C_4^k\) for \(k\geq1\text{.}\) For each \(k\)-tuple \(x=(x_1,\dots,x_k)\in \{0,1\}^k\text{,}\) let \(T_x\) be the subset of \(V(C_4)^k\) consisting of \(k\)-tuples \((u_1,\dots,u_k)\) such that \(u_i\in \{v_1,v_2\}\) if \(x_i=0\) and \(u_i\in\{v_3,v_4\}\) if \(x_i=1\text{.}\) Then \(T_x\) is a clique in \(C_4^k\) and so \(|S\cap T_x|\leq 1\) for all \(x\in\{0,1\}^k\text{.}\) Therefore,
\begin{equation*}
|S|\leq\sum_{x\in\{0,1\}^k}|S\cap T_x|\leq \sum_{x\in\{0,1\}^k}1 = 2^k\text{.}
\end{equation*}
Hence, \(\alpha(C_4^k)\leq2^k\text{.}\) Combining this with the lower bound in the previous paragraph, we get \(\alpha(C_4^k)=2^k\text{.}\) Therefore, \(\Theta(C_4)=2\text{.}\)
Next, we focus our attention on proving that \(\Theta(G)\) is well-defined in general. To this end, we will show that the sequence \((\alpha(G^k))_{k=1}^\infty\) is “supermultiplicative” and apply a very commonly used tool known as Fekete’s Lemma. The first thing that we need is the following lemma, whose proof has been left as an exercise.
Lemma 6.3.5.
For any graphs \(G\) and \(H\text{,}\)
\begin{equation*}
\alpha(G)\alpha(H)\leq \alpha(G\boxtimes H)\leq |V(G)||V(H)|\text{.}
\end{equation*}
Proof.
Next, we state and prove Fekete’s Lemma
[230]. For this, we need the following definition.
Definition 6.3.6.
A sequence \((a_k)_{k=1}^\infty\) is said to be subadditive if \(a_{n+m}\leq a_n+a_m\) for all \(n,m\geq1\text{.}\)
Lemma 6.3.7. Fekete’s Lemma.
For every subadditive sequence \((a_k)_{k=1}^\infty\text{,}\)
\begin{equation*}
\lim_{k\to\infty} a_k/k=\inf_{k\in\mathbb{N}}a_k/k\text{.}
\end{equation*}
In particular, the limit exists (but may be \(-\infty\)).
Proof.
Define \(a:=\inf_{k\in\mathbb{N}}a_k/k\text{.}\) Then, clearly, \(\liminf_{k\to\infty}a_n/k \geq a\text{.}\) We will be done if we can prove that \(\limsup_{k\to\infty}a_n/k \leq a\text{.}\)
Suppose, to the contrary, that \(\limsup_{k\to\infty}a_n/k > a\text{.}\) Then there exists a subsequence \(k_1,k_2,\dots\) and \(\varepsilon>0\) such that \(a_{k_n}/k_n\geq a+\varepsilon\) for all \(n\geq1\text{.}\) By definition of \(a\text{,}\) there must exist some index \(m\) such that \(a_m/m \leq a+\varepsilon/2\text{.}\) By the pigeonhole principle, there must exist some \(\ell\in\{0,\dots,m-1\}\) such that infinitely many of the integers in the sequence \(k_1,k_2,\dots\) are congruent to \(\ell\) modulo \(m\text{.}\) So, we can assume that this subsequence was originally chosen so that all of its elements are congruent to \(\ell\bmod m\text{.}\) Write \(k_n:=t_nm+\ell\) for \(n\geq1\) where the sequence \(t_1,t_2,\dots\) is increasing. Then, for any \(n\geq1\text{,}\)
\begin{equation*}
\frac{a_{k_n}}{k_n} = \frac{a_{\ell+t_nm}}{\ell+t_nm}\text{.}
\end{equation*}
By subadditivity and definition of \(m\text{,}\) this is at most
\begin{equation*}
\frac{a_\ell+t_na_m}{\ell+t_nm} \leq \frac{a_\ell+t_nm(a+\varepsilon/2)}{\ell+t_nm}\text{.}
\end{equation*}
However, this final expression converges to \(a+\varepsilon/2\) as \(n\) tends to infinty. Thus, there exists \(n\) such that \(a_{k_n}/k_n\lt a+\varepsilon\text{,}\) which contradicts the choice of this subsequence and completes the proof.
While we will not use Fekete’s Lemma in the above form, we will use the following variant of it. Say that a sequence \((a_k)_{k=1}^\infty\) is supermultiplicitive if \(a_{m+n}\geq a_ma_n\) for all \(m,n\geq1\text{.}\)
Corollary 6.3.8.
For every supermultiplicative sequence \((a_k)_{k=1}^\infty\text{,}\)
\begin{equation*}
\lim_{k\to\infty} a_k^{1/k}=\sup_{k\in\mathbb{N}}a_k^{1/k}\text{.}
\end{equation*}
In particular, the limit exists (but may be \(\infty\)).
Proof.
Finally, we can deduce that the Shannon capacity of a graph is well-defined.
Corollary 6.3.9.
For any graph \(G\text{,}\) the limit of \(\alpha(G^k)^{1/k}\) as \(k\to\infty\) is equal to \(\sup_{k\geq1}\alpha(G^k)^{1/k}\text{.}\) Therefore, the Shannon capacity of any graph exists and is finite.
Proof.
Let
\(a_k=\alpha(G^k)\) for all
\(k\geq1\text{.}\) By the lower bound in
Lemma 6.3.5, we have
\begin{equation*}
\alpha(G^{n+m})=\alpha(G^n\boxtimes G^m) \geq \alpha(G^n)\alpha(G^m)
\end{equation*}
for all
\(n,m\geq1\text{.}\) Therefore,
\((a_k)_{k=1}^\infty\) is supermultiplicative and so the limit of
\(a_k^{1/k}\) as
\(k\to\infty\) exists by
Corollary 6.3.8. By the upper bound in
Lemma 6.3.5, the limit is at most
\(|V(G)|\text{.}\) Therefore,
\(\Theta(G)\) exists and is finite.
For those of you who are interested in computational complexity, one very natural problem is to determine the “computationally hardness” of computing \(\Theta(G)\) for a given input graph \(G\text{.}\) You may be familiar with many other combinatorial problems which tend to be either polynomial-time solvable or, at the very least, solvable in exponential time via a trivial brute force algorithm. In contrast, any natural brute force approach to computing the Shannon capacity would run for an infinite number of steps, rendering it completely useless. It is a major open problem to determine whether or not there exists an algorithm for computing the Shannon capacity that terminates in a finite number of steps for any given input graph \(H\text{.}\) To elaborate further on how difficult it is to compute the Shannon capacity of a graph, we note that one of the major open problems in the field is to compute the Shannon capacity of the \(7\)-cycle!
The Shannon capacity of the
\(5\)-cycle was determined in a groundbreaking paper of Lovász
[202] and is the main focus of the rest of this section. First, we derive a lower bound on
\(\Theta(C_5)\) from the following. A graph
\(G\) is
self-complementary if
\(\overline{G}\) is isomorphic to
\(G\text{.}\)
Lemma 6.3.10.
If \(G\) is self-complementary, then \(\alpha(G\boxtimes G)\geq |V(G)|\text{.}\)
Proof.
Corollary 6.3.11.
\(\Theta(C_5)\geq \sqrt{5}\text{.}\)
Proof.
\begin{equation*}
\Theta(C_5)\geq \alpha(C_5\boxtimes C_5)^{1/2} \geq\sqrt{5}\text{.}
\end{equation*}
Our aim now is to show that the bound in
Corollary 6.3.11 is actually tight; that is, that
\(\Theta(C_5)\leq \sqrt{5}\) as well. To present the proof, we need a few definitions. In everything that follows,
\(\langle \cdot,\cdot\rangle\) is the standard inner (dot) product on
\(\mathbb{R}^n\) and
\(\|\cdot\|_2\) is the standard Euclidean norm. Recall that
\(\mathbb{S}_{n-1}\) is the unit sphere in
\(\mathbb{R}^n\text{.}\)
Definition 6.3.12.
Let \(G\) be a graph with \(n\) vertices. A function \(f:V(G)\to \mathbb{S}_{n-1}\) is called a orthonormal representation of \(G\) if \(\langle f(u),f(v)\rangle=0\) whenever \(uv\notin E(G)\text{.}\) Let \(\mathcal{O}_{G}\) be the set of all orthonormal representations \(f\) of \(G\text{.}\)
Definition 6.3.13.
For an \(n\)-vertex graph \(G\text{,}\) define
\begin{equation*}
\vartheta(G) = \min_{\substack{f\in\mathcal{O}_{G}\\c\in \mathbb{S}_{n-1}} }\max_{v\in V(G)}\frac{1}{\langle c,f(v)\rangle^2}\text{.}
\end{equation*}
We call \(\vartheta\) the Lovász theta function.
Since this definition is quite complicated, let us try to unpack it a little bit. Let \(z\) be a real number and suppose that we are trying to prove that \(\vartheta(G)\leq z\text{.}\) To certify this inequality, we need to provide an orthogonal representation \(f\) of \(G\) and a vector \(c\in \mathbb{S}_{n-1}\text{.}\) If, for our choice of \(f\) and \(c\text{,}\) we have that \(\langle c,f(v)\rangle^2\geq 1/z\) for all \(v\in V(G)\text{,}\) then we get that \(\vartheta(G)\leq z\text{.}\) If our choice of \(f\) and \(c\) fails to prove that \(\vartheta(G)\leq z\text{,}\) then this means that there exists \(v\in V(G)\) such that \(\langle c,f(v)\rangle^2\lt 1/z\text{.}\) If we think of two elements of \(\mathbb{S}_{n-1}\) as being “far apart” if \(\langle c,f(v)\rangle^2\) is close to zero (i.e. if they are nearly orthogonal), then, basically, when bounding the Lovász theta function, our goal is to find an orthogonal representation \(f\) and a vector \(c\) so that \(c\) is “close to” all of the vectors \(f(v)\) for \(v\in V(G)\) simultaneously. One can think of the letter \(c\) as standing for “close” or “centre.”
Now that we have a feel for what the Shannon capacity and Lovász theta function are, let us link them together. We will prove the following theorem at the end of this section.
Theorem 6.3.15.
For any graph \(G\text{,}\)
\begin{equation*}
\Theta(G)\leq \vartheta(G)\text{.}
\end{equation*}
Let us take a moment to explain why this theorem is so valuable. Proving a lower bound on the Shannon capacity lf
\(G\) just requires one to exhibit a large independent set in
\(G^k\) for some
\(k\text{;}\) that is, lower bounds can be proven constructively. On the other hand, proving an upper bound requires showing that
\(G^k\) does not have a large independent set for any
\(k\text{,}\) which is, at first glance, a much harder problem, since it does not boil down to coming up with a nice construction. However, as we discussed above, upper bounding the Lovász theta function is an inherently constructive problem. Even better,
\(\vartheta(G)\) can be estimated to arbitrary precision in polynomial time; we discuss this further at the end of the section. Thus, amazingly,
Theorem 6.3.15 provides us with a constructive approach to bounding the Shannon capacity which can be carried out in polynomial time!
Before delving into the proof of
Theorem 6.3.15, let us see how it can be used to compute the Shannon capacity of
\(C_5\text{.}\)

Theorem 6.3.16. Lovász [202].
\(\Theta(C_5)=\sqrt{5}\text{.}\)
Proof.
Recall that
\(\Theta(C_5)\geq\sqrt{5}\) by
Corollary 6.3.11 and so it suffices to prove the upper bound. We show that
\(\vartheta(C_5)\leq \sqrt{5}\text{,}\) from which the result will follow by
Theorem 6.3.15.
So, let us come up with an orthonormal representation
\(f\) of
\(C_5\) and a vector
\(c\) such that
\(\langle c,f(v)\rangle^2\geq \sqrt{5}\) for all
\(v\in V(C_5)\text{.}\) By
Remark 6.3.14, we can work in
\(\mathbb{S}_2\) (i.e. the unit sphere in
\(\mathbb{R}^3\)).
First, imagine that we map all 5 of the vertices of \(C_5\) to the unit circle in the \(xy\)-plane so that they are equally spaced and adjacent vertices are mapped to consecutive vectors. When you take 5 vectors in \(\mathbb{R}^2\) spaced equally around the unit circle, then the angle between consecutive vectors is \(2\pi/5\) which is less than \(\pi/2\) and the angle between non-consecutive vectors is \(4\pi/5\) which is greater than \(\pi/2\text{.}\) See the picture below.

Now, imagine that these five vectors are the ribs of an open umbrella, and imagine that you close all of the ribs of the umbrella at the same rate until the non-consecutive vectors become orthogonal. In other words, the “cylindrical coordinates” of the five vectors are \((\rho,\theta,z)\) where \(\theta\) is of the form \(2\pi k/5\) for \(k\in\{1,2,3,4,5\}\) and \(\rho\) and \(z\) are fixed so that all of the vectors have Euclidean length one and pairs of non-consecutive vectors are orthogonal. Thus, mapping \(v_i\) to the vector corresponding to the angle \(\theta=2\pi k/5\) for \(1\leq k\leq 5\) gives us an orthonormal representation of \(C_5\text{.}\) Finally, take \(c\) to be the unit vector on the \(z\)-axis. Note that \(c\) has the same inner product with any of the five vectors of the orthogonal representation, and so, to complete the proof, it suffices to show that the square of the inner product of \(c\) and any of these five vectors is equal to \(1/\sqrt{5}\text{.}\)
This last step requires a bit of trigonometry. Let \(y\) be the length of the line-segment between the endpoints of two non-consecutive vectors of the orthogonal representation. These two vectors have length one and are orthogonal to one another. Thus, in the plane spanned by these two vectors, the vectors and the line segment between their endpoints forms a right triangle with two sides of length one and one side of length \(y\text{.}\) Therefore, \(y=\sqrt{2}\text{.}\)
Now, let us compute \(\rho\text{,}\) the radius of the cylinder that contains all of the endpoints of the vectors of our orthogonal representation. Consider the triangle whose corners are the origin and the endpoints of the projections of two non-consecutive vectors of the orthogonal represenetation into the \(xy\)-plane. The angle of this triangle at the origin is \(4\pi/5\text{,}\) the lengths of the sides which contain the origin are both equal to \(\rho\) and the length of the other side is \(y=\sqrt{2}\text{.}\) By the Law of Cosines,
\begin{equation*}
\sqrt{2}^2 = \rho^2 + \rho^2 - 2\rho^2\cos(4\pi/5)
\end{equation*}
or, in other words,
\begin{equation*}
\rho^2 = \frac{2}{2-2\cos(4\pi/5)} = \frac{2}{2-2\left(\frac{-1}{4}+\frac{-\sqrt{5}}{4}\right)} = \frac{4}{5+\sqrt{5}} = 1-\frac{1}{\sqrt{5}}\text{.}
\end{equation*}
Finally, we compute the inner product of \(c\) and any of the five vectors of the orthogonal representation. Consider the triangle whose corners are the origin, the endpoint of \(f(v_1)\) and the endpoint of the projection of \(f(v_1)\) into the \(xy\)-plane. Then this is a right triangle such that the length of the hypotenuse is 1 and one of the other sides has length \(\rho\text{.}\) So, letting \(\alpha\) denote the angle of this triangle at \(f(v_1)\text{,}\) we get \(\sin(\alpha)=\rho\text{.}\) Note that the angle between \(v_1\) and \(c\) is also equal to \(\alpha\) and, therefore,
\begin{equation*}
\langle c,f(v_1)\rangle^2 = \cos^2(\alpha) = 1-\sin^2(\alpha) = 1-\rho^2 = \frac{1}{\sqrt{5}}
\end{equation*}
which completes the proof.
We now turn our attention to proving
Theorem 6.3.15. We will need the following fact from linear algebra. Note that this fact is obvious when
\(v_1,\dots,v_n\) are the standard basis vectors, and so it should not be surprising that a similar phenomenon holds for general orthonormal bases.
Fact 6.3.17.
If \(\{v_1,\dots,v_n\}\) is an orthonormal basis of \(\mathbb{R}^n\text{,}\) then, for any \(c\in\mathbb{R}^n\text{,}\)
\begin{equation*}
\sum_{i=1}^n\langle v_i,c\rangle^2 = \langle c,c\rangle\text{.}
\end{equation*}
Proof.
Let \(A\) be the \(n\times n\) matrix whose columns are the vectors \(v_1,\dots,v_n\text{.}\) Since \(\{v_1,\dots,v_n\}\) is an orthonormal basis, we have that \(AA^T=I\text{.}\) We have
\begin{equation*}
\sum_{i=1}^n\langle v_i,c\rangle^2 =\langle c^TA, c^TA\rangle = (c^TA)(c^TA)^T = (c^TA)(A^Tc) = c^Tc = \langle c,c\rangle
\end{equation*}
which completes the proof.
Let us now bound \(\vartheta(G)\) below by \(\alpha(G)\text{.}\)
Lemma 6.3.18.
Every graph \(G\) satisfies \(\alpha(G)\leq \vartheta(G)\text{.}\)
Proof.
Let \(S\) be an independent set in \(G\text{,}\) let \(f\) be an orthonormal representation of \(G\) and let \(c\) be a unit vector. By definition of \(\vartheta(G)\text{,}\) we have that \(\langle f(v),c\rangle^2\geq 1/\vartheta(G)\) for all \(v\in V(G)\text{.}\) Therefore,
\begin{equation*}
\sum_{v\in S}\langle f(v),c\rangle^2\geq \alpha(G)/\vartheta(G)\text{.}
\end{equation*}
On the other hand, since
\(\{f(v):
v\in S\}\) is an orthonormal system (not necessarily a basis),
Fact 6.3.17 implies the following:
\begin{equation*}
\sum_{v\in S}\langle f(v),c\rangle^2 \leq \langle c,c\rangle=1\text{.}
\end{equation*}
Putting these two bounds together completes the proof.
Earlier, we saw that, for any graph \(G\text{,}\) the sequence \((a_k)_{k=1}^{\infty}\) defined by \(a_k=\alpha(G^k)\) is supermultiplicative. The next lemma says that, in contrast, the Lovász theta function satisfies a submultiplicativity condition.
Lemma 6.3.19.
For every pair \(G\) and \(H\) of graphs, \(\vartheta(G\boxtimes H)\leq \vartheta(G)\vartheta(H)\text{.}\)
Proof.
Proof of Theorem 6.3.15.
Let
\(G\) be a graph and
\(k\) be a positive integer. By
Lemma 6.3.18, we have
\begin{equation*}
\alpha(G^k)\leq \vartheta(G^k)\text{.}
\end{equation*}
\begin{equation*}
\vartheta(G^k)\leq \vartheta(G)^k\text{.}
\end{equation*}
Putting theset two inequalities together gives us
\begin{equation*}
\alpha(G^k)^{1/k}\leq \vartheta(G)\text{.}
\end{equation*}
Since this holds for all \(k\geq1\text{,}\) we get \(\Theta(G)\leq \vartheta(G)\text{.}\)
As we have seen, the Lovász theta function gives a tight upper bound on the Shannon capacity of \(C_5\text{.}\) Sadly, this is not the case for all graphs. For example, as was alluded to earlier, we do not know the Shannon capacity of any odd cycle apart from \(K_3\) and \(C_5\text{,}\) despite the fact that the Shannon capacity has been computed for all odd cycles.
Turning back to algorithmic questions for a moment, recall that there is no known algorithm for computing the Shannon capacity of a graph. As we alluded to earlier, the Lovász theta function is much easier to compute. We close this section by explaining, on a surface level, why this is the case. Let \(G\) be a graph with vertices \(v_1,\dots,v_n\text{,}\) let \(f\) be an orthogonal representation of \(G\) and let \(c\in\mathbb{S}_{n-1}\) be a vector. For convenience, we let \(x_i:=f(v_i)\) for \(1\leq i\leq n\) and let \(x_{n+1}:=c\text{.}\) Let \(M=(m_{i,j})_{1\leq i,j\leq n+1}\) be an \((n+1)\times (n+1)\) matrix where \(m_{i,j}\) is equal to \(\langle x_i,x_j\rangle\) for all \(1\leq i,j\leq n+1\text{.}\) In the literature, such a matrix (i.e. a matrix of inner products of real vectors) is known as a Gram matrix . We observe that
\(m_{i,i}=1\) for all \(1\leq i\leq n+1\) (because \(x_i\) is a unit vector),
\(m_{i,j}=0\) whenever \(1\leq i,j\leq n\) and \(v_i\) and \(v_j\) are not adjacent.
We can now think about the problem of computing \(\vartheta(G)\) in terms of an optimization problem involving the entries of the Gram matrix. That is, \(\vartheta(G)\) is equal to the value of the following optimization problem:
min |
\(z\) |
subject to |
\(z\geq \frac{1}{m_{i,n+1}^2}\) for all \(1\leq i\leq n\text{,}\)
|
|
\(m_{i,i}=1\) for all \(1\leq i\leq n+1\text{,}\)
|
|
\(m_{i,j}=0\) for all \(1\leq i\leq n\) such that \(v_iv_j\notin E(G)\text{,}\)
|
|
\(M\) is a Gram matrix. |
Let us try to get this optimization problem into a more convenient form. Rather than trying to compute \(\vartheta(G)\) directly, the following optimization problem computes \(1/\sqrt{\vartheta(G)}\text{.}\)
max |
\(z\) |
subject to |
\(z\leq m_{i,n+1}\) for all \(1\leq i\leq n\text{,}\)
|
|
\(m_{i,i}=1\) for all \(1\leq i\leq n+1\text{,}\)
|
|
\(m_{i,j}=0\) for all \(1\leq i\leq n\) such that \(v_iv_j\notin E(G)\text{,}\)
|
|
\(M\) is a Gram matrix. |
The constraint that \(M\) is a Gram matrix seems like it is rather strange and probably hard to work with. However, as it turns out, there is a nice way to express this in terms of the eigenvalues of \(M\text{.}\) Consider the following definition.
Definition 6.3.20.
A real square matrix \(M\) is positive semidefinite if and only if \(M=M^T\) and all of the eigenvalues of \(M\) are non-negative. We write PSD to mean that \(M\) is positive semidefinite.
As it turns out, Gram matrices and positive semidefinite matrices are precisely the same thing.
Fact 6.3.21.
A real matrix \(M\) is positive semidefinite if and only if it is a Gram matrix.
So, putting this all together, the following optimization problem computes \(1/\sqrt{\vartheta(G)}\text{.}\)
max |
\(z\) |
subject to |
\(z\leq m_{i,n+1}\) for all \(1\leq i\leq n\text{,}\)
|
|
\(m_{i,i}=1\) for all \(1\leq i\leq n+1\text{,}\)
|
|
\(m_{i,j}=0\) for all \(1\leq i\leq n\) such that \(v_iv_j\notin E(G)\text{,}\)
|
|
\(M\succcurlyeq0\text{.}\) |
The above is an example of something known as a “semidefinite program,” which is a class of optimization problems that generalize linear programs. Such problems can be solved to within a precision of \(\varepsilon\) in time that is polynomial in the number of constraints and variables and \(1/\varepsilon\text{.}\) Thus, \(\vartheta(G)\) can be approximately computed in an efficient manner.
Here are a couple of YouTube videos related to Shannon capacity: