Skip to main content

Section 7.2 Shearer’s Lemma and Some Applications

By combining Lemma 7.1.9 and Lemma 7.1.11, we get that, for any random variables \(X_1,\dots,X_n\text{,}\)
\begin{equation*} \mathbb{H}(X_1,\dots,X_n)=\mathbb{H}(X_1)+\sum_{i=2}^n\mathbb{H}(X_i\mid X_1,\dots,X_{i-1}) \leq \sum_{i=1}^n\mathbb{H}(X_i)\text{.} \end{equation*}
The following result, known as Shearer’s Lemma, can be seen as a generalization of this simple fact. We will use it to prove two nice theorems shortly.

Proof.

For each \(i\in[n]\text{,}\) let \(\mathcal{F}_i:=\{F\in\mathcal{F}: i\in F\}\text{.}\) By the Chain Rule,
\begin{equation*} t\mathbb{H}(X_1,\dots,X_n) = \sum_{i=1}^n t\cdot \mathbb{H}(X_i\mid X_1,\dots,X_{i-1}). \end{equation*}
By the hypotheses of the lemma, we have that \(|\mathcal{F}_i|\geq t\) for all \(i\in[n]\text{.}\) So, the above expression is at most
\begin{equation*} \sum_{i=1}^n \sum_{F\in \mathcal{F}_i} \mathbb{H}(X_i\mid X_1,\dots,X_{i-1}) = \sum_{F\in\mathcal{F}}\sum_{i\in F}\mathbb{H}(X_i\mid X_1,\dots,X_{i-1}). \end{equation*}
Now, by the Deconditioning Lemma (Lemma 7.1.11), this is at most
\begin{equation*} \sum_{F\in\mathcal{F}}\sum_{i\in F}\mathbb{H}(X_i\mid X_j: j < i \text{ and }j\in F) = \sum_{F\in\mathcal{F}}\mathbb{H}(X_i:i\in F) \end{equation*}
which completes the proof.
The following is a simple consequence of Shearer’s Lemma, known as Han’s Inequality.

Proof.

Apply Shearer’s Lemma to the collection \(\mathcal{F}:=\{[n]\setminus\{j\}: 1\leq j\leq n\}\text{.}\) Clearly, each \(i\in [n]\) is contained in exactly \(n-1\) of the sets in \(\mathcal{F}\text{.}\)
As a few quick application of Han’s Inequality, let us prove the well-known Loomis–Whitney Inequality and an edge-isoperimetric inequality in the hypercube.
described in detail following the image
“Loomis–Whitney.”

Proof.

For each integer \(k\geq1\text{,}\) let
\begin{equation*} \mathcal{C}_k:=\left\{\prod_{j=1}^n\left[\frac{\ell_j}{k},\frac{\ell_j+1}{k}\right):\ell_1,\dots,\ell_n\in\mathbb{Z}\right\}\text{.} \end{equation*}
That is, it is the set of all “half open” cubes with side-lengths \(1/k\) whose bottom corner is in the set \(\{\ell/k: \ell\in\mathbb{Z}\}^n\text{.}\) Let \(B_k\) be the union of all unit cubes \(C\in\mathcal{C}_k\) such that \(C\cap B_k\neq \emptyset\text{.}\) Then
\begin{equation} Vol_{n}(B) = \lim_{k\to\infty}Vol_{n}(B_k)\text{.}\tag{7.2.1} \end{equation}
A similar phenomenon holds for all of the projections of \(B\) as well. The rest of the proof will focus on \(B_k\) as opposed to \(B\text{;}\) at the end, we will use (7.2.1) to translate the result back to \(B\text{.}\)
Let \(B_k=\bigsqcup_{i=1}^N C_i\) where \(C_i\in\mathcal{C}_k\) for all \(1\leq i\leq k\text{.}\) Each cube \(C_i\) has measure \(k^{-n}\) and so \(Vol_{n}(B_k)=N\cdot k^{-n}\text{.}\) Choose \(i\in [N]\) uniformly at random and let \(X=(X_1,\dots,X_n)\) where \(X_j\) is the \(j\)th coordinate of the bottom corner of the cube \(C_i\text{.}\) Then \(X\) is a uniform random variable whose range has cardinality \(N=k^nVol_{n}(B_k)\text{.}\) Thus, by Lemma 7.1.4,
\begin{equation*} \log_2(k^nVol_{n}(B_k)) = \mathbb{H}(X)\text{.} \end{equation*}
By Han’s Inequality,
\begin{equation*} \mathbb{H}(X)\leq \frac{1}{n-1}\sum_{j=1}^n\mathbb{H}(X_i: i\neq j)\text{.} \end{equation*}
Now, for fixed \(1\leq j\leq n\text{,}\) we have that \(\mathbb{H}(X_i: i\neq j)\) is a random variable whose range has cardinality precisely equal to \(k^{n-1}Vol_{n-1}(\pi_j(B_k))\text{.}\) Applying Lemma 7.1.4 again, we get
\begin{equation*} \mathbb{H}(X_i:i\neq j) \leq \log_2(Vol_{n-1}(\pi_j(B_k)))\text{.} \end{equation*}
Putting this all together, we get that
\begin{equation*} k^nVol_{n}(B_k)\leq \prod_{j=1}^n\left(k^{n-1}Vol_{n-1}(\pi_j(B_k))\right)^{1/(n-1)} \end{equation*}
\begin{equation*} \Rightarrow Vol_{n}(B_k)\leq \prod_{j=1}^nVol_{n-1}(\pi_j(B_k))^{1/(n-1)} \end{equation*}
The result follows by taking the limit as \(k\to \infty\) and applying (7.2.1) and the analogous statements for the projections of \(B\text{.}\)

Example 7.2.4.

If \(B=\prod_{i=1}^n[0,a_i]\) is an axis-aligned cube, then \(Vol_{n}(B)=\prod_{i=1}^na_i\) and \(Vol_{n-1}(B_j)=\prod_{i\neq j}a_i\) for all \(1\leq j\leq n\text{.}\) Therefore, every such \(B\) satisfies the bound in Theorem 7.2.3 with equality.

Proof.

Let \(X=(X_1,\dots,X_d)\) be a uniformly random vertex of \(A\) where, for \(1\leq i\leq d\text{,}\) \(X_i\) is the \(i\)th coordinate of \(X\text{.}\) By Lemma 7.1.4  and Han’s Inequality, we have
\begin{equation*} \log_2(|A|) = \mathbb{H}(X) = n\mathbb{H}(X) -(n-1) \mathbb{H}(X) \end{equation*}
\begin{equation*} \geq n\mathbb{H}(X) - \sum_{j=1}^n\mathbb{H}(X_i: i\neq j) = \sum_{j=1}^n[\mathbb{H}(X) - \mathbb{H}(X_i: i\neq j)]\text{.} \end{equation*}
By Lemma 7.1.9, this is equal to
\begin{equation*} \sum_{j=1}^n\mathbb{H}(X_j\mid X_i: i\neq j)\text{.} \end{equation*}
Note that, for each \(1\leq j\leq n\text{,}\) \(\rng(X_i:i\neq j)\) is the set of all sequences of the form \((x_1,\dots,x_{j-1},x_{j+1},\dots,x_n)\) where \((x_1,\dots,x_n)\in A\text{.}\) Going all the way back to the definition of conditional entropy, we get that
\begin{equation*} \mathbb{H}(X_j\mid X_i: i\neq j) = \sum_{(x_i: i\neq j)\in \rng(X_i:i\neq j)}\mathbb{P}((X_i: i\neq j)=(x_i: i\neq j))\mathbb{H}(X_j\mid (X_i: i\neq j)=(x_i: i\neq j))\text{.} \end{equation*}
Given \(x = (x_1,\dots,x_n)\in \{0,1\}^d\text{,}\) let \(x^j\) be the vertex of the hypercube obtained from \(x\) by changing the \(j\)th coordinate from \(0\) to \(1\) or vice versa. Then, for any such pair \(\{x,x^j\}\subseteq \{0,1\}^d\text{,}\) we have
\begin{equation*} \mathbb{P}((X_i: i\neq j)=(x_i: i\neq j)) = \frac{|\{x\cap x^j\}\cap A|}{|A|}\text{.} \end{equation*}
Also, the number of choices for \(X_j\) given that \((X_i:i\neq j)=(x_i: i\neq j)\) is equal to \(|\{x\cap x^j\}\cap A|\text{.}\) Thus, assuming that \((x_i:i\neq j)\in \rng(X_i:i\neq j)\text{,}\) we have
\begin{equation*} \mathbb{H}(X_j\mid (X_i: i\neq j)=(x_i: i\neq j))=\log_2(|\{x\cap x^j\}\cap A|) \end{equation*}
which is equal to \(\log_2(1)=0\) if \(|\{x\cap x^j\}\cap A|=1\) and \(\log_2(2)=1\) otherwise. Thus, \(\mathbb{H}(X_j\mid X_i: i\neq j)\) is equal to \(\frac{2}{|A|}\) times the number of pairs \(\{x,x^j\}\subseteq \{0,1\}^d\) such that both \(x\) and \(x^j\) belong to \(A\text{.}\) In other words, \(\mathbb{H}(X_j\mid X_i: i\neq j)\) is precisely equal to the number of edges within \(A\) whose endpoints differ on the \(j\)th coordinate. Therefore,
\begin{equation*} \sum_{j=1}^n\mathbb{H}(X_j\mid X_i: i\neq j) = \frac{2e(A)}{|A|}\text{.} \end{equation*}
Putting this all together gives us
\begin{equation*} \log_2(|A|) \geq\frac{2e(A)}{|A|} \end{equation*}
and we are done by solving for \(e(A)\text{.}\)

Example 7.2.6.

If \(A\) induces a hypercube of dimension \(k\) within \(Q_d\text{,}\) then \(|A|=2^k\) and
\begin{equation*} e(A) = k2^{k-1} = \frac{|A|}{2}\log_2(|A|) \end{equation*}
and so \(A\) satisfies the bound in Theorem 7.2.5 with equality.
Here are two YouTube videos related to Shearer’s Lemma: