Skip to main content

Section 5.3 Proof of the Regularity Lemma

Now that we have seen a few applications of the Regularity Lemma (Theorem 5.2.1), let’s finally prove it. None of the individual steps of the proof are terribly difficult, but the whole proof is a bit long and technical. It is probably the most “involved” proof that we will see in this course. However, it is a powerful tool and, therefore, worth proving (and worth seeing the proof at least once in your life).
described in detail following the image
“Mathematician trying to prove the regularity lemma.”

High-Level Overview.

Before diving into details, let us take a moment to discuss the basic strategy of the proof. Let \(\varepsilon>0\) and \(t\geq 1\) and define \(t_0:=t\text{.}\) Let \(G\) be any graph and let \(n:=|V(G)|\text{.}\) Let \(\mathcal{P}^0 = (X_1^0,\dots,X_{t_0}^0)\) be an arbitrary partition of \(V(G)\) into \(t_0\) sets, each of cardinality \(\lfloor n/t_0\rfloor\) or \(\lceil n/t_0\rceil\text{.}\)
Obviously, a silly arbitrary partition \(\mathcal{P}^0\) will usually fail to do the job. In particular, it will fail to satisfy property a of Theorem 5.2.1. The first order of business will be to define a parameter which “measures” how poor the partition \(\mathcal{P}^0\) (or, any other partition) is at doing this job. The way that this will be measured is via the “mean square density” function \(f\) defined in Definition 5.3.1. The function \(f\) will have the property that \(f(\mathcal{P})\) is between \(0\) and \(1\) for any partition \(\mathcal{P}\text{.}\) We think of a partition \(\mathcal{P}'\) as being “better” than a partition \(\mathcal{P}\) from the perspective of the mean-squared density function if \(f(\mathcal{P}')>f(\mathcal{P})\text{.}\)
The key to the proof is that, if many pairs of sets in the partition \(\mathcal{P}^0\) fail to be \(\varepsilon\)-regular, then it is possible to refine \(\mathcal{P}^0\) to obtain a partition \(\mathcal{Q}^1\) of \(V(G)\) into at most \(t_02^{t_0-1}\) sets such that \(f(\mathcal{Q}^1)\geq f(\mathcal{P}^0)+\varepsilon^5/2\text{.}\) The sets in the partition \(\mathcal{Q}^1\) may not be of close to the same cardinality as one another. However, by refining the partition further and doing a bit of “cleaning up,” we will get a partition \(\mathcal{P}^1\) with \(f(\mathcal{P}^1)\geq f(\mathcal{P}^0)+\varepsilon^5/4\) such that the number \(t_1\) of classes of \(\mathcal{P}^1\) is at most \(\left\lceil\frac{8t_{0}2^{t_{0}-1}}{\varepsilon^5}\right\rceil\) and all the classes of \(\mathcal{P}^1\) have size \(\lfloor n/t_1\rfloor\) or \(\lceil n/t_1\rceil\text{.}\) If a large number of the pairs of sets in \(\mathcal{P}^1\) still fail to be \(\varepsilon\)-regular, then we repeat this process, obtaining a partition \(\mathcal{P}^2\) into \(t_2\leq\left\lceil\frac{8t_{1}2^{t_{1}-1}}{\varepsilon^5}\right\rceil\) parts with mean square density at least \(\varepsilon^5/4\) greater than that of \(\mathcal{P}^1\text{,}\) and so on. Since every partition of \(V(G)\) has mean square density between zero and one, this procedure must terminate after at most \(4/\varepsilon^5\) steps, yielding a partition which satisfies condition a of Theorem 5.2.1. Letting \(m=\lfloor 4/\varepsilon^5\rfloor\) be an upper bound on the number of iterations, the number of sets in the final partition is at most \(t_m\) where, for \(1\leq i\leq m\text{,}\) the quantity \(t_i\) is defined by the recurrence \(t_0=t\) and \(t_i:=\left\lceil\frac{8t_{i-1}2^{t_{i-1}-1}}{\varepsilon^5}\right\rceil\text{.}\) Letting \(k_0(\varepsilon,t):=t_m\text{,}\) we see that property c of Theorem 5.2.1 holds as well.
Now, let’s get into the details. Our first step is to formally define the mean square density of a partition.

Definition 5.3.1.

Given a partition \(\mathcal{P}=(X_1,\dots,X_t)\) of \(V(G)\text{,}\) define the mean square density of \(\mathcal{P}\) to be
\begin{equation*} f(\mathcal{P}):=\frac{1}{n^2}\sum_{\substack{(i,j)\in [t]^2\\ i\neq j} }|X_i||X_j|d(X_i,X_j)^2\text{.} \end{equation*}

Remark 5.3.2.

Let us try to explain why \(f(\mathcal{P})\) is called the “mean square density.” Let \(x,y\in V(G)\) be two vertices chosen uniformly at random, independently of one another, and let
\begin{equation*} D:=\begin{cases}d(X_i,X_j)^2 \amp \text{ if } (x,y)\in X_i\times X_j\text{ for } i\neq j,\\ 0 \amp \text{ otherwise } . \end{cases} \end{equation*}
Then it is easy to see that \(f(\mathcal{P})\) is precisely the expected value (in other words, the mean) of \(D\text{.}\) So, in some sense, \(f(\mathcal{P})\) computes the mean of the square of the density from \(X_i\) to \(X_j\text{,}\) where the pair \((X_i,X_j)\) is chosen with probability proportional to \(|X_i||X_j|\text{.}\)

Observation 5.3.3.

Clearly, \(f(A,B)\geq 0\) for any \(A\) and \(B\text{.}\) Therefore, \(f(\mathcal{P})\geq0\) for any partition \(\mathcal{P}\) of \(V(G)\text{.}\)
A basic fact is that any partition has mean square density at most one.

Proof.

Recall that, for partitions \(\mathcal{P} = (X_1,\dots,X_m)\) and \(\mathcal{P}'=(X_1',\dots,X_\ell')\) of \(V(G)\text{,}\) we say that \(\mathcal{P}'\) refines \(\mathcal{P}\) if for every \(1\leq i\leq \ell\) there exists \(1\leq j\leq m\) such that \(X_i'\subseteq X_j\text{.}\) Another important fact which we will use is that refining a partition does not decrease its mean square density. The following lemma will be used to compare the “contribution” from a pair \((X_i,X_j)\) in a partition of \(V(G)\) to the mean square density to the contribution of the subsets of \(X_i\) and \(X_j\) in a refinement of the partition.

Proof.

By a simple double-counting argument, we see that
\begin{equation} e(A,B) = \sum_{i=1}^s\sum_{j=1}^t e(A_i,B_j) = \sum_{i=1}^s\sum_{j=1}^t|A_i||B_j|d(A_i,B_j)\text{.}\tag{5.3.1} \end{equation}
Squaring both sides yields
\begin{equation*} e(A,B)^2 = \left(\sum_{i=1}^s\sum_{j=1}^t|A_i||B_j|d(A_i,B_j)\right)^2\text{.} \end{equation*}
For each \((i,j)\in [s]\times[t]\text{,}\) let
\begin{equation*} x_{i,j}:=\sqrt{|A_i||B_j|} \end{equation*}
and
\begin{equation*} y_{i,j}:=d(A_i,B_j)\sqrt{|A_i||B_j|}\text{.} \end{equation*}
By Hölder’s Inequality (Lemma C.0.4), we get that
\begin{equation*} \left(\sum_{i=1}^s\sum_{j=1}^t|A_i||B_j|d(A_i,B_j)\right)^2 = \left(\sum_{i=1}^s\sum_{j=1}^t|x_{i,j}y_{i,j}|\right)^2\leq \left(\sum_{i=1}^s\sum_{j=1}^tx_{i,j}^2\right)\left(\sum_{i=1}^s\sum_{j=1}^ty_{i,j}^2\right) \end{equation*}
\begin{equation*} =\left(\sum_{i=1}^s\sum_{j=1}^t|A_i||B_j|\right)\left(\sum_{i=1}^s\sum_{j=1}^t|A_i||B_j|d(A_i,B_j)^2\right) \end{equation*}
It is easily observed that
\begin{equation} \sum_{i=1}^s\sum_{j=1}^t|A_i||B_j| = |A||B|\text{.}\tag{5.3.2} \end{equation}
Therefore,
\begin{equation*} e(A,B)^2\leq |A||B|\left(\sum_{i=1}^s\sum_{j=1}^t|A_i||B_j|d(A_i,B_j)^2\right)\text{.} \end{equation*}
Dividing both sides by \(|A||B|\) and recalling that \(d(A,B)=\frac{e(A,B)}{|A||B|}\) completes the proof.

Proof.

Write \(\mathcal{P}=(X_1,\dots,X_m)\) and \(\mathcal{P}'=(X_1',\dots,X_\ell')\text{.}\) For \(1\leq i\leq m\text{,}\) let
\begin{equation*} R_i:=\{j: 1\leq r\leq \ell\text{ and } X_r'\subseteq X_i\}\text{.} \end{equation*}
Then, by Lemma 5.3.5 to every pair \((X_i,X_j)\) with \(i\neq j\text{,}\) we get
\begin{equation*} f(\mathcal{P}) = \frac{1}{n^2}\sum_{i\neq j}|X_i||X_j|d(X_i,X_j)^2 \leq \frac{1}{n^2}\sum_{i\neq j}\sum_{r\in R_i}\sum_{s\in R_j}|X_r'||X_s'|d(X_r',X_s')^2 = f(\mathcal{P}') \end{equation*}
as desired.
Note that the inequality
\begin{equation*} \left(\sum_{i=1}^n|x_iy_i|\right)^2\leq \left(\sum_{i=1}^nx_i^2\right)\left(\sum_{i=1}^ny_i^2\right) \end{equation*}
in Lemma C.0.4 is tight if and only if the vectors \((x_1,\dots,x_n)\) and \((y_1,\dots,y_n)\) are constant multiples of one another. What does this tell us in the context of the proof of Lemma 5.3.5? It tells us that the “gain” in the mean square density will be zero if and only if there exists \(d\) such that \(d(A_i,B_j)=d\) for all \((i,j)\in [s]\times [t]\text{.}\) Of course, if such a \(d\) exists, then it must be the case that \(d=d(A,B)\text{.}\) Thus, if \((A,B)\) is not \(\varepsilon\)-regular and we refine \(A\) and \(B\) in a way that “pulls out” some subsets of \(A\) and \(B\) of density different from \(d(A,B)\text{,}\) then we should be able to get a “boost” in mean square density. The following lemma quantifies the size of this “boost.”

Proof.

We have
\begin{equation*} |A_1||B_1|(d(A_1,B_1)-d(A,B))^2\leq \sum_{1\leq i,j\leq 2}|A_i||B_j|(d(A_i,B_j)-d(A,B))^2\text{.} \end{equation*}
Expanding \((d(A_i,B_j)-d(A,B))^2\) in the sum on the right side (for each \(1\leq i,j\leq 2\) individually) yields the following expression:
\begin{equation*} \sum_{1\leq i,j\leq 2}|A_i||B_j|d(A_i,B_j)^2 - 2d(A,B)\sum_{1\leq i,j\leq 2}|A_i||B_j|d(A_i,B_j) + d(A,B)^2\sum_{1\leq i,j\leq 2}|A_i||B_j|\text{.} \end{equation*}
Applying (5.3.1) to the second summation and (5.3.2) to the third, we get that this is equal to
\begin{equation*} \sum_{1\leq i,j\leq 2}|A_i||B_j|d(A_i,B_j)^2 -|A||B|d(A,B)^2\text{.} \end{equation*}
We now add \(|A||B|d(A,B)^2\) to both sides to get the desired bound.
Okay, let’s describe an algorithm for producing the partition in Theorem 5.2.1. Recall that \(t_0:=t\) and \(\mathcal{P}^0\) is an arbitrary partition of \(V(G)\) into sets \(X_1^0,\dots,X_{t_0}^0\text{,}\) each of size roughly \(n/t_0\text{.}\) If \(\mathcal{P}^0\) already satisfies condition a of Theorem 5.2.1, then we have nothing more to do, and we simply output it.
So, assume that condition a of Theorem 5.2.1 is not satisfied. For each pair \((X_i^0,X_j^0)\) which is not \(\varepsilon\)-regular and such that \(i\neq j\text{,}\) take \(X_{i,j}\subseteq X_i^0\) and \(X_{j,i}\subseteq X_j^0\) such that \(|X_{i,j}|\geq \varepsilon|X_i^0|\text{,}\) \(|X_{j,i}|\geq \varepsilon|X_j^0|\) and \(|d(X_{i,j},X_{j,i}) - d(X_i^0,X_j^0)|\geq \varepsilon\text{.}\) Let \(X_{i,j}':=X_i^0\setminus X_{i,j}\) and \(X_{j,i}':=X_j^0\setminus X_{j,i}\text{.}\) By Lemma 5.3.7, if we replace \(X_i^0\) and \(X_j^0\) with four parts \(X_{i,j},X_{i,j}',X_{j,i},X_{j,i}'\text{,}\) then the contribution of the pair \((X_i^0,X_j^0)\) to the mean square density increases by at least
 1 
Note that the \(n^2\) on the denominator comes from the original definition of the mean square density in Definition 5.3.1.
\begin{equation*} \frac{(d(X_{i,j},X_{j,i}) - d(X_i^0,X_j^0))^2|X_{i,j}||X_{j,i}|}{n^2}\geq \frac{\varepsilon^2\cdot \varepsilon|X_i^0|\varepsilon|X_j^0|}{n^2} \geq \frac{\varepsilon^4\left\lfloor n/t_0\right\rfloor^2}{n^2}>\frac{\varepsilon^4}{2t_0^2}\text{.} \end{equation*}
Also, crucially, by Lemma 5.3.5, any partitions of \(X_i^0\) and \(X_j^0\) that refine the partitions \((X_{i,j},X_{i,j}')\) and \((X_{j,i},X_{j,i}')\) will also give us an extra contribution of at least \(\frac{\varepsilon^4}{2t_0^2}\) to the mean square density compared with the contribution from \((X_i^0,X_j^0)\text{.}\) So, if there are more than \(\varepsilon t_0^2\) non-\(\varepsilon\)-regular pairs and we refine every non-\(\varepsilon\)-regular pair in the way described above (at the same time), then we get a total boost of at least
\begin{equation*} \left(\frac{\varepsilon^4}{2t_0^2}\right)\left(\varepsilon t_0^2\right) \geq \frac{\varepsilon^5}{2} \end{equation*}
to the mean square density.
Now, let \(\mathcal{Q}^1\) be the partition obtained from the refinement procedure described in the previous paragraph. The following lemma explains how to refine \(\mathcal{Q}^1\) further and rearrange it so that all sets have roughly the same size without decreasing the mean square density by very much.

Proof.

Let \(t= \left\lceil\frac{2m}{\delta}\right\rceil\) and let \(0\leq r\leq t-1\) such that
\begin{equation*} |V(G)| = t\left\lfloor \frac{n}{t}\right\rfloor + r\text{.} \end{equation*}
Let \(\tilde{\mathcal{Q}}\) be an arbitrary refinement of \(\mathcal{Q}\) obtained by breaking up each class \(X_i\) of \(\mathcal{Q}\) into sets of size \(\lfloor n/t\rfloor\) or \(\lceil n/t\rceil\) and one “leftover” set \(Y_i\subseteq X_i\) with \(|Y_i|\lt n/t\text{.}\) When doing so, one needs to be careful to maintain the restriction that the total number of sets of size \(\lfloor n/t\rfloor\) is at most \(t-r\) and the number of size \(\lceil n/t\rceil\) is at most \(r\text{.}\) By Lemma 5.3.5,
\begin{equation*} f(\tilde{\mathcal{Q}})\geq f(\mathcal{Q})\text{.} \end{equation*}
Now, we let \(\mathcal{P}\) be a partition obtained from \(\tilde{\mathcal{Q}}\) by putting all of the sets of leftover vertices together and breaking them up into sets of size \(\lfloor n/t\rfloor\) or \(\lceil n/t\rceil\) abitrarily. Note that it is possible to do this since our partitioning of the rest of the vertices uses at most \(t-r\) sets of size \(\lfloor n/t\rfloor\) and at most \(r\) of size \(\lceil n/t\rceil\text{.}\)
The last step is to show that the mean square density of \(\mathcal{P}\) cannot be much smaller than that of \(\tilde{\mathcal{Q}}\text{.}\) For each \(1\leq i\leq m\text{,}\) the number of leftover vertices in the partition of \(X_i\) is at most \(n/t\text{.}\) Since \(d(A,B)\leq 1\) for any two disjoint non-empty sets \(A\) and \(B\text{,}\) the contribution of these leftover sets to the mean square density of \(\tilde{\mathcal{Q}}\) is at most
\begin{equation*} 2\sum_{i=1}^m\frac{|Y_i|n}{n^2} \leq \frac{2}{n}\sum_{i=1}^m|Y_i|\leq 2m\left(\frac{n}{t}\right)\leq \delta\text{.} \end{equation*}
Thus, when we put together all these leftover vertices and divide them up differently, the most that we can lose from the mean square density is \(\delta\text{.}\) This completes the proof of the lemma.
So, the partition \(\mathcal{P}^1\) is obtained from \(\tilde{\mathcal{Q}^1}\) by applying Lemma 5.3.8 with \(\delta=\frac{\varepsilon^5}{4}\text{.}\) To finish the proof, all that we need to notice is that the procedure for obtaining \(\mathcal{P}^1\) from \(\mathcal{P}^0\) that we have just described can be applied in exactly the same way to obtain \(\mathcal{P}^2\) from \(\mathcal{P}^1\text{,}\) and so on, until we finally reach a partition which satisfies property a of Theorem 5.2.1. Since every iteration increases the mean square density by at least \(\varepsilon^5/4\text{,}\) this will terminate after at most \(4/\varepsilon^5\) steps. The total number of sets in the final partition will be bounded above by \(t_m\) where \(m=\lfloor 4/\varepsilon^5\rfloor\text{,}\) \(t_0=t\) and \(t_i=\left\lceil\frac{8t_{i-1}2^{kt{i-1}-1}}{\varepsilon^5}\right\rceil\) for all \(i\geq1\text{.}\)

Remark 5.3.9.

Notice, here, that the number of sets in the final partition is, in the worst case, completely enormous. In particular, we have
\begin{equation*} k_0(\varepsilon,t) = t_m \geq 2^{t_{m-1}} \geq 2^{2^{t_{m-2}}}\geq 2^{2^{2^{t_{m-3}}}}\geq 2^{2^{2^{2^{\dots^{2^{t_0}}}}}}\text{.} \end{equation*}
That is, the number of sets in the final partition may be as large as a “tower of \(2\)s” of height \(\varepsilon^{-O(1)}\text{.}\)
Here are a couple of YouTube videos related to the proof of the Regularity Lemma: