Section 5.3 Proof of the Regularity Lemma
Now that we have seen a few applications of the Regularity Lemma (
Theorem 5.2.1), let’s finally prove it. None of the individual steps of the proof are terribly difficult, but the whole proof is a bit long and technical. It is probably the most “involved” proof that we will see in this course. However, it is a powerful tool and, therefore, worth proving (and worth seeing the proof at least once in your life).

“Mathematician trying to prove the regularity lemma.”
High-Level Overview.
Before diving into details, let us take a moment to discuss the basic strategy of the proof. Let \(\varepsilon>0\) and \(t\geq 1\) and define \(t_0:=t\text{.}\) Let \(G\) be any graph and let \(n:=|V(G)|\text{.}\) Let \(\mathcal{P}^0 = (X_1^0,\dots,X_{t_0}^0)\) be an arbitrary partition of \(V(G)\) into \(t_0\) sets, each of cardinality \(\lfloor n/t_0\rfloor\) or \(\lceil n/t_0\rceil\text{.}\)
Obviously, a silly arbitrary partition
\(\mathcal{P}^0\) will usually fail to do the job. In particular, it will fail to satisfy
property a of
Theorem 5.2.1. The first order of business will be to define a parameter which “measures” how poor the partition
\(\mathcal{P}^0\) (or, any other partition) is at doing this job. The way that this will be measured is via the “mean square density” function
\(f\) defined in
Definition 5.3.1. The function
\(f\) will have the property that
\(f(\mathcal{P})\) is between
\(0\) and
\(1\) for any partition
\(\mathcal{P}\text{.}\) We think of a partition
\(\mathcal{P}'\) as being “better” than a partition
\(\mathcal{P}\) from the perspective of the mean-squared density function if
\(f(\mathcal{P}')>f(\mathcal{P})\text{.}\)
The key to the proof is that, if many pairs of sets in the partition
\(\mathcal{P}^0\) fail to be
\(\varepsilon\)-regular, then it is possible to refine
\(\mathcal{P}^0\) to obtain a partition
\(\mathcal{Q}^1\) of
\(V(G)\) into at most
\(t_02^{t_0-1}\) sets such that
\(f(\mathcal{Q}^1)\geq f(\mathcal{P}^0)+\varepsilon^5/2\text{.}\) The sets in the partition
\(\mathcal{Q}^1\) may not be of close to the same cardinality as one another. However, by refining the partition further and doing a bit of “cleaning up,” we will get a partition
\(\mathcal{P}^1\) with
\(f(\mathcal{P}^1)\geq f(\mathcal{P}^0)+\varepsilon^5/4\) such that the number
\(t_1\) of classes of
\(\mathcal{P}^1\) is at most
\(\left\lceil\frac{8t_{0}2^{t_{0}-1}}{\varepsilon^5}\right\rceil\) and all the classes of
\(\mathcal{P}^1\) have size
\(\lfloor n/t_1\rfloor\) or
\(\lceil n/t_1\rceil\text{.}\) If a large number of the pairs of sets in
\(\mathcal{P}^1\) still fail to be
\(\varepsilon\)-regular, then we repeat this process, obtaining a partition
\(\mathcal{P}^2\) into
\(t_2\leq\left\lceil\frac{8t_{1}2^{t_{1}-1}}{\varepsilon^5}\right\rceil\) parts with mean square density at least
\(\varepsilon^5/4\) greater than that of
\(\mathcal{P}^1\text{,}\) and so on. Since every partition of
\(V(G)\) has mean square density between zero and one, this procedure must terminate after at most
\(4/\varepsilon^5\) steps, yielding a partition which satisfies
condition a of
Theorem 5.2.1. Letting
\(m=\lfloor 4/\varepsilon^5\rfloor\) be an upper bound on the number of iterations, the number of sets in the final partition is at most
\(t_m\) where, for
\(1\leq i\leq m\text{,}\) the quantity
\(t_i\) is defined by the recurrence
\(t_0=t\) and
\(t_i:=\left\lceil\frac{8t_{i-1}2^{t_{i-1}-1}}{\varepsilon^5}\right\rceil\text{.}\) Letting
\(k_0(\varepsilon,t):=t_m\text{,}\) we see that
property c of
Theorem 5.2.1 holds as well.
Now, let’s get into the details. Our first step is to formally define the mean square density of a partition.
Definition 5.3.1.
Given a partition \(\mathcal{P}=(X_1,\dots,X_t)\) of \(V(G)\text{,}\) define the mean square density of \(\mathcal{P}\) to be
\begin{equation*}
f(\mathcal{P}):=\frac{1}{n^2}\sum_{\substack{(i,j)\in [t]^2\\ i\neq j} }|X_i||X_j|d(X_i,X_j)^2\text{.}
\end{equation*}
A basic fact is that any partition has mean square density at most one.
Lemma 5.3.4.
For any partition \(\mathcal{P}\) of \(V(G)\text{,}\)
\begin{equation*}
f(\mathcal{P})\leq 1\text{.}
\end{equation*}
Proof.
Recall that, for partitions \(\mathcal{P} = (X_1,\dots,X_m)\) and \(\mathcal{P}'=(X_1',\dots,X_\ell')\) of \(V(G)\text{,}\) we say that \(\mathcal{P}'\) refines \(\mathcal{P}\) if for every \(1\leq i\leq \ell\) there exists \(1\leq j\leq m\) such that \(X_i'\subseteq X_j\text{.}\) Another important fact which we will use is that refining a partition does not decrease its mean square density. The following lemma will be used to compare the “contribution” from a pair \((X_i,X_j)\) in a partition of \(V(G)\) to the mean square density to the contribution of the subsets of \(X_i\) and \(X_j\) in a refinement of the partition.
Lemma 5.3.5.
If \(A\) and \(B\) are disjoint subsets of \(V(G)\) and \((A_1,\dots,A_s)\) and \((B_1,\dots,B_t)\) are partitions of \(A\) and \(B\text{,}\) respectively, then
\begin{equation*}
|A||B|d(A,B)^2 \leq \sum_{i=1}^s\sum_{j=1}^t |A_i||B_j|d(A_i,B_j)^2\text{.}
\end{equation*}
Proof.
By a simple double-counting argument, we see that
\begin{equation}
e(A,B) = \sum_{i=1}^s\sum_{j=1}^t e(A_i,B_j) = \sum_{i=1}^s\sum_{j=1}^t|A_i||B_j|d(A_i,B_j)\text{.}\tag{5.3.1}
\end{equation}
Squaring both sides yields
\begin{equation*}
e(A,B)^2 = \left(\sum_{i=1}^s\sum_{j=1}^t|A_i||B_j|d(A_i,B_j)\right)^2\text{.}
\end{equation*}
For each \((i,j)\in [s]\times[t]\text{,}\) let
\begin{equation*}
x_{i,j}:=\sqrt{|A_i||B_j|}
\end{equation*}
and
\begin{equation*}
y_{i,j}:=d(A_i,B_j)\sqrt{|A_i||B_j|}\text{.}
\end{equation*}
\begin{equation*}
\left(\sum_{i=1}^s\sum_{j=1}^t|A_i||B_j|d(A_i,B_j)\right)^2 = \left(\sum_{i=1}^s\sum_{j=1}^t|x_{i,j}y_{i,j}|\right)^2\leq \left(\sum_{i=1}^s\sum_{j=1}^tx_{i,j}^2\right)\left(\sum_{i=1}^s\sum_{j=1}^ty_{i,j}^2\right)
\end{equation*}
\begin{equation*}
=\left(\sum_{i=1}^s\sum_{j=1}^t|A_i||B_j|\right)\left(\sum_{i=1}^s\sum_{j=1}^t|A_i||B_j|d(A_i,B_j)^2\right)
\end{equation*}
It is easily observed that
\begin{equation}
\sum_{i=1}^s\sum_{j=1}^t|A_i||B_j| = |A||B|\text{.}\tag{5.3.2}
\end{equation}
Therefore,
\begin{equation*}
e(A,B)^2\leq |A||B|\left(\sum_{i=1}^s\sum_{j=1}^t|A_i||B_j|d(A_i,B_j)^2\right)\text{.}
\end{equation*}
Dividing both sides by \(|A||B|\) and recalling that \(d(A,B)=\frac{e(A,B)}{|A||B|}\) completes the proof.
Lemma 5.3.6.
If \(\mathcal{P}'\) and \(\mathcal{P}\) are partitions of \(V(G)\) such that \(\mathcal{P}'\) refines \(\mathcal{P}\text{,}\) then
\begin{equation*}
f(\mathcal{P})\leq f(\mathcal{P}')\text{.}
\end{equation*}
Proof.
Write \(\mathcal{P}=(X_1,\dots,X_m)\) and \(\mathcal{P}'=(X_1',\dots,X_\ell')\text{.}\) For \(1\leq i\leq m\text{,}\) let
\begin{equation*}
R_i:=\{j: 1\leq r\leq \ell\text{ and } X_r'\subseteq X_i\}\text{.}
\end{equation*}
Then, by
Lemma 5.3.5 to every pair
\((X_i,X_j)\) with
\(i\neq j\text{,}\) we get
\begin{equation*}
f(\mathcal{P}) = \frac{1}{n^2}\sum_{i\neq j}|X_i||X_j|d(X_i,X_j)^2 \leq \frac{1}{n^2}\sum_{i\neq j}\sum_{r\in R_i}\sum_{s\in R_j}|X_r'||X_s'|d(X_r',X_s')^2 = f(\mathcal{P}')
\end{equation*}
as desired.
Note that the inequality
\begin{equation*}
\left(\sum_{i=1}^n|x_iy_i|\right)^2\leq \left(\sum_{i=1}^nx_i^2\right)\left(\sum_{i=1}^ny_i^2\right)
\end{equation*}
in
Lemma C.0.4 is tight if and only if the vectors
\((x_1,\dots,x_n)\) and
\((y_1,\dots,y_n)\) are constant multiples of one another. What does this tell us in the context of the proof of
Lemma 5.3.5? It tells us that the “gain” in the mean square density will be zero if and only if there exists
\(d\) such that
\(d(A_i,B_j)=d\) for all
\((i,j)\in [s]\times [t]\text{.}\) Of course, if such a
\(d\) exists, then it must be the case that
\(d=d(A,B)\text{.}\) Thus, if
\((A,B)\) is not
\(\varepsilon\)-regular and we refine
\(A\) and
\(B\) in a way that “pulls out” some subsets of
\(A\) and
\(B\) of density different from
\(d(A,B)\text{,}\) then we should be able to get a “boost” in mean square density. The following lemma quantifies the size of this “boost.”
Lemma 5.3.7.
Suppose \(A\) and \(B\) are disjoint subsets of \(V(G)\) and let \(A_1\subseteq A\) and \(B_1\subseteq B\text{.}\) Define \(A_2:=A\setminus A_1\) and \(B_2:=B\setminus B_1\text{.}\) Then
\begin{equation*}
\sum_{1\leq i,j\leq 2}|A_i||A_j|d(A_i,A_j)^2\geq |A||B|d(A,B)^2 + (d(A_1,B_1)-d(A,B))^2|A_1||B_1|\text{.}
\end{equation*}
Proof.
We have
\begin{equation*}
|A_1||B_1|(d(A_1,B_1)-d(A,B))^2\leq \sum_{1\leq i,j\leq 2}|A_i||B_j|(d(A_i,B_j)-d(A,B))^2\text{.}
\end{equation*}
Expanding \((d(A_i,B_j)-d(A,B))^2\) in the sum on the right side (for each \(1\leq i,j\leq 2\) individually) yields the following expression:
\begin{equation*}
\sum_{1\leq i,j\leq 2}|A_i||B_j|d(A_i,B_j)^2 - 2d(A,B)\sum_{1\leq i,j\leq 2}|A_i||B_j|d(A_i,B_j) + d(A,B)^2\sum_{1\leq i,j\leq 2}|A_i||B_j|\text{.}
\end{equation*}
Applying
(5.3.1) to the second summation and
(5.3.2) to the third, we get that this is equal to
\begin{equation*}
\sum_{1\leq i,j\leq 2}|A_i||B_j|d(A_i,B_j)^2 -|A||B|d(A,B)^2\text{.}
\end{equation*}
We now add \(|A||B|d(A,B)^2\) to both sides to get the desired bound.
Okay, let’s describe an algorithm for producing the partition in
Theorem 5.2.1. Recall that
\(t_0:=t\) and
\(\mathcal{P}^0\) is an arbitrary partition of
\(V(G)\) into sets
\(X_1^0,\dots,X_{t_0}^0\text{,}\) each of size roughly
\(n/t_0\text{.}\) If
\(\mathcal{P}^0\) already satisfies
condition a of
Theorem 5.2.1, then we have nothing more to do, and we simply output it.
So, assume that
condition a of
Theorem 5.2.1 is not satisfied. For each pair
\((X_i^0,X_j^0)\) which is not
\(\varepsilon\)-regular and such that
\(i\neq j\text{,}\) take
\(X_{i,j}\subseteq X_i^0\) and
\(X_{j,i}\subseteq X_j^0\) such that
\(|X_{i,j}|\geq \varepsilon|X_i^0|\text{,}\) \(|X_{j,i}|\geq \varepsilon|X_j^0|\) and
\(|d(X_{i,j},X_{j,i}) - d(X_i^0,X_j^0)|\geq \varepsilon\text{.}\) Let
\(X_{i,j}':=X_i^0\setminus X_{i,j}\) and
\(X_{j,i}':=X_j^0\setminus X_{j,i}\text{.}\) By
Lemma 5.3.7, if we replace
\(X_i^0\) and
\(X_j^0\) with four parts
\(X_{i,j},X_{i,j}',X_{j,i},X_{j,i}'\text{,}\) then the contribution of the pair
\((X_i^0,X_j^0)\) to the mean square density increases by at least
\begin{equation*}
\frac{(d(X_{i,j},X_{j,i}) - d(X_i^0,X_j^0))^2|X_{i,j}||X_{j,i}|}{n^2}\geq \frac{\varepsilon^2\cdot \varepsilon|X_i^0|\varepsilon|X_j^0|}{n^2} \geq \frac{\varepsilon^4\left\lfloor n/t_0\right\rfloor^2}{n^2}>\frac{\varepsilon^4}{2t_0^2}\text{.}
\end{equation*}
Also, crucially, by
Lemma 5.3.5, any partitions of
\(X_i^0\) and
\(X_j^0\) that refine the partitions
\((X_{i,j},X_{i,j}')\) and
\((X_{j,i},X_{j,i}')\) will also give us an extra contribution of at least
\(\frac{\varepsilon^4}{2t_0^2}\) to the mean square density compared with the contribution from
\((X_i^0,X_j^0)\text{.}\) So, if there are more than
\(\varepsilon t_0^2\) non-
\(\varepsilon\)-regular pairs and we refine every non-
\(\varepsilon\)-regular pair in the way described above (at the same time), then we get a total boost of at least
\begin{equation*}
\left(\frac{\varepsilon^4}{2t_0^2}\right)\left(\varepsilon t_0^2\right) \geq \frac{\varepsilon^5}{2}
\end{equation*}
to the mean square density.
Now, let \(\mathcal{Q}^1\) be the partition obtained from the refinement procedure described in the previous paragraph. The following lemma explains how to refine \(\mathcal{Q}^1\) further and rearrange it so that all sets have roughly the same size without decreasing the mean square density by very much.
Lemma 5.3.8.
Let \(\delta>0\) and \(\mathcal{Q}=(X_1,\dots,X_m)\) be a partition of \(V(G)\text{.}\) Then there exists a partition \(\mathcal{P}=(X_1',\dots,X_t')\) of \(V(G)\) such that
\(t= \left\lceil\frac{2m}{\delta}\right\rceil\text{,}\)
\(||X_i'|-|X_j'||\leq 1\) for all \(1\leq i,j\leq t\) and
\(f(\mathcal{P})\geq f(\mathcal{Q})-\delta\text{.}\)
Proof.
Let \(t= \left\lceil\frac{2m}{\delta}\right\rceil\) and let \(0\leq r\leq t-1\) such that
\begin{equation*}
|V(G)| = t\left\lfloor \frac{n}{t}\right\rfloor + r\text{.}
\end{equation*}
Let
\(\tilde{\mathcal{Q}}\) be an arbitrary refinement of
\(\mathcal{Q}\) obtained by breaking up each class
\(X_i\) of
\(\mathcal{Q}\) into sets of size
\(\lfloor n/t\rfloor\) or
\(\lceil n/t\rceil\) and one “leftover” set
\(Y_i\subseteq X_i\) with
\(|Y_i|\lt n/t\text{.}\) When doing so, one needs to be careful to maintain the restriction that the total number of sets of size
\(\lfloor n/t\rfloor\) is at most
\(t-r\) and the number of size
\(\lceil n/t\rceil\) is at most
\(r\text{.}\) By
Lemma 5.3.5,
\begin{equation*}
f(\tilde{\mathcal{Q}})\geq f(\mathcal{Q})\text{.}
\end{equation*}
Now, we let \(\mathcal{P}\) be a partition obtained from \(\tilde{\mathcal{Q}}\) by putting all of the sets of leftover vertices together and breaking them up into sets of size \(\lfloor n/t\rfloor\) or \(\lceil n/t\rceil\) abitrarily. Note that it is possible to do this since our partitioning of the rest of the vertices uses at most \(t-r\) sets of size \(\lfloor n/t\rfloor\) and at most \(r\) of size \(\lceil n/t\rceil\text{.}\)
The last step is to show that the mean square density of \(\mathcal{P}\) cannot be much smaller than that of \(\tilde{\mathcal{Q}}\text{.}\) For each \(1\leq i\leq m\text{,}\) the number of leftover vertices in the partition of \(X_i\) is at most \(n/t\text{.}\) Since \(d(A,B)\leq 1\) for any two disjoint non-empty sets \(A\) and \(B\text{,}\) the contribution of these leftover sets to the mean square density of \(\tilde{\mathcal{Q}}\) is at most
\begin{equation*}
2\sum_{i=1}^m\frac{|Y_i|n}{n^2} \leq \frac{2}{n}\sum_{i=1}^m|Y_i|\leq 2m\left(\frac{n}{t}\right)\leq \delta\text{.}
\end{equation*}
Thus, when we put together all these leftover vertices and divide them up differently, the most that we can lose from the mean square density is \(\delta\text{.}\) This completes the proof of the lemma.
So, the partition
\(\mathcal{P}^1\) is obtained from
\(\tilde{\mathcal{Q}^1}\) by applying
Lemma 5.3.8 with
\(\delta=\frac{\varepsilon^5}{4}\text{.}\) To finish the proof, all that we need to notice is that the procedure for obtaining
\(\mathcal{P}^1\) from
\(\mathcal{P}^0\) that we have just described can be applied in exactly the same way to obtain
\(\mathcal{P}^2\) from
\(\mathcal{P}^1\text{,}\) and so on, until we finally reach a partition which satisfies
property a of
Theorem 5.2.1. Since every iteration increases the mean square density by at least
\(\varepsilon^5/4\text{,}\) this will terminate after at most
\(4/\varepsilon^5\) steps. The total number of sets in the final partition will be bounded above by
\(t_m\) where
\(m=\lfloor 4/\varepsilon^5\rfloor\text{,}\) \(t_0=t\) and
\(t_i=\left\lceil\frac{8t_{i-1}2^{kt{i-1}-1}}{\varepsilon^5}\right\rceil\) for all
\(i\geq1\text{.}\)
Here are a couple of YouTube videos related to the proof of the Regularity Lemma: