Section 5.1 Regular Pairs and Counting
The Regularity Lemma (stated in the next section) roughly says that the vertex set of any graph can be partitioned into a bounded number of parts such that the edges between most pairs of parts are “well behaved.” The precise meaning of this is given by
Definition 5.1.2 below.
Definition 5.1.1.
Given a graph \(G\) and disjoint non-empty subsets \(X\) and \(Y\) of \(V(G)\text{,}\) the density between \(X\) and \(Y\) is defined by
\begin{equation*}
d(X,Y):=\frac{e(X,Y)}{|X||Y|}\text{.}
\end{equation*}
Definition 5.1.2.
Given \(\varepsilon> 0\text{,}\) a graph \(G\) and disjoint non-empty subsets \(X\) and \(Y\) of \(V(G)\text{,}\) say that the pair \((X,Y)\) is \(\varepsilon\)-regular if, for every \(A\subseteq X\) and \(B\subseteq Y\) such that \(|A|\geq \varepsilon|X|\) and \(|B|\geq \varepsilon |Y|\text{,}\)
\begin{equation*}
\left|d(A,B)-d(X,Y)\right|\leq \varepsilon\text{.}
\end{equation*}
In other words, \((X,Y)\) is \(\varepsilon\)-regular if, for any “large enough” subsets \(A\) and \(B\) of \(X\) and \(Y\text{,}\) respectively, the density between \(A\) and \(B\) is the same as the density between \(X\) and \(Y\text{,}\) up to an error of at most \(\varepsilon\text{.}\) Before moving forward, let us make a quick remark and think through a few examples.
Example 5.1.4.
Suppose that \(X\) and \(Y\) are disjoint sets of vertices and that \(G\) contains every edge from \(X\) to \(Y\text{;}\) e.g., if \(X\) and \(Y\) are two parts of a complete multipartite graph. Then, for every \(A\subseteq X\) and \(B\subseteq Y\text{,}\) we have \(d(A,B)=1\text{.}\) Therefore, the pair \((X,Y)\) is \(\varepsilon\)-regular for every \(\varepsilon>0\text{.}\)
Example 5.1.5.
Suppose now that \(X\) and \(Y\) are disjoint sets of vertices and that every vertex in \(X\) has at most \(d\) non-neighbours in \(Y\text{.}\) We claim that, if \(\varepsilon > \sqrt{d/|Y|}\text{,}\) then \((X,Y)\) is \(\varepsilon\)-regular. For any non-empty sets \(A\subseteq X\) and \(B\subseteq Y\text{,}\) we have
\begin{equation*}
|A|(|B|-d)\leq e(A,B)\leq |A||B|
\end{equation*}
which implies that
\begin{equation*}
(|B|-d)/|B|\leq d(A,B)\leq 1\text{.}
\end{equation*}
Thus, if \(|B|\geq \varepsilon |Y|\text{,}\) then
\begin{equation*}
|d(A,B)-d(X,Y)|\leq d/|B| \leq d/(\varepsilon|Y|) \leq \varepsilon\text{.}
\end{equation*}
Thus, \((X,Y)\) is \(\varepsilon\)-regular.
Example 5.1.6.
Suppose that \(X\) and \(Y\) are disjoint sets such that \(|X|=|Y|\) and that, for each \(x\in X\) and \(y\in Y\text{,}\) the edge \(xy\) is included in \(G\) with probability \(1/2\text{,}\) independently of all other such edges. For \(\varepsilon>0\text{,}\) if \(|X|=|Y|\) is large with respect to \(\varepsilon\text{,}\) then \((X,Y)\) is \(\varepsilon\)-regular with high probability.
Here are the details. Given any
\(A\subseteq X\) and
\(B\subseteq Y\text{,}\) by linearity of expectation, the expected number of edges from
\(X\) to
\(Y\) is
\(|A||B|/2\text{.}\) By the Chernoff Bound (
Theorem B.0.10) and independence, for any
\(t\geq0\text{,}\) the probability that the number of edges deviates from
\(|A||B|/2\) by at least
\(t\) is at most
\(2e^{-2t^2/|A||B|}\text{.}\) Letting
\(t=\varepsilon|A||B|/2\text{,}\) we get that
\begin{equation*}
\mathbb{P}(|d(A,B)-1/2|>\varepsilon/2) \leq 2e^{-\varepsilon^2|A||B|/4}\text{.}
\end{equation*}
If each of \(|A|\) and \(|B|\) is at least \(\varepsilon|X|=\varepsilon|Y|\text{,}\) then this probability is at most \(2e^{-\varepsilon^4|X||Y|/4}\text{.}\) The number of possible choices for \((A,B)\) is at most \(2^{|X|}2^{|Y|} = 4^{|X|}\text{.}\) Thus, the probability that there exists such a pair \((A,B)\) with \(|d(A,B)-1/2|>\varepsilon/2\) goes to zero (very quickly) as \(|X|\) goes to infinity. So, with probability close to one, there are no such pairs. Therefore, with probability close to one, \((X,Y)\) is \(\varepsilon\)-regular.
Example 5.1.6 is particularly relevant to the way that we will generally “think about”
\(\varepsilon\)-regular pairs. That is, intuitively speaking, if
\((X,Y)\) is
\(\varepsilon\)-regular, then we will think of the edges between
\(X\) and
\(Y\) as being distributed in a “random-like” way, up to small errors which are bounded in terms of
\(\varepsilon\text{.}\)
Now that we are comfortable with the definition of an \(\varepsilon\)-regular pair, let us turn our attention to their main application; namely, in approximately counting copies of graphs. As a simple special case, let us consider the case of triangles.
Lemma 5.1.7. The Counting Lemma For Triangles.
Let \(G\) be a graph and let \((X_1,X_2,X_3)\) be a partition of \(V(G)\text{.}\) Let \(0\lt \varepsilon\lt 1\) and suppose that, for all \(1\leq i\neq j\leq 3\text{,}\) the pair \((X_i,X_j)\) is \(\varepsilon\)-regular and \(d(X_i,X_j)>2\varepsilon\text{.}\) Then the number of triangles in \(G\) with one vertex in each of the sets \(X_1,X_2\) and \(X_3\) is at least
\begin{equation*}
(1-2\varepsilon)\left(\prod_{i\neq j}(d(X_i,X_{j})-\varepsilon)\right)|X_1||X_2||X_3|\text{.}
\end{equation*}
Proof.
Given \(i\in \{2,3\}\text{,}\) say that a vertex \(x\in X_1\) is bad for \(X_i\) if it has fewer than \((d(X_1,X_i)-\varepsilon)|X_i|\) neighbours in \(X_i\text{.}\) Let \(X_{1,i}'\) be the set of all vertices that are bad for \(X_i\text{.}\) Then, by definition,
\begin{equation*}
e(X_{1,i}',X_i) \lt \left(d(X_1,X_i) - \varepsilon\right)|X_{1,i}'||X_{i}|
\end{equation*}
and so \(|d(X_{1,i}',X_{i})-d(X_1,X_{i})|>\varepsilon\text{.}\) Since \((X_1,X_{2})\) is \(\varepsilon\)-regular, and \(|X_i|>\varepsilon|X_i|\text{,}\) this implies that \(|X_{1,i}'| \lt \varepsilon|X_1|\text{.}\)
Now, let \(Y_1:=X_1\setminus(X_{1,2}'\cup X_{1,3}')\text{.}\) As we have shown above, \(|Y_1|\geq |X_1|-2\varepsilon|X_1|=(1-2\varepsilon)|X_1\text{.}\) Note that any \(x\in Y_1\) is contained in exactly
\begin{equation*}
e(N(x)\cap X_2,N(x)\cap X_3)
\end{equation*}
\begin{equation*}
=|N(x)\cap X_2||N(x)\cap X_3|\cdot d(N(x)\cap X_2,N(x)\cap X_3)
\end{equation*}
triangles in \(G\text{.}\) Since \(x\in Y_1\text{,}\) it has at least \((d(X_1,X_{2})-\varepsilon)|X_{2}| > \varepsilon |X_2|\) neighbours in \(X_2\) and at least \((d(X_1,X_{3})-\varepsilon)|X_{3}|>\varepsilon|X_3|\) in \(X_3\text{.}\) So, since \((X_2,X_3)\) is \(\varepsilon\)-regular, the number of triangles containing \(x\) is at least
\begin{equation*}
(d(X_1,X_{2})-\varepsilon)(d(X_1,X_{3})-\varepsilon)(d(X_2,X_3)-\varepsilon)|X_{2}||X_{3}|\text{.}
\end{equation*}
Thus, we are now done by summing the number of triangles containing \(x\) over all \(x\in Y_1\text{.}\)
The main message of
Lemma 5.1.7 is that, if
\((X_1,X_2),(X_1,X_3)\) and
\((X_2,X_3)\) are all
\(\varepsilon\)-regular, then the number of triangles between these three sets is almost as large as it would be if the edges from
\(X_i\) to
\(X_j\) were placed randomly with probability
\(d(X_i,X_j)\) for all distinct
\(i,j\in\{1,2,3\}\text{.}\) That is, from the perspective of counting triangles,
\(\varepsilon\)-regular pairs for small
\(\varepsilon\) can be thought of as being “random-like.”
Of course, there is nothing particularly special about finding copies of triangles here. A very similar argument works for bounding the number of copies of
\(K_r\) in a graph whose vertex set is partitioned into
\(r\) sets
\(X_1,\dots,X_r\) such that the pairs
\((X_i,X_j)\) with
\(i\neq j\) are
\(\varepsilon\)-regular. In fact, there is also nothing special about complete graphs. By complementing the edges between pairs
\((X_i,X_j)\) and counting appropriately, one can also obtain a lower bound on the number of induced copies of any graph
\(H\text{.}\) Thus, since this holds for all possible subgraphs, by “counting the complement,” we can also get upper bounds on the number of induced copies of
\(H\text{.}\) These extensions will be explored in exercises
Exercise 3 and
Exercise 4 in
Section 5.4.
Here are a couple of YouTube videos related to regular pairs and counting: