### Definition 5.1.1.

Given a graph \(G\) and disjoint non-empty subsets \(X\) and \(Y\) of \(V(G)\text{,}\) the

*density*between \(X\) and \(Y\) is defined by
\begin{equation*}
d(X,Y):=\frac{e(X,Y)}{|X||Y|}\text{.}
\end{equation*}

The Regularity Lemma (stated in the next section) roughly says that the vertex set of any graph can be partitioned into a bounded number of parts such that the edges between most pairs of parts are “well behaved.” The precise meaning of this is given by Definition 5.1.2 below.

Given a graph \(G\) and disjoint non-empty subsets \(X\) and \(Y\) of \(V(G)\text{,}\) the *density* between \(X\) and \(Y\) is defined by

\begin{equation*}
d(X,Y):=\frac{e(X,Y)}{|X||Y|}\text{.}
\end{equation*}

Given \(\varepsilon> 0\text{,}\) a graph \(G\) and disjoint non-empty subsets \(X\) and \(Y\) of \(V(G)\text{,}\) say that the pair \((X,Y)\) is *\(\varepsilon\)-regular * if, for every \(A\subseteq X\) and \(B\subseteq Y\) such that \(|A|\geq \varepsilon|X|\) and \(|B|\geq \varepsilon |Y|\text{,}\)

\begin{equation*}
\left|d(A,B)-d(X,Y)\right|\leq \varepsilon\text{.}
\end{equation*}

In other words, \((X,Y)\) is \(\varepsilon\)-regular if, for any “large enough” subsets \(A\) and \(B\) of \(X\) and \(Y\text{,}\) respectively, the density between \(A\) and \(B\) is the same as the density between \(X\) and \(Y\text{,}\) up to an error of at most \(\varepsilon\text{.}\) Before moving forward, let us make a quick remark and think through a few examples.

The condition that \(A\) and \(B\) are “large enough” is important for making Definition 5.1.2 meaningful for densities between zero and one. For example, if we allow \(A\) and \(B\) to be singletons, then \(d(A,B)\) is either \(0\) or \(1\text{,}\) depending on whether or not the unique vertex of \(A\) is adjacent to the unique vertex of \(B\text{.}\) Thus, if there was no restriction on \(|A|\) and \(|B|\text{,}\) then the only way in which there could exist \(0\lt \varepsilon\lt 1/2\) such that \((X,Y)\) is \(\varepsilon\)-regular is if \(G\) contains no edges from \(X\) to \(Y\) or it contained every edge from \(X\) to \(Y\text{.}\)

Suppose that \(X\) and \(Y\) are disjoint sets of vertices and that \(G\) contains every edge from \(X\) to \(Y\text{;}\) e.g., if \(X\) and \(Y\) are two parts of a complete multipartite graph. Then, for every \(A\subseteq X\) and \(B\subseteq Y\text{,}\) we have \(d(A,B)=1\text{.}\) Therefore, the pair \((X,Y)\) is \(\varepsilon\)-regular for every \(\varepsilon>0\text{.}\)

Suppose now that \(X\) and \(Y\) are disjoint sets of vertices and that every vertex in \(X\) has at most \(d\) non-neighbours in \(Y\text{.}\) We claim that, if \(\varepsilon > \sqrt{d/|Y|}\text{,}\) then \((X,Y)\) is \(\varepsilon\)-regular. For any non-empty sets \(A\subseteq X\) and \(B\subseteq Y\text{,}\) we have

\begin{equation*}
|A|(|B|-d)\leq e(A,B)\leq |A||B|
\end{equation*}

which implies that

\begin{equation*}
(|B|-d)/|B|\leq d(A,B)\leq 1\text{.}
\end{equation*}

Thus, if \(|B|\geq \varepsilon |Y|\text{,}\) then

\begin{equation*}
|d(A,B)-d(X,Y)|\leq d/|B| \leq d/(\varepsilon|Y|) \leq \varepsilon\text{.}
\end{equation*}

Thus, \((X,Y)\) is \(\varepsilon\)-regular.

Suppose that \(X\) and \(Y\) are disjoint sets such that \(|X|=|Y|\) and that, for each \(x\in X\) and \(y\in Y\text{,}\) the edge \(xy\) is included in \(G\) with probability \(1/2\text{,}\) independently of all other such edges. For \(\varepsilon>0\text{,}\) if \(|X|=|Y|\) is large with respect to \(\varepsilon\text{,}\) then \((X,Y)\) is \(\varepsilon\)-regular with high probability.

Here are the details. Given any \(A\subseteq X\) and \(B\subseteq Y\text{,}\) by linearity of expectation, the expected number of edges from \(X\) to \(Y\) is \(|A||B|/2\text{.}\) By the Chernoff Bound (Theorem B.0.10) and independence, for any \(t\geq0\text{,}\) the probability that the number of edges deviates from \(|A||B|/2\) by at least \(t\) is at most \(2e^{-2t^2/|A||B|}\text{.}\) Letting \(t=\varepsilon|A||B|/2\text{,}\) we get that

\begin{equation*}
\mathbb{P}(|d(A,B)-1/2|>\varepsilon/2) \leq 2e^{-\varepsilon^2|A||B|/4}\text{.}
\end{equation*}

If each of \(|A|\) and \(|B|\) is at least \(\varepsilon|X|=\varepsilon|Y|\text{,}\) then this probability is at most \(2e^{-\varepsilon^4|X||Y|/4}\text{.}\) The number of possible choices for \((A,B)\) is at most \(2^{|X|}2^{|Y|} = 4^{|X|}\text{.}\) Thus, the probability that there exists such a pair \((A,B)\) with \(|d(A,B)-1/2|>\varepsilon/2\) goes to zero (very quickly) as \(|X|\) goes to infinity. So, with probability close to one, there are no such pairs. Therefore, with probability close to one, \((X,Y)\) is \(\varepsilon\)-regular.

Example 5.1.6 is particularly relevant to the way that we will generally “think about” \(\varepsilon\)-regular pairs. That is, intuitively speaking, if \((X,Y)\) is \(\varepsilon\)-regular, then we will think of the edges between \(X\) and \(Y\) as being distributed in a “random-like” way, up to small errors which are bounded in terms of \(\varepsilon\text{.}\)

Now that we are comfortable with the definition of an \(\varepsilon\)-regular pair, let us turn our attention to their main application; namely, in approximately counting copies of graphs. As a simple special case, let us consider the case of triangles.

Let \(G\) be a graph and let \((X_1,X_2,X_3)\) be a partition of \(V(G)\text{.}\) Let \(0\lt \varepsilon\lt 1\) and suppose that, for all \(1\leq i\neq j\leq 3\text{,}\) the pair \((X_i,X_j)\) is \(\varepsilon\)-regular and \(d(X_i,X_j)>2\varepsilon\text{.}\) Then the number of triangles in \(G\) with one vertex in each of the sets \(X_1,X_2\) and \(X_3\) is at least

\begin{equation*}
(1-2\varepsilon)\left(\prod_{i\neq j}(d(X_i,X_{j})-\varepsilon)\right)|X_1||X_2||X_3|\text{.}
\end{equation*}

Given \(i\in \{2,3\}\text{,}\) say that a vertex \(x\in X_1\) is *bad for \(X_i\)* if it has fewer than \((d(X_1,X_i)-\varepsilon)|X_i|\) neighbours in \(X_i\text{.}\) Let \(X_{1,i}'\) be the set of all vertices that are bad for \(X_i\text{.}\) Then, by definition,

\begin{equation*}
e(X_{1,i}',X_i) \lt \left(d(X_1,X_i) - \varepsilon\right)|X_{1,i}'||X_{i}|
\end{equation*}

and so \(|d(X_{1,i}',X_{i})-d(X_1,X_{i})|>\varepsilon\text{.}\) Since \((X_1,X_{2})\) is \(\varepsilon\)-regular, and \(|X_i|>\varepsilon|X_i|\text{,}\) this implies that \(|X_{1,i}'| \lt \varepsilon|X_1|\text{.}\)

Now, let \(Y_1:=X_1\setminus(X_{1,2}'\cup X_{1,3}')\text{.}\) As we have shown above, \(|Y_1|\geq |X_1|-2\varepsilon|X_1|=(1-2\varepsilon)|X_1\text{.}\) Note that any \(x\in Y_1\) is contained in exactly

\begin{equation*}
e(N(x)\cap X_2,N(x)\cap X_3)
\end{equation*}

\begin{equation*}
=|N(x)\cap X_2||N(x)\cap X_3|\cdot d(N(x)\cap X_2,N(x)\cap X_3)
\end{equation*}

triangles in \(G\text{.}\) Since \(x\in Y_1\text{,}\) it has at least \((d(X_1,X_{2})-\varepsilon)|X_{2}| > \varepsilon |X_2|\) neighbours in \(X_2\) and at least \((d(X_1,X_{3})-\varepsilon)|X_{3}|>\varepsilon|X_3|\) in \(X_3\text{.}\) So, since \((X_2,X_3)\) is \(\varepsilon\)-regular, the number of triangles containing \(x\) is at least

\begin{equation*}
(d(X_1,X_{2})-\varepsilon)(d(X_1,X_{3})-\varepsilon)(d(X_2,X_3)-\varepsilon)|X_{2}||X_{3}|\text{.}
\end{equation*}

Thus, we are now done by summing the number of triangles containing \(x\) over all \(x\in Y_1\text{.}\)

The main message of Lemma 5.1.7 is that, if \((X_1,X_2),(X_1,X_3)\) and \((X_2,X_3)\) are all \(\varepsilon\)-regular, then the number of triangles between these three sets is almost as large as it would be if the edges from \(X_i\) to \(X_j\) were placed randomly with probability \(d(X_i,X_j)\) for all distinct \(i,j\in\{1,2,3\}\text{.}\) That is, from the perspective of counting triangles, \(\varepsilon\)-regular pairs for small \(\varepsilon\) can be thought of as being “random-like.”

Of course, there is nothing particularly special about finding copies of triangles here. A very similar argument works for bounding the number of copies of \(K_r\) in a graph whose vertex set is partitioned into \(r\) sets \(X_1,\dots,X_r\) such that the pairs \((X_i,X_j)\) with \(i\neq j\) are \(\varepsilon\)-regular. In fact, there is also nothing special about complete graphs. By complementing the edges between pairs \((X_i,X_j)\) and counting appropriately, one can also obtain a lower bound on the number of induced copies of any graph \(H\text{.}\) Thus, since this holds for all possible subgraphs, by “counting the complement,” we can also get upper bounds on the number of induced copies of \(H\text{.}\) These extensions will be explored in exercises Exercise 3 and Exercise 4 in Section 5.4.

Here is a YouTube video related to regular pairs and counting: