Skip to main content

Appendix B Basic Discrete Probability

Everything in this appendix should be familiar to anyone who has taken a probability or statistics course before. In these notes, all probability spaces that we require are discrete (i.e. countable). In fact, they are all finite. Therefore, we will not need the full measure-theoretic definition of a probability space; the following will suffice.
described in detail following the image
“Discrete probability.”

Definition B.0.1.

A discrete probability space is a pair \((\Omega,\mathbb{P})\) where \(\Omega\) is a countable set and \(\mathbb{P}:2^\Omega \to [0,1]\) is a function satisfying
\begin{equation*} \mathbb{P}(\Omega)=1 \end{equation*}
and, for any \(A,B\subseteq \Omega\) with \(A\cap B=\emptyset\text{,}\)
\begin{equation*} \mathbb{P}(A\cup B) = \mathbb{P}(A) + \mathbb{P}(B)\text{.} \end{equation*}

Definition B.0.2.

Given a discrete probability space \((\Omega,\mathbb{P})\text{,}\) the subsets \(E\) of \(\Omega\) are called events.

Definition B.0.3.

Given a discrete probability space \((\Omega,\mathbb{P})\text{,}\) a random variable is a function \(X:\Omega\to \mathbb{R}\text{.}\)

Definition B.0.4.

The expectation or expected value of a random variable \(X\) on a discrete probability space \((\Omega,\mathbb{P})\) is given by
\begin{equation*} \mathbb{E}(X) := \sum_{x\in \Omega}X(x) \mathbb{P}(\{x\})\text{.} \end{equation*}

Observation B.0.5. Linearity of Expectation.

For any two random variables \(X\) and \(Y\) on a discrete probability space \((\Omega,\mathbb{P})\) and \(a,b\in\mathbb{R}\text{,}\)
\begin{equation*} \mathbb{E}(aX+bY) = a\mathbb{E}(X) + b\mathbb{E}(Y)\text{.} \end{equation*}

Proof.

Easily deduced from the definition of expectation.

Definition B.0.6.

Given a discrete probability space \((\Omega,\mathbb{P})\) and an event \(E\text{,}\) the indicator random variable of \(E\) is the random variable \(1_E\) such that
\begin{equation*} 1_E(x)=\begin{cases}1 \amp \text{ if } x\in E,\\ 0 \amp \text{ otherwise } . \end{cases} \end{equation*}

Observation B.0.7.

For any event \(E\) in a discrete probability space \((\Omega,\mathbb{P})\text{,}\) it holds that
\begin{equation*} \mathbb{E}(1_E) = \mathbb{P}(E)\text{.} \end{equation*}

Proof.

Easily deduced from the definition of expectation and of \(\mathbb{P}\text{.}\)
The simplest tool from the toolbox of the “probabilistic method” in combinatorics, and the only one that we will use in this course, is the First Moment Method. This is based on the following simple principle.

Observation B.0.8. First Moment Principle.

If \(X\) is a random variable such that \(\mathbb{E}(X)\leq \mu\text{,}\) then
\begin{equation*} \mathbb{P}(X\leq \mu)>0\text{.} \end{equation*}

Proof.

We prove the contrapositive. Suppose that \(\mathbb{P}(X\leq \mu)=0\text{.}\) That is,
\begin{equation*} \mathbb{P}(x\in \Omega: X(x)\leq \mu)=0\text{.} \end{equation*}
Then all non-zero terms in the sum in the definition of \(\mathbb{E}(X)\) are greater than \(\mu\text{.}\) Since probabilities sum to one, this tells us that \(\mathbb{E}(X)>\mu\text{.}\)
The way in which we typically apply the First Moment Principle is described as follows. Suppose that we choose a certain type of mathematical object (e.g. a graph, a function or a set) randomly from a particular collection of objects. If the randomly chosen object satisfies a particular inequality on average, then there must be at least one element of the collection which satisfies the inequality. Let us demonstrate the method with a quick and easy application of it due to Erdős [84].

Proof.

We flip a coin for each element of \(X\text{.}\) If the coin flip for \(x\) is heads, we colour it red and if it is tails we colour it blue. For a given set \(S\subseteq X\) of size \(n\text{,}\) the probability that every element of \(S\) is red is
\begin{equation*} \left(\frac{1}{2}\right)^n\text{.} \end{equation*}
Likewise, the probability that every element is blue is the same. Thus, the expected number of sets in \(\mathcal{F}\) whose elements all get the same colour is
\begin{equation*} |\mathcal{F}|\left(\left(\frac{1}{2}\right)^n + \left(\frac{1}{2}\right)^n\right) = \frac{|\mathcal{F}|}{2^{n-1}} \end{equation*}
which is less than one since \(|\mathcal{F}|\lt 2^{n-1}\text{.}\) Therefore, by the First Moment Principle, there must exist a colouring in which fewer than one set in \(\mathcal{F}\) is monochromatic. Since the number of monochromatic sets is an integer, it must be zero.
Let us also mention one more classical result, known as the Chernoff Bound. It will only be used in the analysis of some simple examples.
For further reading on the probabilistic method in combinatorics, see the books of Alon and Spencer [13] and Molloy and Reed [220].