We will actually prove the stronger statement that every non-empty collection \(\mathcal{F}\subseteq 2^{[n]}\) shatters at least \(|\mathcal{F}|\) distinct subsets of \([n]\text{.}\) Given this, the result will follow as there are at most \(\sum_{i=0}^{k-1}\binom{n}{i}\) sets of cardinality less than \(k\text{.}\)
We proceed by induction on \(|\mathcal{F}|\text{.}\) Every non-empty family shatters the empty set, which proves the case \(|\mathcal{F}|=1\text{.}\) Now, suppose that \(|\mathcal{F}|\geq2\) and let \(\mathcal{S}\) be the collection of subsets of \([n]\) shattered by \(\mathcal{F}\text{.}\) Our goal is to prove that \(|\mathcal{S}|\geq|\mathcal{F}|\text{.}\)
Since \(|\mathcal{F}|\geq2\text{,}\) we can let \(F_1,F_2\in\mathcal{F}\) such that \(F_1\setminus F_2\neq\emptyset\text{.}\) Choose \(x\in F_1\setminus F_2\) and let
\begin{equation*}
\mathcal{F}_1:= \{F\in\mathcal{F}: x\in F\}
\end{equation*}
and
\begin{equation*}
\mathcal{F}_2:= \{F\in\mathcal{F}: x\notin F\}\text{.}
\end{equation*}
For \(i\in\{1,2\}\text{,}\) let \(\mathcal{S}_i\) be the collection of subsets of \([n]\) shattered by \(\mathcal{F}_i\text{.}\) The families \(\mathcal{F}_1\) and \(\mathcal{F}_2\) partition \(\mathcal{F}\) and, by our choice of \(x\text{,}\) neither of them is empty. So, we can apply the inductive hypothesis to each of them. That is, \(|\mathcal{S}_1|\geq |\mathcal{F}_1|\) and \(|\mathcal{S}_2|\geq |\mathcal{F}_2|\text{.}\)
Now, suppose that \(S\subseteq [n]\) is shattered by either \(\mathcal{F}_1\) or by \(\mathcal{F}_2\text{.}\) Then, by definition of shattering, we immediately get that \(S\) is also shattered by \(\mathcal{F}\text{.}\) Moreover, we observe that \(x\) cannot be an element of \(S\text{;}\) this is because every set in \(\mathcal{F}_1\) contains \(x\) and no set in \(\mathcal{F}_2\) contains \(x\text{.}\) Therefore,
\begin{equation}
\mathcal{S}_1\cup \mathcal{S}_2\subseteq \{S\in \mathcal{S}: x\notin S\}\text{.}\tag{1.3.1}
\end{equation}
Of course, this is not quite enough to finish the proof, since \(\mathcal{S}_1\) and \(\mathcal{S}_2\) may have common elements. However, given any \(S\in \mathcal{S}_1\cap \mathcal{S}_2\text{,}\) recall that \(x\notin S\text{.}\) Since both \(\mathcal{F}_1\) and \(\mathcal{F}_2\) shatter \(S\text{,}\) we get that \(\mathcal{F}\) shatters the set \(S\cup \{x\}\text{.}\) Therefore,
\begin{equation}
\{S\cup\{x\}: S\in \mathcal{S}_1\cap \mathcal{S}_2\}\subseteq \{S\in \mathcal{S}: x\in S\}\text{.}\tag{1.3.2}
\end{equation}
Putting
(1.3.1) and
(1.3.2) together, and applying the inductive hypothesis, we get that
\begin{equation*}
|\mathcal{S}|= |\{S\in \mathcal{S}: x\notin S\}| + |\{S\in \mathcal{S}: x\in S\}|
\end{equation*}
\begin{equation*}
\geq |\mathcal{S}_1\cup\mathcal{S}_2| + |\mathcal{S}_1\cap \mathcal{S}_2|= |\mathcal{S}_1|+|\mathcal{S}_2|
\end{equation*}
\begin{equation*}
\geq |\mathcal{F}_1|+|\mathcal{F}_2|=|\mathcal{F}|
\end{equation*}
and we are done.