First, suppose that
\((X,\mathcal{I})\) is a matroid. Clearly, since the weight function is non-negative, the optimum is achieved by a maximum weight basis. We prove that, at every stage of the algorithm, if
\(Y\) is the set of points chosen so far, then there exists a basis
\(B\) of maximum weight such that
\(Y\subseteq B\text{.}\) This is certainly true at the beginning of the algorithm when
\(Y=\emptyset\text{.}\) Now, suppose that it is true up to a certain stage, let
\(Y\) be the set of points chosen so far, and let
\(x\) be the next point to be added to
\(Y\text{.}\) Note that, by definition of the greedy algorithm,
\(x\) is a point such that
\(Y\cup\{x\}\) is independent and the weight of
\(x\) is maximized subject to this. Since the property holds so far, there exists a basis
\(B\) of maximum weight containing
\(Y\text{.}\) If
\(x\in B\text{,}\) then we are happy. If not, then, by
Property 3, we can continue to add elements from
\(B\) to
\(Y\cup\{x\}\) while staying independent until we reach an independent set
\(B'\) of the same cardinality as
\(B\text{.}\) Since
\(x\) is the only non-element of
\(B\) added during this construction, we have
\(B'=(B\setminus\{z\})\cup\{x\}\) for some
\(z\in B\text{.}\) However, recall that the greedy algorithm decided to add
\(x\) to
\(Y\) rather than
\(z\text{,}\) despite the fact that
\(Y\cup\{z\}\subseteq B\) is independent, and so we must have
\(w(x)\geq w(z)\text{.}\) This means that
\(w(B')=w(B)-w(z)+w(x)\geq w(B)\) and so, since
\(B\) is a maximum weight basis, so is
\(B'\text{.}\) Thus, at every stage, the set of points selected is contained in a maximum weight basis and so this is true when the algorithm terminates.
Now, suppose that \((X,\mathcal{I})\) is not a matroid and let \(Y,Z\in\mathcal{I}\) such that \(|Y|<|Z|\) and there does not exist \(z\in Z\) such that \(Y\cup\{z\}\in\mathcal{I}\text{.}\) We would like to set up the weight function in such a way that the best strategy would be to select all of the elements of \(Z\) but, unfortunately, the greedy algorithm is tricked into choosing the elements of \(Y\text{.}\) For this, we pick \(\varepsilon>0\) to be a small constant to be chosen later and define
\begin{equation*}
w(x):=\begin{cases} 1+\varepsilon \amp \text{if }x\in Y,\\ 1\amp \text{if }x\in Z\setminus Y,\\ 0 \amp \text{otherwise}.\end{cases}
\end{equation*}
Clearly, in the first
\(|Y|\) steps, the greedy algorithm will simply select the elements of
\(Y\) one by one. Note that, throughout these steps, the set of points selected so far remains a subset of
\(Y\) and so it is in
\(\mathcal{I}\) by
Property 2. After this stage, the algorithm may be able to select additional points of weight zero but, by our assumption on
\(Z\text{,}\) it cannot add any points from
\(Z\setminus Y\text{.}\) So, it ends up with a final weight of
\((1+\varepsilon)|Y|\text{.}\) However, the set
\(Z\) certifies that there exists a set in
\(\mathcal{I}\) of weight at least
\(|Z|\text{.}\) So, since
\(|Z|> |Y|\text{,}\) if
\(\varepsilon\) is small enough, then we get that the greedy algorithm fails to find the optimal solution to the problem.