Skip to main content

Section 2.3 Farkas’ Lemma and Strong Duality for Linear Programs

Our goal in this section is to prove the remarkable Strong Duality Theorem for linear programming.
The key idea underpinning the proof of the Strong Duality Theorem comes from a statement known as Farkas’ Lemma. To prove Farkas’ Lemma, we will need the following result known as the Hyperplane Separation Theorem. Recall that a set \(C\subseteq \mathbb{R}^n\) is closed if it contains all of its limit points; a limit point for \(C\) is a point \(x\in \mathbb{R}^n\) for which there exists \(x_1,x_2,\dots\in C\) converging to \(x\text{.}\) The closure \(\overline{S}\) of a set \(S\subseteq\mathbb{R}^n\) is the union of \(S\) and the set of all limit points of \(S\text{.}\) A subset of \(\mathbb{R}^n\) is compact if it is closed and bounded. An affine hyperplane is a set of the form \(\{x\in\mathbb{R}^n:w^Tx = \alpha\}\) for some fixed vector \(w\in\mathbb{R}^n\) and real number \(\alpha\text{.}\) In this sense, the next theorem says that if a point \(y\) lies outside of a closed convex set, then there is an affine hyperplane such that separates \(y\) from \(C\text{;}\) i.e. they lie on different “sides” of the hyperplane.

Proof.

First, let us show that there exists a point \(z\in C\) such that \(\|y-z\|_2\leq \|y-x\|_2\) for all \(x\in C\text{.}\) That is, we show that there is a point of \(C\) at minimum distance from \(y\text{.}\) Start by taking \(x_0\in C\) arbitrarily and define \(r=\|y-x_0\|_2\) and \(C' = C\cap \overline{B_r(y)}\) where \(B_r(y)=\{x: \|y-x\|_2<r\}\) is the ball of radius \(r\) around \(y\). Then the set \(C'\) is closed and bounded and so it is compact. Define \(f:C'\to \mathbb{R}\) by \(f(x)=\|y-x\|_2\text{.}\) Then this is a continuous function on a compact set, and so there exists some \(z\in C'\) at which \(f\) is minimized (by the Extreme Value Theorem
 1 
en.wikipedia.org/wiki/Extreme_value_theorem
). For any point \(x\) of \(C\setminus C'\text{,}\) the distance from \(x\) to \(y\) is greater than \(r\) by defintiion of \(C'\) and so \(z\) must minimize the distance to \(y\) among all points of \(C\text{.}\)
Okay, so let us now define the hyperplane which separates \(y\) from \(C\text{.}\) We set \(w=2y-2z\) and \(\alpha:=y^Ty -z^Tz\text{.}\) Let us now verify the conditions. First, we show that \(w^Ty>\alpha\text{.}\) Since \(y\neq z\) and no non-zero vector is orthogonal to itself, we have
\begin{equation*} 0< (y-z)^T(y-z) = y^Ty - 2y^Tz + z^Tz. \end{equation*}
Rearranging this yields
\begin{equation} \alpha = y^Ty -z^Tz < 2y^Ty - 2z^Ty = w^Ty\tag{2.3.1} \end{equation}
as desired.
Now, we need to show that every \(x\in C\) satisfies \(w^Tx < \alpha\text{.}\) Suppose not and let \(x\in C\) such that
\begin{equation*} 2y^Tx - 2z^Tx \geq y^Ty - z^Tz. \end{equation*}
Rearranging this yields
\begin{equation} (x-z)^T(y-z)\geq (x-y)^T(z-y).\tag{2.3.2} \end{equation}
By our choice of \(z\text{,}\) we must have
\begin{equation*} (x-y)^T(x-y) \geq (z-y)^T(z-y) \end{equation*}
which can be rearranged to yield
\begin{equation} (y-x)^T(z-x)\geq (x-z)^T(y-z).\tag{2.3.3} \end{equation}
Now, observe that the convex hull of \(\{x,y,z\}\) forms a triangle. Each triangle has at most one angle of 90 degrees or more, and so only one of the quantities \((x-z)^T(y-z)\text{,}\) \((y-x)^T(z-x)\) or \((x-y)^T(z-y)\) can be non-positive. So, by Equation (2.3.2) and Equation (2.3.3), the first two of these must be strictly positive. Thus, the angles of the triangle at \(x\) and \(z\) are both acute. Thus, there exists a point \(v\) on the line segment between \(x\) and \(z\text{,}\) distinct from both \(z\) and \(x\text{,}\) such that the line from \(y\) to \(v\) is perpendicular to the line segment between \(x\) and \(z\text{.}\) Since \(C\) is convex and both \(x\) and \(z\) are in \(C\text{,}\) the point \(v\) lies within \(C\) as well. The convex hull of \(\{v,z,y\}\) forms a triangle with a right angle at \(v\) which, by Pythagorus, implies that \(v\) is closer to \(y\) than \(z\) is, contradicting the choice of \(z\) and completing the proof.

Proof.

We need to prove two things: that the two alternatives cannot both be true, and that they cannot both be false. Let us start by showing that they cannot both be true. Suppose that such \(x\) and \(y\) both exist. We start with the equation
\begin{equation*} Ax=b \end{equation*}
and multiply both sides on the left by \(y^T\) to yield
\begin{equation*} y^T(Ax)=y^Tb \end{equation*}
which, by associativity, is
\begin{equation*} (y^TA)x=y^Tb. \end{equation*}
However, by definition of \(y\) and \(x\text{,}\) the left side is the inner product of two vectors with non-negative coefficients, and is therefore non-negative, whereas the right side is negative. This is a contradiction, and so the alternatives cannot both be true.
Now, let us show that at least one of the alternatives must hold. Let \(a_1,\dots,a_n\) denote the columns of \(A\) and take
\begin{equation*} C:=\left\{\sum_{j=1}^n\lambda_ja_j: \lambda_1,\dots,\lambda_n\geq0\right\}. \end{equation*}
The set \(C\) is known as the convex cone generated by \(\{a_1,\dots,a_n\}\text{.}\) The set \(C\) is easily shown to be closed and convex. If we have \(b\in C\text{,}\) then the first alternative holds by choosing \(\lambda_1,\dots,\lambda_n\) such that \(\sum_{i=1}^n\lambda_ia_i = b\) and then taking \(x_j=\lambda_j\) for \(1\leq j\leq n\text{.}\) So we may assume that \(b\notin C\text{.}\) By the Separating Hyperplane Theorem, there exists \(w\in\mathbb{R}^m\) and \(\alpha\in\mathbb{R}\) such that \(w^Tb>\alpha\) and \(w^Tx<\alpha\) for all \(x\in C\text{.}\) Note that the zero vector is in \(C\) and so we must have \(\alpha\geq0\text{.}\) Also, there cannot be any \(x\in C\) satisfying \(w^Tx>0\) as, if such an \(x\) existed, then we could take \(\lambda> \frac{\alpha}{w^Tx}\) and get that \(\lambda x\in C\) and \(w^T(\lambda x) >\alpha\text{.}\) Thus, we must have that \(w^Tx\leq 0\) for all \(x\in C\text{.}\) Now, to complete the proof, we take \(y=-w\text{.}\) We get
\begin{equation*} y^TA = -w^TA \geq 0 \end{equation*}
since \(w^Tx\leq 0\) for all \(x\in C\) and
\begin{equation*} y^Tb = -w^Tb < -\alpha \leq 0 \end{equation*}
and so the proof is complete.

Proof.

Apply Farkas’ Lemma to the matrix \(A' = \begin{bmatrix}A \amp -A \amp I\end{bmatrix}\) where \(I\) is the \(m\times m\) identity matrix. Think about what the two alternatives give you.
Finally, we prove the Strong Duality Theorem for Linear Programming.

Proof of the Strong Duality Theorem for Linear Programming.

We already know that the value of the primal is at most the value of the dual by the Weak Duality Theorem. So, all that remains is to prove that the value of the primal is at least the value of the dual. We claim that it suffices to find a primal feasible \(x\) and a dual feasible \(y\) such that
\begin{equation*} c^Tx \geq b^Ty. \end{equation*}
Indeed, if such a pair of vectors can be found, then the maximum value for the primal objective function will immediately be at least as large as the minimum for the dual, which is exactly what we want. So, what we want to show is that following polyhedron in variables \(x_1,\dots,x_n,y_1,\dots,y_m\) is non-empty:
\begin{gather*} Ax\leq b\\ x\geq 0\\ A^Ty\geq c\\ y\geq 0\\ c^Tx \geq b^Ty. \end{gather*}
Rearranging these constraints to put the variables on the left side gives us
\begin{gather*} Ax\leq b\\ -A^Ty\leq -c\\ -c^Tx + b^Ty \leq 0\\ -x\leq 0\\ -y\leq 0. \end{gather*}
Let us now define a matrix which allows us to express all of these constraints simultaneously. This will be a \((2n+2m+1)\times (n+m)\) matrix. The first \(m\) rows consist of a copy of \(A\) on the left side followed by \(m\) columns of zeros. The next \(n\) rows consist of \(n\) columns of zeros followed by the matrix \(-A^T\text{.}\) Next, we have a row consisting of the vector \(-c^T\) followed by the vector \(b^T\text{.}\) Finally, the last \(n+m\) rows are the negative of the identity matrix of dimension \(n+m\text{.}\) Also, we let \(d\in \mathbb{R}^{2n+2m+1}\) be the vector where the first \(m\) entries are equal to \(b\text{,}\) the next \(n\) entries are equal to \(-c\text{,}\) and all of the other entries are zero. If there exists a solution \(z\in\mathbb{R}^{n+m}\) to \(Mz\leq d\text{,}\) then taking \(x\) and \(y\) to be the vectors corresponding to the first \(n\) and last \(m\) entries of \(z\text{,}\) respectively, then all of the above constraints are satisfied which gives us what we want. Otherwise, by the Theorem of the Alternative, there exists a vector \(w\in\mathbb{R}^{2n+2m+1}\) such that \(w^TM=0\text{,}\) \(w^Td< 0\) and \(w\geq0\text{.}\) Let us write \(w=\begin{bmatrix}p^T\amp q^T\amp r\amp s^T\amp t^T\end{bmatrix}^T\) where \(p,t\in\mathbb{R}^m\text{,}\)\(q,s\in\mathbb{R}^n\) and \(r\in\mathbb{R}\text{.}\) Our conditions on \(w\) tell us that
\begin{equation*} w^TM=0 \Longrightarrow \begin{cases}p^TA -rc^T-s^T=0\\ -q^TA^T+rb^T-t^T=0\end{cases} \end{equation*}
\begin{equation*} w^Td<0\Longrightarrow p^Tb - q^Tc < 0. \end{equation*}
By taking transposes, we can rewrite these equalities and inequalities as
\begin{equation*} A^Tp -rc-s=0 \end{equation*}
\begin{equation*} -Aq+rb-t=0 \end{equation*}
\begin{equation*} b^Tp - c^Tq < 0. \end{equation*}
We divide the proof into cases based on the value of \(r\text{.}\)

Case.

\(r=0.\)
In this case, we have \(A^Tp=s\) and \(Aq=-t\text{.}\) Let \(x\) and \(y\) be any feasible points for the primal and dual, respectively. Then, since the entries of \(q\) and \(t\) are non-negative, for any \(\alpha\geq0\text{,}\) we have \(A(x+\alpha q)= Ax+\alpha Aq\leq b\) and so \(x+\alpha q\) is primal feasible. Similarly, \(y+\alpha p\) is dual feasible. Combining this with the Weak Duality Theorem, we have that the following holds for all \(\alpha\geq0\)
\begin{equation*} c^T(x + \alpha q)\leq b^T(y+\alpha p). \end{equation*}
Rearranging, we get
\begin{equation*} c^Tx -b^Ty\leq \alpha (b^Tp - c^Tq). \end{equation*}
Now, recall that \(b^Tp - c^Tq<0\text{.}\) This is a contradiction since the expression on the left side is a fixed number and, by choosing \(\alpha\) sufficiently large, we can make the right an arbitrarily large negative number. This contradiction completes the proof in this case.

Case.

\(r\neq0.\)
Note that \(r>0\) since the entries of \(w\) are all non-negative. We claim that the vector \(r^{-1} q\) is primal feasible. Indeed, \(A(r^{-1}q) = b + r^{-1}t\) where \(r\) and the entries of \(t\) are non-negative, and so \(A(r^{-1}q)\leq b\text{.}\) Similarly, \(r^{-1}p\) is dual feasible. However, the inequality \(b^Tp - c^Tq < 0\) deduced earlier contradicts the Weak Duality Theorem, and so the proof is complete.
The problem of showing that the value of a linear program is at least a particular value, say \(\alpha\text{,}\) is clearly in NP because one can simply take any feasible point as a certificate. Interestingly, the Strong Duality Theorem tells us that, for feasible LPs with feasibly duals, this problem is also in co-NP. As we mentioned in our discussion of Problem 1.4.8, a problem being in P and co-NP is very good evidence that it is polynomial-time solvable. So, it is perhaps not surprising (but still, not easy to prove) that linear programming problems can be solved in polynomial time.
Here are two YouTube videos related to this section: