Lagrangian duality in convex analysis: A (not so) summarized account

I've been dabbling in some convex optimization and decided to write this post to summarize the ideas in a (hopefully) clear and linear fashion. The material is mostly straight out of Boyd and Vandenberghe (freely available), inspired by this very helpful MSE post. Let's get started.

We'll work with $\mathbb{R}^n$ for concreteness, though the theory extends to much more general contexts (such as Banach spaces). I presume you know a convex set $C \subseteq \mathbb{R}^n$ is one for which $x, y \in C$ and $0 \leq t \leq 1$ implies $tx + (1-t)y \in C$, and that a real-valued function $f$ is convex if its domain is convex and \[ f(tx + (1-t)y) \leq tf(x) + (1-t)f(y) \] for $x, y \in \text{dom}(f)$ and $0 \leq t \leq 1$ as before. I said $f$ is real-valued, but we also allow it to take on $\pm \infty$, adopting the standard conventions for arithmetic and inequalities with infinities, which will be useful later. Because of that, we define a proper convex function to be one which never equals $-\infty$ and attains at least one finite value (i.e. it's not constant equal to $+\infty$). These are the functions for which the problem of minimizing is sensible.

Hyperplanes, halfspaces and separation

A very important aspect of convex sets is that they can be (more or less) characterized by halfspaces which contain them. We think of a hyperplane as a set of the form $\{x\ |\ a^T x = b\}$ for $a \neq 0$ a vector and $b$ a scalar[1]. A halfspace is then of the form $\{x\ |\ a^T x \leq b\}$ (or $a^T x \geq b$, of course). Since halfspaces are closed and convex, it follows easily that an arbitrary intersection of halfspaces is a closed convex set. Turns out the reciprocal is true! This follows from the

Separating hyperplane theorem: If $C, D$ are nonempty disjoint convex sets, there exists a hyperplane which separates them; that is, there exists a nonzero vector $a$ and a scalar $b$ such that $\{x\ |\ a^T x \leq b\}$ contains $C$ and $\{x\ |\ a^T x \geq b\}$ contains $D$.
Proof: Take \[ K = C + (-D) = \{c - d\ |\ c \in C, d \in D\} \] (which is convex). The set $\overline{K}$, being closed and non-empty, has a vector $v \in \overline{K}$ of minimum norm. It then holds that[2] \[ w^T v \geq |v|^2 \] for all $w \in \overline{K}$. This means \[ c^T v \geq \left( \sup_{d \in D} d^T v \right) + |v|^2 \] for all $c \in C$ (since the inequality without the supremum holds for each $d \in D$), and certainly \[ d^T v \leq \left( \sup_{d \in D} d^T v \right) + |v|^2 \] for all $d \in D$. Thus if $v \neq 0$ we have struck gold and separated them already (this corresponds to the case where the minimum possible distance between points of $C$ and $D$ is achieved). In this case, note we separated them strictly (i.e. neither intersect the hyperplane).

For the general case, we do a trick. If $\text{int}\ K \neq \emptyset$, write \[ \text{int}\ K = \bigcup_{n=1}^{\infty} K_n \] as the increasing union of compact convex sets $K_n$[3]. Let $v_n \in K_n$ be a vector of minimum norm for each of them; we know $v_n \neq 0$ and $w^T v_n \geq |v_n|^2$ for $w \in K_n$. Normalizing the $v_n$'s, we may pass to a convergent subsequence (since the sphere is compact -- hooray for finite dimensions) $v_n \to v$. We have $v \neq 0$ and $w^T v \geq 0$ for all $w \in K$ (it holds on the boundary by continuity). Then, again, we have \[ c^T v \geq \left( \sup_{d \in D} d^T v \right) \] and \[ d^T v \leq \left( \sup_{d \in D} d^T v \right) \] for all $c, d \in C, D$, thus separating them. Finally, if $\text{int}\ K = \emptyset$, then $K$ must be contained in a hyperplane (otherwise there would be enough independent points to span a simplex). This means there is a non-zero vector $v$ and a scalar $c$ such that $w^T v \geq c$ for all $w \in K$ (they are actually equal), and the proof finishes exactly as before. $\square$

This theorem implies that if $C$ is closed and convex, and $p \not\in C$, then there is a hyperplane strictly separating them. Intersecting corresponding halfspaces over all points $p \not\in C$ shows that $C$ is an intersection of halfspaces as we claimed. Another consequence is the

Supporting hyperplane theorem: If $C$ is convex and $x_0 \in \text{bd}\ C = \text{cl}\ C - \text{int}\ C$ then there exists a supporting hyperplane at $x_0$, i.e. there exists a vector $a \neq 0$ such that $\{x\ |\ a^T x \leq a^T x_0\}$ contains $C$ (the supporting hyperplane is then $\{x\ |\ a^T x = a^T x_0\}$).
Proof: If $\text{int}\ C$ is non-empty, it suffices to separate $\text{int}\ C$ and $\{x_0\}$. If $\text{int}\ C = \emptyset$ then $C$ must be contained in a hyperplane, and so just pick that hyperplane. $\square$

You should picture the supporting hyperplane as normal to $C$ at $x_0$.

Understanding convex functions

One relationship between the concepts of convex function and convex set is via the epigraph of a function $f$, which is the region above the graph of $f$: \[ \text{epi}(f) = \{(x,t)\ |\ x \in \text{dom}(f), t \geq f(x)\}\] You may check that a function is convex if and only if its epigraph is a convex set (incidentally, this leads to a nice proof of the fact that the pointwise supremum of an arbitrarily family of convex functions is convex -- do you see why?) Under the philosophy of characterizing a closed convex set by the halfspaces which contain it, the following fact really ties the room together:

Fact: Given a convex $f$, define $\widetilde{f}$ by \[ \widetilde{f}(x) = \sup \{ g(x)\ |\ g\ \text{affine,}\ g(z) \leq f(z)\ \text{for all}\ z\} \] Then $f = \widetilde{f}$ if and only if $f$ is proper and closed (i.e. $\text{epi}(f)$ is a closed set).[4]

Building on this idea of characterizing a convex function by the affine maps that globally underestimate it, we can ask: given an $f$, what is the best affine global understimator of slope $y$? Such a function would be of the form $y^T x - \alpha$, and it's an underestimator if (and only if) \[ f(x) \geq y^T x - \alpha \] for all $x \in \text{dom}(f)$ (or we can say for all $x$, with the tacit understanding that $f$ is $\infty$ wherever undefined). This happens if and only if \[ \alpha \geq \sup_{x} y^T x - f(x) \] and therefore it is sensible to define the conjugate of $f$ as \[ f^*(y) = \sup_{x} y^T x - f(x) \] Then the best global affine understimator for $f$ with slope $y$ is \[ y^T x - f^*(y) \] Since $f^*(y)$ is the pointwise supremum of a family of affine functions of $y$, it follows $f^*$ is convex. The transform $f \mapsto f^*$ is also known as the Legendre transform. Note \[ f^{**}(x) = \sup_{y} y^T x - f^*(y) = \widetilde{f}(x) \] and so we obtain the fact that, for a convex $f$, one has $f = f^{**}$ if and only if $f$ is proper and closed (while $f^{**} \leq f$ always holds).

Convexity shines when you bring in calculus. It's not hard to show that a differentiable function $f$ is convex if and only if \[ f(y) \geq f(x) + \nabla f(x)^T (y - x) \] for all $x, y \in \text{dom}(f)$ (this is saying the graphs of linear approximations to $f$ form supporting hyperplanes to its epigraph)[5]. Hence if $\nabla f(x) = 0$ it follows $f(x)$ is a global minimum of $f$. That's why convex functions are so pleasant to minimize: they are the calculus student's dream, where differentiating and setting equal to 0 is guaranteed (if a solution exists!) to give you a global minimum. It is similarly pleasant to maximize differentiable concave functions $f$ (i.e. those for which $-f$ is convex).

Optimization problems

An optimization problem is of the form: \begin{equation*} \begin{split} \text{minimize}\ & f_0(x) \\ \text{subject to}\ & f_i(x) \leq 0,\ i = 1,\ldots , n \\ & h_k(x) = 0,\ k = 1, \ldots , m \end{split} \end{equation*} The function $f_0$ to be minimized is called the objective function, and the $f_i$'s and $h_k$'s are the constraints. The domain $\mathcal{D}$ of the problem is the intersection of the domains of $f_0, f_1, \ldots, f_n, h_1, \ldots, h_m$. The points of $\mathcal{D}$ satisfying the constraints are called feasible, and the goal is to find an optimal point $x^{*}$, i.e. one which attains the optimal value \[ p^* = \inf \{ f_0(x)\ |\ f_i(x) \leq 0, h_k(x) = 0 \} \] Thus $p^* = -\infty$ if $f_0$ is unbounded below over the feasible set, and $p^* = \infty$ if the feasible set is empty. A convex optimization problem is one for which $f_0, f_1, \ldots, f_n$ are convex and $h_1, \ldots, h_m$ are affine, so that it is of the form \begin{equation*} \begin{split} \text{minimize}\ & f_0(x) \\ \text{subject to}\ & f_i(x) \leq 0,\ i = 1, \ldots, n \\ & a^T_k x = b_k,\ k = 1\ \ldots, m \end{split} \end{equation*} The last constraint could equivalently be stated as $Ax = b$ for $A = [a^T_k]$ and $b = (b_k)$. Note the domain of the problem $\mathcal{D}$ is convex; therefore, at a conceptual level, a convex optimization problem is one of minimizing a convex function over a convex set. Similarly, the problem of maximizing a concave function $f_0$ (over a convex set) is also considered convex, since it is equivalent to minimizing $-f_0$.

An example of particular interest is a linear program, which is of the form: \begin{equation*} \begin{split} \text{minimize}\ & c^T x \\ \text{subject to}\ & Gx \preccurlyeq h \\ & Ax = b \end{split} \end{equation*} where $G, A$ are linear maps and $\preccurlyeq$ refers to coordinate-wise inequality of vectors. We could have taken the objective function to be $c^T x + d$, since it would not affect what the optimal points are.

Lagrangians and duality

We may associate to an arbitrary optimization problem of the form \begin{equation*} \begin{split} \text{minimize}\ & f_0(x) \\ \text{subject to}\ & f_i(x) \leq 0,\ i = 1, \ldots, n \\ & h_k(x) = 0,\ k = 1, \ldots, m \end{split} \end{equation*} the Lagrangian \[ L(x, \lambda, \nu) = f_0(x) + \sum_{i=1}^{n} \lambda_i f_i(x) + \sum_{k=1}^{m} \nu_k h_k(x) \] This is a sort of linearization of the constraints (but I strongly recommend reading the aforementioned post to understand its motivation). To see the usefulness, we further define the dual function \[ g(\lambda, \nu) = \inf_{x \in \mathcal{D}} L(x, \lambda, \nu) \] (recall $\mathcal{D}$ is the domain of the optimization problem, i.e. where the objective and constraint functions are defined). Since $g$ is the pointwise infimum of a family of affine functions of $(\lambda, \nu)$, it follows $g$ is concave. Its interest comes from the fact that, if $\lambda \succcurlyeq 0$ (and $\nu$ is arbitrary), then \[ g(\lambda, \nu) \leq p^* \] (where $p^*$ is the optimal value of the optimization problem). Indeed, for any feasible point $\widetilde{x}$, we have \[L(\widetilde{x}, \lambda, \nu) = f_0(\widetilde{x}) + \sum_{i=1}^{n} \lambda_i f_i(\widetilde{x}) \leq f_0(\widetilde{x}) \] since $h_k(\widetilde{x}) = 0$, $f_i(\widetilde{x}) \leq 0$ and $\lambda_i \geq 0$. Thus \[ g(\lambda, \nu) \leq L(\widetilde{x}, \lambda, \nu) \leq f_0(\widetilde{x}) \] and since $g(\lambda, \nu) \leq f_0(\widetilde{x})$ for every feasible $\widetilde{x}$ it follows $g(\lambda, \nu) \leq p^*$ as desired.

The dual function thereby allows us to obtain lower bounds on $p^*$. At this stage we ask: what is the best lower bound one can obtain in this way? This naturally leads us to the dual problem: \begin{equation*} \begin{split} \text{maximize}\ & g(\lambda, \nu) \\ \text{subject to}\ & \lambda \succcurlyeq 0 \end{split} \end{equation*} Note this is a convex optimization problem (since $g$ is concave) regardless of whether the original (also called primal) problem was so. Denoting by $d^*$ the optimal value of the dual problem, we have already established that \[ d^* \leq p^* \] This fact is known as weak duality. When equality holds, we unsurprisingly call it strong duality. We'd very much like to know when strong duality holds.

Slater's condition

Slater's condition gives us a (fairly weak) sufficient condition for a convex optimization problem to satisfy strong duality.

Theorem (Slater's condition): Consider a convex optimization problem of the form \begin{equation*} \begin{split} \text{minimize}\ & f_0(x) \\ \text{subject to}\ & f_i(x) \leq 0,\ i = 1, \ldots, n \\ & A x = b \end{split} \end{equation*} If there exists a point $x \in \text{relint}\ \mathcal{D}$ (the interior of $\mathcal{D}$ within its affine hull, the smallest affine subspace containing it) such that \[ \begin{split} f_i(x) & < 0,\ i = 1, \ldots, n \\ Ax & = b \end{split} \] (i.e. it satisfies the inequality contraints strictly) then strong duality holds, i.e. $d^* = p^*$.

Remark: Such a point $x$ is called strictly feasible. Furthermore, strong duality still holds if $x$ is (non-strictly) feasible and satisfies $f_i(x) < 0$ for the non-affine $f_i$. This is known as the weak form of Slater's condition.[6]

Proof: We may without loss of generality restrict the problem to the affine hull of $\mathcal{D}$ by pre-composing with an affine isomorphism from a (potentially) lower-dimensional Euclidean space, maintaining convexity. We thus assume $\text{relint}\ D = \text{int}\ D$. We may also assume the matrix $A$ has full rank $r$, since we might as well restrict its codomain to its image (and post-compose with a linear isomorphism accordingly).

Define \[ \mathcal{A} = \{(u,v,t)\ |\ \exists x \in \mathcal{D},\ f_i(x) \leq u_i\ (1 \leq i \leq n),\ Ax - b = v,\ f_0(x) \leq t\} \subseteq \mathbb{R}^n \times \mathbb{R}^r \times \mathbb{R}\] which is convex, and \[ \mathcal{B} = \mathbb{R}^n \times \mathbb{R}^r \times (-\infty, p^*) \] which is also convex. Clearly $\mathcal{A}$ is non-empty (assuming $\mathcal{D}$ is, otherwise we wouldn't have a problem to begin with), and $\mathcal{B}$ is only empty if $p^* = -\infty$; but in that case strong duality is automatic from weak duality. So we assume $\mathcal{A}, \mathcal{B}$ are non-empty; note also they are disjoint (since one can't have $f_0(x) < p^*$ without contradicting the optimality of $p^*$). Note, finally, that $p^*$ is a finite number, since our hypothesis guarantees the non-emptiness of the feasible set.

There then exists a separating hyperplane, i.e. $(\widetilde{\lambda}, \widetilde{v}, \mu) \neq 0$ and a scalar $\alpha$ such that \[ \begin{split} (u, v, t) \in \mathcal{A} & \implies \widetilde{\lambda}^T u + \widetilde{\nu}^T v + \mu t \geq \alpha \\ (u,v,t) \in \mathcal{B} & \implies \widetilde{\lambda}^T u + \widetilde{\nu}^T v + \mu t \leq \alpha \end{split} \] If $(u,v,t) \in \mathcal{A}$, we also have $(u',v,t') \in \mathcal{A}$ for any $u' \succcurlyeq u$ and $t' \geq t$. Therefore, we must have $\widetilde{\lambda} \succcurlyeq 0$ and $\mu \geq 0$; otherwise, the left hand side of the first inequality above would be unbounded below over $\mathcal{A}$. Now, for $(u,v,t) \in \mathcal{B}$, we have \[ \alpha \geq \widetilde{\lambda}^T u + \widetilde{\nu}^T v + \mu t = \mu t \] This condition is equivalent to $\mu p^* \leq \alpha$. If $x \in \mathcal{D}$, then $(f_1(x), \ldots, f_n(x), Ax - b, f_0(x)) \in \mathcal{A}$, and therefore \[ \sum_{i=1}^{n} \widetilde{\lambda}_i f_i(x) + \widetilde{\nu}^T (Ax - b) + \mu f_0(x) \geq \alpha \geq \mu p^* \] This is great if $\mu > 0$; in that case, dividing through we get \[ L(x, \widetilde{\lambda}/\mu, \widetilde{\nu}/\mu) \geq p^* \] which holds for all $x \in \mathcal{D}$. Therefore $g(\widetilde{\lambda}/\mu, \widetilde{\nu}/\mu) \geq p^*$, and thus strong duality holds (note that, in fact, the dual optimal value is achieved).

The point of Slater's condition is to rule out the $\mu = 0$ case. In that case, our previous argument shows \[ \sum_{i=1}^{n} \widetilde{\lambda}_i f_i(x) + \widetilde{\nu}^T (Ax - b) \geq 0 \] for $x \in \mathcal{D}$. Now (finally) let $\widetilde{x} \in \text{int}\ \mathcal{D}$ be a point satisfying the Slater conditions, i.e. $f_i(\widetilde{x}) < 0$ for $i = 1, \ldots, n$ and $A\widetilde{x} = b$. Thus we have \[ \sum_{i=1}^{n} \widetilde{\lambda}_i f_i(\widetilde{x}) \geq 0 \] Since $\widetilde{\lambda}_i \geq 0$ and $f_i(\widetilde{x}) < 0$ it follows $\widetilde{\lambda} = 0$. Thus \[ \widetilde{\nu}^T (Ax - b) \geq 0 \] for all $x \in \mathcal{D}$. We must have $\widetilde{\nu} \neq 0$ (since $(\widetilde{\lambda}, \widetilde{\nu}, \mu) \neq 0$ and the other two are zero). But $\widetilde{x}$ satisfies \[ \widetilde{\nu}^T (A\widetilde{x} - b) = 0 \] while belonging to the interior of $\mathcal{D}$. Given the full rank of $A$, we may perturb $\widetilde{x}$ by an arbitrarily small amount to make it so that \[ \widetilde{\nu}^T (A\widetilde{x}' - b) < 0 \] causing a contradiction. Therefore we could not have had $\mu = 0$, finishing the proof. $\square$

KKT conditions

To recap, we started with an arbitrary optimization problem \begin{equation*} \begin{split} \text{minimize}\ & f_0(x) \\ \text{subject to}\ & f_i(x) \leq 0,\ i = 1,\ldots , n \\ & h_k(x) = 0,\ k = 1, \ldots , m \end{split} \end{equation*} and associated the Lagrangian \[ L(x, \lambda, \nu) = f_0(x) + \sum_{i=1}^{n} \lambda_i f_i(x) + \sum_{k=1}^{m} \nu_k h_k(x) \] from which we extract the dual function \[ g(\lambda, \nu) = \inf_{x \in \mathcal{D}} L(x, \lambda, \nu) \] The dual problem is to maximize $g$ over $\lambda \succcurlyeq 0$. Suppose both the primal optimal values and the dual optimal values are attained and equal; take $x^*$ a primal feasible point such that $f_0(x^*) = p^*$ and $(\lambda^*, \nu^*)$ dual feasible such that $g(\lambda^*, \nu^*) = d^* = p^*$. Then \[ \begin{split} f_0(x^*) & = g(\lambda^*, \nu^*) \\ & = \inf_{x} L(x, \lambda^*, \nu^*) \\ & \leq f_0(x^*) + \sum_{i=1}^{n} \lambda^*_i f_i(x^*) + \sum_{k=1}^{m} \nu^*_k h_k(x^*) \\ & = f_0(x^*) \end{split} \] where the last equality holds because the sum involving $h_k$ is zero (since $h_k(x^*) = 0$) and the sum involving $f_i$ is non-negative and so can only be $0$. Thus we see \[ g(\lambda^*, \nu^*) = L(x^*, \lambda^*, \nu^*) \] and \[ \sum_{i=1}^{n} \lambda^*_i f_i(x^*) = 0 \] Since each term is non-negative, this means $\lambda^*_i f_i(x^*) = 0$ for $i = 1, \ldots, n$. In other words, the (dual) optimal Lagrange multiplier must be $0$ for each active inequality (i.e. when $f_i(x^*) < 0$).

Now assume further that $f_0, f_1, \ldots, f_n, h_1, \ldots, h_m$ are differentiable (with open domains). As we've seen, $x^*$ is a minimum of $L(x, \lambda^*, \nu^*)$, and thus \[ 0 = \nabla L(x^*, \lambda^*, \nu^*) = \nabla f_0(x^*) + \sum_{i=1}^{n} \lambda^*_i \nabla f_i(x^*) + \sum_{k=1}^{m} \nu^*_k \nabla h_k(x^*) \] We can conclude that in any optimization problem with differentiable objective and constraint functions where strong duality holds (and is attained), primal and dual optimal points $x^*, \lambda^*, \nu^*$ must satisfy the KKT conditions: \[ \begin{split} f_i(x^*) & \leq 0,\ i = 1, \ldots, n \\ h_k(x^*) & = 0,\ k = 1, \ldots, m \\ \lambda^*_i & \geq 0,\ i = 1, \ldots, n \\ \lambda^*_i f_i(x^*) & = 0,\ i = 1, \ldots, n \end{split} \] \[ \nabla f_0(x^*) + \sum_{i=1}^{n} \lambda^*_i \nabla f_i(x^*) + \sum_{k=1}^{m} \nu^*_k \nabla h_k(x^*) = 0 \] The main interest in these conditions is that they turn out to be sufficient for convex optimization problems, in the sense that if $(x^*, \lambda^*, \nu^*)$ satisfy the above conditions, then $x^*$ is feasible and primal optimal, $(\lambda^*, \nu^*)$ is feasible and dual optimal, and strong duality holds. Indeed, the first two conditions say $x^*$ is primal feasible. Due to the convexity of $L(x, \lambda^*, \nu^*)$ in $x$ (since $\lambda^*_i \geq 0$ and the $h_k$'s are affine), the last condition guarantees $x^*$ is a minimum, i.e. \[ \begin{split} g(\lambda^*, \nu^*) & = L(x^*, \lambda^*, \nu^*) \\ & = f_0(x^*) + \sum_{i=1}^{n} \lambda^*_i f_i(x^*) + \sum_{k=1}^{m} \nu^*_k h_k(x^*) \\ & = f_0(x^*) \end{split} \] since $\lambda^*_i f_i(x^*) = 0$ and $h_k(x^*) = 0$. Recalling that $g(\lambda, \nu) \leq f_0(\widetilde{x})$ for any $\lambda \succcurlyeq 0$ and primal feasible $\widetilde{x}$, we must have \[ d^* = g(\lambda^*, \nu^*) = f_0(x^*) \geq p^* \] By weak duality, $d^* \leq p^*$, and therefore strong duality holds, and $x^*, (\lambda^*, \nu^*)$ are optimal.

Notes

[1] We use $a^T x$ in place of $\langle a, x \rangle$ since it's a shorter and often more convenient notation. It would be conceptually better to think of $a$ as a dual vector, and write $a(x) = b$ instead, but let's keep things as simple as possible.

[2] Here's a nice 'proof': writing $w = v + w'$, it suffices to show $w'^Tv \geq 0$. The convexity of $\overline{K}$ means it contains \[ (1-t)v + t(v+w') = v + tw' \] for $0 \leq t \leq 1$. This means we can start at $v$ and walk along the direction of $w'$ for a while and remain inside $\overline{K}$. Now imagine $w'^Tv < 0$. We can split the motion of walking along $w'$ as a sum of walking perpendicular to $v$ and walking along $-v$. Infinitesimally, starting at $v$ and walking perpendicular to it doesn't change your norm, but (even infinitesimally) walking along $-v$ certainly will reduce it. That contradicts the minimality of $|v|^2$. (Check Wikipedia for a formal proof.)

[3] You can obtain this by starting with any compact exhaustion of $\text{int}\ K$, which is convex, and taking the convex hull of each set, since in finite dimensions the convex hull of a compact set is compact.

[4] You may find the key aspect of the proof on page 83 of Boyd (see also exercise 3.28 on page 119). The idea is basically to obtain affine global underestimators for $f$ at every point by obtaining their graphs as supporting hyperplanes for $\text{epi}(f)$. The technicality is that in principle this could be a vertical hyperplane, which is why the theorem doesn't work in general. For an example of a convex function that isn't closed, take for instance $f : [0, 1] \to \mathbb{R}$ given by $f(x) = 0$ for $0 \leq x < 1$ and $f(1) = 1$. As you can imagine, reasonable functions will usually be closed.

[5] The convexity condition is easiest to check when $f$ is twice differentiable. In that case, $f$ is convex if and only if $\nabla^2 f(x) \succcurlyeq 0$ (for all $x \in \text{dom}(f)$, which is convex), where $\nabla^2 f(x)$ is the Hessian matrix of $f$ at $x$, and saying it is $\succcurlyeq 0$ means it is positive semidefinite (this is easily shown by restricting $f$ to an arbitrarily line; the Hessian condition ends up meaning the second derivative of $f$ is non-negative, from which it's not hard to get the previous characterization in terms of the first derivative).

[6] To see this, first get rid of the equality constraints by just restricting the problem to $\{x\ |\ Ax = b\}$. We can also assume the affine inequalities are given by $c_j^T x \leq d_j$ for an LI set of $c_j$'s, since any linear dependence would give rise to an inequality that is either always true or always false (given the others). Then take any point satisfying the weak form of Slater's condition, pick a vector $c$ with $c^T c_j < 0$ for all $c_j$ (possible since they are LI) and walk a little bit in the direction of $c$. That will make the affine inequalities strict, and you won't lose the other ones.

Comments

Popular posts from this blog

The joy of quaternions

Classical field theory and the Euler-Lagrange equations

Green, Gauss, Stokes: the classical theorems of integral calculus (part I)