Stationarity Beginnings
Table of Contents
About | Blog | CV | Papers to Read |
Adding in support for verifying Stationarity (for differentiable functions) in CVXPY:
In this post, I want to talk about my PR, where we take the first steps for my Google Summer of Code '23 project by adding in support for verifying stationarity for problems with differentiable components within CVXPY.
To re-iterate, the goal for my project this summer is to take the first steps towards easy verification of the Karush-Kuhn-Tucker optimality conditions for problems solved within CVXPY. For a quick recap, informally, the following four statements comprise of the KKT conditions:
Stationarity of the Lagrangian:
\begin{equation*} \nabla\mathcal{L}(\boldsymbol{x, \lambda,\nu})=0 \end{equation*}The stationarity condition tells us that for a given dual pair \((\lambda,\nu)\), the point \(x\) minimizes the Lagrangian \(\mathcal{L}(x,\lambda,\nu)\). We will later use a more general version of stationarity for verifying the same w.r.t certain variables.
Complimentary Slackness:
\begin{equation*} \lambda_{i}.f_{i}(x)=0,i=1,2,\ldots,m \end{equation*}Can be easily derived by assuming strong duality.
Primal feasibility:
\begin{align*} f_{i}(x)&\leq 0, i=1,2,\ldots,m\\ h_{i}(x)&=0, i=1,2,\ldots,p\\ \end{align*}Dual feasibility:
\begin{equation*} \lambda_{i}\geq 0,i=1,2,\ldots,m \end{equation*}
Conditions 3 and 4 simply ensure feasibility of the computed solution, and are pretty intuitive.
Since CVXPY does not explicitly construct the Lagrangian, we had to do as much in our verification of it's stationarity. The major delievrable of this PR is the check_stationarity_lagrangian
method — which for the time being has been defined within the SolverTestHelper
class (that stays in the solver_test_helpers.py
file). The reason for this is that the class also has several other methods which can be used to verify the other KKT conditions (including a ~more or less working implementation for verifying complementarity and a partial implementation for verifying dual feasibility — which will be the next subject of our work).
A first attempt at the problem — immediate concerns:
In this subsection, I want to discuss one of our early implementations of the method. Before I do that, I want to talk about how we will be computing and representing derivatives of the Lagrangian.
Derivatives are computed using the grad
facility in CVXPY. The grad
method is defined for (almost) every Atom
within CVXPY (examples of current exceptions include the von_neumann_entr
for instance). As far as the construction of the Lagrangian is concerned, we need to include the contribution from the objective function within the same — this can be accessed via cp.Problem.objective.expr
(the objective of any CVXPY problem would be a function of the decision variables constructed using a combination of several Atom
classes in CVXPY. Care must be taken to ensure that the resulting problem is DCP
in nature).
There are two major issues that we faced at this juncture:
- Differing sign of the contribution from different
Constraint
objects - Being able to move constraint functions into the Lagrangian
Moving the objective function into the Lagrangian is easy enough, but what about the constraints? Constraints are represented in CVXPY via the Constraint
class. What we need to move into the Lagrangian in this case, are the constraint functions, i.e., for some constraint \(f_{i}(x) \leq g_{i}(x)\), we would add the contribution \((f_{i}(x) - g_{i}(x))\) (with it's corresponding lagrangian multiplier, \(\lambda_{i}\)) — consequently, for verifying stationarity, we would be expected to differentiate this particular contribution.
However, in CVXPY, there is no easy way to access these constraint functions from constructed Constraint
objects. However, there is a fairly convenient path that already exists within CVXPY for our purpose. For this, we make the following observation (here the lagrangian mutiplier corresponding to \(h_{i}(x)\) is \(\lambda_{i}\))
\[
\nabla_{\textbf{x}} h_{i}(\text{aff}(\boldsymbol{x}))\\
\Updownarrow \\
\]
\[
\nabla_{\boldsymbol{x}}\text{aff}(\boldsymbol{x})\\
\lambda_{i}\in K
\]
Where, \(K\) is the cone defined via \(h_{i}\leq 0\)
(NOTE: The reason we are assuming that all constrained expressions are affine is due to DCP
reasons)
This means that we can now move in constraint functions into the lagrangian for verifying stationarity very simply as cp.scalar_product(con.expr, con.dual_value)
— here, con.expr
is the affine expression that was constrained to lie within con subclasses Constraint
, the dual values that CVXPY recovers already imposes \(\lambda_{i}\in K\).
Also, for checking what gradients are problematic within the final computation, we compute the standard frobenius norm of the same (hence the choice of variable names bad_fro_norms
). Other choices for matrix norms could potentially have worked equally well.
For supporting new Constraint
sets in CVXPY for stationarity calculations, we would require dual variable recovery to be implemented for the same (for example, PowConeND, RelEntrConeQuad
and OpRelEntrConeQuad
do not have computation of dual variables implemented as of this date).
Similarly, for supporting new Atom
functions for stationarity calculations, we would need to be able to compute their derivatives (or more generally, their subgradients, which is a major addition we plan on working later on via the introduction of a new ConvexSet
class) via the grad
facility.
Here is our first implementation. It should be read fairly easily with the above prose in mind.
def check_stationary_lagrangian(self, places) -> None: L = self.prob.objective.expr for con in self.constraints: if isinstance(con, (cp.constraints.Inequality, cp.constraints.Equality)): dual_var_value = con.dual_value prim_var_expr = con.expr L = L + cp.scalar_product(dual_var_value, prim_var_expr) elif isinstance(con, (cp.constraints.ExpCone, cp.constraints.SOC, cp.constraints.Zero, cp.constraints.NonNeg, cp.constraints.PSD, cp.constraints.PowCone3D)): L = L - cp.scalar_product(con.args, con.dual_value) else: raise NotImplementedError() g = L.grad # compute norm bad_fro_norms = [] for (k, v) in g.items(): # (k, v) = (cvxpy Variable, SciPy sparse matrix) norm = np.linalg.norm(v.data) / np.sqrt(k.size) if norm > 10**(-places): bad_fro_norms.append((norm, k.name())) if len(bad_fro_norms): msg = f"""\n The gradient of Lagrangian with respect to the primal variables is above the threshold of 10^{-places}. The names of the problematic variables and the corresponding gradient norms are as follows: """ for norm, varname in bad_fro_norms: msg += f"\n\t\t\t{varname} : {norm}" msg += '\n' self.tester.fail(msg) pass
The next biggest issue that we face is that CVXPY allows users to pass in certain constraints on variables implicitly via flags during their initialization. Note, by our current convention of defining the Lagrangian, we are only moving the objective function and the constraints that have been explicitly passed in by the user to the problem into the Lagrangian. However, we cannot do this in the case of constraints imposed on variables via flags at initialization time — for the same we had to interpret the stationarity condition in a slightly more general sense, as described in the next section.
The final implementation — handling flag constraints:
A more general way to interpret stationarity is via the characterization of the dual cone of the domain of the variable w.r.t we are differentiating (henceforth called the differentat-er XD). Specifically, in the case when there are no implicit constraints on the domain of the differentiat-er, i.e. when the domain of the same is \(\mathbb{R}^{n\times n}\), the dual cone of the same is the singleton set containing the zero element \(\mathbb{R^{n\times n}}\). Similarly, in the case when the decision variable that is playing the role of the differentiat-er is constrained to lie in some cone non-trivial cone \(K\), we will instead check if the resultant gradient lies in it's corresponding dual cone \(K^{*}\).
As of today, the different kinds of (real and conic) constraints that CVXPY allows users to impose via flags are the following:
nonneg : bool nonpos : bool symmetric : bool diag : bool PSD : bool NSD : bool pos : bool neg : bool
CVXPY also supports boolean, integer
but MIP's (Mixed Integer Programs) cannot be differentiated through.
Of these, we do not support diagonal matrices via diag
(the cone dual to the cone of diagonal matrices is the set of matrices which have all diagonal elements zero, the so-called hollow matrices) as of now because of their peculiar internal reprsentation within CVXPY (via SciPy CSC
matrices). The dual-cones for all of the others and how they've been implemented has been detailed in the code snippet below. For all flags with the exception of symmetric
, we follow the same pattern of first constructing the constraint corresponding to the dual cone of the domain cone and computing the violation of the gradient w.r.t the same by chcecking the residual
property. For example, here is how the current code for symmetric
would be adapted to the same pattern:
if opt_var.is_symmetric(): """The cone of skew-symmetric matrices is dual to the cone of symmetric matrices""" g_bad_mat = cp.Constant(np.reshape(g[opt_var].toarray(), opt_var.shape)) tmp_con = g_bad_mat == -g_bad_mat.T dual_cone_violation = tmp_con.residual if dual_cone_violation > 10**(-places): # some code pass
Here is the final implementation of check_stationarity_lagrangian
. Again, most of this should follow directly from the prose above, another point worthy of note is the order of the flag tests. We check for PSD/NSD
before symmetric
because when a user declares a variable PSD/NSD=True
CVXPY additionally also sets symmetric=True
(this is not done when the user explicitly passes in the constraint X << 0
or X >> 0
).
def check_stationary_lagrangian(self, places) -> None: L = self.prob.objective.expr objective = self.prob.objective if objective.NAME == 'minimize': L = objective.expr else: L = -objective.expr for con in self.constraints: if isinstance(con, (cp.constraints.Inequality, cp.constraints.Equality)): dual_var_value = con.dual_value prim_var_expr = con.expr L = L + cp.scalar_product(dual_var_value, prim_var_expr) elif isinstance(con, (cp.constraints.ExpCone, cp.constraints.SOC, cp.constraints.Zero, cp.constraints.NonNeg, cp.constraints.PSD, cp.constraints.PowCone3D)): L = L - cp.scalar_product(con.args, con.dual_value) else: raise NotImplementedError() try: g = L.grad except TypeError as e: assert 'is not subscriptable' in str(e) msg = """\n CVXPY problems with `diag` variables are not supported for stationarity checks as of now """ self.tester.fail(msg) bad_norms = [] """The convention that we follow for construting the Lagrangian is: 1) Move all explicitly passed constraints to the problem (via Problem.constraints) into the Lagrangian --- dLdX == 0 for any such variables 2) Constraints that have implicitly been imposed on variables at the time of declaration via specific flags (e.g.: PSD/symmetric etc.), in such a case we check, `dLdX\in K^{*}`, where `K` is the convex cone corresponding to the implicit constraint on `X` """ for (opt_var, v) in g.items(): if all(not attr for attr in list(map(lambda x: x[1], opt_var.attributes.items()))): """Case when the variable doesn't have any special attributes""" norm = np.linalg.norm(v.data) / np.sqrt(opt_var.size) if norm > 10**(-places): bad_norms.append((norm, opt_var)) else: if opt_var.is_psd(): """The PSD cone is self-dual""" g_bad_mat = cp.Constant(np.reshape(g[opt_var].toarray(), opt_var.shape)) tmp_con = g_bad_mat >> 0 dual_cone_violation = tmp_con.residual if dual_cone_violation > 10**(-places): bad_norms.append((dual_cone_violation, opt_var)) elif opt_var.is_nsd(): """The NSD cone is also self-dual""" g_bad_mat = cp.Constant(np.reshape(g[opt_var].toarray(), opt_var.shape)) tmp_con = g_bad_mat << 0 dual_cone_violation = tmp_con.residual if dual_cone_violation > 10**(-places): bad_norms.append((dual_cone_violation, opt_var)) elif opt_var.is_diag(): """The dual cone to the set of diagonal matrices is the set of 'Hollow' matrices i.e. matrices with diagonal entries zero""" g_bad_mat = np.reshape(g[opt_var].toarray(), opt_var.shape) diag_entries = np.diag(opt_var.value) dual_cone_violation = np.linalg.norm(diag_entries) / np.sqrt(opt_var.size) if diag_entries > 10**(-places): bad_norms.append((dual_cone_violation, opt_var)) elif opt_var.is_symmetric(): """The dual cone to the set of symmetric matrices is the set of skew-symmetric matrices, so we check if dLdX \in set(skew-symmetric-matrices) g[opt_var] is the problematic gradient in question""" g_bad_mat = np.reshape(g[opt_var].toarray(), opt_var.shape) mat = g_bad_mat + g_bad_mat.T dual_cone_violation = np.linalg.norm(mat) / np.sqrt(opt_var.size) if dual_cone_violation > 10**(-places): bad_norms.append((dual_cone_violation, opt_var)) elif opt_var.is_nonpos(): """The cone of matrices with all entries nonpos is self-dual""" g_bad_mat = cp.Constant(np.reshape(g[opt_var].toarray(), opt_var.shape)) tmp_con = g_bad_mat <= 0 dual_cone_violation = np.linalg.norm(tmp_con.residual) / np.sqrt(opt_var.size) if dual_cone_violation > 10**(-places): bad_norms.append((dual_cone_violation, opt_var)) elif opt_var.is_nonneg(): """The cone of matrices with all entries nonneg is self-dual""" g_bad_mat = cp.Constant(np.reshape(g[opt_var].toarray(), opt_var.shape)) tmp_con = g_bad_mat >= 0 dual_cone_violation = np.linalg.norm(tmp_con.residual) / np.sqrt(opt_var.size) if dual_cone_violation > 10**(-places): bad_norms.append((dual_cone_violation, opt_var)) if len(bad_norms): msg = f"""\n The gradient of Lagrangian with respect to the primal variables is above the threshold of 10^{-places}. The names of the problematic variables and the corresponding gradient norms are as follows: """ for norm, opt_var in bad_norms: msg += f"\n\t\t\t{opt_var.name} : {norm}" msg += '\n' self.tester.fail(msg) pass