Optimization

This section includes several tools which form basic building blocks for other higher level solvers.

We provide generators for:

  • Indicator functions

  • Projector function

  • Proximal operators for functions which are prox capable

  • Smooth functions for which gradients can be computed

Most of the descriptions focus on convex sets \(C \subset \RR^n\) and convex functions with signatures \(\RR^n \to \RR\). However, the implementations generalize well for the complex vector space \(\CC^n\) with the real inner product \(\langle x, y \rangle = \Re(x^H y)\).

Also, while we describe the math for vectors \(x \in \RR^n\), the memory layout of these vectors could be in the form of multi-dimensional arrays too.

Indicator Function Generators

Let \(C\) be a subset of \(\RR^n\). An indicator function \(I_C(x)\) is defined as \(I_C(x) = 0\) for every \(x \in C\). For our purposes, its extended-value extension is relevant which is defined as:

(1)\[\begin{split}I_C(x) = \begin{cases} 0 & \text{if } x \in C \\ \infty & x \notin C \end{cases}\end{split}\]

An indicator function generator creates an indicator function based on the specification of the set \(C\). For example if \(C\) is a Euclidean ball, then the specification of \(C\) is based on its center and radius.

indicator_zero()

Returns an indicator function for all zero arrays

indicator_singleton(c)

Returns an indicator function for a singleton set \(C = \{c\}\)

indicator_affine(A[, b])

Returns an indicator function for the linear system \(A x = b\)

indicator_box([l, u])

Returns an indicator function for the box \(l \preceq x \preceq u\)

indicator_box_affine(l, u, a[, alpha, tol])

Returns indicator function for the constraints l <= x <= u and a’ x = alpha

indicator_conic()

Returns an indicator function for Lorentz/ice-cream cone \({(x,t): \| x \|_2 \leq t}\)

indicator_l1_ball([q, b, A])

Returns an indicator function for the closed l1 ball \(\| A x - b \|_1 \leq q\)

indicator_l2_ball([q, b, A])

Returns an indicator function for the closed ball \(\| A x - b \|_2 \leq q\)

Projection Function Generators

A key idea in covex optimization is Projection on Convex Sets (POCS). Given a convex set \(C \subset \RR^n\) and a point \(x \in \RR^n\), the projection on to convex set \(C\) is defined as :

(2)\[P_C(x) = \text{arg} \min_{v \in C} \| x - v \|_2.\]

i.e. find the point \(v\) in \(C\) which is closest to \(x\). If \(x\) is inside \(C\), then the projection is \(x\) itself.

In general, computation of the projection for arbitrary convex sets can be very hard. However, for specific classes of convex sets, the projection can be computed efficiently. Here we provide projection function generators for some classes of convex sets.

A projection function generator creates a projection function based on the specification of the set \(C\).

proj_identity()

Projects on \(\RR^n\) (x => x)

proj_zero()

Projects to the zero vector for \(\RR^n\)

proj_singleton(c)

proj_affine(A[, b])

Returns a function which projects a point to the solution set A x = b

proj_box([l, u])

Projector function for element-wise lower and upper bounds

proj_conic()

Projector function for Lorentz/ice-cream cone {(x,t): | x |_2 leq t}

proj_l1_ball([q, b])

Projector functions for || x - b ||_1 <= q

proj_l2_ball([q, b, A])

Smooth Function Generators

The smoothness of a function is a property measured by the number of continuous derivatives it has over some domain.

We are concerned with convex functions \(f : \RR^n \to \RR\) defined over convex sets \(C \in \RR^n\) and consider them to be smooth if the gradient \(g = \nabla f : \RR^n \to \RR^n\) exists over the interior of its domain.

For a smooth convex function \(f\), we need to be able to

  • Compute the value \(f(x)\) at \(x \in \RR^n\) using the extended-value extension (i.e. \(f(x) = \infty \text{if} x \notin C\).

  • Compute the gradient \(g = \nabla f\) at \(x\), \(g(x)\)

  • Compute the pair \(g(x), f(x)\) at \(x\)

While JAX provides automatic gradient computation capability, it doesn’t match exactly at the boundaries of the domain \(C\) of \(f\) for many of our functions of interest. Hence, we choose to provide our hand-coded gradient implementations.

Sometimes, the gradient computation \(g(x)\) and function computation \(f(x)\) have common parts, and this can be exploited in improving the algorithm efficiency. Hence, we provide an ability to compute the pair too if an algorithm desires to compute them together efficiently.

We represent smooth functions by the type cr.sparse.opt.SmoothFunction.

SmoothFunction

Represents a smooth function

Let op be a variable of type cr.sparse.opt.SmoothFunction which represents some smooth function \(f\). Then:

  • op.func(x) returns the function value \(f(x)\).

  • op.grad(x) returns the gradient of function \(g(x) = \nabla f(x)\).

  • op.grad_val(x) returns the pair \((g(x), f(x))\).

A smooth function generator creates an instance of cr.sparse.opt.SmoothFunction based on the specification of the smooth function \(f\).

Available smooth function generators:

smooth_constant([c])

A constant value function

smooth_entropy()

Entropy function \(f(x) = -\sum(x_i \log (x_i))\) and its gradient

smooth_huber([tau])

Huber penalty function and its gradient

smooth_linear(C[, D])

Linear function and its gradient \(f(x) = \langle c, x \rangle + d\)

smooth_entropy()

Entropy function \(f(x) = -\sum(x_i \log (x_i))\) and its gradient

smooth_logdet([q, C])

Log Det function and its gradient \(f(X) = -\log( \text{det}( X ) )\)

smooth_quad_matrix([P, q, r])

Quadratic function and its gradient \(f(x) = \frac{1}{2} x^T P x + \langle q, x \rangle + r\)

smooth_quad_error(A, b)

Quadratic error function and its gradient \(f(x) = \frac{1}{2} \| A x - b \|_2^2\)

Operations on smooth functions:

smooth_func_translate(smooth_func, b)

Returns a smooth function \(g\) for a smooth function \(f\) s.t.

We provide tools to build your own smooth functions.

  • You can provide just the definition of the smooth function and we will use JAX to compute the gradient.

  • You can provide definitions of the function \(f(x)\) and the gradient \(g(x)\).

  • You can provide definitions of \(f(x)\), \(g(x)\) as well as an efficient routine to compute the pair: \((g(x), f(x))\).

smooth_build(func)

Creates a smooth function based on function definition \(f(x)\)

smooth_build2(func, grad)

Creates a smooth function with user defined \(f(x)\) and gradient \(g(x)\)

smooth_build3(func, grad, grad_val)

Creates a a smooth function with user defined grad and grad_val functions

Utilities:

build_grad_val_func(func, grad)

Constructs a grad_val function from the definitions of function \(f(x)\) and gradient \(g(x)\)

smooth_value_grad(func, grad)

Returns a function which computes both the value and gradient of a smooth function at a specified point

Proximal Operator Generators

The proximal operator for a function \(f\) is defined as

(3)\[p_f(x, t) = \text{arg} \min_{z \in \RR^n} f(x) + \frac{1}{2t} \| z - x \|_2^2\]

The proximal operator \(p\) is a mapping from \((\RR^n, \RR) \to \RR^n\) which maps a vector \(x\) to its proximal vector \(z = p_f(x,t)\) where \(t\) can be thought of as a step size for computing the proximal vector.

The proximal operator can be thought of as a generalization of the projection operator. If \(f\) is an indicator function for some convex set \(C\), then the proximal operator is nothing but the projection function onto the set \(C\). If we think of \(f\) as a cost function over \(\RR^n\) then indicator functions are possibly the toughest cost functions.

For smooth functions, the proximal operator reduces to the gradient step.

We informally call a function as prox-capable if its proximal operator can be computed efficiently.

For a prox-capable function \(f\), in typical proximal algorithms, we need to be able to

  • Compute the value \(f(x)\) at \(x \in \RR^n\) using the extended-value extension

  • Compute the proximal vector for the step size \(t\) as \(p_f(x, t)\)

  • Compute the pair \(p_f(x, t), f(p_f(x, t))\)

Note that in the last case, we first compute \(z = p_f(x, t)\) and then compute the value \(v = f(z)\).

We represent prox-capable functions by the type cr.sparse.opt.ProxCapable.

ProxCapable

Represents a function which is prox capable

Let op be a variable of type cr.sparse.opt.ProxCapable which represents some prox-capable function \(f\). Then:

  • op.func(x) returns the function value \(f(x)\).

  • op.prox_op(x) returns the proximal vector for a step size: \(z = p_f(x, t)\).

  • op.prox_vec_val(x) returns the pair \(z,v = p_f(x, t), f(z)\).

A proximal operator generator creates an instance of cr.sparse.opt.ProxCapable based on the specification of the prox-capable function \(f\).

Available proximal operator generators:

prox_zero()

Returns a prox-capable wrapper for the function \(f(x)=0\)

prox_l1([q])

Returns a prox-capable wrapper for the function \(f(x) = \| q x \|_1\)

prox_l2([q])

Returns a prox-capable wrapper for the function \(f(x) = \| q x \|_2\)

prox_l1_pos([q])

Returns a prox-capable wrapper for the function \(f(x) = \| q x \|_1 + I({x \geq 0})\)

prox_l1_ball([q])

Returns a prox-capable wrapper for the l1-ball \(\{ x : \| x \|_1 \leq q \}\) indicator

prox_owl1([lambda_])

Returns a prox-capable wrapper for the ordered and weighted l1-norm function f(x) = sum(lambda * sort(abs(x), 'descend'))

You can build your own cr.sparse.opt.ProxCapable wrappers by providing the definition of the function \(f(x)\) and its proximal operator \(p_f(x, t)\).

prox_build(func, prox_op)

Creates a wrapper for a prox capable function

build_from_ind_proj(func, prox_op)

Creates a wrapper for a prox capable function

Simpler Projection Functions

project_to_ball(x[, radius])

Projects a vector to the \(\ell_2\) ball of a specified radius.

project_to_box(x[, radius])

Projects a vector to the box (\(\ell_{\infty}\) ball) of a specified radius.

project_to_real_upper_limit(x[, limit])

Projects a (possibly complex) vector to its real part with an upper limit on each entry.

Shrinkage

shrink(a, kappa)

Shrinks each entry of a vector by \(\kappa\).

Conjugate Gradient Methods

Normal Conjugate Gradients on Matrices

cg.solve_from(A, b, x_0[, max_iters, …])

Solves the problem \(Ax = b\) for a symmetric positive definite \(A\) via conjugate gradients iterations with an initial guess.

cg.solve_from_jit(A, b, x_0[, max_iters, …])

Solves the problem \(Ax = b\) for a symmetric positive definite \(A\) via conjugate gradients iterations with an initial guess.

cg.solve(A, b[, max_iters, res_norm_rtol])

Solves the problem \(Ax = b\) for a symmetric positive definite \(A\) via conjugate gradients iterations.

cg.solve_jit(A, b[, max_iters, res_norm_rtol])

Solves the problem \(Ax = b\) for a symmetric positive definite \(A\) via conjugate gradients iterations.

Preconditioned Normal Conjugate Gradients on Linear Operators

These are more general purpose.

pcg.solve_from(A, b, x0[, max_iters, tol, …])

Solves the problem \(Ax = b\) for a symmetric positive definite \(A\) via preconditioned conjugate gradients iterations with an initial guess and a preconditioner.

pcg.solve_from_jit(A, b, x0[, max_iters, …])

Solves the problem \(Ax = b\) for a symmetric positive definite \(A\) via preconditioned conjugate gradients iterations with an initial guess and a preconditioner.

pcg.solve(A, b[, max_iters, tol, atol, M])

Solves the problem \(Ax = b\) for a symmetric positive definite \(A\) via preconditioned conjugate gradients iterations with a preconditioner.

pcg.solve_jit(A, b[, max_iters, tol, atol, M])

Solves the problem \(Ax = b\) for a symmetric positive definite \(A\) via preconditioned conjugate gradients iterations with a preconditioner.