Iterative Shrinkage and Thresholding Algorithm¶
Our description is based on [DDDM04, Ela10, ZE10].
ISTA can be used to solve problems of the form:
and its variants (with different regularizations).
Here the objective function is:
Derivation¶
For an identity \(A\), the problem reduces to:
The term on the R.H.S. :
is separable in the components of \(x\).
The scalar function
has a minimizer given by the soft thresholding function:
[ZE10] show that a similar solution emerges when \(A\) is unitary too.
[DDDM04] introduced a surrogate term:
For this function to be strictly convex w.r.t. \(x\), its Hessian must be positive definite. This holds if:
the largest eigen value \(A^H A\).
The new surrogate objective function is defined as:
Introducing:
We can rewrite the surrogate objective as:
As discussed above, this surrogate objective can be minimized using the soft thresholding operator:
Starting with some \(x_0\), we can propose an itarative method over a sequence \(x_i = \{x_0, x_1, x_2, \dots \}\) as:
This is the IST algorithm.
By changing the regularization in (1), we can derive different IST algorithms with different thresholding functions. The version below considers a generalized thresholding function which depends on the regularizer.
Sparsifying Basis¶
Often, the signal \(x\) (e.g. an image) may not be sparse or compressible but it has a sparse representation in some basis \(B\). We have
as the representation of \(x\) in the basis \(B\).
The regularization is then applied to the representation \(\alpha\). (1) becomes:
We can rewrite this as:
(14) changes to:
By substituting \(\alpha = B^H x\) and \(x = B \alpha\), we get:
Simplifying further:
This is the version of IST algorithm with an operator \(A\) and a basis \(B\).
Implementation¶
We introduce the current residual:
and a step size parameter \(\alpha = \frac{1}{c}\). We also assume that a thresholding function \(\mathbf{T}\) will be user defined.
This simplifies the iteration to:
Algorithm state
Current state of the algorithm is described by following quantities:
Term |
Description |
---|---|
|
Current estimate of \(x\) |
|
Current residual \(r\) |
|
Squared norm of the residual \(\| r \|_2^2\) |
|
Change in the norm of \(x\) given by \(\|x_{i+1} - x_{i} \|_2\) |
|
Number of iterations of algorithm run so far |
Algorithm initialization
We initialize the algorithm with:
\(x \leftarrow x_0\) an initial estimate of solution given by user. It can be 0.
\(r \leftarrow b - A x_0\)
Compute
r_norm_sqr
Give a very high value to
x_change_norm
(since there is no \(x_{-1}\)).iterations = 0
Algorithm iteration
Following steps are involved in each iteration. These are directly based on (20).
Compute gradient: \(g \leftarrow \alpha A^H r\)
Update estimate: \(x \leftarrow x + g\)
Transform \(\alpha \leftarrow B^H x\)
Threshold: \(\alpha \leftarrow \mathbf{T} (\alpha)\)
Inverse transform \(x \leftarrow B \alpha\)
Update residual: \(r \leftarrow b - A x\)