Iterative Shrinkage and Thresholding Algorithm¶

Derivation
- Sparsifying Basis
Implementation

Our description is based on [Ela10, DDDM04, ZE10].

ISTA can be used to solve problems of the form:

(1)¶\[\widehat{x} = \text{arg} \min_{x} \frac{1}{2}\| b - A x \|_2^2 + \lambda \| x \|_1\]

and its variants (with different regularizations).

Here the objective function is:

(2)¶\[f(x) = \frac{1}{2}\| b - A x \|_2^2 + \lambda \| x \|_1\]

Derivation ¶

For an identity \(A\), the problem reduces to:

(3)¶\[\widehat{x} = \text{arg} \min_{x} \frac{1}{2}\| b - x \|_2^2 + \lambda \| x \|_1\]

The term on the R.H.S. :

(4)¶\[f(x) = \frac{1}{2}\| b - x \|_2^2 + \lambda \| x \|_1 = \sum_{i=1}^n \left [\frac{1}{2} | b_i - x_i|^2 + \lambda |x_i| \right ]\]

is separable in the components of \(x\).

The scalar function

(5)¶\[g(\tau) = \frac{1}{2} | \gamma - \tau|^2 + \lambda |\tau|\]

has a minimizer given by the soft thresholding function:

(6)¶\[\begin{split}\tau^* = \mathbf{ST}_{\lambda}(\gamma) = \begin{cases} \gamma\left ( 1 - \frac{\lambda}{|\gamma|} \right ) & \text{for} & |\gamma| \gt \lambda\\ 0 & \text{for} & |\gamma| \le \lambda \end{cases}\end{split}\]

[ZE10] show that a similar solution emerges when \(A\) is unitary too.

[DDDM04] introduced a surrogate term:

(7)¶\[\text{dist}(x, x_0) = \frac{c}{2} \| x - x_0 \|_2^2 - \frac{1}{2} \| A x - A x_0 \|_2^2\]

For this function to be strictly convex w.r.t. \(x\), its Hessian must be positive definite. This holds if:

(8)¶\[c > \| A^H A \|_2 = \lambda_{\max}(A^H A)\]

the largest eigen value \(A^H A\).

The new surrogate objective function is defined as:

(9)¶\[\widetilde{f}(x, x_0) = f(x) + \text{dist}(x, x_0) = \frac{1}{2}\| b - A x \|_2^2 + \lambda \| x \|_1 + \frac{c}{2} \| x - x_0 \|_2^2 - \frac{1}{2} \| A x - A x_0 \|_2^2\]

Introducing:

(10)¶\[\nu_0 = x_0 + \frac{1}{c} A^H (b - A x_0)\]

We can rewrite the surrogate objective as:

(11)¶\[\widetilde{f}(x, x_0) = \text{Const} + \lambda \| x \|_1 + \frac{c}{2} \| x - \nu_0 \|_2^2\]

As discussed above, this surrogate objective can be minimized using the soft thresholding operator:

(12)¶\[x^* = \mathbf{ST}_{\frac{\lambda}{c} }(\nu_0) = \mathbf{ST}_{\frac{\lambda}{c} }\left (x_0 + \frac{1}{c} A^H (b - A x_0) \right )\]

Starting with some \(x_0\), we can propose an itarative method over a sequence \(x_i = \{x_0, x_1, x_2, \dots \}\) as:

(13)¶\[x_{i+1} = \mathbf{ST}_{\frac{\lambda}{c} }\left (x_i + \frac{1}{c} A^H (b - A x_i)\right )\]

This is the IST algorithm.

By changing the regularization in (1), we can derive different IST algorithms with different thresholding functions. The version below considers a generalized thresholding function which depends on the regularizer.

(14)¶\[x_{i+1} = \mathbf{T}_{\frac{\lambda}{c} }\left (x_i + \frac{1}{c} A^H (b - A x_i)\right )\]

Sparsifying Basis ¶

Often, the signal \(x\) (e.g. an image) may not be sparse or compressible but it has a sparse representation in some basis \(B\). We have

(15)¶\[\alpha = B^H x\]

as the representation of \(x\) in the basis \(B\).

The regularization is then applied to the representation \(\alpha\). (1) becomes:

(16)¶\[\widehat{x} = \text{arg} \min_{x} \frac{1}{2}\| b - A x \|_2^2 + \lambda \| B^H x \|_1\]

We can rewrite this as:

(17)¶\[\widehat{\alpha} = \text{arg} \min_{\alpha} \frac{1}{2}\| b - A B \alpha \|_2^2 + \lambda \| \alpha \|_1\]

(14) changes to:

(18)¶\[\alpha_{i+1} = \mathbf{T}_{\frac{\lambda}{c} }\left (\alpha_i + \frac{1}{c} B^H A^H (b - A B \alpha_i)\right )\]

By substituting \(\alpha = B^H x\) and \(x = B \alpha\), we get:

(19)¶\[\alpha_{i+1} = \mathbf{T}_{\frac{\lambda}{c} }\left (B^H x_i + \frac{1}{c} B^H A^H (b - A x_i)\right )\]

Simplifying further:

(20)¶\[x_{i+1} = B \mathbf{T}_{\frac{\lambda}{c} }\left ( B^H \left (x_i + \frac{1}{c} A^H (b - A x_i)\right ) \right )\]

This is the version of IST algorithm with an operator \(A\) and a basis \(B\).

Implementation ¶

We introduce the current residual:

(21)¶\[r = b - A x\]

and a step size parameter \(\alpha = \frac{1}{c}\). We also assume that a thresholding function \(\mathbf{T}\) will be user defined.

This simplifies the iteration to:

(22)¶\[x_{i+1} = B \mathbf{T}\left ( B^H \left (x_i + \alpha A^H r_i \right ) \right )\]

Algorithm state

Current state of the algorithm is described by following quantities:

Term	Description
`x`	Current estimate of \(x\)
`r`	Current residual \(r\)
`r_norm_sqr`	Squared norm of the residual \(\\| r \\|_2^2\)
`x_change_norm`	Change in the norm of \(x\) given by \(\\|x_{i+1} - x_{i} \\|_2\)
`iterations`	Number of iterations of algorithm run so far

Algorithm initialization

We initialize the algorithm with:

\(x \leftarrow x_0\) an initial estimate of solution given by user. It can be 0.
\(r \leftarrow b - A x_0\)
Compute r_norm_sqr
Give a very high value to x_change_norm (since there is no \(x_{-1}\)).
iterations = 0

Algorithm iteration

Following steps are involved in each iteration. These are directly based on (20).

Compute gradient: \(g \leftarrow \alpha A^H r\)
Update estimate: \(x \leftarrow x + g\)
Transform \(\alpha \leftarrow B^H x\)
Threshold: \(\alpha \leftarrow \mathbf{T} (\alpha)\)
Inverse transform \(x \leftarrow B \alpha\)
Update residual: \(r \leftarrow b - A x\)

cr-sparse

Navigation

Related Topics

Iterative Shrinkage and Thresholding Algorithm¶

Derivation ¶

Sparsifying Basis ¶

Implementation ¶

Iterative Shrinkage and Thresholding Algorithm¶

Derivation¶

Sparsifying Basis¶

Implementation¶

Derivation ¶

Sparsifying Basis ¶

Implementation ¶