Iterative Shrinkage and Thresholding Algorithm¶

Derivation
- Sparsifying Basis
Implementation

Our description is based on [DDDM04, Ela10, ZE10].

ISTA can be used to solve problems of the form:

(1)¶\[\widehat{\bx} = \text{arg} \min_{x} \frac{1}{2}\| \bb - \bA \bx \|_2^2 + \lambda \| \bx \|_1\]

and its variants (with different regularizations).

Here the objective function is:

(2)¶\[f(\bx) = \frac{1}{2}\| \bb - \bA \bx \|_2^2 + \lambda \| \bx \|_1\]

Derivation ¶

For an identity \(\bA\), the problem reduces to:

(3)¶\[\widehat{\bx} = \text{arg} \min_{x} \frac{1}{2}\| \bb - \bx \|_2^2 + \lambda \| \bx \|_1\]

The term on the R.H.S. :

(4)¶\[f(\bx) = \frac{1}{2}\| \bb - \bx \|_2^2 + \lambda \| \bx \|_1 = \sum_{i=1}^n \left [\frac{1}{2} | b_i - x_i|^2 + \lambda |x_i| \right ]\]

is separable in the components of \(\bx\).

The scalar function

(5)¶\[g(\tau) = \frac{1}{2} | \gamma - \tau|^2 + \lambda |\tau|\]

has a minimizer given by the soft thresholding function:

(6)¶\[\begin{split}\tau^* = \mathbf{ST}_{\lambda}(\gamma) = \begin{cases} \gamma\left ( 1 - \frac{\lambda}{|\gamma|} \right ) & \text{for} & |\gamma| \gt \lambda\\ 0 & \text{for} & |\gamma| \le \lambda \end{cases}\end{split}\]

Zibulevsky and Elad in [ZE10] show that a similar solution emerges when \(\bA\) is unitary also.

Daubechies et al. in [DDDM04] introduced a surrogate term:

(7)¶\[\text{dist}(\bx, \bx_0) = \frac{c}{2} \| \bx - \bx_0 \|_2^2 - \frac{1}{2} \| \bA \bx - \bA \bx_0 \|_2^2\]

For this function to be strictly convex w.r.t. \(\bx\), its Hessian must be positive definite. This holds if:

(8)¶\[c > \| \bA^H \bA \|_2 = \lambda_{\max}(\bA^H \bA)\]

the largest eigen value \(\bA^H \bA\).

The new surrogate objective function is defined as:

(9)¶\[\widetilde{f}(\bx, \bx_0) = f(\bx) + \text{dist}(\bx, \bx_0) = \frac{1}{2}\| \bb - \bA \bx \|_2^2 + \lambda \| \bx \|_1 + \frac{c}{2} \| \bx - \bx_0 \|_2^2 - \frac{1}{2} \| \bA \bx - \bA \bx_0 \|_2^2\]

Introducing:

(10)¶\[\nu_0 = \bx_0 + \frac{1}{c} \bA^H (\bb - \bA \bx_0)\]

we can rewrite the surrogate objective as:

(11)¶\[\widetilde{f}(\bx, \bx_0) = \text{Const} + \lambda \| \bx \|_1 + \frac{c}{2} \| \bx - \nu_0 \|_2^2\]

As discussed above, this surrogate objective can be minimized using the soft thresholding operator:

(12)¶\[\bx^* = \mathbf{ST}_{\frac{\lambda}{c} }(\nu_0) = \mathbf{ST}_{\frac{\lambda}{c} }\left (\bx_0 + \frac{1}{c} \bA^H (\bb - \bA \bx_0) \right )\]

Starting with some \(\bx_0\), we can propose an iterative method over a sequence \(\{ \bx_i \} = \{\bx_0, \bx_1, \bx_2, \dots \}\) as:

(13)¶\[\bx_{i+1} = \mathbf{ST}_{\frac{\lambda}{c} }\left (\bx_i + \frac{1}{c} \bA^H (\bb - \bA \bx_i)\right )\]

This is the IST algorithm.

By changing the regularization in (1), we can derive different IST algorithms with different thresholding functions. The version below considers a generalized thresholding function which depends on the regularizer.

(14)¶\[\bx_{i+1} = \mathbf{T}_{\frac{\lambda}{c} }\left (\bx_i + \frac{1}{c} \bA^H (\bb - \bA \bx_i)\right )\]

Sparsifying Basis ¶

Often, the signal \(\bx\) (e.g. an image) may not be sparse or compressible but it has a sparse representation in some basis \(\bB\). We have

(15)¶\[\ba = \bB^H \bx\]

as the representation of \(\bx\) in the basis \(\bB\).

The regularization is then applied to the representation \(\ba\). (1) becomes:

(16)¶\[\widehat{\bx} = \text{arg} \min_{\bx} \frac{1}{2}\| \bb - \bA \bx \|_2^2 + \lambda \| \bB^H \bx \|_1\]

We can rewrite this as:

(17)¶\[\widehat{\ba} = \text{arg} \min_{\ba} \frac{1}{2}\| \bb - \bA \bB \ba \|_2^2 + \lambda \| \ba \|_1\]

(14) changes to:

(18)¶\[\ba_{i+1} = \mathbf{T}_{\frac{\lambda}{c} }\left (\ba_i + \frac{1}{c} \bB^H \bA^H (\bb - \bA \bB \ba_i)\right )\]

By substituting \(\ba = \bB^H \bx\) and \(\bx = \bB \ba\), we get:

(19)¶\[\ba_{i+1} = \mathbf{T}_{\frac{\lambda}{c} }\left (\bB^H \bx_i + \frac{1}{c} \bB^H \bA^H (\bb - \bA \bx_i)\right )\]

Simplifying further:

(20)¶\[\bx_{i+1} = \bB \mathbf{T}_{\frac{\lambda}{c} } \left ( \bB^H \left (\bx_i + \frac{1}{c} \bA^H (\bb - \bA \bx_i)\right ) \right )\]

This is the version of IST algorithm with an operator \(\bA\) and a sparsifying basis \(\bB\).

Implementation ¶

We introduce the current residual:

(21)¶\[\br = \bb - \bA \bx\]

and a step size parameter \(\alpha = \frac{1}{c}\). We also assume that a thresholding function \(\mathbf{T}\) will be user defined.

This simplifies the iteration to:

(22)¶\[\bx_{i+1} = \bB \mathbf{T}\left ( \bB^H \left (\bx_i + \alpha \bA^H \br_i \right ) \right )\]

Algorithm state

Current state of the algorithm is described by following quantities:

Term	Description
`x`	Current estimate of \(\bx\)
`r`	Current residual \(\br\)
`r_norm_sqr`	Squared norm of the residual \(\\| \br \\|_2^2\)
`x_change_norm`	Change in the norm of \(x\) given by \(\\|\bx_{i+1} - \bx_{i} \\|_2\)
`iterations`	Number of iterations of algorithm run so far

Algorithm initialization

We initialize the algorithm with:

\(\bx \leftarrow \bx_0\) an initial estimate of solution given by user. It can be \(\bzero\).
\(\br \leftarrow \bb - \bA \bx_0\)
Compute r_norm_sqr
Give a very high value to x_change_norm (since there is no \(\bx_{-1}\)).
iterations = 0

Algorithm iteration

Following steps are involved in each iteration. These are directly based on (20).

Compute gradient: \(\bg \leftarrow \alpha \bA^H \br\)
Update estimate: \(\bx \leftarrow \bx + \bg\)
Transform \(\alpha \leftarrow \bB^H \bx\)
Threshold: \(\alpha \leftarrow \mathbf{T} (\alpha)\)
Inverse transform \(\bx \leftarrow \bB \alpha\)
Update residual: \(\br \leftarrow \bb - \bA \bx\)

cr-sparse

Navigation

Related Topics

Iterative Shrinkage and Thresholding Algorithm¶

Derivation ¶

Sparsifying Basis ¶

Implementation ¶

Iterative Shrinkage and Thresholding Algorithm¶

Derivation¶

Sparsifying Basis¶

Implementation¶

Derivation ¶

Sparsifying Basis ¶

Implementation ¶