.. _sls:ista: Iterative Shrinkage and Thresholding Algorithm ====================================================== .. contents:: :depth: 2 :local: Our description is based on :cite:`daubechies2004iterative, elad2010sparse, zibulevsky2010l1`. ISTA can be used to solve problems of the form: .. math:: :label: ista_l1_l2_minimization \widehat{\bx} = \text{arg} \min_{x} \frac{1}{2}\| \bb - \bA \bx \|_2^2 + \lambda \| \bx \|_1 and its variants (with different regularizations). Here the objective function is: .. math:: f(\bx) = \frac{1}{2}\| \bb - \bA \bx \|_2^2 + \lambda \| \bx \|_1 Derivation ------------------ For an identity :math:`\bA`, the problem reduces to: .. math:: \widehat{\bx} = \text{arg} \min_{x} \frac{1}{2}\| \bb - \bx \|_2^2 + \lambda \| \bx \|_1 The term on the R.H.S. : .. math:: f(\bx) = \frac{1}{2}\| \bb - \bx \|_2^2 + \lambda \| \bx \|_1 = \sum_{i=1}^n \left [\frac{1}{2} | b_i - x_i|^2 + \lambda |x_i| \right ] is separable in the components of :math:`\bx`. The scalar function .. math:: g(\tau) = \frac{1}{2} | \gamma - \tau|^2 + \lambda |\tau| has a minimizer given by the soft thresholding function: .. math:: \tau^* = \mathbf{ST}_{\lambda}(\gamma) = \begin{cases} \gamma\left ( 1 - \frac{\lambda}{|\gamma|} \right ) & \text{for} & |\gamma| \gt \lambda\\ 0 & \text{for} & |\gamma| \le \lambda \end{cases} Zibulevsky and Elad in :cite:`zibulevsky2010l1` show that a similar solution emerges when :math:`\bA` is unitary also. Daubechies et al. in :cite:`daubechies2004iterative` introduced a surrogate term: .. math:: \text{dist}(\bx, \bx_0) = \frac{c}{2} \| \bx - \bx_0 \|_2^2 - \frac{1}{2} \| \bA \bx - \bA \bx_0 \|_2^2 For this function to be strictly convex w.r.t. :math:`\bx`, its Hessian must be positive definite. This holds if: .. math:: c > \| \bA^H \bA \|_2 = \lambda_{\max}(\bA^H \bA) the largest eigen value :math:`\bA^H \bA`. The new surrogate objective function is defined as: .. math:: \widetilde{f}(\bx, \bx_0) = f(\bx) + \text{dist}(\bx, \bx_0) = \frac{1}{2}\| \bb - \bA \bx \|_2^2 + \lambda \| \bx \|_1 + \frac{c}{2} \| \bx - \bx_0 \|_2^2 - \frac{1}{2} \| \bA \bx - \bA \bx_0 \|_2^2 Introducing: .. math:: \nu_0 = \bx_0 + \frac{1}{c} \bA^H (\bb - \bA \bx_0) we can rewrite the surrogate objective as: .. math:: \widetilde{f}(\bx, \bx_0) = \text{Const} + \lambda \| \bx \|_1 + \frac{c}{2} \| \bx - \nu_0 \|_2^2 As discussed above, this surrogate objective can be minimized using the soft thresholding operator: .. math:: \bx^* = \mathbf{ST}_{\frac{\lambda}{c} }(\nu_0) = \mathbf{ST}_{\frac{\lambda}{c} }\left (\bx_0 + \frac{1}{c} \bA^H (\bb - \bA \bx_0) \right ) Starting with some :math:`\bx_0`, we can propose an iterative method over a sequence :math:`\{ \bx_i \} = \{\bx_0, \bx_1, \bx_2, \dots \}` as: .. math:: :label: ista_iteration_soft_thresholding \bx_{i+1} = \mathbf{ST}_{\frac{\lambda}{c} }\left (\bx_i + \frac{1}{c} \bA^H (\bb - \bA \bx_i)\right ) This is the IST algorithm. By changing the regularization in :eq:`ista_l1_l2_minimization`, we can derive different IST algorithms with different thresholding functions. The version below considers a generalized thresholding function which depends on the regularizer. .. math:: :label: ista_iteration_thresholding \bx_{i+1} = \mathbf{T}_{\frac{\lambda}{c} }\left (\bx_i + \frac{1}{c} \bA^H (\bb - \bA \bx_i)\right ) Sparsifying Basis ''''''''''''''''''''''' Often, the signal :math:`\bx` (e.g. an image) may not be sparse or compressible but it has a sparse representation in some basis :math:`\bB`. We have .. math:: \ba = \bB^H \bx as the representation of :math:`\bx` in the basis :math:`\bB`. The regularization is then applied to the representation :math:`\ba`. :eq:`ista_l1_l2_minimization` becomes: .. math:: \widehat{\bx} = \text{arg} \min_{\bx} \frac{1}{2}\| \bb - \bA \bx \|_2^2 + \lambda \| \bB^H \bx \|_1 We can rewrite this as: .. math:: \widehat{\ba} = \text{arg} \min_{\ba} \frac{1}{2}\| \bb - \bA \bB \ba \|_2^2 + \lambda \| \ba \|_1 :eq:`ista_iteration_thresholding` changes to: .. math:: \ba_{i+1} = \mathbf{T}_{\frac{\lambda}{c} }\left (\ba_i + \frac{1}{c} \bB^H \bA^H (\bb - \bA \bB \ba_i)\right ) By substituting :math:`\ba = \bB^H \bx` and :math:`\bx = \bB \ba`, we get: .. math:: \ba_{i+1} = \mathbf{T}_{\frac{\lambda}{c} }\left (\bB^H \bx_i + \frac{1}{c} \bB^H \bA^H (\bb - \bA \bx_i)\right ) Simplifying further: .. math:: :label: ista_iteration_thresholding_basis \bx_{i+1} = \bB \mathbf{T}_{\frac{\lambda}{c} } \left ( \bB^H \left (\bx_i + \frac{1}{c} \bA^H (\bb - \bA \bx_i)\right ) \right ) This is the version of IST algorithm with an operator :math:`\bA` and a sparsifying basis :math:`\bB`. Implementation ---------------------- We introduce the current residual: .. math:: \br = \bb - \bA \bx and a step size parameter :math:`\alpha = \frac{1}{c}`. We also assume that a thresholding function :math:`\mathbf{T}` will be user defined. This simplifies the iteration to: .. math:: \bx_{i+1} = \bB \mathbf{T}\left ( \bB^H \left (\bx_i + \alpha \bA^H \br_i \right ) \right ) .. rubric:: Algorithm state Current state of the algorithm is described by following quantities: .. list-table:: :header-rows: 1 * - Term - Description * - ``x`` - Current estimate of :math:`\bx` * - ``r`` - Current residual :math:`\br` * - ``r_norm_sqr`` - Squared norm of the residual :math:`\| \br \|_2^2` * - ``x_change_norm`` - Change in the norm of :math:`x` given by :math:`\|\bx_{i+1} - \bx_{i} \|_2` * - ``iterations`` - Number of iterations of algorithm run so far .. rubric:: Algorithm initialization We initialize the algorithm with: * :math:`\bx \leftarrow \bx_0` an initial estimate of solution given by user. It can be :math:`\bzero`. * :math:`\br \leftarrow \bb - \bA \bx_0` * Compute ``r_norm_sqr`` * Give a very high value to ``x_change_norm`` (since there is no :math:`\bx_{-1}`). * ``iterations = 0`` .. rubric:: Algorithm iteration Following steps are involved in each iteration. These are directly based on :eq:`ista_iteration_thresholding_basis`. * Compute gradient: :math:`\bg \leftarrow \alpha \bA^H \br` * Update estimate: :math:`\bx \leftarrow \bx + \bg` * Transform :math:`\alpha \leftarrow \bB^H \bx` * Threshold: :math:`\alpha \leftarrow \mathbf{T} (\alpha)` * Inverse transform :math:`\bx \leftarrow \bB \alpha` * Update residual: :math:`\br \leftarrow \bb - \bA \bx`