.. _sls:ista: Iterative Shrinkage and Thresholding Algorithm ====================================================== .. contents:: :depth: 2 :local: Our description is based on :cite:`daubechies2004iterative, elad2010sparse, zibulevsky2010l1`. ISTA can be used to solve problems of the form: .. math:: :label: ista_l1_l2_minimization \widehat{x} = \text{arg} \min_{x} \frac{1}{2}\| b - A x \|_2^2 + \lambda \| x \|_1 and its variants (with different regularizations). Here the objective function is: .. math:: f(x) = \frac{1}{2}\| b - A x \|_2^2 + \lambda \| x \|_1 Derivation ------------------ For an identity :math:`A`, the problem reduces to: .. math:: \widehat{x} = \text{arg} \min_{x} \frac{1}{2}\| b - x \|_2^2 + \lambda \| x \|_1 The term on the R.H.S. : .. math:: f(x) = \frac{1}{2}\| b - x \|_2^2 + \lambda \| x \|_1 = \sum_{i=1}^n \left [\frac{1}{2} | b_i - x_i|^2 + \lambda |x_i| \right ] is separable in the components of :math:`x`. The scalar function .. math:: g(\tau) = \frac{1}{2} | \gamma - \tau|^2 + \lambda |\tau| has a minimizer given by the soft thresholding function: .. math:: \tau^* = \mathbf{ST}_{\lambda}(\gamma) = \begin{cases} \gamma\left ( 1 - \frac{\lambda}{|\gamma|} \right ) & \text{for} & |\gamma| \gt \lambda\\ 0 & \text{for} & |\gamma| \le \lambda \end{cases} :cite:`zibulevsky2010l1` show that a similar solution emerges when :math:`A` is unitary too. :cite:`daubechies2004iterative` introduced a surrogate term: .. math:: \text{dist}(x, x_0) = \frac{c}{2} \| x - x_0 \|_2^2 - \frac{1}{2} \| A x - A x_0 \|_2^2 For this function to be strictly convex w.r.t. :math:`x`, its Hessian must be positive definite. This holds if: .. math:: c > \| A^H A \|_2 = \lambda_{\max}(A^H A) the largest eigen value :math:`A^H A`. The new surrogate objective function is defined as: .. math:: \widetilde{f}(x, x_0) = f(x) + \text{dist}(x, x_0) = \frac{1}{2}\| b - A x \|_2^2 + \lambda \| x \|_1 + \frac{c}{2} \| x - x_0 \|_2^2 - \frac{1}{2} \| A x - A x_0 \|_2^2 Introducing: .. math:: \nu_0 = x_0 + \frac{1}{c} A^H (b - A x_0) We can rewrite the surrogate objective as: .. math:: \widetilde{f}(x, x_0) = \text{Const} + \lambda \| x \|_1 + \frac{c}{2} \| x - \nu_0 \|_2^2 As discussed above, this surrogate objective can be minimized using the soft thresholding operator: .. math:: x^* = \mathbf{ST}_{\frac{\lambda}{c} }(\nu_0) = \mathbf{ST}_{\frac{\lambda}{c} }\left (x_0 + \frac{1}{c} A^H (b - A x_0) \right ) Starting with some :math:`x_0`, we can propose an itarative method over a sequence :math:`x_i = \{x_0, x_1, x_2, \dots \}` as: .. math:: :label: ista_iteration_soft_thresholding x_{i+1} = \mathbf{ST}_{\frac{\lambda}{c} }\left (x_i + \frac{1}{c} A^H (b - A x_i)\right ) This is the IST algorithm. By changing the regularization in :eq:`ista_l1_l2_minimization`, we can derive different IST algorithms with different thresholding functions. The version below considers a generalized thresholding function which depends on the regularizer. .. math:: :label: ista_iteration_thresholding x_{i+1} = \mathbf{T}_{\frac{\lambda}{c} }\left (x_i + \frac{1}{c} A^H (b - A x_i)\right ) Sparsifying Basis ''''''''''''''''''''''' Often, the signal :math:`x` (e.g. an image) may not be sparse or compressible but it has a sparse representation in some basis :math:`B`. We have .. math:: \alpha = B^H x as the representation of :math:`x` in the basis :math:`B`. The regularization is then applied to the representation :math:`\alpha`. :eq:`ista_l1_l2_minimization` becomes: .. math:: \widehat{x} = \text{arg} \min_{x} \frac{1}{2}\| b - A x \|_2^2 + \lambda \| B^H x \|_1 We can rewrite this as: .. math:: \widehat{\alpha} = \text{arg} \min_{\alpha} \frac{1}{2}\| b - A B \alpha \|_2^2 + \lambda \| \alpha \|_1 :eq:`ista_iteration_thresholding` changes to: .. math:: \alpha_{i+1} = \mathbf{T}_{\frac{\lambda}{c} }\left (\alpha_i + \frac{1}{c} B^H A^H (b - A B \alpha_i)\right ) By substituting :math:`\alpha = B^H x` and :math:`x = B \alpha`, we get: .. math:: \alpha_{i+1} = \mathbf{T}_{\frac{\lambda}{c} }\left (B^H x_i + \frac{1}{c} B^H A^H (b - A x_i)\right ) Simplifying further: .. math:: :label: ista_iteration_thresholding_basis x_{i+1} = B \mathbf{T}_{\frac{\lambda}{c} }\left ( B^H \left (x_i + \frac{1}{c} A^H (b - A x_i)\right ) \right ) This is the version of IST algorithm with an operator :math:`A` and a basis :math:`B`. Implementation ---------------------- We introduce the current residual: .. math:: r = b - A x and a step size parameter :math:`\alpha = \frac{1}{c}`. We also assume that a thresholding function :math:`\mathbf{T}` will be user defined. This simplifies the iteration to: .. math:: x_{i+1} = B \mathbf{T}\left ( B^H \left (x_i + \alpha A^H r_i \right ) \right ) .. rubric:: Algorithm state Current state of the algorithm is described by following quantities: .. list-table:: :header-rows: 1 * - Term - Description * - ``x`` - Current estimate of :math:`x` * - ``r`` - Current residual :math:`r` * - ``r_norm_sqr`` - Squared norm of the residual :math:`\| r \|_2^2` * - ``x_change_norm`` - Change in the norm of :math:`x` given by :math:`\|x_{i+1} - x_{i} \|_2` * - ``iterations`` - Number of iterations of algorithm run so far .. rubric:: Algorithm initialization We initialize the algorithm with: * :math:`x \leftarrow x_0` an initial estimate of solution given by user. It can be 0. * :math:`r \leftarrow b - A x_0` * Compute ``r_norm_sqr`` * Give a very high value to ``x_change_norm`` (since there is no :math:`x_{-1}`). * ``iterations = 0`` .. rubric:: Algorithm iteration Following steps are involved in each iteration. These are directly based on :eq:`ista_iteration_thresholding_basis`. * Compute gradient: :math:`g \leftarrow \alpha A^H r` * Update estimate: :math:`x \leftarrow x + g` * Transform :math:`\alpha \leftarrow B^H x` * Threshold: :math:`\alpha \leftarrow \mathbf{T} (\alpha)` * Inverse transform :math:`x \leftarrow B \alpha` * Update residual: :math:`r \leftarrow b - A x`