# Introduction to Sparse Subspace Clustering¶

Consider a dataset of \(S\) points in the ambient data space \(\mathbb{R}^M\) which have been assembled in a matrix \(Y\) of shape \(M \times S\).

In many applications, it often occurs that
if we *group* or *segment* the data set \(Y\) into
multiple disjoint subsets (clusters):
\(Y = Y_1 \cup \dots \cup Y_K\),
then each subset can be modeled sufficiently well by a low dimensional subspace
\(\mathbb{R}^D\) where \(D \ll M\).
Some of the applications include:
motion segmentation [BB91, CK98, Gea98, Kan01, PK97, TK91, TK92],
face clustering [BJ03, HYL+03, LHK05]
and handwritten digit recognition [ZSWL12].

*Subspace clustering* is a clustering framework which assumes
that the data-set can be segmented into clusters where points in
different clusters are drawn from different subspaces. Subspace clustering
algorithms are able to simultaneously segment the data into
clusters corresponding to different subspaces as well as estimate
the subspaces from the data itself.
A comprehensive review of subspace clustering can be found in
[Vid10].
Several state of the art algorithms are based on building
subspace preserving representations of individual data points
by treating the data set itself as a (self expressive) dictionary.
For creating subspace preserving representations, one resorts to
using sparse coding algorithms developed in sparse representations and
compressive sensing literature.

Two common algorithms are
*Sparse Subspace Clustering* using \(\ell_1\) *regularization*
(SSC-\(\ell_1\)):cite:elhamifar2009sparse, elhamifar2013sparse
and *Sparse Subspace Clustering using Orthogonal
Matching Pursuit* (SSC-OMP) [DSB13, YRV16, YV15].
While SSC-\(\ell_1\) is guaranteed to give correct clustering under
broad conditions (arbitrary subspaces and corrupted data), it
requires solving a large scale convex optimization problem. On
the other hand, SSC-OMP
is computationally efficient but its clustering accuracy is
poor (especially at low density of data points per subspace).

The dataset \(Y\) is modeled as being sampled from a collection or arrangement \(\mathcal{U}\) of linear (or affine) subspaces \(\mathcal{U}_k \subset \mathbb{R}^M\) : \(\mathcal{U} = \{ \mathcal{U}_1 , \dots , \mathcal{U}_K \}\). The union of the subspaces is denoted as \(Z_{\mathcal{U}} = \mathcal{U}_1 \cup \dots \cup \mathcal{U}_K\).

Let the data set be \(\{ y_j \in \mathbb{R}^M \}_{j=1}^S\)
drawn from the union of subspaces under consideration.
\(S\) is the total number of data points being analyzed
simultaneously.
We put the data points together in a *data matrix* as

Let the vectors be drawn from a set of \(K\) (linear or affine) subspaces, The subspaces are indexed by a variable \(k\) with \(1 \leq k \leq K\). The \(k\)-th subspace is denoted by \(\mathcal{U}_k\). Let the (linear or affine) dimension of \(k\)-th subspace be \(\dim(\mathcal{U}_k) = D_k\) with \(D_k \leq D \ll M\).

The vectors in \(Y\) can be grouped (or segmented or clustered) as submatrices \(Y_1, Y_2, \dots, Y_K\) such that all vectors in \(Y_k\) are drawn from the subspace \(\mathcal{U}_k\). Thus, we can write

where \(\Gamma\) is an \(S \times S\) unknown permutation matrix placing each vector to the right subspace.

Let there be \(S_k\) vectors in \(Y_k\) with \(S = S_1 + \dots + S_K\). Let \(Q_k\) be an orthonormal basis for subspace \(\mathcal{U}_k\). Then, the subspaces can be described as

For linear subspaces, \(\mu_k = 0\).

A dataset where each point can be expressed as a linear combination
of other points in the dataset is said to satisfy
*self-expressiveness property*. The self-expressive
representation of a point \(y_s\) in \(Y\) is given by

where \(C = \begin{bmatrix}c_1, \dots, c_S \end{bmatrix} \in \mathbb{R}^{S \times S}\) is the matrix of representation coefficients.

Let \(y_s\) belong to \(k\)-th subspace \(\mathcal{U}_k\).
Let \(Y^{-s}\) denote the dataset \(Y\) excluding the point \(y_s\)
and \(Y_k^{-s}\) denote the
set of points in \(Y_k\) excluding \(y_s\). If \(Y_k^{-s}\) spans the subspace
\(\mathcal{U}_k\), then a representation of \(y_s\) can be constructed entirely
from the points in \(Y_k^{-s}\). A representation is called
*subspace preserving* if it consists of points within the same subspace.

If \(c_i\) is a subspace preserving representation of \(y_i\) and \(y_j\) belongs to a different subspace, then \(c_{ij} = 0\). Thus, if \(C\) consists entirely of subspace preserving representations, then \(C_{ij} = 0\) whenever \(y_i\) and \(y_j\) belong to different subspaces. In other words, if \(Y_{-k}\) denotes the set of points from all subspaces excluding the subspace \(Y_k\) corresponding to the point \(y_i\), then points in \(Y_{-k}\) do not participate in the representation \(c_i\).

In the `cr.sparse.cluster.ssc`

package, we provide a version of
OMP which can be used to construct the sparse self expressive representations
\(C\) of \(Y\). Once the representation has been constructed, we compute an
affinity matrix \(W = |C| + |C^T|\).

We then apply spectral clustering on \(W\) to complete SSC-OMP.
For this, we have written a JAX version of spectral clustering
in `cr.sparse.cluster.spectral`

package. In particular, it
uses our own Lanczos Bidiagonalization with Partial Orthogonalization (LANBPRO)
algorithm to compute the \(K\) largest singular values of the
normalized affinity matrix
in as few iterations as possible. The intermediate variables
\(C\), \(W\) are maintained in the experimental sparse matrices
stored in BCOO format.
The LANBPRO algorithm also works on sparse matrices directly.
Thus, even though \(C\) is of size \(S \times S\), it can be stored
efficiently in \(O(DS)\) storage. This enables us to process
hundreds of thousands of points efficiently.