Underdetermined Linear Systems

The discussion in this section is largely based on chapter 1 of [Ela10].

Consider a matrix \(\Phi \in \CC^{M \times N}\) with \(M < N\).

Define an under-determined system of linear equations:

(1)\[\Phi x = y\]

where \(y \in \CC^M\) is known and \(x \in \CC^N\) is unknown.

This system has \(N\) unknowns and \(M\) linear equations. There are more unknowns than equations.

Let the columns of \(\Phi\) be given by \(\phi_1, \phi_2, \dots, \phi_N\).

Column space of \(\Phi\) (vector space spanned by all columns of \(\Phi\)) is denoted by \(\ColSpace(\Phi)\) i.e.

(2)\[\ColSpace(\Phi) = \sum_{i=1}^{N} c_i \phi_i, \quad c_i \in \CC.\]

We know that \(\ColSpace(\Phi) \subset \CC^M\).

Clearly \(\Phi x \in \ColSpace(\Phi)\) for every \(x \in \CC^N\). Thus if \(y \notin \ColSpace(\Phi)\) then we have no solution. But, if \(y \in \ColSpace(\Phi)\) then we have infinite number of solutions.

Let \(\NullSpace(\Phi)\) represent the null space of \(\Phi\) given by

(3)\[\NullSpace(\Phi) = \{ x \in \CC^N : \Phi x = 0\}.\]

Let \(\widehat{x}\) be a solution of \(y = \Phi x\). And let \(z \in \NullSpace(\Phi)\). Then

(4)\[\Phi (\widehat{x} + z) = \Phi \widehat{x} + \Phi z = y + 0 = y.\]

Thus the set \(\widehat{x} + \NullSpace(\Phi)\) forms the complete set of infinite solutions to the problem \(y = \Phi x\) where

(5)\[\widehat{x} + \NullSpace(\Phi) = \{\widehat{x} + z \quad \Forall z \in \NullSpace(\Phi)\}.\]

Regularization

One way to pick a specific solution from the set of all possible solutions is to introduce regularization.

We define a cost function \(J(x) : \CC^N \to \RR\) which defines the desirability of a given solution \(x\) out of infinitely possible solutions. The higher the cost, lower is the desirability of the solution.

Thus the goal of the optimization problem is to find a desired \(x\) with minimum possible cost.

We can write this optimization problem as

(6)\[\begin{split}\begin{aligned} & \underset{x}{\text{minimize}} & & J(x) \\ & \text{subject to} & & y = \Phi x. \end{aligned}\end{split}\]

If \(J(x)\) is convex, then its possible to find a global minimum cost solution over the solution set.

If \(J(x)\) is not convex, then it may not be possible to find a global minimum, we may have to settle with a local minimum.

A variety of such cost function based criteria can be considered.

\(l_2\) Regularization

One of the most common criteria is to choose a solution with the smallest \(l_2\) norm.

The problem can then be reformulated as an optimization problem

(7)\[\begin{split}\begin{aligned} & \underset{x}{\text{minimize}} & & \| x \|_2 \\ & \text{subject to} & & y = \Phi x. \end{aligned}\end{split}\]

In fact minimizing \(\| x \|_2\) is same as minimizing its square \(\| x \|_2^2 = x^H x\).

So an equivalent formulation is

(8)\[\begin{split}\begin{aligned} & \underset{x}{\text{minimize}} & & x^H x \\ & \text{subject to} & & y = \Phi x. \end{aligned}\end{split}\]

A formal solution to \(l_2\) norm minimization problem can be easily obtained using Lagrange multipliers.

We define the Lagrangian

(9)\[\mathcal{L}(x) = \|x\|_2^2 + \lambda^H (\Phi x - y)\]

with \(\lambda \in \CC^M\) being the Lagrange multipliers for the (equality) constraint set.

Differentiating \(\mathcal{L}(x)\) w.r.t. \(x\) we get

(10)\[\frac{\partial \mathcal{L}(x)} {\partial x} = 2 x + \Phi^H \lambda.\]

By equating the derivative to \(0\) we obtain the optimal value of \(x\) as

(11)\[x^* = - \frac{1}{2} \Phi^H \lambda. \label{eq:ssm:underdetermined_l2_optimal_value_expression_1}\]

Plugging this solution back into the constraint \(\Phi x= y\) gives us

(12)\[\Phi x^* = - \frac{1}{2} (\Phi \Phi^H) \lambda= y\implies \lambda = -2(\Phi \Phi^H)^{-1} y.\]

In above we are implicitly assuming that \(\Phi\) is a full rank matrix thus, \(\Phi \Phi^H\) is invertible and positive definite.

Putting \(\lambda\) back in above we obtain the well known closed form least squares solution using pseudo-inverse solution

(13)\[x^* = \Phi^H (\Phi \Phi^H)^{-1} y = \Phi^{\dag} y.\]

Convexity

Convex optimization problems have a unique feature that it is possible to find the global optimal solution if such a solution exists.

The solution space \(\Omega = \{x : \Phi x = y\}\) is convex. Thus the feasible set of solutions for the optimization problem is also convex. All it remains is to make sure that we choose a cost function \(J(x)\) which happens to be convex. This will ensure that a global minimum can be found through convex optimization techniques. Moreover, if \(J(x)\) is strictly convex, then it is guaranteed that the global minimum solution is unique. Thus even though, we may not have a nice looking closed form expression for the solution of a strictly convex cost function minimization problem, the guarantee of the existence and uniqueness of solution as well as well developed algorithms for solving the problem make it very appealing to choose cost functions which are convex.

We recall that all \(l_p\) norms with \(p \geq 1\) are convex functions. In particular \(l_{\infty}\) and \(l_1\) norms are very interesting and popular where

(14)\[l_{\infty}(x) = \max(|x_i|), \, 1 \leq i \leq N\]

and

(15)\[l_1(x) = \sum_{i=1}^{N} |x_i|.\]

In the following section we will attempt to find a unique solution to our optimization problem using \(l_1\) norm.

\(l_1\) Regularization

In this section we will restrict our attention to the Euclidean space case where \(x \in \RR^N\), \(\Phi \in \RR^{M \times N}\) and \(y \in \RR^M\).

We choose our cost function \(J(x) = l_1(x)\).

The cost minimization problem can be reformulated as

(16)\[\begin{split}\begin{aligned} & \underset{x}{\text{minimize}} & & \| x \|_1 \\ & \text{subject to} & & \Phi x = y. \end{aligned}\end{split}\]

It’s time to have a closer look at our cost function \(J(x) = \|x \|_1\). This function is convex yet not strictly convex.

For the \(l_1\) norm minimization problem since \(J(x)\) is not strictly convex, hence a unique solution may not be guaranteed. In specific cases, there may be infinitely many solutions. Yet what we can claim is

  • these solutions are gathered in a set that is bounded and convex, and

  • among these solutions, there exists at least one solution with at most \(M\) non-zeros (as the number of constraints in \(\Phi x = y\)).

Theorem

Let \(S\) denote the solution set of \(l_1\) norm minimization problem. \(S\) contains at least one solution \(\widehat{x}\) with \(\| \widehat{x} \|_0 = M\).

See [Ela10] for proof.

We thus note that \(l_1\) norm has a tendency to prefer sparse solutions. This is a well known and fundamental property of linear programming.

\(l_1\) norm minimization problem as a linear programming problem

We now show that \(l_1\) norm minimization problem in \(\RR^N\) is in fact a linear programming problem.

Recalling the problem:

(17)\[\begin{split}\begin{aligned} & \underset{x \in \RR^N}{\text{minimize}} & & \| x \|_1 \\ & \text{subject to} & & y = \Phi x. \end{aligned}\end{split}\]

Let us write \(x\) as \(u - v\) where \(u, v \in \RR^N\) are both non-negative vectors such that \(u\) takes all positive entries in \(x\) while \(v\) takes all the negative entries in \(x\).

We note here that by definition

(18)\[\supp(u) \cap \supp(v) = \EmptySet\]

i.e. support of \(u\) and \(v\) do not overlap.

We now construct a vector

(19)\[\begin{split}z = \begin{bmatrix} u \\ v \end{bmatrix} \in \RR^{2N}.\end{split}\]

We can now verify that

(20)\[\| x \|_1 = \|u\|_1 + \| v \|_1 = 1^T z.\]

And

(21)\[\begin{split}\Phi x = \Phi (u - v) = \Phi u - \Phi v = \begin{bmatrix} \Phi & -\Phi \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} = \begin{bmatrix} \Phi & -\Phi \end{bmatrix} z\end{split}\]

where \(z \succeq 0\).

Hence the optimization problem (17) can be recast as

(22)\[\begin{split}\begin{aligned} & \underset{z \in \RR^{2N}}{\text{minimize}} & & 1^T z \\ & \text{subject to} & & \begin{bmatrix} \Phi & -\Phi \end{bmatrix} z = y\\ & \text{and} & & z \succeq 0. \end{aligned}\end{split}\]

This optimization problem has the classic Linear Programming structure since the objective function is affine as well as constraints are affine.

Let \(z^* =\begin{bmatrix} u^* \\ v^* \end{bmatrix}\) be an optimal solution to the problem (22).

In order to show that the two optimization problems are equivalent, we need to verify that our assumption about the decomposition of \(x\) into positive entries in \(u\) and negative entries in \(v\) is indeed satisfied by the optimal solution \(u^*\) and \(v^*\). i.e. support of \(u^*\) and \(v^*\) do not overlap.

Since \(z \succeq 0\) hence \(\langle u^* , v^* \rangle \geq 0\). If support of \(u^*\) and \(v^*\) don’t overlap, then we have \(\langle u^* , v^* \rangle = 0\). And if they overlap then \(\langle u^* , v^* \rangle > 0\).

Now for the sake of contradiction, let us assume that support of \(u^*\) and \(v^*\) do overlap for the optimal solution \(z^*\).

Let \(k\) be one of the indices at which both \(u_k \neq 0\) and \(v_k \neq 0\). Since \(z \succeq 0\), hence \(u_k > 0\) and \(v_k > 0\).

Without loss of generality let us assume that \(u_k > v_k > 0\).

In the equality constraint

(23)\[\begin{split}\begin{bmatrix} \Phi & -\Phi \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} = y\end{split}\]

Both of these coefficients multiply the same column of \(\Phi\) with opposite signs giving us a term

(24)\[\phi_k (u_k - v_k).\]

Now if we replace the two entries in \(z^*\) by

(25)\[u_k' = u_k - v_k\]

and

(26)\[v_k' = 0\]

to obtain an new vector \(z'\), we see that there is no impact in the equality constraint

(27)\[\begin{bmatrix} \Phi & -\Phi \end{bmatrix} z = y.\]

Also the positivity constraint

(28)\[z \succeq 0\]

is satisfied. This means that \(z'\) is a feasible solution.

On the other hand, the objective function \(1^T z\) value reduces by \(2 v_k\) for \(z'\). This contradicts our assumption that \(z^*\) is the optimal solution.

Hence for the optimal solution of (22) we have

(29)\[\supp(u^*) \cap \supp(v^*) = \EmptySet\]

thus

(30)\[x^* = u^* - v^*\]

is indeed the desired solution for the optimization problem (17).