Background
In this background page, we explore in more detail the geometric properties of the short-and-sparse deconvolution (SaSD) problem. Typical optimization formulations of SaSD are nonconvex. This nonconvexity can be attributed to the signed shift symmetry of the short-and-sparse model: because of symmetry there are many distinct minimizers. Understanding the geometry of SaSD will enable us to give provable algorithms for certain model problem instances, and, more importantly, to develop better practical algorithms.
Bilinear Lasso and Approximations
Our study centers around the “bilinear Lasso” formulation:
which balances sparsity and fidelity to the observed data.
To study its geometry rigorously, we develop an approximation to this objective function, by approximating the loss as
Since both and are constants, the simplified objective becomes
This objective still solves the short-and-sparse deconvolution, but requires more stringent signal conditions and larger sample size (longer signal ); nevertheless, the geometry of this objective represents that of the bilinear-Lasso well and therefore it draws our attention to study it. Specifically, we will investigate the effect of the symmetric solutions of short signal on the objective landscape , the marginal minimization version of , over the sphere :
As it turns out, the geometry of this objective will be indeed dictated by the solutions of problem, that is, the shifts of short ground truth .
From Shifts to Geometry
Shifts of
From now on, we will assume that the length of the true short signal is as that we optimize over a space of dimension . Write the shift of by samples as and put it in the space of dimension , we say
where . As we mentioned, these shifts are the solutions to the short-and-sparse deconvolution, which are also the local minimizers of the objective and dictate the overall landscape of the objective.
Geometry over subspace of shifts
Specifically, the geometry of over the subspace formed by linear combination of a few shifts will exhibit a “symmetric structure”. We will define a subspace spanned by the shifts of , , as:
As it turns out, these symmetric structure of landscape for over will enable efficient methods to find the ground truth when several conditions on the ground truth signals are satisfied.
We demonstrate three pictorial examples for the landscape over sphere near the subspace that is spanned by one, two, and three shifts, as follows:
Firstly, the figure on the left shows that the landscape of is strongly convex near the shift , and, not-surprisingly, its unique local minimizer is also close to that exact shift. Though this local geometry is certainly ideal for the minimization algorithm to solving the deconvolution problem, this region is very small compared to overall space the sphere , meaning that it can be difficult (computationally inefficient) to locate these regions without additional understanding of geometric landscape for .
Secondly, the figure on the middle shows landscape near the subspace spanned by two shifts exhibits a symmetric structure. The landscape is constituted by two convex region near either and ; in between the two shifts, the negative curvature of breaks this symmetric structure; and away from this subspace, the function values are growing. Though this geometry is nonconvex, it is expected that a minimization algorithm that starts within this region solves the deconvolution problem, because of two reasons: (1) the minimizing procedure is very unlikely to be stuck in between the symmetric structure—the saddle point has zero measure and has large negative curvature directions pointing at either solutions, and (2) the minimization algorithm will be trapped within this region near subspace, since the function value away from this subspace always grows.
Lastly, the symmetric structure for can be extended to higher dimensional subspaces, as we show in the figure on the right, which demonstrates the landscape over span of three shifts . Noticeably, all local minimizers are near shifts, and in between all three shifts (the peak point right in the middle), all curvatures with directions inside this subspace are negative; meanwhile, we can also see the substructure formed by any pair of two shifts exhibits similar symmetric structure as in the second figure. Therefore, minimization starts within this subspace can also successfully find the shifts; and more importantly, this geometry structure can be extended to even higher dimensional subspace (much higher then three), and to any subspaces formed by different combinations of shifts.
Finally, we want to list these key findings base on the investigation of this geometry over shifts subspace:
-
Minimizer: Within the subspace, all local minimizers are close to one of the shifts, and all these local regions have symmetric structure.
-
Negative Curvature: Within the subspace, and in between the shifts, the negative curvature of will breaks the symmetric structure, meaning that the minimizing algorithm will not stuck in between and will eventually find one of the good local minimizer.
-
Positive Curvature: Away from the subspace, the function value of will be growing, meaning that the minimizing algorithm will stay trapped near the subspace over iterates.
Geometry over union of subspaces
The benign geometry of for the minimization algorithm to solve the short-and-sparse deconvolution holds not just near a single subspace, but near any subspace formed by span of any different combination of shifts:
We provide a pictorial example once again, as it shows the geometry of over the sphere exhibits similar symmetric structure for every pair of different shifts , and , in which all the local minimizers of over this union of subspace are close to shifts that span the subspace. Similar geometric property holds for union of shifts subspaces of much higher dimension. In theory, depending on the length of and the sparsity rate of , this preferable geometry holds for union of subspaces spanned any combination of shifts.
When is SaSD Solvable?
Short-and-sparse deconvolution is impossible in general, but it can be solved efficiently, whenever the following conditions are satisfied:
-
Short: is enough short compares to the observed signal.
-
Incoherence: all signed shifts of are not close to each other in distance.
-
Sparse: The density of the support for is sufficiently sparse, and somewhat irregular.
There is a catch here, if we push these conditions to extreme, say if the pattern is so short and the map is sparse enough that most of the patterns being non overlapping, then short-and-sparse deconvolution problem becomes trivial—-which simply boils down to looking for the isolated short signal as the ground truth . This extreme case does not apply to most of the practical scenarios, and a well designed algorithm should be able to solve short-and-sparse deconvolution even under other more complicated, non-trivial cases.
Shift coherence of
Short-and-sparse deconvolution becomes an easier problem if the short is more shift-incoherent. The coherence is simply defined as the largest absolute inner-product between all different shift pairs, namely
characterizes the difference between the solutions of short-and-sparse deconvolution: if the coherence is larger , then there are two shifts being closer on the sphere, which will be harder for the algorithm to distinguish whether the shift of is fitting to solve the problem. Geometrically speaking, the closer these shifts are, the easier a bad local minimizer may be created in between the shift, and the algorithm could wrongly report this minimizer, the linear combination of shifts, as the solution.
Generally speaking, if the spectrum of the short is biased ( is lowpass, highpass, etc.), then the coherence of tends to become larger, and vice versa.
Sparsity of
We characterize the sparsity of by assuming it to be a random vector whose entries are subordinate to Bernoulli-Gaussian distribution with support activation rate .
Concretely, the random vector can be written as where is Gaussian vector and is Bernoulli vector .
As expected, when the sparsity is higher, the short-and-sparse deconvolution problem becomes harder, and vice versa. Notice, that if the sparsity rate is lesser then , there will have many isolated in the observation signal that can be found via simply algorithm, which makes the short-and-sparse deconvolution a trivial problem. The algorithm we are studying works under more complicating signal condition, namely when .
Sparsity-coherence tradeoff
We can further articulate these conditions for solving short-and-sparse deconvolution through “sparsity-coherence tradeoff”. That is, the deconvolution problem becomes easier to solve when either the short has low coherence or the sparsity of is lower . If the short has low shift-coherence, then the sparsity of is allowable to larger, and if turns out to be smoother and low-pass, then the allowable of has to be lower.
From Geometry to Algorithm
The understanding of how the geometry of is shaped by the shifts of suggests it is possible to design an algorithm that guarantees solving short-and-sparse deconvolution. The algorithm is straightforward. It starts near one of the shift subspaces, and stays near the subspace through the course of minimization.
Initialization in union of subspaces
Let us consider a case where the observation of noiseless consists of isolated, non-overlapping . In this trivial case, solving the deconvolution problem becomes finding the isolated copy of the short signal , which can be done via locating the shortest continuous segment of , simple as it can possibly be. This algorithm, though, is unrealistic in even slightly more general cases, when contains noise or has higher sparsity.
Nevertheless, we can still adopt this idea under realistic scenarios when contains noise and overlapping by seeing that, due to the sparsity of , a chunk of data will contain only a few shifts of . This observation implies that a chunk of will be closer to the set of the shifts, and will be close to the shift subspace where has nice geometry.
The figure shows an example where the chunk of the data contains a few truncated shift of , and with simple modification (taking gradient) we can easily obtain a initial vector that stays in the subspace spanned by these shifts. When the sparsity of is , then in expectation, the number of participating shift in this initialization scheme is roughly about . Since, it is known that has nice geometry over subspace with dimension up to , it is expected that minimization starting with the appointed solves the deconvolution problem.
As it turns out, this initialization using a chunk of data , can be also applied to the practical problems using bilinear-lasso.
Minimization within the subspace
Finally, knowing that the function value of is growing away from the subspace, implies the small stepping gradient descent method on this objective will not leave the subspace, and will eventually converge to a local minimizer within, which are exactly the solution, shifts of .
To adopt this phenomenon in practice while solving bilinear-lasso for deconvolution, since the studied objective is the marginal minimization over of the Lasso-like objective, it is advised the initialization should also be the minimizer of the objective with assigned , then alternation minimization of both and would well approximate the gradient descent procedure of minimizing within the subspace, hence the correct solution can be attained.