The goal of this project is to provide a c library that is useful for the application of compressed sensing and related L1-regularized problems. To date, lots of research has been produced that develops methods to efficiently solve various formulations of the compressed sensing problem. To their great credit, many of those reasearchers have released code along with their publications. See, for example, l1-magic, NESTA, SparseLab, and Split Bregman. So, a good question is, why re-do it in c?
-
Much of the published code is written in Matlab. This means if you want to use the code in a Python or R or Julia or a c/c++ project, you must translate the Matlab code to your language. By contrast, c is something like the langua franca of computing. It is relatively straigtforward to interface c code with most languages, including Matlab.
-
CS optimizations are computationally intensive. Thus, for those who wish to apply CS to real world problems, it is beneficial to write optimized implementations of those optimizations and c fits that requirement (Fortran might be better, but I have forgotten it).
-
c is fun (probably the main reason).
Present capabilites and features include
-
Solving the basis pursuit denoising problem
\(\min_{x} ||W^Tx||_{1} \quad \text{s.t.} \quad ||Ax -b||_{2} < \epsilon,\)
using Nesterov’s algorithm, which is based on the paper NESTA: A Fast and Accurate First-order Method for Sparse Recovery , by Stephen Becker, Jerome Bobin and Emmanuel J. Candes. The NESTA algorithm can problems in both synthesis mode (\(W^T=I\)) or analysis mode, where \(W\) may be an overcomplete dictionary. This allows the inclusion of several regularizers. NESTA is much faster than the log-barrier algorithm.
-
Solving the basis pursuit denoising problem
\(\min_{x} ||x||_{1} \quad \text{s.t.} \quad ||Ax -b||_{2} < \epsilon,\)
using a log-barrier algorithm, which is based on the implementation in l1-magic, with a few modifications.
-
Solving the Anistropic Total Variation (TV) denoising using Bregman Splitting. Given a noisy \(n\) by \(m\) image \(f\), this solves the problem
\(\min_{u} ||\nabla_{x}u||_{1} + ||\nabla_{y}u||_{1} + \mu ||u-f||_{2}. \)
-
1D and 2D discrete cosine transforms (using FFTW).
-
Python and Matlab bindings for the high-level optimization routines.
So far, using l1C
gives me a speed increase of between 2 and 7 times faster compared to the original matlab code, depending on the problem and computer I run it on.
If you compile with FFTW+OpenBlas, it is important that both libraries are compiled with openmp. I don’t quite understand what happens, but if this is not the case, I see only single processor being used and performance suffers dramatically.
If you have a CPU with hyperthreading, it is important to export the environmental variable
export OMP_NUM_THREADS=N
where N is the number of physical cores. Essentially, if you have HT, this is half the the number of processors you see in a resource monitor, which shows the number of logical cores. The code currently can not detect this, and for number crunching applications like this one, HT is detrimental.
Setting OMP_BIND_PROC=true
seems to cost us about 1 second.
Please see either examples/c
or, when building with the bindings, interfaces/python/examples
or interfaces/mex/examples
.
I have made a few changes (improvements?) to the original \~l1-magic
algorithm, both pertaining to the line search. These changes address issues with numerical, rather than mathematical, problems. As the l1-magic
authors note, in the later stages of the optimziation, numerical difficulties arise and the line search can fail. These modifications help to push that point into the future, enabling more iterations.
-
In the original code, I noticed that at some point, the data become complex when it should have been purely real. One of the places where this occures is in the code which computes the maximum step-size which still satisfies the constraints (i.e., lines XX in the original code). In particular, the code which computes the largest number \(s\) such such that, for \(x_{k+1}= x_{k} + sd_{x_k}\), \(||Ax_{k+1}-b||<\epsilon\) still holds. To do this, we expand into a scalar equation quadratic in \(s\)
\( \begin{aligned} ||A(x+sd_{x})-b||^{2} - \epsilon^{2} &=0 \\ s^{2}(d_x^{T}A^{T}Ad_x) + 2r^{T}Ad_x + r^{T}r - \epsilon^{2} &= 0 \end{aligned}\)
where \(r = Ax - b\). Although the roots should always be real, due to either computing \(d_{x}\) with insufficient accuracy (which accomplished via conjugate gradient) or otherwise, the roots become complex in the later stages. In matlab, the promation to a complex type happens silently and we start optimizing complex data, which is undersirable. In c, the
sqrt
operation simply returns NaN, which is also undersirable. When this happens, the modification is to set \(s=1\) and let the line search deal with. This will work fine in c because taking the log of a negative number results in NaN. In Matlab, we need something likelog(max(0, x))
. -
The goal of the line-search is to find (approximitaly) the largest step-size \(s\) such that
\( f(z + sd_{z}) < f(z) + \alpha s \nabla f\cdot d_{z} \)
In the original code, the functional \(f(z)\) is only evaluated explicitly at the start of each log-barrier iteration and the value of \(f(z_{i})\) is updated from derived values, e.g., \(r_{k+1}= r_{k} + sAd_{x}\). Mathematically, this is fine. Numerically, it is problematic because after enough iterations the explicit value of \(f(z_{k})\) becomes infinite (due to the barrier functions) even though the putative value is finite. Thus, although it is less efficient, this code evaluates the functional explicitly at each iteration of the line-search and this value is then passed to the next Newton iteration.
-
Enable detection hyperthreading, and set
omp_num_threads
to half the number of reported cores. -
Documentation!
-
Examples via the bindings.
-
Other optimization routines. On the list are
-
The isotropic TV-denoising problem using Bregman iteration.
-
FISTA
-
-
Other transforms, such as
-
Wavelets. I think we could just pull in the wavelet implementation of GSL.
-
Fourier Transforms. I believe the algorithms should still work if we just reshape complex data of size
n
todouble*
of size2n
.
-