blind compressed sensing using sparsifying...
TRANSCRIPT
1
Blind Compressed SensingUsing Sparsifying Transforms
Saiprasad Ravishankar and Yoram Bresler
Department of Electrical and Computer Engineeringand Coordinated Science Laboratory
University of Illinois at Urbana-Champaign
May 29, 2015
S. Ravishankar & Y. Bresler Blind Compressed Sensing
2
Key Topics of Talk
Non-adaptive Compressed Sensing (CS)
Synthesis dictionary learning-basedblind compressed sensing
Transform learning vs. Dictionary learning
Transform learning-based blindcompressed sensing
Application to magnetic resonanceimaging (MRI)
Transform learning based MRI (TLMRI)
Conclusions
S. Ravishankar & Y. Bresler Blind Compressed Sensing
3
Compressed Sensing (CS)
CS enables accurate recovery of images from far fewermeasurements than the number of unknowns
Sparsity of image in transform domain or dictionary
measurement procedure incoherent with transform
Reconstruction non-linear and expensive
Reconstruction problem (NP-hard) -
minx
Data Fidelity︷ ︸︸ ︷
‖Ax − y‖22 +λRegularizer︷ ︸︸ ︷
‖Ψx‖0 (1)
x ∈ CP : image as vector, y ∈ Cm : measurements.
A ∈ Cm×P : sensing matrix (m < P), Ψ : transform (Wavelets,
Contourlets, Total Variation). ℓ0 “norm”counts non-zeros.
Iterative algorithms for CS reconstruction are usually expensive.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
4
Application: Compressed Sensing MRI (CSMRI)
Data - samples in k-space of spatial Fouriertransform of object, acquired sequentially.
Acquisition rate limited by MR physics,physiological constraints on RF energydeposition.
CS accelerates the data acquisition in MRI.
CSMRI with non-adaptive transforms ordictionaries limited to 2.5-3 foldundersampling [Ma et al. ’08].
Two directions to improve CSMRI -
better or adaptive sparse modeling
better choice of sampling pattern (Fu)[EMBC, 2011]
Fig. from Lustig et al. ’07
S. Ravishankar & Y. Bresler Blind Compressed Sensing
5
Synthesis Model for Sparse Representation
Given a signal y ∈ Rn, and dictionary D ∈ Rn×K , we assumey = Dx with ‖x‖0 ≪ K ⇒ a union of subspaces model.
Real world signals modeled as y = Dx + e, e is deviation term.
Given D, sparsity level s, the synthesis sparse coding problem is
x = argminx
‖y − Dx‖22 s.t. ‖x‖0 ≤ s
This problem is NP-hard.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
6
Synthesis Dictionary Learning
The DL problem (NP-hard) -
minD,B
N∑
j=1
‖Rjx − Dbj‖22 s.t. ‖dk‖2 = 1 ∀ k , ‖bj‖0 ≤ s ∀ j (2)
Rjx ∈ Cn -√n×√
n patch indexed bylocation in image.
Rj ∈ Cn×P extracts patch.
D ∈ Cn×K - patch based dictionary.
bj ∈ CK - sparse, xj ≈ Dbj .
s - sparsity, B = [b1 | b2 | ... | bN ].DL minimizes fit error of all patches usingsparse representations w.r.t. D.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
7
Synthesis-based Blind Compressed Sensing (BCS)
(P0) minx,D,B
Sparse Fitting Regularizer︷ ︸︸ ︷
N∑
j=1
‖Rjx − Dbj‖22 + ν
Data Fidelity︷ ︸︸ ︷
‖Ax − y‖22
s.t. ‖dk‖2 = 1 ∀ k , ‖bj‖0 ≤ s ∀ j .
B ∈ Cn×N : matrix that has the sparse codes bj as its columns.
(P0) learns D ∈ Cn×K , and reconstructs x , from only undersampled y ⇒
dictionary adaptive to underlying image.
(P0) is NP-hard, non-convex even if ℓ0 “norm” relaxed to ℓ1.
DLMRI1 solves (P0) for MRI and works better than non-adaptive CS.
Synthesis BCS algorithms have no guarantees and are expensive.
1 [Ravishankar & Bresler ’11]
S. Ravishankar & Y. Bresler Blind Compressed Sensing
8
2D Random Sampling - 6 fold undersampling
0
0.05
0.1
0.15
0.2
0.25
0.3
LDP2 reconstruction (22 dB) LDP error magnitude
0
0.05
0.1
0.15
0.2
0.25
0.3
DLMRI reconstruction (32 dB) DLMRI error magnitude
MRI data from Miki Lustig. 2 [Lustig et al. ’07]
S. Ravishankar & Y. Bresler Blind Compressed Sensing
9
Alternative: Sparsifying Transform Model
Given a signal y ∈ Rn, and transform W ∈ Rm×n, we modelWy = x + η with ‖x‖0 ≪ m and η - error term.
Natural signals are approximately sparse in Wavelets, DCT.
Given W , and sparsity s, transform sparse coding is
x = argminx
‖Wy − x‖22 s.t. ‖x‖0 ≤ s
x = Hs(Wy) computed by thresholding Wy to the s largest magnitude
elements. Sparse coding is cheap. Signal recovered as W †x .
Sparsifying transforms exploited for compression (JPEG2000), etc.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
10
Alternative: Sparsifying Transform Learning
Square Transform Models
Unstructured transform learning [IEEE TSP, 2013 & 2015]
Doubly sparse transform learning [IEEE TIP, 2013]
Online learning for Big Data [IEEE JSTSP, 2015]
Convex formulations for transform learning [ICASSP, 2014]
Overcomplete Transform Models
Unstructured overcomplete transform learning [ICASSP, 2013]
Learning structured overcomplete transforms with block cosparsity(OCTOBOS) [IJCV, 2014]
Applications: Sparse representation, Image & Video denoising,Classification, Blind compressed sensing (BCS) for imaging.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
11
Square Transform Learning Formulation
(P1) minW ,B
Sparsification Error︷ ︸︸ ︷
N∑
j=1
‖WRjx − bj‖22 +λ
Regularizer︷ ︸︸ ︷(
0.5 ‖W ‖2F − log |detW |)
s.t. ‖bj‖0 ≤ s ∀ j
Sparsification error - measures deviation of data in transform domainfrom perfect sparsity.
Regularizer enables complete control over conditioning & scaling of W .
If ∃ (W , B) such that the condition number κ(W ) = 1, WRjx = bj ,
||bj ||0 ≤ s ∀ j ⇒ globally identifiable by solving (P1).
(P1) favors both a low sparsification error and good conditioning.
The solution to (P1) is unitary as λ→ ∞.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
12
Transform-based Blind Compressed Sensing (BCS)
(P2) minx,W ,B
Sparsification Error︷ ︸︸ ︷
N∑
j=1
‖WRjx − bj‖22+νData Fidelity︷ ︸︸ ︷
‖Ax − y‖22 +λRegularizer︷ ︸︸ ︷
v(W )
s.t.N∑
j=1
‖bj‖0 ≤ s, ‖x‖2 ≤ C .
(P2) learns W ∈ Cn×n, and reconstructs x , from only undersampledy ⇒ transform adaptive to underlying image.
v(W ) , − log |detW | + 0.5 ‖W ‖2F controls scaling and κ of W .
‖x‖2 ≤ C is an energy/range constraint. C > 0.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
13
Transform BCS: Identifiability & Uniqueness
Proposition 1
Let x ∈ Cp, and let y = Ax with A ∈ Cm×p. Suppose
‖x‖2 ≤ C
W ∈ Cn×n is a unitary transform∑N
j=1 ‖WRjx‖0 ≤ s
Further, let B denote the matrix that has WRjx as its columns.Then, (x ,W ,B) is a global minimizer of Problem (P2), i.e., it isidentifiable by solving (P2).
Given minimizer (x ,W ,B) of (P2), (x ,ΘW ,ΘB) is anotherequivalent minimizer ∀Θ s.t. ΘHΘ = I ,
∑
j ‖Θbj‖0 ≤ s. Theoptimal x is invariant to such transformations of (W ,B).
When W is constrained to be doubly sparse and unitary, uniquenesscan be guaranteed under additional (e.g., spark) conditions.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
14
Alternative Transform BCS Formulations
(P3) minx,W ,B
N∑
j=1
‖WRjx − bj‖22 + ν ‖Ax − y‖22
s.t. WHW = I ,
N∑
j=1
‖bj‖0 ≤ s, ‖x‖2 ≤ C .
(P3) is also a unitary synthesis dictionary-based BCS problem,with WH the synthesis dictionary.
(P4) minx,W ,B
N∑
j=1
‖WRjx − bj‖22 + ν ‖Ax − y‖22 + λ v(W ) + η2N∑
j=1
‖bj‖0
s.t. ‖x‖2 ≤ C .
S. Ravishankar & Y. Bresler Blind Compressed Sensing
15
Block Coordinate Descent (BCD) Algorithm for (P2)
(P2) solved by alternating between updating W , B, and x .
Alternate a few times between the W and B updates, beforeperforming an image update.
Sparse Coding Step solves (P2) for B with fixed x , W .
minB
N∑
j=1
‖WRjx − bj‖22 s.t.N∑
j=1
‖bj‖0 ≤ s. (3)
Cheap Solution: Let Z ∈ Cn×N be the matrix with WRjx as its
columns. Solution B = Hs(Z ) computed exactly by zeroing out allbut the s largest magnitude coefficients in Z .
S. Ravishankar & Y. Bresler Blind Compressed Sensing
16
BCD Algorithm for (P2)
Transform Update Step solves (P2) for W with fixed x , B.
minW
N∑
j=1
‖WRjx − bj‖22 + 0.5λ ‖W ‖2F − λ log |detW | (4)
Let X ∈ Cn×N be the matrix with Rjx as its columns.
Closed-form solution:
W = 0.5R
(
Σ+(
Σ2 + 2λI) 1
2
)
VHL−1 (5)
where XXH + 0.5λI = LLH , and L−1XBH has a full SVD of VΣRH .
Solution is unique if and only if XBH is non-singular.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
17
BCD Algorithm for (P2)
Image Update Step solves (P2) for x with fixed W , B.
minx
N∑
j=1
‖WRjx − bj‖22 + ν ‖Ax − y‖22 s.t. ‖x‖2 ≤ C . (6)
Least squares problem with ℓ2 norm constraint.
Solution is unique as long as the set of overlapping patches cover allimage pixels.
Solve Least squares Lagrangian formulation:
minx
N∑
j=1
‖WRjx − bj‖22 + ν ‖Ax − y‖22 + µ
(
‖x‖22 − C)
(7)
The optimal multiplier µ ∈ R+ is the smallest real such that‖x‖2 ≤ C . µ and x can be found cheaply.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
18
BCS Convergence Guarantees - Notations
Define the barrier function ψs(B) as
ψs(B) =
{
0,
+∞,
∑N
j=1 ‖bj‖0 ≤ s
else
χC (x) is the barrier function corresponding to ‖x‖2 ≤ C .
(P2) is equivalent to the problem of minimizing the objective
g(W ,B, x) =
N∑
j=1
‖WRjx − bj‖22 + ν ‖Ax − y‖22 + λ v(W ) + ψ(B) + χ(x)
For H ∈ Cp×q , ρj(H) is the magnitude of the j th largest element(magnitude-wise) of H .
X ∈ Cn×N denotes a matrix with Rjx , 1 ≤ j ≤ N , as its columns.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
19
Transform BCS Convergence Guarantees
Theorem 1
For the sequence {W t ,B t , x t} generated by the BCD Algorithm withinitial (W 0,B0, x0), we have
{g (W t ,B t , x t)} → g∗ = g∗(W 0,B0, x0).
{W t ,B t , x t} is bounded, and all its accumulation points areequivalent, i.e., they achieve the same value g∗ of the objective.∥∥x t − x t−1
∥∥2→ 0 as t → ∞.
Every accumulation point (W ,B, x) is a critical point of gsatisfying the following partial global optimality conditions
x ∈ argminx
g (W ,B, x) (8)
W ∈ argminW
g(
W ,B, x)
, B ∈ argminB
g(
W , B , x)
(9)
S. Ravishankar & Y. Bresler Blind Compressed Sensing
20
Transform BCS Convergence Guarantees
Theorem 2
Each accumulation point (W ,B, x) of {W t ,B t , x t} also satisfies thefollowing partial local optimality conditions
g(W +∆W ,B +∆B, x) ≥g(W ,B, x) = g∗ (10)
g(W ,B +∆B, x +∆x) ≥g(W ,B, x) = g∗ (11)
The conditions each hold for all ∆x ∈ Cp, and all ∆W ∈ Cn×n satisfying‖∆W ‖F ≤ ǫ for some ǫ = ǫ(W ) > 0, and all ∆B ∈ Cn×N in R1 ∪ R2
R1. The half-space Re(tr{(WX − B)∆BH
})≤ 0.
R2. The local region defined by ‖∆B‖∞ < ρs(WX ).
Furthermore, if ‖WX‖0 ≤ s, then ∆B can be arbitrary.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
21
Global Convergence Guarantees
Proposition 2
For each initialization, the iterate sequence in the BCD algorithmconverges to an equivalence class (same objective values) of criticalpoints of the objective, that are also partial global/local minimizers.
Proposition 3
The BCD algorithm is globally convergent to a subset of the set ofcritical points of the objective. The subset includes all (W ,B, x) that areat least partial global and partial local minimizers.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
22
Computational Advantages of Transform BCS
Cost per iteration of transform BCS: O(p4NL)
N overlapping patches of size p × p, W ∈ Cn×n, n , p2.
L : # inner alternations between transform update & sparse coding.
Cost per iteration of Synthesis BCS method DLMRI3: O(p6NJ).
D ∈ Cn×K , n , p2, K ∝ n, sparsity s ∝ n.
J : # of inner iterations of dictionary learning using K-SVD4.
In practice, transform BCS converges quickly and is much cheaperfor large p.
In 3D or 4D imaging, n = p3 or p4, and the gain in computations isabout a factor n in order.
3 [Ravishankar & Bresler ’11] 4 [Aharon et al. ’06],
S. Ravishankar & Y. Bresler Blind Compressed Sensing
23
TLMRI Convergence - 4x Undersampling (s = 3.4%)
Reference Sampling mask
100
1019.45
9.5
9.55
9.6
9.65
9.7x 105
Iteration Number
Obj
ectiv
e F
unct
ion
100
10110
−2
10−1
100
101
Iteration Number (t)
∥ ∥
xt−xt−1∥ ∥
2
Objective∥∥x t − x t−1
∥∥2vs. t
S. Ravishankar & Y. Bresler Blind Compressed Sensing
24
Convergence & Learning - 4x Undersampling (s = 3.4%)
0
0.05
0.1
0.15
0.2
Zero-filling (28.94 dB) Zero-filling Error
TLMRI (32.66 dB) real (top), imaginary (bottom)parts of learnt 36× 36 W
S. Ravishankar & Y. Bresler Blind Compressed Sensing
25
Comparison (PSNR & Runtime) to Recent Methods
Sampling Scheme Undersampling Zero-filling LDP5 PBDWS6 DLMRI7 PANO8 TLMRI
2D Random4x 25.3 30.3 32.6 32.91 32.2 33.04
7x 25.3 27.3 31.3 31.46 30.2 31.81
Cartesian4x 28.9 30.2 32.0 32.46 31.6 32.64
7x 27.9 25.5 30.1 30.72 30.4 31.04
Avg. Runtime (s) 251 794 2051 664 211
TLMRI is up to 5.5 dB better than LDP, that uses Wavelets + TV.
TLMRI provides up to 1 dB improvement in PSNR over the PBDWS methodthat uses redundant Wavelets and trained patch-based geometric directions, andis up to 1.6 dB better than the non-local PANO method.
It is up to 0.35 dB better than DLMRI, that learns 4x overcomplete dictionary.
TLMRI is 10x faster than DLMRI, and 4x faster than the PBDWS method.
TLMRI provides the best reconstructions, and is the fastest.
5 [Lustig et al. ’07] 6 [Ning et al. ’13] 7 [Ravishankar & Bresler ’11] 8 [Qu et al. ’14]
S. Ravishankar & Y. Bresler Blind Compressed Sensing
26
Example - 2D random 5x Undersampling
Reference DLMRI (28.54 dB) TLMRI (30.47 dB)
0
0.05
0.1
0.15
0.2
0.25
0
0.05
0.1
0.15
0.2
0.25
Sampling Mask DLMRI Error TLMRI Error
S. Ravishankar & Y. Bresler Blind Compressed Sensing
27
Conclusions
We introduced a transform-based BCS framework
Proposed BCS algorithms have a low computational cost.
We provided novel convergence guarantees for the algorithms, thatdo not require any restrictive assumptions.
For CSMRI, the proposed approach is better than leading imagereconstruction methods, while being much faster.
Future work: convergence of algorithm to global minimizer &convergence rate.
S. Ravishankar & Y. Bresler Blind Compressed Sensing
28
Thank you! Questions??
S. Ravishankar & Y. Bresler Blind Compressed Sensing
29
Convergence Guarantees - Definitions
Definition 1
Let φ : Rq 7→ (−∞,+∞] be a proper function and let z ∈ domφ. TheFrechet sub-differential of the function φ at z is the following set:
∂φ(z) ,{
h ∈ Rq : lim infb→z,b 6=z
1‖b−z‖ (φ(b) − φ(z) − 〈b − z , h〉) ≥ 0
}
(12)
If z /∈ domφ, then ∂φ(z) = ∅. The sub-differential of φ at z is defined as
∂φ(z) ,{
h ∈ Rq : ∃zk → z , φ(zk ) → φ(z), hk ∈ ∂φ(zk ) → h
}. (13)
Lemma 1
A necessary condition for z ∈ Rq to be a minimizer of the functionφ : Rq 7→ (−∞,+∞] is that z is a critical point of φ, i.e., 0 ∈ ∂φ(z). Ifφ is a convex function, this condition is also sufficient.
S. Ravishankar & Y. Bresler Blind Compressed Sensing