empirical testing of sparse approximation and matrix

34
Sparse Approximation Phase Transitions Matrix completion Empirical Testing of Sparse Approximation and Matrix Completion Algorithms Jared Tanner Workshop on Sparsity, Compressed Sensing and Applications ———– University of Oxford Joint with Blanchard, Donoho, and Wei Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Upload: others

Post on 10-Apr-2022

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Empirical Testing of Sparse Approximationand Matrix Completion Algorithms

Jared Tanner

Workshop on Sparsity, Compressed Sensing and Applications———–

University of OxfordJoint with Blanchard, Donoho, and Wei

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 2: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Three sparse approximation questions to test

Sparse approximation:

minx‖x‖0 s.t. ‖Ax − b‖2 ≤ τ with A ∈ Rm×n

1. Are there algorithms that have same behaviour for different A?2. Which algorithm is fastest and with a high recovery probability?

Matrix completion:

minX

rank(X ) s.t. ‖A(X )− b‖2 ≤ τ with A maps Rm×n to Rp

3. What is largest rank that is recovered with efficient algorithm?

Information about each question can be gleaned from large scaleempirical testing. Lets use some HPC resources.

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 3: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Three sparse approximation questions to test

Sparse approximation:

minx‖x‖0 s.t. ‖Ax − b‖2 ≤ τ with A ∈ Rm×n

1. Are there algorithms that have same behaviour for different A?2. Which algorithm is fastest and with a high recovery probability?

Matrix completion:

minX

rank(X ) s.t. ‖A(X )− b‖2 ≤ τ with A maps Rm×n to Rp

3. What is largest rank that is recovered with efficient algorithm?

Information about each question can be gleaned from large scaleempirical testing. Lets use some HPC resources.

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 4: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Three sparse approximation questions to test

Sparse approximation:

minx‖x‖0 s.t. ‖Ax − b‖2 ≤ τ with A ∈ Rm×n

1. Are there algorithms that have same behaviour for different A?2. Which algorithm is fastest and with a high recovery probability?

Matrix completion:

minX

rank(X ) s.t. ‖A(X )− b‖2 ≤ τ with A maps Rm×n to Rp

3. What is largest rank that is recovered with efficient algorithm?

Information about each question can be gleaned from large scaleempirical testing. Lets use some HPC resources.

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 5: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Sparse approximation phase transition

I Problem characterized by three numbers: k ≤ m ≤ n• n, Signal Length, “Nyquist” sampling rate• m, number of inner product measurements,• k, signal complexity, sparsity, k := minx ‖x‖0

I Mixed under/over-sampling rates compared to naive/optimal

Undersampling: δm :=m

n, Oversampling: ρm :=

k

m

I Testing model: For matrix ensemble and algorithm draw Aand k-sparse x0, let Π(k,m, n) be the probability of recovery

I For fixed (δm, ρm), Π(k,m, n) converges to 1 or 0 withincreasing m: separated by phase transition curve ρ(δ)

I Algorithm with ρ(δ) large, Π(k,m, n) insensitive to matrix?

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 6: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Phase Transition: `1 ball, C n

I With overwhelming probability on measurements Am,n:for any ε > 0, as (k,m, n) →∞• All k-sparse signals if k/m ≤ ρS(m/n,C )(1− ε)• Most k-sparse signals if k/m ≤ ρW (m/n,C )(1− ε)• Failure typical if k/m ≥ ρW (m/n,C )(1 + ε)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recovery: all signals

Recovery: most signals

!S

!W

δ = m/n

km

I Asymptotic behaviour δ → 0: ρ(m/n) ∼ [2(e) log(n/m)]−1

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 7: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Phase Transition: Simplex, T n−1, x ≥ 0

I With overwhelming probability on measurements Am,n:for any ε > 0, x ≥ 0, as (k,m, n) →∞• All k-sparse signals if k/m ≤ ρS(m/n,T )(1− ε)• Most k-sparse signals if k/m ≤ ρW (m/n,T )(1− ε)• Failure typical if k/m ≥ ρW (m/n,T )(1 + ε)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Recovery: all signals

Recovery: most signals

!W

!S

δ = m/n

km

I Asymptotic behaviour δ → 0: ρ(m/n) ∼ [2(e) log(n/m)]−1

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 8: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

`1-Weak Phase Transitions: Visual agreement

I Testing beyond the proven theory, 6.4 CPU years later...

I Black: Weak phase transition: x ≥ 0 (top), x signed (bot.)I Overlaid empirical evidence of 50% success rate:

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

δ=n/N

ρ=k/n

GaussianBernoulliFourierTernary p=2/3Ternary p=2/5Ternary p=1/10HadamardExpander p=1/5Rademacherρ(δ,Q)

I Gaussian, Bernoulli, Fourier, Hadamard, RademacherI Ternary (p): P(0) = 1− p and P(±1) = p/2I Expander (p): dp · ne ones per column, otherwise zerosI Rigorous statistical comparison shows n−1/2 convergence

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 9: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

`1-Weak Phase Transitions: Visual agreement

I Testing beyond the proven theory, 6.4 CPU years later...I Black: Weak phase transition: x ≥ 0 (top), x signed (bot.)I Overlaid empirical evidence of 50% success rate:

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

δ=n/N

ρ=k/n

GaussianBernoulliFourierTernary p=2/3Ternary p=2/5Ternary p=1/10HadamardExpander p=1/5Rademacherρ(δ,Q)

I Gaussian, Bernoulli, Fourier, Hadamard, RademacherI Ternary (p): P(0) = 1− p and P(±1) = p/2I Expander (p): dp · ne ones per column, otherwise zerosI Rigorous statistical comparison shows n−1/2 convergence

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 10: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Bulk Z -scores

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−4

−3

−2

−1

0

1

2

3

4

5

δ=n/N

Z−score

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−6

−4

−2

0

2

4

6

δ=n/N

Z−score

(a) Bernoulli (b) Fourier

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−4

−3

−2

−1

0

1

2

3

4

δ=n/N

Z−score

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−4

−3

−2

−1

0

1

2

3

δ=n/N

Z−score

(c) Ternary (1/3) (d) Rademacher

I n = 200, n = 400 and n = 1600I Linear trend with δ = m/n, decays at rate n−1/2

I Proven for matrices with subgaussian tail, Montanari 2012

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 11: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Which algorithm is fastest and with high phase transition?

State of the art algorithms for sparse approximation

I Hard Thresholding, Hk(ATb), followed by subspace restrictedlinear solver: Conjugate Gradient

I Normalized IHT: Hk(x t + κAT (b − Ax t)), (Steepest Descent)I Hard Thresholding Pursuit: NIHT with pseudo-inverseI CSAMPSP (hybrid of CoSaMP and Subspace Pursuit)

v t+1 = x t+1 = Hαk(xt + κAT (b − Ax t))

It = supp(v t) ∪ supp(x t) Join supp. sets

wIt = (ATIt AIt )

−1ATIt b Least squares fit

x t+1 = Hβk(wt) Second threshold

I SpaRSA [Lee and Wright ’08]

I Testing environment with random problem generation, orpassing matrix and measurements.

I Matrix EnsemblesI Discrete Cosine Transform, Sparse Matrices, Gaussian

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 12: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Which algorithm is fastest and with high phase transition?

State of the art algorithms for sparse approximation

I Hard Thresholding, Hk(ATb), followed by subspace restrictedlinear solver: Conjugate Gradient

I Normalized IHT: Hk(x t + κAT (b − Ax t)), (Steepest Descent)I Hard Thresholding Pursuit: NIHT with pseudo-inverseI CSAMPSP (hybrid of CoSaMP and Subspace Pursuit)

v t+1 = x t+1 = Hαk(xt + κAT (b − Ax t))

It = supp(v t) ∪ supp(x t) Join supp. sets

wIt = (ATIt AIt )

−1ATIt b Least squares fit

x t+1 = Hβk(wt) Second threshold

I SpaRSA [Lee and Wright ’08]I Testing environment with random problem generation, or

passing matrix and measurements.

I Matrix EnsemblesI Discrete Cosine Transform, Sparse Matrices, Gaussian

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 13: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Which algorithm is fastest and with high phase transition?

State of the art algorithms for sparse approximation

I Hard Thresholding, Hk(ATb), followed by subspace restrictedlinear solver: Conjugate Gradient

I Normalized IHT: Hk(x t + κAT (b − Ax t)), (Steepest Descent)I Hard Thresholding Pursuit: NIHT with pseudo-inverseI CSAMPSP (hybrid of CoSaMP and Subspace Pursuit)

v t+1 = x t+1 = Hαk(xt + κAT (b − Ax t))

It = supp(v t) ∪ supp(x t) Join supp. sets

wIt = (ATIt AIt )

−1ATIt b Least squares fit

x t+1 = Hβk(wt) Second threshold

I SpaRSA [Lee and Wright ’08]I Testing environment with random problem generation, or

passing matrix and measurements.I Matrix Ensembles

I Discrete Cosine Transform, Sparse Matrices, Gaussian

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 14: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Ingredients of Greedy CS Algorithms:

I Descent: νt := x t + κAT (b − Ax t) with κ =‖AT

Λt (b−Ax t)‖22

‖AΛt ATΛt (b−Ax t)‖2

2

requires two matvec and one transpose matvec, and vec adds.

I Support: identification of the support set forx t+1 = Hk(νt), hard thresholding, and calculating κ.Use linear binning for fast parallel order statistic calculation,and only do so when support set could change. Reducedsupport set time to a small fraction of one DCT matvec time.

I Generation: when testing millions of problemsthe problem generation can become slow, especially usingmatlab randn.

Total time (for large problems) reduced to essentially the matvecs.

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 15: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Computing environment

CPU:

I Intel Xeon 5650 (released March 2010)

I 6 core, 2.66 GHz

I 12 GB of DDR2 PC3-1066, 6.4 GT/s

I Matlab 2010a, 64 bit (inherent multi-core threading)

GPU:

I NVIDIA Tesla c2050 (release April 2010)

I 448 Cores, peak performance 1.03 Tflop/s

I 3GB GDDR5 (on device memory)

I Error-correction

Is it faster?

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 16: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Multiplicative acceleration factor for NIHT: CPU/GPU

matrixEnsemble n nonZeros Descent Support Generation

dct

214 63.21 42.16 1.04216 64.46 41.59 1.77218 54.11 38.45 3.20220 57.94 38.82 5.80

smv

212 4 0.52 4.10 32.32214 4 1.41 14.64 135.08216 4 4.29 43.04 521.60218 4 10.43 71.50 1630.08212 7 0.63 3.48 33.92214 7 1.86 12.86 142.53216 7 5.42 37.11 526.82218 7 10.80 55.60 1556.44

gen210 1.06 2.07 0.34212 10.36 4.09 2.53214 16.75 6.17 5.85

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 17: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Algorithm Selection for DCT, map, n = 216

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

NIHT: circle

HTP: plus

CSMPSP: square

ThresholdCG: times

m/n

k/m

Algorithm selection map, n=65536

NIHT dominant near phase transition.

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 18: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Algorithm Selection for DCT, map, n = 218

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

NIHT: circle

HTP: plus

CSMPSP: square

ThresholdCG: times

m/n

k/m

Algorithm selection map, n=262144

NIHT dominant near phase transition.

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 19: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Algorithm Selection for DCT, map, n = 220

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

NIHT: circle

HTP: plus

CSMPSP: square

ThresholdCG: times

m/n

k/m

Algorithm selection map, n=1048576

NIHT dominant near phase transition, though HTP nearly as fast.

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 20: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

HTP / best time for DCT, n = 220

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

m/n

Time: HTP / fastest algorithm, n=1048576

k/m

1

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

NIHT and HTP have essentially identical average case behaviour.

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 21: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Best time for DCT, n = 214

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

m/n

Time (ms) of fastest algorithm, n=16384

k/m

25

30

35

40

45

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 22: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Best time for DCT, n = 216

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

m/n

Time (ms) of fastest algorithm, n=65536

k/m

25

30

35

40

45

50

55

60

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 23: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Best time for DCT, n = 218

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

m/n

Time (ms) of fastest algorithm, n=262144

k/m

40

50

60

70

80

90

100

110

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 24: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Best time for DCT, n = 220

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

m/n

Time (ms) of fastest algorithm, n=1048576

k/m

100

150

200

250

300

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 25: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

Universality using cluster: embarrassinglyEmpirical testing of iterative algorithms using GPUs

Concentration phenomenon NIHT for DCT, δ = 0.25

I Logit fit, exp(β0+β1k)1+exp(β0+β1k) , of data collected of about 105 tests

I ρnihtW (1/4) ≈ 0.25967 (Note, ρW (1/4,C ) = 0.2674)

I Transition width proportional to n−1/2

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 26: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

What is largest rank recoverable with efficient algorithmMatrix completion with ρ near 1

Optimal order recovery - matrix completion

I Four defining numbers: r ≤ m ≤ n and p• m × n, Matrix size, mn is “Nyquist” sampling rate• p, number of inner product measurements• r , matrix complexity, rank

I For what (r ,m, n, p) does an encoder/decoder pair recover asuitable approximation of X from (b,A)?• p = r(m + n − r) is the optimal oracle rate• p ∼ r(m + n − r) possible using efficient algorithms

I Mixed under/over-sampling rates compared to naive/optimal

Undersampling: δ :=p

mn, Oversampling: ρ :=

r(m + n − r)

p

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 27: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

What is largest rank recoverable with efficient algorithmMatrix completion with ρ near 1

Optimal order recovery - matrix completion

I Four defining numbers: r ≤ m ≤ n and p• m × n, Matrix size, mn is “Nyquist” sampling rate• p, number of inner product measurements• r , matrix complexity, rank

I For what (r ,m, n, p) does an encoder/decoder pair recover asuitable approximation of X from (b,A)?• p = r(m + n − r) is the optimal oracle rate• p ∼ r(m + n − r) possible using efficient algorithms

I Mixed under/over-sampling rates compared to naive/optimal

Undersampling: δ :=p

mn, Oversampling: ρ :=

r(m + n − r)

p

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 28: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

What is largest rank recoverable with efficient algorithmMatrix completion with ρ near 1

Optimal order recovery - matrix completion

I Four defining numbers: r ≤ m ≤ n and p• m × n, Matrix size, mn is “Nyquist” sampling rate• p, number of inner product measurements• r , matrix complexity, rank

I For what (r ,m, n, p) does an encoder/decoder pair recover asuitable approximation of X from (b,A)?• p = r(m + n − r) is the optimal oracle rate• p ∼ r(m + n − r) possible using efficient algorithms

I Mixed under/over-sampling rates compared to naive/optimal

Undersampling: δ :=p

mn, Oversampling: ρ :=

r(m + n − r)

p

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 29: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

What is largest rank recoverable with efficient algorithmMatrix completion with ρ near 1

Largest rank recoverable with efficient algorithm

I Compresses sensing algorithms all behave about the same

I How about matrix completion, do simple methods work well?

I NIHT: alternating projection with column subspace stepsize

X j+1 = Hr (Xj + µjA∗(b −A(X j)))

with

µj :=‖P j

UA∗(b −A(X j))‖2

F

||A(P jUA∗(b −A(X j)))||22

where P jU := UjU

∗j . (column & row projection doesn’t work.)

I Contrast NIHT with nuclear norm minimization viasemi-definite programming and simple Power Factorization.

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 30: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

What is largest rank recoverable with efficient algorithmMatrix completion with ρ near 1

Three matrix completion algorithms to compare

I Nuclear norm minimization (extension of `1 in CS)

minX‖X‖∗ :=

∑σi (X ) subject to A(X ) = b.

I NIHT for matrix completion (how to select µj)

X j+1 = Hr (Xj + µjA∗(b −A(X j)))

I Power Factorization

minR,V

‖RV ‖2 subject to RV(X ) := A(X ) = b.

Benchmark algorithms ability to recover low rank matrices, andcontrast speed and memory requirements. 4.3 CPU years later...

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 31: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

What is largest rank recoverable with efficient algorithmMatrix completion with ρ near 1

NIHT vs “state of the art”, Gaussian sensing (m = n = 80)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.4

0.5

0.6

0.7

0.8

0.9

1

p/mn

rho

Recovery phase transition with gamma = 1.000

NIHT: Column Projection (0.999) with Gaussian MeasurementsPower Factorization with Gaussian MeasurementsNuclear Norm Minimization with Gaussian Measurements

I Simple NIHT has nearly optimal recovery ability

I Convex relaxation consistent with theory of Hassibi et al.

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 32: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

What is largest rank recoverable with efficient algorithmMatrix completion with ρ near 1

NIHT vs “state of the art”, entry sensing (m = n = 800)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

p/mn

rho

Recovery phase transition with gamma = 1.000

NIHT: Column Projection (0.999) with Entry MeasurementsPower Factorization with Entry MeasurementsNuclear Norm Minimization with Entry Measurements

I Simple NIHT has nearly optimal recovery ability

I Convex relaxation is slow and with a small recovery region.

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 33: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

What is largest rank recoverable with efficient algorithmMatrix completion with ρ near 1

Conclusions

I There are many algorithms for sparse approximation,and matrix completion, all proven to have the optimal orderrecovery m ≥ Const.k log(n/m), and p ≥ Const.r(m + n− r).

I Empirical testing can suggest conjectures and point us to“best” methods.

I Use of high performance computing tools allows testing largenumbers of problems, and problems quickly: GPU softwaresolves problems of size n = 106 in under one second.

Two new findings:

I Near universality of CS algorithms phase transitions, `1

I Convexification less effective for matrix completion; simplemethods for min rank have higher phase transition

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms

Page 34: Empirical Testing of Sparse Approximation and Matrix

Sparse Approximation Phase TransitionsMatrix completion

What is largest rank recoverable with efficient algorithmMatrix completion with ρ near 1

References

I Observed universality of phase transitions in high-dimensionalgeometry, with implications for modern data analysis andsignal processing (2009) Phil. Trans. Roy. Soc. A, Donohoand Tanner.

I GPU Accelerated Greedy Algorithms for compressed sensing(2012), Blanchard and Tanner.

I Normalized iterative hard thresholding for matrix completion(2012), Tanner and Wei.

Thanks for your time

Jared Tanner Empirical Testing of Sparse Approximation and Matrix Completion Algorithms