singular values of tensors

Singular Values of Tensors

Harm Derksen

University of Michigan

CUNY/NYUFebruary 15, 2019

Harm Derksen Singular Values of Tensors

Tensor Rank

F a field, F = R or F = CV (i) ∼= Fni for i = 1, 2, . . . , dV = V (1) ⊗ V (2) ⊗ · · · ⊗ V (d) ∼= Fn1×···×nd tensor product space

Definition

A simple tensor is a tensor of the form v (1) ⊗ v (2) ⊗ · · · ⊗ v (d)

(v (i) ∈ V (i)).

Definition (tensor rank)

The rank of T is the smallest positive integer r such that T is thesum of r simple tensors.

Tensor Rank

F a field, F = R or F = CV (i) ∼= Fni for i = 1, 2, . . . , dV = V (1) ⊗ V (2) ⊗ · · · ⊗ V (d) ∼= Fn1×···×nd tensor product space

Definition

A simple tensor is a tensor of the form v (1) ⊗ v (2) ⊗ · · · ⊗ v (d)

(v (i) ∈ V (i)).

Definition (tensor rank)

The rank of T is the smallest positive integer r such that T is thesum of r simple tensors.

Matrix Multiplication Tensor

Md =d∑

d∑j=1

d∑k=1

ei ,j ⊗ ej ,k ⊗ ek,i ∈ Fn×n ⊗ Fn×n ⊗ Fn×n

clearly rank(Md) ≤ d3

Theorem (Strassen)

if rank(Md) = s then complexity of n × n matrix multiplication isO(nlogd (s)) (the standard algorithm is O(n3))

Theorem (Strassen)

rank(M2) ≤ 7, so complexity of n × n matrix multiplication isO(nlog2(7)) = O(n2.81)

Current record: O(n2.373) (Le Gall)

Md =d∑

d∑j=1

d∑k=1

Theorem (Strassen)

Md =d∑

d∑j=1

d∑k=1

Theorem (Strassen)

The Canonical Polyadic (CP) Model

aka PARAFAC, CANDECOMP

Problem

Given a tensor T ∈ V = V (1) ⊗ · · · ⊗ V (d), writeT = v1 + v2 + · · ·+ vr where v1, v2, . . . , vr are simple tensors and ris minimal.

I the CP-decomposition is sometimes unique, not always

I not numerically stable: there is no upper bound formax{‖v1‖, . . . , ‖vr‖} as a function of ‖T‖.

I difficult to compute: for many tensors of interest the rank isunknown

Many numerical applications in psychometrics, signal processing,numerical linear algebra, computer vision, numerical analysis, datamining, graph analysis, neuroscience, chemometrics, etc.

aka PARAFAC, CANDECOMP

Problem

Given a tensor T ∈ V = V (1) ⊗ · · · ⊗ V (d), writeT = v1 + v2 + · · ·+ vr where v1, v2, . . . , vr are simple tensors and ris minimal.

I the CP-decomposition is sometimes unique, not always

I not numerically stable: there is no upper bound formax{‖v1‖, . . . , ‖vr‖} as a function of ‖T‖.

I difficult to compute: for many tensors of interest the rank isunknown

Many numerical applications in psychometrics, signal processing,numerical linear algebra, computer vision, numerical analysis, datamining, graph analysis, neuroscience, chemometrics, etc.

To make it more numerically stable, we could allow an error term.

Problem

Given a tensor T ∈ V and fixed r , write T = v1 + v2 + · · ·+ vr +Ewhere v1, v2, . . . , vr are simple tensors and ‖E‖ is minimal.

This optimization problem might be ill-posed. Define

T = e2 ⊗ e1 ⊗ e1 + e1 ⊗ e2 ⊗ e1 + e1 ⊗ e1 ⊗ e2,

v1(t) = t−1(e1+te2)⊗(e1+te2)⊗(e1+te2), v2(t) = −t−1e1⊗e1⊗e1Then T = v1(t) + v2(t) + E (t) with ‖E (t)‖ = t

√3 + t.

So we can write T = v1 + v2 + E with ‖E‖ = ε for every ε > 0,but we cannot write T = v1 + v2 + E with ‖E‖ = 0.

To make it more numerically stable, we could allow an error term.

Problem

Given a tensor T ∈ V and fixed r , write T = v1 + v2 + · · ·+ vr +Ewhere v1, v2, . . . , vr are simple tensors and ‖E‖ is minimal.

This optimization problem might be ill-posed. Define

T = e2 ⊗ e1 ⊗ e1 + e1 ⊗ e2 ⊗ e1 + e1 ⊗ e1 ⊗ e2,

v1(t) = t−1(e1+te2)⊗(e1+te2)⊗(e1+te2), v2(t) = −t−1e1⊗e1⊗e1Then T = v1(t) + v2(t) + E (t) with ‖E (t)‖ = t

√3 + t.

So we can write T = v1 + v2 + E with ‖E‖ = ε for every ε > 0,but we cannot write T = v1 + v2 + E with ‖E‖ = 0.

Motivation: Compressed Sensing and Convex Relaxation

For x ∈ Rn, its sparsity is measured by ‖x‖0 = #{i | xi 6= 0}.

Problem

Given A ∈ Rm×n, b ∈ Rm, find a solution x ∈ Rn for Ax = b with‖x‖0 minimal (a sparsest solution).

But, ‖ · ‖0 is not convex and this optimization problem is difficult,

so instead we consider:

Problem (Basis Pursuit)

Given A ∈ Rm×n, b ∈ Rm, find a solution x ∈ Rn for Ax = b with‖x‖1 minimal.

Basis Pursuit can be solved by linear programming and is generallyfast. Under reasonable assumptions, Basis Pursuit also gives thesparsest solutions (Candes-Tao, Donoho).

Motivation: Compressed Sensing and Convex Relaxation

For x ∈ Rn, its sparsity is measured by ‖x‖0 = #{i | xi 6= 0}.

Problem

Given A ∈ Rm×n, b ∈ Rm, find a solution x ∈ Rn for Ax = b with‖x‖0 minimal (a sparsest solution).

But, ‖ · ‖0 is not convex and this optimization problem is difficult,so instead we consider:

Problem (Basis Pursuit)

Given A ∈ Rm×n, b ∈ Rm, find a solution x ∈ Rn for Ax = b with‖x‖1 minimal.

Basis Pursuit can be solved by linear programming and is generallyfast. Under reasonable assumptions, Basis Pursuit also gives thesparsest solutions (Candes-Tao, Donoho).

The Nuclear and Spectral Norms

Low rank tensors are sparse in some sense, and by convexrelaxation:

Definition

The nuclear norm ‖T‖? is the smallest value of∑r

i=1 ‖vi‖ whereT =

∑ri=1 vi and v1, . . . , vr are simple tensors. (well-defined)

V (1), . . . ,V (d),V a spaces with a positive definitebilinear/hermitian form 〈·, ·〉

Definition

The spectral norm is defined by

‖T‖σ = max{|〈T , v〉| | v simple tensor with ‖v‖ = 1}.

The spectral norm is dual to the nuclear norm, in particular

|〈T ,S〉| ≤ ‖T‖?‖S‖σ for all tensors S ,T .

Definition

if A ∈ Rn×m = Rn ⊗ Rm is an n ×m matrix then the tensor rankof A coincides with the matrix rank of A

If λ1 ≥ λ2 ≥ · · · ≥ λr are the singular values of A, then‖A‖? = λ1 + · · ·+ λr , ‖A‖σ = λ1 (spectral/operator norm) and

‖A‖ = ‖A‖2 = ‖A‖F =√λ21 + · · ·+ λ2r (Euclidean/Frobenius

if A ∈ Rn×m = Rn ⊗ Rm is an n ×m matrix then the tensor rankof A coincides with the matrix rank of A

If λ1 ≥ λ2 ≥ · · · ≥ λr are the singular values of A, then‖A‖? = λ1 + · · ·+ λr , ‖A‖σ = λ1 (spectral/operator norm) and

‖A‖ = ‖A‖2 = ‖A‖F =√λ21 + · · ·+ λ2r (Euclidean/Frobenius

Example: Determinant Tensor

Dn =∑

σ∈Sn sgn(σ)eσ(1) ⊗ eσ(2) ⊗ · · · ⊗ eσ(n) ∈ Cn ⊗ · · · ⊗ Cn.

clearly ‖Dn‖? ≤ n! and rank(Dn) ≤ n!

‖Dn‖σ = max{| det(v1v2 · · · vn)| | ‖v1‖ = · · · = ‖vn‖ = 1} = 1

‖Dn‖? = ‖Dn‖?‖Dn‖σ ≥ 〈Dn,Dn〉 = n!.

so ‖Dn‖? ≥ n!

Theorem (D.)

‖Dn‖? = n!

Example: Determinant Tensor

Dn =∑

σ∈Sn sgn(σ)eσ(1) ⊗ eσ(2) ⊗ · · · ⊗ eσ(n) ∈ Cn ⊗ · · · ⊗ Cn.

clearly ‖Dn‖? ≤ n! and rank(Dn) ≤ n!

‖Dn‖σ = max{| det(v1v2 · · · vn)| | ‖v1‖ = · · · = ‖vn‖ = 1} = 1

‖Dn‖? = ‖Dn‖?‖Dn‖σ ≥ 〈Dn,Dn〉 = n!.

so ‖Dn‖? ≥ n!

Theorem (D.)

‖Dn‖? = n!

Example: Permanent Tensor

Pn =∑

σ∈Sn eσ(1) ⊗ eσ(2) ⊗ · · · ⊗ eσ(n) ∈ Cn ⊗ · · · ⊗ Cn.

‖Pn‖σ = max{| perm(v1v2 · · · vn)| | ‖v1‖ = · · · = ‖vn‖ = 1}

Theorem (Carlen, Lieb and Moss, 2006)

max{perm(v1v2 · · · vn) | ‖v1‖ = · · · = ‖vn‖ = 1} = n!/nn/2

nn/2‖Pn‖? = ‖Pn‖?‖Pn‖σ ≥ 〈Pn,Pn〉 = n!.

so ‖Pn‖? ≥ nn/2

Pn =∑

σ∈Sn eσ(1) ⊗ eσ(2) ⊗ · · · ⊗ eσ(n) ∈ Cn ⊗ · · · ⊗ Cn.

‖Pn‖σ = max{| perm(v1v2 · · · vn)| | ‖v1‖ = · · · = ‖vn‖ = 1}

Theorem (Carlen, Lieb and Moss, 2006)

max{perm(v1v2 · · · vn) | ‖v1‖ = · · · = ‖vn‖ = 1} = n!/nn/2

nn/2‖Pn‖? = ‖Pn‖?‖Pn‖σ ≥ 〈Pn,Pn〉 = n!.

so ‖Pn‖? ≥ nn/2

Theorem (Glynn 2010)

2n−1

(∏ni=1 δi

)(∑n

i=1 δiei )⊗ · · · ⊗ (∑n

i=1 δiei )

where δ runs over {1} × {−1, 1}n−1.

In particular,( nbn/2c

)≤ rank(Pn) ≤ 2n−1 and ‖Pn‖? ≤ nn/2, so

Theorem (D.)

‖Pn‖? = nn/2

Theorem (Glynn 2010)

2n−1

(∏ni=1 δi

)(∑n

i=1 δiei )⊗ · · · ⊗ (∑n

i=1 δiei )

where δ runs over {1} × {−1, 1}n−1.

In particular,( nbn/2c

)≤ rank(Pn) ≤ 2n−1 and ‖Pn‖? ≤ nn/2, so

Theorem (D.)

‖Pn‖? = nn/2

The Convex Decomposition (CoDe) Model

Problem

Given a tensor T ∈ V , write T = v1 + v2 + · · ·+ vr wherev1, v2, . . . , vr are simple tensors and ‖v1‖+ ‖v2‖+ · · ·+ ‖vr‖ isminimal.

Well-defined but the decomposition may not be unique.

We can also allow an error term . . .

Problem

Given a tensor T ∈ V , write T = v1 + v2 + · · ·+ vr wherev1, v2, . . . , vr are simple tensors and ‖v1‖+ ‖v2‖+ · · ·+ ‖vr‖ isminimal.

Well-defined but the decomposition may not be unique.

We can also allow an error term . . .

Problem

Given a tensor T ∈ V and fixed x ≥ 0, writeT = v1 + v2 + · · ·+ vr + E where r is a nonnegative integer,v1, v2, . . . , vr are simple tensors, ‖v1‖+ ‖v2‖+ · · ·+ ‖vr‖ ≤ x and‖E‖ is minimal.

The decomposition is not unique.

If A = v1 + v2 + · · ·+ vr then A is the unique tensor with‖A‖? ≤ x for which ‖T − A‖ is minimal.

Problem

Given a tensor T ∈ V and fixed y ≥ 0, writeT = v1 + v2 + · · ·+ vr + E where r is a nonnegative integer,v1, v2, . . . , vr are simple tensors, ‖E‖ ≤ y and‖v1‖+ ‖v2‖+ · · ·+ ‖vr‖ is minimal.

Problem

Given a tensor T ∈ V and fixed x ≥ 0, writeT = v1 + v2 + · · ·+ vr + E where r is a nonnegative integer,v1, v2, . . . , vr are simple tensors, ‖v1‖+ ‖v2‖+ · · ·+ ‖vr‖ ≤ x and‖E‖ is minimal.

The decomposition is not unique.

If A = v1 + v2 + · · ·+ vr then A is the unique tensor with‖A‖? ≤ x for which ‖T − A‖ is minimal.

Problem

Given a tensor T ∈ V and fixed y ≥ 0, writeT = v1 + v2 + · · ·+ vr + E where r is a nonnegative integer,v1, v2, . . . , vr are simple tensors, ‖E‖ ≤ y and‖v1‖+ ‖v2‖+ · · ·+ ‖vr‖ is minimal.

t-Orthogonality

Definition

Simple unit tensors v1, v2, . . . , vr are t-orthogonal if

r∑i=1

|〈vi ,w〉|2/t ≤ 1

for every simple tensor w with ‖w‖2 = 1.

1-orthogonal ⇔ orthogonalIf t > s then t-orthogonal ⇒ s-orthogonal.

Theorem (D.)

If v1, . . . , vr ∈ V are t-orthogonal, then r ≤ dim(V )1/t .

t-Orthogonality

Definition

r∑i=1

|〈vi ,w〉|2/t ≤ 1

Theorem (D.)

t-Orthogonality

Definition

r∑i=1

|〈vi ,w〉|2/t ≤ 1

Theorem (D.)

Horizontal and Vertical Tensor Product

Theorem (“horizontal tensor product”, D.)

If v1, . . . , vr are t-orthogonal, and w1, . . . ,wr are s-orthogonal,then v1 ⊗ w1, . . . , vr ⊗ wr are (s + t)-orthogonal.

If V = V (1) ⊗ · · · ⊗ V (d) and W = W (1) ⊗ · · · ⊗W (d), then

V �W := (V (1) ⊗W (1))⊗ · · · ⊗ (V (d) ⊗W (d)).

(v1⊗· · ·⊗vd)�(w1⊗· · ·⊗wd) = (v1⊗w1)⊗(v2⊗w2)⊗· · ·⊗(vd⊗wd)

Theorem (“vertical tensor product”, D.)

If v1, v2, . . . , vr ∈ V and w1, . . . ,ws ∈W are t-orthogonal, then{vi � wj | 1 ≤ i ≤ r , 1 ≤ j ≤ s} are t-orthogonal.

Horizontal and Vertical Tensor Product

Theorem (“horizontal tensor product”, D.)

If v1, . . . , vr are t-orthogonal, and w1, . . . ,wr are s-orthogonal,then v1 ⊗ w1, . . . , vr ⊗ wr are (s + t)-orthogonal.

If V = V (1) ⊗ · · · ⊗ V (d) and W = W (1) ⊗ · · · ⊗W (d), then

V �W := (V (1) ⊗W (1))⊗ · · · ⊗ (V (d) ⊗W (d)).

(v1⊗· · ·⊗vd)�(w1⊗· · ·⊗wd) = (v1⊗w1)⊗(v2⊗w2)⊗· · ·⊗(vd⊗wd)

Theorem (“vertical tensor product”, D.)

If v1, v2, . . . , vr ∈ V and w1, . . . ,ws ∈W are t-orthogonal, then{vi � wj | 1 ≤ i ≤ r , 1 ≤ j ≤ s} are t-orthogonal.

The Diagonal Singular Value Decomposition

Definition

Suppose that (?) : T =∑r

i=1 λivi such that λ1 ≥ · · · ≥ λr > 0and v1, . . . , vr are 2-orthogonal simple tensors of unit length, then(?) is called a diagonal singular value decomposition of T (DSVD).

If d = 2 (tensor product of 2 spaces) then the DSVD is the usualsingular value decomposition. For d > 2, the DSVD is differentfrom the Higher Order Singular Value Decomposition defined byDe Lathauer, De Moor, and Vandewalle. Not every tensor has aDSVD.

Theorem (D.)

If T has a DSVD then

‖T‖? =∑i

λi , ‖T‖2 =

√∑i

λ2i , ‖T‖σ = λ1

Theorem (D.)

If λ1 > λ2 > · · · > λr then the DSVD is unique.

Theorem (D.)

If v1, . . . , vr are t-orthogonal with t > 2, then the DSVD is unique.

Theorem (D.)

If T has a DSVD then

‖T‖? =∑i

λi , ‖T‖2 =

√∑i

λ2i , ‖T‖σ = λ1

Theorem (D.)

If λ1 > λ2 > · · · > λr then the DSVD is unique.

Theorem (D.)

If v1, . . . , vr are t-orthogonal with t > 2, then the DSVD is unique.

Example: Matrix Multiplication Tensor

e1, . . . , en ∈ Cn are orthogonale1 ⊗ e1, . . . , en ⊗ en ∈ Cn ⊗ Cn are 2-orthogonal

e1 ⊗ e1 ⊗ 1, . . . , en ⊗ en ⊗ e1 ∈ Cn ⊗ Cn ⊗ C are 2-orthogonale1 ⊗ 1⊗ e1, . . . , en ⊗ 1⊗ en ∈ Cn ⊗ C⊗ Cn are 2-orthogonal1⊗ e1 ⊗ e1, . . . , 1⊗ en ⊗ en ∈ C⊗ Cn ⊗ Cn are 2-orthogonal

Using vertical tensor product, we get

{(ei ⊗ ej)⊗ (ej ⊗ ek)⊗ (ek ⊗ ei ) | 1 ≤ i , j , k ≤ n}

are 2-orthogonal.

Theorem (D.)

The matrix multiplication tensor

Tn =n∑

i ,j ,k=1

ei ,j ⊗ ej ,k ⊗ ek,i

is a DSVD.

The singular values of Tn are

1, 1, . . . , 1︸︷︷︸n3

In particular,

‖Tn‖? =n3∑i=1

1 = n3.

Example: Group Algebra Multiplication Tensor

G is a group with n elements and CG ∼= Cn is the group algebra

TG =∑

g ,h∈Gg ⊗ h ⊗ h−1g−1.

DFT case corresponds to G = Z/nZ.

Theorem (D.)

TG has a DSVD and its singular values are√nd1, . . . ,

√nd1︸︷︷︸

, . . . ,√

nds, . . . ,

√nds︸︷︷︸

where d1, d2, . . . , ds are the dimension of the irreduciblerepresentations of G .

Example: Group Algebra Multiplication Tensor

G is a group with n elements and CG ∼= Cn is the group algebra

TG =∑

g ,h∈Gg ⊗ h ⊗ h−1g−1.

DFT case corresponds to G = Z/nZ.

Theorem (D.)

TG has a DSVD and its singular values are√nd1, . . . ,

√nd1︸︷︷︸

, . . . ,√

nds, . . . ,

√nds︸︷︷︸

where d1, d2, . . . , ds are the dimension of the irreduciblerepresentations of G .

Pn no DSVD

Suppose that Pn has a DSVD with singular values λ1, . . . , λr .

‖Pn‖? = nn/2 =∑r

i=1 λi

‖Pn‖σ = n!nn/2

‖Pn‖2 = n! =∑r

i=1 λ2i

r∑i=1

λi = n! =r∑

so λ1 = · · · = λr , and r = rλ1λ1

= ‖Dn‖?‖Dn‖σ = nn

n! .If n > 2 then r is not an integer!

Noisy Signal Model

Suppose that V is a finite dimensional R-vector space of signals. Ifc ∈ V is a noisy signal, then there is a decomposition

c = a + b

where a ∈ V is a sparse original signal, and b ∈ V is additive noise.

Using convex relaxation, sparsity is measured by some `1-typenorm ‖ · ‖X . The noise is measured by some `2-type norm ‖ · ‖Y .

For example, c is a tensor in V = V (1) ⊗ · · · ⊗ V (d), ‖ · ‖X = ‖ · ‖?is the nuclear norm, and ‖ · ‖Y = ‖ · ‖2 = ‖ · ‖ is the usual `2-norm.(see the CoDe Model)

Noisy Signal Model

Suppose that V is a finite dimensional R-vector space of signals. Ifc ∈ V is a noisy signal, then there is a decomposition

c = a + b

where a ∈ V is a sparse original signal, and b ∈ V is additive noise.

Using convex relaxation, sparsity is measured by some `1-typenorm ‖ · ‖X . The noise is measured by some `2-type norm ‖ · ‖Y .

For example, c is a tensor in V = V (1) ⊗ · · · ⊗ V (d), ‖ · ‖X = ‖ · ‖?is the nuclear norm, and ‖ · ‖Y = ‖ · ‖2 = ‖ · ‖ is the usual `2-norm.(see the CoDe Model)

The Pareto Frontier

V finite dimensional R-vector space‖ · ‖X , ‖ · ‖Y norms on Vc ∈ V

Definition

A pair (x , y) ∈ R2 is Pareto-efficient if there exists adecomposition c = a + b with ‖a‖ = x , ‖b‖ = y and for everydecomposition c = a′ + b′ we have ‖a′‖X > x , ‖b′‖Y > y or(‖a′‖X , ‖b′‖Y ) = (x , y). We call c = a + b an XY -decomposition.The Pareto-frontier consists of all Pareto-efficient pairs.

The Pareto-frontier is the graph of a strictly decreasing, convex,homeomorphism f cYX : [0, ‖c‖X ]→ [0, ‖c‖Y ].

The Pareto Frontier

V finite dimensional R-vector space‖ · ‖X , ‖ · ‖Y norms on Vc ∈ V

Definition

A pair (x , y) ∈ R2 is Pareto-efficient if there exists adecomposition c = a + b with ‖a‖ = x , ‖b‖ = y and for everydecomposition c = a′ + b′ we have ‖a′‖X > x , ‖b′‖Y > y or(‖a′‖X , ‖b′‖Y ) = (x , y). We call c = a + b an XY -decomposition.The Pareto-frontier consists of all Pareto-efficient pairs.

The Pareto-frontier is the graph of a strictly decreasing, convex,homeomorphism f cYX : [0, ‖c‖X ]→ [0, ‖c‖Y ].

The Pareto Sub-Frontier

〈·, ·〉 positive definite bilinear form‖v‖2 := ‖v‖ =

√〈v , v〉 `2-norm

‖ · ‖X , ‖ · ‖Y dual to each others

Theorem

Equivalent:

1. c = a + b is an X2-decomposition

2. c = a + b is an 2Y -decomposition

3. 〈a, b〉 = ‖a‖X‖b‖Y .

Definition

The Pareto sub-frontier consists of all pairs (x , y) such that thereexists an X2-decomposition c = a + b with ‖a‖X = x and‖b‖Y = y .

√〈v , v〉 `2-norm

Theorem

Equivalent:

3. 〈a, b〉 = ‖a‖X‖b‖Y .

Definition

√〈v , v〉 `2-norm

Theorem

Equivalent:

3. 〈a, b〉 = ‖a‖X‖b‖Y .

Definition

The Pareto sub-frontier is the graph of a strictly decreasinghomeomorphism hcYX : [0, ‖c‖X ]→ [0, ‖c‖Y ].

Clearly, f cYX ≤ hcYX .

Definition

A vector c ∈ V is called tight if f cYX = hcYX . If all vectors in V aretight then the norms ‖ · ‖X and ‖ · ‖Y are called tight.

(Sub)-Pareto Example

c = (3 3)t ∈ V = R2∥∥∥∥(z1z2)∥∥∥∥

21 + 2z22 ,

∥∥∥∥(z1z2)∥∥∥∥

2z21 + 12z

dual norms

0 1 2 3 4 50

So c is not tight.

The Area Below the Pareto Sub-Frontier

Suppose c = a + b is an X2-decomposition with ‖a‖X = x0 and‖b‖Y = y0. The area under the Pareto sub-frontier is12‖c‖

2 = 12‖a‖

2 + 〈a, b〉+ 12‖b‖

||c||X

||c||Y

(1/2)||a||2

(||a||X

,||b||Y

(1/2)||b||2

The Area Below the Pareto Sub-Frontier

The sub-Pareto frontier applies to:

I denoising a sparse 1D signal: ‖ · ‖1 vs. ‖ · ‖∞ (tight!)

I denoising of a matrix by soft tresholding: ‖ · ‖? vs.‖ · ‖σ(tight!)

I denoising tensors: ‖ · ‖? vs. ‖ · ‖σI compressive sensing (Basis Pursuit Denoising, LASSO,

Dantzig Selector)

I Fatemi-Osher-Rudin image denoising: total variation norm vs.its dual

I 1D total variation denoising: Taut String Method

Example: Singular Value Decomposition

Rn×m = Rn ⊗ Rm, r = min{n,m}A ∈ Rn×m with singular values λ1 ≥ λ2 ≥ · · · ≥ λr ≥ 0

‖A‖X = ‖A‖? = λ1 + λ2 + · · ·+ λr

‖A‖Y = ‖A‖σ = λ1

‖A‖ =√λ21 + λ22 + · · ·+ λ2r

The norms ‖ · ‖? and ‖ · ‖σ are tight.

XY -decompositions are related to soft tresholding

Rn×m = Rn ⊗ Rm, r = min{n,m}A ∈ Rn×m with singular values λ1 ≥ λ2 ≥ · · · ≥ λr ≥ 0

‖A‖X = ‖A‖? = λ1 + λ2 + · · ·+ λr

‖A‖Y = ‖A‖σ = λ1

‖A‖ =√λ21 + λ22 + · · ·+ λ2r

The norms ‖ · ‖? and ‖ · ‖σ are tight.

XY -decompositions are related to soft tresholding

(note that the Pareto and sub-Pareto frontiers are the same here)

If A has singular values λ1 ≥ λ2 ≥ · · · ≥ λr ≥ 0 then the Pareto(sub-)frontier is the piecewise linear function through the points:

(λ1 + λ2 + · · ·+ λk − kλk+1, λk+1), k = 0, 1, 2, . . . , r

(with the convention λr+1 = 0).

Corollary

The singular values are uniquely determined by the (sub-)Paretocurve, and vice versa.

(note that the Pareto and sub-Pareto frontiers are the same here)

If A has singular values λ1 ≥ λ2 ≥ · · · ≥ λr ≥ 0 then the Pareto(sub-)frontier is the piecewise linear function through the points:

(λ1 + λ2 + · · ·+ λk − kλk+1, λk+1), k = 0, 1, 2, . . . , r

(with the convention λr+1 = 0).

Corollary

The singular values are uniquely determined by the (sub-)Paretocurve, and vice versa.

Slope decomposition

Definition

c = c1 + c2 + · · ·+ cs is called a slope decomposition if〈ci , cj〉 = ‖ci‖X‖cj‖Y for all i ≤ j and

‖c1‖X‖c1‖Y

<‖c2‖X‖c2‖Y

< · · · < ‖cr‖X‖cr‖Y

Theorem

c ∈ V has a slope decomposition if and only if c is tight.

Theorem

The slope decomposition is unique when it exists.

Slope decomposition

Definition

‖c1‖X‖c1‖Y

<‖c2‖X‖c2‖Y

< · · · < ‖cr‖X‖cr‖Y

Theorem

Slope decomposition

Definition

‖c1‖X‖c1‖Y

<‖c2‖X‖c2‖Y

< · · · < ‖cr‖X‖cr‖Y

Theorem

Singular values for tight vectors

Suppose c ∈ V is tight, with slope decompositionc = c1 + c2 + · · ·+ cs .

Definition

Let µi = ‖ci‖Y , and λi = µi + µi+1 + · · ·+ µs for all i . Then c

has singular value λi with multiplicity ‖ci‖X‖ci‖Y −‖ci−1‖X‖ci−1‖Y .

Multiplicities and singular values are positive, but not necessarilyintegers.

Theorem

If c has singular Values λ1, . . . , λs with multiplicities m1, . . . ,ms

respectively, then ‖c‖X =∑

i miλi , ‖c‖Y = λ1 and

‖T‖ =√∑

i miλ2i .

Singular values for tight vectors

Suppose c ∈ V is tight, with slope decompositionc = c1 + c2 + · · ·+ cs .

Definition

Let µi = ‖ci‖Y , and λi = µi + µi+1 + · · ·+ µs for all i . Then c

has singular value λi with multiplicity ‖ci‖X‖ci‖Y −‖ci−1‖X‖ci−1‖Y .

Multiplicities and singular values are positive, but not necessarilyintegers.

Theorem

If c has singular Values λ1, . . . , λs with multiplicities m1, . . . ,ms

respectively, then ‖c‖X =∑

i miλi , ‖c‖Y = λ1 and

‖T‖ =√∑

i miλ2i .

Singular Values for Tensors

Norms ‖ · ‖X = ‖ · ‖?, ‖ · ‖Y = ‖ · ‖σ

Theorem

If T =∑r

i=1 λivi is a diagonal singular value decomposition, thenthe singular values of T are λ1, . . . , λr .

The permanent tensor Pn is tight, but has no DSVD.It has singular value n!

nn/2with multiplicity nn

Singular Values for Tensors

Norms ‖ · ‖X = ‖ · ‖?, ‖ · ‖Y = ‖ · ‖σ

Theorem

If T =∑r

i=1 λivi is a diagonal singular value decomposition, thenthe singular values of T are λ1, . . . , λr .

The permanent tensor Pn is tight, but has no DSVD.It has singular value n!

nn/2with multiplicity nn

Singular Value Region of a Tensor

There is an easy procedure to go from the Pareto subfrontier tothe singular values and their multiplicities for tight vectors.

Even if a vector are not tight we can use the same procedure todefine a singular spectrum. Some singular valules may haveinfinitesemal multiplicity.

Example: Singular Value Region of a Tensor

‖ · ‖X = ‖ · ‖? and ‖ · ‖Y = ‖ · ‖σ are dual norms on Rp×q×r

C = e2 ⊗ e1 ⊗ e1 + e1 ⊗ e2 ⊗ e1 + e1 ⊗ e1 ⊗ e2 ∈ R2 ⊗ R2 ⊗ R2

has continuous singular spectrum

0 1 2 3 40

singular value 2√3

with multiplicity 2713 , singular value 1

4 with

multiplicity 43 and singular values between 1

4 and 2√3

infinitesemal multiplicity‖C‖σ = 2√

3(largest singular value), area is ‖C‖? = 3, and integral

of 2y over region is ‖C‖22 = 3

Thanks

Thank You!

singular values of tensors

Documents