singular values of tensors
TRANSCRIPT
Singular Values of Tensors
Harm Derksen
University of Michigan
CUNY/NYUFebruary 15, 2019
Harm Derksen Singular Values of Tensors
Tensor Rank
F a field, F = R or F = CV (i) ∼= Fni for i = 1, 2, . . . , dV = V (1) ⊗ V (2) ⊗ · · · ⊗ V (d) ∼= Fn1×···×nd tensor product space
Definition
A simple tensor is a tensor of the form v (1) ⊗ v (2) ⊗ · · · ⊗ v (d)
(v (i) ∈ V (i)).
Definition (tensor rank)
The rank of T is the smallest positive integer r such that T is thesum of r simple tensors.
Harm Derksen Singular Values of Tensors
Tensor Rank
F a field, F = R or F = CV (i) ∼= Fni for i = 1, 2, . . . , dV = V (1) ⊗ V (2) ⊗ · · · ⊗ V (d) ∼= Fn1×···×nd tensor product space
Definition
A simple tensor is a tensor of the form v (1) ⊗ v (2) ⊗ · · · ⊗ v (d)
(v (i) ∈ V (i)).
Definition (tensor rank)
The rank of T is the smallest positive integer r such that T is thesum of r simple tensors.
Harm Derksen Singular Values of Tensors
Matrix Multiplication Tensor
Md =d∑
i=1
d∑j=1
d∑k=1
ei ,j ⊗ ej ,k ⊗ ek,i ∈ Fn×n ⊗ Fn×n ⊗ Fn×n
clearly rank(Md) ≤ d3
Theorem (Strassen)
if rank(Md) = s then complexity of n × n matrix multiplication isO(nlogd (s)) (the standard algorithm is O(n3))
Theorem (Strassen)
rank(M2) ≤ 7, so complexity of n × n matrix multiplication isO(nlog2(7)) = O(n2.81)
Current record: O(n2.373) (Le Gall)
Harm Derksen Singular Values of Tensors
Matrix Multiplication Tensor
Md =d∑
i=1
d∑j=1
d∑k=1
ei ,j ⊗ ej ,k ⊗ ek,i ∈ Fn×n ⊗ Fn×n ⊗ Fn×n
clearly rank(Md) ≤ d3
Theorem (Strassen)
if rank(Md) = s then complexity of n × n matrix multiplication isO(nlogd (s)) (the standard algorithm is O(n3))
Theorem (Strassen)
rank(M2) ≤ 7, so complexity of n × n matrix multiplication isO(nlog2(7)) = O(n2.81)
Current record: O(n2.373) (Le Gall)
Harm Derksen Singular Values of Tensors
Matrix Multiplication Tensor
Md =d∑
i=1
d∑j=1
d∑k=1
ei ,j ⊗ ej ,k ⊗ ek,i ∈ Fn×n ⊗ Fn×n ⊗ Fn×n
clearly rank(Md) ≤ d3
Theorem (Strassen)
if rank(Md) = s then complexity of n × n matrix multiplication isO(nlogd (s)) (the standard algorithm is O(n3))
Theorem (Strassen)
rank(M2) ≤ 7, so complexity of n × n matrix multiplication isO(nlog2(7)) = O(n2.81)
Current record: O(n2.373) (Le Gall)
Harm Derksen Singular Values of Tensors
The Canonical Polyadic (CP) Model
aka PARAFAC, CANDECOMP
Problem
Given a tensor T ∈ V = V (1) ⊗ · · · ⊗ V (d), writeT = v1 + v2 + · · ·+ vr where v1, v2, . . . , vr are simple tensors and ris minimal.
I the CP-decomposition is sometimes unique, not always
I not numerically stable: there is no upper bound formax{‖v1‖, . . . , ‖vr‖} as a function of ‖T‖.
I difficult to compute: for many tensors of interest the rank isunknown
Many numerical applications in psychometrics, signal processing,numerical linear algebra, computer vision, numerical analysis, datamining, graph analysis, neuroscience, chemometrics, etc.
Harm Derksen Singular Values of Tensors
The Canonical Polyadic (CP) Model
aka PARAFAC, CANDECOMP
Problem
Given a tensor T ∈ V = V (1) ⊗ · · · ⊗ V (d), writeT = v1 + v2 + · · ·+ vr where v1, v2, . . . , vr are simple tensors and ris minimal.
I the CP-decomposition is sometimes unique, not always
I not numerically stable: there is no upper bound formax{‖v1‖, . . . , ‖vr‖} as a function of ‖T‖.
I difficult to compute: for many tensors of interest the rank isunknown
Many numerical applications in psychometrics, signal processing,numerical linear algebra, computer vision, numerical analysis, datamining, graph analysis, neuroscience, chemometrics, etc.
Harm Derksen Singular Values of Tensors
The Canonical Polyadic (CP) Model
To make it more numerically stable, we could allow an error term.
Problem
Given a tensor T ∈ V and fixed r , write T = v1 + v2 + · · ·+ vr +Ewhere v1, v2, . . . , vr are simple tensors and ‖E‖ is minimal.
This optimization problem might be ill-posed. Define
T = e2 ⊗ e1 ⊗ e1 + e1 ⊗ e2 ⊗ e1 + e1 ⊗ e1 ⊗ e2,
v1(t) = t−1(e1+te2)⊗(e1+te2)⊗(e1+te2), v2(t) = −t−1e1⊗e1⊗e1Then T = v1(t) + v2(t) + E (t) with ‖E (t)‖ = t
√3 + t.
So we can write T = v1 + v2 + E with ‖E‖ = ε for every ε > 0,but we cannot write T = v1 + v2 + E with ‖E‖ = 0.
Harm Derksen Singular Values of Tensors
The Canonical Polyadic (CP) Model
To make it more numerically stable, we could allow an error term.
Problem
Given a tensor T ∈ V and fixed r , write T = v1 + v2 + · · ·+ vr +Ewhere v1, v2, . . . , vr are simple tensors and ‖E‖ is minimal.
This optimization problem might be ill-posed. Define
T = e2 ⊗ e1 ⊗ e1 + e1 ⊗ e2 ⊗ e1 + e1 ⊗ e1 ⊗ e2,
v1(t) = t−1(e1+te2)⊗(e1+te2)⊗(e1+te2), v2(t) = −t−1e1⊗e1⊗e1Then T = v1(t) + v2(t) + E (t) with ‖E (t)‖ = t
√3 + t.
So we can write T = v1 + v2 + E with ‖E‖ = ε for every ε > 0,but we cannot write T = v1 + v2 + E with ‖E‖ = 0.
Harm Derksen Singular Values of Tensors
Motivation: Compressed Sensing and Convex Relaxation
For x ∈ Rn, its sparsity is measured by ‖x‖0 = #{i | xi 6= 0}.
Problem
Given A ∈ Rm×n, b ∈ Rm, find a solution x ∈ Rn for Ax = b with‖x‖0 minimal (a sparsest solution).
But, ‖ · ‖0 is not convex and this optimization problem is difficult,
so instead we consider:
Problem (Basis Pursuit)
Given A ∈ Rm×n, b ∈ Rm, find a solution x ∈ Rn for Ax = b with‖x‖1 minimal.
Basis Pursuit can be solved by linear programming and is generallyfast. Under reasonable assumptions, Basis Pursuit also gives thesparsest solutions (Candes-Tao, Donoho).
Harm Derksen Singular Values of Tensors
Motivation: Compressed Sensing and Convex Relaxation
For x ∈ Rn, its sparsity is measured by ‖x‖0 = #{i | xi 6= 0}.
Problem
Given A ∈ Rm×n, b ∈ Rm, find a solution x ∈ Rn for Ax = b with‖x‖0 minimal (a sparsest solution).
But, ‖ · ‖0 is not convex and this optimization problem is difficult,so instead we consider:
Problem (Basis Pursuit)
Given A ∈ Rm×n, b ∈ Rm, find a solution x ∈ Rn for Ax = b with‖x‖1 minimal.
Basis Pursuit can be solved by linear programming and is generallyfast. Under reasonable assumptions, Basis Pursuit also gives thesparsest solutions (Candes-Tao, Donoho).
Harm Derksen Singular Values of Tensors
The Nuclear and Spectral Norms
Low rank tensors are sparse in some sense, and by convexrelaxation:
Definition
The nuclear norm ‖T‖? is the smallest value of∑r
i=1 ‖vi‖ whereT =
∑ri=1 vi and v1, . . . , vr are simple tensors. (well-defined)
V (1), . . . ,V (d),V a spaces with a positive definitebilinear/hermitian form 〈·, ·〉
Definition
The spectral norm is defined by
‖T‖σ = max{|〈T , v〉| | v simple tensor with ‖v‖ = 1}.
The spectral norm is dual to the nuclear norm, in particular
|〈T ,S〉| ≤ ‖T‖?‖S‖σ for all tensors S ,T .
Harm Derksen Singular Values of Tensors
The Nuclear and Spectral Norms
Low rank tensors are sparse in some sense, and by convexrelaxation:
Definition
The nuclear norm ‖T‖? is the smallest value of∑r
i=1 ‖vi‖ whereT =
∑ri=1 vi and v1, . . . , vr are simple tensors. (well-defined)
V (1), . . . ,V (d),V a spaces with a positive definitebilinear/hermitian form 〈·, ·〉
Definition
The spectral norm is defined by
‖T‖σ = max{|〈T , v〉| | v simple tensor with ‖v‖ = 1}.
The spectral norm is dual to the nuclear norm, in particular
|〈T ,S〉| ≤ ‖T‖?‖S‖σ for all tensors S ,T .
Harm Derksen Singular Values of Tensors
The Nuclear and Spectral Norms
Low rank tensors are sparse in some sense, and by convexrelaxation:
Definition
The nuclear norm ‖T‖? is the smallest value of∑r
i=1 ‖vi‖ whereT =
∑ri=1 vi and v1, . . . , vr are simple tensors. (well-defined)
V (1), . . . ,V (d),V a spaces with a positive definitebilinear/hermitian form 〈·, ·〉
Definition
The spectral norm is defined by
‖T‖σ = max{|〈T , v〉| | v simple tensor with ‖v‖ = 1}.
The spectral norm is dual to the nuclear norm, in particular
|〈T ,S〉| ≤ ‖T‖?‖S‖σ for all tensors S ,T .
Harm Derksen Singular Values of Tensors
The Nuclear and Spectral Norms
if A ∈ Rn×m = Rn ⊗ Rm is an n ×m matrix then the tensor rankof A coincides with the matrix rank of A
If λ1 ≥ λ2 ≥ · · · ≥ λr are the singular values of A, then‖A‖? = λ1 + · · ·+ λr , ‖A‖σ = λ1 (spectral/operator norm) and
‖A‖ = ‖A‖2 = ‖A‖F =√λ21 + · · ·+ λ2r (Euclidean/Frobenius
norm)
Harm Derksen Singular Values of Tensors
The Nuclear and Spectral Norms
if A ∈ Rn×m = Rn ⊗ Rm is an n ×m matrix then the tensor rankof A coincides with the matrix rank of A
If λ1 ≥ λ2 ≥ · · · ≥ λr are the singular values of A, then‖A‖? = λ1 + · · ·+ λr , ‖A‖σ = λ1 (spectral/operator norm) and
‖A‖ = ‖A‖2 = ‖A‖F =√λ21 + · · ·+ λ2r (Euclidean/Frobenius
norm)
Harm Derksen Singular Values of Tensors
Example: Determinant Tensor
Dn =∑
σ∈Sn sgn(σ)eσ(1) ⊗ eσ(2) ⊗ · · · ⊗ eσ(n) ∈ Cn ⊗ · · · ⊗ Cn.
clearly ‖Dn‖? ≤ n! and rank(Dn) ≤ n!
‖Dn‖σ = max{| det(v1v2 · · · vn)| | ‖v1‖ = · · · = ‖vn‖ = 1} = 1
‖Dn‖? = ‖Dn‖?‖Dn‖σ ≥ 〈Dn,Dn〉 = n!.
so ‖Dn‖? ≥ n!
Theorem (D.)
‖Dn‖? = n!
Harm Derksen Singular Values of Tensors
Example: Determinant Tensor
Dn =∑
σ∈Sn sgn(σ)eσ(1) ⊗ eσ(2) ⊗ · · · ⊗ eσ(n) ∈ Cn ⊗ · · · ⊗ Cn.
clearly ‖Dn‖? ≤ n! and rank(Dn) ≤ n!
‖Dn‖σ = max{| det(v1v2 · · · vn)| | ‖v1‖ = · · · = ‖vn‖ = 1} = 1
‖Dn‖? = ‖Dn‖?‖Dn‖σ ≥ 〈Dn,Dn〉 = n!.
so ‖Dn‖? ≥ n!
Theorem (D.)
‖Dn‖? = n!
Harm Derksen Singular Values of Tensors
Example: Permanent Tensor
Pn =∑
σ∈Sn eσ(1) ⊗ eσ(2) ⊗ · · · ⊗ eσ(n) ∈ Cn ⊗ · · · ⊗ Cn.
‖Pn‖σ = max{| perm(v1v2 · · · vn)| | ‖v1‖ = · · · = ‖vn‖ = 1}
Theorem (Carlen, Lieb and Moss, 2006)
max{perm(v1v2 · · · vn) | ‖v1‖ = · · · = ‖vn‖ = 1} = n!/nn/2
n!
nn/2‖Pn‖? = ‖Pn‖?‖Pn‖σ ≥ 〈Pn,Pn〉 = n!.
so ‖Pn‖? ≥ nn/2
Harm Derksen Singular Values of Tensors
Example: Permanent Tensor
Pn =∑
σ∈Sn eσ(1) ⊗ eσ(2) ⊗ · · · ⊗ eσ(n) ∈ Cn ⊗ · · · ⊗ Cn.
‖Pn‖σ = max{| perm(v1v2 · · · vn)| | ‖v1‖ = · · · = ‖vn‖ = 1}
Theorem (Carlen, Lieb and Moss, 2006)
max{perm(v1v2 · · · vn) | ‖v1‖ = · · · = ‖vn‖ = 1} = n!/nn/2
n!
nn/2‖Pn‖? = ‖Pn‖?‖Pn‖σ ≥ 〈Pn,Pn〉 = n!.
so ‖Pn‖? ≥ nn/2
Harm Derksen Singular Values of Tensors
Example: Permanent Tensor
Theorem (Glynn 2010)
Pn =1
2n−1
∑δ
(∏ni=1 δi
)(∑n
i=1 δiei )⊗ · · · ⊗ (∑n
i=1 δiei )
where δ runs over {1} × {−1, 1}n−1.
In particular,( nbn/2c
)≤ rank(Pn) ≤ 2n−1 and ‖Pn‖? ≤ nn/2, so
Theorem (D.)
‖Pn‖? = nn/2
Harm Derksen Singular Values of Tensors
Example: Permanent Tensor
Theorem (Glynn 2010)
Pn =1
2n−1
∑δ
(∏ni=1 δi
)(∑n
i=1 δiei )⊗ · · · ⊗ (∑n
i=1 δiei )
where δ runs over {1} × {−1, 1}n−1.
In particular,( nbn/2c
)≤ rank(Pn) ≤ 2n−1 and ‖Pn‖? ≤ nn/2, so
Theorem (D.)
‖Pn‖? = nn/2
Harm Derksen Singular Values of Tensors
The Convex Decomposition (CoDe) Model
Problem
Given a tensor T ∈ V , write T = v1 + v2 + · · ·+ vr wherev1, v2, . . . , vr are simple tensors and ‖v1‖+ ‖v2‖+ · · ·+ ‖vr‖ isminimal.
Well-defined but the decomposition may not be unique.
We can also allow an error term . . .
Harm Derksen Singular Values of Tensors
The Convex Decomposition (CoDe) Model
Problem
Given a tensor T ∈ V , write T = v1 + v2 + · · ·+ vr wherev1, v2, . . . , vr are simple tensors and ‖v1‖+ ‖v2‖+ · · ·+ ‖vr‖ isminimal.
Well-defined but the decomposition may not be unique.
We can also allow an error term . . .
Harm Derksen Singular Values of Tensors
The Convex Decomposition (CoDe) Model
Problem
Given a tensor T ∈ V and fixed x ≥ 0, writeT = v1 + v2 + · · ·+ vr + E where r is a nonnegative integer,v1, v2, . . . , vr are simple tensors, ‖v1‖+ ‖v2‖+ · · ·+ ‖vr‖ ≤ x and‖E‖ is minimal.
The decomposition is not unique.
If A = v1 + v2 + · · ·+ vr then A is the unique tensor with‖A‖? ≤ x for which ‖T − A‖ is minimal.
Problem
Given a tensor T ∈ V and fixed y ≥ 0, writeT = v1 + v2 + · · ·+ vr + E where r is a nonnegative integer,v1, v2, . . . , vr are simple tensors, ‖E‖ ≤ y and‖v1‖+ ‖v2‖+ · · ·+ ‖vr‖ is minimal.
Harm Derksen Singular Values of Tensors
The Convex Decomposition (CoDe) Model
Problem
Given a tensor T ∈ V and fixed x ≥ 0, writeT = v1 + v2 + · · ·+ vr + E where r is a nonnegative integer,v1, v2, . . . , vr are simple tensors, ‖v1‖+ ‖v2‖+ · · ·+ ‖vr‖ ≤ x and‖E‖ is minimal.
The decomposition is not unique.
If A = v1 + v2 + · · ·+ vr then A is the unique tensor with‖A‖? ≤ x for which ‖T − A‖ is minimal.
Problem
Given a tensor T ∈ V and fixed y ≥ 0, writeT = v1 + v2 + · · ·+ vr + E where r is a nonnegative integer,v1, v2, . . . , vr are simple tensors, ‖E‖ ≤ y and‖v1‖+ ‖v2‖+ · · ·+ ‖vr‖ is minimal.
Harm Derksen Singular Values of Tensors
t-Orthogonality
Definition
Simple unit tensors v1, v2, . . . , vr are t-orthogonal if
r∑i=1
|〈vi ,w〉|2/t ≤ 1
for every simple tensor w with ‖w‖2 = 1.
1-orthogonal ⇔ orthogonalIf t > s then t-orthogonal ⇒ s-orthogonal.
Theorem (D.)
If v1, . . . , vr ∈ V are t-orthogonal, then r ≤ dim(V )1/t .
Harm Derksen Singular Values of Tensors
t-Orthogonality
Definition
Simple unit tensors v1, v2, . . . , vr are t-orthogonal if
r∑i=1
|〈vi ,w〉|2/t ≤ 1
for every simple tensor w with ‖w‖2 = 1.
1-orthogonal ⇔ orthogonalIf t > s then t-orthogonal ⇒ s-orthogonal.
Theorem (D.)
If v1, . . . , vr ∈ V are t-orthogonal, then r ≤ dim(V )1/t .
Harm Derksen Singular Values of Tensors
t-Orthogonality
Definition
Simple unit tensors v1, v2, . . . , vr are t-orthogonal if
r∑i=1
|〈vi ,w〉|2/t ≤ 1
for every simple tensor w with ‖w‖2 = 1.
1-orthogonal ⇔ orthogonalIf t > s then t-orthogonal ⇒ s-orthogonal.
Theorem (D.)
If v1, . . . , vr ∈ V are t-orthogonal, then r ≤ dim(V )1/t .
Harm Derksen Singular Values of Tensors
Horizontal and Vertical Tensor Product
Theorem (“horizontal tensor product”, D.)
If v1, . . . , vr are t-orthogonal, and w1, . . . ,wr are s-orthogonal,then v1 ⊗ w1, . . . , vr ⊗ wr are (s + t)-orthogonal.
If V = V (1) ⊗ · · · ⊗ V (d) and W = W (1) ⊗ · · · ⊗W (d), then
V �W := (V (1) ⊗W (1))⊗ · · · ⊗ (V (d) ⊗W (d)).
(v1⊗· · ·⊗vd)�(w1⊗· · ·⊗wd) = (v1⊗w1)⊗(v2⊗w2)⊗· · ·⊗(vd⊗wd)
Theorem (“vertical tensor product”, D.)
If v1, v2, . . . , vr ∈ V and w1, . . . ,ws ∈W are t-orthogonal, then{vi � wj | 1 ≤ i ≤ r , 1 ≤ j ≤ s} are t-orthogonal.
Harm Derksen Singular Values of Tensors
Horizontal and Vertical Tensor Product
Theorem (“horizontal tensor product”, D.)
If v1, . . . , vr are t-orthogonal, and w1, . . . ,wr are s-orthogonal,then v1 ⊗ w1, . . . , vr ⊗ wr are (s + t)-orthogonal.
If V = V (1) ⊗ · · · ⊗ V (d) and W = W (1) ⊗ · · · ⊗W (d), then
V �W := (V (1) ⊗W (1))⊗ · · · ⊗ (V (d) ⊗W (d)).
(v1⊗· · ·⊗vd)�(w1⊗· · ·⊗wd) = (v1⊗w1)⊗(v2⊗w2)⊗· · ·⊗(vd⊗wd)
Theorem (“vertical tensor product”, D.)
If v1, v2, . . . , vr ∈ V and w1, . . . ,ws ∈W are t-orthogonal, then{vi � wj | 1 ≤ i ≤ r , 1 ≤ j ≤ s} are t-orthogonal.
Harm Derksen Singular Values of Tensors
The Diagonal Singular Value Decomposition
Definition
Suppose that (?) : T =∑r
i=1 λivi such that λ1 ≥ · · · ≥ λr > 0and v1, . . . , vr are 2-orthogonal simple tensors of unit length, then(?) is called a diagonal singular value decomposition of T (DSVD).
If d = 2 (tensor product of 2 spaces) then the DSVD is the usualsingular value decomposition. For d > 2, the DSVD is differentfrom the Higher Order Singular Value Decomposition defined byDe Lathauer, De Moor, and Vandewalle. Not every tensor has aDSVD.
Harm Derksen Singular Values of Tensors
The Diagonal Singular Value Decomposition
Theorem (D.)
If T has a DSVD then
‖T‖? =∑i
λi , ‖T‖2 =
√∑i
λ2i , ‖T‖σ = λ1
Theorem (D.)
If λ1 > λ2 > · · · > λr then the DSVD is unique.
Theorem (D.)
If v1, . . . , vr are t-orthogonal with t > 2, then the DSVD is unique.
Harm Derksen Singular Values of Tensors
The Diagonal Singular Value Decomposition
Theorem (D.)
If T has a DSVD then
‖T‖? =∑i
λi , ‖T‖2 =
√∑i
λ2i , ‖T‖σ = λ1
Theorem (D.)
If λ1 > λ2 > · · · > λr then the DSVD is unique.
Theorem (D.)
If v1, . . . , vr are t-orthogonal with t > 2, then the DSVD is unique.
Harm Derksen Singular Values of Tensors
Example: Matrix Multiplication Tensor
e1, . . . , en ∈ Cn are orthogonale1 ⊗ e1, . . . , en ⊗ en ∈ Cn ⊗ Cn are 2-orthogonal
e1 ⊗ e1 ⊗ 1, . . . , en ⊗ en ⊗ e1 ∈ Cn ⊗ Cn ⊗ C are 2-orthogonale1 ⊗ 1⊗ e1, . . . , en ⊗ 1⊗ en ∈ Cn ⊗ C⊗ Cn are 2-orthogonal1⊗ e1 ⊗ e1, . . . , 1⊗ en ⊗ en ∈ C⊗ Cn ⊗ Cn are 2-orthogonal
Using vertical tensor product, we get
{(ei ⊗ ej)⊗ (ej ⊗ ek)⊗ (ek ⊗ ei ) | 1 ≤ i , j , k ≤ n}
are 2-orthogonal.
Harm Derksen Singular Values of Tensors
Example: Matrix Multiplication Tensor
e1, . . . , en ∈ Cn are orthogonale1 ⊗ e1, . . . , en ⊗ en ∈ Cn ⊗ Cn are 2-orthogonal
e1 ⊗ e1 ⊗ 1, . . . , en ⊗ en ⊗ e1 ∈ Cn ⊗ Cn ⊗ C are 2-orthogonale1 ⊗ 1⊗ e1, . . . , en ⊗ 1⊗ en ∈ Cn ⊗ C⊗ Cn are 2-orthogonal1⊗ e1 ⊗ e1, . . . , 1⊗ en ⊗ en ∈ C⊗ Cn ⊗ Cn are 2-orthogonal
Using vertical tensor product, we get
{(ei ⊗ ej)⊗ (ej ⊗ ek)⊗ (ek ⊗ ei ) | 1 ≤ i , j , k ≤ n}
are 2-orthogonal.
Harm Derksen Singular Values of Tensors
Example: Matrix Multiplication Tensor
e1, . . . , en ∈ Cn are orthogonale1 ⊗ e1, . . . , en ⊗ en ∈ Cn ⊗ Cn are 2-orthogonal
e1 ⊗ e1 ⊗ 1, . . . , en ⊗ en ⊗ e1 ∈ Cn ⊗ Cn ⊗ C are 2-orthogonale1 ⊗ 1⊗ e1, . . . , en ⊗ 1⊗ en ∈ Cn ⊗ C⊗ Cn are 2-orthogonal1⊗ e1 ⊗ e1, . . . , 1⊗ en ⊗ en ∈ C⊗ Cn ⊗ Cn are 2-orthogonal
Using vertical tensor product, we get
{(ei ⊗ ej)⊗ (ej ⊗ ek)⊗ (ek ⊗ ei ) | 1 ≤ i , j , k ≤ n}
are 2-orthogonal.
Harm Derksen Singular Values of Tensors
Example: Matrix Multiplication Tensor
Theorem (D.)
The matrix multiplication tensor
Tn =n∑
i ,j ,k=1
ei ,j ⊗ ej ,k ⊗ ek,i
is a DSVD.
The singular values of Tn are
1, 1, . . . , 1︸ ︷︷ ︸n3
In particular,
‖Tn‖? =n3∑i=1
1 = n3.
Harm Derksen Singular Values of Tensors
Example: Group Algebra Multiplication Tensor
G is a group with n elements and CG ∼= Cn is the group algebra
TG =∑
g ,h∈Gg ⊗ h ⊗ h−1g−1.
DFT case corresponds to G = Z/nZ.
Theorem (D.)
TG has a DSVD and its singular values are√nd1, . . . ,
√nd1︸ ︷︷ ︸
d31
, . . . ,√
nds, . . . ,
√nds︸ ︷︷ ︸
d3s
where d1, d2, . . . , ds are the dimension of the irreduciblerepresentations of G .
Harm Derksen Singular Values of Tensors
Example: Group Algebra Multiplication Tensor
G is a group with n elements and CG ∼= Cn is the group algebra
TG =∑
g ,h∈Gg ⊗ h ⊗ h−1g−1.
DFT case corresponds to G = Z/nZ.
Theorem (D.)
TG has a DSVD and its singular values are√nd1, . . . ,
√nd1︸ ︷︷ ︸
d31
, . . . ,√
nds, . . . ,
√nds︸ ︷︷ ︸
d3s
where d1, d2, . . . , ds are the dimension of the irreduciblerepresentations of G .
Harm Derksen Singular Values of Tensors
Pn no DSVD
Suppose that Pn has a DSVD with singular values λ1, . . . , λr .
‖Pn‖? = nn/2 =∑r
i=1 λi
‖Pn‖σ = n!nn/2
= λ1
‖Pn‖2 = n! =∑r
i=1 λ2i
λ1
r∑i=1
λi = n! =r∑
i=1
λ2i
so λ1 = · · · = λr , and r = rλ1λ1
= ‖Dn‖?‖Dn‖σ = nn
n! .If n > 2 then r is not an integer!
Harm Derksen Singular Values of Tensors
Noisy Signal Model
Suppose that V is a finite dimensional R-vector space of signals. Ifc ∈ V is a noisy signal, then there is a decomposition
c = a + b
where a ∈ V is a sparse original signal, and b ∈ V is additive noise.
Using convex relaxation, sparsity is measured by some `1-typenorm ‖ · ‖X . The noise is measured by some `2-type norm ‖ · ‖Y .
For example, c is a tensor in V = V (1) ⊗ · · · ⊗ V (d), ‖ · ‖X = ‖ · ‖?is the nuclear norm, and ‖ · ‖Y = ‖ · ‖2 = ‖ · ‖ is the usual `2-norm.(see the CoDe Model)
Harm Derksen Singular Values of Tensors
Noisy Signal Model
Suppose that V is a finite dimensional R-vector space of signals. Ifc ∈ V is a noisy signal, then there is a decomposition
c = a + b
where a ∈ V is a sparse original signal, and b ∈ V is additive noise.
Using convex relaxation, sparsity is measured by some `1-typenorm ‖ · ‖X . The noise is measured by some `2-type norm ‖ · ‖Y .
For example, c is a tensor in V = V (1) ⊗ · · · ⊗ V (d), ‖ · ‖X = ‖ · ‖?is the nuclear norm, and ‖ · ‖Y = ‖ · ‖2 = ‖ · ‖ is the usual `2-norm.(see the CoDe Model)
Harm Derksen Singular Values of Tensors
The Pareto Frontier
V finite dimensional R-vector space‖ · ‖X , ‖ · ‖Y norms on Vc ∈ V
Definition
A pair (x , y) ∈ R2 is Pareto-efficient if there exists adecomposition c = a + b with ‖a‖ = x , ‖b‖ = y and for everydecomposition c = a′ + b′ we have ‖a′‖X > x , ‖b′‖Y > y or(‖a′‖X , ‖b′‖Y ) = (x , y). We call c = a + b an XY -decomposition.The Pareto-frontier consists of all Pareto-efficient pairs.
Lemma
The Pareto-frontier is the graph of a strictly decreasing, convex,homeomorphism f cYX : [0, ‖c‖X ]→ [0, ‖c‖Y ].
Harm Derksen Singular Values of Tensors
The Pareto Frontier
V finite dimensional R-vector space‖ · ‖X , ‖ · ‖Y norms on Vc ∈ V
Definition
A pair (x , y) ∈ R2 is Pareto-efficient if there exists adecomposition c = a + b with ‖a‖ = x , ‖b‖ = y and for everydecomposition c = a′ + b′ we have ‖a′‖X > x , ‖b′‖Y > y or(‖a′‖X , ‖b′‖Y ) = (x , y). We call c = a + b an XY -decomposition.The Pareto-frontier consists of all Pareto-efficient pairs.
Lemma
The Pareto-frontier is the graph of a strictly decreasing, convex,homeomorphism f cYX : [0, ‖c‖X ]→ [0, ‖c‖Y ].
Harm Derksen Singular Values of Tensors
The Pareto Sub-Frontier
〈·, ·〉 positive definite bilinear form‖v‖2 := ‖v‖ =
√〈v , v〉 `2-norm
‖ · ‖X , ‖ · ‖Y dual to each others
Theorem
Equivalent:
1. c = a + b is an X2-decomposition
2. c = a + b is an 2Y -decomposition
3. 〈a, b〉 = ‖a‖X‖b‖Y .
Definition
The Pareto sub-frontier consists of all pairs (x , y) such that thereexists an X2-decomposition c = a + b with ‖a‖X = x and‖b‖Y = y .
Harm Derksen Singular Values of Tensors
The Pareto Sub-Frontier
〈·, ·〉 positive definite bilinear form‖v‖2 := ‖v‖ =
√〈v , v〉 `2-norm
‖ · ‖X , ‖ · ‖Y dual to each others
Theorem
Equivalent:
1. c = a + b is an X2-decomposition
2. c = a + b is an 2Y -decomposition
3. 〈a, b〉 = ‖a‖X‖b‖Y .
Definition
The Pareto sub-frontier consists of all pairs (x , y) such that thereexists an X2-decomposition c = a + b with ‖a‖X = x and‖b‖Y = y .
Harm Derksen Singular Values of Tensors
The Pareto Sub-Frontier
〈·, ·〉 positive definite bilinear form‖v‖2 := ‖v‖ =
√〈v , v〉 `2-norm
‖ · ‖X , ‖ · ‖Y dual to each others
Theorem
Equivalent:
1. c = a + b is an X2-decomposition
2. c = a + b is an 2Y -decomposition
3. 〈a, b〉 = ‖a‖X‖b‖Y .
Definition
The Pareto sub-frontier consists of all pairs (x , y) such that thereexists an X2-decomposition c = a + b with ‖a‖X = x and‖b‖Y = y .
Harm Derksen Singular Values of Tensors
The Pareto Sub-Frontier
Lemma
The Pareto sub-frontier is the graph of a strictly decreasinghomeomorphism hcYX : [0, ‖c‖X ]→ [0, ‖c‖Y ].
Clearly, f cYX ≤ hcYX .
Definition
A vector c ∈ V is called tight if f cYX = hcYX . If all vectors in V aretight then the norms ‖ · ‖X and ‖ · ‖Y are called tight.
Harm Derksen Singular Values of Tensors
(Sub)-Pareto Example
c = (3 3)t ∈ V = R2∥∥∥∥(z1z2)∥∥∥∥
X
=√
12z
21 + 2z22 ,
∥∥∥∥(z1z2)∥∥∥∥
Y
=√
2z21 + 12z
22 .
dual norms
0 1 2 3 4 50
1
2
3
4
5
fYX
,hYX
So c is not tight.
Harm Derksen Singular Values of Tensors
The Area Below the Pareto Sub-Frontier
Suppose c = a + b is an X2-decomposition with ‖a‖X = x0 and‖b‖Y = y0. The area under the Pareto sub-frontier is12‖c‖
2 = 12‖a‖
2 + 〈a, b〉+ 12‖b‖
2.
X
Y
hYX
||c||X
||c||Y
x0
y0
(1/2)||a||2
2
(||a||X
,||b||Y
)
<a,b>
(1/2)||b||2
2
Harm Derksen Singular Values of Tensors
The Area Below the Pareto Sub-Frontier
The sub-Pareto frontier applies to:
I denoising a sparse 1D signal: ‖ · ‖1 vs. ‖ · ‖∞ (tight!)
I denoising of a matrix by soft tresholding: ‖ · ‖? vs.‖ · ‖σ(tight!)
I denoising tensors: ‖ · ‖? vs. ‖ · ‖σI compressive sensing (Basis Pursuit Denoising, LASSO,
Dantzig Selector)
I Fatemi-Osher-Rudin image denoising: total variation norm vs.its dual
I 1D total variation denoising: Taut String Method
Harm Derksen Singular Values of Tensors
Example: Singular Value Decomposition
Rn×m = Rn ⊗ Rm, r = min{n,m}A ∈ Rn×m with singular values λ1 ≥ λ2 ≥ · · · ≥ λr ≥ 0
‖A‖X = ‖A‖? = λ1 + λ2 + · · ·+ λr
‖A‖Y = ‖A‖σ = λ1
‖A‖ =√λ21 + λ22 + · · ·+ λ2r
Lemma
The norms ‖ · ‖? and ‖ · ‖σ are tight.
XY -decompositions are related to soft tresholding
Harm Derksen Singular Values of Tensors
Example: Singular Value Decomposition
Rn×m = Rn ⊗ Rm, r = min{n,m}A ∈ Rn×m with singular values λ1 ≥ λ2 ≥ · · · ≥ λr ≥ 0
‖A‖X = ‖A‖? = λ1 + λ2 + · · ·+ λr
‖A‖Y = ‖A‖σ = λ1
‖A‖ =√λ21 + λ22 + · · ·+ λ2r
Lemma
The norms ‖ · ‖? and ‖ · ‖σ are tight.
XY -decompositions are related to soft tresholding
Harm Derksen Singular Values of Tensors
Example: Singular Value Decomposition
(note that the Pareto and sub-Pareto frontiers are the same here)
Lemma
If A has singular values λ1 ≥ λ2 ≥ · · · ≥ λr ≥ 0 then the Pareto(sub-)frontier is the piecewise linear function through the points:
(λ1 + λ2 + · · ·+ λk − kλk+1, λk+1), k = 0, 1, 2, . . . , r
(with the convention λr+1 = 0).
Corollary
The singular values are uniquely determined by the (sub-)Paretocurve, and vice versa.
Harm Derksen Singular Values of Tensors
Example: Singular Value Decomposition
(note that the Pareto and sub-Pareto frontiers are the same here)
Lemma
If A has singular values λ1 ≥ λ2 ≥ · · · ≥ λr ≥ 0 then the Pareto(sub-)frontier is the piecewise linear function through the points:
(λ1 + λ2 + · · ·+ λk − kλk+1, λk+1), k = 0, 1, 2, . . . , r
(with the convention λr+1 = 0).
Corollary
The singular values are uniquely determined by the (sub-)Paretocurve, and vice versa.
Harm Derksen Singular Values of Tensors
Slope decomposition
Definition
c = c1 + c2 + · · ·+ cs is called a slope decomposition if〈ci , cj〉 = ‖ci‖X‖cj‖Y for all i ≤ j and
‖c1‖X‖c1‖Y
<‖c2‖X‖c2‖Y
< · · · < ‖cr‖X‖cr‖Y
.
Theorem
c ∈ V has a slope decomposition if and only if c is tight.
Theorem
The slope decomposition is unique when it exists.
Harm Derksen Singular Values of Tensors
Slope decomposition
Definition
c = c1 + c2 + · · ·+ cs is called a slope decomposition if〈ci , cj〉 = ‖ci‖X‖cj‖Y for all i ≤ j and
‖c1‖X‖c1‖Y
<‖c2‖X‖c2‖Y
< · · · < ‖cr‖X‖cr‖Y
.
Theorem
c ∈ V has a slope decomposition if and only if c is tight.
Theorem
The slope decomposition is unique when it exists.
Harm Derksen Singular Values of Tensors
Slope decomposition
Definition
c = c1 + c2 + · · ·+ cs is called a slope decomposition if〈ci , cj〉 = ‖ci‖X‖cj‖Y for all i ≤ j and
‖c1‖X‖c1‖Y
<‖c2‖X‖c2‖Y
< · · · < ‖cr‖X‖cr‖Y
.
Theorem
c ∈ V has a slope decomposition if and only if c is tight.
Theorem
The slope decomposition is unique when it exists.
Harm Derksen Singular Values of Tensors
Singular values for tight vectors
Suppose c ∈ V is tight, with slope decompositionc = c1 + c2 + · · ·+ cs .
Definition
Let µi = ‖ci‖Y , and λi = µi + µi+1 + · · ·+ µs for all i . Then c
has singular value λi with multiplicity ‖ci‖X‖ci‖Y −‖ci−1‖X‖ci−1‖Y .
Multiplicities and singular values are positive, but not necessarilyintegers.
Theorem
If c has singular Values λ1, . . . , λs with multiplicities m1, . . . ,ms
respectively, then ‖c‖X =∑
i miλi , ‖c‖Y = λ1 and
‖T‖ =√∑
i miλ2i .
Harm Derksen Singular Values of Tensors
Singular values for tight vectors
Suppose c ∈ V is tight, with slope decompositionc = c1 + c2 + · · ·+ cs .
Definition
Let µi = ‖ci‖Y , and λi = µi + µi+1 + · · ·+ µs for all i . Then c
has singular value λi with multiplicity ‖ci‖X‖ci‖Y −‖ci−1‖X‖ci−1‖Y .
Multiplicities and singular values are positive, but not necessarilyintegers.
Theorem
If c has singular Values λ1, . . . , λs with multiplicities m1, . . . ,ms
respectively, then ‖c‖X =∑
i miλi , ‖c‖Y = λ1 and
‖T‖ =√∑
i miλ2i .
Harm Derksen Singular Values of Tensors
Singular Values for Tensors
Norms ‖ · ‖X = ‖ · ‖?, ‖ · ‖Y = ‖ · ‖σ
Theorem
If T =∑r
i=1 λivi is a diagonal singular value decomposition, thenthe singular values of T are λ1, . . . , λr .
The permanent tensor Pn is tight, but has no DSVD.It has singular value n!
nn/2with multiplicity nn
n! .
Harm Derksen Singular Values of Tensors
Singular Values for Tensors
Norms ‖ · ‖X = ‖ · ‖?, ‖ · ‖Y = ‖ · ‖σ
Theorem
If T =∑r
i=1 λivi is a diagonal singular value decomposition, thenthe singular values of T are λ1, . . . , λr .
The permanent tensor Pn is tight, but has no DSVD.It has singular value n!
nn/2with multiplicity nn
n! .
Harm Derksen Singular Values of Tensors
Singular Value Region of a Tensor
There is an easy procedure to go from the Pareto subfrontier tothe singular values and their multiplicities for tight vectors.
Even if a vector are not tight we can use the same procedure todefine a singular spectrum. Some singular valules may haveinfinitesemal multiplicity.
Harm Derksen Singular Values of Tensors
Example: Singular Value Region of a Tensor
‖ · ‖X = ‖ · ‖? and ‖ · ‖Y = ‖ · ‖σ are dual norms on Rp×q×r
C = e2 ⊗ e1 ⊗ e1 + e1 ⊗ e2 ⊗ e1 + e1 ⊗ e1 ⊗ e2 ∈ R2 ⊗ R2 ⊗ R2
has continuous singular spectrum
0 1 2 3 40
0.5
1
1.5
singular value 2√3
with multiplicity 2713 , singular value 1
4 with
multiplicity 43 and singular values between 1
4 and 2√3
with
infinitesemal multiplicity‖C‖σ = 2√
3(largest singular value), area is ‖C‖? = 3, and integral
of 2y over region is ‖C‖22 = 3
Harm Derksen Singular Values of Tensors
Thanks
Thank You!
Harm Derksen Singular Values of Tensors