tensordecompositionsandtheir …people.csail.mit.edu/moitra/docs/tensors.pdfspearman’shypothesis...
TRANSCRIPT
![Page 1: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/1.jpg)
TENSOR DECOMPOSITIONS AND THEIR APPLICATIONS
ANKUR MOITRA MASSACHUSETTS INSTITUTE OF TECHNOLOGY
![Page 2: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/2.jpg)
SPEARMAN’S HYPOTHESIS
Charles Spearman (1904): There are two types of intelligence, educ%ve and reproduc%ve
![Page 3: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/3.jpg)
SPEARMAN’S HYPOTHESIS
Charles Spearman (1904): There are two types of intelligence, educ%ve and reproduc%ve
educIve (adj): the ability to make sense out of complexity reproducIve (adj): the ability to store and reproduce informaIon
![Page 4: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/4.jpg)
SPEARMAN’S HYPOTHESIS
Charles Spearman (1904): There are two types of intelligence, educ%ve and reproduc%ve
To test this theory, he invented Factor Analysis:
≈ M A
tests (10)
students (1000)
educIve (adj): the ability to make sense out of complexity reproducIve (adj): the ability to store and reproduce informaIon
BT
![Page 5: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/5.jpg)
SPEARMAN’S HYPOTHESIS
Charles Spearman (1904): There are two types of intelligence, educ%ve and reproduc%ve
To test this theory, he invented Factor Analysis:
≈ M A BT
tests (10)
students (1000) inner-‐dimension (2)
educIve (adj): the ability to make sense out of complexity reproducIve (adj): the ability to store and reproduce informaIon
![Page 6: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/6.jpg)
Given: M = ai bi ×
= A BT
“correct” factors
![Page 7: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/7.jpg)
Given: M = ai bi ×
When can we recover the factors ai and bi uniquely?
= A BT
“correct” factors
![Page 8: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/8.jpg)
Given: M = ai bi ×
When can we recover the factors ai and bi uniquely?
= A BT = AR R-‐1BT
“correct” factors alternaIve factorizaIon
![Page 9: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/9.jpg)
Given: M = ai bi ×
When can we recover the factors ai and bi uniquely?
= A BT = AR R-‐1BT
“correct” factors alternaIve factorizaIon
Claim: The factors {ai} and {bi} are not determined uniquely unless we impose addiIonal condiIons on them
![Page 10: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/10.jpg)
Given: M = ai bi ×
When can we recover the factors ai and bi uniquely?
= A BT = AR R-‐1BT
“correct” factors alternaIve factorizaIon
e.g. if {ai} and {bi} are orthogonal, or rank(M)=1
Claim: The factors {ai} and {bi} are not determined uniquely unless we impose addiIonal condiIons on them
![Page 11: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/11.jpg)
Given: M = ai bi ×
When can we recover the factors ai and bi uniquely?
Claim: The factors {ai} and {bi} are not determined uniquely unless we impose addiIonal condiIons on them
= A BT = AR R-‐1BT
“correct” factors alternaIve factorizaIon
This is called the rota=on problem, and is a major issue in factor analysis and moIvates the study of tensor methods…
e.g. if {ai} and {bi} are orthogonal, or rank(M)=1
![Page 12: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/12.jpg)
OUTLINE
Part I: Algorithms
� The RotaIon Problem
� Jennrich’s Algorithm
Part II: Applica=ons
� PhylogeneIc ReconstrucIon
� Pure Topic Models
The focus of this tutorial is on Algorithms/ApplicaIons/Models for tensor decomposiIons
Part III: Smoothed Analysis
� Overcomplete Problems
� Kruskal Rank and the Khatri-‐Rao Product
![Page 13: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/13.jpg)
MATRIX DECOMPOSITIONS
M = a1 ⌦ b1 + a2 ⌦ b2 + · · ·+ aR ⌦ bR
+ +
![Page 14: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/14.jpg)
MATRIX DECOMPOSITIONS
M = a1 ⌦ b1 + a2 ⌦ b2 + · · ·+ aR ⌦ bR
+ +
TENSOR DECOMPOSITIONS
T = a1 ⌦ b1 ⌦ c1 + · · ·+ aR ⌦ bR ⌦ cR
(i, j, k) entry of x⌦ y ⌦ z is x(i)⇥ y(j)⇥ z(k)
![Page 15: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/15.jpg)
When are tensor decomposiIons unique?
![Page 16: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/16.jpg)
When are tensor decomposiIons unique?
Theorem [Jennrich 1970]: Suppose {ai} and {bi} are linearly independent and no pair of vectors in {ci} is a scalar mulIple of each other…
![Page 17: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/17.jpg)
When are tensor decomposiIons unique?
Theorem [Jennrich 1970]: Suppose {ai} and {bi} are linearly independent and no pair of vectors in {ci} is a scalar mulIple of each other. Then
T = a1 ⌦ b1 ⌦ c1 + · · ·+ aR ⌦ bR ⌦ cR
is unique up to permuIng the rank one terms and rescaling the factors.
![Page 18: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/18.jpg)
When are tensor decomposiIons unique?
Theorem [Jennrich 1970]: Suppose {ai} and {bi} are linearly independent and no pair of vectors in {ci} is a scalar mulIple of each other. Then
T = a1 ⌦ b1 ⌦ c1 + · · ·+ aR ⌦ bR ⌦ cR
is unique up to permuIng the rank one terms and rescaling the factors.
Equivalently, the rank one factors are unique
![Page 19: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/19.jpg)
When are tensor decomposiIons unique?
Theorem [Jennrich 1970]: Suppose {ai} and {bi} are linearly independent and no pair of vectors in {ci} is a scalar mulIple of each other. Then
T = a1 ⌦ b1 ⌦ c1 + · · ·+ aR ⌦ bR ⌦ cR
is unique up to permuIng the rank one terms and rescaling the factors.
Equivalently, the rank one factors are unique
There is a simple algorithm to compute the factors too!
![Page 20: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/20.jpg)
JENNRICH’S ALGORITHM
x
i.e. add up matrix slices
xi Ti
Compute T( � , � , x )
![Page 21: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/21.jpg)
JENNRICH’S ALGORITHM
x
i.e. add up matrix slices
xi Ti
Compute T( � , � , x )
T = a b c × × If then T(� , � , x ) = c, x a b ×
![Page 22: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/22.jpg)
JENNRICH’S ALGORITHM
x
i.e. add up matrix slices
xi Ti
Compute T( � , � , x )
![Page 23: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/23.jpg)
JENNRICH’S ALGORITHM
x
i.e. add up matrix slices
xi Ti
ci, x ai bi × Compute T( � , � , x ) =
![Page 24: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/24.jpg)
JENNRICH’S ALGORITHM
x
i.e. add up matrix slices
xi Ti
ci, x ai bi ×
(x is chosen uniformly at random from Sn-‐1)
Compute T( � , � , x ) =
![Page 25: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/25.jpg)
JENNRICH’S ALGORITHM
x
i.e. add up matrix slices
xi Ti
(x is chosen uniformly at random from Sn-‐1)
Compute T( � , � , x ) = A Dx BT
Diag( ci, x )
![Page 26: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/26.jpg)
JENNRICH’S ALGORITHM
Compute T( � , � , x ) = A Dx BT
![Page 27: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/27.jpg)
JENNRICH’S ALGORITHM
Compute T( � , � , x ) = A Dx BT
Compute T( � , � , y ) = A Dy BT
![Page 28: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/28.jpg)
JENNRICH’S ALGORITHM
Compute T( � , � , x ) = A Dx BT
Compute T( � , � , y ) = A Dy BT
Diagonalize T( � , � , x ) T( � , � , y )-‐1
![Page 29: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/29.jpg)
JENNRICH’S ALGORITHM
A DxBT(BT)-‐1 Dy-‐1 A-‐1
Compute T( � , � , x ) = A Dx BT
Compute T( � , � , y ) = A Dy BT
Diagonalize T( � , � , x ) T( � , � , y )-‐1
![Page 30: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/30.jpg)
JENNRICH’S ALGORITHM
A Dx Dy-‐1 A-‐1
Compute T( � , � , x ) = A Dx BT
Compute T( � , � , y ) = A Dy BT
Diagonalize T( � , � , x ) T( � , � , y )-‐1
![Page 31: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/31.jpg)
JENNRICH’S ALGORITHM
A Dx Dy-‐1 A-‐1
Claim: whp (over x,y) the eigenvalues are disInct, so the EigendecomposiIon is unique and recovers ai’s
Compute T( � , � , x ) = A Dx BT
Compute T( � , � , y ) = A Dy BT
Diagonalize T( � , � , x ) T( � , � , y )-‐1
![Page 32: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/32.jpg)
JENNRICH’S ALGORITHM
Compute T( � , � , x ) = A Dx BT
Compute T( � , � , y ) = A Dy BT
Diagonalize T( � , � , x ) T( � , � , y )-‐1
![Page 33: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/33.jpg)
JENNRICH’S ALGORITHM
Compute T( � , � , x ) = A Dx BT
Compute T( � , � , y ) = A Dy BT
Diagonalize T( � , � , x ) T( � , � , y )-‐1
Diagonalize T( � , � , y ) T( � , � , x )-‐1
![Page 34: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/34.jpg)
JENNRICH’S ALGORITHM
Compute T( � , � , x ) = A Dx BT
Compute T( � , � , y ) = A Dy BT
Diagonalize T( � , � , x ) T( � , � , y )-‐1
Diagonalize T( � , � , y ) T( � , � , x )-‐1
Match up the factors (their eigenvalues are reciprocals) and find {ci} by solving a linear syst.
![Page 35: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/35.jpg)
Given: M = ai bi × When can we recover the factors ai and bi uniquely?
This is only possible if {ai} and {bi} are orthonormal, or rank(M)=1
![Page 36: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/36.jpg)
Given: M = ai bi × When can we recover the factors ai and bi uniquely?
This is only possible if {ai} and {bi} are orthonormal, or rank(M)=1
When can we recover the factors ai, bi and ci uniquely?
Given: T = ai bi ci × ×
![Page 37: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/37.jpg)
Given: M = ai bi × When can we recover the factors ai and bi uniquely?
This is only possible if {ai} and {bi} are orthonormal, or rank(M)=1
When can we recover the factors ai, bi and ci uniquely?
Jennrich: If {ai} and {bi} are full rank and no pair in {ci} are scalar mulIples of each other
Given: T = ai bi ci × ×
![Page 38: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/38.jpg)
OUTLINE
Part I: Algorithms
� The RotaIon Problem
� Jennrich’s Algorithm
Part II: Applica=ons
� PhylogeneIc ReconstrucIon
� Pure Topic Models
The focus of this tutorial is on Algorithms/ApplicaIons/Models for tensor decomposiIons
Part III: Smoothed Analysis
� Overcomplete Problems
� Kruskal Rank and the Khatri-‐Rao Product
![Page 39: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/39.jpg)
PHYLOGENETIC RECONSTRUCTION
x
a
b c
d z
y
= exInct
= extant
“Tree of Life”
![Page 40: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/40.jpg)
PHYLOGENETIC RECONSTRUCTION
x
a
b c
d z
y
= exInct
= extant
![Page 41: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/41.jpg)
PHYLOGENETIC RECONSTRUCTION
root: π : Σ R+ “iniIal distribuIon”
x
a
b c
d z
y
= exInct
= extant
Σ = alphabet
![Page 42: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/42.jpg)
PHYLOGENETIC RECONSTRUCTION
root: π : Σ R+ “iniIal distribuIon”
x
a
b c
d z
y
Rz,b “condiIonal distribuIon”
= exInct
= extant
Σ = alphabet
![Page 43: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/43.jpg)
PHYLOGENETIC RECONSTRUCTION
root: π : Σ R+ “iniIal distribuIon”
x
a
b c
d z
y
Rz,b “condiIonal distribuIon”
= exInct
= extant
In each sample, we observe a symbol (Σ) at each extant ( ) node where we sample from π for the root, and propagate it using Rx,y, etc
Σ = alphabet
![Page 44: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/44.jpg)
HIDDEN MARKOV MODELS
x
a
= hidden
= observed
y
b
z
c …
![Page 45: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/45.jpg)
HIDDEN MARKOV MODELS
π : Σs R+ “iniIal distribuIon”
x
a
= hidden
= observed
y
b
z
c …
![Page 46: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/46.jpg)
HIDDEN MARKOV MODELS
π : Σs R+ “iniIal distribuIon”
x
a
= hidden
= observed
y
b
z
c
Rx,y
Ox,a
“transiIon matrices” “obs. m
atrices”
…
![Page 47: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/47.jpg)
HIDDEN MARKOV MODELS
π : Σs R+ “iniIal distribuIon”
x
a
= hidden
= observed
In each sample, we observe a symbol (Σo) at each obs. ( ) node where we sample from π for the start, and propagate it using Rx,y, etc (Σs)
y
b
z
c
Rx,y
Ox,a
“transiIon matrices” “obs. m
atrices”
…
![Page 48: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/48.jpg)
Ques=on: Can we reconstruct just the topology from random samples?
![Page 49: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/49.jpg)
Usually, we assume Tx,y, etc are full rank so that we can re-‐root the tree arbitrarily
Ques=on: Can we reconstruct just the topology from random samples?
![Page 50: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/50.jpg)
[Steel, 1994]: The following is a distance funcIon on the edges
dx,y = -‐ ln |det(Px,y)| + ½ ln πx,σ -‐ ½ ln πy,σ σ in Σ σ in Σ
where Px,y is the joint distribuIon
Usually, we assume Tx,y, etc are full rank so that we can re-‐root the tree arbitrarily
Ques=on: Can we reconstruct just the topology from random samples?
![Page 51: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/51.jpg)
[Steel, 1994]: The following is a distance funcIon on the edges
dx,y = -‐ ln |det(Px,y)| + ½ ln πx,σ -‐ ½ ln πy,σ σ in Σ σ in Σ
where Px,y is the joint distribuIon, and the distance between leaves is the sum of distances on the path in the tree
Usually, we assume Tx,y, etc are full rank so that we can re-‐root the tree arbitrarily
Ques=on: Can we reconstruct just the topology from random samples?
![Page 52: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/52.jpg)
[Steel, 1994]: The following is a distance funcIon on the edges
dx,y = -‐ ln |det(Px,y)| + ½ ln πx,σ -‐ ½ ln πy,σ σ in Σ σ in Σ
where Px,y is the joint distribuIon, and the distance between leaves is the sum of distances on the path in the tree
(It’s not even obvious it’s nonnega=ve!)
Usually, we assume Tx,y, etc are full rank so that we can re-‐root the tree arbitrarily
Ques=on: Can we reconstruct just the topology from random samples?
![Page 53: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/53.jpg)
Usually, we assume Tx,y, etc are full rank so that we can re-‐root the tree arbitrarily
Ques=on: Can we reconstruct just the topology from random samples?
![Page 54: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/54.jpg)
[Erdos, Steel, Szekely, Warnow, 1997]: Used Steel’s distance funcIon and quartet tests
Usually, we assume Tx,y, etc are full rank so that we can re-‐root the tree arbitrarily
to reconstrucIon the topology
a b
c d OR
a c
d b OR …
Ques=on: Can we reconstruct just the topology from random samples?
![Page 55: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/55.jpg)
[Erdos, Steel, Szekely, Warnow, 1997]: Used Steel’s distance funcIon and quartet tests
Usually, we assume Tx,y, etc are full rank so that we can re-‐root the tree arbitrarily
to reconstrucIon the topology, from polynomially many samples
a b
c d OR
a c
d b OR …
Ques=on: Can we reconstruct just the topology from random samples?
![Page 56: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/56.jpg)
[Erdos, Steel, Szekely, Warnow, 1997]: Used Steel’s distance funcIon and quartet tests
Usually, we assume Tx,y, etc are full rank so that we can re-‐root the tree arbitrarily
to reconstrucIon the topology, from polynomially many samples
a b
c d OR
a c
d b OR …
For many problems (e.g. HMMs) finding the transiIon matrices is the main issue…
Ques=on: Can we reconstruct just the topology from random samples?
![Page 57: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/57.jpg)
[Chang, 1996]: The model is idenIfiable (if R’s are full rank)
![Page 58: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/58.jpg)
x
a
b c
d z
y
[Chang, 1996]: The model is idenIfiable (if R’s are full rank)
Rz,b
![Page 59: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/59.jpg)
x
a
b c
d z
y
[Chang, 1996]: The model is idenIfiable (if R’s are full rank)
Rz,b
![Page 60: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/60.jpg)
x
a
b c
d z
y z
a
b c
[Chang, 1996]: The model is idenIfiable (if R’s are full rank)
Rz,b
![Page 61: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/61.jpg)
x
a
b c
d z
y z
a
b c
Joint distribu=on over (a, b, c):
× Pr[z = σ] Pr[a|z = σ] Pr[b|z = σ] Pr[c|z = σ] × σ
[Chang, 1996]: The model is idenIfiable (if R’s are full rank)
Rz,b
![Page 62: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/62.jpg)
x
a
b c
d z
y z
a
b c
Joint distribu=on over (a, b, c):
× Pr[z = σ] Pr[a|z = σ] Pr[b|z = σ] Pr[c|z = σ] × σ
[Chang, 1996]: The model is idenIfiable (if R’s are full rank)
columns of Rz,b
Rz,b
![Page 63: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/63.jpg)
[Mossel, Roch, 2006]: There is an algorithm to PAC learn a
phylogeneIc tree or an HMM (if its transiIon/output matrices are full rank) from polynomially many samples
![Page 64: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/64.jpg)
[Mossel, Roch, 2006]: There is an algorithm to PAC learn a
phylogeneIc tree or an HMM (if its transiIon/output matrices are full rank) from polynomially many samples
Ques=on: Is the full-‐rank assumpIon necessary?
![Page 65: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/65.jpg)
[Mossel, Roch, 2006]: There is an algorithm to PAC learn a
phylogeneIc tree or an HMM (if its transiIon/output matrices are full rank) from polynomially many samples
Ques=on: Is the full-‐rank assumpIon necessary?
[Mossel, Roch, 2006]: It is as hard as noisy-‐parity to learn the parameters of a general HMM
![Page 66: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/66.jpg)
[Mossel, Roch, 2006]: There is an algorithm to PAC learn a
phylogeneIc tree or an HMM (if its transiIon/output matrices are full rank) from polynomially many samples
Ques=on: Is the full-‐rank assumpIon necessary?
[Mossel, Roch, 2006]: It is as hard as noisy-‐parity to learn the parameters of a general HMM
Noisy-‐parity is an infamous problem in learning, where O(n) samples suffice but the best algorithms run in Ime 2n/log(n)
Due to [Blum, Kalai, Wasserman, 2003]
![Page 67: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/67.jpg)
[Mossel, Roch, 2006]: There is an algorithm to PAC learn a
phylogeneIc tree or an HMM (if its transiIon/output matrices are full rank) from polynomially many samples
Ques=on: Is the full-‐rank assumpIon necessary?
[Mossel, Roch, 2006]: It is as hard as noisy-‐parity to learn the parameters of a general HMM
Noisy-‐parity is an infamous problem in learning, where O(n) samples suffice but the best algorithms run in Ime 2n/log(n)
Due to [Blum, Kalai, Wasserman, 2003]
(It’s now used as a hard problem to build cryptosystems!)
![Page 68: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/68.jpg)
THE POWER OF CONDITIONAL INDEPENDENCE
[Phylogene=c Trees/HMMS]:
× Pr[z = σ] Pr[a|z = σ] Pr[b|z = σ] Pr[c|z = σ] × σ
(joint distribuIon on leaves a, b, c)
![Page 69: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/69.jpg)
PURE TOPIC MODELS
words (m
)
topics (r)
� Each topic is a distribuIon on words
![Page 70: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/70.jpg)
PURE TOPIC MODELS
words (m
)
topics (r)
� Each topic is a distribuIon on words
� Each document is about only one topic
(stochasIcally generated)
![Page 71: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/71.jpg)
PURE TOPIC MODELS
words (m
)
topics (r)
� Each topic is a distribuIon on words
� Each document is about only one topic
(stochasIcally generated)
� Each document, we sample L words from its distribuIon
![Page 72: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/72.jpg)
PURE TOPIC MODELS
= A W M
![Page 73: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/73.jpg)
PURE TOPIC MODELS
= A W M
![Page 74: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/74.jpg)
PURE TOPIC MODELS
= A W M
![Page 75: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/75.jpg)
PURE TOPIC MODELS
= A W M
![Page 76: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/76.jpg)
PURE TOPIC MODELS
= A W M
![Page 77: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/77.jpg)
PURE TOPIC MODELS
= A W M
![Page 78: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/78.jpg)
PURE TOPIC MODELS
A W M
≈
![Page 79: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/79.jpg)
PURE TOPIC MODELS
A W M
≈ [Anandkumar, Hsu, Kakade, 2012]: Algorithm for learning pure topic models from polynomially many samples (A is full rank)
![Page 80: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/80.jpg)
PURE TOPIC MODELS
A W M
≈ [Anandkumar, Hsu, Kakade, 2012]: Algorithm for learning pure topic models from polynomially many samples (A is full rank)
Ques=on: Where can we find three condiIonally independent random variables?
![Page 81: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/81.jpg)
PURE TOPIC MODELS
A W M
≈ [Anandkumar, Hsu, Kakade, 2012]: Algorithm for learning pure topic models from polynomially many samples (A is full rank)
![Page 82: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/82.jpg)
PURE TOPIC MODELS
A W M
≈ [Anandkumar, Hsu, Kakade, 2012]: Algorithm for learning pure topic models from polynomially many samples (A is full rank)
The first, second and third words are independent condiIoned on the topic t (and are random samples from At)
![Page 83: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/83.jpg)
THE POWER OF CONDITIONAL INDEPENDENCE
[Phylogene=c Trees/HMMS]:
× Pr[z = σ] Pr[a|z = σ] Pr[b|z = σ] Pr[c|z = σ] × σ
(joint distribuIon on leaves a, b, c)
![Page 84: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/84.jpg)
THE POWER OF CONDITIONAL INDEPENDENCE
[Phylogene=c Trees/HMMS]:
× Pr[z = σ] Pr[a|z = σ] Pr[b|z = σ] Pr[c|z = σ] × σ
(joint distribuIon on leaves a, b, c)
[Pure Topic Models/LDA]:
× Pr[topic = j] Aj Aj Aj × j
(joint distribuIon on first three words)
![Page 85: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/85.jpg)
THE POWER OF CONDITIONAL INDEPENDENCE
[Phylogene=c Trees/HMMS]:
× Pr[z = σ] Pr[a|z = σ] Pr[b|z = σ] Pr[c|z = σ] × σ
(joint distribuIon on leaves a, b, c)
[Pure Topic Models/LDA]:
× Pr[topic = j] Aj Aj Aj × j
(joint distribuIon on first three words)
[Community Detec=on]:
× Pr[Cx = j] (CAΠ)j (CBΠ)j (CCΠ)j × j
(counIng stars)
![Page 86: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/86.jpg)
OUTLINE
Part I: Algorithms
� The RotaIon Problem
� Jennrich’s Algorithm
Part II: Applica=ons
� PhylogeneIc ReconstrucIon
� Pure Topic Models
The focus of this tutorial is on Algorithms/ApplicaIons/Models for tensor decomposiIons
Part III: Smoothed Analysis
� Overcomplete Problems
� Kruskal Rank and the Khatri-‐Rao Product
![Page 87: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/87.jpg)
So far, Jennrich’s algorithm has been the key but it has a crucial limitaIon.
![Page 88: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/88.jpg)
So far, Jennrich’s algorithm has been the key but it has a crucial limitaIon. Let
T = ai ai ai × × i = 1
R
where {ai} are n-‐dimensional vectors
![Page 89: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/89.jpg)
So far, Jennrich’s algorithm has been the key but it has a crucial limitaIon. Let
Ques=on: What if R is much larger than n?
T = ai ai ai × × i = 1
R
where {ai} are n-‐dimensional vectors
![Page 90: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/90.jpg)
So far, Jennrich’s algorithm has been the key but it has a crucial limitaIon. Let
Ques=on: What if R is much larger than n?
T = ai ai ai × × i = 1
R
This is called the overcomplete case e.g. the number of factors is much larger than the number of observaIons…
where {ai} are n-‐dimensional vectors
![Page 91: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/91.jpg)
So far, Jennrich’s algorithm has been the key but it has a crucial limitaIon. Let
Ques=on: What if R is much larger than n?
T = ai ai ai × × i = 1
R
This is called the overcomplete case e.g. the number of factors is much larger than the number of observaIons…
where {ai} are n-‐dimensional vectors
In such cases, why stop at third-‐order tensors?
![Page 92: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/92.jpg)
Consider a sixth-‐order tensor T:
T = ai ai ai × × i = 1
R
ai ai ai × × ×
![Page 93: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/93.jpg)
Consider a sixth-‐order tensor T:
T = ai ai ai × × i = 1
R
ai ai ai × × ×
Ques=on: Can we find its factors, even if R is much larger than n?
![Page 94: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/94.jpg)
Consider a sixth-‐order tensor T:
T = ai ai ai × × i = 1
R
ai ai ai × × ×
flat(T) = bi bi bi × × i = 1
R
(where bi = ai ai × KR )
n2-‐dimensional vector whose (j,k)th entry is the product of the jth and kth entries of ai Khatri-‐Rao product
Ques=on: Can we find its factors, even if R is much larger than n?
Let’s flaven it:
![Page 95: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/95.jpg)
Consider a sixth-‐order tensor T:
T = ai ai ai × × i = 1
R
ai ai ai × × ×
flat(T) = bi bi bi × × i = 1
R
(where bi = ai ai × KR )
n2-‐dimensional vector whose (j,k)th entry is the product of the jth and kth entries of ai Khatri-‐Rao product
Ques=on: Can we find its factors, even if R is much larger than n?
Let’s flaven it by rearranging its entries into a third-‐order tensor:
![Page 96: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/96.jpg)
Ques=on: Can we apply Jennrich’s Algorithm to flat(T)?
![Page 97: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/97.jpg)
Ques=on: Can we apply Jennrich’s Algorithm to flat(T)?
When are the new factors bi = ai ai × KR linearly independent?
![Page 98: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/98.jpg)
Ques=on: Can we apply Jennrich’s Algorithm to flat(T)?
When are the new factors bi = ai ai × KR linearly independent?
Example #1:
Let {ai} be all ( ) n 2 vectors with exactly two ones
![Page 99: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/99.jpg)
Ques=on: Can we apply Jennrich’s Algorithm to flat(T)?
When are the new factors bi = ai ai × KR linearly independent?
Example #1:
Let {ai} be all ( ) n 2 vectors with exactly two ones
Then {bi} are vectorizaIons of:
=
![Page 100: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/100.jpg)
Ques=on: Can we apply Jennrich’s Algorithm to flat(T)?
When are the new factors bi = ai ai × KR linearly independent?
Example #1:
Let {ai} be all ( ) n 2 vectors with exactly two ones
Then {bi} are vectorizaIons of:
=
Non-‐zero only in bi
![Page 101: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/101.jpg)
Ques=on: Can we apply Jennrich’s Algorithm to flat(T)?
When are the new factors bi = ai ai × KR linearly independent?
Example #1:
Let {ai} be all ( ) n 2 vectors with exactly two ones
Then {bi} are vectorizaIons of:
=
Non-‐zero only in bi
and are linearly independent
![Page 102: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/102.jpg)
Ques=on: Can we apply Jennrich’s Algorithm to flat(T)?
When are the new factors bi = ai ai × KR linearly independent?
![Page 103: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/103.jpg)
Ques=on: Can we apply Jennrich’s Algorithm to flat(T)?
When are the new factors bi = ai ai × KR linearly independent?
Example #2:
Let {a1…n} and {an+1..2n} be two random orthonormal bases
![Page 104: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/104.jpg)
Ques=on: Can we apply Jennrich’s Algorithm to flat(T)?
When are the new factors bi = ai ai × KR linearly independent?
Example #2:
Then there is a linear dependence with 2n terms:
Let {a1…n} and {an+1..2n} be two random orthonormal bases
![Page 105: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/105.jpg)
Ques=on: Can we apply Jennrich’s Algorithm to flat(T)?
When are the new factors bi = ai ai × KR linearly independent?
Example #2:
Then there is a linear dependence with 2n terms:
ai ai × KR i = 1
n
-‐ ai ai × KR i = n+1
2n
= 0
Let {a1…n} and {an+1..2n} be two random orthonormal bases
![Page 106: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/106.jpg)
Ques=on: Can we apply Jennrich’s Algorithm to flat(T)?
When are the new factors bi = ai ai × KR linearly independent?
Example #2:
Then there is a linear dependence with 2n terms:
ai ai × KR i = 1
n
-‐ ai ai × KR i = n+1
2n
= 0 (as matrices, both sum to the idenIty)
Let {a1…n} and {an+1..2n} be two random orthonormal bases
![Page 107: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/107.jpg)
THE KRUSKAL RANK
![Page 108: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/108.jpg)
THE KRUSKAL RANK
Defini=on: The Kruskal rank (k-‐rank) of {bi} is the largest k s.t. every set of k vectors is linearly independent
![Page 109: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/109.jpg)
THE KRUSKAL RANK
Defini=on: The Kruskal rank (k-‐rank) of {bi} is the largest k s.t. every set of k vectors is linearly independent
bi = ai ai × KR k-‐rank({ai}) = n
![Page 110: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/110.jpg)
THE KRUSKAL RANK
Defini=on: The Kruskal rank (k-‐rank) of {bi} is the largest k s.t. every set of k vectors is linearly independent
bi = ai ai × KR Example #1: k-‐rank({bi}) = R = ( ) n
2
k-‐rank({ai}) = n
![Page 111: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/111.jpg)
THE KRUSKAL RANK
Defini=on: The Kruskal rank (k-‐rank) of {bi} is the largest k s.t. every set of k vectors is linearly independent
bi = ai ai × KR Example #1: k-‐rank({bi}) = R = ( ) n
2
Example #2: k-‐rank({bi}) = 2n-‐1
k-‐rank({ai}) = n
![Page 112: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/112.jpg)
THE KRUSKAL RANK
Defini=on: The Kruskal rank (k-‐rank) of {bi} is the largest k s.t. every set of k vectors is linearly independent
bi = ai ai × KR Example #1: k-‐rank({bi}) = R = ( ) n
2
Example #2: k-‐rank({bi}) = 2n-‐1
k-‐rank({ai}) = n
The Kruskal rank always adds under the Khatri-‐Rao product, but someImes it mul=plies and that can allow us to handle R >> n
![Page 113: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/113.jpg)
[Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank mulIplies under the Khatri-‐Rao product
![Page 114: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/114.jpg)
[Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank mulIplies under the Khatri-‐Rao product
Proof: The set of {ai} where
bi = ai ai × KR det({bi}) = 0 and
is measure zero
![Page 115: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/115.jpg)
[Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank mulIplies under the Khatri-‐Rao product
Proof: The set of {ai} where
bi = ai ai × KR det({bi}) = 0 and
is measure zero
But this yields a very weak bound on the condi=on number of {bi}…
![Page 116: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/116.jpg)
[Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank mulIplies under the Khatri-‐Rao product
Proof: The set of {ai} where
bi = ai ai × KR det({bi}) = 0 and
is measure zero
But this yields a very weak bound on the condi=on number of {bi}…
… which is what we need to apply it to learning/staIsIcs, where we have an esImate to T
![Page 117: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/117.jpg)
[Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank mulIplies under the Khatri-‐Rao product
![Page 118: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/118.jpg)
[Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank mulIplies under the Khatri-‐Rao product
Defini=on: The robust Kruskal rank (k-‐rankγ) of {bi} is the largest k s.t. every set of k vector has condiIon number at most O(γ)
![Page 119: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/119.jpg)
[Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank mulIplies under the Khatri-‐Rao product
[Bhaskara, Charikar, Vijayaraghavan, 2013]: The robust Kruskal rank always under the Khatri-‐Rao product
Defini=on: The robust Kruskal rank (k-‐rankγ) of {bi} is the largest k s.t. every set of k vector has condiIon number at most O(γ)
![Page 120: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/120.jpg)
[Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank mulIplies under the Khatri-‐Rao product
[Bhaskara, Charikar, Vijayaraghavan, 2013]: The robust Kruskal rank always under the Khatri-‐Rao product
Defini=on: The robust Kruskal rank (k-‐rankγ) of {bi} is the largest k s.t. every set of k vector has condiIon number at most O(γ)
[Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {ai} are ε-‐perturbed…
![Page 121: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/121.jpg)
[Allman, Ma=as, Rhodes, 2009]: Almost surely, the Kruskal rank mulIplies under the Khatri-‐Rao product
[Bhaskara, Charikar, Vijayaraghavan, 2013]: The robust Kruskal rank always under the Khatri-‐Rao product
Defini=on: The robust Kruskal rank (k-‐rankγ) of {bi} is the largest k s.t. every set of k vector has condiIon number at most O(γ)
[Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {ai} are ε-‐perturbed. Then
k-‐rankγ({bi}) = R
for R = n2/2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ)
![Page 122: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/122.jpg)
[Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {ai} are ε-‐perturbed. Then
k-‐rankγ({bi}) = R
for R = n2/2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ)
![Page 123: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/123.jpg)
[Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {ai} are ε-‐perturbed. Then
k-‐rankγ({bi}) = R
for R = n2/2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ)
Hence we can apply Jennrich’s Algorithm to flat(T) with R >> n
![Page 124: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/124.jpg)
[Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {ai} are ε-‐perturbed. Then
k-‐rankγ({bi}) = R
for R = n2/2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ)
Note: These bounds are easy to prove with inverse polynomial failure probability, but then γ depends δ
Hence we can apply Jennrich’s Algorithm to flat(T) with R >> n
![Page 125: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/125.jpg)
[Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {ai} are ε-‐perturbed. Then
k-‐rankγ({bi}) = R
for R = n2/2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ)
Note: These bounds are easy to prove with inverse polynomial failure probability, but then γ depends δ
This can be extended to any constant order Khatri-‐Rao product
Hence we can apply Jennrich’s Algorithm to flat(T) with R >> n
![Page 126: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/126.jpg)
[Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {ai} are ε-‐perturbed. Then
k-‐rankγ({bi}) = R
for R = n2/2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ)
Hence we can apply Jennrich’s Algorithm to flat(T) with R >> n
![Page 127: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/127.jpg)
[Bhaskara, Charikar, Moitra, Vijayaraghavan, 2014]: Suppose the vectors {ai} are ε-‐perturbed. Then
k-‐rankγ({bi}) = R
for R = n2/2 and γ = poly(1/n, ε) with exponen=ally small failure probability (δ)
Sample applica=on: Algorithm for learning mixtures of nO(1) spherical Gaussians in Rn, if their means are ε-‐perturbed
This was also obtained independently by [Anderson, Belkin, Goyal, Rademacher, Voss, 2014]
Hence we can apply Jennrich’s Algorithm to flat(T) with R >> n
![Page 128: TENSORDECOMPOSITIONSANDTHEIR …people.csail.mit.edu/moitra/docs/Tensors.pdfSPEARMAN’SHYPOTHESIS CharlesSpearman(1904): There’are’two’types’of’intelligence,’ educve ’and’reproducve*](https://reader030.vdocuments.net/reader030/viewer/2022041103/5f02754c7e708231d4045cc1/html5/thumbnails/128.jpg)
Any QuesIons? Summary:
� Tensor decomposiIons are unique under much more general condiIons, compared to matrix decomposiIons
� Jennrich’s Algorithm (rediscovered many Imes!), and its many applicaIons in learning/staIsIcs
� Introduced new models to study overcomplete problems (R >> n)
� Are there algorithms for order-‐k tensors that work with R = n0.51 k?