mlconf nyc animashree anandkumar
DESCRIPTION
TRANSCRIPT
![Page 1: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/1.jpg)
Tensor Decompositions for Guaranteed Learningof Latent Variable Models
Anima Anandkumar
U.C. Irvine
![Page 2: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/2.jpg)
Application 1: Topic Modeling
Document modeling
Observed: words in document corpus.
Hidden: topics.
Goal: carry out document summarization.
![Page 3: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/3.jpg)
Application 2: Understanding Human Communities
Social Networks
Observed: network of social ties, e.g. friendships, co-authorships
Hidden: groups/communities of actors.
![Page 4: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/4.jpg)
Application 3: Recommender Systems
Recommender System
Observed: Ratings of users for various products, e.g. yelp reviews.
Goal: Predict new recommendations.
Modeling: Find groups/communities of users and products.
![Page 5: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/5.jpg)
Application 4: Feature Learning
Feature Engineering
Learn good features/representations for classification tasks, e.g.image and speech recognition.
Sparse representations, low dimensional hidden structures.
![Page 6: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/6.jpg)
Application 5: Computational Biology
Observed: gene expression levels
Goal: discover gene groups
Hidden variables: regulators controlling gene groups
“Unsupervised Learning of Transcriptional Regulatory Networks via Latent Tree Graphical
Model” by A. Gitter, F. Huang, R. Valluvan, E. Fraenkel and A. Anandkumar Submitted to
BMC Bioinformatics, Jan. 2014.
![Page 7: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/7.jpg)
Statistical Framework
In all applications: discover hidden structure in data: unsupervisedlearning.
Latent Variable Models
Concise statistical description throughgraphical modeling
Conditional independence relationshipsor hierarchy of variables. x
h
![Page 8: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/8.jpg)
Statistical Framework
In all applications: discover hidden structure in data: unsupervisedlearning.
Latent Variable Models
Concise statistical description throughgraphical modeling
Conditional independence relationshipsor hierarchy of variables. x1 x2 x3 x4 x5
h
![Page 9: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/9.jpg)
Statistical Framework
In all applications: discover hidden structure in data: unsupervisedlearning.
Latent Variable Models
Concise statistical description throughgraphical modeling
Conditional independence relationshipsor hierarchy of variables. x1 x2 x3 x4 x5
h1
h2 h3
![Page 10: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/10.jpg)
Computational FrameworkChallenge: Efficient Learning of Latent Variable Models
Maximum likelihood is NP-hard.
Practice: EM, Variational Bayes have no consistency guarantees.
Efficient computational and sample complexities?
![Page 11: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/11.jpg)
Computational FrameworkChallenge: Efficient Learning of Latent Variable Models
Maximum likelihood is NP-hard.
Practice: EM, Variational Bayes have no consistency guarantees.
Efficient computational and sample complexities?
Fast methods such as matrix factorization are not statistical. Wecannot learn the latent variable model through such methods.
![Page 12: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/12.jpg)
Computational FrameworkChallenge: Efficient Learning of Latent Variable Models
Maximum likelihood is NP-hard.
Practice: EM, Variational Bayes have no consistency guarantees.
Efficient computational and sample complexities?
Fast methods such as matrix factorization are not statistical. Wecannot learn the latent variable model through such methods.
Tensor-based Estimation
Estimate moment tensors from data: higher order relationships.
Compute decomposition of moment tensor.
Iterative updates, e.g. tensor power iterations, alternatingminimization.
Non-convex: convergence to a local optima. No guarantees.
![Page 13: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/13.jpg)
Computational FrameworkChallenge: Efficient Learning of Latent Variable Models
Maximum likelihood is NP-hard.
Practice: EM, Variational Bayes have no consistency guarantees.
Efficient computational and sample complexities?
Fast methods such as matrix factorization are not statistical. Wecannot learn the latent variable model through such methods.
Tensor-based Estimation
Estimate moment tensors from data: higher order relationships.
Compute decomposition of moment tensor.
Iterative updates, e.g. tensor power iterations, alternatingminimization.
Non-convex: convergence to a local optima. No guarantees.
Innovation: Guaranteed convergence to correct model.
![Page 14: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/14.jpg)
Computational FrameworkChallenge: Efficient Learning of Latent Variable Models
Maximum likelihood is NP-hard.
Practice: EM, Variational Bayes have no consistency guarantees.
Efficient computational and sample complexities?
Fast methods such as matrix factorization are not statistical. Wecannot learn the latent variable model through such methods.
Tensor-based Estimation
Estimate moment tensors from data: higher order relationships.
Compute decomposition of moment tensor.
Iterative updates, e.g. tensor power iterations, alternatingminimization.
Non-convex: convergence to a local optima. No guarantees.
Innovation: Guaranteed convergence to correct model.
In this talk: tensor decompositions and applications
![Page 15: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/15.jpg)
Outline
1 Introduction
2 Topic Models
3 Efficient Tensor Decomposition
4 Experimental Results
5 Conclusion
![Page 16: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/16.jpg)
Topic Models: Bag of Words
![Page 17: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/17.jpg)
Probabilistic Topic Models
Bag of words: order of words does not matter
Graphical model representation
l words in a document x1, . . . , xl.
h: proportions of topics in a document.
Word xi generated from topic yi.
A(i, j) := P[xm = i|ym = j] :
topic-word matrix.
Words
Topics
Topic
Mixture
x1 x2 x3 x4 x5
y1 y2 y3 y4 y5
AAAAA
h
![Page 18: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/18.jpg)
Geometric Picture for Topic Models
Topic proportions vector (h)
Document
Linear Model:E[xi|h] = Ah .
Multiview model: h isfixed and multiple words(xi) are generated.
![Page 19: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/19.jpg)
Geometric Picture for Topic Models
Single topic (h)
Linear Model:E[xi|h] = Ah .
Multiview model: h isfixed and multiple words(xi) are generated.
![Page 20: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/20.jpg)
Geometric Picture for Topic Models
Topic proportions vector (h)
Linear Model:E[xi|h] = Ah .
Multiview model: h isfixed and multiple words(xi) are generated.
![Page 21: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/21.jpg)
Geometric Picture for Topic Models
Topic proportions vector (h)
AAA
x1
x2
x3Word generation (x1, x2, . . .)
Linear Model:E[xi|h] = Ah .
Multiview model: h isfixed and multiple words(xi) are generated.
![Page 22: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/22.jpg)
Moment TensorsConsider single topic model.
E[xi|h] = Ah. ~λ := [E[h]]i.
Learn topic-word matrix A, vector ~λ = P[h]
M2: Co-occurrence of two words in a document
M2 := E[x1x⊤2 ] = E[E[x1x
⊤2 |h]] = AE[hh⊤]A⊤ =
k∑
r=1
λrara⊤r
![Page 23: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/23.jpg)
Moment TensorsConsider single topic model.
E[xi|h] = Ah. ~λ := [E[h]]i.
Learn topic-word matrix A, vector ~λ = P[h]
M2: Co-occurrence of two words in a document
M2 := E[x1x⊤2 ] = E[E[x1x
⊤2 |h]] = AE[hh⊤]A⊤ =
k∑
r=1
λrara⊤r
Tensor M3: Co-occurrence of three words
M3 := E(x1 ⊗ x2 ⊗ x3) =∑
r
λrar ⊗ ar ⊗ ar
![Page 24: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/24.jpg)
Moment TensorsConsider single topic model.
E[xi|h] = Ah. ~λ := [E[h]]i.
Learn topic-word matrix A, vector ~λ = P[h]
M2: Co-occurrence of two words in a document
M2 := E[x1x⊤2 ] = E[E[x1x
⊤2 |h]] = AE[hh⊤]A⊤ =
k∑
r=1
λrara⊤r
Tensor M3: Co-occurrence of three words
M3 := E(x1 ⊗ x2 ⊗ x3) =∑
r
λrar ⊗ ar ⊗ ar
Matrix and Tensor Forms: ar := rth column of A.
M2 =
k∑
r=1
λrar ⊗ ar. M3 =
k∑
r=1
λrar ⊗ ar ⊗ ar
![Page 25: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/25.jpg)
Tensor Decomposition Problem
M2 =k
∑
r=1
λrar ⊗ ar. M3 =k
∑
r=1
λrar ⊗ ar ⊗ ar
= + ....
Tensor M3 λ1a1 ⊗ a1 ⊗ a1 λ2a2 ⊗ a2 ⊗ a2
u⊗ v ⊗ w is a rank-1 tensor whose i, j, kth entry is uivjwk.
k topics, d words in vocabulary.
M3: O(d× d× d) tensor, Rank k.
Learning Topic Models through Tensor Decomposition
![Page 26: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/26.jpg)
Detecting Communities in Networks
![Page 27: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/27.jpg)
Detecting Communities in Networks
Stochastic Block Model
Non-overlapping
![Page 28: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/28.jpg)
Detecting Communities in Networks
Stochastic Block Model
Non-overlapping
Mixed Membership Model
Overlapping
![Page 29: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/29.jpg)
Detecting Communities in Networks
Stochastic Block Model
Non-overlapping
Mixed Membership Model
Overlapping
![Page 30: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/30.jpg)
Detecting Communities in Networks
Stochastic Block Model
Non-overlapping
Mixed Membership Model
Overlapping
Unifying Assumption
Edges conditionally independent given community memberships
![Page 31: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/31.jpg)
Multi-view Mixture Models
![Page 32: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/32.jpg)
Tensor Forms in Other Models
Independent Component Analysis
Independent sources, unknown mixing.
Blind source separation of speech, image, video..
h1 h2 hk
x1 x2 xd
A
Gaussian Mixtures Hidden MarkovModels/Latent Trees
x1 x2 x3 x4 x5
h1
h2 h3
Reduction to similar moment forms
![Page 33: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/33.jpg)
Outline
1 Introduction
2 Topic Models
3 Efficient Tensor Decomposition
4 Experimental Results
5 Conclusion
![Page 34: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/34.jpg)
Tensor Decomposition Problem
M3 =
k∑
r=1
λrar ⊗ ar ⊗ ar
= + ....
Tensor M3 λ1a1 ⊗ a1 ⊗ a1 λ2a2 ⊗ a2 ⊗ a2
u⊗ v ⊗ w is a rank-1 tensor whose i, j, kth entry is uivjwk.
k topics, d words in vocabulary.
M3: O(d× d× d) tensor, Rank k.
d: vocabulary size for topic models or n: size of network forcommunity models.
![Page 35: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/35.jpg)
Dimensionality Reduction for Tensor Decomposition
M3 =
k∑
r=1
λrar ⊗ ar ⊗ ar
Dimensionality Reduction(Whitening)
Convert M3 of size O(d× d× d)to tensor T of size k × k × k
Carry out decomposition of T Tensor M3 Tensor T
Dimensionality reduction through multi-linear transforms
Computed from data, e.g. pairwise moments.
T =∑
i ρir⊗3i is symmetric orthogonal tensor: {ri} are orthonormal
![Page 36: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/36.jpg)
Orthogonal/Eigen Decomposition
Orthogonal symmetric tensor: T =∑
j∈[k]
ρjr⊗3j
T (I, r1, r1) =∑
j∈[k]
ρj〈r1, rj〉2rj = ρ1r1
![Page 37: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/37.jpg)
Orthogonal/Eigen Decomposition
Orthogonal symmetric tensor: T =∑
j∈[k]
ρjr⊗3j
T (I, r1, r1) =∑
j∈[k]
ρj〈r1, rj〉2rj = ρ1r1
Obtaining eigenvectors through power iterations
u 7→T (I, u, u)
‖T (I, u, u)‖
![Page 38: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/38.jpg)
Orthogonal/Eigen Decomposition
Orthogonal symmetric tensor: T =∑
j∈[k]
ρjr⊗3j
T (I, r1, r1) =∑
j∈[k]
ρj〈r1, rj〉2rj = ρ1r1
Obtaining eigenvectors through power iterations
u 7→T (I, u, u)
‖T (I, u, u)‖
Basic Algorithm
Random initialization, run power iterations and deflate
![Page 39: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/39.jpg)
Practical Considerations
k communities, n nodes, k ≪ n.
Steps
k-SVD of n× n matrix: randomized techniques
Online k × k × k tensor decomposition: No tensor explicitly formed.
Parallelization: Inherently parallelizable, GPU deployment.
Sparse implementation: real-world networks are sparse
Validation Metric: p-value test based “soft-pairing”
Parallel time complexity: O
(
nsk
c+ k3
)
,
s is max. degree in graph and c is number of cores.
Huang, Niranjan, Hakeem and Anandkumar, “Fast Detection of Overlapping Communities via
Online Tensor Methods,” Preprint, Sept. 2013.
![Page 40: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/40.jpg)
Scaling Of The Stochastic Iterations
vt+1
i ← vti − 3θβt
k∑
j=1
[
⟨
vtj , vti
⟩2vtj
]
+ βt⟨
vti , ytA
⟩⟨
vti , ytB
⟩
ytC + . . .
Parallelize acrosseigenvectors.
STGD is iterative:device code reusebuffers for updates.
vti
ytA,ytB,y
tC
CPU
GPU
Standard Interface
vti
ytA,ytB,y
tC
CPU
GPU
Device Interface
vti
![Page 41: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/41.jpg)
Scaling Of The Stochastic Iterations
102
103
10−1
100
101
102
103
104
Number of communities k
Runningtime(secs)
MATLAB Tensor Toolbox
CULA Standard Interface
CULA Device Interface
Eigen Sparse
![Page 42: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/42.jpg)
Outline
1 Introduction
2 Topic Models
3 Efficient Tensor Decomposition
4 Experimental Results
5 Conclusion
![Page 43: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/43.jpg)
Experimental Results
Friend
Users
n ∼ 20, 000
BusinessUser
Reviews
Yelp
n ∼ 40, 000
AuthorCoauthor
DBLP
n ∼ 1 million
Error (E) and Recovery ratio (R)
Dataset k̂ Method Running Time E RFacebook(k=360) 500 ours 468 0.0175 100%Facebook(k=360) 500 variational 86,808 0.0308 100%.Yelp(k=159) 100 ours 287 0.046 86%Yelp(k=159) 100 variational N.A..DBLP(k=6000) 100 ours 5407 0.105 95%
![Page 44: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/44.jpg)
Experimental Results on Yelp
Lowest error business categories & largest weight businesses
Rank Category Business Stars Review Counts
1 Latin American Salvadoreno Restaurant 4.0 362 Gluten Free P.F. Chang’s China Bistro 3.5 553 Hobby Shops Make Meaning 4.5 144 Mass Media KJZZ 91.5FM 4.0 135 Yoga Sutra Midtown 4.5 31
![Page 45: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/45.jpg)
Experimental Results on Yelp
Lowest error business categories & largest weight businesses
Rank Category Business Stars Review Counts
1 Latin American Salvadoreno Restaurant 4.0 362 Gluten Free P.F. Chang’s China Bistro 3.5 553 Hobby Shops Make Meaning 4.5 144 Mass Media KJZZ 91.5FM 4.0 135 Yoga Sutra Midtown 4.5 31
Bridgeness: Distance from vector [1/k̂, . . . , 1/k̂]⊤
Top-5 bridging nodes (businesses)
Business Categories
Four Peaks Brewing Restaurants, Bars, American, Nightlife, Food, Pubs, Tempe
Pizzeria Bianco Restaurants, Pizza, Phoenix
FEZ Restaurants, Bars, American, Nightlife, Mediterranean, Lounges, Phoenix
Matt’s Big Breakfast Restaurants, Phoenix, Breakfast& Brunch
Cornish Pasty Co Restaurants, Bars, Nightlife, Pubs, Tempe
![Page 46: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/46.jpg)
Outline
1 Introduction
2 Topic Models
3 Efficient Tensor Decomposition
4 Experimental Results
5 Conclusion
![Page 47: MLconf NYC Animashree Anandkumar](https://reader034.vdocuments.net/reader034/viewer/2022051311/5454e615af795994188b45cb/html5/thumbnails/47.jpg)
Conclusion
Guaranteed Learning of Latent Variable Models
Guaranteed to recover correct model
Efficient sample and computational complexities
Better performance compared to EM, VariationalBayes etc.
Mixed membership communities, topic models,ICA, Gaussian mixtures...
Current and Future Goals
Guaranteed online learning in high dimensions
Large-scale cloud-based implementation of tensor approaches
Code available on website and Github