probabilistic graphical modelsepxing/class/10708-14/lectures/lecture23-kernelgm.pdfconsider kernel...

75
School of Computer Science Probabilistic Graphical Models Kernel Graphical Models Eric Xing Lecture 23, April 9, 2014 Acknowledgement: slides first drafted by Ankur Parikh ©Eric Xing @ CMU, 2012-2014 1

Upload: others

Post on 24-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

School of Computer Science

Probabilistic Graphical Models

Kernel Graphical Models

Eric XingLecture 23, April 9, 2014

Acknowledgement: slides first drafted by Ankur Parikh©Eric Xing @ CMU, 2012-2014 1

Page 2: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Nonparametric Graphical Models

How do we make a conditional probability table out of this?

How to learn parameters? How to perform inference?

Hilbert Space Embeddings!!!!!

©Eric Xing @ CMU, 2012-2014 2

Page 3: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Important Notation for this Lecture We will use the calligraphic P to denote that the probability is

being treated as a matrix/vector/tensor

Probabilities

Probability Vectors/Matrices/Tensors

©Eric Xing @ CMU, 2012-2014 3

Page 4: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Review: Embedding Distribution of One Variable[Smola et al. 2007]

densityThe Hilbert Space Embedding of X

©Eric Xing @ CMU, 2012-2014 4

Page 5: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Review: Cross Covariance Operator [Smola et al. 2007]

Embed Joint Distribution of X and Y in the Tensor Product of two RKHS’s

Embedding of

©Eric Xing @ CMU, 2012-2014 5

Page 6: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Review: Auto Covariance Operator [Smola et al. 2007]

Only take expectation over these

Embedding of

©Eric Xing @ CMU, 2012-2014 6

Page 7: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Review: Conditional Embedding Operator [Song et al. 2009]

Conditional Embedding Operator:

Has Following Property:

Analogous to “Slicing” a Conditional Probability Table in the Discrete Case:

©Eric Xing @ CMU, 2012-2014 7

Page 8: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Slicing the Conditional Probability Matrix

©Eric Xing @ CMU, 2012-2014 8

Page 9: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

“Slicing” the Conditional Embedding Operator

©Eric Xing @ CMU, 2012-2014 9

Page 10: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Why we Like Hilbert Space Embeddings

We will prove these now

We can marginalize and use chain rule in Hilbert Space too!!!

Sum Rule:

Chain Rule:

Sum Rule in RKHS:

Chain Rule in RKHS:

©Eric Xing @ CMU, 2012-2014 10

Page 11: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Sum Rules The sum rule can be expressed in two ways:

First way:

Second way:

What is special about the second way? Intuitively, it can be expressed elegantly as matrix multiplication

Does not work in RKHS, since there is no “sum” operation for an operator

Works in RKHS!!!

©Eric Xing @ CMU, 2012-2014 11

Page 12: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Sum Rule

Equivalent view using Matrix Algebra

Sum Rule (Matrix Form)

©Eric Xing @ CMU, 2012-2014 12

Page 13: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Chain Rule

Equivalent view using Matrix Algebra

Note how diagonal is used to keep Y from being marginalized out.

Chain Rule (Matrix Form)

Means on diagonal

©Eric Xing @ CMU, 2012-2014 13

Page 14: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Example What about?

Only if B and C are conditionally independent given A!!!

©Eric Xing @ CMU, 2012-2014 14

Page 15: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Let’s now derive the matrix sum rule differently.

Let denote an indicator vector, that is 1 in the position.

Different Proof of Matrix Sum Rule with Expectations

©Eric Xing @ CMU, 2012-2014 15

Page 16: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Random Variables?

Remember this is a probability vector. It is not a random variable.

Similarly, this is a function in an RKHS. It is not a random variable.

This is a random vector

This is a random function

©Eric Xing @ CMU, 2012-2014 16

Page 17: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Expectation Proof of Matrix Sum Rule Cont.

This is a conditional probability matrix, so it is not a random variable (despite the misleading notation), and thus the Expectation can be pulled out

This is a random variable

©Eric Xing @ CMU, 2012-2014 17

Page 18: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Proof of RKHS Sum Rule Now apply the same technique to the RKHS Case.

Move expectation outside

Property of conditional embedding

Property of Expectation

Definition of Mean Map

©Eric Xing @ CMU, 2012-2014 18

Page 19: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

The idea is to replace the CPTs with RKHS operators/functions.

Let’s do this for a simple example first.

We would like to compute

Kernel Graphical Models [Song et al. 2010, Song et al. 2011]

©Eric Xing @ CMU, 2012-2014 19

Page 20: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Consider the Discrete Case

©Eric Xing @ CMU, 2012-2014 20

Page 21: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Inference as Matrix Multiplication

Oops....we accidentally integrated out A©Eric Xing @ CMU, 2012-2014 21

Page 22: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Put A on Diagonal Instead

©Eric Xing @ CMU, 2012-2014 22

Page 23: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Now it works

©Eric Xing @ CMU, 2012-2014 23

Page 24: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Introducing evidence

Introduce evidence with delta vectors

©Eric Xing @ CMU, 2012-2014 24

Page 25: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Now with Kernels

©Eric Xing @ CMU, 2012-2014 25

Page 26: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Sum-Product with Kernels

©Eric Xing @ CMU, 2012-2014 26

Page 27: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Sum-Product with Kernels

©Eric Xing @ CMU, 2012-2014 27

Page 28: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Consider just evaluating one random variable X at a particular evidence value using the Gaussian RBF Kernel:

What does this looks like?

What does it mean to evaluate the mean map at a point?

©Eric Xing @ CMU, 2012-2014 28

Page 29: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Consider Kernel Density Estimate at point :

And its empirical estimate:

So evaluating the mean map at a point is like an unnormalizedkernel density estimate. To find the “MAP” assignment, we can evaluate on a grid of points, and then pick the one with the highest value.

Kernel Density Estimation!

©Eric Xing @ CMU, 2012-2014 29

Page 30: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Kernel Density Estimation with Gaussian RBF Kernel in Multiple Variables is:

Like evaluating a “Huge” Covariance Operator using Gaussian RBF Kernel (without normalization):

Multiple Variables

©Eric Xing @ CMU, 2012-2014 30

Page 31: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

What is the problem with this?

The empirical estimate is very inaccurate because of curse of dimensionality

Empirically computing the “huge” covariance operator will have the same problem.

But then what is the point of Hilbert Space Embeddings?

©Eric Xing @ CMU, 2012-2014 31

Page 32: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

We can factorize the “Huge” Covariance Operator Hilbert Space Embeddings allow us to factorize the huge

covariance operator using the graphical model structure that kernel density estimation does not do.

Factorizes into smaller covariance/conditional embedding operators using the graphical model that are more efficient to estimate.

©Eric Xing @ CMU, 2012-2014 32

Page 33: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Kernel Graphical Models: The Overall Picture

Naïve way to represent joint distribution of discrete variables is to store and manipulate a “huge” probability table.

Naïve way to represent joint distribution for many continuous variables is to use multivariate kernel density estimation.

Discrete Graphical Models allow us to factorize the “huge” joint distribution table into smaller factors.

Kernel Graphical Models allow us to factorize joint distributions of continuous variables into smaller factors.

©Eric Xing @ CMU, 2012-2014 33

Page 34: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Consider an Even Simpler Graphical Model

We are going to show how to estimate these operators from data.

©Eric Xing @ CMU, 2012-2014 34

Page 35: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

The Kernel Matrix

….

..

.

.

.

.

.

.…

©Eric Xing @ CMU, 2012-2014 35

Page 36: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Empirical Estimate Auto Covariance

Defined on next slide©Eric Xing @ CMU, 2012-2014 36

Page 37: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Conceptually,

©Eric Xing @ CMU, 2012-2014 37

Page 38: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Conceptually,

©Eric Xing @ CMU, 2012-2014 38

Page 39: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Conceptually,

©Eric Xing @ CMU, 2012-2014 39

Page 40: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Rigorously,

is an operator that maps vectors in to functions in

such that:

Its adjoint (transpose) can then be derived to be:

©Eric Xing @ CMU, 2012-2014 40

Page 41: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Empirical Estimate Cross Covariance

©Eric Xing @ CMU, 2012-2014 41

Page 42: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Getting the Kernel Matrix It can then be shown that,

This is finite and easy to compute!!

However, note that the estimates of the covariance operators are not finite since:

©Eric Xing @ CMU, 2012-2014 42

Page 43: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Intuition 1: Why the Kernel Trick works

This operator is infinite dimensional but it has at most rank N

The kernel matrix is N by N, and thus the kernel trick is exploiting the low rank structure

©Eric Xing @ CMU, 2012-2014 43

Page 44: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Empirical Estimate of Conditional Embedding Operator

44

Sort of……We need to regularize so that this is invertible

?

regularizer

©Eric Xing @ CMU, 2012-2014

Page 45: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Return of Matrix Inversion Lemma Matrix Inversion Identity

Using it we get,

©Eric Xing @ CMU, 2012-2014 45

Page 46: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

But Our estimates are still Infinite….

Lets do inference and see what happens.

©Eric Xing @ CMU, 2012-2014 46

Page 47: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Running Inference

©Eric Xing @ CMU, 2012-2014 47

Page 48: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Incorporating the Evidence

©Eric Xing @ CMU, 2012-2014 48

Page 49: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Reparameterize the Model

A

B

C

Evidence:

Finite!!!!!

©Eric Xing @ CMU, 2012-2014 49

Page 50: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Intuition 2: Why the Kernel Trick Works

….

..

.

.

.

.

.

.…

©Eric Xing @ CMU, 2012-2014 50

Page 51: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Intuition 2: Why the Kernel Trick Works

….

..

.

.

.…

.

.

.

Evaluating a feature function at the N data points!!!

©Eric Xing @ CMU, 2012-2014 51

Page 52: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Intuition 2: Why the Kernel Trick Works Generally people interpret the kernel matrix to be a similarity

matrix.

However, we can also view each row of the kernel matrix as evaluating a function at the N data points.

Although the function may be continuous and not easily represented analytically, we only really care about what its value is on the N data points.

Thus, when we only have a finite amount of data, the computation should be inherently finite.

©Eric Xing @ CMU, 2012-2014 52

Page 53: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Protein Sidechains

http://t3.gstatic.com/images?q=tbn:ANd9GcS_nfJy1o9yrDt37YlpK7i5s0f7QFqhPrG7-1CLm2AfWNt5wCE50pIKNZd0

Goal is to predict the 3D configuration of each sidechain

©Eric Xing @ CMU, 2012-2014 53

Page 54: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

3D configuration of the sidechain is determined by two angles (spherical coordinates).

Protein Sidechains

http://www.math24.net/images/triple-int23.jpg

©Eric Xing @ CMU, 2012-2014 54

Page 55: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

The Graphical Model Construct a Markov Random Field.

Each side-chain angle pair is a node. There is an edge between side-chains that are nearby in the protein.

Edge potentials are already determined by physics equations.

©Eric Xing @ CMU, 2012-2014 55

Page 56: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Goal is to find the MAP assignment of all the sidechain angle pairs.

Note that this is not Gaussian. But it is easy to define a kernel between angle pairs:

Can then run Kernel Belief Propagation

The Graphical Model

©Eric Xing @ CMU, 2012-2014 56

Page 57: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

References Smola, A. J., Gretton, A., Song, L., and Schölkopf, B., A Hilbert Space

Embedding for Distributions, Algorithmic Learning Theory, E. Takimoto(Eds.), Lecture Notes on Computer Science, Springer, 2007.

L. Song. Learning via Hilbert space embedding of distributions. PhD Thesis 2008.

Song, L., Huang, J., Smola, A., and Fukumizu, K., Hilbert space embeddings of conditional distributions, International Conference on Machine Learning, 2009.

Song, L., Gretton, A., and Guestrin, C., Nonparametric Tree Graphical Models via Kernel Embeddings, Artificial Intelligence and Statistics (AISTATS), 2010.

Song, L., Gretton, A., Bickson, D., Low, Y., and Guestrin, C., Kernel Belief Propagation, International Conference on Artifical Intelligence and Statistics (AISTATS), 2011.

©Eric Xing @ CMU, 2012-2014 57

Page 58: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Supplemental: Kernel Belief Propagation on Trees

©Eric Xing @ CMU, 2012-2014 58

Page 59: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Kernel Tree Graphical Models [Song et al. 2010]

The goal is to somehow replace the CPTs with RKHS operators/functions.

But we need to do this in a certain way so that we can still do inference.

?

?

©Eric Xing @ CMU, 2012-2014 59

Page 60: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

We need to “matricize” message passing to apply the RKHS trick (but matrices are not enough, we need tensors )

Message Passing/Belief Propagation

© Ankur Parikh, Eric Xing @ CMU, 2012-2013©Eric Xing @ CMU, 2012-2014 60

Page 61: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Show how to represent discrete graphical models using higher order tensors

Derive Tensor Message Passing

Show how Tensor Message Passing can also be derived using Expectations

Derive Kernel Message Passing [Song et al. 2010] using the intuition from Tensor Message Passing / Expectations

(For simplicity, we will assume a binary tree – all internal nodes have 2 children).

Outline

© Ankur Parikh, Eric Xing @ CMU, 2012-2013©Eric Xing @ CMU, 2012-2014 61

Page 62: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Multidimensional arrays A Tensor of order N has N modes (N indices):

Each mode is associated with a dimension. In the example, Dimension of mode 1 is 4 Dimension of mode 2 is 3 Dimension of mode 3 is 4

Tensors

4

3

4

©Eric Xing @ CMU, 2012-2014 62

Page 63: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Diagonal Tensors

©Eric Xing @ CMU, 2012-2014 63

Page 64: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Partially Diagonal Tensors

©Eric Xing @ CMU, 2012-2014 64

Page 65: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Multiplying a 3rd order tensor by a vector produces a matrix

Tensor Vector Multiplication

©Eric Xing @ CMU, 2012-2014 65

Page 66: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Tensor Vector Multiplication Cont. Multiplying a 3rd order tensor by two vectors produces a

vector

©Eric Xing @ CMU, 2012-2014 66

Page 67: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Conditional Probability Table At Leaf is a Matrix

©Eric Xing @ CMU, 2012-2014 67

Page 68: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

CPT At Internal Node (Non-Root) is 3rd Order Tensor Note that we have

©Eric Xing @ CMU, 2012-2014 68

Page 69: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

CPT At Root CPT at root is a matrix.

©Eric Xing @ CMU, 2012-2014 69

Page 70: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

The Outgoing Message as a Vector (at Leaf)

“bar” denotes evidence

©Eric Xing @ CMU, 2012-2014 70

Page 71: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

The Outgoing Message At Internal Node

©Eric Xing @ CMU, 2012-2014 71

Page 72: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

At the Root

©Eric Xing @ CMU, 2012-2014 72

Page 73: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Kernel Graphical Models [Song et al. 2010, Song et al. 2011]

The Tensor CPTs at each node are replaced with RKHS functions/operators

Leaf:

Internal (non-root):

Root:

©Eric Xing @ CMU, 2012-2014 73

Page 74: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Conditional Embedding Operator for Internal Nodes

Embedding of

What is ?

©Eric Xing @ CMU, 2012-2014 74

Page 75: Probabilistic Graphical Modelsepxing/Class/10708-14/lectures/lecture23-KernelGM.pdfConsider Kernel Density Estimate at point : And its empirical estimate: So evaluating the mean map

Embedding of Cross Covariance Operator in Different RKHS

Embedding of

©Eric Xing @ CMU, 2012-2014 75