csc321 tutorial 8: assignment 3: mixture of gaussiansyueli/csc321_utm_2014_files/tut8.pdf · csc321...

43
CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians (K -means slides based on Maksims Volkovs’s and many figures from Bishop 2006 textbook: Pattern recognition and machine learning) Yue Li Email: [email protected] Wed 11-12 March 12 Fri 10-11 March 14

Upload: ngohanh

Post on 11-Aug-2018

236 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

CSC321 Tutorial 8:Assignment 3: Mixture of Gaussians

(K -means slides based on Maksims Volkovs’s and many figures fromBishop 2006 textbook: Pattern recognition and machine learning)

Yue LiEmail: [email protected]

Wed 11-12 March 12Fri 10-11 March 14

Page 2: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

Outline

K-means clustering

Mixture of Gaussians

Assignment 3

Page 3: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

• Setting:• data: {x1, . . . , xN}• goal: partition the data into K clusters• objective function in K -means:

J =N∑

n=1

K∑k=1

rnk ||xn − µk ||2 (1)

• Algorithm:1. initialize K cluster centers µ1, . . . ,µK

2. assign each point xn to the closest center k:

rnk =

1 if k = arg minj||xn − µj ||2

0 if otherwise

3. update cluster centers:

µk =

∑n rnkxk∑n rnk

4. repeat 2 & 3 until convergence (i.e. little change from (1))

Pattern recognition and machine learning (Bishop, 2006)

Page 4: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

• Setting:• data: {x1, . . . , xN}• goal: partition the data into K clusters• objective function in K -means:

J =N∑

n=1

K∑k=1

rnk ||xn − µk ||2 (1)

• Algorithm:1. initialize K cluster centers µ1, . . . ,µK

2. assign each point xn to the closest center k :

rnk =

1 if k = arg minj||xn − µj ||2

0 if otherwise

3. update cluster centers:

µk =

∑n rnkxk∑n rnk

4. repeat 2 & 3 until convergence (i.e. little change from (1))

Pattern recognition and machine learning (Bishop, 2006)

Page 5: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(a)

−2 0 2

−2

0

2

Page 6: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(b)

−2 0 2

−2

0

2

Page 7: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(c)

−2 0 2

−2

0

2

Page 8: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(d)

−2 0 2

−2

0

2

Page 9: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(e)

−2 0 2

−2

0

2

Page 10: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(f)

−2 0 2

−2

0

2

Page 11: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(g)

−2 0 2

−2

0

2

Page 12: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(h)

−2 0 2

−2

0

2

Page 13: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(i)

−2 0 2

−2

0

2

Page 14: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

J

1 2 3 40

500

1000

Page 15: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

Mixture of Gaussians (MoG)

• A soft K -means that incorporates uncertainty in the clusterassignment to each data point (e.g., points on the boundaryin slide 1):

• rnk ∈ [0, 1] rather than rnk ∈ {0, 1}

• A model-based method that assumes data points areindependently sampled from K Gaussians N (µk ,Σk)

• Two important questions to address in MoG:• What is the objective function of MoG?• How to fit MoG to optimize such objective function?

• To further motivate MoG, let’s see the following example fromBishop (2006) that model the same data from the previousK -means example.

Page 16: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(a)−2 0 2

−2

0

2

Page 17: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(b)−2 0 2

−2

0

2

Page 18: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(c)

L = 1

−2 0 2

−2

0

2

Page 19: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(d)

L = 2

−2 0 2

−2

0

2

Page 20: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(e)

L = 5

−2 0 2

−2

0

2

Page 21: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

(f)

L = 20

−2 0 2

−2

0

2

Page 22: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

A graphical model view of MoG

xn

zn

N

µ Σ

π

Pattern recognition and machine learning, Chapter 9 p433, (Bishop, 2006)

Page 23: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

• Density function of one data point xn:

p(xn) =K∑

k=1

p(zk)p(xn|θk)

=K∑

k=1

πkN (xn|µk ,Σk)

where zk is the latent variable, πk is the probability ofchoosing Gaussian k to represent xn (i.e., probability of zkequal to 1). πk is called the mixing proportion.

• Objective function - log likelihood of all data points:

ln p(X) = lnN∏

n=1

p(xn)

= lnN∏

n=1

K∑k=1

πkN (xn|µk ,Σk)

=N∑

n=1

lnK∑

k=1

πkN (xn|µk ,Σk)

Page 24: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

• Density function of one data point xn:

p(xn) =K∑

k=1

p(zk)p(xn|θk)

=K∑

k=1

πkN (xn|µk ,Σk)

where zk is the latent variable, πk is the probability ofchoosing Gaussian k to represent xn (i.e., probability of zkequal to 1). πk is called the mixing proportion.

• Objective function - log likelihood of all data points:

ln p(X) = lnN∏

n=1

p(xn)

= lnN∏

n=1

K∑k=1

πkN (xn|µk ,Σk)

=N∑

n=1

lnK∑

k=1

πkN (xn|µk ,Σk)

Page 25: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

• Objective function:

ln p(X) =N∑

n=1

lnK∑

k=1

πkN (xn|µk ,Σk)

• Maximum likelihood (ML) solutions for µk , Σk , πk :

∂ ln p(X)

∂µk

= 0 =⇒ µk =1

Nk

N∑n=1

γ(zk)xn

∂ ln p(X)

∂Σk= 0 =⇒ Σk =

1

Nk

N∑n=1

γ(zk)(xn − µk)(xn − µk)T

∂ ln p(X)

∂πk= 0 =⇒ πk =

Nk

N

where Nk =∑N

n=1 γ(zk) and γ(zk) = p(zk = 1|xn)

Page 26: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

In more details, γ(zk) is the posterior probability (akaresponsibility of zk to xn):

γ(zk) = p(zk = 1|xn)

=p(zk = 1)p(xn|zk = 1)

p(xn)(Bayes’ Rule)

=p(zk = 1)p(xn|zk = 1)∑Kj=1 p(zj = 1)p(xn|zj = 1)

=πkN (xn|µk ,Σk)∑Kj=1 πjN (xn|µj ,Σj)

Page 27: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

p(zk = 1|xn)

xn

zn

N

µ Σ

π

Pattern recognition and machine learning, Chapter 9 p433, (Bishop, 2006)

Page 28: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

• Back to the ML solutions for µk , Σk , πk :

∂ ln p(X)

∂µk

= 0 =⇒ µk =1

Nk

N∑n=1

γ(zk)xn

∂ ln p(X)

∂Σk= 0 =⇒ Σk =

1

Nk

N∑n=1

γ(zk)(xn − µk)(xn − µk)T

∂ ln p(X)

∂πk= 0 =⇒ πk =

Nk

N

Nk =∑N

n=1 γ(zk); γ(zk) = p(zk = 1|xn) = πkN (xn|µk ,Σk )∑Kj=1 πjN (xn|µj ,Σj )

• Because µk , Σk , πk , γ(zk) all depend on each other, there isno analytical solution.

• We resort to a powerful optimization algorithm - ExpectationMaximization (EM).

Page 29: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

EM algorithm for MoG :

1. Initialize µk , Σk , πk .

2. E-step. Evaluate the responsibilities γ(zk) using µk , Σk , πk :

γ(zk) =πkN (xn|µk ,Σk)∑Kj=1 πjN (xn|µj ,Σj)

(2)

3. M-step. Re-estimate µk , Σk , πk based on the ML solutions:

µnewk =

1

Nk

N∑n=1

γ(zk)xn (3)

Σnewk =

1

Nk

N∑n=1

γ(zk)(x− µnewk )(x− µnew

k )T (4)

πnewk =

Nk

N(5)

where Nk =∑N

n=1 γ(zk).

4. Evaluate the log likelihood:

ln p(X) =N∑

n=1

lnK∑

k=1

πkN (xn|µk ,Σk) (6)

Page 30: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

EM algorithm for MoG :

1. Initialize µk , Σk , πk .

2. E-step. Evaluate the responsibilities γ(zk) using µk , Σk , πk :

γ(zk) =πkN (xn|µk ,Σk)∑Kj=1 πjN (xn|µj ,Σj)

(2)

3. M-step. Re-estimate µk , Σk , πk based on the ML solutions:

µnewk =

1

Nk

N∑n=1

γ(zk)xn (3)

Σnewk =

1

Nk

N∑n=1

γ(zk)(x− µnewk )(x− µnew

k )T (4)

πnewk =

Nk

N(5)

where Nk =∑N

n=1 γ(zk).

4. Evaluate the log likelihood:

ln p(X) =N∑

n=1

lnK∑

k=1

πkN (xn|µk ,Σk) (6)

Page 31: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

EM algorithm for MoG :

1. Initialize µk , Σk , πk .

2. E-step. Evaluate the responsibilities γ(zk) using µk , Σk , πk :

γ(zk) =πkN (xn|µk ,Σk)∑Kj=1 πjN (xn|µj ,Σj)

(2)

3. M-step. Re-estimate µk , Σk , πk based on the ML solutions:

µnewk =

1

Nk

N∑n=1

γ(zk)xn (3)

Σnewk =

1

Nk

N∑n=1

γ(zk)(x− µnewk )(x− µnew

k )T (4)

πnewk =

Nk

N(5)

where Nk =∑N

n=1 γ(zk).

4. Evaluate the log likelihood:

ln p(X) =N∑

n=1

lnK∑

k=1

πkN (xn|µk ,Σk) (6)

Page 32: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

EM algorithm for MoG :

1. Initialize µk , Σk , πk .

2. E-step. Evaluate the responsibilities γ(zk) using µk , Σk , πk :

γ(zk) =πkN (xn|µk ,Σk)∑Kj=1 πjN (xn|µj ,Σj)

(2)

3. M-step. Re-estimate µk , Σk , πk based on the ML solutions:

µnewk =

1

Nk

N∑n=1

γ(zk)xn (3)

Σnewk =

1

Nk

N∑n=1

γ(zk)(x− µnewk )(x− µnew

k )T (4)

πnewk =

Nk

N(5)

where Nk =∑N

n=1 γ(zk).

4. Evaluate the log likelihood:

ln p(X) =N∑

n=1

lnK∑

k=1

πkN (xn|µk ,Σk) (6)

Page 33: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

Assignment 3

• This assignment is about fitting mixtures of axis-alignedGaussians to two-dimensional data. First copy and unzip thearchive below into your directory.

• http://www.cs.toronto.edu/~bonner/courses/2014s/

csc321/assignments/hw3_matlab.zip

Page 34: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

Assignment 3 PART 1 (5 points)

• Run moginit to create training and validation datasets from4 random Gaussians.

• Then use the function mogem to fit various numbers ofGaussians to the training data.

• Using performance on the validation data, determine theoptimal number of Gaussians to fit to the training data.

• Present your results as a graph that plots both the validationdensity and the training density as a function of the numberof Gaussians.

• Include a brief statement of what you think the graph shows.

• Also include a brief statement about the effects of changingthe initial standard deviation used in mogem.

• Please do not change the random seeds in mogem (this willproduce different data).

Page 35: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

DEMO: Fitting 4 Gaussians

Page 36: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

0 5 10 15 200

50

100

150

log(p

)

log−likelihood vs number of Gaussians

K

0 5 10 15 20−100

0

100

200

logp(t

rain

) −

logp(v

alid

)

0 5 10 15 20−100

0

100

200Train log(p)Valid log(p)Train log(p) − Valid log(p)

Training/validation generated from 4 Gaussians each with 30 cases(i.e., 120 cases in total)

Page 37: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

Assignment 3 PART 2 (2 points)

• Change moginit.m to use only 12 cases per Gaussian, and 12axis-aligned gaussians to generate the data and repeat theexperiment above (without changing the random seeds).

• Present your results as a graph and include a brief statementof what you think the graph shows and why it differs from thegraph in PART 1.

Page 38: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

0 5 10 15 20−50

0

50

100

150

200

log(p

)

log−likelihood vs number of Gaussians

K

0 5 10 15 20−50

0

50

100

150

200

logp(t

rain

) −

logp(v

alid

)

0 5 10 15 20−50

0

50

100

150

200Train log(p)

Valid log(p)

Train log(p) − Valid log(p)

Training/validation generated from 12 Gaussians each with 12cases (i.e., 144 cases in total)

Page 39: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

Assignment 3 PART 3 (3 points) - some programming

• Change mogem.m so that in addition to fitting the means andaxis-aligned variances, it also fits the mixing proportions(Hint: Eq 5).

• Currently, mogem does not mention mixing proportions so it iscurrently assuming that they are all equal (which makes themall cancel out when computing the posterior probability ofeach Gaussian for each datapoint.).

• So the first thing to do is to include mixing proportions whencomputing the posterior, but keep them fixed (and not allequal).

Page 40: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

Assignment 3 PART 3 (3 points) - some programming

• Associate the code mogem.m with the EMalgorithm in slide 27 or Lecture 14 (p9 - 12).

• Add/modify the code at the three commentedplaces in mogem.m.

Page 41: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

Improvement after adding mixing proportion

−50

0

50

100

Train−Mixprop Train+Mixprop Test−Mixprop Test+Mixprop

logp

Page 42: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

Initial and final mixing proportionsOnce you have debugged this, try learning the mixing proportions. Makemogem print out the final mixing proportions and hand in several differentexamples of the final mixing proportions that you get when fitting 4Gaussians to the data, when you start with mixing proportions [0.25,0.25, 0.25,0.25], [0.3, 0.2, 0.2, 0.3], [0.1, 0.2, 0.3, 0.4] and [0.9, 0.025,0.025, 0.05]. Do not change the data. NB: You may got different resultsif your nIter, sdinit are different from the those used below. Pleasereport nIter, sdinit when reporting the final mixprop in your report.

0 0.2 0.4 0.6 0.8 1

1

2

3

4

Initial Mixing proportions

Mixing proportion0 0.2 0.4 0.6 0.8 1

1

2

3

4

Final Mixing proportions

Mixing proportion

Other initial parameters.: k=4; nIter = 20; sdinit = 0.15;

Use barh in Matlab to generate this plot.

Page 43: CSC321 Tutorial 8: Assignment 3: Mixture of Gaussiansyueli/CSC321_UTM_2014_files/tut8.pdf · CSC321 Tutorial 8: Assignment 3: Mixture of Gaussians ... Pattern recognition and machine

ASSIGNMENT 3: due on March 18 at 3pm