lecture 2: generative learningtzhao80/lectures/lecture_2.pdf · lecture 2: generative learning tuo...
TRANSCRIPT
![Page 1: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/1.jpg)
Lecture 2: Generative Learning
Tuo Zhao
Schools of ISYE and CSE, Georgia Tech
![Page 2: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/2.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Generative Learning
Tuo Zhao — Lecture 2: Generative Learning 2/47
![Page 3: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/3.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Generative Learning
Tuo Zhao — Lecture 2: Generative Learning 3/47
![Page 4: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/4.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Modeling Dogs
Tuo Zhao — Lecture 2: Generative Learning 4/47
![Page 5: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/5.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Modeling Cats
Tuo Zhao — Lecture 2: Generative Learning 5/47
![Page 6: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/6.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Discriminative Learning
GenerativeDiscriminative GenerativeDiscriminative
Tuo Zhao — Lecture 2: Generative Learning 6/47
![Page 7: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/7.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Which One is Better for Classification?
Tuo Zhao — Lecture 2: Generative Learning 7/47
![Page 8: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/8.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Joint and Posterior Distributions
We consider a binary classification problem:
Feature: X ∈ Rd
Response: Y ∈ {0, 1}Class Prior: P(Y = 1) = p and P(Y = 0) = 1− pPosterior: Conditional Probability of Y Given X, i.e.,
P(Y |X) =P(Y )P(X|Y )
P(X).
Tuo Zhao — Lecture 2: Generative Learning 8/47
![Page 9: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/9.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Joint and Posterior Distributions
We consider a binary classification problem:
Feature: X ∈ Rd
Response: Y ∈ {0, 1}Class Prior: P(Y = 1) = p and P(Y = 0) = 1− pPosterior: Conditional Probability of Y Given X, i.e.,
P(Y |X) =P(Y )P(X|Y )
P(X).
Tuo Zhao — Lecture 2: Generative Learning 8/47
![Page 10: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/10.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Joint and Posterior Distributions
We consider a binary classification problem:
Feature: X ∈ Rd
Response: Y ∈ {0, 1}Class Prior: P(Y = 1) = p and P(Y = 0) = 1− pPosterior: Conditional Probability of Y Given X, i.e.,
P(Y |X) =P(Y )P(X|Y )
P(X).
Tuo Zhao — Lecture 2: Generative Learning 8/47
![Page 11: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/11.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Joint and Posterior Distributions
We consider a binary classification problem:
Feature: X ∈ Rd
Response: Y ∈ {0, 1}Class Prior: P(Y = 1) = p and P(Y = 0) = 1− pPosterior: Conditional Probability of Y Given X, i.e.,
P(Y |X) =P(Y )P(X|Y )
P(X).
Tuo Zhao — Lecture 2: Generative Learning 8/47
![Page 12: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/12.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Joint and Posterior Distributions
We consider a binary classification problem:
Feature: X ∈ Rd
Response: Y ∈ {0, 1}Class Prior: P(Y = 1) = p and P(Y = 0) = 1− pPosterior: Conditional Probability of Y Given X, i.e.,
P(Y |X) =P(Y )P(X|Y )
P(X).
Tuo Zhao — Lecture 2: Generative Learning 8/47
![Page 13: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/13.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Discriminative Learning
Posterior is sufficient for prediction:
y = argmaxy
P(Y = y|X = x)
= argmaxy
P(Y = y)P(X = x|Y = y)
P(X = x)
= argmaxy
P(Y = y)P(X = x|Y = y)
= argmaxy
P(X = x, Y = y)
Which one to model?
Joint Distribution? Conditional Distribution?
Tuo Zhao — Lecture 2: Generative Learning 9/47
![Page 14: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/14.jpg)
Gaussian Discriminant Analysis
![Page 15: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/15.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Gaussian Discriminant Analysis
Multivariate Gaussian Distribution: X ∼ N(µ,Σ)
Probability Density Function
f(x;µ,Σ) =1
(2π)d/2|Σ|1/2 exp(−1
2(x− µ)>Σ−1(x− µ)
)
Expectation: EX = µ
Covariance: E(X − µ)(X − µ)> = Σ
Standard Gaussian Distribution: µ = 0 and Σ = Id.
Tuo Zhao — Lecture 2: Generative Learning 11/47
![Page 16: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/16.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Gaussian Discriminant Analysis
Multivariate Gaussian Distribution: X ∼ N(µ,Σ)
Probability Density Function
f(x;µ,Σ) =1
(2π)d/2|Σ|1/2 exp(−1
2(x− µ)>Σ−1(x− µ)
)
Expectation: EX = µ
Covariance: E(X − µ)(X − µ)> = Σ
Standard Gaussian Distribution: µ = 0 and Σ = Id.
Tuo Zhao — Lecture 2: Generative Learning 11/47
![Page 17: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/17.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Gaussian Discriminant Analysis
Multivariate Gaussian Distribution: X ∼ N(µ,Σ)
Probability Density Function
f(x;µ,Σ) =1
(2π)d/2|Σ|1/2 exp(−1
2(x− µ)>Σ−1(x− µ)
)
Expectation: EX = µ
Covariance: E(X − µ)(X − µ)> = Σ
Standard Gaussian Distribution: µ = 0 and Σ = Id.
Tuo Zhao — Lecture 2: Generative Learning 11/47
![Page 18: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/18.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Gaussian Discriminant Analysis
Multivariate Gaussian Distribution: X ∼ N(µ,Σ)
Probability Density Function
f(x;µ,Σ) =1
(2π)d/2|Σ|1/2 exp(−1
2(x− µ)>Σ−1(x− µ)
)
Expectation: EX = µ
Covariance: E(X − µ)(X − µ)> = Σ
Standard Gaussian Distribution: µ = 0 and Σ = Id.
Tuo Zhao — Lecture 2: Generative Learning 11/47
![Page 19: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/19.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Gaussian Discriminant Analysis
Multivariate Gaussian Distribution: X ∼ N(µ,Σ)
3
real-valued random variable. The covariance can also be defined as Cov(Z) =E[ZZT ]− (E[Z])(E[Z])T . (You should be able to prove to yourself that thesetwo definitions are equivalent.) If X ∼ N (µ, Σ), then
Cov(X) = Σ.
Here’re some examples of what the density of a Gaussian distributionlooks like:
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
The left-most figure shows a Gaussian with mean zero (that is, the 2x1zero-vector) and covariance matrix Σ = I (the 2x2 identity matrix). A Gaus-sian with zero mean and identity covariance is also called the standard nor-mal distribution. The middle figure shows the density of a Gaussian withzero mean and Σ = 0.6I; and in the rightmost figure shows one with , Σ = 2I.We see that as Σ becomes larger, the Gaussian becomes more “spread-out,”and as it becomes smaller, the distribution becomes more “compressed.”
Let’s look at some more examples.
−3−2
−10
12
3
−3
−2
−1
0
1
2
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3
−2
−1
0
1
2
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3
−2
−1
0
1
2
3
0.05
0.1
0.15
0.2
0.25
The figures above show Gaussians with mean 0, and with covariancematrices respectively
Σ =
!1 00 1
"; Σ =
!1 0.5
0.5 1
"; .Σ =
!1 0.8
0.8 1
".
The leftmost figure shows the familiar standard normal distribution, and wesee that as we increase the off-diagonal entry in Σ, the density becomes more“compressed” towards the 45◦ line (given by x1 = x2). We can see this moreclearly when we look at the contours of the same three densities:
4
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Here’s one last set of examples generated by varying Σ:
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
The plots above used, respectively,
Σ =
!1 -0.5
-0.5 1
"; Σ =
!1 -0.8
-0.8 1
"; .Σ =
!3 0.8
0.8 1
".
From the leftmost and middle figures, we see that by decreasing the off-diagonal elements of the covariance matrix, the density now becomes “com-pressed” again, but in the opposite direction. Lastly, as we vary the pa-rameters, more generally the contours will form ellipses (the rightmost figureshowing an example).
As our last set of examples, fixing Σ = I, by varying µ, we can also movethe mean of the density around.
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
The figures above were generated using Σ = I, and respectively
µ =
!10
"; µ =
!-0.50
"; µ =
!-1
-1.5
".
Tuo Zhao — Lecture 2: Generative Learning 12/47
![Page 20: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/20.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Gaussian Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(Y = y) = py(1− p)1−y
P(X|Y = 0) ∼ N(µ0,Σ)
P(X = x|Y = 0) =exp
(−1
2(x− µ0)>Σ−1(x− µ0)
)
(2π)d/2|Σ|1/2
P(X|Y = 1) ∼ N(µ1,Σ)
P(X = x|Y = 1) =exp
(−1
2(x− µ1)>Σ−1(x− µ1)
)
(2π)d/2|Σ|1/2
Tuo Zhao — Lecture 2: Generative Learning 13/47
![Page 21: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/21.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Gaussian Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(Y = y) = py(1− p)1−y
P(X|Y = 0) ∼ N(µ0,Σ)
P(X = x|Y = 0) =exp
(−1
2(x− µ0)>Σ−1(x− µ0)
)
(2π)d/2|Σ|1/2
P(X|Y = 1) ∼ N(µ1,Σ)
P(X = x|Y = 1) =exp
(−1
2(x− µ1)>Σ−1(x− µ1)
)
(2π)d/2|Σ|1/2
Tuo Zhao — Lecture 2: Generative Learning 13/47
![Page 22: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/22.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Gaussian Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(Y = y) = py(1− p)1−y
P(X|Y = 0) ∼ N(µ0,Σ)
P(X = x|Y = 0) =exp
(−1
2(x− µ0)>Σ−1(x− µ0)
)
(2π)d/2|Σ|1/2
P(X|Y = 1) ∼ N(µ1,Σ)
P(X = x|Y = 1) =exp
(−1
2(x− µ1)>Σ−1(x− µ1)
)
(2π)d/2|Σ|1/2
Tuo Zhao — Lecture 2: Generative Learning 13/47
![Page 23: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/23.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Generative Learning
Maximum Likelihood Estimation:
L(p,µ0,µ1,Σ) = log
n∏
i=1
f(xi, yi; p,µ0,µ1,Σ)
= log
n∏
i=1
h(xi|yi; p,µ0,µ1,Σ)g(yi; p).
6
By maximizing ℓ with respect to the parameters, we find the maximum like-lihood estimate of the parameters (see problem set 1) to be:
φ =1
m
m!
i=1
1{y(i) = 1}
µ0 =
"mi=1 1{y(i) = 0}x(i)
"mi=1 1{y(i) = 0}
µ1 =
"mi=1 1{y(i) = 1}x(i)
"mi=1 1{y(i) = 1}
Σ =1
m
m!
i=1
(x(i) − µy(i))(x(i) − µy(i))T .
Pictorially, what the algorithm is doing can be seen in as follows:
−2 −1 0 1 2 3 4 5 6 7−7
−6
−5
−4
−3
−2
−1
0
1
Shown in the figure are the training set, as well as the contours of thetwo Gaussian distributions that have been fit to the data in each of thetwo classes. Note that the two Gaussians have contours that are the sameshape and orientation, since they share a covariance matrix Σ, but they havedifferent means µ0 and µ1. Also shown in the figure is the straight linegiving the decision boundary at which p(y = 1|x) = 0.5. On one side ofthe boundary, we’ll predict y = 1 to be the most likely outcome, and on theother side, we’ll predict y = 0.
1.3 Discussion: GDA and logistic regression
The GDA model has an interesting relationship to logistic regression. If weview the quantity p(y = 1|x; φ, µ0, µ1, Σ) as a function of x, we’ll find that it
Tuo Zhao — Lecture 2: Generative Learning 14/47
![Page 24: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/24.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Generative Learning
Maximum Likelihood Estimation:
L(p,µ0,µ1,Σ) = log
n∏
i=1
h(xi|yi; p,µ0,µ1,Σ)g(yi; p).
Convex Minimization
µ0 =
∑ni=1 xi · (1− yi)n−∑n
i=1 yiand µ1 =
∑ni=1 xi · yi∑ni=1 yi
p =
∑ni=1 yin
and Σk =1
nk
∑
yi=k
(xi − µyi)(xi − µyi)>
d(d+ 1) + 2d+ 1 parameters to estimate.
Tuo Zhao — Lecture 2: Generative Learning 15/47
![Page 25: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/25.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Generative Learning
Maximum Likelihood Estimation:
L(p,µ0,µ1,Σ) = log
n∏
i=1
h(xi|yi; p,µ0,µ1,Σ)g(yi; p).
Convex Minimization
µ0 =
∑ni=1 xi · (1− yi)n−∑n
i=1 yiand µ1 =
∑ni=1 xi · yi∑ni=1 yi
p =
∑ni=1 yin
and Σk =1
nk
∑
yi=k
(xi − µyi)(xi − µyi)>
d(d+ 1) + 2d+ 1 parameters to estimate.
Tuo Zhao — Lecture 2: Generative Learning 15/47
![Page 26: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/26.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Generative Learning
Maximum Likelihood Estimation:
L(p,µ0,µ1,Σ) = log
n∏
i=1
h(xi|yi; p,µ0,µ1,Σ)g(yi; p).
Convex Minimization
µ0 =
∑ni=1 xi · (1− yi)n−∑n
i=1 yiand µ1 =
∑ni=1 xi · yi∑ni=1 yi
p =
∑ni=1 yin
and Σk =1
nk
∑
yi=k
(xi − µyi)(xi − µyi)>
d(d+ 1) + 2d+ 1 parameters to estimate.
Tuo Zhao — Lecture 2: Generative Learning 15/47
![Page 27: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/27.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Generative Learning
Maximum Likelihood Estimation:
L(p,µ0,µ1,Σ) = log
n∏
i=1
h(xi|yi; p,µ0,µ1,Σ)g(yi; p).
Convex Minimization
µ0 =
∑ni=1 xi · (1− yi)n−∑n
i=1 yiand µ1 =
∑ni=1 xi · yi∑ni=1 yi
p =
∑ni=1 yin
and Σk =1
nk
∑
yi=k
(xi − µyi)(xi − µyi)>
d(d+ 1) + 2d+ 1 parameters to estimate.
Tuo Zhao — Lecture 2: Generative Learning 15/47
![Page 28: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/28.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Generative Learning
Maximum Likelihood Estimation:
L(p,µ0,µ1,Σ) = log
n∏
i=1
h(xi|yi; p,µ0,µ1,Σ)g(yi; p).
Convex Minimization
µ0 =
∑ni=1 xi · (1− yi)n−∑n
i=1 yiand µ1 =
∑ni=1 xi · yi∑ni=1 yi
p =
∑ni=1 yin
and Σk =1
nk
∑
yi=k
(xi − µyi)(xi − µyi)>
d(d+ 1) + 2d+ 1 parameters to estimate.
Tuo Zhao — Lecture 2: Generative Learning 15/47
![Page 29: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/29.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Gaussian Discriminant Analysis
Prediction: Given X ∈ Rd, we predict
Y = argmaxY ∈{0,1}
P(Y |X; p, µ0, µ1, Σ).
Since we have [Analytical Problem in HW3]
log
(P(Y = 1|X)
1− P(Y = 1|X)
)= −1
2(µ1 + µ0)
>Σ−1(µ1 − µ0)
+ (µ1 − µ0)Σ−1X + log
(p
1− p
),
this is actually a logistic regression model!
But different from maximizing the conditional log likelihood!
Tuo Zhao — Lecture 2: Generative Learning 16/47
![Page 30: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/30.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Gaussian Discriminant Analysis
Prediction: Given X ∈ Rd, we predict
Y = argmaxY ∈{0,1}
P(Y |X; p, µ0, µ1, Σ).
Since we have [Analytical Problem in HW3]
log
(P(Y = 1|X)
1− P(Y = 1|X)
)= −1
2(µ1 + µ0)
>Σ−1(µ1 − µ0)
+ (µ1 − µ0)Σ−1X + log
(p
1− p
),
this is actually a logistic regression model!
But different from maximizing the conditional log likelihood!
Tuo Zhao — Lecture 2: Generative Learning 16/47
![Page 31: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/31.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Gaussian Discriminant Analysis
Prediction: Given X ∈ Rd, we predict
Y = argmaxY ∈{0,1}
P(Y |X; p, µ0, µ1, Σ).
Since we have [Analytical Problem in HW3]
log
(P(Y = 1|X)
1− P(Y = 1|X)
)= −1
2(µ1 + µ0)
>Σ−1(µ1 − µ0)
+ (µ1 − µ0)Σ−1X + log
(p
1− p
),
this is actually a logistic regression model!
But different from maximizing the conditional log likelihood!
Tuo Zhao — Lecture 2: Generative Learning 16/47
![Page 32: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/32.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
GDA v.s. Logistic Regression
Gaussian Discriminant Analysis
Modeling Assumption: Terrible
d(d+ 1)/2 + 2d+ 1 parameters: Terrible
Simple with a closed form solution: Not very useful!
Logistic Regression
Modeling Assumption: More Robust!
d parameters: Fewer!
Need an iterative optimization algorithm: Not bad!
Tuo Zhao — Lecture 2: Generative Learning 17/47
![Page 33: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/33.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
GDA v.s. Logistic Regression
Gaussian Discriminant Analysis
Modeling Assumption: Terrible
d(d+ 1)/2 + 2d+ 1 parameters: Terrible
Simple with a closed form solution: Not very useful!
Logistic Regression
Modeling Assumption: More Robust!
d parameters: Fewer!
Need an iterative optimization algorithm: Not bad!
Tuo Zhao — Lecture 2: Generative Learning 17/47
![Page 34: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/34.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
GDA v.s. Logistic Regression
Gaussian Discriminant Analysis
Modeling Assumption: Terrible
d(d+ 1)/2 + 2d+ 1 parameters: Terrible
Simple with a closed form solution: Not very useful!
Logistic Regression
Modeling Assumption: More Robust!
d parameters: Fewer!
Need an iterative optimization algorithm: Not bad!
Tuo Zhao — Lecture 2: Generative Learning 17/47
![Page 35: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/35.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
GDA v.s. Logistic Regression
Gaussian Discriminant Analysis
Modeling Assumption: Terrible
d(d+ 1)/2 + 2d+ 1 parameters: Terrible
Simple with a closed form solution: Not very useful!
Logistic Regression
Modeling Assumption: More Robust!
d parameters: Fewer!
Need an iterative optimization algorithm: Not bad!
Tuo Zhao — Lecture 2: Generative Learning 17/47
![Page 36: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/36.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
GDA v.s. Logistic Regression
Gaussian Discriminant Analysis
Modeling Assumption: Terrible
d(d+ 1)/2 + 2d+ 1 parameters: Terrible
Simple with a closed form solution: Not very useful!
Logistic Regression
Modeling Assumption: More Robust!
d parameters: Fewer!
Need an iterative optimization algorithm: Not bad!
Tuo Zhao — Lecture 2: Generative Learning 17/47
![Page 37: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/37.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
GDA v.s. Logistic Regression
Gaussian Discriminant Analysis
Modeling Assumption: Terrible
d(d+ 1)/2 + 2d+ 1 parameters: Terrible
Simple with a closed form solution: Not very useful!
Logistic Regression
Modeling Assumption: More Robust!
d parameters: Fewer!
Need an iterative optimization algorithm: Not bad!
Tuo Zhao — Lecture 2: Generative Learning 17/47
![Page 38: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/38.jpg)
Naive Bayes Classification
![Page 39: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/39.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Gaussian Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(X|Y = 0) ∼ N(µ0,Σ)
P(X|Y = 1) ∼ N(µ1,Σ)
Σ =
σ21σ22
. . .
σ2d
4
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Here’s one last set of examples generated by varying Σ:
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
The plots above used, respectively,
Σ =
!1 -0.5
-0.5 1
"; Σ =
!1 -0.8
-0.8 1
"; .Σ =
!3 0.8
0.8 1
".
From the leftmost and middle figures, we see that by decreasing the off-diagonal elements of the covariance matrix, the density now becomes “com-pressed” again, but in the opposite direction. Lastly, as we vary the pa-rameters, more generally the contours will form ellipses (the rightmost figureshowing an example).
As our last set of examples, fixing Σ = I, by varying µ, we can also movethe mean of the density around.
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
The figures above were generated using Σ = I, and respectively
µ =
!10
"; µ =
!-0.50
"; µ =
!-1
-1.5
".
4
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Here’s one last set of examples generated by varying Σ:
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
The plots above used, respectively,
Σ =
!1 -0.5
-0.5 1
"; Σ =
!1 -0.8
-0.8 1
"; .Σ =
!3 0.8
0.8 1
".
From the leftmost and middle figures, we see that by decreasing the off-diagonal elements of the covariance matrix, the density now becomes “com-pressed” again, but in the opposite direction. Lastly, as we vary the pa-rameters, more generally the contours will form ellipses (the rightmost figureshowing an example).
As our last set of examples, fixing Σ = I, by varying µ, we can also movethe mean of the density around.
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
The figures above were generated using Σ = I, and respectively
µ =
!10
"; µ =
!-0.50
"; µ =
!-1
-1.5
".
Tuo Zhao — Lecture 2: Generative Learning 19/47
![Page 40: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/40.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Gaussian Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(X|Y = 0) ∼ N(µ0,Σ)
P(X|Y = 1) ∼ N(µ1,Σ)
Σ =
σ21σ22
. . .
σ2d
4
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Here’s one last set of examples generated by varying Σ:
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
The plots above used, respectively,
Σ =
!1 -0.5
-0.5 1
"; Σ =
!1 -0.8
-0.8 1
"; .Σ =
!3 0.8
0.8 1
".
From the leftmost and middle figures, we see that by decreasing the off-diagonal elements of the covariance matrix, the density now becomes “com-pressed” again, but in the opposite direction. Lastly, as we vary the pa-rameters, more generally the contours will form ellipses (the rightmost figureshowing an example).
As our last set of examples, fixing Σ = I, by varying µ, we can also movethe mean of the density around.
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
The figures above were generated using Σ = I, and respectively
µ =
!10
"; µ =
!-0.50
"; µ =
!-1
-1.5
".
4
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Here’s one last set of examples generated by varying Σ:
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
The plots above used, respectively,
Σ =
!1 -0.5
-0.5 1
"; Σ =
!1 -0.8
-0.8 1
"; .Σ =
!3 0.8
0.8 1
".
From the leftmost and middle figures, we see that by decreasing the off-diagonal elements of the covariance matrix, the density now becomes “com-pressed” again, but in the opposite direction. Lastly, as we vary the pa-rameters, more generally the contours will form ellipses (the rightmost figureshowing an example).
As our last set of examples, fixing Σ = I, by varying µ, we can also movethe mean of the density around.
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
The figures above were generated using Σ = I, and respectively
µ =
!10
"; µ =
!-0.50
"; µ =
!-1
-1.5
".
Tuo Zhao — Lecture 2: Generative Learning 19/47
![Page 41: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/41.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Gaussian Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(X|Y = 0) ∼ N(µ0,Σ)
P(X|Y = 1) ∼ N(µ1,Σ)
Σ =
σ21σ22
. . .
σ2d
4
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Here’s one last set of examples generated by varying Σ:
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
The plots above used, respectively,
Σ =
!1 -0.5
-0.5 1
"; Σ =
!1 -0.8
-0.8 1
"; .Σ =
!3 0.8
0.8 1
".
From the leftmost and middle figures, we see that by decreasing the off-diagonal elements of the covariance matrix, the density now becomes “com-pressed” again, but in the opposite direction. Lastly, as we vary the pa-rameters, more generally the contours will form ellipses (the rightmost figureshowing an example).
As our last set of examples, fixing Σ = I, by varying µ, we can also movethe mean of the density around.
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
The figures above were generated using Σ = I, and respectively
µ =
!10
"; µ =
!-0.50
"; µ =
!-1
-1.5
".
4
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Here’s one last set of examples generated by varying Σ:
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
The plots above used, respectively,
Σ =
!1 -0.5
-0.5 1
"; Σ =
!1 -0.8
-0.8 1
"; .Σ =
!3 0.8
0.8 1
".
From the leftmost and middle figures, we see that by decreasing the off-diagonal elements of the covariance matrix, the density now becomes “com-pressed” again, but in the opposite direction. Lastly, as we vary the pa-rameters, more generally the contours will form ellipses (the rightmost figureshowing an example).
As our last set of examples, fixing Σ = I, by varying µ, we can also movethe mean of the density around.
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
The figures above were generated using Σ = I, and respectively
µ =
!10
"; µ =
!-0.50
"; µ =
!-1
-1.5
".
Tuo Zhao — Lecture 2: Generative Learning 19/47
![Page 42: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/42.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Gaussian Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(X|Y = 0) ∼ N(µ0,Σ)
P(X|Y = 1) ∼ N(µ1,Σ)
Σ =
σ21σ22
. . .
σ2d
4
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Here’s one last set of examples generated by varying Σ:
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
The plots above used, respectively,
Σ =
!1 -0.5
-0.5 1
"; Σ =
!1 -0.8
-0.8 1
"; .Σ =
!3 0.8
0.8 1
".
From the leftmost and middle figures, we see that by decreasing the off-diagonal elements of the covariance matrix, the density now becomes “com-pressed” again, but in the opposite direction. Lastly, as we vary the pa-rameters, more generally the contours will form ellipses (the rightmost figureshowing an example).
As our last set of examples, fixing Σ = I, by varying µ, we can also movethe mean of the density around.
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
The figures above were generated using Σ = I, and respectively
µ =
!10
"; µ =
!-0.50
"; µ =
!-1
-1.5
".
4
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
Here’s one last set of examples generated by varying Σ:
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
−3 −2 −1 0 1 2 3−3
−2
−1
0
1
2
3
The plots above used, respectively,
Σ =
!1 -0.5
-0.5 1
"; Σ =
!1 -0.8
-0.8 1
"; .Σ =
!3 0.8
0.8 1
".
From the leftmost and middle figures, we see that by decreasing the off-diagonal elements of the covariance matrix, the density now becomes “com-pressed” again, but in the opposite direction. Lastly, as we vary the pa-rameters, more generally the contours will form ellipses (the rightmost figureshowing an example).
As our last set of examples, fixing Σ = I, by varying µ, we can also movethe mean of the density around.
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
−3−2
−10
12
3
−3−2
−10
12
3
0.05
0.1
0.15
0.2
0.25
The figures above were generated using Σ = I, and respectively
µ =
!10
"; µ =
!-0.50
"; µ =
!-1
-1.5
".
Tuo Zhao — Lecture 2: Generative Learning 19/47
![Page 43: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/43.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Gaussian Discriminant Analysis
Conditional Independence:
P(X|Y ) =
d∏
j=1
P(Xj |Y ) = P(X1|Y )P(X2|Y ) · · ·P(Xd|Y )
A Simpler Decision Rule:
P(Y = 1|X) =P(X|Y = 1)P(Y = 1)
P(X)
=
∏dj=1 P(Xj |Y = 1)P(Y = 1)
∏dj=1 P(Xj |Y = 1)P(Y = 1) +
∏dj=1 P(Xj |Y = 0)P(Y = 0)
Tuo Zhao — Lecture 2: Generative Learning 20/47
![Page 44: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/44.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Gaussian Discriminant Analysis
Conditional Independence:
P(X|Y ) =
d∏
j=1
P(Xj |Y ) = P(X1|Y )P(X2|Y ) · · ·P(Xd|Y )
A Simpler Decision Rule:
P(Y = 1|X) =P(X|Y = 1)P(Y = 1)
P(X)
=
∏dj=1 P(Xj |Y = 1)P(Y = 1)
∏dj=1 P(Xj |Y = 1)P(Y = 1) +
∏dj=1 P(Xj |Y = 0)P(Y = 0)
Tuo Zhao — Lecture 2: Generative Learning 20/47
![Page 45: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/45.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Gaussian Discriminant Analysis
Maximum Likelihood Estimation:
µ0 =
∑ni=1 xi · (1− yi)n−∑n
i=1 yiand µ1 =
∑ni=1 xi · yi∑ni=1 yi
p =
∑ni=1 yin
and σ2j =1
n
n∑
i=1
(xi,j − µyi,j)2
3d+ 1 parameters to estimate.
Tuo Zhao — Lecture 2: Generative Learning 21/47
![Page 46: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/46.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Gaussian Discriminant Analysis
Maximum Likelihood Estimation:
µ0 =
∑ni=1 xi · (1− yi)n−∑n
i=1 yiand µ1 =
∑ni=1 xi · yi∑ni=1 yi
p =
∑ni=1 yin
and σ2j =1
n
n∑
i=1
(xi,j − µyi,j)2
3d+ 1 parameters to estimate.
Tuo Zhao — Lecture 2: Generative Learning 21/47
![Page 47: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/47.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Gaussian Discriminant Analysis
Maximum Likelihood Estimation:
µ0 =
∑ni=1 xi · (1− yi)n−∑n
i=1 yiand µ1 =
∑ni=1 xi · yi∑ni=1 yi
p =
∑ni=1 yin
and σ2j =1
n
n∑
i=1
(xi,j − µyi,j)2
3d+ 1 parameters to estimate.
Tuo Zhao — Lecture 2: Generative Learning 21/47
![Page 48: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/48.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Gaussian Discriminant Analysis
Missing values?
Example: X = (X1, ..., Xd−1)
P(Y = 1|X) =P(X|Y = 1)P(Y = 1)
P(X)
=
∏d−1j=1 P(Xj |Y = 1)P(Y = 1)
∏d−1j=1 P(Xj |Y = 1)P(Y = 1) +
∏d−1j=1 P(Xj |Y = 0)P(Y = 0)
Tuo Zhao — Lecture 2: Generative Learning 22/47
![Page 49: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/49.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
GDA v.s. Naive Bayes GDA
Gaussian Discriminant Analysis
Stronger Modeling Assumption: Terrible
d(d+ 1)/2 + 2d+ 1 parameters: Terrible
A simple closed form solution: Not very useful!
Naive Bayes GDA
Even Stronger Modeling Assumption: Terrible!
3d+ 1 parameters: Good!
A super simple closed form solution: Useful sometimes!
Tuo Zhao — Lecture 2: Generative Learning 23/47
![Page 50: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/50.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Bernoulli Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(Xj |Y = 0) ∼ Bernoulli(γ(0)j )
P(Xj |Y = 1) ∼ Bernoulli(γ(1)j )
Conditional Independence:
P(X|Y ) =d∏
j=1
P(Xj |Y ) = P(X1|Y )P(X2|Y ) · · ·P(Xd|Y )
Tuo Zhao — Lecture 2: Generative Learning 24/47
![Page 51: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/51.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Bernoulli Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(Xj |Y = 0) ∼ Bernoulli(γ(0)j )
P(Xj |Y = 1) ∼ Bernoulli(γ(1)j )
Conditional Independence:
P(X|Y ) =d∏
j=1
P(Xj |Y ) = P(X1|Y )P(X2|Y ) · · ·P(Xd|Y )
Tuo Zhao — Lecture 2: Generative Learning 24/47
![Page 52: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/52.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Bernoulli Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(Xj |Y = 0) ∼ Bernoulli(γ(0)j )
P(Xj |Y = 1) ∼ Bernoulli(γ(1)j )
Conditional Independence:
P(X|Y ) =
d∏
j=1
P(Xj |Y ) = P(X1|Y )P(X2|Y ) · · ·P(Xd|Y )
Tuo Zhao — Lecture 2: Generative Learning 24/47
![Page 53: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/53.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Bernoulli Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(Xj |Y = 0) ∼ Bernoulli(γ(0)j )
P(Xj |Y = 1) ∼ Bernoulli(γ(1)j )
Conditional Independence:
P(X|Y ) =
d∏
j=1
P(Xj |Y ) = P(X1|Y )P(X2|Y ) · · ·P(Xd|Y )
Tuo Zhao — Lecture 2: Generative Learning 24/47
![Page 54: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/54.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Poisson Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(Xj |Y = 0) ∼ Poisson(λ(0)j )
P(Xj |Y = 1) ∼ Poisson(λ(1)j )
Conditional Independence:
P(X|Y ) =d∏
j=1
P(Xj |Y ) = P(X1|Y )P(X2|Y ) · · ·P(Xd|Y )
Tuo Zhao — Lecture 2: Generative Learning 25/47
![Page 55: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/55.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Poisson Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(Xj |Y = 0) ∼ Poisson(λ(0)j )
P(Xj |Y = 1) ∼ Poisson(λ(1)j )
Conditional Independence:
P(X|Y ) =d∏
j=1
P(Xj |Y ) = P(X1|Y )P(X2|Y ) · · ·P(Xd|Y )
Tuo Zhao — Lecture 2: Generative Learning 25/47
![Page 56: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/56.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Poisson Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(Xj |Y = 0) ∼ Poisson(λ(0)j )
P(Xj |Y = 1) ∼ Poisson(λ(1)j )
Conditional Independence:
P(X|Y ) =
d∏
j=1
P(Xj |Y ) = P(X1|Y )P(X2|Y ) · · ·P(Xd|Y )
Tuo Zhao — Lecture 2: Generative Learning 25/47
![Page 57: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/57.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Naive Bayes Poisson Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(Xj |Y = 0) ∼ Poisson(λ(0)j )
P(Xj |Y = 1) ∼ Poisson(λ(1)j )
Conditional Independence:
P(X|Y ) =
d∏
j=1
P(Xj |Y ) = P(X1|Y )P(X2|Y ) · · ·P(Xd|Y )
Tuo Zhao — Lecture 2: Generative Learning 25/47
![Page 58: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/58.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Example: Spam Email Classification
Data Set:
4601 email messages
Goal: predict whether an email message is spam or good.
Features: the frequencies in a message of 48 of the mostcommonly occurring words in all these email messages.
We coded spam as 1 and email as 0.
Tuo Zhao — Lecture 2: Generative Learning 26/47
![Page 59: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/59.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Example: Spam Email Classification
Data Set:
4601 email messages
Goal: predict whether an email message is spam or good.
Features: the frequencies in a message of 48 of the mostcommonly occurring words in all these email messages.
We coded spam as 1 and email as 0.
Tuo Zhao — Lecture 2: Generative Learning 26/47
![Page 60: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/60.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Example: Spam Email Classification
Data Set:
4601 email messages
Goal: predict whether an email message is spam or good.
Features: the frequencies in a message of 48 of the mostcommonly occurring words in all these email messages.
We coded spam as 1 and email as 0.
Tuo Zhao — Lecture 2: Generative Learning 26/47
![Page 61: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/61.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Example: Spam Email Classification
Data Set:
4601 email messages
Goal: predict whether an email message is spam or good.
Features: the frequencies in a message of 48 of the mostcommonly occurring words in all these email messages.
We coded spam as 1 and email as 0.
Tuo Zhao — Lecture 2: Generative Learning 26/47
![Page 62: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/62.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Example: Spam Email Classification
Data Set:
4601 email messages
Goal: predict whether an email message is spam or good.
Features: the frequencies in a message of 48 of the mostcommonly occurring words in all these email messages.
We coded spam as 1 and email as 0.
Tuo Zhao — Lecture 2: Generative Learning 26/47
![Page 63: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/63.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Example: Spam Email Classification
Transforming Features:
Naive Bayes GDA:
Relative Frequency of “free” =# of free in this email
# of all words in this email
Naive Bayes Bernoulli DA:
Indicator of “free” = 1 if “free” appears in this email
Naive Bayes Poisson DA: No transformation needed
Coding Problem in HW2.
Tuo Zhao — Lecture 2: Generative Learning 27/47
![Page 64: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/64.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Example: Spam Email Classification
Transforming Features:
Naive Bayes GDA:
Relative Frequency of “free” =# of free in this email
# of all words in this email
Naive Bayes Bernoulli DA:
Indicator of “free” = 1 if “free” appears in this email
Naive Bayes Poisson DA: No transformation needed
Coding Problem in HW2.
Tuo Zhao — Lecture 2: Generative Learning 27/47
![Page 65: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/65.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Example: Spam Email Classification
Transforming Features:
Naive Bayes GDA:
Relative Frequency of “free” =# of free in this email
# of all words in this email
Naive Bayes Bernoulli DA:
Indicator of “free” = 1 if “free” appears in this email
Naive Bayes Poisson DA: No transformation needed
Coding Problem in HW2.
Tuo Zhao — Lecture 2: Generative Learning 27/47
![Page 66: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/66.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Example: Spam Email Classification
Transforming Features:
Naive Bayes GDA:
Relative Frequency of “free” =# of free in this email
# of all words in this email
Naive Bayes Bernoulli DA:
Indicator of “free” = 1 if “free” appears in this email
Naive Bayes Poisson DA: No transformation needed
Coding Problem in HW2.
Tuo Zhao — Lecture 2: Generative Learning 27/47
![Page 67: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/67.jpg)
Multiclass Fisher Discriminant Analysis
![Page 68: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/68.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Revisiting GDA
A Dimensionality Reduction Perspective:
Between-Class Scatter Matrix:
Γ =∑
k=0,1
nkn(µk − µ)(µk − µ)>,
where
µ =1
n
n∑
i=1
xi, n1 =
n∑
i=1
yi and n0 = n− n1.
Rayleigh Quotient Formulation
w = argmaxw
w>Γw
w>Σw= argmax
ww>Γw s.t. w>Σw = 1.
Tuo Zhao — Lecture 2: Generative Learning 29/47
![Page 69: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/69.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Revisiting GDA
A Dimensionality Reduction Perspective:
Between-Class Scatter Matrix:
Γ =∑
k=0,1
nkn(µk − µ)(µk − µ)>,
where
µ =1
n
n∑
i=1
xi, n1 =
n∑
i=1
yi and n0 = n− n1.
Rayleigh Quotient Formulation
w = argmaxw
w>Γw
w>Σw= argmax
ww>Γw s.t. w>Σw = 1.
Tuo Zhao — Lecture 2: Generative Learning 29/47
![Page 70: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/70.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
FDA and Dimension Reduction
Tuo Zhao — Lecture 2: Generative Learning 30/47
![Page 71: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/71.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Multiclass Fisher Discriminant Analysis
Generative View:
Y ∼ Discrete(p1, ..., pm) with∑m
k=1 pk = 1
P(X|Y = k) ∼ N(µk,Σ)
Between-Class Scatter Matrix:
Γ =1
m
m∑
k=1
nkn(µk − µ)(µk − µ)> with nk =
n∑
i=1
1(yi = k).
Rayleigh Quotient Formulation
W = argmaxW∈Rd×r
trace(W>ΓW) s.t. W>ΣW = Ir.
Tuo Zhao — Lecture 2: Generative Learning 31/47
![Page 72: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/72.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Multiclass Fisher Discriminant Analysis
Generative View:
Y ∼ Discrete(p1, ..., pm) with∑m
k=1 pk = 1
P(X|Y = k) ∼ N(µk,Σ)
Between-Class Scatter Matrix:
Γ =1
m
m∑
k=1
nkn(µk − µ)(µk − µ)> with nk =
n∑
i=1
1(yi = k).
Rayleigh Quotient Formulation
W = argmaxW∈Rd×r
trace(W>ΓW) s.t. W>ΣW = Ir.
Tuo Zhao — Lecture 2: Generative Learning 31/47
![Page 73: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/73.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Multiclass Fisher Discriminant Analysis
Generative View:
Y ∼ Discrete(p1, ..., pm) with∑m
k=1 pk = 1
P(X|Y = k) ∼ N(µk,Σ)
Between-Class Scatter Matrix:
Γ =1
m
m∑
k=1
nkn(µk − µ)(µk − µ)> with nk =
n∑
i=1
1(yi = k).
Rayleigh Quotient Formulation
W = argmaxW∈Rd×r
trace(W>ΓW) s.t. W>ΣW = Ir.
Tuo Zhao — Lecture 2: Generative Learning 31/47
![Page 74: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/74.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Multiclass Fisher Discriminant Analysis
Generative View:
Y ∼ Discrete(p1, ..., pm) with∑m
k=1 pk = 1
P(X|Y = k) ∼ N(µk,Σ)
Between-Class Scatter Matrix:
Γ =1
m
m∑
k=1
nkn(µk − µ)(µk − µ)> with nk =
n∑
i=1
1(yi = k).
Rayleigh Quotient Formulation
W = argmaxW∈Rd×r
trace(W>ΓW) s.t. W>ΣW = Ir.
Tuo Zhao — Lecture 2: Generative Learning 31/47
![Page 75: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/75.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Multiclass Fisher Discriminant Analysis
Generative View:
Y ∼ Discrete(p1, ..., pm) with∑m
k=1 pk = 1
P(X|Y = k) ∼ N(µk,Σ)
Between-Class Scatter Matrix:
Γ =1
m
m∑
k=1
nkn(µk − µ)(µk − µ)> with nk =
n∑
i=1
1(yi = k).
Rayleigh Quotient Formulation
W = argmaxW∈Rd×r
trace(W>ΓW) s.t. W>ΣW = Ir.
Tuo Zhao — Lecture 2: Generative Learning 31/47
![Page 76: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/76.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Multiclass Fisher Discriminant Analysis
Tuo Zhao — Lecture 2: Generative Learning 32/47
![Page 77: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/77.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Eigenvalue Problem (Rank-1)
Rayleigh Quotient Formulation
w = argmaxw∈Rd
w>Aw s.t. w>w = 1.
Lagrangian Multiplier Method: λ ∈ R
L(w, λ) = w>Aw − λ(w>w − 1).
We only need eigenvectors of A, since
∇wL(w, λ) = 2Aw − 2λw = 0,
∇λL(w, λ) = w>w − 1 = 0.
Tuo Zhao — Lecture 2: Generative Learning 33/47
![Page 78: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/78.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Eigenvalue Problem (Rank-1)
Rayleigh Quotient Formulation
w = argmaxw∈Rd
w>Aw s.t. w>w = 1.
Lagrangian Multiplier Method: λ ∈ R
L(w, λ) = w>Aw − λ(w>w − 1).
We only need eigenvectors of A, since
∇wL(w, λ) = 2Aw − 2λw = 0,
∇λL(w, λ) = w>w − 1 = 0.
Tuo Zhao — Lecture 2: Generative Learning 33/47
![Page 79: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/79.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Eigenvalue Problem (Rank-1)
Rayleigh Quotient Formulation
w = argmaxw∈Rd
w>Aw s.t. w>w = 1.
Lagrangian Multiplier Method: λ ∈ R
L(w, λ) = w>Aw − λ(w>w − 1).
We only need eigenvectors of A, since
∇wL(w, λ) = 2Aw − 2λw = 0,
∇λL(w, λ) = w>w − 1 = 0.
Tuo Zhao — Lecture 2: Generative Learning 33/47
![Page 80: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/80.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Eigenvalue Problem (Rank-r)
Rayleigh Quotient Formulation
U = argmaxU∈Rd×r
trace(U>AU) s.t. U>U = Ir,
Lagrangian Multiplier Method: Λ ∈ Rr×r and Λ = Λ>
L(U,Λ) = trace(U>AU)− trace(Λ>(U>U− Ir))
We only need eigenvectors of A, since
∇UL(U,Λ) = 2AU− 2UΛ = 0,
∇ΛL(U,Λ) = U>U− Ir = 0.
Tuo Zhao — Lecture 2: Generative Learning 34/47
![Page 81: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/81.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Eigenvalue Problem (Rank-r)
Rayleigh Quotient Formulation
U = argmaxU∈Rd×r
trace(U>AU) s.t. U>U = Ir,
Lagrangian Multiplier Method: Λ ∈ Rr×r and Λ = Λ>
L(U,Λ) = trace(U>AU)− trace(Λ>(U>U− Ir))
We only need eigenvectors of A, since
∇UL(U,Λ) = 2AU− 2UΛ = 0,
∇ΛL(U,Λ) = U>U− Ir = 0.
Tuo Zhao — Lecture 2: Generative Learning 34/47
![Page 82: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/82.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Eigenvalue Problem (Rank-r)
Rayleigh Quotient Formulation
U = argmaxU∈Rd×r
trace(U>AU) s.t. U>U = Ir,
Lagrangian Multiplier Method: Λ ∈ Rr×r and Λ = Λ>
L(U,Λ) = trace(U>AU)− trace(Λ>(U>U− Ir))
We only need eigenvectors of A, since
∇UL(U,Λ) = 2AU− 2UΛ = 0,
∇ΛL(U,Λ) = U>U− Ir = 0.
Tuo Zhao — Lecture 2: Generative Learning 34/47
![Page 83: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/83.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Generalized Eigenvalue Problem
Rayleigh Quotient Formulation
W = argmaxW∈Rd×r
trace(W>ΓW) s.t. W>ΣW = Ir.
Replace U = Σ1/2
W
U = argmaxU∈Rd×r
trace(U>AU) s.t. U>U = Ir,
where A = Σ−1/2
ΓΣ−1/2
.
Eigenvalue Problem!
Tuo Zhao — Lecture 2: Generative Learning 35/47
![Page 84: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/84.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Generalized Eigenvalue Problem
Rayleigh Quotient Formulation
W = argmaxW∈Rd×r
trace(W>ΓW) s.t. W>ΣW = Ir.
Replace U = Σ1/2
W
U = argmaxU∈Rd×r
trace(U>AU) s.t. U>U = Ir,
where A = Σ−1/2
ΓΣ−1/2
.
Eigenvalue Problem!
Tuo Zhao — Lecture 2: Generative Learning 35/47
![Page 85: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/85.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Generalized Eigenvalue Problem
Rayleigh Quotient Formulation
W = argmaxW∈Rd×r
trace(W>ΓW) s.t. W>ΣW = Ir.
Replace U = Σ1/2
W
U = argmaxU∈Rd×r
trace(U>AU) s.t. U>U = Ir,
where A = Σ−1/2
ΓΣ−1/2
.
Eigenvalue Problem!
Tuo Zhao — Lecture 2: Generative Learning 35/47
![Page 86: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/86.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Eigenvalue Problem
Power Iteration:
U(t+1) = QR(ΘU(t))
When r = 1, we have
u(t+1) =Θu(t)
∥∥u(t)∥∥2
.
where Θ = Σ−1/2
ΓΣ−1/2
,
We need T = O(gap · log(1/ε)) iterations to guarantee
|u>u(T )| = 1− ε,where gap = λ1/(λ1 − λ2).
Tuo Zhao — Lecture 2: Generative Learning 36/47
![Page 87: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/87.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Eigenvalue Problem
Power Iteration:
U(t+1) = QR(ΘU(t))
When r = 1, we have
u(t+1) =Θu(t)
∥∥u(t)∥∥2
.
where Θ = Σ−1/2
ΓΣ−1/2
,
We need T = O(gap · log(1/ε)) iterations to guarantee
|u>u(T )| = 1− ε,where gap = λ1/(λ1 − λ2).
Tuo Zhao — Lecture 2: Generative Learning 36/47
![Page 88: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/88.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Eigenvalue Problem
Power Iteration:
U(t+1) = QR(ΘU(t))
When r = 1, we have
u(t+1) =Θu(t)
∥∥u(t)∥∥2
.
where Θ = Σ−1/2
ΓΣ−1/2
,
We need T = O(gap · log(1/ε)) iterations to guarantee
|u>u(T )| = 1− ε,where gap = λ1/(λ1 − λ2).
Tuo Zhao — Lecture 2: Generative Learning 36/47
![Page 89: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/89.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Quadratic Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(Y = y) = py(1− p)1−y
P(X|Y = 0) ∼ N(µ0,Σ0)
P(X = x|Y = 0) =exp
(−1
2(x− µ0)>Σ−10 (x− µ0)
)
(2π)d/2|Σ0|1/2
P(X|Y = 1) ∼ N(µ1,Σ1)
P(X = x|Y = 1) =exp
(−1
2(x− µ1)>Σ−11 (x− µ1)
)
(2π)d/2|Σ1|1/2
Tuo Zhao — Lecture 2: Generative Learning 37/47
![Page 90: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/90.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Quadratic Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(Y = y) = py(1− p)1−y
P(X|Y = 0) ∼ N(µ0,Σ0)
P(X = x|Y = 0) =exp
(−1
2(x− µ0)>Σ−10 (x− µ0)
)
(2π)d/2|Σ0|1/2
P(X|Y = 1) ∼ N(µ1,Σ1)
P(X = x|Y = 1) =exp
(−1
2(x− µ1)>Σ−11 (x− µ1)
)
(2π)d/2|Σ1|1/2
Tuo Zhao — Lecture 2: Generative Learning 37/47
![Page 91: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/91.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Quadratic Discriminant Analysis
Generative View:
Y ∼ Bernoulli(p)
P(Y = y) = py(1− p)1−y
P(X|Y = 0) ∼ N(µ0,Σ0)
P(X = x|Y = 0) =exp
(−1
2(x− µ0)>Σ−10 (x− µ0)
)
(2π)d/2|Σ0|1/2
P(X|Y = 1) ∼ N(µ1,Σ1)
P(X = x|Y = 1) =exp
(−1
2(x− µ1)>Σ−11 (x− µ1)
)
(2π)d/2|Σ1|1/2
Tuo Zhao — Lecture 2: Generative Learning 37/47
![Page 92: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/92.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Quadratic Discriminant Analysis
Tuo Zhao — Lecture 2: Generative Learning 38/47
![Page 93: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/93.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Quadratic Discriminant Analysis
Maximum Likelihood Estimation:
L(p,µ0,µ1,Σ0,Σ1) = log
n∏
i=1
h(xi|yi; p,µ0,µ1,Σ0,Σ1)g(yi; p).
Convex Minimization
µ0 =
∑ni=1 xi · (1− yi)n−∑n
i=1 yiand µ1 =
∑ni=1 xi · yi∑ni=1 yi
p =
∑ni=1 yin
and Σk =1
nk
∑
yi=k
(xi − µk)(xi − µk)>
d(d+ 1) + 2d+ 1 parameters to estimate.
Tuo Zhao — Lecture 2: Generative Learning 39/47
![Page 94: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/94.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Quadratic Discriminant Analysis
Maximum Likelihood Estimation:
L(p,µ0,µ1,Σ0,Σ1) = log
n∏
i=1
h(xi|yi; p,µ0,µ1,Σ0,Σ1)g(yi; p).
Convex Minimization
µ0 =
∑ni=1 xi · (1− yi)n−∑n
i=1 yiand µ1 =
∑ni=1 xi · yi∑ni=1 yi
p =
∑ni=1 yin
and Σk =1
nk
∑
yi=k
(xi − µk)(xi − µk)>
d(d+ 1) + 2d+ 1 parameters to estimate.
Tuo Zhao — Lecture 2: Generative Learning 39/47
![Page 95: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/95.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Quadratic Discriminant Analysis
Maximum Likelihood Estimation:
L(p,µ0,µ1,Σ0,Σ1) = log
n∏
i=1
h(xi|yi; p,µ0,µ1,Σ0,Σ1)g(yi; p).
Convex Minimization
µ0 =
∑ni=1 xi · (1− yi)n−∑n
i=1 yiand µ1 =
∑ni=1 xi · yi∑ni=1 yi
p =
∑ni=1 yin
and Σk =1
nk
∑
yi=k
(xi − µk)(xi − µk)>
d(d+ 1) + 2d+ 1 parameters to estimate.
Tuo Zhao — Lecture 2: Generative Learning 39/47
![Page 96: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/96.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Quadratic Discriminant Analysis
Maximum Likelihood Estimation:
L(p,µ0,µ1,Σ0,Σ1) = log
n∏
i=1
h(xi|yi; p,µ0,µ1,Σ0,Σ1)g(yi; p).
Convex Minimization
µ0 =
∑ni=1 xi · (1− yi)n−∑n
i=1 yiand µ1 =
∑ni=1 xi · yi∑ni=1 yi
p =
∑ni=1 yin
and Σk =1
nk
∑
yi=k
(xi − µk)(xi − µk)>
d(d+ 1) + 2d+ 1 parameters to estimate.
Tuo Zhao — Lecture 2: Generative Learning 39/47
![Page 97: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/97.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Quadratic Discriminant Analysis
Maximum Likelihood Estimation:
L(p,µ0,µ1,Σ0,Σ1) = log
n∏
i=1
h(xi|yi; p,µ0,µ1,Σ0,Σ1)g(yi; p).
Convex Minimization
µ0 =
∑ni=1 xi · (1− yi)n−∑n
i=1 yiand µ1 =
∑ni=1 xi · yi∑ni=1 yi
p =
∑ni=1 yin
and Σk =1
nk
∑
yi=k
(xi − µk)(xi − µk)>
d(d+ 1) + 2d+ 1 parameters to estimate.
Tuo Zhao — Lecture 2: Generative Learning 39/47
![Page 98: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/98.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
GDA v.s. QDA
Gaussian Discriminant Analysis
Stronger Modeling Assumption: Terrible
d(d+ 1)/2 + 2d+ 1 parameters: Terrible
A simple closed form solution: Not very useful!
Quadratic Discriminant Analysis
Weaker Modeling Assumption: Still Terrible!
d(d+ 1) + 2d+ 1 parameters: More Terrible!
A simple closed form solution: Not very useful!
Tuo Zhao — Lecture 2: Generative Learning 40/47
![Page 99: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/99.jpg)
Multiclass Classification
![Page 100: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/100.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
K-Nearest Neighbor Classification
Very intuitive....
Tuo Zhao — Lecture 2: Generative Learning 42/47
![Page 101: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/101.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Model Complexity?
More flexible for larger K’s?
Not really!
Tuo Zhao — Lecture 2: Generative Learning 43/47
![Page 102: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/102.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Curse of Dimensionality
Tuo Zhao — Lecture 2: Generative Learning 44/47
![Page 103: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/103.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Local Linear Regression
Build linear regression models using ONLY neighbors
Tuo Zhao — Lecture 2: Generative Learning 45/47
![Page 104: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/104.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Local Logistic Regression
Build logistic regression models using ONLY neighbors
Tuo Zhao — Lecture 2: Generative Learning 46/47
![Page 105: Lecture 2: Generative Learningtzhao80/Lectures/Lecture_2.pdf · Lecture 2: Generative Learning Tuo Zhao Schools of ISYE and CSE, Georgia Tech](https://reader034.vdocuments.net/reader034/viewer/2022042310/5ed82a150fa3e705ec0df49f/html5/thumbnails/105.jpg)
CS7641/ISYE/CSE 6740: Machine Learning/Computational Data Analysis
Multinomial Regression
Given x1, ...,xn ∈ Rd, y1, ..., yn ∈ {1, 2, ...,m}, andθ∗1, ...,θ
∗m−1 ∈ Rd, for k = 1, ...,m− 1 and i = 1, ..., n,
P(yi = k) =exp(−x>i θ∗k)
1 +
m−1∑
k=1
exp(−x>i θ∗k),
P(yi = m) =1
1 +
m−1∑
k=1
exp(−x>i θ∗k)
Maximum Likelihood Estimation: Still a convex problem.
Tuo Zhao — Lecture 2: Generative Learning 47/47