classification: probabilistic generative model

34
Classification: Probabilistic Generative Model

Upload: others

Post on 06-May-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Classification: Probabilistic Generative Model

Classification: Probabilistic Generative Model

Page 2: Classification: Probabilistic Generative Model

Classification

โ€ข Credit Scoring

โ€ข Input: income, savings, profession, age, past financial history โ€ฆโ€ฆ

โ€ข Output: accept or refuse

โ€ข Medical Diagnosis

โ€ข Input: current symptoms, age, gender, past medical history โ€ฆโ€ฆ

โ€ข Output: which kind of diseases

โ€ข Handwritten character recognition

โ€ข Face recognition

โ€ข Input: image of a face, output: person

้‡‘Input:output:

Function๐‘ฅ Class n

Page 3: Classification: Probabilistic Generative Model

Example Application

๐‘“ = ๐‘“ = ๐‘“ =

Page 4: Classification: Probabilistic Generative Model

Example Application

โ€ข HP: hit points, or health, defines how much damage a pokemon can withstand before fainting

โ€ข Attack: the base modifier for normal attacks (eg. Scratch, Punch)

โ€ข Defense: the base damage resistance against normal attacks

โ€ข SP Atk: special attack, the base modifier for special attacks (e.g. fire blast, bubble beam)

โ€ข SP Def: the base damage resistance against special attacks

โ€ข Speed: determines which pokemon attacks first each round

Can we predict the โ€œtypeโ€ of pokemon based on the information?

35

55

40

50

50

90

pokemon games (NOT pokemon cards or Pokemon Go)

Page 5: Classification: Probabilistic Generative Model

Example Application

Page 6: Classification: Probabilistic Generative Model

Ideal Alternatives

โ€ข Function (Model):

โ€ข Loss function:

โ€ข Find the best function:

โ€ข Example: Perceptron, SVM Not Today

๐‘ฅOutput = class 1

๐‘“ ๐‘ฅ

๐‘” ๐‘ฅ > 0

๐‘’๐‘™๐‘ ๐‘’

๐ฟ ๐‘“ =

๐‘›

๐›ฟ ๐‘“ ๐‘ฅ๐‘› โ‰  เทœ๐‘ฆ๐‘›

Output = class 2

The number of times f get incorrect results on training data.

Page 7: Classification: Probabilistic Generative Model

How to do Classification

โ€ข Training data for Classification

๐‘ฅ1, เทœ๐‘ฆ1 ๐‘ฅ2, เทœ๐‘ฆ2 ๐‘ฅ๐‘, เทœ๐‘ฆ๐‘โ€ฆโ€ฆ

Training: Class 1 means the target is 1; Class 2 means the target is -1

Classification as Regression?Binary classification as example

Testing: closer to 1 โ†’ class 1; closer to -1 โ†’ class 2

Page 8: Classification: Probabilistic Generative Model

โ€ข Multiple class: Class 1 means the target is 1; Class 2 means the target is 2; Class 3 means the target is 3 โ€ฆโ€ฆ problematic

Penalize to the examples that are โ€œtoo correctโ€ โ€ฆ (Bishop, P186)

Class 1

x1x1

x2x2

Class 2

Class 1

Class 2

1 1

-1 -1

y = b + w1x1 + w2x2

b + w1x1 + w2x2 = 0

>>1

error

to decrease error

Page 9: Classification: Probabilistic Generative Model

Two Boxes

Box 1 Box 2

P(B1) = 2/3 P(B2) = 1/3

P(Blue|B1) = 4/5

P(Green|B1) = 1/5

P(Blue|B2) = 2/5

P(Green|B2) = 3/5

Where does it come from?

P(B1 | Blue) =๐‘ƒ Blue|๐ต1 ๐‘ƒ ๐ต1

๐‘ƒ Blue|๐ต1 ๐‘ƒ ๐ต1 + ๐‘ƒ Blue|๐ต2 ๐‘ƒ ๐ต2

from one of the boxes

Page 10: Classification: Probabilistic Generative Model

Two Classes

Given an x, which class does it belong to

=๐‘ƒ ๐‘ฅ|๐ถ1 ๐‘ƒ ๐ถ1

๐‘ƒ ๐‘ฅ|๐ถ1 ๐‘ƒ ๐ถ1 + ๐‘ƒ ๐‘ฅ|๐ถ2 ๐‘ƒ ๐ถ2๐‘ƒ ๐ถ1|๐‘ฅ

๐‘ƒ ๐‘ฅ = ๐‘ƒ ๐‘ฅ|๐ถ1 ๐‘ƒ ๐ถ1 + ๐‘ƒ ๐‘ฅ|๐ถ2 ๐‘ƒ ๐ถ2Generative Model

P(C1) P(C2)

P(x|C1) P(x|C2)

Estimating the ProbabilitiesFrom training data

Class 1 Class 2

Page 11: Classification: Probabilistic Generative Model

Prior

Class 1 Class 2

P(C1) P(C2)

Water Normal

Water and Normal type with ID < 400 for training, rest for testing

Training: 79 Water, 61 Normal

P(C1) = 79 / (79 + 61) =0.56

P(C2) = 61 / (79 + 61) =0.44

Page 12: Classification: Probabilistic Generative Model

Probability from Class

P( |Water)

79 in total

WaterType

= ?

Each Pokรฉmon is represented as a vector by its attribute. feature

P(x|C1) = ?

Page 13: Classification: Probabilistic Generative Model

Probability from Class - Feature

โ€ข Considering Defense and SP Defense

6564

4850

10345

P(x|Water)=? 0?

WaterType

๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3, โ€ฆโ€ฆ ,๐‘ฅ79

Assume the points are sampled from a Gaussian distribution.

Page 14: Classification: Probabilistic Generative Model

๐‘“๐œ‡,ฮฃ ๐‘ฅ =1

2๐œ‹ ๐ท/2

1

ฮฃ 1/2๐‘’๐‘ฅ๐‘ โˆ’

1

2๐‘ฅ โˆ’ ๐œ‡ ๐‘‡ฮฃโˆ’1 ๐‘ฅ โˆ’ ๐œ‡

Input: vector x, output: probability of sampling x

The shape of the function determined by mean ๐ and covariance matrix ๐œฎ

Gaussian Distribution

๐‘ฅ1

https://blog.slinuxer.com/tag/pca

๐‘ฅ2๐‘ฅ1

๐‘ฅ2

๐œ‡ =00 ๐œ‡ =

23

=2 00 2

=2 00 2

Page 15: Classification: Probabilistic Generative Model

๐‘“๐œ‡,ฮฃ ๐‘ฅ =1

2๐œ‹ ๐ท/2

1

ฮฃ 1/2๐‘’๐‘ฅ๐‘ โˆ’

1

2๐‘ฅ โˆ’ ๐œ‡ ๐‘‡ฮฃโˆ’1 ๐‘ฅ โˆ’ ๐œ‡

Input: vector x, output: probability of sampling x

The shape of the function determines by mean ๐ and covariance matrix ๐œฎ

Gaussian Distribution

๐‘ฅ1

https://blog.slinuxer.com/tag/pca

๐‘ฅ2๐‘ฅ1

๐‘ฅ2

๐œ‡ =00 ๐œ‡ =

00

=2 00 6

=2 โˆ’1โˆ’1 2

Page 16: Classification: Probabilistic Generative Model

Probability from ClassAssume the points are sampled from a Gaussian distribution

Find the Gaussian distribution behind them

WaterType

Probability for new points

๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3, โ€ฆโ€ฆ ,๐‘ฅ79

๐œ‡ =75.071.3

ฮฃ =874 327327 929

๐‘“๐œ‡,ฮฃ ๐‘ฅ =1

2๐œ‹ ๐ท/2

1

ฮฃ 1/2๐‘’๐‘ฅ๐‘ โˆ’

1

2๐‘ฅ โˆ’ ๐œ‡ ๐‘‡ฮฃโˆ’1 ๐‘ฅ โˆ’ ๐œ‡

New x

How to find them?

Page 17: Classification: Probabilistic Generative Model

The Gaussian with any mean ๐œ‡ and covariance matrix ๐›ดcan generate these points.

Maximum Likelihood

๐œ‡ =7575

๐œ‡ =150140

=5 00 5

=2 โˆ’1โˆ’1 2

๐‘“๐œ‡,ฮฃ ๐‘ฅ =1

2๐œ‹ ๐ท/2

1

ฮฃ 1/2๐‘’๐‘ฅ๐‘ โˆ’

1

2๐‘ฅ โˆ’ ๐œ‡ ๐‘‡ฮฃโˆ’1 ๐‘ฅ โˆ’ ๐œ‡

= ๐‘“๐œ‡,ฮฃ ๐‘ฅ1 ๐‘“๐œ‡,ฮฃ ๐‘ฅ2 ๐‘“๐œ‡,ฮฃ ๐‘ฅ3 โ€ฆโ€ฆ๐‘“๐œ‡,ฮฃ ๐‘ฅ79

Likelihood of a Gaussian with mean ๐œ‡ and covariance matrix ๐›ด

Different Likelihood

๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3, โ€ฆโ€ฆ ,๐‘ฅ79

= the probability of the Gaussian samples ๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3, โ€ฆโ€ฆ ,๐‘ฅ79

๐ฟ ๐œ‡, ฮฃ

Page 18: Classification: Probabilistic Generative Model

Maximum Likelihood

๐‘“๐œ‡,ฮฃ ๐‘ฅ =1

2๐œ‹ ๐ท/2

1

ฮฃ 1/2๐‘’๐‘ฅ๐‘ โˆ’

1

2๐‘ฅ โˆ’ ๐œ‡ ๐‘‡ฮฃโˆ’1 ๐‘ฅ โˆ’ ๐œ‡

๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3, โ€ฆโ€ฆ ,๐‘ฅ79We have the โ€œWaterโ€ type Pokรฉmons:

We assume ๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3, โ€ฆโ€ฆ ,๐‘ฅ79 generate from the Gaussian (๐œ‡โˆ—, ฮฃโˆ—) with the maximum likelihood

= ๐‘“๐œ‡,ฮฃ ๐‘ฅ1 ๐‘“๐œ‡,ฮฃ ๐‘ฅ2 ๐‘“๐œ‡,ฮฃ ๐‘ฅ3 โ€ฆโ€ฆ๐‘“๐œ‡,ฮฃ ๐‘ฅ79๐ฟ ๐œ‡, ฮฃ

๐œ‡โˆ—, ฮฃโˆ— = ๐‘Ž๐‘Ÿ๐‘”max๐œ‡,ฮฃ

๐ฟ ๐œ‡, ฮฃ

๐œ‡โˆ— =1

79

๐‘›=1

79

๐‘ฅ๐‘› ฮฃโˆ— =1

79

๐‘›=1

79

๐‘ฅ๐‘› โˆ’ ๐œ‡โˆ— ๐‘ฅ๐‘› โˆ’ ๐œ‡โˆ— ๐‘‡

average

Page 19: Classification: Probabilistic Generative Model

Maximum Likelihood

Class 1: Water Class 2: Normal

๐œ‡1 =75.071.3

๐œ‡2 =55.659.8

ฮฃ1 =874 327327 929

ฮฃ2 =847 422422 685

Page 20: Classification: Probabilistic Generative Model

Now we can do classification

=๐‘ƒ ๐‘ฅ|๐ถ1 ๐‘ƒ ๐ถ1

๐‘ƒ ๐‘ฅ|๐ถ1 ๐‘ƒ ๐ถ1 + ๐‘ƒ ๐‘ฅ|๐ถ2 ๐‘ƒ ๐ถ2๐‘ƒ ๐ถ1|๐‘ฅ

If ๐‘ƒ ๐ถ1|๐‘ฅ > 0.5 x belongs to class 1 (Water)

P(C1) = 79 / (79 + 61) =0.56

P(C2) = 61 / (79 + 61) =0.44

๐‘“๐œ‡1,ฮฃ1 ๐‘ฅ =1

2๐œ‹ ๐ท/2

1

ฮฃ1 1/2๐‘’๐‘ฅ๐‘ โˆ’

1

2๐‘ฅ โˆ’ ๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡1

๐œ‡1 =75.071.3

๐œ‡2 =55.659.8

ฮฃ1 =874 327327 929

ฮฃ2 =847 422422 685

๐‘“๐œ‡2,ฮฃ2 ๐‘ฅ =1

2๐œ‹ ๐ท/2

1

ฮฃ2 1/2๐‘’๐‘ฅ๐‘ โˆ’

1

2๐‘ฅ โˆ’ ๐œ‡2 ๐‘‡ ฮฃ2 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡2

Page 21: Classification: Probabilistic Generative Model

Howโ€™s the results?Testing data: 47% accuracy All: hp, att, sp att, de, sp de, speed (6 features)

64% accuracy โ€ฆ

๐‘ƒ ๐ถ1|๐‘ฅ

Blue points: C1 (Water), Red points: C2 (Normal)

๐‘ƒ ๐ถ1|๐‘ฅ> 0.5

๐‘ƒ ๐ถ1|๐‘ฅ< 0.5

๐‘ƒ ๐ถ1|๐‘ฅ> 0.5

๐‘ƒ ๐ถ1|๐‘ฅ< 0.5

๐œ‡1, ๐œ‡2: 6-dim vector

ฮฃ1, ฮฃ2: 6 x 6 matrices

Page 22: Classification: Probabilistic Generative Model

Modifying Model

Class 1: Water

๐œ‡1 =75.071.3

๐œ‡2 =55.659.8

ฮฃ1 =874 327327 929

ฮฃ2 =847 422422 685

Class 2: Normal

The same ฮฃLess parameters

Page 23: Classification: Probabilistic Generative Model

Modifying Model

โ€ข Maximum likelihood

๐‘ฅ1, ๐‘ฅ2, ๐‘ฅ3, โ€ฆโ€ฆ ,๐‘ฅ79โ€œWaterโ€ type Pokรฉmons:

๐‘ฅ80, ๐‘ฅ81, ๐‘ฅ82, โ€ฆโ€ฆ ,๐‘ฅ140โ€œNormalโ€ type Pokรฉmons:

๐œ‡1 ฮฃ ๐œ‡2

Find ๐œ‡1, ๐œ‡2, ฮฃ maximizing the likelihood ๐ฟ ๐œ‡1,๐œ‡2,ฮฃ

๐ฟ ๐œ‡1,๐œ‡2,ฮฃ = ๐‘“๐œ‡1,ฮฃ ๐‘ฅ1 ๐‘“๐œ‡1,ฮฃ ๐‘ฅ2 โ‹ฏ๐‘“๐œ‡1,ฮฃ ๐‘ฅ79

ร— ๐‘“๐œ‡2,ฮฃ ๐‘ฅ80 ๐‘“๐œ‡2,ฮฃ ๐‘ฅ81 โ‹ฏ๐‘“๐œ‡2,ฮฃ ๐‘ฅ140

ฮฃ =79

140ฮฃ1 +

61

140ฮฃ2

Ref: Bishop, chapter 4.2.2

Calculate ๐œ‡1 and ๐œ‡2 as before

Page 24: Classification: Probabilistic Generative Model

Modifying Model

The same covariance matrix

All: hp, att, sp att, de, sp de, speed

54% accuracy 73% accuracy

The boundary is linear

Page 25: Classification: Probabilistic Generative Model

โ€ข Function Set (Model):

โ€ข Goodness of a function:

โ€ข The mean ๐œ‡ and covariance ฮฃ that maximizing the likelihood (the probability of generating data)

โ€ข Find the best function: easy

Three Steps

๐‘ƒ ๐ถ1|๐‘ฅ =๐‘ƒ ๐‘ฅ|๐ถ1 ๐‘ƒ ๐ถ1

๐‘ƒ ๐‘ฅ|๐ถ1 ๐‘ƒ ๐ถ1 + ๐‘ƒ ๐‘ฅ|๐ถ2 ๐‘ƒ ๐ถ2๐‘ฅ

If ๐‘ƒ ๐ถ1|๐‘ฅ > 0.5, output: class 1

Otherwise, output: class 2

Page 26: Classification: Probabilistic Generative Model

Probability Distribution

โ€ข You can always use the distribution you like

๐‘ƒ ๐‘ฅ|๐ถ1

If you assume all the dimensions are independent,

then you are using Naive Bayes Classifier.

๐‘ฅ1๐‘ฅ2โ‹ฎ๐‘ฅ๐‘˜โ‹ฎ๐‘ฅ๐พ

= ๐‘ƒ ๐‘ฅ1|๐ถ1 ๐‘ƒ ๐‘ฅ2|๐ถ1 ๐‘ƒ ๐‘ฅ๐‘˜|๐ถ1โ€ฆโ€ฆ โ€ฆโ€ฆ

1-D Gaussian

For binary features, you may assume they are from Bernoulli distributions.

Page 27: Classification: Probabilistic Generative Model

Posterior Probability

๐‘ƒ ๐ถ1|๐‘ฅ =๐‘ƒ ๐‘ฅ|๐ถ1 ๐‘ƒ ๐ถ1

๐‘ƒ ๐‘ฅ|๐ถ1 ๐‘ƒ ๐ถ1 + ๐‘ƒ ๐‘ฅ|๐ถ2 ๐‘ƒ ๐ถ2

=1

1 +๐‘ƒ ๐‘ฅ|๐ถ2 ๐‘ƒ ๐ถ2๐‘ƒ ๐‘ฅ|๐ถ1 ๐‘ƒ ๐ถ1

=1

1 + ๐‘’๐‘ฅ๐‘ โˆ’๐‘ง

๐‘ง = ๐‘™๐‘›๐‘ƒ ๐‘ฅ|๐ถ1 ๐‘ƒ ๐ถ1๐‘ƒ ๐‘ฅ|๐ถ2 ๐‘ƒ ๐ถ2

= ๐œŽ ๐‘งSigmoid function

z

z

Page 28: Classification: Probabilistic Generative Model

Warning of Math

Page 29: Classification: Probabilistic Generative Model

Posterior Probability

๐‘ƒ ๐ถ1|๐‘ฅ = ๐œŽ ๐‘ง ๐‘ง = ๐‘™๐‘›๐‘ƒ ๐‘ฅ|๐ถ1 ๐‘ƒ ๐ถ1๐‘ƒ ๐‘ฅ|๐ถ2 ๐‘ƒ ๐ถ2

๐‘ƒ ๐‘ฅ|๐ถ1 =1

2๐œ‹ ๐ท/2

1

ฮฃ1 1/2๐‘’๐‘ฅ๐‘ โˆ’

1

2๐‘ฅ โˆ’ ๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡1

๐‘ƒ ๐‘ฅ|๐ถ2 =1

2๐œ‹ ๐ท/2

1

ฮฃ2 1/2๐‘’๐‘ฅ๐‘ โˆ’

1

2๐‘ฅ โˆ’ ๐œ‡2 ๐‘‡ ฮฃ2 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡2

๐‘ง = ๐‘™๐‘›๐‘ƒ ๐‘ฅ|๐ถ1๐‘ƒ ๐‘ฅ|๐ถ2

+ ๐‘™๐‘›๐‘ƒ ๐ถ1๐‘ƒ ๐ถ2

sigmoid

๐‘1๐‘1 +๐‘2๐‘2

๐‘1 +๐‘2

=๐‘1๐‘2

Page 30: Classification: Probabilistic Generative Model

= ๐‘™๐‘›ฮฃ2 1/2

ฮฃ1 1/2๐‘’๐‘ฅ๐‘ โˆ’

1

2๐‘ฅ โˆ’ ๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡1 โˆ’ ๐‘ฅ โˆ’ ๐œ‡2 ๐‘‡ ฮฃ2 โˆ’1 ๐‘ฅ โˆ’

๐‘™๐‘›

12๐œ‹ ๐ท/2

1ฮฃ1 1/2 ๐‘’๐‘ฅ๐‘ โˆ’

12๐‘ฅ โˆ’ ๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡1

12๐œ‹ ๐ท/2

1ฮฃ2 1/2 ๐‘’๐‘ฅ๐‘ โˆ’

12๐‘ฅ โˆ’ ๐œ‡2 ๐‘‡ ฮฃ2 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡2

= ๐‘™๐‘›ฮฃ2 1/2

ฮฃ1 1/2โˆ’1

2๐‘ฅ โˆ’ ๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡1 โˆ’ ๐‘ฅ โˆ’ ๐œ‡2 ๐‘‡ ฮฃ2 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡2

๐‘ƒ ๐‘ฅ|๐ถ1 =1

2๐œ‹ ๐ท/2

1

ฮฃ1 1/2๐‘’๐‘ฅ๐‘ โˆ’

1

2๐‘ฅ โˆ’ ๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡1

๐‘ƒ ๐‘ฅ|๐ถ2 =1

2๐œ‹ ๐ท/2

1

ฮฃ2 1/2๐‘’๐‘ฅ๐‘ โˆ’

1

2๐‘ฅ โˆ’ ๐œ‡2 ๐‘‡ ฮฃ2 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡2

๐‘ง = ๐‘™๐‘›๐‘ƒ ๐‘ฅ|๐ถ1๐‘ƒ ๐‘ฅ|๐ถ2

+ ๐‘™๐‘›๐‘ƒ ๐ถ1๐‘ƒ ๐ถ2 ๐‘1

๐‘2

Page 31: Classification: Probabilistic Generative Model

๐‘ฅ โˆ’ ๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡1

= ๐‘ฅ๐‘‡ ฮฃ1 โˆ’1๐‘ฅ โˆ’ 2 ๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1๐‘ฅ + ๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1๐œ‡1

๐‘ง = ๐‘™๐‘›ฮฃ2 1/2

ฮฃ1 1/2 โˆ’1

2๐‘ฅ๐‘‡ ฮฃ1 โˆ’1๐‘ฅ + ๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1๐‘ฅ โˆ’

1

2๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1๐œ‡1

+1

2๐‘ฅ๐‘‡ ฮฃ2 โˆ’1๐‘ฅ โˆ’ ๐œ‡2 ๐‘‡ ฮฃ2 โˆ’1๐‘ฅ +

1

2๐œ‡2 ๐‘‡ ฮฃ2 โˆ’1๐œ‡2

= ๐‘™๐‘›ฮฃ2 1/2

ฮฃ1 1/2โˆ’1

2๐‘ฅ โˆ’ ๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡1 โˆ’ ๐‘ฅ โˆ’ ๐œ‡2 ๐‘‡ ฮฃ2 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡2

๐‘ง = ๐‘™๐‘›๐‘ƒ ๐‘ฅ|๐ถ1๐‘ƒ ๐‘ฅ|๐ถ2

+ ๐‘™๐‘›๐‘ƒ ๐ถ1๐‘ƒ ๐ถ2

=๐‘1๐‘2

= ๐‘ฅ๐‘‡ ฮฃ1 โˆ’1๐‘ฅ โˆ’ ๐‘ฅ๐‘‡ ฮฃ1 โˆ’1๐œ‡1 โˆ’ ๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1๐‘ฅ + ๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1๐œ‡1

๐‘ฅ โˆ’ ๐œ‡2 ๐‘‡ ฮฃ2 โˆ’1 ๐‘ฅ โˆ’ ๐œ‡2

= ๐‘ฅ๐‘‡ ฮฃ2 โˆ’1๐‘ฅ โˆ’ 2 ๐œ‡2 ๐‘‡ ฮฃ2 โˆ’1๐‘ฅ + ๐œ‡2 ๐‘‡ ฮฃ2 โˆ’1๐œ‡2

+๐‘™๐‘›๐‘1๐‘2

Page 32: Classification: Probabilistic Generative Model

End of Warning

Page 33: Classification: Probabilistic Generative Model

๐‘ƒ ๐ถ1|๐‘ฅ = ๐œŽ ๐‘ง

ฮฃ1 = ฮฃ2 = ฮฃ

๐‘ง = ๐œ‡1 โˆ’ ๐œ‡2 ๐‘‡ฮฃโˆ’1๐‘ฅ โˆ’1

2๐œ‡1 ๐‘‡ฮฃโˆ’1๐œ‡1 +

1

2๐œ‡2 ๐‘‡ฮฃโˆ’1๐œ‡2 + ๐‘™๐‘›

๐‘1๐‘2

๐’˜๐‘ปb

๐‘ƒ ๐ถ1|๐‘ฅ = ๐œŽ ๐‘ค โˆ™ ๐‘ฅ + ๐‘

๐‘ง = ๐‘™๐‘›ฮฃ2 1/2

ฮฃ1 1/2 โˆ’1

2๐‘ฅ๐‘‡ ฮฃ1 โˆ’1๐‘ฅ + ๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1๐‘ฅ โˆ’

1

2๐œ‡1 ๐‘‡ ฮฃ1 โˆ’1๐œ‡1

+1

2๐‘ฅ๐‘‡ ฮฃ2 โˆ’1๐‘ฅ โˆ’ ๐œ‡2 ๐‘‡ ฮฃ2 โˆ’1๐‘ฅ +

1

2๐œ‡2 ๐‘‡ ฮฃ2 โˆ’1๐œ‡2 +๐‘™๐‘›

๐‘1๐‘2

In generative model, we estimate ๐‘1, ๐‘2, ๐œ‡1, ๐œ‡2, ฮฃ

Then we have w and b

How about directly find w and b?

Page 34: Classification: Probabilistic Generative Model

Reference

โ€ข Bishop: Chapter 4.1 โ€“ 4.2

โ€ข Data: https://www.kaggle.com/abcsds/pokemon

โ€ข Useful posts:โ€ข https://www.kaggle.com/nishantbhadauria/d/abcsds/po

kemon/pokemon-speed-attack-hp-defense-analysis-by-type

โ€ข https://www.kaggle.com/nikos90/d/abcsds/pokemon/mastering-pokebars/discussion

โ€ข https://www.kaggle.com/ndrewgele/d/abcsds/pokemon/visualizing-pok-mon-stats-with-seaborn/discussion