m l d m chapter 3 parameter estimation

MIMA Group

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University

Chapter 3Parameter Estimation

M LD M

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 2

Contents Introduction Maximum-Likelihood Estimation Bayesian Estimation

Bayesian Theorem

To compute posterior probability , we need to know:

)()()|()|(

pPpP ii

jii Ppp

1)()|()( xx

)|( ip x )( iP

)|( xiP

How can we get these values?

Samples

},,,{ 21 cDDDD D1 D2

The samples in Dj are drawn independently according to the probability law p(x|j). That is, examples in Dj are i.i.d. random variables, i.e., independent and identically distributed.

It is easy to compute the prior probability:

Samples For class-conditional pdf:

Case I: p(x|j) has certain parametric forme.g.

If j contains “d+d(d+1)/2” free parameters.

Case II: p(x|j) doesn’t have parametric formNext chapter.

),(~)|( jjj Np Σμx

mj ),,,( 21 θdRX

},,,{ 21 cDDDD D1 D2

)|()|( jj pp θxx )|()|( jj pp θxx

Use Dj to estimate the unknown parameter vector j

1θ̂2θ̂

Tmj ),,,( 21 θ

Estimation Under Parametric Form

Maximum-Likelihood Estimation

Bayesian Estimation

View parameters as quantities whose values are fixed but unknown

Estimate parameter values by maximizing the likelihood (probability) of observing the actual examples.

View parameters as random variables having some known prior distribution

Observation of the actual training examples transforms parameters’ prior into posterior distribution. (via Bayes rule)

Maximum-Likelihood Estimation

Because each class is considered individually, the subscript used before will be dropped.

Now the problem becomes:

Given a sample set D, whose elements are drawn independently from a population possessing a known parameter form, say p(x|), we want to choose a that will make D to occur most likely.

Maximum-Likelihood Estimation (Cont.)

Criterion of ML

By the independence assumption, we have

The Likelihood Function

The maximum-likelihood estimation:

},,,{ 21 nxxx D

)|()|()|()|( 21 θxθxθxθ npppp D

)|( θx

)|()|( θθ DD pL

)|( θx

)|(maxarg ˆ DL

Often, we resort to maximize the log-likelihood function

)|(ln)|( DD θθ Ll

1)|(ln θx

ˆ arg max ( | )Lθ

θ θ Dˆ arg max ( | )Lθ

θ θ D

ˆ arg max ( | )lθ

θ θ Dˆ arg max ( | )lθ

θ θ D

Find the extreme values using the method in differential calculus.

Gradient Operator Let f() be a continuous function, where =(1, 2,…, n)T.

Find the extreme values by solving

θGradientOperator

The Gaussian Case I Case I: unknown , and is known

||)2(1),|( 1

2/12/ μxΣμxΣ

Σμx Tdp

)|()|( μμ DD pL

12/2/ )()(

||)2(1 μxΣμx

)|( θx

)|(ln)|( DD μμ Ll

12/2/ )()(21||)2ln( μxΣμxΣ

The Gaussian Case I

0)()|(1

kkl μxΣμμ D

1ˆ xμ

Intuitive Result: Maximum estimate for the unknown is just the arithmetic average of training samples---sample mean.

Sample Mean!

)|(ln)|( DD μμ Ll

12/2/ )()(21||)2ln( μxΣμxΣ

The Gaussian Case II Case II: both and are unknown Consider univariate case

2)(exp

21),|(

xxp TT ),(),( 2

)|()|( θθ DD pL

2/ 2)(exp

)|( θ

)|(ln)|( DD θθ Ll

2/ )(2

1)2ln(

2/ )(21)2ln(

The Gaussian Case II

nn xl1

2/ )(21)2ln()|(

2 )ˆ(1ˆˆ

biased

unbiased

Arithmetic average of n vectors

Arithmetic average of n matricesT

kk )ˆ)(ˆ( μxμx

Unbiased Estimator:

Consistent Estimator:θθ ]ˆ[E

]ˆ[lim En

MLE for Normal Population

1ˆ xμ

Tkkn 1

)ˆ)(ˆ(1ˆ μxμxΣ

Tkkn 1

)ˆ)(ˆ(1ˆ μxμxΣ

μμ ]ˆ[E

ΣΣΣ

nE 1]ˆ[

Tkkn 1

)ˆ)(ˆ(1

1 μxμxC

Tkkn 1

)ˆ)(ˆ(1

1 μxμxCSample Covariance Matrix

ΣC ][E

Sample Mean

Bayesian Estimation Settings

The parametric form of the likelihood function for each category is known

However, j is considered to be random variables instead of being fixed (but unknown) values.

In this case, we can no longer make a single ML estimate and then infer based on and

θ̂)|( xiP )( iP )|( ip x

How can we proceed?

Fully exploit training examples!

Posterior Probabilities from sample

1)x,,(

)x,,()x,(

)x,,(),|(D

),|x()|()()|x,()(),,( DDDD iiii PPDPPDPP x

),|(),|( iDD ii PP xx ),|(),|( iDD ii PP xx

)()|( ii PP D )()|( ii PP D

Assumptions:

1)(),|(

)(),|(),|(

Each class can be considered independently

Problem Formulation

1)(),|(

)(),|(),|(

The key problem is to determine, ,treat each class independently, the problem becomes

),|( iiP Dx)|( DxP

This is always the central problem of Bayesian Learning.

Class-Conditional Density Estimation

θθθx

θθxx

)|()|(

)|(),|(

)|,()|(

D :Random variable w.r.t. parametric form

x is independent of D given

The form of distribution is assumed

The posterior density we want to estimate

Assume p(x) is unknown but knowing it has a fixed form with parameter vector .

Bayesian Estimation: General Procedure

?)|( DθpPhase I:

Bayesian Estimation: General Procedure

θθθxx dppp )|()|( )|( DDPhase II:

Phase III:

1)(),|(

)(),|(),|(

The Gaussian Case The univariate Gaussian: unknown

Phase I:

Dxpp )|()( )|( Dp

0 21exp

Other form of prior pdf could be assumed as well

)()|()|(1

θθxθ pppn

The Gaussian Case

12121exp

0 21exp

)()|()|(1

θθxθ pppn

The Gaussian Case

)|( Dp

12121exp

),(~)|( 2nnNp D

is an exponential function of a quadratic function of ; thus is also a normal.

)|( Dp)|( Dp

The Gaussian Case Equating the coefficients in both form; then, we

The Gaussian Case

Phase II:

)|()|( xpDp )|( Dxp

How would p(x|D) look like in this case?

θθθxx dppp )|()|( )|( DD

),(~)|( 2nnNp D

The Gaussian Case

θxx dupupp )|()|()|( DD

),(~)|( 2nnNp D

21)|( D

21exp)(

p(x|D) is an exponential function of a quadratic function of x; thus, it is also a normal pdf. =?

The Gaussian Case

θxx dupupp )|()|()|( DD

),(~)|( 2nnNp D

21)|( D

21exp)(

p(x|D) is an exponential function of a quadratic function of x; thus, it is also a normal pdf. =?

),(~)|( 22nnNxp D

The Gaussian Case

Phase III:

1)(),|(

)(),|(),|(

Summary Key issue

Estimate prior and class-conditional pdf from training set

Basic assumption on training examples: i.i.d. Two strategies to key issue

Parametric form for class-conditional pdfMaximum likelihood estimationBayesian estimation

No parametric form for class-conditional pdf

Summary Maximum likelihood estimation

Settings: parameters as fixed but unknown values The objective function: log-likelihood function The gradient for the objective function should be zero Gaussian

Bayesian estimation Settings: parameters as random variables General procedure: I, II, III Gaussian caseProject 3.2

MIMA Group

Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University

Any Question?

m l d m chapter 3 parameter estimation

Documents

musical signal parameter estimation

recall: parameter estimation: lecture 15 robust estimation

parameter estimation class 6

sample size and item parameter estimation precision when...

l6: parameter estimation - texas a&m...

classification, parameter - ferdowsi university of...

chapter 6 parameter estimation

m l d m chapter 4 non-parameter...

nonlinear parameter estimation - washington university...

1 lbayesian estimation (be) l bayesian parameter estimation:...

model parameter estimation -...

default parameter estimation using market...

parameter estimation ds-cdma

parameter estimation in a reservoir engineering application...

parameter estimation kinetics

introduction to parameter estimation

parameter estimation for hmm

joint state parameter estimation

parameter estimation user guide

parameter estimation of ode's via nonparametric … ·...