m l d m chapter 3 parameter estimation
Post on 09-Jan-2022
2 Views
Preview:
TRANSCRIPT
MIMA Group
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University
Chapter 3Parameter Estimation
M LD M
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 2
Contents Introduction Maximum-Likelihood Estimation Bayesian Estimation
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 3
Bayesian Theorem
To compute posterior probability , we need to know:
)()()|()|(
xxx
pPpP ii
i
c
jii Ppp
1)()|()( xx
)|( ip x )( iP
)|( xiP
How can we get these values?
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 4
Samples
},,,{ 21 cDDDD D1 D2
D3
The samples in Dj are drawn independently according to the probability law p(x|j). That is, examples in Dj are i.i.d. random variables, i.e., independent and identically distributed.
It is easy to compute the prior probability:
c
i i
ji
D
DP
1
)(
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 5
Samples For class-conditional pdf:
Case I: p(x|j) has certain parametric forme.g.
If j contains “d+d(d+1)/2” free parameters.
Case II: p(x|j) doesn’t have parametric formNext chapter.
),(~)|( jjj Np Σμx
jT
mj ),,,( 21 θdRX
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 6
Goal
},,,{ 21 cDDDD D1 D2
D3
1 2
3
)|()|( jj pp θxx )|()|( jj pp θxx
Use Dj to estimate the unknown parameter vector j
1θ̂2θ̂
3θ̂
Tmj ),,,( 21 θ
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 7
Estimation Under Parametric Form
Maximum-Likelihood Estimation
Bayesian Estimation
View parameters as quantities whose values are fixed but unknown
Estimate parameter values by maximizing the likelihood (probability) of observing the actual examples.
View parameters as random variables having some known prior distribution
Observation of the actual training examples transforms parameters’ prior into posterior distribution. (via Bayes rule)
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 8
Maximum-Likelihood Estimation
Because each class is considered individually, the subscript used before will be dropped.
Now the problem becomes:
Given a sample set D, whose elements are drawn independently from a population possessing a known parameter form, say p(x|), we want to choose a that will make D to occur most likely.
θ̂
D
θ̂
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 9
Maximum-Likelihood Estimation (Cont.)
Criterion of ML
By the independence assumption, we have
The Likelihood Function
The maximum-likelihood estimation:
},,,{ 21 nxxx D
)|()|()|()|( 21 θxθxθxθ npppp D
n
kkp
1
)|( θx
)|()|( θθ DD pL
n
kkp
1
)|( θx
)|(maxarg ˆ DL
θ
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 10
Maximum-Likelihood Estimation (Cont.)
Often, we resort to maximize the log-likelihood function
)|(ln)|( DD θθ Ll
n
kkp
1)|(ln θx
ˆ arg max ( | )Lθ
θ θ Dˆ arg max ( | )Lθ
θ θ D
ˆ arg max ( | )lθ
θ θ Dˆ arg max ( | )lθ
θ θ D
why?
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 11
Maximum-Likelihood Estimation (Cont.)
Find the extreme values using the method in differential calculus.
Gradient Operator Let f() be a continuous function, where =(1, 2,…, n)T.
Find the extreme values by solving
T
n
,,,21
θGradientOperator
0 fθ
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 12
The Gaussian Case I Case I: unknown , and is known
)()(
21exp
||)2(1),|( 1
2/12/ μxΣμxΣ
Σμx Tdp
)|()|( μμ DD pL
n
kk
Tknnd
1
12/2/ )()(
21exp
||)2(1 μxΣμx
Σ
n
kkp
1
)|( θx
)|(ln)|( DD μμ Ll
n
kk
Tk
nnd
1
12/2/ )()(21||)2ln( μxΣμxΣ
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 13
The Gaussian Case I
0)()|(1
1
n
kkl μxΣμμ D
n
kkn 1
1ˆ xμ
Intuitive Result: Maximum estimate for the unknown is just the arithmetic average of training samples---sample mean.
Sample Mean!
)|(ln)|( DD μμ Ll
n
kk
Tk
nnd
1
12/2/ )()(21||)2ln( μxΣμxΣ
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 14
The Gaussian Case II Case II: both and are unknown Consider univariate case
2
22
2)(exp
21),|(
xxp TT ),(),( 2
21 θ
)|()|( θθ DD pL
n
k
knn
x1
2
2
2/ 2)(exp
)2(1
n
kkxp
1
)|( θ
)|(ln)|( DD θθ Ll
n
kk
nn x1
22
2/ )(2
1)2ln(
n
kk
nn x1
21
2
2/2
2/ )(21)2ln(
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 15
The Gaussian Case II
n
kk
nn xl1
21
2
2/2
2/ )(21)2ln()|(
Dθ
0θθ
n
k
k
n
kk
xn
xl
122
21
2
11
2
2)(
2
)(1
)|(
D
n
kkx
n 11
1ˆˆ
n
kkx
n 1
22
2 )ˆ(1ˆˆ
biased
unbiased
Arithmetic average of n vectors
Arithmetic average of n matricesT
kk )ˆ)(ˆ( μxμx
Unbiased Estimator:
Consistent Estimator:θθ ]ˆ[E
θθ
]ˆ[lim En
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 16
MLE for Normal Population
n
kkn 1
1ˆ xμ
n
kkn 1
1ˆ xμ
n
k
Tkkn 1
)ˆ)(ˆ(1ˆ μxμxΣ
n
k
Tkkn 1
)ˆ)(ˆ(1ˆ μxμxΣ
μμ ]ˆ[E
ΣΣΣ
n
nE 1]ˆ[
n
k
Tkkn 1
)ˆ)(ˆ(1
1 μxμxC
n
k
Tkkn 1
)ˆ)(ˆ(1
1 μxμxCSample Covariance Matrix
ΣC ][E
Sample Mean
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 18
Bayesian Estimation Settings
The parametric form of the likelihood function for each category is known
However, j is considered to be random variables instead of being fixed (but unknown) values.
In this case, we can no longer make a single ML estimate and then infer based on and
θ̂)|( xiP )( iP )|( ip x
How can we proceed?
Fully exploit training examples!
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 19
Posterior Probabilities from sample
c
j j
iii
PP
PPP
1)x,,(
)x,,()x,(
)x,,(),|(D
DD
DD
x
),|x()|()()|x,()(),,( DDDD iiii PPDPPDPP x
),|(),|( iDD ii PP xx ),|(),|( iDD ii PP xx
)()|( ii PP D )()|( ii PP D
Assumptions:
c
jjjj
iiii
PP
PPP
1)(),|(
)(),|(),|(
D
DD
x
xx
Each class can be considered independently
Each class can be considered independently
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 20
Problem Formulation
c
jjjj
iiii
PP
PPP
1)(),|(
)(),|(),|(
D
DD
x
xx
The key problem is to determine, ,treat each class independently, the problem becomes
),|( iiP Dx)|( DxP
This is always the central problem of Bayesian Learning.
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 21
Class-Conditional Density Estimation
θθθx
θθθx
θθxx
dpp
dpDp
dDpp
)|()|(
)|(),|(
)|,()|(
D
D
D :Random variable w.r.t. parametric form
x is independent of D given
The form of distribution is assumed
known
The posterior density we want to estimate
Assume p(x) is unknown but knowing it has a fixed form with parameter vector .
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 22
Bayesian Estimation: General Procedure
?)|( DθpPhase I:
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 23
Bayesian Estimation: General Procedure
θθθxx dppp )|()|( )|( DDPhase II:
Phase III:
c
jjjj
iiii
PP
PPP
1)(),|(
)(),|(),|(
D
DD
x
xx
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 24
The Gaussian Case The univariate Gaussian: unknown
Phase I:
Dxpp )|()( )|( Dp
2
21exp
21)|(
xxp
2
0
0
0 21exp
21)(
p
Other form of prior pdf could be assumed as well
)()|()|(1
θθxθ pppn
kk
D
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 25
The Gaussian Case
2
0
0
01
2
21exp
21
21exp
21)|(
n
k
kxp D
n
k
kx1
2
0
02
21exp
n
kkxn
120
02
220
2
12121exp
2
21exp
21)|(
xxp
2
0
0
0 21exp
21)(
p
)()|()|(1
θθxθ pppn
kk
D
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 26
The Gaussian Case
)|( Dp
n
kkxn
120
02
220
2
12121exp
),(~)|( 2nnNp D
2
21exp
21)|(
n
n
n
p
D
22
2 22
1exp2
1nn
nn
Com
paris
on
is an exponential function of a quadratic function of ; thus is also a normal.
)|( Dp)|( Dp
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 27
The Gaussian Case Equating the coefficients in both form; then, we
have
0220
2
220
20 ˆ
nn
nnn
220
2202
nn
n
kkn x
n 1
1̂
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 28
The Gaussian Case
Phase II:
)|()|( xpDp )|( Dxp
2
21exp
21)|(
xxp
How would p(x|D) look like in this case?
θθθxx dppp )|()|( )|( DD
),(~)|( 2nnNp D
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 29
The Gaussian Case
θxx dupupp )|()|()|( DD
2
21exp
21)|(
xxp
),(~)|( 2nnNp D
dxxp
n
n
n
22
21exp
21exp
21)|( D
dxx
n
nn
n
n
n
n
n
2
22
22
22
22
22
2
21exp)(
21exp
21
p(x|D) is an exponential function of a quadratic function of x; thus, it is also a normal pdf. =?
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 30
The Gaussian Case
θxx dupupp )|()|()|( DD
2
21exp
21)|(
xxp
),(~)|( 2nnNp D
dxxp
n
n
n
22
21exp
21exp
21)|( D
dxx
n
nn
n
n
n
n
n
2
22
22
22
22
22
2
21exp)(
21exp
21
p(x|D) is an exponential function of a quadratic function of x; thus, it is also a normal pdf. =?
),(~)|( 22nnNxp D
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 31
The Gaussian Case
Phase III:
c
jjjj
iiii
PP
PPP
1)(),|(
)(),|(),|(
D
DD
x
xx
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 32
Summary Key issue
Estimate prior and class-conditional pdf from training set
Basic assumption on training examples: i.i.d. Two strategies to key issue
Parametric form for class-conditional pdfMaximum likelihood estimationBayesian estimation
No parametric form for class-conditional pdf
MIMA
Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University 33
Summary Maximum likelihood estimation
Settings: parameters as fixed but unknown values The objective function: log-likelihood function The gradient for the objective function should be zero Gaussian
Bayesian estimation Settings: parameters as random variables General procedure: I, II, III Gaussian caseProject 3.2
top related