![Page 1: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/1.jpg)
Probability Theory and
Parameter Estimation II
![Page 2: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/2.jpg)
Least Squares and Gauss
● How to solve the least square problem?● Carl Friedrich Gauss solution in 1794 (age 18)
● Why solving the least square problem?● Carl Friedrich Gauss solution in 1822 (age 46)● Least square solution is optimal in the sense that it is the best linear unbiased estimator of the coefficients of the polynomials ● Assumptions: errors have zero mean and equal variances
http://en.wikipedia.org/wiki/Least_squares
![Page 3: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/3.jpg)
Three Approaches
posterior ∝ likelihood × prior
p(Data | Parameters)
p(Parameters)p(Parameters | Data)
1. find parameters that maximize (log) likelihood 2. find parameters that maximize posterior (MAP)3. find the posterior (fully Bayesian)
![Page 4: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/4.jpg)
Maximum Likelihood I
Maximize log likelihood
Surprise! Maximizing log likelihood is equivalent to minimizing sum of square error function!
![Page 5: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/5.jpg)
Maximum Likelihood II
Determine by minimizing sum-of-squares error, .
Maximize log likelihood
![Page 6: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/6.jpg)
Predictive Distribution
![Page 7: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/7.jpg)
MAP: A Step towards Bayes
Determine by minimizing regularized sum-of-squares error.
Maximum posterior
prior over parameters
hyper-parameter
likelihood
![Page 8: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/8.jpg)
MAP: A Step towards Bayes
Determine by minimizing regularized sum-of-squares error.
Surprise! Maximizing posterior is equivalent to minimizing regularized sum of square error function!
![Page 9: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/9.jpg)
Three Approaches
posterior ∝ likelihood × prior
p(Data | Parameters)
p(Parameters)p(Parameters | Data)
1. find parameters that maximize (log) likelihood 2. find parameters that maximize posterior (MAP)3. find the posterior (fully Bayesian)
p (t 0∣X , x0)=∫ p (t 0∣X , x0,Y ) p(Y∣X , x0)dY
![Page 10: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/10.jpg)
Bayesian Curve Fitting
![Page 11: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/11.jpg)
Bayesian Predictive Distribution
Mean of predictive distribution
![Page 12: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/12.jpg)
Review
posterior ∝ likelihood × prior
1. find parameters that can lead to over-fitting maximize (log) likelihood (yields parameters) 2. find parameters that avoids over-fitting maximize posterior (MAP) (yields parameters)3. find the posterior yields distribution (fully Bayesian)
![Page 13: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/13.jpg)
Review
posterior ∝ likelihood × prior
1. find parameters that can lead to over-fitting maximize (log) likelihood (yields parameters) 2. find parameters that avoids over-fitting maximize posterior (MAP) (yields parameters)3. find the posterior yields distribution (fully Bayesian)
![Page 14: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/14.jpg)
Model Selection
Cross-Validation
![Page 15: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/15.jpg)
Curse of Dimensionality
![Page 16: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/16.jpg)
Curse of Dimensionality
![Page 17: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/17.jpg)
Curse of Dimensionality
![Page 18: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/18.jpg)
Curse of Dimensionality
Polynomial curve fitting, M = 3
Gaussian Densities in higher dimensions
![Page 19: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/19.jpg)
Decision Theory
Inference stepDetermine either or .
Decision stepFor given x, determine optimal a (action).
To minimize misclassification: maximize posterior
regression
classification
p (cancer∣image)=p (image∣cancer ) p(cancer )
p (image)
![Page 20: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/20.jpg)
Minimum Misclassification Rate
![Page 21: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/21.jpg)
Minimum Expected Loss
Example: classify medical images as ‘cancer’ or ‘normal’
DecisionTru
thLoss function
![Page 22: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/22.jpg)
Minimum Expected Loss
Regions are chosen to minimize
elements in region j
real class is k
![Page 23: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/23.jpg)
Reject Option
![Page 24: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/24.jpg)
Why Separate Inference and Decision?• Minimizing risk (loss matrix may change over
time)• Reject option• Unbalanced class priors• Combining models
![Page 25: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/25.jpg)
Decision Theory for Regression
Inference stepDetermine .
Decision stepFor given x, make optimal prediction, y(x), for t.
Loss function:
![Page 26: Probability Theory and Parameter Estimation II](https://reader034.vdocuments.net/reader034/viewer/2022042107/6256d22bb19b9b25632180a4/html5/thumbnails/26.jpg)
The Squared Loss Function
As expected