pattern’recognition’’...
TRANSCRIPT
![Page 1: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/1.jpg)
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION
![Page 2: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/2.jpg)
Example
Handwri/en Digit Recogni6on
![Page 3: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/3.jpg)
Polynomial Curve Fi=ng
• The input data set x of 10 points was generated by choosing values of xn, for n = 1, . . . , 10, spaced uniformly in range [0, 1]. • The target data set t was obtained by first computing the corresponding values of the function sin(2πx) and then adding a small level of random noise having a Gaussian distribution to each corresponding value tn.
![Page 4: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/4.jpg)
Sum-‐of-‐Squares Error Func6on
Note: E(w) is a quadratic function w.r.t. each wi. Thus the partial derivative of E(w) w.r.t. each wi is a linear function of w’s.
This method is also called Linear Least Squares
![Page 5: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/5.jpg)
0th Order Polynomial
![Page 6: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/6.jpg)
1st Order Polynomial
![Page 7: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/7.jpg)
3rd Order Polynomial
![Page 8: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/8.jpg)
9th Order Polynomial
![Page 9: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/9.jpg)
Over-‐fi=ng
Root-‐Mean-‐Square (RMS) Error:
Test set: 100 data points generated using exactly the same procedure used to generate the training set points but with new choices for the random noise values included in the target values tn.
![Page 10: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/10.jpg)
Polynomial Coefficients
![Page 11: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/11.jpg)
Data Set Size:
9th Order Polynomial
![Page 12: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/12.jpg)
Data Set Size: 9th Order Polynomial
One rough heuristic that is sometimes advocated is that the number of data points should be no less than some multiple (say 5 or 10) of the number of adaptive parameters in the model. (However, the number of parameters is not necessarily the most appropriate measure of model complexity - a measure of how hard it is to learn from limited data.)
![Page 13: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/13.jpg)
Regulariza6on
Penalize large coefficient values
![Page 14: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/14.jpg)
Regulariza6on:
![Page 15: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/15.jpg)
Regulariza6on:
![Page 16: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/16.jpg)
Regulariza6on: vs.
![Page 17: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/17.jpg)
Polynomial Coefficients
![Page 18: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/18.jpg)
Probability Theory
Marginal Probability Condi6onal Probability Joint Probability
![Page 19: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/19.jpg)
Probability Theory
Sum Rule
Product Rule
![Page 20: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/20.jpg)
The Rules of Probability
Sum Rule
Product Rule Also: p( X, Y ) = p(X | Y) p(Y)
p(Y | X) p(X) = p(X | Y) p(Y)
Thus: 1.
2.
![Page 21: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/21.jpg)
Bayes’ Theorem
posterior ∝ likelihood × prior
![Page 22: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/22.jpg)
Probability Theory
Apples and Oranges
4/10 6/10
![Page 23: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/23.jpg)
Bayes’ Theorem Generalized
P(T|X) = sum over W of p(T|X, W) p(W|X)
P(W|T, X) = p(T|X, W) p(W|X) / p(T|X)
P(W|T) = p(T|W) p(W) / p(T)
![Page 24: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/24.jpg)
Probability Densi6es
P’(x) = p(x)
![Page 25: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/25.jpg)
Transformed Densi6es
Suppose x = g(y)
![Page 26: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/26.jpg)
Expecta6ons
Condi6onal Expecta6on (discrete)
Approximate Expecta6on (discrete and con6nuous)
![Page 27: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/27.jpg)
Variances and Covariances
(standard deviation’s square)
(a symmetric matrix)
![Page 28: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/28.jpg)
The Gaussian Distribu6on
![Page 29: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/29.jpg)
Gaussian Mean and Variance
(mean)
(variance)
(precision – the bigger is, the smaller is, thus the more “precise” the distribution is.)
![Page 30: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/30.jpg)
The Gaussian Distribu6on
For the normal distribution, the values less than one standard deviation away from the mean account for 68.27% of the set; while two standard deviations from the mean account for 95.45%; and three standard deviations account for 99.73%.
![Page 31: PATTERN’RECOGNITION’’ AND’MACHINE’LEARNING’szhou/568/BayesianCurveFitting_1.pdfPolynomial(Curve(Fing ((• The input data set x of 10 points was generated by choosing values](https://reader034.vdocuments.net/reader034/viewer/2022042305/5ed0fe23d3b82946a13ab7a2/html5/thumbnails/31.jpg)
The Mul6variate Gaussian Gaussian distribution defined over a D-dimensional vector x of continuous variables:
where the D-dimensional vector µ is called the mean, the D x D symmetric matrix Σ is called the covariance, and |Σ| denotes the determinant of Σ.