pattern recognition and machine learning

45
Source: Bishop book chapter 1 with modifications by Christoph F. Eick PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION

Upload: conor

Post on 25-Feb-2016

112 views

Category:

Documents


3 download

DESCRIPTION

Source: Bishop book chapter 1 with modifications by Christoph F. Eick. Pattern Recognition and Machine Learning. Chapter 1: Introduction. Polynomial Curve Fitting. Experiment: Given a function; create N training example. What M should we choose?  Model Selection - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pattern Recognition  and  Machine Learning

Source: Bishop book chapter 1 with modifications by Christoph F. Eick

PATTERN RECOGNITION AND MACHINE LEARNINGCHAPTER 1: INTRODUCTION

Page 2: Pattern Recognition  and  Machine Learning

Polynomial Curve Fitting

What M should we choose?Model Selection

Given M, what w’s should we choose? Parameter Selection

Experiment: Given a function; create N training example

Page 3: Pattern Recognition  and  Machine Learning

Sum-of-Squares Error Function

Page 4: Pattern Recognition  and  Machine Learning

0th Order Polynomial

How do M, the quality of fitting and the capability to generalize relate to each other??

As NE As c (H)first Eand then E s c (H)the training error

decreases for some time and then stays constant (frequently at 0)

Page 5: Pattern Recognition  and  Machine Learning

1st Order Polynomial

Page 6: Pattern Recognition  and  Machine Learning

3rd Order Polynomial

Page 7: Pattern Recognition  and  Machine Learning

9th Order Polynomial

Page 8: Pattern Recognition  and  Machine Learning

Over-fitting

Root-Mean-Square (RMS) Error:

Page 9: Pattern Recognition  and  Machine Learning

Polynomial Coefficients

Page 10: Pattern Recognition  and  Machine Learning

Data Set Size:

9th Order Polynomial

Page 11: Pattern Recognition  and  Machine Learning

Data Set Size:

9th Order Polynomial

Increasing the size of the data sets alleviates the over-fitting problem.

Page 12: Pattern Recognition  and  Machine Learning

Regularization

Penalize large coefficient values

Idea: penalize high weights that contribute to highvariance and sensitivity to outliers.

Page 13: Pattern Recognition  and  Machine Learning

Regularization: 9th Order Polynomial

Page 14: Pattern Recognition  and  Machine Learning

Regularization:

Page 15: Pattern Recognition  and  Machine Learning

Regularization: vs.

Page 16: Pattern Recognition  and  Machine Learning

The example demonstrated:

As NE As c (H)first Eand then E s c (H)the training error decreases for

some time and then stays constant (frequently at 0)

Page 17: Pattern Recognition  and  Machine Learning

Polynomial Coefficients

Weight of regularization increases

Page 18: Pattern Recognition  and  Machine Learning

Probability Theory

Apples and Oranges

Page 19: Pattern Recognition  and  Machine Learning

Probability Theory

Marginal Probability

Conditional ProbabilityJoint Probability

Page 20: Pattern Recognition  and  Machine Learning

Probability Theory

Sum Rule

Product Rule

Page 21: Pattern Recognition  and  Machine Learning

The Rules of Probability

Sum Rule

Product Rule

Page 22: Pattern Recognition  and  Machine Learning

Bayes’ Theorem

posterior likelihood × prior

Page 23: Pattern Recognition  and  Machine Learning

Probability DensitiesCumulative Distribution Function

Usually in ML!

Page 24: Pattern Recognition  and  Machine Learning

Transformed Densities

Markus Svensén
This figure was taken from Solution 1.4 in the web-edition of the solutions manual for PRML, available at http://research.microsoft.com/~cmbishop/PRML. A more thorough explanation of what the figure shows is provided in the text of the solution.
Page 25: Pattern Recognition  and  Machine Learning

Expectations (f under p(x))

Conditional Expectation(discrete)

Approximate Expectation(discrete and continuous)

Page 26: Pattern Recognition  and  Machine Learning

Variances and Covariances

Page 27: Pattern Recognition  and  Machine Learning

The Gaussian Distribution

Page 28: Pattern Recognition  and  Machine Learning

Gaussian Mean and Variance

Page 29: Pattern Recognition  and  Machine Learning

The Multivariate Gaussian

Page 30: Pattern Recognition  and  Machine Learning

Gaussian Parameter Estimation

Likelihood function

Compare: for 2, 2.1, 1.9,2.05,1.99 N(2,1) and N(3.1)

Page 31: Pattern Recognition  and  Machine Learning

Maximum (Log) Likelihood

Page 32: Pattern Recognition  and  Machine Learning

Properties of and

Page 33: Pattern Recognition  and  Machine Learning

Curve Fitting Re-visited

Page 34: Pattern Recognition  and  Machine Learning

Maximum Likelihood

Determine by minimizing sum-of-squares error, .

Page 35: Pattern Recognition  and  Machine Learning

Predictive DistributionSkip initially

Page 36: Pattern Recognition  and  Machine Learning

Model Selection

Cross-Validation

Page 37: Pattern Recognition  and  Machine Learning

Entropy

Important quantity in• coding theory• statistical physics• machine learning

Page 38: Pattern Recognition  and  Machine Learning

Entropy

Coding theory: x discrete with 8 possible states; how many bits to transmit the state of x?

All states equally likely

Page 39: Pattern Recognition  and  Machine Learning

Entropy

Page 40: Pattern Recognition  and  Machine Learning

Entropy

In how many ways can N identical objects be allocated M bins?

Entropy maximized when

Page 41: Pattern Recognition  and  Machine Learning

Entropy

Page 42: Pattern Recognition  and  Machine Learning

Differential Entropy

Put bins of width ¢ along the real line

Differential entropy maximized (for fixed ) when

in which case

Page 43: Pattern Recognition  and  Machine Learning

Conditional Entropy

Page 44: Pattern Recognition  and  Machine Learning

The Kullback-Leibler Divergence

Page 45: Pattern Recognition  and  Machine Learning

Mutual Information