pattern recognition and machine learning

59
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION

Upload: fifi

Post on 25-Feb-2016

30 views

Category:

Documents


3 download

DESCRIPTION

Pattern Recognition and Machine Learning. Chapter 1: Introduction. Example. Handwritten Digit Recognition. Polynomial Curve Fitting. Sum-of-Squares Error Function. 0 th Order Polynomial. 1 st Order Polynomial. 3 rd Order Polynomial. 9 th Order Polynomial. Over-fitting. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pattern Recognition  and  Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNINGCHAPTER 1: INTRODUCTION

Page 2: Pattern Recognition  and  Machine Learning

Example

Handwritten Digit Recognition

Page 3: Pattern Recognition  and  Machine Learning

Polynomial Curve Fitting

Page 4: Pattern Recognition  and  Machine Learning

Sum-of-Squares Error Function

Page 5: Pattern Recognition  and  Machine Learning

0th Order Polynomial

Page 6: Pattern Recognition  and  Machine Learning

1st Order Polynomial

Page 7: Pattern Recognition  and  Machine Learning

3rd Order Polynomial

Page 8: Pattern Recognition  and  Machine Learning

9th Order Polynomial

Page 9: Pattern Recognition  and  Machine Learning

Over-fitting

Root-Mean-Square (RMS) Error:

Page 10: Pattern Recognition  and  Machine Learning

Polynomial Coefficients

Page 11: Pattern Recognition  and  Machine Learning

Data Set Size:

9th Order Polynomial

Page 12: Pattern Recognition  and  Machine Learning

Data Set Size:

9th Order Polynomial

Page 13: Pattern Recognition  and  Machine Learning

Regularization

Penalize large coefficient values

Page 14: Pattern Recognition  and  Machine Learning

Regularization:

Page 15: Pattern Recognition  and  Machine Learning

Regularization:

Page 16: Pattern Recognition  and  Machine Learning

Regularization: vs.

Page 17: Pattern Recognition  and  Machine Learning

Polynomial Coefficients

Page 18: Pattern Recognition  and  Machine Learning

Probability Theory

Apples and Oranges

Page 19: Pattern Recognition  and  Machine Learning

Probability Theory

Marginal Probability

Conditional ProbabilityJoint Probability

Page 20: Pattern Recognition  and  Machine Learning

Probability Theory

Sum Rule

Product Rule

Page 21: Pattern Recognition  and  Machine Learning

The Rules of Probability

Sum Rule

Product Rule

Page 22: Pattern Recognition  and  Machine Learning

Bayes’ Theorem

posterior likelihood × prior

Page 23: Pattern Recognition  and  Machine Learning

Probability Densities

Page 24: Pattern Recognition  and  Machine Learning

Transformed Densities

Markus Svensén
This figure was taken from Solution 1.4 in the web-edition of the solutions manual for PRML, available at http://research.microsoft.com/~cmbishop/PRML. A more thorough explanation of what the figure shows is provided in the text of the solution.
Page 25: Pattern Recognition  and  Machine Learning

Expectations

Conditional Expectation(discrete)

Approximate Expectation(discrete and continuous)

Page 26: Pattern Recognition  and  Machine Learning

Variances and Covariances

Page 27: Pattern Recognition  and  Machine Learning

The Gaussian Distribution

Page 28: Pattern Recognition  and  Machine Learning

Gaussian Mean and Variance

Page 29: Pattern Recognition  and  Machine Learning

The Multivariate Gaussian

Page 30: Pattern Recognition  and  Machine Learning

Gaussian Parameter Estimation

Likelihood function

Page 31: Pattern Recognition  and  Machine Learning

Maximum (Log) Likelihood

Page 32: Pattern Recognition  and  Machine Learning

Properties of and

Page 33: Pattern Recognition  and  Machine Learning

Curve Fitting Re-visited

Page 34: Pattern Recognition  and  Machine Learning

Maximum Likelihood

Determine by minimizing sum-of-squares error, .

Page 35: Pattern Recognition  and  Machine Learning

Predictive Distribution

Page 36: Pattern Recognition  and  Machine Learning

MAP: A Step towards Bayes

Determine by minimizing regularized sum-of-squares error, .

Page 37: Pattern Recognition  and  Machine Learning

Bayesian Curve Fitting

Page 38: Pattern Recognition  and  Machine Learning

Bayesian Predictive Distribution

Page 39: Pattern Recognition  and  Machine Learning

Model Selection

Cross-Validation

Page 40: Pattern Recognition  and  Machine Learning

Curse of Dimensionality

Page 41: Pattern Recognition  and  Machine Learning

Curse of Dimensionality

Polynomial curve fitting, M = 3

Gaussian Densities in higher dimensions

Page 42: Pattern Recognition  and  Machine Learning

Decision Theory

Inference stepDetermine either or .

Decision stepFor given x, determine optimal t.

Page 43: Pattern Recognition  and  Machine Learning

Minimum Misclassification Rate

Page 44: Pattern Recognition  and  Machine Learning

Minimum Expected Loss

Example: classify medical images as ‘cancer’ or ‘normal’

DecisionTr

uth

Page 45: Pattern Recognition  and  Machine Learning

Minimum Expected Loss

Regions are chosen to minimize

Page 46: Pattern Recognition  and  Machine Learning

Reject Option

Page 47: Pattern Recognition  and  Machine Learning

Why Separate Inference and Decision?

• Minimizing risk (loss matrix may change over time)• Reject option• Unbalanced class priors• Combining models

Page 48: Pattern Recognition  and  Machine Learning

Decision Theory for Regression

Inference stepDetermine .

Decision stepFor given x, make optimal prediction, y(x), for t.

Loss function:

Page 49: Pattern Recognition  and  Machine Learning

The Squared Loss Function

Page 50: Pattern Recognition  and  Machine Learning

Generative vs Discriminative

Generative approach: ModelUse Bayes’ theorem

Discriminative approach: Model directly

Page 51: Pattern Recognition  and  Machine Learning

Entropy

Important quantity in• coding theory• statistical physics• machine learning

Page 52: Pattern Recognition  and  Machine Learning

Entropy

Coding theory: x discrete with 8 possible states; how many bits to transmit the state of x?

All states equally likely

Page 53: Pattern Recognition  and  Machine Learning

Entropy

Page 54: Pattern Recognition  and  Machine Learning

Entropy

In how many ways can N identical objects be allocated M bins?

Entropy maximized when

Page 55: Pattern Recognition  and  Machine Learning

Entropy

Page 56: Pattern Recognition  and  Machine Learning

Differential Entropy

Put bins of width ¢ along the real line

Differential entropy maximized (for fixed ) when

in which case

Page 57: Pattern Recognition  and  Machine Learning

Conditional Entropy

Page 58: Pattern Recognition  and  Machine Learning

The Kullback-Leibler Divergence

Page 59: Pattern Recognition  and  Machine Learning

Mutual Information