pattern recognition and machine learning

Source: Bishop book chapter 1 with modifications by Christoph F. Eick

PATTERN RECOGNITION AND MACHINE LEARNINGCHAPTER 1: INTRODUCTION

Polynomial Curve Fitting

What M should we choose?Model Selection

Given M, what w’s should we choose? Parameter Selection

Experiment: Given a function; create N training example

Sum-of-Squares Error Function

0th Order Polynomial

How do M, the quality of fitting and the capability to generalize relate to each other??

As NE As c (H)first Eand then E s c (H)the training error

decreases for some time and then stays constant (frequently at 0)

1st Order Polynomial

3rd Order Polynomial

Over-fitting

Root-Mean-Square (RMS) Error:

Polynomial Coefficients

Data Set Size:

Increasing the size of the data sets alleviates the over-fitting problem.

Regularization

Penalize large coefficient values

Idea: penalize high weights that contribute to highvariance and sensitivity to outliers.

Regularization: 9th Order Polynomial

Regularization:

Regularization: vs.

The example demonstrated:

As NE As c (H)first Eand then E s c (H)the training error decreases for

some time and then stays constant (frequently at 0)

Polynomial Coefficients

Weight of regularization increases

Probability Theory

Apples and Oranges

Probability Theory

Marginal Probability

Conditional ProbabilityJoint Probability

Probability Theory

Sum Rule

Product Rule

The Rules of Probability

Sum Rule

Product Rule

Bayes’ Theorem

posterior likelihood × prior

Probability DensitiesCumulative Distribution Function

Usually in ML!

Transformed Densities

Expectations (f under p(x))

Conditional Expectation(discrete)

Approximate Expectation(discrete and continuous)

Variances and Covariances

The Gaussian Distribution

Gaussian Mean and Variance

The Multivariate Gaussian

Gaussian Parameter Estimation

Likelihood function

Compare: for 2, 2.1, 1.9,2.05,1.99 N(2,1) and N(3.1)

Maximum (Log) Likelihood

Properties of and

Curve Fitting Re-visited

Maximum Likelihood

Determine by minimizing sum-of-squares error, .

Predictive DistributionSkip initially

Model Selection

Cross-Validation

Entropy

Important quantity in• coding theory• statistical physics• machine learning

Entropy

Coding theory: x discrete with 8 possible states; how many bits to transmit the state of x?

All states equally likely

Entropy

In how many ways can N identical objects be allocated M bins?

Entropy maximized when

Entropy

Differential Entropy

Put bins of width ¢ along the real line

Differential entropy maximized (for fixed ) when

in which case

Conditional Entropy

The Kullback-Leibler Divergence

Mutual Information

pattern recognition and machine learning

order polynomialincreasing

order polynomialregularization

eas c h

training error decreases

order polynomialdata

n training examplesum

meansquare rms error

n identical objects

Documents

christopher m. bishop, pattern recognition and machine...

center for machine perception statistical pattern...

machine learning and pattern recognition *1cm basics...

ece 8443 – pattern recognition ece 8527 – introduction...

pattern recognition and machine learning.pdf

machine learning and pattern recognition fall 2004

pattern recognition and machine learning -...

pattern recognition and machine learning (1.1)

pattern recognition and machine learning -...

statistical pattern recognition: a review - pattern ... ·...

an overview of machine learning and pattern recognition

pattern recognition advanced the support vector machine

pattern recognition and machine learning : graphical … ·...

pattern recognition and machine learning -...

christopher m. bishop pattern recognition and machine...

pattern recognition and machine learning

cellular automata machine for pattern recognition

image analysis lecture 9.3 - introduction to machine...

pattern recognition and machine learning: section 3.3

pattern recognition and machine...