statistical learning - chromechrome.ws.dei.polimi.it/images/0/03/ml-2017-02...statistical learning...

26
Prof. Matteo Matteucci Machine Learning Statistical Learning

Upload: others

Post on 26-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci

Machine LearningStatistical Learning

Page 2: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Statistical Learning Outline

o What Is Statistical Learning?

Why estimate f?

How do we estimate f?

The trade-off between

prediction accuracy & model interpretability

o Some important taxonomies (I expect you’ll know this by heart!)

Prediction vs. Inference

Parametric vs. Non Parametric models

Regression vs. Classification problems

Supervised vs. Unsupervised learning

2

X Y/G

f

Page 3: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Example: Increasing Sales by Advertising 3

Page 4: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

0.0 0.2 0.4 0.6 0.8 1.0

-0.1

0-0

.05

0.0

00

.05

0.1

0

x

y

0.0 0.2 0.4 0.6 0.8 1.0

-0.1

0-0

.05

0.0

00

.05

0.1

0

x

y

εi

f

What is Statistical Learning?

o Suppose we observe and for

Assume a relationship exists

between Y and at least one

of the observed X’s

Assume we can model this

relationship as

• f : unknown function systematic

• εi : zero mean random error

o The term statistical learning refers to using the data to “learn” f

iii fY )(X

Yi Xi = (Xi1,..., Xip) i =1,...,n

4

Page 5: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Reducible vs Irreducible Error

o The error our estimate will have has two components

Reducible error due to the choice of f (model complexity)

Irreducible error due to the presence of εi in the training set

5

iii fY )(X

X Y/G

f

Page 6: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Because noise matters … 6

0.0 0.2 0.4 0.6 0.8 1.0

-0.1

0-0

.05

0.0

00.0

50.1

0

sd=0.001

x

y

0.0 0.2 0.4 0.6 0.8 1.0

-0.1

0-0

.05

0.0

00.0

50.1

0

sd=0.005

x

y

0.0 0.2 0.4 0.6 0.8 1.0

-0.1

0-0

.05

0.0

00.0

50.1

0

sd=0.01

x

y

0.0 0.2 0.4 0.6 0.8 1.0

-0.1

00.0

00.0

50.1

0

sd=0.03

x

y

0.0 0.2 0.4 0.6 0.8 1.0

-0.1

0-0

.05

0.0

00.0

50.1

0

sd=0.001

x

y

0.0 0.2 0.4 0.6 0.8 1.0

-0.1

0-0

.05

0.0

00.0

50.1

0

sd=0.005

x

y

0.0 0.2 0.4 0.6 0.8 1.0

-0.1

0-0

.05

0.0

00.0

50.1

0

sd=0.01

x

y

0.0 0.2 0.4 0.6 0.8 1.0

-0.1

00.0

00.0

50.1

0

sd=0.03

x

y

Page 7: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Reducible vs Irreducible Error (Part 2)

o The error our estimate will have has two components

Reducible error due to the choice of f (model complexity)

Irreducible error due to the presence of εi in the training set

o Let assume and fixed for the time being

7

iii fY )(X

f̂ X

Page 8: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Reducible vs Irreducible Error (Part 3)

o Can you derive this?

8

0.0 0.2 0.4 0.6 0.8 1.0

-0.1

0-0

.05

0.0

00.0

50.1

0

sd=0.001

x

y

0.0 0.2 0.4 0.6 0.8 1.0-0

.10

-0.0

50.0

00.0

50.1

0

sd=0.005

x

y

0.0 0.2 0.4 0.6 0.8 1.0

-0.1

0-0

.05

0.0

00.0

50.1

0

sd=0.01

x

y

0.0 0.2 0.4 0.6 0.8 1.0

-0.1

00.0

00.0

50.1

0

sd=0.03

x

y

X

)(ˆˆ XfY

)(XfY

22 )( XfY

11 )( XfY

)())(ˆ)((

0][))(ˆ)((

][))(ˆ)(2)(ˆ)((

)(ˆ)(2)(ˆ][)(

)(ˆ)(2)(ˆ][2

)(][2)(ˆ][)(

)](ˆ)(2)(ˆ2

)(2)(ˆ)([

]))(ˆ)([(

])ˆ[(

2

22

222

222

222

222

2

2

VarXfXf

EXfXf

EXfXfXfXf

XfXfXfEXf

XfXfXfE

XfEXfEXf

XfXfXf

XfXfXfE

XfXfE

YYE

Page 9: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Example: Income vs. Education Seniority

o Function f might also involve multiple variables …

9

Page 10: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Why do we estimate f ?

o There are 2 reasons for estimating f

Prediction

Inference

o Prediction

If we can produce a good estimate for f (and the variance of εis not too large) we can make accurate predictions for the

response, Y/G, based on a new value of X.

o Inference

We may be interested in the type of relationship between

Y/G and the X's to control/influence Y/G.

• Which particular predictors actually affect the response?

• Is the relationship positive or negative?

• Is the relationship a simple linear one or is it more complicated etc.?

10

X Y/G

f

Page 11: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Examples for Prediction & Inference

o Direct Mail Prediction

Interested in predicting how much money an individual will

donate based on observations from 90,000 people on which

we have recorded over 400 different characteristics.

Don’t care too much about each individual characteristic.

Just want to know: For a given individual should I send out a

mailing?

o Medium House Price

Which factors have the biggest effect on the response

How big the effect is.

Want to know: how much impact does a river view have on

the house value

11

Page 12: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

How Do We Estimate f?

o We have observed a set of training data

o Use statistical method/model to

estimate f so that for any (X, Y)

o Statistical methods/models are usually divided in

Parametric Methods/Models

Non-parametric Methods/Models

12

)},(,),,(),,{( 2211 nn YYY XXX

0.0 0.2 0.4 0.6 0.8 1.0

-0.1

0-0

.05

0.0

00

.05

0.1

0

x

y

Page 13: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Parametric Methods (Part 1)

o Parametric methods leverage on an assumption about the model

underlining f

They reduce the problem of estimating f down to the one of

estimating a set of parameters

They involve a two-step model based approach

o STEP 1: Make some assumption about the functional form of f,

i.e. come up with a model (e.g., a linear model)

o STEP 2: Use the training data to fit the model, i.e., estimate f

through the unknown parameters

13

ippiii XXXf 22110)(X

p 210

Page 14: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Parametric Methods (Part 2)

o Parametric methods leverage on an assumption about the model

underlining f

They reduce the problem of estimating f down to the one of

estimating a set of parameters

They involve a two-step model based approach

o STEP 1: In this course we will examine far more complicated, and

flexible, models for f w.r.t linear ones. In a sense the more

flexible the model the more realistic it is.

o STEP 2: The most common approach for estimating the

parameters in a linear model is Ordinary Least Squares (OLS),

but there are often superior approaches.

14

Page 15: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Example: A Linear Regression Estimate

o Even if the standard deviation is low we will still get a bad answer

if we use the wrong model.

15

f = b0 +b1 ´Education+b2 ´Seniority

Page 16: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Non-parametric Methods

o Sometimes they are referred as “sample-based” or “instance-

based” methods, they do not make explicit assumptions about

the functional form of f, they exploit the training data “directly”

o Advantages:

They accurately fit a wider range of possible shapes of f

They do not require a “trainining” phase

o Disadvantages:

A very large number of observations is required to obtain

an accurate estimate of f

Higher computational cost at “testing” time

They accurately fit a wider range of possible shapes of f.

16

Page 17: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Example: A Thin-Plate Spline Estimate

o Non-parametric regression methods are more flexible thus they

can potentially provide more accurate estimates

17

Smooth thin-plate spline fit

Page 18: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Prediction Accuracy vs Model Interpretability

o Why not just use a more flexible method if it is more realistic?

Reason 1:

A simple method such as linear regression produces a model which

is much easier to interpret (the Inference part is better).

E.g., in a linear model, βj is the average increase in Y for a one

unit increase in Xj holding all other variables constant.

Reason 2:

Even if you are only interested in prediction, it is often possible to

get more accurate predictions with a simple, instead of a

complicated, model.

This seems counter intuitive but has to do with the fact that it

is harder to fit properly a more flexible model.

18

Page 19: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

A Poor Estimate

o Non-parametric regression methods can also be too flexible and

produce poor estimates for f

19

Thin-plate spline fit with

zero training error

Page 20: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Flexibility vs Model Interpretability 20

Page 21: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Supervised vs. Unsupervised Learning

o Machine Learning makes usually a clear distinction between

Supervised Models

Unsupervised Models

o Supervised Learning:

Supervised Learning is where both the predictors, Xi, and the

response, Yi, are observed.

21

Page 22: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Supervised vs. Unsupervised Learning

o Machine Learning makes usually a clear distinction between

Supervised Models

Unsupervised Models

o Unsupervised Learning:

Only the Xi’s are observed and use them to build a high level

representation (possibly for modeling some Y)

22

Page 23: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Regression vs. Classification

o Supervised learning problems can be further divided into

Regression problems cover situations where Y is

continuous/numerical

• Predicting the value of the Dow in 6 months

• Predicting the value of a given house based on various inputs.

Classification problems cover situations where Y is categorical

• Will the Dow be up (U) or down (D) in 6 months?

• Is this email a SPAM or not?

23

Page 24: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

A Simple Clustering Example 24

Page 25: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

What about higher dimensions? 25

Page 26: Statistical Learning - Chromechrome.ws.dei.polimi.it/images/0/03/ML-2017-02...Statistical Learning Outline o What Is Statistical Learning? Why estimate f? How do we estimate f? The

Prof. Matteo Matteucci - Machine Learning

Wrap up!

o What Is Statistical Learning?

Why estimate f?

How do we estimate f?

The trade-off between

prediction accuracy & model interpretability

o Some important taxonomies (I expect you’ll know this by heart!)

Prediction vs. Inference

Parametric vs. Non Parametric models

Regression vs. Classification problems

Supervised vs. Unsupervised learning

26

X Y/G

f