Transcript
  • Introduction to Predictive ModelsThe Bias Variance Tradeoff

    Cross Validation

    Some of the figures in this presentation are taken from An Introduction toStatistical Learning, with applications in R (Springer, 2013) with permission

    from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani

    Carlos Carvalho Mladen Kolar, and Rob McCulloch

  • 1. Introduction to Predictive Models2. Measuring Accuracy3. Out-of-Sample Predictions4. Bias-Variance Trade-Off5. Cross-Validation6. More on k-Nearest Neighbors, p > 17. Doing CV with a Bigger n

  • 1. Introduction to Predictive Models

    Simply put, the goal is to predict atarget variable Y with input variables X !

    In Data Mining terminology this is know as supervised learning(also called Predictive Analytics).

    In general, a useful way to think about it is that Y and X arerelated in the following way:

    Yi = f (Xi ) + �i

    The main purpose of this part of the course is to learn or estimatef (·) from data

    1

  • Examples:

    I Y: will a customer respond to a promotion (target marketing).

    I Y: which customer is likely to cancel

    I Y: the lifetime value of a customer (how much will theyspend).

    I Y: pregnancy (from shopping behaviour) so you can target.

    I Y: will a customer defect.

    I Y: predict which products a customer will like (Pandora,Amazon).

    I Y: predict age of death (insurance companies)

    I ...

    See Tables 1-9 after page 142 of “Predictive Analytics” by EricSiegel for many examples.

    2

  • Y = f (X ) + �

    I f (x): the part of Y you learn from X , the signal.

    I �: the part of Y you don’t learn from X , the noise.

    More generally,we want the conditional distribution of Y given X = x .

    3

  • Example: Boston Housing

    We might be interested in predicting the median house value as afunction of some measure of social economic level... here’s somedata:

    Each observation corre-sponds to a town in theBoston area.

    medv: median house value(data is old).

    lstat: % lower status.

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ● ●● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ● ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●● ●●

    ● ●

    ●●

    ● ●

    ●●● ●

    ●● ● ●

    ●●

    ● ●

    ●●

    ●●●

    ● ●

    ●● ●

    ● ●

    ●● ●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    10 20 30

    1020

    3040

    50

    lstat

    med

    v

    What should f (·) be?

    4

  • How about this...

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ● ●● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ● ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●● ●●

    ● ●

    ●●

    ● ●

    ●●● ●

    ●● ● ●

    ●●

    ● ●

    ●●

    ●●●

    ● ●

    ●● ●

    ● ●

    ●● ●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    10 20 30

    1020

    3040

    50

    lstat

    med

    v

    If lstat = 30 what is the prediction for medv?

    5

  • or this?

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ● ●● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ● ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●● ●●

    ● ●

    ●●

    ● ●

    ●●● ●

    ●● ● ●

    ●●

    ● ●

    ●●

    ●●●

    ● ●

    ●● ●

    ● ●

    ●● ●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    10 20 30

    1020

    3040

    50

    lstat

    med

    v

    If lstat = 30 what is the prediction for medv?

    6

  • How do we estimate f (·)?

    I Using training data:

    {(X1,Y1), (X2,Y2), . . . , (Xn,Yn)}

    I We use a statistical method to estimate the function f (·)I Two general methodological strategies:

    1. simple parametric models (restricted assumptions about f (·))2. non-parametric models (flexibility in defining f (·))

    Years of Education

    Sen

    iorit

    y

    Incom

    e

    Years of Education

    Sen

    iorit

    y

    Incom

    e

    7

  • Back to Boston Housing

    Parametric Model Non-Parametric Model(Y = α + βx + �) (k-nearest neighbors)

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ● ●● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ● ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●● ●●

    ● ●

    ●●

    ● ●

    ●●● ●

    ●● ● ●

    ●●

    ● ●

    ●●

    ●●●

    ● ●

    ●● ●

    ● ●

    ●● ●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    10 20 30

    1020

    3040

    50

    lstat

    med

    v

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ● ●● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ● ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●● ●●

    ● ●

    ●●

    ● ●

    ●●● ●

    ●● ● ●

    ●●

    ● ●

    ●●

    ●●●

    ● ●

    ●● ●

    ● ●

    ●● ●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    10 20 3010

    2030

    4050

    lstat

    med

    v

    8

  • Simple parametric model:

    Yi = α + β xi + �i

    Using the training data,we estimate f (x) as

    f̂ (x) = α̂ + β̂ x

    where α̂ and β̂are the linearregression estimates.

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ● ●● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ● ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●● ●●

    ● ●

    ●●

    ● ●

    ●●● ●

    ●● ● ●

    ●●

    ● ●

    ●●

    ●●●

    ● ●

    ●● ●

    ● ●

    ●● ●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    10 20 30

    1020

    3040

    50

    lstat

    med

    v

    9

  • To get this estimate we usedkNN- k-nearest neighbors.

    To estimate f (xf ), average they values for the k training ob-servations with x closest to xf .

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ● ●● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ● ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●● ●●

    ● ●

    ●●

    ● ●

    ●●● ●

    ●● ● ●

    ●●

    ● ●

    ●●

    ●●●

    ● ●

    ●● ●

    ● ●

    ●● ●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    10 20 3010

    2030

    4050

    lstat

    med

    v

    What do I mean by closest?We will choose the k=50 points that are closest to the X value atwhich we are trying to predict.

    10

  • ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ● ●● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ● ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●● ●●

    ● ●

    ●●

    ● ●

    ●●● ●

    ●● ● ●

    ●●

    ● ●

    ●●

    ●●●

    ● ●

    ●● ●

    ● ●

    ●● ●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    10 20 30

    1020

    3040

    50

    k= 50

    lstat

    med

    v

    ●●

    ●●

    ●●●

    ●●

    11

  • ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ● ●● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ● ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●● ●●

    ● ●

    ●●

    ● ●

    ●●● ●

    ●● ● ●

    ●●

    ● ●

    ●●

    ●●●

    ● ●

    ●● ●

    ● ●

    ●● ●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    10 20 30

    1020

    3040

    50

    k= 50

    lstat

    med

    v

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    12

  • ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ● ●● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ● ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●● ●●

    ● ●

    ●●

    ● ●

    ●●● ●

    ●● ● ●

    ●●

    ● ●

    ●●

    ●●●

    ● ●

    ●● ●

    ● ●

    ●● ●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    10 20 30

    1020

    3040

    50

    k= 50

    lstat

    med

    v

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●

    ●●

    13

  • ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ● ●● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ● ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●● ●●

    ● ●

    ●●

    ● ●

    ●●● ●

    ●● ● ●

    ●●

    ● ●

    ●●

    ●●●

    ● ●

    ●● ●

    ● ●

    ●● ●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    10 20 30

    1020

    3040

    50

    k= 50

    lstat

    med

    v

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ● ●●

    14

  • ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ● ●● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ● ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●● ●●

    ● ●

    ●●

    ● ●

    ●●● ●

    ●● ● ●

    ●●

    ● ●

    ●●

    ●●●

    ● ●

    ●● ●

    ● ●

    ●● ●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    10 20 30

    1020

    3040

    50

    k= 50

    lstat

    med

    v

    ●●

    ●●

    ●●

    ●●

    ●●

    15

  • ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●●

    ●●

    ●●

    ●● ●

    ●● ●

    ●●

    ● ●● ●

    ●●

    ● ●

    ●●

    ●●

    ●●●

    ● ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ●●

    ● ●

    ●● ●

    ●●

    ●●● ●●

    ● ●

    ●●

    ● ●

    ●●● ●

    ●● ● ●

    ●●

    ● ●

    ●●

    ●●●

    ● ●

    ●● ●

    ● ●

    ●● ●

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●● ●●

    ● ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    10 20 30

    1020

    3040

    50

    k= 50

    lstat

    med

    v

    ●●

    ●●●

    ●●

    ●●

    ●●

    ●●

    ●●●

    16

  • Okay, that seems sensible, but, 2 neighbors or 200 neighbors?

    ●●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●● ●●

    ● ●

    ● ●

    ●●●

    ●●

    ● ●

    ●●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●●

    ● ●●

    ●●

    10 20 30

    1020

    3040

    50

    k= 2

    lstat

    med

    v

    17

  • ●●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●● ●●

    ● ●

    ● ●

    ●●●

    ●●

    ● ●

    ●●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●●

    ● ●●

    ●●

    10 20 30

    1020

    3040

    50

    k= 10

    lstat

    med

    v

    18

  • ●●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●● ●●

    ● ●

    ● ●

    ●●●

    ●●

    ● ●

    ●●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●●

    ● ●●

    ●●

    10 20 30

    1020

    3040

    50

    k= 50

    lstat

    med

    v

    19

  • ●●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●● ●●

    ● ●

    ● ●

    ●●●

    ●●

    ● ●

    ●●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●●

    ● ●●

    ●●

    10 20 30

    1020

    3040

    50

    k= 100

    lstat

    med

    v

    20

  • ●●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●● ●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ●●

    ● ●

    ● ●● ●●

    ● ●

    ● ●

    ●●●

    ●●

    ● ●

    ●●●

    ●●

    ● ●

    ●●

    ●●

    ●●

    ● ●●

    ● ●●

    ●●

    10 20 30

    1020

    3040

    50

    k= 150

    lstat

    med

    v

    21

  • ●●●

    ●●

    ●●

    ●●

    ● ●

    ●●

    ● ●● ●

    ● ●

    ●●

    ●●

    ●●

    ●●

    ●● ●


Top Related