lecture 12: the bootstrap · lecture 12: the bootstrap reading: chapter 5 stats 202: data mining...

Lecture 12: The Bootstrap

Reading: Chapter 5

STATS 202: Data mining and analysis

Jonathan Taylor, 10/19Slide credits: Sergio Bacallado

1 / 14

Announcements

I Midterm is a week from today

I Topics: chapters 1-5 and 10 of the book — everything untiland including today’s lecture.

I We will post a practice exam.

I Notes: 1 page double sided or 2 pages single sided. Closedbook.

I No calculators necessary.

I SCPD students: if you haven’t chosen your proctor already,you must do it ASAP. For guidelines see:

http://scpd.stanford.edu/programs/courses/graduate-courses/exam-monitor-information

2 / 14

Announcements

2 / 14

Announcements

2 / 14

Announcements

2 / 14

Announcements

2 / 14

Announcements

2 / 14

Cross-validation vs. the Bootstrap

Cross-validation: provides estimates of the (test) error.

The Bootstrap: provides the (standard) error of estimates.

I One of the most important techniques in allof Statistics.

I Computer intensive method.

I Popularized by Brad Efron, from Stanford.

3 / 14

Standard errors in linear regression

Standard error: SD of an estimate from a sample of size n.

4 / 14

Classical way to compute Standard Errors

Example: Estimate the variance of a sample x1, x2, . . . , xn:

σ̂2 =1

n− 1

n∑i=1

(xi − x)2.

What is the Standard Error of σ̂2?

1. Assume that x1, . . . , xn are normally distributed with commonmean µ and variance σ2.

2. Then σ̂2(n− 1) has a χ-squared distribution with n− 1degrees of freedom.

3. For large n, σ̂2 is normally distributed around σ2.

4. The SD of this sampling distribution is the Standard Error.

5 / 14

σ̂2 =1

n− 1

n∑i=1

(xi − x)2.

5 / 14

σ̂2 =1

n− 1

n∑i=1

(xi − x)2.

5 / 14

σ̂2 =1

n− 1

n∑i=1

(xi − x)2.

5 / 14

σ̂2 =1

n− 1

n∑i=1

(xi − x)2.

5 / 14

Limitations of the classical approach

This approach has served statisticians well for many years; however,what happens if:

I The distributional assumption — for example, x1, . . . , xnbeing normal — breaks down?

I The estimator does not have a simple form and its samplingdistribution cannot be derived analytically?

6 / 14

Example. Investing in two assets

Suppose that X and Y are the returns of two assets.

These returns are observed every day: (x1, y1), . . . , (xn, yn).

−2 −1 0 1 2

−3 −2 −1 0 1 2

−2 −1 0 1 2 3

Y7 / 14

We have a fixed amount of money to invest and we will invest afraction α on X and a fraction (1− α) on Y .

Therefore, our returnwill be

αX + (1− α)Y.

Our goal will be to minimize the variance of our return as afunction of α. One can show that the optimal α is:

α =σ2Y − Cov(X,Y )

σ2X + σ2Y − 2Cov(X,Y ).

Proposal: Use an estimate:

α̂ =σ̂2Y − ˆCov(X,Y )

σ̂2X + σ̂2Y − 2 ˆCov(X,Y ).

8 / 14

We have a fixed amount of money to invest and we will invest afraction α on X and a fraction (1− α) on Y . Therefore, our returnwill be

αX + (1− α)Y.

σ2X + σ2Y − 2Cov(X,Y ).

σ̂2X + σ̂2Y − 2 ˆCov(X,Y ).

8 / 14

αX + (1− α)Y.

Our goal will be to minimize the variance of our return as afunction of α.

One can show that the optimal α is:

σ2X + σ2Y − 2Cov(X,Y ).

σ̂2X + σ̂2Y − 2 ˆCov(X,Y ).

8 / 14

αX + (1− α)Y.

σ2X + σ2Y − 2Cov(X,Y ).

σ̂2X + σ̂2Y − 2 ˆCov(X,Y ).

8 / 14

αX + (1− α)Y.

σ2X + σ2Y − 2Cov(X,Y ).

σ̂2X + σ̂2Y − 2 ˆCov(X,Y ).

8 / 14

Suppose we compute the estimate α̂ = 0.6 using the samples(x1, y1), . . . , (xn, yn).

I How sure can we be of this value?

I If we resampled the observations, would we get a wildlydifferent α̂?

In this thought experiment, we know the actual joint distributionP (X,Y ), so we can resample the n observations to our hearts’content.

9 / 14

Resampling the data from the true distribution

−2 −1 0 1 2

−3 −2 −1 0 1 2

−2 −1 0 1 2 3

10 / 14

Computing the standard error of α̂

For each resampling of the data,

(x(1)1 , . . . , x(1)n )

(x(2)1 , . . . , x(2)n )

we can compute a value of the estimate α̂(1), α̂(2), . . . .

The Standard Error of α̂ is approximated by the standard deviationof these values.

11 / 14

Computing the standard error of α̂

For each resampling of the data,

(x(1)1 , . . . , x(1)n )

(x(2)1 , . . . , x(2)n )

we can compute a value of the estimate α̂(1), α̂(2), . . . .

The Standard Error of α̂ is approximated by the standard deviationof these values.

11 / 14

In reality, we only have n samples

−2 −1 0 1 2

−3 −2 −1 0 1 2

−2 −1 0 1 2 3

I However, these samples can beused to approximate the jointdistribution of X and Y .

I The Bootstrap: Resample fromthe empirical distribution:

P̂ (X,Y ) =1

n∑i=1

δ(xi, yi).

I Equivalently, resample the data bydrawing n samples withreplacement from the actualobservations.

12 / 14

−2 −1 0 1 2

−3 −2 −1 0 1 2

−2 −1 0 1 2 3

P̂ (X,Y ) =1

n∑i=1

δ(xi, yi).

12 / 14

−2 −1 0 1 2

−3 −2 −1 0 1 2

−2 −1 0 1 2 3

P̂ (X,Y ) =1

n∑i=1

δ(xi, yi).

12 / 14

−2 −1 0 1 2

−3 −2 −1 0 1 2

−2 −1 0 1 2 3

P̂ (X,Y ) =1

n∑i=1

δ(xi, yi).

12 / 14

A schematic of the Bootstrap

2.8 5.3 3

1.1 2.1 2

2.4 4.3 1

Y X Obs

2.8 5.3 3

2.4 4.3 1

2.8 5.3 3

Y X Obs

2.4 4.3 1

2.8 5.3 3

1.1 2.1 2

Y X Obs

2.4 4.3 1

1.1 2.1 2

Y X Obs

Original Data (Z)

1*α̂

2*α̂

α̂*B

13 / 14

Comparing Bootstrap resamplingsto resamplings from the true distribution

0.4 0.5 0.6 0.7 0.8 0.9

0.3 0.4 0.5 0.6 0.7 0.8 0.9

True Bootstrap

14 / 14

lecture 12: the bootstrap · lecture 12: the bootstrap reading: chapter 5 stats 202: data mining...

Documents