quantitative methods. review—bivariate regression what is the criterion that ols uses to “fit”...

Quantitative Methods

Review—Bivariate Regression

What is the criterion that OLS uses to “fit” a line to your data?

What is a parameter? A parameter estimate?

What are independent variables—or rhs (right-hand-side) variables? Dependent variables?

What is the slope? The intercept? The error term (in the population) or the residual (in the sample)?


Review of notation—slope, intercept, error (estimated or sample VS. true or population)

Two possible consequences of violating an OLS assumption—”bias” (what does that mean?) and inflated / deflated standard errors (what does that mean?)


Assumptions:

No measurement error

Specification—include all relevant rhs variables, no irrelevant rhs variables, linear relationship

Is this likely what our data look like?

Homeskedastic error terms (no heteroskedasticity)

No autocorrelation

Bivariate Regression: Robustness

A discussion about standard errors, p-values, ad statistical significance

Confidence intervals. What are they?

Confidence intervals are a range in which you would expect the true parameter to fall a pre-specified percentage of the time.


The wider the confidence interval, the less certain you are of the estimate. (A relatively wide confidence interval means that you could gather another sample, and would not be confident that your new slope estimate would be relatively close to the one you have from this sample).

The wider the confidence interval....

The higher the p-value (farther away from .05 or .01)...

The less statistically significant the results....

The less confident you are that there is a non-zero effect of the independent variable on the dependent variable


The narrower the confidence interval, the more “robust”, “efficient”, “stable” the results are.

If you were to gather an infinite number of samples from your population, and calculate an infinite number of slope estimates (one from each sample), you dno’t expect that the slope estimate will change much from sample to sample...


The standard error and variance of the slope estimate....

is a closely related concept. The larger the standard error, the less confident you are in your results (and the wider your confidence interval). Recall the image of the seesaw.


Let’s start with the variance of the residual. The formula for the variance of the residual (or the estimated variance of the error term):

2

ˆ2

2

n

eei


Note two elements of that equation--

First, what (by assumption) is the average residual?

And second, why are we subtracting 2 from our sample size?


What is the variance of the slope estimate?

2

22

ˆ

xX i


Note the numerator of the variance of the slope estimate b: it taps into the variance of the residuals (or, how well the data “fit” your estimated line)

Note the denominator of the variance of the slope estimate b: it taps into the range or variance of X.


So, if the data fit your line well....

The numerator of that equation is reduced

The variance of the slope estimate b is reduced; your results are more stable

The confidence interval for β is narrower

Your results are more statistically significant; your p-value is relatively low.

You are more likely to reject the null hypothesis that β=0


And, if you have relatively good variance in X...

The denominator of that equation is increased

The variance of the slope estimate b is reduced; your results are more stable

The confidence interval for β is narrower

Your results are more statistically significant; your p-value is relatively low.

You are more likely to reject the null hypothesis that β=0


The equation above is for the variance of your estimated slope b.

Your computer printouts will generally give you the standard error of b.

How do we calculate a standard error based on a variance?


The confidence interval for b is analogous to the variance and standard error, as noted above...

ˆ2

2ˆ

2

2ˆ*ˆ,ˆ*ˆ nn tt


So, the slope estimate plus / minus

The t-value * the standard error of the slope estimate

What is α? (It is 100-CL. Our CL is predetermined).

Why are we dividing α by 2?

Where can we find the t-value?

How do we interpret a confidence interval?

Bivariate Regression: T values

Recall the central limit theorem, which said that for any population with known mean μ and known variance σ2, random samples can be drawn, and the means of these samples will be

),(2

nNx


We use the t distribution in probability testing. Suppose X is some Random Variable with a true mean of μ and a true variance of σ2. Of course, in “real life”, we never know these ‘true” values. We estimate μ with

And we don’t know σ2, so we estimate it with s2. So instead of saying that we approximate a normal distribution, we say....

x


And we don’t know σ2, so we estimate it with s2. So instead of saying that we approximate a normal distribution, we say....

~ Tn-1

ns

x

/


As n gets larger, T distribution is closer to N(0,1) distribution; the mean of the t distribution is always 0, and as n increases, the variance of the t-distribution shrinks to 1.

~ Tn-1ns

x

/


Our t value is calculated by setting μ to a hypothesized value (usually 0), and then taking our sample estimate, and dividing it by the standard error. Note that this corresponds to the formula below (although note that instead of the mean of X, we will be using “b” as our sample / estimated slope)

~ Tn-1ns

x

/

Bivariate Regression: Hypothesis Testing

If our 1- α confidence interval includes zero, then we do not reject (we “fail to reject” our null hypothesis H0: β=0 at the 1- α level (2 tailed test).

Of course, if our 1- α CI does not include zero, then we accept H1: β ≠ 0 (we do reject H0: β=0) at the 1- α Confidence Level.

Bivariate Regression: Prob-Values

Prob values for slope coefficient are analogous to confidence intervals. Most computer packages will report these p-values for each slope coefficient. The universal decision rule indicates that we

Reject H0: β=0 with (1-α) confidence if p-value < α

Bivariate Regression: Prob-ValuesPre-DeterminedCI

1-α α reportedp-value

Is p-value < α?

Conclusion

90% 1-.10 .10 .0374 Yes Reject Ho

95% 1-.05 .05 .0374 Yes Reject Ho

98% 1-.02 .02 .0374 No Fail to Reject Ho

99% 1-.01 .01 .0374 No Fail to Reject Ho

Bivariate Regression: One-tailed versus two-tailed tests.

We use one-tailed tests when we have a directional hypothesis.

One tailed tests make parameter estimates more significant, because you are restricting H1 to a narrower set of possibilities.

In confidence intervals, the α remains the same, because you’ve picked a pre-determined confidence level—but you can think of the p-value (the area under the curve that represents greater than t) as being halved.

Bivariate Regression: Summary

We are estimating slopes and intercepts—and so we talk about the degree of confidence we have, based on our sample slopes and intercepts.

That concept of “confidence”, “robustness”, “efficiency”, “stability” is part of inferential statistics.

And, in general, a better fit of the data to the model—and more variance in the explanatory / independent variables – tends to make the findings more robust.


This makes sense—if you only ask a couple of people who they are voting for in the Democratic / Republican primaries, you will not have much variance—and you would not be confident if you tried to generalize to a larger population.

And research problems where there isn’t much variance in the independent variables (or dependent variable), and where the dependent variable is a “rare event” are just inherently difficult to predict (although there are ways to weight the observations so that one can address those issues).


Likewise, we are always going to be thinking about two possible problems—we can have deflated or inflated standard errors if we are violating OLS assumptions (so, our results are more or less significant than they would otherwise be).

Or our results are biased, which means that the estimated slope would not average out to the true slope, even if one collected an infinite number of samples, and an infinite number of estimate slopes.


These concepts—of confidence and bias—also carry over to all inferential methods.

And, keep in mind that there is a difference between statistical significance (as signaled by p-values or t-values or confidence intervals) and the magnitude of b.

You may have a very small effect, but it is “statistically significant” because it is very robust (remember what goes into the t--


You may have a very small effect, but it is “statistically significant” because it is very robust (remember what goes into the t—the value of b, divded by the standard error of b).

Or, you may have a very large effect, but cannot conclude that it is different from 0, because it is not very robust—you’re not that sure it would be large if you collected a different sample.

These concepts, too, carry over across methods. It is very important to interpret both statistical significance and magnitude, and to recognize they are not the same.

Bivariate Regression: Residuals

True Model:

yi = σ + βxi + εi

Which is estimated with:

Yi = a + bXi + ei

εi is the true error term for observation i. “e” is the estimated error (residual) for observation i.

Bivariate Regression: Residuals

So,

ei = yi – (a + bxi)

Or

ei = observed Y – predicted Y

Think of the error term in the population as not an error, but as a disturbance or a stochastic shock, whose deviation from the “true” population line is due to randomness.

Bivariate Regression: R2

Notice that observed Y – Mean Y = total deviation of Yi from mean Y.

Notice that predicted Y – mean of Y is the deviation of Yi from the mean of Y explained by OLS regression line

And notice that observed Y – predicted Y is the remaining unexplained deviation of Yi from mean Y (error)


Case (i)

Ordering Total (observed-mean)

Explained(predicted - mean)

Unexplained(Error)(observed-predicted)

1 P < O < M

2 O < P < M

3 P < M < O

4 O < M < P

5 M < P < O

6 M < O < P


Of course, in any sample we have n data points—

so we’ll have n total deviations

And n explained deviations

And n unexplained deviations


Suppose we square each individual total, explained, and unexplained deviation.

And then we sum up all of the squared total deviations, do the same for all the squared explained deviations, and the same for all the squared unexplained deviations.

We would see that

Sum of the Squared Total Deviations =

Sum of the Squared Explained Deviations +Sum of the Squared Unexplained Deviations


OR,

TSS = Total Sum of Squares =

RSS (Regression Sum of Squares / Explained Sum of Squares)

+

ESS (Error Sum of Squares / Residual Sum of Squares)


And R2 = RSS / TSS

(Note that this is the same as 1 – proportion of total deviation (TSS) of Y from the mean that is “unexplained” by OLS)

So, if R2 = .34, we can say that 34% of the total variation in Y has been accounted for by the OLS regression of Y and X


Why is R2 useful?

What are the limits of R2?

It is not really a measure of magnitude of the effect.It is a measure of correlation, and so it depends in part

on the standard deviation of X and Y—and cannot be compared across samples.

Models with high R2 are not necessarily “good”—and models with low R2 are not necessarily “bad”.


It is not really a measure of magnitude of the effect.

It is a measure of correlation, and so it depends in part on the standard deviation of X and Y—and cannot be compared across samples.

Models with high R2 are not necessarily “good”—and models with low R2 are not necessarily “bad”.

R2 can be biased, particularly in small samples.

And it can be a reflection of the number of variables on the left hand sides (although there are ways to account for this—adjusted R2)


The bottom line? – R2 is a measure of goodness of fit, and as such can be useful. It is not a measure of how good your results are.

And, when you think about it, what the R2 is doing is telling you how well the data fit the line—how good your prediction is compared to just using the mean. The mean isn’t a great predictor of Y, so the utility of the R2 is limited.

Bivariate Regression: Standard Error of Estimate

2ˆ

2

n

ei

Bivariate Regression: Some useful equations....

2)(

))((

XX

YYXXb

i

ii

XbYa

quantitative methods. review—bivariate regression what is the criterion that ols uses to “fit”...

Documents

variance of

autocorrelation slide

estimated variance

dependent variable slide

new slope estimate

parameter estimate

sample size

wide confidence interval