dr. g. johnson, data analysis: regression research methods for public administrators dr. gail...

51
Dr. G. Johnson, www.resea rchdemystified.org 1 Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Upload: jamie-davy

Post on 14-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

1

Data Analysis: Regression

Research Methods for Public Administrators

Dr. Gail Johnson

Page 2: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

2

Making Sense of Regression

Regression analysis is an advanced analytical technique—with the ability to consider many different variables that might explain something like differences in income or declining crime rates

Page 3: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

3

Making Sense of Regression

Why include in an introductory research methods textbook? Because regression results are often reported in

the news Because regression is not hard to understand

conceptually-building on what we know about relationships and measures of association –even if the actual equations are intimidating and unclear because so many symbols are used

Page 4: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

4

Back to the Premise of Demystifying Statistics When advocates of particular policies try to

persuade, they often use statistics. The fancier statistics might be appropriate but can

also bedazzle or intimidate. Having an insider’s view about measuring

relationships using quantitative data may demystify these statistical techniques.

Page 5: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

5

Making Sense of Regression

The emphasis here is on Understanding the key elements of regression Requirements Application Limitations

Page 6: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

6

Regression Is A Powerful Analytical Technique

Enables researchers to do two things:1. Determine the strength of the relationship

The r-squared value Small “r” for regression with only one

independent variable Capital “R” for regression with more than one

independent variable

Page 7: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

7

Regression Is A Powerful Analytical Technique2. Determine the impact of the independent

variable(s) on the dependent variable The regression coefficient is the predicted

change in the dependent variable for every one unit of change in the independent variable

Collectively, the regression coefficients enable the researchers to make estimates of how the dependent variable will change using different scenarios for the independent variables

Page 8: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

8

1. R-square And Its Companions

r = correlation coefficient (overall fit or measure of association, which is also called r, Pearson’s r, Pearson Product Moment Correlation coefficient, or zero-order coefficient). We’ve seen this in prior chapter

r-square = proportion of the explained variance the dependent variable (also called the coefficient of determination)

1 minus r-square = proportion of unexplained variance in the dependent variable

Page 9: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

9

Interpreting R-Square Is Easy

Or at least as easy as any measure of association Fake Example: Researchers look at GRE scores

and academic performance in graduate school as measured by grade point average The hypothesis is that people who have high GRE

scores will also have high GPAs From an admission’s committee perspective: the

belief that GRE scores are a good predictor of future academic success and are, therefore, a good criteria for admission decisions

The researchers report an r-squared of .2

Page 10: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

10

Interpreting R-Square Is Easy

R-square is similar to a measure of association: It varies from 0 to 1: zero indicating no relationship, 1

indicating a perfect relationship Except that it gives more information—it gives an

estimate of how much change in the dependent variable (in this case, GPAs) are explained by GRE scores.

Interpretation of prior slide: GRE’s explain 20 percent of the change in GPAs This means that 80 percent of the changes in GPA are

explained by other factors.

Page 11: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

11

Discussion

If you were making a recommendation to the admissions committee, how much emphasis should they give GRE scores in admission decisions?

Explain/defend your reasoning

Page 12: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

12

A Different R-Squared, A Different Decision? Suppose the researchers found an r-squared

of .65? What would you recommend? Why? What other factors might be important in

predicting academic success in graduate school?

Page 13: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

13

Paradox of High R-squares

Researchers want to obtain results with a high R-square They want to build models that explain as much

as possible about what affects the dependent variable

That is, they want to discover good predictive models

But sophisticated users should be suspicious of results with a high R-squared

Page 14: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

14

Generating High R-squares

Problem of multi-collinearity This means using independent variables that are

highly correlated with each other Including median income and poverty rates for

example They will throw off the mathematics that may

give a falsely high r-squared Aggregating data in ways that reduce sample size

can generate high r-squares

Page 15: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

15

Generating High R-squares

Researchers might decide to get rid of “outliers”—the data points that are really, really far away from the bulk of the data If the data point is truly incorrect—clearly

someone typed it I wrong, it can be deleted. Otherwise, researchers should accept the

outliers as part of the way things are For more information, see Taken from J. Scott Armstrong, 1985,

long-range forecasting, 2nd ed., P. 487.

Page 16: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

16

2. Regression Wizardry: Predicting Change

Regression follows the same concepts of relationships, then takes it to the next level It allows researchers to predict the change in the

dependent variable based on every unit change in the independent variable

This is the regression coefficient (or partial regression coefficient in multiple regression analysis)

If the regression coefficient = .05, it means that for every one unit change in the GRE score, there will be a .05 increase in the GPA score

Assuming, of course, that there is a strong relationship

Page 17: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

17

Other Examples of the Regression Coefficient For every one unit change in years of education,

there is a $2,000 change in yearly individual income.

For every one unit change in the age of a plane, there is a $500 change in maintenance costs.

For every one unit change in age, there is a .3 percent decrease in memory test scores among adults.

(note: these are all fake data)

Page 18: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

18

Regression Requirements

Requirements: Assumes a linear relationship Uses random sample or census data Works with interval/ratio level data

It is possible to convert a nominal variable into a “dummy variable”—which means that it only has two variables: 0 and 1—to use as an independent variable

– For example: Gender: female 0, male 1

Page 19: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

19

Ordinary Least Squares Regresion

There are many types of regression tools For our purposes, I am sticking with what they call

“ordinary least squares” (OLS) that can only be used with interval/ratio level data (i.e. real numbers)

There are other types to handle other data situations

– For example, logistic regression is use with nominal dependent variable with only 2 categories

– For example: Drug Use: yes or no

Page 20: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

20

The Concept of “Least Squares”

Regression analysis used here is based on the idea of “least squares”

The computer creates an imaginary "best" straight line through a set of data, such that for any value of X, the value of Y can be predicted

Page 21: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

21

X Axis: Age of Planes5 years

.

..

..

.. ...

20 years

Y Axis: Plane Maintenance Costs

The dots represent each plane’s age and maintenance cost from prior year

Predicted values if perfect relationship

$1,000

$500

..

..

. ..

.

10 years

. ....

Page 22: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

22

The Concept of “Least Squares”

This line is selected because it yields the smallest total distance between every data point and this perfect line. The distances are squared as part of the calculation—

hence the name, “least squares”

The line is useful to the extent that the difference between the predicted line and the actual data points is small

Page 23: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

23

Simple Regression Equation

Y = a + bX + e

Where: Y = predicted value of the dependant variable a = the constant or Y intercept (where the

imaginary line crosses the Y access) b = the regression coefficient X = the independent variable e = error (the computer will estimate the likely

error)

Page 24: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

24

Applying Simple Regression

Researchers are asked to estimate maintenance costs for next year’s budget This large state that has a fleet of planes used by public

officials to make it easy to visit all parts of the state Analysts believe that there is a relationship

between maintenance costs and use of the planes (measured by the miles flown) Y= plane maintenance costs measured in dollars (the

dependent variable) X = miles flown (the independent variable)

Page 25: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

25

How It Is Applied

Analysts collect data over the past two years and crunch it. The computer gives these results:

Y = 100 and .020X The constant is 100:

If they do not fly at all, the computer estimates there is still a cost of $100

The .020 is the regression coefficient: This gets interpreted as: for every mile flown, there is

$.02 change in maintenance costs.

Page 26: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

26

Simple Regression

Y = 100 and .020X Interpreting the regression coefficient:

For every mile flown, the maintenance costs goes up by 2 cents.

For every 100 miles flown, costs are $2 For every 1,000 miles, the costs are $20 For every 100,000 miles, the costs are $20,000

Page 27: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

27

Making Maintenance Cost Estimates They can then solve the equation:

Assuming 100,000 miles will be flown, how much will they need to budget for maintenance?

100,000 multiplied by .020 = $20,000 Y= 100 + $20,000 + error

The estimate maintenance will cost: $20,100 + error

Page 28: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

28

Yes, but…

How strong is the relationship between miles flown and maintenance costs?

Before we put too much faith in these budget estimates, we will want to look at the r-squared

Like any measure of association, there is some choice about what is “good enough”, since it would be exceedingly rare to get an r-squared close to a perfect 1.

Page 29: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

29

Simple Regression: Another Example

Hypothesis: If schools have a higher percentage of poor children, then they will have lower test scores.

A regression analysis shows:A regression coefficient of -.04 An r-squared value of .25

Page 30: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

30

Simple Regression

Interpretation? Regression coefficient: For every increase in the

percent of children in poverty within a school, the average test score goes down by .04

R-squared: 25% of the test scores are explained by the percent of children in poverty in the school

Researchers will ask: what other factors might explain differences in test scores in the schools?

They will want to build a bigger model that will include more factors

Page 31: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

31

Life More Complex

Rarely will any one single variable cause big changes in another variable, especially complex phenomena Warning bells should sound when anyone states

that a single variable caused a complex problemThe economic collapse is due to consumer

debtThe economic collapse is due to corporate

greed

Page 32: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

32

Discussion: Complexity of Public Policy Issues What are the possible causes the 2008

economic downturn? What are the possible explanations for the

declining crime rate from 1991 to 2004? In 1991, the national violent crime rate was:

1991: 753 per 100,000 population 2004: 463 per 100,000 population

Page 33: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

33

What Are the Possible Causes for Urban Decay? Lack of jobs High % of absentee

landlords Low % of homeowners Poor quality of schools Increased

concentration of poor

Increase in drugs, crime

Aging housing stock Flight of middle class

to suburbs Corruption Aging infrastructure Business flight to

suburbs

Page 34: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

34

Multiple Regression: Added Power

Multiple regression does four things: Provides the an overall measure of the predictive

strength of the model: the R-square Predict the dependent variable based on the summed

contributions of the independent variables. Determines the impact of each independent variable on

the dependent variable while controlling for the other variables (these are the partial regression coefficients)

Determines the relative strength of each of the independent variable using the beta weights

Page 35: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

35

Multiple Regression Equation

Y = a + bX1 + bX2 + bX3 + bX4 + e.Y = dependent variableX1 = independent variable 1,

controlling for X2, X3, X4X2 = independent variable 2

controlling for X1, X3, X4X3 = independent variable 3

controlling for X1, X2, X4X4= independent variable 4

controlling for X1, X2, X3

Page 36: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

36

Multiple Regression Equation

It has the same basic structure of simple regression Y is still the dependent variable There is still a constant (a) and some amount of

error (e) that the computer calculates But there are more Xs to represent the multiple

independent variables

Page 37: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

37

Multiple Regression Equation

The b in front of the Xs will be the Partial Regression Coefficients The separate impact on dependent variable

controlling for all the other independent variables (sometimes called “holding them constant)

Page 38: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

38

Multiple Regression: An Example

Hypothesis: Income is a function of education and seniority?

We suggest that income (the dependent variable) will increase as both education and seniority increases (two independent variables)

Y (Income) = a + education + seniority+ errorbased on Lewis-Beck example

Page 39: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

39

Multiple Regression: InterpretationResults:

Y= 6000 + 400X1 (education) + 200X2 (seniority)R square = .67 First look at the R-Square: This shows a strong

relationship—so analysis can continue Partial regression coefficients:

For every year of education, holding seniority constant, income increases by $400.

For every year of seniority, holding education constant, income increases by $200.

Page 40: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

40

Multiple Regression: ApplicationEstimate the income of someone who has

10 years of education and 5 years of seniority

We solve the regression equation: Multiply the 10 years of education by the regression

coefficient of 400: equals 4,000 Multiply 5 years of senior by the regression coefficient of

200: equals 1,000 Put it together with the constant and you have Y=6000 + 400(10) + 200(5) + error

Y= $ 11,000 + error

Page 41: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

41

Multiple Regression: Beta Weights

Relationship between contributions to political campaigns as a function of age and income?

Y= campaign contribution (dollars)

X1 = age (years)

X2 = income (dollars)

Page 42: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

42

Multiple Regression

Relationship between contributions to political campaigns as a function of age and income. Computer generates this equation:

Y = 8 + 2X1 + .010X2

(age) (income)Interpreting the partial regression coefficients: For every one year increase in age, contributions go

up by $2. For every dollar increase in income, contributions

go up .01 dollars

Page 43: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

43

Multiple Regression: Beta Weights

But which is stronger? We cannot tell because age and income are

measured differently (years versus dollars)

Need to look at the Beta Weights Beta Weights are Standardized--thus

making all variables comparable But they have a very limited application

Page 44: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

44

Beta Weights

Returning to age and income as predictors of campaign contributions, the computer gives us these beta weights

Age = .15Income = .45

Which is the strongest of the two? Income is the highest, therefore the stronger

of the two

Page 45: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

45

Takeaway Lesson

When reading research results about relationships, my best advice is to exercise healthy skepticism and ask the tough questions before asserting—or believing—that research results are irrefutable facts merely because of sophisticated mathematics.

Page 46: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

46

Takeaway Lesson

Knowing how difficult it is to demonstrate causality or program impacts, be mindful when people present research asserting they have found a cause-effect relationship.

Be especially cautious when people claim they have a found a single cause for a complex phenomenon even when they use advanced statistical techniques.

Page 47: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

47

Takeaway Lesson

At the same time, be cautious in believing variables are not connected or that programs do not have an impact based on data from one study. “More research is needed” is not a self-

employment program for researchers

It is also important to know when statistics are just too frail to give a clear answer.

Page 48: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

48

Ask the Tough Questions

Are they using data that is likely to be unknown or difficult to measure? Do the proxy measures they use make sense? Do they state all of their assumptions in constructing

their measures used in their calculations?

Is the analysis appropriate to the situation? Do they provide measures of association and are

they strong enough?

Page 49: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

49

Ask the Tough Questions

Is there design strong enough to rule out possible rival explanations?

Even with fancy statistics, the basic principles of good research design still must be met—especially when attempting to answer cause-effect questions I might show a high r-square between stock market

activity and sunspot activity—but I still need a good theory to explain why they are connected

Page 50: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

50

Remember: It Is OK To Ask For Help It is also important to recognize that statistics can

be so technical that it necessary to bring in experts to make sense of complex and confusing research results.

No one expects you to know it all from one required research methods course—or remember it 10 years later

My point: remember that it really is OK to bring in the experts to make sense of research that focuses on issues that matter.

Page 51: Dr. G. Johnson,  Data Analysis: Regression Research Methods for Public Administrators Dr. Gail Johnson

Dr. G. Johnson, www.researchdemystified.org

51

Creative Commons

This powerpoint is meant to be used and shared with attribution

Please provide feedback If you make changes, please share freely

and send me a copy of changes: [email protected]

Visit www.creativecommons.org for more information