pols 7000x statistics in political science class 10 brooklyn college-cuny shang e. ha

32
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 10 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society

Upload: neena

Post on 16-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang E. Ha. Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society. Chapter 9: Regression and Correlation. Overview The Scatter Diagram Linear Relations and Prediction Rules - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

POLS 7000X STATISTICS IN POLITICAL SCIENCE

CLASS 10BROOKLYN COLLEGE-CUNYSHANG E. HALeon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society

Page 2: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Chapter 9: Regression and Correlation Overview The Scatter Diagram Linear Relations and Prediction Rules Methods for Assessing the Accuracy for

Predictions Calculating r2

Pearson’s Correlation Coefficient r

Page 3: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Overview

Int

erva

l

Nom

inal

Depe

nden

tVa

riabl

eIndependent Variables

Nominal Interval

Considers the distribution of one variable across the categories of another variableConsiders the difference between the mean of one group on a variable with another group

Considers how a change in a variable affects a discrete outcome

Considers the degree to which a change in one variable results in a change in another

Page 4: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Overview

Int

erva

l

Nom

inal

Depe

nden

tVa

riabl

eIndependent Variables

Nominal Interval

Logistic Regression

RegressionCorrelation

Lambda

Confidence Intervals

T-Test

Overview

Page 5: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

General Examples

Does a change in one variable significantly affect another variable?Do two scores tend to co-vary positively (high on one score high on the other, low on one, low on the other)?Do two scores tend to co-vary negatively (high on one score low on the other; low on one, hi on the other)?

Inte

rval

Nom

inal

Dep

ende

ntV

aria

ble

Independent Variables

Nominal Interval

Inte

rval

Nom

inal

Dep

ende

ntV

aria

ble

Independent Variables

Nominal Interval

Inte

rval

Nom

inal

Dep

ende

ntV

aria

ble

Independent Variables

Nominal Interval

Page 6: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Specific Examples

Does getting older significantly influence a person’s political views?Does marital satisfaction increase with length of marriage?How does an additional year of education affect one’s earnings?

Inte

rval

Nom

inal

Dep

ende

ntV

aria

ble

Independent Variables

Nominal Interval

Inte

rval

Nom

inal

Dep

ende

ntV

aria

ble

Independent Variables

Nominal Interval

Inte

rval

Nom

inal

Dep

ende

ntV

aria

ble

Independent Variables

Nominal Interval

Page 7: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Scatter Diagrams Scatter Diagram (scatterplot)—a

visual method used to display a relationship between two interval-ratio variables.

Typically, the independent variable is placed on the X-axis (horizontal axis), while the dependent variable is placed on the Y-axis (vertical axis.)

Page 8: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Scatter Diagram Example

Page 9: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Scatter Diagram Example

Page 10: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

A Scatter Diagram Example of a Negative Relationship

Page 11: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Linear Relationships Linear relationship – A relationship

between two interval-ratio variables in which the observations displayed in a scatter diagram can be approximated with a straight line.

Deterministic (perfect) linear relationship – A relationship between two interval-ratio variables in which all the observations (the dots) fall along a straight line. The line provides a predicted value of Y (the vertical axis) for any value of X (the horizontal axis.

Page 12: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Graph the data below and examine the relationship:

Page 13: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

The Seniority-Salary Relationship

Page 14: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Example: Education & PrestigeDoes education predict occupational prestige? If so, then the higher the respondent’s level of education, as measured by number of years of schooling, the greater the prestige of the respondent’s occupation.Take a careful look at the scatter diagram on the next slide and see if you think that there exists a relationship between these two variables…

Page 15: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Take your best guess?

The mean age for U.S. residents.Now if I tell you that this person owns a skateboard, would you change your guess? (Of course!)With quantitative analyses we are generally trying to predict or take our best guess at value of the dependent variable. One way to assess the relationship between two variables is to consider the degree to which the extra information of the second variable makes your guess better. If someone owns a skateboard, that is likely to indicate to us that s/he is younger and we may be able to guess closer to the actual value.

If you know nothing else about a person, except that he or she lives in United States and I asked you to his or her age, what would you guess?

Page 16: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Take your best guess? Similar to the example of age and

the skateboard, we can take a much better guess at someone’s occupational prestige, if we have information about her/his years or level of education.

Page 17: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Equation for a Straight LineY= a + bX

where a = interceptb = slopeY = dependent variableX = independent variable

X

Y

a

riserunrise run = b

Page 18: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Bivariate Linear Regression Equation

Y = a + bX Y-intercept (a)—The point where the

regression line crosses the Y-axis, or the value of Y when X=0.

Slope (b)—The change in variable Y (the dependent variable) with a unit change in X (the independent variable.)

The estimates of a and b will have the property that the sum of the squared differences between the observed and predicted (Y-Y)2 is minimized using ordinary least squares (OLS). Thus the regression line represents the Best Linear and Unbiased Estimators (BLUE) of the intercept and slope.

ˆ

^

Page 19: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Now let’s interpret the SPSS output...

SPSS Regression Output (GSS)Education & Prestige

Page 20: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

The Regression Equation

Prediction Equation:

Y = 13.874 + 1.384(X)

This line represents the predicted values for Y for any and all values of X

ˆ

Page 21: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

• If a respondent had zero years of schooling, this model predicts that his occupational prestige score would be 13.874 points.

• For each additional year of education, our model predicts a 1.384 point increase in occupational prestige.

Interpreting the regression equation

Y = 13.874 + 1.384(X)

Page 22: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Ordinary Least Squares

Least-squares line (best fitting line) – A line where the errors sum of squares, or e2, is at a minimum.

Least-squares method – The technique that produces the least squares line.

Page 23: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Estimating the slope: b• The bivariate regression coefficient

or the slope of the regression line can be obtained from the observed X and Y scores.

)(

)()(

1

)(1

)()(

222 XX

YYXX

N

XXN

YYXX

SSbX

YX

Page 24: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Covariance =

Variance of X =

Covariance of X and Y—a measure of how X and Y vary together. Covariance will be close to zero when X and Y are unrelated. It will be greater than zero when the relationship is positive and less than zero when the relationship is negative.Variance of X—we have talked a lot about variance in the dependent variable. This is simply the variance for the independent variable

Covariance and Variance

1

)()(

N

YYXX

1)(

1))(( 2

N

XXN

XXXX

Page 25: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Estimating the Intercept

XbYa

The regression line always goes through the point corresponding to the mean of both X and Y, by definition.

So we utilize this information to solve for a:

Page 26: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Summary: Properties of the Regression Line

Represents the predicted values for Y for any and all values of X.

Always goes through the point corresponding to the mean of both X and Y.

It is the best fitting line in that it minimizes the sum of the squared deviations.

Has a slope that can be positive or negative; null hypothesis is that the slope is zero.

Page 27: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Coefficient of Determination

Page 28: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

Coefficient of Determination

Page 29: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

• Pearson’s Correlation Coefficient (r) — The square root of r2. It is a measure of association between two interval-ratio variables:

• Symmetrical measure—No specification of independent or dependent variables.

• Ranges from –1.0 to +1.0. The sign () indicates direction. The closer the number is to 1.0 the stronger the association between X and Y.

The Correlation Coefficient

Page 30: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

r = 0 means that there is no association between the two variables.

The Correlation Coefficient

Y

X

r = 0

The Correlation Coefficient

Page 31: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

The Correlation Coefficient

Y

X

r = 1

r = 1 means a perfect positive correlation.

The Correlation Coefficient

Page 32: POLS 7000X Statistics in Political Science Class 10 Brooklyn College-CUNY shang  E. Ha

Leon-Guerrero/Frankfort-Nachmias: Essentials of Social Statistics for a Diverse Society© 2012 SAGE Publications

The Correlation Coefficient

Y

X

r = –1

r = –1 means a perfect negative correlation.

The Correlation Coefficient