types of bivariate relationships and associated statistics

45
Types of Bivariate Types of Bivariate Relationships and Relationships and Associated Statistics Associated Statistics Nominal/Ordinal and Nominal/Ordinal Nominal/Ordinal and Nominal/Ordinal (including dichotomous) (including dichotomous) Crosstabulation (Lamda, Chi-Square Gamma, etc.) Crosstabulation (Lamda, Chi-Square Gamma, etc.) Interval and Dichotomous Interval and Dichotomous Difference of means test Difference of means test Interval and Nominal/Ordinal Interval and Nominal/Ordinal Analysis of Variance Analysis of Variance Interval and Interval Interval and Interval Regression and correlation Regression and correlation

Upload: shepry

Post on 06-Jan-2016

39 views

Category:

Documents


3 download

DESCRIPTION

Types of Bivariate Relationships and Associated Statistics. Nominal/Ordinal and Nominal/Ordinal (including dichotomous) Crosstabulation (Lamda, Chi-Square Gamma, etc.) Interval and Dichotomous Difference of means test Interval and Nominal/Ordinal Analysis of Variance - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Types of  Bivariate  Relationships and Associated Statistics

Types of Bivariate Relationships Types of Bivariate Relationships and Associated Statisticsand Associated Statistics

Nominal/Ordinal and Nominal/Ordinal (including Nominal/Ordinal and Nominal/Ordinal (including dichotomous)dichotomous)

Crosstabulation (Lamda, Chi-Square Gamma, etc.) Crosstabulation (Lamda, Chi-Square Gamma, etc.)

Interval and DichotomousInterval and Dichotomous Difference of means testDifference of means test

Interval and Nominal/OrdinalInterval and Nominal/Ordinal Analysis of VarianceAnalysis of Variance

Interval and IntervalInterval and Interval Regression and correlationRegression and correlation

Page 2: Types of  Bivariate  Relationships and Associated Statistics

Analysis of VarianceAnalysis of Variance

A crosstabulation is appropriate when both A crosstabulation is appropriate when both variables are nominal or ordinal. But when variables are nominal or ordinal. But when the variables are interval or ratio the variables are interval or ratio (continuous) the table would have far too (continuous) the table would have far too many columns and rows for analysis.many columns and rows for analysis.

Therefore, we have another technique in Therefore, we have another technique in these cases, the analysis of variance. these cases, the analysis of variance. (ANOVA)(ANOVA)

Page 3: Types of  Bivariate  Relationships and Associated Statistics

Analysis of VarianceAnalysis of Variance

ANOVA is a way of comparing means of a ANOVA is a way of comparing means of a quantitative variable between categories quantitative variable between categories or groups formed by a categorical variable.or groups formed by a categorical variable.

For example, you have a variable (x) with For example, you have a variable (x) with three categories (A,B,C) and a sample of three categories (A,B,C) and a sample of observations within each of those observations within each of those categories.categories.

Page 4: Types of  Bivariate  Relationships and Associated Statistics

Analysis of VarianceAnalysis of Variance

For each observation there is a For each observation there is a measurement on a quantitative dependent measurement on a quantitative dependent variable (Y).variable (Y).

Therefore, within each category you can Therefore, within each category you can find the mean of (Y).find the mean of (Y).

Page 5: Types of  Bivariate  Relationships and Associated Statistics

Analysis of VarianceAnalysis of Variance

ANOVA looks through all of the data to ANOVA looks through all of the data to discover:discover:

1) If there are any differences among the means1) If there are any differences among the means 2) which specific means differ and by how much2) which specific means differ and by how much 3) whether the observed differences could have 3) whether the observed differences could have

arisen by chance or reflect real variation among arisen by chance or reflect real variation among the categories in (X)the categories in (X)

Page 6: Types of  Bivariate  Relationships and Associated Statistics

Difference of MeansDifference of Means

When you are comparing only two means When you are comparing only two means you have a special case of ANOVA, which you have a special case of ANOVA, which is simply referred to as a Difference of is simply referred to as a Difference of Means test.Means test.

Page 7: Types of  Bivariate  Relationships and Associated Statistics

Difference of MeansDifference of Means

Often, we are interested in the difference in the Often, we are interested in the difference in the means of means of twotwo populations. populations.

For example,For example, What is the difference in the mean income for blacks What is the difference in the mean income for blacks

and whites?and whites?

What is the difference in the average defense What is the difference in the average defense expenditure level for Republican and Democratic expenditure level for Republican and Democratic presidents?presidents?

Page 8: Types of  Bivariate  Relationships and Associated Statistics

Difference of MeansDifference of Means

Note that both of these questions are Note that both of these questions are essentially asking if two variables (one of essentially asking if two variables (one of which is interval and the other which is interval and the other dichotomous) are related to one another.dichotomous) are related to one another.

Page 9: Types of  Bivariate  Relationships and Associated Statistics

Imagine that you are running an experiment Imagine that you are running an experiment to test if neg. campaign advertising affects to test if neg. campaign advertising affects the likelihood of voting.the likelihood of voting.

You have two groups randomly assigned (control You have two groups randomly assigned (control group and experimental group)group and experimental group)

EXP – Watches a TV newscast with a campaign ad EXP – Watches a TV newscast with a campaign ad includedincluded

CON – Watches a TV newscast with without the ad.CON – Watches a TV newscast with without the ad.

Page 10: Types of  Bivariate  Relationships and Associated Statistics

After each newscast, the groups are asked After each newscast, the groups are asked to answer questions that measure their to answer questions that measure their intent to vote.intent to vote.

Now, to make a conclusion on the effect of Now, to make a conclusion on the effect of the ads, you can compare the responses the ads, you can compare the responses before and after watching the ads. And before and after watching the ads. And while the control group measures stayed while the control group measures stayed the same…the same…

Page 11: Types of  Bivariate  Relationships and Associated Statistics

The experimental group’s measures did The experimental group’s measures did not. By examining the mean response to not. By examining the mean response to the question on intention to vote between the question on intention to vote between the pre-test and the post-test you can the pre-test and the post-test you can calculate the difference between the calculate the difference between the means.means.

This is called the EFFECT SIZE. It is one This is called the EFFECT SIZE. It is one of the most basic measures in science.of the most basic measures in science.

Page 12: Types of  Bivariate  Relationships and Associated Statistics

The question is however, whether the The question is however, whether the difference in “means” means anything. It difference in “means” means anything. It could just be found by chance or it could could just be found by chance or it could truly reflect that the negative ads had a truly reflect that the negative ads had a real impact on voting.real impact on voting.

We can interpret the difference of means We can interpret the difference of means test then as follows:test then as follows:

Page 13: Types of  Bivariate  Relationships and Associated Statistics

The larger the difference in means, the The larger the difference in means, the more likely the difference is not due to more likely the difference is not due to chance and is instead due to a relationship chance and is instead due to a relationship between the IV & DV.between the IV & DV.

It is essential, then, to establish a way to It is essential, then, to establish a way to determine when the difference is large determine when the difference is large enough to conclude that there was a enough to conclude that there was a meaningful effect.meaningful effect.

Page 14: Types of  Bivariate  Relationships and Associated Statistics

Difference of MeansDifference of Means

The null hypothesis for a difference of The null hypothesis for a difference of means test is: means test is: There is no difference in the mean of Y across There is no difference in the mean of Y across

groups. (Group 1 = Group 2) (groups. (Group 1 = Group 2) (11‑‑22=0) =0)

Page 15: Types of  Bivariate  Relationships and Associated Statistics

Difference of MeansDifference of Means

The alternative hypothesis for a difference The alternative hypothesis for a difference of means test is: of means test is: There is a difference in the mean of Y across There is a difference in the mean of Y across

groups. (Group 1 = Group 2) (groups. (Group 1 = Group 2) (11‑‑22≠≠0) (< or >)0) (< or >)

Page 16: Types of  Bivariate  Relationships and Associated Statistics

Sampling Distribution for a Sampling Distribution for a Difference of MeansDifference of Means

The The sampling distribution for the difference of sampling distribution for the difference of two meanstwo means::

1.1. Is distributed normally (for large N)Is distributed normally (for large N)

2.2. Has mean Has mean 11‑‑22

3. We can determine the variance of the sampling 3. We can determine the variance of the sampling distribution of the difference of means (and distribution of the difference of means (and thus the SE) from information about the thus the SE) from information about the population variances.population variances.

Page 17: Types of  Bivariate  Relationships and Associated Statistics

There are a variety of different difference There are a variety of different difference of means tests and although they have of means tests and although they have slightly different formulas, they all have slightly different formulas, they all have two identical properties.two identical properties.

The numerator indicates the difference on The numerator indicates the difference on means and the denominator indicates the means and the denominator indicates the standard error of the difference of means.standard error of the difference of means.

Page 18: Types of  Bivariate  Relationships and Associated Statistics

Test Statistic for a Difference of Test Statistic for a Difference of MeansMeans

The test statistic (used to test the null The test statistic (used to test the null hypothesis) for the difference of two hypothesis) for the difference of two means (for independent samples) is means (for independent samples) is calculated as:calculated as:

The main difference in the different formulas The main difference in the different formulas is how the standard error is calculated.is how the standard error is calculated.

Page 19: Types of  Bivariate  Relationships and Associated Statistics

The standard error of the difference of The standard error of the difference of means captures in one number how likely means captures in one number how likely the difference of sample means reflects the difference of sample means reflects differences in the true population.differences in the true population.

The standard error is partly a function of The standard error is partly a function of variability and partly a function of sample variability and partly a function of sample size.size.

Page 20: Types of  Bivariate  Relationships and Associated Statistics

Both variability and sample size provide Both variability and sample size provide information about how much confidence information about how much confidence we can have that observed difference is we can have that observed difference is representative of the population difference.representative of the population difference.

Variability is crucial to our confidence. Variability is crucial to our confidence. Think of it like this…Think of it like this…

Page 21: Types of  Bivariate  Relationships and Associated Statistics

If we took a sample and every observation was If we took a sample and every observation was equal to 5 then our mean would also be 5. equal to 5 then our mean would also be 5. Because our variation is 0 we are exceptionally Because our variation is 0 we are exceptionally confident that that this reflects our population confident that that this reflects our population mean.mean.

But we can also have a mean of 5 with But we can also have a mean of 5 with observations that are varied and in which none observations that are varied and in which none are exactly 5. Now we are less confident.are exactly 5. Now we are less confident.

Page 22: Types of  Bivariate  Relationships and Associated Statistics

Similarly, sample size is crucial. As we Similarly, sample size is crucial. As we have discussed, a larger sample size = have discussed, a larger sample size = greater confidence.greater confidence.

Moral of the story? What is true for Moral of the story? What is true for individual means is also true for difference individual means is also true for difference of means.of means.

Page 23: Types of  Bivariate  Relationships and Associated Statistics

Test Statistic for a Difference of Test Statistic for a Difference of MeansMeans

So, by calculating this test statistic, we So, by calculating this test statistic, we can determine the probability of can determine the probability of observing a t-value at least this large, observing a t-value at least this large, assuming the null hypothesis is true (P-assuming the null hypothesis is true (P-value/sig. level)value/sig. level)

Page 24: Types of  Bivariate  Relationships and Associated Statistics

Essentially, we are testing whether the Essentially, we are testing whether the population difference of means is 0 or population difference of means is 0 or positive.positive.

Page 25: Types of  Bivariate  Relationships and Associated Statistics

Example: NES and 2000 ElectionExample: NES and 2000 Election

1. Null hypothesis: there was no difference 1. Null hypothesis: there was no difference in gender between those who voted for in gender between those who voted for Bush and those who voted for Gore Bush and those who voted for Gore (alternative hypothesis: there WAS a (alternative hypothesis: there WAS a difference)difference)

Since only values much greater than 0 are Since only values much greater than 0 are of interest, this is a one tailed test.of interest, this is a one tailed test.

Page 26: Types of  Bivariate  Relationships and Associated Statistics

We would use a two tailed test if we We would use a two tailed test if we specified a test that could be different from specified a test that could be different from 0 in the positive or negative direction.0 in the positive or negative direction.

Page 27: Types of  Bivariate  Relationships and Associated Statistics

Example: NES and 2000 ElectionExample: NES and 2000 Election

2. Appropriate test statistic for difference 2. Appropriate test statistic for difference of means = t statistic (t-test)of means = t statistic (t-test)

3. What would the sampling distribution 3. What would the sampling distribution look like if the null hypothesis were true? look like if the null hypothesis were true? (normal, mean of 0, and SE calculated by (normal, mean of 0, and SE calculated by researcher) This is because we have a researcher) This is because we have a large sample!large sample!

Page 28: Types of  Bivariate  Relationships and Associated Statistics

Example: NES and 2000 ElectionExample: NES and 2000 Election

5. Calculate test statistic5. Calculate test statistic

Sample Size: 5000Sample Size: 5000Mean for Gore voters: 49.63Mean for Gore voters: 49.63Mean for Bush voters: 49.60Mean for Bush voters: 49.60Difference: .033Difference: .033SE: .98SE: .98T-statistic: 0.0337T-statistic: 0.0337P-value: 0.9732 (the probability of obtaining a sample P-value: 0.9732 (the probability of obtaining a sample

difference of at least .033 if in fact there is no difference in difference of at least .033 if in fact there is no difference in the population) the population)

Conclusion: ???Conclusion: ???

Page 29: Types of  Bivariate  Relationships and Associated Statistics

Example: NES and 2000 ElectionExample: NES and 2000 Election

4. Let us now choose our Alpha level. For the 4. Let us now choose our Alpha level. For the purposes here lets choose (.01) = we will reject the purposes here lets choose (.01) = we will reject the null hypothesis if the P-value (sig. level) is less null hypothesis if the P-value (sig. level) is less than than .01.01

This means that if we should conclude to reject the This means that if we should conclude to reject the null hypothesis, we may still be making Type 1 error null hypothesis, we may still be making Type 1 error (falsely rejecting the null) but the chances of doing so (falsely rejecting the null) but the chances of doing so are 1 in 100.are 1 in 100.

Page 30: Types of  Bivariate  Relationships and Associated Statistics

Because in this situation we have a large Because in this situation we have a large sample size we can go back to our z-table and sample size we can go back to our z-table and use it. If our sample size is smaller we must use it. If our sample size is smaller we must employ a t-table (one of them) but more on this employ a t-table (one of them) but more on this later.later.

When we consult the z-table we find that a z-When we consult the z-table we find that a z-score of 2.325 is the critical value that cuts score of 2.325 is the critical value that cuts off .01 percent of the area under the normal off .01 percent of the area under the normal curve.curve.

Page 31: Types of  Bivariate  Relationships and Associated Statistics

Any observed test statistic greater than Any observed test statistic greater than this value will lead to the rejection of the this value will lead to the rejection of the null hypothesis.null hypothesis.

Page 32: Types of  Bivariate  Relationships and Associated Statistics

Zilber and Niven (SSQ)Zilber and Niven (SSQ)

Page 33: Types of  Bivariate  Relationships and Associated Statistics

Zilber and Niven (SSQ)Zilber and Niven (SSQ)

HypothesisHypothesis Whites will react less favorably to black Whites will react less favorably to black

leaders who use the label “African-American” leaders who use the label “African-American” instead of “black.”instead of “black.”

Page 34: Types of  Bivariate  Relationships and Associated Statistics

Zilber and Niven (SSQ)Zilber and Niven (SSQ)

Simple 2-group posttest-onlySimple 2-group posttest-only Sample – convenience sample from Sample – convenience sample from

Midwestern city; university studentsMidwestern city; university students

R (“black”) MR (“black”) MBLACKBLACK

R (“A-A”) MR (“A-A”) MAFRICANAMERICANAFRICANAMERICAN

Page 35: Types of  Bivariate  Relationships and Associated Statistics

Zilber and Niven (SSQ)Zilber and Niven (SSQ)

*p<.05

**p<.01

Page 36: Types of  Bivariate  Relationships and Associated Statistics

ExampleExample

NES 2004NES 2004 Republican Party Feeling Thermometer Republican Party Feeling Thermometer

(537)(537) Religious importance (51)Religious importance (51) Talk Radio (78)Talk Radio (78)

Page 37: Types of  Bivariate  Relationships and Associated Statistics

Analysis of VarianceAnalysis of Variance Purpose – ANOVA is used to compare the means of >2 Purpose – ANOVA is used to compare the means of >2

groupsgroups

More specifically, ANOVA is used to test:More specifically, ANOVA is used to test:

Null Hypothesis: Null Hypothesis: 11 = = 2 2 = = 33= ... = = ... = gg

againstagainst

Alternative Hypothesis: At least one mean is differentAlternative Hypothesis: At least one mean is different

Page 38: Types of  Bivariate  Relationships and Associated Statistics

Analysis of VarianceAnalysis of Variance

ExamplesExamples Comparing the differences in mean income Comparing the differences in mean income

among racial/ethnic groups (black, white, among racial/ethnic groups (black, white, Hispanic, Asian)Hispanic, Asian)

Comparing the differences in feeling Comparing the differences in feeling thermometer scores for Bush among thermometer scores for Bush among Republicans, Democrats, and IndependentsRepublicans, Democrats, and Independents

Page 39: Types of  Bivariate  Relationships and Associated Statistics

Analysis of VarianceAnalysis of Variance

Essentially, ANOVA divides up the total Essentially, ANOVA divides up the total variance in Y (TSS) into two components.variance in Y (TSS) into two components.

TSS = Total sum of squares – total TSS = Total sum of squares – total variation in Yvariation in Y

_ _ (Y(Yii – Y) – Y)22

Page 40: Types of  Bivariate  Relationships and Associated Statistics

Analysis of VarianceAnalysis of Variance

BSS = Between Sum of Squares = BSS = Between Sum of Squares = variation in Y due to differences variation in Y due to differences betweenbetween groups groups

__ __

(Y(Ygg – Y) – Y)22

Page 41: Types of  Bivariate  Relationships and Associated Statistics

Analysis of VarianceAnalysis of Variance

WSS = Within Sum of Squares = WSS = Within Sum of Squares = variation in Y due to differences variation in Y due to differences withinwithin groupsgroups

__

(Y(Yigig – Y – Ygg))22

Page 42: Types of  Bivariate  Relationships and Associated Statistics

Analysis of VarianceAnalysis of Variance

Test statistic:Test statistic:

FFg-1, N-gg-1, N-g = [BSS/(g-1)] / [WSS/(N-g)] = [BSS/(g-1)] / [WSS/(N-g)] [Where g=# groups][Where g=# groups]

Page 43: Types of  Bivariate  Relationships and Associated Statistics

Analysis of VarianceAnalysis of Variance

Interpreting an ANOVAInterpreting an ANOVA If the null hypothesis is true (i.e. all means are If the null hypothesis is true (i.e. all means are

equal), the F-statistic will be equal to 1 (in the equal), the F-statistic will be equal to 1 (in the population)population)

If the F-statistic is judged to be “statistically If the F-statistic is judged to be “statistically significant” (and thus sufficiently greater than significant” (and thus sufficiently greater than 1) we reject the null hypothesis1) we reject the null hypothesis

Page 44: Types of  Bivariate  Relationships and Associated Statistics

Analysis of VarianceAnalysis of Variance

Interpreting an ANOVAInterpreting an ANOVA We can also calculate a measure of the We can also calculate a measure of the

strength of the relationshipstrength of the relationship• Eta-squaredEta-squared = the proportion of variation in the = the proportion of variation in the

dependent variable explained by the independent dependent variable explained by the independent variablevariable

Page 45: Types of  Bivariate  Relationships and Associated Statistics

Bivariate StatisticsBivariate Statistics

Independent Independent VariableVariable

Dependent VariableDependent Variable

Nominal or Interval orNominal or Interval or

Dichotmous Ordinal RatioDichotmous Ordinal Ratio

Nominal orNominal or

DichotomousDichotomous

Lamda,Lamda,

Chi-square,Chi-square,

Cramer’s VCramer’s V

Lamda,Lamda,

Chi-square,Chi-square,

Cramer’s VCramer’s V

Difference of Difference of Means Test (t-test) Means Test (t-test) if IV dichotomous,if IV dichotomous,ANOVA if IV nominal ANOVA if IV nominal

OrdinalOrdinal

Lamda,Lamda,

Chi-square,Chi-square,

Cramer’s VCramer’s V

Gamma, Gamma,

Tau-b, Tau-c, Tau-b, Tau-c,

Somer’s DSomer’s D

Analysis of Analysis of

VarianceVariance

(ANOVA)(ANOVA)

Interval orInterval or

RatioRatio

Difference of Difference of Means Test (t-test) Means Test (t-test) if DV dichotomous,if DV dichotomous,ANOVA if DV nominal ANOVA if DV nominal

Analysis of Analysis of

VarianceVariance

(ANOVA)(ANOVA)

Correlation,Correlation,

Hypthesis Hypthesis TestsingTestsing

RegressionRegression