urbp 204a quantitative methods i statistical analysis lecture iv
DESCRIPTION
URBP 204A QUANTITATIVE METHODS I Statistical Analysis Lecture IV. Gregory Newmark San Jose State University (This lecture is based on Chapters 5,12,13, & 15 of Neil Salkind’s Statistics for People who (Think They) Hate Statistics, 2 nd Edition - PowerPoint PPT PresentationTRANSCRIPT
URBP 204A QUANTITATIVE METHODS I
Statistical Analysis Lecture IV
Gregory NewmarkSan Jose State University
(This lecture is based on Chapters 5,12,13, & 15 of Neil Salkind’sStatistics for People who (Think They) Hate Statistics, 2nd Edition
which is also the source of many of the offered examples. All cartoons are from CAUSEweb.org by J.B. Landers.)
More Statistical Tests• Factorial Analysis of Variance (ANOVA)
– Tests between means of more than two groups for two or more factors (independent variables)
• Correlation Coefficient– Tests the association between two variables
• One Sample Chi-Square (χ2)– Tests if an observed distribution of frequencies for one
factor is what one would expect by chance• Two Factor Chi-Square (χ2)
– Tests if an observed distribution of frequencies for two factors is what one would expect by chance
Factorial ANOVA• Compares observations of a single variable among two
or more groups which incorporate two or more factors.
• Examples:– Reading Skills
• School (Elementary, Middle, High)• Academic Philosophy (Montessori, Waldorf)
– Environmental Knowledge• Commute Mode (Car, Bus, Walking)• Age (Under 40, 40+)
– Wealth • Favorite Team (A’s, Giants, Dodger, Angels)• Home Location (Oakland, SF, LA)
– Weight Loss• Gender (Male, Female)• Exercise (Biking, Running)
Factorial ANOVA• Two Types of Effects
– Main Effects: differences within one factor– Interaction Effects: differences across factors
• Example:– Weight Loss
• Gender (Male, Female)• Exercise (Biking, Running)
– Main Effects:• Does weight loss vary by exercise?• Does weight loss vary by gender?
– Interaction Effects: • Does weight loss due to exercise vary by gender?
Factorial ANOVA
• Example:– “How is weight loss affected by exercise program
and gender?”• Steps:
– State hypotheses• Null :
H0 : µMale = µFemale
H0 : µBiking = µRunning
H0 : µMale-Biking = µFemale-Biking = µMale-Running = µFemale-Running
• Research : What would these three be?
Factorial ANOVA• Steps (Continued):
– Set significance level• Level of risk of Type I Error = 5% • Level of Significance (p) = 0.05
– Select statistical test• Factorial ANOVA
– Computation of obtained test statistic value • Insert obtained data into appropriate formula• (SPSS can expedite this step for us)
Factorial ANOVA• Weight Loss Data
Male-Biking Male-Running Female-Biking Female-Running
76 88 65 65
78 76 90 67
76 76 65 67
76 76 90 87
76 56 65 78
74 76 90 56
74 76 90 54
76 98 79 56
76 88 70 54
55 78 90 56
Factorial ANOVA
• SPSS Outputs
Tests of Between-Subjects Effects
Dependent Variable: WeightLoss
1522.875a 3 507.625 4.678 .007218892.025 1 218892.025 2017.386 .000
265.225 1 265.225 2.444 .127207.025 1 207.025 1.908 .176
1050.625 1 1050.625 9.683 .0043906.100 36 108.503
224321.000 405428.975 39
SourceCorrected ModelInterceptExerciseGenderExercise * GenderErrorTotalCorrected Total
Type III Sumof Squares df Mean Square F Sig.
R Squared = .281 (Adjusted R Squared = .221)a.
Between-Subjects Factors
Running 20Biking 20Male 20Female 20
12
Exercise
12
Gender
Value Label N
=p
Factorial ANOVA
• SPSS Outputs
Factorial ANOVA
• SPSS Outputs– Graph them!
FemaleMale
Gender
80
75
70
65
Mea
n W
eigh
tLos
s
BikingRunning
Exercise
Factorial ANOVA• Steps (Continued)
– Computation of obtained test statistic value • Exercise F = 2.444, p = 0.127• Gender F = 1.908, p = 0.176• Interaction F = 9.683, p = 0.004
– Look up the critical F score• dfnumerator = # of Factors – 1 • dfdenominator = # of Observations – # of Groups• What is the critical F score?
– Comparison of obtained and critical values• If obtained > critical reject the null hypothesis• If obtained < critical stick with the null hypothesis
Factorial ANOVA• Steps (Continued)
– Therefore we reject the null hypothesis for the interaction effects. This means that while choice of exercise alone and gender alone make no difference to weight loss, in combination they do differentially affect weight loss. Men should run and women should bike, according to these data.
Correlation Coefficient• Tests whether changes in two variables are
related• Examples
– “Are property values positively related to distance from waste dumps?”
– “Is age correlated with height for minors?”– “Are apartment rents negatively related to
commute time?”– “Does someone’s height relate to income?”– “How related are hand size and height?”
Correlation Coefficient• Are Tastiness and Ease correlated for fruit?• Is there directionality?
Correlation Coefficient• Numeric index that reflects the linear relationship
between two variables (bivariate correlation)– “How does the value of one variable change when another
variable changes?”– Each case has two data points:
• E.g. This study records each persons height and weight to see if they are correlated.
– Ranges from -1.0 to +1.0– Two types of possible correlations
• Change in the same direction : positive or direct correlation• Change in opposite directions: negative or indirect correlation
– Absolute value reflects strength of correlation• Pearson Product-Moment Correlation
– Both variables need to be ratio or interval
Correlation Coefficient• Scatterplot
Correlation Coefficient• Coefficient of Determination
– Squaring the correlation coefficient (r2)– The percentage of variance in one variable that is
accounted for by the variance in another variable• Example: GPA and Time Spent Studying
– [rGPA and Study Time = 0.70]; [r2GPA and Study Time = 0.49]
• 49% of the variance in GPA can be explained by the variance in studying time
• GPA and studying time share 49% of the variance between themselves
Correlation Coefficient• Example
– “How related are hand size and height?”• Steps
– State hypotheses• Null : H0 : ρHand Size and Height = 0
• Research: H1 : rHand Size and Height ≠ 0– Non-directional
– Set significance level• Level of risk of Type I Error = 5% • Level of Significance (p) = 0.05
Correlation Coefficient• Steps (Continued)
– Select statistical test• Correlation Coefficient (it is the test statistic!)
– Computation of obtained test statistic value • Insert obtained data into appropriate formula
Correlation Coefficient• Plot the data: n = 30
Correlation Coefficient• Steps (Continued)
– Computation of obtained test statistic value • rHand Size and Height = 0.736
Correlations
Height HandHeight Pearson
Correlation1 .736**
Sig. (2-tailed) .000
N 30 30Hand Pearson
Correlation.736** 1
Sig. (2-tailed) .000
N 30 30**. Correlation is significant at the 0.01 level (2-tailed).
Correlation Coefficient• Steps (Continued)
– Computation of critical test statistic value• Value needed to reject null hypothesis• Look up p = 0.05 in critical value table• Consider degrees of freedom [df= n – 2] • Consider number of tails (is there directionality?)• rcritical = ?
Correlation Coefficient
• What happens to the critical score when the number of cases (n) decreases? Why?
Correlation Coefficient
• Steps (Continued)– Comparison of obtained and critical values
• If obtained > critical reject the null hypothesis• If obtained < critical stick with the null hypothesis• robtained = 0.736 > rcritical = 0.349
– Therefore, we reject the null hypothesis and accept the research hypothesis that height and handbreadth are correlated.
• Is there a directionality to that correlation?
Correlation Coefficient
• Significance vs. Meaning– Rules of Thumb
• r = 0.8 to 1.0 Very strong relationship• r = 0.6 to 0.8 Strong relationship• r = 0.4 to 0.6 Moderate relationship• r = 0.2 to 0.4 Weak relationship• r = 0.0 to 0.2 Weak or no relationship
Correlation Coefficient
• Does correlation express causation?• Classic Example:
– Ice Cream Eaten– Crimes Committed
Correlation Coefficient
• Correlation expresses association only
Chi-Square (χ2)• Non-Parametric Test
– Does not rely on a given distribution• Useful for small sample sizes
– Enables consideration of data that comes as ordinal or nominal frequencies
• Number of children in different grades• Percentage of people by state receiving social security
One Sample Chi-Square (χ2)• Tests whether an observed distribution of
frequencies for one factor is likely to have occurred by chance
• Examples:– “Is this community evenly distributed among ethnic
groups?”– “Are the 31 ice cream flavors at Baskin Robbins
equally purchased?”– “Are commuting mode shares evenly spread out?”– “Did people report equal preferences for a school
voucher policy?”
One Sample Chi-Square (χ2)• Examples:
– “Did people report equal preferences for a school voucher policy?”
– Data (90 People split into 3 Categories)• For 23• Maybe 17• Against 50
– Always try to have at least 5 responses per category
One Sample Chi-Square (χ2)• Steps:
– State hypotheses• Null :
H0 : ProportionFor = ProportionMaybe = ProportionAgainst
• Research : H1 : ProportionFor ≠ ProportionMaybe ≠ ProportionAgainst
– Set significance level• Level of risk of Type I Error = 5% • Level of Significance (p) = 0.05
– Select statistical test• Chi-Square (χ2)
One Sample Chi-Square (χ2)• Steps (Continued):
– Computation of obtained test statistic value • Insert obtained data into appropriate formula• (SPSS can expedite this step for us)
One Sample Chi-Square (χ2)• Steps (Continued):
– Computation of obtained test statistic value
Category O E (O-E) (O-E)2 (O-E)2/E
For 23 30 -7 49 1.63
Against 17 30 -13 169 5.63
Maybe 50 30 20 400 13.33
Total 90 90 -- -- 20.59
One Sample Chi-Square (χ2)• Steps (Continued):
– Computation of obtained test statistic value • χ2 obtained = 20.59
– Computation of critical test statistic value• Value needed to reject null hypothesis• Look up p = 0.05 in χ2 table• Consider degrees of freedom [df= # of categories - 1] • χ2 critical = 5.99
One Sample Chi-Square (χ2)• Steps (Continued):
– Computation of obtained test statistic value Votes
23 30.0 -7.017 30.0 -13.050 30.0 20.090
ForMaybeAgainstTotal
Observed N Expected N Residual
Test Statistics
20.6002
.000
Chi-Square a
dfAsymp. Sig.
Votes
0 cells (.0%) have expected frequencies less than5. The minimum expected cell frequency is 30.0.
a.
One Sample Chi-Square (χ2)• Steps (Continued):
– Comparison of obtained and critical values• If obtained > critical reject the null hypothesis• If obtained < critical stick with the null hypothesis• χ2 obtained = 20.59 > χ2 critical = 5.99
– Therefore, we can reject the null hypothesis and we thus conclude that distribution of preferences regarding the school voucher is not even.
Two Factor Chi-Square (χ2)• What if we want to see if gender effects the
distribution of votes?
• How is this different from Factorial ANOVA?
Votes * Gender Crosstabulation
Count
17 6 237 10 17
20 30 5044 46 90
ForMaybeAgainst
Votes
Total
Male FemaleGender
Total
Two Factor Chi-Square (χ2)• Steps:
– State hypotheses• Null :
H0 : PFor*Male = PMaybe*Male = PAgainst *Male = PFor*Female = PMaybe*Female = PAgainst *Female
• Research : H1 : PFor*Male ≠ PMaybe*Male ≠ PAgainst *Male ≠ PFor*Female ≠ PMaybe*Female ≠ PAgainst
*Female
– Set significance level• Level of risk of Type I Error = 5% • Level of Significance (p) = 0.05
– Select statistical test• Chi-Square (χ2)
Two Factor Chi-Square (χ2)• Steps (Continued):
– Computation of obtained test statistic value • Insert obtained data into appropriate formula• Same as for One Factor Chi-Square
Two Factor Chi-Square (χ2)• How do we find the expected frequencies?
– (Row Total * Column Total)/ Total Total– Expected Value [For*Male] = (23*44)/90 = 11.2
Votes * Gender Crosstabulation
17 6 2311.2 11.8 23.0
7 10 178.3 8.7 17.020 30 50
24.4 25.6 50.044 46 90
44.0 46.0 90.0
CountExpected CountCountExpected CountCountExpected CountCountExpected Count
For
Maybe
Against
Votes
Total
Male FemaleGender
Total
Two Factor Chi-Square (χ2)• Steps (Continued):
– Computation of obtained test statistic value • χ2 obtained = 7.750
Chi-Square Tests
7.750a 2 .0217.984 2 .018
6.344 1 .012
90
Pearson Chi-SquareLikelihood RatioLinear-by-LinearAssociationN of Valid Cases
Value dfAsymp. Sig.
(2-sided)
0 cells (.0%) have expected count less than 5. Theminimum expected count is 8.31.
a.
Two Factor Chi-Square (χ2)• Steps (Continued):
– Computation of critical test statistic value• Value needed to reject null hypothesis• Look up p = 0.05 in χ2 table• Consider degrees of freedom • df= (# of rows – 1) * (# of columns – 1) • χ2 critical = ?
Two Factor Chi-Square (χ2)• Steps (Continued):
– Comparison of obtained and critical values• If obtained > critical reject the null hypothesis• If obtained < critical stick with the null hypothesis• χ2 obtained = 7.750 > χ2 critical = 5.99
– Therefore, we can reject the null hypothesis and we thus conclude that gender affects the distribution of preferences regarding the school vouchers.
Tutorial Time