![Page 1: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/1.jpg)
Regression and Correlation
GTECH 201Lecture 18
![Page 2: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/2.jpg)
ANOVA
Analysis of Variance Continuation from matched-pair difference
of means tests; but now for 3+ cases We still check whether samples come from
one or more distinct populations Variance is a descriptive parameter ANOVA compares group means and looks
whether they differ sufficiently to reject H0
![Page 3: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/3.jpg)
ANOVA H0 and HA
![Page 4: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/4.jpg)
ANOVA Test Statistic
MSB = between-group mean squares
MSW = within-group mean squares
Between-group variability is calculated in three steps:
B
w
MSF
MS
1. Calculate overall mean as weighted average of sample means
2. Calculate between-group sum of squares3. Calculate between-group mean squares (MSB)
![Page 5: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/5.jpg)
Between-group Variability
1. Total or overall mean
2. Between-group sum of squares
3. Between-group mean squares
1
k
i ii
T
n XX
N
2
2 2
1 1
k k
B i i T i i Ti i
SS n X X n X N X
1B B
BB
SS SSMS
df k
![Page 6: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/6.jpg)
Within-group Variability
1. Within-group sum of squares
2. Within-group mean squares
2
1
1k
w i ii
SS n s
W WW
W
SS SSMS
df N k
![Page 7: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/7.jpg)
Kruskal-Wallis Test Nonparametric equivalent of ANOVA Extension of Wilcoxon rank sum W test
to 3+ cases Average rank is Ri / ni
Then the Kruskal-Wallis H test statistic is
With N =n1 + n2 + … +nk = total number of observations, and
Ri = sum of ranks in sample i
2
1
123 1
1
ki
i i
RH N
N N n
![Page 8: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/8.jpg)
ANOVA Example
House prices by neighborhood in ,000 dollars A B C D175 151 127 174147 183 142 182138 174 124 210156 181 150 191184 193 180148 205
196
![Page 9: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/9.jpg)
ANOVA Example, continued
Sample statistics
n X sA 6 158.00 17.83B 7 183.29 17.61C 5 144.60 22.49D 4 189.25 15.48
Total 22 168.68 24.85
Now fill in the six steps of the ANOVA calculation
![Page 10: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/10.jpg)
The Six Steps
1 6(158.00) 7(183.29) 5(144.60) 4(189.25)168.68
22
k
i ii
T
n XX
N
2 2 2 2 22 2
1
6 158.00 7 183.29 5 144.60 4 189.25 22 168 6769.394k
B i i Ti
SS n X N X
6769.3942256.465
1 3B B
BB
SS SSMS
df k
2 2 2 22
1
1 5 17.83 6 17.61 4 22.49 3 15.48 6193.379k
w i ii
SS n s
6193.379
344.07722 4
W WW
W
SS SSMS
df N k
2256.465
6.558344.077
B
W
MSF
MS .003p
![Page 11: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/11.jpg)
Correlation Co-relatedness between 2+ variables As the values of one variable go up,
those of the other change proportionally
Two step approach:1. Graphically - scatterplot2. Numerically – correlation coefficients
![Page 12: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/12.jpg)
Is There a Correlation?
![Page 13: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/13.jpg)
Scatterplots Exploratory analysis
![Page 14: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/14.jpg)
Pearson’s Correlation Index
Based on concept of covariance
= covariation between X and Y
= deviation of X from its mean
= deviation of Y from its mean
Pearson’s correlation coefficient
XYCV X X Y Y XYCV
X X
Y Y
/X Y
X X Y Y Nr
S S
![Page 15: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/15.jpg)
Sample and Population
r is the sample correlation coefficient
Applying the t distribution, we can infer the correlation for the whole population
Test statistic for Pearson’s r
2
2
1
r nt
r
![Page 16: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/16.jpg)
Correlation Example Lake effect snow
![Page 17: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/17.jpg)
Spearman’s Rank Correlation
Non-parametric alternative to Pearson
Logic similar to Kruskal and Wilcoxon
Spearman’s rank correlation coefficient 2
3
61s
dr
N N
![Page 18: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/18.jpg)
Regression
In correlation we observe degrees of association but no causal or functional relationship
In regression analysis, we distinguish an independent from a dependent variable
Many forms of functional relationships bivariate linear
multivariate non-linear (curvi-linear)
![Page 19: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/19.jpg)
Graphical Representation
In correlation analysis either variable could be depicted on either axis
In regression analysis, the independent variable is always on the X axis
Bivariate relationship is described by a best-fitting line through the scatterplot
![Page 20: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/20.jpg)
Least-Square Regression
Objective: minimize 2id
Y a bX
![Page 21: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/21.jpg)
Regression Equation
Y = a + bX
22
n XY X Yb
n X X
Y b Xa
n
![Page 22: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/22.jpg)
Strength of Relationship
How much is explained by the regression equation?
![Page 23: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/23.jpg)
Coefficient of Determination
Total variation of Y (all the bucket water)
Large ‘Y’ = dependent variable Small ‘y’ = deviation of each value of Y
from its mean
e = explained; u = unexplained
22y Y Y 2 2 2
e uy y y
2 2 2e uy y y
![Page 24: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/24.jpg)
Explained Variation
Ratio of square of covariation between X and Y to the variation in X
where xy = covariation between X and Y
x2 = total variation of X
Coefficient of determination
222e
xyy
x
22
2eyry
![Page 25: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases](https://reader030.vdocuments.net/reader030/viewer/2022032800/56649d4d5503460f94a2cd0d/html5/thumbnails/25.jpg)
Error Analysis
r 2 tells us what percentage of the variation is accounted for by the independent variable
This then allows us to infer the standard error of our estimate
which tells us, on average, how far off our prediction would be in measurement units
2
2eySE
n