multivariate analysis of variance (manova) · pdf filemultivariate analysis of variance...
TRANSCRIPT
C B A ๐ฅ ๐ด๐ต๐ถ
๐ป๐: ๐๐ด = ๐๐ต = ๐๐ถ ๐ป๐: ๐๐ด โ ๐๐ต โ ๐๐ถ
The alternative could be true because all the means are different or just one of them is different than the others
If we reject the null hypothesis we need to perform some further analysis to draw conclusions about which population means differ from the others and by how much
๐ฅ ๐ ๐ฅ ๐ต ๐ฅ ๐ด 508 514.25 727.5 583.25
Consider Univariate ANOVA Used when you have 3 or more samples
C B A ๐ฅ ๐ด๐ต๐ถ
Used when you have 3 or more samples
๐น =๐ ๐๐๐๐๐
๐๐๐๐ ๐ ๐น =
๐ฃ๐๐๐๐๐๐๐ ๐๐๐ก๐ค๐๐๐
๐ฃ๐๐๐๐๐๐๐ ๐ค๐๐กโ๐๐
๐ฃ๐๐๐๐๐๐๐ ๐๐๐ก๐ค๐๐๐ = ๐ฅ ๐ โ ๐ฅ ๐ด๐ฟ๐ฟ
2๐๐
๐ โ 1โ ๐ ๐ฃ๐๐๐๐๐๐๐ ๐ค๐๐กโ๐๐ =
๐ฃ๐๐๐๐๐๐๐๐๐๐
๐
SIGNAL
NOISE
A large F-value indicates a significant difference
๐ฅ ๐ ๐ฅ ๐ต ๐ฅ ๐ด 508 514.25 727.5 583.25
Consider Univariate ANOVA
C B A
๐ฅ ๐ ๐ฅ ๐ต ๐ฅ ๐ด 508 514.25 727.5
๐ฅ ๐ด๐ต๐ถ SIGNAL
NOISE
๐น =๐ฃ๐๐๐๐๐๐๐ ๐๐๐ก๐ค๐๐๐
๐ฃ๐๐๐๐๐๐๐ ๐ค๐๐กโ๐๐=62463.25
672.1943= ๐๐. ๐๐๐๐๐
583.25
๐ฃ๐๐๐๐๐๐๐ ๐๐๐ก๐ค๐๐๐ = ๐ฅ ๐ โ ๐ฅ ๐ด๐ต๐ถ
2๐ด,๐ต,๐ถ๐
3 โ 1โ 4 =
727.5 โ 583.25 2 + 514.25 โ 583.25 2 + 508 โ 583.25 2
2โ 4
๐ฃ๐๐๐๐๐๐๐ ๐๐๐ก๐ค๐๐๐ = ๐๐๐๐๐. ๐๐
๐ฃ๐๐๐๐๐๐๐ ๐ค๐๐กโ๐๐ =๐ฃ๐๐๐ด + ๐ฃ๐๐๐ต + ๐ฃ๐๐๐ถ
3=891.6667 + 819.3333 + 305.5833
3
๐ฃ๐๐๐๐๐๐๐ ๐ค๐๐กโ๐๐ = ๐๐๐. ๐๐๐๐
One-way ANOVA in R:
anova(lm(YIELD~VARIETY))
Used when you have 3 or more samples Consider Univariate ANOVA
๐น =๐ ๐๐๐๐๐
๐๐๐๐ ๐ ๐น =
๐ฃ๐๐๐๐๐๐๐ ๐๐๐ก๐ค๐๐๐
๐ฃ๐๐๐๐๐๐๐ ๐ค๐๐กโ๐๐
๐ฃ๐๐๐๐๐๐๐ ๐๐๐ก๐ค๐๐๐ = ๐ฅ ๐ โ ๐ฅ ๐ด๐ฟ๐ฟ
2๐๐
๐ โ 1โ ๐๐๐๐๐ก
๐ฃ๐๐๐๐๐๐๐ ๐ค๐๐กโ๐๐ = ๐ฃ๐๐๐๐๐๐๐๐๐๐
๐
Pro
bab
ility
of
ob
serv
atio
n
๐ ๐๐๐๐๐ > ๐๐๐๐ ๐ ๐ ๐๐๐๐๐ < ๐๐๐๐ ๐
P-value (percentiles, probabilities) Present 1-p-value
In R: pf(F, ๐๐1, ๐๐2)
In R: qf(p, ๐๐1, ๐๐2)
0.50 0 0.95
โ
โ= 0.05
F-Distribution (family of distributions- shape is dependent on degrees of freedom)
The larger the F-value the further into the tail โ AND the smaller the probability that the calculated F-value was found by chance, MEANING there is a high probability that something is causing a significant difference between the groups
Using DISCRIM to predict which group
Problem: A new skull is found but we donโt know whether it belongs to homo erectus or homo habilis or if itโs a new group?
Homo erectus
Homo habilis
Group centroid
New find (unknown origin)
Skull measurement
How predictions work:
1. Calculate group centroid 2. Find out which centroid is the closest position to the unknown data point
New groups are defined when we find a significant difference between new find and predefined groups
Popular method in taxonomy and anthropology
Multivariate Analysis of Variance (MANOVA)
Is there a significant difference among groups based on multiple response variables? (e.g. ANOVA with multiple response variables)
MANOVA in R: output=manova(responseMatrix~predictorMatrix) (stats package)
Skull measurement When we calculate a centroid of a group you build a probability distribution around the centroid for comparison
You can the run repeated t-tests (with adjusted p-values
for multiple comparisons) to compare the new data to the groups but MANOVA does it all for you in one shot!
Another lab on MANOVA for reference: Lauraโs website, RENR 480, Lab 22
Assumptions of (MANOVA)
MANOVA is VERY sensitive to invalid assumptions and outliers
Within groups we need to have:
1. Normality: Residuals have to be normally distributed 2. Homogeneity of variances: residuals need to have equal variances
Need to meet the assumption in the univariate context to meet them for multivariate analyses
You therefore first have to check each individual measurement (response variable) for normality and homogeneity e.g. By making boxplots or plotting ANOVA residuals for each variable
Median
Mean
Left skewed negatively skewed
Normal perfectly symmetric
Right skewed positively skewed
Represented as a boxplot
Bi-Modal Two different modes
Not necessarily symmetric
Freq
uen
cy
Freq
uen
cy
Mode Mode
Mean Median
Assumptions of (MANOVA)
Generate boxplots for each response variable and assess shape & whiskers
Boxplots in R (multiple plots): boxplot(ResponseVariable~Group)
Testing for Normality & Equal Variances โ Residual Plots
Residual plots in R (multiple plots): plot(lm(ResponseVariable~Group))(2nd plot) P
red
icte
d v
alu
es
Observed (original units)
Pre
dic
ted
val
ues
Observed (original units)
Pre
dic
ted
val
ues
Observed (original units)
Pre
dic
ted
val
ues
Observed (original units)
โข NORMAL distribution: equal number of points along observed
โข EQUAL variances: equal spread on either side of the meanpredicted value=0
โข Good to go!
0
0
0
0
โข NON-NORMAL distribution: unequal number of points along observed
โข EQUAL variances: equal spread on either side of the meanpredicted value=0
โข Optional to fix
โข NORMAL/NON NORMAL: look at histogram or test
โข UNEQUAL variances: cone shape โ away from or towards zero
โข This needs to be fixed for MANOVA (transformations)
โข OUTLIERS: points that deviate from the majority of data points
โข This needs to be fixed for MANOVA (transformations or removal)
Assumptions of (MANOVA)
Assumptions of (MANOVA)
If you violate the assumptions of MANOVA:
1. Transform your data (follow examples we will discuss on the board)
2. Use non-parametric options (e.g. perMANOVA Lab 6)
Multivariate Analysis of Variance (MANOVA) - output
You can see if there is a significant difference across all predictor variables using the Wilkโs MANOVA test statistic
Or you can see if there is a significant difference among groups for each predictor variable separately
P-value โ the probability the observed difference between groups or larger is due
to random chance Thus if p-value is small this means that something is having an effect on the groups causing the difference