meta-analysis
DESCRIPTION
Sessions 1.2-1.3: Effect Size Calculation. Funded through the ESRC’s Researcher Development Initiative. Meta-analysis. Department of Education, University of Oxford. Sessions 1.2-1.3: Effect Size Calculation. 2. Effect size calculation. The effect size makes meta-analysis possible - PowerPoint PPT PresentationTRANSCRIPT
-
Funded through the ESRCs Researcher Development InitiativeDepartment of Education, University of OxfordSessions 1.2-1.3: Effect Size Calculation
-
*
-
The effect size makes meta-analysis possibleIt is based on the dependent variable (i.e., the outcome)It standardizes findings across studies such that they can be directly comparedAny standardized index can be an effect size (e.g., standardized mean difference, correlation coefficient, odds-ratio), but mustbe comparable across studies (standardization)represent magnitude & direction of the relationshipbe independent of sample sizeDifferent studies in same meta-analysis can be based on different statistics, but have to transform each to a standardized effect size that is comparable across different studies
-
XLSSample size, significance and d effect size
Levene
DefinitionThe Levene test is defined as:
H0:
Ha:for at least one pair (i,j).
Test Statistic:Given a variable Y with sample of size N divided into k subgroups, where Ni is the sample size of the ith subgroup, the Levene test statistic is defined as:
where Zij can have one of the following three definitions:
1. where
is the mean of the ith subgroup.
2. where
is the median of the ith subgroup.
3. where
is the 10% trimmed mean of the ith subgroup.
are the group means of the Zij and is the overall mean of the Zij.
The three choices for defining Zij determine the robustness and power of Levene's test. By robustness, we mean the ability of the test to not falsely detect unequal variances when the underlying data are not normally distributed and the variables are in f
Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean per
(i.e., skewed) distribution. Using the mean provided the best power for symmetric, moderate-tailed, distributions.
Although the optimal choice depends on the underlying distribution, the definition based on the median is recommended as the choice that provides good robustness against many types of non-normal data while retaining good power. If you have knowledge of th
Significance Level:
Critical Region:The Levene test rejects the hypothesis that the variances are equal if
where
is the upper critical value of the F distribution with k - 1 and N - k degrees of freedom at a significance level of .
In the above formulas for the critical regions, the Handbook follows the convention that
is the upper critical value from the F distribution and
is the lower critical value. Note that this is the opposite of some texts and software programs. In particular, Dataplot uses the opposite convention.
Sample OutputDataplot generated the following output for Levene's test using the GEAR.DAT data set (by default, Dataplot performs the form of the test based on the median):
LEVENE F-TEST FOR SHIFT IN VARIATION
(CASE: TEST BASED ON MEDIANS)
1. STATISTICS
NUMBER OF OBSERVATIONS = 100
NUMBER OF GROUPS = 10
LEVENE F TEST STATISTIC = 1.705910
2. FOR LEVENE TEST STATISTIC
0 % POINT = 0.
50 % POINT = 0.9339308
75 % POINT = 1.296365
90 % POINT = 1.702053
95 % POINT = 1.985595
99 % POINT = 2.610880
99.9 % POINT = 3.478882
90.09152 % Point: 1.705910
3. CONCLUSION (AT THE 5% LEVEL):
THERE IS NO SHIFT IN VARIATION.
THUS: HOMOGENEOUS WITH RESPECT TO VARIATION.
Interpretation of Sample OutputWe are testing the hypothesis that the group variances are equal. The output is divided into three sections.
1. The first section prints the number of observations (N), the number of groups (k), and the value of the Levene test statistic.
2. The second section prints the upper critical value of the F distribution corresponding to various significance levels. The value in the first column, the confidence level of the test, is equivalent to 100(1-
). We reject the null hypothesis at that significance level if the value of the Levene F test statistic printed in section one is greater than the critical value printed in the last column.
3. The third section prints the conclusion for a 95% test. For a different significance level, the appropriate conclusion can be drawn from the table printed in section two. For example, for
= 0.10, we look at the row for 90% confidence and compare the critical value 1.702 to the Levene test statistic 1.7059. Since the test statistic is greater than the critical value, we reject the null hypothesis at the
= 0.10 level.
Output from other statistical software may look somewhat different from the above output.
QuestionLevene's test can be used to answer the following question:
Is the assumption of equal variances valid?
Related TechniquesStandard Deviation Plot
Box Plot
Bartlett Test
Chi-Square Test
Analysis of Variance
SoftwareThe Levene test is available in some general purpose statistical software programs, including Dataplot.
is the mean of the ith subgroup.
is the median of the ith subgroup.
is the 10% trimmed mean of the ith subgroup.
Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean per
Dataplot generated the following output for Levene's test using the GEAR.DAT data set (by default, Dataplot performs the form of the test based on the median):
Standard Deviation Plot
Box Plot
Bartlett Test
Chi-Square Test
Analysis of Variance
The Levene test is available in some general purpose statistical software programs, including Dataplot.
mean_effect size
T-test with unpooled varianses and effect sizes (with pooled variances)
sample Asample Bsample Csample D
M100105-5.0000M1 - M1M100105-5.0000M1 - M1
SD151545.0000(s1)2 / n1SD151522.5000(s1)2 / n1
N5545.0000(s2)2 / n2N101022.5000(s2)2 / n2
9.4868SQRT ((s1)2 / n1 + (s2)2 / n2)6.7082SQRT ((s1)2 / n1 + (s2)2 / n2)
T-0.5270T-0.7454
one or two tailed20.5270one or two tailed20.7454
DF8(n1 + n2 -2)DF18(n1 + n2 -2)
sign0.6125sign0.4657
d0.3333d0.3333
sample Esample Fsample Gsample H
M5.541.5000M1 - M1M5.541.5000M1 - M1
SD0.50.50.0013(s1)2 / n1SD0.50.50.0125(s1)2 / n1
N2002000.0013(s2)2 / n2N20200.0125(s2)2 / n2
0.0500SQRT ((s1)2 / n1 + (s2)2 / n2)0.1581SQRT ((s1)2 / n1 + (s2)2 / n2)
T30.0000T9.4868
one or two tailed230.0000one or two tailed29.4868
DF398(n1 + n2 -2)DF38(n1 + n2 -2)
sign0.0000sign0.0000
d3.0000d3.0000
cohens d (pooled variances) = (m1 - m2) /sqrt ( ( ( (n1 -1) sd12 ) + ( (n2 - 1) sd22) )/ (n1 + n2 - 2) )
T-test and effect sizes with pooled variances
* systematically change the mean-level differences, maintaining the variances and sample sizes
* systematically change the sample sizes, maintaining the variances and mean-level differences
* systematically change the variances, maintaining the mean-level differences and sample sizes
* "unbalance" the test, distort sample sizes and / or variances of the groups
* figure out some "rule of thumb" for your observations
chi2
This spread sheet calculates the chi-square value for a 2 x 2 contingency table and transforms it into an effect size and a phi coefficient
Variable A(observed minus expected frequencies)2
noyesnoyes
Variable Bno265204469no536.94536.94
yes3275107yes536.94536.94
297279576
expected frequencies(observed minus expected frequencies)2 / expected frequencies
noyesnoyes
no241.83227.17no2.222.36
yes55.1751.83yes9.7310.36
observed minus expected frequenciesCHISQ24.6758732833
-
*XLSESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*Sample size, significance and d effect size
Levene
DefinitionThe Levene test is defined as:
H0:
Ha:for at least one pair (i,j).
Test Statistic:Given a variable Y with sample of size N divided into k subgroups, where Ni is the sample size of the ith subgroup, the Levene test statistic is defined as:
where Zij can have one of the following three definitions:
1. where
is the mean of the ith subgroup.
2. where
is the median of the ith subgroup.
3. where
is the 10% trimmed mean of the ith subgroup.
are the group means of the Zij and is the overall mean of the Zij.
The three choices for defining Zij determine the robustness and power of Levene's test. By robustness, we mean the ability of the test to not falsely detect unequal variances when the underlying data are not normally distributed and the variables are in f
Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean per
(i.e., skewed) distribution. Using the mean provided the best power for symmetric, moderate-tailed, distributions.
Although the optimal choice depends on the underlying distribution, the definition based on the median is recommended as the choice that provides good robustness against many types of non-normal data while retaining good power. If you have knowledge of th
Significance Level:
Critical Region:The Levene test rejects the hypothesis that the variances are equal if
where
is the upper critical value of the F distribution with k - 1 and N - k degrees of freedom at a significance level of .
In the above formulas for the critical regions, the Handbook follows the convention that
is the upper critical value from the F distribution and
is the lower critical value. Note that this is the opposite of some texts and software programs. In particular, Dataplot uses the opposite convention.
Sample OutputDataplot generated the following output for Levene's test using the GEAR.DAT data set (by default, Dataplot performs the form of the test based on the median):
LEVENE F-TEST FOR SHIFT IN VARIATION
(CASE: TEST BASED ON MEDIANS)
1. STATISTICS
NUMBER OF OBSERVATIONS = 100
NUMBER OF GROUPS = 10
LEVENE F TEST STATISTIC = 1.705910
2. FOR LEVENE TEST STATISTIC
0 % POINT = 0.
50 % POINT = 0.9339308
75 % POINT = 1.296365
90 % POINT = 1.702053
95 % POINT = 1.985595
99 % POINT = 2.610880
99.9 % POINT = 3.478882
90.09152 % Point: 1.705910
3. CONCLUSION (AT THE 5% LEVEL):
THERE IS NO SHIFT IN VARIATION.
THUS: HOMOGENEOUS WITH RESPECT TO VARIATION.
Interpretation of Sample OutputWe are testing the hypothesis that the group variances are equal. The output is divided into three sections.
1. The first section prints the number of observations (N), the number of groups (k), and the value of the Levene test statistic.
2. The second section prints the upper critical value of the F distribution corresponding to various significance levels. The value in the first column, the confidence level of the test, is equivalent to 100(1-
). We reject the null hypothesis at that significance level if the value of the Levene F test statistic printed in section one is greater than the critical value printed in the last column.
3. The third section prints the conclusion for a 95% test. For a different significance level, the appropriate conclusion can be drawn from the table printed in section two. For example, for
= 0.10, we look at the row for 90% confidence and compare the critical value 1.702 to the Levene test statistic 1.7059. Since the test statistic is greater than the critical value, we reject the null hypothesis at the
= 0.10 level.
Output from other statistical software may look somewhat different from the above output.
QuestionLevene's test can be used to answer the following question:
Is the assumption of equal variances valid?
Related TechniquesStandard Deviation Plot
Box Plot
Bartlett Test
Chi-Square Test
Analysis of Variance
SoftwareThe Levene test is available in some general purpose statistical software programs, including Dataplot.
is the mean of the ith subgroup.
is the median of the ith subgroup.
is the 10% trimmed mean of the ith subgroup.
Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean per
Dataplot generated the following output for Levene's test using the GEAR.DAT data set (by default, Dataplot performs the form of the test based on the median):
Standard Deviation Plot
Box Plot
Bartlett Test
Chi-Square Test
Analysis of Variance
The Levene test is available in some general purpose statistical software programs, including Dataplot.
mean_effect size
T-test with unpooled varianses and effect sizes (with pooled variances)
sample Asample Bsample Csample D
M100105-5.0000M1 - M1M100105-5.0000M1 - M1
SD151545.0000(s1)2 / n1SD151522.5000(s1)2 / n1
N5545.0000(s2)2 / n2N101022.5000(s2)2 / n2
9.4868SQRT ((s1)2 / n1 + (s2)2 / n2)6.7082SQRT ((s1)2 / n1 + (s2)2 / n2)
T-0.5270T-0.7454
one or two tailed20.5270one or two tailed20.7454
DF8(n1 + n2 -2)DF18(n1 + n2 -2)
sign0.6125sign0.4657
d0.3333d0.3333
sample Esample Fsample Gsample H
M5.541.5000M1 - M1M5.541.5000M1 - M1
SD0.50.50.0013(s1)2 / n1SD0.50.50.0125(s1)2 / n1
N2002000.0013(s2)2 / n2N20200.0125(s2)2 / n2
0.0500SQRT ((s1)2 / n1 + (s2)2 / n2)0.1581SQRT ((s1)2 / n1 + (s2)2 / n2)
T30.0000T9.4868
one or two tailed230.0000one or two tailed29.4868
DF398(n1 + n2 -2)DF38(n1 + n2 -2)
sign0.0000sign0.0000
d3.0000d3.0000
cohens d (pooled variances) = (m1 - m2) /sqrt ( ( ( (n1 -1) sd12 ) + ( (n2 - 1) sd22) )/ (n1 + n2 - 2) )
T-test and effect sizes with pooled variances
* systematically change the mean-level differences, maintaining the variances and sample sizes
* systematically change the sample sizes, maintaining the variances and mean-level differences
* systematically change the variances, maintaining the mean-level differences and sample sizes
* "unbalance" the test, distort sample sizes and / or variances of the groups
* figure out some "rule of thumb" for your observations
chi2
This spread sheet calculates the chi-square value for a 2 x 2 contingency table and transforms it into an effect size and a phi coefficient
Variable A(observed minus expected frequencies)2
noyesnoyes
Variable Bno265204469no536.94536.94
yes3275107yes536.94536.94
297279576
expected frequencies(observed minus expected frequencies)2 / expected frequencies
noyesnoyes
no241.83227.17no2.222.36
yes55.1751.83yes9.7310.36
observed minus expected frequenciesCHISQ24.6758732833
-
*XLSESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*Simulate ds on homemade calculator (ES.xls)Change direction of effectsChange Ns (equal or same?)Change SDs
Sheet1
T-test and effect sizes
TreatmentControlTreatmentControl
M105pooled SD1005.0000M100pooled SD105-5.0000
SD15151515.0000SD15151515.0000
N151515.0000N151515.0000
5.47725.4772
T0.91T-0.91
0.910.91
DF28DF28
sign0.3691sign0.3691
d0.33d-0.33
one or two tailed2one or two tailed2
Sheet2
Sheet3
-
79% of T above 69% of T above *Effect size as proportion in the Treatment group doing better than the average Control group person 57% of T above
-
*Effect size as proportion of success in the Treatment versus Control group (Binomial Effect Size Display = BESD): Success: 55% of T, 45% of C Success: 62% of T, 38% of C Success: 68% of T, 32% of C
-
Long focus on significance level (safe-guarding against Type I (a) error) today focus on practical and meaningful significance.Cohen, J. (1994). The earth is round (p < .05), American Psychologist, 49, 9971003.*Why effect size?
interrater
StudyIVRater 1Rater 2IV1Rater 1
Study 1IV134Cat1Cat2Cat3Cat4
Study 1IV223Rater 2Cat14010
Study 1IV333Cat21500
Study 1IV444Cat30030
Study 2IV122Cat40015
Study 2IV233
Study 2IV344
Study 2IV443
.
Study kIV133
Study kIV222
Study kIV312
Study kIV433
StudyIVRater 1Rater 2.Rater n
Study 1IV1344
Study 1IV2232
Study 1IV3333
Study 1IV4444
Study 2IV1222
Study 2IV2334
Study 2IV3444
Study 2IV4433
.
Study kIV1332
Study kIV2221
Study kIV3121
Study kIV4333
H0 H1
Real world
H0 TrueH1 True
StudyAccept H0ok
Accept H1ok
Sheet3
-
*A short history of the effect size (Huberty, 2002; see also Olejnik & Algina, 2000 for review of effect sizes)
-
Power: Finding what is out there Type II (b) error not finding what is out therePower (1 b): the probability of rejecting a false H0 hypothesis Power of .80 or .90 in primary research
*Power and effect size
-
*Power, sought effect size, at significance level a = .05 in primary research (prior to conducting study)
Chart1
19412887
246162110
310204139
394259176
527347235
effect .20
effect .25
effect .30
power
N needed per sample
Sample size for three effects sizes, a = .05
Sheet1
Effect size = .33
PowerSample for each group
a = .10a = .05
90%95%99%
0.7086113175
0.7598126192
0.80112143212
0.85131163237
0.90155191270
0.95196235323
wedn 22.09.2005
Power
Effect0.500.600.700.800.90
effect .20194246310394527
effect .25128162204259347
effect .3087110139176235
wedn 22.09.2005
effect .20
effect .25
effect .30
power
N needed per sample
Sample size for three effects sizes, a = .05
Sheet3
-
*How meaningful is a small effect size?A small effect size changed the course of an RCT in 1987: placebo group participants were given aspirin instead (see Rosenthal, 1994, p. 242)XLS
Levene
DefinitionThe Levene test is defined as:
H0:
Ha:for at least one pair (i,j).
Test Statistic:Given a variable Y with sample of size N divided into k subgroups, where Ni is the sample size of the ith subgroup, the Levene test statistic is defined as:
where Zij can have one of the following three definitions:
1. where
is the mean of the ith subgroup.
2. where
is the median of the ith subgroup.
3. where
is the 10% trimmed mean of the ith subgroup.
are the group means of the Zij and is the overall mean of the Zij.
The three choices for defining Zij determine the robustness and power of Levene's test. By robustness, we mean the ability of the test to not falsely detect unequal variances when the underlying data are not normally distributed and the variables are in f
Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean per
(i.e., skewed) distribution. Using the mean provided the best power for symmetric, moderate-tailed, distributions.
Although the optimal choice depends on the underlying distribution, the definition based on the median is recommended as the choice that provides good robustness against many types of non-normal data while retaining good power. If you have knowledge of th
Significance Level:
Critical Region:The Levene test rejects the hypothesis that the variances are equal if
where
is the upper critical value of the F distribution with k - 1 and N - k degrees of freedom at a significance level of .
In the above formulas for the critical regions, the Handbook follows the convention that
is the upper critical value from the F distribution and
is the lower critical value. Note that this is the opposite of some texts and software programs. In particular, Dataplot uses the opposite convention.
Sample OutputDataplot generated the following output for Levene's test using the GEAR.DAT data set (by default, Dataplot performs the form of the test based on the median):
LEVENE F-TEST FOR SHIFT IN VARIATION
(CASE: TEST BASED ON MEDIANS)
1. STATISTICS
NUMBER OF OBSERVATIONS = 100
NUMBER OF GROUPS = 10
LEVENE F TEST STATISTIC = 1.705910
2. FOR LEVENE TEST STATISTIC
0 % POINT = 0.
50 % POINT = 0.9339308
75 % POINT = 1.296365
90 % POINT = 1.702053
95 % POINT = 1.985595
99 % POINT = 2.610880
99.9 % POINT = 3.478882
90.09152 % Point: 1.705910
3. CONCLUSION (AT THE 5% LEVEL):
THERE IS NO SHIFT IN VARIATION.
THUS: HOMOGENEOUS WITH RESPECT TO VARIATION.
Interpretation of Sample OutputWe are testing the hypothesis that the group variances are equal. The output is divided into three sections.
1. The first section prints the number of observations (N), the number of groups (k), and the value of the Levene test statistic.
2. The second section prints the upper critical value of the F distribution corresponding to various significance levels. The value in the first column, the confidence level of the test, is equivalent to 100(1-
). We reject the null hypothesis at that significance level if the value of the Levene F test statistic printed in section one is greater than the critical value printed in the last column.
3. The third section prints the conclusion for a 95% test. For a different significance level, the appropriate conclusion can be drawn from the table printed in section two. For example, for
= 0.10, we look at the row for 90% confidence and compare the critical value 1.702 to the Levene test statistic 1.7059. Since the test statistic is greater than the critical value, we reject the null hypothesis at the
= 0.10 level.
Output from other statistical software may look somewhat different from the above output.
QuestionLevene's test can be used to answer the following question:
Is the assumption of equal variances valid?
Related TechniquesStandard Deviation Plot
Box Plot
Bartlett Test
Chi-Square Test
Analysis of Variance
SoftwareThe Levene test is available in some general purpose statistical software programs, including Dataplot.
is the mean of the ith subgroup.
is the median of the ith subgroup.
is the 10% trimmed mean of the ith subgroup.
Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean per
Dataplot generated the following output for Levene's test using the GEAR.DAT data set (by default, Dataplot performs the form of the test based on the median):
Standard Deviation Plot
Box Plot
Bartlett Test
Chi-Square Test
Analysis of Variance
The Levene test is available in some general purpose statistical software programs, including Dataplot.
mean_effect size
T-test with unpooled varianses and effect sizes (with pooled variances)
sample Asample Bsample Csample D
M100105-5.0000M1 - M1M100105-5.0000M1 - M1
SD151545.0000(s1)2 / n1SD151522.5000(s1)2 / n1
N5545.0000(s2)2 / n2N101022.5000(s2)2 / n2
9.4868SQRT ((s1)2 / n1 + (s2)2 / n2)6.7082SQRT ((s1)2 / n1 + (s2)2 / n2)
T-0.5270T-0.7454
one or two tailed20.5270one or two tailed20.7454
DF8(n1 + n2 -2)DF18(n1 + n2 -2)
sign0.6125sign0.4657
d0.3333d0.3333
sample Esample Fsample Gsample H
M5.541.5000M1 - M1M5.541.5000M1 - M1
SD0.50.50.0013(s1)2 / n1SD0.50.50.0125(s1)2 / n1
N2002000.0013(s2)2 / n2N20200.0125(s2)2 / n2
0.0500SQRT ((s1)2 / n1 + (s2)2 / n2)0.1581SQRT ((s1)2 / n1 + (s2)2 / n2)
T30.0000T9.4868
one or two tailed230.0000one or two tailed29.4868
DF398(n1 + n2 -2)DF38(n1 + n2 -2)
sign0.0000sign0.0000
d3.0000d3.0000
cohens d (pooled variances) = (m1 - m2) /sqrt ( ( ( (n1 -1) sd12 ) + ( (n2 - 1) sd22) )/ (n1 + n2 - 2) )
T-test and effect sizes with pooled variances
* systematically change the mean-level differences, maintaining the variances and sample sizes
* systematically change the sample sizes, maintaining the variances and mean-level differences
* systematically change the variances, maintaining the mean-level differences and sample sizes
* "unbalance" the test, distort sample sizes and / or variances of the groups
* figure out some "rule of thumb" for your observations
chi2
This spread sheet calculates the chi-square value for a 2 x 2 contingency table and transforms it into an effect size and a phi coefficient
Variable A(observed minus expected frequencies)2
noyesnoyes
Variable Bno1041093311037no1807.941807.94
yes1891084511034yes1807.941807.94
2932177822071
expected frequencies(observed minus expected frequencies)2 / expected frequencies
noyesnoyes
no146.5210890.48no12.340.17
yes146.4810887.52yes12.340.17
observed minus expected frequenciesCHISQ25.01
-
*
Effect size as proportion of success in the Treatment versus Control group (Binomial Effect Size Display = BESD):
Success: 55% of T, 45% of C
Success: 62% of T, 38% of C
Success: 68% of T, 32% of C
*
*
LM chip inCompare with emerging themesCompare with interrater drift
2 June 2008
-
Within the one meta-analysis, can include studies based on any combination of statistical analysis (e.g., t-tests, ANOVA, correlation, odds-ratio, chi-square, etc). The art of meta-analysis is how to compute effect sizes based on non-standard designs and studies that do not supply complete data (see Lipsey&Wilson_AppB.pdf).Convert all effect sizes into a common metric based on the natural metric given research in the area. E.g. d, r, OR
*ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)
-
Standardized mean differenceGroup contrast researchTreatment groupsNaturally occurring groupsInherently continuous constructCorrelation coefficientAssociation between inherently continuous constructsOdds-ratioGroup contrast researchTreatment or naturally occurring groupsInherently dichotomous constructRegression coefficients and other multivariate effectsRequires access to covariance-variance (correlation) matrices for each included study*
-
*Means and standard deviationsCorrelationsP-valuesF-statisticsdt-statisticsother test statisticsAlmost all test statistics can be transformed into an standardized effect size dESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*Calculating ds (1)
-
Represents a standardized group contrast on an inherently continuous measureUses the pooled standard deviationCommonly called dESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)Calculating ds (1)
-
Cohens d
Hedges g
Glasss D
*ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)Various contrast effect sizes
-
*ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)Calculating d (1) using Ms, SDs and nsRemember to code treatment effect in positive direction!
Sheet1
T-test and effect sizes
TreatmentControlsample Csample D
M25pooled SD205.0000M20pooled SD25-5.0000
SD5551.0000SD5551.0000
N25300.8333N25300.8333
1.35401.3540
T3.6927T-3.6927
3.69273.6927
DF53DF53
sign0.0005sign0.0005
d1.0000d1.0000
one or two tailed2one or two tailed2
Sheet2
Sheet3
-
**ES_calculator.xls
-
*Calculating d (2) using ES calculator, using Ms, ns, and t-value
- **Calculating d (3) using ES calculator, using ns, and t-valueThe treatment group scored higher than the control group at Time 2 (t[28]= 4.11; p
-
Hedges proposed a correction for small sample size bias (ns < 20)Must be applied before analysis*Calculating d (3) correcting for small sample bias
-
**Calculating d (4) using ES calculator, using ns, and F-valueRemember: in a two-group ANOVA F = t2
-
**Calculating d (5) using ES calculator, using p-valueThe mean-level comparison was not significant (p = .53)
- **T-test tabledf = (n1 + ns 2) Sometimes authors only report e.g., p
-
**Example dataset so far (1) (ES_enter.sav):
enter_w
studyesTreatCntrnGroupssen1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.0025305520.28710.07330.00910.287112.1312.13
20.8040408020.23240.05000.00400.232418.5214.81
31.4615153020.41090.13330.03550.41095.928.65
40.4020204020.31940.10000.00200.31949.803.92
50.10807515520.16080.02580.00000.160838.663.87
60.10606012020.18270.03330.00000.182729.963.00
70.1745358020.22580.05080.00020.225819.623.34
80.4345408520.21980.04720.00110.219820.708.90
90.4010012022020.13670.01830.00040.136753.4821.39
100.6020016036020.10840.01130.00050.108485.1151.06
110.05556512020.18320.03360.00000.183229.781.49
120.10708015020.16380.02680.00000.163837.293.73
130.70800801
141.20900901
150.80200201
Sums360.98136.29
average es0.38
se of mean es0.05
C.I. Lower0.27
C.I. Upper0.48
Sheet2
Sheet3
- *ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)Use all available tools for calculating the following 5 effect sizes ES 6: MT = 21, MC = 20, nT = 60, nC = 60, t = .55ES 7: MT = 103.5, MC = 100, SDT = 22.0, SDC = 18.5, nT = 45, nC = 35, ES 8: nT = 45, nC = 40, p
-
**Example dataset so far (2) (ES_enter.sav):
enter_w
studyesTreatCntrnGroupssen1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.0025305520.28710.07330.00910.287112.1312.13
20.8040408020.23240.05000.00400.232418.5214.81
31.4615153020.41090.13330.03550.41095.928.65
40.4020204020.31940.10000.00200.31949.803.92
50.10807515520.16080.02580.00000.160838.663.87
60.10606012020.18270.03330.00000.182729.963.00
70.1745358020.22580.05080.00020.225819.623.34
80.4345408520.21980.04720.00110.219820.708.90
90.4010012022020.13670.01830.00040.136753.4821.39
100.6020016036020.10840.01130.00050.108485.1151.06
110.05556512020.18320.03360.00000.183229.781.49
120.10708015020.16380.02680.00000.163837.293.73
130.70800801
141.20900901
150.80200201
Sums360.98136.29
average es0.38
se of mean es0.05
C.I. Lower0.27
C.I. Upper0.48
Sheet2
Sheet3
-
**Calculating d (11) using ES calculator, using number of successful outcomes per group
Sheet1
0.8726666667
0.8726666667
0.2348484848
1.54040404040.1322901849
Estimates of Covariance Parameters(a)
ParameterEstimateStd. Error
Residual0.22222222220.0641500299
Intercept [subject = study]Variance1.54461279460.69054155810.12577417820.874
aDependent Variable: rating.
1.76683501680.8742258218
Sheet2
SuccessFailureTotal
Treatment282856
Control313465
Total5962121
Sheet3
MBD000C7826.unknown
MBD000C86C3.unknown
MBD000C7ACC.unknown
-
**Calculating d (11) using ES calculator, using number of successful outcomes per group
Sheet1
0.8726666667
0.8726666667
0.2348484848
1.54040404040.1322901849
Estimates of Covariance Parameters(a)
ParameterEstimateStd. Error
Residual0.22222222220.0641500299
Intercept [subject = study]Variance1.54461279460.69054155810.12577417820.874
aDependent Variable: rating.
1.76683501680.8742258218
Sheet2
SuccessFailureTotal
Treatment282856
Control313465
Total5962121
Sheet3
MBD000C7826.unknown
MBD000C86C3.unknown
MBD000C7ACC.unknown
-
**Calculating d (12) using ES calculator, using proportion of successes per group (53% vs. 48.5%)
-
*
Effect size as proportion of success in the Treatment versus Control group
Success: 55% of T, 45% of C
Success: 62% of T, 38% of C
Success: 68% of T, 32% of C
*
*
LM chip inCompare with emerging themesCompare with interrater drift
2 June 2008
-
**Calculating d (13) using paired t-test (only one experimental group; each person their own control)Dont use the SD of the change score!r = correlation between Time 1 and Time 2
-
**Calculating d (14) using paired t-test (only one experimental group)n (pairs) = 90, t-value = 6.5, r = .70
- **Calculating d (15)The 20 participants increased .84 z-scores between time 1 and time 2 (p
-
**Example dataset so far 3 (ES_enter.sav): Method difference: mean contrast and gain scores
enter_w
studyesTreatCntrnGroupssen1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.0025305520.28710.07330.00910.287112.1312.13
20.8040408020.23240.05000.00400.232418.5214.81
31.4615153020.41090.13330.03550.41095.928.65
40.4020204020.31940.10000.00200.31949.803.92
50.10807515520.16080.02580.00000.160838.663.87
60.10606012020.18270.03330.00000.182729.963.00
70.1745358020.22580.05080.00020.225819.623.34
80.4345408520.21980.04720.00110.219820.708.90
90.4010012022020.13670.01830.00040.136753.4821.39
100.6020016036020.10840.01130.00050.108485.1151.06
110.05566512120.18240.03320.00000.182430.071.50
120.10708015020.16380.02680.00000.163837.293.69
130.70800801
140.53900901
150.80200201
Sums361.27136.27
average es0.38
se of mean es0.05
C.I. Lower0.27
C.I. Upper0.48
Sheet2
Sheet3
-
**Summary of equations from Lipsey & Wilson (2001) (for more formulae see Lipsey & Wilson Appendix B)
-
The effect sizes are weighted by the inverse of the variance to give more weight to effects based on larger sample sizesVariance for mean level comparison is calculated as
The standard error of each effect size is given by the square root of the sampling varianceSE = vi*ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*Weighting for mean-level differences
-
*ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*Enter_w.xls
enter_w
studyesTreatCntrndGroupsrse_1grse1se2se3w1n1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.002530551.0020.07330.00910.287112.1312.13
20.804040800.8020.05000.00400.232418.5214.81
31.461515301.4620.13330.03550.41095.928.65
40.402020400.4020.10000.00200.31949.803.92
50.1080751550.1020.02580.00000.160838.663.87
60.1060601200.1020.03330.00000.182729.963.00
70.174535800.1720.05080.00020.225819.623.34
80.434540850.4320.04720.00110.219820.708.90
90.401001202200.4020.01830.00040.136753.4821.39
100.602001603600.6020.01130.00050.108485.1151.06
110.0556651210.0520.03320.00000.182430.071.50
120.1070801500.1020.02680.00000.163837.293.69
130.70800800.7010.700.10280.00750.00310.102894.67460.102894.6766.27
140.53900900.5310.650.09660.00780.00160.0966107.08550.0966107.0956.76
150.80200200.8010.500.25690.05000.01600.256915.15150.256915.1512.12
Sums578.18271.41
average es0.4694
se of mean es0.0416
95% C.I. Lower0.3879
95% C.I. Upper0.5509
enter_w2
studyesTreatCntrndGroupsrse_1grse1se2se3w1n1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.002530551.0020.07330.00910.287112.1312.13
20.804040800.8020.05000.00400.232418.5214.81
31.461515301.4620.13330.03550.41095.928.65
40.402020400.4020.10000.00200.31949.803.92
50.1080751550.1020.02580.00000.160838.663.87
60.1060601200.1020.03330.00000.182729.963.00
70.174535800.1720.05080.00020.225819.623.34
80.434540850.4320.04720.00110.219820.708.90
90.401001202200.4020.01830.00040.136753.4821.39
100.602001603600.6020.01130.00050.108485.1151.06
110.0556651210.0520.03320.00000.182430.071.50
120.1070801500.1020.02680.00000.163837.293.69
130.70800800.7010.700.10280.00750.00310.102894.67460.102894.6766.27
140.53900900.5310.650.09660.00780.00160.0966107.08550.0966107.0956.76
150.80200200.8010.500.25690.05000.01600.256915.15150.256915.1512.12
Sums578.18271.41
average es0.47
se of mean es0.04
95% C.I. Lower0.39
95% C.I. Upper0.55
enter_w2
11
0.80.8
1.461.46
0.40.4
0.10.1
0.10.1
0.170.17
0.430.43
0.40.4
0.60.6
0.050.05
0.0990.099
0.70.7
0.530.53
0.80.8
d
d
Effect sizes by sample size
Sheet2
110.2870962250.287096225
0.80.80.23237900080.2323790008
1.461.460.41092578410.4109257841
0.40.40.31937438850.3193743885
0.10.10.16082783150.1608278315
0.10.10.18268825910.1826882591
0.170.170.22577483430.2257748343
0.430.430.2197950620.219795062
0.40.40.1367368630.136736863
0.60.60.10839741690.1083974169
0.050.050.18235155280.1823515528
0.0990.0990.16376319580.1637631958
0.70.70.1027740240.102774024
0.530.530.09663505230.0966350523
0.80.80.25690465160.2569046516
d
d
Effect sizes by sample size
Sheet3
0.302002010.500.223620.0000
0.302002010.700.179631.0078
0.302002010.900.110781.6327
0.505005010.500.144647.8469
0.505005010.700.120468.9655
0.505005010.900.0806153.8462
0.808008010.500.118671.1111
0.808008010.700.107286.9565
0.808008010.900.0806153.8462
-
SE for gain scores
Inverse variance for gain scores
*ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*Weighting for gain scoresT1 and T2 scores are dependent so we need to get correlation between T1 and T2 into equation (not always reported)
-
*ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*XLSEnter_w.xls
enter_w
studyesTreatCntrndGroupsrse_1grse1se2se3w1n1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.002530551.0020.07330.00910.287112.1312.13
20.804040800.8020.05000.00400.232418.5214.81
31.461515301.4620.13330.03550.41095.928.65
40.402020400.4020.10000.00200.31949.803.92
50.1080751550.1020.02580.00000.160838.663.87
60.1060601200.1020.03330.00000.182729.963.00
70.174535800.1720.05080.00020.225819.623.34
80.434540850.4320.04720.00110.219820.708.90
90.401001202200.4020.01830.00040.136753.4821.39
100.602001603600.6020.01130.00050.108485.1151.06
110.0556651210.0520.03320.00000.182430.071.50
120.1070801500.1020.02680.00000.163837.293.69
130.70800800.7010.650.10870.00880.00310.108784.65610.108784.6659.26
140.53900900.5310.700.09070.00670.00160.0907121.54770.0907121.5564.42
150.80200200.8010.500.25690.05000.01600.256915.15150.256915.1512.12
Sums582.63272.07
average es0.4670
se of mean es0.0414
95% C.I. Lower0.3858
95% C.I. Upper0.5482
enter_w2
studyesTreatCntrndGroupsrse_1grse1se2se3w1n1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.002530551.0020.07330.00910.287112.1312.13
20.804040800.8020.05000.00400.232418.5214.81
31.461515301.4620.13330.03550.41095.928.65
40.402020400.4020.10000.00200.31949.803.92
50.1080751550.1020.02580.00000.160838.663.87
60.1060601200.1020.03330.00000.182729.963.00
70.174535800.1720.05080.00020.225819.623.34
80.434540850.4320.04720.00110.219820.708.90
90.401001202200.4020.01830.00040.136753.4821.39
100.602001603600.6020.01130.00050.108485.1151.06
110.0556651210.0520.03320.00000.182430.071.50
120.1070801500.1020.02680.00000.163837.293.69
130.70800800.7010.650.10870.00880.00310.108784.65610.108784.6659.26
140.53900900.5310.700.09070.00670.00160.0907121.54770.0907121.5564.42
150.80200200.8010.500.25690.05000.01600.256915.15150.256915.1512.12
Sums582.63272.07
average es0.47
se of mean es0.04
95% C.I. Lower0.39
95% C.I. Upper0.55
enter_w2
11
0.80.8
1.461.46
0.40.4
0.10.1
0.10.1
0.170.17
0.430.43
0.40.4
0.60.6
0.050.05
0.0990.099
0.70.7
0.530.53
0.80.8
d
d
Effect sizes by sample size
enter_w3
110.2870962250.287096225
0.80.80.23237900080.2323790008
1.461.460.41092578410.4109257841
0.40.40.31937438850.3193743885
0.10.10.16082783150.1608278315
0.10.10.18268825910.1826882591
0.170.170.22577483430.2257748343
0.430.430.2197950620.219795062
0.40.40.1367368630.136736863
0.60.60.10839741690.1083974169
0.050.050.18235155280.1823515528
0.0990.0990.16376319580.1637631958
0.70.70.10868532560.1086853256
0.530.530.09070403640.0907040364
0.80.80.25690465160.2569046516
d
d
Effect sizes by sample size
Sheet2
studyesTreatCntrndGroupsrsewwes
11.002530551.0020.287112.1312.13
20.804040800.8020.232418.5214.81
31.461515301.4620.41095.928.65
40.402020400.4020.31949.803.92
50.1080751550.1020.160838.663.87
60.1060601200.1020.182729.963.00
70.174535800.1720.225819.623.34
80.434540850.4320.219820.708.90
90.401001202200.4020.136753.4821.39
100.602001603600.6020.108485.1151.06
110.0556651210.0520.182430.071.50
120.1070801500.1020.163837.293.69
130.70800800.7010.650.108784.6659.26
140.53900900.5310.700.0907121.5564.42
150.80200200.8010.500.256915.1512.12
Sums582.63272.07
average es0.47
se of mean es0.04
95% C.I. Lower0.39
95% C.I. Upper0.55
Sheet3
0.302002010.500.223620.0000
0.302002010.700.179631.0078
0.302002010.900.110781.6327
0.505005010.500.144647.8469
0.505005010.700.120468.9655
0.505005010.900.0806153.8462
0.808008010.500.118671.1111
0.808008010.700.107286.9565
0.808008010.900.0806153.8462
-
*ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*Compute the weighted mean ES and s.e. of the ES in SPSS (var_ofES.sps) (1)
-
*ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*Compute the weighted mean ES and s.e. of the ES in SPSS (var_ofES.sps) (2)
-
Weight the ES by the inverse of the s.e.
The average ES
Standard error of the ES*ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*Compute the weighted mean ES and s.e. of the ES
-
*ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*Enter_w.xls
enter_w
studyesTreatCntrndGroupsrse_1grse1se2se3w1n1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.002530551.0020.07330.00910.287112.1312.13
20.804040800.8020.05000.00400.232418.5214.81
31.461515301.4620.13330.03550.41095.928.65
40.402020400.4020.10000.00200.31949.803.92
50.1080751550.1020.02580.00000.160838.663.87
60.1060601200.1020.03330.00000.182729.963.00
70.174535800.1720.05080.00020.225819.623.34
80.434540850.4320.04720.00110.219820.708.90
90.401001202200.4020.01830.00040.136753.4821.39
100.602001603600.6020.01130.00050.108485.1151.06
110.0556651210.0520.03320.00000.182430.071.50
120.1070801500.1020.02680.00000.163837.293.69
130.70800800.7010.650.10870.00880.00310.108784.65610.108784.6659.26
140.53900900.5310.700.09070.00670.00160.0907121.54770.0907121.5564.42
150.80200200.8010.500.25690.05000.01600.256915.15150.256915.1512.12
Sums582.63272.07
average es0.4670
se of mean es0.0414
95% C.I. Lower0.3858
95% C.I. Upper0.5482
enter_w2
studyesTreatCntrndGroupsrse_1grse1se2se3w1n1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.002530551.0020.07330.00910.287112.1312.13
20.804040800.8020.05000.00400.232418.5214.81
31.461515301.4620.13330.03550.41095.928.65
40.402020400.4020.10000.00200.31949.803.92
50.1080751550.1020.02580.00000.160838.663.87
60.1060601200.1020.03330.00000.182729.963.00
70.174535800.1720.05080.00020.225819.623.34
80.434540850.4320.04720.00110.219820.708.90
90.401001202200.4020.01830.00040.136753.4821.39
100.602001603600.6020.01130.00050.108485.1151.06
110.0556651210.0520.03320.00000.182430.071.50
120.1070801500.1020.02680.00000.163837.293.69
130.70800800.7010.650.10870.00880.00310.108784.65610.108784.6659.26
140.53900900.5310.700.09070.00670.00160.0907121.54770.0907121.5564.42
150.80200200.8010.500.25690.05000.01600.256915.15150.256915.1512.12
Sums582.63272.07
average es0.47
se of mean es0.04
95% C.I. Lower0.39
95% C.I. Upper0.55
enter_w2
11
0.80.8
1.461.46
0.40.4
0.10.1
0.10.1
0.170.17
0.430.43
0.40.4
0.60.6
0.050.05
0.0990.099
0.70.7
0.530.53
0.80.8
d
d
Effect sizes by sample size
enter_w3
110.2870962250.287096225
0.80.80.23237900080.2323790008
1.461.460.41092578410.4109257841
0.40.40.31937438850.3193743885
0.10.10.16082783150.1608278315
0.10.10.18268825910.1826882591
0.170.170.22577483430.2257748343
0.430.430.2197950620.219795062
0.40.40.1367368630.136736863
0.60.60.10839741690.1083974169
0.050.050.18235155280.1823515528
0.0990.0990.16376319580.1637631958
0.70.70.10868532560.1086853256
0.530.530.09070403640.0907040364
0.80.80.25690465160.2569046516
d
d
Effect sizes by sample size
Sheet2
studyesTreatCntrndGroupsrsewwes
11.002530551.0020.287112.1312.13
20.804040800.8020.232418.5214.81
31.461515301.4620.41095.928.65
40.402020400.4020.31949.803.92
50.1080751550.1020.160838.663.87
60.1060601200.1020.182729.963.00
70.174535800.1720.225819.623.34
80.434540850.4320.219820.708.90
90.401001202200.4020.136753.4821.39
100.602001603600.6020.108485.1151.06
110.0556651210.0520.182430.071.50
120.1070801500.1020.163837.293.69
130.70800800.7010.650.108784.6659.26
140.53900900.5310.700.0907121.5564.42
150.80200200.8010.500.256915.1512.12
Sums582.63272.07
average es0.47
se of mean es0.04
95% C.I. Lower0.39
95% C.I. Upper0.55
Sheet3
0.302002010.500.223620.0000
0.302002010.700.179631.0078
0.302002010.900.110781.6327
0.505005010.500.144647.8469
0.505005010.700.120468.9655
0.505005010.900.0806153.8462
0.808008010.500.118671.1111
0.808008010.700.107286.9565
0.808008010.900.0806153.8462
-
*ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*
enter_w
studyesTreatCntrndGroupsrse_1grse1se2se3w1n1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.002530551.0020.07330.00910.287112.1312.13
20.804040800.8020.05000.00400.232418.5214.81
31.461515301.4620.13330.03550.41095.928.65
40.402020400.4020.10000.00200.31949.803.92
50.1080751550.1020.02580.00000.160838.663.87
60.1060601200.1020.03330.00000.182729.963.00
70.174535800.1720.05080.00020.225819.623.34
80.434540850.4320.04720.00110.219820.708.90
90.401001202200.4020.01830.00040.136753.4821.39
100.602001603600.6020.01130.00050.108485.1151.06
110.0556651210.0520.03320.00000.182430.071.50
120.1070801500.1020.02680.00000.163837.293.69
130.70800800.7010.650.10870.00880.00310.108784.65610.108784.6659.26
140.53900900.5310.700.09070.00670.00160.0907121.54770.0907121.5564.42
150.80200200.8010.500.25690.05000.01600.256915.15150.256915.1512.12
Sums582.63272.07
average es0.4670
se of mean es0.0414
95% C.I. Lower0.3858
95% C.I. Upper0.5482
enter_w2
studyesTreatCntrndGroupsrse_1grse1se2se3w1n1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.002530551.0020.07330.00910.287112.1312.13
20.804040800.8020.05000.00400.232418.5214.81
31.461515301.4620.13330.03550.41095.928.65
40.402020400.4020.10000.00200.31949.803.92
50.1080751550.1020.02580.00000.160838.663.87
60.1060601200.1020.03330.00000.182729.963.00
70.174535800.1720.05080.00020.225819.623.34
80.434540850.4320.04720.00110.219820.708.90
90.401001202200.4020.01830.00040.136753.4821.39
100.602001603600.6020.01130.00050.108485.1151.06
110.0556651210.0520.03320.00000.182430.071.50
120.1070801500.1020.02680.00000.163837.293.69
130.70800800.7010.650.10870.00880.00310.108784.65610.108784.6659.26
140.53900900.5310.700.09070.00670.00160.0907121.54770.0907121.5564.42
150.80200200.8010.500.25690.05000.01600.256915.15150.256915.1512.12
Sums582.63272.07
average es0.47
se of mean es0.04
95% C.I. Lower0.39
95% C.I. Upper0.55
enter_w2
11
0.80.8
1.461.46
0.40.4
0.10.1
0.10.1
0.170.17
0.430.43
0.40.4
0.60.6
0.050.05
0.0990.099
0.70.7
0.530.53
0.80.8
d
d
Effect sizes by sample size
enter_w3
110.2870962250.287096225
0.80.80.23237900080.2323790008
1.461.460.41092578410.4109257841
0.40.40.31937438850.3193743885
0.10.10.16082783150.1608278315
0.10.10.18268825910.1826882591
0.170.170.22577483430.2257748343
0.430.430.2197950620.219795062
0.40.40.1367368630.136736863
0.60.60.10839741690.1083974169
0.050.050.18235155280.1823515528
0.0990.0990.16376319580.1637631958
0.70.70.10868532560.1086853256
0.530.530.09070403640.0907040364
0.80.80.25690465160.2569046516
d
d
Effect sizes by sample size
Sheet2
studyesTreatCntrndGroupsrsewwes
11.002530551.0020.287112.1312.13
20.804040800.8020.232418.5214.81
31.461515301.4620.41095.928.65
40.402020400.4020.31949.803.92
50.1080751550.1020.160838.663.87
60.1060601200.1020.182729.963.00
70.174535800.1720.225819.623.34
80.434540850.4320.219820.708.90
90.401001202200.4020.136753.4821.39
100.602001603600.6020.108485.1151.06
110.0556651210.0520.182430.071.50
120.1070801500.1020.163837.293.69
130.70800800.7010.650.108784.6659.26
140.53900900.5310.700.0907121.5564.42
150.80200200.8010.500.256915.1512.12
Sums582.63272.07
average es0.47
se of mean es0.04
95% C.I. Lower0.39
95% C.I. Upper0.55
Sheet3
0.302002010.500.223620.0000
0.302002010.700.179631.0078
0.302002010.900.110781.6327
0.505005010.500.144647.8469
0.505005010.700.120468.9655
0.505005010.900.0806153.8462
0.808008010.500.118671.1111
0.808008010.700.107286.9565
0.808008010.900.0806153.8462
-
Does average of ES converge toward the average of the largest (n) study?*Funnel plot for x = sample size, y = ES 95% C.I. = 1.96 * s.e.99% C.I. = 2.58 * s.e.99.9% C.I. = 3.29 * s.e.
Chart2
11
0.80.8
1.461.46
0.40.4
0.10.1
0.10.1
0.170.17
0.430.43
0.40.4
0.60.6
0.050.05
0.0990.099
0.70.7
0.530.53
0.80.8
d
d
Effect sizes by sample size
enter_w
studyesTreatCntrndGroupsrse_1grse1se2se3w1n1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.002530551.0020.07330.00910.287112.1312.13
20.804040800.8020.05000.00400.232418.5214.81
31.461515301.4620.13330.03550.41095.928.65
40.402020400.4020.10000.00200.31949.803.92
50.1080751550.1020.02580.00000.160838.663.87
60.1060601200.1020.03330.00000.182729.963.00
70.174535800.1720.05080.00020.225819.623.34
80.434540850.4320.04720.00110.219820.708.90
90.401001202200.4020.01830.00040.136753.4821.39
100.602001603600.6020.01130.00050.108485.1151.06
110.0556651210.0520.03320.00000.182430.071.50
120.1070801500.1020.02680.00000.163837.293.69
130.70800800.7010.700.10280.00750.00310.102894.67460.102894.6766.27
140.53900900.5310.650.09660.00780.00160.0966107.08550.0966107.0956.76
150.80200200.8010.500.25690.05000.01600.256915.15150.256915.1512.12
Sums578.18271.41
average es0.47
se of mean es0.04
95% C.I. Lower0.39
95% C.I. Upper0.55
enter_w
d
d
Effect sizes by sample size
enter_w2
0.2870962250.287096225
0.23237900080.2323790008
0.41092578410.4109257841
0.31937438850.3193743885
0.16082783150.1608278315
0.18268825910.1826882591
0.22577483430.2257748343
0.2197950620.219795062
0.1367368630.136736863
0.10839741690.1083974169
0.18235155280.1823515528
0.16376319580.1637631958
0.1027740240.102774024
0.09663505230.0966350523
0.25690465160.2569046516
d
d
Effect sizes by sample size
Sheet2
studyesTreatCntrndGroupsrse_1grse1se2se3w1n1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.002530551.0020.07330.00910.287112.1312.13
20.804040800.8020.05000.00400.232418.5214.81
31.461515301.4620.13330.03550.41095.928.65
40.402020400.4020.10000.00200.31949.803.92
50.1080751550.1020.02580.00000.160838.663.87
60.1060601200.1020.03330.00000.182729.963.00
70.174535800.1720.05080.00020.225819.623.34
80.434540850.4320.04720.00110.219820.708.90
90.401001202200.4020.01830.00040.136753.4821.39
100.602001603600.6020.01130.00050.108485.1151.06
110.0556651210.0520.03320.00000.182430.071.50
120.1070801500.1020.02680.00000.163837.293.69
130.70800800.7010.700.10280.00750.00310.102894.67460.102894.6766.27
140.53900900.5310.650.09660.00780.00160.0966107.08550.0966107.0956.76
150.80200200.8010.500.25690.05000.01600.256915.15150.256915.1512.12
Sums578.18271.41
average es0.47
se of mean es0.04
95% C.I. Lower0.39
95% C.I. Upper0.55
Sheet2
11
0.80.8
1.461.46
0.40.4
0.10.1
0.10.1
0.170.17
0.430.43
0.40.4
0.60.6
0.050.05
0.0990.099
0.70.7
0.530.53
0.80.8
d
d
Effect sizes by sample size
Sheet3
110.2870962250.287096225
0.80.80.23237900080.2323790008
1.461.460.41092578410.4109257841
0.40.40.31937438850.3193743885
0.10.10.16082783150.1608278315
0.10.10.18268825910.1826882591
0.170.170.22577483430.2257748343
0.430.430.2197950620.219795062
0.40.40.1367368630.136736863
0.60.60.10839741690.1083974169
0.050.050.18235155280.1823515528
0.0990.0990.16376319580.1637631958
0.70.70.1027740240.102774024
0.530.530.09663505230.0966350523
0.80.80.25690465160.2569046516
d
d
Effect sizes by sample size
0.302002010.500.223620.0000
0.302002010.700.179631.0078
0.302002010.900.110781.6327
0.505005010.500.144647.8469
0.505005010.700.120468.9655
0.505005010.900.0806153.8462
0.808008010.500.118671.1111
0.808008010.700.107286.9565
0.808008010.900.0806153.8462
enter_w
studyesTreatCntrndGroupsrse_1grse1se2se3w1n1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.002530551.0020.07330.00910.287112.1312.13
20.804040800.8020.05000.00400.232418.5214.81
31.461515301.4620.13330.03550.41095.928.65
40.402020400.4020.10000.00200.31949.803.92
50.1080751550.1020.02580.00000.160838.663.87
60.1060601200.1020.03330.00000.182729.963.00
70.174535800.1720.05080.00020.225819.623.34
80.434540850.4320.04720.00110.219820.708.90
90.401001202200.4020.01830.00040.136753.4821.39
100.602001603600.6020.01130.00050.108485.1151.06
110.0556651210.0520.03320.00000.182430.071.50
120.1070801500.1020.02680.00000.163837.293.69
130.70800800.7010.700.10280.00750.00310.102894.67460.102894.6766.27
140.53900900.5310.650.09660.00780.00160.0966107.08550.0966107.0956.76
150.80200200.8010.500.25690.05000.01600.256915.15150.256915.1512.12
Sums578.18271.41
average es0.47
se of mean es0.04
95% C.I. Lower0.39
95% C.I. Upper0.55
enter_w
d
d
Effect sizes by sample size
enter_w2
0.2870962250.287096225
0.23237900080.2323790008
0.41092578410.4109257841
0.31937438850.3193743885
0.16082783150.1608278315
0.18268825910.1826882591
0.22577483430.2257748343
0.2197950620.219795062
0.1367368630.136736863
0.10839741690.1083974169
0.18235155280.1823515528
0.16376319580.1637631958
0.1027740240.102774024
0.09663505230.0966350523
0.25690465160.2569046516
d
d
Effect sizes by sample size
Sheet2
studyesTreatCntrndGroupsrse_1grse1se2se3w1n1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.002530551.0020.07330.00910.287112.1312.13
20.804040800.8020.05000.00400.232418.5214.81
31.461515301.4620.13330.03550.41095.928.65
40.402020400.4020.10000.00200.31949.803.92
50.1080751550.1020.02580.00000.160838.663.87
60.1060601200.1020.03330.00000.182729.963.00
70.174535800.1720.05080.00020.225819.623.34
80.434540850.4320.04720.00110.219820.708.90
90.401001202200.4020.01830.00040.136753.4821.39
100.602001603600.6020.01130.00050.108485.1151.06
110.0556651210.0520.03320.00000.182430.071.50
120.1070801500.1020.02680.00000.163837.293.69
130.70800800.7010.700.10280.00750.00310.102894.67460.102894.6766.27
140.53900900.5310.650.09660.00780.00160.0966107.08550.0966107.0956.76
150.80200200.8010.500.25690.05000.01600.256915.15150.256915.1512.12
Sums578.18271.41
average es0.47
se of mean es0.04
95% C.I. Lower0.39
95% C.I. Upper0.55
Sheet2
11
0.80.8
1.461.46
0.40.4
0.10.1
0.10.1
0.170.17
0.430.43
0.40.4
0.60.6
0.050.05
0.0990.099
0.70.7
0.530.53
0.80.8
d
d
Effect sizes by sample size
Sheet3
110.2870962250.287096225
0.80.80.23237900080.2323790008
1.461.460.41092578410.4109257841
0.40.40.31937438850.3193743885
0.10.10.16082783150.1608278315
0.10.10.18268825910.1826882591
0.170.170.22577483430.2257748343
0.430.430.2197950620.219795062
0.40.40.1367368630.136736863
0.60.60.10839741690.1083974169
0.050.050.18235155280.1823515528
0.0990.0990.16376319580.1637631958
0.70.70.1027740240.102774024
0.530.530.09663505230.0966350523
0.80.80.25690465160.2569046516
d
d
Effect sizes by sample size
0.302002010.500.223620.0000
0.302002010.700.179631.0078
0.302002010.900.110781.6327
0.505005010.500.144647.8469
0.505005010.700.120468.9655
0.505005010.900.0806153.8462
0.808008010.500.118671.1111
0.808008010.700.107286.9565
0.808008010.900.0806153.8462
-
ES in smaller sample has larger standard error (s.e.)*Funnel plot including s.e. of ES
Chart1
110.2870962250.287096225
0.80.80.23237900080.2323790008
1.461.460.41092578410.4109257841
0.40.40.31937438850.3193743885
0.10.10.16082783150.1608278315
0.10.10.18268825910.1826882591
0.170.170.22577483430.2257748343
0.430.430.2197950620.219795062
0.40.40.1367368630.136736863
0.60.60.10839741690.1083974169
0.050.050.18235155280.1823515528
0.0990.0990.16376319580.1637631958
0.70.70.1027740240.102774024
0.530.530.09663505230.0966350523
0.80.80.25690465160.2569046516
d
d
Effect sizes by sample size
enter_w
studyesTreatCntrndGroupsrse_1grse1se2se3w1n1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.002530551.0020.07330.00910.287112.1312.13
20.804040800.8020.05000.00400.232418.5214.81
31.461515301.4620.13330.03550.41095.928.65
40.402020400.4020.10000.00200.31949.803.92
50.1080751550.1020.02580.00000.160838.663.87
60.1060601200.1020.03330.00000.182729.963.00
70.174535800.1720.05080.00020.225819.623.34
80.434540850.4320.04720.00110.219820.708.90
90.401001202200.4020.01830.00040.136753.4821.39
100.602001603600.6020.01130.00050.108485.1151.06
110.0556651210.0520.03320.00000.182430.071.50
120.1070801500.1020.02680.00000.163837.293.69
130.70800800.7010.700.10280.00750.00310.102894.67460.102894.6766.27
140.53900900.5310.650.09660.00780.00160.0966107.08550.0966107.0956.76
150.80200200.8010.500.25690.05000.01600.256915.15150.256915.1512.12
Sums578.18271.41
average es0.47
se of mean es0.04
95% C.I. Lower0.39
95% C.I. Upper0.55
enter_w
d
d
Effect sizes by sample size
enter_w2
0.2870962250.287096225
0.23237900080.2323790008
0.41092578410.4109257841
0.31937438850.3193743885
0.16082783150.1608278315
0.18268825910.1826882591
0.22577483430.2257748343
0.2197950620.219795062
0.1367368630.136736863
0.10839741690.1083974169
0.18235155280.1823515528
0.16376319580.1637631958
0.1027740240.102774024
0.09663505230.0966350523
0.25690465160.2569046516
d
d
Effect sizes by sample size
Sheet2
studyesTreatCntrndGroupsrse_1grse1se2se3w1n1+n2/(n1*n2)d2 / (2(n1+n2))sewwes
11.002530551.0020.07330.00910.287112.1312.13
20.804040800.8020.05000.00400.232418.5214.81
31.461515301.4620.13330.03550.41095.928.65
40.402020400.4020.10000.00200.31949.803.92
50.1080751550.1020.02580.00000.160838.663.87
60.1060601200.1020.03330.00000.182729.963.00
70.174535800.1720.05080.00020.225819.623.34
80.434540850.4320.04720.00110.219820.708.90
90.401001202200.4020.01830.00040.136753.4821.39
100.602001603600.6020.01130.00050.108485.1151.06
110.0556651210.0520.03320.00000.182430.071.50
120.1070801500.1020.02680.00000.163837.293.69
130.70800800.7010.700.10280.00750.00310.102894.67460.102894.6766.27
140.53900900.5310.650.09660.00780.00160.0966107.08550.0966107.0956.76
150.80200200.8010.500.25690.05000.01600.256915.15150.256915.1512.12
Sums578.18271.41
average es0.47
se of mean es0.04
95% C.I. Lower0.39
95% C.I. Upper0.55
Sheet2
11
0.80.8
1.461.46
0.40.4
0.10.1
0.10.1
0.170.17
0.430.43
0.40.4
0.60.6
0.050.05
0.0990.099
0.70.7
0.530.53
0.80.8
d
d
Effect sizes by sample size
Sheet3
110.2870962250.287096225
0.80.80.23237900080.2323790008
1.461.460.41092578410.4109257841
0.40.40.31937438850.3193743885
0.10.10.16082783150.1608278315
0.10.10.18268825910.1826882591
0.170.170.22577483430.2257748343
0.430.430.2197950620.219795062
0.40.40.1367368630.136736863
0.60.60.10839741690.1083974169
0.050.050.18235155280.1823515528
0.0990.0990.16376319580.1637631958
0.70.70.1027740240.102774024
0.530.530.09663505230.0966350523
0.80.80.25690465160.2569046516
d
d
Effect sizes by sample size
0.302002010.500.223620.0000
0.302002010.700.179631.0078
0.302002010.900.110781.6327
0.505005010.500.144647.8469
0.505005010.700.120468.9655
0.505005010.900.0806153.8462
0.808008010.500.118671.1111
0.808008010.700.107286.9565
0.808008010.900.0806153.8462
-
*Samplen = sizem = meand = effect sizeESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*
-
Means and standard deviations (d)c2 fP-valuesF-statisticsrt-statisticsother test statisticsAlmost all test statistics can be transformed into an standardized effect size rESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*Calculating rs
-
*Correlations / relationships between variables rxy Pearsons product moment coefficient (continuous continuous)Rpb Bi-serial correlation (dichotomous continuous)c2 (dichotomous dichotomous)rsSpearmans rank-order coefficient (ordinal ordinal)And others, e.g., f coefficient, Odds-Ratio (OR) Cramers V, Contingency coefficient C Tetrachoric and polychoric correlations . (etc)
-
*Bias when dichotomising continuous variables
X or Y are both truly continuous, but in the study either is dichotomised X = continuous, Y =50/50 split gives an rpb that is 80% of its value, had it been continuous X or Y are both truly continuous, but both are dichotomised Maximum value of f if x = 30/70 split and Y = 50/50 split is f = .33
-
*ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*Calculating rs from d (1)r can be used in all situations d can, but d cannot be used in all situations where r is appropriate
Levene
DefinitionThe Levene test is defined as:
Test Statistic:
1. where
is the mean of the ith subgroup.
2. where
is the median of the ith subgroup.
3. where
is the 10% trimmed mean of the ith subgroup.
Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean per
(i.e., skewed) distribution. Using the mean provided the best power for symmetric, moderate-tailed, distributions.
Although the optimal choice depends on the underlying distribution, the definition based on the median is recommended as the choice that provides good robustness against many types of non-normal data while retaining good power. If you have knowledge of th
Significance Level:
Critical Region:The Levene test rejects the hypothesis that the variances are equal if
where
In the above formulas for the critical regions, the Handbook follows the convention that
is the upper critical value from the F distribution and
is the lower critical value. Note that this is the opposite of some texts and software programs. In particular, Dataplot uses the opposite convention.
Sample OutputDataplot generated the following output for Levene's test using the GEAR.DAT data set (by default, Dataplot performs the form of the test based on the median):
LEVENE F-TEST FOR SHIFT IN VARIATION
(CASE: TEST BASED ON MEDIANS)
1. STATISTICS
NUMBER OF OBSERVATIONS = 100
NUMBER OF GROUPS = 10
LEVENE F TEST STATISTIC = 1.705910
2. FOR LEVENE TEST STATISTIC
0 % POINT = 0.
50 % POINT = 0.9339308
75 % POINT = 1.296365
90 % POINT = 1.702053
95 % POINT = 1.985595
99 % POINT = 2.610880
99.9 % POINT = 3.478882
90.09152 % Point: 1.705910
3. CONCLUSION (AT THE 5% LEVEL):
THERE IS NO SHIFT IN VARIATION.
THUS: HOMOGENEOUS WITH RESPECT TO VARIATION.
Interpretation of Sample OutputWe are testing the hypothesis that the group variances are equal. The output is divided into three sections.
2. The second section prints the upper critical value of the F distribution corresponding to various significance levels. The value in the first column, the confidence level of the test, is equivalent to 100(1-
). We reject the null hypothesis at that significance level if the value of the Levene F test statistic printed in section one is greater than the critical value printed in the last column.
3. The third section prints the conclusion for a 95% test. For a different significance level, the appropriate conclusion can be drawn from the table printed in section two. For example, for
= 0.10, we look at the row for 90% confidence and compare the critical value 1.702 to the Levene test statistic 1.7059. Since the test statistic is greater than the critical value, we reject the null hypothesis at the
= 0.10 level.
Output from other statistical software may look somewhat different from the above output.
QuestionLevene's test can be used to answer the following question:
Is the assumption of equal variances valid?
Related TechniquesStandard Deviation Plot
Box Plot
Bartlett Test
Chi-Square Test
Analysis of Variance
SoftwareThe Levene test is available in some general purpose statistical software programs, including Dataplot.
is the mean of the ith subgroup.
is the median of the ith subgroup.
is the 10% trimmed mean of the ith subgroup.
Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean per
Dataplot generated the following output for Levene's test using the GEAR.DAT data set (by default, Dataplot performs the form of the test based on the median):
Standard Deviation Plot
Box Plot
Bartlett Test
Chi-Square Test
Analysis of Variance
The Levene test is available in some general purpose statistical software programs, including Dataplot.
mean_effect size
T-test with unpooled varianses and effect sizes (with pooled variances)
sample Asample Bsample Csample D
M100105-5.0000M100105-5.0000
SD151545.0000SD151522.5000
N5545.0000N101022.5000
9.48686.7082
T-0.5270T-0.7454
one or two tailed20.5270one or two tailed20.7454
DF8DF18
sign0.6125sign0.4657
d0.3333d0.3333
sample Esample Fsample Gsample H
M5.541.5000M5.541.5000
SD0.50.50.0013SD0.50.50.0125
N2002000.0013N20200.0125
0.05000.1581
T30.0000T9.4868
one or two tailed230.0000one or two tailed29.4868
DF398DF38
sign0.0000sign0.0000
d3.0000d3.0000
T-test and effect sizes with pooled variances
* systematically change the mean-level differences, maintaining the variances and sample sizes
* systematically change the sample sizes, maintaining the variances and mean-level differences
* systematically change the variances, maintaining the mean-level differences and sample sizes
* "unbalance" the test, distort sample sizes and / or variances of the groups
* figure out some "rule of thumb" for your observations
ES
T-test and effect sizes
TreatmentControlTreatmentControl
M105pooled SD1005.0000M105pooled SD1005.0000
SD1515152.2500SD1515152.2500
N1001002.2500N1001002.2500
2.12132.1213
T2.3570T2.3570
2.35702.3570
DF198DF198
sign0.0194sign0.0194
Cohen's d0.3333
0.3333
0.3333d0.3333
one or two tailed2one or two tailed2
Hedge's g0.331662479
Cohen's d from Hedge's g0.3333333333
chi2
This spread sheet calculates the chi-square value for a 2 x 2 contingency table and transforms it into an effect size and a phi coefficient
Variable A(observed minus expected frequencies)21.0967741935
noyesnoyes0.0923733201
Variable Bno282856no0.480.480.0504772241
yes313465yes0.480.48
59621210.0222680886
expected frequencies(observed minus expected frequencies)2 / expected frequencies
noyesnoyes
no27.3128.69no0.020.02
yes31.6933.31yes0.020.01
observed minus expected frequenciesCHISQ0.06
-
If inherently continuous X and Y, mean-contrast is a better option than rpb
*Calculating rpb (2)
-
**Calculating r (3) from t-valueAppropriate for both independent and dependent samples t-test valuesCalculating r (4) from c2-value
-
*Sources of error
Cf. Structural Equation Model (circle = latent/ unobserved construct, rectangle = manifest/ observed variable)
Manifest (observed) variable xManifest (observed) variable yLatent (unobserved) XLatent (unobserved) Yrx*y*rxxryyrxy
-
*Alternatively: transform rs into Fishers Zr-transformed rs, which are more normally distributed
ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)*
Levene
DefinitionThe Levene test is defined as:
Test Statistic:
1. where
is the mean of the ith subgroup.
2. where
is the median of the ith subgroup.
3. where
is the 10% trimmed mean of the ith subgroup.
Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean per
(i.e., skewed) distribution. Using the mean provided the best power for symmetric, moderate-tailed, distributions.
Although the optimal choice depends on the underlying distribution, the definition based on the median is recommended as the choice that provides good robustness against many types of non-normal data while retaining good power. If you have knowledge of th
Significance Level:
Critical Region:The Levene test rejects the hypothesis that the variances are equal if
where
In the above formulas for the critical regions, the Handbook follows the convention that
is the upper critical value from the F distribution and
is the lower critical value. Note that this is the opposite of some texts and software programs. In particular, Dataplot uses the opposite convention.
Sample OutputDataplot generated the following output for Levene's test using the GEAR.DAT data set (by default, Dataplot performs the form of the test based on the median):
LEVENE F-TEST FOR SHIFT IN VARIATION
(CASE: TEST BASED ON MEDIANS)
1. STATISTICS
NUMBER OF OBSERVATIONS = 100
NUMBER OF GROUPS = 10
LEVENE F TEST STATISTIC = 1.705910
2. FOR LEVENE TEST STATISTIC
0 % POINT = 0.
50 % POINT = 0.9339308
75 % POINT = 1.296365
90 % POINT = 1.702053
95 % POINT = 1.985595
99 % POINT = 2.610880
99.9 % POINT = 3.478882
90.09152 % Point: 1.705910
3. CONCLUSION (AT THE 5% LEVEL):
THERE IS NO SHIFT IN VARIATION.
THUS: HOMOGENEOUS WITH RESPECT TO VARIATION.
Interpretation of Sample OutputWe are testing the hypothesis that the group variances are equal. The output is divided into three sections.
2. The second section prints the upper critical value of the F distribution corresponding to various significance levels. The value in the first column, the confidence level of the test, is equivalent to 100(1-
). We reject the null hypothesis at that significance level if the value of the Levene F test statistic printed in section one is greater than the critical value printed in the last column.
3. The third section prints the conclusion for a 95% test. For a different significance level, the appropriate conclusion can be drawn from the table printed in section two. For example, for
= 0.10, we look at the row for 90% confidence and compare the critical value 1.702 to the Levene test statistic 1.7059. Since the test statistic is greater than the critical value, we reject the null hypothesis at the
= 0.10 level.
Output from other statistical software may look somewhat different from the above output.
QuestionLevene's test can be used to answer the following question:
Is the assumption of equal variances valid?
Related TechniquesStandard Deviation Plot
Box Plot
Bartlett Test
Chi-Square Test
Analysis of Variance
SoftwareThe Levene test is available in some general purpose statistical software programs, including Dataplot.
is the mean of the ith subgroup.
is the median of the ith subgroup.
is the 10% trimmed mean of the ith subgroup.
Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean per
Dataplot generated the following output for Levene's test using the GEAR.DAT data set (by default, Dataplot performs the form of the test based on the median):
Standard Deviation Plot
Box Plot
Bartlett Test
Chi-Square Test
Analysis of Variance
The Levene test is available in some general purpose statistical software programs, including Dataplot.
mean_effect size
T-test with unpooled varianses and effect sizes (with pooled variances)
sample Asample Bsample Csample D
M100105-5.0000M100105-5.0000
SD151545.0000SD151522.5000
N5545.0000N101022.5000
9.48686.7082
T-0.5270T-0.7454
one or two tailed20.5270one or two tailed20.7454
DF8DF18
sign0.6125sign0.4657
d0.3333d0.3333
sample Esample Fsample Gsample H
M5.541.5000M5.541.5000
SD0.50.50.0013SD0.50.50.0125
N2002000.0013N20200.0125
0.05000.1581
T30.0000T9.4868
one or two tailed230.0000one or two tailed29.4868
DF398DF38
sign0.0000sign0.0000
d3.0000d3.0000
T-test and effect sizes with pooled variances
* systematically change the mean-level differences, maintaining the variances and sample sizes
* systematically change the sample sizes, maintaining the variances and mean-level differences
* systematically change the variances, maintaining the mean-level differences and sample sizes
* "unbalance" the test, distort sample sizes and / or variances of the groups
* figure out some "rule of thumb" for your observations
chi2
This spread sheet calculates the chi-square value for a 2 x 2 contingency table and transforms it into an effect size and a phi coefficient
Variable A(observed minus expected frequencies)21.0967741935
noyesnoyes0.0923733201
Variable Bno282856no0.480.480.0504772241
yes313465yes0.480.48
59621210.0222680886
expected frequencies(observed minus expected frequencies)2 / expected frequencies
noyesnoyes
no27.3128.69no0.020.02
yes31.6933.31yes0.020.01
observed minus expected frequenciesCHISQ0.06
-
*ESRC RDI One Day Meta-analysis workshop (Marsh, OMara, Malmberg)rr.xls
Levene
DefinitionThe Levene test is defined as:
Test Statistic:
1. where
is the mean of the ith subgroup.
2. where
is the median of the ith subgroup.
3. where
is the 10% trimmed mean of the ith subgroup.
Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean per
(i.e., skewed) distribution. Using the mean provided the best power for symmetric, moderate-tailed, distributions.
Although the optimal choice depends on the underlying distribution, the definition based on the median is recommended as the choice that provides good robustness against many types of non-normal data while retaining good power. If you have knowledge of th
Significance Level:
Critical Region:The Levene test rejects the hypothesis that the variances are equal if
where
In the above formulas for the critical regions, the Handbook follows the convention that
is the upper critical value from the F distribution and
is the lower critical value. Note that this is the opposite of some texts and software programs. In particular, Dataplot uses the opposite convention.
Sample OutputDataplot generated the following output for Levene's test using the GEAR.DAT data set (by default, Dataplot performs the form of the test based on the median):
LEVENE F-TEST FOR SHIFT IN VARIATION
(CASE: TEST BASED ON MEDIANS)
1. STATISTICS
NUMBER OF OBSERVATIONS = 100
NUMBER OF GROUPS = 10
LEVENE F TEST STATISTIC = 1.705910
2. FOR LEVENE TEST STATISTIC
0 % POINT = 0.
50 % POINT = 0.9339308
75 % POINT = 1.296365
90 % POINT = 1.702053
95 % POINT = 1.985595
99 % POINT = 2.610880
99.9 % POINT = 3.478882
90.09152 % Point: 1.705910
3. CONCLUSION (AT THE 5% LEVEL):
THERE IS NO SHIFT IN VARIATION.
THUS: HOMOGENEOUS WITH RESPECT TO VARIATION.
Interpretation of Sample OutputWe are testing the hypothesis that the group variances are equal. The output is divided into three sections.
2. The second section prints the upper critical value of the F distribution corresponding to various significance levels. The value in the first column, the confidence level of the test, is equivalent to 100(1-
). We reject the null hypothesis at that significance level if the value of the Levene F test statistic printed in section one is greater than the critical value printed in the last column.
3. The third section prints the conclusion for a 95% test. For a different significance level, the appropriate conclusion can be drawn from the table printed in section two. For example, for
= 0.10, we look at the row for 90% confidence and compare the critical value 1.702 to the Levene test statistic 1.7059. Since the test statistic is greater than the critical value, we reject the null hypothesis at the
= 0.10 level.
Output from other statistical software may look somewhat different from the above output.
QuestionLevene's test can be used to answer the following question:
Is the assumption of equal variances valid?
Related TechniquesStandard Deviation Plot
Box Plot
Bartlett Test
Chi-Square Test
Analysis of Variance
SoftwareThe Levene test is available in some general purpose statistical software programs, including Dataplot.
is the mean of the ith subgroup.
is the median of the ith subgroup.
is the 10% trimmed mean of the ith subgroup.
Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean per
Dataplot generated the following output for Levene's test using the GEAR.DAT data set (by default, Dataplot performs the form of the test based on the median):
Standard Deviation Plot
Box Plot
Bartlett Test
Chi-Square Test
Analysis of Variance
The Levene test is available in some general purpose statistical software programs, including Dataplot.
mean_effect size
T-test with unpooled varianses and effect sizes (with pooled variances)
sample Asample Bsample Csample D
M100105-5.0000M100105-5.0000
SD151545.0000SD151522.5000
N5545.0000N101022.5000
9.48686.7082
T-0.5270T-0.7454
one or two tailed20.5270one or two tailed20.7454
DF8DF18
sign0.6125sign0.4657
d0.3333d0.3333
sample Esample Fsample Gsample H
M5.541.5000M5.541.5000
SD0.50.50.0013SD0.50.50.0125
N2002000.0013N20200.0125
0.05000.1581
T30.0000T9.4868
one or two tailed230.0000one or two tailed29.4868
DF398DF38
sign0.0000sign0.0000
d3.0000d3.0000
T-test and effect sizes with pooled variances
* systematically change the mean-level differences, maintaining the variances and sample sizes
* systematically change the sample sizes, maintaining the variances and mean-level differences
* systematically change the variances, maintaining the mean-level differences and sample sizes
* "unbalance" the test, distort sample sizes and / or variances of the groups
* figure out some "rule of thumb" for your observations
chi2
This spread sheet calculates the chi-square value for a 2 x 2 contingency table and transforms it into an effect size and a phi coefficient
Variable A(observed minus expected frequencies)21.0967741935
noyesnoyes0.0923733201
Variable Bno282856no0.480.480.0504772241
yes313465yes0.480.48
59621210.0222680886
expected frequencies(observed minus expected frequencies)2 / expected frequencies
noyesnoyes
no27.3128.69no0.020.02
yes31.6933.31yes0.020.01
observed minus expected frequenciesCHISQ0.06
-
*
Chart1
0.46
0.33
0.25
-0.2
-0.25
-0.4
-0.1
0.1
0.275
0.15
r
N
ES (r)
Ten effect sizes (r)
Levene
DefinitionThe Levene test is defined as:
Test Statistic:
1. where
is the mean of the ith subgroup.
2. where
is the median of the ith subgroup.
3. where
is the 10% trimmed mean of the ith subgroup.
Levene's original paper only proposed using the mean. Brown and Forsythe (1974)) extended Levene's test to use either the median or the trimmed mean in addition to the mean. They performed Monte Carlo studies that indicated that using the trimmed mean per
(i.e., skewed) distribution. Using the mean provided the best power for symmetric, moderate-tailed, distributions.
Although the optimal choice depends on the underlying distribution, the definition based on the median is recommended as the choice that provides good robustness against many types of non-normal data while retaining good power. If you have knowledge of th
Significance Level:
Critical Region:The Levene test rejects the hypothesis that the variances are equal if
where
In the above formulas for the critical regions, the Handbook follows the convention that
is the upper critical value from the F distribution and
is the lower critical value. Note that this is the opposite of some texts and software programs. In particular, Dataplot uses the opposite convention.
Sample OutputDataplot generated the following output for Levene's test using the GEAR.DAT data set (by default, Dataplot performs the form of the test based on the median):
LEVENE F-TEST FOR SHIFT IN VARIATION
(CASE: TEST BASED ON MEDIANS)
1. STATISTICS
NUMBER OF OBSERVATIONS = 100
NUMBER OF GROUPS = 10
LEVENE F TEST STATISTIC = 1.705910
2. FOR LEVENE TEST STATISTIC
0 % POINT = 0.
50 % POINT = 0.9339308
75 % POINT = 1.296365
90 % POINT = 1.702053
95 % POINT = 1.985595
99 % POINT = 2.610880
99.9 % POINT = 3.478882
90.09152 % Point: 1.705910
3. CONCLUSION (AT THE 5% LEVEL):
THERE IS NO SHIFT IN VARIATION.
THUS: HOMOGENEOUS WITH RESPECT TO VARIATION.
Interpretation of Sample OutputWe are testing the hypothesis that the group variances are equal. The output is divided into three sections.
2. The second section prints the upper critical value of the F