variance component analysis by paravayya c pujeri
TRANSCRIPT
VARIANCE COMPONENT ANALYSIS by using ANOVA method
ByPARAVAYYA C PUJERIPGS14AGR6424
Flow Of Seminar
Basic conceptsIntroductionApplicationsAssumptionStepsExamplesCase studyConclusionReferences
• Factor: An independent variable defining groups of cases.
• Fixed factors : are generally thought of as variables whose values of interest are all represented in the data file.
• Random factors : are variables whose values in the data file can be considered a random sample from a larger population of values. They are useful for explaining excess variability in the dependent variable.
• Fixed eflects : the effects attributable to a finite set of levels of a factor that occur in the data and which are there because we are interested in them
• Random eflect: These are attributable to a (usually) infinite set of levels of a factor, of which only a random sample are deemed to occur in the data.
• Mixed effect model: Models that contain both fixed and random effects are called mixed effect models.
Basic concepts
VARIANCE COMPONENT ANALYSIS
Other Names: Components of variation, sources of variance, variance analysis, intraclass correlation, random effects models, analysis of a nested data.
Definition :Is a technique of partitioning of total variations into different components of which some are
known and some are completely unknown.
• Variance components : are a way to assess the amount of variation in a dependent variable that is associated with one or more random-effects variables.
The Variance Components procedure, for random-effects/ mixed-effects models , estimates the contribution of each random effect to the variance of the dependent variable. This procedure is particularly interesting for analysis of mixed models such as split plot, univariate repeated measures, and random block designs. By calculating variance components, we can determine where to focus attention in order to reduce the variance.Example. At an agriculture school, weight gains for pigs in six different litters are measured after one month. The litter variable is a random factor with six levels. (The six litters studied are a random sample from a large population of pig litters.) The investigator finds out that the variance in weight gain is attributable to the difference in litters much more than to the difference in pigs within a litter.
INTRODUCTION
Variance components models are a way to assess the amount of variation in a dependent variable that is associated with one or more random-effects variable . the variance components procedure estimates only variance components, not model regression coefficients`.
Variance component model/ analysis can be traced back to the works of astronomers Airy(1861) and Chauvenel (1863). A modern interpretation of a one way random model is given by R A Fisher (1918). Later Handerson(1950), estimates variance components by “equating mean sum square to expected mean sum square” these methods popularly known as Handerson methods.
Historical development
APPLICATIONS
• components of variance have been used widely in agricultural genetics and animal breeding
(1) to predict the breeding values of sires or dams and to predict real producing abilities of
cows, (2) to indicate sources of variation which should
be considered in analyzing production records,
• In plant breeding, • Epidemologies, psychometric testing • In Engineering and• In Environmental science and etc..
The observations are normally distributed (under some conditions this assumption can be relaxed) with each source of variance being constant for all subgroups (this may be true only after a transformation).
The values of errors are independent of each other and the variables in the model.
The errors have a normal distribution with a mean of 0. The data are completely balanced; this means that all
similar subgroups have the same numbers of observations (more complex methods allow estimation of variance components from unbalanced data).
Variance Components assumes:
• Analysis of variance (ANOVA), • maximum likelihood (ML), • minimum norm quadratic unbiased
estimator (MINQUE), • restricted maximum likelihood(REML).
Four different methods are available for estimating the variance components:
The ANOVA method is the oldest and simplest method of estimating variance components. first computes sums of squares and expected mean squares for all effects following the general linear model approach. Then a system of linear equations is established by equating the sums of squares of the random effects to their expected mean squares. The variables in the equations are the variance components and the residual variance. Any solution, if one exists, to this system of linear equations constitutes a set of estimates for the variance components.
Estimating the variance components: - by using ANOVA method
it is easy to calculate. it is easy to understand; it has very few basic assumptions, e.g., When -random variable
the resulting estimators are unbiased Variance components can calculated by different software like SPSS, SAS, STRATA and MARK . variance component methods is very simple – to decompose
the overall variance in a phenotype into particular sources.
ADVANTAGES
Source D f Sum of Squares
Mean Squares
EMS
Mean 1 SSMBetween s-1 SSAWithin N-s SSETotal N SST
Calculation steps
Consider, for example, the completely randomized design (or 1-way classification) of a groups and n observations in each. The usual model equation for y ij , the j'th observation in the i 'th group, is
For i = 1, 2, …,s. and j = 1, 2, .. ·, n. With µ representing an overall mean - is a random variable
ANOVA
where
yij - value of j’th observation in I’th group
The mean sum of squares is therefore N times the means squared.
The sum of squares due to a particular effect is therefore the sum over all observations of the estimated effect in each observation squared
Form the ANOVA table
Variance estimates • Var(bwtn) =
• Var(error) =E(MSE)
Source D f Sum of Squares
Mean Squares
EMS
Mean 1 SSM SSMbetween s-1 SSA MSA nσs
2+ σe2
Error N-s SSE MSE σe2
Total N SST
Three fabrications casting in the same facility were randomly selected. Each casting was broken into individual bars. Ten randomly selected bars from each casting were tested . The interest is on identifying variations of tensile strength caused by casting in the facility and by bar within the casting, not about the mean differences among the tree casting.
EXAMPLE:
17
Row cast 1 cast 2 cast 3
1 88.0 85.9 94.2
2 88.0 88.6 91.5
3 94.8 90.0 92.0
4 90.0 87.1 96.5
5 93.0 85.6 95.6
6 89.0 86.0 93.8
7 86.0 91.0 92.5
8 92.9 89.6 93.2
9 89.0 93.0 96.2
10 93.0 87.5 92.5
The statistical model for identifying the two sources of variation for this random effects in this experiment is
ij
, i = 1,2,...t; j =1,2,..r.
where is the process mean, ' are the random effects due to castings, e ' are the random error due to bars within castings.
The distribution assumptio
ij i ij
i
y e
s
s
ij
2 2 2
2 2
ns are: ~ (0, ); e ~N(0, ),andbothareindependent .
The total variance of an observation
may be expressed by :
and are two variance components. .
i e
y e
e
N
The ANOVA table and expected mean squares for the random effect model:Source Df SS MS EMS
mean 1 SSM SSM
Among Castings t-1 SSA MSA=SSA/(t-1) e2 r
2
Among Bars within Casting N-t SSW MSW=SSW/(N-t) e2
Total N SST
• SSM=30×(90.86)2˭247666.188
• Grand mean= y̅..= =90.86
• SST=882+882+………..+92.52 =247970.4
• SSA=10((90.35-90.86)2+(88.43-90.86)2+(93.8-90.86)2=148.086
• SSW=247970.4-247666.188-148.086=156.126
SOURSE DF SS MSS EMSmean 1 247666.188 247666.1
9Casting 2 148.086 74.043 10.00
error 27 156.125 5.78 5.78
total 30 247970.4
Source Est. Value
%
Casting 6.826 54.17Error 5.78 45.83total 12.60
Variance Components
Is variance due to Casting, 2
Is the Random Error due Bars. e2
ANOVA TABLE
variance component analysis in SPSS
Analyze
General Linear Model Variance Components
► To run a Variance Components analysis, from the menus choose:
► Select Amount spent as the dependent variable. ► Select Who shopping for and Use coupons as fixed factors. ► Select Store ID as a random factor. ► Click Model.
► Select Interaction from the Build Term(s) drop-down list and select the interaction term to the model. ► Click Continue. ► Click Options in the Variance Components dialog box.
► Select ANOVA as the ` method.► Select Sums of squares and Expected mean squares in the Display group. ► Click Continue.
► Click OK in the Variance Components dialog box.
This table displays variance estimates for each of the variance components.
we can use this table to figure out how much each component contributes to the total variance.
In this example Var(STOREID)=665.237 and Var(Error)=3835.388.
Thus, the store effect explains 665.237/(665.237+3835.388) = 14.78%
of the random variation. Error accounts for the 85.22% of the random variation.
RESULT in output of SPSS
Example in SPSS
CASE STUDIES
28
(Wernimont,1985).
A study of a chromatographic method for determining malathion
CASE STUDY-1
In this study ten labs participated; each lab received a subsample of a technical grade malathion (Tech), two wetable powders (25% WP and 50% WP), and an emulsifiable concentrate (58% EC), and a dust.
The statistical model is
30
Row lab Rep WP25 WP50
1 1 1 26.17 50.76
2 1 2 26.22 50.67
3 1 3 25.85 50.81
4 1 4 25.80 50.72
5 2 1 26.44 50.82
6 2 2 26.57 50.90
7 2 3 25.80 51.04
8 2 4 26.06 50.96
9 3 1 26.95 52.53
10 3 2 26.91 52.54
11 3 3 26.98 52.55
12 3 4 26.91 52.47
13 5 1 26.23 50.20
14 5 2 26.00 50.47
15 5 3 26.22 50.39
16 5 4 26.18 50.43
17 6 1 25.45 51.65
18 6 2 25.62 51.67
Row lab Rep WP25 WP50
19 6 3 27.01 51.72
20 6 4 25.72 52.07
21 7 1 26.14 50.53
22 7 2 26.78 50.75
23 7 3 26.04 49.99
24 7 4 25.97 50.92
25 8 1 25.70 50.00
26 8 2 25.90 50.30
27 8 3 25.80 50.50
28 8 4 25.70 50.60
29 9 1 26.13 50.26
30 9 2 26.13 50.36
31 9 3 25.91 50.97
32 9 4 25.86 50.44
33 10 1 26.22 50.23
34 10 2 26.20 50.27
35 10 3 25.84 50.29
36 10 4 25.84 49.97
Raw Data for the Malathion Interlaboratory Study
31
Analysis of Variance for WP50%_1
Source DF SS MS F P
laborato 8 19.1570 2.3946 50.958 0.000
Error 27 1.2688 0.0470
Total 35 20.4258
Variance Components
Source Var Comp. % of Total StDev
laborato 0.587 92.59 0.766
Error 0.047 7.41 0.217
Total 0.634 0.796
Expected Mean Squares
1 laborato 1.00(2) + 4.00(1)
2 Error 1.00(2)
2 2 2 + y L e
It is clearly indicates that 92.6% of the total variance of each observation is the between-lab. That is the lab averages are very different.
CASE STUDY -2
Estimating variance components in Stata
Yulia Marchenko (2006)
The research problem,estimates the variability of measurements among machines operated over several days. Four machines (b = 4) were selected for the study, with two measurements (r = 2) obtained from each machine for each of the 4 days (a = 4).
(%) 28.82 37.23 22.39 11.54
Variance estimates
ANOVA TABLE
CASE STUDY -3
Methods of variance component estimation
D. Rasch, O. Mašata(2006)
By random number generation , received a data set with a = 100 sires with ni daughters as given in table(1). milk yields of heifers during the full first lactation with an assumed heritability coefficient.
table 1.Numbers of daughters of 100 sires
Estimates of the variance components from the data set by different methods
conclusion
From all this study we can conclude that, variance component analyses helps to partitioning total variation into different components. All the results are useful, naturally. But to be applicable to real-life situations they demand a numerical value .it also help to know the percentage contributions of random factors to variation of the dependent variable.
Dorothy L. Robinson (1987) Estimation and Use of Variance Components Journal of the Royal Statistical Society. Series D (The Statistician), Vol. 36, No. 1
pp. 3-14
D. Rasch, O. Mašata (2006) .,Methods of variance component estimation, Biometric Unit, Research Institute of Animal Production`Czech J. Anim. Sci., 51, 2006 (6): 227–235
REFERENCES
Henderson, C.R., 1953. Estimation of variance and covariance components Biometrics9:226-252.
http://www.jstor.org/stable/2988267 .
REFERENCES
Shayle R. Searle(April1994) An Overview of Variance Component Estimation Biometrics Unit, Cornell University, Ithaca, N.Y.,
U.S.A., 14853
Searle, S.R., Casella, G., and McCulloch, C.E. (1992) Variance components. John
Wiley and Sons, NY.
Yulia Marchenko(2006)., Estimating variance components in Stata The Stata Journal (2006)6, Number 1, pp. 1–21
P.J. SOLOMON (2005) Variance Components Volume 8, pp. 5685–5697 Encyclopedia Of Biostatistics Second Edition (ISBN 0-470-84907-X)