variance component analysis by paravayya c pujeri

VARIANCE COMPONENT ANALYSIS by using ANOVA method

ByPARAVAYYA C PUJERIPGS14AGR6424

Flow Of Seminar

Basic conceptsIntroductionApplicationsAssumptionStepsExamplesCase studyConclusionReferences

• Factor: An independent variable defining groups of cases.

• Fixed factors : are generally thought of as variables whose values of interest are all represented in the data file.

• Random factors : are variables whose values in the data file can be considered a random sample from a larger population of values. They are useful for explaining excess variability in the dependent variable.

• Fixed eflects : the effects attributable to a finite set of levels of a factor that occur in the data and which are there because we are interested in them

• Random eflect: These are attributable to a (usually) infinite set of levels of a factor, of which only a random sample are deemed to occur in the data.

• Mixed effect model: Models that contain both fixed and random effects are called mixed effect models.

Basic concepts

VARIANCE COMPONENT ANALYSIS

Other Names: Components of variation, sources of variance, variance analysis, intraclass correlation, random effects models, analysis of a nested data.

Definition :Is a technique of partitioning of total variations into different components of which some are

known and some are completely unknown.

• Variance components : are a way to assess the amount of variation in a dependent variable that is associated with one or more random-effects variables.

The Variance Components procedure, for random-effects/ mixed-effects models , estimates the contribution of each random effect to the variance of the dependent variable. This procedure is particularly interesting for analysis of mixed models such as split plot, univariate repeated measures, and random block designs. By calculating variance components, we can determine where to focus attention in order to reduce the variance.Example. At an agriculture school, weight gains for pigs in six different litters are measured after one month. The litter variable is a random factor with six levels. (The six litters studied are a random sample from a large population of pig litters.) The investigator finds out that the variance in weight gain is attributable to the difference in litters much more than to the difference in pigs within a litter.

INTRODUCTION

Variance components models are a way to assess the amount of variation in a dependent variable that is associated with one or more random-effects variable . the variance components procedure estimates only variance components, not model regression coefficients`.

Variance component model/ analysis can be traced back to the works of astronomers Airy(1861) and Chauvenel (1863). A modern interpretation of a one way random model is given by R A Fisher (1918). Later Handerson(1950), estimates variance components by “equating mean sum square to expected mean sum square” these methods popularly known as Handerson methods.

Historical development

APPLICATIONS

• components of variance have been used widely in agricultural genetics and animal breeding

(1) to predict the breeding values of sires or dams and to predict real producing abilities of

cows, (2) to indicate sources of variation which should

be considered in analyzing production records,

• In plant breeding, • Epidemologies, psychometric testing • In Engineering and• In Environmental science and etc..

The observations are normally distributed (under some conditions this assumption can be relaxed) with each source of variance being constant for all subgroups (this may be true only after a transformation).

The values of errors are independent of each other and the variables in the model.

The errors have a normal distribution with a mean of 0. The data are completely balanced; this means that all

similar subgroups have the same numbers of observations (more complex methods allow estimation of variance components from unbalanced data).

Variance Components assumes:

• Analysis of variance (ANOVA), • maximum likelihood (ML), • minimum norm quadratic unbiased

estimator (MINQUE), • restricted maximum likelihood(REML).

Four different methods are available for estimating the variance components:

The ANOVA method is the oldest and simplest method of estimating variance components. first computes sums of squares and expected mean squares for all effects following the general linear model approach. Then a system of linear equations is established by equating the sums of squares of the random effects to their expected mean squares. The variables in the equations are the variance components and the residual variance. Any solution, if one exists, to this system of linear equations constitutes a set of estimates for the variance components.

Estimating the variance components: - by using ANOVA method

it is easy to calculate. it is easy to understand; it has very few basic assumptions, e.g., When -random variable

the resulting estimators are unbiased Variance components can calculated by different software like SPSS, SAS, STRATA and MARK . variance component methods is very simple – to decompose

the overall variance in a phenotype into particular sources.

ADVANTAGES

Source D f Sum of Squares

Mean Squares

EMS

Mean 1 SSMBetween s-1 SSAWithin N-s SSETotal N SST

Calculation steps

Consider, for example, the completely randomized design (or 1-way classification) of a groups and n observations in each. The usual model equation for y ij , the j'th observation in the i 'th group, is

For i = 1, 2, …,s. and j = 1, 2, .. ·, n. With µ representing an overall mean - is a random variable

ANOVA

where

yij - value of j’th observation in I’th group

The mean sum of squares is therefore N times the means squared.

The sum of squares due to a particular effect is therefore the sum over all observations of the estimated effect in each observation squared

Form the ANOVA table

Variance estimates • Var(bwtn) =

• Var(error) =E(MSE)

Source D f Sum of Squares

Mean Squares

EMS

Mean 1 SSM SSMbetween s-1 SSA MSA nσs

2+ σe2

Error N-s SSE MSE σe2

Total N SST

Three fabrications casting in the same facility were randomly selected. Each casting was broken into individual bars. Ten randomly selected bars from each casting were tested . The interest is on identifying variations of tensile strength caused by casting in the facility and by bar within the casting, not about the mean differences among the tree casting.

EXAMPLE:

17

Row cast 1 cast 2 cast 3

1 88.0 85.9 94.2

2 88.0 88.6 91.5

3 94.8 90.0 92.0

4 90.0 87.1 96.5

5 93.0 85.6 95.6

6 89.0 86.0 93.8

7 86.0 91.0 92.5

8 92.9 89.6 93.2

9 89.0 93.0 96.2

10 93.0 87.5 92.5

The statistical model for identifying the two sources of variation for this random effects in this experiment is

ij

, i = 1,2,...t; j =1,2,..r.

where is the process mean, ' are the random effects due to castings, e ' are the random error due to bars within castings.

The distribution assumptio

ij i ij

i

y e

s

s

ij

2 2 2

2 2

ns are: ~ (0, ); e ~N(0, ),andbothareindependent .

The total variance of an observation

may be expressed by :

and are two variance components. .

i e

y e

e

N

The ANOVA table and expected mean squares for the random effect model:Source Df SS MS EMS

mean 1 SSM SSM

Among Castings t-1 SSA MSA=SSA/(t-1) e2 r

2

Among Bars within Casting N-t SSW MSW=SSW/(N-t) e2

Total N SST

• SSM=30×(90.86)2˭247666.188

• Grand mean= y̅..= =90.86

• SST=882+882+………..+92.52 =247970.4

• SSA=10((90.35-90.86)2+(88.43-90.86)2+(93.8-90.86)2=148.086

• SSW=247970.4-247666.188-148.086=156.126

SOURSE DF SS MSS EMSmean 1 247666.188 247666.1

9Casting 2 148.086 74.043 10.00

error 27 156.125 5.78 5.78

total 30 247970.4

Source Est. Value

%

Casting 6.826 54.17Error 5.78 45.83total 12.60

Variance Components

Is variance due to Casting, 2

Is the Random Error due Bars. e2

ANOVA TABLE

variance component analysis in SPSS

Analyze

General Linear Model Variance Components

► To run a Variance Components analysis, from the menus choose:

► Select Amount spent as the dependent variable. ► Select Who shopping for and Use coupons as fixed factors. ► Select Store ID as a random factor. ► Click Model.

► Select Interaction from the Build Term(s) drop-down list and select the interaction term to the model. ► Click Continue. ► Click Options in the Variance Components dialog box.

► Select ANOVA as the ` method.► Select Sums of squares and Expected mean squares in the Display group. ► Click Continue.

► Click OK in the Variance Components dialog box.

This table displays variance estimates for each of the variance components.

we can use this table to figure out how much each component contributes to the total variance.

In this example Var(STOREID)=665.237 and Var(Error)=3835.388.

Thus, the store effect explains 665.237/(665.237+3835.388) = 14.78%

of the random variation. Error accounts for the 85.22% of the random variation.

RESULT in output of SPSS

Example in SPSS

CASE STUDIES

28

(Wernimont,1985).

A study of a chromatographic method for determining malathion

CASE STUDY-1

In this study ten labs participated; each lab received a subsample of a technical grade malathion (Tech), two wetable powders (25% WP and 50% WP), and an emulsifiable concentrate (58% EC), and a dust.

The statistical model is

30

Row lab Rep WP25 WP50

1 1 1 26.17 50.76

2 1 2 26.22 50.67

3 1 3 25.85 50.81

4 1 4 25.80 50.72

5 2 1 26.44 50.82

6 2 2 26.57 50.90

7 2 3 25.80 51.04

8 2 4 26.06 50.96

9 3 1 26.95 52.53

10 3 2 26.91 52.54

11 3 3 26.98 52.55

12 3 4 26.91 52.47

13 5 1 26.23 50.20

14 5 2 26.00 50.47

15 5 3 26.22 50.39

16 5 4 26.18 50.43

17 6 1 25.45 51.65

18 6 2 25.62 51.67

Row lab Rep WP25 WP50

19 6 3 27.01 51.72

20 6 4 25.72 52.07

21 7 1 26.14 50.53

22 7 2 26.78 50.75

23 7 3 26.04 49.99

24 7 4 25.97 50.92

25 8 1 25.70 50.00

26 8 2 25.90 50.30

27 8 3 25.80 50.50

28 8 4 25.70 50.60

29 9 1 26.13 50.26

30 9 2 26.13 50.36

31 9 3 25.91 50.97

32 9 4 25.86 50.44

33 10 1 26.22 50.23

34 10 2 26.20 50.27

35 10 3 25.84 50.29

36 10 4 25.84 49.97

Raw Data for the Malathion Interlaboratory Study

31

Analysis of Variance for WP50%_1

Source DF SS MS F P

laborato 8 19.1570 2.3946 50.958 0.000

Error 27 1.2688 0.0470

Total 35 20.4258

Variance Components

Source Var Comp. % of Total StDev

laborato 0.587 92.59 0.766

Error 0.047 7.41 0.217

Total 0.634 0.796

Expected Mean Squares

1 laborato 1.00(2) + 4.00(1)

2 Error 1.00(2)

2 2 2 + y L e

It is clearly indicates that 92.6% of the total variance of each observation is the between-lab. That is the lab averages are very different.

CASE STUDY -2

Estimating variance components in Stata

Yulia Marchenko (2006)

The research problem,estimates the variability of measurements among machines operated over several days. Four machines (b = 4) were selected for the study, with two measurements (r = 2) obtained from each machine for each of the 4 days (a = 4).

(%) 28.82 37.23 22.39 11.54

Variance estimates

ANOVA TABLE

CASE STUDY -3

Methods of variance component estimation

D. Rasch, O. Mašata(2006)

By random number generation , received a data set with a = 100 sires with ni daughters as given in table(1). milk yields of heifers during the full first lactation with an assumed heritability coefficient.

table 1.Numbers of daughters of 100 sires

Estimates of the variance components from the data set by different methods

conclusion

From all this study we can conclude that, variance component analyses helps to partitioning total variation into different components. All the results are useful, naturally. But to be applicable to real-life situations they demand a numerical value .it also help to know the percentage contributions of random factors to variation of the dependent variable.

Dorothy L. Robinson (1987) Estimation and Use of Variance Components Journal of the Royal Statistical Society. Series D (The Statistician), Vol. 36, No. 1

pp. 3-14

D. Rasch, O. Mašata (2006) .,Methods of variance component estimation, Biometric Unit, Research Institute of Animal Production`Czech J. Anim. Sci., 51, 2006 (6): 227–235

REFERENCES

Henderson, C.R., 1953. Estimation of variance and covariance components Biometrics9:226-252.

http://www.jstor.org/stable/2988267 .

REFERENCES

Shayle R. Searle(April1994) An Overview of Variance Component Estimation Biometrics Unit, Cornell University, Ithaca, N.Y.,

U.S.A., 14853

Searle, S.R., Casella, G., and McCulloch, C.E. (1992) Variance components. John

Wiley and Sons, NY.

Yulia Marchenko(2006)., Estimating variance components in Stata The Stata Journal (2006)6, Number 1, pp. 1–21

P.J. SOLOMON (2005) Variance Components Volume 8, pp. 5685–5697 Encyclopedia Of Biostatistics Second Edition (ISBN 0-470-84907-X)

variance component analysis by paravayya c pujeri

Data & Analytics