design of experiments notes_iit delhi

352
  Anal sis of V ariance and  Anal sis of V ariance and Design Design of Experiments of Experiments- -I I MODULE MODULE - -I LECTURE LECTURE - - 1 SOME R ESULTS ON L INEAR SOME R ESULTS ON L INEAR  A LGEBRA , MA TRIX THEOR Y  A LGEBRA , MA TRIX THEOR Y Dr. Shalab h epar men o a ema cs an as cs Indian Insti tute of Technolo gy Kanpur  A - P D F M  e r  g  e r D E  O  : P  u r  c h  a  s  e f  r  o  w w w A - P D F  c  o   t   o r  e  o v  e  t  h  e w  a  t   e r m  a r k 

Upload: vicksgeorge

Post on 05-Nov-2015

28 views

Category:

Documents


0 download

DESCRIPTION

Design and analysis of experiments

TRANSCRIPT

  • Analysis of Variance and Analysis of Variance and yyDesign Design of Experimentsof Experiments--II

    MODULE MODULE -- II

    LECTURE LECTURE -- 1 1 SOME RESULTS ON LINEAR SOME RESULTS ON LINEAR ALGEBRA, MATRIX THEORY ALGEBRA, MATRIX THEORY

    AND DISTRIBUTIONSAND DISTRIBUTIONSAND DISTRIBUTIONSAND DISTRIBUTIONSDr. Shalabh

    D t t f M th ti d St ti tiDepartment of Mathematics and StatisticsIndian Institute of Technology Kanpur

    A-PDF M

    erger DEMO

    : Purchase from www.A-PDF.com

    to remove the waterm

    ark

  • We need some basic knowledge to understand the topics in analysis of variance.

    2

    A vector Y is an ordered n-tuple of real numbers. A vector can be expressed as row vector or a column vector as

    Vectors

    1

    2

    n

    yy

    Y

    y

    = #

    is a column vector of order . 1n

    n

    1 2' ( , ,..., )nY y y y= is a row vector of order

    and

    1 .nIf all for all i = 1,2,,n then is called the null vector.0iy = ' (0, 0,..., 0)Y =If 1 1 1

    2 2 2, ,

    x y zx y z

    X Y Z

    = = = # # #

    1 2 1

    thenn n nx y z

    x y kyk

    + +

    # # #

    2 2 2,

    n n n

    x y kyX Y kY

    x y ky

    + + = = + # #

  • ( ) ( )'( ) ' '

    X Y Z X Y ZX Y Z X Y X Z+ + = + +

    + +

    3

    1 1 2 2

    '( ) ' '( ' ) ( ) ' '( )

    ( )' ... n n

    X Y Z X Y X Zk X Y kX Y X kY

    k X Y kX kYX Y x y x y x y

    + = += =

    + = += + + +

    where k is a scalar.

    Orthogonal vectors g

    Two vectors X and Y are said to be orthogonal if ' ' 0.X Y Y X= =The null vector is orthogonal to every vector X and is the only such vector.

    Linear combination

    if are m vectors of same order and are scalars, Then1 2, ,..., mx x x

    1

    m

    i ii

    t k x=

    = is called the linear combination of 1 2, ,..., .x x x

    1 2, ,..., mk k k

    is called the linear combination of 1 2, ,..., .mx x x

  • Linear independence

    4

    If are m vectors then they are said to be linearly independent if there exist scalars

    such that1 2, ,..., mX X X 1 2, ,..., mk k k

    1

    0 0m

    i i ii

    k X k=

    = = for all i = 1,2,,m.If there exist with at least one to be nonzero, such that then are said to

    be linearly dependent

    1 2, ,..., mk k k ik1

    0m

    i ii

    k x=

    = 1 2, ,..., mx x xbe linearly dependent.

    Any set of vectors containing the null vector is linearly dependent. Any set of non null pair wise orthogonal vectors is linearly independent Any set of non-null pair-wise orthogonal vectors is linearly independent. If m > 1 vectors are linearly dependent, it is always possible to express at least one of them as a linear

    combination of the others.

  • Linear function

    5

    Let be a vector of scalars and be a vector of variables, then

    is called a linear function or linear form. The vector K is called the coefficient vector.

    1 2( , ,..., ) 'mK k k k= 1m 1 2( , ,..., )mX x x x= 1m

    1'

    == m i i

    iK Y k y

    For example mean of can be expressed asx x xFor example, mean of can be expressed as1 2, ,..., mx x x

    1

    2 '

    1

    1 1 1(1,1,...,1) 1m

    i mi

    xx

    x x Xm m m=

    = = = #

    where is a vector of all elements unity.

    mx

    '1m

    Contrast

    1m

    The linear function is called a contrast in

    For example, the linear functions1

    'm

    i ii

    K X k x=

    = 1 21

    , ,..., 0.ifm

    m ii

    x x x k=

    =xx

    are contrasts.

    311 2 1 2 3 2, 2 3 , 2 3

    xxx x x x x x + +

    A linear function is a contrast if and only if it is orthogonal to a linear function or to the linear function'K X1

    m

    ii

    x=

    1

    1 .m

    ii

    x xm =

    = Contrasts are linearly independent for all j = 2, 3,, m.

    Every contrast in in can be written as a linear combination of (m - 1) contrasts

    1i= 1i1 2 1 3 1, , ..., jx x x x x x

    1 2, ,..., nx x x 1 2 1 3 1, ,..., .mx x x x x x

  • Matrix

    6

    A matrix is a rectangular array of real numbers. For example,

    11 12 1

    21 22 2

    ......

    n

    n

    a a aa a a # # #

    1 2 ...m m mna a a # # #

    is a matrix of order with m rows and n columns.m n

    If m = n, then A is called a square matrix.

    If then A is a diagonal matrix and is denoted as

    If m = n, (square matrix) and for i > j , then A is called an upper triangular matrix. On the other hand if

    0, , ,= =ija i j m n 11 22( , ,..., ).nna aA g adia=0ija =

    m = n, and for i < j then A is called a lower triangular matrix.

    If A is a matrix, then the matrix obtained by writing the rows of A and columns of A as columns of A and

    rows of A respectively, is called the transpose of a matrix A and is denoted as . If then A is a symmetric matrix

    0ija =m n

    'A'A A=If then A is a symmetric matrix.

    If then A is skew symmetric matrix.

    A matrix whose all elements are equal to zero is called as null matrix.

    An identity matrix is a square matrix of order p whose diagonal elements are unity (ones) and all the off diagonal

    A A='A A=

    elements are zero. It is denotes as .pI

  • ( ) ' ' 'A B A B+ + If A and B are matrices of order thenm n

    7

    ( ) ' ' '.A B A B+ = +

    If A and B are the matrices of order m x n and n x p respectively and k is any scalar, then

    ( ) ' ' 'AB B A=

    If the orders of matrices A is m x n, B is n x p and C is n x p then

    ( ) ( ) ( ) .kA B A kB k AB kAB= = =

    ( ) .A B C AB AC+ = + If the orders of matrices A is m x n, B is n x p and C is p x q then

    ( ) ( ).AB C A BC=

    If A is the matrix of order m x n then

    .m nI A AI A= = If A is the matrix of order m x n then

  • Trace of a matrix

    8

    The trace of n x n matrix A, denoted as tr(A) or trace(A) is defined to be the sum of all the diagonal elements of A,

    i.e.,

    If A is of order and B is of order , thenm n1

    ( ) .=

    = n iii

    tr A a

    n m.

    If A is n x n matrix and P is any nonsingular n x n matrix then

    ( ) ( ).tr AB tr BA=

    1( ) ( )tr A tr P AP=If P is an orthogonal matrix than

    If A and B are n x n matrices, a and b are scalars then

    ( ) ( ).tr A tr P AP=( ) ( ' ).tr A tr P AP=

    .

    If A is a m x n matrix, then

    ( ) ( ) ( )tr aA bB a tr A b tr B+ = +

    2( ' ) ( ')n n

    ijtr A A tr AA a= =and

    if and only if A = 0.

    If A is n x n matrix then

    1 1ij

    j i= =

    ( ' ) ( ') 0tr A A tr AA= = If A is n x n matrix then

    .( ')tr A trA=

  • Rank of a matrix

    9

    a o a at

    The rank of a matrix A of is the number of linearly independent rows in A.

    Let B be another matrix of order

    m n.n q

    A square matrix of order m is called non-singular if it has a full rank.

    ( ) min ( ( ), ( )).rank AB rank A rank B( ) ( ) ( ).+ +rank A B rank A rank B

    Rank of A is equal to the maximum order of all nonsingular square sub-matrices of A.

    A is of full row rank if rank(A) = m < n.

    ( ) ( ) ( )

    ( ') ( ' ) ( ) ( ').= = =rank AA rank A A rank A rank A

    A is of full column rank if rank(A) = n < m.

  • Inverse of matrix

    10

    The inverse of a square matrix A of order m, is a square matrix of order m, denoted as , such that 1A 1 1 .mA A AA I = =The inverse of A exists if and only if A is non singular.

    1 1( ) .A A = If A is non singular, then

    If A and B are non-singular matrices of same order, then their product, if defined, is also nonsingular and

    ( )1 1( ') ( ) '.A A =

    1 1 1( ) .AB B A =

    Idempotent matrix

    A square matrix A is called idempotent if 2A AA A= =

    the eigenvalues of A are 1 or 0.

    A square matrix A is called idempotent if

    If A is an idempotent matrix with . Then

    .A AA A= =n n ( ) rank A r n=

    ( ) ( ) .trace A rank A r= = If A is of full rank n, then

    ( ) ( ).nA I=

    If A and B are idempotent and AB = BA, then AB is also idempotent.

    If A is idempotent then (I A) is also idempotent and A(I - A) = (I - A) A = 0.

  • Analysis of Variance and Analysis of Variance and yyDesign Design of Experimentsof Experiments--II

    MODULE MODULE -- II

    LECTURE LECTURE -- 2 2 SOME RESULTS ON LINEAR SOME RESULTS ON LINEAR ALGEBRA, MATRIX THEORY ALGEBRA, MATRIX THEORY

    AND DISTRIBUTIONSAND DISTRIBUTIONSAND DISTRIBUTIONSAND DISTRIBUTIONSDr. Shalabh

    D t t f M th ti d St ti tiDepartment of Mathematics and StatisticsIndian Institute of Technology Kanpur

  • Quadratic forms

    2

    If A is a given matrix of order and X and Y are two given vectors of order and respectively, then

    the quadratic form is given by

    m n 1m 1n

    'm n

    ij i jX AY a x y= 1 1

    j ji j= =

    where are the nonstochastic elements of A.'ija s

    If A is square matrix of order m and X = Y , then2 2

    11 1 12 21 1 2 1 1 1' ... ( ) ... ( ) .mm m m m m m m mX AX a x a x a a x x a a x x = + + + + + + +11 1 12 21 1 2 1, , 1 1( ) ( )mm m m m m m m mIf A is symmetric also, then

    2 211 1 12 1 2 1, 1' ... 2 ... 2mm m m m m m

    m n

    X AX a x a x a x x a x x = + + + + +

    1 1

    = ij i ji j

    a x x= =

    is called a quadratic form in m variables x1, x2, , xm or a quadratic form in X.

    To every quadratic form corresponds a symmetric matrix and vice versa.

    The matrix A is called the matrix of quadratic form.

    The quadratic form and the matrix A of the form is called

    Positive definite if for all . Positive semi definite if for all

    'X AX

    ' 0X AX > 0x ' 0X AX 0x Positive semi definite if for all .

    Negative definite if for all . Negative semi definite if for all .

    0X AX ' 0X AX < 0x

    ' 0X AX

    0x

    0x

  • If A is positive semi definite matrix then and if then for all j, and for all j.0iia 0iia = 0ija = 0jia =

    3

    If P is any nonsingular matrix and A is any positive definite matrix (or positive semi-definite matrix) then is

    also a positive definite matrix (or positive semi-definite matrix).

    A matrix A is positive definite if and only if there exists a non-singular matrix P such that

    'P AP

    ' .A P P= A positive definite matrix is a nonsingular matrix.

    If A is matrix and is positive definite and is positive semidefinite.

    If A matrix and then both and are positive semidefinite

    m n ( ) 'thenrank A m n AA= < 'A Am n ( )rank A k m n= < < 'A A 'AAIf A matrix and then both and are positive semidefinite.m n ( ) ,rank A k m n= < < A A AA

  • Simultaneous linear equations

    4

    The set of m linear equations in n unknowns and scalars and , of the form 1 2, ,..., nx x x ija ib 1, 2,..., , 1, 2,...,i m j n= =11 1 12 2 1 1

    21 1 22 2 2 2

    ......

    n n

    n n

    a x a x a x ba x a x a x b

    + + + =+ + + =

    can be formulated as

    AX b

    1 1 2 2 ...m m mn n ma x a x a x b+ + + =#

    AX = b

    where A is a real matrix of known scalars of order called as coefficient matrix, X is real vector and b is real

    vector of known scalars given by

    m n 1n

    a a a 11 12 121 22 2

    1 2

    ......

    ,

    ...

    is an real matrix called as coefficient matrix,

    n

    n

    m m mn

    a a aa a a

    A m n

    a a a

    = # # % #

    1 1

    2 21 1is an vector of variables and is an real vector, .

    x bx b

    X n b m

    = = # #n mx b

  • If A is nonsingular matrix, then AX = b has a unique solution.L t B [A b] i t d t i A l ti t AX b i t if d l if k(A) k(B)

    n n

    5

    Linear homogeneous system AX = 0 has a solution other than X = 0 if and only if rank (A) < n.

    Let B = [A, b] is an augmented matrix. A solution to AX = b exist if and only if rank(A) = rank(B).

    If A is an matrix of rank m, then AX = b has a solution.m n

    If AX = b is consistent then AX = b has a unique solution if and only if rank (A) = n

    If is the ith diagonal element of an orthogonal matrix, then

    Let the matrix be partitioned as where is an vector of the elements of ith column of A.

    A necessary and sufficient condition that A is an orthogonal matrix is given by the following:

    iia 1 1.iia n n 1 2[ , ,..., ]= nA a a a ia 1n

    A necessary and sufficient condition that A is an orthogonal matrix is given by the following:

    '

    '

    ( ) 1 1, 2,...,

    ( ) 0 1, 2,..., .

    i i

    i j

    i a a for i n

    ii a a for i j n

    = == =

    Orthogonal matrix

    A square matrix A is called an orthogonal matrix if or equivalently if' 'A A AA I= = 1 '.A A =An orthogonal matrix is non singular An orthogonal matrix is non-singular.

    If A is orthogonal, then is also orthogonal.

    If A is an matrix and let P is an orthogonal matrix, then the determinants of A and are the same.

    'AA

    n n n n 'P AP

  • Random vectors

    6

    Let be n random variables then is called a random vector.1 2, ,..., nY Y Y 1 2( , ,..., ) 'nY Y Y Y= The mean vector of Y is

    1 2( ) (( ( ), ( ),..., ( )) '.nE Y E Y E Y E Y=

    The covariance matrix or dispersion matrix of Y is The covariance matrix or dispersion matrix of Y is

    1 1 2 1

    2 1 2 2

    ( ) ( , ) ... ( , )( , ) ( ) ... ( , )

    ( )

    n

    n

    Var Y Cov Y Y Cov Y YCov Y Y Var Y Cov Y Y

    Var Y

    = # # % #1 2( , ) ( , ) ... ( )n n nCov Y Y Cov Y Y Var Y

    which is a symmetric matrix.

    If are pair-wise uncorrelated, then the covariance matrix is a diagonal matrix.1 2, , ..., nY Y Y

    If for all i = 1, 2,, n then2( )iVar Y = 2( ) .nVar Y I=

  • Linear function of random variable

    7

    If are n random variables and are scalars , then is called a linear function of random

    variables .1 2, ,..., nY Y Y 1 2, ,.., nk k k

    1

    n

    i ii

    k Y=

    1 2, ,..., nY Y Y

    If then

    the mean is and

    1 2 1 2( , ,..., ) ', ( , ,..., ) 'n nY Y Y Y K k k k= =1

    ' ,n

    i ii

    K Y k Y=

    = 'K Y

    1( ' ) ' ( ) ( )

    n

    i ii

    E K Y K E Y k E Y=

    = = the variance of is 'K Y ( ) ( )' ' .Var K Y K Var Y K=

    M lti i t l di t ib ti

    A random vector has a multivariate normal distribution with mean vector and dispersion

    matrix if its probability density function is

    Multivariate normal distribution

    1 2( , ,..., ) 'nY Y Y Y= 1 2( , ,..., ) = n

    assuming is a nonsingular matrix.

    1/2/2

    1 1( | , ) exp ( ) ' ( )2(2 ) nn

    f Y Y Y =

  • Chi-square distribution 8

    If are identically and independently distributed random variables following the normal distribution with 1 2, ,..., kY Y Y y p y g

    common mean 0 and common variance 1, then the distribution of is called the - distribution with k

    degrees of freedom.

    1 2, , , k2

    1

    k

    ii

    Y= 2

    The probability density function of - distribution with k degrees of freedom is given as2

    2

    121( ) exp ; 0k xf x x x = < <

    If are independently distributed following the normal distribution with common means 0 and common

    variance then has - distribution with k degrees of freedom.

    2 /2( ) exp ; 0 .( / 2)2 2kf x x x

    k< <

    1 2, ,..., kY Y Y

    2 , 22

    1 kiY 2 g

    If the random variables are normally distributed with non-null means but common variance

    1 th th di t ib ti f h t l di t ib ti ith k d f f d d t lit

    1 2, ,..., kY Y Y 1 2, ,..., k 2

    k

    Y 2

    21

    ii =

    1, then the distribution of has non-central - distribution with k degrees of freedom and non-centrality

    parameter1

    ii

    Y=

    2

    1

    k

    ii

    =

    =

    If are independently distributed following the normal distribution with means but common

    variance then has non-central -distribution with k degrees of freedom and noncentrality parameter

    1 2, ,..., kY Y Y 1 2, ,..., k 2 22

    1

    1 ki

    iY = 22 1

    1 .k

    ii

    == 2

  • 9 If U has a Chi-square distribution with k degrees of freedom then and ( )E U k= ( ) 2 .Var U k=

    If U has a noncentral Chi-square distribution with k degrees of freedom and noncentrality parameter then

    and

    ( )E U k = + ( ) 2 4 .Var U k = +

    If are independently distributed random variables with each having a noncentral Chi-square

    distribution with degrees of freedom and non centrality parameter then has noncentral

    Chi di t ib ti ith d f f d d t lit t

    1 2, ,..., kU U U iU

    in , 1,2,...,i i k =1

    k

    ii

    U=

    k k Chi-square distribution with degrees of freedom and noncentrality parameter

    Let has a multivariate distribution with mean vector and positive definite covariance matrix

    1i

    in

    =

    1.i

    i

    =

    1 2( , ,..., ) 'nX X X X= .Then is distributed as noncentral with k degrees of freedom if and only if is an idempotent matrix

    of rank k.

    'X AX2 A

    Let has a multivariate normal distribution with mean vector and positive definite covariance

    matrix . Let the two quadratic forms-

    i di t ib t d ith d f f d d t lit t d

    1 2( , ,..., )nX X X X=

    'X A X 2 ' A is distributed as with degrees of freedom and noncentrality parameter and is distributed as with degrees of freedom and noncentrality parameter

    Then are independently distributed if

    1'X A X 1n 1' A 2'X A X

    2 2' .A 1 2' 'andX A X X A X 1 2 0.A A =

    2n

  • t- distribution

    10

    If

    X has a normal distribution with mean 0 and variance 1,

    Y has a distribution with n degrees of freedom, and

    X and Y are independent random variables,

    then the distribution of the statistic is called the t distribution with n degrees of freedom

    2

    XT =then the distribution of the statistic is called the t-distribution with n degrees of freedom. The probability density function of T is

    /T

    Y n=

    12 2

    1 nn + + 2 22( ) 1 ; - .

    2

    Ttf t t

    n nn = + < <

    If the mean of X is non zero then the distribution of is called the noncentral t - distribution with n degrees

    of freedom and noncentrality parameter

    /X

    Y n

    .

  • F- distribution

    11

    If X and Y are independent random variables with - distribution with m and n degrees of freedom respectively,

    then the distribution of the statistic is called the F-distribution with m and n degrees of freedom. The

    2//

    X mFY

    =probability density function of F is

    /2

    2

    mm nm n m +

    +

    /Y n

    2 222( ) 1 ; 0 .

    2 2

    m

    Fmnf f f f f

    m n n

    = + < <

    If X has a noncentral Chi-square distribution with m degrees of freedom and noncentrality parameter has a

    distribution with n degrees of freedom, and X and Y are independent random variables, then the distribution of

    2; Ydistribution with n degrees of freedom, and X and Y are independent random variables, then the distribution of

    is the noncentral F distribution with m and n degrees of freedom and noncentrality parameter .//

    X mFY n

    =

  • Analysis of Variance and Analysis of Variance and yyDesign Design of Experimentsof Experiments--II

    MODULE MODULE -- II

    LECTURE LECTURE -- 3 3 SOME RESULTS ON LINEAR SOME RESULTS ON LINEAR ALGEBRA, MATRIX THEORY ALGEBRA, MATRIX THEORY

    AND DISTRIBUTIONSAND DISTRIBUTIONSAND DISTRIBUTIONSAND DISTRIBUTIONSDr. Shalabh

    D t t f M th ti d St ti tiDepartment of Mathematics and StatisticsIndian Institute of Technology Kanpur

  • Linear model

    2

    Suppose there are n observations. In the linear model, we assume that these observations are the values taken by n

    random variables satisfying the following conditions:1 2, ,.., nY Y Y

    is a linear combination of p unknown parameters with ( )iE Y

    1 1 2 2( ) ... , 1, 2,...,i i i ip pE Y x x x i n = + + + =

    1 2, ,..., p

    where are known constants.

    are uncorrelated and normality distributed with variance

    The linear model can be rewritten by introducing independent normal random variables following as

    'ijx s

    1 2, ,..., nY Y Y2( ) .iVar Y =

    2(0 )N The linear model can be rewritten by introducing independent normal random variables following , as.

    (0, )N 1 1 2 2 .... , 1, 2,..., .i i i ip p iY x x x i n = + + + + =

    These equations can be written using the matrix notations as

    Y X = +Y X = +where Y is a n x1 vector of observation, X is a matrix of n observations on each of variables,

    is a vector of parameters and is a vector of random error components with Here Y

    n p 1 2, ,..., pX X X1p 1n 2~ (0, ).nN I

    ( )ijx s'

    is called study or dependent variable, are called explanatory or independent variables and

    are called as regression coefficients.

    1 2, ,..., pX X X

    1 2, ,..., p

  • Alternatively since so the linear model can also be expressed in the expectation form as a normal 2~ ( , )Y N X I

    3

    random variable Y with

    2

    ( )

    ( ) .

    E Y X

    Var Y I

    ==

    Note that are unknown but X is known.

    ( ) .Var Y I2and

    Estimable function

    A li t i f ti f th t i id t b ti bl t i f ti ti bl if th' A linear parametric function of the parameter is said to be an estimable parametric function or estimable if there exists a linear function of random variables of Y where such that

    'YA 1 2( , ,..., ) 'nY Y Y Y=

    ( ' ) 'E Y =Awith and being the vectors of known scalars.1 2( , ,..., ) 'n=A A A A 1 2( , ,..., ) 'n =

  • Best Linear Unbiased Estimates (BLUE)

    4

    The unbiased minimum variance linear estimate of an estimable function is called the best linear unbiased

    estimate of .

    S d th BLUE f ti l

    'YA ' ' ' YA ' YA ' 'd Suppose and are the BLUE of respectively.

    Then is the BLUE of

    If is estimable, its best estimate is where is any solution of the equations

    1YA 2YA 1 2and 1 1 2 2( ) 'a a Y+A A 1 1 2 2( ) ' .a a +

    ' ' ' ' .X X X Y =

    Least squares estimation

    The least squares estimate of in is the value of which minimizes the error sum of squares . Y X = + ' The least squares estimate of in is the value of which minimizes the error sum of squares . Y X + Let ' ( ) '( )

    ' 2 ' ' ' ' .

    S Y X Y X

    Y Y X Y X X

    = = = +

    Minimizing S with respect to involvesMinimizing S with respect to involves 0

    ' '

    S

    X X X Y

    = =

    which is termed as normal equation.

  • This normal equation has a unique solution given by

    5

    assuming Note that is a positive definite matrix. So is the value of

    which minimizes and is termed as ordinary least squares estimator of .

    ( ) .rank X p= 2 ''

    S X X =

    1 ( ' ) 'X X X Y = '

    1 ( ' ) 'X X X Y =

    In this case, are estimable and consequently all the linear parametric function are estimable.

    y q

    1 2, ,..., p 1 1( ) ( ' ) ' ( ) ( ' ) 'E X X X E Y X X X X = = =

    1 1 2 1( ) ( ' ) ' ( ) ( ' ) ( ' )V X X X V Y X X X X X

    1 1 2 1( ) ( ' ) ' ( ) ( ' ) ( ' )Var X X X Var Y X X X X X = =

    If are the estimates of respectively, then ' 'and ' 'and 2 1 ( ' ) ' ( ) [ '( ' ) ]Var Var X X = = 2 1 ( ' , ' ) [ '( ' ) ].Cov X X =

    is called the residual vector and Y X ( ) 0E Y X =( ) 0.E Y X

  • Linear model with correlated observations

    6

    In the linear model

    Y X = +with and is normally distributed, we find( ) 0, ( )E Var = =

    ( ) , ( ) .E Y X Var Y= = Assuming to be positive definite, we can write

    'P P 'P P =where P is a nonsingular matrix. Premultiplying by P, we getY X = +

    * * *

    PY PX P

    Y X

    = += +or

    * , * * .and

    Y X

    Y PY X PX P

    = += = =

    Note that are unknown but X is known.2and

    or

    where

  • Distribution of 'YA

    7

    In the linear model consider a linear function which is normally distributed with 2, ~ (0, )Y X N I = + 'YA

    2

    ( ' ) ' ,( ' ) ( ' )

    E Y XV Y

    =A AA A A

    Then

    2( ' ) ( ' ).Var Y =A A A

    ' '~ ,1 .' '

    Y XN

    A AA A A A

    Further, has a noncentral Chi-square distribution with one degrees of freedom and noncentrality parameter

    2

    2

    ( ' ) .'

    X AA A

    2

    2

    ( ' )'

    yAA A

    2 ' A A

    Degrees of freedom

    A linear function of the observations is said to carry one degrees of freedom. A set of r linear functions

    where L is r x n matrix, is said to have M degrees of freedom if there exists M linearly independent functions in the set

    and no more. Alternatively, the degrees of freedom carried by the set equals rank (L). When the set are

    the estimates of the degrees of freedom of the set will also be called the degrees of freedom for the

    'YA ( 0)A ' ,L Y

    'L Y 'L Y' 'L Ythe estimates of the degrees of freedom of the set will also be called the degrees of freedom for the

    estimates of .

    , L Y'

  • Sum of squares

    8

    If is a linear function of observations, then the projection of Y on is the vector . The square of this

    projection is called the sum of squares (SS) due to is given by . Since has one degree of freedom,

    'YA A ' .'Y A AA A

    ' yA2( ' )

    'YAA A 'YA

    so the SS due to has one degree of freedom.

    The sum of squares and the degrees of freedom arising out of the mutually orthogonal sets of functions can be added

    together to give the sum of squares and degrees of freedom for the set of all the function together and vice versa.

    'YA

    Let has a multivariate normal distribution with mean vector and positive definite covariance 1 2( , ,..., )nX X X X= matrix . Let the two quadratic forms

    is distributed as degrees of freedom and noncentrality parameter and

    is distributed as degrees of freedom and noncentrality parameter

    22with n 2' .A

    1'X A X 2 1with n 1' A 2'X A X

    Then and are independently distributed if 1 2 0.A A =1'X A X 2'X A X

  • Fisher-Cochran theorem

    9

    If has multivariate normal distribution with mean vector and positive definite covariance

    matrix and let

    h ith k Th i d d tl di t ib t d t l Chi

    1 2( , ,..., )nX X X X= 1 1 2' ... kX X Q Q Q = + + +

    'Q X A X ( ) 1 2A N i k 'Qwhere with rank Then are independently distributed noncentral Chi-square

    distribution with degrees of freedom and noncentrality parameter if and only if in which case

    'i iQ X A X= ( ) , 1, 2,..., .i iA N i k= = 'iQ s

    iN ' iA 1

    ,k

    ii

    N N=

    =1

    1' ' .

    k

    ii

    A =1i=

  • Derivatives of quadratic and linear forms

    10

    Let and f(X) be any function of n independent variables ,

    then

    1 2( , ,..., ) '= nX x x x 1 2, ,..., nx x x

    1

    ( )

    ( )( )

    f Xx

    f Xf X

    2

    ( ) .

    ( )

    n

    f X xX

    f Xx

    =

    #

    If is a vector of constants, then 1 2( , ,..., ) '= nK k k k ' .K X KX =

    If A is an matrix, thenn n ' 2( ') .X AX A A XX

    = +X

    Independence of linear and quadratic forms

    Let Y be an vector having multivariate normal distribution and B be an matrix Then1n ( )N I m n 1m Let Y be an vector having multivariate normal distribution and B be an matrix. Then vector linear form BY is independent of the quadratic form if BA = 0 where A is a symmetric matrix of known

    elements.

    1n ( , )N I m n 1m'Y AY

    Let Y be an vector having multivariate normal distribution with . If , then the

    quadratic form is independent of linear form BY where B is an matrix.

    1n ( , ) N ( ) =rank nm n

    0 =B A'Y AY

  • Analysis of Variance andAnalysis of Variance andAnalysis of Variance and Analysis of Variance and Design of ExperimentDesign of Experiment--II

    MODULE MODULE II II

    g pg p

    GENERAL LINEAR HYPOTHESISGENERAL LINEAR HYPOTHESISLECTURE LECTURE -- 4 4

    GENERAL LINEAR HYPOTHESIS GENERAL LINEAR HYPOTHESIS AND ANALYSIS OF VARIANCEAND ANALYSIS OF VARIANCE

    Dr. ShalabhDepartment of Mathematics and Statistics

    Indian Institute of Technology Kanpur

  • Regression model for the general linear hypothesis

    2

    1 2, ,..., nY Y Y be a sequence of n independent random variables associated with responses. Then we can write it asLet

    1( ) , 1, 2,..., , 1, 2,...,

    p

    i j ijj

    E Y x i n j p=

    = = =1

    2( ) .j

    iVar Y =

    =This is the linear model in the expectation form where are the unknown parameters and s are the known values

    of independent covariates1 2, , ..., p 'ijx

    1 2, ,..., .pX X X

    Alternatively, the linear model can be expressed as1

    , 1, 2,..., ; 1, 2,...,p

    i j ij ij

    Y x i n j p =

    = + = =where s are identically and independently distributed random error component with mean 0 and variance i.e., i 2 ,

    2( ) 0 ( ) ( ) 0( )andE Var Cov i j = = = ( ) 0, ( ) ( , ) 0( ).and i i i jE Var Cov i j = = = In matrix notations, the linear model can be expressed as

    Y X = +where

    is vector of observations on response variable, 1n

    11 12 1 ... pX X X

    1 2( , ,..., ) 'nY Y Y Y=

    the matrix 21 22 2

    1 2

    ...

    ...

    p

    n n np

    X X XX

    X X X

    = # # % #

    is matrix of n observations on p independent covariates 1 2, , ..., ,pX X Xn p

  • is a vector of unknown regression parameters (or regression coefficients)1 2( , ,..., )p = 1p 1 2, , ..., p

    3

    associated with respectively and 1 2, , ..., ,pX X X

    is a vector of random errors or disturbances.1 2( , ,..., )n = 1n

    We assume that covariance matrix ( ) 0,E = 2( ) ( ') , ( ) .pV E I rank X p = = =

    In the context of analysis of variance and design of experiments,

    the matrix X is termed as design matrix;

    unknown are termed as effects;1 2, , ..., p

    the covariates are counter variables or indicator variables where counts the number of times

    the effect occurs in the ith observation1 2, , ..., pX X X

    xijx

    the effect occurs in the ith observation . ixj mostly takes the values 1 or 0 but not always.ijx

    The value indicates the presence of effect in and indicates the absence of effect in Xi.1ijx = j ix 0ijx = j

    Note that in the linear regression model, the covariates are usually continuous variables.

    When some of the covariates are counter variables and rest are continuous variables, then the model is called as

    mixed model and is used in the analysis of covariance.

  • Relationship between the regression model and analysis of variance model

    4

    The same linear model is used in the linear regression analysis as well as in the analysis of variance. So it is important to understand the role of linear model in the context of linear regression analysis and analysis of variance.

    Consider the multiple linear model

    Y X X X 0 1 1 2 2 ... .p pY X X X = + + + + +

    In the case of analysis of variance model,

    the one-way classification considers only one covariate,

    t l ifi ti d l id t i t two way-classification model considers two covariates,

    three-way classification model considers three covariates and so on.

    If denote the effects associated with the covariates X, Z and W which are counter variables, then in , and One-way model:

    Two-way model:

    Three-way model : and so on.

    Y X Z = + + +Y X = + +

    Y X Z W = + + + +

    Consider an example of agricultural yield. The study variable denotes the yield which depends on various covariates

    . In case of regression analysis, the covariates are the different variables like temperature, 1 2, , ..., pX X X 1 2, , ..., pX X X

    quantity of fertilizer, amount of irrigation, etc.

  • Now consider the case of one way model and try to understand its interpretation in terms of multiple regression model.

    5

    The covariate X is now measured at different levels, e.g., if X is the quantity of fertilizer then suppose there are p

    possible values, say 1 Kg., 2 Kg., ,..., p Kg. then denotes these p values in the following way.

    The linear model now can be expressed as by defining1 2, , ..., pX X X

    1 1 2 2 ...o p pY X X X = + + + + +

    1

    2

    22

    1 if effect of 1 Kg. fertilizer is present0 if effect of 1 Kg. fertilizer is absent

    1 if effect of Kg. fertilizer is present0 if effect of Kg. fertilizer is absent

    X

    X

    = =

    #1 if effect of Kg. fertilizer is presen

    p

    pX =#

    t0 if effect of Kg. fertilizer is absent.p

    If effect of 1 Kg. of fertilizer is present, then other effects will obviously be absent and the linear model is expressible as

    0 1 1 2 2

    0 1

    ( 1) ( 0) ... ( 0)

    .p pY X X X

    = + = + = + + = += + +

    If effect of 2 Kg. of fertilizer is present then

    0 1 1 2 2

    0 2

    ( 0) ( 1) ... ( 0)

    .p pY X X X

    = + = + = + + = += + +0 2 . + +

  • If effect of p Kg. of fertilizer is present then

    6

    0 1 1 2 2

    0

    ( 0) ( 0) ... ( 1)

    = + = + = + + = += + +

    p p

    p

    Y X X X

    and so on.

    If the experiment with 1 Kg. of fertilizer is repeated n1 number of times then n1 observation on response variables

    are recorded which can be represented asp

    11 0 1 2 11

    12 0 1 2 12

    .1 .0 ... .0

    .1 .0 ... .0p

    p

    YY

    = + + + + += + + + + +

    #11 0 1 2 1 1

    .1 .0 ... .0 .n p nY = + + + + +#

    If X2 = 1 is repeated n2 times, then on the same lines n2 number of times then n1 observation on response

    variables are recorded which can be represented as

    21 0 1 2 21

    22 0 1 2 22

    .0 .1 ... .0

    .0 .1 ... .0p

    p

    YY

    = + + + + += + + + + +

    #2 22 0 1 2 2

    .0 .1 ... .0 .n p nY = + + + + +#

  • The experiment is continued and if Xp = 1 is repeated np times, then on the same lines

    7

    1 0 1 2 1

    2 0 1 2 2

    0 1 2

    .0 .0 ... .1

    .0 .0 ... .1

    .0 .0 ... .1 .p p

    p p P

    p p P

    pn p pn

    YY

    Y

    = + + + + += + + + + +

    = + + + + +#

    All these observations can be represented as 1 2, , .., pn n n

    11 1 1 0 0 0 0y " 11

    1

    12

    1

    21

    1 1 0 0 0 0

    1 1 0 0 0 01 0 1 0 0 0

    n

    y

    y

    y

    "# # # # #% # #

    ""

    1

    12

    1

    21

    n

    #

    2

    22

    2

    1 0 1 0 0 0

    1 0 1 0 0 0

    1 0 0 0 0 1

    n

    y

    y

    =

    "# # # # # %# #

    "# # # #% # ##

    2

    022

    1

    2np

    +

    ##

    #1

    2

    1 0 0 0 0 11 0 0 0 0 1

    1 0 0 0p

    p

    p

    pn

    yy

    y

    ""

    # # # #%# ##"

    1

    2

    0 1p

    p

    p

    pn

    #

    or.Y X = +

  • 8In the two way analysis of variance model, there are two covariates and the linear model is expressible as

    0 1 1 2 2 1 1 2 2+ ... ...p p q qY X X X Z Z Z = + + + + + + + +

    where denotes, e.g., the p levels of quantity of fertilizer, say 1 Kg., 2 Kg.,..., p Kg. and

    denotes, e.g., the q levels of level of irrigation, say 10 Cms., 20 Cms.,,10q Cms. etc. The levels

    are defined as counter variable indicating the presence or absence of the effect as in the earlier

    If th ff t f X d Z t i 1 K f f tili d 10 C f i i ti i d th th

    1 2, , ..., pX X X 1 2, , ..., qZ Z Z

    1 2, , ..., ,pX X X

    1 2, ,..., qZ Z Z

    case. If the effect of X1 and Z1 are present, i.e., 1 Kg of fertilizer and 10 Cms. of irrigation is used then the

    linear model is written as

    0 1 2 1 2

    0 1 1

    .1 .0 ... .0 .1 .0 ... .0

    .p pY

    = + + + + + + + + += + + +0 1 1

    If X2 = 1 and Z2 = 1 is used, then the model is 0 2 2 .Y = + + +

    The design matrix can be written accordingly as in the one way analysis of variance case.

    In the three way analysis of variance model

    1 1 1 1 1 1... ... ... .p p q q r rY X X Z Z W W = + + + + + + + + + +

  • 9 The regression parameters can be fixed or random.'s

    's If all are unknown constants, they are called as parameters of the model and the model is called as a fixed-effects model or model I. The objective in this case is to make inferences about the parameters and the error

    g p

    2variance .

    If for some for all then is termed as additive constant In this case occurs with every1j x = 1 2=i n If for some for all then is termed as additive constant. In this case, occurs with every observation and so it is also called as general mean effect.

    , 1ijj x = 1, 2,...,=i n j

    If all are observable random variables except the additive constant, then the linear model is termed as's

    j

    random-effects model, model II or variance components model. The objective in this case is to make inferences

    about the variances of i.e., and error variance and/or certain functions of them.' ,s1 2

    2 2 2, , ..., p 2

    If some parameters are fixed and some are random variables, then the model is called as mixed-effects model

    or model III. In mixed effect model, at least one is constant and at least one is random variable. The

    objective is to make inference about the fixed effect parameters, variance of random effects and error variance .2j j

  • Analysis of Variance andAnalysis of Variance andAnalysis of Variance and Analysis of Variance and Design of ExperimentDesign of Experiment--II

    MODULE MODULE II II

    g pg p

    GENERAL LINEAR HYPOTHESISGENERAL LINEAR HYPOTHESISLECTURE LECTURE -- 5 5

    GENERAL LINEAR HYPOTHESIS GENERAL LINEAR HYPOTHESIS AND ANALYSIS OF VARIANCEAND ANALYSIS OF VARIANCE

    Dr. ShalabhDepartment of Mathematics and Statistics

    Indian Institute of Technology Kanpur

  • Analysis of variance

    2

    Analysis of variance is a body of statistical methods of analyzing the measurements assumed to be structured as

    1 1 2 2 ... , 1, 2,...,i i i p ip iy x x x i n = + + + + =where are integers, generally 0 or 1 indicating usually the absence or presence of effects and s are assumed to ;j iijxbe identically and independently distributed with mean 0 and variance . It may be noted that the s can be assumed

    additionally to follow a normal distribution It is needed for the maximum likelihood estimation of parameters from

    the beginning of analysis but in the least squares estimation, it is needed only when conducting the tests of hypothesis and

    the confidence interval estimation of parameters The least squares method does not require any knowledge of distribution

    j2 i

    2(0, ).N

    ij

    the confidence interval estimation of parameters. The least squares method does not require any knowledge of distribution

    like normal upto the stage of estimation of parameters.

    We need some basic concepts to develop the tools.

    Least squares estimate of Let be a sample of observations on The least squares estimate of is the values of for which

    the sum of squares due to errors, i.e.,1 2, ,..., ny y y 1 2, ,..., .nY Y Y

    2 2

    1' ( ) ( )

    2 '

    == = =

    = +n ii

    S y X y X

    y y X y X X

  • i i i h Diff i i S2 i h d b i i i b h l( )

    3

    is minimum where . Differentiating S2 with respect to and substituting it to be zero, the normal

    equations are obtained as1 2( , ,..., )ny y y y =

    2

    2 2 0 dS X X X yd

    = =.or

    d

    X X X y

    =

    If X has full rank then has a unique inverse and the unique least squares estimate of is( )X X

    which is the best linear unbiased estimator of in the sense of having minimum variance in the class of linear and unbiased

    estimator If rank of X is not full then generalized inverse is used for finding the inverse of

    ( )X X

    1 ( )X X X y =

    estimator. If rank of X is not full, then generalized inverse is used for finding the inverse of ( ).X X

    If is a linear parametric function where is a non-null vector, then the least squares estimate of

    is

    L 1 2( , ,..., )pL = A A A

    A ti i th t h t th diti d hi h li t i f ti d it i l tL

    L .L

    A question arises that what are the conditions under which a linear parametric function admits a unique least

    squares estimate in the general case.

    The concept of estimable function is needed to find such conditions.

    L

  • 4Estimable functionsA linear function of the parameters with known is said to be an estimable parametric function (or estimable) if there

    exists a linear function of Y such that

    L Y

    b ( ) .for all bE L Y R =

    Note that not all parametric functions are estimable.

    Following results will be useful in understanding the further topics.

    Theorem 1

    A linear parametric function admits a unique least squares estimate if and only if is estimable.L L

    Th 2 (G M k ff h )Theorem 2 (Gauss Markoff theorem)

    If the linear parametric function is estimable then the linear estimator where is a solution of

    is the best linear unbiased estimater of in the sense of having minimum variance in the class of all linear and unbiased

    L L X X X Y =L g

    estimators of .

    L

  • 5Theorem 3

    If the linear parametric function are estimable, then any linear combination of

    is also estimable.

    ' ' '1 1 2 2, ,..., = = =k kl l l 1 2, ,..., k

    Theorem 4

    All linear parametric functions in are estimable if and only if X has full rankAll linear parametric functions in are estimable if and only if X has full rank.

    If X is not of full rank, then some linear parametric functions do not admit the unbiased linear estimators and nothing can be

    inferred about them. The linear parametric functions which are not estimable are said to be confounded. A possible solution

    to this problem is to add linear restrictions on so as to reduce the linear model to a full rank.

    Theorem 5

    Let and be two estimable parametric functions and let and be their least squares estimators. Then'1L '2L '1 L '2 L ' 2 ' 11 1 1

    ' ' 2 ' 11 2 1 2

    ( ) ( ) ( , ) ( )

    =

    =Var L L X X L

    Cov L L L X X L

    assuming that X is a full rank matrix. If not, the generalized inverse of can be used in place of unique inverse.XX

  • Estimator of based on least squares estimation2

    6

    q

    Consider an estimator of as 2

    1 1

    1 ( ) ( )

    1 [ ( ) ' ] [ ( ) ]

    y X y Xn p

    y X X X X y y X X X X y

    =

    2

    1 1

    1

    [ ( ) ' ] [ ( ) ]

    1 [ ( ) ][ ( ) ]

    1 [ ( ) ]

    y X X X X y y X X X X yn p

    y I X X X X I X X X X yn p

    y I X X X X y

    = = = [ ( ) ]y y

    n pwhere the hat matrix is an idempotent matrix with its trace as1[ ( ) ]I X X X X

    1 1

    1

    [ ( ) '] ( )( ) ( ( ) ( ))i th lt

    tr I X X X X trI trX X X Xt X X X X t AB t BA

    =

    1( ) ( ( ) ( ))

    .

    using the result

    p

    n tr X X X X tr AB tr BAn tr In p

    = == =

    Note that, using , we have ( ) ' ( )E y Ay A tr A = + 2

    2 1

    2

    ( ) [ ( ) ]E tr I X X X Xn p

    = =

    and so is an unbiased estimator of 2 2.

  • Maximum likelihood estimation

    7

    The least square method does not uses any distribution of the random variables in the estimation of parameters. We need the

    distributional assumption in case of least squares only while constructing the tests for hypothesis and the confidence

    intervals. For maximum likelihood estimation, we need the distributional assumption from the beginning.

    Suppose are independently and identically distributed following a normal distribution with mean

    and variance (i = 1, 2,, n). Then the likelihood function of is1 2, ,..., ny y y

    1( )

    p

    i j ijj

    E y x=

    = 2( ) =iVar y 1 2, ,..., ny y y

    22

    22 2

    1 1( | , ) exp ( ) ( )2(2 ) ( )

    n nL y y X y X =

    where . . Then1 2( , ,..., ) .ny y y y =2 2

    2

    1ln ( | , ) log 2 log ( ) ( ).2 2 2n nL L y y X y X = =

    Differentiating the log likelihood with respect to and we have 2Differentiating the log likelihood with respect to and we have ,

    2

    0 ,

    10 ( ) ( )

    L X X X y

    L y X y X

    = =

    2 0 ( ) ( ).y X y Xn = =

  • Assuming the full rank of X, the normal equations are solved and the maximum likelihood estimators are obtained as1

    8

    1

    2

    1

    ( )1 ( ) ( )

    1 ( ) .

    X X X y

    y X y Xn

    y I X X X X yn

    ==

    =

    n

    The second order differentiation conditions can be checked and they are satisfied for to be the maximum

    likelihood estimators.

    Note that in the maximum likelihood estimator is same as the least squares estimator and

    2and

    is an unbiased estimator of , i.e., like the least squares estimator but

    is not an unbiased estimator of , i.e., unlike the least squares estimator.

    Now we use the following theorems for developing the test of hypothesis.

    ( )E =2 2 2 2 2( ) = n pE

    n

    Theorem 6

    Let follow a multivariate normal distribution with mean vector and positive definite covariance

    matrix . Then follows a noncentral chi-square distribution with p degrees of freedom and noncentrality parameter1 2( , ,..., )nY Y Y Y = ( , )N

    Y AYmatrix . Then follows a noncentral chi square distribution with p degrees of freedom and noncentrality parameterif and only if is an idempotent matrix of rank p.

    Y AY2, . ., ( , )A i e p A A

    Theorem 7

    Let follows a multivariate normal distribution with mean vector and positive definite covariancematrix . Let follows and follows Then and are independently distributed if

    1 2( , ,..., )nY Y Y Y = ( , )N 1Y AY 2 1 1( , )p A 2YAY 2 2 2( , ).p A 1Y AY 2YAY

    1 2 0.A A =

  • Theorem 8

    9

    and follows where rank(X) = p2

    2n 2 ( )n p

    Let follow a multivariate normal distribution then the maximum likelihood (or least squares)

    estimator of estimable linear parametric function is independently distributed of follow1 2( , ,..., )nY Y Y Y = 2( , ),N I

    L 2 ; L 1, ( )N L L X X L and follows where rank(X) p. 2 ( )n p

    Proof: Consider then1 ( ) ,X X X Y =1

    1

    ( ) ( ) ( )( )

    E L L X X X E YL X X X X

    = =

    2 1

    ( )

    ( ) ( ) ( ) ( )( )

    L X X X XL

    Var L L Var LL E L

    L X X L

    ==

    = =

    = ( ) .L X X L=Since is a linear function of y and is a linear function of , so follows a normal distribution

    Let and then

    L 2 1, ( ) .N L L X X L 1( )A I X X X X = 1'( ) ,B L X X X = 1 ( )L L X X X Y BY = =

    2 1 ( ) ' ( ) ( ) ' .and n Y X I X X X X Y X Y AY = =

    L

    So, using Theorem 6 with rank(A) = n p, follows a . Also

    ( ) ( ) ( ) .and n Y X I X X X X Y X Y AY 2

    2n

    2 ( )n p 1 1 1( ) ( ) ( )

    0BA L X X X L X X X X X X X =

    So using Theorem 7, and are independently distributed.0.=

    'Y AY 'Y BY

  • Analysis of Variance andAnalysis of Variance andAnalysis of Variance and Analysis of Variance and Design of ExperimentDesign of Experiment--II

    MODULE MODULE II II

    g pg p

    GENERAL LINEAR HYPOTHESISGENERAL LINEAR HYPOTHESISLECTURE LECTURE -- 6 6

    GENERAL LINEAR HYPOTHESIS GENERAL LINEAR HYPOTHESIS AND ANALYSIS OF VARIANCEAND ANALYSIS OF VARIANCE

    Dr. ShalabhDepartment of Mathematics and Statistics

    Indian Institute of Technology Kanpur

  • Tests of hypothesis in the linear regression model

    2

    First we discuss the development of the tests of hypothesis concerning the parameters of a linear regression model. These tests

    of hypothesis will be used later in the development of tests based on the analysis of variance.

    Analysis of Variance The technique in the analysis of variance involves the breaking down of total variation into orthogonalAnalysis of Variance The technique in the analysis of variance involves the breaking down of total variation into orthogonalcomponents. Each orthogonal factor represents the variation due to a particular factor contributing in the total variation.

    Model

    Let be independently distributed following a normal distribution with mean and variance . Denoting1 2, ,..., nY Y Y ( )p

    i j ijE Y x= 2a column vector, such assumption can be expressed in the form of a linear regression model

    where X is a matrix, is a vector and is a vector of disturbances with

    1j=1 2( , ,..., )nY Y Y Y = 1n Y X = +

    n p 1p 1n( ) 0 =E

    2( ) =Cov I

    and follows a normal distribution.

    This implies that

    ( ) =Cov I

    ( )E Y X=

    Now we consider four different types of tests of hypothesis .

    I th fi t t d l th lik lih d ti t t f th ll h th i l t d t th l i f i N t th t

    2( )( ) .E Y X Y X I =

    In the first two cases, we develop the likelihood ratio test for the null hypothesis related to the analysis of variance. Note that,

    later we will derive the same test on the basis of least squares principle also. An important idea behind the development of this

    test is to demonstrate that the test used in the analysis of variance can be derived using least squares principle as well as

    likelihood ratio test.

  • 3Consider the null hypothesis for testing where is specified and is

    unknown.

    00 :H = 0 0 0 01 2 1 2( , ,..., ) , ( , ,..., ) 'p p = = 2

    Case 1: Test of 00 :H =

    This null hypothesis is equivalent to

    0 0 00 1 1 2 2: , ,..., .p pH = = =

    Assume that all are estimable, i.e., rank(X) = p (full column rank). We now develop the likelihood ratio test.

    The dimensional parametric space is a collection of points such that

    'i s

    ( 1) 1p +

    Under all s are known and equal, say and the reduces to one dimensional space given by

    { }2 2( , ); , 0 1,2,... .i i p = < < > =0 ,H

    { }0 2 2( ); 0 = >' 0

    The likelihood function of is

    { }( , ); 0 . = >1 2, ,..., ny y y

    222 2 .

    1 1( | , ) exp ( ) ( )2 2

    n

    L y y X y X =

  • 4The likelihood function is maximum over when and are substituted with their maximum likelihood estimators, i.e., 21

    2

    ( )1 ( ) ( ).

    X X X y

    y X y X

    ==

    Substituting and in gives

    222 2

    1 1 ( | , ) exp ( ) ( ) 2 2

    n

    Max L y y X y X =

    ( ) ( ).y X y Xn

    2 2( | , )L y

    U d th i lik lih d ti t f i

    2

    .

    2 2

    exp 22 ( ) ( )

    n

    n ny X y X

    =

    H 2Under the maximum likelihood estimator of is

    The maximum value of the likelihood function under is

    0 ,H2

    2 0 01 ( ) ( ).y X y Xn

    = 0H

    22 0 02 2

    2

    1 1( | , ) exp ( ) ( ) 2 2

    n

    n

    Max L y y X y X

    n n

    =

    0 0 .exp2 ( ) ( ) 2y X y X

    =

  • The likelihood ratio test statistic is

    5

    2

    2

    2

    ( | , )( | , )

    ( ) ( )n

    Max L yMax L y

    y X y X

    =

    0 0

    2

    '0 0

    ( ) ( )( ) ( )

    ( ) ( ) ( ) ( ) ( ) ( )

    n

    y yy X y X

    y X y X

    y X X X y X X X

    =

    = + +

    20 0

    ( ) ( ) ( ) ( )

    ( ) ' ( ) 1 ( ) ( )

    n

    y X X X y X X X

    X Xy X y X

    + +

    = + n

    1

    21 q

    q

    = + 2

    ( ) ( )where q y X y X =

    The expression of q1 and q2 can be further simplified as follows:

    2

    0 01

    ( ) ( )

    ( ) ( ).

    where

    and

    q y X y X

    q X X

    = =

    p q1 q2 p

  • Consider

    6

    0 01

    1 0 1 0

    1 0 1 0

    ( ) ( )

    ( ) ( )

    ( ) ( ) ( ) ( )

    q X X

    X X X y X X X X X y

    X X X X X X X X X X

    = =

    1 0 1 00 1 1 0

    0 1 0

    ( ) ( ) ( ) ( )

    ( ) ( ) ( ) ( )

    ( ) ( ) ( )

    X X X y X X X X X X y X

    y X X X X X X X X X y X

    y X X X X X y X

    = = =

    2

    1 1

    1

    ( ) ( )

    ( ) ( )

    q y X y X

    y X X X X y y X X X X y

    = =

    1

    0 0 1 0 0

    0 1 0

    ( )

    [( ) ] [ ( ) '][( ) ]

    ( ) [ ( ) ]( ).

    y I X X X X y

    y X X I X X X X y X X

    y X I X X X X y X

    = = + +

    =

    Other two terms become zero using 1[ ( ) ] 0.I X X X X X =

  • 70H1qq

    In order to find out the decision rule for based on , first we need to find if is a monotonic increasing or decreasing

    function of . So we proceed as follows:

    1

    2

    ,qgq

    = 21 22

    1 (1 )

    nnq g

    q

    = + = +

    12

    = nd nd

    Let so that

    then

    2q

    Thus is a monotonic decreasing function of 1 .qq

    So as g increases, decreases.ddg

    122 (1 )++

    ndg g

    The decision rule is to reject if where is a constant to be determined on the basis of size of the test. Let us

    simplify this in our context.

    2q

    0H 0 0

    02

    1

    2

    1

    1

    or

    or

    n

    oqq

    +

    ( )2

    2

    02

    1

    (1 )

    1

    or

    or

    on

    n

    n

    g

    g

    ++ 0 1or

    or

    ngg C

    where C is a constant to be determined by the size condition of the test.

  • 8So reject whenever

    Note that the statistic can also be obtained by the least squares method as follows. The least squares methodology will

    0H 12

    q Cq

    1

    2

    qq

    also be discussed in further lectures.

    0 01

    ( ) ( ) q X X = 1 ( ) ( ) ( ) ( )

    sum of squares due

    q Min y X y X Min y X y X =

    sum sumof squares of

    to deviationfrom

    ORsumof

    oH

    due to squaresOR due to error

    Total sum of squares

    oH

    squaresdue to

  • Let

    Theorem 9

    9

    Let0

    11

    12

    ( ) '

    [ ( ) ] .

    Z Y XQ Z X X X X ZQ Z I X X X X Z

    = = =

    Then and are independently distributed. Further, when is true , then and

    where denotes the distribution with m degrees of freedom.

    12

    Q

    22

    Q 0H

    212 ~ ( )

    Q p 222 ~ ( ) Q n p

    2 ( )m 2HProof: Under 0 ,H

    0 0

    2

    ( ) 0( ) ( ) .

    E Z X XVar Z Var Y I

    = == =

    Proof: Under

    Further Z is a linear function of Y and Y follows a normal distribution. So 2~ (0 )Z N IFurther Z is a linear function of Y and Y follows a normal distribution. SoThe matrices and are idempotent matrices. So

    (0, )Z N I1( )X XX X 1[ ( ) ]I X X X X

    1 1

    1 1

    [ ( ) ] [( ) ] ( )

    [ ( ) ] [ ( ) ] .p

    n

    tr X X X X tr X X X X tr I p

    tr I X X X X tr I tr X X X X n p

    = = =

    = =

    So using theorem 6, we can write that under 2 21 20 2 2, ~ ( ) ~ ( )and Q QH p n p

    where the degrees of freedom p and (n-p) are obtained by the trace of and trace of 1( )X X X X 1( ) ,I X X X X respectively.

    Since so using theorem 7, the quadratic forms and are independent under H0 .

    Hence the theorem is proved.1Q 2Q1 1( ) ( ) 0,I X X X X X X X X =

  • Since and are independently distributed, so under 0 ,H/Q p

    10

    1Q 2Q

    follows a central F distribution, i.e.1

    2

    //( )Q p

    Q n p1

    2

    ( , ).Qn p F p n pp Q

    Hence the constant C in the likelihood ratio test statistic is given by

    where denotes the upper points of F-distribution with n1 and n2 degrees of freedom.

    The computations of this test of hypothesis can be represented in the form of an analysis of variance table.

    1 1 2( , )F n n

    1 ( , )C F p n p= 100 %

    The computations of this test of hypothesis can be represented in the form of an analysis of variance table.

    ANOVA table for testing 00 :H =

    Source of variation

    Degrees of freedom

    Sum of squares

    Mean squares

    F - value

    Due to p1q 1

    qp

    1

    2

    qn pp q

    1 ( , )C F p n p=

    0:H

    Error (n p)2q

    2

    ( )q

    n p

    00 :H =

    Total n 0 0( ) ( )y X y X

  • Analysis of Variance andAnalysis of Variance andAnalysis of Variance and Analysis of Variance and Design Design of Experimentsof Experiments--II

    MODULE MODULE II II

    gg pp

    GENERAL LINEAR HYPOTHESISGENERAL LINEAR HYPOTHESISLECTURE LECTURE -- 7 7

    GENERAL LINEAR HYPOTHESIS GENERAL LINEAR HYPOTHESIS AND ANALYSIS OF VARIANCEAND ANALYSIS OF VARIANCE

    Dr. ShalabhDepartment of Mathematics and Statistics

    Indian Institute of Technology Kanpur

  • C 2 T t f b t f t h d0: 1 2H k < 2

    2

    Case 2: Test of a subset of parameters when and

    are unknown

    00 : , 1, 2,..,k kH k r p = = = + +

    Th i l f lik lih d f ti d i bt i d b b tit ti th i lik lih d

    The likelihood function is

    222 2

    1 1( | , ) exp ( ) ( ) .2 2

    n

    L y y X y X =

    The maximum value of likelihood function under is obtained by substituting the maximum likelihood estimates of and , i.e.,

    2

    1

    2

    ( )1 ( ) ( )

    X X X y

    y X y X

    =

    as

    ( ) ( )y X y Xn

    =

    222 2

    1 1 ( | , ) exp ( ) ( ) 2 2

    n

    Max L y y X y X =

    2

    'exp . 22 ( ) ( )

    n

    n ny X y X

    =

  • Now we find the maximum value of likelihood function under . The model under becomes

    Th lik lih d f ti d i0H 0H

    0Y X X

    5

    . The likelihood function under is01 (1) 2 2Y X X = + + 0H22 0 0

    1 (1) 2 (2) 1 (1) 2 (2)2 2

    1 1( | , ) exp ( ) ( )2 2

    1 1

    n

    n

    L y y X X y X X =

    where Note that and are the unknown parameters. This likelihood function looks like as if it is 2

    2

    2 (2) 2 (2)2 2

    1 1exp ( * ) ( * )2 2

    y X y X =

    (0)1 (1)* .y y X = (2)

    written for

    This helps is writing the maximum likelihood estimators of and directly as

    22 (2)* ~ ( , ). y N X

    2(2)' 1 '

    (2) 2 2 2 ( ) *X X X y =( )

    22 (2) 2 (2)

    1 ( * ) ( * ).y X y Xn

    =

    Note that is a principal minor of Since is a positive definite matrix, so is also positive

    definite. Thus exists and is unique.

    '2 2X X .XX X X '2 2X X

    ' 12 2( )X X

    definite. Thus exists and is unique.

    Thus the maximum value of likelihood function under is obtained as

    2 2( )X X

    0H

    222 (2) 2 (2)2 2

    1 1 ( * | , ) exp ( * ) ( * ) 2 2

    n

    Max L y y X y X =

    2

    2 (2) 2 (2)

    exp . 22 ( * ) '( * )

    n

    n ny X y X

    =

  • The likelihood ratio test statistic for is 00 (1) (1):H =

    6

    ( ) ( )

    2

    2

    2

    max ( | , )

    max ( | , )

    ( ) ( )n

    L y

    L y

    y X y X

    =

    2 (2) 2 (2)

    2

    2 (2) 2 (2)

    ( ) ( ) ( * ) ( *

    - ( * ) ( * ) ( ) ( ) ( ) ( ) ( ) ( )

    n

    y X y Xy X y X

    y X y X y X y X y X y Xy X y X

    = + =

    2 (2) 2 (2) ( * ) ( * ) (

    1y X y X y X = +

    2

    2

    1

    -) ( ) ( ) ( )

    -1

    n

    n

    y Xy X y X

    q

    + 1

    2

    1q

    = +

    where and 1 2 (2) 2 (2) ( * ) ( * ) ( ) ( ) = q y X y X y X y X 2 ( ) ( ). = q y X y XNow we simplify q1 and q2 .p y q1 q2Consider

    ' 1 ' ' 1 '2 (2) 2 (2) 2 2 2 2 2 2 2 2

    ' 1 '2 2 2 2

    ( * ) ( * ) = ( * ( ) *) ( * ( ) *)

    * ' ( ) *

    y X y X y X X X X y y X X X X y

    y I X X X X y

    =

    0 1 01 (1) 2 (2) 2 (2) 2 2 2 2 1 (1) 2 (2) 2 (2)1 (

    ( ) ( ) ( )

    (

    y X X X I X X X X y X X X

    y X

    = + + = 0 1 01) 2 (2) 2 2 2 2 1 (1) 2 (2)) ( ) ( ).X I X X X X y X X

  • The other terms becomes zero using the result

    Consider

    ' ' 1 '2 2 2 2 2( ) 0.X I X X X X

    =

    7

    Consider1 1

    1

    0 0 1 01 (1) 2 (2) 1 (1) 2 (2) 1 (1)

    ( ) ( ) = ( ( ' ) ' ) ( ( ' ) ' )

    = ( ) '

    = ( ) ) ( ' ) (

    y X y X y X X X X y y X X X X y

    y I X X X X y

    y X X X X I X X X X y X

    + + 02 (2) 1 (1) 2 (2) ) )X X X + +

    and other terms become zero using the result Note that under

    can be expressed as Thus

    1' ( ) 0. = X I X X X X

    1 (1) 2 (2) 1 (1) 2 (2) 1 (1)( ) ) ( ) (y y 2 (2) 1 (1) 2 (2)0 1 0

    1 (1) 2 (2) 1 (1) 2 (2)

    ) )

    = ( ) ' ( ) ( )y X X I X X X X y X X

    00 1 (1) 2 (2), the term H X X +

    01 2 (1) (2)( )( ) 'X X can be expressed as Thus1 2 (1) (2)( )( ) .X X

    1 2 (2) 2 (2)

    1 12 2 2 2

    0 ' 1 ' 0 0 1

    ( * ) ( * ) ( ) ( )

    = * ' ( ) * ' ( )

    ( ) ( ) ( ) ( ) ' ( ) (

    q y X y X y X y X

    y I X X X X y y I X X X X y

    y X X I X X X X y X X y X X I X X X X y X

    =

    0 )X 1 (1) 2 (2) 2 2 2 2 1 (1) 2 (2) 1 (1) 2 (2) 1 (1( ) ( ) ( ) ( ) ' ( ) (y X X I X X X X y X X y X X I X X X X y X = ) 2 (2)0 1 ' 1 ' 0

    1 (1) 2 (2) 2 2 2 2 1 (1) 2 (2)

    )

    ( ) ( ) ( ) ( )

    X

    y X X X X X X X X X X y X X

    =

    2 ( ) ( )q y X y X =

    [ ][ ]'0 0 0 01 (1) 2 (2) 1 (1) 2 (2) 1 (1) 2 (2) 1 (1) 2 (2)

    0 1 01 (1) 2 (2) 1 (1) 2 (2)

    ( )

    ( ) ( ) ( ) ( ) ( )

    ( ) ' ( ) ( ).

    y I X X X X y

    y X X X X I X X X X y X X X X

    y X X I X X X X y X X

    = = + + + +

    =

    Other terms become zero. Note that in simplifying the terms q1 and q2, we tried to write them in the quadratic form with

    same variable 01 (1) 2 (2)( ). y X X

  • Using the same argument as in the case 1, we can say that since is a monotonic decreasing function of , so the

    lik lih d ti t t j t h

    12

    qq

    H

    8

    likelihood ratio test rejects whenever

    where C is a constant to be determined by the size of the test.

    0H

    1

    2

    q Cq

    >

    The likelihood ratio test statistic can also be obtained through least squares method as follows:

    Minimum value of when holds true. 1 2( ) :q q+ ( ) ( ) y X y X 00 (1) (1):H =

    Sum of squares due to H( ) :q q+ Sum of squares due to : Sum of squares due to error.

    : Sum of squares due to the deviation from or sum of squares due to adjusted for

    0H

    2q

    1q 0H (1)

    If then0(1) 0 =(2)

    1 2( ) :q q+

    If then(1) 0

    1 2 (2) 2 (2)

    ' '(2) 2

    ' '(2) 2

    ( ) '( ) ( ) '( ) ( ) ( )

    .

    q y X y X y X y X

    y y X y y y X y

    X y X y

    = = = (2) 2

    (2)

    sum of squaresReductiondue tosum of squares

    or ignoring

    y y

    (1)or ignoringsum of squaresdue to

  • Now we have the following theorem based on the Theorems 6 and 7.Th 10

    9

    Theorem 10

    Let 01 (1) 2 (2)

    1

    Z Y X XQ Z AZQ Z BZ

    = ==2

    1 ' 1 '2 2 2 2

    1

    ( ) ' ( )

    ( ) '.

    Q Z BZA X X X X X X X XB I X X X X

    ==

    =

    Then and are independently distributed Further and1Q 2Q 21 ( )Q q 22 ( )Q n p

    where

    Then and are independently distributed. Further and 2 2 2 ~ ( ) q 2 ~ ( ).n p

    Thus under 0 ,H

    follow a F-distribution 1 12 2

    // ( )Q r n p Q

    Q n p r Q=

    ( , ).F r n p

    Hence the constant C in is

    1 ( , )C F r n p=

    where denotes the upper points on F-distribution with r and (n - p) degrees of freedom.1 ( , )F r n p 100 %

  • The analysis of variance table for this null hypothesis is as follows:

    10

    ANOVA for testing

    Source of Degrees of Sum of Mean F - value

    00 (1) (1):H =

    1 ( , )C F p n p=

    variation freedom squares squaresDue to r(1) 1q 1q

    r1

    2

    n p qr q

    00 :H =

    Error (n p)2q

    2

    ( )q

    n p

    Total n -(p q) 1 2q q+

  • Analysis of Variance andAnalysis of Variance andAnalysis of Variance and Analysis of Variance and Design Design of Experimentsof Experiments--II

    MODULE MODULE II II

    gg pp

    GENERAL LINEAR HYPOTHESISGENERAL LINEAR HYPOTHESISLECTURE LECTURE -- 8 8

    GENERAL LINEAR HYPOTHESIS GENERAL LINEAR HYPOTHESIS AND ANALYSIS OF VARIANCEAND ANALYSIS OF VARIANCE

    Dr. ShalabhDepartment of Mathematics and Statistics

    Indian Institute of Technology Kanpur

  • Case 3: Test of 0 :H L =

    2

    Let us consider the test of hypothesis related to a linear parametric function. Assuming that the linear parameter function

    is estimable where is a vector of known constants and . The null hypothesis

    of interest is

    L 1 2( , ,..., )pL = A A A 1p 1 2( , , ..., )p =

    H L 0 :H L =where is some specified constant.

    Consider the set up of linear model where follows The maximum likelihood

    ti t f d

    Y X = + 1 2( , ,..., )nY Y Y Y = 2( , ).N X I 2estimators of and are 2

    1 ( ) = X X X y

    2 1 ( ) ( )y X y X =and

    respectively( ) ( ),y X y Xn

    = respectively.

    The maximum likelihood estimate of estimable is with L L ( ' )E L L =

    2 1

    2 1

    ( )( ) ( )

    ~ , ( )

    E L L

    Cov L L X X L

    L N L L X X L

    = =

    and

    2n 22 ~ ( )

    n n p

    assuming X to be the full column rank matrix. Further, and are also independently distributed.L 2

    2

    n

  • 3Under the statistic0 : ,H L =

    2 1

    ( )( ) ( )

    n p Ltn L X X L

    = ( )follows a t-distribution with degrees of freedom. So the test for against rejects whenever( )n p 0 :H L = 1 :H L 0H

    12

    ( )t t n p

    where denotes the upper points on t-distribution with degrees of freedom.1 1( )t n 1n100 %

  • Case 4: Test of 0 1 1 2 2: , ,..., k kH = = =

    4

    Now we develop the test of hypothesis related to more than one linear parametric functions. Let the ith estimable linear

    parametric function is and there are k such functions with and both being vectors as in the Case 3.

    Our interest is to test the hypothesis

    ' =i iL iL 1p

    :H 0 1 1 2 2: , ,..., k kH = = =where are the known constants.1 2, ,..., k Let and 1 2( , ,..., ) = k 1 2( , ,.., ) .k =

    where is a matrix of constants associated with

    The maximum likelihood estimator of is :

    L k p 1 2, ,..., .kL L Li ' i iL =

    Then is expressible as 0H 0 :H L = =

    Then 1 2 ( , ,..., ) .k L = =Also ( )E =

    2( )Cov V =( )where where is the element of V. Thus' 1(( ( ) ))= i jV L X X L ' 1( ( ) )i jL X X L ( , )thi j

    1

    2

    ( ) ( )V

    2

    follows a distribution with k degrees of freedom and follows a distribution with (n - p) degrees of freedom where 2 2

    2

    n

    2

    is the maximum likelihood estimator of21 ( ) ( )y X y Xn

    = 2.

  • Further and are also independently distributed.1

    2

    ( ) ( )V 22

    n

    5

    2 2Thus under

    1

    2

    ( ) ( )V

    0 :H =

    2

    2

    ( )

    k

    n

    n p

    or

    follows F distribution with k and (n p) degrees of freedom So the hypothesis is rejected against

    1

    2

    ( ) ( )

    n p Vk n

    :H

    ( )n p

    follows F- distribution with k and (n p) degrees of freedom. So the hypothesis is rejected against0 :H =

    whenever where denotes the upper points1 : 1, 2,..., iH i k =At least one for 1 ( , )F F k n p 1 ( , )F k n p of F-distribution with k and (n p) degrees of freedom.

    100 %

  • One-way classification with fixed effect linear models of full rank

    6

    The objective in the one way classification is to test the hypothesis about the equality of means on the basis of several

    samples which have been drawn from univariate normal populations with different means but the same variances.

    Let there be p univariate normal populations and samples of different sizes are drawn from each of the population. Let

    be a random sample from the ith normal population with mean and variance , i.e.,( 1, 2,..., )=ij iy j n i 2 , 1, 2,..., =i p2~ ( , ), 1, 2,..., ; 1, 2,..., .ij i iY N j n i p = =

    The random samples from different population are assumed to be independent of each other.

    These observations follow the set up of linear model

    Y X = +where

    1 211 12 1 21 2 1 2( , ,..., , ,..., ,..., , ,..., ) '

    ( ) '

    pn n p p pnY Y Y Y Y Y Y Y Y

    y y y y y y y y y

    ==

    1 2

    1 2

    11 12 1 21 2 1 2

    1 2

    11 12 1 21 2 1 2

    ( , ,..., , ,..., ,..., , ,..., )

    ( , ,..., )

    ( , ,..., , ,..., ,..., , ,..., ) '

    p

    p

    n n p p pn

    p

    n n p p pn

    y y y y y y y y y

    ==

    =

  • 11 0...0n

    # #%# values

    7

    2

    1 0 0

    0 1...0

    0 1...0n

    X

    = # #%# values

    0 0...1

    0 0...1pn

    # # #

    # #%#

    values

    1

    0

    thi j

    ij i j

    i j

    j xx x

    x

    =

    if occurs in the observationor if effect is present inif effect is absent in

    So X is a matrix of order is fixed and

    1

    0

    .

    i j

    p

    ii

    x

    n n

    =

    =

    if effect is absent in

    ,n p first rows of are

    next rows of are

    and similarly the last rows of are

    1n '1 (1, 0, 0,..., 0), =2n '2 (0,1, 0,..., 0) =

    pn ' (0, 0,..., 0,1).p =2Obviously, and ( ) ( ), rank X p E Y X = = 2( ) .Cov Y I=

    This completes the representation of a fixed effect linear model of full rank.

  • Th ll h th i f i t t i ( )H

    8

    The null hypothesis of interest is (say)

    and

    where and are unknown.

    0 1 2: ... pH = = = =1 : ( )At least one i jH i j 2

    W ld d l h h lik lih d i I b d h h l b d i d h h h lWe would develop here the likelihood ratio test. It may be noted that the same test can also be derived through the least

    squares method. This will be demonstrated in the next module. This way the readers will understand both the methods.

    We already have developed the likelihood ratio for the hypothesis in the case 1.0 1 2: ... pH = = =The whole parametric space is a dimensional space . ( 1)+pe o e pa a et c space s a d e s o a space( )p

    { }2 2( , ) : , 0, 1, 2,..., .i i p = < < > =Note that there are parameters and .

    Under , reduces to two dimensional space as

    ( 1)+p 1 2, , ..., p 20H Under , reduces to two dimensional space as0H

    22 21 1( | ) exp ( )i

    nnp

    L y y The likelihood function under is

    { }2 2( , ) ; , 0 . = < < >

    2 21 1

    2 2 22

    1 1

    ( | , ) exp ( )2 2

    1ln ( | , ) ln (2 ) ( )2 2

    10

    i

    i

    ij ii j

    np

    ij ii j

    n

    L y y

    nL L y y

    L y y

    = =

    = =

    = = =

    = = =

    1

    2 22

    1 1

    0

    10 ( ) .i

    i ij ioji i

    np

    ij ioi j

    y yn

    L y yn

    =

    = =

    = = = = =

  • The dot sign in indicates that the average has been taken over the second subscript j. The Hessian matrix of ( )o ioy2 2 2

    9

    second order partial derivation of with respect to and is negative definite at and which

    ensures that the likelihood function is maximized at these values.

    Thus the maximum value of over is

    ln L i 2 = ioy 2 2 =

    2( | , )L y n 22 2

    2 21 1

    /2

    1 1 ( | , ) exp ( )2 2

    exp .

    inp

    ij ii j

    n

    Max L y y

    n n

    = = =

    =

    2

    1 1

    p22 ( )

    inp

    ij ioi j

    y y= =

    The likelihood function under is

    n 22 22 2

    1 1

    2 2 2

    1 1( | , ) exp ( )2 2

    1l ( | ) l (2 ) ( )

    and

    i

    i

    nnp

    iji j

    np

    L y y

    nL

    = =

    =

    2 2 221 1

    ln ( | , ) ln(2 ) ( ) .2 2

    The normal equations and the least squares estimates are obtained a

    iji j

    L y y = ==

    2

    1 1

    ln ( | , ) 10

    s follows:

    inp

    ij ooi j

    L y y yn

    = = = 1 1

    22 2

    21 1

    ln ( | , ) 10 ( ) . i

    i j

    np

    ij ooi j

    n

    L y y yn

    = =

    = =

    = =

  • The maximum value of the likelihood function over under is 0H

    10

    22 22 2

    1 1

    / 2

    1 1 ( | , ) exp ( ) 2 2

    in

    np

    iji j

    n

    Max L y y = = =

    2

    1 1

    exp .22 ( )

    inp

    ij ooi j

    n n

    y y= =

    =

    The likelihood ratio test statistic is2

    2

    /2

    ( | , )

    ( | , )

    nnp

    Max L y

    Max L y

    =

    21 1

    2

    1 1

    ( ).

    ( )

    i

    i

    np

    ij ioi j

    np

    ij ooi j

    y y

    y y

    = =

    = =

    =

    We have2

    2

    1 1 1 1( ) ( ) ( )

    i i

    i

    n np p

    ij oo ij io io ooi j i j

    np p

    y y y y y y= = = =

    = + 2 2

    1 1 1( ) ( ) .

    ip p

    ij io i io ooi j i

    y y n y y= = =

    = +

  • Thus 22 2( ) ( )

    + n

    inp p

    ij i i io ooy y n y y

    11

    1 1 1

    2

    1 1

    2

    1

    ( ) ( )

    ( )

    1

    = = =

    = =

    = = +

    i

    ij i i io ooi j I

    np

    ij ioi j

    n

    y y y y

    y y

    q

    2

    1+ qwhere

    21

    1( )

    == p i io oo

    iq n y y

    22

    1 1

    ( ) .= =

    = inp ij ioi j

    q y y

    Note that if the least squares principal is used, then

    q1 : sum of squares due to deviations from or the between population sum of squares,

    q2 : sum of squares due to error or the within population sum of squares,

    q1+q2 : sum of squares due to or the total sum of squares.

    Using the Theorems 6 and 7 let

    0H

    0H

    Using the Theorems 6 and 7, let

    21

    1

    22

    1

    ( )

    p

    i io ooi

    p

    ii

    Q n Y Y

    Q S

    =

    =

    =

    =

    1

    2 2

    1 1 1 1

    1 1( ) , ,

    where

    i i i

    i

    n n np

    i ij io oo ij io iji i j ji

    S Y Y Y Y Y Yn n

    =

    = = = == = =

  • then under 0H

    Q

    12

    212

    222

    ~ ( 1)

    ~ ( )

    Q p

    Q n p

    and and are independently distributed.12

    Q

    22

    Q

    Thus under 0H

    12

    22

    1

    ~ ( 1, ).

    Q

    p

    F p n pQ

    2n p

    The likelihood ratio test reject whenever0H

    1

    2

    q Cq

    >

    where the constant 1 ( 1, ).C F p n p=

  • The analysis of variance table for the one way classification in fixed effect model is

    13

    Source of variation

    Degrees of freedom

    Sum of squares

    Mean squares

    F - value

    Between q 1q 1qn p

    1 ( , )C F p n p=

    populations p - 1

    Within

    1q 11

    qp

    2q

    1

    2

    .1

    qn pp q

    00 :H =

    Within populations n - p

    Total n - 1

    2q2

    ( )q

    n p

    1 2q q+

    Note that22QE

    n p =

    2 11

    ( );

    1 1

    1 .

    p

    ii

    p

    i

    p

    QEp p

    =

    = +

    =

    1

    .iip

    =

  • Analysis of Variance andAnalysis of Variance andAnalysis of Variance and Analysis of Variance and Design Design of Experimentsof Experiments--II

    MODULE MODULE II II

    gg pp

    GENERAL LINEAR HYPOTHESISGENERAL LINEAR HYPOTHESIS

    LECTURE LECTURE -- 9 9

    GENERAL LINEAR HYPOTHESIS GENERAL LINEAR HYPOTHESIS AND ANALYSIS OF VARIANCEAND ANALYSIS OF VARIANCE

    Dr. ShalabhDepartment of Mathematics and Statistics

    Indian Institute of Technology Kanpur

  • Case of rejection of 0H

    2

    If then is rejected. This means that at least one is different from other effects

    which is responsible for the rejection. So objective is to investigate and find out such and divide the population into

    groups such that the means of populations within the group are same. This can be done by pairwise testing of

    1 ( 1, ),F F p n p> 0 1 2: ... pH = = = ii

    .s Test against .

    This can be tested using following t-statistic0 : ( )i kH i k = 1 : i kH

    io koY Yt =

    which follows the t distribution with degrees of freedom under and

    2 1 1i k

    t

    sn n

    +

    ( )n p 0H 2 2 .qs n p=

    Thus the decision rule is to reject at level if the observed difference

    n p

    0H

    2 1 1( )i ky y t s > +

    1 , 2

    ( )io ko n pi k

    y y t sn n

    > +

    The quantity is called the critical difference.21 ,

    1 1n p

    k

    t sn n

    + ,2 p i kn n

  • Thus following steps are followed:

    3

    1. Compute all possible critical differences arising out of all possible pairs

    2. Compare them with their observed differences.

    3 Divide the p populations into different groups such that the populations in the same group have same means

    ( , ), 1, 2,..., .i k i k p =

    3. Divide the p populations into different groups such that the populations in the same group have same means.

    The computation are simplified if for all i. In such a case , the common critical difference (CCD) isin n=

    2

    1 , 2

    2n p

    sCCD tn

    =

    and the observed difference are compared with CCD( )y y i k and the observed difference are compared with CCD.

    If

    then the corresponding effects/means and are coming from populations with the different means.

    ( ),io koy y i k

    io koy y CCD >ioy koy

  • Note: In general we say that if there are three effects then1 2 3, ,

    4

    1 2 3 if is accepted01 1 2: ( denote as event )H A =

    and if is accepted02 2 3: (H B = denote as event )then will be accepted. 03 1 2: ( denote as event )H C =

    Since event soA B C

    The question arises here that in what sense do we conclude such statement about the acceptance of 03.H

    The reason is as follows:

    ( ) ( )P A B P C

    In this sense if the probability of an event is higher than the intersection of the events, i.e., the probability that is 03H

    Since event so,A B C

    accepted is higher than the probability of acceptance of both, so we conclude, in general , that the

    acceptance of imply the acceptance of 01 02andH H

    01 02andH H 03.H

  • Multiple comparison tests

    5

    p p

    One interest in the analysis of variance is to decide whether population means are equal or not. If the hypothesis of

    equal means is rejected then one would like to divide the populations into subgroups such that all populations with

    same means come to the same subgroup. This can be achieved by the multiple comparison tests.

    A multiple comparison test procedure conducts the test of hypothesis for all the pairs of effects and compare them at a

    significance level i.e., it works on per comparison basis.

    This is based mainly on the t statistic If we want to ensure that the significance level simultaneously for all group

    ,

    This is based mainly on the t-statistic. If we want to ensure that the significance level simultaneously for all group

    comparison of interest, the approximate multiple test procedure is one that controls the error rate per experiment

    basis.

    There are various available multiple comparison tests. We will discuss some of them in the context of one way

    classification. In two way or higher classification, they can be used on similar lines.

  • 1. Studentized range test

    6

    1. Studentized range test

    It is assumed in the Studentized range test that the p samples, each of size n, have been drawn from p normal

    populations. Let their sample means be These means are ranked and arranged in an ascending order 1 , 2 , ...,o o poy y y

    as where and

    Find the range.

    * * *1 2, ,..., py y y

    * *R y y

    *1 = ioiy Min y

    * , 1,2,..., .= =p ioiy Max y i p

    The Studentized range is defined as

    1 .pR y y=

    , p n pR nq =

    where is the upper point of Studentized range when The tables for are available.

    ,p p s

    , ,pq . = n p , ,pq

    The testing procedure involves the comparison of with in the usual way as follows: ,pq , ,pq

    100 %

    if then conclude that

    if then all in the group are not the same.

    , , , p n p p n pq q s

  • 2. Studentized - Newman - Keuls test

    7

    2

    .p psW q =

    The Student-Newman-Keuls test is similar to Studentized range test in the sense that the range is compared with

    points on critical Studentized range given by

    .pW

    100 %

    , ,p pq n

    The observed range is now compared with If then stop the process of comparison and conclude that

    if then

    * *1pR y y= .pW

    pR W< 1 2 ... .p = = =R W> if thenpR W>

    i. divide the ranked means into two subgroups containing* * *1 2, , ..., py y y

    * * * * * *1 2 1 2 1( , ,..., ) ( , ,..., ) .andp p p py y y y y y

    ii. Compute the ranges . Then compare the ranges and with* * * *1 2 2 1 1andp pR y y R y y= = 1R 2R 1.pW

    If either range is smaller than , then means (or s) in each of the groups are equal.

    If are greater then then the means (or s) in the group concerned are divided1 2orR R 1pW

    1 2/and orR R 1W ( 1)p i

    iIf are greater then , then the means (or s) in the group concerned are divided into two groups of (p 2) means each and compare the range of the two groups with

    Continue with this procedure until a group of i means (or s) is found whose range does not exceed

    1 2/and orR R 1pW ( 1)p

    2.pW

    .iW

    i

    i

    By this method, the difference between any two means under test is significant when the range of the observed means

    of each and every subgroup containing the two means under test is significant according to the Studentized critical

    range. This procedure can be easily understood by the following flow chart.

  • * * *1 2

    '

    ...io

    p

    y sy y y

    Arrange in increasing order

    8

    p

    * *1

    2

    pR y y

    s

    = Compute

    , ,ppsW qn

    =Compare with

    1 2 ...