fplrp17

148
U. S. FOREST SERVICE RESEARCH PAPER • FPL 17 • DECEMBER 1964 LINEAR REGRESSION METHODS for FOREST RESEARCH U. S DEPARTMENT OF AGRICULTURE FOREST SERVICE FOREST PRODUCTS LABORATORY MADISON. WIS

Upload: marciel-souza-carvalho

Post on 12-Jan-2016

217 views

Category:

Documents


0 download

DESCRIPTION

Este livro aborda assuntos relatiovos a Analise de Regressão aplicados as ciencias florestais.

TRANSCRIPT

Page 1: fplrp17

U. S. FOREST SERVICE RESEARCH PAPER • FPL 17 • DECEMBER 1964

LINEAR REGRESSION

METHODS for FOREST RESEARCH

U. S DEPARTMENT OF AGRICULTURE FOREST SERVICE

FOREST PRODUCTS LABORATORY MADISON. WIS

Page 2: fplrp17

SUMMARY

This Research Paper discusses the methods of linear regression analysis that have been found most useful in forest research. Among the topics treated are the fitting and testing of linear models, weighted regression, confidence limits, covariance analysis, and discriminant functions.

The discussions are kept at a fairly elementary level and the various methods are illustrated by presenting typical numerical examples and their solution. The logical basis of regression analysis is also presented to a limited extent.

ACKNOWLEDGMENTS

Appreciation is extended to Professor George W. Snedecor and the Iowa State University Press, Ames, Iowa, for their permission to reprint from their book Statistical Methods (ed. 5), the material in tables 1 and 8 of Appendix E of this Research Paper.

We are also indebted to the Literary Executor of the late Professor Sir Ronald A. Fisher, F.R.S., Cambridge, to Dr. Frank Yates, F.R.S., Rothamsted, and to Messrs. Oliver and Boyd Ltd., Edinburgh, Scotland, for their permission to reprint Table No. III from their book Statistical Tables for Biological, Agricultural, and Medical Research (Table 7 of Appendix E of this Research Paper): also Table 10.5.3 from Snedecor’s Statistical Methods (ed. 5) shown as Table 6 in Appendix E of this Research Paper.

Page 3: fplrp17

CONTENTS

INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

REGRESSION--THE GENERAL IDEA . . . . . . . . . . . . . . . . . . . . . . 2 A Moving Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Fitting a Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Confidence Limits and Tests of Significance . . . . . . . . . . . . . . 7 Interpreting a Fitted Regression . . . . . . . . . . . . . . . . . . . . . 8

THE MATHEMATICAL MODEL . . . . . . . . . . . . . . . . . . . . . . . . . 9

FITTING A LINEAR MODEL . . . . . . . . . . . . . . . . . . . . . . . . . . 15 The Least Squares Principle . . . . . . . . . . . . . . . . . . . . . . . 15 Problem I--Multiple Linear Regression with a Constant Term . . . . 19 Problem II--Multiple Linear Regression Without a Constant Term . . 21 Problem III--Simple Linear Regression with a Constant Term . . . . 22 Problem IV--The Arithmetic Mean . . . . . . . . . . . . . . . . . . . . 23 Problem V--Fitting a Curve . . . . . . . . . . . . . . . . . . . . . . . 23 Problem VI--A Conditioned Regression . . . . . . . . . . . . . . . . . 24 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

FITTING A WEIGHTED REGRESSION . . . . . . . . . . . . . . . . . . . . . 28 Problem VII--A Weighted Regression with a Constant Term . . . . . 30 Problem VIII--Ratio Estimators . . . . . . . . . . . . . . . . . . . . . 31 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

SOME ELEMENTS OF MATRIX ALGEBRA . . . . . . . . . . . . . . . . . . 34 Definitions and Terminology . . . . . . . . . . . . . . . . . . . . . . . 34 Matrix Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . 37 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 The Inverse Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Matrix Algebra and Regression Analysis . . . . . . . . . . . . . . . . 42

ANALYSIS OF VARIANCE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 A General Test Procedure . . . . . . . . . . . . . . . . . . . . . . . . 47 Degrees of Freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Problem IX--Test of the Hypothesis that ß1 + ß2 = 1 . . . . . . . . . 51 Problem X--Test of the Hypothesis that ß2 = 0 . . . . . . . . . . . . 54 Problem XI--Working With Corrected Sums of Squares and Products 57 Problem XI--Test of the Hypothesis that ß2 = ß3 = 0 . . . . . . . . . 59 Problem XIII--Test of the Hypothesis that ß1 + 2ß2 = 0 . . . . . . . . 61 Problem XIV--Hypothesis Testing in a Weighted Regression . . . . . 63 An Alternate Way to Compute the Gain Due to a Set. of X Variables . 64

FPL 17 -i-

Page 4: fplrp17

Page THE t-TEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Problem XV--Test of a Non-Zero Hypothesis . . . . . . . . . . . . . 69 CONFIDENCE LIMITS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 ^ Confidence Limits on Y . . . . . . . . . . . . . . . . . . . . . . . . . 74

Problem XVI--Confidence Limits in Multiple Regression . . . . . . . 75 Problem XVII--Confidence Limits on a Simple Linear Regression. . 78 Confidence Limits on Individual Values of Y . . . . . . . . . . . . . . 81

COVARIANCE ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Problem XVIII--Covariance Analysis . . . . . . . . . . . . . . . . . . 84 Covariance Analysis with Dummy Variables . . . . . . . . . . . . . . 86 Problem XIX--Covariance Analysis with Dummy Variables . . . . . . 88

DISCRIMINANT FUNCTION . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Use and Interpretation of the Discriminant Function . . . . . . . . . . 96 Testing a Fitted Discriminant . . . . . . . . . . . . . . . . . . . . . . 97 Testing the Contribution of Individual Variables or Sets of Variables 98 Reliability of Classifications . . . . . . . . . . . . . . . . . . . . . . . 99 Reducing the Probability of a Misclassification . . . . . . . . . . . . 100 Basic Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

ELECTRONIC COMPUTERS . . . . . . . . . . . . . . . . . . . . . . . . . . 101

CORRELATION COEFFICIENTS . . . . . . . . . . . . . . . . . . . . . . . . 102 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 The Simple Correlation Coefficient . . . . . . . . . . . . . . . . . . . 103 Partial Correlation Coefficients . . . . . . . . . . . . . . . . . . . . . 104 The Coefficient of Determination . . . . . . . . . . . . . . . . . . . . 106 Tests of Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

THE BEST OF TWO LINEAR REGRESSIONS . . . . . . . . . . . . . . . . . 107 SELECTED REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

APPENDIX A.--THE SOLUTION OF NORMAL EQUATIONS . . . . . . . . . 111 Method I--Basic Procedure . . . . . . . . . . . . . . . . . . . . . . . 111 Method II--Forward Solution . . . . . . . . . . . . . . . . . . . . . . . 112 Method III--Stepwise Fitting . . . . . . . . . . . . . . . . . . . . . . . 116

APPENDIX B.--MATRIX INVERSION . . . . . . . . . . . . . . . . . . . . . 118

APPENDIX C.--SOMESIMPLE FUNCTIONS AND CURVE FORMS ..... 123 APPENDIX D.--THEANALYSIS OF DESIGNED EXPERIMENTS .. . . . . 128 APPENDIX E.--TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Table 6.--The F-Distribution . . . . . . . . . . . . . . . . . . . . . . 133 Table 7.--The t-Distribution . . . . . . . . . . . . . . . . . . . . . . . 135 Table 8.--The Cumulative Normal Distribution . . . . . . . . . . . . 136

FPL 17 -ii-

Page 5: fplrp17

LINEAR REGRESSION METHODS for

FOREST RESEARCH by FRANK FREESE, Analytical Statistician

FOREST PRODUCTS LABORATORY1

FOREST SERVICE U. S. DEPARTMENT OF AGRICULTURE

INTRODUCTION

Many researchers and administrators have discovered the usefulness of regression methods in deriving and testing empirical relationships among various observed phenomena. In the field of forestry, for example, tree volumes have been expressed as a function of diameter, merchantable height, and form class; the strength properties of wood have been related to such characteristics as specific gravity, age, and average rate of radial growth; studies have been made of how logging costs are affected by average tree size, totalvolume, and distance from hard-surfaced roads: and site index for various species has been related to certainproperties of the soil and topography.

Regression analysis provides an objective and widely accepted routine for fitting mathematical models involving several variables. In addition, there are procedures that can often be used to evaluate the fitted equation, and, with the development of modern electronic computers, much of the computational drudgery has been eliminated.

Unfortunately, the obvious value and increased availability of regression methods have resulted in their use by people who have had a rather meager knowledge of the mechanism and its limitations. This is not necessarily a statistical catastrophe-many people drive a car without having the slightest notion of what makes it go. But the user of regression, like the driver of a car, will do a better job if he has learned the best operating procedures and knows something of what the machinery can and cannot do. The purpose of this paper is to provide some of this knowledge in relatively simple terms.

1Maintained at Madison, Wis.. in cooperation with the University of Wisconsin.

Page 6: fplrp17

The expression “relatively simple” is not very informative. To be more specific, it is necessary to spell out the level of knowledge that the reader is assumed to have. Mathematically, nothing is assumed beyond high-school algebra. Though the solution of simultaneous linear equations falls within this limit, a review of this topic is given in Appendix A. The use of subscripts and the summation notation (for example,

n m Σ Σ x i j) will not be reviewed; information on this subject is given in (3, 8). 2

A i=1 j=1 knowledge of matrix algebra is not assumed. However, the so-called c-multipliers play such an important role in regression analysis, and the term matrix appears so often in regression literature, that a few pages are devoted to some of the basic elements of matrix algebra.

The reader should have a knowledge of the elementary terms, concepts, and methods of statistics. He does not have to be an expert but should have some idea of the meaning of such terms as population, sample, mean, variance, standard deviation, degrees of freedom, correlation, and normal distribution. He should also know the rudiments of the analysis of variance and the “t” and “F” tests of significance. Those who need brushing up on these topics should review one of the many textbooks on statistical methods (1, 5, 7).

This research paper is not designed for statisticians but for research workers and administrators who want to use some of the tools that statisticians have devised For this reason, theemphasis will be on “how” rather than “why.” No attempt will be made to give the theory, but for some of the methods described, a rather loose discussion of the rationale may be given. It is hoped that when the reader becomes comfortably familiar with some of the “hows,” he will find the time and inclination to take a closer look at the ‘whys.’

REGRESSION - THE GENERAL IDEA

A Moving Average

The concept of the arithmetic mean or average of a population is familiar to most people, particularly those who have had any exposure to statistical methods. Very briefly, we envision a population of units, each of which can be characterized by a variable (Y). There is a population mean (µ y) around which the actual unit values are distributed in some manner. Thus, the Y value of a given unit can be represented by

2Underlined numbers in parentheses refer to Selected References at the end of this report. FPL 17 -2-

Page 7: fplrp17

where: Yi = the actual value of Y for the ith unit

µ = the population mean of all Y values y ε 1 = the difference between the Y value of the ith unit and the population

mean (Yi -µy). This is sometimes called a deviation or error.

A measure of how widely the individual values are spread around the mean is known as the variance, and the square root of the variance is called the standard deviation. For the population, the variance is the average squared deviation.

Now, think in terms of a series of such populations, each with its own mean and variance. Often there will be some other characteristic (X) that has the same value for all units within a given population but varies among populations. It also happens at times that there is some sort of functional relationship between the mean Y values for the populations and the associated X values. Graphically, such a relationship might appear as shown in figure 1.

M 124 614

Figure 1.--Y and X values for four populations

¸ = Individual values of Y in population 1 µ1 = Mean Y for population 1

X1 = The value of X for population 1

o, µ2, X2 = Similar values for population 2

r , µ3, X3 = Similar values for population 3

£ , µ4, X4 = Similar values for population 4

FPL 17 -3-

Page 8: fplrp17

The line showing the relationship between mean Y and X is called a regression line, and its mathematical expression is called a regression function. If the relationship between the mean value of Y (µy) and the value of X is a straight line, we could write

where: X =The value of X for the population having a mean Y value of µy .

a,b = constants, indicating the level and the slope of the straight line.

Thus, a regression can be thought of as a form of average that changes with changes in the value of the X variable--amoving average. One of the aims in regression analysis is to find an equation representing this relationship. In this relationship, Y is usually called the dependent variable and X the independent variable. This does not mean, however, that there has to be a cause and effect relationship: it only indicates that the Y values are associated with the X values in some manner that can be described approximately by some mathematical equation. Knowing the value of X gives us some information about the value of Y. The person concerned with the r e g r e s s i o n makes his own inferences as to what is implied by the indicated relationship,

The equation µy = a + bX specifies the relationship between the mean values of Y and the level of X. To indicate that the individual values of Y vary about the mean, we might write

Or, since µy varies linearly with the level of X,

In other words, this says that they value of any individual unit is due to the regression of mean Y on X plus a deviation (εi) from the mean.

If the spread (as measured by the variance) of the Y values about their mean (µy) is the same for all of the populations, regardless of the value of the associated X variable, the variance is said to be homogeneous. If the variance is not the same for all populations, it is said to be heterogeneous.

Frequently, the populations can be characterized by more than one X variable (for example, X1, X2, and X3) and it may happen that the mean (µy) associated with each

combination of values of these variables is functionally related to these values. Thus, we might have the regression equation

FPL 17 -4-

Page 9: fplrp17

µ y = ß0 + ß1X1 + ß2X2 + ß3X3 where: ß0, ß1, ß2, and ß3 are constants (usually called regression coefficients)

X1, X2, and X3 are the numerical values of three associated characteristics.

This equation merely says that ifwespecifyvalues for X1, X2, and X3 then we would,

on the average, expect the characteristic labeled Y to have the value (µy) given by the

equation. The relationship of µy to the independent variables is sometimes spoken of

as a regression surface or a response surface, even though a direct geometric analogy breaks down beyond two X variables.

Since µy represents a mean value, some individual values of Y will have to be

higher and some lower than this. In short, we again write

Yi = µy + εi or, in this case,

Yi = ß0 + ß1X1 + ß2X2 + ß3X3 + εi

And again, if the spread of the Y values about their mean is the same for all points on the regression surface (that is. at all combinations of the independent variables), the variance is said to be homogeneous. If the spread of Y values is not the same at all points, the variance is heterogeneous.

In this introduction to the idea of a regression, we have talked as though there were a number of separate populations--one for each value of X or one for each possible combination of values for several different X’s. It is also possible (and more common) to think in terms of a single population of units, each unit being characterized by a Y value and one or more X values. There is a regression “surface” representing the relationship of the Y value to the associated X values, but the Y values are not all right on the surface: some of them are above it and some below. A given point on the surface represents the mean Y value of all the units having the same X values as those associated with that point. The spread of Y values above and below the surface may be the same for all points (homogeneous variance) or it may differ from point to point (heterogeneous variance).

Fitting a Regression

If there is a relationship between µy and the independent variables (X1, X2, etc.),

it may be very desirable to know what the relationship is. To illustrate, it might be used in predicting the value of Y that would, on the average, be associated with any

FPL 17 -5-

Page 10: fplrp17

particular combination of X values; it could also be useful in selecting a combination of X values that might be associated with some specified value Y; and it could suggest how changes in Y are associated with changes in any of the X variables.

Ordinarily, the regression relationship will not be known but must be estimated from observations made on a sample of the individual units. On each of the selected units, we will observe the value of Y and each of the associated X’s. From these observations, we must derive estimates of the coefficients (ß 0, ß1, etc.) in the regression equation. Usually, we will also want to obtain some measure of the reliability of these estimates.

A first step is to select a mathematical function or model which we think may represent the relationship. Two broad classes of functions should be recognized; those that are linear in the coefficients and those that are nonlinear in the coefficients. An equation in which the coefficients are raised to only the first power and are combined only by addition or subtraction is said to be linear in the coefficients. Some examples are:

(1) Y = a + bX

(2) Y =a + bX + CX2

(3) Y = ß0 + ß1X1 + ß2X2 + ß3X3

(4) Y = ß0 + ß1X1 + ß2(1 )X1

Note that the model can be linear in the coefficients even though it is nonlinear as far as the variables (Y and X) are concerned

An equation in which the coefficients are raised to other than the first power, appear as exponents, or are combined other than by addition or subtraction are said to be nonlinear in the coefficients. The following are examples:

(1) Y = a + bX

(2) Y = aXb

(3) Y = a(X-b)c

In some cases models that are nonlinear in the coefficients can be put into a linear form by a transformation of the variables. Thus, the second equation above could be converted to a linear form by taking the logarithm of both sides, giving

log Y = log a + b log X or

Y' = a' + bX'

FPL 17 -6-

Page 11: fplrp17

where: Y ' = log Y

X ' = log X

This Research Paper will be confined to the fitting and testing of linear models. The fitting of nonlinear models requires more mathematical skill than is assumed here. While this will be an inconvenient restriction at times, it will often be found that a linear model provides a very good approximation to the nonlinear relationship.

Having selected a mathematical model, we should next examine the variability of the Y values about the regression surface. Two aspects of this variability are of interest: (1) Is it the same or nearly so at all points of the regression surface (homogeneous variance), or does it vary (heterogeneous variance)? In the latter case, we would also like to know how the variance changes with changes in the independent variables.

(2) What is the form of the distribution of individual Y values about the regression surface? In many populations, the values will follow the familiar Normal or Gaussian Distribution.

The answer to the first question affects the method of estimating the regression coefficients. If the variance is homogeneous, we can use equal weighting of all observations. If the variance is not homogeneous, it will be more efficient (and more work) to use unequal weighting. The answer to the second question is needed if tests of significance are to be made and confidence limits obtained for the estimated regression coefficients or for functions of these coefficients.

These questions are not easily answered, and the less fastidious users of regression tend to bypass them by making a number of assumptions. Given sufficient critical

familiarity with a population, the assumptions as to homogeneity of variance and form of distribution may be quite valid. Without this familiarity, special studies may have to be made to obtain the necessary information.

Confidence Limits and Tests of Significance

When the mean of a population is estimated from a sample of the units of that population, it is well known that this sample estimate is subject to variation. Its value will depend on which units were, by chance, included in the sample. Such an estimate would be worthless without some means of determining how far it might be from the true value. Fortunately, the statisticians have shown that the variability of

FPL 17 -7-

Page 12: fplrp17

the individual units in a sample can be used in obtaining an indication of the variability of the estimated mean. This in turn enables us to test some hypothesis about the value of the true mean or to determine confidence limits that have a specified probability of including the true mean,

In fitting a regression, the estimated regression coefficients are also subject to sampling variation. Again it is important to have a method of testing various hypotheses about the coefficients and of determining some limits within which the true coefficients or the true regression may be found. This would include testing whether or not any or all of the coefficients could actually be zero, which would imply no association between Y and a particular X or set of X variables. The statisticians have provided the means of doing this. The procedures that have been devised for testing various hypotheses about the regression coefficients or for setting confidence limits on the regression estimates will be discussed following the description of the fitting techniques.

Interpreting a Fitted Regression

Deriving the meaning of a fitted equation is one of the very difficult and dangerous phases. Here there are no strict rules to follow. On the assumption that it has not been copyrighted, 'THINK' is suggested as the guiding principle.

In searching for the meaning of a regression, the fact that it is man-made should never be overlooked. It is an attempt to describe some phenomenon that may be controlled by very complex biological, physical, or economic laws. It may, at times, be an excellent description, but it is not a law in itself; only a mathematical approximation.

Not only is the fitted regression an artificial description of a relationship, but it is also a description that may not be reliable beyond the range of the sample observations used in fitting the regression. For example, a straight line may be a very good approximation of the relationship between two variables over the range of the sample data, but this does not mean that outside of this range the relationship is

not curved. Similarly, a second-degree parabola (Y = a + bX + CX2) may give an excellent fit over a certain range in the data, but this does not prove the existence of the maximum or minimum point that will appear if this parabola is extended.

Finally, it must be remembered that a fitted regression is a sample-based estimate and, as such, is subject to sampling variation. It should not be used without giving due consideration to sampling error. Usually, this will mean computing confidence limits on any predictions made from the fitted regression.

FPL 17 -8-

Page 13: fplrp17

---

THE MATHEMATICAL MODEL

The most common applications of regression methods have one or both of the following objectives :

(1) To find a mathematical function that can be used to describe the relationship between a dependent variable and one or more independent variables.

(2) To test some hypothesis about the relationship between a dependent variable and one or more independent variables.

This section will discuss some aspects of selecting a mathematical model to be fitted and tested as a description of the relationship between the dependent and the independent variables.

Through this Research Paper, we will be concerned with fitting and testing the general linear model

Y = ß0 + ß 1 X1 + ß2X2 + + ßkXk where: Y = the dependent variable

Xi = an independent variable

ßi = the regression coefficient for Xi (to be estimated in fitting).

This does not mean that we will only be able to fit straight lines or flat surfaces. For example, the general equation for a second-degree parabola is

Y = a+ bX+ CX2

Graphically, this might look roughly like one of the curves shown in figure 2.

Figure 2.--The second degree parabola. M 124 619

To fit this curve with the general linear model, we merely let X1 = X, and X2 = X2 ,

then fit the model, Y = ß0 + ß1X1 + ß2X2 FPL 17 -9-

Page 14: fplrp17

As another example, we might want to fit a hyperbolic function with the general equation

Y= a + b (1 X)

The form of this curve is illustrated in figure 3.

Figure 3.--The hyperbola. M 124 619

X , then we can fit this function with the modelIf we let X1 = 1

Y = ß0 + ß1X1

A s a final example, the exponential curve represented by the function

Y = ABx

bas the form shown in figure 4.

Figure 4.--Exponential curves. M 124 622

The curve can be fitted by a linear model if we take the logarithm of both sides, giving log Y = log A + X log B

This is the same as the linear model

Y' = ß0 + ß1X

where we let Y' = log Y.

As noted earlier, when we speak of a linear model we are referring to a model that is linear in the coefficients. The above examples show that a linear model may be nonlinear as far as the variables are concerned

FPL 17 -10-

Page 15: fplrp17

There are, of course, some curvilinear functions that cannot be transformed to a

linear model. The function Y = A + BX , for example, cannot be transformed to the csimple linear form, nor can the function Y = a(X-b) . There are procedures for

fitting some nonlinear models, but they are generally too involved and laborious for inclusion here. It should be mentioned, however, that there are electronic computer programs available for fitting some nonlinear models.

Selecting the appropriate model can be both critical and difficult. It is one phase of regression analysis that has not been taken over by the electronic computers. The degree of difficulty and our probable success will depend to a considerable extent on how much we know about the behavior of the subject matter.

In some cases, the model can be derived by reasoning from basic principles. In formulating a model for the relationship between the specific gravity (S) of an annual increment of wood at a given point on a tree and the distance (T) of that point from the apex of the tree, Stage 3 reasoned as follows:

(1) The specific gravity is inversely related to the concentration of auxin per unit area of cambium (C) and directly proportional to the distance (T) from the apex. These effects are additive. This gives

in which a, b, and d are constants.

(2) The tree bole is approximately a paraboloid, so that the diameter (DT) of the stem at a given distance (T) from the apex can be represented by

in which g is a constant.

(3) Since auxin concentration would vary inversely with cambial area (and hence with diameter) we have

C = k/DT or C = k/g T

in which k = a constant.

(4) This reasoning then led to the model

S = a + dT + bg k T

3Stage, Albert R. Specific gravity and tree weight of single-tree samples of grand fir. U.S. Forest Serv. Res. Paper INT-4, 11 pp., Intermountain Forest and Range Expt. Sta., Ogden, Utah. 1963.

FPL 17 -11-

Page 16: fplrp17

or in terms of the general linear model

S = ß0 + ß1X1 + ß2X2 where: X1 = T

X2 =

In many cases, our knowledge of the subject will be less specific, but the same line of development must still be followed. If we were studying the relationship between Y and two independentvariables (X1 and X2), we might have an idea that the relationship

between Y and one of the variables (say X1) could be represented by a straight line (fig. 5).

M 124 615Figure 5.--Y is a linear function of X1

.

Now, to work X2 into the model, we have to consider how changes in X2 might affect the relationship of Y to X1. Equal increments in X2 might result in a series of

equally spaced parallel straight lines for the relationship of Y to X1 (fig. 6).

M 124 628Figure 6.--The relationship of Y to X1 and X 2 .

This suggests that in the equation Y = a + bX1’ the slope (b) remains unchanged, but

the value of the Y intercept is a linear function of X2 (that is, a = a' + b' X2). Then

substituting for (a) in the relationship between Y and X1, we have

Y = a ' + b ' X2 + bX1

FPL 17 -12-

Page 17: fplrp17

or the general model Y= ß0 + ß1X1 + ß2X2

In this case, we say that the effects of X1 and X2 are additive.

If we have reason to believe that the Y-intercept remains constant but the slope changes linearly (that is, b = a' + b' X2) with changes in X2, we would have the model

Y = a + (a' + b' X2)X1

=a + a' X1 + b' X1X2 or

Y = ß0 + ß1X1 + ß2X' 2 where: X' 2 = X1X2

In cases such as this, we say that there is an interaction between the effects of X1 and X2, and the variable X' 2 = X1X2 is called an interaction term. It implies that the

effect that one variable has on changes in Y depends on (interacts with) the level of the other variable.

Most likely, if the slope changes, the Y intercept will also change. If both of these changes are thought to be linear, then

a = a' + b' X2

b = a" + b"X2

and the model becomes Y = a' + b' X2 + a" X1 + b" X1X2

or

Y = ß0 + ß1X1 + ß2X2 + ß3X3 where: X3 = X1X2

If the relationship of the dependent variable to an independent variable is curvilinear, the problem is to select a mathematical function that gives a good approximation to the particular form of curve. This is largely a matter of learning the appearance of the various functions. Some of the forms associated with the more commonly fitted functions are shown in Appendix C. Those who will be doing consider-able regression work would do well to maintain a “library of curve forms,” adding to it whenever a new form is encountered

FPL 17 -13-

Page 18: fplrp17

If absolutely nothing is known about the form of relationship, then the selection of a model gets to be a rather loose and frustrating process. There are no good rules to follow. Plotting the data will sometimes suggest the appropriate model to fit. For a single independent variable, plotting is no problem. For two independent variables (say X1 and X2), we can plot Y over X1 using different symbols to represent different

levels of X2' As an example, the following set of hypothetical observations has been

plotted in figure 7.

Figure 7.--Relationship of Y to X at various levels of X2

. M 124 617

Each symbol represents a different class of X2 values as shown along the right side

of the graph. The different lines represent the relationship of Y to X1 at the various

levels of X2. The relationship of Y to X seems to be linear (Y = a + bX1), and it

appears that both the Y intercept and the slop of the line increases linearly with X2 (a = a' + b' X2, b = a" + b" X2). This would suggest the model

Y = ß0 + ß1X1 + ß2X2 + ß3X3

where: X3 = X1X2 .

Of course, real data will rarely behave so nicely.

FPL 17 -14-

Page 19: fplrp17

- - -

If there are more than two independent variables, the graphics may not be much more illuminating than the basic data tabulation. The usual procedure is to plot the single variables in the hope of spotting some overall trend, and then perhaps to plot pairs of independent variables (as above) to reveal some of the two-variable inter-actions. There are other graphical manipulations that can be tried, but the probability of success is seldom high,

With the advent of electronic computers, much of the computational drudgery has been removed from the fitting of a regression. This had led to what might be called a 'shotgun' technique of fitting. A guess is made as to the most likely form for each variable, a number of interaction terms are introduced, usually up to the capacity of the program, and then a machine run is made. This may consist of fitting all possible linear combinations of a set of independent variables (as discussed in the section on Electronic Computers) or may employ a stepwise fitting technique (Appendix A Method III). From the output of the machine, the variables that seem best are selected, The analysis may end here or further trials may be made using new variables or new forms of the variables tried in the first run. Statistically, this technique has some flaws. Nonetheless, it is useful when little or nothing is known about the nature of the relationships involved. But, it should be recognized and used strictly as an exploratory procedure.

FITTING A LINEAR MODEL

The Least Squares Principle

The most commonly used procedures for fitting a regression surface are derived from what is known as the least squares principle. To see what this principle is and what it leads to, suppose that a sample of n units has been selected from some population and on each unit a value has been observed for a dependent variable (Y) and several independent variables (X1, X2,---,Xk). Supposefurther that the relationship

between the dependent and the independent variables can be represented by the linear model ,

where: Y. = the observed value of the dependent variable for the ith unit in the

sample (i = 1, 2, , n)

Xji = the value of the j th independent variable (j = 1, 2, , k) on the ith ---

sample unit.

FPL 17 -15-

Page 20: fplrp17

---

ßj = the regression coefficient of the jth independent variable

εi = the deviation of the Y value from the regression surface

(that is, ε i = Yi - ß0 - ß1X1i - ß2X2i - - ßkXki).

We do not, of course, know the values of the coefficients, but must estimate them from the sample data. The principle of least squares says that under certain conditions, the best estimates of the coefficients are those that make the sum of squared deviations a minimum

Now, for the ith sample unit, the deviation would be

and the squared deviation is

For all sample units, the sum of squared deviations is

In this quantity, we know what the values of Yi, X1i, X2i, ---, Xki, are, because

these were observed on the sample units. The magnitude of this sum of squared deviations therefore depends on what values are used for the regression coefficients (ßj). To distinguish them from the true but unknown coefficients, the estimates will

be symbolized by

It can be shown that the estimates that make the sum of squared deviations a minimum, can be found by solving the following set of simultaneous equations:

FPL 17 -16-

Page 21: fplrp17

These are known as least squares normal equations (LSNE), and the solutions are called the least squares estimates of the regression

coefficients. The first equation is called the ß0 equation, the second the ß1 equation, etc.

For those who are familiar with differential calculus, it can be mentioned that the ßj equation is obtained by taking the derivative of the sum of squared deviations with

respect to ßj and setting it equal to zero: the familiar procedure for finding the value

of a variable for which a function is a maximum or minimum Thus,

Setting this equal to zero and moving the term with no coefficient to the righthand side gives the equation

But, writing the normal equations for a particular linear model does not require a knowledge of calculus. Merely use the set of equations given above as a general set and select those needed to solve for the coefficients in the model to be fitted, eliminating unwanted coefficients from the selected equations. Thus for the model

Y = ß0 + ß1X1 + ß6X6

the equations, with unwanted coefficients eliminated, would be:

Coefficient Equation

ß0

ß1

ß6 For the model

Y = ß1X1 + ß2X2

FPL 17 -17-

Page 22: fplrp17

the normal equations would be:

Coefficient Equation

ß1 ß2

When the model contains a constant term (ß0 ), it is possible to simplify the normal equations and their solution. The simplification arises from the fact that the solution of the normal equations will give as the estimate of ß0,

_ _ where: Y , X 1, _ _ _ = The sample means of Y, X1, etc.

Using this value, we can rewrite the model

or

where: y = Y - Y

The normal equations for this model are:

where:

etc. FPL 17 -18-

Page 23: fplrp17

--- ^ ^ ^ The usual procedure is to solve the normal equations for ß 1, ß 2, , ß k and then to

^ use these values to solve for ß0 . This may not appear to be much of a saving in labor,

but it is. The saving arises from the fact that the normal equations have been reduced by one row and column.

The terms

etc., are called uncorrected or raw

etc., are usually referred to as the corrected sums

of squares and products, while

sums of squares and products. Some details of the analysis of variance depend on which fitting procedure is used, as will be noted later on.

Problem1 - Multiple Linear Regression With a Constant Term

A number of units (n = 13) were selected at randomfrom a population. On each unit, measurements were made of a Y variable and three independent variables (X1,X2, and X3). The model to be fitted is of the form

The data were as follows:

Since the model contains a constant term, it will be simpler to work with the corrected sums of squares and products. For this method, the normal equations will be

Coefficient Equation

FPL 17 -19-

Page 24: fplrp17

Calculating the corrected sums of squares and products:

and similarly,

In computing the sums of squares and products, it should be noted that a sum of products may be either positive or negative, but a sum of squares must always be positive; a negative sum of squares indicates a computational error.

Substituting the sums of squares and products into the normal equations, we have

The solution4 of the system yields

and from these we obtain

Therefore, the fitted regression is

4Appendix A reviews the method for solving a set of simultaneous equations.

FPL 17 -20-

Page 25: fplrp17

In a fitted regression, the circumflex (^ ), which is also referred to as a caret or hat, is placed over the Y to indicate that we are dealing with an estimated value, just as we

^ used ßi to symbolize the estimate of ß1. In this case, it will be recalled, the value being

estimated is the mean of all Y values associated with some specified combination of values for the three independent variables.

Although the above method involves less work and is to be preferred when the model contains a constant term, the same model can be fitted using uncorrected sums of squares and products. The normal equations in this case would be:

Coefficient Equation

or,

As before, the solutions are

Problem II - Multiple Linear Regression Without a Constant Term

Given the data of Problem I, fit the model

FPL 17 -21-

Page 26: fplrp17

This presents no additional problems. Since the model contains no constant term, we will have to work with uncorrected sums of squares and products. The normal equations to be solved are:

Coefficient Equation

or

^ ^ The solutions are ß1 = 3.1793 and ß 2 = -1,6204, so the fitted equation is

Problem III - Simple Linear Regression With a Constant Term

It is customary in elementary discussions of regression to start with the fitting of a simple linear equation and then go on to the fitting of multiple regressions. In these sample problems, the procedure has been reversed in the hope of emphasizing the generality of the fitting procedure. Thus, fitting the linear model

is just a simple case of the general methods used in Problems I and 11. Since the model has a constant term, we can work with the corrected sums of squares and products. This results in the single normal equation

Coefficient Equation

or using the data of Problem I,

^ The solution is ß1 = 2.4615 and with this we find

FPL 17 -22-

Page 27: fplrp17

so the fitted equation is

Problem IV - The Arithmetic Mean

It may be of interest to the reader who has had little exposure to the methods of least squares toknowthatthe sample mean is also a form of least squares regression. If we specify the model

we are merely saying that we want to estimate the mean value of Y, ignoring the _ values of the X variables. This is obviously the sample mean, Y. Treating this as a regression problem, the normal equation would be:

Coefficient Equation

which has the familiar solution

Problem V - Fitting a Curve

Fitting a curve presents no new problems, provided the curve can be expressed by

a linear model. To fit Y as a quadratic function of X1(that is, Y = a t bX1 + cX1 2),

for example, we merely rename X1 2(say X1

2 = X4) and fit the linear model

The values of X would be

FPL 17 -23-

Page 28: fplrp17

As the model contains a constant term, the normal equations can be written

Coefficient Equation

The corrected sums of squares and products involving X4 are

so the normal equations are

^ ^ Solving this set gives ß1 = 1.1663 and ß4 = 0.0708. Then,

and the fitted quadratic is

Problem VI - A Conditioned Regression

Sometimes there is a reason to impose certain restrictions on the values of the coefficients in a fitted regression. We have already seen one example of this in Problem II, where we fitted a model without a constant term This is equivalent to imposing the restriction that ß0 = 0, that is, the regression surface passes through the origin.

Fitting a regression with linear restrictions on the coefficients usually involves nothing more than rewriting the model and sometimes a revision of the basic

FPL 17 -24-

Page 29: fplrp17

variables. Suppose, for example, that we were going to fit the model

Y = ß0 + ß1X1 + ß2X2

to the data of Problem I, but we wished to impose the restriction that

ß1 + ß2 = 1.

This is equivalent to

ß2 = 1 -ß1

and writing this into the original model gives

or

Y = ß0 + ß1X1 + (1 - ß1)X2

(Y - X2) = ß0 + ß1(X1 - X2).

This is obviously a linear model

Y' = ß0 + ß1X'

where: Y' = Y - X2

X' = X1 - X2

The normal equation for fitting the revised model is

There are two ways of getting the sums of squares and products of the revised variables; one is to compute revised values for each observation. Thus we would have

Y' 12 25 -4 -2 34 30 30 3 -2 10 -6 -1 14

Sums Means

143 11

X' 2 6 -2 -2 11 8 9 -2 -4 2 -2 -2 2 26 2

and Σx' 2 = 298, Σ x' y' = 848

FPL 17 -25-

jgodfrey
Typewritten Text
jgodfrey
Typewritten Text
jgodfrey
Typewritten Text
Page 30: fplrp17

Often it will be easier to work directly with the original values, thus:

Σx' 2 = Σ X' 2 - (Σ

n X')2

= Σ (X1 - X2)2 -[Σ(X1

n - X2)]2

2 (Σ X1 -Σ X 2)2 2 = Σ (X1 - 2X1 X2 + X2 ) - n

= ΣX1 2 - 2 ΣX1X2 + ΣX2

2 -ΣX1)2 - 2( X1)(ΣX2) + ( ΣX2)2

n .

Then, using the values that have already been computed for the original variables,

Σx'2 = 1,235 - 2(847) + 809 - (117)2 - 2(117)(91) + (91)2 = 298, as before.13

Similarly,

Σ x'y' = Σ X'Y' - (ΣX')( n ΣY') = Σ(X1 - X2)(Y - X2) -

[Σ(X1 - X2)][Σ (Y - X2)] n

2 (ΣX 1)( Σ Y) - ( ΣX2)( Σ Y) - (ΣX1)( ΣX2) + ( ΣX2)2

= Σ X1Y - Σ X2Y - Σ X1X2 + Σ X2 -n

= 2554 - 1382 - 847 + 809 - (117)(234) - (91)(234) - (117)(91) + (91)2

13

= 848, as before.

Putting these values in the normal equation gives

^ 298ß = 8481 ^ ß = 2.84561

and ^ ß0 = 11 - (2.8456)(2) = 5.3088.

In terms of the revised variables, the regression is

^ Y ' = 5.3088 + 2.8456 X'.

FPL 17 -26-

Page 31: fplrp17

This may be rewritten in terms of the original variables as

^ (Y - X2) = 5.3088 + 2.8456(X1 - X2)

or

^ Y = 5.3088 + 2.8456 X1 - 1.8456 X2

Note that the coefficients of X1 and X2 add up to 1 as required.

Requirements

In order to use the methods that have been described, the sample data must meet certain requirements. For one thing, it must be from a population for which the variance is homogeneous. That is, the variance of the Y values about the regression surface must be the same at all points (for all combinations of X values). If the variance is not homogeneous, it will usually be more efficient to use some weighting procedure as will be discussed later. In this connection, it should be noted that if the model to be fitted does not have a constant term, then the homogeneity of variance may be open to question. Absence of the constant term implies that when all X variables are equal to zero, then Y will also be zero. If Y cannot have negative values, then the variability of Y may be restricted near the origin.

A second requirement is thatforthesampleunits the deviations (εi) of the Y values

from the regression surface must be independent of each other. That is, the size and direction (+ or -) of the error for one unit should have no relationship to the size and direction of the error for any of the other units in the sample, beyond the fact that they are from the same population. Independence of errors can usually be assumed if the sample units were randomly selected as far as the Y values are concerned (purposive selection of X values is usually permissible and often desirable). The errors may not be independent where a series of observations are made on a single unit. Thus, when growth bands are placed on trees and the diameter is observed on the same trees at intervals of time, the errors will probably not be independent. Also, if the units observed are clustered in some way, the errors may not be independent within clusters.

A final requirement is that the X values be measured with essentially no error. Procedures exist for fitting a regression when the dependent and the independent variables are both subject to error, but they are beyond the scope of this paper.

FPL 17 -27-

Page 32: fplrp17

It should be noted that fitting a regression by the least squares principle does not require that the Y values be normally distributed about the regression surface. However, the commonly used procedures for computing confidence limits and making tests of significance (t and F tests) do assume normality.

FITTING A WEIGHTED REGRESSION

The regression fitting procedures that have been described will give unbiased estimates of the regression coefficients, whether thevariance is homogeneous or not. However, if the variance is not homogeneous, a weighted regression procedure may give more precise estimates of the coefficients. In a weighted regression, each squared deviation is assigned a weight (wi), and the regression coefficients are

estimated so as to minimize the weighted sum of squared deviations. That is, values ^ are found for the ß j ’s so as to minimize

This leads to the normal equations

Coefficient Equation

The weights are usually made inversely proportional to the known (or assumed) variance of Y about the regression surface. To understand the reasoning behind this, refer to figure 8, in which a hypothetical regression of Y on X has been plotted along with a number of individual unit values.

FPL 17 -28-

Page 33: fplrp17

Figure 8.--An example of non-homogeneous variance. M 124 616

It is obvious that the variance of Y about the regression line is not homogeneous: it is larger for large values of X than for small values. It is also fairly obvious that a single observation from the lower end of the line tells much more about the location of the line than does a single observation from the upper end. That is, units that are likely to vary less from the line (small variance) give more information about the location of the line than do the units that are subject to large variation. It stands to reason that in fitting this regression the units with small variance should be given more weight than the units withlargevariance. This can be accomplished by assigning weights that are inversely proportional to the variance. Thus, if the variance is known to be proportional to thevalueof one of the X variables (say Xj ), then the weight could be

If the variance is proportional to the square of Xj , the weight could be

Determining the appropriate weighting procedure can be a problem. If nothing is known about the magnitude of the variance at different points on the regression surface, special studies may have to be made.

It might be mentioned that if the variance is homogeneous, each observation is given equal weight (wi = 1). Notice that when wi = 1, the normal equations for a weighted

regression are the same as those for an unweighted regression.

FPL 17 -29-

Page 34: fplrp17

2

Problem VII - A Weighted Regression With Constant Term

To illustrate the weighted regression procedure, it will be assumed that in the data of Problem I, the variance of Y is proportional to X1 and that we want to fit the model

Y = ß0 + ß1X1.

The appropriate weighting would be

1 .wi = X1i

The basic data and weights are

wiX1i 2 = 117, w iX1iY i = 234.

The normal equations would be:

Coefficient Equation

^ ^ß0 ( wi)ß0 + ( wiX1i)ß1 = wiYi

^ ^ß1 ( wiX1i)ß0 + ( wiX1i 2)ß 1 = wiX1iYi

or ^ ^ 1.9129ß0 + 13ß 1 = 24.631

^ ^ 13ß0 + 117ß1 = 234.

^ ^ The solutions are ß0 = -2.9224 and ß 1 = 2.3247, so the fitted equation is

^ Y = 2.3247X1 - 2.9224.

FPL 17 -30-

jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
Page 35: fplrp17

Problem VIII - Ratio Estimators

A situation is frequently encountered where we have observations on a Y and an associated X value, and we want to describe the relationship of Y to X by a ratio

Y = R.X

This is equivalent to fitting the regression model

Y = ß1X1.

The appropriate estimate of ß1 will depend on how the variance of Y changes with

the level of X1. Three situations will be considered: (1) the variance of Y is propor-2tional to X1, (2) the variance of Y is proportional to X1 , and (3) the variance is

homogeneous.

(1) Variance of Y proportional to X1. In this case, we would fit a weighted regression using the weights

1wi = .X1i

The normal equation for ß1 would be

( wiX1i 2)ß1 = wiX1iYi

so that ^

wiX1iYiß = 1 wiX1i

However, wi = X 1

1i , so that wiX1iYi = Yi, and wiX1,

2 = X1i. Hence,

Yi nY Y^ = .ß1 = X1i nX1 X1

FPL 17 -31-

jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
Page 36: fplrp17

In other words, if the variance of Y is proportional to X, then the ratio of Y to X is estimated by computing the ratio of Y to X. In sampling literature, this is sometimes referred to as the ‘ratio-of-meansestimator.”

2(2) Variance of Y proportional to X1 . The weights in this case would be

1 wi = 2 . X1i

As we’ve seen, the weighted estimate of ß1 is

ß ^ = wiX 1iY i ,

1 2Sw iX1i ( 2

But, if wi = 1 2 , then w iX1i Yi = (Y i / X1i), and wiX1i

2 = X1i

2)= n. Hence, X1i

X1i

ß = (Yi /X1i) . ^

1 n 2So, if the variance of Y is proportional to X1 , then the ratio of Y to X is estimated

by computing the ratio of Y to X for each unit and then taking the average of these ratios. In sampling, this is called the “mean-of-ratiosestimator.”

(3) Variance of Y is homogeneous. If the variance is homogeneous, we can fit unweighted regression (that is, a weighted regression with equal weights) for which the normal equation is

^ ( X 1 2)ß1 = ( X1Y)

or X1Y

^ ß1 = 2 X1

For the data of Problem I, the three estimates would be:

(1) Variance proportional to X1.

^ ß = SY = 234 = 2.0000 1 SX 117

FPL 17 -32-

jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
Page 37: fplrp17

(2) Variance proportional to X12.

(Y/X 1) = 24.631^ ß = = 1.8947 n 13

(3) Homogeneous variance.

^ X1Y 2,554 = 2.0680ß = X 2 1,235

1

As many readers may know, fitting the model

Y = ß0 + ß1 X1 + --- + ß k Xk

by weighted regression methods with weights wi leads to the same results as an

unweighted (or equal weighted) fitting of

where:

Transformations

Fitting a regression in the presence of heterogeneous variance may be a lot of work. Special study of the variance is often required to select the proper weighting pro-cedure, and the computations involved in a weighted fitting can be quite laborious.

To avoid the computations of a weighted regression, some workers resort to a transformation of the variables. The hope is that the transformation will largely eliminate the heterogeneity of variance, thus permitting the use of equal weighting procedures. The most common transformations are log Y, arc sin (used where Y is a percentage), and (frequently used if Y is a count rather than a measured variable).

FPL 17 -33-

jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
Page 38: fplrp17

This may be perfectly valid if the transformation does actually induce homogeneity. But, there is some tendency to use transformations without really knowing what happens to the variance. Also, it should be remembered that the use of a transfor-mation may also change the implied relationship between Y and the X variables. Thus, if we fit

log Y = ß0 + ß1X1

we are implying that the relationship of Y to X1 is of the form

Y = abX1

Fitting

= ß0 + ß 1X1

implies the quadratic relationship

2Y = a + bX1 + cX1

SOME ELEMENTS OF MATRIX ALGEBRA

It is not necessary to know anything about matrix algebra (as such) in order to make a regression analysis. If you can compute and use the c-multipliers as discussed in the sections dealing with the t-test and confidence limits, then you have the essentials. However, an elementary knowledge of matrix algebrais very helpful in understanding certain procedures and terms that are used in regression work.

Definitions and Terminology

A matrix is simply a rectangular array of numbers (or letters). The array is usually enclosed in brackets. The dimensions of a matrix are specified by the number of rows and columns (in that order) that it contains. Thus in the matrices,

FPL 17 -34-

Page 39: fplrp17

----

A is a 2 by 2 matrix B is a 1 by 3, C is a 3 by 1, and D is a 3 by 2 matrix.

The individual numbers (or letters) in a matrix are referred to as elements. A particular element may be identified by subscripts designating the row and column (in that order) in which the element appears. Thus, we could represent a matrix by using subscripted letters in place of numerical elements. For example,

This is an m by n or (m x n) matrix.

A square matrix is one in which the number of rows equals the number of columns. In a square matrix, the elements alongtheline from the upper left corner to the lower right corner constitute the diagonal of the matrix (that is, a11, a22, , a nn).

If the elements above the diagonal of a square matrix are a mirror image of those below the diagonal (that is, aij = aji for all values of i and j), the matrix is said to be

symmetrical. Some examples of symmetrical matrices are

A square matrix in which every element of the diagonal is a one and every other element a zero is called the identity matrix and is usually symbolized by the letter I. The last matrix above is an identity matrix.

FPL 17 -35-

Page 40: fplrp17

Two matrices are equal if they have the same dimensions and if all corresponding elements are equal. Thus,

only if

The transpose of a matrix is formed by 'rotating' the matrix so that the rows become the columns and the columns become the rows. The transpose of

The transpose of

and of [3 1 4 2] is

The transpose of a matrix (A) is symbolized by a prime (A' ).

FPL - 36 -

Page 41: fplrp17

Matrix Addition and Subtraction

Two matrices having the same dimensions can be added (or subtracted) simply by adding (or subtracting) the corresponding elements. Thus,

Note that the sum (or difference) matrix has the same dimensions as the matrices that were added (or subtracted).

Matrix Multiplication

Two matrices can be multiplied only if the number of columns of the first matrix is equal to the number of rows of the second. If A is a (4 x 3) matrix, B is a (3 x 2), and C is a (2 x 3), then the multiplications AB, BC, and CB are possible, while the multiplications AC, BAY and CA are not possible.

The rule for matrix multiplication (when possible) is as follows: If A is an (r x n)

matrix and B is an (n x m) matrix, then the ith element of the product matrix (C) is

The dimensions of theproduct matrix will be (r x m). In words, the above rule states

that the element in the ith row and the jth column of the product matrix is obtained as

the sum of the products of elements from the ith row of the first matrix and the

corresponding elements from the jth column of the second matrix.

Most persons find it easier to spot the pattern of matrix multiplication than to follow the above rule. A few examples may be helpful:

FPL 17 -37-

Page 42: fplrp17

(1)

(2)

(3)

(4)

(5)

Multiplication is not possible; the number of columns in the first matrix does not equal the number of rows in the second.

(6)

FPL 17 -38-

Page 43: fplrp17

In addition to observing the pattern of matrix multiplication in these examples, a few other points might be noted. For one thing, if the dimensions of a proposed matrix multiplication are written down, the two inner terms must be equal for multiplication to be possible. If multiplication is possible, the two outer terms tell the dimensions of the product matrix.

Thus, (3 x 2)(4 x 3) multiplication is not possible.

(4 x 3)(3 x 2) multiplication is possible; the product matrixwill be (4 x 2).

(1 x 200)(200 x 1) multiplication is possible; theproduct matrixwillbe (1 x 1).

A second point to note is that even though the multiplications AB and BA may both be possible (if A and B are both square matrices), the products will generally not be the same. That is, matrix multiplication is, in general, not commutative. This is illustrated by examples (1) and (6). The identity matrix (I) is one exception to this rule; it will give the same results whether it is used in pre- or post-multiplication (IA =AI =A).

Finally, it should be noted that any matrix is unchanged when multiplied by the identity matrix, as in example (4).

The Inverse Matrix

ordinary algebra, an equation such as

ab = c

can be solved for b by dividing c by a (if a is not equal to zero). In the case of matrices, this form of division is not possible. In place of division, we make use of the inverse matrix, which basically is not too different from ordinary algebraic division.

The inverse of a square matrix (A) is a matrix (called “A inverse” and symbolized

by A -1) such that the product of the matrix and its inverse will be the identity matrix

A-1 A = I

FPL 17 -39-

Page 44: fplrp17

As an example, the inverse of

is

since

Finding the inverse of a matrix is not too complicated though it may be a lot of work if an electronic computer is not available. One method is to work from the basic definition. In the matrix (A) given above, we can symbolize the elements of the inverse matrix by the letter c, with subscripts to identify the row and column of the element. Thus we can write,

Then, by the definition of the inverse, we know that

FPL 17 -40-

Page 45: fplrp17

or

Now, two matrices are equal only if all of their corresponding elements are equal; therefore, we have three sets of simultaneous equations, each involving three unknowns.

Solving these leads to the inverse,

as given before.

When the matrix to be inverted is symmetrical, the inversion process is not quite so laborious, for it turns out that the inverse of a symmetrical matrix will also be symmetrical. This means that only the elements in and above the diagonal will have to be computed; the elements below the diagonal can be determined from those above the diagonal. A calculating routine for inverting a symmetrical matrix is given in Appendix B.

As a final note, it may be mentioned that although matrix multiplication is not usually commutative (AB BA), the product of a matrix and its inverse does commute (A-1 A = -1 AA-1 ).

FPL 17 -41-

jgodfrey
Typewritten Text
¹
Page 46: fplrp17

Matrix Algebra and Regression Analysis

Matrix algebra provides a very useful tool in the regression analysis. To illustrate, consider the set of normal equations for fitting a linear regression of Y on X1 and X2:

or with a small revision in notation

(Note that in this case aij = aji; this is a symmetric matrix)

Now, remembering what we learned about matrix multiplication, it will be noted that this set of equations can be written in matrix form as:

Or even better,

where: A is the matrix of sums, and sums of squares and products (computed from the data).

ß ^ is the matrix of estimated regression coefficients (to be computed).

R is the matrix of the righthandsides of the normal equations (computed from the data).

FPL 17 -42-

Page 47: fplrp17

In ordinary algebra the equation

^ ^ could be solved for ß simply by dividing both sides by A, giving ß = R/A. This does not work in matrix algebra, but there is a comparable process that will lead to the desired result. This is to multiply each side of the equation by the inverse of A

(= A-1) giving

-1By definition, the product A A is equal to the identity matrix I, so we have

We have seen that a matrix is unchanged when multiplied by the identity matrix, so ^ ^ Iß = ß and the above equation is

If we represent the elements of the inverse matrix by cij the above equation is equivalent to

(Note again that cij = cji; that is, the inverse of a symmetric matrix is also

symmetric.)

Writing this out more fully gives

FPL 17 -43-

Page 48: fplrp17

Then, since two matrices are equal only if corresponding elements are equal, we have

or in general

To reassure ourselves that this procedure actually works, let us take a simple numerical example. Suppose we have the normal equations

By the more familiar simultaneous equation procedures, we find

In matrix form, the normal equations are

The matrix of sums of squares and products is

and its inverse is

FPL 17 -44-

Page 49: fplrp17

The inverse can (and should) be checked by multiplying it and the original matrix to see that the result is the identity matrix

check

Then, the regression coefficients are given by

Or,

This is only one of the uses of the inverse matrix. It is, in fact, one of the leas important uses, since the regression coefficients are just as easily computed by the more familiar simultaneous equation techniques. Other uses of the inverse will be discussed later under testing and computing confidence limits.

A word on notation is in order here. In regression work, the subscripting of the c-multipliers depends on the form of the normal equations. If the normal equations contain the constant term (ß 0) so that

then the c-multipliers are usually subscripted in this manner

FPL 17 -45-

Page 50: fplrp17

If the normal equations do not contain the constant term, the subscripts for the c-multipliers will be

It will be recalled that there are two situations in which the normal equations will not have a constant term. The first is, of course, when the model being fitted does not contain a constant term. The second is when the model being fitted does have a constant term but the fitting is being done with corrected sums of squares and products.

ANALYSIS OF VARIANCE

It is important to keep in mind that when a linear model such as

is fitted to a set of sample data, we are in effect, obtaining sample estimates of the population regression coefficients ß0, ß1, ß2, . . . ßk. These estimates will obviously

be subject to sampling variation: their values will depend on which units were, by chance, selected for the sample.

If we have some hypothesis concerning what one or more of the coefficients should be, this leaves us with the problem of determining whether the differences between the observed and hypothesized values are real or could have occurred by chance. For example, suppose we have a hypothesis that the relationship between Y and X is linear. From a sample of Y and X values, we obtain the estimated equation

To test our hypothesis of a linear relationship, we would want to test whether the ^ observed value of ß1 = .082 represents a real or only a chance departure from a

true value of ß1 = 0.

FPL 17 -46-

Page 51: fplrp17

Or, if we fitted the quadratic

and obtained the equation

we might ask whether ß 2 = -.11 represents a real or only a chance departure from a ^ hypothesized value of ß2 = 0. If we find that a value of ß = -.11 could arise by chance

in sampling a population for which ß2 = 0, then we might infer that there is no

evidence that the parabola is any better than the straight line for describing the relationship of Y to X.

It may be desired to test more than one coefficient or values other than zero. For example, we might have reasons �or believing that the ratio of Y to X is some constant K, which is equivalent to saying Y = KX. If we have fitted the linear regression

we would then want to test the joint hypothesis that ß0 = 0 and ß1 = K.

The exact form of the hypothesis will depend on the objectives of the research. The main requirements are that the hypothesis be specified before the equation is fitted and that it be meaningful in terms of the research objective.

This portion of the paper will deal with the use of analysis-of-varianceprocedures in testing hypotheses about the coefficients. Some hypotheses can also be tested by the t-test and this will be described later.

A General Test Procedure

There is a basic procedure that may be used in all situations, but in practice the computational routine often varies with the method of fitting and the hypothesis to be tested. First, let’s look at the basic procedure and then illustrate the computations �or some of the more common testing situations. To illustrate the discussion of the basic procedure, assume that we have a set of n observations on a Y-variable and

FPL 17 -47-

Page 52: fplrp17

four associated X-variables, and that for the model

we want to test the joint hypothesis that

The first step is to fit the complete or maximum model to obtain estimates ^ ^ ^ (ß , ß1, . . . , ß 4) of the regression coefficients. With the results of this fitting, we

then compute the sum of the squared deviations of the individual Y-values from the ^ corresponding values predicted by the fitted regression (Y ). We will call this the

residual sum of squares (the sum of squared deviations or residuals).

Residual Sum of Squares

^ Rather than computing each value of Yi and each squared deviation (Yi -̂ Yi)

2 , the same result can be obtained by computing

Residual

thwhere: Rj = the right-handside of the j normal equation.

In this equation the first term (SY2) is called the total sum of squares. The second ^ term (Sßj Rj ) is called the reduction or regression sum of squares.

The next step is to rewrite the basic model, imposing the conditions specified by the hypothesis. In this case the hypothesis is that ß2 = 1 and ß3 = 2ß , so we have

Rewriting the model we have,

or

where:

FPL 17 -48-

jgodfrey
Typewritten Text
jgodfrey
Typewritten Text
jgodfrey
Typewritten Text
S
jgodfrey
Typewritten Text
S
Page 53: fplrp17

Now we fit this "hypothesis" model by the standard least squares procedures and again compute

Residual

In this equation, the Rk term is the right-hand side of the kth normal equation of the ^ set used to fit the hypothesis model, and the ßk are the resulting solutions of that set.

The analysis of variance can now be outlined as follows:

Degrees of Sum of Mean Source freedom squares square

Residual about hypothesis model

Residual about maximum model

Difference for testing hypothesis

In this table, the residual sums of squares for the hypothesis and maximum models are computed according to the equations given above. The difference is obtained by subtraction. The degrees of freedom for a residual sum of squares will always be equal to the number of observations (n) minus the number of independently estimated coefficients. Thus, in the maximum model we estimated five coefficients so the residual sum of squares would have n-5degrees of freedom. In the hypothesis model, we estimated three coefficients so the residuals will have n-3 degrees of freedom. The degrees of freedom for the difference (2) are then obtained by subtraction. The mean squares are equal to the sum of squares divided by the degrees of freedom.

Finally, the test of the hypothesis can be made by computing

F = Difference Mean Square Maximum Model Residual Mean Square

This value is compared to the tabular value of F(table 6, Appendix E) with 2 and n-5 degrees of freedom (in this instance). If the computed value exceeds the tabular value at the selected probability level, the hypothesis is rejected.

FPL 17 -49-

Page 54: fplrp17

Loosely, the rationale of the test is this; given a sample of n observation on a variable Y, there will be a certain amount of variation among the Y-values (the total sum of Y-squares is a measure of this variation). When we fit a regression to these values we are stating that some portion of the variation in Y is associated with the regression and the remainder represents deviations from that regression (the total sum of squares is divided into a regression sum of squares and a residual sum of squares). Because the maximum model is subject to fewer restrictions than the hypothesis model, it will fit the data better; the sum of squared residuals should (and will) be smaller. If the hypothesis being tested is true, then the difference in residuals between the hypothesis model and the maximum model will be no larger than might be expected by chance. If the F-test indicates that this difference is larger than might be expected by chance, then the hypothesis is rejected.

Degrees of Freedom

When we select a random sample of n observations on some variable Y, then all n of the values are free to vary. Instatistical terms these observations (or the squares of the observations) are said to have n degrees of freedom, If we estimate the mean _ _ (Y ) of this sample and calculate the deviation of each value from the mean (y = Y-Y), then since the deviations must sum to zero, only (n-1) of them are free to vary and the deviations (or squared deviations) are said to have (n-1) degrees of freedom As shown previously, estimating the mean is equivalent to fitting the regression Y = ß0. If we

fit a straight line to the data, we are imposing two restrictions on the variation in Y so that the deviations from regression (or the squared deviations) will have n-2

degrees of freedom. A parabola (Y = ß0 + ß1X + ßX2) imposes three restrictions

on the variations in Y, so the residuals about a parabola would have n-3 degrees of freedom. Thus, we have the rule that the degrees of freedom for the residuals is equal to the number of observations minus the number of independently estimated coefficients .

Just as we partitioned the total sum of squares into a portion due to regression and a portion for deviations from regression, we can think of partitioning the total degrees of freedom into a part associated with the regression and a part associated with deviations from regression. If we fit a model with k independently estimated coef-ficients, we associate k degrees of freedom with the regression or reduction sum of squares, and n-kdegrees of freedom with the residual sum of squares.

It might be mentioned that if we fit a model with n coefficients, the residuals will have n-n = 0 degrees of freedom. This is equivalent to saying that the residuals have no freedom to vary--thatthe model accounts for all the variation in Y; and, it will

FPL 17 -50-

Page 55: fplrp17

turn out that every point will lie on the regression surface. The sum of squared residuals will be zero. This will be true regardless of the independent variables used in the model. This is a statistical form of the geometrical fact that two points may define a straight line, three points may define a plane in three-dimensional space, and n points define a "hyperplane" in n-dimensional space.

Problem IX - Test of the Hypothesis that ß1 + ß2 = 1

Whenever the hypothesis specifies that a coefficient or some linear function of the coefficients have a value other than zero, the basic test procedure must be used. To illustrate the test of this non-zerohypothesis we will assume that we have fitted the model (maximum model)

to the data of Problem 1.

The normal equations for fitting this model are:

or, substituting numerical values from Problem I,

The solutions are:

FPL 17 -51-

Page 56: fplrp17

The total sum of squares is

The reduction or regression sum of squares is

Reduction

Since three coefficients were fitted, the reduction sum of squares has 3 degrees of freedom.

The residual sum of squares can now be obtained as

Residual = Total - Reduction

The next step is to fit the hypothesis model. Under the hypothesis that ß 1 + ß 2 = 1 (or ß = 1 - ß1 ) the model becomes

or

This can be rewritten

where:

FPL 17 -52-

Page 57: fplrp17

The normal equations for fitting this model are

' ' At this stage individual values could be found for the new variables X1 and Y and

these could be used to compute the sums, and sums of squares and products needed. However, with a little algebraic manipulation, we can save ourselves a lot of work. Thus,

Substituting in the normal equations, we have

The solutions are

FPL 17 -53-

Page 58: fplrp17

The reduction sum of squares is

Reduction

= 3986.0545,

Then, since the total sum of squares

the residual sum of squares will be

Residual = Total

with 2 degrees of freedom.

for Y ' is

- Reduction

= 4091 - 3986.0545

= 104.9455, with 13 - 2 = 11 df.

With these values we can now summarize the analysis of variance and F-test.

Sum of Mean Source df squares square

Residual-Hypothesis Model 11 104.9455

Residual-Maximum Model 10 101.7204 10.17204

Difference 1 3.2251 3.2251

The hypothesis would not be rejected at the level.

Problem X - Test of the Hypothesis that ß2 = 0.

One of the most common situations in regression is to test whether the dependent variable (Y) is significantly related to a particular independent variable. We might want to test this hypothesis when the variableis fitted alone, in which instance (if the variable is X 2) we might fit

FPL 17 -54-

Page 59: fplrp17

and test the hypothesis that ß2 = 0.

Or, we might want to test the same hypothesis when the variable has been fitted in the presence of one or more other independent variables. We could, for example, fit

and test the hypothesis that ß2 = 0.

The latter situation will be illustrated with the data of Problem I. If we work with uncorrected sums of squares and products, the normal equations for fitting the maximum model are:

and the solutions are

Thus, the reduction sum of squares is

Reduction

= 5948.5182, with 4 df.

The total sum of squares (uncorrected) for Y is

Total

so the residual sum of squares for the maximum model is

FPL 17 -55-

Page 60: fplrp17

Residual = Total - Reduction

= 6046 - 5948.5182

= 97.4818, with 13 - 4 = 9 df.

Under the hypothesis that ß2 = 0, the model becomes

for which the normal equations are

giving the solutions

The reduction sum of squares will then be

Reduction

At this point we depart slightly from the basic procedure. Ordinarily, the residuals for the hypothesis model would next be computed and then the difference in residuals between the maximum and hypothesis models would be obtained. But, where the hypothesis results in no change in the Y-variable, the difference between the residuals is the same as the difference between the reductions for the two models.

Difference in residuals = Hypothesis models residuals - Maximum model residuals.

FPL 17 -56-

Page 61: fplrp17

So, we can set up the analysis of variance in the following form:

Source df

Maximum model reduction 4 5948.5182

Hypothesis model reduction 3 5335.4130 ...................................... Difference for testing hypothesis 1 613.1052 613.1052

Residual about maximum model 9 97.4818 10.8313

Total (uncorrected) 13 6046

As F exceeds the tabular value at the .01 level, the hypothesis would be rejected at this level. We say that X2 makes a significant reduction (in the residuals) when fitted after X and X1 3'

Problem XI - Working With Corrected Sums of Squares and Products

It has been shown that if the model contains a constant term (ß0), the fitting may be

accomplished with less effort by working with the corrected sums of squares and products. That is, instead of fitting

we can fit

When this is done, it must be remembered that y = Y - Y _

is the deviation of Y from its mean and that with n observations only n-1 of the deviations are free to vary (since the deviations must sum to zero). Thus, the total sum of squares (corrected sum of squares) will have only n-1df ’s. Also, in the maximum model we now estimate three rather than four coefficients, so the reduction will have only three degrees of freedom.

FPL 17 -57-

Page 62: fplrp17

Despite these changes, the test will lead (as it should) to exactly the same conclusion that was obtained in working with uncorrected sums of squares and products.

Thus, using the data of Problem I and testing the same hypothesis that was tested in Problem X, the normal equations for the maximum model are:

^ 1

^ 2 = -1.9218, and ß ̂ 2 = -0.0975. Therefore theThe solutions are ß = 2.7288, ß

reduction sum of squares is

Reduction

The total sum of squares (corrected) for y is

Total

The residual sum of squares for the maximum model is therefore

Residual = Total - Reduction

Under the hypothesis that ß2 = 0, the reducedmodelwouldbecome y = ß1 x1 + ß2 x2 for which the normal equations are

FPL 17 -58-

Page 63: fplrp17

^ The solutions are ß1 = 2.3990 and ß ̂ 3 = -0.2149, so the reduction clue to fitting this model is

Reduction = (2.3990)(448) + (-0.2149)(-226)

= 1123.3194, with 2 df.

Then in tabular form, the test of the hypothesis is as fallows:

Source df Maximum model reduction 3 1736.5182

Hypothesis model reduction 2 1123.3194 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Difference for testing hypothesis 1 613.1988 613.1988

Residual about maximum model 9 97.4818 10.8313

Total (corrected) 12 1834.

To show what variables are involved in the two models, the various sources in the analysis of variance may be relabeled as follows:

Source

Due to X ,1, X and X3

Due to X1, and X3 - - - - - - - - - - - - - - - - -Gain due to X after X and X2 1 3

Residuals

Total

Problem XII - Test of the Hypothesis that ß2 = ß3 = 0.

A test of this hypothesis presents no new problems. For the model

FPL 17 -59-

Page 64: fplrp17

we found (in Problem XI)

Reduction = 1736.5182, with 3 df

Total (corrected) = 1834, with 12 df

Residual = 97.4818, with 9 df

Under the hypothesis that ß2 = ß3 = 0 the model becomes

for which the normal equation is

giving

Then the reduction SUM of squares is

Reduction = 2.4615(448) = 1102.7520, with 1 df.

Putting these values in the analysis of variance table we have

Source df Reduction due to X1, X2, X 3 1736.5182

Reduction due to X 1 1102.75201- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Gain due to X2 and X3 after X1 2 633.7662 316.8831

Residuals 9 97.4818 10.8313

Total 12 1834.

FPL 17 -60-

Page 65: fplrp17

Problem XIII - Test of the Hypothesis That ß1 + 2ß2 = 0

In Problem IX we tested the hypothesis that ß1 + ß2 = 1. We had to use the basic

procedure for this test because the maximum model and hypothesis model did not have the same Y-values. For a zero hypothesis (e.g., ß1 + ß2 = 0) some of the

X-values may be changed, but the Y-values simpler procedure shown in Problems XI and XII.

Thus in Problem XI we fitted

y = ß1x1 + ß2x2 + ß3 x3

and found

Reduction = 1736.5182,

Total (corrected) = 1834,

Residual = 97.4818,

are unaffected and we may use the

with 3 df

with 12 df

with 9 df.

Under the hypothesis that ß1 + 2ß2 = 0 (or ß1 = -2ß2), the model can be written

or

where:

The normal equations for this model are:

Coefficient Equation

Again, we could compute each value of separately and then get the sums of squares

and products involving using these individual values. It will be easier, however, to

make use of the sums of squares and products that were computed in fitting the maximum model.

FPL 17 -61-

Page 66: fplrp17

Thus,

So, the normal equations for the hypothesis model are:

The solutions to the normal equations are

so that the reduction sum of squares is

Reduction

The analysis of variance is as follows:

Source df

Maximum model reduction 3 1736.5182

Hypothesis model reduction 2 1688.3534 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Difference 1 48.1648 48.1648

Residuals 9 97.4818 10.8313

Total 12 1834

not significant at the .05 level.

FPL 17 -62-

Page 67: fplrp17

Problem XIV - Hypothesis Testing in a Weighted Regression

The primary difference in the test procedure for a weighted regression is the use of a weighted total sum of squares in place of the unweighted (or equally weighted) sum of squares.

In VII we fitted the model

giving each observation a weight inversely proportional to the value of X 1' The normal equations were

and the solutions, This gave as the reduction sum of squares

Reduction

The weighted sum of squares lor Y was

Total

so the residual sum of squares would be

Residual

Then to test the hypothesis that ß 1 = 0, we would fit (by a weighted regression) the model

for which the normal equation is

or

FPL 17 -63-

Page 68: fplrp17

This model gives a reduction of 317.156 with 1 df, so the test of the hypothesis is:

Source df

Reduction due to maximum model 2

Reduction due to hypothesis model 1 317.156 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Difference for testing hypothesis 1 154.842

Residual 11 81.743 7.431

Total 13 553.741

significant at 0.01 level.

The hypothesis that ß 1 = 0, would be rejected at the 0.01 level.

An Alternative Way to Compute The Gain Due to a Set of X Variables

To test the hypothesis that one or more of the ß = 0, the difference in reduction j

sum of squares between the maximum model and the hypothesis model must be obtained. We have done this by first solving the normal equations and computing the reductions for each model and then taking the difference. If each model has several independent variables, this can be a very laborious process.

When the c-multipliers have been computed for the maximum model, there is a method of obtaining the difference in reduction between the two models that is in some cases less work. If we have fitted the regression of Y on X1, X2, X3, X4, and X5 and

we want to test X4, and X5 in the presence of X1, X2 and X3, that is, the hypothesis

that ß4 = ß5 = 0, then the difference in reduction sum of squares between the maximum

model (Y = ß0 + ß1 X1 + ß2X2 + ß3X3 + ß4X4 + ß5X5) and the hypothesis model

(Y = ß0 + ß1X1 + ß2X2 + ß3X3 ) canbe obtained by the matrix multiplication:

Difference in Reduction

That is, we take the portion of the inverse associated with the variables to be tested, invert it, and then pre- and post-multiply by a matrix of the coefficients being tested.

FPL 17 -64-

Page 69: fplrp17

Two examples that illustrate the method will be given. In Problem XII we fitted the model

^ ^ and tested the hypothesis that ß2 = ß3 = 0. We found ß 2 =-1.9218and ß2 =-0.0975,

and the gain due to X2 and X3 after X1 was found to be 633.7662.

To compute this gain using the c-multipliers, we must first find the inverse of the matrix of sums of squares and products

The inverse is

The portion of the inverse associated with X2 and X3 is

The inverse of this is

FPL 17 -65-

Page 70: fplrp17

Hence, the gain due to X2 and X3 after X1 is

= 633.7408 (as before, except for rounding errors),

^ In Problem XI we tested the hypothesis that ß2 = 0. We found ß2 = -1.9218, and

the gain due to Y2 after X1 and X3 was By the present method, the gain

could be computed as

Whether or not this method saves any time or labor will depend on the number of variables being tested, the number of variables in the hypothesis model, and the individual's facility at solving simultaneous equations and inverting matrices. If only one of the variables is being tested (as in the last example), the t-test as described in the next chapter will usually be the easiest. If the hypothesis model involves only one or two variables or if the several variables are to be tested, it may be easiest to fit each model and find the reduction due to each rather than with the c-multipliers. For the beginner, the best method will usually be the one with which he is most familiar.

THE t-TEST

Many regression hypotheses can be tested by means of the t-distribution (table 7, Appendix E), Setting up these tests requires the following bits of statistical knowledge:

1. The coefficients of a fitted regression are sample estimates which, like all sample estimates, are subject to sampling variation. The statistical measure of the variation of a variable is the variance, and the measure of the association in the variation of two variables is the covariance.

FPL -66-

Page 71: fplrp17

The variance of an estimated regression is

where: cjj is an element of the inverse of the matrix of coefficients of the normal equations.

The covariance of two coefficients estimated from the same set of normal equations is

2. a linear function of several estimated coefficients (for example,

3. The general equation for the t test is

where: The estimated value of some function of normally distributed variables.

The true or hypothesized value of the function.

The variance of the sample estimate.

The t computed by this equation will have degrees of freedom equal to those of the mean square used in the denominator.

Putting these three items together, we could, for example, test the hypothesis that ß1 + 2ß2 = 8 by

FPL 17 -67-

Page 72: fplrp17

Or, we could test a more familiar hypothesis such as ß2 = 0 by

Since this last is the hypothesis that was tested in Problem XI, the t-test can be ^ illustrated with the same data. In that example, we had ß 2 = -1.9218, and the residual

mean square with 9 degrees of freedom was Residual = 10.8313.

The matrix of coefficients from the normal equations was

and the inverse would be

From the inverse, we find c22 = .006 022 9 so that the t-test of the hypothesis that ß = 0 is2

If the absolute value (algebraic sign ignored) of t exceeds the tabular (table 7, Appendix E) value for 9 df at the desired probability level, then the hypothesis is rejected. In this case tabular t = 2.262 (at the .05 level), so we would reject the hypothesis that ß 2 = 0.

The F-test of this hypothesis also leads to a rejection. Those who are not familiar with the relationship between the “t” and “F” distributions sometimes ask which is

FPL 17 -68-

Page 73: fplrp17

the best test. The answer is that where both are applicable, the easiest one is best because essentially there is no difference between them. If a given set of data is used to test some hypothesis by both the t and F tests, it will be found that F with 1 and k degrees of freedom is equal to the square of t with k degrees of freedom. In the2example given above, we found t = -7.525 so t = 56.63. The F value for testing the same hypothesis was except for rounding errors, they should be identical.

Problem XV - Test of a Non-Zero Hypothesis

The main advantage of the t-test over the analysis of variance is in the test of a non-zero hypothesis. It will be recalled that an F-test of this type of hypothesis required the computation of separate total and residual sums of squares for both the maximum and the hypothesis model. the t-test, a non-zero hypothesis is handled as easily as any other. This can be illustrated using the data in Problem XI for a test of the hypothesis that 12t = 1.

^ ^ In that problem we had ß1 = 2.7288 ß2 = -1.9218, and Residual Mean Square =

10.8313 with 9 df. As we have seen, the inverse matrix is:

The ref ore,

Page 74: fplrp17

The absolute value of t is less than the tabular value (2.262) at the .05 level, so the hypothesis would not be rejected.

It should be noted that this is a test of the hypothesis that ß1 + ß2 = 1, when fitted in the model

This is not the same as the hypothesis that was tested in Problem IX. In that problem = 1 was tested when fitted in the model

CONFIDENCE LIMITS

General

If you have ever asked for estimates on the cost of repairing a car or a TV set, you are probably well aware of the fact that there are good and there are bad estimates. Sample-based regression estimates can also be good or bad, and it is important to provide some indication of just how good or bad they might be.

The variation that may be encountered in fitting sample regressions can be illustrated by five separate samples of 10 units each, selected at random from a population in which Y was known to have no relationship to the X variable. The simple linear regressions that resulted from the five samples were as follows:

FPL 17 -70-

Page 75: fplrp17

The sample regressions have been plotted in figure 9. The heavy horizontal line represents the mean value of Y for the population.

M 124 626 Figure 9.--Plotting of 5 sample linear regressions.

These sample regressions illustrate two points that should never be forgotten by those who work with regression. The first is that the fitting procedure can be applied to any set of data. Put in the numbers, turn the crank, and out will come an equation that expresses one variable in terms of one or more other variables. But no how hard or how many times the crank is turned, it is impossible to induce relation-ships that did not exist to start with. It may all look very scientific, but the mere existence of an equation with coefficients computed to eight decimal places on a $3 million computer does not prove that there is a relationship.

The second point is that sample estimates are subject to variation. The variation in these regressions may be quite startling to those who have had little experience with the behavior of sample estimates. These results should not, however, be allowedto shatter the beginner’s hopes for regression analysis. Ten units from this population is far too light a sample for fitting even a simple linear regression and the erratic results are no more than might be expected.

FPL 17 -71-

Page 76: fplrp17

For properly designed sampling and estimating procedures, it is possible to compute statistical confidence limits--values which will bracket the thing being estimated a specifiable percentage of the time. If we compute 95-percent confidence limits for the mean, these limits will include the population mean unless a one-in-twenty chance has occurred in sampling. That is, about one time in 20 we will get a poor sample, and the confidence limits computed from that sample will fail to include the mean. We have no way of knowing which is the bad sample, but we can say that over the long run, only one time in 20 will our 95-percent confidence limits fail to include the mean.

Similar confidence limits can be computed for regression coefficients and for regression predictions. For any estimate that can be assumed to follow the normal distribution, the general equation for the confidence limits is:

where: The thing being estimated.

A sample-based estimate of

The value of t for the desired probability level (Table 7, Appendix E).

Applying this to an estimated regression coefficient, we would have

The t would have degrees of freedom equal to those for the residual mean square.

In the section on the analysis of variance, we fitted a regression of Y on X1, X2 ,

^ and X3 (Problem XI). The estimated value of ß2 was ß 2 = The residual

mean square was 10.8313 with 9 degrees of freedom, and the c-multiplier was c22 = 0.006 022 9 (Problem XV). The 95-percent confidence limits for ß2 would

then be given by

or

The confidence limits can be used as a test of some hypothesized value of the coefficient. Since these limits do not include zero as a possible value, we would reject the hypothesis that ß2 = 0. This is the sameconclusion that we reached by the

FPL 17 -72-

Page 77: fplrp17

F and t tests. The confidence limit approach is usually more informative than F or t tests of a hypothesis.

If we wish to place confidence limits on a linear function of the regression coefficients, we must remember the rule for the variance of a linear function. This rule was given previously, but will he repeated here.

If are a set of estimated regression coefficients, then the ^ ^ variance of the linear function (a1 ß1 + a2ß2 + akß k) will be estimated by

Or, in more abbreviated form,

where:

This is nowhere near as difficult as it appears. For example, the variance of the ^ ^ function 2ß 1 - 3ß2 would be

or

Then if we are given a linear function of the regression coefficients the confidence limits on this function will be

FPL 17 -73-

Page 78: fplrp17

If then the confidence limits would be

where t has degrees of freedom equal to the df for the residual mean square and is selected for the desired level of confidence.

^ Confidence Limits on Y (Predicted Mean Y)

The preceding rule may be applied to theproblem of determining confidence limits ^ for Y (a predicted value of mean Y for a given set of values for the X variables). If

the predicted value of mean Y is

then the confidence limits can be obtained by treating the specified values of the X’s as constants and applying the preceding rule. Thus,

Confidence Limits

An abbreviated way of writing this is

Confidence Limits

where Xi or Xj = 1, if i or j = 0

This again looks more difficult than it is. Thus, for the sample linear regression

the confidence limits are

FPL 17 -74-

Page 79: fplrp17

If the fitted regression is of the form (that is, no constant term), then the confidence would be

Note that in this case where the fitted model has no constant term, we will have no c-multipliers with a zero in the subscript.

All of the above equations would apply for weighted as well as unweighted regressions .

The reader who has always used corrected sums of squares and products in regression work may find that the confidence limit equations given are not quite the same as those with which he is familiar. They will, however, give exactly the same confidence limits. The somewhat familiar equation (for unweighted regression) for the confidence limits is

In the following problem, we will see that these equations lead to the same result.

Problem XVI - Confidence Limits In Multiple Regression

In Problem I, we fitted a regression of Y as a linear function of X1, X2, and X3. The fitted equation was

The same result was obtained whether we used corrected or uncorrected sums of squares and products. The residual mean square (Problem XI) was 10.8313 with 9 degrees of freedom.

Suppose now that we predicted the mean value of Y associated with the values X1 = 6, X2 = 8, and X3 = 12. We would have

FPL 17 -75-

Page 80: fplrp17

The method of computing the confidence interval for this estimate will depend on whether corrected or uncorrected sums of squares and products were used in the fitting. We will assume first that the fitting was done with uncorrected terms. In this instance the normal equations were

The matrix of sums and uncorrected sums of squares and products is

The inverse is

The equation for the confidence limits in this case will be

FPL 17 -76-

Page 81: fplrp17

or, for 95 percent confidence limits,

Thus, unless a 1 in 20 chance occurred in sampling, we can say that the true mean of Y associated with X1 = 6, X2 = 8, and X3 = 12 is somewhere between 4.0948 and

Note that this does not implythat individual values of Y will be found between these limits. This is a confidence interval on regression Y which is the mean value of Y associated with a specified combination of X values.

Suppose now that in fittingthis equation, we had used the corrected sums of squares and products so that the normal equations were

The inverse of the matrix

is

FPL 17 -77-

Page 82: fplrp17

Note that this is the same as the inverse of the matrix of uncorrected sums of squares and products with the first row and column deleted.

The confidence limits can be computed by the equation

Limits on the mean value of Y associated with X1 = 6, X2 = 8, and X3 = 12 would be:

Problem XVII - Confidence Limits On a Simple Linear Regression

If we fit a simple linear regression(Y = ß0 + ß1X1) using uncorrected sums of

squares and products, the equation for the confidence limits will be

as may be determined from the general formula,

If corrected sums of squares and products are used in the fitting, the equation for the confidence limits boils down to

FPL 17 -78-

Page 83: fplrp17

The normal equation for the simple linear regression fitted with corrected sums of squares and products is

and the inverse of the (1x1) matrix

is simply

can also be writtenso this equation

which is the form that appears in many textbooks.

Calculation of the limits can be illustrated with the simple linear regression fitted in Problem 111. The equation was

^ Y = -4.1535 + 2.4615X1. _

We had n = 13, Xl = 9, and the residual mean square is 66.4771 with 11 degrees of

freedom. The normal equation was

or

For a value of X1 = 7, we would have

and the 95 percent confidence limits would be

FPL 17 -79-

Page 84: fplrp17

or

If regression Y and the confidence limits are computed for several values of X, the confidence limits can be displayed graphically. In the above example, we would have

In figure 10 these points have been plotted and connected by smooth curves.

Figure 10.--A linear regression with 95-percent confidence limits. M 620

FPL 17 -80-

Page 85: fplrp17

Confidence Limits on Individual Values of Y

It was mentioned and should be emphasized that the limits previously discussed are limits for the regression line (mean value of Y for a specified X), not limits on individual values of Y. Often, however, having estimated a value of Y by means of a regression equation, we would like to have some idea as to the limits which might include most of the individual Y values. These limits can be obtained by adding one times the residual mean square to the term under the radical in the equations given for the limits on regression Y.

This would make the general formula for the limits on an individual value of Y

The formula that can be used when the corrected sums of squares and products have been used in the fitting would be

For the simple linear regression, this last formula reduces to

COVARIANCE ANALYSIS

It frequently happens that the unit observations to be analyzed can be classified into two or more groups. A set of tree heights and diameters might, for example, be grouped according to tree species. This raises the question of whether separate prediction equations should be used for each group or could some or all of the groups be represented by a single equation? Covariance analysis provides a means of answering this question.

In the case of simple linear equations, group regressions may differ either because they have different slopes or, if the slopes are the same, because they differ in level (fig. 11).

FPL 17 -81-

Page 86: fplrp17

Figure 11.--Variation among linear regressions, M 124 623

The standard covariance analysis first tests the hypothesis of no difference in slope. Then if there is no evidence of a difference in slopes, the hypothesis of no difference in levels is tested. If no significant difference is found in either the slopes or levels, then a single regression may be fitted ignoring the difference in

The following set of data will be used in the problems illustrating the analysis of covariance.

Sums Means

Group A

X1Y

5.9 10.7 11.4

9.6 12.6

8.0 12.8

7.5 12.5 14.2 8.4

113.6 10.3273

11 242.72

132.73

390.91

69.541 819

28.256 364

40.815 455

0.8 3.1 4.4 1.6 4.6 2.6 5.5 1.1 3.9 4.9 1.4

33.9 3.0818

Group B

Y

5.2 13.4 10.0

7.5 10.1 11.9 10.7 6.8 9.0

84.6 9.4

9 848.20

145.98

346.69

52.96

19.042 222

28.97

1.6 5.8 3.6 2.0 4.3 5.8 4.8 3.3 2.6

33.8 3.7556

Group C

Y

7.8 12.4 10.9

16.8 13.9 11.4

8.9 13.7 16.0

121.7 12.17

10 1559.73

89.47

356.57

78.641

21.349

38.933

0.6 3.4 1.5

.7 4.5 4.1 2.3 1.3 3.1 4.6

26.1 2.61

X 1

FPL 1 7 -82-

Page 87: fplrp17

Pooled values (ignoring groups)

n = 30; ΣY = 319.9; Σ X 1

= 93.8; ΣY 2 = 3650.65; Σy 2 = 667; ΣX1 Y = 1094.17; Σ x1y = 93.949 334; Σ X 1

2 = 368.18; Σ x1

2 = 74.898 667.

Using corrected sums of squares and products, the normal equation for a linear regression with constant term is:

If a separate regression were fitted for each group, we would have: ^ Group A: 28.256 364 ß1 = 40.815 455 ^ ß1 = 1.444 469

Reduction = (1.444 469)(40.815 455) = 58.956 659, with 1 df 2

Residual = Σy - Reduction = 69.541 819 - 58.956 659 = 10.585 160, with 9 df.

^ Group B: 19.042 222ß1 = 28.97 ^ ß1 = 356

Reduction = (1.521 356)(28.97) = 44.073 683, with 1 df

Residual = 52.96 - 44.073 683 = 8.886 317, with 7 df.

^ Group C: 21.349ß1 = 38.933 ^ ß1 = 1.823 645

Reduction = (1.823 645)(38.933) = 70.999 971, with 1 df

Residual = 78.641 - 70.999 971 = 7.641 029, with 8 df.

If a single regression were fitted (ignoring groups), we would have:

^ 74.898 667ß 1 = 93.949 334 ^ ß1 = 1.254 353

Reduction = 353)(93.949 334) = 629, with 1 df

Residual = 239.449 667 - 117.845 629

= 121.604 038, with 28 df.

FPL 17 -83-

Page 88: fplrp17

!

Two approaches to the analysis of covariance will be illustrated. The first method is that given by Snedecor (7), while the second is a general method involving the introduction of dummy variables.

Problem XVIII - Covariance Analysis

Snedecor (7) presents the analysis of covariance in a very neat form. The steps in this procedure are summarized in table 1.

Table 1.--Analysis of covariance

For test of difference in slopes: not significant at 0.05 level.

For test of levels (assuming common slopes):

The first three lines in this table summarize the results of the fitting of separate linear regressions for each group. In line 4, the residuals about the separate regressions and the associated degrees of freedom are pooled. This pooled term can be thought of as the sum of squared residuals about the maximum model; it represents the smallest sum of squares that can be obtained by fitting straight lines to these observations.

FPL 17 -84-

Page 89: fplrp17

Skipping to line 6 for the moment, the first four columns are the pooled degrees of freedom and corrected sums of squares and products for the groups. The last three columns summarize the result of using the pooled sums of squares and products to fit a straight line. The normal equation and solution for this fitting would be:

^ 68.647 586ß 1 = 108.718 455

^ ß1 = 1.583 719

Thus the reduction sum of squares with 1 degree of freedom is

Reduction = 1.583 719(108.718 455) = 172.179 483

And the residual sum of squares is

2Residual = Σy - Reduction

= 201.142 819 - 172.179 483

= 28.963 336 with 26 df.

This represents the residual that we would get by forcing the regressions for all groups to have the same slope even though they were at different levels. Since this is a more restrictive model, the residuals sum of squares will be larger than that obtained by letting each group regression have its own slope. The mean square difference in these residuals (line 5) can be used to test the hypothesis of common slopes. The error term for this test is the mean square of the pooled residuals for separate regressions (line 4). The F test gives no indication that the hypothesis of common slopes should be rejected. If the hypothesis of common slopes is rejected, we would usually go no further.

Having shown no significant difference in slopes, the next question would be whether the regressions differ in level. Under the hypothesis of no difference in levels (or slopes) we would, in effect, ignore the groups and use all of the data to fit a single regression. The results are summarized in line 8. Because of the added restriction (common levels) that has been imposed on this regression, the residuals will be larger than those obtained where we let the group regressions assume separate levels but force them to have a common slope (line 6). The mean square difference (line 7) provides a test of the hypothesis of common levels. The error for this test is the residual mean square for the model assuming common slopes (line 6).

FPL 17 -85-

Page 90: fplrp17

The significant value of F suggests that the group regressions are different. The difference is mostly due to a difference m levels. There is no evidence of a real difference in slopes.

If we are not interested in finding out whether the difference (if any) in the group regressions is in the slopes or the levels, an overall test could be made using the difference between lines 8 and 4. We would have:

It is possible to test more complex hypotheses than these. We could, for example, test for difference in slope and level between groups A and C, or for the average of groups A and B versus group C. We could also deal with multiple or curvilinear regressions and test for differences between specified coefficients or sets of coefficients. It is probably safe to say that readers who have sufficient understanding of regression to derive meaningful interpretations of such tests will usually know how to make them.

Covariance Analysis With Variables

In fitting a regression within a single group, some workers are accustomed to dealing with the model

where X0 is a dummy variable, defined to be equal to 1 for all observations. The normal equations would, of course, be

Since X0 is equal to 1 for all observations, the normal equations are equivalent to:

FPL 17 -86- GPO 815-411-6

Page 91: fplrp17

So, the end result will be the same as that given by the methods previously described.

The idea of a dummy variable comes in quite handy in dealing with the problem of group regressions. There are several ways of applying the idea, but the most easily understood is to introduce a dummy variable for each group. The dummy variable would be defined as equal to 1 for every observation in that particular group, and equal to zero for any observation that is in a different group. As we are interested in linear regressions of Y on X1, the dummy variables could (in the case of three

groups) be labeled X2, X3 , and X4 where

X2 = 1 for any observation falling in group A.

= 0 for an observation falling in any other group.

X3 = 1 for any observation falling in group B.

= 0 for an observation falling in any other group.

X4 = 1 for any observation falling in group C.

= 0 for an observation falling in any other group.

It is also necessary to introduce three variables representing the interactions between the measured variable (X 1) and the three dummy variables. These could be labeled X , X , and X7 where56

With these variables, we can now express the idea of separate linear regressions for each group by the general linear model

After solving for the coefficients, the regression for any group can be obtained by assigning the appropriate value to each dummy variable. Thus for Group A, X2 =1,

X3 = 0, and X4 = 0.

So we have:

The equation will be exactly the same as that we would get by fitting a simple linear regression of Y on X1 for the observations in group A only.

FPL 17 -87-

Page 92: fplrp17

Under the hypothesis that the three groups have regressions that differ in level but not in slope, the model would be

In this model, ß1 is the common slope, while ß2, ß3, and ß4 represent the different levels.

The difference in reduction sum of squares for these two models could then be used in a test of the hypothesis of common slopes.

Under the hypothesis that there is no difference in either slope or level, the model becomes

The difference in reduction between this model and the model assuming common slopes can be used to test the hypothesis that there is no difference in levels.

Problem XIX - Covariance Analysis With Dummy Variables

The use of dummy variables for a covariance analysis will lead to exactly the same result as the method of Snedecor (7). Applying the procedure to the data of the previous example, the values of the variables would be as follows:

FPL 17 -88-

Page 93: fplrp17

FPL 17 -89-

Page 94: fplrp17

Sums of Squares and Products (Uncorrected):

The model for separate regressions is

and the normal equations would be:

FPL 17 -90-

Page 95: fplrp17

or

The solutions, which are easily obtained by working with pairs of equations involving the same coefficients, are:

Thus, the separate regressions would be

These are the same as the equations that would have been obtained by fitting separate regressions in each group.

The reduction due to this maximum model would be:

Reduction

This gives a residual sum of squares of

Residual

FPL 17 -91-

Page 96: fplrp17

Under the hypothesis for common slopes but different levels, the model becomes

The normal equations are:

or,

The solutions are:

giving a reduction of

Reduction

Then to test the hypothesis of common slopes we have

FPL 17 -92-

Page 97: fplrp17

as before.

Now t o test the hypothesis of no difference in levels (assuming no difference in slopes), must fit model

The normal equations are:

o r

The solutions are:

so the reduction sum of squares is

Reduction

testThen commonfor common levels slopes) is

a s before.

FPL 17

Page 98: fplrp17

Although the two procedures will lead to exactly the same results, the computational routine of Snedecor’s procedure (7) is probably easier to follow and, therefore, better for the beginner. The advantage (if any) of the dummy variable approach might be that it gives a somewhat clearer picture of the hypotheses being tested. Once the dummy variable approach has beenlearned, it maybe easier to work out its extension to the testing of more complex hypotheses.

Dummy variables are also useful in introducing the relationship between regression analysis and the analysis of variance for various experimental designs. Those interested in this subject will find a brief discussion in Appendix D.

DISCRIMINANT FUNCTION

Closely related to the methods of regression analysis is the problem of finding a linear function of one or more variables which will permit us to classify individuals as belonging to one of two groups. The function is known as a discriminant. In forestry, it might be used, for example, to find a function of several measurements which would enable us to assign fire or insect damaged trees to one of two classes: “will live” or “will

The methods will be illustrated by the intentionally trivial example of classifying an individual as male or female by means of the individual’s height (X1), weight (X2),

and age (X3). To develop the discriminant, measurements of these three variables

were made on 10 men and 10 women.

FPL 17 -94-

Page 99: fplrp17

The first step is to compute the difference in the group means for each variable and the corrected sums of squares and products for each group.

Mean differences:

Corrected Within-Group Sums of Squares and Products:

The next step is to compute the pooled variances and covariances. The pooled variance for Xj will be symbolized by sjj and computed as

where: n m = number of males

nf = number of females

The pooled covariance of Xj and Xk will be symbolized by sjk and computed as

FPL 17 -95-

Page 100: fplrp17

The computed values of the pooled variances and covariances are

Now what we wish to do is fit a function of the form

such that the value of Y (for measured values of X1 X2, and X3) will enable us to

classify an individual as male or female. Fisher has that the bi values can be determined by solving the normal equations

entering the calculated values, we have

for which solutions are

Use and Interpretation of the Discriminant Function

Assuming for the moment that we are satisfied with the fitted discriminant, it could be used as follows:

(1) Compute the mean value of the discriminant for males and for females

FPL 17 -96-

Page 101: fplrp17

(2) The mean of these two

serves as a criterion for classifying individuals as male or female. Any individual for whom Y is greater than 25.4388 would be classified as male, and any individual for whom Y is less than 25.4388 would be classified as female.

Testing a Fitted Discriminant

Before using the discriminant function for classification purposes, we should test its significance. This can be done using the F test with p and (N-p-1) degrees of freedom.

where: p = Number of variables fitted.

N1 = Number of observations in the first group.

N2 = Number of observations in the second group.

For the previously fitted discriminant, we have:

Thus,

= 8.908; significant at the .01 level.

FPL 17 -97-

Page 102: fplrp17

This test tells us that there is a significant difference in the mean values of the discriminant between males and females. Looking at it another way, we have shown a significant difference between the two groups using measurements on several characteristics. This is analogous to the familiar t and F tests where a significant difference is shown between two groups using only a single variable. In fact, if we fit and test a discriminant function using just a single variable, for example, weight, we will get the same F value (29.824) as we would by testing the difference in weight between male and female using an F test of a completely randomized experimental design.

Testing the Contribution of Individual Variables or Sets of Variables

To test the contribution of any set of q variables in the presence of some other set 2of p variables, first fit a discriminant to the p variables and compute D p . Then fit

2all p + q variables and compute D . The test of the contribution of the q variables p + q

in the presence of the p variables is:

F (with q and N-p-q-1degrees of freedom)

Thus, to test the contribution of weight and age in the presence of height, we first fit a discriminant function for height alone. The single equation is:

2The discriminant for all three variables gave a value of D3 = 6.0127. The test is:

Hence, weight and age do not make a significant contribution to the discrimination between male and female when used after height.

FPL 17 -98-

Page 103: fplrp17

The contribution of single variables can be similarly tested. For example, we could test the contribution of age when fitted after height and weight. To determine

D2 2 for the discriminant using height andweight, we must solve the normal equations:

The solutions are:

2from which we find D2 = 5.9660

2For all three variables, we found D3 = 6.0127

Then,

Similar tests of the contribution of weight in the presence of height (significant at the .05 level) and the contribution of height after weight (not significant) suggest that weight alone provides about as good a means of discrimination as does the use of all three variables.

Reliability of Classifications

Using a discriminant function, we will misclassify some individuals. The probability of a misclassification can be estimated by using K = D/2 as a standard normal deviate and determining from a table of the cumulative normal distribution (table 8, Appendix E) the probability of τ > K.

Using all three variables, the value of D2 was 6.0127, giving D = 2.452 and K = 1.226. The probability of getting a standard normal deviate larger than 1.226 is found to be about P = 0.1101. About 11 percent of the individuals classified using this function would be assigned to the wrong group. For the data used to develop the discriminant, it actually turns out that 2 out of 20 (10 percent) would have been misclassified.

FPL 17 -99-

Page 104: fplrp17

For a discriminant involving height and weight but not age, the probability of a misclassification would be about 0.11096. For the discriminant involving weight alone, the probability of a misclassification is about 0.11098. Thus, we see, (as previous tests indicated) that our classifications using weight alone would be almost as reliable as those using weight, height, and age. For a discriminant involving height alone, the probability of a misclassification is about 0.18, and with a dis-criminant using age alone, about 0.368 of our classifications would be in error.

Reducing the Probability of a Misclassification

There are two possible procedures for reducing the proportion of misclassifications. One of these is to look for more or better variables to be used in the discriminant function. The second possibility is to set up a doubtful region within which no classification will be made. This requires determining two values, Ym and Yf. All individuals for which Y is greater than Y m will be classified as male, while all those for which Y is less than Yf will be classified as female. For values of Y between Yf and Y m no classification will be made.

To determine Y m and Yf it is first necessary to decide the proportion of misclassifications we are willing to tolerate. Suppose, for example, we will use our three-variable discriminant function but we wish to make no more than 5-percent misclassifications. The procedure is to look in a table of the cumulative normal for a value of τ such that the probability of getting a standard normal deviate greater than τ is 0.05. The value of τ meeting this requirement is τ = Then the appropriate limit values are

For the three-variable discriminant, we have:

The ref ore,

An individual with a Y value greater than 26.4659 would be classified as a male while an individual with Y less than 24.4116 would be classified as female. No classification would be given for individuals with a Y value between 24.4116 and 26.4659.

FPL 17 -100-

Page 105: fplrp17

Basic Assumptions

The methods described here assume that for each variable, the within-group variance is the same in both groups. Different variables can, of course, have different variances. Also, any given pair of variables is assumed to have the same within-group covariance in each group. All variables must follow (within the group) a multivariate normal distribution. Since the methods are based on large sample theory, it is ordinarily desirable to have at least 30 observations in each group.

ELECTRONIC COMPUTERS

The present popularity of regression analysis is due in no small way to the advances that have been made in electronic computers. The computations involved in fitting regressions with more than two or three independent variables are quite tedious, and with a large number of observations, fitting even a simple linear regression may be an unpleasant task. Also, the possibilities for simple but devastating arithmetical mistakes are great. Modern electronic computers have overcome both of these obstacles. They can handle huge masses of raw data and subject it to numerous mathematical operations in a matter of minutes, and they seldom make mistakes.

Nearly every phase of regression analysis can be handled by one or more of the computers and almost every computing center has programs for obtaining sums of squares and products, fitting multiple regressions, inverting matrices, computing reduction and residual sums of squares, etc. Despite the high per hour rental on these computers, the cost of doing a particular regression computation will usually be a small fraction of what it would cost to do the same job with a desk calculator, and the work will rarely contain serious errors.

Because of the numerous variations in these programs and the rate at which new ones are being produced, no attempt will be made to list everything that is available or to describe the use of such programs. This information can best be obtained by first learning what regression is, how and why it works, and then discussing your needs with a computer specialist.

To merely indicate what can be done by a computer, a brief description will be given of a few of the existing programs.

TV REM is the designation of a regression program for the IBM 704 computer. It will take up to 586 sets of observations on a Y and up to 9 independent (X) variables and compute the mean of each variable and the corrected sums of squares and products

FPL 17 -101-

Page 106: fplrp17

for all variables. It will also fit the regressions of Y on all possible linear combinations of up to nine independent variables (a total of 511 different equations) and the reduction sum of squares associated with each fitted equation. The cost of this may vary from $40 to $200, depending largely on the machine rental rate and to a lesser extent on the volume of data. This program is described in a publication by L. R Grosenbaugh (6).

SS XXR is another program for the IBM 704. It will take up to 999,999 sets of up to 41 variables and compute their means, all possible uncorrected sums of squares and products, the corrected sums of squares and products, and the simple correlation coefficients for all possible pairs of variables. The cost may run from $5 to $50, depending again on machine rental rates and on the number of observations and variables.

These give just a faint idea of what is available. Other programs will compute sums of squares and products and give the inverse matrix for 40 or 50 variables, fit regressions for as many as 60 independent variables, or fit weighted regressions and regressions subject to various constraints One programwill follow what is known as a stepwise fitting procedure (see Appendix A, Method III), in which the best single independent variable will be fitted and tested first; then from the remaining variables, the program will select and fit the variable that will give the greatest reduction in the residual sum of squares. This will continue until a variable is encountered that does not make a significant reduction. The program can also be altered so as to introduce a particular variable at any stage of the fitting.

No space need be devoted in this Research paper to encouraging the reader to look into the computer possibilities, for he will be a convert the first time he has occasion to fit a four-variable regression and compute the c-multipliers--ifnot sooner.

CORRELATION COEFFICIENTS

General

In earlier literature, there is frequent reference to and use of various forms of correlation coefficients. They were used as a guide in the selection of independent variables to be fitted, and many of the regression computations were expressed in terms of correlation coefficients. In recent years, however, their role has been considerably diminished, and in at least one of the major texts on regression analysis, correlation is mentioned less than a half-dozen times. The subject will be touched upon lightly here so that the reader will not be entirely mystified by references to correlation in the earlier literature.

FPL 17 -102- GPO 815-411-5

Page 107: fplrp17

The Simple Correlation Coefficient

A measure of the degree of association between two normally distributed variables (Y and X) is the simple correlation coefficient symbolized by ? and defined as

The correlation coefficient can have values from -1 to +1. A value approaching +1 would indicate a strong positive relationship between Y and X, while a value approaching -1 would indicate a strong negative relationship. A value approaching 0 would suggest that there is little or no relationship between Y and X.

For a random sample, the correlation between X and Y can be estimated by

In regression work we will seldom be dealing with strictly random samples. Usually we try to get a wide range of values of the independent variable (X) in order to have more precise estimates of the regression coefficients or to spot the existence of curvilinear relationships. In addition, the data may not be from a normal population. Far these reasons, the sample correlation coefficient computed from regression data will usually not be a valid estimate of the population correlation coefficient.

It will, however, give a measure of the degree of linear association between the sample values Y and X and this has been one of its primary uses in regression. If we have observations on a Y and several X variables, the X variable having the strongest (nearest to or -1) correlation with Y will give the best association with Y in a simple linear regression. That is, a linear regression of Y on this X will have a smaller residual than that of the simple linear regression of Y on any of the other X variables.

In this use of the correlation coefficient, it must be remembered that it is a measure of linear association. A low correlation coefficient may suggest that there is little or no linear relationship between the observed values of the two variables. There may, however, be a very strong curvilinear relationship. The simple correlation between Y and the X variables and among the X variables themselves may also be used as a somewhat confusing guide in the selection of independent variables to be used in the fitting of a multiple regression. In general, when two independent variables are highly correlated with each other, it is unlikely that a linear regression involving both of these variables will be very much better than a linear regression involving only one of them. If we had, for example:

FPL 17 -103-

Page 108: fplrp17

then the regression Y = ß0 + ß1X1 + ß2X2 would probably not be much better than

either Y1 = ß0 + ß1X1, or Y2 = ß0 + ß2X2. Of the two simple regressions,

Y1 = ß0 + ß1X1 would give the better fit, since the correlation of Y and X is greater

than the correlation of Y and X2. In practice, the correlations usually are not so

large or the indications so clearcut. When a number of X variables are under consid-eration for use in a multiple regression, inspection of the simple correlation coefficients between Y and each X and between pairs of X’s provides little more than a rough screening.

Partial Correlation Coefficients

In the previous paragraph we considered an approach to the problem of which variables to use in a multiple regression. In the case of a Y and two X variables, this resolved down to the question of whether or not a linear regression involving X1 and X2 would be any better than a simple regression involving only X1 or X2 as

the independent variable. The simple correlation coefficients sometimes shed some light on this, but they are just as likely to confuse the issue.

The partial correlation coefficient may give a better answer. Having fitted a regression of Y on one or more X variables, the partial correlation coefficient indicates the degree of linear association between the regression residuals (deviations of Y from the regression) and some other X variable. Thus ry2·1 would be a measure

of the linear relationship between y and X2 after adjustment for the linear relationship

between Y and X1. The value of ry2·1 is given by

In the example where we hadry1 = .84, ry2 = .78, and r21(= r12) = .92 we would have

FPL 17 -104-

Page 109: fplrp17

This tells us that after fitting the linear regression of Y on X1, there would be little

associations between Y and X2. A more exact way of putting this is that the corre-

lation between X2 and the residuals about the regression of Y on X1 is very low (.034).

The general equation for the partial correlation between Y and Xj after fitting the linear regression of Y on Xk is

This is sometimes referred to as the first partial correlation coefficient.

If we wished to know the correlation between a variable (say X3) and the residuals

of Y about the multiple regression (say Y on X1 and X2), the formula would be

This is sometimes referred to as the second partial correlation coefficient. In order to compute the second partial it wouldbe necessary to first compute the first partials (ry2·1, ry3·1, etc.) by means of the previous formula.

The process can be extended to the extent of the individual's inclination and energy. The correlation of X4 with the residuals of Y after fitting the regression of Y on X1.

X2, andX3 would be:

The general equation for the correlation between Xj and the residuals of Y after

fitting the regression of Y on X1, X2,--- and Xk is

The use of partial correlation coefficients as an aid in the selection of the best independent variables to be fitted in a multiple regression, has lost much of its popularity since the advent of electronic computers. With these machines it has

FPL 17 -105-

Page 110: fplrp17

fairly easy to fit regressions involving many or all possible combinations of a set of independent variables and then select the best combination by an inspection of the residual mean squares.

The Coefficient of Determination

A commonly used measure of how well a regression fits a set of data is the

coefficient of determination, symbolized by r 2 if the regression involves only one

independent variable, and by R2 if it involves more than one independent variable. For the common case of a regression with a constant term (ß0) which has been fitted with

corrected sums of squares and products, the coefficient of determination is calculated as

Thus, R2 represents the proportion of the variation in Y that is associated with the regression on the independent variables.

If the regression has been fitted and the reduction computed with uncorrected

sums of squares the formula for R2 is

The relationship between the coefficient of determination and the correlation

coefficient can be seen by an inspection of how r 2 is computed. For a simple linear regression the normal equation and its solution are

Then, since the reduction sum of squares is equal to the estimated coefficient times the right-hand side of its normal equation, we have

Thus,

FPL 17 -106-

Page 111: fplrp17

---

---

This can be recognized as the square of the simple correlation coefficient

By analogy, R is called the coefficient of multiple correlation.

Tests of Significance

The simple and multiple correlation coefficients (r and R) are sometimes used to test a fitted regression. The distribution of these sample variables has been tabulated and the test consists of comparing the sample value with the tabulated value. If the sample value is greater than the tabular value at a specified probability level, the regression is said to be significant.

In the case of a simple linear regression Y = ß0 + ß1X1, r has degrees of freedom

equal to the degrees of freedom for the residual mean square. The test of the hypothesis that ? = 0 is equivalent to the previously described tests of the hypothesis that ß1 = 0. It is possible to test other hypotheses about ? or to test the difference

among two sample r values, but these require a transformation of r to Z. The details are given by Snedecor (7).

In the case of a multiple regression Y = ß0 + ß1X1 + ß2X2 + + ßk Xk, testing

the significance of R is equivalent to testing the hypothesis that ß1 = ß2 = = ßk = 0.

R has degrees of freedom equal to the degrees of freedom for the residual mean square. The tables of R take into account the number of variables fitted.

Using r or R for tests of significance seems to offer no advantages over the appropriate F- or t-test.

THE BEST OF TWO LINEAR REGRESSIONS

When a single set of observations has been used to fit simple linear regressions of Y on each of two independent variables it is often desirable to know which of these regressions is the best. That is, we have Y1 = ß11X1 and Y2 = ß02 + ß12X2,

both of which are significant and we want to know whether one is significant1y better than the other.

FPL 17 -107-

Page 112: fplrp17

A test credited to Hotelling and described by W. D. Baten in the Journal of the American Society of Agronomy (Vol. 33: pp. 695-699) is to compare

with tabular t with n - 3 degrees of freedom.

In this equation,

|r| = The absolute value (sign ignored) of the simple correlation coefficient (ry1 = the correlation between Y and X , etc.).

To illustrate, suppose we have the following set of observations.

FPL 17 -108-

Page 113: fplrp17

Then,

not significant at the .05 level.

Thus, the linear regression of Y on X2 is not significantly better (from the stand-

point of precision) than the regression of Y on X1. If X2 were significantly better than

X1 but X1 were more easily measured, thenselecting the best of the two regressions

becomes a matter of deciding how much the extra precision of X2 is worth.

By working with partial correlation coefficients it is possible to extend this test to the problem of which of two variables is better, when fitted after some specified set of independent variables. The test cannot, unfortunately, be extended to the comparison of two sets of independent variables.

FPL 17 -109-

Page 114: fplrp17

SELECTED REFERENCES

1. Dixon, W. J., and Massey, F. J., Jr. 1957. Introduction to statistical analyses. 488 pp. New York: McGraw-Hill

Book Co.

2. Ezekial, M., and Fox, K. A. 1959. Methods of correlation and regression analysis: linear and curvilinear.

Ed. 3, 548 pp. New York: John Wiley & Sons.

3. Freese, F. 1962. Elementary forest sampling. Forest Serv., U.S. Dept. of Agriculture,

Agr. Handbook No. 232, 91 pp., Washington, D.C.

4. Friedman, J., and Foote, R. J. Computational methods for handling systems of simultaneous equations.

Marketing Serv., U.S. Dept. of Agriculture, Agr. Handbook No. 94, 109 pp., U.S. Government Printing Office, Washington, D.C.

5. Goulden, C. H. 1952. Methods of statistical analysis. Ed. 2, 467 pp., New York John Wiley

& Sons.

6. Grosenbaugh, L. R. 1958. The elusive formula of best fit: a comprehensivenew machine program.

U.S. Forest Serv., Southern Forest Expt. Sta. Paper No. 158, 9 pp., New Orleans, La.

7. Snedecor, G. W. 1956. Statistical methods. Ed. 5, 534 pp., Ames, Iowa: Iowa State University

Press.

8. Walker, H. M. 1951. Mathematics essential for elementary statistics. Ed. 2, New York The

Henry Holt & Co.

9. Williams, E., Jr. Regression analysis. 214 New York: John Wiley & Sons.

FPL 17 -110-

Page 115: fplrp17

APPENDIX A-The solution of normal equations

There are numerous routines available for solving a set of normal equations. One of these is to use the c-multipliers as described in the section on matrix algebra. However, if the c-multipliers are not needed for setting confidence limits or testing hypotheses, then there are less laborious procedures available. Three of these will be illustrated by solving the normal equations that appear in the first part of Problem I:

In each of these methods, and throughout this Paper, more digits are carried than are warranted by the rules for significant digits. Unless this is done it is usually impossible to get any sort of check on the computations. After the computations have been checked, the coefficients should be rounded off to a number of digits com-mensurate with the precision of the original data.

Method I. --Basic Procedure. Basically, all methods involve manipulating the equations so as to eliminate all but one unknown, and then solving for this unknown. Solutions for the other unknowns are then obtained by substitution in the equations that arise at intermediate stages. This may be illustrated by the following direct approach which may be applied to any set of simultaneous equations.

^ Step 1. Divide through each equation by the coefficient of ß1, giving

^ Step 2. Eliminate ß1 by subtracting any one of the equations (say the first) from

each of the others

FPL 17 -111-

Page 116: fplrp17

^ Step 3. Divide through each equation by the coefficient of ß2 giving

^ Step 4. Subtract either equation (say the first) from the other to eliminate ß 2.

^ Step 5. Solve for ß 3

^ ^ Step 6. To solve for ß2, substitute the solution for ß3 in one of the equations

(say the second) of Step 3.

so,

^ ^ ^ Step 7. To solve for ß1 substitute for ß2 and ß3 in one of the equations (say the

third) of Step 1.

Step 8. As a check, add up all of the original normal equations giving

^ ^ ^ Now substitute the solutions for ß 1 , ß 2, and ß 3 in this equation,

Check.

Method 11.--Forward Solution. A systematic procedure for solving the normal equations is the so-called ‘‘Forward Solution.” It is a more mechanical routine and perhaps a bit more difficult to learn and remember, but it has the advantage of providing some supplementary information along the way. The steps will be described using the symbols of table 2. The numerical results of these steps will be presented

FPL 17 -112-

Page 117: fplrp17

in table 3. In this example, the columns headed “Coefficients” and “Reduction give supplementary information. If only the final regression coefficients are these columns may be omitted.

desired,

In any of the mechanical computation systems there is a pattern to the computations. Once this pattern has been recognized, the systems are easily applied and extended to larger or smaller problems. Learning the system is primarily a matter of recognizing the pattern.

Step 1. Write the upper right half of the matrix of sums of squares and products

along with the sums of products involving Y. Thus, in table 2,

etc. In the column headed “Coefficientsn are the

regression coefficients that would be obtained by fitting a simple linear regression of Y on each of the X variables. The coefficients are computed as

Table 2.--The forward solution in symbols.

FPL 17 -113-

Page 118: fplrp17

Tabl

e 3.

--T

he f

orw

ard

solu

tion-

-num

eric

al e

xam

ple.

Page 119: fplrp17

In the last column are the reduction sums of squares that would be obtained by fitting the linear regression.

These are computed as

or,

Step 2. Rewrite the sums of squares and products from the X1 row.

Step 3. Divide each element in row 2 by the first element in that row (a11). Thus, q12 = a12/a11

Step 4. Compute the matrix of sums of squares and products adjusted for the regression of Y on X1' The general equation is

Thus,

The coefficients obtained at this stage are those that would be obtained for X2 or X3 when fitted along with X1. To indicate this the symbol often used is b2·1 (or sometimes

bY2·1) and b3·1 (or sometimes bY3·1). In the last colums are the reductions that would be attributable to X2 (or X3) whenfitted after X1. In this example the reduction due to

X1 alone is 1102.7690 and the reductiondueto X2 after X1 (i.e., the gain due to X2) is

629.5758; so the total reduction due to fitting X1 and X2 would be the sum of these or

1732.3448.

At this stage we could, if desired, compute a residual sum of squares and mean square and test whether X2 or X3 made a significant reduction when fitted after X1. If neither did, we might not wish to continue the fitting.

FPL 17 -115-

Page 120: fplrp17

Step 5. Copy the adjusted sums of squares and products in the first row of Step 4.

Step 6. Divide each element of Step 5 by the value of the first element (a22·1).

Step 7. Compute the matrix of sums of squares and products adjusted for the regression of Y on X1 and X

2.

The regression coefficient b3·12 is the coefficient for X3 fitted in the presence

of X1 and X2 and is one of the terms we are seeking (previously we labelled it ^ as ß3, but we use b3·12 here to distinguish it from b3·1 and b3). The other two

^ ^ terms b2·13 (or ß 2) and b1·23 (or ß 1) are easily obtained from lines 6 and 3.

Thus, from line 6

and from line 3,

The reduction obtained in Step 7 is the gain due to X3 after fitting X1 and X2. Since

the reduction due to X1 and X2 was 1732.3448 and the gain due to X3 after X1 and X

is 4.1860, the total reduction due to X1, X2, and X3 is 1736.5308 (as given in Problem

XI). We could at this stage, test the gain due to X3 and decide whether to retain it as

a variable in the regression. If we decided to drop X3 the coefficients for the

regression of Y on X1 and X2 could be obtained from Steps 1 through 4 simply by

ignoring all the terms having a 3 in the subscript.

Method III.--Stepwise Fitting. This method is merely a modification of the second method. At each stage of the fitting, the sums of squares and products (original or adjusted) are rearranged so that the variable giving the largest reduction of the residuals is on the left and will be the next one fitted. Also, at each stage, the reduction due to the best variable is tested, and if the gain is not significant the fitting is stopped.

FPL 17 -116-

2

Page 121: fplrp17

The procedure is helpful for screening a large number of independent variables in order to select those that are likely to give a good fit to the sample data. It should be noted, however, that the procedure is strictly exploratory. The probabilities associated with tests of hypotheses that are selected by examination of the data are not what they seemtobe. Significance tests made in this way do not have the same meaning that they have when applied to a single preselected hypothesis.

It might also be noted that though the stepwise procedure will frequently lead to the linear combination of the independent variables that will result in the smallest residual mean square, it does not always do so. This can only be done by fitting all possible combinations and then comparing their residuals. Here again, tests of significance may be informative, but the exact probabilities are unknown.

FPL 17 -117-

Page 122: fplrp17

APPENDIX 6-Matrix inversion

The inversion of a matrix is a common mathematical problem and dozens of computational schemes have been devised for this purpose. The job is not particularly complex, but it can be quite tedious and it is very easy to make simple but disastrous arithmetical mistakes. To avoid a load of unpleasant labor and the possibility of some frustrating mistakes, it is best to let an electronic computer handle the work (this is true of all regression calculations). For a few dollars, the computer will do a job that might takes days on a desk calculator.

There will be times, however, when electronic computer facilities are not immediately available and hand computation is necessary. One of the many computa-tional routines for inverting a symmetrical matrix is known as the “Abbreviated Doolittle Method.” To illustrate the procedure we will obtain the inverse of the matrix of uncorrected sums of squares and products from the second part of Problem I. The matrix is

In describing this method, the elements of the matrix to be inverted will be symbolized by aij where i indicates the row and j the column in which the element

appears. Since the matrix is symmetrical, aij = aji . The elements of the inverse

matrix will be symbolized by cij. Since the original matrix is symmetrical, the

inverse will also be so and hence, cij = cji. As this is a matrix of uncorrected sums

of squares and products we will let i and j start at zero. If we were working with corrected sums of squares and products we would usually let i and j start at one.

The results of each step in the method will be shown symbolically in table 4 and numerically in table 5. In following these steps it is important to notice the pattern of the computations. Once this has been recognized, the extension to a matrix of any size will be obvious.

FPL 17 -118-GPO 815-411-4

Page 123: fplrp17

Table 4.--Invertinga symmetric matrix

FPL 17 -119-

Page 124: fplrp17

Tabl

e 5.

--Inv

ertin

g a

sym

met

ric

mat

rix-

-num

eric

al e

xam

ple.

Page 125: fplrp17

Step 1. In the A Columns write the upper right-halfof the matrix to be inverted.

Step 2. In the I Columns write a complete identity matrix of the same dimensions as the matrix to be inverted.

Step 3. In the check column perform the indicated summations. For row 0 the sum will be a00 + a01 + a02 + a03 + 1. For row 1 the sum will be a10 + a11 +

a12 + a13 + 1, and so forth. Note that a10 = a01 (the matrix is symmetrical).

Step 4. Copy the entries from row 0. In table 4, the entry in the first I Column (=1) has been symbolized by d00.

Step 5. Divide each element (including the check sum) of line 4 by the first element (a00) in that line. The sum of all of the elements in the A and I Columns

will equal the value in the check column if no error has been made.

Step 6. The elements in this line (including the check) are obtained by multiplying each element of line 4 (except the first) by b 01 and subtracting this

quantity from the corresponding elements of row 1. Thus, a11·0 = a11 - b01a01 and a12· 0 = a12 - b01 a02. The sum of these elements must equal the value in

the check column.

Step 7. Divide each element in line 6 by the first element in that line (a11· 0). Check.

Step 8. The elements in this line are obtained by subtracting two quantities from each element of row 2. The two quantities are b02 times (the element in

line 4 below the row 2 element) and b 12· 0times (the element in line 6 below

the row 2 element. Thus,

and

The elements in line 8 must equal the value computed for the check column.

Step 9. Divide each element of line 8 by the first element in this line. Check.

Step 10. The elements of this line are obtained by subtracting three quantities from each element of row 3. The three quantities are b03 times (the line 4

element below the row 3 element), b13· 0 times (the line 6 element below the

row 3 element), and b23· 01 times (the line 8 element below the row 3 element).

Thus, -121-FPL 17

Page 126: fplrp17

and

The sum of the elements in line 10 must equal the computed value in the check column.

Step 11. Divide each element in line 10 by the first element in that line (a33· 012).

Step 12. Compute the c-multipliers by the following formulae:

Step 13. As a final check, multiply the original matrix by the inverse. The product should be the identity matrix

FPL 17 -122-

Page 127: fplrp17

--

--

APPENDIX C-Some simple functions and curve forms

I. Y = a + bX Straight line

Linear Model: Y = b0 + b1X

a = Y-intercept (Value of Y when X = 0) M 124 625

b = Slope (Change in Y per unit change in X)

Estimates:

211. (Y - a) = k(X - b) Second degree parabola

2Linear Model: Y = b0 + b1X + b2X

2 M 124 625 Y-intercept is at Y = kb + a

X-intercepts are at X (complex if is negative)

Estimates:

FPL 17 -123-

Page 128: fplrp17

-- III. (Y - a) = k/X Hyperbola

a = Level of horizontal asymptote

Estimates:

IV.

Estimates:

FPL 17 -124-

Page 129: fplrp17

V.

Estimates:

FPL 17 -125-

Page 130: fplrp17

VI.

M124618

^ Estimates: a = anti-log b0

^ b =b 1

^ c = anti-log b2

FPL 17 -126-

Page 131: fplrp17

VII. Y = ab(X - c)2

2Linear Model: log Y = b + b X + b X

M 124 624

Estimates:

VIII. 10Y = aXb

Linear Model: Y = b0 + b1 (log X)

M 124 624 A

Estimates: a = anti-log b0

^ b = b1

FPL 17 -127-

Page 132: fplrp17

APPENDIX D-The analysis of designed experiments

In the section on covariance analysis we encountered the use of dummy variables in a regression analysis. This leaves us very close to the connection between regression analysis and the analysis of variance in designed experiments. Those who have some familiarity with the analysis of variance of designed experiments may be interested in taking a look at this connection.

As a simple example, suppose that we have a completely randomized design comparing three treatments with four replications of each. The yields might be as follows:

Treatment

I

II

III

Yields

12 17 16 15

14 9 13 12

11 20 18 13

Sums

60

48

62

170

For the standard analysis of variance of this design we first calculate the correction term and the sums of squares for total, treatment, and error.

Correction term (CT) =

Total

Treatment

Error

FPL 17 -128-

Page 133: fplrp17

The completed analysis is

Source

Treatments

Error

Total

df SS MS F

2 28.6667 14.3333 1.593

9 81.0000 9.0000

11 109.6667

These computations are merely a simplified form of regression analysis. To see this, we first represent the yield for each plot in the experiment by the linear model

where: X1 = 1 for any plot receiving treatment I;

= 0 otherwise

X2 = 1 for any plot receiving treatment 11;

= 0 otherwise

X3 = 1 for any plot receiving treatment III;

= 0 otherwise.

ß1, ß2, and ß3 = The effects of treatments I, II, and III, expressed as

deviations from the overall mean (represented by ß0).

Because the treatment effects are expressed as deviations from the mean, they will sum to zero (i.e., ß1 + ß2 + ß3 = 0) so that we can express one coefficient in terms

of the other two (say ß3 = -ß1 - ß2) and rewrite the model

or

where:

FPL 17 -129-

Page 134: fplrp17

For any plot receiving treatment I, the independent variables will have values X' = 1, and X' = 0; for treatment II plots, the values are X' = 0, and X' 2 = 1; and for

' ' treatment III plots, X = -1, and X2 = -1. Thus the study data can be listed as follows:

Treatment Y =Yield X ' 1 X ' 2

I 12 1 0 17 1 0 16 1 0 15 1 0

II 14 0 1 9 0 1

13 0 1 12 0 1

III 11 -1 -1 20 -1 -1 18 -1 -1 13 -1 -1

Sums 170 0 0

The normal equations for fitting the revised model (with uncorrected sums of squares and products) are:

or

The solutions are:

FPL 17 -130-

Page 135: fplrp17

The reduction sum of squares for this model is therefore

Reduction = (14.1667)(170) + (.8333)(-2) + (-2.1667)(-14)

= 2437.0062, with 3 df,

and the residual sum of squares is

2Residual = CY - Reduction

= 2518 - 2437.0062 = 80.9938, with 12 - 3 = 9 df.

The hypothesis of no difference among treatments is equivalent to the hypothesis that ß1 = ß2 = ß3 = 0, and the model becomes

which for the normal equation is

or

^ The solution is ß 0 = so the reduction sum of squares is

Reduction = = 2408.3390, with 1 df.

Then the analysis of variance for testing this hypothesis is

Source df SS MS

Reduction due to maximum model 3 2437.0062

Reduction due to hypothesis model 1 2408.3390 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -Difference for testing hypothesis 2 28.6672 14.3336

Residuals about maximum model 9 80.9938 8.9993

Total 12 2518

F = 14.3336 = 1.5932/9df 8.9993

Except for rounding errors and differences in terminology, this is the same result as the standard test procedure.

FPL 17 -131-

Page 136: fplrp17

In illustrating the regression basis for the analysis of a designed experiment, we have made use of dummy variables so as to avoid too much of a departure from the familiar regression situation. But in most text books on experimental design the dummy variables are present by implication only. The model is written in terms of the coefficients. Thus, for the completely randomized design the model might be written

where = The observed yield of the j plot of the th treatment

= The overall mean yield

1 = The effect of treatment i expressed as a departure from the overall

mean (so that 1

= 0).1

i j = The er ror associated with the j plot of the i treatment.

For the randomized block design with one replication of each treatment in each block, the model is

where ßi = The effect of block i expressed as a departure from the overall mean, so

that Σi ßi = 0.

Other terms are as previously defined.

Thus, each experimental design is defined by some linear model. The analysis of variance for the design involves a least-squares fitting of the model under various hypotheses and testing the differences in residuals, As in any regression analysis, the hypothesis to be tested should be specified prior to examination of the data.

FPL 17 -132-

Page 137: fplrp17

APPENDIX E-Tables

Table 6.--The distribution of F

M 124 644

133

Page 138: fplrp17

Table 6;--The distribution of F (cont.)

M 124 645

Reproduced by permission of the author and publishers from table 10.5.3 of Snedecor's Statistical Methods (ed. 5), © 1956, Iowa State University Press, Ames, Iowa. Permission has also been granted by the literary executor of the late Professor Sir Ronald A. Fisher and Oliver and Boyd Ltd., publishers, for the portion of the table computed from Dr. Fisher's table VI in Statistical Methods for Research Workers.

134 GPO 815-411-3

Page 139: fplrp17

Table 7.--The distribution of t

M 124 643

Table reproduced in part from table III of Fisher and Yates' Statistical Tables for Biological. Agricultural. and Medical Research. published by Oliver and Boyd Ltd., Edinburgh. Scotland. Permission has been given by Dr. F . Yates. by the literary executor of the late Professor Sir Ronald A. Fisher. and by the publishers.

135

Page 140: fplrp17

Table 8.--The cumulative normal distribution (Probability of a standard normal deviate being greater than 0 and less than τ)

τ .00 .01 .02 .05 .07 .08 .09

.0

1 1.1 1.2 1.3 1.4

1.5 1.6 1.7 1.8 1.9

2.0 2.1 2.2 2.3 2.4

2.5 2.6 2.7 2.8 2.9

3.0 3.1 3.2 3.3 3.4

3.6

3.9

0000 0040 0080 0120 0160 0199 0239 0279 0319 0359 0398 0438 0478 0517 0557 0596 0636 0675 0714 0753 0793 0832 0871 0910 0948 0987 1026 1064 1103 1141 1179 1217 1255 1293 1331 1368 1406 1443 1480 1517 1554 1591 1628 1664 1700 1736 1772 1808 1844 1879

1915 1950 1985 2019 2054 2088 2123 2157 2190 2224 2257 2291 2324 2357 2389 2422 2454 2486 2517 2549 2580 2611 2642 2673 2704 2734 2764 2794 2823 2852 2881 2910 2939 2967 2995 3023 3051 3078 3106 3133 3159 3186 3212 3238 3264 3289 3315 3340 3365 3389

3413 3438 3461 3485 3508 3531 3554 3577 3599 3621 3643 3665 3686 3708 3729 3749 3770 3790 3810 3830 3849 3869 3888 3907 3925 3944 3962 3980 3997 4015 4032 4049 4066 4082 4099 4115 4131 4147 4162 4177 4192 4207 4222 4236 4251 4265 4279 4292 4306 4319

4332 4345 4357 4370 4382 4394 4406 4418 4429 4441 4452 4463 4474 4484 4495 4505 4515 4525 4535 4545 4554 4564 4573 4582 4591 4599 4608 4616 4625 4633 4641 4649 4656 4664 4671 4678 4686 4693 4699 4706 4713 4719 4726 4732 4738 4744 4750 4756 4761 4767

4772 4778 4783 4788 4793 4798 4803 4808 4812 4817 4821 4826 4830 4834 4838 4842 4846 4850 4854 4857 4861 4864 4868 4871 4875 4878 4881 4884 4887 4890 4893 4896 4898 4901 4904 4906 4909 4911 4913 4916 4918 4920 4922 4925 4927 4929 4931 4932 4934 4936

4938 4940 4941 4943 4945 4946 4948 4949 4951 4952 4953 4955 4956 4957 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4974 4975 4976 4977 4977 4978 4979 4979 4980 4981 4981 4982 4982 4983 4984 4984 4985 4985 4986 4986

4987 4987 4987 4988 4988 4989 4989 4989 4990 4990 4990 4991 4991 4991 4992 4992 4992 4992 4993 4993 4993 4993 4994 4994 4994 4994 4994 4995 4995 4995 4995 4995 4995 4996 4996 4996 4996 4996 4996 4997 4997 4997 4997 4997 4997 4997 4997 4997 4997 4998

4998 4998 4999 4999 4999 4999 4999 4999 4999 4999

5000

M 124 646

Reprinted from Table 8.8.1 of Statistical Methods (ed. 5) by G. W. Snedecor, © 1956, published by the Iowa State University Press, Ames, Iowa, and by permission of the author and publisher.

136 1.5-137

Page 141: fplrp17

Fore

st S

ervi

ce

regi

onal

exp

erim

ent

stat

ions

and

For

est

Pro

duct

s la

bora

tory

Page 142: fplrp17

PUBLICATION LISTS ISSUED BY THE

FOREST PRODUCTS LABORATORY

The following lists of publications deal with investigative projects of the Forest Products Laboratory or relate to special interest groups and are avail-able upon request:

Box, Crate, and Packaging Data

Chemistry of Wood

Drying of Wood

Fire Protection

Fungus and Insect Defects in Forest Products

Glue and Plywood

Growth, Structure, and Identification of Wood

Furniture Manufacturers, Woodworkers, and Teachers of Woodshop Practice

Logging, Milling, and Utilization of Timber Products

Mechanical Properties of Timber

Pulp and Paper

Structural Sandwich, Plastic Laminates, and Wood-Base Components

Thermal Properties of Wood

Wood Finishing Subjects

Wood Preservation

Architects, Builders, Engineers, and Retail Lumbermen

Note: Since Forest Products Laboratory publications are so varied in subject matter, no single catalog of titles is issued. Instead, a listing is made for each area of Laboratory research. Twice a year, December 31 and June 30, a list is compiled showingnewreports for the previous 6 months. This is the only item sent regularly to the Laboratory’s mailing roster, and it serves to keep current the various subject matter listings, Names may be added to the mailing roster upon request,

GPO 815-411-2

Page 143: fplrp17

NOTES

Page 144: fplrp17

NOTES

Page 145: fplrp17

NOTES

Page 146: fplrp17

NOTES

Page 147: fplrp17

The Forest Service, U.S. Department of Agriculture, is dedicated to the principle of multiple use management of the Nation's forest resources for sustained yields of wood, water, forage, wildlife, and recreation. Through for-estry research, cooperation with the States and private forest owners, and management of the National Forests and National Grasslands, it strives--as directed by Congress--to provide increasingly greater service to a growing Nation.

U. S. Forest Products Laboratory. Linear regression methods for forest research, by Frank Freese. Madison,

Wis., F.P.L., 1964. 136 pp.. illus. (U.S. FS res. paper FPL 17)

A presentation and discussion of the methods of linear regression analysis that have been found most useful in forest research. Topics treated include the fitting and testing of linear models, weighted regression, confidence limits, covariance analysis, and discriminant functions. The various methods are illustrated bytypical numerical examples and their solution.

Page 148: fplrp17

FOREST PRODUCTS LABORATORY

U.S. DEPARTMENT OFAGRICULTURE

FOREST SERVICE---MADISON,WIS.

In Cooperation with the University of Wisonsin