the concise encyclopedia of statisticsstewartschultz.com/statistics/books/the concise encyclopedia...

612

Upload: others

Post on 12-Jul-2020

111 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge
Page 2: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

The Concise Encyclopedia of Statistics

Page 3: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Yadolah Dodge

The Concise Encyclopediaof Statistics

With 247 Tables

123

Page 4: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

AuthorYadolah DodgeHonorary ProfessorUniversity of Neuchâ[email protected]

A C.i.P. Catalog record for this book is available from the Library of Congress Control

ISBN: 978-0-387-32833-1

This publication is available also as:Print publication under ISBN 978-0-387-31742-7andPrint and electronic bundle under ISBN 978-0-387-33828-6

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned,specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction onmicrofilms or in other ways, and storage in data banks. Duplication of this publication or parts thereof is onlypermitted under the provisions of the German Copyright Law of September 9, 1965, in its current version, andpermission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution underthe German Copyright Law.

Springer is part of Springer Science+Business Media

springer.com

© 2008 Springer Science + Business Media, LLC.

The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specificstatement, that such names are exempt from the relevant protective laws and regulations and therefore free forgeneral use.

Printed on acid free paper SPIN: 10944523 2109 – 5 4 3 2 1 0

Page 5: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

To the memory of my beloved wife K,my caring mother,

my hard working fatherand

to my two kind and warm-hearted sons, Ali and Arash

Page 6: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Preface

With this concise volume we hope to satisfy the needs of a large scientific community pre-viously served mainly by huge encyclopedic references. Rather than aiming at a compre-hensive coverage of our subject, we have concentrated on the most important topics, butexplained those as deeply as space has allowed. The result is a compact work which we trustleaves no central topics out.Entries have a rigid structure to facilitate the finding of information. Each term introducedhere includes a definition, history, mathematical details, limitations in using the terms fol-lowed by examples, references and relevant literature for further reading. The referenceis arranged alphabetically to provide quick access to the fundamental tools of statisticalmethodology and biographies of famous statisticians, including some currents ones whocontinue to contribute to the science of statistics, such as Sir David Cox, Bradley Efron andT.W. Anderson just to mention a few. The critera for selecting these statisticians, whetherliving or absent, is of course rather personal and it is very possible that some of those famouspersons deserving of an entry are absent. I apologize sincerely for any such unintentionalomissions.In addition, an attempt has been made to present the essential information about statisticaltests, concepts, and analytical methods in language that is accessible to practitioners andstudents and the vast community using statistics in medicine, engineering, physical science,life science, social science, and business/economics.The primary steps of writing this book were taken in 1983. In 1993 the first French languageversion was published by Dunod publishing company in Paris. Later, in 2004, the updatedand longer version in French was published by Springer France and in 2007 a student editionof the French edition was published at Springer.In this encyclopedia, just as with the Oxford Dictionary of Statistical Terms, published forthe International Statistical Institute in 2003, for each term one or more references are given,in some cases to an early source, and in others to a more recent publication. While somecare has been taken in the choice of references, the establishment of historical priorities isnotoriously difficult and the historical assignments are not to be regarded as authoritative.For more information on terms not found in this encyclopedia short articles can be foundin the following encyclopedias and dictionaries:

Page 7: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

VIII Preface

International Encyclopedia of Statistics, eds. William Kruskal and Judith M. Tanur (TheFree Press, 1978).

Encyclopedia of Statistical Sciences, eds. Samuel Kotz, Norman L. Johnson and CambellReed (John Wiley and Sons, 1982).

The Encyclopedia of Biostatistics, eds. Peter Armitage and Ted Colton (Chichester: JohnWiley and Sons, 1998).

The Encyclopedia of Environmetrics, eds. A.H. El-Sharaawi and W.W. Paregoric (JohnWiley and Sons, 2001).

The Encyclopedia of Statistics in Quality and Reliability, eds. F. Ruggeri, R.S. Kenett andF.W. Faltin (John Wiley and Sons, 2008).

Dictionnaire- Encylopédique en Statistique, Yadolah Dodge, Springer 2004

In between the publication of the first version of the current book in French in 1993 andthe later edition in 2004 to the current one, the manuscript has undergone many correc-tions. Special care has been made in choosing suitable translations for terms in order toachieve sound meaning in both the English and French languages. If in some cases this hasnot happen, I apologize. I would be very grateful to readers for any comments regardinginaccuracies, corrections, and suggestions for the inclusion of new terms, or any matter thatcould improve the next edition. Please send your comments to Springer-Verlag.I wish to thank many people who helped me throughout these many years to bring thismanuscript to its current form. Starting with my former assistants from 1983 to 2004,Nicole Rebetez, Sylvie Gonano-Weber, Maria Zegami, Jurg Schmid, Severine Pfaff, JimmyBrignony Elisabeth Pasteur, Valentine Rousson, Alexandra Fragnieire, and Theiry Murrier.To my colleagues Joe Whittaker of University of Lancaster, Ludevic Lebart of France Tele-com,andBernardFisher,UniversityofMarseille, forreadingpartsof themanuscript.Specialthanks go to Gonna Serbinenko and Thanos Kondylis for their remarkable cooperation intranslating some of terms from the French version to English. Working with Thanos, my for-merPh.D. student,wasawonderfulexperience.To mycolleagueShahriarHudawhosehelp-ful comments, criticisms, and corrections contributed greatly to this book. Finally, I thankthe Springer-Verlag, especially John Kimmel, Andrew Spencer, and Oona Schmid for theirmeticulous care in the production of this encyclopedia.

January 2008 Yadolah DodgeHonorary Professor

University of NeuchâtelSwitzerland

Page 8: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

About the Author

Founder of the Master in Statistics program in 1989 for the University of Neuchâtelin Switzerland, Professor Yadolah Dodge earned his Master in Applied Statistics fromthe Utah State University in 1970 and his Ph.D in Statistics with a minor in Biometryfrom the Oregon State University in 1973. He has published numerous articles andauthored, co-authored, and edited several books in the English and French languages,including Mathematical Programming in Statistics (John Wiley 1981, Classic Edition1993), Analysis of Experiments with Missing Data (John Wiley 1985), AlternativeMethods of Regression (John Wiley 1993), Premier Pas en Statistique (Springer 1999),Adaptive Regression (Springer 2000), The Oxford Dictionary of Statistical Terms (2003),Statistique: Dictionnaire encyclopédique (Springer 2004), and Optimisation appliquée(Springer 2005). Professor Dodge is an elected member of the International StatisticalInstitute (1976) and a Fellow of the Royal Statistical Society.

Page 9: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

A

Acceptance Region

The acceptance region is the interval withinthe sampling distribution of the test statis-tic that isconsistentwith thenull hypothesisH0 from hypothesis testing.It is the complementary region to the rejec-tion region.The acceptance region is associated witha probability 1− α, where α is the signifi-cance level of the test.

MATHEMATICAL ASPECTSSee rejection region.

EXAMPLESSee rejection region.

FURTHER READING� Critical value� Hypothesis testing� Rejection region� Significance level

Accuracy

The general meaning of accuracy is the prox-imity of a value or a statistic to a refer-encevalue.Morespecifically, itmeasures theproximity of theestimatorT of theunknownparameter θ to the true value of θ .

The accuracy of an estimator can be mea-sured by the expected value of the squareddeviation between T and θ , in other words:

E[(T − θ)2

].

Accuracy should not be confused with thetermprecision,which indicates thedegreeofexactnessofameasureand isusually indicat-ed by the number of decimals after the com-ma.

FURTHER READING� Bias� Estimator� Parameter� Statistics

AlgorithmAn algorithm is a process that consists ofa sequence of well-defined steps that lead tothe solution of a particular type of problem.This process can be iterative, meaning thatit is repeated several times. It is generallya numerical process.

HISTORYThe term algorithm comes from the Latinpronunciation of the nameof the ninth centu-ry mathematician al-Khwarizmi, who livedin Baghdad and was the father of algebra.

Page 10: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

2 Alternative Hypothesis

DOMAINS AND LIMITATIONSThe word algorithm has taken on a differentmeaning in recent years due to the advent ofcomputers. In thefield of computing, it refersto a process that is described in a way that canbe used in a computer program.The principal goal of statistical software isto develop a programming language capa-ble of incorporating statistical algorithms,so that these algorithms can then be pre-sented in a form that is comprehensible tothe user. The advantage of this approach isthat the user understands the results pro-duced by the algorithm and trusts the preci-sion of the solutions. Among various sta-tistical reviews that discuss algorithms,the Journal of Algorithms from the Aca-demic Press (New York), the part of theJournal of the Royal Statistical SocietySeries C (Applied Statistics) that focuses onalgorithms, Computational Statistics fromPhysica-Verlag (Heidelberg) and RandomStructures and Algorithms edited by Wiley(New York) are all worthy of special men-tion.

EXAMPLESWe present here an algorithm that calculatesthe absolute value of a nonzero number; inother words |x|.Process:

Step 1. Identify the algebraic sign of thegiven number.

Step 2. If the sign is negative, go to step 3.If the sign is positive, specify theabsolute value of the number as thenumber itself:

|x| = x

and stop the process.

Step 3. Specify the absolute value of thegiven number as its opposite num-ber:

|x| = −x

and stop the process.

FURTHER READING� Statistical software� Yates’ algorithm

REFERENCESChambers, J.M.: Computational Methods

for Data Analysis. Wiley, New York(1977)

Khwarizmi, Musa ibn Meusba (9th cent.).Jabr wa-al-muqeabalah. The algebra ofMohammed ben Musa, Rosen, F. (ed. andtransl.). Georg Olms Verlag, Hildesheim(1986)

Rashed, R.: La naissance de l’algèbre. In:Noël, E. (ed.) Le Matin des Mathémati-ciens. Belin-Radio France, Paris (1985)

Alternative Hypothesis

An alternative hypothesis is the hypothesiswhich differs from the hypothesis being test-ed.Thealternativehypothesis isusuallydenotedby H1.

HISTORYSee hypothesis and hypothesis testing.

MATHEMATICAL ASPECTSDuring the hypothesis testing of a param-eter of a population, the null hypothesis ispresented in the following way:

H0 : θ = θ0 ,

Page 11: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

A

Alternative Hypothesis 3

where θ is the parameter of the populationthat is to be estimated, and θ0 is the pre-sumed value of this parameter. The alterna-tive hypothesis can then take three differentforms:1. H1 : θ > θ0

2. H1 : θ < θ0

3. H1 : θ �= θ0

In the first two cases, the hypothesis testis called the one-sided, whereas in the thirdcase it is called the two-sided.Thealternativehypothesiscanalso take threedifferent forms during the hypothesis test-ingofparametersof twopopulations. If thenull hypothesis treats the two parameters θ1

and θ2 equally, then:

H0 : θ1 = θ2 or

H0 : θ1 − θ2 = 0 .

The alternative hypothesis could then be• H1 : θ1 > θ2 or H1 : θ1 − θ2 > 0• H1 : θ1 < θ2 or H1 : θ1 − θ2 < 0• H1 : θ1 �= θ2 or H1 : θ1 − θ2 �= 0During the comparison of more than twopopulations, the null hypothesis supposesthat the values of all of the parameters areidentical. If we want to compare k popula-tions, the null hypothesis is the following:

H0 : θ1 = θ2 = . . . = θk .

The alternative hypothesis will then be for-mulated as follows:

H1: the values of θi(i = 1, . . . , k) are not allidentical.

This means that only one parameter needsto have a different value to those of the otherparametersinorder toreject thenull hypoth-esis and accept the alternative hypothesis.

EXAMPLESWe are going to examine the alternativehypotheses for threeexamplesofhypothesistesting:1. Hypothesis testing on the percentage of

a populationAn election candidate wants to know if hewill receive more than 50% of the votes.The null hypothesis for this problem canbe written as follows:

H0 : π = 0.5 ,

where π is the percentage of the popu-lation to be estimated.Wecarry outaone-sided teston the right-hand side that allows us to answer the can-didate’s question. The alternative hypoth-esis will therefore be:

H1 : π > 0.5 .

2. Hypothesis testing on the mean of a pop-ulationA bolt maker wants to test the precisionof a new machine that should make bolts8 mm in diameter.We can use the following null hypothe-sis:

H0 : μ = 8 ,

where μ is the mean of the populationthat is to be estimated.We carry out a two-sided test to checkwhether the bolt diameter is too small ortoo big.The alternative hypothesis can be formu-lated in the following way:

H1 : μ �= 8 .

3. Hypothesis testing on a comparison ofthe means of two populationsAn insurance company decided to equipits offices with microcomputers. It wants

Page 12: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

4 Analysis of Binary Data

to buy these computers from two differ-ent companies so long as there is no sig-nificant difference in durability betweenthe two brands. It therefore tests the timethat passes before the first breakdown ona sample of microcomputers from eachbrand.According to the null hypothesis, themean of the elapsed time before the firstbreakdown is the same for each brand:

H0 : μ1 − μ2 = 0 .

Here μ1 and μ2 are the respective meansof the two populations.Since we do not know which mean willbe the highest, we carry out a two-sidedtest. Therefore the alternative hypothesiswill be:

H1 : μ1 − μ2 �= 0 .

FURTHER READING� Analysis of variance� Hypothesis� Hypothesis testing� Null hypothesis

REFERENCELehmann, E.I., Romann, S.P.: Testing Statis-

tical Hypothesis, 3rd edn. Springer, NewYork (2005)

Analysis of Binary DataThe study of how the probability of successdepends on expanatory variables and group-ing of materials.The analysis of binary data also involvesgoodness-of-fit tests of a sample of binaryvariables to a theoretical distribution, as wellas the study of 2 × 2 contingency tables

and their subsequent analysis. In the lattercase we note especially independence testsbetween attributes, and homogeneity tests.

HISTORYSee data analysis.

MATHEMATICAL ASPECTSLet Y be a binary random variable andX1, X2, . . . , Xk besupplementarybinaryvari-ables. So the dependence of Y on the vari-ablesX1, X2, . . . , Xk is represented by the fol-lowing models (the coefficients of which areestimated via the maximum likelihood):1. Linear model: P(Y = 1) is expressed as

a linear function (in the parameters) of Xi.2. Log-linear model: log P(Y = 1) is

expressed as a linear function (in theparameters) of Xi.

3. Logistic model: log(

P(Y=1)P(Y=0)

)is

expressed as a linear function (in theparameters) of Xi.

Models 1 and 2 are easier to interpret. Yetthe last one has the advantage that the quan-tity to be explained takes all possible valuesof the linear models. It is also important topay attention to the extrapolation of themod-eloutsideof thedomain inwhich it isapplied.It is possible that among the independentvariables (X1, X2, . . . , Xk), there are cate-gorical variables (eg. binary ones). In thiscase, it is necessary to treat the nonbinarycategorical variables in the following way:let Z be a random variable with m cate-gories. We enumerate the categories from 1to m and we define m − 1 random vari-ables Z1, Z2, . . . , Zm−1. So Zi takes the val-ue 1 if Z belongs to the category represent-ed by this index. The variable Z is there-fore replaced by these m − 1 variables, thecoefficientsofwhich express the influenceof

Page 13: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

A

Analysis of Residuals 5

the considered category. The reference (usedin order to avoid the situation of collinear-ity) will have (for the purposes of compar-ison with other categories) a parameter ofzero.

FURTHER READING� Binary data� Data analysis

REFERENCESCox, D.R., Snell, E.J.: The Analysis of Bina-

ry Data. Chapman & Hall (1989)

Analysis of Categorical Data

The analysis of categorical data involvesthe following methods:

(a) A study of the goodness-of-fit test;

(b) Thestudy ofacontingency tableand itssubsequent analysis, which consists ofdiscovering and studying relationshipsbetween the attributes (if they exist);

(c) An homogeneity test of some pop-ulations, related to the distribution ofa binary qualitative categoricalvariable;

(d) An examination of the independencehypothesis.

HISTORYThe term “contingency”, used in the rela-tion to cross tables of categorical data wasprobablyfirstusedbyPearson, Karl (1904).The chi-square test, was proposed by Bar-lett, M.S. in 1937.

MATHEMATICAL ASPECTSSee goodness-of-fit and contingency table.

FURTHER READING� Data� Data analysis� Categorical data� Chi-square goodness of fit test� Contingency table� Correspondence analysis� Goodness of fit test� Homogeneity test� Test of independence

REFERENCESAgresti, A.: Categorical Data Analysis.

Wiley, New York (1990)

Bartlett, M.S.: Properties of sufficiency andstatistical tests. Proc. Roy. Soc. Lond.Ser. A 160, 268–282 (1937)

Cox, D.R., Snell, E.J.: Analysis of BinaryData, 2nd edn. Chapman & Hall, London(1990)

Haberman, S.J.: Analysis of QualitativeData. Vol. I: Introductory Topics. Aca-demic, New York (1978)

Pearson, K.: On the theory of contingencyand its relation to association and normalcorrelation. Drapers’ Company ResearchMemoirs, Biometric Ser. I., pp. 1–35(1904)

Analysis of Residuals

An analysis of residuals is used to test thevalidity of thestatisticalmodeland to controlthe assumptions made on the error term. Itmay be used also for outlier detection.

HISTORYThe analysis of residuals dates back to Euler(1749) and Mayer (1750) in the middle of

Page 14: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

6 Analysis of Residuals

the eighteenth century, who were confront-ed with the problem of the estimation ofparameters from observations in the fieldof astronomy. Most of the methods used toanalyze residuals are based on the works ofAnscombe (1961) and Anscombeand Tukey(1963). In 1973, Anscombe also presentedan interesting discussion on the reasons forusing graphical methods of analysis. Cookand Weisberg (1982) dedicated a completebook to the analysis of residuals. Draper andSmith (1981) also addressed this problem ina chapter of their work Applied RegressionAnalysis.

MATHEMATICAL ASPECTSConsider a general model of multiple linearregression:

Yi = β0 +p−1∑j=1

βjXij + εi , i = 1, . . . , n ,

where εi is the nonobservable random errorterm.The hypotheses for the errors εi are gener-ally as follows:• The errors are independent;• They are normally distributed (they fol-

low a normal distribution);• Their mean is equal to zero;• Their variance is constant and equal to

σ 2.Regression analysisgivesanestimation forYi,denoted Yi. If the chosen model is ade-quate, the distribution of the residuals or“observed errors” ei = Yi − Yi should con-firm these hypotheses.Methods used to analyze residuals are main-ly graphical. Such methods include:1. Representing the residuals by a frequency

chart (for example a scatter plot).

2. Plotting the residuals as a function of time(if the chronological order is known).

3. Plotting the residuals as a function of theestimated values Yi.

4. Plotting the residuals as a function of theindependent variables Xij.

5. Creating a Q–Q plot of the residuals.

DOMAINS AND LIMITATIONSTovalidatetheanalysis,someofthehypothe-ses need to hold (like for example the nor-mality of the residuals in estimations basedon the mean square).Consider a plot of the residuals as a functionof the estimated values Yi. This is one of themost commonly used graphical approachesto verifying the validity of a model. It con-sists of placing:• The residuals ei = Yi − Yi in increasing

order;• The estimated values Yi on the abscissa.If the chosen model is adequate, the residu-als are uniformly distributed on a horizontalband of points.

However, if the hypotheses for the residu-als are not verified, the shape of the plot canbe different to this. The three figures belowshow the shapes obtained when:1. The variance σ 2 is not constant. In this

case, it is necessary to perform a trans-formation on the data Yi before tacklingthe regression analysis.

Page 15: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

A

Analysis of Residuals 7

2. The chosen model is inadequate (forexample, the model is linear but the con-stant term was omitted when it was nec-essary).

3. The chosen model is inadequate(a parabolic tendency is observed).

Different statistics have been proposed inorder topermitnumericalmeasurements thatare complementary to the visual techniques

presented above, which include those giv-en by Anscombe (1961) and Anscombe andTukey (1963).

EXAMPLESIn thenineteenthcentury,aScottishphysicistnamed Forbe, James D. wanted to estimatethe altitude above sea level by measuring theboiling point of water. He knew that the alti-tude could be determined from the atmos-pheric pressure; he then studied the relationbetween pressure and the boiling point ofwater. Forbe suggested that for an intervalof observed values, a plot of the logarithm ofthe pressure as a function of the boiling pointof water should give a straight line. Sincethe logarithm of these pressures is small andvaries little, we have multiplied these valuesby 100 below.

X boiling point Y 100 · log (pressure)

194.5 131.79

194.3 131.79

197.9 135.02

198.4 135.55

199.4 136.46

199.9 136.83

200.9 137.82

201.1 138.00

201.4 138.06

201.3 138.05

203.6 140.04

204.6 142.44

209.5 145.47

208.6 144.34

210.7 146.30

211.9 147.54

212.2 147.80

Thesimple linear regression modelfor thisproblem is:

Yi = β0 + β1Xi + εi , i = 1, . . . , 17 .

Page 16: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

8 Analysis of Residuals

Using the least squares method, we can findthe following estimation function:

Yi = −42.131+ 0.895Xi

where Yi is the estimated value of variable Yfor a given X.For each of these 17 values of Xi, we havean estimated value Yi. We can calculate theresiduals:

ei = Yi − Yi .

These results are presented in the followingtable:

i Xi Yi Yi ei =Yi − Yi

1 194.5 131.79 132.037 −0.247

2 194.3 131.79 131.857 −0.067

3 197.9 135.02 135.081 −0.061

4 198.4 135.55 135.529 0.021

5 199.4 136.46 136.424 0.036

6 199.9 136.83 136.872 −0.042

7 200.9 137.82 137.768 0.052

8 201.1 138.00 137.947 0.053

9 201.4 138.06 138.215 −0.155

10 201.3 138.05 138.126 −0.076

11 203.6 140.04 140.185 −0.145

12 204.6 142.44 141.081 1.359

13 209.5 145.47 145.469 0.001

14 208.6 144.34 144.663 −0.323

15 210.7 146.30 146.543 −0.243

16 211.9 147.54 147.618 −0.078

17 212.2 147.80 147.886 −0.086

Plotting the residuals as a function of theestimated values Yi gives the previousgraph.It is apparent from this graph that, except forone observation (the 12th), where the valueof the residual seems to indicate an outli-er, the residuals are distributed in a very thinhorizontal strip. In this case the residuals donot provide any reason to doubt the validityof the chosen model. By analyzing the stan-dardizedresidualswecandeterminewhetherthe 12th observation is an outlier or not.

FURTHER READING� Anderson–Darling test� Least squares� Multiple linear regression� Outlier� Regression analysis� Residual� Scatterplot� Simple linear regression

REFERENCESAnscombe, F.J.: Examination of residuals.

Proc. 4th Berkeley Symp. Math. Statist.Prob. 1, 1–36 (1961)

Anscombe, F.J.: Graphs in statistical analy-sis. Am. Stat. 27, 17–21 (1973)

Anscombe, F.J., Tukey, J.W.: Analysis ofresiduals. Technometrics 5, 141–160(1963)

Cook, R.D., Weisberg, S.: Residuals andInfluenceinRegression.Chapman&Hall,London (1982)

Cook, R.D., Weisberg, S.: An Introductionto Regression Graphics. Wiley, New York(1994)

Cook, R.D., Weisberg, S.: Applied Regres-sion Including Computing and Graphics.Wiley, New York (1999)

Page 17: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

A

Analysis of Variance 9

Draper, N.R., Smith, H.: Applied Regres-sion Analysis, 3rd edn. Wiley, New York(1998)

Euler, L.: Recherches sur la question des iné-galités du mouvement de Saturne et deJupiter, pièce ayant remporté le prix del’année 1748, par l’Académie royale dessciences de Paris. Republié en 1960, dansLeonhardi Euleri, Opera Omnia, 2èmesérie. Turici, Bâle, 25, pp. 47–157 (1749)

Mayer,T.:AbhandlungüberdieUmwälzungdesMondsumseineAchseunddieschein-bare Bewegung der Mondflecken. Kos-mographische Nachrichten und Samm-lungen aufdasJahr17481, 52–183(1750)

Analysis of Variance

The analysis of variance is a technique thatconsists of separating the total variation ofdata set into logical components associat-ed with specific sources of variation in orderto compare the mean of several popula-tions. This analysis also helps us to testcertain hypotheses concerning the param-eters of the model, or to estimate the compo-nents of the variance. The sources of vari-ation are globally summarized in a compo-nent called error variance, sometime calledwithin-treatment mean square and anothercomponent that is termed “effect” or treat-ment, sometime called between-treatmentmean square.

HISTORYAnalysis of variance dates back to Fish-er, R.A. (1925). He established the first fun-damental principles in this field. Analysis ofvariancewasfirst applied in thefieldsofbiol-ogy and agriculture.

MATHEMATICAL ASPECTSThe analysis of variance compares themeans of three or more random samplesand determines whether there is a signif-icant difference between the populationsfrom which the samples are taken. Thistechnique can only be applied if the randomsamples are independent, if the populationdistributions are approximately normal andall have the same variance σ 2.Having established that the null hypothesis,assumes that the means are equal, while thealternative hypothesis affirms that at leastone of them is different, we fix a significantlevel. We then make two estimates of theunknown variance σ 2:• The first, denoted s2

E, corresponds to themean of the variances of each sample;

• The second, s2Tr, is based on the variation

between the means of the samples.Ideally, if the null hypothesis is verified,these two estimationswillbeequal, and the Fratio (F = s2

Tr/s2E, as used in the Fisher test

and defined as thequotientof thesecond esti-mation of σ 2 to the first) will be equal to 1.The value of the F ratio, which is generallymore than 1 because of the variation from thesampling, must be compared to the value inthe Fisher table corresponding to the fixedsignificant level. The decision rule consistsof either rejecting the null hypothesis if thecalculated value isgreater thanorequal to thetabulated value, or else the means are equal,which shows that the samples come from thesame population.Consider the following model:

Yij = μ+ τi + εij ,

i = 1, 2, . . . , t , j = 1, 2, . . . , ni .

Here

Yij represents the observation j receivingthe treatment i,

Page 18: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

10 Analysis of Variance

μ is the general mean common to all treat-ments,

τi is the actual effect of treatment i on theobservation,

εij is the experimental error for observa-tion Yij.

In this case, the null hypothesis is expressedin the following way:

H0 : τ1 = τ2 = . . . = τt ,

which means that the t treatments are iden-tical.The alternative hypothesis is formulated inthe following way:

H1 : the values of τi(i = 1, 2, . . . , t)

are not all identical .

The following formulae are used:

SSTr =t∑

i=1

ni(Yi. − Y..)2 , s2

Tr =SSTr

t − 1,

SSE =t∑

i=1

ni∑j=1

(Yij − Yi.)2 , s2

E =SSE

N − t,

and

SST =t∑

i=1

ni∑j=1

(Yij − Y..)2

orSST = SSTr + SSE .

where

Yi. =ni∑

j=1

Yij

niis the mean ofthe ith set

Y.. = 1

N

t∑i=1

ni∑j=1

Yij is the global meantaken on all theobservations, and

N =t∑

i=1

ni is the total numberof observations.

and finally the value of the F ratio

F = s2Tr

s2E

.

It iscustomary to summarize the informationfrom the analysis of variance in an analysisof variance table:

Sourceof varia-tion

Degreesoffreedom

Sum ofsquares

Meanofsquares

F

Amongtreat-ments

t − 1 SSTr s2Tr

s2Tr

s2E

Withintreat-ments

N − t SSE s2E

Total N − 1 SST

DOMAINS AND LIMITATIONSAn analysis of variance is always associat-ed with a model. Therefore, there is a dif-ferent analysis of variance in each distinctcase. For example, consider the case wherethe analysis of variance is applied to factori-al experiments with one or several factors,and these factorial experiments are linked toseveral designs of experiment.We can distinguish not only the number offactors in the experiment but also the typeof hypotheses linked to the effects of thetreatments.Wethenhaveamodelwithfixedeffects, a model with variable effects anda model with mixed effects. Each of theserequires a specific analysis, but whichev-er model is used, the basic assumptions ofadditivity, normality, homoscedasticity andindependencemustberespected.Thismeansthat:1. The experimental errors of the model are

random variables that are independentof each other;

Page 19: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

A

Anderson, Oskar 11

2. All of the errors follow a normal distri-bution with a mean of zero and anunknown variance σ 2.

All designs of experiment can be analyzedusing analysis of variance. The most com-mon designs are completely randomizeddesigns, randomized block designs andLatin square designs.An analysis of variance can also be per-formed with simple or multiple linearregression.If during an analysis of variance the nullhypothesis (the case for equality of means) isrejected, a least significant difference testis used to identify the populations that havesignificantlydifferentmeans,which issome-thing that an analysis of variance cannot do.

EXAMPLESSee two-way analysis of variance, one-way analysis of variance, linear multipleregression and simple linear regression.

FURTHER READING� Design of experiments� Factor� Fisher distribution� Fisher table� Fisher test� Least significant difference test� Multiple linear regression� One-way analysis of variance� Regression analysis� Simple linear regression� Two-way analysis of variance

REFERENCESFisher, R.A.: Statistical Methods for

Research Workers. Oliver & Boyd, Edin-burgh (1925)

Rao, C.R.: Advanced Statistical Methodsin Biometric Research. Wiley, New York(1952)

Scheffé, H.: The Analysis of Variance.Wiley, New York (1959)

Anderson, Oskar

Anderson, Oskar (1887–1960) was animportantmemberof theContinentalSchoolof Statistics; his contributions touched upona wide range of subjects, including corre-lation, time series analysis, nonparamet-ric methods and sample survey, as well aseconometrics and statistical applications insocial sciences.Anderson, Oskar received a bachelor degreewith distinction from the Kazan Gymnasiumand then studied mathematics and physicsfor a year at the University of Kazan. Hethen entered the Faculty of Economics atthe Polytechnic Institute of St. Petersburg,where he studied mathematics, statistics andeconomics.The publications of Anderson, Oskar com-bine the traditions of the Continental Schoolof Statistics with the concepts of the EnglishBiometric School, particularly in two ofhis works: “Einführung in die mathema-tische Statistik” and “Probleme der statis-tischen Methodenlehre in den Sozialwis-senschaften”.In 1949, he founded the journal Mitteilungs-blatt für Mathematische Statistik withKellerer, Hans and Münzner, Hans.

Some principal works of Anderson, Oskar:

1935 Einführung in die MathematischeStatistik. Julius Springer, Wien

1954 Probleme der statistischen Metho-denlehre in den Sozialwissenschaf-ten. Physica-Verlag, Würzberg

Page 20: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

12 Anderson, Theodore W.

Anderson, Theodore W.Anderson, Theodore Wilbur was born onthe 5th of June 1918 in Minneapolis, in thestate of Minnesota in the USA. He becamea Doctor of Mathematics in 1945 at theUniversity of Princeton, and in 1946 hebecame a member of the Department ofMathematical Statistics at the University ofColumbia, where he was named Professorin 1956. In 1967, he was named Professorof Statistics and Economics at Stanford Uni-versity. He was, successively: Fellow of theGuggenheim Foundation between 1947 and1948; Editor of the Annals of MathematicalStatistics from 1950 to 1952; Presidentof theInstitute of Mathematical Statistics in 1963;and Vice-President of the American Statis-tical Association from 1971 to 1973. He isa member of the American Academy of Artsand Sciences, of the National Academy ofSciences, of the Institute of MathematicalStatistics and of the Royal Statistical Soci-ety. Anderson’s most important contributionto statistics is surely in the domain of mul-tivariate analysis. In 1958, he published thebook entitled An Introduction to Multivari-ate Statistical Analysis. This book was thereference work in this domain for over fortyyears. It has been even translated into Rus-sian.

Some of the principal works and articles ofTheodore Wilbur Anderson:

1952 (with Darling, D.A.) Asymptotic the-ory of certain goodness of fit criteriabased on stochastic processes. Ann.Math. Stat. 23, 193–212.

1958 An Introduction to Multivariate Sta-tistical Analysis. Wiley, New York.

1971 The Statistical Analysis of TimeSeries. Wiley, New York.

1989 Linear latent variable models andcovariance structures. J. Economet-rics, 41, 91–119.

1992 (with Kunitoma, N.) Asymptoticdistributions of regression and auto-regression coefficients with Martin-gale difference disturbances. J. Mul-tivariate Anal., 40, 221–243.

1993 Goodness of fit tests for spectral dis-tributions. Ann. Stat. 21, 830–847.

FURTHER READING� Anderson–Darling test

Anderson–Darling TestTheAnderson–Darling test isagoodness-of-fit test which allows to control the hypothe-sis that the distribution of a random variableobserved in a sample follows a certain the-oretical distribution. In particular, it allowsus to test whether the empirical distributionobtained corresponds to a normal distri-bution.

HISTORYAnderson, Theodore W. and Darling D.A.initially used Anderson–Darling statistics,denoted A2, to test the conformity of a distri-bution with perfectly specified parameters(1952 and 1954). Later on, in the 1960sand especially the 1970s, some other authors(mostly Stephens) adapted the test to a widerrange of distributions where some of theparameters may not be known.

MATHEMATICAL ASPECTSLet us consider the random variable X,which follows the normal distribution withan expectation μ and a variance σ 2, andhas a distribution function FX(x; θ), where θ

is a parameter (or a set of parameters) that

Page 21: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

A

Anderson–Darling Test 13

determine, FX . We furthermore assume θ tobe known.An observation of a sample of size n issuedfrom the variable X gives a distribution func-tion Fn(x). The Anderson–Darling statistic,denoted by A2, is then given by the weight-ed sum of the squared deviations FX(x; θ)−Fn(x):

A2 = 1

n

(n∑

i=1

(FX (x; θ)− Fn (x))2

).

Starting from the fact that A2 is a randomvariable that follows a certain distributionover the interval [0; +∞[, it is possible totest, for a significance level that is fixed a pri-ori, whether Fn(x) is the realization of therandom variable FX(X; θ); that is, whether Xfollows the probability distribution with thedistribution function FX(x; θ).

Computation of A 2 StatisticArrange the observationsx1, x2, . . . , xn in thesample issued from X in ascending order i.e.,x1 < x2 < . . . < xn. Note that zi =FX(xi; θ), (i = 1, 2, . . . , n). Then compute,A2 by:

A2 = −1

n

( n∑i=1

(2i− 1) (ln (zi)

+ ln(1− zn+1−i))

)− n .

For the situation preferred here (X followsthe normal distribution with expectation μ

and varianceσ 2),wecan enumerate fourcas-es, depending on the known parameters μ

and σ 2 (F is the distribution function of thestandard normal distribution):1. μ and σ 2 are known, so FX(x; (μ, σ 2))

is perfectly specified. Naturally we thenhave zi = F(wi) where wi = xi−μ

σ.

2. σ 2 is known but μ is unknown and is esti-mated using x = 1

n

(∑i xi

), the mean of

the sample. Then, let zi = F(wi), wherewi = xi−x

σ.

3. μ is known but σ 2 is unknown and is esti-mated using s′2 = 1

n

(∑i(xi − u)2

). In

this case, let zi = F(wi), where wi =x(i)−μ

s′ .4. μ and σ 2 are both unknown and are esti-

mated respectively using x and s2 =1

n−1 (∑

i(xi − x)2). Then, let zi = F(wi),

where wi = xi−xs .

Asymptotic distributions were found for A2

by Anderson and Darling for the first case,and by Stephens for the next two cases. Forlast case, Stephens determined an asymptot-ic distribution for the transformation: A∗ =A2(1.0+ 0.75

n + 2.25n2 ).

Therefore, as shown below, we can constructa table that gives, depending on the case andthe significance level (10%, 5%, 2.5% or 1%below), the limiting values of A2 (and A∗for the case 4) beyond which the normalityhypothesis is rejected:

Significance level

Case: 0.1 0.050 0.025 0.01

1: A2 = 1.933 2.492 3.070 3.857

2: A2 = 0.894 1.087 1.285 1.551

3: A2 = 1.743 2.308 2.898 3.702

4: A∗ = 0.631 0.752 0.873 1.035

DOMAINS AND LIMITATIONSAs the distribution of A2 is expressed asymp-totically, the testneeds thesamplesize n to belarge. If this is not the case then, for the firsttwo cases, the distribution of A2 is not knownand it is necessary to perform a transforma-tion of the type A2 �−→ A∗, from which A∗can be determined. When n > 20, we canavoid such a transformation and so the datain the above table are valid.The Anderson–Darling test has the advan-tage that it can be applied to a wide range

Page 22: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

14 Anderson–Darling Test

of distributions (not just a normal distri-bution but also exponential, logistic andgamma distributions, among others). Thatallowsusto tryoutawiderangeofalternativedistributions if the initial test rejects the nullhypothesis for the distribution of a randomvariable.

EXAMPLESThe following data illustrate the applicationof the Anderson–Darling test for the normal-ity hypothesis:Consider a sample of the heights (in cm) of25 male students. The following table showsthe observations in the sample, and also wi

and zi. We can also calculate x and s fromthese data: x = 177.36 and s = 4.98.Assuming that F is a standard normal distri-bution function, we have:

Obs: xi wi = xi −xs zi = F

(wi

)

1 169 −1.678 0.047

2 169 −1.678 0.047

3 170 −1.477 0.070

4 171 −1.277 0.100

5 173 −0.875 0.191

6 173 −0.875 0.191

7 174 −0.674 0.250

8 175 −0.474 0.318

9 175 −0.474 0.318

10 175 −0.474 0.318

11 176 −0.273 0.392

12 176 −0.273 0.392

13 176 −0.273 0.392

14 179 0.329 0.629

15 180 0.530 0.702

16 180 0.530 0.702

17 180 0.530 0.702

18 181 0.731 0.767

19 181 0.731 0.767

20 182 0.931 0.824

21 182 0.931 0.824

Obs: xi wi = xi −xs zi = F

(wi

)

22 182 0.931 0.824

23 185 1.533 0.937

24 185 1.533 0.937

25 185 1.533 0.937

We then get A2 ∼= 0.436, which gives

A∗ = A2 ·(

1.0+ 0.75

25+ 0.25

625

)

= A2 · (1.0336) ∼= 0.451 .

Since we have case 4, and a significance lev-el fixed at 1%, the calculated value of A∗ ismuch less then the value shown in the table(1.035). Therefore, thenormality hypothesiscannot be rejected at a significance level of1%.

FURTHER READING� Goodness of fit test� Histogram� Nonparametric statistics� Normal distribution� Statistics

REFERENCESAnderson, T.W., Darling, D.A.: Asymptot-

ic theory of certain goodness of fit criteriabased on stochastic processes. Ann.Math.Stat. 23, 193–212 (1952)

Anderson, T.W., Darling, D.A.: A test ofgoodness of fit. J. Am. Stat. Assoc. 49,765–769 (1954)

Durbin, J., Knott, M., Taylor, C.C.: Com-ponents of Cramer-Von Mises statistics,II. J. Roy. Stat. Soc. Ser. B 37, 216–237(1975)

Stephens, M.A.: EDF statistics for goodnessof fit and some comparisons. J. Am. Stat.Assoc. 69, 730–737 (1974)

Page 23: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

A

Arithmetic Mean 15

Arithmetic MeanThe arithmetic mean is a measure of cen-tral tendency. It allows us to characterizethe center of the frequency distribution ofa quantitative variable by considering allof the observations with the same weightafforded to each (in contrast to the weightedarithmetic mean).It is calculated by summing the observationsand then dividing by the number of observa-tions.

HISTORYThe arithmetic mean is one of the oldestmethods used to combine observations inorder to give a unique approximate val-ue. It appears to have been first used byBabylonian astronomers in the third centu-ry BC. The arithmetic mean was used by theastronomers to determine thepositionsof thesun, the moon and the planets. According toPlackett (1958), theconceptof thearithmeticmean originated from the Greek astronomerHipparchus.In 1755 Thomas Simpson officially pro-posed the use of the arithmetic mean in a let-ter to the President of the Royal Society.

MATHEMATICAL ASPECTSLet x1, x2, . . . , xn be a set of n quantitiesor n observations relating to a quantitativevariable X.The arithmetic mean x of x1, x2, . . . , xn isthe sum of these observations divided by thenumber n of observations:

x =

n∑i=1

xi

n.

When the observations are ordered in theform of a frequency distribution, the arith-

metic mean is calculated in the followingway:

x =

k∑i=1

xi · fik∑

i=1fi

,

where xi are the different values of the vari-able, fi are the frequencies associated withthese values, k is the number of different val-ues, and thesum of thefrequenciesequals thenumber of observations:

k∑i=1

fi = n .

To calculate the mean of a frequency distri-bution where values of the quantitative vari-able X are grouped in classes, we consid-er that all of the observations belongingto a certain class take the central value ofthe class, assuming that the observationsare uniformly distributed inside the classes(if this hypothesis is not correct, the arith-metic mean obtained will only be an appro-ximation.)Therefore, in this case we have:

x =

k∑i=1

xi · fik∑

i=1fi

,

where the xi are the class centers, the fi arethe frequencies associated with each class,and k is the number of classes.

Properties of the Arithmetic Mean• The algebraic sum of deviations between

every value of the set and the arithmeticmean of this set equals 0:

n∑i=1

(xi − x) = 0 .

Page 24: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

16 Arithmetic Mean

• The sum of square deviations from everyvalue to a given number “a” is smallestwhen “a” is the arithmetic mean:

n∑i=1

(xi − a)2 ≥n∑

i=1

(xi − x)2 .

Proof:We can write:

xi − a = (xi − x)+ (x− a) .

Finding the squares of both members ofthe equality, summarizing them and thensimplifying gives:

n∑i=1

(xi − a)2

=n∑

i=1

(xi − x)2 + n · (x− a)2 .

As n · (x − a)2 is not negative, we haveproved that:

n∑i=1

(xi − a)2 ≥n∑

i=1

(xi − x)2 .

• The arithmetic mean x of a sample(x1, . . . , xn) is normally considered tobe an estimator of the mean μ of thepopulation from which the sample wastaken.

• Assuming that xi are independent ran-dom variables with the same distributionfunction for the mean μ and the vari-ance σ 2, we can show that1. E [x] = μ,2. Var (x) = σ 2

n ,if these moments exist.Since the mathematical expectation ofx equals μ, the arithmetic mean is an esti-matorwithoutbiasof themeanof thepop-ulation.

• If thexi result from therandom samplingwithout replacementofafinitepopulationwith a mean μ, the identity

E [x] = μ

is still valid, but the variance of x must beadjusted by a factor that depends on thesize N of the population and the size n ofthe sample:

Var (x) = σ 2

n·[

N − n

N − 1

],

whereσ 2 is thevarianceof thepopulation.

Relationship Between the Arithmetic Meanand Other Measures of Central Tendency• Thearithmeticmeanisrelated to twoprin-

cipal measures of central tendency: themode Mo and the median Md.If the distribution is symmetric and uni-modal:

x = Md = Mo .

If the distribution is unimodal, it is nor-mally true that:x ≥ Md ≥ Mo if the distribution isstretched to the right,x ≤ Md ≤ Mo if the distribution isstretched to the left.For a unimodal, slightly asymmetricdistribution, these three measures of thecentral tendency often approximatelysatisfy the following relation:

(x−Mo) = 3 · (x−Md) .

• In the same way, for a unimodal distri-bution, if we consider a set of posi-tive numbers, the geometric mean G is

Page 25: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

A

Arithmetic Mean 17

always smaller than or equal to the arith-metic mean x, and is always greater thanor equal to the harmonic mean H. So wehave:

H ≤ G ≤ x .

These three means are identical only if allof the numbers are equal.

DOMAINS AND LIMITATIONSThe arithmetic mean is a simple measureof the central value of a set of quantitativeobservations. Finding the mean can some-times lead to poor data interpretation:

If the monthly salaries (in Euros) of5 people are 3000, 3200, 2900, 3500and 6500, the arithmetic mean of thesalary is 19100

5 = 3820. This meangives us some idea of the sizes of thesalaries sampled, since it is situatedbetween the biggest and the smallestone. However, 80% of the salaries aresmaller then the mean, so in this caseit is not a particularly good representa-tion of a typical salary.

This case shows that we need to pay attentionto the form of the distribution and the relia-bility of the observations before we use thearithmetic mean as the measure of centraltendency for a particular set of values. If anabsurdobservationoccurs in thedistribution,the arithmetic mean could provide an unrep-resentative value for the central tendency.If some observations are considered to beless reliable then others, it could be usefulto make them less important. This can bedone by calculating a weighted arithmeticmean, or by using the median, which is notstrongly influenced by any absurd observa-tions.

EXAMPLESIn company A, nine employees have the fol-lowing monthly salaries (in Euros):

3000 3200 2900 3440 5050

4150 3150 3300 5200

The arithmetic mean of these monthlysalaries is:

x = (3000+ 3200+ · · · + 3300+ 5200)

9

= 33390

9= 3710 Euros .

We now examine a case where the data arepresented in the form of a frequency distri-bution.The following frequency table gives thenumber of days that 50 employees wereabsent on sick leave during a period of oneyear:

xi : Days of illness fi : Number ofemployees

0 7

1 12

2 19

3 8

4 4

Total 50

Let us try to calculate the mean number ofdays that the employees were absent due toillness.The total number of sick days for the50 employees equals the sum of the productof each xi by its respective frequency fi:

5∑i=1

xi · fi = 0 · 7+ 1 · 12+ 2 · 19+ 3 · 8

+ 4 · 4 = 90 .

Page 26: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

18 Arithmetic Triangle

The total number of employees equals:

5∑i=1

fi = 7+ 12+ 19+ 8+ 4 = 50 .

L The arithmetic mean of the number of sickdays per employee is then:

x =

5∑i=1

xi · fi5∑

i=1fi

= 90

50= 1.8

which means that, on average, the50 employees took 1.8 days off for sick-ness per year.In the following example, the data aregrouped in classes.We want to calculate the arithmetic mean ofthe daily profits from the sale of 50 types ofgrocery. The frequency distribution for thegroceries is given in the following table:

Classes(profitsin Euros)

Mid-pointsxi

Frequenciesfi (numberof groceries)

xi · fi

500–550 525 3 1575

550–600 575 12 6900

600–650 625 17 10625

650–700 675 8 5400

700–750 725 6 4350

750–800 775 4 3100

Total 50 31950

The arithmetic mean of the profits is:

x =

6∑i=1

xi · fi6∑

i=1fi

= 31950

50= 639 ,

which means that, on average, each ofthe 50 groceries provide a daily profit of639 Euros.

FURTHER READING� Geometric mean� Harmonic mean� Mean� Measure of central tendency� Weighted arithmetic mean

REFERENCESPlackett, R.L.: Studies in the history of

probability and statistics. VII. The princi-ple of the arithmetic mean. Biometrika 45,130–135 (1958)

Simpson, T.: A letter to the Right HonorableGeorge Earl of Macclesfield, President oftheRoyalSociety,on theadvantageof tak-ing the mean of a number of observationsinpracticalastronomy.Philos.Trans.Roy.Soc. Lond. 49, 82–93 (1755)

Simpson, T.: An attempt to show the advan-tage arising by taking the mean of a num-ber of observations in practical astron-omy. In: Miscellaneous Tracts on SomeCurious and Very Interesting Subjectsin Mechanics, Physical-Astronomy, andSpeculative Mathematics. Nourse, Lon-don (1757). pp. 64–75

Arithmetic Triangle

The arithmetic triangle is used to determinebinomial coefficients (a + b)n when cal-culating the number of possible combina-tions of k objects out of a total of n objects(Ck

n).

HISTORYThe notion of finding the number of combi-nations of k objects from n objects in totalhas been explored in India since the ninthcentury. Indeed, there are traces of it in the

Page 27: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

A

Arithmetic Triangle 19

Meru Prastara written by Pingala in around200 BC.Between the fourteenth and thefifteenth cen-turies, al-Kashi, a mathematician from theIranian city of Kashan, wrote The Key toArithmetic. In this work he calls binomialcoefficients “exponent elements”.In his work Traité du Triangle Arithmétique,published in 1665, Pascal, Blaise (1654)defined the numbers in the “arithmetic tri-angle”, and so this triangle is also known asPascal’s triangle.We should also note that the triangle wasmade popular by Tartaglia, Niccolo Fontanain 1556, and so Italians often refer to it asTartaglia’s triangle, even though Tartagliadid not actually study the arithmetic triangle.

MATHEMATICAL ASPECTSThe arithmetic triangle has the followingform:

1

1 1

1 2 1

1 3 3 1

1 4 6 4 1

1 5 10 10 5 1

1 6 15 20 15 6 1

. . .

Each element is a binomial coefficient

Ckn =

n!

k! (n− k)!

= n · (n− 1) · . . . · (n− k+ 1)

1 · 2 · . . . · k .

This coefficient corresponds to the elementk of the line n+ 1, k = 0, . . . , n.Any particular number is obtained by addingtogether its neighboring numbers in the pre-vious line.

For example:

C46 = C3

5 + C45 = 10+ 5 = 15 .

More generally, we have the relation:

Ckn + Ck+1

n = Ck+1n+1 ,

because:

Ckn + Ck+1

n = n!

(n− k)! · k!

+ n!

(n− k − 1)! · (k + 1)!

= n! · [(k + 1)+ (n− k)]

(n− k)! · (k + 1)!

= (n+ 1)!

(n− k)! · (k + 1)!

= Ck+1n+1 .

FURTHER READING� Binomial� Binomial distribution� Combination� Combinatory analysis

REFERENCESPascal, B.: Traité du triangle arithmétique

(publ. posthum. in 1665), Paris (1654)

Pascal, B.: Œuvres, vols. 1–14. Brun-schvicg, L., Boutroux, P., Gazier, F. (eds.)

Page 28: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

20 ARMA Models

LesGrandsEcrivainsdeFrance.Hachette,Paris (1904–1925)

Pascal, B.: Mesnard, J. (ed.) Œuvres com-plètes. Vol. 2. Desclée de Brouwer, Paris(1970)

Rashed, R.: La naissance de l’algèbre. In:Noël, E. (ed.) Le Matin des Mathémati-ciens. Belin-Radio France, Paris (1985)Chap. 12).

Youschkevitch, A.P.: Les mathématiquesarabes (VIIIème-XVème siècles). Partialtranslation by Cazenave, M., Jaouiche, K.Vrin, Paris (1976)

ARMA Models

ARMA models (sometimes called Box-Jenkins models) are autoregressive movingaverage models used in time series analy-sis. The autoregressive part, denoted AR,consists of a finite linear combination ofprevious observations. The moving aver-age part, MA, consists of a finite linearcombination in t of the previous values fora white noise (a sequence of mutually inde-pendent and identically distributed randomvariables).

MATHEMATICAL ASPECTS1. AR model (autoregressive)

In an autoregressive process of order p,the present observation yt is generated bya weighted mean of the past observationsup to the pth period. This takes the follow-ing form:

AR(1) : yt = θ1yt−1 + εt ,

AR(2) : yt = θ1yt−1 + θ2yt−2 + εt ,

...

AR(p) : yt = θ1yt−1 + θ2yt−2 + . . .

+ θpyt−p + εt ,

where θ1, θ2, . . . , θp are the positive ornegative parameters to be estimated andεt is the error factor, which follows a nor-mal distribution.

2. MA model (moving average)In a moving average process of order q,each observation yt is randomly generat-ed by a weighted arithmetic mean untilthe qth period:

MA(1) : yt = εt − α1εt−1

MA(2) : yt = εt − α1εt−1 − α2εt−2

· · ·MA(p) : yt = εt − α1εt−1 − α2εt−2

− . . .− αqεt−q ,

where α1, α2, . . . , αq are positive or nega-tive parameters and εt is the Gaussian ran-dom error.The MA model represents a time seriesfluctuating about its mean in a randommanner, which gives rise to the term“moving average”, because it smoothestheseries, subtracting thewhitenoisegen-erated by the randomness of the element.

3. ARMA model (autoregressive movingaverage model)ARMA models represent processes gen-erated from a combination of past valuesand past errors. They are defined by thefollowing equation:

ARMA(p, q) :

yt = θ1yt−1 + θ2yt−2 + . . .

+ θpyt−p + εt − α1εt−1 − α2εt−2

− . . .− αqεt−q ,

with θp �= 0, αq �= 0, and (εt, t ∈ Z) isa weak white noise.

Page 29: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

A

Arrangement 21

FURTHER READING� Time series� Weighted arithmetic mean

REFERENCESBox, G.E.P., Jenkins, G.M.: Time Series

Analysis: Forecasting and Control (SeriesinTimeSeriesAnalysis).HoldenDay,SanFrancisco (1970)

ArrangementArrangements are a concept found in com-binatory analysis.The number of arrangements is the numberof ways drawing k objects from n objectswhere the order in which the objects aredrawn is taken into account (in contrast tocombinations).

HISTORYSee combinatory analysis.

MATHEMATICAL ASPECTS1. Arrangements without repetitions

An arrangement without repetition refersto the situation where the objects drawnare not placed back in for the next draw-ing. Each object can then only be drawnonce during the k drawings.The number of arrangements of k objectsamongst n without repetition is equal to:

Akn =

n!

(n− k)!.

2. Arrangements with repetitionsArrangementswith repetition occurwheneach object pulled out is placed back infor the next drawing. Each object can thenbe drawn r times from k drawings, r =0, 1, . . . , k.

The number of arrangements of k objectsamongst n with repetitions is equal to n tothe power k:

Akn = nk .

EXAMPLES1. Arrangements without repetitions

Consider an urn containing six balls num-bered from 1 to 6. We pull out four ballsfrom the urn in succession, and wewant toknow how many numbers it is possible toform from the numbers of the balls drawn.We are then interested in the number ofarrangements (since we take into accountthe order of the balls) without repetition(since each ball can be pulled out onlyonce) of four objects amongst six. Weobtain:

Akn =

n!

(n− k)!= 6!

(6− 4)!= 360

possible arrangements. Therefore, it ispossible to form 360 different numbersby drawing four numbers from the num-bers 1,2,3,4,5,6 when each number canappear only once in the four-digit numberformed.Asasecond example, letus investigate thearrangements without repetitions of twoletters from the letters A, B and C. Withn = 3 and k = 2 we have:

Akn =

n!

(n− k)!= 3!

(3− 2)!= 6 .

We then obtain:AB, AC, BA, BC, CA, CB.

2. Arrangements with repetitionsConsider the same urn as described previ-ously. We perform four successive draw-ings, but this time we put each ball drawnback in the urn.

Page 30: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

22 Attributable Risk

We want to know how many four-digitnumbers (or arrangements) are possible iffour numbers are drawn.In thiscase,weare investigatingfthenum-ber of arrangements with repetition (sinceeach ball is placed back in the urn beforethe next drawing). We obtain

Akn = nk = 64 = 1296

different arrangements. It is possible toform 1296 four-digit numbers from thenumbers 1,2,3,4,5,6 if each number canappear more than once in the four-digitnumber.As a second example we again take thethree letters A, B and C and form anarrangement of two letters with repeti-tions. With n = 3 and k = 2, we have:

Akn = nk = 32 = 9 .

We then obtain:AA, AB, AC, BA, BB, BC, CA, CB, CC.

FURTHER READING� Combination� Combinatory analysis� Permutation

REFERENCESSee combinatory analysis.

Attributable RiskThe attributable risk is the differencebetween the risk encountered by individ-uals exposed to a particular factor and therisk encountered by individuals who are notexposed to it. This is the opposite to avoid-able risk. It measures the absolute effect ofa cause (that is, the excess risk or cases ofillness).

HISTORYSee risk.

MATHEMATICAL ASPECTSBy definition we have:

attributable risk = risk for those exposed

− risk for those not exposed .

DOMAINS AND LIMITATIONSThe confidence interval of an attributablerisk is equivalent to the confidence intervalof the difference between the proportionspE and pNE, where pE and pNE representthe risksencountered by individualsexposedand not exposed to the studied factor, respec-tively. Take nE and nNE to be, respective-ly, the size of the exposed and nonexposedpopulations. Then, for a confidence level of(1− α), is given by:

(pE − pNE)± zα

√pE·(1−pE)

nE+ pNE·(1−pNE)

nNE,

wherezα thevalueobtainedfromthenormaltable (for example, for a confidence intervalof 95%, α = 0.05 and zα = 1.96). The con-fidence interval for (1− α) for an avoidablerisk has bounds given by:

(pNE − pE)± zα

√pE·(1−pE)

nE· pNE·(1−pNE)

nNE.

Here, nE and nNE need to be large. If the con-fidenceinterval includeszero,wecannotruleout an absence of attributable risk.

EXAMPLESAs an example, we consider a study of therisk of breast cancer in women due to smok-ing:

Page 31: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

A

Attributable Risk 23

Group Incidencerate

Attributable to riskfrom smoking

(/100000/year)

(A) (/100000 /year)

Nonex-posed

57.0 57.0− 57.0 = 0

Passivesmokers

126.2 126.2− 57.0 = 69.2

Activesmokers

138.1 138.1− 57.0 = 81.1

Total 114.7 114.7− 57.0 = 57.7

The risks attributable to passive andactive smoking are respectively 69 and 81(/100000 year). In other words, if theexposure to tobacco was removed, theincidence rate for active smokers (138/

100000 per year) could be reduced by81/100000 per year and that for pas-sive smokers (126/100000 per year) by69/100000 per year. The incidence rates inboth categories of smokers would becomeequal to the rate for nonexposed women(57/100000 per year). Note that the inci-dence rate for nonexposed women is notzero, due to the influence of other factorsaside from smoking.

Group No.indiv.observedover twoyears

Casesattrib. tosmoking(fortwo-yearperiod)

Casesattrib. tosmoking(peryear)

Nonex-posed

70160 0.0 0.0

Passivesmok-ers

110860 76.7 38.4

Activesmok-ers

118636 96.2 48.1

Total 299656 172.9 86.5

We can calculate the number of cases ofbreast cancer attributable to tobacco expo-sure by multiplying the number of individ-uals observed per year by the attributablerisk. By dividing the number of incidentsattributable to smoking in the two-year peri-od by two, we obtain the number of casesattributable to smoking per year, and we canthen determine the risk attributable to smok-ing in thepopulation,denotedPAR,asshownin the following example. The previous tableshows the details of the calculus.We describe the calculus for the pas-sive smokers here. In the two-year study,110860 passive smokers were observed.The risk attributable to the passive smokingwas 69.2/100000 per year. This means thatthe number of cases attributable to smok-ing over the two-year period is (110860 ·69.2)/100000 = 76.7. If we want to calcu-late the number of cases attributable to pas-sive smoking per year, we must then dividethe last value by 2, obtaining 38.4. More-over, we can calculate the risk attributableto smoking per year simply by dividing thenumber of cases attributable to smoking forthe two-year period (172.9) by the numberof individuals studied during these two years(299656 persons). We then obtain the riskattributable to smoking as 57.7/100000 peryear. We note that we can get the same resultby taking the difference between the totalincidence rate (114.7/100000 per year, seethe examples under the entries for incidencerate, prevalence rate) and the incidencerate of the nonexposed group (57.0/100000per year).The risk of breast cancer attributable tosmoking in the population (PAR) is the ratioof the number of the cases of breast can-cer attributable to exposure to tobacco andthe number of cases of breast cancer diag-

Page 32: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

24 Autocorrelation

nosed in the population (see the above table).The attributable risk in the population is22.3% (38.4/172) for passive smoking and28%(48.1/172) foractivesmoking.Forbothforms of exposure, it is 50.3% (22.3% +28%). So, half of the cases of breast cancerdiagnosed each year in this population areattributable to smoking (active or passive).

Group Caseattrib. tosmoking(for a two-year period)

Caseattrib. tosmoking(per year)

PAR

Nonex-posed

0.0 20 0.0

Passivesmok-ers

38.4 70 22.3

Activesmok-ers

48.1 82 28.0

Total 86.5 172 50.3

FURTHER READING� Avoidable risk� Cause and effect in epidemiology� Incidence rate� Odds and odds ratio� Prevalence rate� Relative risk� Risk

REFERENCESCornfield, J.: A method of estimating com-

parative rates from clinical data. Appli-cations to cancer of the lung, breast, andcervix. J. Natl. Cancer Inst. 11, 1269–75(1951)

Lilienfeld, A.M., Lilienfeld, D.E.: Founda-tions of Epidemiology, 2nd edn. Claren-don, Oxford (1980)

MacMahon, B., Pugh, T.F.: Epidemiology:Principles and Methods. Little Brown,Boston, MA (1970)

Morabia, A.: Epidemiologie Causale. Editi-ons Médecine et Hygiène, Geneva (1996)

Morabia, A.: L’Épidémiologie Clinique.Editions “Que sais-je?”. Presses Univer-sitaires de France, Paris (1996)

Autocorrelation

Autocorrelation, denoted ρk, is a measure ofthe correlation of a particular time serieswith the same time series delayed by k lags(the distance between the observations thatare so correlated). It is obtained by dividingthe covariance between two observations,separated by k lags, of a time series (auto-covariance) by the standard deviation of yt

and yt−k. If the autocorrelation is calculatedfor all values of k we obtain the autocorrela-tion function. For a time series that does notchange over time, the autocorrelation func-tion decreases exponentially to 0.

HISTORYThe first research into autocorrelation, thepartial autocorrelation and the correlogramwas performed in the 1920s and 1930s byYule, George, who developed the theory ofautoregressive processes.

MATHEMATICAL ASPECTSWe define the autocorrelation of time seriesYt by:

ρk = cov (yt, yt−k)

σyt σyt−k

Page 33: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

A

Autocorrelation 25

=

T∑t=k+1

(yt − y) (yt−k − y)

√T∑

t=k+1(yt − y)2

√T∑

t=k+1(yt−k − y)2

.

Here y is the mean of the series calculated onT − k lags, where T is the number of obser-vations.We find out that:

ρ0 = 1 and

ρk = ρ−k .

It is possible to estimate the autocorrelation(denoted ρk) provided the number of obser-vations is large enough (T > 30) using thefollowing formula:

ρk =

T∑t=k+1

(yt − y) (yt−k − y)

T∑t=1

(yt − y)2

.

The partial autocorrelation function fora delay of k lags is defined as the auto-correlation between yt and yt−k, the influ-ence of other variables is moved by k lags(yt−1, yt−2, . . . , yt−k+1).

Hypothesis TestingWhen analyzing the autocorrelation func-tion of a time series, it can be useful to knowthe terms ρk that are significantly differentfrom 0. Hypothesis testing then proceeds asfollows:

H0 : ρk = 0

H1 : ρk �= 0 .

For a large sample (T > 30), the coefficientρk tends asymptotically to a normal distri-bution with a mean of 0 and a standard devi-ation of 1√

T. The Student test is based on the

comparison of an empirical t and a theoret-ical t.

The confidence interval for the coefficientρk

is given by:

ρk = 0± tα/21√T

.

If the calculated coefficient ρk does not fallwithin this confidence interval, it is signifi-cantly different from 0 at the level α (gener-ally α = 0.05 and tα/2 = 1.96).

DOMAINS AND LIMITATIONSThe partial autocorrelation function is prin-cipally used in studies of time series and,more specifically, when we want to adjust anARMAmodel. These functionsarealso usedin spatial statistics, although in the contextof spatial autocorrelation, where we investi-gate the correlation of a variable with itselfin space. If the presence of a phenomenon ina particular spatial region affects the proba-bility of the phenomenon being present inneighboring regions, the phenomenon dis-plays spatial autocorrelation. In this case,positive autocorrelation occurs when theneighboring regions tend to have identi-cal properties or similar values (examplesinclude homogeneous regions and regulargradients). Negative autocorrelation occurswhen the neighboring regions have differ-ent qualities, or alternate between strong andweak values for the phenomenon. Autocor-relation measures depend on the scaling ofthe variables which are used in the analysisas well as on the grid that registers the obser-vations.

EXAMPLESWe take as an example the national aver-age wage in Switzerland from 1950 to 1994,measured every two years.

Page 34: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

26 Avoidable Risk

We calculate the autocorrelation functionbetween the data; we would like to find a pos-itive autocorrelation. The following figuresshow the presence of this autocorrelation.

We note that the correlation significancepeaks between the observation at time t andtheobservationattime t−1,andalsobetweenthe observation at time t and the observationat time t−2. This data configuration is typi-cal of an autoregressive process. For two firstvalues, we can see that this autocorrelationis significant, because the Student statistic tfor the T = 23 observations gives:

ρk = 0± 1.961√23

.

Year National average wage

50 11999

52 12491

54 13696

56 15519

58 17128

60 19948

62 23362

64 26454

66 28231

68 30332

70 33955

72 40320

74 40839

76 37846

78 39507

80 41180

Year National average wage

82 42108

84 44095

86 48323

88 51584

90 55480

92 54348

94 54316

Source: Swiss Federal Office of Statistics

FURTHER READING� Student test� Time series

REFERENCESBourbonnais, R.: Econométrie, manuel et

exercices corrigés, 2nd edn. Dunod, Paris(1998)

Box, G.E.P., Jenkins, G.M.: Time SeriesAnalysis: Forecasting and Control (SeriesinTimeSeriesAnalysis).HoldenDay,SanFrancisco (1970)

Chatfield, C.: The Analysis of Time Series:An Introduction, 4th edn. Chapman &Hall (1989)

Avoidable Risk

The avoidable risk (which, of course, isavoidable if we neutralize the effect of expo-sure to aparticularphenomenon) is theoppo-site to the attributable risk. In other words,it is the difference between the risk encoun-tered by nonexposed individuals and thatencountered by individuals exposed to thephenomenon.

HISTORYSee risk.

Page 35: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

A

Avoidable Risk 27

MATHEMATICAL ASPECTSBy definition we have:

avoidable risk = risk if not exposed

− risk if exposed .

DOMAINS AND LIMITATIONSThe avoidable risk was introduced in orderto avoid the need for defining a negativeattributable risk. It allows us to calculatethe number of patients that will need to betreated, because:

Number of patientsto be treated

= 1

Avoidable risk.

See also attributable risk.

EXAMPLESAs an example, consider a study of the effi-ciency of a drug used to treat an illness.The 223 patients included in the study areall at risk of contracting the illness, but theyhave not yet done so. We separate theminto two groups: patients in the first group(114 patients) received the drug; those inthe second group (109 patients) were givena placebo. The study period was two years.In total, 11 cases of the illness are diagnosedin the first group and 27 in the placebo group.

Group Casesofillness

Number ofpatients inthe group

Risk forthetwo-yearperiod

(A) (B) (A/B in %)

1st group 11 114 9.6%

2nd group 27 109 24.8%

So, theavoidableriskduetothedrugis24.8−9.6 = 15.2% per two years.

FURTHER READING� Attributable risk� Cause and effect in epidemiology� Incidence rate� Odds and odds ratio� Prevalence rate� Relative risk� Risk

REFERENCESCornfield, J.: A method of estimating com-

parative rates from clinical data. Appli-cations to cancer of the lung, breast, andcervix. J. Natl. Cancer Inst. 11, 1269–75(1951)

Lilienfeld, A.M., Lilienfeld, D.E.: Founda-tions of Epidemiology, 2nd edn. Claren-don, Oxford (1980)

MacMahon, B., Pugh, T.F.: Epidemiology:Principles and Methods. Little Brown,Boston, MA (1970)

Morabia, A.: Epidemiologie Causale. Editi-ons Médecine et Hygiène, Geneva (1996)

Morabia, A.: L’Épidémiologie Clinique.Editions “Que sais-je?”. Presses Univer-sitaires de France, Paris (1996)

Page 36: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Bar Chart

Bar chart is a type of quantitative graph. Itconsists of a series of vertical or horizontalbars of identical width but with lengths rel-ative to the represented quantities.Bar charts are used to compare the cate-goriesofacategorical qualitative variableor to compare sets of data from differentyears or different places for a particular vari-able.

HISTORYSee graphic representation.

MATHEMATICAL ASPECTSA vertical axis and a horizontal axis must bedefined in order to construct a vertical barchart.The horizontal axis is divided up into differ-ent categories; the vertical axis shows thevalue of each category.To construct a horizontal bar chart, the axesare simply inverted.The bars must all be of the same width sinceonly their lengths are compared.Shading, hatching or color can be used tomake it easier to understand the the graph-ic.

DOMAINS AND LIMITATIONSA bar chart can also be used to represent neg-ative category values. To be able to do this,the scale of the axis showing the categoryvalues must extend below zero.There are several types of bar chart. The onedescribed above is called a simple bar chart.A multiple bar chart is used to compare sev-eral variables.A composite bar chart is a multiple bar chartwhere the different sets of data are stackedon top of each other. This type of diagram isused when the different data sets can be com-bined into a total population, and we wouldlike to compare the changes in the data setsand the total population over time.There is another way of representing the sub-setsofa totalpopulation. In thiscase, the totalpopulation represents 100% and value givenfor each subset is a percentage of the total(also see pie chart).

EXAMPLESLet us construct a bar chart divided intopercentages for the data in the followingfrequency table:

Page 37: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

30 Barnard, George A.

Marital status in a sample of the Australianfemale population on the 30th June 1981 (inthousands)

Maritalstatus

Fre-quency

Relativefre-quency

Percent-age

Bachelor 6587.3 0.452 45.2

Married 6836.8 0.469 46.9

Divorced 403.5 0.028 2.8

Widow 748.7 0.051 5.1

Total 14576.3 1.000 100.0

Source: ABS (1984) Australian Pocket YearBook. Australian Bureau of Statistics, Canber-ra, p. 11

FURTHER READING� Graphical representation� Quantitative graph

Barnard, George A.

Barnard, George Alfred was born in 1915,in Walthamstow, Essex, England. He gaineda degree in mathematics from CambridgeUniversity in 1936. Between 1942 and 1945he worked in the Ministry of Supply as a sci-entificconsultant.Barnard joined theMathe-matics Department at Imperial College Lon-don from 1945 to 1966. From 1966 to 1975he was Professor of Mathematics in the Uni-versity of Essex, and from 1975 until hisretirement in1981hewasProfessorofStatis-tics at the University of Waterloo, Canada.

Barnard, George Alfred received numerousdistinctions, including a gold medal from theRoyal Statistical Society and from the Insti-tute of Mathematics and its Applications. In1987 he was named an Honorary Member ofthe InternationalStatistical Institute.Hediedin 2002 in August.

Some articles of Barnard, George Alfred:

1954 Sampling inspection and statisticaldecisions. J. Roy. Stat. Soc. Ser. B 16,151–174.

1958 Thomas Bayes – A biographical note.Biometrika 45, 293–315.

1989 On alleged gains in power from lowerp-values. Stat. Med., 8, 1469–1477.

1990 Must clinical trials be large? Theinterpretation of p-values and thecombination of test results. Stat.Med., 9, 601–614.

Bayes’ Theorem

If we consider the set of the “reasons” thatan event occurs, Bayes’ theorem gives a for-mula for the probability that the event is thedirect result of a particular reason.Therefore, Bayes’ theorem can be interpret-ed as a formula for the conditional proba-bility of an event.

HISTORYBayes’ theorem is named after Bayes,Thomas, and was developed in the mid-dle of eighteenth century. However, Bayesdid not publish the theorem during his life-time; instead, it was presented by Price, R.on the 23rd December 1763, two years afterhis death, to the Royal Society of London,

Page 38: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Bayes’ Theorem 31

which Bayes was a member of during thelast twenty last years of his life.

MATHEMATICAL ASPECTSLet{A1, A2, . . . , Ak}beapartitionof thesam-ple space �. We suppose that each eventA1, . . . , Ak has a nonzero probability. Let Ebe an event such that P(E) > 0.So, for every i(1 ≤ i ≤ k), Bayes’ theorem(for the discrete case) gives:

P(Ai|E) = P(Ai) · P(E|Ai)

k∑j=1

P(Aj) · P(E|Aj)

.

In the continuous case, where X is a ran-dom variable with density function f (x),said also to be an a priori density function,Bayes’ theorem gives the density a posterioriaccording to

f (x|E) = f (x) · P(E|X = x)∫∞−∞ f (t) · P(E|X = t)dt

.

DOMAINS AND LIMITATIONSBayes’ theorem has been the object of muchcontroversy, relating to the ability to use itwhen the values of the probabilities used todetermine the probability function a poste-riori arenotgenerally established in apreciseway.

EXAMPLESThreeurnscontain red,whiteandblackballs:• Urn A contains 5 red balls, 2 white balls

and 3 black balls;• Urn B contains 2 red balls, 3 white balls

and 1 black balls;• Urn C contains 5 red balls, 2 white balls

and 5 black balls.

Randomly choosing an urn, we draw a ballat random: it is white. We wish to determinethe probability that it was taken from urn A.Let A1 correspond to the event where we“choose urn A”, A2 be the event where we“choose urn B,” and A3 be the event wherewe “choose urn C.” {A1, A2, A3} forms a par-tition of the sample space.Let E be the event where “the ball taken iswhite,” which has a strictly positive proba-bility.We have:

P(A1) = P(A2) = P(A3) = 13 ,

P(E|A1) = 210 , P(E|A2) = 3

6 ,

and P(E|A3) = 212 .

Bayes’ formula allows us to determine theprobability that the drawn white ball comesfrom the urn A:

P (A1|E) = P(A1) · P(E|A1)

3∑i=1

P(Ai) · P(E|Ai)

=13 · 2

1013 · 2

10 + 13 · 3

6 + 13 · 2

12

= 313 .

FURTHER READING� Conditional probability� Probability

REFERENCEBayes, T.: An essay towards solving a prob-

lem in the doctrine of chances. Philos.Trans. Roy. Soc. Lond. 53, 370–418(1763). Published, by the instigation ofPrice, R., 2 years after his death. Repub-lished with a biography by Barnard,George A. in 1958 and in Pearson, E.S.,Kendall, M.G.: Studies in the History ofStatistics and Probability. Griffin, Lon-don, pp. 131–153 (1970)

Page 39: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

32 Bayes, Thomas

Bayes, ThomasBayes, Thomas (1702–1761) was the eldestson ofBayes, Joshua,who wasoneof thefirstsix Nonconformist ministers to be ordainedin England, and was a member of the RoyalSociety. He was privately schooled by pro-fessors, as was customary in Nonconformistfamilies. In 1731 he became reverend ofthe Presbyterian chapel in Tunbridge Wells,a town located about 150 km south-west ofLondon. Due to some religious publicationshe was elected a Fellow of the Royal Societyin 1742.His interest in mathematics was well-knownto his contemporaries, despite the fact thathe had not written any technical publica-tions, because he had been tutored by DeMoivre, A., one of the founders of the theo-ry of probability. In 1763, Price, R. sortedthrough the papers left by Bayes and had hisprincipal work published:

1763 An essay towards solving a prob-lem in the doctrine of chances.Philos. Trans. Royal Soc. London,53, pp. 370–418. Republished witha biography by Barnard, G.A. (1958).In: Pearson, E.S. and Kendall, M(1970). Studies in the History ofStatistics and Probability. Griffin,London, pp. 131–153.

FURTHER READING� Bayes’ theorem

Bayesian StatisticsBayesien statistics is a large domain in thefield of statistics that differs due to an axiom-atization of the statistics that gives it a certaininternal coherence.

The basic idea is to interpret the probabilityof an event as it is commonly used; in otherwords as the uncertainty that is related to it.In contrast, the classical approach considersthe probability of an event to be the limit ofthe relative frequency (see probability fora more formal approach).The most well-known aspect of Bayesianinference is theprobabilityofcalculatingthejoint probability distribution (or densityfunction) f (θ , X = x1, . . . , X = xn) ofone or many parameters θ (one parameter ora vector of parameters) having observed thedata x1, . . . , xn sampled independently froma random variable X on which θ depends. (Itis worth noting that it also allows us to cal-culate the probability distribution for a newobservation xn+1).Bayesian statistics treat the unknown param-eters as random variables not because ofpossible variability (in reality, the unknownparameters are considered to be fixed), butbecause of our ignorance or uncertaintyabout them.The posterior distribution f (θ |X = x1, . . . ,X = xn) is direct to compute since it isthe prior ( f (θ)) times the likelihood f (X =x1, . . . , X = xn|θ).

posterior ∝ prior x likelihood

The second expression does not cause prob-lems, because it is a function that we oftenuse in classical statistics, known as the like-lihood (see maximum likelihood).In contrast, the first part supposes a priordistribution for θ . We often use the initialdistribution of θ to incorporate possible sup-plementary information about the param-eters of interest. In the absence of this infor-mation,weuseareferencefunctionthatmax-imizes the lack of information (which is thenthe most “objective” or “noninformative”

Page 40: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Bayesian Statistics 33

function, following the common but not pre-cise usage).Once the distribution f (θ |x1, . . . , xn ) iscalculated, all of the information on theparameters of interest is available. There-fore,wecancalculateplausiblevaluesfor theunknown parameter (the mean, the medianor some other measure of central ten-dency), its standard deviation, confidenceintervals, or perform hypothesis testing onits value.

HISTORYSee Bayes, Thomas and Bayes’ theorem.

MATHEMATICAL ASPECTSLet D be the set of data X = x1, . . . , X =xn independently sampled from a randomvariable X of unknown distribution. We willconsider the simple case where there is onlyone interesting parameter, θ , which dependson X.Then a standard Bayesian procedure can beexpressed by:1. Identify the known quantities x1, . . . , xn.2. Specify a model for the data; in oth-

er words a parametric family f (x |θ ) ofdistributions that describe the generationof data.

3. Specify the uncertainty concerning θ byan initial distribution function f (θ).

4. We can then calculate the distributionf (θ |D ) (called the final distribution)using Bayes’ theorem.

The first two points are common to every sta-tistical inference.The third point is more problematic. In theabsenceofsupplementary informationaboutθ , the idea is to calculate a reference distri-bution f (θ) by maximizing a function thatspecifies the missing information on the

parameter θ . Once this problem is resolved,the fourth point iseasily tackled with thehelpof Bayes’ theorem.Bayes’ theorem can be expressed, in its con-tinuous form, by:

f (θ |D ) = f (D |θ ) · f (θ)

f (D)

= f (D |θ ) · f (θ)∫f (D |θ ) f (θ) dθ

.

Since the xi are independent, we can write:

f (θ |D ) =

n∏i=1

f (xi |θ ) · f (θ)

∫ n∏i=1

f (xi |θ )f (θ) dθ

.

Now we have the means to calculate the den-sity function of a new (independent) obser-vation xn+1, given x1, . . . , xn:

f (X = xn+1 |D )

= f (X = x1, . . . , X = xn+1)

f (X = x1, . . . , X = xn)

=

∫f (X = x1, . . . , X = xn+1 |θ )

·f (θ) dθ∫f (D |θ ) · f (θ) dθ

=∫ n+1∏

i=1f (X = xi |θ ) · f (θ) dθ

∫ n∏i=1

f (X = xi |θ ) · f (θ) dθ

=∫

f (X = xn+1 |θ ) · f (θ |D ) dθ .

We will now briefly explain the methods thatallow us to:• Find a value for the estimated parameter

that is more probable than the others;• Find a confidence interval for θ , and;• Perform hypothesis testing.These methods are strictly related to deci-sion theory, which plays a considerable rolein Bayesian statistics.

Page 41: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

34 Bayesian Statistics

Point Estimation of the ParameterTo get a point estimation for the param-eter θ , we specify a loss function l(θ , θ) thatcomes from using θ (the estimated value)instead of the true value (the unknown) θ .Then we minimize this function of θ . Forexample, if θ is real and “the loss is quadrat-ic” (that is, l(θ , θ) = (θ−θ)2), then the pointestimation of θ will be the mean of the cal-culated distribution.

Confidence IntervalsThe concept of a confidence interval isreplaced in bayesian statistics by the conceptof an α-credible region (where α is the “con-fidence level”), which is simply defined as aninterval I such that:

If (θ |D ) dθ = α .

Often, we also require that the width of theinterval is minimized.

Hypothesis TestingThe general approach to hypothesis testing:

H0 : θ ∈ I versus H1 : θ /∈ I .

is related to decision theory. Based on this,we define a loss function l(a0, θ) that acceptsH0 (where the true value of the parameter isθ ) and a loss function l(a1, θ) that rejects H0.If the value for the true value obtained byaccepting H0, that is,

∫l(a0, θ)dθ ,

is smaller than to the one obtained by reject-ing H0, then we can accept H0.Using this constraint, we reject the restric-tions imposed on θ by the null hypothesis(θ ∈ I).

EXAMPLESThe following example involves estimatingthe parameter θ from the Bernoulli distri-bution X with the help of n independentobservations x1, . . . , xn, taking the value 1 inthecaseof successand 0 in thecaseof failure.Let r be the number of successes and n − rbe the number of failures among the obser-vations.We have then:

L(θ) = P (X = x1, . . . , X = xn |θ )

=n∏

i=1

P (X = xi |θ )

= θ r · (1− θ)n−r .

An estimation of the maximum likelihoodof θ , denoted by θmle, maximizes this func-tion. To do this, we consider the logarithmof this function, in other words the log-likelihood:

log L (θ) = r log (θ)+ (n− r) log (1− θ) .

We maximize this by setting its derivative byθ equal to zero:

∂ (log L)

∂θ= r

θ− n− r

1− θ= 0 .

This is equivalent to r (1− θ) = (n− r) θ ,and it simplifies to θmle = r

n . The estimatorfor the maximum likelihood of θ is then sim-ply the proportion of observed successes.Now we return to the bayesian method.In this case, the Bernoulli distribution, thea priori reference distribution of the param-eter θ , is expressed by:

f (θ) = c · (h (θ))12 ,

where c is an appropriate constant (such as∫f (θ)dθ = 1) and where h(θ) is called the

Page 42: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Bayesian Statistics 35

Fisher information of X:

h(θ) = −E

[∂2L

∂θ2

]

= −E

[− X

θ2 −1− X

(1− θ)2

]

= E [X]

θ2 +1− E [X]

(1− θ)2

= θ

θ2 +1− θ

(1− θ)2

= 1

θ+ 1

1− θ= 1

θ (1− θ).

The distribution will then be:

f (θ) = c · θ− 12 · (1− θ)−

12 .

The distribution function of θ , given theobservations x1, . . . , xn, can then be ex-pressed by:

f (θ |X = x1, . . . , X = xn )

= 1

d

n∏i=1

P (X = xi |θ ) · f (θ)

= 1

d· θ r · (1− θ)n−r · c · θ− 1

2 · (1− θ)−12

= c

d· θ r− 1

2 · (1− θ)n−k− 12

= c

d· θ(r+ 1

2 )−1 · (1− θ)(n−r+ 12 )−1 .

whichisabeta distributionwithparameters

α = r + 1

2and β = n− r + 1

2,

and with a constant of

c

d= (α + β)

(α) (β),

where is the gamma function (see gammadistribution).We now consider a concrete case, wherewe want to estimate the proportion of HIV-positive students. We test 200 students and

noneof themisHIV-positive.Theproportionof HIV-positive students is therefore esti-mated to be 0 by the maximum likelihood.Confidence intervals are not very useful inthis case, because (if we follow the usualapproach) they are calculated by:

0± 1.96 ·√p(1− p) .

In this case, as p is the proportion ofobservedsuccesses, the confidence interval reducesto 0.Following bayesian methodology, we obtainthe distribution, based only on the data,that describes the uncertainty about the par-ameters to be estimated. The larger n is, themore sure we are about θ ; the final referencedistribution for θ is then more concentratedaround the true value of θ .In this case, we find as a final referencedistri-bution a beta distribution with parametersα = 0.5 and β = 200.5, which summarizesthe information about θ (the values that cor-respond to the spikes in the distribution arethemostprobable). It isoftenuseful tograph-ically represent such results:

We can see from this that:• The probability that the proportion of

interest is smaller then 0.015 is almost 1;

Page 43: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

36 Bernoulli Distribution

• The probability that the proportion ofinterest is smaller then 0.005 is approx-imately 0.84, and;

• The median, which is chosen as the bestmeasure of the central tendency due tothe strong asymmetry of the distribution,is 0.011.

We remark that there is a qualitative dif-ference between the classical result (theproportion can be estimated as zéro) andthe bayesian solution to the problem, whichallows us to calculate the probability distri-bution of the parameter, mathematicallytranslating the uncertainty about it. Thismethod tell us in particular that, given cer-tain information, the correct estimation forthe proportion of interest is 0.011. Note thatthe bayesian estimation depends on (like theuncertainty) the number of the observed cas-es, and it is equivalent to observing 0 casesof HIV among 2, 200 or 20.0 students in theclassical case.

FURTHER READING� Bayes’ theorem� Bayes, Thomas� Conditional probability� Inference� Joint density function� Maximum likelihood� Probability

REFERENCESBernardo, J.M., Smith, A.F.M.: Bayesian

Theory. Wiley, Chichester (1994)

Berry, D.A.: Statistics, a Bayesian Perspec-tive. Wadsworth, Belmont, CA (1996)

Box, G.E.P., Tiao, G.P.: Bayesian Inferencein Statistical Analysis. Addison-Wesley,Reading, MA (1973)

Carlin, B., Louis, T.A.: Bayes & EmpiricalBayes Methods. Chapman & Hall /CRC,London (2000)

Finetti, B. de: Theory of Probability; A Crit-ical Introductory Treatment. Trans. Ma-chi, A., Smith, A. Wiley, New York (1975)

Gelman, A., Carlin, J.B., Stern H.S.,Rubin, D.B.: Bayesian Data Analysis.Chapman & Hall /CRC, London (2003)

Lindley, D.V.: The philosophy of statistics.Statistician 49, 293–337 (2000)

Robert, C.P.: Le choix Bayesien. Springer,Paris (2006)

Bernoulli DistributionA random variable X follows a Bernoullidistribution with parameter p if its proba-bility function takes the form:

P(X = x) ={

p for x = 1

q = 1− p for x = 0.

where p and q represent, respectively, theprobabilities of “success” and “failure,”symbolized by the values 1 and 0.

Bernoulli’s law, p = 0.3, q = 0.7

The Bernoulli distribution is a discreteprobability distribution.

Page 44: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Bernoulli, Jakob 37

HISTORYSee binomial distribution.

MATHEMATICAL ASPECTSThe expected value of the Bernoulli distri-bution is by definition:

E[X] =1∑

x=0

x · P(X = x)

= 1 · p+ 0 · q = p .

The variance of the Bernoulli distribution isby definition:

Var(X) = E[X2]− (E[X])2

= 12 · p+ 02 · q− p2

= p− p2 = pq .

DOMAINS AND LIMITATIONSThe Bernoulli distribution is used whenarandom experimenthasonly twopossibleresults: “success” or “failure.” These resultsare usually symbolized by 0 and 1.

FURTHER READING� Binomial distribution� Discrete probability distribution

Bernoulli Family

Originally from Basel, the Bernoulli fami-ly contributed several mathematicians to sci-ence. Bernoulli, Jacques (1654–1705) stud-ied theology at the University of Basel,according to the will of his father, and thentraveled for several years, teaching and con-tinuing his own studies. Having resolute-ly steered himself towards mathematics, hebecame a professor at the University ofBasel in 1687. According to Stigler (1986),

Bernoulli, Jacques (called Bernoulli, Jamesin most the English works) is the father of thequantification of uncertainty.It was only in 1713, seven years after hisdeath, and on the instigation of his nephewNicolas, that his main work Ars Conjectandiwas published. This work is divided into fourparts: in the first the author comments uponthe famous treatise of Huygens; the second isdedicated to the theory of permutations andcombinations; the third to solving diverseproblems about games of chance; and final-ly the fourth discusses the application of thetheory of probability to questions of moralinterest and economic science.A great number of the works of Bernoulli,Jacques were never published.Jean Bernoulli (1667–1748), who was moreinterested in mathematics than the med-ical career his father intended for him,also became a university professor, firstin Groningen in the Netherlands, and thenin Basel, where he took over the chair leftvacant by the death of his brother.The two brothers had worked on differentialand integral calculusandminimizationprob-lems and had also studied functions.

REFERENCESStigler, S.: The History of Statistics, the

Measurement of Uncertainty Before1900. Belknap, London (1986) pp. 62–71

Bernoulli, Jakob

Bernoulli, Jakob (or Jacques or Jacob orJames) (1655–1705) and his brother Jeanwere pioneers of Leibniz calculus. Jakobreformulated the problem of calculating anexpectation into probability calculus. Healso formulated the weak law of large num-

Page 45: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

38 Bernoulli Trial

bers, upon which modern probability andstatistics are based. Bernoulli, Jakob wasnominated maître ès lettres in 1671. Duringhis studies in theology, which he terminatedin 1676, he studied mathematics and astron-omy, contrary to the will of his father (Niko-laus Bernoulli).In 1682, Leibniz published (in Acta Eru-ditorium) a method that could be used todetermine the integrals of algebraic func-tions, a brief discourse on differential cal-culus in an algorithmic scheme, and someremarks on the fundamental idea of integralcalculus. This paper attracted the attentionof Jakob and his brother, and they consider-ably improved upon the work already doneby Leibniz. Leibniz himself recognized thatinfinitesimal calculus was mostly foundedby theBernoullibrothers rather thanhimself.Indeed, in 1690 Jakob introduced the term“integral.”In 1687, he was nominated Professor ofMathematics at the University of Basel,where he stayed until his death in 1705.Ars conjectandi is the title of what is gen-erally accepted as Bernoulli’s most originalwork. It consists of four parts: the first con-tains a reprint of Huygen’s De Ratiociniisin Ludo Aleae (published in 1657), whichis completed via important modifications.In the second part of his work, Bernoulliaddresses combinatory theory. The third partcomprises 24 examples that help to illustratethe modified concept of the expected value.Finally, the fourth part is the most interestingand original, even though Bernoulli did nothave the time to finish it. It is in this part thatBernoulli distinguishes two ways of defin-ing (exactly or approximately) the classicalmeasure of probability.Around 1680, Bernoulli, Jakob also becameinterested in stochastics.Theevolution ofhis

ideas can be followed in his scientific journalMeditations.Some of the main works and articles ofBernoulli, Jakob include:

1677 Meditationes, Annotationes, Ani-madversiones Theologicae & Philo-sophicae, a me JB. concinnatae &collectae ab anno 1677. Universitäts-bibliothek, Basel, L I a 3.

1713 Ars Conjectandi, Opus Posthumum.Accedit Tractatus de Seriebus infini-tis, et Epistola Gallice scripta de ludoPilae recticularis. Impensis Thurni-siorum, Fratrum, Basel.

Bernoulli Trial

The Bernoulli trials are repeated tests of anexperiment that obey the following rules:1. Each trial results in either success or fail-

ure;2. The probability of success is the same

for each trial; the probability of success isdenoted by p, and the probability of fail-ure by q = 1− p;

3. The trials are independent.

HISTORYThe Bernoulli trials take their name fromthe Swiss mathematician Bernoulli, Jakob(1713).Bernoulli, Jakob (1654–1705) was the eldestof four brothers and it was his father’s willthat he should study theology. When he hadfinished his studies in theology in Basel in1676, he briefly left the town only to returnin 1680 in order to devote himself to mathe-matics. He obtained the Chair of Mathe-matics at the University of Basel in 1687.

Page 46: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Bernoulli’s Theorem 39

EXAMPLESThe most simple example of a Bernoulli trialis the flipping of a coin. If obtaining “heads”isconsidered tobeasuccess(S)whileobtain-ing “tails” is considered to be a failure (F),we have:

p = P(S) = 12

and

q = P(F) = 1− p = 12 .

FURTHER READING� Bernoulli distribution� Binomial distribution

REFERENCESBernoulli, J.: Ars Conjectandi, Opus Posthu-

mum. Accedit Tractatus de Seriebusinfinitis, et Epistola Gallice scripta deludo Pilae recticularis. Impensis Thurni-siorum, Fratrum, Basel (1713)

Ross, S.M.: Introduction to ProbabilityModels, 8th edn. John Wiley, New York(2006)

Bernoulli’s Theorem

Bernoulli’s theoremsaysthat therelativefre-quency of success in a sequence of Bernoul-li trials approaches the probability of suc-cessas thenumberof trials increases towardsinfinity.It is a simplified form of the law of largenumbers and derives from the Chebyshevinequality.

HISTORYBernoulli’s theorem, sometimes called the“weak law of large numbers,” was firstdescribed by Bernoulli, Jakob (1713) in his

work Ars Conjectandi, which was published(with the help of his nephew Nikolaus) sevenyears after his death.

MATHEMATICAL ASPECTSIf S represents the number of successesobtained during n Bernoulli trials, and if pis the probability of success, then we have:

limn→∞P

(∣∣∣∣S

n− p

∣∣∣∣ ≥ ε

)= 0 ,

or

limn→∞P

(∣∣∣∣S

n− p

∣∣∣∣ < ε

)= 1 ,

where ε > 0 and arbitrarily small.In an equivalent manner, we can write:

S

n−→n→∞ p ,

which means that the relative frequency ofsuccess tends to the probability of successwhen n tends to infinity.

FURTHER READING� Bernoulli distribution� Bernoulli trial� Convergence� Law of large numbers

REFERENCESBernoulli, J.: Ars Conjectandi, Opus Posthu-

mum. Accedit Tractatus de Seriebusinfinitis, et Epistola Gallice scripta deludo Pilae recticularis. Impensis Thurni-siorum, Fratrum, Basel (1713)

Rényi, A.: Probability Theory. North Hol-land, Amsterdam (1970)

Page 47: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

40 Beta Distribution

Beta DistributionA random variable X follows a beta distri-bution with parameters α and β if its den-sity function is of the form:

f (x) = 1

B(α, β)(x− a)α−1(b− x)β−1

· (b− a)−(α+β−1) ,

a ≤ x ≤ b , α > 0 and β > 0 ,

where:

B(α, β) =∫ 1

0tα−1(1− t)β−1dt

= (α) (β)

(α + β),

is the gamma function (see gamma distri-bution).

Beta distribution α = 2, β = 3, a = 3, b = 7

Thebetadistribution isacontinuous proba-bility distribution.

MATHEMATICAL ASPECTSThe density function of the standard betadistribution is obtained by performing thevariable change Y = X−a

b−a :

f (y) =

⎧⎪⎪⎨⎪⎪⎩

1B(α,β)

yα−1

·(1− y)β−1if 0 < y < 1

0 if not

.

Consider X, a random variable that followsthe standard beta distribution. The expected

valueandthevarianceofX are,respectively,given by:

E[X] = α

α + β,

Var(X) = α · β(α + β)2 · (α + β + 1)

.

DOMAINS AND LIMITATIONSThe beta distribution is one of the most fre-quentlyused toadjustempiricaldistributionswhere therange (orvariation interval) [a, b]is known.Herearesomeparticularcaseswhere thebetadistribution is used, related to other contin-uous probability distributions:• If X1 and X2 are two independent random

variables each distributed according toa gamma distribution with parameters(α1, 1) and (α2, 1), respectively, the ran-dom variable

X1

X1 + X2,

is distributed according to a beta distri-bution with parameters (α1, α2).

• The beta distribution becomes a uniformdistribution when

α = β = 1 .

• When the parameters α and β tends to-wards infinity, the beta distribution tendstowards the standard normal distribu-tion.

FURTHER READING� Continuous probability distribution� Gamma distribution� Normal distribution� Uniform distribution

Page 48: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Bias 41

Bias

From a statistical point of view, the bias isdefinedasthedifferencebetween theexpect-ed value of a statistic and the true valueof the corresponding parameter. Therefore,the bias is a measure of the systematic errorof an estimator. If we calculate the mean ofa large number of unbiased estimations, wewill find the correct value. The bias indicatesthe distance of the estimator from the truevalue of the parameter.

HISTORYThe concept of an unbiased estimatorcomes from Gauss, C.F. (1821), during thetime when he worked on the least squaresmethod.

DOMAINS AND LIMITATIONSWe should not confuse the bias of an estima-tor of a parameter with its degree of preci-sion,which isameasurementof thesamplingerror.There are several types of bias, selection bias(due to systematic differences between thegroups compared), exclusion bias (due to thesystematic exclusion of certain individualsfrom the study) or analytical bias (due to theway that the results are evaluated).

MATHEMATICAL ASPECTSConsider a statistic T used to estimatea parameter θ . If E[T] = θ + b(θ)

(where E[T] represents the expected valueof T), then the quantity b(θ) is called thebias of the statistic T.If b(θ) = 0, we have E[T] = θ , and T is anunbiased estimator of θ .

EXAMPLESConsider X1, X2, . . . , Xn, a sequence of inde-pendent random variables distributedaccording to the same law of probabilitywith a mean μ and a finite variance σ 2. Wecan calculate the bias of the estimator

S2 = 1

n∑i=1

(xi − x)2 ,

used to estimate the variance σ 2 of the pop-ulation in the following way:

E[S2] = E

[1

n

n∑i=1

(xi − x)2

]

= E

[1

n

n∑i=1

(xi − μ)2 − (x− μ)2

]

= 1

nE

[n∑

i=1

(xi − μ)2

]− E(x− μ)2

= 1

n

n∑i=1

E(xi − μ)2 − E(x− μ)2

= Var(xi)− Var(x)

= σ 2 − σ 2

n.

The bias of S2 is then equal to −σ 2

n .

FURTHER READING� Estimation� Estimator� Expected value

REFERENCESGauss, C.F.: Theoria Combinationis Obser-

vationum Erroribus Minimis Obnoxiae,Parts 1, 2 and suppl. Werke 4, 1–108(1821, 1823, 1826)

Page 49: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

42 Bienaymé, Irénée-Jules

Gauss, C.F.: Méthode des Moindres Car-rés. Mémoires sur la Combinaison desObservations. Traduction Française par J.Bertrand. Mallet-Bachelier, Paris (1855)

Bienaymé, Irénée-Jules

Bienaymé, Irenée-Jules (1796–1878)entered the French Ministry of Finance andbecame the Inspector General for Financein 1836, although he lost his employmentin 1848 for political reasons. Shortly afterthis, he began to give lessons in the Facultyof Science in Paris.A follower of Laplace, Pierre Simonde, Bienaymé proved the Bienaymé–Chebyshev inequality some years beforeChebyshev, Pafnutii Lvovich. The reedit-ed version of Bienaymé’s paper from 1867precedes the French version of Chebychev’sproof. This inequality was then used inChebyshev’s incomplete proof of centrallimit theorem, which was later finished byMarkov, Andrei Andreevich.Moreover, Bienaymé correctly formulatedthe theorem for branching processes in 1845.His most famous public work is probably thecorrections he made to the use of Duvillard’smortality table.Some principal works and articles of Bien-aymé, Irénée-Jules:

1853 Considérations à l’appui de la décou-vertedeLaplacesur la loideprobabil-ité dans la méthode des moindres car-rés. Comptes Rendus de l’Académiedes Sciences, Paris 37, 5–13; reeditedin 1867 in the Journal de Liouvillepreceding theproofof theBienaymé–Chebyshev inequality in J. Math.Pure. Appl., 12, 158–176.

Binary Data

Binary dataoccurwhen thevariableof inter-est can only take two values. These two val-uesaregenerally representedby0and1,evenif the variable is not quantitative.Gender, the presence or absence of a charac-teristic and the success or failure of an exper-iment are just a few examples of variablesthat result in binary data. These variables arecalled dichotomous variables.

EXAMPLESA meteorologist wants to know how reliablehis forecasts are. To do this, he studies a ran-dom variable representing the prediction.This variable can only take two values:

X ={

0 if the prediction was incorrect

1 if the prediction was correct.

The meteorologist makes predictions fora period of 50 consecutive days. The predic-tion isfoundtobecorrect32 times,and incor-rect 18 times.To find out whether his predictions are betterthan the ones that could have been obtainedby flipping a coin and predicting the weatherbased on whether heads or tails are obtained,he decides to test the null hypothesis H0,that the proportion p of correct predictions isequal to 0.5, against the alternative hypoth-esisH1, that thisproportion isdifferentto0.5:

H0 : p = 0.5

H1 : p �= 0.5 .

Let us calculate the value of

χ2 =∑

(observed frequency−theoretical frequency

)2

theoretical frequency,

Page 50: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Binomial 43

where the observed frequencies are 18and 32, and the theoretical frequencies are,respectively, 25 and 25:

χ2 = (32− 25)2 + (18− 25)2

25

= 49+ 49

25= 98

25= 3.92 .

By assuming that the central limit theo-rem applies, we compare this value withthe value of the chi-square distributionwith one degree of freedom. For a ’signifi-cance level of 5%, we find in the chi-squaretable:

χ21,0.95 = 3.84 .

Therefore, since 3.92 > 3.84, the nullhypothesis is rejected, which means thatthe meteorologist predicts the weather bet-ter than a coin.

FURTHER READING� Bernoulli distribution� Categorical data� Contingency table� Data� Dichotomous variable� Likelihood ratio test� Logistic regression� Variable

REFERENCESCox, D.R., Snell, E.J.: The Analysis

of Binary Data. Chapman & Hall(1989)

Bishop, Y.M.M., Fienberg, S.E., Hol-land, P.W.: Discrete Multivariate Anal-ysis: Theory and Practice. MIT Press,Cambridge, MA (1975)

BinomialAlgebraic sums containing variables arecalled polynomials (from the Greek “poly,”meaning “several”). An expression that con-tains two terms is called a binomial (from theLatin “bi”, meaning “double”). A monomial(fromtheGreek“mono”,meaning“unique”)is an expression with one term and a trino-mial (from the Latin “tri”, meaning “triple”)contains three elements.

HISTORYSee arithmetic triangle.

MATHEMATICAL ASPECTSThe square of a binomial is easily calculatedusing

(a+ b)2 = a2 + 2ab+ b2 .

Binomial formulae for exponents higherthan two also exist, such as:

(a+ b)3 = a3 + 3a2b+ 3ab2 + b3 ,

(a+ b)4 = a4 + 4a3b+ 6a2b2 + 4ab3

+ b4 .

Wecan writeageneralized binomial formulain the following manner:

(a+ b)n = an + nan−1b

+ n(n− 1)

2an−2b2

+ · · · + n(n− 1)

2a2bn−2

+ nabn−1 + bn

= C 0n an + C1

na n−1b+ C2nan−2b2

+ · · · + Cn−1n abn−1 + Cn

nbn

=n∑

k=0

Cknan−kbk.

Page 51: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

44 Binomial Distribution

DOMAINS AND LIMITATIONSThe binomial (p + q) raised to the power nlends its name to the binomial distributionbecause it corresponds to the total probabi-lity obtained after n Bernoulli trials.

FURTHER READING� Arithmetic triangle� Binomial distribution� Combination

Binomial Distribution

A random variable X follows a binomialdistribution with parameters n and p if itsprobability function takes the form:

P(X = x) = Cxn · px · qn−x ,

x = 0, 1, 2, . . . , n .

Therefore if an event comprises x “success-es” and (n − x) “failures,” where p is theprobability of “success” and q = 1− p theprobability of “failure,” the binomial distri-bution allows us to calculate the probabilityof obtaining x successes from n independenttrials.

Binomial distribution, p = 0.3, q = 0.7, n = 3

The binomial distribution with param-eters n and p, denoted B(n, p), is a discreteprobability distribution.

Binomial distribution, p = 0.5, q = 0.5, n = 4

HISTORYThe binomial distribution is one of the old-est known probability distributions. It wasdiscovered by Bernoulli, J. in his work enti-tled Ars Conjectandi (1713). This work isdivided into four parts: in the first, the authorcomments on the treatise from Huygens; thesecond part is dedicated to the theory of per-mutations and combinations; the third isdevoted to solving various problems relatedto games of chance; finally, in the fourth part,he proposes applying probability theory tomoral questions and to the science of eco-nomics.

MATHEMATICAL ASPECTSIf X1, X2, . . . , Xn are n independent randomvariables following a Bernoulli distri-bution with a parameter p, then the randomvariable

X = X1 + X2 + . . .+ Xn

follows a binomial distribution B(n, p).To calculate theexpected valueof X, the fol-lowing property will be used, where Y and Zare two random variables:

E[Y + Z] = E[Y]+ E[Z] .

We therefore have:

E[X] = E[X1 + X2 + . . .+ Xn]

= E[X1]+ E[X2]+ . . .+ E[Xn]

= p+ p+ . . .+ p = np .

Page 52: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Binomial Table 45

To calculate the variance of X, the followingproperty will be used, where Y and Z are twoindependent variables:

Var(Y + Z) = Var(Y)+ Var(Z) .

We therefore have:

Var(X) = Var(X1 + X2 + . . .+ Xn)

= Var(X1)+ Var(X2)+ . . .

+ Var(Xn)

= pq+ pq+ . . .+ pq = npq .

Binomial distribution tables (for the proba-bility distribution and the distributionfunction) have been published by Raoet al. (1985), as well as by the Nation-al Bureau of Standards (1950). Extendeddistribution function tables can be found inthe Annals of the Computation Laboratory(1955).

EXAMPLESA coin is flipped ten times. Consider the ran-dom variable X, which represents the num-ber of times that the coin lands on “tails.”We therefore have:

number of trials: n = 10

probability ofone success:

p = 12

(tails obtained)

probability of onefailure:

q = 12

(heads obtained)

The probability of obtaining tails x timesamongst the ten trials is given by

P(X = x) = Cx10 ·

(1

2

)x

·(

1

2

)10−x

.

The probability of obtaining tails exactlyeight times is therefore equal to:

P(X = 8) = C810 · p8 · q10−8

= 10!

8!(10− 8)!·(

1

2

)8

·(

1

2

)2

= 0.0439 .

The random variable X follows the bino-mial distribution B(10, 1

2 ).

FURTHER READING� Bernoulli distribution� Binomial table� Discrete probability distribution� Negative binomial distribution

REFERENCESBernoulli, J.: Ars Conjectandi, Opus Posthu-

mum. Accedit Tractatus de Seriebusinfinitis, et Epistola Gallice scripta deludo Pilae recticularis. Impensis Thurni-siorum, Fratrum, Basel (1713)

Harvard University: Tables of the Cumu-lative Binomial Probability Distribution,vol. 35. Annals of the Computation Lab-oratory, Harvard University Press, Cam-bridge, MA (1955)

National Bureau of Standards.: Tables ofthe Binomial Probability Distribution.U.S. Department of Commerce. AppliedMathematics Series 6 (1950)

Rao, C.R., Mitra, S.K., Matthai, A., Rama-murthy, K.G.: Formulae and Tables forStatistical Work. Statistical PublishingCompany, Calcutta, pp. 34–37 (1966)

Binomial TableThe binomial table gives the values forthe distribution function of a random

Page 53: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

46 Binomial Table

variable that follows a binomial distri-bution.

HISTORYSee binomial distribution.

MATHEMATICAL ASPECTSLet the random variable X follow the bino-mial distribution with parameters n and p.Its probability function is given by:

P(X = x) = Cxn · px · qn−x ,

x = 0, 1, 2, . . . , n ,

where Cxn is the binomial coefficient, equal

to n!x!(n−x)! , parameter p is the probability of

success, and q = 1−p is the complementaryprobability that corresponds to the probabi-lity of failure (see normal distribution).The distribution function of the randomvariable X is defined by:

P(X ≤ x) =x∑

i=0

Cin · pi · qn−i ,

0 ≤ x ≤ n .

The binomial table gives the value ofP(X ≤ x) for various combinations of x,n and p.For largen, thiscalculation becomes tedious.Thankfully, we can use some very goodapproximations instead. If min(np, n(1 −p)) > 10, we can approximate it with thenormal distribution:

P(X ≤ x) = φ

(x+ 1

2 − np√npq

),

where φ is the distribution function for thestandard normal distribution, and the conti-nuity correction 1

2 is included.

DOMAINS AND LIMITATIONSThe binomial table is used to perform non-parametric tests on statistics that are dis-tributed according to binomial distri-bution, especially the sign test and thebinomial test.The National Bureau of Standards (1950)published individual and cumulative bino-mial distribution probabilities for n ≤ 49,while cumulative binomial distributionprobabilities for n ≤ 1000 are given in theAnnals of the Computation Laboratory(1955).

EXAMPLESSee Appendix B.We can verify that for n = 2 and p = 0.5:

P(X ≤ 1) =1∑

i=0

Ci2(0.5)i(0.5)2−i

= 0.75 .

or that for n = 5 and p = 0.05:

P(X ≤ 3) =3∑

i=0

Ci5(0.05)i(0.95)5−i

= 1.0000 .

Foran exampleof theapplicationof thebino-mial table, see binomial test.

FURTHER READING� Binomial distribution� Binomial test� Sign test� Statistical table

REFERENCESNational Bureau of Standards.: Tables of

the Binomial Probability Distribution.U.S. Department of Commerce. AppliedMathematics Series 6 (1950)

Page 54: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Binomial Test 47

Harvard University: Tables of the Cumu-lative Binomial Probability Distribution,vol. 35. Annals of the Computation Lab-oratory, Harvard University Press, Cam-bridge, MA (1955)

Binomial TestThe binomial test is a parametric hypoth-esis test that applies when the populationcan be divided into two classes: each obser-vation of this population will belong to oneor the other of these two categories.

MATHEMATICAL ASPECTSWe consider a sample of n independent tri-als. Each trial belongs to either the class C1

or the class C2. We note the number of obser-vations n1 that fall into C1 and the number ofobservations n2 that fall into C2.Each trial has a probability p of belonging toclass C1, where p is identical for all n trials,and a probability q = 1− p of belonging toclass C2.

HypothesesThe binomial test can be either a two-sidedtest or a one-sided test. If p0 is the presumedvalue of p, (0 ≤ p0 ≤ 1), the hypotheses areexpressed as follows:

A: Two-sided case

H0 : p = p0 ,

H1 : p �= p0 .

B: One-sided case

H0 : p ≤ p0 ,

H1 : p > p0 .

C: One-sided case

H0 : p ≥ p0 ,

H1 : p < p0 .

Decision RulesCase A

For n < 25, we use the binomial table withn and po as parameters.Therefore, we initially look for the closestvalue to α

2 in this table (which we denoteα1),where α is the significance level. We denotethe value corresponding to α1 by t1.Next we find the value of 1−α1 = α2 in thetable. We denote the value corresponding toα2 by t2.We reject H0 at the level α if

n1 ≤ t1 or n1 > t2 ,

where n1 is the number of observations thatfall into the class C1.When n is bigger then 25,we can use the nor-mal distribution as an approximation forthe binomial distribution.The parameters for the binomial distributionare:

μ = n · p0 ,

σ = √n · p0 · q0 .

The random variable Z that follows thestandard normal distribution then equals:

Z = X − μ

σ= X − n · p0√

n · p0 · q0,

where X is a random binomial variable withparameters n and p0.The approximations for the values of t1 andt2 are then:

t1 = n · p0 + zα1 ·√

n · p0 · q0 ,

t2 = n · p0 + zα2 ·√

n · p0 · q0 ,

where zα1 and zα2 are the values found in thenormal table corresponding to the levels α1

and α2.

Page 55: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

48 Binomial Test

Case B

For n < 25, we take t to be the value inthe binomial table corresponding to 1 − α,whereα is the significance level (or the clos-est value), and n and p0 are the parametersdescribed previously. We reject H0at the lev-el α if

n1 > t ,

where n1 is the number of the observationsthat fall into the class C1.For n ≥ 25 we can make the approximation:

t = n · p0 + zα√

n · p0 · q0 ,

where zα can be found in the normal table for1− α.The decision rule in this case is the same asthat for the n < 25.

Case C

Forn < 25, t is thevalue in thebinomial tablecorresponding to α, where α is the signifi-cance level (or the closest value), and with nand p0 the same parameters described previ-ously.We reject H0 at the level α if

n1 ≤ t ,

where n1 is the number of observations thatfall into the class C1.For n ≥ 25, we make the approximation:

t = n · p0 + zα√

n · p0 · q0 ,

where zα is found in the normal table for thesignificance level α. Then, the decision ruleis the same as described previously.

DOMAINS AND LIMITATIONSTwo basic conditions that must be respectedwhen performing the binomial test are:

1. The n observations must be mutuallyindependent;

2. Every observation has a probability p offalling into the first class. This probabilityis also the same for all observations.

EXAMPLESA machine is considered to be operational ifa maximum of 5% of the pieces that it pro-duces are defective. The null hypothesis,denoted H0, expresses this situation, whilethealternative hypothesis, denoted H1, sig-nifies that the machine is failing:

H0 : p ≤ 0.05

H1 : p > 0.05 .

Performing the test, we take a sample of10 pieces and we note that there are threedefective pieces (n1 = 3). As the hypothesescorrespond to the one-tailed test (case B),decision rule B is used. If we choose a sig-nificance level of α = 0.05, the value of tin the binomial table equals 1 (for n = 10and p0 = 0.05).We reject H0 because n1 = 3 > t = 1, andwe conclude that the machine is failing.Then we perform the test again, but on100 pieces this time. We notice that there are12 defective pieces. In this case, thevalue of tcan be approximated by:

t = n · p0 + zα√

n · p0 · q0 ,

where zα can be found from the normaltable for 1− α = 0.95. We then have:

t = 100 · 0.05+ 1.64√

100 · 0.05 · 0.95

= 8.57 .

We again reject H0, because n1 = 12 > t =8.57.

Page 56: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Biostatistics 49

FURTHER READING� Binomial table� Goodness of fit test� Hypothesis testing� One-sided test� Parametric test� Two-sided test

REFERENCESAbdi, H.: Binomial distribution: Binomi-

cal and Sign Tests. In: Salkind, N.J. (ed.)Encyclopedia of Measurement and Statis-tics. Sage, Thousand Oaks (2007)

Biostatistics

Biostatistics is the scientific field wherestatistical methods are applied in order toanswer questions related to human biologyand medicine (the prefix “bio” comes fromGreek “bios,” which means “life”).The domains of biostatistics are mainlyepidemiology, clinical and biological tri-als, and it is also used when studying theethics of these trials.

HISTORYBiostatistics began in the middle of the sev-enteenth century, when Petty, Sir William(1623–1687) and Graunt, John (1620–1674)created new methods of analyzing the Lon-don Bills of Mortality. They applied thesenew methods of analysis to death rate,birthrate and census studies, creating thefield of biometrics. Then, in the middle ofthe nineteenth century, the works of Mendel,Gregor studied inheretance in plants. Hisobservations and results were based on sys-tematically gathered data, and also on theapplicationofnumericalmethodsofdescrib-ing the regularity of hereditary transmission.

Galton, Francis and Pearson, Karl weretwo of the most important individuals asso-ciated with the development of this science.They used the new concepts and statisticalmethodswheninvestigatingtheresemblancein physical, psychological and behavioraldata between parents and their children.Pearson, Galton and Weldon, Walter FrankRaphael (1860–1906) cofounded the jour-nal Biometrika. Fisher, Ronald Aylmer,during his agricultural studies performed atRothamsted ExperimentalStation,proposeda method of random sampling where ani-mals were partitioned into different groupsand allocated different treatments, markingthe first studies into clinical trials.

DOMAINS AND LIMITATIONSThe domains of biostatistics are principallyepidemiology, clinical trials and the ethicalquestions related to them, as well as biologi-cal trials. One of the areas where biostatisti-cal concepts have been used to analyze a spe-cific question was in indirect measurementsof the persistence of a substance (for exam-ple vitamins and hormones) when adminis-tered to a living creature. Biostatistics there-fore mainly refer to statistics used to solveproblems that appear in the biomedical sci-ences.Thestatisticalmethodsmostcommonlyusedin biostatistics include resampling meth-ods (bootstrap, jackknife), multivariateanalysis, various regression methods, eval-uations of uncertainty related to estimation,and the treatment of missing data.The most well-known scientific journalsrelated to biostatistics are: Statistics inMedicine, The Biometrical Journal, Con-trolled Clinical Trials, The Journal of Bio-pharmaceutical Statistics, Statistical Meth-ods in Medical Research and Biometrics.

Page 57: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

50 Block

FURTHER READING� Demography� Epidemiology

REFERENCESArmitage, P., Colton, T.: Encyclopedia of

Biostatistics. Wiley, New York (1998)

Graunt, J.:Natural and politicalobservationsmentioned in a following index, and madeupon the bills of mortality: with referenceto thegovernment, religion, trade,growth,ayre, diseases, and the several changes ofthe said city. Tho. Roycroft, for John Mar-tin, James Allestry, and Tho. Dicas (1662)

Greenwood, M.: Medical Statistics fromGraunt to Farr. Cambridge UniversityPress, Cambridge (1948)

Petty, W.: Another essay in political arith-metic concerning the growth of the cityof London. Kelley, New York (1682) (2ndedn. 1963)

Petty, W.: Observations Upon the Dublin-Bills of Mortality, 1681, and the State ofThat City. Kelley, New York (1683) (2ndedn. 1963)

Block

Blocks are sets where experimental unitsare grouped in such a way that the units areas similar as possible within each block.We can expect that the experimental errorassociated with a block will be smaller thanthat obtained if the same number of unitswere randomly located within the wholeexperimental space.The blocks are generally determined by tak-ing into account both controllable causesrelated by the factors studied and causes that

may be difficult or impossible to keep con-stant over all of the experimental units.The variations between the blocks are theneliminated when we compare the effects ofthe factors.Several types of regroupings can be used toreduce the effects of one or several sources oferror. A randomized block design resultsif there is only one source of error.

HISTORYThe block concept used in the field of designof experiment originated in studies madebyFisher, R.A. (1925) when he was Head oftheRothamstedExperimentalStation.Whenworking with agricultural researchers, Fish-er realized that the ground(field) chosen forthe experiment was manifestly heteroge-nous in the sense that the fertility variesin a systematic way from one point on theground to another.

FURTHER READING� Design of experiments� Experimental Unit� Factor� Graeco-Latin square design� Latin square designs� Randomized block design

REFERENCESFisher, R.A.: Statistical Methods for

Research Workers. Oliver & Boyd, Edin-burgh (1925)

Bonferroni, Carlo E.Bonferroni, Carlo Emilio was born in 1892in Bergamo, Italy. He obtained a degree inmathematics in Turin and completedhisedu-cation by spending a year at university in

Page 58: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Bootstrap 51

Vienna and at the Eidgenössiche TechnischeHochschule inZurich.HewasaMilitaryoffi-cer during the First World War, and after thewar had finished became an assistant profes-sor atTurin Polytechnic. In1923,he receivedthe Financial Mathematics Chair at the Eco-nomics Institute in Bari, where he was theRector for seven years. He finally transferredto Florence in 1933, where he held his chairuntil his death.Bonferroni tackled various subjects, includ-ing actuarial mathemetics, probability andstatistical mathematics, analysis, geometryand mechanics. He gave his name to thetwo Bonferroni inequalities that facilitate thetreatment of statistical dependences. Theseappeared for the first time in 1936, in the arti-cle Teoria statistica delle classi e calcolodelle probabilità. The development of theseinequalities prompted a wide range of newliterature in this area.Bonferroni died in 1960 in Florence, Italy.Principal article of Carlo Emilio Bonfer-roni:

1936 Teoria statistica delle classi e calcolodelle probabilità. Publ. R. Istit. Super.Sci. Econ. Commerc. Firenze, 8, 1–62.

BootstrapThe term bootstrap describes a family oftechniques that are principally used to esti-mate the standard error, the bias and the con-fidence interval of a parameter (or morethan one parameter). It is based on n inde-pendent observations of a random variablewith an unknown distribution function F,and is particularly useful when the param-eters to be estimated relate to a complicatedfunction of F.

The basic idea of bootstrap (which is some-what similar to the idea behind the jackknifemethod) is to estimate F using a possibledistribution F and then to resample from F.Bootstrap procedures usually require the useof computers, since they can perform a largenumber of simulations in a relatively shorttime. In bootstrap methods, automatic sim-ulationstake theplaceof theanalyticalcalcu-lations used in the “traditional” methods ofestimation, and in certain cases they can pro-vide more freedom, for example when we donot want to (or we cannot) accept a hypothe-sis for the structure of the distribution of thedata. Certain bootstrap methods are includedin statistical software such as S-PLUS, SASand MATLAB.

HISTORYThe origin of the use of the word “bootstrap”in relation to the methods described hereneatly illustrates the reflexive nature of thesecondary samples generated by them (theones constructed from F). It originates fromthe literary character Baron Munchausen(from The Surprising Adventures of BaronMunchausen by Raspe, R.E.), who fell intoa lake,butpulledhimselfoutbyhisownboot-straps (his laces).The history of bootstrap, like many self-statistical techniques, starts in 1979 withthe publication of an article by Efron(1979), which had a great impact on manyresearchers. This article triggered the pub-lication of a huge number of articles on thetheory and applications of bootstrap.

MATHEMATICAL ASPECTSDifferent types of bootstrap have been pro-posed; the goal here is not to review all oftheseexhaustively,but instead to givean ideaof the types of the methods used.

Page 59: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

52 Bootstrap

Let x1, x2, . . . , xn be n independent observa-tions of a random variable with an unknowndistribution function F.We are interested in the estimation of anunknown parameter δ that depends on F andthe reliability (in other words its bias, itsvariance and its confidence interval). Theestimation of δ is afforded by the statistict = t(F), which is dependent on F. Since Fis unknown, we find an estimation F for Fbased on the sample x1, x2, . . . , xn, and we

estimate δ using t(

F)

. Classical examples

of t are the mathematical expectation, thevariance and the quantiles of F. We denotethe random variable corresponding to thestatistic t (which depends on x1, x2, . . . , xn)by T.We distinguish the following types of boot-strap:• Parametric bootstrap. If F = Fθ is

a member of the parametric family ofdistribution functions, we can estimate(the vector) θ using the estimator θ of themaximum likelihood. We naturally esti-mate F by F

θ.

• Nonparametric bootstrap. If F is nota member of the parametric family, it isestimated via an empirical distributionfunction calculated based on a sample ofsizen.Theempiricaldistribution functionin x is defined by:

F(x) = number of observations xi ≤ x

n.

The idea of the bootstrap is the following.1. Consider R independent samples of type

x∗1, x∗2, . . . , x∗n of a random variable of thedistribution F.

2. Calculate the estimated values t∗1 , t∗2 , . . . ,t∗R for the parameter δ based on the gen-erated bootstrap samples. First calculatethe estimation F∗i based on the sample i

(i = 1, . . . , R), and then set:

t∗i = t(

F∗i)

.

Following the bootstrap method, thebias b and the variance v of t are esti-mated by:

bboot = t∗ − t =(

1

R

R∑i=1

t∗i

)− t ,

vboot = 1

R− 1

R∑i=1

(t∗i − t∗)2 .

We should remark that two types of errorsintervene here:• A statistical error arising from the fact

thatweproceedbyconsidering thebiasand variance for F and not the real F,which is unknown.

• Asimulation error arising from the factthat the number of simulations is nothigh enough. In reality, it is enoughto resample between R = 50 and200 times.

There there are many possible methods ofobtaining confidence intervals using boot-strap, and we present only a few here:1. Normal intervals. Suppose that T follows

the Gaussian (normal) distribution. Hop-ing that T∗ also follows the normal distri-bution, we then find the following confi-dence interval for the (bias) corrected val-ue tb = t − b of t:

[tb −√vboot · z1−α/2,

tb +√vboot · z1−α/2

],

where zγ is the γ quantile of the Gaus-sian distribution N(0, 1). For a tolerancelimit of α, we use the value from the nor-mal table z1− α

2, which gives α = 5% :

z1− 0.052= 1.96.

Page 60: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Bootstrap 53

2. Bootstrap intervals. Suppose that T − δ

does not depend on an unknown variable.In this case we calculate the α-quantile aα

of T−δ. To do this, we arrange the valuest∗1−t, t∗2−t, . . . , t∗R−t in increasing order:

t∗(1) − t, t∗(2) − t, . . . , t∗(R) − t ,

and then we read the quantiles of inter-est aα = t∗((R+1)α) − t and a1−α =t∗((R+1)(1−α)) − t.And as far as

1− 2α = Pr (aα ≤ T − δ ≤ a1−α)

= Pr(2t− t∗((R+1)(1−α)) ≤ δ

≤ 2t − t∗((R+1)α)

),

the confidence interval for t at the signif-icance level α is:

[2t− t∗((R+1)(1−α)), 2t− t∗((R+1)α)

].

3. Studentized bootstrap intervals. It is pos-sible to improve the previous estimationby considering

Z = T − δ√v

,

where v is the variance of t, which must beestimated (normally using known meth-ods like Delta methods or the jackknifemethod). The confidence interval for tis found in an analogous way to thatdescribed previously:

[t −√v · z∗((R+1)(1−α)),

t +√v · z∗((R+1)α)

],

where z∗((R+1)(1−α)) and z∗((R+1)α) are“empirically” the (1 − α and α) quan-

tiles of(

z∗i = t∗i −tv∗

)i=1,...,R

.

EXAMPLESWe consider a typical example of nonpara-metric bootstrap (taken from the work ofDavison, A. and Hinkley, D.V. (1997)). Thedata concern the populations of ten Ame-rican cities in 1920 (U) and 1930 (X):

u 138 93 61 179 48x 143 104 69 260 75

u 37 29 23 30 2x 63 50 48 111 50

We are interested in the value δof the statisticT = E(X)

E(U)which will allow us to deduce the

populations in 1930 from those in 1920. Weestimate δ using t = x

u = 1.52, but what isthe uncertainty in this estimation?The bootstrap allows us to simulate the val-ues t∗i (i = 1, . . . , R) of δ sampled froma bivariate distribution Y = (U, X) with anempirical distribution function F that givesa weight 1

10 to each observation.Inpractice, this isamatterof resamplingwithreturn the ten data values and calculating thecorresponding

t∗i =∑10

j=1 x∗ij∑10j=1 u∗ij

, i = 1, . . . , R ,

where x∗ij and u∗ij are the values of the vari-ables X and U. This allows us to approximatethe bias and the variance of t using the for-mulae given before.The following table summarizes the simula-tions obtained with R = 5:

j

1 2 3 4 5 6 7

u∗1j 93 93 93 61 61 37 29

x∗1j 104 104 104 69 69 63 50

u∗2j 138 61 48 48 37 37 29

x∗2j 143 69 75 75 63 63 50

Page 61: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

54 Boscovich, Roger J.

j

1 2 3 4 5 6 7

u∗3j 138 93 179 37 30 30 30

x∗3j 143 104 260 63 111 111 111

u∗4j 93 61 61 48 37 29 29

x∗4j 104 69 69 75 63 50 50

u∗5j 138 138 138 179 48 48 48

x5j∗ 143 143 143 260 75 75 75

j Sum

8 9 10

u∗1j 29 30 2∑

u∗1j = 528

x∗1j 50 111 50∑

x∗1j = 774

u∗2j 30 30 2∑

u∗2j = 460

x∗2j 111 111 50∑

x∗2j = 810

u∗3j 30 2 2∑

u∗3j = 571

x∗3j 111 50 50∑

x∗3j = 1114

u∗4j 23 23 2∑

u∗4j = 406

x∗4j 48 48 50∑

x∗4j = 626

u∗5j 29 23 30∑

u∗5j = 819

x5j∗ 50 48 111∑

x∗5j = 1123

So, the t∗i are:

t∗1 = 1.466 , t∗2 = 1.761 , t∗3 = 1.951 ,

t∗4 = 1.542 , t∗5 = 1.371 .

From this, it is easy to calculate the bias andthe variance of t using the formulae givenpreviously:

b = 1.62− 1.52 = 0.10 and v = 0.0553 .

It is possible to use these results to calcu-late the normal confidence intervals, sup-posing that T − δ is normally distributed,N (0.1, 0.0553).Weobtain, forasignificancelevel of 5%, the following confidence inter-val for t: [1.16, 2.08].The small number of cities considered andsimulations mean that we cannot have muchconfidence in the values obtained.

FURTHER READING� Monte Carlo method� Resampling� Simulation

REFERENCESDavison, A.C., Hinkley, D.V.: Bootstrap

Methods and Their Application. Cam-bridge University Press, Cambridge(1997)

Efron, B.: Bootstrap methods: anotherlook at the jackknife. Ann. Stat. 7, 1–26(1979)

Efron, B., Tibshirani, R.J.: An Introductionto the Bootstrap. Chapman & Hall, NewYork (1993)

Hall, P.: The Bootstrap and EdgeworthExpansion. Springer, Berlin HeidelbergNew York (1992)

Shao, J., Tu, D.: The Jackknife and Boot-strap. Springer, Berlin Heidelberg NewYork (1995)

Boscovich, Roger J.

Boscovich,RogerJosephwasbornatRagusa(now Dubrovnik, Croatia) in 1711. Heattended the Collegium Ragusinum, thenhe went to Rome to study in the CollegioRomano; both colleges were Jesuitic. Hedied in 1787, in Milano.In 1760, Boscovich developed a geomet-ric method for finding a simple regressionline L1 (in order to correct errors in stel-lar observations). Laplace, Pierre Simonde disapproved of the fact that Boscovichused geometrical terminology and translat-ed this method into an algebraic version in1793.

Page 62: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Box Plot 55

The principal work of Boscovich, RogerJoseph:

1757 De Litteraria Expeditione per Pontifi-ciam ditionem, et Synopsis ampliorisOperis, ac habentur plura eius exexemplaria etiam sensorum impres-sa. Bononiensi Scientiarium et Ar-tium Instituto Atque Academia Com-mentarii, Tomus IV, pp. 353–396.

REFERENCESWhyte, Lancelot Law: Roger Joseph

Boscovich, Studies of His Life and Workon the 250th Anniversary of his Birth.Allen and Unwin, London (1961)

Box, E.P. GeorgeBox, George E.P. was born in 1919 in Eng-land. He served as a chemist in the BritishArmy Engineers during World War II. Afterthe war he received a degree in mathematicsand statistics from University College Lon-don. In the1950s,heworkedasavisitingpro-fessor in the US at the Institute of Statisticsat the University of North Carolina. In 1960he moved to Wisconsin, where he served asthe first chairman of the statistics depart-ment. He received the British Empire Medalin 1946, and the Shewhart Medal in 1968.His main interest was in experimental statis-tics and the design of experiments. His bookStatistics for Experimenters, coauthoredwith Hunter, J. Stuart and Hunter, WilliamG., is one of the most highly recommend-ed texts in this field. Box also wrote on timeseries analysis.

Recommended publication of Box, George:

2005 (with Hunter, William G. andHunter, J. Stuart) Statistics for Exper-

imenters: Design, Innovation, andDiscovery, 2nd edn. Wiley, New York

FURTHER READING� Design of experiments

Box Plot

The box plot is a way to represent the follow-ing five quantities for a set of data: the medi-an; the first quartile and the third quartile;the maximum and minimum values.The box plot is a diagram (a box) that illus-trates:• The measure of central tendency (in

principal the median);• The variability, and;• The symmetry.It is often used to compare several sets ofobservations.

HISTORYThe“box-and-whiskerplot,”orboxplot,wasintroduced by Tukey in 1972,alongwith oth-er methods of representing data semigraphi-cally, one of the most famous of which is thestem and leaf diagram.

MATHEMATICAL ASPECTSSeveralwaysof representing abox plotexist.We will present it in the following way:

The central rectangle represents 50% of theobservations. It is known as the interquar-tile range.Thelower limitof thisrectangle isfixed at the first quartile, and the upper limit

Page 63: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

56 Box Plot

at the thirdquartile.Thepositionofthemedi-an is indicated by a line through the rectan-gle.A line segment (a “whisker”) connects eachquartile to the corresponding extreme (min-imum or maximum, unless outliers arepresent; see below) value on each side ofthe rectangle.In this representation, outliers are treatedin a special way. When the observations arevery spread out we define two values calledthe “internal limits” or “whiskers” by:

int.lim.1 = 1st quartile

− (1.5 · interquartile range)

int.lim.2 = 3rd quartile

+ (1.5 · interquartile range)

For each internal limit, we then select thedata value that is closest to the limit but stillinside the interval between the internal lim-its. These two data values are known as adja-cent points. Now, when we construct the boxplot, the line segments connect the quartilesto the adjacent points. Observations outsideof the interval between the internal limits(outliers) are represented by stars in the plot.

EXAMPLEThe following example presents the revenueindexes per inhabitant for each Swiss canton(Swiss revenue= 100) in 1993:

Canton Index Canton Index

Zurich 125.7 Schaffhouse 99.2

Bern 86.2 AppenzellRh.-Ext.

84.2

Luzern 87.9 AppenzellRh.-Int.

72.6

Uri 88.2 Saint-Gall 89.3

Canton Index Canton Index

Schwyz 94.5 Grisons 92.4

Obwald 80.3 Argovie 98.0

Nidwald 108.9 Thurgovie 87.4

Glaris 101.4 Tessin 87.4

Zoug 170.2 Vaud 97.4

Fribourg 90.9 Valais 80.5

Soleure 88.3 Neuchâtel 87.3

Bâle-City 124.2 Genève 116.0

Bâle-Campaign

105.1 Jura 75.1

Source: Federal Office for Statistics (1993)

We now calculate the different quartiles:The box plot gives us information on the cen-tral tendency, on the dispersion of the distri-bution and the degree of symmetry:• Central tendency:

The median equals 90.10.• Dispersion:

The interquartile interval indicates theinterval that contains50%of theobserva-tions, and these observations are the clos-est to the center of distribution.In our example, we have:– 50% of the cantons of Switzerland

have an index that falls in the interval[102.33−87.03] (withawidthof15.3).

The limits of the box plot are the following:

lim.int.1 = 1st quartile

− (1.5 · interquartile interval)

= 87.03− 1.5 · 15.3 = 64.08 .

lim.int.2 = 3rd quartile

+ (1.5 · interquartile interval)

= 102.33+ 1.5 · 15.3 = 125.28 .

The values bigger then 125.28 are there-fore outliers. There are two of them: 125.7

Page 64: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

B

Box Plot 57

(Zurich) and 170.2 (Zoug). The box plot isthen as follows:

FURTHER READING� Graphical representation� Exploratory data analysis

� Measure of location� Measure of central tendency

REFERENCESTukey, J.W.: Some graphical and semigraph-

ical displays. In: Bancroft, T.A. (ed.) Sta-tistical Papers in Honor of George W.Snedecor. Iowa State University Press,Ames, IA , pp. 293–316 (1972)

Page 65: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Categorical DataCategorical data consists of counts of obser-vations falling into specified classes.We can distinguish between various types ofcategorical data:• Binary, characterizing the presence or

absence of a property;• Unordered multicategorical (also called

“nominal”);• Ordered multicategorical (also called

“ordinal”);• Whole numbers.We represent the categorical data in the formof a contingency table.

DOMAINS AND LIMITATIONSVariables that are essentially continuous canalso be presented as categorical variables.One example is “age”, which is a continuousvariable, but ages can still be grouped intoclasses so it can still be presented as categor-ical data.

EXAMPLESIn a public opinion survey for approving ordisapproving a new law, the votes cast canbe either “yes” or “no”. We can represent theresults in the form of a contingency table:

Yes No

Votes 8546 5455

If we divide up the employees of a businessinto professions (and at least three profes-sions are presented), the data we obtain isunordered multicategorical data (there is nonatural ordering of the professions).In contrast, if we are interested in the numberof people that have achieved various levelsof education, there will probably be a nat-ural ordering of the categories: “primary,secondary” and then university. Such datawould therefore be an example of orderedmulticategorical data.Finally, if we group employees into cate-gories based on the size of each employee’sfamily (that is, the number of family mem-bers), we obtain categorical data where thecategories are whole numbers.

FURTHER READING� Analysis of categorical data

� Binary data

� Category

� Data

� Dichotomous variable

� Qualitative categorical variable

� Random variable

REFERENCESSee analysis of categorical data.

Page 66: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

60 Category

Category

A category represents a set of people orobjects that have a common characteris-tic.If we want to study the people in a popu-lation, we can sort them into “natural” cat-egories, by gender (men and women) forexample, or into categories defined by othercriteria, such as vocation (managers, secre-taries, farmers . . . ).

FURTHER READING� Binary data� Categorical data� Dichotomous variable� Population� Random variable� Variable

Cauchy Distribution

A random variable X follows a Cauchydistribution if its density function is of theform:

f (x) = 1

πθ·[

1+(

x− α

θ

)2]−1

,

θ > 0 .

The parametersα and θ are the location anddispersion parameters, respectively.The Cauchy distribution is symmetric aboutx = α, which represents the median. Thefirst quartile and the third quartile are givenby α ± θ .The Cauchy distribution is a continuousprobability distribution.

Cauchy distribution, θ = 1, α = 0

MATHEMATICAL ASPECTSThe expected value E[X] and the varianceVar(X) do not exist.If α = 0 and θ = 1, the Cauchy distributionis identical to the Student distribution withone degree of freedom.

DOMAINS AND LIMITATIONSIts importance in physics is mainly due to thefact that it is the solution to the differentialequation describing force resonance.

FURTHER READING� Continuous probability distribution� Student distribution

REFERENCESCauchy, A.L.: Sur les résultats moyens

d’observations de même nature, et sur lesrésultats les plus probables. C.R. Acad.Sci. 37, 198–206 (1853)

Causal Epidemiology

Theaim ofcausalepidemiology is to identifyhow cause is related to effect with regard tohuman health.In other words, it is the study of causes of ill-ness, and involves attempting to find statis-

Page 67: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Causal Epidemiology 61

tical evidence for a causal relationship or anassociation between the illnessand the factorproposed to cause the illness.

HISTORYSee epidemiology.

MATHEMATICAL ASPECTSSee cause and effect in epidemiology,odds and odds ratio, relative risk, attri-butable risk, avoidable risk, incidencerate, prevalence rate.

DOMAINS AND LIMITATIONSStudies of the relationship between tobac-co smoking and the development of lungialcancers and the relationship between HIVand the AIDS are examples of causal epide-miology. Research into a causal relation isoften very complex, requiring many stud-ies and the incorporation and combination ofvarious data sets from biological and animalexperiments, to clinical trials.While causes cannot always be identifiedprecisely, a knowledge of the risk factorsassociated with an illness and therefore thegroups of people at risk allows us to inter-vene with preventative measures that couldpreserve health.

EXAMPLESAs an example, we can investigate therelationship between smoking and thedevelopment of lung cancer. Considera study of 2000 subjects: 1000 smokers and1000 nonsmokers. The age distributionsand male/female proportions are identicalin both groups. Let us analyze a summary ofdata obtained over many years. The resultsare presented in the following table:

Smokers Non-smokers

Number of cases oflung cancer (/1000)

50 10

Proportion ofpeople in this groupthat contractedlung cancer

5% 1%

If we compare the proportion of smokers thatcontract lungcancer to theproportionofnon-smokers that do, we get 5%/1% = 5, so wecan conclude that the risk of developing lungcancer is five times higher in smokers thanin nonsmokers. We generally also evaluatethe significance of this result computing p-value. Suppose that thechi-square test gavep < 0.001. It normally accepted that if thep-value is smaller then 0.05, then the resultsobtained are statistically significant.Suppose that we perform the same study, butthe dimension of each group (smokers andnonsmokers) is 100 instead of 1000, and weobserve the same proportions of people thatcontract lung cancer:

Smokers Non-smokers

Number of cases oflung cancer (/100)

5 1

Proportion ofindividuals in thisgroup thatcontracted lungscancer

5% 1%

If we perform the same statistical test, thep-value is found to be 0.212. Since this isgreater than 0.05, we can cannot draw anysolid conclud that there isabout theexistenceof a significant statistical relation betweensmoking and lung cancer. This illustratesthat, in order to have a statistically signifi-

Page 68: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

62 Cause and Effect in Epidemiology

cant level of difference between the resultsfor different populations obtained from epi-demiological studies, it is usually necessaryto study large samples.

FURTHER READINGSee epidemiology.

REFERENCESSee epidemiology.

Cause and Effectin Epidemiology

In epidemiology, the “cause” is an agent(microbial germs, polluted water, smoking,etc.) that modifies health, and the “effect”describes the the way that the health ischanged by the agent. The agent is oftenpotentially pathogenic (in which case it isknown as a “risk factor”).The effect is therefore effectively a risk com-parison. We can define two different types ofrisk in this context:• The absolute effect of a cause expresses

the increase in the risk or the additionalnumber of cases of illness that result orcould result from exposure to thiscause. Itis measured by the attributable risk andits derivatives.

• The relative effect of a cause expressesthestrengthof theassociationbetweenthecausal agent and the illness.

A cause that produces an effect by itself iscalled sufficient.

HISTORYThe terms “cause” and “effect” were definedat thebirth ofepidemiology,which occurredin the seventeenth century.

MATHEMATICAL ASPECTSFormally, we have:

Absolute effect = Risk for exposed

− risk for unexposed .

The absolute effect expresses the excess riskor cases of illness that result (or could result)from exposure to the cause.

Relative effect = risk for exposed

risk for unexposed.

The relative effect expresses the strength ofthe association between the illness and thecause. It is measured using the relative riskand the odds ratio.

DOMAINS AND LIMITATIONSStrictly speaking, the strength of an asso-ciation between a particular factor and an ill-ness is not enough to establish a causal rela-tionship between them. We also need to con-sider:• The “temporality criterion” (we must be

sure that the supposed cause precedes theeffect), and;

• Fundamental and experimental researchelements that allow us to be sure thatthe supposed causal factor is not actual-ly a “confusion factor” (which is a factorthat is not causal, but is statistically relat-ed to the unidentified real causal factor).

Two types of causality correspond to theserelativeandabsoluteeffects.Relativecausal-ity is independent of the clinical or publichealth impact of the effect; it generally doesnot allow us to prejudge the clinical or pub-lic health impact of the associated effect. Itcan be strong when the risk of being exposedis very high or when the risk of being unex-posed is very low.Absolute causality expresses the clinical orpublic health impact of the associated effect,

Page 69: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Cause and Effect in Epidemiology 63

and therefore enables us to answer the ques-tion: if we had suppressed the cause, whatlevel of impact on the population (in terms ofcases of illness) would have been avoided? Ifthepatienthadstoppedsmoking,whatwouldthe reduction in his risk of developing lungcancer or having a myocardial infarction be?A risk factor associated with a high relativeeffect, but which concerns only a small num-ber of individuals, will cause fewer illnessesand deaths then a risk factor that is associatedwith a smaller relative effect but where manymore individuals are exposed. It is there-fore clear that the importance of a causalrelation varies depending on whether weare considering relative or absolute causal-ity.We should also make an important point hereabout causal interactions. There can be manycauses for the same illness. While all of thesecauses contribute to the same result, they canalso interact. The main consequence of thiscausal interaction is that we cannot prejudgethe effect of simultaneous exposure to caus-es A and B (denoted A+B) based on what weknowabout theeffectofexposure toonlyAoronlyB. Incontrast to thecasefor independentcauses, we must estimate the joint effect, notrestrict ourselves with the isolated analysesof the interacting causes.

EXAMPLESThe relative effect and the absolute effect aresubject to different interpretations, as the fol-lowing example shows.Suppose we have two populations P1 and P2,each comprising 100000 individuals. Inpop-ulation P1, the risk of contracting a given ill-ness is 0.2% for the exposed and 0.1% for theunexposed. In population P2, the risk for theexposed is 20% and that for the unexposedis 10%, as shown in the following table:

Popu-lation

Risk for theexposed (%)

Risk for theunexposed (%)

A B

P1 0.2 0.1

P2 20 10

Popu-lation

Relativeeffect

Absoluteeffect (%)

Avoidablecases

AB

C = A − B C×100000

P1 2.0 0.1 100

P2 2.0 10 10000

The relative effect is the same for popula-tions P1 and P2 (the ratio of the risk for theexposed to the risk for the unexposed is 2),but the impact of the same prevention mea-sures would be very different in the two pop-ulations, because the absolute effect is tentimes more important in P2: the number ofpotentially avoidable cases is therefore 100in population P1 and 10000 in population P2.Now consider the incidence rate of lungcancer in a population of individuals whosmoke 35 or more cigarettes per day:3.15/1000/year. While this rate may seemsmall, it masks the fact that there is a strongrelative effect (the risk is 45 times biggerfor smokers then for nonsmokers) due tothe fact that lung cancer is very rare in non-smokers (the incidence rate for nonsmokersis 0.07/1000/year).

FURTHER READING� Attributable risk

� Avoidable risk

� Incidence rate

� Odds and odds ratio

� Prevalence rate

� Relative risk

Page 70: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

64 Census

REFERENCESLilienfeld, A.M., Lilienfeld, D.E.: Founda-

tions of Epidemiology, 2nd edn. Claren-don, Oxford (1980)

MacMahon, B., Pugh, T.F.: Epidemiology:Principles and Methods. Little Brown,Boston, MA (1970)

Morabia, A.: Epidemiologie Causale. Editi-ons Médecine et Hygiène, Geneva (1996)

Rothmann, J.K.: Epidemiology. An Intro-duction. Oxford University Press (2002)

CensusA census is an operation that consists ofobserving all of the individuals in a popu-lation. The word census can refer to a pop-ulation census, in other words a populationcount, but it can also refer to inquiries (called“exhaustive” inquiries) where we retrieveinformation about a population by observ-ing all of the individuals in the population.Clearly, such inquiries will be very expen-sive for very large populations. That is whyexhaustive inquiries are rarely performed;sampling, which consists of observing ofonly a portion of the population (called thesample), is usually preferred instead.

HISTORYCensuses originated with the great civi-lizations of antiquity, when the large areasof empires and complexity associated withgoverning them required knowledge of thepopulations involved.Among the most ancient civilizations, itis known that censuses were performedin Sumeria (between 5000 and 2000 BC),where the people involved reported lists ofmen and goods on clay tables in cuneiformcharacters.

Censuses were also completed in Meso-potamia (about 3000 BC), as well as inancient Egypt from the first dynasty on-wards; these censuses were performed dueto military and fiscalobjectives.Under Ama-sis II, everybody had to (at the risk of death)declare their profession and source(s) of rev-enue.The situation in Israel was more complex:censuses were sometimes compulsary andsometimes forbidden due to the Old Testa-ment. This influence on Christian civiliza-tion lasted quite some time; in the MiddleAges, St. Augustin and St. Ambroise werestill condemning censuses.In China, censuses have been performedsince at least 200 BC, in different formsand for different purposes. Hecht, J. (1987)reported the main censuses:1. Han Dynasty (200 years BC to 200 years

AD):populationcensuseswere related tothe system of conscription.

2. Three Kingdoms Period to Five Dynas-ties (221–959 AD): related to the systemof territorial distribution.

3. Song and Yuan Dynasties (960–1368AD): censuses were performed for fiscalpurposes.

4. Ming Dynasty (1368–1644 AD): “yellowregisters” were established for ten-yearcensuses. They listed the name, profes-sion, gender and age of every person.

5. Qing Dynasty (since 1644 AD): censuseswere performed in order to survey popu-lation migration.

In Japan during the Middle Age, differenttypes of census have been used.The first cen-sus was probably performed under the ruleof Emperor Sujin (86 BC).Finally, in India, a political and economicscience treatise entitled “Arthasastra” (profittreaty) gives some information about the use

Page 71: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Census 65

of an extremely detailed record. This treatisewas written by Kautilya, Prime Minister inthe reign ofChandraguptaMaurya (313–289BC).Another very important civilization, theIncans, also used censuses. They useda statistics system called “quipos”. Eachquipo was both an instrument and a reg-istry of information. Formed from a series ofcords, the colors, combinations and knots onthe cords had precise meanings. The quiposwere passed to specially initiated guards thatgathered together all of these statistics.In Europe, the ancient Greeks and Romansalso practiced censuses. Aristotle reportedthat the Greeks donated a measure of wheatper birth and a measure of barley per death tothegoddessAthéna. InRome, thefirstcensuswas performed at the behest of King ServiusTullius (578–534 BC) in order to monitorrevenues, and consequently raise taxes.Later, depending on the country, census-es were practiced with different frequenciesand on different scales.In 786, Charlemagne ordered a count of allof his subjects over twelve years old; pop-ulation counts were also initiated in Italy inthe twelfth century; many cities performedcensuses of their inhabitants in the fifteenthcentury, including Nuremberg (in 1449) andStrasbourg (in 1470). In the sixteenth centu-ry, France initiated marital status registers.The seventeenth century saw the develop-ment of three different schools of thought:a German school associated with descrip-tive statistics, a French school associatedwith census ideology and methodology, andan English school that led to modern statis-tics.In the history of censuses in Europe, thereis a country that occupies a special place. In1665, Sweden initiated registers of parish-

ioners that were maintained by pastors; in1668,adecreemadeitobligatorytobecount-ed in theseregisters,and insteadofbeingreli-gious, the registers became administrative.1736 saw the appearance of another decree,stating that the governor of each provincehad to report any changes in the populationof the province to parliament. Swedish pop-ulation statistics were officially recognizedon the 3rd of February 1748 due to creationof the“Tabellverket” (administrative tables).The first summary for all Swedish provinces,realized in 1749, can be considered to bethe first proper census in Europe, and the11th of November 1756 marked the creationof the “Tabellkommissionen,” the first offi-cial Division of Statistics. Since 1762, thesetables of figures have been maintained by theAcademy of Sciences.Initially, the Swedish censuses were orga-nized annually (1749–1751), and then everythreeyears (1754–1772),but since1775 theyhave been conducted every five years.At the end of the eighteenth century an offi-cial institute for censuses was created inFrance (in 1791). In 1787 the principle ofcensus has been registered in the Constitu-tional Charter of the USA (C. C. USA).See also official statistics.

FURTHER READING� Data collection� Demography� Official statistics� Population� Sample� Survey

REFERENCESHecht, J.: L’idée du dénombrement jusqu’à

la Révolution. Pour une histoire de la

Page 72: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

66 Central Limit Theorem

statistique, tome 1, pp. 21–82 . Econo-mica/INSEE (1978)

Central Limit Theorem

The central limit theorem is a fundamentaltheorem of statistics. In its simplest form, itprescribes that the sum of a sufficiently largenumber of independent identically distribut-ed random variables approximately followsa normal distribution.

HISTORYThe central limit theorem was first estab-lished within the framework of binomi-al distribution by Moivre, Abraham de(1733). Laplace, Pierre Simon de (1810)formulated the proof of the theorem.Poisson, Siméon Denis (1824) also workedon this theorem, and Chebyshev, Pafnu-tii Lvovich (1890–1891) gave a rigorousdemonstration of it in the middle of the nine-teenth century.At the beginning of the twentieth centu-ry, the Russian mathematician Liapounov,Aleksandr Mikhailovich (1901) created thegenerally recognized form of thecentral lim-it theorem by introducing its characteris-tic functions. Markov, Andrei Andreevich(1908) also worked on it and was the first togeneralize the theorem to the case of inde-pendent variables.According to Le Cam, L. (1986), the qualifi-er “central” was given to it by George Polyà(1920) due to the essential role that it playsin probability theory.

MATHEMATICAL ASPECTSLet X1, X2, . . . , Xn be n independent ran-dom variables that are identically distribut-

ed (with any distribution) with a meanμ anda finite variance σ 2.We define the sum Sn = X1+X2+ . . .+Xn

and we establish the ratio:

Sn − n · μσ · √n

,

where n·μandσ ·√n represent the mean andthe standard deviation of Sn, respectively.The central limit theorem establishes that thedistribution of this ratio tends to the standardnormal distribution when n tends to infinity.This means that:

P

(Sn − n · μ

σ√

n≤ x

)−→

n→+∞�(x)

where �(x) is the distribution func-tion of the standard normal distribution,expressed by:

�(x) =∫ x

−∞1√2π

exp

(−x2

2

)dx ,

−∞ < x <∞ .

DOMAINS AND LIMITATIONSThe central limit theorem provides a simplemethod of approximately calculating proba-bilities related to the sums of random vari-ables.Besides its interest in relation to the sam-pling theorem, where the sums and themeansplayan important role, thecentral lim-it theorem is used to approximate normaldistributions derived from summing iden-tical distributions. We can for example, withthe help of the central limit theorem, usethe normal distribution to approximate thebinomial distribution, the Poisson distri-bution, the gamma distribution, the chi-square distribution, the Student distri-bution, the hypergeometric distribution,the Fisher distribution and the lognormaldistribution.

Page 73: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Central Limit Theorem 67

EXAMPLESIna largebatchofelectrical items, theproba-bility of choosing a defective item equalsp = 1

8 . What is the probability that 4 defec-tive items are chosen when 25 items areselected?Let X the dichotomous variable be theresult of a trial:

X ={

1 if the selected item is defective;0 if it is not.

The random variable X follows a Bernoul-li distribution with parameter p. Conse-quently, the sum Sn = X1+X2+. . .+Xn fol-lows a binomial distribution with a mean,np and a variance np(1−p), which, follow-ing the central limit theorem, can be approxi-matedbyanormal distributionwithameanμ = np and a variance σ 2 = np(1− p).We evaluate these values:

μ = n · p = 25 · 18 = 3.125

σ 2 = n · p(1− p) = 25 · 18 ·

(1− 1

8

)

= 2.734 .

We then calculate P(Sn > 4) in two differentways:1. With the binomial distribution:

From the binomial table, the probabilityof P(Sn ≤ 4) = 0.8047. The probabilityof P (Sn > 4) is then:

P(Sn > 4) = 1− P(Sn ≤ 4) = 0.1953 ,

2. With the normal approximation (obtainedfrom the central limit theorem):In order to account for the discrete char-acter of the random variable Sn, we mustmake a continuity correction; that is, wecalculate the probability that Sn is greaterthen 4+ 1

2 = 4.5

We have:

z = Sn − n · p√n · p (1− p)

= 4.5− 3.125

1.654= 0.832 .

From the normal table, we obtain theprobability:

P(Z > z) = P(Z > 0.832)

= 1− P(Z ≤ 0.832)

= 1− 0.7967 = 0.2033 .

FURTHER READING� Binomial distribution� Chi-square distribution� Convergence� Convergence theorem� Fisher distribution� Gamma distribution� Hypergeometric distribution� Law of large numbers� Lognormal distribution� Normal distribution� Poisson distribution� Probability� Probability distribution� Student distribution

REFERENCELaplace, P.S. de: Mémoire sur les approxi-

mations des formules qui sont fonctionsde très grands nombres et sur leur appli-cation aux probabilités. Mémoires del’Académie Royale des Sciences de Paris,10. Reproduced in: Œuvres de Laplace12, 301–347 (1810)

Le Cam, L.: The Central Limit Theoremaround 1935. Stat. Sci. 1, 78–96 (1986)

Page 74: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

68 Chebyshev, Pafnutii Lvovich

Liapounov, A.M.: Sur une proposition dela théorie des probabilités. Bulletin del’Academie ImperialedesSciencesdeSt.-Petersbourg 8, 1–24 (1900)

Markov,A.A.:Extension des théorèmes lim-itesdu calculdesprobabilitésaux sommesdes quantités liées en chaîne. Mem. Acad.Sci. St. Petersburg 8, 365–397 (1908)

Moivre, A. de: Approximatio ad summamterminorum binomii (a + b)n, in seriemexpansi. Supplementum II to MiscellanaeAnalytica, pp. 1–7 (1733). Photograph-ically reprinted in a rare pamphlet onMoivre and some of his discoveries. Pub-lished by Archibald, R.C. Isis 8, 671–683(1926)

Poisson,S.D.:Sur laprobabilitédes résultatsmoyens des observations. Connaissancedes temps pour l’an 1827, pp. 273–302(1824)

Polyà, G.: Ueber den zentralen Grenzwert-satz der Wahrscheinlichkeitsrechnungund dasMomentproblem.MathematischeZeitschrift 8, 171–181 (1920)

Tchebychev, P.L. (1890–1891). Sur deuxthéorèmes relatifs aux probabilités. ActaMath. 14, 305–315

Chebyshev, Pafnutii Lvovich

Chhebyshev, Pafnutii Lvovich (1821–1894)began studying at Moscow University in1837, where he was influenced by Zernov,Nikolai Efimovich (the first Russian to geta doctorate in mathematical sciences) andBrashman, Nikolai Dmitrievich. After gain-ing his degree he could not find any teachingwork in Moscow, so he went to St. Peters-bourg where he organized conferences onalgebra and probability theory. In 1859, hetook the probability course given by Buni-

akovsky, Viktor Yakovlevich at St. Peters-bourg University.His name lives on through the Chebyshevinequality (also known as the Bienaymé–Chebyshev inequality), which he proved.This was published in French just after Bien-aymé, Irénée-Jules had an article publishedon the same topic in the Journal de Math-ématiques Pures et Appliquées (also calledthe Journal of Liouville).He initiated rigorous work into establishinga general version of the central limit theo-rem and is considered to be the founder ofthe mathematical school of St. Petersbourg.

Some principal works and articles ofChebyshev, Pafnutii Lvovich:

1845 An Essay on Elementary Analysis ofthe Theory of Probabilities (thesis)Crelle’s Journal.

1867 Preuve de l’inégalité de Tchebychev.J. Math. Pure. Appl., 12, 177–184.

FURTHER READING� Central limit theorem

REFERENCESHeyde, C.E., Seneta, E.: I.J. Bienaymé.

Statistical Theory Anticipated. Springer,Berlin Heidelberg New York (1977)

Chebyshev’s InequalitySee law of large numbers.

Chi-Square DistanceConsider a frequency table with n rowsand p columns, it is possible to calculate rowprofiles and column profiles. Let us then plot

Page 75: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Chi-Square Distance 69

the n or p points from each profile. We candefine the distances between these points.The Euclidean distance between the compo-nents of the profiles, on which a weightingis defined (each term has a weight that is theinverse of its frequency), is called the chi-square distance. The name of the distance isderived from the fact that the mathematicalexpression defining the distance is identicalto that encountered in the elaboration of thechi square goodness of fit test.

MATHEMATICAL ASPECTSLet (fij), be the frequency of the ith row andjthcolumninafrequencytablewithn rowsanp columns. The chi-square distance betweentwo rows i and i′ is given by the formula:

d(i, i′) =√√√√

p∑j=1

(fijfi.− fi′j

fi′.

)2

· 1

f.j,

where

fi. is the sum of the components of the ithrow;

f.j is the sum of the components of the jthcolumn;

[fijfi.

] is the ith row profile for j= 1, 2, . . . , p.

Likewise, the distance between two co-lumns j and j′ is given by:

d(j, j′) =√√√√

n∑i=1

(fijf.j− fij′

f.j′

)2

· 1

fi.,

where [ fijf.j

] is the jth column profile for j =1, . . . , n.

DOMAINS AND LIMITATIONSThe chi-square distance incorporatesa weight that is inversely proportional tothe total of each row (or column), which

increases the importance of small devia-tions in the rows (or columns) which havea small sum with respect to those with moreimportant sum package.The chi-square distance has the property ofdistributional equivalence, meaning that itensures that the distances between rows andcolumns are invariant when two columns (ortwo rows) with identical profiles are aggre-gated.

EXAMPLESConsider a contingency table charting howsatisfied employees working for three differ-entbusinessesare.Letusestablishadistancetable using the chi-square distance.Values for thestudied variableX canfall intoone of three categories:• X1: high satisfaction;• X2: medium satisfaction;• X3: low satisfaction.Theobservationscollected from samplesofindividuals from the threebusinessesaregiv-en below:

Busi-ness 1

Busi-ness 2

Busi-ness 3

Total

X1 20 55 30 105

X2 18 40 15 73

X3 12 5 5 22

Total 50 100 50 200

The relative frequency table is obtained bydividing all of the elements of the table by200, the total number of observations:

Busi-ness 1

Busi-ness 2

Busi-ness 3

Total

X1 0.1 0.275 0.15 0.525

X2 0.09 0.2 0.075 0.365

X3 0.06 0.025 0.025 0.11

Total 0.25 0.5 0.25 1

Page 76: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

70 Chi-square Distribution

We can calculate the difference in employ-ee satisfaction between the the 3 enterprises.The column profile matrix is given below:

Busi-ness 1

Busi-ness 2

Busi-ness 3

Total

X1 0.4 0.55 0.6 1.55

X2 0.36 0.4 0.3 1.06

X3 0.24 0.05 0.1 0.39

Total 1 1 1 3

This allows us to calculate the distancesbetween the different columns:

d2(1, 2) = 1

0.525· (0.4− 0.55)2

+ 1

0.365· (0.36− 0.4)2

+ 1

0.11· (0.24− 0.05)2

= 0.375423

d(1, 2) = 0.613

We can calculate d(1, 3) and d(2, 3) in a sim-ilar way. The distances obtained are summa-rized in the following distance table:

Busi-ness 1

Busi-ness 2

Busi-ness 3

Business 1 0 0.613 0.514

Business 2 0.613 0 0.234

Business 3 0.514 0.234 0

We can also calculate the distances betweenthe rows, in other words the difference inemployee satisfaction; to do this we need theline profile table:

Busi-ness 1

Busi-ness 2

Busi-ness 3

Total

X1 0.19 0.524 0.286 1

X2 0.246 0.548 0.206 1

X3 0.546 0.227 0.227 1

Total 0.982 1.299 0.719 3

This allows us to calculate the distancesbetween the different rows:

d2(1, 2) = 1

0.25· (0.19− 0.246)2

+ 1

0.5· (0.524− 0.548)2

+ 1

0.25· (0.286− 0.206)2

= 0.039296

d(1, 2) = 0.198

Wecan calculated(1, 3)and d(2, 3) ina simi-lar way. The differences between the degreesof employee satisfaction are finally summa-rized in the following distance table:

X1 X2 X3

X1 0 0.198 0.835

X2 0.198 0 0.754

X3 0.835 0.754 0

FURTHER READING� Contingency table� Distance� Distance table� Frequency

Chi-square Distribution

A random variable X follows a chi-squaredistribution with n degrees of freedom if itsdensity function is:

f (x) = xn2−1 exp

(− x2

)

2n2

( n2

) , x ≥ 0 ,

where is the gamma function (see Gammadistribution).

Page 77: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Chi-square Distribution 71

χ2 distribution, ν = 12

The chi-square distribution is a continuousprobability distribution.

HISTORYAccording to Sheynin (1977), the chi-squaredistribution was discovered by Ernst KarlAbbe in 1863. Maxwell obtained it forthree degrees of freedom a few years before(1860),andBoltzmandiscoveredthegeneralcase in 1881.However, according to Lancaster (1966),Bienaymé obtained the chi-square distri-bution in1838asthe limitof thediscreteran-dom variable

k∑i=1

(ni − npi)2

npi,

if (N1, N2, . . . , Nk) follow a joint multi-nomial distribution of parameters n, p1,p2, . . . , pk.Ellis demonstrated in 1844 that the sum ofk random variables distributed accordingto a chi-square distribution with two degreesof freedom follows a chi-square distributionwith 2k degrees of freedom. The generalresult was demonstrated in 1852 by Bien-aymé.The works of Pearson, Karl are very impor-tant in this field. In 1900 he used the chi-square distribution to approximate the chi-square statistic used in different tests basedon contingency tables.

MATHEMATICAL ASPECTSThe chi-square distribution appears in thetheory of random variables distributedaccording to a normal distribution. In this,it is the distribution of the sum of squares ofnormal, centered and reduced random vari-ables (with a mean equal to 0 and a varianceequal to 1).Consider Z1, Z2, . . . , Zn, n independent,standard normal random variables. Theirsum of squares:

X = Z21 + Z2

2 + . . .+ Z2n =

n∑i=1

Z2i

is a random variable distributed according toa chi-square distribution with n degrees offreedom.The expected value of the chi-square distri-bution is given by:

E[X] = n .

The variance is equal to:

Var(X) = 2n .

Thechi-squaredistribution is related to othercontinuous probability distributions:• The chi-square distribution is a particular

case of the gamma distribution.• If two random variables X1 and X2 fol-

lowachi-squaredistributionwith, respec-tively, n1 and n2 degrees of freedom,then the random variable

Y = X1/n1

X2/n2

followsaFisher distributionwith n1 andn2 degrees of freedom.

• When the number of degrees of free-dom n tends towards infinity, the chi-square distribution tends (relatively slow-ly) towards a normal distribution.

Page 78: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

72 Chi-square Goodness of Fit Test

DOMAINS AND LIMITATIONSThe chi-square distribution is used in manyapproaches to hypothesis testing, the mostimportant being the goodness of fit testwhich involves comparing the observed fre-quenciesandthehypothetical frequenciesofspecific classes.It is also used for comparisons between theobservedvarianceandthehypotheticalvari-ance of normally distributed samples, and totest the independence of two variables.

FURTHER READING� Chi-square goodness of fit test

� Chi-square table

� Chi-square test

� Continuous probability distribution

� Fisher distribution

� Gamma distribution

� Normal distribution

REFERENCESLancaster, H.O.: Forerunners of the Pear-

son chi-square. Aust. J. Stat. 8, 117–126(1966)

Pearson, K.: On the criterion, that a givensystem of deviations from the probable inthe case of a correlated system of vari-ables is such that it can be reasonably sup-posed to have arisen from random sam-pling. In: Karl Pearson’s Early Statisti-cal Papers. Cambridge University Press,pp. 339–357. First published in 1900 inPhilos. Mag. (5th Ser) 50, 157–175 (1948)

Sheynin, O.B.: On the history of some statis-tical laws of distribution. In: Kendall, M.,Plackett, R.L. (eds.) Studies in the Historyof Statistics and Probability, vol. II. Grif-fin, London (1977)

Chi-squareGoodness of Fit Test

The chi-square goodness of fit test is, alongwith the Kolmogorov–Smirnov test, oneof the most commonly used goodness of fittests.This test aims to determine whether it ispossible to approximate an observed distri-bution by a particular probability distri-bution (normal distribution, Poissondistribution, etc).

HISTORYThe chi-square goodness of fit test is the old-est and most well-known of the goodnessof fit tests. It was first presented in 1900 byPearson, Karl.

MATHEMATICAL ASPECTSLetX1, . . . , Xn beasampleofnobservations.The steps used to perform the chi-squaregoodness of fit test are then as follows:1. State thehypothesis.The null hypothesis

will take the following form:

H0 : F = F0 ,

where F0 is the presumed distributionfunction of the distribution.

2. Distribute the observations in k disjointclasses:

[ai−1, ai] .

We denote the number of observationscontained in the ith class, i = 1, . . . , k,by ni.

3. Calculate the theoretical probabilities forevery class on the base of the presumeddistribution function F0:

pi = F0(ai)− F0(ai−1) , i = 1, . . . , k .

Page 79: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Chi-square Goodness of Fit Test 73

4. Obtain the expected frequencies for everyclass

ei = n · pi , i = 1, . . . , k ,

where n is the size of the sample.5. Calculate the χ2 (chi-square) statistic:

χ2 =k∑

i=1

(ni − ei)2

ei.

If H0 is true, the χ2 statistic follows a chi-square distribution with v degrees offreedom, where:

v =(

k − 1− number of estimatedparameters

).

For example, when testing the goodnessof fit to a normal distribution, the num-ber of degrees of freedom equals:• k − 1 if the mean μ and the stan-

dard deviation σ of the populationare known;

• k−2ifoneoutofμorσ isunknownandwill be estimated in order to proceedwith the test;

• k − 3 if both parameters μ and σ areunknown and both are estimated fromthe corresponding values of the sam-ple.

6. Reject H0 if the deviation between theobserved and estimated frequencies isbig; that is:

if χ2 > χ2v,α ,

where χ2v,α is the value given in the chi-

square table for a particular significancelevel α.

DOMAINS AND LIMITATIONSTo apply the chi-square goodness offit test, itis important that n is big enough and that the

estimated frequencies, ei, are not too small.We normally state that the estimated fre-quencies must be greater then 5, except forextreme classes, where they can be smallerthen 5 but greater then 1. If this constraint isnot satisfied, we must regroup the classes inorder to satisfy this rule.

EXAMPLESGoodness of Fit to the BinomialDistributionWe throw a coin four times and count thenumber of times that “heads” appears.This experiment is performed 160 times.The observed frequencies are as follows:

Number of “heads” Number ofexperiments

xi (ni )

0 17

1 52

2 54

3 31

4 6

Total 160

1. If theexperiment wasperformed correct-ly and the coin is not forged, the distri-bution of the number of “heads” obtainedshould follow the binomial distribution.We then state a null hypothesis that theobserved distribution can be approximat-ed by the binomial distribution, and wewill proceed with a goodness of fit test inorder to determine whether this hypoth-esis can be accepted or not.

2. In this example, the different number of“heads” that can be obtained per experi-ment (0, 1, 2, 3 and 4) are each consideredto be a class.

Page 80: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

74 Chi-square Goodness of Fit Test

3. The random variable X (number of“heads” obtained after four throws ofa coin) follows a binomial distribution if

P(X = x) = Cxn · px · qn−x ,

where:

n is the number of independent trials=4;

p is the probability of a success(“heads”)= 0.5;

q is the probability of a failure (“tails”)= 0.5;

Cxn is the number of combinations of

x objects from n.

We then have the following theoreticalprobabilities for four throws:

P(X = 0) = 116

P(X = 1) = 416

P(X = 2) = 616

P(X = 3) = 416

P(X = 4) = 116

4. After the experiment has been performed160 times, the expected number of headsfor each possible value of X is given by:

ei = 160 · P(X = xi) .

We obtain the following table:

Number of“heads” xi

Observedfrequency(ni )

Expectedfrequency (ei )

0 17 10

1 52 40

2 54 60

3 31 40

4 6 10

Total 160 160

5. The χ2 (chi-square) statistic is then:

χ2 =k∑

i=1

(ni − ei)2

ei,

where k is the number of possible valuesof X.

χ2 = (17− 10)2

10+ . . .+ (6− 10)2

10= 12.725 .

6. Choosingasignificance levelαof5%,wefind that the value of χ2

v,α for k − 1 = 4degrees of freedom is:

χ24,0.05 = 9.488 .

Since the calculated value of χ2 is greaterthan the value obtained from the table,we reject the null hypothesis and con-clude that the binomial distribution doesnot give a good approximation to ourobserved distribution. We can then con-clude that the coins were probably forged,or that they were not correctly thrown.

Goodness of Fit to the Normal DistributionThe diameters of cables produced by a fac-tory were studied.A frequency table of the observed distri-bution of diameters is given below:

Cable diameter Observed frequency

(in mm) ni

19.70–19.80 5

19.80–19.90 12

19.90–20.00 35

20.00–20.10 42

20.10–20.20 28

20.20–20.30 14

20.30–20.40 4

Total 140

Page 81: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Chi-square Goodness of Fit Test 75

1. We perform a goodness of fit test fora normal distribution. The null hypoth-esis is therefore that the observed distri-bution can be approximated by a normaldistribution.

2. The previous table shows the observeddiameters divided up into classes.

3. If the random variable X (cable diameter)follows the normal distribution, the ran-dom variable

Z = X − μ

σ

follows a standard normal distribution.The mean μ and the standard devia-tionσ of thepopulationareunknownandare estimated using the mean x and thestandard deviation S of the sample:

x =

7∑i=1

δi · ni

n= 2806.40

140= 20.05 ,

S =

√√√√√7∑

i=1ni · (δi − x)2

n− 1=

√2.477

139

= 0.134

where the δi are the centers of the class-es (in this example, the mean diameter ofa class; for i = 1: δ1 = 19.75) and n isthe total number of observations.We can then calculate the theoreticalprobabilities associated with each class.The detailed calculations for the first twoclasses are:

p1 = P(X ≤ 19.8)

= P(Z ≤ 19.8− x

S)

= P(Z ≤ −1.835)

= 1− P(Z ≤ 1.835)

p2 = P(19.8 ≤ X ≤ 19.9)

= P

(19.8− x

S≤ Z ≤ 19.9− x

S

)

= P(−1.835 ≤ Z ≤ −1.089)

= P(Z ≤ 1.835)− P(Z ≤ 1.089)

These probabilities can be found by con-sulting the normal table. We get:

p1 = P(X ≤ 19.8) = 0.03325

p2 = P(19.8 ≤ X ≤ 19.9) = 0.10476

p3 = P(19.9 ≤ X ≤ 20.0) = 0.22767

p4 = P(20.0 ≤ X ≤ 20.1) = 0.29083

p5 = P(20.1 ≤ X ≤ 20.2) = 0.21825

p6 = P(20.2 ≤ X ≤ 20.3) = 0.09622

p7 = P(X > 20.3) = 0.02902

4. The expected frequencies for the classesare then given by:

ei = n · pi ,

which yields the following table:

Cablediameter(in mm)

Observedfrequencyni

Expectedfrequencyei

19.70–19.80 5 4.655

19.80–19.90 12 14.666

19.90–20.00 35 31.874

20.00–20.10 42 40.716

20.10–20.20 28 30.555

20.20–20.30 14 13.471

20.30–20.40 4 4.063

Total 140 140

5. The χ2 (chi-square) statistic is then:

χ2 =k∑

i=1

(ni − ei)2

ei,

Page 82: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

76 Chi-Square Table

where k = 7 is the number of classes.

χ2 = (5− 4.655)2

4.655

+ (12− 14.666)2

14.666

+ . . .+ (4− 4.063)2

4.063= 1.0927 .

6. Choosing a significance level α = 5%,we find that the value of χ2

v,α with k−3 =7− 3 = 4 degrees of freedom in the chi-square table is:

χ24,0.05 = 9.49 .

Since the value calculated from χ2 issmaller then the value obtained from thechi-square table, we do not reject the nullhypothesis and we conclude that the dif-ferencebetween theobserveddistributionand the normal distribution is not signif-icant at a significance level of 5%.

FURTHER READING� Chi-square distribution� Chi-square table� Goodness of fit test� Hypothesis testing� Kolmogorov–Smirnov test

REFERENCEPearson, K.: On the criterion, that a given

system of deviations from the probable inthe case of a correlated system of vari-ables is such that it can be reasonably sup-posed to have arisen from random sam-pling. In: Karl Pearson’s Early Statisti-cal Papers. Cambridge University Press,pp. 339–357. First published in 1900 inPhilos. Mag. (5th Ser) 50, 157–175 (1948)

Chi-Square TableThe chi-square table gives the valuesobtained from the distribution functionof a random variable that follows a chi-square distribution.

HISTORYOne of the first chi-square tables was pub-lished in 1902, by Elderton. It containsdistribution function values that are givento six decimal places.In 1922, Pearson, Karl reported a table ofvalues for the incomplete gamma function,down to seven decimals.

MATHEMATICAL ASPECTSLet X be a random variable that followsa chi-square distribution with v degrees offreedom. The density function of the ran-dom variable X is given by:

f (t) = tv2−1 exp

(− t2

)

2v2

( v2

) , t ≥ 0 ,

where represents the gamma function (seegamma distribution).The distribution function of the randomvariable X is defined by:

F(x) = P(X ≤ x) =∫ x

0f (t) dt .

The chi-square table gives the values of thedistribution function F(x) for different val-ues of v.Weoften use thechi-square table in theoppo-site way, to find the value of x that corre-sponds to a given probability.We generally denote as χ2

v,α the value of therandom variable X for which

P(X ≤ χ2v,α) = 1− α .

Note: the notation χ2 is read “chi-square”.

Page 83: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Chi-Square Test 77

EXAMPLESSee Appendix F.The chi-square table allows us, for a givennumber of degrees of freedom v, to deter-mine:1. The value of the distribution function

F(x), given x.2. The value of χ2

v,α , given the probabilityP(X ≤ χ2

v,α).

FURTHER READING� Chi-square distribution� Chi-square goodness of fit test� Chi-square test� Chi-square test of independence� Statistical table

REFERENCESElderton, W.P.: Tables for testing the good-

ness of fit of theory to observation.Biometrika 1, 155–163 (1902)

Pearson, K.: Tables of the Incomplete -function. H.M. Stationery Office (Cam-bridge University Press, Cambridge since1934), London (1922)

Chi-Square TestThere are a number of chi-square tests, all ofwhich involve comparing the test results tothevaluesfromthechi-square distribution.The most well-known of these tests are intro-duced below:• The chi-square test of independence is

used to determine whether two qualita-tive categorical variables associated witha sample are independent.

• The chi-square goodness of fit test isusedtodeterminewhether thedistributionobserved for a sample can be approxi-mated by a theoretical distribution. We

mightwant to know, for example,whetherthe distribution observed for the sam-ple corresponds to a particular probabi-lity distribution (normal distribution,Poisson distribution, etc).

• The chi-square test for an unknown vari-ance is used when we want to test whetherthis variance takes a particular constantvalue.

• The chi-square test is used to test forhomogeneity of the variances calculatedfor many samples drawn from a normallydistributed population.

HISTORYIn 1937, Bartlett, M.S. proposed a method oftesting the homogeneity of the variance formany samples drawn from a normally dis-tributed population.See also chi-square test of independenceand chi-square goodness of fit test.

MATHEMATICAL ASPECTSThe mathematical aspects of the chi-squaretest of independence and those of the chi-square goodness of fit test are dealt with intheir corresponding entries.The chi-square test used to check whetheran unknown variance takes a particular con-stant value is the following:Let (x1, . . . , xn) be a random sample com-ing from a normally distributed populationof unknown mean μ and of unknown vari-ance σ 2.We have good reason to believe that the vari-ance of the population equals a presumedvalue σ 2

0 . The hypotheses for each case aredescribed below.

A: Two-sided case:

H0 : σ 2 = σ 20

H1 : σ 2 �= σ 20

Page 84: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

78 Chi-Square Test

B: One-sided test:

H0 : σ 2 ≤ σ 20

H1 : σ 2 > σ 20

C: One-sided test:

H0 : σ 2 ≥ σ 20

H1 : σ 2 < σ 20

We then determine the statistic of the givenchi-square test using:

χ2 =

n∑i=1

(xi − x)2

σ 20

.

This statistic is, under H0, chi-square dis-tributed with n − 1 degrees of freedom. Inother words, we look for the value of χ2

n−1,αin the chi-square table, and we then com-pare that value to the calculated value χ2.The decision rules depend on the case, andare as follows.

Case A

If χ2 ≥ χ2n−1,α1

or if χ2 ≤ χ2n−1,1−α2

wereject the null hypothesis H0 for the alter-native hypothesis H1, where we have splitthe significance level α into α1 and α2 suchthatα1+α2 = α. Otherwise we do not rejectthe null hypothesis H0.

Case B

If χ2 < χ2n−1,α we do not reject the null

hypothesis H0. If χ2 ≥ χ2n−1,α we reject the

nullhypothesisH0for thealternativehypoth-esis H1.

Case C

If χ2 > χ2n−1,1−α we do not reject the

null hypothesis H0. If χ2 ≤ χ2n−1,1−α we

reject the null hypothesis H0 for the alterna-tive hypothesis H1.Other chi-square tests are proposed in thework of Ostle, B. (1963).

DOMAINS AND LIMITATIONSχ2 (chi-square) statistic must be calculatedusing absolute frequencies and not relativeones.Note that thechi-square testcanbeunreliablefor small samples, especially when some ofthe estimated frequencies are small (< 5).This issue can often be resolved by group-ing categories together, if such grouping issensible.

EXAMPLESConsider a batch of items produced bya machine. They can be divided up intoclasses depending on their diameters (inmm), as in the following table:

Diameter (mm) xi Number of items ni

59.5 2

59.6 6

59.7 7

59.8 15

59.9 17

60.0 25

60.1 15

60.2 7

60.3 3

60.4 1

60.5 2

Total N = 100

We have a random sample drawn froma normally distributed population where themean and variance are not known. The ven-dor of these items would like the variance σ 2

to be smaller than or equal to 0.05. We test

Page 85: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Chi-square Test of Independence 79

the following hypotheses:

null hypothesis H0 : σ 2 ≤ 0.05

alternative hypothesis H1 : σ 2 > 0.05 .

In this case we use the one-tailed hypothesistest.We start by calculating the mean for the sam-ple:

x =∑11

i=1 ni · xi

N= 5995

100= 59.95 .

We can then calculate the χ2 statistics:

χ2 =∑11

i=1 ni · (xi − x)2

σ 20

= 2 · (−0.45)2 + . . .+ 2 · (0.55)2

0.05

= 3.97

0.05= 79.4 .

Using a significance level of α = 5%, wethen find the value of χ2

99,0.05 (= 123.2) inthe chi-square table.As χ2 = 79.4 < χ2

99,0.05, we do not rejectthe null hypothesis, which means that thevendor should be happy to sell these itemssince they are not significantly different indiameter.

FURTHER READING� Chi-square distribution� Chi-square goodness of fit test� Chi-square table� Chi-square test of independence

REFERENCESBartlett, M.S.: Some examples of statisti-

cal methods of research in agriculture andapplied biology. J. Roy. Stat. Soc. (Suppl.)4, 137–183 (1937)

Ostle, B.: Statistics in Research: Basic Con-cepts and Techniques for Research Work-ers. Iowa State College Press, Ames, IA(1954)

Chi-square Testof Independence

The chi-square test of independence aims todetermine whether two variables associatedwith a sample are independent or not. Thevariables studied are categorical qualitativevariables.The chi-square independence test is per-formed using a contingency table.

HISTORYThe first contingency tables were used onlyfor enumeration. However, encouraged bythe work of Quetelet, Adolphe (1849),statisticians began to take an interest in theassociations between the variables used inthe tables. For example, Pearson, Karl(1900)performed fundamentalworkon con-tingency tables.Yule, George Udny (1900) proposedasomewhatdifferentapproachtothestudyofcontingency tables to Pearson’s, which leadto a disagreement between them. Pearsonalso argued with Fisher, Ronald Aylmerabout the number of degrees of freedom touse in the chi-square test of independence.Everyone used different numbers until Fish-er, R.A. (1922) was eventually proved to becorrect.

MATHEMATICAL ASPECTSConsider two qualitative categorical vari-ables X and Y. We have a sample contain-ing n observations of these variables.

Page 86: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

80 Chi-square Test of Independence

These observations can be presented ina contingency table.We denote the observed frequency of thecategory iof thevariableX andthecategory jof the variable Y as nij.

Categories of variable Y

Categoriesof

variable X

Y1 . . . Yc TotalX1 n11 . . . n1c n1.

. . . . . . . . . . . . . . .Xr nr1 . . . nrc nr.

Total n.1 . . . n.c n..

The hypotheses to be tested are:

Null hyp. H0 : the two variablesare independent,

Alternative hyp. H1 : the two variablesare not independent.

Steps Involved in the Test1. Compute the expected frequencies,

denoted by eij, for each case in the con-tingency table under the independencehypothesis:

eij = ni. · n.j

n..,

ni. =c∑

k=1

nik and n.j =r∑

k=1

nkj ,

wherec represents thenumberofcolumns(or number of categories of variable X inthecontingency table) andr thenumberofrows (or the number of categories of vari-able Y).

2. Calculate the value of the χ2 (chi-square)statistic, which is really a measure of thedeviation of the observed frequencies nij

from the expected frequencies eij:

χ2 =c∑

i=1

r∑j=1

(nij − eij)2

eij.

3. Choose the significance levelα to beusedin the test and compare the calculated val-ue of χ2 with the value obtained fromthe chi-square table, χ2

v,α . The numberof degrees of freedom correspond to thenumber of cases in the table that can takearbitrary values; the values taken by theother cases are imposed on them by therow and column totals. So, the number ofdegrees of freedom is given by:

v = (r − 1)(c− 1) .

4. If thecalculatedχ2 is smaller then theχ2v,α

from the table, we do not reject the nullhypothesis. The two variables can be con-sidered to be independent.However, if the calculated χ2 is greaterthen the χ2

v,α from the table, we reject thenull hypothesis for the alternative hypoth-esis. We can then conclude that the twovariables are not independent.

DOMAINS AND LIMITATIONSCertain conditions must be fulfilled in orderto be able to apply the chi-square test of inde-pendence:

1. The sample, which contains n observa-tions, must be a random sample;

2. Each individual observation can onlyappear in one category for each vari-able. In other words, each individu-al observation can only appear in oneline and one column of the contingencytable.

Note that the chi-square test of indepen-dence is not very reliable for small samples,especially when the estimated frequenciesare small (< 5). To avoid this issue we cangroup categories together, but only when thisgroups obtained are sensible.

Page 87: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Chi-square Test of Independence 81

EXAMPLESWe want to determine whether the propor-tion of smokers is independent of gender.The two variables to be studied are categor-ical and qualitative and contain two cate-gories each:• Variable “gender:” M or F;• Variable “smoking status:” “smokes” or

“does not smoke.”The hypotheses are then:

H0 : chance of being a smoker isindependent of gender

H1 : chance of being a smoker is notindependent of gender.

The contingency table obtained froma sample of 100 individuals (n = 100)is shown below:

Smoking status

“smokes” “doesnotsmoke”

Total

Gender M 21 44 65

F 10 25 35

Total 31 69 100

We now denote the observed frequencies asnij (i = 1, 2, j = 1, 2).We then estimate all of the frequencies in thetable based on the hypothesis that the twovariables are independent of each other. Wedenote these estimated frequencies by eij:

eij = ni. · n.j

n...

We therefore obtain:

e11 = 65 · 31

100= 20.15

e12 = 65 · 69

100= 44.85

e21 = 35 · 31

100= 10.85

e12 = 35 · 69

100= 24.15 .

The estimated frequency table is givenbelow:

Smoking status

“smokes” “doesnotsmoke”

Total

Gender M 20.15 44.85 65

F 10.85 24.15 35

Total 31 69 100

If the null hypothesis H0 is true, the statistic

χ2 =2∑

i=1

2∑j=1

(nij − eij)2

eij

is chi-square-distributed with (r − 1)

(c− 1) = (2− 1) (2− 1) = 1 degreeof freedom and

χ2 = 0.036+ 0.016+ 0.066+ 0.030

= 0.148 .

If a significance level of 5% is selected, thevalue of χ2

1,0.05 is 3.84, from the chi-squaretable.Since the calculated value of χ2 is smallerthen the value found in the chi-square table,we do not reject the null hypothesis and weconclude that the two variables studied areindependent.

FURTHER READING� Chi-square distribution� Chi-square table� Contingency table� Test of independence

Page 88: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

82 Classification

REFERENCEFisher, R.A.: On the interpretation of χ2

from contingency tables, and the calcula-tion ofP. J.Roy.Stat.Soc.Ser.A85,87–94(1922)

Pearson, K.: On the criterion, that a givensystem of deviations from the probable inthe case of a correlated system of vari-ables is such that it can be reasonably sup-posed to have arisen from random sam-pling. In: Karl Pearson’s Early Statisti-cal Papers. Cambridge University Press,pp. 339–357. First published in 1900 inPhilos. Mag. (5th Ser) 50, 157–175 (1948)

Quetelet, A.: Letters addressed to H.R.H. theGrandDukeofSaxeCoburgandGotha,onthe Theory of Probabilities as Applied tothe Moral and Political Sciences. (Frenchtranslation by Downs, Olinthus Gregory).Charles & Edwin Layton, London (1849)

Yule,G.U.:Ontheassociationofattributes instatistics: with illustration from the mate-rialof thechildhoodsociety.Philos.Trans.Roy. Soc. Lond. Ser. A 194, 257–319(1900)

Classification

Classification is the grouping together ofsimilar objects. If each object is charac-terized by p variables, classification canbe performed according to rational criteria.Depending on the criteria used, an objectcould potentially belong to several classes.

HISTORYClassifying the residents of a locality ora country according to their sex and oth-er physical characteristics is an activity thatdates back to ancient times. The Hindus, the

ancient Greeks and the Romans all devel-oped multiple typologies for human beings.The oldest comes from Galen (129–199A.D.).Later on, the concept of classification spreadto the fields of biology and zoology; theworks of Linné (1707–1778) should be men-tioned in this regard.The first ideas regarding actual methods ofcluster analysis are attributed to Adanson(eighteenth century). Zubin (1938), Tryon(1939) and Thorndike (1953) also attemptedto develop some methods, but the true devel-opment of classification methods coincideswith the advent of the computer.

MATHEMATICAL ASPECTSClassification methods can be divided intotwo large categories, one based on prob-abilities and the other not.The first category contains, for example, dis-criminating analysis. The second categorycan be further subdivided into two groups.The first group contains what are known asoptimalclassificationmethods. In thesecondgroup, we can distinguish between severalsubtypes of classification method:• Partitionmethodsthatconsistofdistribut-

ing n objects among g groups in suchawaythateachobjectexclusivelybelongsto just one group. The number of groups gis fixed beforehand, and the partitionapplied most closely satisfies the classi-fication criteria.

• Partition methods incorporating infring-ing classes, where an object is allowed tobelong to several groups simultaneously.

• Hierarchical classification methods,where the structure of the data at dif-ferent levels of classification is taken intoaccount. We can then take account for therelationships that exist between the dif-

Page 89: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Cluster Analysis 83

ferent groups created during the partition.Two different kinds of hierarchical tech-niques exist: agglomerating techniquesand dividing techniques.Agglomerating methods start with sepa-rated objects, meaning that the n objectsare initially distributed into n groups. Twogroups are agglomerated in each subse-quent step until there is only one groupleft.In contrast, dividing methodsstartwith allof the objects grouped together, meaningthat all of the n objects are in one singlegroup to start with. New groups are creat-ed at each subsequent step until there aren groups.

• Geometric classification, in which theobjects are depicted on a scatter plot andthen grouped according to position on theplot. In a graphical representation, theproximities of the objects to each other inthe graphic correspond to the similaritiesbetween the objects.

The first three types of classification are gen-erally grouped together under the term clus-ter analysis. What they have in commonis the fact that the objects to be classifiedmust present a certain amount of structurethat allows us to measure the degree of sim-ilarity between the objects.Each type of classification contains a multi-tudeofmethodsthatallowustocreateclassesof similar objects.

DOMAINS AND LIMITATIONSClassification can be used in two cases:• Description cases;• Prediction cases.In the first case, the classification is doneon the basis of some generally acceptedstandard characteristics. For example, pro-fessions can be classified into freelance,

managers, workers, and so on, and one cancalculate average salaries, average frequen-cy of health problems, and so on, for eachclass.In the second case, classification will lead toa prediction and then to an action. For exam-ple, if the foxes in a particular region exhibitapathetic behavior and excessive salivation,we can conclude that there is a new rabiesepidemic. This should then prompt a vaccinecampaign.

FURTHER READING� Cluster analysis� Complete linkage method� Data analysis

REFERENCESEveritt, B.S.: Cluster Analysis. Halstead,

London (1974)

Gordon, A.D.: Classification. Methods forthe Exploratory Analysis of MultivariateData. Chapman & Hall, London (1981)

Kaufman, L., Rousseeuw, P.J.: FindingGroups in Data: An Introduction to Clus-ter Analysis. Wiley, New York (1990)

Thorndike, R.L.: Who belongs in a family?Psychometrika 18, 267–276 (1953)

Tryon, R.C.: Cluster Analysis. McGraw-Hill, New York (1939)

Zubin, J.: A technique for measuring like-mindedness. J. Abnorm. Social Psychol.33, 508–516 (1938)

Cluster AnalysisClustering is the partitioning of a data setinto subsets or clusters, so that the degree ofassociation isstrongbetweenmembersof thesame cluster and weak between members of

Page 90: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

84 Cluster Analysis

different clusters according to some defineddistance measure.Several methods of performing clusteranalysis exist:• Partitional clustering• Hierarchical clustering.

HISTORYSee classification and data analysis.

MATHEMATICAL ASPECTSTo carry out cluster analysis on a set ofn objects, we need to define a distancebetween the objects (or more generallya measure of the similarity between theobjects) that need to be classified. The exis-tence of some kind of structure within theset of objects is assumed.To carry out a hierarchical classification ofa set E of objects {x1, x2, . . . , xn}, it is neces-sary to define a distance associated with Ethat can be used to obtain a distance tablebetween the objects of E. Similarly, a dis-tance must also be defined for any subsetsof E.One approach to hierarchical clustering is touse theagglomeratingmethod. Itcanbesum-marized in the following algorithm:1. Locate the pair of objects (xi, xj) which

have the smallest distance between eachother.

2. Aggregate the pair of objects (xi, xj) intoa single element α and re-establish a newdistance table. This is achieved by sup-pressing the lines and columns associat-ed with xi and xj and replacing them witha lineand acolumn associated withα. Thenew distance table will have a line anda column less than the previous table.

3. Repeat these two operations until thedesired number of classes are obtained or

until allof theobjectsaregathered into thesame class.

Note that the distance between the groupformed from aggregated elements and theother elements can be defined in differentways, leading to different methods. Exam-ples include the single link method and thecomplete linkage method.The single link method is a hierarchicalclassification method thatuses theEuclideandistance to establish a distance table, andthe distance between two classes is given bythe Euclidean distance between the two clos-est elements (the minimum distance).In the complete linkage method, the dis-tance between two classes is given by theEuclidean distance between the two ele-ments furthest away (the maximum dis-tance).Given that the only difference between thesetwo methods is that the distance between twoclasses is either the minimum and the max-imum distance, only the single link methodwill be considered here.For a set E = {X1, X2, . . . , Xn}, the distancetable for the elements of E is then estab-lished.Since this table is symmetric and null alongits diagonal, only one half of the table is con-sidered:

d(X1, X2) d(X1, X3) . . . d(X1, Xn)

d(X2, X3) . . . d(X2, Xn)

. . . . . .d(Xn−1, Xn)

where d(Xi, Xj) is the Euclidean distancebetween Xi and Xj for i < j, where the valuesof i and j are between 1 and n.The algorithm for the single link method isas follows:• Search for the minimum d(Xi, Xj) for

i < j;

Page 91: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Cluster Analysis 85

• TheelementsXi andXj areaggregated intoa new group Ck = Xi ∪ Xj;

• The set E is then partitioned into

{X1}, . . . , {Xi−1}, {Xi, Xj}, {Xi+1}, . . . ,

{Xj−1}, {Xj+1}, . . . , {Xn} ;• The distance table is then recreated

without the lines and columns associ-ated with Xi and Xj, and with a lineand a column representing the distancesbetween Xm and Ck, m = 1, 2, . . . , n m �=i and m �= j, given by:

d(Ck, Xm) = min{d(Xi, Xm); d(Xj, Xm)}.The algorithm is repeated until the desirednumber of groups is attained or until thereis only one group containing all of the ele-ments.In thegeneralcase, thedistancebetweentwogroups is given by:

d(Ck, Cm) = min{d(Xi, Xj) with Xi

belonging to Ck and Xj to Cm} ,

The formula quoted previously applies to theparticular case when the groups are com-posed, respectively, of two elements and onesingle element.This series of agglomerations can be repre-sented by a dendrogram, where the abscis-sa shows the distance separating the objects.Note that we could find more than one pairwhen we search for the pair of closest ele-ments. In this case, the pair that is select-ed for aggregation in the first step does notinfluence later steps (provided the algorithmdoes not finish at this step), because the oth-er pair of closest elements will be aggre-gated in the following step. The aggrega-tion order is not shown on the dendrogrambecause it reports the distance that separatestwo grouped objects.

DOMAINS AND LIMITATIONSThe choice of the distance between thegroup formed of aggregated elements andthe other elements can be operated in severalways, according to the method that is used,as for example in the single link method andin the complete linkage method.

EXAMPLESLet us illustrate how the single link methodof cluster analysis can be applied to theexaminationgradesobtainedbyfivestudentseach studying four courses: English, French,maths and physics.Wewant to divide thesefivestudents into twogroups using the single link method.The grades obtained in the examinations,which range from 1 to 6, are summarized inthe following table:

English French Maths Physics

Alain 5.0 3.5 4.0 4.5

Jean 5.5 4.0 5.0 4.5

Marc 4.5 4.5 4.0 3.5

Paul 4.0 5.5 3.5 4.0

Pierre 4.0 4.5 3.0 3.5

We then work out the Euclidian distancesbetween the students and use them to createa distance table:

Alain Jean Marc Paul Pierre

Alain 0 1.22 1.5 2.35 2

Jean 1.22 0 1.8 2.65 2.74

Marc 1.5 1.8 0 1.32 1.12

Paul 2.35 2.65 1.32 0 1.22

Pierre 2 2.74 1.12 1.22 0

By only considering the upper part of thissymmetric table, we obtain:

Page 92: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

86 Cluster Analysis

Jean Marc Paul Pierre

Alain 1.22 1.5 2.35 2

Jean 1.8 2.65 2.74

Marc 1.32 1.12

Paul 1.22

The minimum distance is 1.12, betweenMarc and Pierre; we therefore form the firstgroup from these two students. We then cal-culate the new distances.For example, we calculate the new dis-tance between Marc and Pierre on one sideand Alain on the other by taking the mini-mum distance between Marc and Alain andthe minimum distance between Pierre andAlain:

d({Marc,Pierre},Alain)

= min{d(Marc,Alain);d(Pierre,Alain)}= min{1.5; 2} = 1.5 ,

also

d({Marc,Pierre},Jean)

= min{d(Marc,Jean);d(Pierre,Jean)}= min{1.8; 2.74} = 1.8 ,

and

d({Marc,Pierre},Paul)

= min{d(Marc,Paul);d(Pierre,Paul)}= min{1.32; 1.22} = 1.22 .

The new distance table takes the followingform:

Jean Marc and Paul

Pierre

Alain 1.22 1.5 2.35

Jean 1.8 2.65

Marc and Pierre 1.22

The minimum distance is now 1.22, betweenAlain and Jean and also between the group

of Marc and Pierre on the one side and Paulon the other side (in other words, two pairsexhibit the minimum distance); let us chooseto regroup Alain and Jean first. The otherpair will be aggregated in the next step. Werebuild the distance table and obtain:

d({Alain,Jean}, {Marc,Pierre})

= min{d(Alain,{Marc,Pierre}) ,

d(Jean,{Marc,Pierre})}

= min{1.5; 1.8} = 1.5

as well as:

d({Alain,Jean},Paul)

= min{d(Alain,Paul);d(Jean,Paul)}

= min{2.35; 2.65} = 2.35 .

This gives the following distance table:

Marc and Pierre Paul

Alain and Jean 1.5 2.35

Marc and Pierre 1.22

Notice that Paul must now be integrated inthe group formed from Marc and Pierre, andthe new distance will be:

d({(Marc,Pierre),Paul},{Alain,Jean})

= min{d({Marc,Pierre},{Alain,Jean}) ,

d(Paul,{Alain,Jean})}

= min{1.5; 2.35} = 1.5

which gives the following distance table:

Alain and Jean

Marc, Pierre and Paul 1.5

We finally obtain two groups:{Alain and Jean} and {Marc, Pierre andPaul} which are separated by a distance of1.5.

Page 93: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Cluster Sampling 87

The following dendrogram illustrates thesuccessive aggregations:

FURTHER READING� Classification� Complete linkage method� Dendrogram� Distance� Distance table

REFERENCESCeleux,G.,Diday,E.,Govaert,G.,Lecheval-

lier, Y., Ralambondrainy, H.: Classifi-cation automatique des données—aspectsstatistiques et informatiques. Dunod,Paris (1989)

Everitt, B.S.: Cluster Analysis. Halstead,London (1974)

Gordon, A.D.: Classification. Methods forthe Exploratory Analysis of MultivariateData. Chapman & Hall, London (1981)

Jambu, M., Lebeaux, M.O.: Classificationautomatique pour l’analyse de données.Dunod, Paris (1978)

Kaufman, L., Rousseeuw, P.J.: FindingGroups in Data: An Introduction to Clus-ter Analysis. Wiley, New York (1990)

Lerman, L.C.: Classification et analyse ordi-nale des données. Dunod, Paris (1981)

Tomassone, R., Daudin, J.J., Danzart, M.,Masson, J.P.: Discrimination et classe-ment. Masson, Paris (1988)

Cluster SamplingIn cluster sampling, the first step is to dividethe population into subsets called clusters.Each cluster consists of individuals that aresupposed to be representative of the popula-tion.Cluster sampling then involves choosingarandomsampleofclustersandthenobserv-ing all of the individuals that belong to eachof them.

HISTORYSee sampling.

MATHEMATICAL ASPECTSCluster sampling is the process of random-ly extracting representative sets (known asclusters) from a larger population of unitsand then applying a questionnaire to all ofthe units in the clusters. The clusters oftenconsist of geographical units, like city dis-tricts. In thiscase, themethod involvesdivid-ing a city into districts, and then selecting thedistricts to be included in thesample.Finally,all of the people or households in the chosendistrict are questioned.There are two principal reasons to performcluster sampling. In many inquiries, there isno complete and reliable list of the popu-lation units on which to base the sampling,or it may be that it is too expensive to cre-ate such a list. For example, in many coun-tries, including industrialized ones, it is rareto have complete and up-to-date lists of all ofthe members of the population, householdsor rural estates. In this situation, samplingcan be achieved in a geographical manner:each urban region is divided up into districtsand each rural region into rural estates. Thedistricts and the agricultural areas are con-sidered to be clusters and we use the com-

Page 94: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

88 Coefficient of Determination

plete list of clusters because we do not havea complete and up-to-date list of all popu-lation units. Therefore, we sample a requi-site number of clusters from the list and thenquestion all of the units in the selected clus-ter.

DOMAINS AND LIMITATIONSThe advantage of cluster sampling is that itis not necessary to have a complete, up-to-date list of all of the units of the populationto perform analysis.For example, in many countries, there are noupdated lists of people or housing. The costsof creating such lists are often prohibitive.It is therefore easier to analyze subsets of thepopulation (known as clusters).In general, cluster sampling provides esti-mations that are not as precise as simplerandom sampling, but this drop in accuracyis easily offset by the far lower cost of clustersampling.In order to perform cluster sampling as effi-ciently as possible:• The clusters should not be too big, and

there should be a large enough number ofclusters,

• cluster sizes should be as uniform as pos-sible;

• The individuals belonging to each clus-ter must be as heterogenous as possiblewith respect to the parameter being ob-served.

Another reason to use cluster sampling iscost. Even when a complete and up-to-datelist of all population units exists, it may bepreferable to use cluster sampling from aneconomic point of view, since it is complet-ed faster, involves fewer workers and min-imizes transport costs. It is therefore moreappropriate to use cluster sampling if themoney saved by doing so is far more signif-

icant than the increase in sampling variancethat will result.

EXAMPLESConsider N, the size of the population oftown X. We want to study the distribution of“Age” for town X without performing a cen-sus. The population is divided into G parts.Simple random sampling is performedamongst these G parts and we obtain g parts.The final sample will be composed of all theindividuals in the g selected parts.

FURTHER READING� Sampling� Simple random sampling

REFERENCESHansen, M.H., Hurwitz, W.N., Madow,

M.G.: Sample Survey Methods and The-ory. Vol. I. Methods and Applications.Vol. II. Theory. Chapman & Hall, Lon-don (1953)

Coefficientof Determination

The coefficient of determination, denotedR2, is the quotient of the explained varia-tion (sum of squares due to regression) to thetotal variation (total sum of squares total SS(TSS)) in a model of simple or multiple lin-ear regression:

R2 = Explained variation

Total variation.

It equals thesquareof thecorrelation coeffi-cient, and it can take values between 0 and 1.It is often expressed as a percentage.

HISTORYSee correlation coefficient.

Page 95: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Coefficient of Determination 89

MATHEMATICAL ASPECTSConsider the following model for multipleregression:

Yi = β0 + β1Xi1 + · · · + βpXip + εi

for i = 1, . . . , n, where

Yi are the dependent variables,Xij (i = 1, . . . , n, j = 1, . . . , p) are the inde-

pendent variables,εi are the random nonobservable error

terms,βj (j = 1, . . . , p) are the parameters to be

estimated.

Estimating the parameters β0, β1, . . . , βp

yields the estimation

Yi = β0 + β1Xi1 + · · · + βpXip .

The coefficient of determination allows us tomeasure the quality of fit of the regressionequation to the measured values.To determine the quality of the fit of theregression equation, consider the gapbetween the observed value and the esti-mated value for each observation of thesample. This gap (or residual) can also beexpressed in the following way:

n∑i=1

(Yi − Y)2 =n∑

i=1

(Yi − Yi)2

+n∑

i=1

(Yi − Y)2

TSS = RSS+ REGSS

where

TSS is the total sum of squares,RSS the residual sum of squares andREGSS the sum of the squares of the

regression.

These concepts and the relationshipsbetween them are presented in the followinggraph:

Using these concepts, we can define R2,which is the determination coefficient. Itmeasures the proportion of variation in vari-able Y, which is described by the regressionequation as:

R2 = REGSS

TSS=

n∑i=1

(Yi − Y)2

n∑i=1

(Yi − Y)2

If the regression function is to be used tomake predictions about subsequent observa-tions, it is preferable to have a high valueof R2, because the higher the value of R2, thesmaller the unexplained variation.

EXAMPLESThe following table gives values for theGross National Product (GNP) and thedemand for domestic products covering the1969–1980 period for a particular country.

Year GNPX

Demand for dom-estic products Y

1969 50 6

1970 52 8

1971 55 9

1972 59 10

Page 96: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

90 Coefficient of Determination

Year GNPX

Demand for dom-estic products Y

1973 57 8

1974 58 10

1975 62 12

1976 65 9

1977 68 11

1978 69 10

1979 70 11

1980 72 14

We will try to estimate the demand for smallgoods as a function of GNP according to themodel

Yi = a+ b · Xi + εi , i = 1, . . . , 12 .

Estimating the parameters a and b by theleast squares method yields the followingestimators:

b =

12∑i=1

(Xi − X)(Yi − Y)

12∑i=1

(Xi − X)2

= 0.226 ,

a = Y − b · X = −4.047 .

The estimated line is written

Y = −4.047+ 0.226 · X .

The quality of the fit of the measured pointsto the regression line is given by the deter-mination coefficient:

R2 = REGSS

TSS=

12∑i=1

(Yi − Y)2

12∑i=1

(Yi − Y)2

.

We can calculate the mean using:

Y =

12∑i=1

Yi

n= 9.833 .

Yi Yi (Yi − Y )2 (Yi − Y )2

6 7.260 14.694 6.622

8 7.712 3.361 4.500

9 8.390 0.694 2.083

10 9.294 0.028 0.291

8 8.842 3.361 0.983

10 9.068 0.028 0.586

12 9.972 4.694 0.019

9 10.650 0.694 0.667

11 11.328 1.361 2.234

10 11.554 0.028 2.961

11 11.780 1.361 3.789

14 12.232 17.361 4.754

Total 47.667 30.489

We therefore obtain:

R2 = 30.489

47.667= 0.6396

or, in percent:

R2 = 63.96% .

We can therefore conclude that, according tothe model chosen, 63.96% of the variation inthe demand for small goods is explained bythe variation in the GNP.Obviously the value of R2 cannot exceed100%. While 63.96% is relatively high, it isnot close enough to 100% to rule out tryingto modify the model further.This analysis also shows that other variablesapart from the GNP should be taken intoaccount when determining the function cor-responding to the demand for small goods,since the GNP only partially explains thevariation.

Page 97: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Coefficient of Kurtosis 91

FURTHER READING� Correlation coefficient� Multiple linear regression� Regression analysis� Simple linear regression

Coefficient of KurtosisThe coefficient of kurtosis is used to mea-sure the peakness or flatness of a curve. Itis based on the moments of the distribution.This coefficient is one of the measures ofkurtosis.

HISTORYSee coefficient of skewness.

MATHEMATICAL ASPECTSThe coefficient of kurtosis (β2) is basedon the centered fourth-order moment ofa distribution which is equal to:

μ4 = E[(X − μ)4

].

In order to obtain a coefficient of kurtosisthat is independent of the units of measu-rement, the fourth-order moment is dividedby the standard deviation of the popula-tion σ raised to the fourth power. The coef-ficient of kurtosis then becomes equal to:

β2 = μ4

σ 4 .

For a sample (x1, x2, . . . , xn), the estimatorof this coefficient is denoted by b2. It is equalto:

b2 = m4

S4 ,

where m4 is the centered fourth-ordermoment of the sample, given by:

m4 = 1

n∑i=1

(xi − x)4 ,

where x is the arithmetic mean, n is the totalnumber of observations and S4 is the stan-dard deviation of the sample raised to thefourth power.For the case where a random vari-able X takes values xi with frequenciesfi, i = 1, 2, . . . , h, the centered fourth-ordermoment of the sample is given by the for-mula:

m4 = 1

h∑i=1

fi · (xi − x)4 .

DOMAINS AND LIMITATIONSForanormal distribution, thecoefficientofkurtosis is equal to 3. Therefore a curve willbe called platikurtic (meaning flatter than thenormal distribution) if it has a kurtosis coef-ficient smaller than 3. It will be leptokur-tic (meaning sharper than the normal distri-bution) if β2 is greater than 3.Letusnowprove that thecoefficientofkurto-sis is equal to 3 for the normal distribution.We know that β2 = μ4

σ 4 , meaning that thecentered fourth-order moment is divided bythe standard deviation raised to the fourthpower. It can be proved that the centered sthorder moment, denoted μs, satisfies the fol-lowing relation for a normal distribution:

μs = (s− 1)σ 2 · μs−2 .

This formula is a recursive formula whichexpresses higher order moments as a func-tion of lower order moments.Given that μ0 = 1 (the zero-order momentofany random variable is equal to 1, since itis the expected value of this variable raisedto the power zero) and that μ1 = 0 (the cen-tered first-order moment is zero for any ran-dom variable), we have:

μ2 = σ 2

μ3 = 0

Page 98: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

92 Coefficient of Skewness

μ4 = 3σ 4

etc. .

The coefficient of kurtosis is then equal to:

β2 = μ4

σ 4 =3σ 4

σ 4 = 3 .

EXAMPLESWewant tocalculate thekurtosisof thedistri-bution of daily turnover for 75 bakeries. Letus calculate the coefficient of kurtosis β2

using the following data:

Table categorizing the daily turnovers of 75bakeries

Turnover Frequencies

215–235 4

235–255 6

255–275 13

275–295 22

295–315 15

315–335 6

335–355 5

355–375 4

The fourth-order moment of the sample isgiven by:

m4 = 1

n

h∑i=1

fi(xi − x)4 ,

where n = 75, x = 290.60 and xi is the cen-ter of class interval i. We can summarize thecalculations in the following table:

xi xi − x fi fi (xi − x)4

225 −65.60 4 74075629.16

245 −45.60 6 25942428.06

265 −25.60 13 5583457.48

285 −5.60 22 21635.89

305 14.40 15 644972.54

325 34.40 6 8402045.34

345 54.40 5 43789058.05

365 74.40 4 122560841.32

281020067.84

Since S = 33.88, the coefficient of kurtosisis equal to:

β2 =1

75 (281020067.84)

(33.88)4 = 2.84 .

Since β2 is smaller than 3, we can concludethat the distribution of the daily turnover in75 bakeries is platikurtic, meaning that it isflatter than the normal distribution.

FURTHER READING� Measure of kurtosis� Measure of shape

REFERENCESSee coefficient of skewness β1 de Pearson.

Coefficient of SkewnessThe coefficient of skewness measures theskewness of a distribution. It is based on thenotion of the moment of the distribution.This coefficient is one of the measures ofskewness.

HISTORYBetween the end of the nineteenth centu-ry and the beginning of the twentieth cen-tury, Pearson, Karl studied large sets ofdata which sometimes deviated significant-ly from normality and exhibited consider-able skewness.He first used the following coefficient asa measure of skewness:

skewness = x−mode

S,

where x represents thearithmetic mean andS the standard deviation.This measure is equal to zero if the data aredistributed symmetrically.

Page 99: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Coefficient of Skewness 93

He discovered empirically that for a mod-erately asymmetric distribution (the gammadistribution):

Mo − x ≈ 3 · (Md − x) ,

where Mo and Md denote the mode andthe median of data set. By substituting thisexpression into the previous coefficient, thefollowing alternative formula is obtained:

skewness = 3 · (x−Md)

S.

Following this, Pearson, K. (1894,1895)introduced a coefficient of skewness, knownas the β1 coefficient, based on calculationsof the centered moments. This coefficientis more difficult to calculate but it is moredescriptive and better adapted to large num-bers of observations.Pearson, K. also created the coefficient ofkurtosis (β2), which is used to measure theoblateness of a curve. This coefficient is alsobased on the moments of the distributionbeing studied.Tables giving the limit values of the coeffi-cients β1 and β2 can be found in the worksof Pearson and Hartley (1966, 1972). If thesample estimates a fall outside the limit forβ1, β2, we conclude that the population issignificantly curved or skewed.

MATHEMATICAL ASPECTSThe skewness coefficient is based on the cen-tered third-order moment of the distributionin question, which is equal to:

μ3 = E[(X − μ)3

].

To obtain a coefficient of skewness that isindependent of the measuring unit, the third-order moment is divided by the standarddeviation of the population σ raised to the

third power. The coefficient obtained, desig-nated by

√β1, is equal to:

√β1 = μ3

σ 3 .

The estimator of this coefficient, calculat-ed for a sample (x1, x2, . . . xn), is denoted by√

b1. It is equal to:√

b1 = m3

S3,

where m3 is the centered third-ordermoment of the sample, given by:

m3 = 1

n∑i=1

(xi − x)3 .

Here x is the arithmetic mean, n is the totalnumber of observations and S3 is the stan-dard deviation raised to the third power.For the case where a random variable Xtakes values xi with frequencies fi, i =1, 2, . . . , h, the centered third-order momentof the sample is given by the formula:

m3 = 1

h∑i=1

fi · (xi − x)3 .

If the coefficient is positive, the distributionspreads to the the right. If it is negative, thedistribution expands to the left. If it is close tozero, the distribution is approximately sym-metric.If the sample is taken from a normal pop-ulation, the statistic

√b1 roughly follows

a normal distribution with a mean of 0 anda standard deviation of

√6n . If the size of

the sample n is bigger than 150, the nor-mal table can be used to test the skewnesshypothesis.

DOMAINS AND LIMITATIONSThis coefficient (similar to the other mea-sures of skewness) is only of interest if it

Page 100: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

94 Coefficient of Skewness

can be used to compare the shapes of two ormore distributions.

EXAMPLESSuppose that we want to compare the shapesof the daily turnover distributions obtainedfor 75 bakeries for two different years. Wethen calculate the skewness coefficient inboth cases.The data are categorized in the table below:

Turnover Frequenciesfor year 1

Frequenciesfor year 2

215–235 4 25

235–255 6 15

255–275 13 9

275–295 22 8

295–315 15 6

315–335 6 5

335–355 5 4

355–375 4 3

The third-order moment of the sample isgiven by:

m3 = 1

h∑i=1

fi · (xi − x)3 .

For year 1, n = 75, x = 290.60 and xi is thecenter of each class interval i. The calcula-tions are summarized in the following table:

xi xi − x fi fi (xi − x)3

225 −65.60 4 −1129201.664

245 −45.60 6 −568912.896

265 −25.60 13 −218103.808

285 −5.60 22 −3863.652

305 14.40 15 44789.760

325 34.40 6 244245.504

345 54.40 5 804945.920

365 74.40 4 1647323.136

821222.410

Since S = 33.88, the coefficient of skewnessis equal to:

√b1 =

1

75(821222.41)

(33.88)3 = 10949.632

38889.307

= 0.282 .

For year 2, n = 75 and x = 265.27. Thecalculationsaresummarized in thefollowingtable:

xi xi − x fi fi (xi − x)3

225 −40.27 25 −1632213.81

245 −20.27 15 −124864.28

265 −0.27 9 −0.17

285 19.73 8 61473.98

305 39.73 6 376371.09

325 59.73 5 1065663.90

345 79.73 4 2027588.19

365 99.73 3 2976063.94

4750082.83

Since S = 42.01, the coefficient of skewnessis equal to:

√b1 =

1

75(4750082.83)

(42.01)3 = 63334.438

74140.933

= 0.854 .

The coefficient of skewness for year 1 isclose to zero (

√b1 = 0.282), so the daily

turnover distribution for the 75 bakeries foryear 1 is very close to being a symmetricaldistribution. For year 2, the skewness coef-ficient is higher; this means that the distri-bution spreads towards the right.

FURTHER READING� Measure of shape� Measure of skewness

Page 101: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Coefficient of Variation 95

REFERENCESPearson, E.S., Hartley, H.O.: Biometrika

Tables for Statisticians, vols. I and II.Cambridge University Press, Cambridge(1966,1972)

Pearson, K.: Contributions to the mathe-matical theory of evolution. I. In: KarlPearson’s Early Statistical Papers. Cam-bridge University Press, Cambridge,pp. 1–40 (1948). First published in 1894as: On the dissection of asymmetrical fre-quency curves. Philos. Trans. Roy. Soc.Lond. Ser. A 185, 71–110

Pearson, K.: Contributions to the mathe-matical theory of evolution. II: Skewvariation in homogeneous material. In:Karl Pearson’s Early Statistical Papers.Cambridge University Press, Cambridge,pp. 41–112 (1948). First published in1895 in Philos. Trans. Roy. Soc. Lond.Ser. A 186, 343–414

Coefficient of Variation

The coefficient of variation is a measure ofrelative dispersion. It describes the stan-dard deviation as a percentage of the arith-metic mean.This coefficient can be used to compare thedispersions of quantitative variables thatarenotexpressed in thesameunits (forexam-ple, when comparing the salaries in differ-ent countries, given in different currencies),or the dispersions of variables that have verydifferent means.

MATHEMATICAL ASPECTSThe coefficient of variation CV is defined asthe ratio of the standard deviation to thearithmetic mean for a set of observations;

in other words:

CV = S

x· 100

for a sample, where:

S is the standard deviation of the sample,and

x is the arithmetic mean of the sample,

or:CV = σ

μ· 100

for a population, where

σ is the standard deviation of the popula-tion, and

μ is the mean of the population.

This coefficient is independent of the unit ofmeasurement used for the variable.

EXAMPLESLet us study the salary distributions for twocompanies from two different countries.According to a survey, the arithmeticmeans and the standard deviations of thesalaries are as follows:

Company A:

x = 2500 CHF ,

S = 200 CHF ,

CV = 200

2500· 100 = 8% .

Company B:

x = 1000 CHF ,

S = 50 CHF ,

CV = 50

1000· 100 = 5% .

The standard deviation represents 8% ofthe arithmetic mean for company A, and5% for company B. The salary distributionis a bit more homogeneous for company Bthan for company A.

Page 102: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

96 Collinearity

FURTHER READING� Arithmetic mean� Measure of dispersion� Standard deviation

REFERENCESJohnson, N.L., Leone, F.C.: Statistics and

experimental design in engineering andthe physical sciences. Wiley, New York(1964)

Collinearity

Variables are known to be mathematicallycollinear if one of them is a linear com-bination of the other variables. They areknown as statistically collinear if one ofthem is approximately a linear combinationof other variables. In the case of a regres-sion model where the explanatory variablesare strongly correlated to each other, we saythat there is collinearity (or multicollineari-ty) between the explanatory variables. In thefirst case, it is simply impossible to defineleast squares estimators, and in the secondcase, these estimators can exhibit consider-able variance.

HISTORYThe term “collinearity” was first used inmathematics at the beginning of the twen-tieth century, due to the rediscovery of thetheorem of Pappus of Alexandria (a third-century mathematician). Let A, B, C be threepoints on a line and A’, B’, C’ be three pointson a different line. If we relate the pairsusing AB’.A’B,CA’.AC’and BC’.B’C, theirintersections will occur in a line; in otherwords, the three intersection points will becollinear.

MATHEMATICAL ASPECTSIn the case of a matrix of explanatory vari-ables X, collinearity means that one of thecolumns of X is (approximately) a linearcombination of the other columns. Thisimplies that X′X is almost singular. Con-sequently, the estimator obtained by theleast squares method β = (

X′X)−1 X′Y

is obtained by inverting an almost singu-lar matrix, which causes its componentsto become unstable. The ridge regressiontechnique was created in order to deal withthese collinearity problems.A collinear relation between more than twovariables will not always be the result ofobserving the pairwise correlations betweenthe variables. A better indication of the pres-ence of a collinearity problem is provided byvariance inflation factors, VIF. The varianceinflation factor of an explanatory variable Xj

is defined by:

VIFj = 1

1− R2j

,

where R2j is the coefficient of determina-

tion of the model

Xj = β0 +∑k �=j

βkXk + ε .

ThecoefficientVIF takesvaluesofbetween1and∞. If the Xj are mathematically collinearwith other variables, we get R2

j = 1 andVIFj = ∞. On the other hand, if the Xj arereciprocally independent, we have R2

j = 0and VIFj = 1. In practice, we consider thatthere isa realproblem with collinearity whenVIFj is greater then 100, which correspondsto a R2

j that is greater then 0.99.

DOMAINS AND LIMITATIONSInverting a singular matrix, similar to invert-ing 0, is not a valid operation. Using the

Page 103: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Collinearity 97

same principle, inverting an almost singu-lar matrix is similar to inverting a very smallnumber. Some of the elements of the matrixmust therefore be very big. Consequently,when the explanatory variables are collinear,some elements of the matrix (X′X)−1 of β

will probably be very large. This is whycollinearity leads to unstable regression esti-mators. Aside from this problem, collinear-ity also results in a calculation problem; it isdifficult to precisely calculate the inverse ofan almost singular matrix.

EXAMPLESThirteen portions of cement are examinedin the following example. Each portion con-tains four ingredients, as described in thetable. The goal of the experiment is to deter-mine how the quantities X1, X2, X3 and X4,corresponding to the quantities of these fouringredients, affect the quantity Y of heat giv-en out as the cement hardens.

Yi quantity of heat given out during thehardening of the ith portion (in joules);

Xi1 quantity of ingredient 1 (tricalcium alu-minate) in the ith portion;

Xi2 quantity of ingredient 2 (tricalcium sil-icate) in the ith portion;

Xi3 quantity of the ingredient 3 (tetracalci-um aluminoferrite) in the ith portion;

Xi4 quantity of ingredient 4 (dicalcium sil-icate) in the ith portion.

Table: Heat given out by the cement portionsduring hardening

Por-tion

Ingre-dient

Ingre-dient

Ingre-dient

Ingre-dient

Heat

i 1 X1 2 X2 3 X3 4 X4 Y

1 7 26 6 60 78.5

2 1 29 12 52 74.3

Por-tion

Ingre-dient

Ingre-dient

Ingre-dient

Ingre-dient

Heat

i 1 X1 2 X2 3 X3 4 X4 Y

3 11 56 8 20 104.3

4 11 31 8 47 87.6

5 7 54 6 33 95.9

6 11 55 9 22 109.2

7 3 71 17 6 102.7

8 1 31 22 44 72.5

9 2 54 18 22 93.1

10 21 48 4 26 115.9

11 1 40 23 34 83.9

12 11 66 9 12 113.3

13 10 68 8 12 109.4

Source: Birkes & Dodge (1993)

We start with a simple linear regression.The model used for the linear regressionis:

Y = β0

+ β1X1 + β2X2 + β3X3 + β4X4

+ ε .

We obtain the following results:

Variable βi S.d. tcConstant 62.41 70.07 0.89

X1 1.5511 0.7448 2.08

X2 0.5102 0.7238 0.70

X3 0.1019 0.7547 0.14

X4 −0.1441 0.7091 −0.20

We can see that only coefficient X1 is signif-icantly greater then zero; in the other casestc < t(α/2,n−2) (value taken from the Studenttable) for a significance level of α = 0.1.Moreover, thestandarddeviationsof theesti-mated coefficients β2, β3 and β4 are greaterthen the coefficients themselves. When the

Page 104: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

98 Combination

variables are strongly correlated, it is knownthat the effect of one can mask the effect ofanother. Because of this, the coefficients canappear to be insignificantly different fromzero.To verify the presence of multicollinearityforacoupleofvariables,wecalculatethecor-relation matrix.

X1 X2 X3 X4

X1 1 0.229 −0.824 0.245

X2 0.229 1 −0.139 −0.973

X3 −0.824 −0.139 1 0.030

X4 0.245 −0.973 0.030 1

We note that a strong negative correlation(−0.973) exists between X2 and X4. Look-ing at the data, we can see the reason forthat. Aside from portions 1 to 5, the totalquantity of silicates (X2 + X4) is almostconstant across the portions, and is approx-imately 77; therefore, X4 is approximately77 − X2. this situation does not allow usto distinguish between the individual effectsof X2 and those of X4. For example, we seethat the four largest values of X4 (60, 52, 47and 44) correspond to values of Y smallerthen the mean heat 95.4. Therefore, at firstsight it seems that large quantities of ingre-dient 4 will lead to small amounts of heat.However, we also note that the four largestvalues of X4 correspond to the four small-est values of X2, giving a negative correla-tion between X2 and X4. This suggests thatingredient 4 taken alone has a small effecton variable Y, and that the small quanti-ties of 2 taken alone can explain the smallamount of heat emitted. Hence, the lineardependence between two explanatory vari-ables (X2 and X4) makes it more complicat-ed to see the effect of each variable alone onthe response variable Y.

FURTHER READING� Multiple linear regression� Ridge regression

REFERENCESBirkes, D., Dodge, Y.: Alternative Methods

of Regression. Wiley, New York (1993)

Bock, R.D.: Multivariate Statistical Meth-ods in Behavioral Research. McGraw-Hill, New York (1975)

Combination

A combination is an un-ordened collectionof unique elements or objects.A k-combination is a subset with k elements.The number of k-combinations from a set ofn elements is the number of arrangements.

HISTORYSee combinatory analysis.

MATHEMATICAL ASPECTS1. Combination without repetition

Combination without repetition describethe situation where each object drawn isnotplacedbackfor thenextdrawing.Eachobject can therefore only occur once ineach group.The number of combination without rep-etition of k objects among n is given by:

Ckn =

(nk

)= n!

k! · (n− k)!.

2. Combination with repetitionsCombination with repetitions (or withremittance) are used when each drawnobject is placed back for the next draw-ing. Each object can then occur r times ineach group, r = 0, 1, . . . , k.

Page 105: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Combinatory Analysis 99

The number of combinations with repeti-tions of k objects among n is given by:

Kkn =

(n+ k − 1

k

)= (n+ k − 1)!

k! · (n− 1)!.

EXAMPLES1. Combinations without repetition

Consider the situation where we mustchoose a committee of three people froman assembly of eight people. How manydifferent committees could potentially bepicked from this assembly, if each personcan only be selected once in each group?Here we need to calculate the numberof possible combinations of three peoplefrom eight:

Ckn =

n!

k! · (n− k)!= 8!

3! · (8− 3)!

= 40320

6 · 120= 56

Therefore, it is possible to form 56 dif-ferent committees containing three peo-ple from an assembly of eight people.

2. Combinations with repetitions Consideran urn containing six numbered balls. Wecarry out four successive drawings, andplace thedrawn ballback into theurn aftereach drawing. How many different com-binations could occur from this drawing?In this case we want to find the numberof combinations with repetition (becauseeach drawn ball is placed back in the urnbefore the next drawing). We obtain

Kkn =

(n+ k− 1)!

k! · (n− 1)!= 9!

4! · (6− 1)!

= 362880

24 · 120= 126

different combinations.

FURTHER READING� Arrangement� Binomial distribution� Combinatory analysis� Permutation

Combinatory Analysis

Combinatory analysis refers to a group oftechniques that can be used to determine thenumber of elements in a particular set with-out having to count them one-by-one.The elements in question could be the resultsfrom a scientific experiment or the differentpotential outcomes of a random event.Three particular concepts are important incombinatory analysis:• Permutations;• Combinations;• Arrangements.

HISTORYCombinatory analysis has interested mathe-maticians for centuries. According to Takacs(1982), such analysis dates back to ancientGreece. However, the Hindus, the Per-sians (including the poet and mathematicianKhayyâm, Omar) and (especially) the Chi-nese also studied such problems. A 3000year-old Chinese book “I Ching” describesthe possible arrangements of a set of nelements, where n ≤ 6. In 1303, Chu, Shih-chieh published a work entitled “Ssu YuanYü Chien” (Precious mirror of the fourelements). The cover of the book depict-sa triangle that shows the combinations ofk elements taken from a set of size n where0 ≤ k ≤ n. This arithmetic triangle was alsoexplored by several European mathemati-cians such as Stifel, Tartaglia and Hérigone,and especially Pascal, who wrote the “Trai-

Page 106: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

100 Compatibility

té du triangle arithmétique” (Treatise of thearithmetic triangle) in 1654 (although it wasnot published until after his death in 1665).Another document on combinations waspublished in 1617 by Puteanus, Eryciuscalled “Erycii Puteani Pretatis Thaumatain Bernardi Bauhusii è Societate Jesu Pro-teum Parthenium”. However, combinatoryanalysis only revealed its true power withthe works of Fermat (1601–1665) and Pas-cal (1623–1662). The term “combinatoryanalysis” was introduced by Leibniz (1646–1716) in 1666. In his work “Dissertatio deArte Combinatoria,” he systematically stud-ied problems related to arrangements, per-mutations and combinations.Otherworksin thisfieldshouldbementionedhere, such as thoseofWallis, J. (1616–1703),reported in “The Doctrine of Permutationsand Combinations” (an essential and funda-mental part of the “Doctrines of Chances”),or thoseofBernoulli, J.,Moivre, A. de,Car-dano, G. (1501–1576), and Galileo (1564–1642).In the second half of the nineteenth centu-ry, Cayley (1829–1895) solved some prob-lemsrelated to this typeofanalysisviagraph-ics that he called “trees”. Finally, we shouldalso mention the important work of MacMa-hon (1854–1929), “Combinatory Analysis”(1915–1916).

EXAMPLESSee arrangement, combination and per-mutation.

FURTHER READING� Arithmetic triangle� Arrangement� Binomial� Binomial distribution

� Combination� Permutation

REFERENCESMacMahon, P.A.: Combinatory Analysis,

vols. I and II.CambridgeUniversityPress,Cambridge (1915–1916)

Stigler, S.: The History of Statistics, theMeasurement of Uncertainty Before1900. Belknap, London (1986)

Takács, L.: Combinatorics. In: Kotz, S.,Johnson, N.L. (eds.) Encyclopedia of Sta-tistical Sciences, vol. 2. Wiley, New York(1982)

Compatibility

Two events are said to be compatible if theoccurrence of the first event does not pre-vent the occurrence of the second (or in otherwords, if the intersection between the twoevents is not null):

P(A ∩ B) �= 0 .

We can represent two compatible events Aand B schematically in the following way:

Two events A and B are incompatible (ormutually exclusive) if the occurrence of Aprevents the occurrence of B, or vice versa.We can represent this in the following way:

Page 107: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Compatibility 101

This means that the probability that thesetwo events happen at the same time is zero:

A ∩ B = φ −→ P(A ∩ B) = 0 .

MATHEMATICAL ASPECTSIf two events A and B are compatible, theprobability that at least one of the eventsoccurs can be obtained using the followingformula:

P(A ∪ B) = P(A)+ P(B)− P(A ∩ B) .

On the other hand, if the two events A and Bare incompatible, the probability that thetwo events A and B happen at the same timeis zero:

P(A ∩ B) = 0 .

Theprobability thatat leastoneof theeventsoccurs can be obtained by simply adding theindividual probabilities of A and B:

P(A ∪ B) = P(A)+ P(B) .

EXAMPLESConsider a random experiment thatinvolves drawing a card from a pack of52 cards. We are interested in the three fol-lowing events:

A = “draw a heart”

B = “draw a queen”

C = “draw a club” .

The probabilities associated with each ofthese events are:

P(A) = 13

52

P(B) = 4

52

P(C) = 13

52.

The events A and B are compatible, becauseit is possible to draw both a heart and a queenat the same time (the queen of hearts). There-fore, the intersection between A and B is thequeen of hearts. The probability of this eventis given by:

P(A ∩ B) = 1

52.

The probability of the union of the twoevents A and B (drawing either a heart ora queen) is then equal to:

P(A ∪ B) = P(A)+ P(B)− P(A ∩ B)

= 13

52+ 4

52− 1

52

= 4

13.

On the other hand, the events A and Care incompatible, because a card cannot beboth a heart and a club! The intersectionbetween A and C is an empty set.

A ∩ C = φ .

The probability of the union of the twoevents A and C (drawing a heart or a club) issimply given by the sum of the probabilitiesof each event:

P(A ∪ C) = P(A)+ P(C)

= 13

52+ 13

52

= 1

2.

FURTHER READINGEventIndependenceProbability

Page 108: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

102 Complementary

ComplementaryConsider the sample space � for a randomexperiment.For any event A, an element of �, we candetermine a new event B that contains all ofthe elements of the sample space � that arenot included in A.This event B is called the complement of Awith respect to� and is obtained by the nega-tion of A.

MATHEMATICAL ASPECTSConsider an event A, which is an element ofthe sample space �.The compliment of A with respect to � isdenoted A. It is given by the negation of A:

A = �− A

= {w ∈ �; w /∈ A} .

EXAMPLESConsider a random experiment that con-sists of flipping a coin three times.The sample space of this experiment is

� = {TTT, TTH, THT, THH,

HTT, HTH, HHT, HHH} .Consider the event

A = “Heads (H) occurs twice”

= {THH, HTH, HHT} .

The compliment of A with respect to � isequal to

A = {TTT, TTH, THT, HTT, HHH}= “Heads (H) does not occur twice” .

FURTHER READING� Event� Random experiment� Sample space

Complete Linkage Method

The complete linkage method is a hierarchi-cal classification method where the distancebetween two classes is defined as the greatestdistance that could be obtained if we selectone element from each class and measurethe distance between these elements. In oth-er words, it is the distance between the mostdistant elements from each class.For example, the distance used to constructthe distance table is the Euclidian distance.Using the complete linkage method, the dis-tance between two classes is given by theEuclidian distance between the most distantelements (the maximum distance).:

MATHEMATICAL ASPECTSSee cluster analysis.

FURTHER READING� Classification� Cluster analysis� Dendrogram� Distance� Distance table

CompletelyRandomized Design

A completely randomized design is a type ofexperimental design where the experimentalunits are randomly assigned to the differenttreatments.It is used when the experimental units arebelieved to be “uniform;” that is, when thereis no uncontrolled factor in the experiment.

HISTORYSee design of experiments.

Page 109: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Composite Index Number 103

EXAMPLESWe want to test five different drugs based onaspirin. To do this, we randomly distributethefive typesofdrug to 40 patients.Denotingthe five drugs by A, B, C, D and E, we obtainthe following random distribution:

A is attributed to 10 people;

B is attributed to 12 people;

C is attributed to 4 people;

D is attributed to 7 people;

E is attributed to 7 people.

We have then a completely randomizeddesign where the treatments (drugs) are ran-domly attributed to the experimental units(patients), and each patient receives only onetreatment. We also assume that the patientsare “uniform:” that there are no differencesbetween them. Moreover, we assume thatthere is no uncontrolled factor that inter-venes during the treatment.In this example, the completely randomizeddesign is a factorial experiment that usesonly one factor: the aspirin. The five typesof aspirin are different levels of the factor.

FURTHER READING� Design of experiments� Experiment

CompositeIndex Number

Composite index numbers allow us to mea-sure, with a single number, the relative varia-tions within a group of variables upon mov-ing from one situation to another.The consumer price index, the wholesaleprice index, the employment index and the

Dow-Jones index are all examples of com-posite index numbers.The aim of using composite index numbersis to summarize all of the simple index num-bers contained in a complex number (a valueformed from a set of simple values) in justone index.The most commonly used composite indexnumbers are:• The Laspeyres index;• The Paasche index;• The Fisher index.

HISTORYSee index number.

MATHEMATICAL ASPECTSThere are several methods of creating com-posite index numbers.To illustrate these methods, let us use a sce-nario where a price index is determined forthe current period n with respect to a refer-ence period 0.1. Index number of the arithmetic means

(the sum method):

In/0 =∑

Pn∑P0· 100 ,

where∑

Pn is the sum of the prices of theitems at the current period, and

∑P0 is

the sum of the prices of the items at thereference period.

2. Arithmetic mean of simple index num-bers:

In/0 = 1

N·∑(

Pn

P0

)· 100 ,

where N is the number of goods consid-ered and Pn

P0is the simple index number

of each item.In these two methods, each item hasthe same importance. This is a situation

Page 110: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

104 Conditional Probability

which often does not correspond to real-ity.

3. Index number of weighted arithmeticmeans (the weighted sum method):The general formula for an index numbercalculated by the weighted sum method isas follows:

In/0 =∑

Pn · Q∑P0 · Q · 100 .

Choosing the quantity Q for each itemconsidered could prove problematic here:Q must be the same for both the numera-tor and the denominator when calculatinga price index.In the Laspeyres index, the value of Qcorresponding to the reference year isused. In the Paasche index, thevalueofQfor the current year is used. Other statisti-cians have proposed using the value of Qfor a given year.

EXAMPLESConsider the following table indicatingthe(fictitious) prices of three consumergoods in the reference year (1970) and theircurrent prices.

Price (francs)

Goods 1970 (P0) Now (Pn )

Milk 0.20 1.20

Bread 0.15 1.10

Butter 0.50 2.00

Using these numbers, we now examine thethree main methods of constructing compos-ite index numbers.1. Index number of arithmetic means (the

sum method):

In/0 =∑

Pn∑P0· 100

= 4.30

0.85· 100 = 505.9 .

According to this method, the price indexhas increased by 405.9% (505.9 − 100)between thereferenceyearand thecurrentyear.

2. Arithmetic mean of simple index num-bers:

In/0 = 1

N·∑(

Pn

P0

)· 100

= 1

3·(

1.20

0.20+ 1.10

0.15+ 2.00

0.50

)· 100

= 577.8 .

This method gives a slightly differentresult from the previous one, since weobtain an increase of 477.8% (577.8 −100) in the price index between the ref-erence year and the current year.

3. Index number of weighted arithmeticmeans (the weighted sum method):

In/0 =∑

Pn · Q∑P0 · Q · 100 .

This method is used in conjunction withthe Laspeyres index or the Paascheindex.

FURTHER READING� Fisher index� Index number� Laspeyres index� Paasche index� Simple index number

Conditional Probability

Theprobabilityofaneventgiventhatanotherevent is known to have occured.The conditional probability is denotedP(A|B), which is read as the “probabilityof A conditioned by B.”

Page 111: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Conditional Probability 105

HISTORYThe concept of independence dominatedprobability theory until the beginning ofthe twentieth century. In 1933, Kolmogorov,Andrei Nikolaievich introduced the conceptof conditional probability; this concept nowplays an essential role in theoretical andapplied probability and statistics.

MATHEMATICAL ASPECTSConsider a random experiment for whichwe know the sample space �. Consider twoeventsAand B from this space.Theprobabi-lity of A, P(A) depends on the set of possibleevents in the experiment (�).

Now consider that we have supplementaryinformation concerning the experiment: thatthe event B has occurred. The probability ofthe event A occurring will then be a functionof thespaceB rather thana functionof�.Theprobability of A conditioned by B is calculat-ed as follows:

P(A|B) = P(A ∩ B)

P(B).

If A and B are two incompatible events,the intersection between A and B is anempty space. We will then have P(A|B) =P(B|A) = 0.

DOMAINS AND LIMITATIONSThe concept of conditionalprobability is oneof themost importantones in probability the-

ory. This importance is mainly due to the fol-lowing points:1. We are often interested in calculating the

probability of an event when some infor-mation about the result is already known.In this case, the probability required isa conditional probability.

2. Even when partial information on theresult is not known, conditional probabi-lities can be useful when calculating theprobabilities required.

EXAMPLESConsider a group of 100 cars distributedaccording to two criteria, comfort and speed.We will make the following distinctions:

a car can be

{fast

slow,

a car can be

{comfortable

uncomfortable.

A partition of the 100 cars based on these cri-teria is provided in the following table:

fast slow total

comfortable 40 10 50

uncomfortable 20 30 50

total 60 40 100

Consider the following events:

A = “a fast car is chosen”

and B = “a comfortable car is chosen.”

The probability of these two events are:

P(A) = 0.6 ,

P(B) = 0.5 .

The probability of choosing a fast car is thenof 0.6, and that of choosing a comfortable caris 0.5.

Page 112: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

106 Confidence Interval

Now imagine that we are given supplemen-tary information: a fast car was chosen.What, then, is the probability that this car isalso comfortable?We calculate the probability of B knowingthat A has occurred, or the conditional proba-bility of B depending on A:We find in the table that

P(A ∩ B) = 0.4 .

⇒ P(B|A) = P(A ∩ B)

P(A)= 0.4

0.6= 0.667 .

The probability that the car is comfortable,given that we know that it is fast, is there-fore 2

3 .

FURTHER READING� Event� Probability� Random experiment� Sample space

REFERENCESKolmogorov, A.N.: Grundbegriffe der

Wahrscheinlichkeitsrechnung. Springer,Berlin Heidelberg New York (1933)

Kolmogorov, A.N.: Foundations of the The-ory of Probability. Chelsea PublishingCompany, New York (1956)

Confidence IntervalA confidence interval is any interval con-structed around an estimator that has a par-ticular probability of containing the truevalue of the corresponding parameter ofa population.

HISTORYAccording to Desrosières, A. (1988), Bow-ley,A.L.wasoneof thefirst tobeinterested in

the concept of the confidence interval. Bow-leypresentedhisfirstconfidence intervalcal-culations to the Royal Statistical Society in1906.

MATHEMATICAL ASPECTSIn order to construct a confidence intervalthat contains the true value of the param-eter θ with a given probability, an equationof the following form must be solved:

P(Li ≤ θ ≤ Ls) = 1− α ,

where

θ is the parameter to be estimated,Li is the lower limit of the interval,Ls is the upper limit of the interval and1 − α is the given probability, called the

confidence level of the interval.

The probabilityα measures the error risk ofthe interval, meaning the probability that theinterval does not contain the true value of theparameter θ .In order to solve this equation, a functionf (t, θ) must be defined where t is an estima-tor of θ , for which the probability distri-bution is known.Defining this interval for f (t, θ) involveswriting the equation:

P(k1 ≤ f (t, θ) ≤ k2) = 1− α ,

where the constants k1 and k2 are given bythe probability distribution of the functionf (t, θ). Generally, the error risk α is divid-ed into two equal parts at α

2 distributed oneach side of the distribution of f (t, θ). If, forexample, the function f (t, θ) follows a cen-tered and reduced normal distribution, theconstantsk1 andk2 willbesymmetricandcanbe represented by −z α

2and +z α

2, as shown

in the following figure.

Page 113: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Confidence Interval 107

Once the constants k1 and k2 have beenfound, the parameter θ must be isolated inthe equation given above. The confidenceinterval θ is found in this way for the con-fidence level 1− α.

DOMAINS AND LIMITATIONSOne should be very careful when interpret-ing a confidence interval. If, for a confidencelevel of 95%, we find a confidence intervalfor a mean of μ where the lower and upperlimits are k1 and k2 respectively, we can con-clude the following (for example):“On the basis of the studied sample, we canaffirm that it is probable that the mean ofthe population can be found in the intervalestablished.”It would not be correct to conclude that thereis a 95% chance of finding the mean of thepopulation in the interval. Indeed, since μ

and the limitsk1 andk2 of the intervalarecon-stants, the interval may or may not containμ.However, if the statistician has the abilityto repeat the experiment (which consists ofdrawing a sample from the population) sev-eral times, 95% of the intervals obtained willcontain the true value of μ.

EXAMPLESA business that fabricates lightbulbs wantsto test the average lifespan of its lightbulbs.The distribution of the random variable X,which represents the life span in hours, isa normal distribution with mean μ andstandard deviation σ = 30.

In order to estimate μ, the business burns outn = 25 lightbulbs.It obtains an average lifespan of x = 860hours. It wants to establish a confidenceinterval around the estimator x at a confi-dence level of 0.95. Therefore, the first stepis to obtain a function f (t, θ) = f (x, μ) forthe known distribution. Here we use:

f (t, θ) = f (x, μ) = x− μσ√

n

,

which follows a centered and reduced nor-mal distribution. The equation P(k1 ≤f (t, θ) ≤ k2) = 1− α becomes:

P

(−z0.025 ≤ x− μ

σ√n

≤ z0.025

)= 0.95 ,

because the error risk α has been dividedinto two equal parts at α

2 = 0.025.The table for the centered and reduced nor-mal distribution, the normal table, givesz0.025 = 1.96. Therefore:

P

(−1.96 ≤ x− μ

σ√n

≤ 1.96

)= 0.95 .

To obtain the confidence interval for μ ata confidence level of 0.95, μ must be iso-lated in the equation above:

P

(−1.96 ≤ x− μ

σ√n

≤ 1.96

)= 0.95

P

(−1.96

σ√n≤ x− μ ≤ 1.96

σ√n

)

= 0.95

P

(x− 1.96

σ√n≤ μ ≤ x+ 1.96

σ√n

)

= 0.95 .

Page 114: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

108 Confidence Level

By replacing x, σ and n with their respectivevalues, we obtain:

P

(860− 1.96

30√25≤ μ ≤ 860

+ 1.9630√25

)= 0.95

P(848.24 ≤ μ ≤ 871.76) = 0.95 .

The confidence interval for μ at the confi-dence level 0.95 is therefore:

[848.24, 871.76] .

This means that we can affirm with a proba-bility of 95% that this interval contains thetrue value of the parameter μ that corre-sponds to the average lifespan of the light-bulbs.

FURTHER READING� Confidence level� Estimation� Estimator

REFERENCESBowley, A.L.: Presidential address to the

economic section of the British Asso-ciation. J. Roy. Stat. Soc. 69, 540–558(1906)

Desrosières, A. La partie pour le tout: com-mentgénéraliser?Lapréhistoirede lacon-trainte de représentativité. Estimation etsondages. Cinq contributions à l’histoirede la statistique. Economica, Paris (1988)

Confidence LevelThe confidence level is the probability thatthe confidence interval constructed aroundan estimator contains the true value of thecorresponding parameter of the popula-tion.

We designate the confidence level by (1 −α), where α corresponds to the risk of error;that is, to the probability that the confidenceinterval does not contain the true value of theparameter.

HISTORYThe first example of a confidence inter-val appears in the work of Laplace (1812).According to Desrosières, A. (1988), Bow-ley, A.L. was one of the first to become inter-ested in the concept of the confidence inter-val.See hypothesis testing.

MATHEMATICAL ASPECTSLet θ be a parameter associated with a pop-ulation. θ is to be estimated and T is its esti-mator from a random sample. We evaluatethe precision of T as the estimator of θ byconstructing a confidence interval aroundthe estimator, which is often interpreted asan error margin.In order to construct this confidence interval,we generally proceed in the following man-ner. From the distribution of the estimator T,we determine an interval that is likely to con-tain the true value of the parameter. Let usdenote this interval by (T − ε, T + ε) andthe probability of true value of the paramterbeing in this interval as (1−α). We can thensay that the error margin ε is related to α bythe probability:

P(T − ε ≤ θ ≤ T + ε) = 1− α .

The level of probability associated with anintervalofestimation iscalled theconfidencelevel or the confidence degree.The interval T − ε ≤ θ ≤ T + ε is calledtheconfidenceinterval forθat theconfidencelevel 1−α. Let us use, for example,α = 5%,

Page 115: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Confidence Level 109

which will give the confidence interval of theparameter θ to a probability level of 95%.This means that, if we use T as an estima-tor of θ , then the interval indicated will onaverage contain the true value of the param-eter 95 times out of 100 samplings, and it willnot contain it 5 times.The quantity εof the confidence interval cor-responds to half od the length of the interval.This parameter therefore gives us an idea oftheerrormargin for theestimator.Foragivenconfidence level 1−α, the smaller the confi-dence interval, more efficient the estimator.

DOMAINS AND LIMITATIONSThe most commonly used confidence levelsare 90%, 95% and 99%. However, if neces-sary, other levels can be used instead.Although we would like to use the highestconfidence level in order to maximize theprobability that the confidence interval con-tains the true value of the parameter, but ifwe increase theconfidence level,the intervalincreases as well. Therefore, what we gain interms of confidence is lost in terms of preci-sion, so have to find a compromise.

EXAMPLESA company that produces ligthtbulbs wantsto study the mean lifetime of its bulbs. Thedistribution of the random variable X thatrepresents the lifetime in hours is a normaldistribution of mean μ and standard devi-ation σ = 30.In order to estimateμ, thecompanyburnsouta random sample of n = 25 bulbs.The company obtains an average bulb life-time of x = 860 hours. It then wants toconstruct a 95% confidence interval for μ

around the estimator x.

The standard deviation σ of the populationis known; the value of ε is zα/2 ·σX . The val-ue of zα/2 is obtained from the normal table,and it depends on the probability attributedto the parameter α. We then deduce the con-fidence interval of the estimator of μ at theprobability level 1− α:

X − zα/2σX ≤ μ ≤ X + zα/2σX .

From the hypothesis that the bulb lifetime Xfollows a normal distribution with mean μ

and standard deviation σ = 30, we deducethat the expression

x− μσ√

n

follows a standard normal distribution.From this we obtain:

P

[−z0.025 ≤ x− μ

σ√n

≤ z0.025

]= 0.95 ,

where the risk of error α is divided into twoparts that both equal α

2 = 0.025.The table of the standard normal distribution(the normal table) gives z0.025 = 1.96. Wethen have:

P

(−1.96 ≤ x− μ

σ√n

≤ 1.96

)= 0.95 .

To get the confidence interval for μ, we mustisolate μ in the following equation via thefollowing transformations:

P

(−1.96

σ√n≤ x− μ ≤ 1.96

σ√n

)

= 0.95 ,

P

(x− 1.96

σ√n≤ μ ≤ x+ 1.96

σ√n

)

= 0.95 .

Page 116: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

110 Contingency Table

Substituting x, σ and n for their respectivevalues, we evaluate the confidence interval:

860− 1.96 · 30√25

≤ μ ≤ 860+ 1.96 · 30√25

848.24 ≤ μ ≤ 871.76 .

We can affirm with a confidence level of 95%that this interval contains the true value ofthe parameter μ, which corresponds to themean bulb lifetime.

FURTHER READING� Confidence interval� Estimation� Estimator� Hypothesis testing

Contingency TableA contingency table is a crossed table con-taining various attributes of a population oran observed sample.Contingency tableanal-ysis consists of discovering and studyingthe relations (if they exist) between theseattributes.A contingency table can be a two-dimen-sional table with r lines and c columns relat-ing to two qualitative categorical variablespossessing, respectively, r and c categories.It can also be multidimensional when thenumber of qualitative variables is greaterthen two: if, for example, the elements ofa population or a sample are characterizedby three attributes, the associated contingen-cy table has the dimensions I×J×K, where Irepresents the number of categories definingthe first attribute, J the number of categoriesof the second attribute and K the number ofthe categories of the third attribute.

HISTORYThe term “contingency,” used in relation toa crossed table of categorical data, seems tohave originated with Pearson, Karl (1904),who used he term “contingency” to meana measure of the total deviation relative tothe independence.See also chi-square test of independence.

MATHEMATICAL ASPECTSIf we consider a two-dimensional table, con-taining entries for two qualitative categori-cal variables X and Y that have, respectively,r and c categories, the contingency table is:

Categories of the variable Y

Categoriesof the

variable X

Y1 . . . Yc TotalX1 n11 . . . n1c n1.

. . . . . . . . . . . . . . .Xr nr1 . . . nrc nr.

Total n.1 . . . n.c n..

where

nij represents the observed frequency forcategory i of variable X and category jof variable Y;

ni. represents the sum of the frequenciesobserved for category i of variable X,

n.j represents the sum of the observed fre-quencies for category jofvariableY, and

n.. indicates the total number of observa-tions.

In the case of a multidimensional table, theelements of the table are denoted by nijk, rep-resenting the observed frequency for catego-ry i of variable X, category j of variable Y andcategory k of variable Z.

DOMAINS AND LIMITATIONSThe independence of two categorical quali-tativevariables represented in thecontingen-

Page 117: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Continuous Distribution Function 111

cy table can be assessed by performing a chi-square test of independence.

FURTHER READING� Chi-square test of independence� Frequency distribution

REFERENCESFienberg, S.E.: The Analysis of Cross-

Classified Categorical Data, 2nd edn.MIT Press, Cambridge, MA (1980)

Pearson, K.: On the theory of contingencyand its relation to association and normalcorrelation. Drapers’ Company ResearchMemoirs, Biometric Ser. I., pp. 1–35(1904)

Continuous DistributionFunction

The distribution function of a continuousrandom variable is defined to be the proba-bility that the random variable takes a valueless than or equal to a real number.

HISTORYSee probability.

MATHEMATICAL ASPECTSThe function defined by

F(b) = P(X ≤ b) =∫ b

−∞f (x)dx .

is called the distribution function of a con-tinuous random variable.In other words, the density function f isthe derivative of the continuous distributionfunction.

Properties of the Continuous DistributionFunction1. F(x) is a continually increasing function

for all x;2. F takes its values in the interval [0, 1];3. lim

b→−∞F(b) = 0;

4. limb→∞F(b) = 1;

5. F(x) is a continuous and differentiablefunction.

This distribution function can be graphical-ly represented on a system of axes. The dif-ferent values of the random variable X areplotted on the abscissa and the correspond-ing values of F(x) on the ordinate.

DOMAINS AND LIMITATIONSThe probability that the continuous ran-dom variable X takes a value in the inter-val ]a, b] for all a < b, meaning that P(a <

X ≤ b), is equal to F(b)− F(a), where F isthe distribution function of the random vari-able X.Demonstration: The event {X ≤ b} can bewritten as the union of two mutually exclu-sive events: {X ≤ a} and {a < X ≤ b}:{X ≤ b} = {X ≤ a} ∪ {a < X ≤ b} .

By finding the probability on each side ofthe equation, we obtain:

P(X ≤ b) = P({X ≤ a} ∪ {a < X ≤ b})= P(X ≤ a)+ P(a < X ≤ b) .

Page 118: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

112 Continuous Probability Distribution

The sum of probabilities result from the factthat the two events are exclusive.By subtracting P(X ≤ a) on each side, wehave:

P(a < X ≤ b) = P(X ≤ b)− P(X ≤ a) .

Finally, from the definition of the distri-bution function, we obtain:

P(a < X ≤ b) = F(b)− F(a) .

EXAMPLESConsider a continuous random variable Xfor which the density function is given by:

f (x) ={

1 if 0 < x < 1

0 if not.

The probability that X takes a value in theinterval [a, b], with 0 < a and b < 1, is asfollows:

P(a ≤ X ≤ b) =∫ b

af (x)dx

=∫ b

a1dx

= x∣∣∣ba

= b− a .

Therefore, for 0 < x < 1 the distributionfunction is:

F(x) = P(X ≤ x)

= P(0 ≤ X ≤ x)

= x .

This function is presented in the followingfigure:

FURTHER READING� Density function� Probability� Random experiment� Random variable� Value

Continuous ProbabilityDistribution

Every random variable has a correspond-ing frequency distribution. For a continu-ous random variable, this distribution is con-tinuous too.A continuous probability distribution isa model that represents the frequencydistribution of a continuous variable inthe best way.

MATHEMATICAL ASPECTSThe probability distribution of a continu-ous random variable X is given by its den-sity function f (x) or its distribution func-tion F(x).It can generally be characterized by itsexpected value:

E[X] =∫

Dx · f (x) dx = μ

and its variance:

Var(X) =∫

D(x− μ)2 · f (x)dx

= E[(X − μ)2

]

= E[X2]− E[X]2 ,

Page 119: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Contrast 113

where D represents the interval covering therange of values that X can take.One essential property of a continuous ran-dom variable is that the probability that itwill take a specific numerical value is zero,whereas the probability that it will take a val-ue over an interval (finite or infinite) is usu-ally nonzero.

DOMAINS AND LIMITATIONSThe most famous continuous probabilitydistribution is the normal distribution.Continuous probability distributions areoften used to approximate discrete proba-bility distributions. They are used in modelconstruction just as much as they are usedwhen applying statistical techniques.

FURTHER READING� Beta distribution� Cauchy distribution� Chi-square distribution� Continuous distribution function� Density function� Discrete probability distribution� Expected value� Exponential distribution� Fisher distribution� Gamma distribution� Laplace distribution� Lognormal distribution� Normal distribution� Probability� Probability distribution� Random variable� Student distribution� Uniform distribution� Variance of a random variable

REFERENCESJohnson, N.L., Kotz, S.: Distributions in

Statistics: Continuous Univariate Distri-

butions, vols. 1 and 2. Wiley, New York(1970)

ContrastIn analysis of variance a contrast is a linearcombination of the observations or factorlevels or treatments in a factorial experi-ment, where the sum of the coefficients iszero.

HISTORYAccording to Scheffé,H. (1953),Tukey,J.W.(1949 and 1951) was the first to proposea method of simultaneously estimating allof the contrasts.

MATHEMATICAL ASPECTSConsider T1, T2, . . . , Tk, which are the sumsof n1, n2, . . . , nk observations. The linearfunction

cj = c1j · T1 + c2j · T2 + · · · + ckj · Tk

is a contrast if and only if

k∑i=1

ni · cij = 0 .

If each ni = n,meaning that if Ti is the sumofthe same number of observations, the con-dition is reduced to:

k∑i=1

cij = 0 .

DOMAINS AND LIMITATIONSIn most experiments involving severaltreatments, it is interesting for the exper-imenter to make comparisons between thedifferent treatments. The statistician usescontrasts to carry out this type of compari-son.

Page 120: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

114 Convergence

EXAMPLESWhen an analysis of variance is carried outfor a three-level factor, some contrasts ofinterest are:

c1 = T1 − T2

c2 = T1 − T3

c3 = T2 − T3

c4 = T1 − 2 · T2 + T3 .

FURTHER READING� Analysis of variance� Experiment� Factorial experiment

REFERENCESOstle, B.: Statistics in Research: Basic Con-

cepts and Techniques for Research Work-ers. Iowa State College Press, Ames, IA(1954)

Scheffé, H.: A method for judging all con-trasts in theanalysisofvariance.Biometri-ka 40, 87–104 (1953)

Tukey, J.W.: Comparing individual means intheanalysisofvariance.Biometrics5, 99–114 (1949)

Tukey, J.W.: Quick and Dirty Methods inStatistics. Part II: Simple Analyses forStandard Designs. Quality Control Con-ference Papers 1951. American Societyfor Quality Control, New York, pp. 189–197 (1951)

Convergence

In statistics, the term “convergence” is relat-ed to probability theory.This statistical con-vergence is often termed stochastic conver-gence in order to distinguish it from classicalconvergence.

MATHEMATICAL ASPECTSDifferent types of stochastic convergencecan be defined. Let {xn}n∈N be a set of ran-dom variables. The most important types ofstochastic convergence are:1. {Xn}n∈� converges in distribution to

a random variable X if

limn→∞FXn(z) = FX(z) ∀z ,

where FXn and FX are the distributionfunctions of Xn and X, respectively.This convergence is simply the point con-vergence (well-known in mathematics)ofthe set of the distribution functions ofthe Xn.

2. {Xn}n∈� converges in probability to a ran-dom variable X if:

limn→∞P (|Xn − X| > ε) = 0 ,

for every ε > 0 .

3. {Xn}n∈� exhibits almost sure conver-gence to a random variable X if:

P({

w| limn→∞Xn(w) = X(w)

})= 1 .

4. Suppose that all elements of Xn havea finite expectancy. The set {Xn}n∈� con-verges in mean square to X if:

limn→∞E

[(Xn − X)2

]= 0 .

Note that:• Almost sure convergence and mean

square convergence both imply a con-vergence in probability;

• Convergence in probability (weak con-vergence) implies convergence in distri-bution.

Page 121: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Correlation Coefficient 115

EXAMPLESLet Xi be independent random variables uni-formly distributed over [0, 1]. We define thefollowing set of random variables from Xi:

Zn = n · mini=1,...,n

Xi .

We can show that the set {Zn}n∈� convergesin distribution to an exponential distri-bution Z with a parameter of 1 as follows:

1− FZn(t) = P(Zn > t)

= P

(min

i=1,...,nXi >

t

n

)

= P(

X1 >t

nand X2 >

t

nand

. . . Xn >t

n

)

ind.=n∏

i=1

P(

Xi >t

n

)=

(1− t

n

)n.

Now, for limn→∞,(1− t

n

)n = exp (−t)since:

limn→∞FZn = lim

n→∞P(Zn ≤ t)

= 1− exp(−t) = FZ .

Finally, let Sn be the number of success-es obtained during n Bernoulli trials witha probability of success p. Bernoulli’s theo-rem tells us that Sn

n converges in probabilityto a “random” variable that takes the value pwith probability 1.

FURTHER READING� Bernoulli’s theorem� Central limit theorem� Convergence theorem� De Moivre–Laplace Theorem� Law of large numbers� Probability� Random variable� Stochastic process

REFERENCESLe Cam, L.M., Yang, C.L.: Asymptotics

in Statistics: Some Basic Concepts.Springer, Berlin Heidelberg New York(1990)

Staudte, R.G., Sheater, S.J.: Robust Esti-mation and Testing. Wiley, New York(1990)

Convergence Theorem

The convergence theorem leads to the mostimportant theoretical results in probabilitytheory. Among them, we find the law oflarge numbers and the central limit the-orem.

EXAMPLESThe central limit theorem and the law oflarge numbers are both convergence the-orems.The law of large numbers states that themean of a sum of identically distributed ran-dom variables converges to their commonmathematical expectation.On the other hand, the central limit theoremstates that the distribution of the sum of a suf-ficiently large number of random variablestends to approximate the normal distri-bution.

FURTHER READING� Central limit theorem� Law of large numbers

Correlation Coefficient

The simple correlation coefficient is a mea-sure of the strength of the linear relationbetween two random variables.

Page 122: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

116 Correlation Coefficient

The correlation coefficient can take val-ues that occur in the interval [−1; 1]. Thetwo extreme values of this interval representa perfectly linear relation between the vari-ables, “positive” in the first case and “nega-tive” in the other. The value 0 (zero) impliesthe absence of a linear relation.The correlation coefficient presented here isalso called the Bravais–Pearson correlationcoefficient.

HISTORYThe concept of correlation originated in the1880s with the works of Galton, F.. In hisautobiographyMemories of My Life (1890),he writes that he thought of this concept dur-ing a walk in the grounds of Naworth Castle,when a rain shower forced him tofindshelter.According to Stigler, S.M. (1989), Por-ter, T.M. (1986) was carrying out historicalresearch when he found a forgotten articlewritten by Galton in 1890 in TheNorth Ame-rican Review, under the title “Kinship andcorrelation”. In this article, which he pub-lished right after its discovery, Galton (1908)explained the nature and the importance ofthe concept of correlation.Thisdiscovery wasrelated to previousworksof the mathematician, notably those onheredity and linear regression. Galton hadbeen interested in this field of study since1860. He published a work entitled “Naturalinheritance” (1889), which was the startingpoint for his thoughts on correlation.In 1888, in an article sent to the Royal Statis-tical Society entitled “Co-relations and theirmeasurement chiefly from anthropometricdata,” Galton used the term “correlation” forthe first time, although he was still alter-nating between the terms “co-relation” and“correlation” and he spoke of a “co-relationindex.” On the other hand, he invoke the con-

cept of a negative correlation. According toStigler (1989), Galton only appeared to sug-gest that correlation was a positive relation-ship.Pearson, Karl wrote in 1920 that correla-tion had been discovered by Galton, whosework “Natural inheritance” (1889) pushedhim to study this concept too, along with twoother researchers, Weldon and Edgeworth.Pearson and Edgeworth then developed thetheory of correlation.Weldon thought the correlation coefficientshouldbecalled the“Galtonfunction.”How-ever, Edgeworth replaced Galton’s term “co-relation index” and Weldon’s term “Galtonfunction” by the term “correlation coeffi-cient.”According to Mudholkar (1982), Pear-son, K. systemized the analysis of correla-tion and established a theory of correlationfor three variables. Researchers from Uni-versity College, most notably his assistantYule, G.U., were also interested in develop-ing multiple correlation.Spearman published the first study on rankcorrelation in 1904.Among the works that were carried outin this field, it is worth highlighting thoseof Yule, who in an article entitled “Whydo we sometimes get non-sense-correlationbetween time-series” (1926) discussed theproblem of correlation analysis interpre-tation. Finally, correlation robustness wasinvestigated by Mosteller and Tukey (1977).

MATHEMATICAL ASPECTSSimple Linear Correlation CoefficientSimple linear correlation is the term used todescribe a linear dependence between twoquantitative variables X and Y (see simplelinear regression).

Page 123: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Correlation Coefficient 117

If X and Y are random variables that fol-low an unknown joint distribution, then thesimple linear correlation coefficient is equalto the covariance between X and Y divid-ed by the product of their standard devia-tions:

ρ = Cov(X, Y)

σXσY.

Here Cov(X, Y) is the measured covariancebetween X and Y; σX and σY are the respec-tive standard deviations of X and Y.Given a sample of size n, (X1, Y1),(X2, Y2), . . . , (Xn, Yn) from the joint distri-bution, the quantity

r =

n∑i=1

(Xi − X)(Yi − Y)

√√√√n∑

i=1

(Xi − X)2n∑

i=1

(Yi − Y)2

is an estimation of ρ; it is the sampling cor-relation.If we denote (Xi − X) by xi and (Yi − Y) byyi, we can write this equation as:

r =

n∑i=1

xiyi

√√√√(

n∑i=1

x2i

)(n∑

i=1

y2i

) .

Test of HypothesisTo test the null hypothesis

H0 : ρ = 0

against the alternative hypothesis

H1 : ρ �= 0 ,

we calculate the statistic t:

t = r

Sr,

where Sr is the standard deviation of theestimator r:

Sr =√

1− r2

n− 2.

Under H0, the statistic t follows a Studentdistribution with n−2 degrees of freedom.For a given significance level α, H0 is reject-ed if |t| ≥ t α

2 ,n−2; the value of t α2 ,n−2 is the

critical value of the testgiven in theStudenttable.

Multiple Correlation CoefficientKnown as the coefficient of determina-tion denoted by R2, determines whetherthe hyperplane estimated from a multiplelinear regression is correctly adjusted tothe data points.The value of the multiple determinationcoefficient R2 is equal to:

R2 = Explained variation

Total variation

=

n∑i=1

(Yi − Y)2

n∑i=1

(Yi − Y)2

.

It corresponds to the square of the multiplecorrelation coefficient. Notice that

0 ≤ R2 ≤ 1 .

In the case of simple linear regression, thefollowing relation can be derived:

r = sign(β1)√

R2 ,

Page 124: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

118 Correlation Coefficient

where β1 is the estimator of the regressioncoefficient β1, and it is given by:

β1 =

n∑i=1

(Xi − X)(Yi − Y)

n∑i=1

(Xi − X)2

.

DOMAINS AND LIMITATIONSIf there is a linear relation between two vari-ables, the correlation coefficient is equal to1 or −1.A positive relation (+) means that the twovariables vary in the same direction. If theindividuals obtain high scores in the firstvariable (for example the independentvariable), they will have a tendency toobtain high scores in the second variable(the dependant variable). The opposite isalso true.A negative relation (−) means that the indi-viduals that obtain high scores in the firstvariable will have a tendency to obtain lowscores in the second one, and vice versa.Note that if thevariablesare independent thecorrelation coefficient is equal to zero. Thereciprocal conclusion is not necessarily true.The fact that two or more variables are relat-ed in a statistical way is not sufficient to con-clude that a cause and effect relation exists.The existence of a statistical correlation isnot a proof of causality.Statistics provides numerous correlationcoefficients. The choice of which to use fora particular set of data depends on differentfactors, such as:• The type of scale used to express the vari-

able;• The nature of the underlying distribution

(continuous or discrete);

• The characteristics of the distribution ofthe scores (linear or nonlinear).

EXAMPLESThe data for two variables X and Y areshown in the table below:

No of x = y =order X Y X − X Y − Y xy x2 y2

1 174 64 −1.5 −1.3 1.95 2.25 1.69

2 175 59 −0.5 −6.3 3.14 0.25 36.69

3 180 64 4.5 −1.3 −5.85 20.25 1.69

4 168 62 −7.5 −3.3 24.75 56.25 10.89

5 175 51 −0.5 −14.3 7.15 0.25 204.49

6 170 60 −5.5 −5.3 29.15 30.25 28.09

7 170 68 −5.5 2.7 −14.85 30.25 7.29

8 178 63 2.5 −2.3 −5.75 6.25 5.29

9 187 92 11.5 26.7 307.05 132.25 712.89

10 178 70 2.5 4.7 11.75 6.25 22.09

Total 1755 653 358.5 284.5 1034.1

X = 175.5 and Y = 65.3 .

We now perform the necessary calculationsto obtain the correlation coefficient betweenthe two variables. Applying the formulagives:

r =

n∑i=1

xiyi

√√√√(

n∑i=1

x2i

)(n∑

i=1

y2i

)

= 358.5√284.5 · 1034.1

= 0.66 .

Test of HypothesisWe can calculate the estimated standarddeviation of r:

Sr =√

1− r2

n− 2=

√0.56

10− 2= 0.266 .

Page 125: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Correspondence Analysis 119

Calculating the statistic t gives:

t = r − 0

Sr= 0.66

0.266= 2.485 .

If we choose a significance level α of 5%,the value from the Student table, t0.025,8, isequal to 2.306.Since |t| = 2.485 > t0.025,8 = 2.306, thenull hypothesis

H0 : ρ = 0

is rejected.

FURTHER READING� Coefficient of determination� Covariance� Dependence� Kendall rank correlation coefficient� Multiple linear regression� Regression analysis� Simple linear regression� Spearman rank correlation coefficient

REFERENCESGalton, F.: Co-relations and their measu-

rement, chiefly from anthropologicaldata. Proc. Roy. Soc. Lond. 45, 135–145(1888)

Galton, F.: Natural Inheritance. Macmillan,London (1889)

Galton, F.: Kinship and correlation. NorthAm. Rev. 150, 419–431 (1890)

Galton, F.: Memories of My Life. Methuen,London (1908)

Mosteller, F., Tukey, J.W.: Data Analysisand Regression: A Second Course inStatistics. Addison-Wesley, Reading, MA(1977)

Mudholkar, G.S.: Multiple correlation coef-ficient. In: Kotz, S., Johnson, N.L. (eds.)Encyclopedia of Statistical Sciences,vol. 5. Wiley, New York (1982)

Pearson, K.: Studies in the history of statis-tics and probability. Biometrika 13, 25–45 (1920). Reprinted in: Pearson, E.S.,Kendall, M.G. (eds.) Studies in the Histo-ry ofStatisticsand Probability,vol. I.Grif-fin, London

Porter, T.M.: The Rise of Statistical Think-ing, 1820–1900. Princeton UniversityPress, Princeton, NJ (1986)

Stigler, S.: Francis Galton’s account of theinvention of correlation. Stat. Sci. 4, 73–79 (1989)

Yule, G.U.: On the theory of correlation.J. Roy. Stat. Soc. 60, 812–854 (1897)

Yule, G.U. (1926) Why do we sometimesget nonsense-correlations between time-series? A study in sampling and the natureof time-series. J. Roy. Stat. Soc. (2) 89, 1–64

Correspondence Analysis

Correspondence analysis is a data analysistechnique that is used to describe contingen-cy tables (or crossed tables). This analysistakes the form of a graphical representa-tion of the associations and the “correspon-dence” between rows and columns.

HISTORYThetheoreticalprinciplesofcorrespondenceanalysis date back to the works of Hart-ley, H.O. (1935) (published under his orig-inal name Hirschfeld) and of Fisher, R.A.(1940) on contingency tables. They werefirst presented in the framework of inferen-tial statistics.The term “correspondence analysis” firstappeared in the autumn of 1962, and the firstpresentation of this method that referred to

Page 126: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

120 Correspondence Analysis

this term was given by Benzécri, J.P. in thewinter of 1963. In 1976 the works of Ben-zécri, J.P., which retraced twelve years of hislaboratory work, were published, and sincethen the algebraic and geometrical proper-ties of this descriptive analytical tool havebecome more widely known and used.

MATHEMATICAL ASPECTSConsider a contingency table relating to twocategorial qualitative variables X and Ythat have, respectively, r and c categories:

Y1 Y2 . . . Yc Total

X1 n11 n12 . . . n1c n1.

X2 n21 n22 . . . n2c n2.

. . . . . . . . . . . . . . . . . .

Xr nr1 nr2 . . . nrc nr.

Total n.1 n.2 . . . n.c n..

where

nij represents the frequency that category iof variable X and category j of variableY is observed,

ni. represents the sum of the observed fre-quencies for category i of variable X,

n.j represents the sum of the observed fre-quencies for category j of variable Y,

n.. represents the total number of observa-tions.

We will assume that r ≥ c; if not we takethe transpose of the initial table and usethis transpose as the new contingency table.The correspondence analysis of a contingen-cy table with more lines than columns, is per-formed as follows:1. Tables of row profiles XI and column pro-

files XJ are constructed..For a fixed line (column), the line (col-umn) profile is the line (column) obtained

by dividing each element in this row (col-umn)by thesum of theelements in the line(column).The line profile of row i is obtained bydividing each term of row i by ni., whichis the sum of the observed frequencies inthe row.The table of row profiles is constructedby replacing each row of the contingencytable with its profile:

Y1 Y2 . . . Yc Total

X1n11n1.

n12n1.

. . . n1cn1.

1

X2n21n2.

n22n2.

. . . n2cn2.

1

. . . . . . . . . . . . . . . . . .

Xrnr1nr.

nr2nr.

. . . nrcnr.

1

Total n′.1 n′.2 . . . n′.c r

It is also common to multiply each ele-mentof the tableby 100inorder toconvertthe terms into percentagesand tomake thesum of terms in each row 100%.The column profile matrix is constructedin a similar way, but this time each col-umn of the contingency table is replacedwith its profile: the column profile of col-umn j is obtained by dividing each term ofcolumn j by n.j, which is the sum of fre-quencies observed for the category cor-responding to this column.

Y1 Y2 . . . Yc Total

X1n11n.1

n12n.2

. . . n1cn.c

n′1.

X2n21n.1

n22n.2

. . . n2cn.c

n′2.

. . . . . . . . . . . . . . . . . .

Xrnr1n.1

nr2n.2

. . . nrcn.c

n′r.Total 1 1 . . . 1 c

Thetablesof rowprofilesandcolumnpro-files correspond to a transformation of thecontingency table that is used to makethe rows and columns comparable.

Page 127: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Correspondence Analysis 121

2. Determine the inertia matrix V .This is done in the following way:• The weighted mean of the r column

coordinates is calculated:

gj =r∑

i=1

ni.

n..· nij

ni.= n.j

n.., j = 1, . . . , c .

• The c obtained values gj are writtenr times in the rows of a matrix G;

• The diagonal matrix DI is constructedwith diagonal elements of ni.

n..;

• Finally, the inertia matrix V is calcu-lated using the following formula:

V = (XI − G)′ · DI · (XI − G) .

3. Using the matrix M, which consists of n..n.j

terms on its diagonal and zero terms else-where, we determine the matrix C:

C = √M · V · √M .

4. Find the eigenvalues (denoted kl) andthe eigenvectors (denoted vl) of thismatrix C.The c eigenvalues kc, kc−1, . . . , k1 (writ-ten in decreasing order) are the iner-tia. The corresponding eigenvectors arecalled the factorial axes (or axes of iner-tia).Foreach eigenvalue,wecalculate thecor-responding inertiaexplained bythe facto-rial axis. For example, the first factorialaxis explains:

100 · k1c∑

l=1

kl

(in %) of inertia.

In the same way, the two first factorialaxes explain:

100 · (k1 + k2)c∑

l=1

kl

(in %) of inertia.

If we want to know, for example, the num-ber of eigenvalues and therefore the fac-torial axes that explain at least 3/4 of thetotal inertia, we sum the explained iner-tia from each of the eigenvalues until weobtain 75%.We then calculate the main axes of inertiafrom these factorial axes.

5. The main axes of inertia, denoted ul, arethen given by:

ul =√

M−1 · vl ,

meaning that its jth component is:

ujl =√

n.j

n..· vjl .

6. We then calculate the main components,denoted by yk, which are the orthogonalprojections of the row coordinates on themain axes of inertia: the ith coordinate ofthe lth main component takes the follow-ing value:

yil = xi ·M · ul ,

meaning that

yil =c∑

j=1

nij

ni.

√n..

n.j· vjl

is the coordinate of row i on the lth axis.7. After the main components yl (of the

row coordinates) have been calculated,we determine the main components of thecolumn coordinates (denoted by zl) usingthe yl, thanks to the transaction formulae:

zjl = 1√kl

r∑i=1

nij

n.j· yil ,

for j = 1, 2, . . . , c;

yil = 1√kl

c∑i=1

nij

ni.· zjl ,

for i = 1, 2, . . . , r .

Page 128: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

122 Correspondence Analysis

zjl is the coordinate of the column j onthe lth factorial axis.

8. The simultaneous representation of therow coordinates and the column coordi-nates on a scatter plot with two factori-al axes, gives a signification to the axisdepending on the points it is related to.Thequality of the representation is relatedto the proportion of the inertia explainedby the two main axes used. The closer theexplained inertia is to 1, the better thequality.

9. It can be useful to insert additional point-rows or column coordinates (illustrativevariables) along with the active variablesused for the correspondence analysis.Consider an extra row-coordinate:

(ns1, ns2, ns3, . . . , nsc)

of profile-row(

ns1

ns.,

ns2

ns., . . . ,

nsc

ns.

).

The lth main component of the extra row-dot is given by the second formula ofpoint 7):

yil = 1√kl

c∑j=1

nsj

ns.· zjl ,

for i = 1, 2, . . . , r .

We proceed the same way for a columncoordinate, but apply the first formula ofpoint 7 instead.

DOMAINS AND LIMITATIONSWhen simultaneously representing the rowcoordinate and the column coordinate, itis not advisable to interpret eventual prox-imities crossed between lines and columnsbecause the two points are not in the sameinitial space. On the other hand it is interest-ing to interpret the position of a row coor-dinate by comparing it to the set of columncoordinate (and vice versa).

EXAMPLESConsider the example of a company thatwants to find out how healthy its staff. Oneof the subjects covered in the question-naire concerns the number of medical visits(dentists included) per year. Three possibleanswers are proposed:• Between 0 and 6 visits;• Between 7 and 12 visits;• More than 12 visits per year.The staff questioned are distributed into fivecategories:• Managerial staff members over 40;• Managerial staff members under 40;• Employees over 40;• Employees under 40;• Office personnel.The resultsare reported in the followingcon-tingency table, to which margins have beenadded:

Number of visitsper year

0 to 6 7 to 12 > 12 Total

Managerialstaff members> 40

5 7 3 15

Managerialstaff members< 40

5 5 2 12

Page 129: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Correspondence Analysis 123

Number of visitsper year

0 to 6 7 to 12 > 12 Total

Employees> 40

2 12 6 20

Employees< 40

24 18 12 54

Officepersonnel

12 8 4 24

Total 48 50 27 125

Using these data we will describe the eightmain steps that should be followed to obtaina graphical representation of employeehealth via correspondence analysis.1. Wefirstdetermine the tableof lineprofiles

matrix XI, by dividing eachelementby thesum of the elements of the line in whichit is located:

0.333 0.467 0.2 10.417 0.417 0.167 10.1 0.6 0.3 10.444 0.333 0.222 10.5 0.333 0.167 11.794 2.15 1.056 5

and the table of column profiles by divid-ing each element by the sum of the ele-ments in the corresponding column:

0.104 0.14 0.111 0.355

0.104 0.1 0.074 0.278

0.042 0.24 0.222 0.504

0.5 0.36 0.444 1.304

0.25 0.16 0.148 0.558

1 1 1 3

2. We then calculate the matrix of inertia Vfor the frequencies (given by ni.

n..for val-

ues of i from 1 to 5) by proceeding in thefollowing way:

• We start by calculating the weightedmean of the five line-dots:

gj =5∑

i=1

ni.

n..· nij

ni.= n.j

n..

for j = 1, 2 and 3;• We then write these three values five

times in thematrix G, as shown below:

G =

⎡⎢⎢⎢⎢⎢⎣

0.384 0.400 0.2160.384 0.400 0.2160.384 0.400 0.2160.384 0.400 0.2160.384 0.400 0.216

⎤⎥⎥⎥⎥⎥⎦

.

• We then construct a diagonal matrixDI containing the ni.

n..;

• Finally, we calculate the matrix ofinertia V , as given by the followingformula:

V = (XI − G)′ · DI · (XI − G) ,

which, in this case, gives:

V =⎡⎣

0.0175 −0.0127 −0.0048−0.0127 0.0097 0.0029−0.0048 0.0029 0.0019

⎤⎦ .

3. We define a third-order square matrix Mthat contains n..

n.jterms on its diagonal and

zero terms everywhere else:

M =⎡⎣

2.604 0 00 2.5 00 0 4.630

⎤⎦ .

The square root of M, denoted√

M, isobtained by taking the square root of eachdiagonal element of M. Using this newmatrix, we determine

C = √M · V · √M

C =⎡⎣

0.0455 −0.0323 −0.0167−0.0323 0.0243 0.0100−0.0167 0.0100 0.0087

⎤⎦ .

Page 130: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

124 Correspondence Analysis

4. The eigenvalues of C are obtained bydiagonalizing the matrix. Arrangingthese values in decreasing order gives:

k1 = 0.0746

k2 = 0.0039

k3 = 0 .

The explained inertia is determined foreach of these values. For example:

k1 :0.0746

0.0746+ 0.0039= 95.03% ,

k2 :0.0039

0.0746+ 0.0039= 4.97% .

For the last one, k3, the explained inertiais zero.The first two factorial axes, associatedwith the eigenvalues k1 and k2, explain allof the inertia. Since the third eigenvalueis zero it is not necessary to calculate theeigenvector that is associated with it. Wefocus on calculating the first two normal-ized eigenvectors then:

v1 =⎡⎣

0.7807−0.5576−0.2821

⎤⎦ and

v2 =⎡⎣

0.08070.5377−0.8393

⎤⎦ .

5. We then calculate the main axes of inertiaby:

ui =√

M−1 · vi , for i = 1 and 2 ,

where√

M−1 is obtained by inverting thediagonal elements of

√M.

We find:

u1 =⎡⎣

0.4838−0.3527−0.1311

⎤⎦ and

u2 =⎡⎣

0.05000.3401−0.3901

⎤⎦ .

6. We then calculate the main componentsby projecting the rows onto the mainaxes of inertia. Constructing an auxiliarymatrix U formed from the two vectors u1

and u2, we define:

Y = XI ·M ·U

=

⎡⎢⎢⎢⎢⎢⎣

−0.1129 0.07900.0564 0.1075−0.5851 0.0186

0.1311 −0.06000.2349 0.0475

⎤⎥⎥⎥⎥⎥⎦

.

Wecansee thecoordinatesof thefiverowswritten horizontally in this matrix Y; forexample, the first column indicates thecomponents related to the first factorialaxis and the second indicates the compo-nents related to the second axis.

7. We then use the coordinates of the rows tofindthoseof thecolumnsvia thefollowingtransition formulae written as matrices:

Z = K · Y ′ · XJ ,

where:

K is the second-order diagonal matrixthat has 1/

√ki terms on its diagonal

and zero terms elsewhere;

Y′ is the transpose of the matrix contain-ing the coordinates of the rows, and;

XJ is the column profile matrix.

Page 131: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Correspondence Analysis 125

We obtain the matrix:

Z =[

0.3442 −0.2409 −0.16580.0081 0.0531 −0.1128

],

where each column contains the compo-nents of one of the three column coor-dinates; for example, the first line corre-sponds to each coordinate on the first fac-torial axis and the second line to eachcoordinate on the second axis.We can verify the transition formula thatgives Y from Z:

Y = XI · Z′ · K

using the same notation as seen previous-ly.

8. We can now represent the five categoriesof people questioned and the three cate-gories of answers proposed on the samefactorial plot:

We can study this factorial plot at three dif-ferent levels of analysis, depending:• The set of categories for the people ques-

tioned;• The set of modalities for the medical vis-

its;• Both at the same time.In the first case, close proximity betweentworows(betweentwocategoriesofperson-nel) signifies similar medical visit profiles.On the factorial plot, this is the case for the

employees under 40 (Y4) and the office per-sonnel (Y5). We can verify from the table ofline profiles that the percentages for thesetwo categories are indeed very similar.Similarly, the proximity between twocolumns (representing two categories relat-ed to the number of medical visits) indicatessimilar distributions of people within thebusiness for these categories. This can beseen for the modalities Z2 (from 7 to 12medical visits per year) and Z3 (more than12 visits per year).If we consider the rows and the columnssimultaneously (and not separately as we didpreviously), it becomes possible to identi-fy similarities between categories for cer-tain modalities. For example, the employ-ees under 40 (Y4) and the office personnel(Y5) seem to have the same behavior towardshealth: high proportions of them (0.44 and0.5 respectively) go to the doctor less than6 times per year (Z1).In conclusion, axis 1 is confronted on oneside with the categories indicating an aver-ageorhigh numberofvisits (Z2 and Z3)—theemployees or the managerial staff membersover 40 (Y1 and Y3)—and on the other sidewith themodalitiesassociatedwith lownum-bersofvisits (Z1):Y2,Y4 and Y5.Thefirst fac-tor can then be interpreted as the importanceof medical control according to age.

FURTHER READING� Contingency table� Data analysis� Eigenvalue� Eigenvector� Factorial axis� Graphical representation� Inertia matrix� Matrix� Scatterplot

Page 132: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

126 Covariance

REFERENCESBenzécri, J.P.: L’Analyse des données.

Vol. 1: La Taxinomie. Vol. 2: L’Analysefactorielle des correspondances. Dunod,Paris (1976)

Benzécri, J.P.: Histoire et préhistoire del’analyse des données, les cahiers del’analyse des données 1, no. 1–4. Dunod,Paris (1976)

Fisher, R.A.: The precision of discriminantfunctions.Ann.Eugen.(London)10,422–429 (1940)

Greenacre, M.: Theory and Applicationsof Correspondence Analysis. Academic,London (1984)

Hirschfeld,H.O.:Aconnectionbetweencor-relation and contingency. Proc. Camb.Philos. Soc. 31, 520–524 (1935)

Lebart, L., Salem, A.: Analyse Statistiquedes Données Textuelles. Dunod, Paris(1988)

Saporta, G.: Probabilité, analyse de donnéeset statistiques. Technip, Paris (1990)

CovarianceThe covariance between two random vari-ables X and Y is the measure of how muchtwo random variables vary together.If X and Y are independent random vari-ables, the covariance of X and Y is zero. Theconverse, however, is not true.

MATHEMATICAL ASPECTSConsider X and Y, two random variablesdefined in the same sample space �. ThecovarianceofX andY,denotedbyCov(X, Y),is defined by

Cov(X, Y) = E[(X − E[X])(Y − E[Y])] ,

where E[.] is the expected value.

Developing the right side of the equationgives:

Cov(X, Y) = E[XY − E[X]Y − XE[Y]

+ E[X]E[Y]]

= E[XY]− E[X]E[Y]

− E[X]E[Y]+ E[X]E[Y]

= E[XY]− E[X]E[Y] .

Properties of CovarianceConsider X, Y and Z, which are randomvariables defined in the same sample space�, and a, b, c and d, which are constants. Wefind that:1. Cov(X, Y) = Cov(Y, X)

2. Cov(X, c) = 03. Cov(aX + bY, Z) =

a Cov(X, Z)+ bCov(Y, Z)

4. Cov(X, cY + dZ) =c Cov(X, Y)+ dCov(X, Z)

5. Cov(aX + b, cY + d) = ac Cov(X, Y).

Consequences of the Definition1. If X and Y are independent random vari-

ables,Cov(X, Y) = 0 .

In fact E[XY] = E[X]E[Y], meaning that:

Cov(X, Y) = E[XY]− E[X]E[Y] = 0 .

The reverse is not generally true:Cov(X, Y) = 0 does not necessarilyimply that X and Y are independent.

2. Cov(X, X) = Var(X)

where Var(X) represents the varianceof X.In fact:

Cov(X, X) = E[XX]− E[X]E[X]

= E[X2]− (E[X])2

= Var(X) .

Page 133: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Covariance 127

DOMAINS AND LIMITATIONSConsider two random variables X and Y,and their sum X + Y. We then have:

E[X + Y] = E[X]+ E[Y] and

Var(X + Y) = Var(X)+ Var(Y)

+ 2Cov(X, Y) .

We now show these results for discrete vari-ables. If Pji = P(X = xi, Y = yj) we have:

E[X + Y] =∑

i

∑j

(xi + yj)Pji .

=∑

i

∑j

xiPji +∑

i

∑j

yjPji

=∑

i

xi

⎛⎝∑

j

Pji

⎞⎠

+∑

j

yj

(∑i

Pji

)

=∑

i

xiPi +∑

j

yjPj

= E[X]+ E[Y] .

Moreover:

Var(X + Y) = E[(X + Y)2]− (E[X + Y])2

= E[X2]+ 2E[XY]+ E[Y2]

− (E[X + Y])2

= E[X2]+ 2E[XY]+ E[Y2]

− (E[X]+ E[Y])2

= E[X2]+ 2E[XY]+ E[Y2]

− (E[X])2 − 2E[X]E[Y]− (E[Y])2

= Var(X)+ Var(Y)+ 2(E[XY]

− E[X]E[Y])

= Var(X)+ Var(Y)+ 2Cov(X, Y) .

These results can be generalized for nrandom variables X1, X2, . . . , Xn, with xi

having an expected value equal to E(Xi)

and a variance equal to Var(Xi). We thenhave:

E[X1 + X2 + · · · + Xn]

= E[X1]+ E[X2]+ · · · + E[Xn]

=n∑

i=1

E[Xi] .

Var(X1 + X2 + · · · + Xn)

= Var(X1)+ Var(X2)+ · · · + Var(Xn)

+ 2[Cov(X1, X2)+ · · · + Cov(X1, Xn)

+ Cov(X2, X3)+ · · · + Cov(X2, Xn)

+ · · · + Cov(Xn−1, Xn)]

=n∑

i=1

Var(Xi)+ 2n−1∑i=1

∑j>i

Cov(Xi, Xj) .

EXAMPLESConsider two psychological tests carriedout in succession. Each subject receivesa grade X of between 0 and 3 for the first testand a grade Y of between 0 and 2 for the sec-ond test. Given that the probabilities of Xbeing equal to 0, 1, 2 and 3 are respectively0.16, 0.3, 0.41 and 0.13, and that theprobabi-lities of Y being equal to 0, 1 and 2 are respec-tively 0.55, 0.32 and 0.13, we have:

E[X] = 0 · 0.16+ 1 · 0.3+ 2 · 0.41

+ 3 · 0.13

= 1.51

E[Y] = 0 · 0.55+ 1 · 0.32+ 2 · 0.13

= 0.58

E[XY] = 0 · 0 · 0.16 · 0.55

+ 0 · 1 · 0.16 · 0.32+ . . .

+ 3 · 2 · 0.13 · 0.13

= 0.88 .

Page 134: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

128 Covariance Analysis

We can then calculate the covariance of Xand Y:

Cov(X, Y) = E[XY]− E[X]E[Y]

= 0.88− (1.51 · 0.58)

= 0.00428 .

FURTHER READING� Correlation coefficient� Expected value� Random variable� Variance of a random variable

Covariance Analysis

Covariance analysis is a method used to esti-mate and test the effects of treatments. Itchecks whether there is a significant differ-ence between the means of several treat-ments by taking into account the observedvalues of the variable before the treatment.Covariance analysis is a precise way of per-forming treatment comparisons because itinvolves adjusting the response variable Y toa concomitant variable X which correspondsto the values observed before the treatment.

HISTORYCovariance analysis dates back to 1930. Itwas first developed by Fisher, R.A. (1932).After that, other authors applied covarianceanalysis to agricultural and medical prob-lems. For example, Bartlett, M.S. (1937)applied covariance analysis to his studies oncottoncultivation inEgyptandonmilkyieldsfrom cows in winter.Delurry, D.B. (1948) used covariance anal-ysis to compare the effects of different med-ications (atropine, quinidine, atrophine) onrat muscles.

MATHEMATICAL ASPECTSWe consider here a covariance analysis ofa completely randomized design implyingone factor.The linear model that we will consider is thefollowing:

Yij = μ+ τi + βXij + εij ,

i = 1, 2, . . . , t , j = 1, 2, . . . , ni

where

Yij represents observation j, receivingtreatment i,

μ is the general mean common to all treat-ments,

τi is the actual effect of treatment i on thobservation,

Xij is the value of the concomitant variable,and

εij is the experimental error in observationYij.

CalculationsIn order to calculate the F ratio that will helpus to determine whether there is a significantdifference between treatments, we need towork out sums of squares and sums of prod-ucts. Therefore, if Xi. and Yi. are respectivelythe means of the X values and the Y valuesfor treatment i, and if X.. and Y.. are respec-tively the means of all the values of X and Y,we obtain the formulae given below.1. The total sum of squares for X:

SXX =t∑

i=1

ni∑j=1

(Xij − X..)2 .

2. The total sum of squares for Y (SYY)is calculated in the same way as SXX, but Xis replaced by Y.

Page 135: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Covariance Analysis 129

3. The total sum of products of X and Y:

SXY =t∑

i=1

ni∑j=1

(Xij − X..)(Yij − Y..) .

4. The sum of squares of the treatments forX:

TXX =t∑

i=1

ni∑j=1

(Xi. − X..)2 .

5. The sum of squares of the treatments for Y(TYY )is calculated in the same way as TXX, but Xis replaced by Y.

6. The sum of the products of the treatmentsof X and Y:

TXY =t∑

i=1

ni∑j=1

(Xi. − X..)(Yi. − Y..) .

7. The sum of squares of the errors for X:

EXX =t∑

i=1

ni∑j=1

(Xij − Xi.)2 .

8. Thesum of thesquaresof theerrors forY:is calculated in thesameway asEXX, butXis replaced by Y.

9. Thesumofproductsof theerrorsX andY:

EXY =t∑

i=1

ni∑j=1

(Xij − Xi.)(Yij − Yi.) .

Substituting in appropriate values and calcu-lating theseformulaecorrespondstoananal-ysis of variance for each of X, Y and XY.The degrees of freedom associated withthese different formulae are as follows:

1. For the total sum:t∑

i=1

ni − 1.

2. For the sum of the treatments: t − 1.

3. For the sum of the errors:t∑

i=1

ni − t.

Adjustment of the variable Y to the con-comitant variable X yields two new sums ofsquares:1. The adjusted total sum of squares:

SStot = SYY − S2XY

SXX.

2. The adjusted sum of the squares of theerrors:

SSerr = EYY − E2XY

EXX.

The new degrees of freedom for these twosums are:1.

∑ti=1 ni − 2;

2.∑t

i=1 ni− t− 1, where a degree of free-dom is subtracted due to the adjustment.

The third adjusted sum ofsquares, theadjust-ed sum of the squares of the treatments, isgiven by:

SStr = SStot − SSerr .

Thishasthesamenumberofdegrees of free-dom t − 1 as before.

Covariance Analysis TableWe now have all of the elements needed toestablish the covariance analysis table. Thesums of squares divided by the number ofdegrees of freedom gives the means of thesquares.

Sourceof var-iation

Degreesof free-dom

Sum of squaresand of products

t∑

i=1x2

i

t∑

i=1xi yi

t∑

i=1y2

i

Treat-ments

t − 1 TXX TXY TYY

Errors∑t

i=1 ni−t EXX EXY EYY

Total∑t

i=1 ni−1 SXX SXY SYY

Page 136: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

130 Covariance Analysis

Note: the numbers in the∑

x2i and

∑y2

icolumns cannot be negative; on the otherhand, the numbers in the

∑ti=1 xiyi column

can be negative.

Adjustment

Sourceof var-iation

Degrees offreedom

Sum ofsquares

Mean ofsquares

Treat-ments

t− 1 SStr MCtr

Errors∑t

i=1 ni−t−1 SSerr MCerr

Total∑t

i=1 ni − 2 TSS

F Ratio: Testing the TreatmentsThe F ratio, used to test the null hypothesisthat there is a significant difference betweenthe means of the treatments once adjustedto the variable X, is given by:

F = MCtr

MCerr.

The ratio follows a Fisher distribution witht−1 and

∑ti=1 ni−t−1 degrees of freedom.

The null hypothesis

H0 : τ1 = τ2 = . . . = τt

will be rejected at the significance level α ifthe F ratio is superior or equal to the valueof the Fisher table, in other words if:

F ≥ Ft−1,∑t

i=1 ni−t−1,α .

It is clear that we assume that the β coeffi-cient is different from zero when perform-ing covarianceanalysis. If this isnot thecase,a simple analysis of variance is sufficient.

Test Concerning the β SlopeSo, we would like to know whether there isa significant effect of the concomitant vari-

ables before the application of the treatment.To test this hypothesis, we will assume thenull hypothesis to be

H0 : β = 0

and the alternative hypothesis to be

H1 : β �= 0 .

The F ratio can be established:

F = E2XY/EXX

MCerr.

It follows a Fisher distribution with 1 and∑ti=1 ni − t − 1 degrees of freedom. The

null hypothesis will be rejected at the sig-nificance level α if the F ratio is superior toor equal to the value in the Fisher table, inother words if:

F ≥ F1,∑t

i=1 ni−t−1,α .

DOMAINS AND LIMITATIONSThe basic hypotheses that need to be con-structed before initiating a covariance anal-ysisare thesameasthoseusedforananalysisof variance or a regression analysis. Theseare the hypotheses of normality, homogene-ity (homoscedasticity), variances and inde-pendence.In covariance analysis, as in an analysis ofvariance, the null hypothesis stipulates thatthe independent samples come from differ-ent populations that have identical means.Moreover, since there are always conditionsassociated with any statistical technique,those that apply to covariance analysis are asfollows:1. The population distributions must be

approximately normal, if not completelynormal.

Page 137: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Covariance Analysis 131

2. The populations from which the samplesare taken must have the same varianceσ 2, meaning:

σ 21 = σ 2

2 = . . . = σ 2k ,

where k is the number of populations tobe compared.

3. The samples must be chosen random-ly and all of the samples must be inde-pendent.

We must also add a basic hypothesis specif-ic to covariance analysis, which is that thetreatments that were carried out must notinfluence thevaluesof theconcomitantvari-able X.

EXAMPLESConsider an experiment consisting of com-paring the effects of three different diets ona population of cows.The data are presented in the form of a table,which includes the three different diets, eachof which was administered to five porch. Theinitial weights are denoted by the concomi-tantvariableX (inkg),andthegains inweight(after treatment) are denoted by Y:

Diets

1 2 3

X Y X Y X Y

32 167 26 182 36 158

29 172 33 171 34 191

22 132 22 173 37 140

23 158 28 163 37 192

35 169 22 182 32 162

Wefirstcalculate thevarioussumsof squaresand products:1. The total sum of squares for X:

SXX =3∑

i=1

5∑j=1

(Xij − X..)2

= (32− 29.8667)2 + . . .

+ (32− 29.8667)2

= 453.73 .

2. The total sum of squares for Y:

SYY =3∑

i=1

5∑j=1

(Yij − Y..)2

= (167− 167.4667)2 + . . .

+ (162− 167.4667)2

= 3885.73 .

3. The total sum of the products of X and Y:

SXY =3∑

i=1

5∑j=1

(Xij − X..)(Yij − Y..)

= (32− 29.8667)

· (167− 167.4667)+ . . .

+ (32− 29.8667)

· (162− 167.4667)

= 158.93 .

4. The sum of the squares of the treatmentsfor X:

TXX =3∑

i=1

5∑j=1

(Xi. − X..)2

= 5(28.2− 29.8667)2

+ 5(26.2− 29.8667)2

+ 5(35.2− 29.8667)2

= 223.33 .

5. The sum of the squares of the treatmentsfor Y:

TYY =3∑

i=1

5∑j=1

(Yi. − Y..)2

Page 138: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

132 Covariance Analysis

= 5(159.6− 167.4667)2

+ 5(174.2− 167.4667)2

+ 5(168.6− 167.4667)2

= 542.53 .

6. The sum of the products of the treatmentsof X and Y:

TXY =3∑

i=1

5∑j=1

(Xi. − X..)(Yi. − Y..)

= 5(28.2− 29.8667)

· (159.6− 167.4667)+ . . .

+ 5(35.2− 29.8667)

· (168.6− 167.4667)

= −27.67 .

7. The sum of the squares of the errors for X:

EXX =3∑

i=1

5∑j=1

(Xij − Xi.)2

= (32− 28.2)2 + . . .+ (32− 35.2)2

= 230.40 .

8. The sum of the squares of the errors for Y:

EYY =3∑

i=1

5∑j=1

(Yij − Yi.)2

= (167− 159.6)2 + . . .

+ (162− 168.6)2

= 3343.20 .

9. The sum of the products of the errors of Xand Y:

EXY =3∑

i=1

5∑j=1

(Xij − Xi.)(Yij − Yi.)

= (32− 28.2)(167− 159.6)+ . . .

+ (32− 35.2)(162− 168.6)

= 186.60 .

The degrees of freedom associated withthese different calculations are as follows:1. For the total sums:

3∑i=1

ni − 1 = 15− 1 = 14 .

2. For the sums of treatments:

t − 1 = 3− 1 = 2 .

3. For the sums of errors:

3∑i=1

ni − t = 15− 3 = 12 .

Adjusting the variable Y to the concomitantvariable X yields two new sums of squares:1. The total adjusted sum of squares:

SStot = SYY − S2XY

SXX

= 3885.73− 158.932

453.73= 3830.06 .

2. The adjusted sum of the squares of theerrors:

SSerr = EYY − E2XY

EXX

= 3343.20− 186.602

230.40= 3192.07 .

The new degrees of freedom for these twosums are:1.

∑3i=1 ni − 2 = 15− 2 = 13;

2.∑3

i=1 ni − t − 1 = 15− 3− 1 = 11.The adjusted sum of the squares of the treat-ments is given by:

SStr = SStot − SSerr

= 3830.06− 3192.07

= 637.99 .

Page 139: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Covariation 133

Thishasthesamenumberofdegrees of free-dom as before:

t − 1 = 3− 1 = 2 .

We now have all of the elements required inorder to establish acovarianceanalysis table.The sums of squares divided by the degreesof freedom gives the means of the squares.

Sourceof var-iation

Degreesof free-dom

Sum of squaresand of products

3∑

i=1x2

i

3∑

i=1xi yi

3∑

i=1y2

i

Treat-ments

2 223.33 −27.67 543.53

Errors 12 230.40 186.60 3343.20

Total 14 453.73 158.93 3885.73

Adjustment

Source ofvariation

Degreesof free-dom

Sum ofsquares

Mean ofsquares

Treat-ments

2 637.99 318.995

Errors 11 3192.07 290.188

Total 13 3830.06

The F ratio, which is used to test the nullhypothesis that there is no significant differ-ence between the means of the treatmentsonce adjusted to the variable Y, is given by:

F = MCtr

MCerr= 318.995

290.188= 1.099 .

Ifwechooseasignificance levelofα = 0.05,the value of F in the Fisher table is equal to:

F2,11,0.05 = 3.98 .

Since F < F2,11,0.05, we cannot reject thenull hypothesis and so we conclude thatthere is no significant difference between theresponses to the three diets once the vari-able Y is adjusted to the initial weight X.

FURTHER READING� Analysis of variance� Design of experiments� Missing data analysis� Regression analysis

REFERENCESBartlett, M.S.: Some examples of statisti-

cal methods of research in agriculture andapplied biology. J. Roy. Stat. Soc. (Suppl.)4, 137–183 (1937)

DeLury, D.B.: The analysis of covariance.Biometrics 4, 153–170 (1948)

Fisher, R.A.: Statistical Methods forResearch Workers. Oliver & Boyd, Edin-burgh (1925)

Huitema, B.E.: The Analysis of Covarianceand Alternatives. Wiley, New York (1980)

Wildt, A.R., Ahtola, O.: Analysis of Covari-ance (Sage University Papers Series onQuantitative Applications in the SocialSciences, Paper 12). Sage, ThousandOaks, CA (1978)

Covariation

It is often interesting, particularly in eco-nomics, to compare two time series.Since we wish to measure the level of depen-dence between two variables, this is some-what reminiscent of the concept of correla-tion. However, in this case, since the timeseries are bound by a third variable, time,finding the correlation coefficient wouldonly lead to an artificial relation.Indeed, if two time series are considered, xt

and yt, which represent completely inde-pendent phenomenaand are linear functions

Page 140: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

134 Covariation

of time:xt = a · t + b ,

yt = c · t + d ,

where a, b, c and d are constants, it isalways possible to eliminate the time fac-tor t between the two equations and to obtaina functional relation of the type y = e ·x+ f . This relation states that there is a lin-ear dependence between the two time series,which is not the case.Therefore, measuring the correlationbetween the evolutions of two phenome-na over time does not imply the existence ofa link between them. The term covariation isthereforeused instead ofcorrelation, and thisdependence is measured using a covariationcoefficient. We can distinguish between:• The linear covariation coefficient;• The tendency covariation coefficient.

HISTORYSee correlation coefficient and time series.

MATHEMATICAL ASPECTSIn order to compare two time series yt and xt,the first step is to attempt to represent themon the same graphic.However,visualcomparison isgenerally dif-ficult. The following change of variables isperformed:

Yt = yt − y

Syand Xt = xt − x

Sx,

which are the centered and reduced variableswhere Sy and Sx are the standard deviationsof the respective time series.We can distinguish between the followingcovariation coefficients:• The linear covariation coefficient

The form of this expression is identical tothe one for the correlation coefficient r,

but here the calculations do not have thesame grasp because the goal is to detectthe eventual existence of relation betweenvariations that are themselves related totime and to measure the order of magni-tude

C =

n∑t=1

(xt − x) · (yt − y)

√√√√n∑

t=1

(xt − x)2 ·n∑

t=1

(yt − y)2

.

This yields values of between−1 and+1.If it is close to ±1, there is a linear rela-tion between the time evolutions of thetwo variables.Notice that:

C =

n∑t=1

Xt · Yt

n.

Here n is the number of observations,while Yt and Xt are the centered andreduced series obtained by a change ofvariable, respectively.

• The tendency covariation coefficientThe influence exerted by the means iseliminated by calculating:

K =

n∑t=1

(xt − Txt) · (yt − Tyt)

√√√√n∑

t=1

(xt − Txt )2 ·

n∑t=1

(yt − Tyt)2

.

The means x and y have simply beenreplaced with the values of the seculartrends Txt and Tyt of each time series.The tendency covariation coefficient Kalso takes values between−1 to +1, andthe closer it gets to ±1, the stronger thecovariation between the time series.

Page 141: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Covariation 135

DOMAINS AND LIMITATIONSThere are many examples of the need tocompare two time series in economics: forexample, when comparing the evolution ofthe price of a product to the evolution ofthe quantity of the product on the market,or the evolution of the national revenue tothe evolution of real estate transactions. Itis important to know whether there is somekind of dependence between the two phe-nomena that evolve over time: this is the goalof measuring the covariation.Visually comparing two time series is animportant operation, but this is often a dif-ficult task because:• The data undergoing comparison may

come from very different domains andpresent very different orders of magni-tude, so it is preferable to study the devi-ations from the mean.

• The peaks and troughs of two time seriesmay have very different amplitudes; it isthen preferable to homogenize the dis-persions by linking the variations backto the standard deviation of the timeseries.

Visual comparison is simplified if weconsider the centered and reduced vari-ables obtained via the following variablechanges:

Yt = yt − y

Syand Xt = xt − x

Sx.

Also, in a similar way to the correlationcoefficient, nonlinear relations can existbetween two variables that give a C valuethat is close to zero.It is therefore important to becautiousduringinterpretation.The tendency covariation coefficient is pref-erentiallyusedwhentherelationbetweenthetime series is not linear.

EXAMPLESLet us study the covariation between twotime series.The variable xt represents the annual pro-duction of an agricultural product; the vari-able yt is its average annual price per unit inconstant euros.

t xt yt

1 320 5.3

2 660 3.2

3 300 2.2

4 190 3.4

5 320 2.7

6 240 3.5

7 360 2.0

8 170 2.5

8∑t=1

xt = 2560 ,8∑

t=1

yt = 24.8 ,

giving x = 320 and y = 3.1.

xt − x (xt − x)2 yt − y (yt − y)2

0 0 2.2 4.84

340 115600 0.1 0.01

−20 400 −0.9 0.81

−130 16900 0.3 0.09

0 0 −0.4 0.16

−80 6400 0.4 0.16

40 1600 −1.1 1.21

−150 22500 0.6 0.36

8∑t=1

(xt − x)2 = 163400 ,

8∑t=1

(yt − y)2 = 7.64 ,

giving σx = 142.9 and σy = 0.98.The centered and reduced values Xt and Yt

are then calculated.

Page 142: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

136 Cox, David R.

Xt Yt Xt · Yt Xt · Yt−1

0.00 2.25 0.00 5.36

2.38 0.10 0.24 −0.01

−0.14 −0.92 0.13 0.84

−0.91 0.31 −0.28 0.00

0.00 −0.41 0.00 0.23

−0.56 0.41 −0.23 0.11

0.28 −1.13 −0.32 1.18

−1.05 −0.61 0.64

The linear covariation coefficient is then cal-culated:

C =

8∑t=1

Xt · Yt

8= 0.18

8= 0.0225 .

If the observations Xt are comparedwith Yt−1, meaning the production this yearis compared with that of the previous year,we obtain:

C =

8∑t=2

Xt · Yt−1

8= 7.71

8= 0.964 .

The linear covariation coefficient for a shiftof one year is very strong. There is a strong(positive) covariation with a shift of one yearbetween the two variables.

FURTHER READING� Correlation coefficient� Moving average� Secular trend� Standard deviation� Time series

REFERENCESKendall, M.G.: Time Series. Griffin, London

(1973)

Py, B.: Statistique Déscriptive. Economica,Paris (1987)

Cox, David R.

Cox, David R. was born in 1924 in Bir-mingham in England. He studied mathe-matics at the University of Cambridge andobtained his doctorate in applied mathe-matics at the University of Leeds in 1949.From 1966 to 1988, he was a professor ofstatistics at Imperial College London, andthen from 1988 to 1994 he taught at NuffieldCollege, Oxford.Cox,David isan eminent statistician.Hewasknighted by Queen Elizabeth II in 1982 ingratitude for his contributions to statisticalscience, and has been named Doctor Hon-oris Causa by many universities in Englandand elsewhere. From 1981 to 1983 he wasPresident of the Royal Statistical Society;he was also President of the Bernoulli Soci-ety from 1972 to 1983 and President of theInternational Statistical Institute from 1995to 1997.Due to thevariety ofsubjects thathehasstud-ied and developed, Professor Cox has hada profound impact in his field. He was namedDoctor Honoris Causa of the University ofNeuchâtel in 1992.Cox, Sir David is the author and the coauthorof more then 250 articles and 16 books, andbetween 1966 and 1991 he was the editor ofBiometrika.

Some principal works and articles of Cox,Sir David:

1964 (and Box, G.E.P.) An analysis oftransformations (with discussion)J. Roy. Stat. Soc. Ser. B 26, 211–243.

1970 The Analysis of Binary Data.Methuen, London.

Page 143: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Cp Criterion 137

1973 (and Hinkley, D.V.) TheoreticalStatistics. Chapman & Hall, London.

1974 (and Atkinson, A.C.) Planning expe-riments for discriminating betweenmodels. J. Roy. Stat. Soc. Ser. B 36,321–348.

1978 (and Hinkley, D.V.) Problems andSolutions in Theoretical Statistics.Chapman & Hall, London.

1981 (and Snell, E.J.) Applied Statistics:Principles and Examples. Chapman& Hall.

1981 Theory and general principles instatistics. J. Roy. Stat. Soc. Ser. A144, 289–297.

2000 (and Reid, N.) Theory of DesignExperiments. Chapman & Hall, Lon-don.

Cox, Mary GertrudeCox, Gertrude Mary was born in 1900,near Dayton, Iowa, USA. Her ambition wasto help people, and so she initially stud-ied a social sciences course for two years.Then she worked in orphanage for youngboys in Montana for two years. In orderto become a director of the orphanage, shedecided to continue her education at IowaState College. She graduated from IowaState College in 1929. To pay for her stud-ies, Cox, Gertrude worked with Snedecor,George Waddel, her professor, in his sta-tistical laboratory, which led to her becom-ing interested in statistics. After graduating,she started studying for a doctorate in psych-ology. In 1933, before finishing her doctor-ate, Snedecor, George, then the director ofthe Iowa State Statistical Laboratory, con-vinced her to become his assistant, which

she agreed to, although she remained in thefield of psychology because she worked onthe evaluation of statistical test in psych-ology and the analysis of psychologicaldata.On 1st November 1940, she became thedirector of the Department of ExperimentalStatistics of the State of North Carolina.In 1945, the General Education Board gaveher permission to create an institute of statis-tics at the University of North Carolina, witha department of mathematical statistics atChapel Hill.She was a founder member of the Journal ofthe International Biometric Society in 1947,and she was a director of it from 1947 to1955 and president of it from 1968 to 1969.She was president of the American Statisti-cal Association (ASA) in 1956. She died in1978.

Principal work of Cox, M. Gertrude:

1957 (and Cochran, W.) ExperimentalDesigns, 2nd edn. Wiley, New York

Cp Criterion

The Cpciterion is a model selection citerionin linear regression. For a linear regressionmodel with p parameters including any con-stant term, in the model, a rule of thumb isto select a model in which the value of Cp isclose to the number of terms in the model.

HISTORYIntroduced by Mallows, Colin L. in 1964, themodel selection criterion Cp has been usedever since as a criterion for evaluating thegoodness of fit of a regression model.

Page 144: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

138 Cp Criterion

MATHEMATICAL ASPECTSLet Yi = β0 + ∑p−1

j=1 Xjiβj + εi, i =1, . . . , n be a multiple linear regressionmodel. Denote the mean square error asMSE(yi). The criterion introduced in thissection can be used to choose the model withthe minimal sum of mean square errors:

∑MSE (yi) =

∑(E

((yi − yi)

2)

− σ 2 (1− 2hii)

)

= E(∑

(yi − yi)2)

− σ 2∑

(1− 2hii)

= E(RSS)− σ 2(n− 2p) ,

where

n is the number of observations,p is the number of estimated parameters,hii are the diagonal elements of the hat

matrix, andyi is the estimator of yi.

Recall the following property of the hii:

∑hii = p .

Define the coefficient

p =∑

MSE (yi)

σ 2

= E (RSS)− σ 2 (n− 2p)

σ 2

= E (RSS)

σ 2 − n+ 2p .

If the model is correct, we must have:

E (RSS) = (n− p) σ 2 ,

which implies

p = p .

In practice, we estimate p by

Cp = RSS

σ 2− n+ 2p ,

where σ 2 is an estimator of σ 2. Here we esti-mate σ 2 using the s2 of the full model. Forthis full model, we actually obtain Cp = p,which is not an interesting result. Howev-er, for all of the other models, where we useonlyasubsetof theexplanatoryvariables, thecoefficient Cp can have values that are differ-ent from p. From the models that incorporateonlyasubsetof theexplanatoryvariables,wethen choose those for which the value of Cp

is the closest to p.If we have k explanatory variables, we canalso define the coefficient Cp for a model thatincorporates a subset X1, . . . , Xp−1, p ≤ k ofthe k explanatory variables in the followingmanner:

Cp = (n− k) RSS(X1, . . . , Xp−1

)

RSS (X1, . . . , Xk)

− n+ 2p ,

where RSS(X1, . . . , Xp−1

)is the sum of the

squares of the residuals related to the mod-el with p − 1 explanatory variables, andRSS (X1, . . . , Xk) is the sum of the squares ofthe residuals related to the model with all kexplanatory variables. Out of the two modelsthat incorporate p−1 explanatory variables,wechoose, according to thecriterion, theoneforwhich thevalueof thecoefficientCp is theclosest to p.

DOMAINS AND LIMITATIONSThe Cp criterion is used when selecting vari-ables. When used with the R2 criterion, thiscriterion can tell us about the goodness offit of the chosen model. The underlying ideaof such a procedure is the following: instead

Page 145: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Cp Criterion 139

of trying to explain one variable using allof the available explanatory variables, it issometimes possible to determine an under-lying model with a subset of these variables,and the explanatory power of this model isalmost the same as that of the model contain-ing all of the explanatory variables. Anoth-er reason for this is that the collinearity ofthe explanatory variables tends to decreaseestimator precision, and so it can be usefulto delete some superficial variables.

EXAMPLESConsider some data related to the Chicagofires of 1975. We denote the variable cor-responding to the logarithm of the numberof fires per 1000 households per district i ofChicago in 1975 by Y, and the variables cor-responding to the proportion of householdsconstructed before 1940, to the number ofthefts and to the median revenue per districti by X1, X2 and X3, respectively.Sincethissetcontainsthreeexplanatoryvari-ables, we have 23 = 8 possible models. Wedivide the eight possible equations into foursets:1. Set A contains the only equation without

explanatory variables:

Y = β0 + ε .

2. Set B contains three equations with oneexplanatory variable:

Y = β0 + β1X1 + ε

Y = β0 + β2X2 + ε

Y = β0 + β3X3 + ε .

3. Set C contains three equations with twoexplanatory variables:

Y = β0 + β1X1 + β2X2 + ε

Y = β0 + β1X1 + β3X3 + ε

Y = β0 + β2X2 + β3X3 + ε .

4. Set D contains one equation with threeexplanatory variables:

Y = β0 + β1X1 + β2X2 + β3X3 + ε .

District X1 X2 X3 Y

i xi1 xi2 xi3 yi

1 0.604 29 11.744 1.825

2 0.765 44 9.323 2.251

3 0.735 36 9.948 2.351

4 0.669 37 10.656 2.041

5 0.814 53 9.730 2.152

6 0.526 68 8.231 3.529

7 0.426 75 21.480 2.398

8 0.785 18 11.104 1.932

9 0.901 31 10.694 1.988

10 0.898 25 9.631 2.715

11 0.827 34 7.995 3.371

12 0.402 14 13.722 0.788

13 0.279 11 16.250 1.740

14 0.077 11 13.686 0.693

15 0.638 22 12.405 0.916

16 0.512 17 12.198 1.099

17 0.851 27 11.600 1.686

18 0.444 9 12.765 0.788

19 0.842 29 11.084 1.974

20 0.898 30 10.510 2.715

21 0.727 40 9.784 2.803

22 0.729 32 7.342 2.912

23 0.631 41 6.565 3.589

24 0.830 147 7.459 3.681

25 0.783 22 8.014 2.918

26 0.790 29 8.177 3.148

27 0.480 46 8.212 2.501

28 0.715 23 11.230 1.723

29 0.731 4 8.330 3.082

30 0.650 31 5.583 3.073

31 0.754 39 8.564 2.197

32 0.208 15 12.102 1.281

33 0.618 32 11.876 1.609

34 0.781 27 9.742 3.353

Page 146: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

140 Cramér, Harald

District X1 X2 X3 Y

i xi1 xi2 xi3 yi

35 0.686 32 7.520 2.856

36 0.734 34 7.388 2.425

37 0.020 17 13.842 1.224

38 0.570 46 11.040 2.477

39 0.559 42 10.332 2.351

40 0.675 43 10.908 2.370

41 0.580 34 11.156 2.380

42 0.152 19 13.323 1.569

43 0.408 25 12.960 2.342

44 0.578 28 11.260 2.747

45 0.114 3 10.080 1.946

46 0.492 23 11.428 1.960

47 0.466 27 13.731 1.589

Source: Andrews and Herzberg (1985)

Wedenote theresultingmodels in thefollow-ing way: 1 for X1, 2 for X2, 3 for X3, 12 forX1 and X2, 13 for X1 and X3, 23 for X2 andX3, and 123 for the full model. The follow-ing table represents the results obtained forthe Cp criterion for each model:

Model Cp

1 32.356

2 29.937

3 18.354

12 18.936

13 14.817

23 3.681

123 4.000

If we consider a model from group B (con-taining one explanatory variable), the num-ber of estimated parameters is p = 2 andnone of the Cp values for the three modelsapproaches 2. If we now consider a modelfrom group C (with two explanatory vari-ables), p equals 3 and the Cp of model 23approaches this. Finally, for the completemodel we find that Cp = 4, which is

also the number of estimated parameters, butthis is not an interesting result as previous-ly explained. Therefore, the most reasonablechoice for the model appears to be:

Y = β0 + β2X2 + β3X3 + ε .

FURTHER READING� Coefficient of determination� Collinearity� Hat matrix� Mean squared error� Regression analysis

REFERENCESAndrews D.F., Hertzberg, A.M.: Data:

A Collection of Problems from ManyFieldsforStudentsandResearchWorkers.Springer, Berlin Heidelberg New York(1985)

Mallows, C.L.: Choosing variables in a lin-ear regression: A graphical aid. Present-ed at the Central Regional Meeting of theInstitute of Mathematical Statistics, Man-hattan, KS, 7–9 May 1964 (1964)

Mallows, C.L.: Some comments on Cp.Technometrics 15, 661–675 (1973)

Cramér, Harald

Cramér, Harald (1893–1985) entered theUniversity of Stockholm in 1912 in orderto study chemistry and mathematics; hebecame a student of Leffler, Mittag andRiesz, Marcel. In 1919, Cramér was namedassistantprofessorat theUniversityofStock-holm. At the same time, he worked as anactuary for an insurance company, SvenskaLife Assurance, which allowed him to studyprobability and statistics.

Page 147: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Criterion of Total Mean Squared Error 141

His main work in actuarial mathematics isCollective Risk Theory. In 1929, he wasasked to create a new department in Stock-holm, and he became the first Swedish pro-fessor of actuarial and statistical mathe-matics. At the end of the Second World Warhe wrote his principal work MathematicalMethods of Statistics, which was publishedfor the first time in 1945 and was recently (in1999) republished.SomeprincipalworksandarticlesofCramér,Harald

1946 Mathematical Methods of Statistics.Princeton University Press, Prince-ton, NJ.

1946 Collective Risk Theory: A Survey ofthe Theory from the Point of View ofthe Theory of Stochastic Processes.Skandia Jubilee Volume, Stockholm.

Criterion Of TotalMean Squared Error

The criterion of total mean squared erroris a way of comparing estimations of theparameters of a biased or unbiased model.

MATHEMATICAL ASPECTSLet

β = (β1, . . . , βp−1

)

beavectorofestimatorsfor theparametersofa regression model. We define the total meansquare error, TMSE, of the vector β of esti-mators as being the sum of the mean squarederrors (MSE) of its components.We recall that

MSE(βj) = E

((βj − βj

)2)

= V(βj)+ (

E(βj)− βj

)2,

where E (.) and V (.) are the usual symbolsused for the expected value and the vari-ance. We define the total mean squared erroras:

TMSE(βj) =p−1∑j=1

MSE(βj)

=p−1∑j=1

E((

βj − βj)2

)

=p−1∑j=1

[Var

(βj)+(

E(βj)−βj

)2]

= (p− 1)σ 2 · Trace (V)

+p−1∑j=1

(E

(βj)− βj

)2.

where V is the variance-covariance matrixof β.

DOMAINS AND LIMITATIONSUnfortunately,whenwewant tocalculate thetotal mean squared error

TMSE(β) =

p−1∑j=1

E((

βj − βj)2

)

of a vector of estimators

β = (β1, . . . , βp−1

)

for the parameters of the model

Yi = β0 + β1Xsi1 + . . .+ βp−1Xs

ip−1 + εi ,

we need to know the values of the modelparametersβj,which areobviously unknownfor all real data. The notation Xs

j means thatthe data for the jth explanatory variable werestandardized (see standardized data).Therefore, in order to estimate these TMSEwe generate data from a structural similarly

Page 148: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

142 Criterion of Total Mean Squared Error

model. Using a generator of pseudo-randomnumbers, we can simulate all of the datafor the artificial model, analyze it with dif-ferent models of regression (such as simpleregression or ridge regression), and calcu-late what we call the total squared error,TSE, of the vectors of the estimators β relat-ed to each method:

TSE(β) =p−1∑j=1

(βj − βj)2 .

We repeat this operation 100 times, ensur-ing that the 100 data sets are pseudo-independent. For each model, the averageof 100 TSE gives a good estimation of theTMSE. Note that some statisticians preferthe model obtained by selecting variablesdue to its simplicity. On the other hand, otherstatisticians prefer the ridge method becauseit uses all of the available information.

EXAMPLESWe can generally compare the followingregression methods: linear regression bymean squares, ridge regression, or the vari-able selection method.In the following example, thirteen portionsof cement have been examined.Each portionis composed of four ingredients, given in thetable. The aim is to determine how the quan-tities xi1, xi2, xi3 and xi4 of these four ingredi-ents influence the quantity yi, the heat givenout due to the hardening of the cement.Heat given out by the cement

Portion Ingredient Heat

1 2 3 4

i xi1 xi2 xi3 xi4 yi

1 7 26 6 60 78.5

2 1 29 15 52 74.3

3 11 56 8 20 104.3

Portion Ingredient Heat

1 2 3 4

i xi1 xi2 xi3 xi4 yi

4 11 31 8 47 87.6

5 7 52 6 33 95.9

6 11 55 9 22 109.2

7 3 71 17 6 102.7

8 1 31 22 44 72.5

9 2 54 18 22 93.1

10 21 47 4 26 115.9

11 1 40 23 34 83.9

12 11 66 9 12 113.3

13 10 68 8 12 109.4

yi quantity of heat given out due to thehardening of the ith portion (in joules);

xi1 quantity of ingredient 1 (tricalcium alu-minate) in the ith portion;

xi2 quantity of ingredient 2 (tricalcium sil-icate) in the ith portion;

xi3 quantity of ingredient 3 (tetracalciumalumino-ferrite) in the ith portion;

xi4 quantity of ingredient 4 (dicalcium sil-icate) in the ith portion.

In this paragraph we will compare the esti-mators obtained by least squares (LS)regression with those obtained by ridgeregression (R) and those obtained with thevariable selection method (SV) via the totalmean squared error TMSE. The three esti-mation vectors obtained from each methodare:

YLS = 95.4+ 9.12X1s + 7.94X2

s

+ 0.65X3s − 2.41X4

s

YR = 95.4+ 7.64X1s + 4.67X2

s

− 0.91X3s − 5.84X4

s

YSV = 95.4+ 8.64X1s + 10.3X2

s

We note here that all of the estimationswere obtained from standardized explanato-

Page 149: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Criterion of Total Mean Squared Error 143

ry variables. We compare these three estima-tions using the total mean squared errors ofthe three vectors:

βLS =(βLS1 , βLS2 , βLS3 , βLS4

)′

βR =(βR1 , βR2 , βR3 , βR4

)′

βSV =(βSV1 , βSV2 , βSV3 , βSV4

)′.

Here the subscript LS corresponds to themethod of least squares, R to the ridgemethod and SV to the variable selectionmethod, respectively. For this latter method,theestimationsfor thecoefficientsof theuns-elected variables in the model are consideredto be zero. In our case we have:

βLS = (9.12, 7.94, 0.65,−2.41)′

βR = (7.64, 4.67,−0.91,−5.84)′

βSV = (8.64, 10.3, 0, 0)′ .

We have chosen to approximate the under-lying process that results in the cement databy the following least squares equation:

YiMC = 95.4+ 9.12Xsi1 + 7.94Xs

i2

+ 0.65Xsi3 − 2.41Xs

i4 + εi .

The procedure consists of generating13 random error terms ε1, . . . , ε13 100times based on a normal distribution withmean 0 and standard deviation 2.446 (recallthat 2.446 is the least squares estimatorof σ for the cement data). We then cal-culate Y1LS, . . . , Y13LS using the Xs

ij val-ues in the data table. In this way, we gen-erate 100 Y1LS, . . . , Y13LS samples from100 ε1, . . . , ε13 samples.The three methods are applied to each ofthese 100 Y1LS, . . . , Y13LS samples(alwaysusing the same values for Xs

ij), which yields

100 estimators βLS, βR and βSV. Note that

these three methods are applied to these100 samples without any influence from theresults from the equations

YiLS = 95.4+ 9.12Xsi1 + 7.94Xs

i2

+ 0.65Xsi3 − 2.41Xs

i4 ,

YiR = 95.4+ 7.64Xsi1 + 4.67Xs

i2

− 0.91Xsi3 − 5.84Xs

i4 and

YiSV = 95.4+ 8.64Xsi1 + 10.3Xs

i2

obtained for the original sample. Despite thefact that the variable selection method haschosen the variables Xi1 and Xi2 in YiSV =95.4+ 8.64Xs

i1+ 10.3Xsi2, it is possible that,

for one of these 100 samples, the method hasselected better Xi2 and Xi3 variables, or onlyXi3, or all of the subset of the four availablevariables. In the same way, despite the factthat the equation YiR = 95.4 + 7.64Xs

i1 +4.67xXs

i2− 0.91Xsi3− 5.84Xs

i4 was obtainedwith k = 0.157, the value of k is recalculat-ed for each of these 100 samples (for moredetails refer to theridgeregressionexample).From these 100 estimations of βLS, βSV andβR, we can calculate 100 ETCs for eachmethod, which we label as:

ETCLS =(βLS1 − 9.12

)2

+(βLS2 − 7.94

)2

+(βLS3 − 0.65

)2

+(βLS4 + 2.41

)2

ETCR =(βR1 − 9.12

)2

+(βR2 − 7.94

)2

+(βR3 − 0.65

)2

+ (βR4 + 2.41

)2

Page 150: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

144 Critical Value

ETCSV =(βSV1 − 9.12

)2

+(βSV2 − 7.94

)2

+(βSV3 − 0.65

)2

+(βSV4 + 2.41

)2.

Themeansof the100valuesofETCLS,ETCR

and ETCSV are the estimations of TMSELS,TMSESV and TMSER: the TMSE estimationsfor the three considered methods.After this simulation was performed, the fol-lowing estimations were obtained:

TMSELS = 270 ,

TMSER = 75 .

TMSESV = 166 .

These give the following differences:

TMSELS − TMSESV = 104 ,

TMSELS − TMSER = 195 ,

TMSESV − TMSER = 91 .

Since the standard deviations of 100 ob-served differences are respectively 350,290 and 280, we can calculate the approxi-mate 95% confidence intervals for the differ-ences between the TMSEs of two methods

TMSELS − TMSESV = 104± 2 · 350√100

,

TMSELS − TMSER = 195± 2 · 290√100

,

TMSESV − TMSER = 91± 2 · 280√100

.

We get

34 < TMSELS − TMSESV < 174 ,

137 < TMSELS − TMSER < 253 ,

35 < TMSESV − TMSER < 147 .

We can therefore conclude (at least for theparticular model used to generate the simu-lated data, and taking into account our aim—to geta smallTMSE), that the ridgemethod isthe best of the methods considered, followedby the varible selection procedure.

FURTHER READING� Bias� Expected value� Hat matrix� Mean squared error� Ridge regression� Standardized data� Variance� Variance–covariance matrix� Weighted least-squares method

REFERENCESBox, G.E.P, Draper, N.R.: A basis for the

selection of response surface design.J. Am. Stat. Assoc. 54, 622–654 (1959)

Dodge, Y.: Analyse de régression appliquée.Dunod, Paris (1999)

Critical Value

Inhypothesis testing, thecriticalvalue is thelimit value at which we take the decision toreject thenull hypothesisH0, foragivensig-nificance level.

HISTORYThe concept of a critical value was intro-duced by Neyman, Jerzy and Pearson,Egon Sharpe in 1928.

MATHEMATICAL ASPECTSThe critical value depends on the type ofthe test used (two-sided test or one-sided

Page 151: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Cyclical Fluctuation 145

test on the right or the left), the probabilitydistribution and the significance level α.

DOMAINS AND LIMITATIONSThe critical value is determined from theprobability distribution of the statisticassociated with the test. It is determined byconsulting the statistical table correspond-ing to this probability distribution (normaltable, Student table, Fisher table, chi-square table, etc).

EXAMPLESA company produces steel cables. Usinga sample of size n = 100, it wants to ver-ify whether the diameters of the cables con-form closely enough to the requireddiameter0.9 cm in general.The standard deviation σ of the popula-tion is known and equals 0.05 cm.In this case, hypothesis testing involvesa two-sided test. The hypotheses are the fol-lowing:

null hypothesis H0 : μ = 0.9

alternative hypothesis H1 : μ �= 0.9 .

To a significance level of α = 5%, by look-ing at the normal table we find that the crit-ical value z α

2equals 1.96.

FURTHER READING� Confidence interval� Hypothesis testing� Significance level� Statistical table

REFERENCENeyman, J., Pearson, E.S.: On the use and

interpretation of certain test criteria forpurposes of statistical inference, Parts I

and II. Biometrika 20A, 175–240, 263–294 (1928)

Cyclical Fluctuation

Cyclical fluctuations is a term used todescribe oscillations that occur over longperiods about the secular trend line or curveof a time series.

HISTORYSee time series.

MATHEMATICAL ASPECTSConsider Yt, a time series given by its com-ponents; Yt can be written as:• Yt = Tt ·St ·Ct ·It (multiplicative model),

or;• Yt = Tt + St + Ct + It (additive model).where

Yt is the data at time t;Tt is the secular trend at time t;St is the seasonal variation at time t;Ct is the cyclical fluctuation at time t, and;It is the irregular variation at time t.

Thefirststepwheninvestigatinga timeseriesis always to determine the secular trend Tt,and then to determine the seasonal varia-tion St. It is then possible to adjust the initialdata of the time series Yt according to thesetwo components:

• Yt

St · Tt= Ct · It (multiplicative model);

• Yt − St − Tt = Ct + It (additive model).To avoid cyclical fluctuations, a weightedmoving average is established over a fewmonths only. The use of moving averagesallows use to smooth the irregular varia-tions It by preserving the cyclical fluctua-tions Ct.

Page 152: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

146 Cyclical Fluctuation

The choice of a weighted moving averageallows us to give more weight to the cen-tral values compared to the extreme val-ues, in order to reproduce cyclical fluctua-tionsinamoreaccurateway.Therefore, largeweights will be given to the central valuesand small weights to the extreme values.For example, for a moving average con-sidered for an interval of five months, theweights−0.1, 0.3, 0.6, 0.3 and−0.1 can beused; since their sum is 1, there will be noneed for normalization.If the values of Ct · It (resulting from theadjustments performed with respect to thesecular trend and to the seasonal varia-tions) are denoted by Xt, the value of thecyclical fluctuation for the month t is deter-mined by:

Ct = −0.1 · Xt−2 + 0.3 · Xt−1 + 0.6 · Xt

+ 0.3 · Xt+1 − 0.1 · Xt+2 .

DOMAINS AND LIMITATIONSEstimating cyclical fluctuations allows us to:• Determine the maxima or minima that

a time series can attain.• Perform short- or medium-term forecast-

ing.• Identify the cyclical components.The limitations and advantages of the use ofweightedmoving averageswhenevaluatingcyclical fluctuations are the following:• Weighted moving averages can smooth

a curve with cyclical fluctuations whichstill retaining most of the original fluc-tuation, because they preserve the ampli-tudes of the cycles in an accurate way.

• The use of an odd number of months toestablish the moving average facilitatesbetter centering of the values obtained.

• It is difficult to study the cyclical fluctu-ation of a time series because the cycles

usually vary in length and amplitude.Thisisdue to thepresenceofamultitudeof fac-tors, where the effects of these factors canchange from one cycle to the other. Noneof the models used to explain and predictsuch fluctuations have been found to becompletely satisfactory.

EXAMPLESLet us establish a moving average con-sidered over five months of data adjustedaccording to the secular trend and seasonalvariations.Letusalsouse theweights−0.1,0.3,0.6,0.3,−0.1; since their sum is equal to 1, there isno need for normalization.Let Xi be the adjusted values of Ci · Ii:

Ci = −0.1 · Xi−2 + 0.3 · Xi−1 + 0.6 · Xi

+ 0.3 · Xi+1 − 0.1 · Xi+2 .

The table below shows the electrical pow-er consumed by street lights every month inmillions of kilowatt hours during the years1952 and 1953. The data have been adjustedaccording to the secular trend and seasonalvariations.

Year Month DataXi

Moving averagefor 5 months Ci

1952 J 99.9

F 100.4

M 100.2 100.1

A 99.0 99.1

M 98.1 98.4

J 99.0 98.7

J 98.5 98.3

A 97.8 98.3

S 100.3 99.9

O 101.1 101.3

N 101.2 101.1

D 100.4 100.7

Page 153: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

C

Cyclical Fluctuation 147

Year Month DataXi

Moving averagefor 5 months Ci

1953 J 100.6 100.3

F 100.1 100.4

M 100.5 100.1

A 99.2 99.5

M 98.9 98.7

J 98.2 98.2

J 98.4 98.5

A 99.7 99.5

S 100.4 100.5

O 101.0 101.0

N 101.1

D 101.1

Nosignificantcyclicaleffectsappear in thesedata and non significant effects do not meanno effects. The beginning of an economic

cycle is often sought, but this only appearsevery 20 years.

FURTHER READING� Forecasting� Irregular variation� Moving average� Seasonal variation� Secular trend� Time series

REFERENCEBox, G.E.P., Jenkins, G.M.: Time Series

Analysis: Forecasting and Control (SeriesinTimeSeriesAnalysis).HoldenDay,SanFrancisco (1970)

Wold, H.O. (ed.): Bibliography on TimeSeries and Stochastic Processes. Oliver &Boyd, Edinburgh (1965)

Page 154: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

D

Daniels, Henry E.

Daniels, Henry Ellis was born in London inOctober 1912. After graduating from Edin-burgh University in 1933, he continued hisstudies at Cambridge University. After gain-ing his doctorate at Edinburgh University, hewentback toCambridgeaslecturer inmathe-matics in 1947. In 1957, Daniels, Henrybecame the first professor of mathematics atBirmingham University, a post he held until1978, up to his retirement. The research fieldthat interested Daniels, Henry were infer-ential statistics, saddlepoint approximationsin statistics, epidemiological modeling andstatistical theory as applied to textile tech-nology. When he was in Birmingham, hefounded the annual meeting of statisticiansin Gregynog Hall in Powys. These annu-al meetings gradually became one of themost well-received among English statisti-cians.From 1974 to 1975 he was president ofthe Royal Statistical Society, which awardedhim the Guy Medal in silver in 1957 and ingold in 1984. In 1980 he was elected a mem-ber of the Royal Society and in 1985 an hon-ored member of the International StatisticalSociety. He died in 2000.

Some principal works and articles ofDaniels, Henry E.:

1954 Saddlepointapproximations in statis-tics. Ann. Math. Stat. 25, 631–650.

1955 Discussion of a paper by Box, G.E.P.and Anderson, S.L.. J. Roy. Stat. Soc.Ser. B 17, 27–28.

1958 Discussion of paper by Cox, D.R..J. Roy. Stat. Soc. Ser. B 20, 236–238.

1987 Tail probability approximations. Int.Stat. Rev. 55, 137–48.

Data

A datum (plural data) is the result of anobservation made on a population or ona sample.The word “datum” is Latin, and means“something given;” it is used in mathematicsto denote an item of information (not neces-sarily numerical) from which a conclusioncan be made.Note that a number (or any other form ofdescription)thatdoesnotnecessarilycontainany information should not be confused witha datum, which does contain information.

Page 155: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

150 Data Analysis

The data obtained from observations arerelated to the variable being studied. Thesedata are quantitative, qualitative, discrete orcontinuous if the corresponding variable isquantitative, qualitative, discrete or contin-uous, respectively.

FURTHER READING� Binary data� Categorical data� Incomplete data� Observation� Population� Sample� Spatial data� Standardized data� Value� Variable

REFERENCESFederer, W.T.: Data collection. In: Kotz, S.,

Johnson, N.L. (eds.) Encyclopedia of Sta-tistical Sciences, vol. 2. Wiley, New York(1982)

Data AnalysisOften, one of the first steps performedin scientific research is to collect data.These data are generally organized into two-dimensional tables. This organization usu-ally makes it easier to extract informationabout the data—in other words, to analyzethem.In its widest sense, data analysis can be con-sidered to be the essence of statistics towhich all other aspects of the subject arelinked.

HISTORYSince data analysis encompasses many dif-ferent methods of statistical analysis, it is

difficult to briefly overview the history ofdata analysis. Nevertheless, we can rapid-ly review the chronology of the most funda-mental aspects of the subject:• The first publication on exploratory data

analysis dates back to 1970–1971, andwas written by Tukey, J.W.. This was thefirstversion of hiswork ExploratoryDataAnalysis, published in 1977.

• The theoretical principles of correspon-dence analysis are due to Hartley, H.O.(1935) (published under his originalGerman name Hirschfeld) and to Fish-er, R.A. (1940). However, the theorywas largely developed in the 1970s byBenzécri, J.P.

• The first studies on classification werecarried out in biology and in zoology.The oldest form of typology was con-ceived by Galen (129–199 A.D.). Numer-ical classification methods derive fromthe ideas of Adanson (in the eighteenthcentury), and were developed, amongstothers, by Zubin (1938) and Thorndike(1953).

DOMAINS AND LIMITATIONSThe field of data analysis comprises manydifferent statistical methods.Types of data analysis can be classified in thefollowing way:1. Exploratory data analysis, which

involves (as its name implies) exploringthe data, via:• Representing the data graphically,• Data transformation (if required),• Detecting outlier observations,• Elaborating research hypotheses that

were not envisaged at the start of theexperiment,

• Robust estimation.

Page 156: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

D

Data Collection 151

2. Initial data analysis, which involves:• Choosing the statistical methods to be

applied to the data.3. Multivariate data analysis, which in-

cludes:• Diseriminant analysis,• Data transformation, which reduces

the number of dimensions and facili-tates interpretation,

• Searching for structure.4. Specific forms of data analysis that are

suited to different analytical tasks; theseforms include:• Correspondence analysis,• Multiple correspondence analysis,• Classification.

5. Confirmatory data analysis, which in-volves evaluating and testing analyticalresults; this includes:• Parameter estimation,• Hypotheses tests,• The generalization and the conclu-

sion.

FURTHER READING� Classification� Cluster analysis� Correspondence analysis� Data� Exploratory data analysis� Hypothesis testing� Statistics� Transformation

REFERENCESBenzécri, J.P.: L’Analyse des données.

Vol. 1: La Taxinomie. Vol. 2: L’Analysefactorielle des correspondances. Dunod,Paris (1976)

Benzécri, J.P.: Histoire et préhistoire del’analyse des données, les cahiers de

l’analyse des données 1, no. 1–4. Dunod,Paris (1976)

Everitt, B.S.: Cluster Analysis. Halstead,London (1974)

Fisher, R.A.: The precision of discriminantfunctions.Ann.Eugen.(London)10,422–429 (1940)

Gower, J.C.: Classification, geometryand data analysis. In: Bock, H.H. (ed.)Classification and Related Methods ofData Analysis. Elsevier, Amsterdam(1988)

Hirschfeld,H.O.:Aconnectionbetweencor-relation and contingency. Proc. Camb.Philos. Soc. 31, 520–524 (1935)

Thorndike, R.L.: Who belongs in a family?Psychometrika 18, 267–276 (1953)

Tukey, J.W.: Explanatory Data Analysis,limited preliminary edition. Addison-Wesley, Reading, MA (1970–1971)

Tukey, J.W.: Exploratory Data Analysis.Addison-Wesley, Reading, MA (1977)

Zubin, J.: A technique for measuring like-mindedness. J. Abnorm. Social Psychol.33, 508–516 (1938)

Data Collection

When collecting data, we need to considerseveral issues. First, it is necessary to definewhy we need to collect the data, and whatthe data will be used for. Second, we need toconsider the type of data that should be col-lected: it is essential that the data collectedarerelated to thegoalof thestudy.Finally,wemust consider how the data are to be collect-ed. There are three approaches to data col-lection:

Page 157: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

152 Decile

1. Register2. Sampling and census3. Experimental research

1. RegisterOne form of accessible data is a registeredone. For example, there are registered dataon births, deaths, marriages, daily tempera-tures, monthly rainfall and car sales.There are no general statistical methods thatensure that valuable conclusions are drawnfrom these data. Each set of data should beconsidered on its own merits, and one shouldbe careful when making inference on thepopulation.

2. Sampling and censusSampling methods can be divided into twocategories: random methods and nonrandommethods.Nonrandom methods involve constructing,by empirical means, a sample that resem-bles the population from which it is taken asmuch as possible. The most commonly usednonrandom method is quota sampling.Random methods use a probabilistic proce-dure toderiveasample fromthepopulation.Given the fact that the probability that a giv-en unit is selected in the sample is known, theerror due to sampling can then be calculated.The main random methods are simple ran-dom sampling, stratified sampling, sys-tematic sampling and batch sampling.In a census, all of the objects in a popula-tionareobserved,yieldingdatafor thewholepopulation.Clearly, this typeof investigationis very costly when the population is verylarge.This iswhycensusesarenotperformedvery often.

3. Experimental researchIn an experiment, data collection is per-formed based upon a particular experiment-

al design. These experimental designs areapplied to data collection in all researchfields.

HISTORYIt is likely that the oldest form of data collec-tion dates back to population censuses per-formed in antiquity.Antille and Ujvari (1991) mention the exis-tence of a position for an official statisticianin China during the Chow dynasty (111–211B.C.).The Roman author Tacitus says that Emper-or Augustus ordered all of the soldiers, shipsand wealth in the Empire to be counted.Evidence for censuses can also be found inthe Bible; Saint Luke reports that “CaesarAugustusordered adecreeprescribingacen-sus of the whole world (. . .) and all went tobe inscribed, each in his own town.”A form of statistics can also be found duringthis time. Its name betrays its administrativeorigin, because it comes from the Latin word“status:” the State.

FURTHER READING� Census� Design of experiments� Sampling� Survey

REFERENCESAntille, G., Ujvari, A.: Pratique de la Statis-

tique Inférentielle. PAN, Neuchâtel,Switzerland (1991)

Boursin, J.-L.: Les structures du hasard.Seuil, Paris (1986)

DecileDeciles are measures of position calculatedon a set of data.

Page 158: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

D

Decile 153

The deciles are the values that separateadistribution into ten equalparts,whereeachpart contains the same number of observa-tions). The decile is a member of the widerfamily of quantiles.The xth decile indicates the value where10x% of the observations occur below thisvalue and (100 − 10x)% of the observa-tionsoccurabovethisvalue.Forexample, theeighth decile is the value where 80% of theobservations fall below this and 20% occurabove it.The fifth decile represents the median.There will therefore be nine deciles for a giv-en distribution:

MATHEMATICAL ASPECTSThe process used to calculate deciles is sim-ilar to that used to calculate the median orquartiles.When all of the raw observations are avail-able, the process used to calculate deciles isas follows:

1. Organize the n observations into a fre-quency distribution

2. The deciles correspond to the observa-tions for which the cumulative relativefrequencies exceed 10%, 20%, 30%,. . . ,80%, 90%.Someauthorsproposeusingthefollowingformula to precisely determine the valuesof the different deciles:Calculating the jth decile:Take i to be the integer part of j · n+1

10 andk the fractional part of j · n+1

10 .Take xi and xi+1 to be the values of theobservations at the ith and (i+1)th posi-

tions (when the observations are arrangedin increasing order).The jth decile is then equal to:

Dj = xi + k · (xi+1 − xi) .

When the observations are grouped intoclasses, the deciles are determined in the fol-lowing way:1. Determine the class containing the

desired decile located:• First decile: the first class for which

the cumulative relative frequencyexceeds 10%.

• Second decile: the first class for whichthe cumulative relative frequencyexceeds 20%.. . .

• Ninth decile: the first class for whichthe cumulative relative frequencyexceeds 90%.

2. Calculate the value of the decile based onthe hypothesis that the observations areuniformly distributed in each class:

decile = L1 +[(N · q)−∑

finf

fdecile

]· c .

where

L1 = lower limit of the class of thedecile

N = total number of observations

q = 1/10 for the first decile

q = 2/10 for the second decile

. . .

q = 9/10 for the ninth decile∑finf = sum of the frequencies lower

than the class of the decile

fdecile = frequency of the class of thedecile

c = size of the interval of the classof the decile

Page 159: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

154 Decile

DOMAINS AND LIMITATIONSThe calculation of deciles only has meaningfor aquantitative variable that can takeval-ues over a given interval.The number of observations needs to berelatively high, because the calculation ofdeciles involves dividing the set of observa-tions into ten parts.Deciles are relatively frequently used inpractice; for example, when interpretingthe distribution of revenue in a city orstate.

EXAMPLESConsider an example where the deciles arecalculated for a frequency distribution ofa continuous variable where the observa-tions are grouped into classes.The following frequency table representstheprofits (in thousandsofeuros)of500bak-eries:

Profit (inthousandsof euros)

Fre-quen-cies

Cumu-latedfrequency

Relativecumulatedfrequency

100–150 40 40 0.08

150–200 50 90 0.18

200–250 60 150 0.30

250–300 70 220 0.44

300–350 100 320 0.62

350–400 80 400 0.80

400–450 60 460 0.92

450–500 40 500 1.00

Total 500

The first decile is in the class 150–200 (thisis where the cumulative relative frequencyexceeds 10%).By assuming that the observations are dis-tributed uniformly in each class, we obtain

the following value for the first decile:

1st decile = 150+[

(500 · 110 )− 40

50

]· 50

= 160 .

The second decile falls in the 200–250 class.The value of the second decile is equal to

2nd decile = 200+[

(500 · 210 )− 90

60

]· 50

= 208.33 .

Wecan calculate theotherdeciles in thesameway, which yields:

Decile Class Value

1 150–200 160.00

2 200–250 208.33

3 200–250 250.00

4 250–300 285.71

5 300–350 315.00

6 300–350 340.00

7 350–400 368.75

8 350–400 400.00

9 400–450 441.67

We can conclude, for example, that 10% ofthe 500 bakeries make a profit of between100000 and 160000 euros, 50% make a prof-it that is lower than 315000 euros, and10% make a profit of between 441670 and500000 euros.

FURTHER READING� Measure of location� Median� Percentile� Quantile� Quartile

Page 160: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

D

Deming, W. Edwards 155

Degree of Freedom

The number of degrees of freedom isa parameter from the chi-square distri-bution. It is also a parameter used in otherprobability distributions related to thechi-square distribution, such as the Studentdistribution and the Fisher distribution.In another context, the number of degreesof freedom refers to the number of linear-ly independent terms involved when calcu-lating the sum of squares based on n inde-pendent observations.

HISTORYThe term “degree of freedom” was intro-duced by Fisher, R.A. in 1925.

MATHEMATICAL ASPECTSLet Y1, Y2, . . . , Yn be a random sample ofsize n taken from a population with anunknown mean Y. The sum of the deviationsof n observations with respect to their arith-metic mean is always equal to zero:

n∑i=1

(Yi − Y) = 0 .

Thisrequirement isaconstraintoneachdevi-ation Yi− Y used when calculating the vari-ance:

S2 =

n∑i=1

(Yi − Y)2

n− 1.

This constraint implies that n − 1 devia-tions completely determine the nth devia-tion. The n deviations (and also the sum oftheir squares and the variance in the S2 of thesample) therefore have n−1 degrees of free-dom.

FURTHER READING� Chi-square distribution� Fisher distribution� Parameter� Student distribution� Variance

REFERENCESFisher, R.A.: Applications of “Student’s”

distribution. Metron 5, 90–104 (1925)

Deming, W. Edwards

Deming, W. Edwards was born in 1900, inSioux City, in Iowa. He studied science atWyoming University, graduating in 1921,and at Colorado University, where he com-pleted his Master degree in mathematicsand physics in 1924. Then he went to YaleUniversity, where he received his doctoratein physics in 1928. After his doctorate, heworked for ten years in the Laboratory ofthe Ministry for Agriculture. In 1939, Dem-ing moved to the Census Bureau in Wash-ington. There he used his theoretical knowl-edge to initiate the first censuses to be per-formed by sampling. These censuses usedtechniques that laterprovidedanexample forsimilar censuses performed around the restof the world. He left the Census Bureau in1946 to becomeconsultant in statistical stud-ies and a professor of statistics at New YorkUniversity. He died in Washington D.C. in1993.For a long time (up to 1980), the theoriesof Deming, W.E. were ignored by Ame-rican companies. In 1947, Deming went toTokyo as a consultant to apply his techniquesof sampling. From 1950 onwards, Japaneseindustry adopted the management theoriesof Deming. Within ten years many Japanese

Page 161: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

156 Demography

products were being exported to Americabecause they were better and less expensivethanequivalentproductsmanufacturedintheUnited States.

Some principal works and articles of Dem-ing, W. Edwards:

1938 Statistical Adjustment of Data.Wiley, New York.

1950 Some Theory of Sampling. Wiley,New York.

1960 SampleDesign inBusinessResearch.Wiley, New York.

DemographyDemography is the study of human pop-ulations. It involves analyzing phenomenasuch as births, deaths, migration, marriages,divorces, fertility rate, mortality rate and agepyramids.These phenomena are treated from both bio-logical and socio-economic points of view.The methods used in demography areadvanced mathematics and statistics as wellas many fields of social science.

HISTORYThe first demographic studies date back tothe middle of the seventeenth century. In thisperiod, Graunt, John, an English scientist,analyzed the only population data available:a list of deaths in the London area, classi-fied according to their cause. In 1662 he pub-lishedastudy inwhichhetried toevaluate theaverage size of a family, the importance ofmigrational movement, and other elementsrelated to the structure of the population.In collaboration with Petty, Sir William,Graunt proposed that more serious studies of

thepopulationshouldbemadeandthatacen-ter where statistical data would be gatheredshould be created.During theeighteenthcentury thesametypesofanalyseswereperformed andprogressive-ly improved,but itwasn’tuntil thenineteenthcentury that several European countries aswell as the United States undertook nationalcensuses and established the regular recordsof births, marriages and deaths.These reports shows that different regionsoffered different chances of survival for theirinhabitants. These conclusions ultimatelyresulted in improved working and hygieneconditions.Using the demographic data gathered, pre-dictions became possible, and the firstdemographic journals and reviews appearedaroundtheendof thecentury.These included“Demography” in the United States, “Popu-lation” in France and “Population Studies”in Great Britain.In the middle of the twentieth century demo-graphic studies began to focus on the glob-al population, as the demographic problemsof the Third World became an importantissue.

FURTHER READING� Census� Population

REFERENCESCox, P.R.: Demography, 5th edn. Cambridge

University Press, Cambridge (1976)

Hauser, P.M., Duncan, O.D.: The Study ofPopulation: An Inventory and Appraisal.University of Chicago Press, Chicago, IL(1959)

Page 162: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

D

Dendrogram 157

Dendrogram

A dendrogram is a graphical representationofdifferentaggregationsmadeduring aclus-ter analysis. It consists of knots that corre-spond to groups and branches that representtheassociationsmadeateach step.Thestruc-ture of the dendrogram is determined by theorder in which the aggregations are made.If a scale is added to the dendrogram it is pos-sible to represent the distances over whichthe aggregations took place.

HISTORYIn a more general sense, a dendrogram (fromthe Greek “dendron”, meaning tree) is a treediagram that illustrates the relations thatexist between the members of a set.The first examples of dendrograms were thephylogenetic trees used by systematic spe-cialists. The term “dendrogram” seems tohave been used for the first time in the workof Mayr et al. (1953).

MATHEMATICAL ASPECTSDuring cluster analysis on a set of objects,aggregations are achieved with the help ofa distance table and the chosen linkagemethod (single linkage method, completelinkage method, etc.).Each extremepointof thedendrogramrepre-sents a different class produced by automaticclassification.The dendrogram is then constructed by rep-resenting each group by a knot placed ata particular position with respect to the hor-izontal scale, where the position dependsupon the distance over which the aggregateis formed.The objects are placed at the zero level of thescale. To stop the branches connecting the

knots from becoming entangled, the proce-dure of drawing a dendrogram is performedin a systematic way.• Thefirst twogroupedelementsareplaced,

one on top of the other, at the zero lev-el of the scale. A horizontal line is thendrawn next to each element, from the zerolevel to the aggregation distance for eachelement. The resulting class is then repre-sented by a vertical line that connects theendsof the lines for theelements (forminga “knot”). The middle of this vertical lineprovides the starting point for a new hori-zontal line for this class when it is aggre-gated with anotherclass.Thedendrogramtherefore consists of branches construct-ed from horizontal and vertical lines.

• Two special cases can occur during thenext step:1. Two classes that each consist of one

element are aggregated; these class-es are placed at the zero level and theresulting knot is located at the aggre-gation distance, as described previous-ly;

2. An element is aggregated to a classformed previously. The element isaggregated from the zero level, andis inserted between existing classesif necessary). The knot representingthis aggregation is placed at the (hori-zontal) distance corresponding to theaggregation distance between the classand the element.

This procedure continues until the desiredconfiguration is obtained. If there are stillclasses consisting of a single element at theend of the aggregation process, they are sim-ply added from the zero level of the dendro-gram.Whentheclassificationhasbeencom-pleted, each class will corresponds to a par-ticular part of the dendrogram.

Page 163: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

158 Dendrogram

DOMAINS AND LIMITATIONSThe dendrogram describes the ordered pathof the set of operations performed duringcluster analysis. It illustrates this type ofclassification in a very precise manner.This strictly defined approach to construct-ing a dendrogram is sometimes modified dueto circumstances. For example, as we willsee in one of the examples below, the aggre-gation distances of two or more successivesteps may be the same, and so the proceduremust then be changed to make sure that thebranchesof thedendrogram do notgetentan-gled.In the case that we have described above, thedendrogram is drawn horizontally. Obvious-ly, it is also possible to plot a dendrogramfrom top to bottom, or from bottom to top.The branches can even consist of diagonallines.

EXAMPLESWe will perform a cluster analysis using thesingle link method and establish the corre-sponding dendrogram, illustrating thediffer-ent aggregations that may occur.Consider thegradesobtainedbyfivestudentsduring examinations for fourdifferentcours-es: English, French, maths and physics.We would like to separate these five stu-dents into two groups using the single linkmethod, in order to test a new teachingmethod.These grades, which can take values from 1to 6, are summarized in the following table:

English French Maths Physics

Alain 5.0 3.5 4.0 4.5

Jean 5.5 4.0 5.0 4.5

Marc 4.5 4.5 4.0 3.5

Paul 4.0 5.5 3.5 4.0

Pierre 4.0 4.5 3.0 3.5

The distance table is obtained by calculat-ing the Euclidean distances between the stu-dents and adding them to the following table:

Alain Jean Marc Paul Pierre

Alain 0 1.22 1.5 2.35 2

Jean 1.22 0 1.8 2.65 2.74

Marc 1.5 1.8 0 1.32 1.12

Paul 2.35 2.65 1.32 0 1.22

Pierre 2 2.74 1.12 1.22 0

Using the single link method, based onthe minimum distance between the objectsin two classes, the following partitions areobtained at each step:First step: Marc and Pierre are grouped atan aggregation distance of 1.12: {Alain},{Jean}, {Marc,Pierre}, {Paul} Second step:We revise the distance table. The new dis-tance table is given below:

Jean Marc and Pierre Paul

Alain 1.22 1.5 2.35

Jean 1.8 2.65

Marc andPierre

1.22

Alain and Jean are then grouped at a distanceof 1.22, and we obtain the following parti-tion: {Alain, Jean}, {Marc, Pierre}, {Paul}Third step: The new distance table is givenbelow:

Marc and Pierre Paul

Alain and Jean 1.5 2.35

Marc and Pierre 1.22

Paul is then grouped with Marc and Pierre atadistanceof1.22,yielding thefollowingpar-tition: {Alain, Jean}, {Marc, Pierre, Paul}Fourth step: The two groups are aggregatedat a distance of 1.5.This gives the following dendrogram:

Page 164: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

D

Density Function 159

Notice that Paul must be positioned beforethegroup ofMarcand Pierre in order toavoidentangling the branches of the diagram.

FURTHER READING� Classification� Cluster analysis� Complete linkage method� Distance� Distance table

REFERENCESMayr,E.,Linsley,E.G.,Usinger,R.L.:Meth-

ods and Principles of Systematic Zoology.McGraw-Hill, New York (1953)

Sneath, P.H.A., Sokal, R.R.: Numerical Tax-onomy: The Principles and Practice ofNumerical Classification (A Series ofBooks in Biology). W.H. Freeman, SanFrancisco, CA (1973)

Sokal, R.R., Sneath, P.H.: Principles ofNumerical Taxonomy. Freeman, SanFrancisco (1963)

Density Function

The density function of a continuous ran-dom variable allows us to determine theprobability that a random variable X takesvalues in a given interval.

HISTORYSee probability.

MATHEMATICAL ASPECTSConsider P(a ≤ X ≤ b), the probabilitythat a continuous random variable X takesavalue in the interval [a, b].Thisprobabilityis defined by:

P(a ≤ X ≤ b) =∫ b

af (x)dx .

where f (x) is the density function of the ran-dom variable X.The density function is graphically repre-sented on an axis system. The different val-ues of the random variable X are placed ontheabscissa,and those takenbythefunction fare placed on the ordinate.The graph of the function f does not allow usto determine the probability for one partic-ular point, but instead to visualize the proba-bility for an interval on a surface.The total surface under the curve corre-sponds to a value of 1:

∫f (x)dx = 1 .

The variable X is a continuous random vari-able if there is a non-negative function f thatis defined for real numbers and for which thefollowing property holds for every interval[a, b]:

P (a ≤ X ≤ b) =∫ b

af (x) dx ,

HereP (a ≤ X ≤ b) is theprobability func-tion.Therefore, theprobability that thecon-tinuous random variable X takes a value inthe interval [a, b] can be obtained by inte-grating the probability function over [a, b].There is also the following condition on f :∫ ∞−∞

f (x) dx = P (∞ ≤ X ≤ ∞) = 1 .

We can represent the density function graph-ically on the system of axes. The different

Page 165: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

160 Dependence

values of the random variable X are placedon the abscissa, and those of the function fon the ordinate.

DOMAINS AND LIMITATIONSIf a = b in P(a ≤ X ≤ b) = ∫ b

a f (x)dx, then

P(X = a) =∫ a

af (x)dx = 0 .

This means that the probability that a con-tinuous random variable takes an isolatedfixed value is always zero.It therefore does not matter whether weinclude or exclude the boundaries of theinterval when calculating the probabilityassociated with an interval.

EXAMPLESConsider X, a continuous random variable,for which the density function is

f (x) ={

34 (−x2 + 2x) , if 0 < x < 2

0 , if not.

We can calculate the probability of X beinghigher than 1. We obtain:

P(X > 1) =∫ ∞

1f (x)dx ,

which can be divided into:

P(X > 1) =∫ 2

1

3

4(−x2 + 2x)dx

+∫ ∞

20dx

= 3

4

(−x3

3+ x2

∣∣∣∣2

1

)

= 1

2.

The density function and the region whereP(X > 1) can be represented graphically:

The shaded surface represents the regionwhere P(X > 1).

FURTHER READING� Continuous distribution function� Continuous probability distribution� Joint density function� Probability

Dependence

The concept of dependence can have twomeanings. The first concerns the events ofa random experiment. It is said that twoevents are dependent if the occurrence ofonedepends on the occurrence of the other.The second meaning of the word concernsthe relation, generally a functional relation,that can exist between random variables.This dependence can be measured, and inmost cases these measurements require ran-dom varible covariance.The most commonly used measure of depen-dence is the correlation coefficient. Thenull hypothesis, which states that thetwo random variables X and Y are inde-pendent, can be tested against the alterna-tive hypothesis of dependence. Tests thatare used for this purpose include the chi-square test of independence, as well astests based on the Spearman rank corre-

Page 166: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

D

Descriptive Statistics 161

lation coefficient and the Kendall rankcorrelation coefficient.

FURTHER READING� Chi-square test of independence� Correlation coefficient� Covariance� Kendall rank correlation coefficient� Spearman rank correlation coefficient� Test of independence

Dependent Variable

The dependent variable (or response vari-able) in a regression model is the variablethat is considered to vary depending on theother variables called independent variablesincorporated into the analysis.

FURTHER READING� Independent variable� Multiple linear regression� Regression analysis� Simple linear regression

DescriptiveEpidemiology

Descriptive epidemiology involves descri-bing the frequencies and patterns of illnessesamong the population.It is based on the collection of health-relatedinformation (in mortality tables, morbidityregisters, illnesses that must be declared . . . )and on information that may have an impacton thehealth of thepopulation(dataonatmo-spheric pollution, risk behavior . . . ), whichare used to obtain a statistical picture of thegeneral health of the population.

HISTORYSee epidemiology.

MATHEMATICAL ASPECTSSee cause and effect in epidemiology,odds and odds ratio, relative risk, at-tributable risk, avoidable risk, incidencerate, prevalence rate.

DOMAINS AND LIMITATIONSDespite the fact that descriptive epide-miology only yields “elementary” infor-mation, it is still vitally important:• It allows us to study the scales and the pat-

terns of health phenomenona (by study-ing theirprevalenceand their incidence),and it facilitates epidemiological surveil-lance,

• It aids decision-making related to theplanning and administration of healthestablishments and programs,

• Itcan lead tohypothesesontheriskfactorsof an illness (example: the increased rateof incidence of skin cancers in the southof France, believed to be related to theincreased intensity of the sun’s rays in thesouth of France compared to the north.).

Descriptive epidemiology cannot be used torelateacause to an effect; it isnotapredictivetool.

FURTHER READINGSee epidemiology.

REFERENCESSee epidemiology.

Descriptive StatisticsTheinformationgathered inastudycanoftentake different forms, such as frequency data

Page 167: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

162 Design of Experiments

(for example, the number of votes cast fora candidate in elections) and scale data.These data are often initially arranged ororganized in such a way that they aredifficultto read and interpret.Descriptive statistics offers us some proce-dures that allow us to represent data in a read-able and worthwhile form.Some of these procedures allow us to obtaina graphical representation of the data, forexample in the following forms:• Histogram,• Bar chart,• Pie chart,• Stem and leaf,• Box plot, etc.whileothersallowus to obtain asetofparam-eters that summarize important properties ofthe basic data:• Mean,• Standard deviation,• Correlation coefficient,• Index number, etc.The descriptive statistics also encompassesmethods of data analysis, such as the corre-spondence analysis, that consist of graphi-cally representing the associations betweenthe rows and the columns of a contingencytable.

HISTORYThefirstknownformofdescriptivestatisticswas the census. Censuses were ordered bythe rulers of ancient civilizations, who want-ed to count their subjects and to monitor theirprofessions and goods.See statistics and graphical representa-tion.

FURTHER READING� Box plot

� Census� Correspondence analysis� Data analysis� Dendrogram� Graphical representation� Histogram� Index number� Line chart� Pie chart� Stem-and-leaf diagram

Design of Experiments

Designing an experiment is like program-ming the experiment in some ways. Eachfactor involved in the experiment cantake a certain number of different values(called levels), and the experimental designemployed specifies the levels of the one ormany factors (or combinations of factors)used in the experiment.

HISTORYExperimental designs were first used in the1920s, mostly in the agricultural domain.Sir Fisher, Ronald Aylmer was the first touse mathematical statistics when designingexperiments. In 1926 he wrote a paper out-lining the principles of experimental designin non-mathematical terms.Federer, W.T. and Balaam, L.N. (1973) pro-vided a very detailed bibliography of liter-ature related to experimental design before1969, incorporating 8000 references.

DOMAINS AND LIMITATIONSThe goal of the experimental design is to findwith the most efficient and economic meth-ods that allow us to reach solid and adequateconclusions on the results from the experi-ment.

Page 168: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

D

Determinant 163

The most frequently applied experimentaldesigns are the completely randomizeddesign, the randomized block design andthe Latin square design.Each design implies a different mathe-matical analysis to those used for the otherdesigns, since the designs really correspondto differentmathematicalmodels. Examplesof these types of analysis include varianceanalysis, covariance analysis and regres-sion analysis.

FURTHER READING� Analysis of variance� Experiment� Model� Optimal design� Regression analysis

REFERENCESFederer, W.T., Balaam, L.N.: Bibliography

on Experimentand TreatmentDesign Pre-1968. Hafner, New York (1973)

Fisher, R.A.: The arrangement of field expe-riments. J. Ministry Agric. 33, 503–513(1926)

Determinant

Any square matrix A of order n has a spe-cial number associated with it, known as thedeterminant of matrix A.

MATHEMATICAL ASPECTSConsider A = (aij), a square matrixof order n. The determinant of A, denot-ed det(A) or |A|, is given by the followingsum:

det(A) =∑

(±)a1i · a2j · . . . · anr ,

where the sum is made overall of the permu-tations of the second index . The sign is posi-tive if the number of inversions in (i, j, . . . , r)is even (an even permutation) and it is neg-ative if the number is odd (an odd permuta-tion).The determinant can also be defined by“developing” along a line or a column. Forexample,wecandevelopthematrixalongthefirst line:

det(A) =n∑

j=1

(−1)1+j · a1j · A1j ,

where A1j is the determinant of the square“submatrix” of order n− 1 obtained from Aby erasing the first line and the jth column.A1j is called the cofactor of element a1j.Since we can arbitrarily choose the line orcolumn, we can write:

det(A) =n∑

j=1

(−1)i+j · aij · Aij

for a fixed i (developing along the ith line),or

det(A) =n∑

i=1

(−1)i+j · aij · Aij

for a fixed j (developing along the jth col-umn).This second way of defining the determi-nant is recursive, because the determinant ofa square matrix of order n is calculated usingthe determinants of the matrices of ordern− 1.

Properties of the Determinant• The determinant of a square matrix A is

equal to thedeterminantof its transposedmatrix A′:

det(A) = det(A′) .

Page 169: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

164 Deviation

• If two lines (or two columns) areexchanged in matrix A, the sign of thedeterminant changes.

• If allof theelementsofa line(oracolumn)are zero, the determinant is also zero.

• If a multiple of a line from matrix A isadded to another line from A, the determi-nant remains the same. This means that iftwo lines from A are identical, it is easy toobtain a line where all of the elements arezero simply by subtracting one line fromthe other; because of the previous proper-ty, the determinant of A is zero. The samesituation occurs when one line is a multi-ple of another.

• The determinant of a matrix that only haszeros under (or over) the diagonal is equalto the product of the elements on the diag-onal (such a matrix is denoted “triangu-lar”).

• Consider A and B, two square matrices oforder n. The determinant of the product ofthe two matrices is equal to the product ofthe determinants of the two matrices:

det(A · B) = det(A) · det(B) .

EXAMPLES1) Consider A, a square matrix of order 2:

A =[

a bc d

]

det(A) = a · d − b · c .

2) Consider B, a square matrix of order 3:

B =⎡⎣

3 2 14 −1 02 −2 0

⎤⎦ .

By developing along the last column:

det(B) = 1 ·∣∣∣∣

4 −12 −2

∣∣∣∣+ 0 ·∣∣∣∣

3 22 −2

∣∣∣∣

+ 0 ·∣∣∣∣

3 24 −1

∣∣∣∣= 1 · {4 · (−2)− (−1) · 2}= −8+ 2

= −6 .

FURTHER READING� Inversion� Matrix� Permutation

DeviationTheconceptofdeviationdescribes thediffer-ence between an observed value and a fixedvalue from the set of possible values ofa quantitative variable.This fixed value is often the arithmeticmean of the set of values or the median.

MATHEMATICAL ASPECTSConsider the set of values x1, x2, . . . , xn. Thearithmetic mean of these values is denotedby x. The deviation of a given value xi withrespect to the arithmetic mean is equal to:

deviation = (xi − x) .

In a similar way, if the median of these samevalues isdenoted by m, thedeviation ofaval-ue xi with respect to the median is equal to:

deviation = (xi − m) .

FURTHER READING� Mean absolute deviation� Standard deviation� Variance

Page 170: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

D

Discrete Distribution Function 165

Dichotomous Variable

A variable is called dichotomous if it cantake only tow values.The simplest example is that of the qualita-tive categorical variable “gender,” whichcan take two values, “male” and “female”.Note that quantitative variables can alwaysbe reduced and dichotomized. The variable“revenue” can, for example, be reduced totwo categories: “lowrevenue”and “highrev-enue”.

FURTHER READING� Binary data� Category� Qualitative categorical variable� Variable

Dichotomy

Dichotomy is the division of the individualsof a population or a sample into two groups,as a function of predetermined criteria.A variable is called dichotomous when itcan only take two values. Such data arecalled binary data.

FURTHER READING� Binary data� Dichotomous variable

Discrete Distribution Function

The distribution function of a discrete ran-dom variable is defined for all real numbersas the probability that the random variabletakes a value less than or equal to this realnumber.

HISTORYSee probability.

MATHEMATICAL ASPECTSThe function F, defined by

F(b) = P(X ≤ b) .

is called the distribution function of the dis-crete random variable X.Given a real number b, the distribution func-tion thereforecorrespondstotheprobabilitythat X is less than or equal to b.The distribution function can be graphical-ly represented on a system of axes. The dif-ferent values of the discrete random vari-able X are displayed on the abscissa and thecumulative probabilities corresponding tothe different values of X, F(x), are shown onthe ordinate.

In the case where the possible values of thediscrete random variable are b1, b2,. . . withb1 < b2 < . . . , the discrete distributionfunction is a step function. F(bi) is constantover the interval [bi, bi+1[.

Properties of the DiscreteDistribution Function1. F is a nondecreasing function; in other

words, if a < b, then F(a) ≤ F(b)

2. F takes values from the interval [0, 1].

Page 171: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

166 Discrete Probability Distribution

3. limb→−∞F(b) = 0.

4. limb→∞F(b) = 1.

EXAMPLESConsider a random experiment that con-sists of simultaneously throwing two dice.Consider the discrete random variable X,corresponding to the total score from the twodice.Let us search for the probability of the event{X ≤ 7}, which is by definition the value ofthe distribution function for x = 7.The discrete random variable X takes itsvalues from the set E = {2, 3, . . . , 12}.The probabilities associated with each val-ue of X are given by the following table:

X 2 3 4 5 6

P (X )136

236

336

436

536

X 7 8 9 10 11 12

P (X )636

536

436

336

236

136

To establish the distribution function, wehave to calculate, for each value b of X,the sum of the probabilities for all valuesless than or equal to b. We therefore createa new table containing the cumulative proba-bilities:

b 2 3 4 5 6 7

P (X ≤ b )136

336

636

1036

1536

2136

b 8 9 10 11 12

P (X ≤ b )2636

3036

3336

3536

3636= 1

The probability of the event {X ≤ 7} istherefore equal to 21

36 .

We can represent the distribution functionof the discrete random variable X as fol-lows:

FURTHER READING� Probability� Probability function� Random experiment� Value

Discrete Probability Distribution

If each possible value of a discrete randomvariable is associated with a certain proba-bility, we can obtain the discrete probabilitydistribution of this random variable.

MATHEMATICAL ASPECTSThe probability distribution of a discreterandom variable X is given by its probabi-lity function P(x) or its distribution func-tion F(x).It can generally be characterized by itsexpected value:

E[X] =∑

D

x · P(X = x) ,

and its variance:

Var(X) =∑

D

(x− E[X])2 · P(X = x) ,

where D represents the set from which X cantake its values.

Page 172: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

D

Discrete Uniform Distribution 167

EXAMPLESThe discrete probability distributions thatare most commonly used are the Bernoul-li distribution, the binomial distribution,the negative binomial distribution, thegeometric distribution, the multinomialdistribution, the hypergeometric distri-bution and the Poisson distribution.

FURTHER READING� Bernoulli distribution� Binomial distribution� Continuous probability distribution� Discrete distribution function� Expected value� Geometric distribution� Hypergeometric distribution� Joint probability distribution function� Multinomial distribution� Negative binomial distribution� Poisson distribution� Probability� Probability distribution� Probability function� Random variable� Variance of a random variable

REFERENCESJohnson, N.L., Kotz, S.: Distributions in

Statistics: Discrete Distributions. Wiley,New York (1969)

Discrete UniformDistribution

The discrete uniform distribution is a dis-crete probability distribution. The corre-sponding continuous probability distri-bution is the (continuous) uniform distri-bution.Consider n events, each of which have thesame probability P(X = x) = 1

n ; the ran-

dom variable X follows a discrete uniformdistribution and its probability function is:

P(X = x) ={

1n , if x = x1, x2, . . . , xn

0 , if not.

Discrete uniform distribution, n = 4

MATHEMATICAL ASPECTSThe expected value of the discrete uniformdistributionover thesetoffirstnnaturalnum-bers is, by definition, given by:

E[X] =n∑

x=1

x · P(X = x)

= 1

n∑x=1

x

= 1

n· (1+ 2+ · · · + n)

= 1

n· n2 + n

2

= n+ 1

2.

The variance of this discrete uniform distri-bution is equal to:

Var(X) = E[X2]− (E[X])2 .

Since

E[X2] =n∑

x=1

x2 · P(X = x)

= 1

n∑x=1

x2

Page 173: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

168 Dispersion

= 1

n·(

12 + 22 + · · · + n2)

= 1

n· n(n+ 1)(2n+ 1)

6

= 2n2 + 3n+ 1

6,

and

(E[X])2 = (n+ 1)2

22,

we have:

Var(X) = 2n2 + 3n+ 1

6− (n+ 1)2

22

= n2 − 1

12.

DOMAINS AND LIMITATIONSThe discrete uniform distribution is oftenused togeneraterandom numbers fromanydiscrete or continuous probability distri-bution.

EXAMPLESConsider the random variable X, the scoreobtained by throwing a die. If the die isnot loaded, the probability of obtaining anyparticular score is equal to 1

6 .Therefore, wehave:

P(X = 1) = 16

P(X = 2) = 16

. . .

P(X = 6) = 16 .

The number of possible events is n = 6. Wetherefore have:

P(X = x) ={

16 , for x = 1, 2, . . . , 6

0 , if not.

In other words, the random variable X fol-lows the discrete uniform distribution.

FURTHER READING� Discrete probability distribution� Uniform distribution

Dispersion

See measure of dispersion.

Distance

Distance is a numerical description of thespacing between two objects. A distancetherefore corresponds to a real number:• Zero, if both objects are the same;• Strictly positive, if not.

MATHEMATICAL ASPECTSConsider three objects X, Y and Z, and thedistance between X and Y, d(X, Y).A distance has following properties:• It is positive,

d(X, Y) > 0 ,

or zero if and only if the objects are thesame

d(X, Y) = 0 ⇔ X = Y .

• It is symmetric, meaning that

d(X, Y) = d(Y, X) .

• It verifies the following inequality, calledthe triangular inequality:

d(X, Z) ≤ d(X, Y)+ d(Y, Z) .

Thissays that thedistance fromoneobjectto another is smaller or equal to the dis-tance obtained by passing through a thirdobject.

Consider X and Y, which are two vectorswith n components,

X = (xi)′ and Y = (yi)

′ ,for i = 1, 2, . . . , n .

Page 174: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

D

Distance 169

A family of distances that is often used insuch a case is given below:

d(X, Y) = p

√√√√n∑

i=1

|xi − yi|p with p ≥ 1 .

These distances are called Minkowski dis-tances.The distance for p = 1 is called the absolutedistance or the L1 norm.The distance for p = 2 is called theEuclidean distance, and it is defined by:

d(X, Y) =√√√√

n∑i=1

(xi − yi)2 .

The Euclidean distance is the most frequent-ly used distance.When the type of distance being used is notspecified, it is generally the Euclidean dis-tance that is being referred to.There are also the weighted Minkowski dis-tances, defined by:

d(X, Y)= p

√√√√n∑

i=1

wi · |xi − yi|p with p ≥ 1 ,

where wi, i = 1, . . . , n represent the differentweights, which sum to 1.When p = 1 or 2, we obtain the weightedabsoluteand Euclidean distances respective-ly.

DOMAINS AND LIMITATIONSFor a set of r distinct objects X1, X2, . . . , Xr,the distance between object i and object j foreach pair (Xi, Xj) can be calculated. Thesedistances are used to create a distance tableor more generally a dissimilarity table con-taining terms that can be generalized to

dij = d(Xi, Xj) , for i, j = 1, 2, . . . , r ,

as used in methods of cluster analysis.Note that the term measure of dissimilari-ty is used if the triangular inequality is notsatisfied.

EXAMPLESConsider the example based on the gradesobtained by five students in theirEnglish, French, maths and physics exami-nations.These grades, which can take any value from1 to 6, are summarized in the following table:

English French Maths Physics

Alain 5.0 3.5 4.0 4.5

Jean 3.5 4.0 5.0 5.5

Marc 4.5 4.5 4.0 3.5

Paul 6.0 5.5 5.5 5.0

Pierre 4.0 4.5 2.5 3.5

The Euclidean distance between Alain andJean, denoted by d2, is given by:

d2(Alain, Jean)

=√

(5.0− 3.5)2 + · · · + (4.5− 5.5)2

= √2.25+ 0.25+ 1+ 1

= √4.5 = 2.12 .

The absolute distance between Alain andJean, denoted by d1, is given by:

d1(Alain, Jean) = |5.0− 3.5| + · · ·+ |4.5− 5.5|= 1.5+ 0.5+ 1+ 1

= 4.0 .

Continuing the calculations, we obtain:

d2(Alain, Marc) = √2.25 = 1.5

d2(Alain, Paul) = √7.5 = 2.74

Page 175: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

170 Distance Table

d2(Alain, Pierre) = √5.25 = 2.29

d2(Jean, Marc) = √6.25 = 2.5

d2(Jean, Paul) = √9 = 3

d2(Jean, Pierre) = √10.75 = 3.28

d2(Marc, Paul) = √7.75 = 2.78

d2(Marc, Pierre) = √2.5 = 1.58

d2(Paul, Pierre) = √16.25 = 4.03

d1(Alain, Marc) = 2.5

d1(Alain, Paul) = 5.0

d1(Alain, Pierre) = 4.5

d1(Jean, Marc) = 4.5

d1(Jean, Paul) = 5.0

d1(Jean, Pierre) = 5.5

d1(Marc, Paul) = 6.5

d1(Marc, Pierre) = 2.0

d1(Paul, Pierre) = 7.5 .

The Euclidean distance table is obtained byplacing these results in the following table:

Alain Jean Marc Paul Pierre

Alain 0 2.12 1.5 2.74 2.29

Jean 2.12 0 2.5 3 3.28

Marc 1.5 2.5 0 2.78 1.58

Paul 2.74 3 2.78 0 4.03

Pierre 2.29 3.28 1.58 4.03 0

Similarly, we can obtain the following abso-lute distance table:

Alain Jean Marc Paul Pierre

Alain 0 4 2.5 5 4.5

Jean 4 0 4.5 5 5.5

Marc 2.5 4.5 0 6.5 2

Paul 5 5 6.5 0 7.5

Pierre 4.5 5.5 2 7.5 0

Note that the order of proximity of the indi-viduals can vary depending on the chosen

distance. For example, if the Euclidean dis-tance is used, it is Alain who is closest toMarc (d2(Alain, Marc) = 1.5), whereasAlain comes in second place if the absolutedistance is used, (d1(Alain, Marc) = 2.5),and it is Pierre who is closest to Marc, witha distance of 2 (d1(Pierre.Marc) = 2).

FURTHER READING� Chi-square distance� Cluster analysis� Distance table� Mahalanobis distance� Measure of dissimilarity

REFERENCESAbdi, H.: Distance. In: Salkind, N.J. (ed.)

Encyclopedia of Measurement and Statis-tics. Sage, Thousand Oaks (2007)

Distance Table

For a set of r individuals, the distance tableis a square matrix of order r, and the gen-eral term (i, j) equals the distance betweenthe ith and the jth individual. Thanks to theproperties of distance, the distance table issymmetrical and its diagonal is zero.

MATHEMATICAL ASPECTSSuppose we have r individuals, denotedX1, X2, . . . , Xr; the distance table construct-ed from these r individuals is a squarematrix D of order r and the general term is

dij = d(Xi, Xj

),

where d(Xi, Xj

)is the distance between the

ith and the jth individual.The properties of distance allow us to statethat:

Page 176: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

D

Distance Table 171

• The diagonal elements dii of D are zerobecause d (Xi, Xi) = 0 (for i = 1, 2,. . . , r),

• D is symmetrical; that is, dij = dji for eachi and for values of j from 1 to r.

DOMAINS AND LIMITATIONSUsingadistancetableprovidesatypeofclus-ter analysis, where the aim is to identifythe closest or the most similar individuals inorder to be able to group them.The distances entered into distance tableare generally assumed to be Euclidian dis-tances, unlessadifferent typeofdistancehasbeen specified.

EXAMPLESLet us take the example of the gradesobtained by five students in four exami-nations: English, French, mathematics andphysics.These grades, which can take any multiple of0.5 from 1 to 6, are summarized in the fol-lowing table:

English French Maths Physics

Alain 5.0 3.5 4.0 4.5

Jean 3.5 4.0 5.0 5.5

Marc 4.5 4.5 4.0 3.5

Paul 6.0 5.5 5.5 5.0

Pierre 4.0 4.5 2.5 3.5

We attempt to measure the distances thatseparate pairs of students. To do this, wecompare two lines of the table (correspond-ing to two students) at a time: to be pre-cise, each grade in thesecond line is subtract-ed from the corresponding grade in the firstline.For example, if we calculate the distancefrom Alain to Jean using the Euclidian dis-

tance, we get:

d(Alain.Jean) = ((5.0− 3.5)2

+ (3.5− 4.0)2 + (4.0

− 5.0)2 + (4.5− 5.5)2)12

= √2.25+ 0.25+ 1+ 1

= √4.5 = 2.12 .

In the same way, the distance between Alainand Marc equals:

d(Alain, Marc) = ((5.0− 4.5)2

+ (3.5− 4.5)2

+ (4.0− 4.0)2

+ (4.5− 3.5)2)12

= √0.25+ 1+ 0+ 1

= √2.25 = 1.5 .

And so on,

d(Alain, Paul) = √7.5 = 2.74

d(Alain, Pierre) = √5.25 = 2.29

d(Jean, Marc) = √6.25 = 2.5

d(Jean, Paul) = √9 = 3

d(Jean, Pierre) = √10.75 = 3.28

d(Marc, Paul) = √7.75 = 2.78

d(Marc, Pierre) = √2.5 = 1.58

d(Paul, Pierre) = √16.25 = 4.03 .

By organizing these results into a table, weobtain the following distance table:

Alain Jean Marc Paul Pierre

Alain 0 2.12 1.5 2.74 2.29

Jean 2.12 0 2.5 3 3.28

Marc 1.5 2.5 0 2.78 1.58

Paul 2.74 3 2.78 0 4.03

Pierre 2.29 3.28 1.58 4.03 0

Page 177: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

172 Distribution Function

We can see that the values along the diago-nal equal zero and that the matrix is sym-metrical. This means that the distance fromeach student to the same student is zero, andthe distance that separates the first studentfrom the second is identical to the distancethat separates the second from the first. So,d (Xi, Xi) = 0 and d

(Xi, Xj

) = d(Xj, Xi

).

FURTHER READING� Cluster analysis� Dendrogram� Distance� Matrix

Distribution Function

The distribution function of a random vari-able is defined for each real number as theprobability that the random variable takesa value less than or equal to this real num-ber.Depending on whether the random vari-able is discrete or continuous, we obtaina discrete distribution function or a con-tinuous distribution function.

HISTORYSee probability.

MATHEMATICAL ASPECTSWe define the distribution function of a ran-dom variable X at value b (with 0 < b <

∞) by:

F(b) = P(X ≤ b) .

FURTHER READING� Continuous distribution function� Discrete distribution function� Probability

� Random variable� Value

Dot Plot

The dot plot is a type of frequency graph.When the data set is relatively small (up toabout 25 data points), the data and their fre-quencies can be represented in a dot plot.This type of graphical representation isparticularly useful for identifying outliers.

MATHEMATICAL ASPECTSThe dot plot is constructed in the followingway.Ascale isestablished thatcontainsallofthe values in the data set. Each datum is thenindividually marked on the graph as a dotabove the corresponding value; if there areseveral data with the same value, the dots arealigned one on top of each other.

EXAMPLESConsider the following data:

5, 2, 4, 5, 3, 2, 4, 3, 5, 2, 3, 4, 3, 17 .

To construct the dot plot, we establish a scalefrom 0 to 20.We then represent the data with dots aboveeach value. We obtain:

Note that this series includes an outlier, at17.

Page 178: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

D

Durbin–Watson Test 173

FURTHER READING� Line chart� Frequency distribution� Histogram� Graphical representation

Durbin–Watson TestThe Durbin–Watson test introduces a statis-tic d that is used to test the autocorrelation ofthe residuals obtained from a linear regres-sion model. This is a problem that oftenappears during the application of a linearmodel to a time series, when we want to testthe independenceof the residualsobtained inthis way.

HISTORYDurbin, J. and Watson, G.S. invented this testin 1950.

MATHEMATICAL ASPECTSConsider the case of a multiple linearregression model containing p − 1 inde-pendent variables. The model is written:

Yt = β0 +p−1∑j=1

βjXjt + εt , t = 1, . . . , T ,

where

Yt is the dependent variable,

Xjt, with j = 1, . . . , p − 1 are the inde-pendent variables

βj, with j = 1, . . . , p−1 are the parametersto be estimated,

εt with t = 1, . . . , T is an unobservablerandom error term.

In the matrix form, the model is written as:

Y = Xβ + ε ,

where

Y is the vector (n × 1) of the observa-tions related to the dependent variable(n observations),

β is a vector (p× 1) of the parameters tobe estimated,

ε is a vector (n× 1) of the errors,

andX =⎛⎜⎝

1 X11 . . . X1(p−1)

......

...1 Xn1 . . . Xn(p−1)

⎞⎟⎠ is the

matrix (n×p) of the design associated withthe independent variables.The residuals, obtained by the method of theleast squares, are given by:

e = Y− Y =[I − X

(X′X

)−1 X′]

Y .

The statistic d is defined as follows:

d =∑T

t=2 (et − et−1)2

∑Tt=1 e2

t

.

The statistic d tests the hypothesis that theerrorsare independentagainst thealternativethat they follow a first-order autoregressiveprocess:

εt = ρεt−1 + ut ,

where |ρ| < 1 and theut are independentandare normally distributed with a mean of zeroand a variance of σ 2. We call the term ρ theautocorrelation. In terms of ρ, the hypoth-esis testing is written as follows:

H0 : ρ = 0

H1 : ρ �= 0 .

DOMAINS AND LIMITATIONSThe construction of this statistic means thatit can take values between 0 and 4. We have

Page 179: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

174 Durbin–Watson Test

d = 2 when ρ = 0. We denote the sam-ple estimate of the observation of ρ by ρ. Inorder to test the hypothesis H0, Durbin andWatson tabulated the critical values of d ata significance level of 5%; these critical val-ues depend on the number of observationsT and the number of explanatory variables(p− 1). The table to test the positive auto-correlation at theα-level, the statistic test d iscompared to lower and upper critical values(dL,dU), and• reject H0 : p = 0 if d < dL

• do not reject H0 if d > dU

• for dL < d < dU the test is inconclusive.Similarily to test negative autocorrelation atthe α-level the statistic 4-d is compared tolower and upper critical values (dL, dU), and• reject H0 : p = 0 if H4 − d < dL

• do not reject H0 if 4− d > dU

• for dL < 4 − d < dU the test is noncon-clusive.

Note that the model must contain a constantterm, because established Durbin–Watsontables generally apply to models with a con-stant term. The variable to be explainedshould not be among the explanatory vari-ables.

EXAMPLESWe consider an example of the number ofhighway accidents in the United States permillion miles traveled by car between 1970and 1984. The data for these years are givenbelow:

4.9, 4.7, 4.5, 4.3, 3.6, 3.4,3.3, 3.4, 3.4, 3.5, 3.5, 3.3,2.9, 2.7, 2.7.

The following figure illustrates how the dataare distributed with time. On the horizontalaxis, zero corresponds to the year 1970, andthe 15 to the year 1984.

Number of accidents = a+ b · t ,

with t = 0, 1, . . . , 14.The equation for the simple linear regressionline obtained for these data is given below:

Number of accidents = 4.59− 0.141 · t .

We then obtain the following residuals:

0.31, 0.25, 0.19, 0.13,−0.43,−0.49,−0.45,−0.21,−0.07, 0.17, 0.32, 0.26,

0.00,−0.06, 0.08.

The value of the statistic d is 0.53. In theDurbin–Watson table for a one-variableregression, we find the values:

dL,0.05 = 1.08

dU,0.05 = 1.36 .

Therefore, we have the case where 0 < d <

dL, so we reject the H0 hypothesis in thefavor of positive autocorrelation between theresiduals of the model. The following figureillustrates the positive correlation betweenthe residuals with time:

Page 180: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

D

Durbin–Watson Test 175

FURTHER READING� Autocorrelation� Multiple linear regression� Residual� Simple linear regression� Statistics� Time series

REFERENCESBourbonnais, R.: Econométrie, manuel et

exercices corrigés, 2nd edn. Dunod, Paris(1998)

Durbin, J.: Alternative method to d-test.Biometrika 56, 1–15 (1969)

Durbin, J., Watson, G.S.: Testing for serialcorrelation in least squares regression, I.Biometrika 37, 409–428 (1950)

Durbin, J., Watson, G.S.: Testing for serialcorrelation in least squares regression, II.Biometrika 38, 159–177 (1951)

Harvey, A.C.: The Econometric Analysis ofTime Series. Philip Allan, Oxford (Wiley,New York ) (1981)

Page 181: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

E

EconometricsEconometrics concerns the applicationof statistical methods to economic data.Economists use statistics to test their theo-ries or to make forecasts.Since economic data are not experimental innature, and often contain some randomness,econometrics uses stochastic models ratherthan deterministic models.

HISTORYThe advantages of applying mathematicsand statistics to the field of economicswere quickly realized. In the second part ofthe seventeenth century, Petty, Sir William(1676) published an important article intro-ducing the methodological foundations ofeconometrics.According to Jaffé, W. (1968),Walras, Léon,professor at the University of Lausanne,is recognized as having originated gener-al equilibrium economic theory, which pro-vides the theoretical basis for modern econo-metrics.On the 29th December 1930, in Cleveland,Ohio, a group of economists, statisticiansand mathematicians founded an Economet-rics Society to promote research into mathe-matical and statistical theories associatedwith the fields of economics. They creat-ed the bimonthly review “Econometrica,”

which appeared for the first time in January1933.The development of simultaneous equationmodels that could be used in time seriesfor economic forecasts was the moment atwhich econometrics emerged as a distinctfield, and it remains an important part ofeconometrics today.

FURTHER READING� Multiple linear regression� Simple linear regression� Time series

REFERENCESJaffé, W.: Walras, Léon. In: Sills, D.L. (ed.)

International Encyclopedia of the SocialSciences, vol. 16. Macmillan and FreePress, New York, pp. 447–453 (1968)

Petty, W.: Political arithmetic. In: WilliamPetty, The Economic Writings . . . , vol 1.Kelley, New York, pp. 233–313 (1676)

Wonnacott, R.J., Wonnacott, T.H.: Econo-metrics. Wiley, New York (1970)

Edgeworth, Francis Y.

Edgeworth, Francis Ysidro was born in 1845at Edgeworthstown (Ireland) and died in

Page 182: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

178 Efron, Bradley

1926. He contributed to various subjects,such as morality, economic sciences, proba-bility and statistics. His early and importantwork on indices and the theory ofutility werelater followedbyastatisticalwork that isnowsometimes called “Bayesian”.The terms “module” and “fluctuation” orig-inated with him.He became the first editor of the EconomicJournal in 1881.

Principal work of Edgeworth, FrancisYsidro:

1881 Mathematical Physics: An Essay onthe Application of Mathematics totheMoralSciences.Kegan Paul,Lon-don.

Efron, Bradley

Efron, Bradley was born in St. Paul, Min-nesota in May 1938, to Efron, Esther andEfron, Miles, Jewish–Russian immigrants.He graduated in mathematics in 1960. Dur-ing that year he arrived at Stanford, wherehe obtained his Ph.D. under the directionof Miller, Rupert and Solomon, Herb in thestatisticsdepartment.Hehastakenupseveralpositions at the University: Chair of Statis-tics, Associate Dean of Science, Chairmanof the University Advisory Board, and Chairof the Faculty Senate. He is currently Profes-sor of Statistics and Biostatistics at StanfordUniversity.He has been awarded many prizes, includ-ing the Ford Prize, the MacArthur Prize andthe Wilks Medal for his research work intothe use of computer applications in statis-tics, particularly regarding bootstrap and thejackknife techniques. He is a Member of the

NationalAcademy ofSciencesand theAme-ricanAcademyofArtsandScience.Heholdsa fellowship in the IMS and the ASA. He alsobecamePresidentof theAmericanStatisticalAssociation in 2004.The term “computer intensive statisticalmethods” originated with him and Diaco-nis, P. His research interests cover a widerange of statistical topics.

Some principal works and articles of Efron,Bradley:

1993 (with Tibshirani, R.J.) An Intro-duction to the Bootstrap. Chapman& Hall.

1983 Estimating the error rate of a pre-diction rule: Improvement on cross-validation. J. Am. Stat. Assoc. 78,316-331.

1982 Jackknife, the Bootstrap and OtherResampling Plans. SIAM, Philadel-phia, PA.

FURTHER READING� Bootstrap� Resampling

EigenvalueLetAbeasquarematrix ofordern. If there isa vector x �= 0 that gives a multiple (k ·x) ofitself when multiplied by A, then this vec-tor is called an eigenvector and the multi-plicative factor k is called an eigenvalue ofthe matrix A.

MATHEMATICAL ASPECTSLet A be a square matrix of order n and x bea nonzero eigenvector. k is an eigenvalue ofA if

A · x = k · x .

Page 183: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

E

Eigenvector 179

that is, (A− k · In) · x = 0, where In is theidentity matrix of order n.As x is nonzero, we can determine the pos-sible values of k by finding the solutions ofthe equation |A− k · In| = 0.

DOMAINS AND LIMITATIONSFor a square matrix A of order n, thedeterminant |A− k · In| is a polynomial ofdegree n in k that has at the most n not neces-sarily unique solutions. There will thereforebe n eigenvalues at the most; some of thesemay be zeros.

EXAMPLESConsider the following square matrix oforder 3:

A =⎡⎣

4 3 20 1 0−2 2 0

⎤⎦

A− k · I3 =⎡⎣

4 3 20 1 0−2 2 0

⎤⎦

− k ·⎡⎣

1 0 00 1 00 0 1

⎤⎦

A− k · I3 =⎡⎣

4− k 3 20 1− k 0−2 2 −k

⎤⎦ .

The determinant of A− k · I3 is given by:

|A− k · I3| = (1− k) · [(4− k)

· (−k)− 2 · (−2)]

= (1− k) · (k2 − 4 · k + 4)

= (1− k) · (k − 2)2 .

The eigenvalues are obtained by finding thesolutions of the equations (1−k)·(k−2)2 =

0. We find

k1 = 1 and

k2 = k3 = 2 .

Therefore, two of the three eigenvalues areconfounded.

FURTHER READING� Determinant� Eigenvector� Matrix� Vector

Eigenvector

Let A be a square matrix of order n. An“eigenvector” of A is a vector x containing ncomponents (not all zeros) where the matrixproduct of A by x is a multiple of x, for exam-ple k · x.We say that the eigenvector x is associatedwith an eigenvalue k.

DOMAINS AND LIMITATIONSLet A be a square matrix of order n; then thevector x (not zero) is an eigenvector of A ifthere is a number k such that

A · x = k · x .

As x is nonzero, we determine the possiblevalues of k by finding solutions to the equa-tion |A− k · In| = 0,whereIn is the identitymatrix of order n.We find the corresponding eigenvectors byresolving thesystemofequationsdefinedby:

(A− k · In) · x = 0 .

Every nonzero multiple of an eigenvectorcan be considered to be an eigenvector. If c

Page 184: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

180 Eigenvector

is a nonzero constant, then

A · x = k · xc · (A · x) = c · (k · x)

A · (c · x) = k · (c · x) .

Therefore, c · x is also an eigenvector if c isnot zero.It is possible to call an eigenvector, thevectorof thenorm1 orunitvectoron theaxisofc·x,and the eigenvector will be on the factorialaxis. This vector is unique (up to a sign).

EXAMPLESWe consider the following square matrix oforder 3:

A =⎡⎣

4 3 20 1 0−2 2 0

⎤⎦ .

Theeigenvaluesof thismatrixAareobtainedby making the determinant of (A− k · I3)

equal to zero

A− k · I3 =⎡⎣

4 3 20 1 0−2 2 0

⎤⎦

− k ·⎡⎣

1 0 00 1 00 0 1

⎤⎦

A− k · I3 =⎡⎣

4− k 3 20 1− k 0−2 2 −k

⎤⎦ .

The determinant of A− k · I3 is given by:

|A− k · I3| = (1− k) · [(4− k)

· (−k)− 2 · (−2)]

= (1− k) ·(

k2 − 4 · k+ 4)

= (1− k) · (k − 2)2 .

Hence, the eigenvalues are equal to :

k1 = 1 and

k2 = k3 = 2 .

Wenowcomputetheeigenvectorsassociatedwith the eigenvalue k1:

k1 = 1 ,

A− k1 · I3 =⎡⎣

4− 1 3 20 1− 1 0−2 2 −1

⎤⎦

=⎡⎣

3 3 20 0 0−2 2 −1

⎤⎦ .

Denoting the components of the eigenvec-tor x as

x =⎡⎣

x1

x2

x3

⎤⎦ .

we then need to resolve the following systemof equations:

(A− k1 · I3) · x = 0⎡⎣

3 3 20 0 0−2 2 −1

⎤⎦ ·

⎡⎣

x1

x2

x3

⎤⎦ =

⎡⎣

000

⎤⎦ .

that is

3x1 + 3x2 + 2x3 = 0

−2x1 + 2x2 − x3 = 0 .

where the second equation has been elimi-nated because it is trivial.The solution to this system of equationsdepending on x2 is given by:

x =⎡⎣

7 · x2

x2

−12 · x2

⎤⎦ .

Thevectorx is an eigenvector of thematrixAfor every nonzero valueof x2.To find theunit

Page 185: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

E

Epidemiology 181

eigenvector, we let:

1 =‖ x ‖2= (7 · x2)2 + (x2)

2 + (−12 · x2)2

= (49+ 1+ 144) · x22 = 194 · x2

2 .

from where x22 =

1

194,

and so x2 = 1√194= 0.0718.

The unit eigenvector of the matrix A associ-ated with the eigenvalue k1 = 1 then equals:

x =⎡⎣

0.500.07−0.86

⎤⎦ .

FURTHER READING� Determinant� Eigenvalue� Eigenvector� Factorial axis� Matrix

REFERENCESDodge, Y.: Mathématiques de base pour

économistes. Springer, Berlin HeidelbergNew York (2002)

Meyer, C.D.: Matrix analysis and AppliedLinear Algebra, SIAM (2000)

Epidemiology

Epidemiology (the word derives from theGreekwords“epidemos”and“logos,”mean-ing “study of epidemics”) is the study of thefrequency and distribution of illness, and theparametersand the risk factors thatdefine thestate of health of a population, in relation totime, space and groups of individuals.

HISTORYEpidemiology originated in the seventeenthcentury, when Petty, William (1623–1687),an English doctor, economist and scien-tist, collected information on the popula-tion, which he described in the work Polit-ical Arithmetic. Graunt, John (1620–1694)proposed the first rigorous analysis of caus-es of death in 1662. Quetelet, Adolphe(1796–1874),Belgian astronomerandmath-ematician, is considered to be the foundorof modern population statistics, the motherdisciplineof epidemiology,statistics, econo-metrics and other quantitative disciplinesdescribing social and biological character-istics of human life. With the work of Farr,William (1807–1883), epidemiology wasrecognized as a separate field from statis-tics, since he studied the causes of death andthe way it varied with age, gender, season,place of residence and profession. The Scot-tish doctor Lind, James (1716–1794) provedthat eating citrus fruit stopped scurvy, whichmarked an important step in the history ofthe epidemiology. Snow, John (1813–1858),a doctor in London, showed theat epidemicspropagated by contagions, which was under-lined by an in situ study that proved thatdrinking Thames water polluted by refusefrom sewers contributes to the disseminationof cholera.

MATHEMATICAL ASPECTSSee cause and effect in epidemiology,odds and odds ratio, relative risk, att-ributable risk, avoidable risk, incidencerate, prevalence rate.

EXAMPLESLet us take an example involving the mor-tality rates in two fictitious regions. Suppose

Page 186: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

182 Error

that we want to analyze the annual mortali-ty in two regions A and B, each populatedwith 10000 inhabitants. The mortality dataare indicated in the following table:

Region A Region B

Deaths/Totalpopulation

500/

100001000/

10000

Total mortality peryear

5% 10%

Annual mortality rateamong those < 60years old

5% 5%

Annual mortality rateamong those > 60years old

15% 15%

% of population< 60 years old

100% 50%

From these data, can we state that the inhab-itants of region B encounter an increasedchance of mortality than the inhabitants ofregion A? In other words, would we recom-mend that people leave region A to live in B?If we divide both populations by age, wenote that, in spite, the different total mortal-ity rates of regions A and B do not seem tobe explained by the different age structuresof A and B.The previous example highlights the multi-tude of factors that should be considered dur-ing data analysis. The principal factors thatwe should consider during an epidemiologi-cal study include: age (as an absolute age andas the generation that the individual belongsto), gender and ethnicity, among others.

FURTHER READING� Attributable risk� Avoidable risk� Biostatistics� Cause and effect in epidemiology� Incidence rate

� Odds and odds ratio� Prevalence rate� Relative risk

REFERENCESLilienfeld, A.M., Lilienfeld, D.E.: Founda-

tions of Epidemiology, 2nd edn. Claren-don, Oxford (1980)

MacMahon, B., Pugh, T.F.: Epidemiology:Principles and Methods. Little Brown,Boston, MA (1970)

Morabia, A.: Epidemiologie Causale. Editi-ons Médecine et Hygiène, Geneva (1996)

Rothmann, J.K.: Epidemiology. An Intro-duction. Oxford University Press (2002)

Error

The error is the difference between the esti-mated value and the true value (or referencevalue) of the concerned quantity.

HISTORYThe error distributions are the probabi-lity distributions which describe the errorthat appears during repeated measures ofa same quantity under the same conditions.They were introduced in the second halfof the eighteenth century to illustrate howthe arithmetic mean can be used to obtaina good approximation of the reference valueof the studied quantity.In a letter to the President of the Royal Soci-ety, Simpson, T. (1756) suggested that theprobability of obtaining a certain error dur-ing an observationcan bedescribedbyadis-crete probability distribution. He pro-posed the first two error distributions. Oth-er discrete distributions were proposed andstudied by Lagrange, J.L. (1776).

Page 187: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

E

Error 183

Simpson,T. (1757) also proposed continu-ous error distributions, as did Laplace, P.S.(1774 and 1781). However, the most impor-tant error distribution, the normal distri-butionwasproposed by Gauss, C.F. (1809).

MATHEMATICAL ASPECTSIn the field of metrology, the following typesof errors can be distinguished:• The absolute error, which is the abso-

lute value of the difference between theobserved value and the reference value ofa certain quantity;

• The relative error, which is the ratiobetween the absolute error and the valueof the quantity itself. It characterizes theaccuracy of a physical measure;

• The accidental error, which is not relat-ed to the measuring tool but to the exper-imenter himself;

• The experimental error, which is the errordue to uncontrolled variables;

• The random error, which is the chanceerror resulting from a combination oferrors due to the instrument and/or theuser. The statistical properties of randomerrors and their estimation are studiedusing probabilities;

• The residual error, which correspondsto the difference between the estimatedvalue and the observed value. The termresidual is sometimes used for this error;

• The systematic error, the error that comesfrom consistent causes (for examplethe improper calibration of a measuringinstrument), and which always happensin the same direction. The bias used instatistics is a particular example of this;

• The rounded error, which is the errorthat is created when a numerical valueis replaced by a truncated value closeto it. When performing calculations with

numbers that have n decimal places, thevalue of the rounded error is locatedbetween

− 12 · 10−n and 1

2 · 10−n .

During long calculation procedures,rounded errors can accumulate and pro-duce very imprecise results. This typeof error is often encountered when usinga computer.

Every error that occurs in a statistical prob-lem is a combination of the different types oferrors listed above.In statistics, and more specifically duringhypothesis testing, there is also:• Type I error, corresponding to the error

committed when the null hypothesis isrejected when it is true;

• Type II error, the error committed whenthe null hypothesis is not rejected when itis false.

FURTHER READING� Analysis of residuals� Bias� Estimation� Hypothesis testing� Residual� Type I error� Type II error

REFERENCESGauss, C.F.: Theoria Motus Corporum

Coelestium. Werke, 7 (1809)

Lagrange, J.L.: Mémoire sur l’utilité de laméthode de prendre le milieu entre lesrésultats de plusieurs observations; danslequel on examine les avantages de cetteméthode par le calcul des probabilités; etoù l’on résoud différents problèmes relat-

Page 188: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

184 Estimation

ifs à cette matière. Misc. Taurinensia 5,167–232 (1776)

Laplace, P.S. de: Mémoire sur la probabi-lité des causes par les événements. Mem.Acad. Roy. Sci. (presented by various sci-entists) 6, 621–656 (1774) (or Laplace,P.S. de (1891) Œuvres complètes, vol 8.Gauthier-Villars, Paris, pp. 27–65)

Laplace, P.S. de: Mémoire sur les probabil-ités. Mem. Acad. Roy. Sci. Paris, 227–332(1781) (or Laplace, P.S. de (1891) Œuvrescomplètes, vol. 9. Gauthier-Villars, Paris,pp. 385–485.)

Simpson, T.: A letter to the Right HonorableGeorge Earl of Macclesfield, President oftheRoyalSociety,on theadvantageof tak-ing the mean of a number of observationsinpracticalastronomy.Philos.Trans.Roy.Soc. Lond. 49, 82–93 (1755)

Simpson, T.: Miscellaneous Tracts on SomeCurious and Very Interesting Subjectsin Mechanics, Physical-Astronomy andSpeculative Mathematics. Nourse, Lon-don (1757)

Estimation

Estimation is the procedure that is used todetermine the value of a particular param-eter associated with a population. To esti-mate the parameter, a sample is drawn fromthepopulation and thevalueof theestimatorfor the unknown parameter is calculated.Estimation is divided into two large cate-gories: point estimation and interval esti-mation.

HISTORYThe concept of estimation dates back tothe first works on mathematical statistics,

notably by Bernoulli, Jacques (1713),Laplace, P.S. (1774) and Bernoulli, Daniel(1778).The greatest advance in the theory of esti-mation, after the introduction of the leastsquares method, was probably the formula-tion of the moments method by Pearson, K.(1894, 1898). However, the foundations ofthe theory of estimation is due to Fish-er, R.A.. In his first work of 1912, he intro-duced the maximum likelihood method.In 1922 he wrote a fundamental paper thatclearly described what estimation really isfor the first time.Fisher, R.A. (1925) also introduced a setof definitions that were adopted to describeestimators. Terms such as “biased,” “effi-cient”and “sufficient” estimatorswere intro-duced by him in his estimation theory.

DOMAINS AND LIMITATIONSThe following graph shows the relationshipbetween sampling and estimation. “Sam-pling” is the process of obtaining a samplefrom a population, while “estimation” is thereverse process: from the sample to the pop-ulation.

FURTHER READING� Confidence interval� Estimator� L1 estimation� Point estimation� Robust estimation� Sample� Sampling

Page 189: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

E

Estimator 185

REFERENCESArmatte, M.: La construction des notions

d’estimation et de vraisemblance chezRonald A. Fisher. In: Mairesse, J. (ed.)Estimation et sondages. Economica, Paris(1988)

Bernoulli, D.: The most probable choicebetween several discrepant observations,translation and comments (1778). In:Pearson, E.S. and Kendall, M.G. (eds.)Studies in the History of Statistics andProbability. Griffin, London (1970)

Bernoulli, J.: Ars Conjectandi, Opus Posthu-mum. Accedit Tractatus de Seriebusinfinitis, et Epistola Gallice scripta deludo Pilae recticularis. Impensis Thurni-siorum, Fratrum, Basel (1713)

Fisher, R.A.: On an absolute criterium forfitting frequency curves. Mess. Math. 41,155–160 (1912)

Fisher, R.A.: On the mathematical foun-dations of theoretical statistics. Philos.Trans. Roy. Soc. Lond. Ser. A 222, 309–368 (1922)

Fisher, R.A.: Theory of statistical estima-tion. Proc. Camb. Philos. Soc. 22, 700–725 (1925)

Gauss, C.F.: Theoria Motus CorporumCoelestium. Werke, 7 (1809)

Laplace, P.S. de: Mémoire sur la probabi-lité des causes par les événements. Mem.Acad. Roy. Sci. (presented by various sci-entists) 6, 621–656 (1774) (or Laplace,P.S. de (1891) Œuvres complètes, vol 8.Gauthier-Villars, Paris, pp. 27–65)

Laplace, P.S. de: Théorie analytique desprobabilités. Courcier, Paris (1812)

Legendre, A.M.: Nouvelles méthodes pourla détermination des orbites des comètes.Courcier, Paris (1805)

Pearson, K.: Contributions to the mathe-matical theory of evolution. I. In: KarlPearson’s Early Statistical Papers. Cam-bridge University Press, Cambridge,pp. 1–40 (1948). First published in 1894as: On the dissection of asymmetrical fre-quency curves. Philos. Trans. Roy. Soc.Lond. Ser. A 185, 71–110

Pearson, K., Filon, L.N.G.: Mathematicalcontributions to the theory of evolution.IV: On the probable errors of frequencyconstants and on the influence of randomselectiononvariationandcorrelation.Phi-los. Trans. Roy. Soc. Lond. Ser. A 191,229–311 (1898)

Estimator

Any statistical function of a sample used toestimate an unknown parameter of the pop-ulation is called an estimator.Any value obtained from this estimator is anestimate of the parameter of the popula-tion.

HISTORYSee estimation.

MATHEMATICAL ASPECTSConsider θ , an unknown parameter definedfor a population, and (X1, X2, . . . , Xn),a sample taken from this population. A sta-tistical function (or simply a statistic) of thissample g(X1, . . . , Xn) is used to estimate θ .To distinguish the statistic g(X1, . . . , Xn),which is a random variable, from the valuethat it takes in a particular case, it is commonto call the statistic g(X1, . . . , Xn) the estima-tor, and the value that is taken by it in a par-ticular case is called an estimate.

Page 190: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

186 Estimator

An estimator should not be confused witha method of estimation.A parameter of a population is usuallydenoted by a Greek letter, and its estima-tor by this same letter but with a circumflexaccent (ˆ) or a hat, to distinguish it from theparameter of the population. Sometimes thecorresponding Roman letter is used instead.

DOMAINS AND LIMITATIONSWhen calculating an estimator, such as thearithmetic mean, it is expected that the val-ueof thisestimatorwillapproach thevalueofthe trueparameter (themeanof thepopula-tion in this case) as the sample size increas-es.It is thereforeclear thatagoodestimatormustbe close to the true parameter. The closerthe estimators cluster around the true param-eters, the better the estimators.In this sense, good estimators are stronglyrelated to statistical measures of dispersionsuch as the variance.In general estimators, should possess certainqualities, such as those described below.1. Estimator without bias:

An estimator θ of an unknown param-eter θ is said to be without bias if itsexpected value is equal to θ ,

E[θ] = θ .

The mean x and the variance S2 of a sam-ple are estimators without bias, respec-tively, of the mean μ and of the varianceσ 2 of the population, with

x = 1

n∑i=1

xi and

S2 = 1

n− 1·

n∑i=1

(xi − x)2 .

2. Efficient estimator:An estimator θ of a parameter θ is said tobe efficient if the variance of θ is smallerthan the variance of any other estimatorof θ .Therefore, for twoestimatorswithoutbiasof θ , one will be more efficient than theother if its variance is smaller.

3. Consistent estimator:An estimator θ of a parameter θ is said tobe consistent if the probability that it dif-fers from θ decreases as the sample sizeincreases.Therefore, an estimator is consistent if

limn→∞P(|θ − θ | < ε) = 1 ,

however small the number ε > 0 is.

EXAMPLESConsider (x1, x2, . . . , xn), a sample drawnfrom a population with a mean of μ anda variance of σ 2.The estimator x, calculated from the sam-ple and used to estimate the parameter μ ofthe population, is an estimator without bias.Then:

E[x] = E

[1

n

n∑i=1

xi

]

= 1

nE

[n∑

i=1

xi

]

= 1

n

n∑i=1

E[xi]

= 1

n

n∑i=1

μ

= 1

nn · μ

= μ .

Page 191: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

E

Estimator 187

To estimate the variance σ 2 of the popula-tion, it is tempting to use the estimator S2

calculated from the sample (x1, x2, . . . , xn),which is defined here by:

S2 = 1

n

n∑i=1

(xi − x)2 .

This estimator is a biased estimatorσ 2 since:

n∑i=1

(xi − x)2 =n∑

i=1

(xi − μ)2 − n(x− μ)2

in the following way:

n∑i=1

(xi − x)2

=n∑

i=1

[(xi − μ)− (x− μ)]2

=n∑

i=1

[(xi − μ)2 − 2(xi − μ)(x− μ)

+ (x− μ)2]

=n∑

i=1

(xi − μ)2 − 2(x− μ) ·n∑

i=1

(xi − μ)

+ n(x− μ)2

=n∑

i=1

(xi − μ)2 − 2(x− μ)( n∑

i=1

xi−n · μ)

+ n(x− μ)2

=n∑

i=1

(xi − μ)2 − 2(x− μ)(n · x− n · μ)

+ n(x− μ)2

=n∑

i=1

(xi − μ)2 − 2n(x− μ)(x− μ)

+ n(x− μ)2

=n∑

i=1

(xi − μ)2 − 2n(x− μ)2 + n(x−μ)2

=n∑

i=1

(xi − μ)2 − n(x− μ)2 .

Hence:

E[S2] = E

[1

n

n∑i=1

(xi − x)2

]

= E

[1

n

n∑i=1

(xi − μ)2 − (x− μ)2

]

= 1

nE

[n∑

i=1

(xi − μ)2

]−E

[(x−μ)2]

= 1

n

n∑i=1

E[(xi − μ)2]−E

[(x− μ)2]

= Var(xi)− Var(x)

= σ 2 − σ 2

n

= n− 1

nσ 2 �= σ 2 .

Therefore, to get an estimator of the vari-ance σ 2 that is not biased, we avoid using S2

and use the same value divided by n−1n

instead:

S′2 = n

n− 1S2

= n

n− 1· 1

n

n∑i=1

(xi − x)2

= 1

n− 1

n∑i=1

(xi − x)2 .

FURTHER READING� Bias� Estimation� Sample� Sampling

REFERENCESHogg, R.V., Craig, A.T.: Introduction

to Mathematical Statistics, 2nd edn.Macmillan, New York (1965)

Page 192: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

188 Event

Lehmann, E.L.: Theory of Point Estimation,2nd edn. Wiley, New York (1983)

EventIn a random experiment, an event is a sub-set of the set the sample space. In otherwords, an event is a set of possible outcomein a random experiment.

MATHEMATICAL ASPECTSConsider �, the sample space of a randomexperiment. Any subset of � is called anevent.

Different Types of Events• Impossible events

An impossible event corresponds to anempty set. The probability of an impos-sible event is equal to 0:

if B = φ , P(B) = 0 .

• Sure eventA sure event is an event that correspondsto the sample space. The probability ofa sure event is equal to 1:

P(�) = 1 .

• Simple eventA simple event is an event that only con-tains one outcome.

Operations on Events• Negation of an event

Consider the event E. The negation of thisevent is another event, called the oppositeevent of E, which is realized when E isnot realized and vice versa. The oppositeeventof E isdenoted by theset E, thecom-plement of E:

E = �\E = {ω ∈ �;ω /∈ E} .

• Intersection of eventsConsider the two events E and F. Theintersection between these two events isanother event that is realized when bothevents are realized simultaneously. It iscalled the “E and F” event, and it is denot-ed by E ∩ F.

E ∩ F = {ω ∈ �;ω ∈ E and ω ∈ F} .

In a more general way, if E1, E2, . . . , Er

are r events, then

r⋂i=1

Ei = {ω ∈ �;ω ∈ Ei, ∀i = 1, . . . , n}

is defined as the intersection of ther events, which is realized when all ofthe r events are realized simultaneously.

• Union of eventsConsider the two same events E and F.The union of these two events is anoth-er event, which is realized when at leastone of the two events E or F is realized.It is sometimes called the “E or F” event,and it is denoted by E ∪ F:

E ∪ F = {ω ∈ �;ω ∈ E or ω ∈ F} .

Properties of Operations on the EventsConsider the three events E, F and G. Theintersection and theunion of theseevents sat-isfy the following properties:• Commutativity:

E ∩ F = F ∩ E ,

E ∪ F = F ∪ E .

• Associativity:

E ∩ (F ∩ G) = (E ∩ F) ∩ G ,

E ∪ (F ∪ G) = (E ∪ F) ∪ G .

Page 193: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

E

Event 189

• Idempotence:

E ∩ E = E ,

E ∪ E = E .

The distributivity of the intersection withrespect to the union is:

E ∩ (F ∪ G) = (E ∩ F) ∪ (E ∩ G) .

The distributivity of the union with respectto the intersection is:

E ∪ (F ∩ G) = (E ∪ F) ∩ (E ∪ G) .

The main negation relations are:

E ∩ F = E ∪ F ,

E ∪ F = E ∩ F .

EXAMPLESConsider a random experiment whichinvolves simultaneously throwing a yellowdie and a blue die.The sample space of this experiment isformed from the set of pairs of possiblescores from the two dice:

� = {(1, 1), (1, 2), (1, 3), . . . , (2, 1) ,

(2, 2), . . . , (6, 5), (6, 6)} .

where, for example: (1, 2) = “1′′ on the yel-low die and “2” on the blue die. The num-ber of pairs (or simple events) that can beformed is given by the number of arrange-ments with repetition of two objects amongsix which is 36. For this sample space, it ispossible to describe the following events, forexample:1. The sum of points is equal to six:

A = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} .

2. The sum of points is even:

B = {(1, 1), (1, 3), (1, 5),

(2, 2), (2, 4), . . . , (6, 4), (6, 6)} .3. The sum of points is less than 5:

C = {(1, 1), (1, 2), (1, 3),

(2, 1), (2, 2), (3, 1)} .4. The sum of points is odd:

D = {(1, 2), (1, 4), (1, 6),

(2, 1), (2, 3), . . . , (6, 5)} = B .

5. The sum of points is even and less than 5:

E = {(1, 1), (1, 3), (2, 2), (3, 1)}= B ∩ C .

6. The blue die lands on “2”:

F = {(1, 2), (2, 2), (3, 2),

(4, 2), (5, 2), (6, 2)} .7. The blue die lands on “2” or the total score

is less than 5:

G = {(1, 2), (2, 2), (3, 2), (4, 2), (5, 2),

(6, 2), (1, 1), (1, 3), (2, 1), (3, 1)}= F ∪ C .

8. A and D are incompatible.9. A implies B.

FURTHER READING� Compatibility� Conditional probability� Independence� Probability� Random experiment� Sample space

Page 194: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

190 Expected Value

Expected Value

The expected value of a random variableis the weighted mean of the values that therandom variable can take, where the weightis the probability that a particular value istaken by the random variable.

HISTORYThe mathematical principle of the expect-ed value first appeared in the work entitledDe Ratiociniis in Aleae Ludo, published in1657 by the Dutch scientist Huygens, Chris-tiaan (1629–1695).His thoughts on the sub-ject appear to have heavily influenced the lat-er works of Pascal and Fermat on probabi-lity.

MATHEMATICAL ASPECTSDepending on whether the random vari-able is discrete or continuous, we refer toeither the expected value of a discrete ran-dom variable or the expected value of a con-tinuous random variable.Consider a discrete random variable Xthat has a probability function p(X). Theexpected value of X, denoted by E[X] or μ,is defined by

μ = E[X] =n∑

i=1

xip(xi)

if X can take n values.If the random variable X is continuous, theexpected value becomes

μ = E[X] =∫

Dxf (x)dx ,

if X takes values over the interval D, wheref (x) is the density function of X.

Properties of the Expected ValueConsider the two constants a and b, and therandom variable X. We then have:1. E[aX + b] = aE[X]+ b;2. E[X + Y] = E[X]+ E[Y];3. E[X − Y] = E [X]− E [Y];4. If X and Y are independent, then:

E[X · Y] = E[X] · E[Y] .

EXAMPLESWe will consider two examples, one con-cerning adiscreterandom variable, theoth-er concerning a continuous random variable.Consider a game of chance where a die isthrown and thescorenoted.Suppose thatyouwin a euro if it lands on an even number, twoeuros if it lands on a “1” or a “3,” and losethree euros if it lands on a 5.The random variable X considered here isthe number of euros that are won or lost. Thefollowing table represents the different val-ues of X and their respective probabilities:

X −3 1 2

P (X ) 1/6 3/6 2/6

The expected value is therefore equal to:

E[X] =3∑

i=1

xip(xi)

= −3 · 16 + 1 · 3

6 + 2 · 26

= 23 .

In other words, the player wins, on average,23 of a euro per throw.Consider a continuous random variable Xwith the following density function:

f (x) ={

1 for 0 < x < 1

0 elsewhere..

Page 195: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

E

Experimental Unit 191

The expected value of this random vari-able X is equal to:

E[X] =∫ 1

0x · 1dx

= x2

2

∣∣∣∣1

0

= 1

2.

FURTHER READING� Density function� Probability function� Weighted arithmetic mean

Experiment

An experiment is an operation conductedunder controlled conditions in order to dis-cover a previously unknown effect, to testor establish a hypothesis, or to demonstratea known law.The goal of experimenting is to clarify therelationbetween thecontrollableconditionsand the result of the experiment.Experimental analysis is performed on ob-servationswhich areaffected notonly bythecontrollable conditions, but also by uncon-trolled conditions and measurement errors.

DOMAINS AND LIMITATIONSWhen the possible results of the experimentcanbedescribed,and it ispossible toattributea probability of realization to each possibleoutcome, the experiment is called a randomexperiment. The set of all possible outcomeof an experiment is called the sample space.A factorial experiment is when the exper-imenter organizes an experiment with twoor more factors. The factors are the control-lable conditions of the experiment. Experi-

mentalerrorscomefromuncontrolledcondi-tions and from measurement errors that arepresent in any type of experiment.An experimental design is established,depending on the aim of the experimentoror the possible resources that are available.

EXAMPLESSee random experiment and factorial ex-periment.

FURTHER READING� Design of experiments� Factorial experiment� Hypothesis testing� Random experiment

REFERENCESBox, G.E.P., Hunter, W.G., Hunter, J.S.:

Statistics for Experimenters. An Intro-duction to Design, Data Analysis, andModel Building. Wiley, New York (1978)

Rüegg, A.: Probabilités et statistique. Press-es Polytechniques Romandes, Lausanne,Switzerland (1985)

Experimental Unit

An experimental unit is the smallest part ofthe experimental material to which we applytreatment.The experimental design specifies the num-ber of experimental units submitted to thedifferent treatments.

FURTHER READING� Design of experiments� Experiment� Factor� Treatment

Page 196: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

192 Exploratory Data Analysis

ExploratoryData Analysis

Exploratory data analysis is an approach todata analysis where the features and char-acteristics of the data are reviewed withan “open mind”; in other words, withoutattempting to apply any particular model tothe data. It is often used upon first contactwith the data, before any models have beenchosen for the structural or stochastic com-ponents, and it is also used to look for devi-ations from common models.

HISTORYExploratory data analysis is a set of tech-niques that have been principally developedbyTukey, John Wildersince1970.Thephi-losophy behind this approach is to examinethe data before applying a specific proba-bility model. According to Tukey, J.W.,exploratory data analysis is similar to detec-tive work. In exploratory data analysis, theseclues can be numerical and (very often)graphical. Indeed, Tukey introduced sev-eral new semigraphical data representationtools to help with exploratory data analysis,including the “box and whisker plot” (alsoknown as the box plot) in 1972, and the stemand leaf diagram in 1977. This diagram issimilar to the histogram, which dates fromthe eighteenth century.

MATHEMATICAL ASPECTSAgood way to summarize theessential infor-mation in a data with exploratory data analy-sis is provided by the five-number summary,which is presented in the form of a table asfollows:

Median

First quartile Third quartile

Minimum Maximum

Let n be the number of observations; wearrange the data in ascending order. Wedefine:

median rank = n+ 1

2

quartile rank = �median rank� + 1

2,

where �x� is the value of x truncated downto the next smallest whole number.In his book, Tukey, J.W. calls the first andthe fourth quartiles “hinges”.

DOMAINS AND LIMITATIONSIn the exploratory data analysis defined byTukey, J.W., there are four principal top-ics. These are graphical representation,re-expression (which is simply the trans-formation of variables), residuals and resis-tance (which is synonymous to the conceptof robustness). The resistance is a measureof sensitivity of the analysis or summary to“bad data”. The need to study the resistancereflects the fact that even “good data rarelycontains less then 5% error and so it is impor-tant to be able to protect the analysis againstthe adverse effects of error”. Tukey’s resis-tant line gives a robust fit to a set of points,which means that this line is not overly sensi-tive to any particular observation. The medi-an is highly resistant, but the mean is not.Graphical representations are used to ana-lyze the behavior of the data, data fits, diag-nosticmeasuresand residuals. In thisway wecan spot any unexpected characteristics andrecognizableir regularity in the data.The development of the exploratory dataanalysis isclosely associated with an empha-sis on visual representation and the use ofa variety of relatively new graphical tech-niques, such as the stem and leaf diagramand the box plot.

Page 197: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

E

Exploratory Data Analysis 193

EXAMPLESIncome indices for Swiss cantons in 1993

Canton Index Canton Index

Zurich 125.7 Schaffhouse 99.2

Bern 86.2 AppenzellRh.-Ext.

84.3

Lucern 87.9 AppenzellRh.-Int.

72.6

Uri 88.2 Saint-Gall 89.3

Schwytz 94.5 Grisons 92.4

Obwald 80.3 Argovie 98.0

Nidwald 108.9 Thurgovie 87.4

Glaris 101.4 Tessin 87.4

Zoug 170.2 Vaud 97.4

Fribourg 90.9 Valais 80.5

Soleure 88.3 Neuchatel 87.3

Basel-Stadt 124.2 Geneva 116.0

Basel-Land 105.1 Jura 75.1

The table provides the income indices ofSwiss cantons per inhabitant (Switzerland= 100) in 1993. We can calculate the five-number summary statistics:

median rank = n+ 1

2= 26+ 1

2= 13.5 .

Therefore, the median Md is the mean of thethirteenth and the fourteenth observations:

Md = 89.3+ 90.9

2= 90.1 .

Then we calculate:

quartile rank = 13+ 1

2= 7 .

The first quartile will be the seventh observa-tion from the start of the data (when arrangedin ascending order) and the third quartile willbe the seventh observation from the end ofthe data:

first quartile = 87.3

third quartile = 101.4 .

Theminimumandmaximumare, respective-ly: 72.6 and 170.2.Thisgivesusthefollowingfive-numbersum-mary:

90.1

87.3 101.4

72.6 170.2

Resistance (or Robustness) of the Medianwith Respect to the MeanWe consider the following numbers: 3, 3, 7,7, 11, 11.Wefirst calculate themean of thesenumbers:

x =∑n

i=1 xi

n

= 3+ 3+ 7+ 7+ 11+ 11

6= 7 .

Then we calculate the median Md:

median rank = n+ 1

2= 6+ 1

2= 3.5

Md = 7+ 7

2= 7 .

We note that, in this case, the mean and themedian both equal 7.Now suppose that we add another number:−1000. We therefore recalculate the meanand the median of the following numbers:−1000, 3, 3, 3, 7, 11, 11. The mean is:

x =∑n

i=1 xi

n

= −1000+ 3+ 3+ 7+ 7+ 11+ 11

7= −136.86 .

It is clear that the presence of even one “baddatum” can affect the mean to a large degree.The median is:

median rank = n+ 1

2= 7+ 1

2= 4

Md = 7 .

Page 198: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

194 Exponential Distribution

Unlike the mean, the median does notchange, which tells us that the median ismore resistant to extremevalues (“bad data”)than the mean.

FURTHER READING� Box plot� Graphical representation� Residual� Stem-and-leaf diagram� Transformation

REFERENCESTukey, J.W.: Some graphical and semigraph-

ical displays. In: Bancroft, T.A. (ed.) Sta-tistical Papers in Honor of George W.Snedecor. Iowa State University Press,Ames, IA , pp. 293–316 (1972)

Tukey, J.W.: Exploratory Data Analysis.Addison-Wesley, Reading, MA (1977)

ExponentialDistribution

A random variable X follows an exponen-tial distribution with parameter θ if its den-sity function is given by:

f (x) ={

θ · e−θx if x ≥ 0; θ > 0

0 if not.

Exponential distribution, θ = 1, σ = 2

The exponential distribution is also calledthe negative exponential.The exponential distribution is a continuousprobability distribution.

MATHEMATICAL ASPECTSThe distribution function of a randomvariableX that followsanexponentialdistri-bution with parameter θ is as follows:

F(x) =∫ x

0θ · e−θ t dt

= 1− e−θx .

The expected value is given by:

E[X] = 1

θ.

Since we have

E[X2] = 2

θ2 .

the variance is equal to:

Var(X) = E[X2]− (E[X])2

= 2

θ2 −1

θ2 =1

θ2 .

The exponential distribution is the continu-ous probability distribution analog to thegeometric distribution.It is actually a particular case of the gammadistribution, where the parameter α of thegamma distribution is equal to 1.It becomes a chi-square distribution withtwo degrees of freedom when the param-eter θ of the exponential distribution is equalto 1

2 .

DOMAINS AND LIMITATIONSThe exponential distribution is used todescribe random events in time. For exam-ple, lifespan is a characteristic that is fre-quently represented by an exponential ran-dom variable.

Page 199: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

E

Extrapolation 195

FURTHER READING� Chi-square distribution� Continuous probability distribution� Gamma distribution� Geometric distribution

Extrapolation

Statistically speaking, an extrapolation is anestimation of the dependent variable forvaluesof independentvariable thatare locat-ed outside of the set of observations.Extrapolation is often used in time serieswhen a model that is determined from thevalues u1, u2, . . . , un observed at timest1, t2, . . . , tn (with t1 < t2 < . . . < tn) isused to predict the value of the variable attime tn+1.

Generally, a simple linear regression func-tion based on the observations (Xi, Yi), i =1, 2, . . . , n can be used to estimate values ofY for values of X that are located outside ofthe set X1, X2, . . . , Xn. It is also possible toobtain extrapolations from amultiple linearregression model.

DOMAINS AND LIMITATIONSCaution is required when using extrapola-tions because they are obtained based on thehypothesis that the model does not changefor the values of the independent variableused for the extrapolation.

FURTHER READING� Forecasting� Multiple linear regression� Simple linear regression� Time series

Page 200: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

F

Factor

A factor is the term used to describe any con-trollable condition in an experiment. Eachfactor can take a predetermined number ofvalues, called the levels of the factor.Experiments that use factors are called fac-torial experiments.

EXAMPLESA sociologist wants to find out the opinionsof the inhabitants of a city about the drugproblem. He is interested in the influenceof gender and income on the attitude to thisissue.The levels of the factor “gender” are:male (M) and female (F); the levels of thefactor “income” correspond to low (L), aver-age (A) and high (H), where the divisionsbetween these levels are set by the experi-menter.The factorial space G of this experimentis composed of the following six combina-tions of the levels of the two factors:

G = {(M,L); (M,A); (M,H); (F,L);(F,A); (F,H)} .

In this example, the factor “gender” is qual-itative and takes nominal values; whereasthe factor “income” is quantitative, and takesordinal values.

FURTHER READING� Analysis of variance� Design of experiments� Experiment� Factorial experiment� Treatment

REFERENCESRaktoe, B.L. Hedayat, A., Federer, W.T.:

Factorial Designs. Wiley, New York(1981)

Factor Level

The levels of a factor are the different valuesthat it can take. The experimental designdetermines which levels, out of all of the pos-sible combinations of such levels, are used inthe experiment.

FURTHER READING� Design of experiments� Experiment� Factor� Treatment

Factorial Axis

Consider S, a square matrix of order n. If Xi

is the eigenvector of S, associated with theeigenvalue ki, we say that Xi is the ith fac-

Page 201: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

198 Factorial Experiment

torial axis; more precisely it is the axis thatcarries the vector Xi.We generally arrange eigenvalues indecreasing order, which explains the needto number the factorial axes.

HISTORYSee correspondence analysis.

DOMAINS AND LIMITATIONSThe concept of a factorial axis is used in cor-respondence analysis.

FURTHER READING� Correspondence analysis� Eigenvalue� Eigenvector� Inertia

REFERENCESSee correspondence analysis.

Factorial Experiment

A factorial experiment is an experiment inwhich all of the possible treatments that canbe derived from two or more factors, whereeach factor has two or more levels, are stud-ied, in a way such that the main effects andthe interactions can be investigated.Theterm“factorialexperiment”describesanexperiment where all of the different factorsare combined in all possible ways, but it doesnot specify the experiment design used toperform such an experiment.

MATHEMATICAL ASPECTSIn a factorial experiment, a model equationcan be used to relate thedependent variableto the independent variable.

Therefore, if the factorial experiment impli-cates two factors, the associated model is asfollows:

Yijk = μ+ αi + βj + (αβ)ij + εijk ,

i = 1, 2, . . . , a (levels of factor A) ,

j = 1, 2, . . . , b (levels of factor B) ,

k = 1, 2, . . . , nij

⎛⎜⎜⎝

number of ex-perimental unitsreceiving thetreatment ij

⎞⎟⎟⎠ ,

where:

μ is the general mean common to allof the treatments,

αi is the effect of the ith level of fac-tor A,

βj is theeffectof the jth leveloffactorB,(αβ)ij is the effect of the interaction be-

tween ith level of A jth level of B,and

εijk is the experimental error in theobservation Yijk.

In general, μ is a constant and εijk is a ran-dom variable distributed according to thenormal distribution with a mean of zeroand a variance of σ 2.If the interaction term (αβ)ij is suppressed,an additive model is obtained, meaning thatthe effect of each factor is a constant thatadds up to the general mean μ, and this doesnot depend on the level of the other factor.This is known as a model without interac-tions.

DOMAINS AND LIMITATIONSIn a factorial experiment where all of thefactors have the same number of levels, thenumber of treatments employed in theexper-iment is usually given by the number of lev-els raised to a power equal to the number

Page 202: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

F

Fisher Distribution 199

of factors. For example, for an experimentthat employs three factors, eachwith two lev-els, the experiment is known as a 23 facto-rial experiment, which employs eight treat-ments.The most commonly used notation for thecombinations of treatments used in a facto-rial experiment can be described as follows.Use a capital letter to designate the factorand a numerical index to designate its level.Consider for example a factorial experimentwith three factors: a factor A with two levels,a factor B with two levels, and a factor C withthree levels. The 12 combinations will be:

A1B1C1 , A1B1C2 , A1B1C3 ,A1B2C1 , A1B2C2 , A1B2C3 ,A2B1C1 , A2B1C2 , A2B1C3 ,A2B2C1 , A2B2C2 , A2B2C3 .

Sometimes only the indices (written in thesame order as the factors) are stated:

111 , 112 , 113 ,121 , 122 , 123 ,211 , 212 , 213 ,221 , 222 , 223 .

Double-factorial experiments, or more gen-erally multiple-factorial experiments, areimportant for the following reasons:• A double-factorial experiment uses

resources more efficiently than two expe-riments that each employ a single factor.The first one takes less time and requiresfewer experimental units to achievea given level of precision.

• A double-factorial experiment allows theinfluence of one factor to be studied ateach level of the other factor because thelevelsofbothfactorsarevaried.This leadsto conclusions that are valid over a largerrange of experimental conditions than ifa series of one-factor designs is used.

• Finally, simultaneous research on twofactors is necessary when interactionsbetween the factors are present, meaningthat the effect of a factor depends on thelevel of the other factor.

FURTHER READING� Design of experiments� Experiment

REFERENCESDodge, Y.: Principaux plans d’expériences.

In: Aeschlimann, J., Bonjour, C., Stock-er, E. (eds.) Méthodologies et techniquesde plans d’expériences: 28ème cours deperfectionnement de l’Association vau-doise des chercheurs en physique, Saas-Fee, 2–8 March 1986 (1986)

Fisher DistributionThe random variable X follows a Fisherdistribution if its density function takes theform:

f (x) = (m+n

2

)

(m

2

)

( n2

)(m

n

)m2

· xm−2

2

(1+ m

n x)m+n

2,

where is the gamma function (see gam-ma distribution) and m, n are the degreesof freedom (m, n = 1, 2, . . .).

Fisher distribution, m = 12, n = 8

Page 203: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

200 Fisher Index

The Fisher distribution is a continuousprobability distribution.

HISTORYThe Fisher distribution was discovered byFisher, R.A. in 1925. The symbol F, usedto denote the Fisher distribution, was intro-duced by Snedecor in 1934, in honor of Fish-er, R.A..

MATHEMATICAL ASPECTSIf U and V are two independent randomvariables each following a chi-squaredistribution, with respectively m and ndegrees of freedom, then the random vari-able:

F = U/m

V/n

follows a Fisher distribution with m and ndegrees of freedom.Theexpected valueofarandom variableFthat follows a Fisher distribution is given by:

E[F] = n

n− 2for n > 2 ,

and the variance is equal to:

Var (F) = 2n2 (m+ n− 2)

m (n− 2)2 (n− 4)n > 4 .

The Fisher distribution with 1 and v degreesof freedom is identical to the square of theStudent distribution with v degrees of free-dom.

DOMAINS AND LIMITATIONSThe importance of the Fisher distribution instatistical theory is related to its applicationto the distribution of the ratio of independentestimators of the variance. Currently, thisdistribution is most commonly used in thestandard tests associated with analysis ofvariance and regression analysis.

FURTHER READING� Analysis of variance� Chi-square distribution� Continuous probability distribution� Fisher table� Fisher test� Student distribution

REFERENCESFisher, R.A.: Statistical Methods for

Research Workers. Oliver & Boyd, Edin-burgh (1925)

Snedecor, G.W.: Calculation and Interpre-tation of Analysis of Variance and Covari-ance. Collegiate, Ames, IA (1934)

Fisher Index

TheFisher index isacomposite index num-ber which allows us to study the increase inthe cost of living (inflation). It is the geomet-ric mean of two index numbers:• The Laspeyres index, and• The Paasche index.

Fisher index =√

(Laspeyres index)× (Paasche index)

.

HISTORYIn 1922, the economist and mathematicianFisher, Irving established a model based onthe Fisher index in order to circumvent someissues related to the use of the index num-bers of Laspeyres and Paasche:Since the Laspeyres index always uses thequantityofgoodssold initiallyasareference,it can overestimate any increase in the costof living (because the consumption of moreexpensive goods may drop over time).Conversely, since the Paasche index alwaysuses the quantity of goods sold in the current

Page 204: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

F

Fisher Index 201

periodasareference, itcanunderestimatetheincrease in the cost of living.Because the Fisher index is the geometricmean of these two indices, it can be con-sidered to be an ideal compromise betweenthem.

MATHEMATICAL ASPECTSUsing thedefinitionsof theLaspeyres indexand thePaasche index, theFisher index Fn/0

is calculated in the following way:

Fn/0 = 100 ·√∑

PnQ0 ·∑PnQn∑P0Q0 ·∑P0Qn

,

where Pn and Qn represent the price of goodsand the quantity of them sold in the currentperiod n, while P0 and Q0 are the price andquantity sold of the same goods during refer-ence period 0. Different values are obtainedfor different goods.Notice that the Fisher index is reversible,meaning that:

Fn/0 = 1

F0/n.

DOMAINS AND LIMITATIONSEven though the Fisher index is a kindof “ideal compromise index number,” it israrely used in practice.

EXAMPLESConsider the following fictitious table indi-cating thepricesand therespectivequantitiessold of threeconsumergoodsat the referenceyear 0 and at the current year n.

Quantities(thousands)

Price(euros)

Goods 1970 1988 1970 1988

(Q0) (Qn) (P0) (Pn)

Milk 50.5 85.5 0.20 1.20

Bread 42.8 50.5 0.15 1.10

Butter 15.5 40.5 0.50 2.00

From the following table, we have:∑

PnQn = 239.15 ,∑

P0Qn = 44.925 ,∑

PnQ0 = 138.68 and∑

P0Q0 = 24.27 .

Goods∑

Pn Qn∑

P0Qn∑

Pn Q0∑

P0Q0

Milk 102.60 17.100 60.60 10.10

Bread 55.55 7.575 47.08 6.42

Butter 81.00 20.250 31.00 7.75

Total 239.15 44.925 138.68 24.27

We can then find the Paasche index:

In/0 =∑

Pn ·Qn∑P0 ·Qn

· 100

= 239.15

44.925· 100 = 532.3 ,

and the Laspeyres index:

In/0 =∑

Pn ·Q0∑P0 ·Q0

· 100

= 138.68

24.27· 100 = 571.4 .

The Fisher index is the square root of theproduct of the index numbers of Paasche andof Laspeyres (or the geometric mean of thetwo index numbers):

Fisher index = √532.3× 571.4 = 551.5 .

According to the Fisher index, the price ofthe goods considered has risen by 451.5%(551.5−100)between thereferenceyearandthe current year.

FURTHER READING� Composite index number� Index number� Laspeyres index� Paasche index� Simple index number

Page 205: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

202 Fisher, Irving

REFERENCESFisher, I.: The Making of Index Numbers.

Houghton Mifflin, Boston (1922)

Fisher, Irving

Fisher, Irving was born in New York on the27th February 1867.He obtained his doctor-ate at Yale University in 1892. His mathe-matical approach to the theory of values andprices,firstdescribed in his thesis, resulted inhim becoming knownasoneof thefirstAme-rican mathematical economists. He foundedtheEconometricSociety in1930withFrisch,Ragnar and Roos, Charles F., and becameits first president. Fisher, I. stayed at YaleUniversity throughout his career. He beganby teaching mathematics, then economics.After a stay in Europe, he was named Pro-fessor of Social Sciences in 1898, and diedin New York in 1947.Fisher was a prolific author: his list of pub-lications contains more then 2000 titles. Hewas interested in both economic theory andscientific research. In 1920, he proposedeconometrical and statistical methods forcalculating indices. His “ideal index”, whichwe now call the Fisher index, is the geo-metric mean of the Laspeyres index and thePaasche index.

Some principal works and articles of Fish-er, Irving:

1892 Mathematical Investigations in theTheory of Value and Prices. Con-necticut Academy of Arts and Sci-ence, New Haven, CT.

1906 The Nature of Capital and Income.Macmillan, New York.

1907 The Rate of Interest: Its Nature,Determination and Relation to Eco-nomic Phenomena. Macmillan, NewYork.

1910 Introduction to Economic Science.Macmillan, New York.

1911 The Purchasing Power of Money.Macmillan, New York.

1912 Elementary Principles of Economics.Macmillan, New York.

1921 The best form of index number. Am.Stat. Assoc. Quart., 17, 533–537.

1922 The Making of Index Numbers.Houghton Mifflin, Boston, MA.

1926 A statistical relation between unem-ployment and price changes. Int.Labour Rev., 13, 785–792.

1927 A statistical method for measuring“marginal utility” and testing thejustice of a progressive income tax.In: Hollander, J.H. (ed) EconomicEssays Contributed in Honor of JohnBates Clark. Macmillan, New York.

1930 The Theory of Interest as Determinedby Impatience to Spend Income andOpportunity to Invest It Macmillan,New York.

1932 Booms and Depressions. Adelphi,New York.

1933 The debt-deflation theory of greatdepressions. Econometrica, 1, 337–357.

1935 100% Money. Adelphi, New York.

FURTHER READING� Fisher index� Index number

Page 206: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

F

Fisher, Ronald Aylmer 203

Fisher, Ronald Aylmer

Born in 1890 in East Finchley, near London,Fisher, Ronald Aylmer studied mathematicsatHarrowand atGonvilleand CaiusCollege,Cambridge. He began his statistical careerin 1919 when he started work at the Insti-tute of Agricultural Research at Rothamsted(“Rothamsted Experimental Station”). Thetime he spent there, which lasted until 1933,was very productive. In 1933, he moved tobecome Professor of Eugenics at UniversityCollege London, and then from 1943 to 1957he took over the Balfour Chair of Genetics atCambridge. Even after he had retired, Fish-er performed some research for the Mathe-matical Statistics Division of the Common-wealth Scientific and Industrial ResearchOrganisation in Adelaide, Australia, wherehe died in 1962.Fisher is recognized as being one of thefounders of modern statistics. Well knownin the field of genetics, he also contributedimportantwork to thegeneral theory of max-imum likelihood estimation. His work onexperimental design, required for the studyof agricultural experimental data, is alsoworthy of mention, as is his development ofthe technique used for the analysis of vari-ance.Fisher originated the principles of random-ization, randomized blocks, Latin squaredesigns and factorial arrangements.Some of the main works and articles of Fi-sher, R.A.:

1912 Onanabsolutecriterionforfittingfre-quency curves.Mess.Math., 41,155–160.

1918 The correlation between relatives onthe supposition of Mendelian inher-

itance. Trans. Roy. Soc. Edinburgh,52, 399–433.

1920 A mathematical examination ofmethods of determining the accu-racy of an observation by the meanerror and by the mean square error.Mon. Not. R. Astron. Soc., 80, 758–770.

1921 On the “probable error” of a coef-ficient of correlation deduced froma small sample. Metron, 1(4), 1–32.

1922 On the mathematical foundation oftheoretical statistics. Phil. Trans. A,222, 309–368.

1925 Theory of statistical estimation. Proc.Camb. Philos. Soc., 22, 700–725.

1925 Statistical Methods for ResearchWorkers. Oliver & Boyd, London.

1934 Probability, likelihood and quantityof information in the logic of uncer-tain inference.Proc.Roy.Soc.A,146,1–8.

1935 The Design of Experiments. Oliver &Boyd, Edinburgh.

1971–1974 Bennett, J.H. (ed) CollectedPapers of R.A. Fisher. Univ. Ade-laide, Adelaide, Australia

FURTHER READING� Fisher distribution� Fisher table� Fisher test

REFERENCESFisher-Box, J.: R.A. Fisher, the Life of a Sci-

entist. Wiley, New York (1978)

Page 207: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

204 Fisher Table

Fisher Table

The Fisher table gives the values of thedistribution function of a random vari-able that follows a Fisher distribution.

HISTORYSee Fisher distribution.

MATHEMATICAL ASPECTSLet the random variable X follow a Fisherdistribution with m and n degrees of free-dom. The density function of the randomvariable X is given by:

f (t) = (m+n

2

)

(m

2

)

( n2

)(m

n

)m2

· tm−2

2

(1+ m

n t)m+n

2

, t ≥ 0 ,

where represents the gamma function (seegamma distribution).Given the degrees of freedom m and n, theFisher table allows us to determine the val-ue x for which the probability P(X ≤ x)equals a particular value.The most commonly used Fisher tables givethe values of x for

P(X ≤ x) = 95% and

(X ≤ x) = 97.5% .

We generally use (x =) Fm,n,α to symbolizethe value of the random variable X for which

P(X ≤ Fm,n,α

) = 1− α .

DOMAINS AND LIMITATIONSThe Fisher table is used in hypothesis test-ing when it involves statistics distributed

according to a Fisher distribution, andespecially during analysis of variance andin simple and multiple linear regressionanalysis.Merrington and Thompson (1943) providedFisher tables for values up to five decimalplaces long, and Fisherand Yates (1938)pro-duced tables up to two decimal places long.

EXAMPLESIf X follows a Fisher distribution with m =10 and n = 15 degrees of freedom, the valueof x that corresponds to a probability of 0.95is 2.54:

P(X ≤ 2.54) = 0.95 .

In the same way, the value of x that corre-sponds to a probability of 0.975 is 3.06:

P(X ≤ 3.06) = 0.975 .

For an example of the use of the Fisher table,see Fisher test.

FURTHER READING� Analysis of variance� Fisher distribution� Multiple linear regression� Simple linear regression� Statistical table

REFERENCESFisher, R.A., Yates, F.: Statistical Tables

for Biological, Agricultural and MedicalResearch. Oliver and Boyd, Edinburghand London (1963)

Merrington, M., Thompson, C.M.: Tables ofpercentage points of the inverted beta (F)distribution. Biometrika 33, 73–88 (1943)

Page 208: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

F

Fisher Test 205

Fisher TestA Fisher test is a hypothesis test on theobserved value of a statistic of the form:

F = U

m· n

V,

where U and V are independent random vari-ables that each follow a chi-square distri-bution with, respectively, m and n degreesof freedom.

HISTORYSee Fisher distribution.

MATHEMATICAL ASPECTSWe consider a linear model with one factorapplied at t levels:

Yij = μ+ τi + εij ,

i = 1, 2, . . . , t ; j = 1, 2, . . . , ni ,

where

Yij represents the jth observation receivingthe treatment i,

μ is the general mean common to all treat-ments,

τi is the real effect of the treatment i on theobservation, and

εij is the experimental error in the obser-vation Yij.

Therandomvariablesεijareindependentanddistributed normally with mean 0 and vari-ance σ 2 : N(0, σ 2).In this case, the null hypothesis that affirmsthat there is a significant difference betweenthe t treatments, is as follows:

H0 : τ1 = τ2 = . . . = τt .

The alternative hypothesis is as follows:

H1 : the values of τi(i = 1, 2, . . . , t)are not all identical.

To compare the variability of within treat-ment also called error or residuals with thevariability between the treatments, we con-struct a ratio where the numerator is an esti-mation of the variance between treatmentand the denominator an estimation of thevariance within treatment or error. The Fish-er test statistic, also called theF-ratio, is thengiven by:

F =

t∑i=1

ni ·(Yi. − Y..

)2

t − 1

t∑i=1

ni∑j=1

(Yij − Yi.

)2

N − t

,

where

ni is thenumberofobservations in ith treat-ment (i = 1, . . . , t)

N =t∑

i=1ni is the total number of observa-

tions,

Yi. =ni∑

j=1

Yijni

is the mean of the ith treatment,

for i = 1, . . . , t, and

Y.. = 1N

t∑

i=1

ni∑

j=1Yij is the global mean.

After choosing a significance level α

for the test, we compare the calculat-ed value F with the appropriate criti-cal value in the Fisher table for t − 1and N − t degrees of freedom, denotedFt−1,N−t,α.IfF ≥ Ft−1,N−t,α,wereject thenullhypothe-sis, which means that at least one of the treat-ments differs from the others.If F < Ft−1,N−t,α, we do not reject the nullhypothesis; in other words, the treatmentspresent no significant difference, which alsomeansthatthe t samplesderivefromthesamepopulation.

Page 209: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

206 Forecasting

DOMAINS AND LIMITATIONSThe Fisher test is most commonly used dur-ing the analysis of variance, in covarianceanalysis, and in regression analysis.

EXAMPLESSee one-way analysis of variance and two-way analysis of variance.

FURTHER READING� Analysis of variance� Fisher distribution� Fisher table� Hypothesis testing� Regression analysis

Forecasting

Forecasting is the utilization of statisticaldata, economic theory and noneconomicconditions in order to form a reasonableopinion about future events.Numerous methods are used to forecast thefuture.Theyareallbasedonarestrictednum-ber of basic hypotheses.One such hypothesis affirms that the signifi-cant tendencies that have been observed his-torically will continue to hold in the future.Another one states that measurable fluctu-ations in a variable will be reproduced atregular intervals, so that these variations willbecome predictable.The time series analysis performed in a fore-cast process is based on two hypotheses:1. Long-term forecast:

Whenwemakelong-termforecasts(morethan five years into the future), analysisand extrapolation of the secular tenden-cy become important.Long-term forecasts based on the seculartendency normally do not take cyclic fluc-

tuations into account. Moreover, season-al variations do not influence annual dataandarenotconsidered ina long-termfore-cast.

2. Mean-term forecast:In order to take the probable effect ofcyclic fluctuations into account in themechanism used to derive a mean-termforecast (one to fiveyearsahead), wemul-tiply the value of the projection of the sec-ular tendency by the cyclic fluctuation,which gives the forecasted value. This, ofcourse, assumes that the same data struc-tures that produce the cyclic variationswill hold in the future, and so theobservedvariationswill continue to recur regularly.In reality, subsequent cycles have the ten-dency to vary enormously in termsof theirperiodicity, their amplitude and the mod-els that they follow.

3. Short-term forecast:The seasonal index must be taken intoaccount in a short-term forecast (a fewmonths ahead), along with the projectedvalues for the secular tendency and cyclicfluctuations.Generally, we do not try to forecast irreg-ular variations.

MATHEMATICAL ASPECTSLet Y be a time series. We can describe itusing the following components:• The secular tendency T;• The seasonal variations S;• The cyclic fluctuations C;• The irregular variations I.We distinguish between:• The multiplicative model Y = T ·S ·C · I;• The additive model Y = T + S+ C + I.For a value of t that is greater than the timetaken to perform an observation, we specify

Page 210: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

F

Fractional Factorial Design 207

that:

Yt is a forecast of the value Y at time t;Tt is a projection of the secular tendency at

time t;Ct is the cyclic fluctuation at time t;St is the forecast for the seasonal variation

at time t.

In this case:1. For a long-term forecast, the value of the

time series can be estimated using:

Yt = Tt .

2. For a mid-term forecast, the value of thetime series can be estimated using:

Yt = Tt · Ct (multiplicative model)

or Yt = Tt + Ct (additive model) .

3. For a short-term forecast, the value of thetime series can be estimated using:

Yt = Tt · Ct · St

(for the multiplicative model)

or Yt = Tt + Ct + St

(for the additive model).

DOMAINS AND LIMITATIONSPlanning and making decisions are twoactivities that involve predicting the future,to some degree. In other words, the adminis-trators that perform these tasks have to makeforecasts.It is worth noting that obtaining forecastsvia time series analysis is worthwhile, eventhough the forecast usually turns out to beinaccurate. In reality, even when it is inac-curate, the forecast obtained in this waywill probably be more accurate than anoth-er forecast based only on intuition. There-fore, despite its limitations and problems,time series analysis is a useful tool in the pro-cess of forecasting.

EXAMPLESThe concept of a “forecast” depends on thesubject being investigated. For example:• In meteorology:

– A short-term forecast looks hoursahead,

– Along-termforecast looksdaysahead.• In economics:

– A short-term forecast looks monthsahead;

– A mid-term forecast looks one to fiveyears ahead;

– A long-term forecast looks more thenfive years ahead.

FURTHER READING� Cyclical fluctuation� Irregular variation� Seasonal index� Seasonal variation� Secular trend� Time series

Fractional Factorial Design

A fractional factorial experimental design isa factorial experiment in which only a frac-tion of the combinations of the factor levelspossible is realized.This type of design is used when an experi-ment contains a number of factors that arebelieved to be more important than the oth-ers, and/or there are a large number of factorlevels. The design allows use to reduce thenumber of experimental units needed.

EXAMPLESFor a factorial experiment containing twofactors, each with three levels, there arenine possible combinations. If four of theseobservations are suppressed, a fraction of

Page 211: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

208 Frequency

the observations are left that can be schema-tized in the following way:

Factor B

Factor A

1 2 31 Y11 Y12 Y132 Y213 Y31

where Yij represents the observation takenfor the ith level of factor A and the jth levelof factor B.

FURTHER READING� Design of experiments� Experiment

REFERENCESYates,F.:Thedesign and analysisof factorial

experiments, Techn. comm. 35, ImperialBureauofSoilScience,Harpenden(1937)

FrequencyThe frequency or absolute frequency corre-spondsto thenumberofappearancesofapar-ticular observation or result in an experi-ment.The absolute frequency can be distinguishedfrom the relative frequency.The relative fre-quency of an observation is defined as theratioof thenumberofappearances to the totalnumber of observations.For certain analyses, it is desirable to knowthe number of observations for which thevalue is, say, “less than or equal to” or “high-er than” a given value. The number of valuesless than or equal to this given value is calledthe cumulative frequency.The percentage of observations in whichthe value is “less than or equal to” or “morethan” a given value may also be desired. Theproportion of values less than or equal to this

given value is called the cumulative relativefrequency.

EXAMPLESConsider, for example, the following16 observations, which represent theheights (in cm) of a group of individuals:

174 169 172 174171 179 174 176177 161 174 172177 168 171 172

Here, thefrequencyofobserving171cmis2;whereas the frequency of observing 174 cmis 4.In thisexample, the relative frequency of174is 4

16 = 0.25 (or 25% of all of the observa-tions). The absolute frequency is 4 and thetotal number of observations is 16.

FURTHER READING� Frequency curve� Frequency distribution� Frequency table

Frequency Curve

If a frequency polygon is smoothed, a curveis obtained, called the frequency curve.This smoothing can be performed if thenumber of observations in the frequencydistribution becomes infinitely large andthe widths of the classes become infinitelysmall.The frequency curve corresponds to the limitshape of a frequency polygon.

HISTORYTowards the end of the nineteenth centu-ry, the general tendency was to consider all

Page 212: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

F

Frequency Distribution 209

distributions to be normal distributions.Histograms that presented several modeswere adjusted to form a set of normal fre-quency curves, and those that presentedasymmetry were analyzed by transformingthe data in such a way that the histogramresulting from this transformation could becompared to a normal curve.In 1895 Pearson, K. proposed creating a setof various theoretical frequency curves inorder to be able to obtain better approxima-tions to the histograms.

FURTHER READING� Continuous probability distribution� Frequency distribution� Frequency polygon� Normal distribution

REFERENCESPearson, K.: Contributions to the mathe-

matical theory of evolution. II: Skewvariation in homogeneous material. In:Karl Pearson’s Early Statistical Papers.Cambridge University Press, Cambridge,pp. 41–112 (1948). First published in1895 in Philos. Trans. Roy. Soc. Lond.Ser. A 186, 343–414

Frequency DistributionThe distribution of a population can beexpressed in relative terms (for example asa percentage), in fractions or in absoluteterms. In all of these cases, the result is calledthe frequency distribution.We also obtain either a discrete frequencydistribution or a continuous frequency distri-bution, depending on whetherthe variablebeing studied is discrete or continuous.The number of appearances of a specific val-ue, or the number of observations for a spe-

cificclass, iscalled the frequency of thatval-ue or class.The frequency distribution that is obtainedin this manner can be represented either asa frequency table or in a graphical way, forexample as a histogram, a frequency poly-gon or a line chart.Any frequency distribution has a corre-sponding relative frequency distribution,which is the distribution of each value oreach class with respect to the total numberof observations.

HISTORYSee graphical representation.

DOMAINS AND LIMITATIONSIn practice, frequency distributions can varyconsiderably. For example:• Somearesymmetric;othersareasymmet-

ric:

Symmetric distribution

Asymmetric distribution

• Some only have one mode (unimodaldistributions); others have several (pluri-modal distributions):

Page 213: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

210 Frequency Distribution

Unimodal distribution

Bimodal distribution

This means that frequency distributions canbe classified into four main types:1. Unimodal symmetric distributions;2. Unimodal asymmetric distributions;3. J-shaped distributions;4. U-shaped distributions.During statistical studies, it is often neces-sary to compare two distributions. If bothdistributions are of the same type, it is possi-ble tocompare thembyexamining theirmaincharacteristics, such as the measure of cen-tral tendency, the measure of dispersionor the measure of form.

EXAMPLESThe following data represent the heights,measured incentimeters,observedforapop-ulation of 27 students from a junior high-school class:

169 177 178 181 173172 175 171 175 172173 174 176 170 172173 172 171 176 176175 168 167 166 170173 169

By ordering and regrouping these observa-tions by value, we obtain the following fre-quency distribution:

Height Frequency Height Frequency

166 1 174 1

167 1 175 3

168 1 176 3

169 2 177 1

170 2 178 1

171 2 179 0

172 4 180 0

173 4 181 1

Here is another example, which concernsthe frequency distribution of a continuousvariable. After retrieving and classifying thepersonal incomes among the population ofNeuchâtel (a state in Switzerland), for theperiod 1975–1976, we obtain the followingfrequency table:

Net revenue(in thousands ofSwiss Francs)

Fre-quencies

Relativefrequency

0–10 283 0.0047

10–20 13175 0.2176

20–50 40316 0.6661

50–80 5055 0.0835

80–120 1029 0.0170

120+ 670 0.0111

Page 214: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

F

Frequency Polygon 211

FURTHER READING� Frequency� Frequency curve� Frequency polygon� Frequency table� Histogram� Interval� Line chart

Frequency Polygon

A frequency polygon is a graphical rep-resentation of a frequency distribution,which fits to the histogram of the frequencydistribution. It is a type of frequency plot.It takes theformofasegmented line that joinsthe midpoint of the top of each rectangle ina histogram.

MATHEMATICAL ASPECTSWe construct a frequency polygon by plot-ting a point for each class of the frequencydistribution. Each point corresponds to themidpoint of the class along the abscisse andthe frequency of theclassalong theordinate.Then we connect each point to its neighbor-ing points by lines. The lines therefore effec-tively link the midpoint of the top of eachrectangle in the histogram of the frequencydistribution.

Normally we close the frequency polygon atthe two ends of the distribution by choosingappropriate points on the horizontal axis andconnecting them to the points for the first andlast classes.

DOMAINS AND LIMITATIONSA frequency polygon is usually construct-ed from data that has been grouped intointervals. In this case, it is preferable toemploy intervals of the same width. If we fita smooth curve to a frequency polygon, weget a frequency curve. This smoothing pro-cess involves regrouping the data such thatthe class width is as small as possible, andthen replotting the distribution.Clearly, we can only perform this type ofsmoothing if the number of units in the pop-ulation is large enough to give a significantnumber of observations in each class afterregrouping.

EXAMPLESThe frequency table below gives the annualmean precipitations for 69 cities in the USA:

Annual mean precipitation(in inches)

Frequency

0–10 4

10–20 9

20–30 5

30–40 24

40–50 21

50–60 4

60–70 2

Total 69

Thefrequency polygonisconstructedvia thehistogram of the frequency distribution:

Page 215: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

212 Frequency Table

To construct the frequency polygon, we plota point for each class. The position of thispoint along the abscisse is given by the mid-point of the class, and its position along theordinate is given by the frequency of theclass. In other words, each point occurs atthe midpoint of the top of the rectangle forthe class in the histogram. Each point is thenjoined to its nearest neighbors by lines. Inorder to close the polygon, a point is placedon the abscisse one class-width before themidpoint of the first class, and another pointis placed on the abscisse one class-widthafter the midpoint of the last class. Thesepoints are then joined by lines to the pointsfor the first and last classes, respectively. Thefrequency polygon we obtain from this pro-cedure is as follows:

FURTHER READING� Frequency distribution� Frequency table� Graphical representation� Histogram� Ogive

Frequency Table

The frequency table is a tool used to repre-sent a frequency distribution. It providesthe ability to represent statistical observa-tions.The frequency of a value of a variable isthe number of times that this value appears ina population. A frequency distribution canthen be defined as the list of the frequenciesobtained for the different values taken by thevariable.

MATHEMATICAL ASPECTSConsider a set of n units described by a vari-able X that can take k values x1, x2, . . . , xk.Letni be thenumberofunitshaving thevaluexi; ni is then the frequency of value xi.The relative frequency of xi is fi = ni

n .Since thevaluesaredifferentand exhaustive,the sum of all of the frequencies ni equals thetotal number n of units in the set, or the sumof the relative frequencies fi equals unity; inother words:

k∑i=1

ni = n andk∑

i=1

fi = 1 .

The corresponding frequency table for thisscenario is as follows:

Values of thevariable X

Frequencies Relativefrequencies

x1 n1 f1

x2 n2 f2

. . . . . . . . .

xk nk fk

Total n 1

Sometimes the concept of cumulative fre-quency is important (see the second example

Page 216: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

F

Frequency Table 213

for instance). This is the sum of all frequen-

cies up to and including frequency fi :i∑

j=1fj.

DOMAINS AND LIMITATIONSConstructing a frequency table is the sim-plest and the most commonly used approachto representing data. However, there aresome rules that should be respected whencreating such a table:• The table must have a comprehensive and

concise title, which mentions the unitsemployed;

• The names of the lines and columns mustbe precise and short;

• The column totals should be provided;• The source of the data given in the table

must be indicated.In general, the table must be comprehensi-ble enough to understand without needing toread the accompanying text.The number of classes chosen to representthedatadependson thesizeof theset studied.Using a large number of classes for a smallset will result in irregular frequencies due tothe small number of units per class. On theotherhand,using smallnumberofclasses fora large set results in the loss of informationabout the structure of the data.However, there is no general rule for deter-mining the number of classes that should beused to construct a frequency table. For rel-atively small sets (n ≤ 200), between 7 and15 classes are recommended, although thisrule is not absolute.To simplify, it is frequent to use classes ofthesamewidth,and theclass intervalsshouldnot overlap. In certain cases, for example ifwe have a variable that can take a large rangeof values, but a certain interval of values isexpected tobefarmorefrequent thantherest,

it can be useful to use an open class, such as“1000 or more.”

EXAMPLESFrequency distribution of Australian resi-dents (in thousands) according to their mat-rimonial statuses, on the 30th June 1981 isgiven in the following table.

Matrimonialstatus

Frequency(in thousands)

Relativefrequency

Single 6587 0.452

Married 6837 0.469

Divorced 403 0.028

Widowed 749 0.051

Total 14576 1.000

Source: ABS (1984) Pocket Year Book Aus-tralia. Australian Bureau of Statistics, Canberra,p. 11.

Thedata ispresented in theformofafrequen-cy table. The variable “Matrimonial status”can take four different values: single, mar-ried, divorced or widowed. The frequency,expressed in thousands, and the relative fre-quency are given for each value.In the following example, we consider thecase where observations are grouped intoclasses. This table represents the classifi-cation of 3014 people based on their incomesin 1955.

Income Frequency

ni Relative RelativeCumulative

Less than1000

261 0.0866 0.0866

1000–1999 331 0.1098 0.1964

2000–2999 359 0.1191 0.3155

3000–3999 384 0.1274 0.4429

4000–4999 407 0.1350 0.5780

Page 217: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

214 Frequency Table

Income Frequency

ni Relative RelativeCumulative

5000–7499 703 0.2332 0.8112

7500–9999 277 0.0919 0.9031

10000 ormore

292 0.0969 1.0000

Total 3014 1.0000

From this table, we can see that 57.8% ofthese 3014 people have incomes that aresmaller then 5000 dollars.

FURTHER READING� Frequency� Frequency distribution

Page 218: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

G

Galton, Francis

Galton, Francis was born in 1822 near Birm-ingham, England, to a family of intellectu-als: Darwin, Charles was his cousin, and hisgrandfatherwasamemberof theRoyalSoci-ety.His interest in science first manifested itselfin the fields of geography and meteorology.Elected as a member of the Royal Societyin 1860, he began to become interested ingenetics and statistical methods in 1864.Galton was close friends with Pearson, K..Indeed, Pearson came to his financial aidwhen Galton founded the journal “Biometri-ka.” Galton’s “Eugenics Record Office”merged with Pearson’s biometry laborato-ry at University College London and becameknown as the “Galton Laboratory.”He died in 1911, leaving more than 300 pub-lications including 17 books, most notablyon statistical methods related to regressionanalysis and the concept of correlation,both of which are attributed to him.

Some of the main works and articles of Gal-ton, F.:

1869 Hereditary Genius: An Inquiry intoits Laws and Consequences. Macmil-lan, London.

1877 Typical laws in heredity. Nature, 15,492–495, 512–514, 532–533.

1889 Natural Inheritance. Macmillan,London.

1907 Probability, theFoundationofEugen-ics. Henry Froude, London.

1908 Memories of my Life. Methuen, Lon-don.

1914–1930 Pearson, K. (ed) The Life, Let-ters and Labours of Francis Galton.Cambridge University Press, Cam-bridge

FURTHER READING� Correlation coefficient� Regression analysis

Gamma Distribution

A random variable X follows a gammadistribution with parameter α if its densityfunction is given by

f (x) = βα

(α)xα−1 · e−βx ,

if x > 0 , α > 0 , β > 0 ,

where (α) is the gamma function.

Page 219: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

216 Gauss, Carl Friedrich

Gamma distribution, α = 2, β = 1

The standard form of the gamma distributionis obtained by putting β = 1, which gives

f (x) = 1

(α)xα−1 · e−x ,

if x > 0 , α > 0 .

The gamma function ( ) appears frequentlyin statistical theory. It is defined by:

(α) =∫ ∞

0tα−1 e−t dt .

In statistics, we are only interested in valuesof α > 0.Integrating by parts, we obtain:

(α + 1) = α (α) .

Since (1) = 1 ,

we have

(α+1) = α! for every positive integer α.

HISTORYAccording to Lancaster (1966), this contin-uous probability distribution was origi-nated by Laplace, P.S. (1836).

MATHEMATICAL ASPECTSThe expected value of the gamma distri-bution is given by:

E[X] = 1

β· (α + 1)

(α)= α

β.

Since

E[X2] = 1

β2· (α + 2)

(α)= α(α + 1)

β2,

the variance is equal to:

Var(X) = E[X2]− (E[X])2

= α(α + 1)− α2

β2

= α

β2 .

The chi-square distribution is a particularcase of the gamma distribution where α = n

2and β = 1

2 , and n is the number of degreesof freedom of the chi-square distribution.The gamma distribution with parameterα = 1 gives the exponential distribution:

f (x) = β · e−βx .

FURTHER READING� Chi-square distribution� Continuous probability distribution� Exponential distribution

REFERENCESLancaster, H.O.: Forerunners of the Pear-

son chi-square. Aust. J. Stat. 8, 117–126(1966)

Laplace, P.S. de: Théorie analytique desprobabilités, suppl. to 3rd edn. Courcier,Paris (1836)

Gauss, Carl FriedrichGauss, Carl Friedrich was born in 1777 inBrunswick in Germany. He quickly becamea renowned astronomer and mathematicianand is still considered to be on a par withArchimedes and Newton as one of the great-est mathematicians of all time.

Page 220: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

G

Gauss–Markov Theorem 217

He obtained his doctorate in 1799, and thenworked at the University of Helmsted. In1807, he moved to Göttingen, where hebecame Laboratory Director. He spent therest of his life in Göttingen, where he diedin 1855.His contributions to science, particularlyphysics, are of great importance. In statis-tics, his works concerned the theory of esti-mation, and the least squares method andthe application of the normal distributionto problems related to measurement errorsboth originated with him.

Some of the main works and articles ofGauss, C.F.:

1803–1809 Disquisitiones de elementisellipticis pallidis. Werke, 6, 1–24.

1809 Theoria motus corporum coelestium.Werke, 7. (1963 English translationby Davis, C.H., published by Dover,New York).

1816 Bestimmung der Genauigkeit derBeobachtungen. Werke, 4, 109–117.

1821, 1823 and 1826 Theoria combinatio-nis observationum erroribus minimisobnoxiae,parts1,2 and suppl.Werke,4, 1–108.

1823 Anwendungen der Wahrschein-lichkeitsrechnung auf eine Aufgabeder praktischen Geometrie. Werke,9, 231–237.

1855 Méthode des Moindres Carrés:Mémoires sur la Combinaison desObservations. French translation oftheworkofGauss,C.F.byBertrand,J.(authorized by Gauss, C.F. himself).Mallet-Bachelier, Paris.

1957 Gauss’ Work (1803–1826). On TheTheory of Least Squares. Englishtranslation by Trotter, H.F. Technical

Report No. 5, Statistical TechniquesResearch Group, Princeton, NJ.

FURTHER READING� De Moivre, Abraham� Gauss–Markov theorem� Normal distribution

Gauss–Markov Theorem

The Gauss–Markov theorem postulates thatwhen the error probability distribution isunknown in a linear model, then, amongstall of the linear unbiased estimators for theparameters of the linear model, the esti-mator obtained using the method of leastsquares is the one that minimizes the vari-ance. The mathematical expectation ofeach error is assumed to be zero, and all ofthem have the same (unknown) variance.

HISTORYGauss, Carl Friedrich provided a proofof this theorem in the first part of hiswork “Theoria combinationis observation-um erroribus minimis obnoxiae” (1821).Markov, Andrei Andreyevich rediscoveredthis theorem in 1900.A version of the Gauss–Markov theoremwritten in modern notation, was provided byGraybill in 1976.

MATHEMATICAL ASPECTSConsider the linear model

Y = X · β + ε ,

where

Y is the n× 1 vector of the observations,X is the n × p matrix of the independent

variables that are considered fixed,

Page 221: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

218 Generalized Inverse

β is the p×1 vector of the unknown param-eters, and

ε is the n× 1 vector of the random errors.

If the error probability distribution isunknown but the following conditions arefulfilled:1. The mathematic expectation E[ε] = 0,2. The variance Var(ε) = σ 2 · In, where In

is the identity matrix,3. The matrix X has a full rank,then the estimator of β,

β = (X′X

)−1X′Y ,

derived via the least squares method is thelinear estimator without bias of β that hasleast variance.

FURTHER READING� Estimator� Least squares

REFERENCESGauss, C.F.: Theoria Combinationis Obser-

vationum Erroribus Minimis Obnoxiae,Parts 1, 2 and suppl. Werke 4, 1–108(1821, 1823, 1826)

Graybill, F.A.: Theory and Applications ofthe Linear Model. Duxbury, North Scit-uate, MA (Waldsworth and Brooks/Cole,Pacific Grove, CA ) (1976)

Rao, C.R.: Linear Statistical Inference andIts Applications, 2nd edn. Wiley, NewYork (1973)

Generalized InverseThe generalized inverse is analogous to theinverseofanonsingularsquarematrix,but isused for a matrix of any dimension and rank.The generalized inverse is used in the reso-lution of systems of linear equations.

HISTORYIt appears that Fredholm (1903) was thefirst to consider the concept of a generalizedinverse. Moore defined a single generalizedinverse in his book General Analysis (1935),published after his death. However, his workwas not used until the 1950s, when it experi-enced a surge in interest due to the applica-tion of the generalized inverse to problemsrelated to least squares.In 1955 Penrose, reusing and enlarging uponwork published in 1951 by Bjerhammar,showed that the Moore’s generalized inverseis a unique matrix G that satisfies the follow-ing four equations:

1. A = A ·G ·A;2. G = G ·A ·G;3. (A ·G)′ = A ·G;4. (G · A)′ = G ·A.

This unique generalized inverse is known asthe Moore–Penrose inverse, and is denotedby A+.

MATHEMATICAL ASPECTSThen×mmatrixG is thegeneralized inverseof the m× n matrix A if

A = A ·G ·A .

The matrix G is unique if and only if m = nand A is not singular.The most common notation used for the gen-eralized inverse of a matrix A is A−.

FURTHER READING� Inverse matrix� Least squares

REFERENCESBjerhammar, A.: Rectangular reciprocal

matrices with special reference to geode-

Page 222: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

G

Generalized Linear Regression 219

tic calculations. Bull. Geod. 20, 188–220(1951)

Dodge, Y., Majumdar, D.: An algorithm forfinding least square generalized invers-es for classification models with arbitrarypatterns. J. Stat. Comput. Simul. 9, 1–17(1979)

Fredholm, J.: Sur une classe d’équationsfonctionnelles. Acta Math. 27, 365–390(1903)

Moore, E.H.: General Analysis, Part I.Mem. Am. Philos. Soc., Philadelphia,PA, pp. 147–209 (1935)

Penrose, R.: A generalized inverse for matri-ces. Proc. Camb. Philos. Soc. 51, 406–413(1955)

Rao, C.R.: A note on generalized inverse ofa matrix with applications to problems inmathematical statistics. J. Roy. Stat. Soc.Ser. B 24, 152–158 (1962)

Rao,C.R.:Calculusofgeneralized inverseofmatrices. I. General theory. Sankhya A29,317–342 (1967)

Rao, C.R., Mitra, S.K.: Generalized Invers-es of Matrices and its Applications. Wiley,New York (1971)

Generalized Linear Regression

An extension of the linear regression mod-el to settings more flexible in underly-ing assumptions that the linear regressionrequires. For example, instead of assumingthat the errors should have equal variances,we could have the following forms:• Heteroscedasticity: theerrorsobserved in

the model (also called residuals) can havedifferent variances among the observa-

tions (or among the different groups ofobservations);

• Autocorrelation: there can be some cor-relation between the errors in the differ-ent observations.

MATHEMATICAL ASPECTSWeconsiderageneralmodelofmultiple lin-ear regression:

Yi = β0 +p−1∑j=1

βjXij + εi , i = 1, . . . , n ,

where

Yi is the dependent variable,Xji, j = 1, . . . , p − 1 are the independent

variables,βj, j = 0, . . . , p − 1 are the parameters to

be estimated, andεi is the random nonobservable error

term.

In matrix form, we write:

Y = Xβ + ε ,

where:

Y is the vector (n× 1) of the observationsrelated to the dependent variable (nobservations),

β is the vector (p×1) of the parameters tobe estimated

ε is the vector (n× 1) of the errors,

and

X =⎛⎜⎝

1 X11 . . . X1(p−1)

......

...1 Xn1 . . . Xn(p−1)

⎞⎟⎠

Page 223: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

220 Generalized Linear Regression

is the matrix (n × p) related to the inde-pendent variables.In an ordinary regression model, thehypotheses used for the errors εi are gen-erally as follows:• Each expectation equals zero (in other

words, the observations are not biased),• They are independent and are not corre-

lated,• their variance V(εi) is constant and

equals σ 2 (we suppose that the errorsare of the same size).

If the chosen model is appropriate, thedistribution of residuals must confirm thesehypotheses (see analysis of variance).If we find that this is not the case, or ifwe obtain supplementary information thatindicates that the errors are correlated, thenwe can use generalized linear regression toobtain a more precise estimation for theparameters β0, . . . , βp−1. We then introducethe variance–covariance matrix V(ε) of (ofdimension n × n) of the errors, which isdefined by the following equation for row iand column j:• V(εi) if i = j (variance in εi),• Cov

(εi, εj

)(covariance between εi and

εj) if i �= j.Often this matrix needs to be estimated, butwe first consider the simplest case, whereV (ε) is known. We use the usual term forvariance, σ 2 : V(ε) = σ 2V.

Estimation of the Vector β

when V Is KnownBy transforming our general model into anordinary regression model, it is possible toestimate the parameters of the generalizedmodel using:

β =(

X′V−1X)−1

X′V−1Y .

We can prove that:1. The estimation is not biased; that is:

E(β) = β.2. The following equality is verified:

V(β)= σ 2

(X′V−1X

)−1,

with σ 2 = 1

n− pε′V−1ε.

3. β is thebest (meaning that it has thesmall-est variance) unbiased estimator ofβ; thisresult is known as the generalized Gauss–Markov theorem.

Estimation of the Vector β

when V Is UnknownWhen the variance–covariance matrix is notknown, the model used is called generalizedregression. The first step consists of express-ing V as a function of a parameter θ , whichallows us to estimate V using V = V(θ ).The estimations for the parameters are thenobtained by substituting V with V in the for-mula given above:

β = (X′V−1X)−1X′V−1Y .

Transformation of V in Three Typical Casesa) HeteroscedasticityWe assume that the observations have vari-ance σ 2

wi, where wi is the weight, which can

be different for each i = 1, . . . , n. So:

V = σ 2

⎛⎜⎜⎜⎜⎜⎝

1w1

0 . . . 0

0 1w2

. . . 0...

.... . .

...

0 0 . . . 1wn

⎞⎟⎟⎟⎟⎟⎠

.

By making

Yw =WY

Xw =WX

εw =Wε ,

Page 224: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

G

Generalized Linear Regression 221

where W is the following square matrix oforder n:

W =

⎛⎜⎜⎜⎝

√w1 0 . . . 0

0√

w2 . . . 0...

.... . .

...0 0 . . .

√wn

⎞⎟⎟⎟⎠ ,

such that W′W = V−1, we obtain the equiv-alent model:

V(εw) = V(Wε) =WV(ε)W′

= σ 2WVW = σ 2In ,

where In is the identity matrix of dimen-sion n. Since the variances of all of the errorsare the same in this new model, we can applythe least squares method and we obtain thevector of the estimators:

βw = (X′wXw)−1X′wYw

= (X′W′WX)−1X′W′WY

= (X′V−1X)−1X′V−1Y ,

so the vector of the estimated values for Y=W−1Yw is:

Y =W−1Yw

=W−1WX(X′V−1X)−1X′V−1Y

= Xβw .

b) Heteroscedasticity by groupsWe suppose that the observations fall intothree groups (of size n1, n2, n3) with respec-tive variances σ 2

1 , σ 22 and σ 2

3 . So:

V =⎛⎜⎝

σ 21 In1 On2×n1 On3×n1

On1×n2 σ 22 In2 On3×n2

On1×n3 On2×n3 σ 23 In3

⎞⎟⎠ ,

where Inj is the identity matrix of order nj

and Oni×nj is the null matrix of order ni×nj.

The variances σ 21 , σ 2

2 , σ 23 can be estimat-

ed by performing ordinary regressions onthe different groups of observations and byestimating σ 2 using S2 (see multiple linearregression).

c) First-order autocorrelationWe assume that there is correlation betweenthe errors in different observations. In partic-ular, we assume first-order autocorrelation;that is the error εi for observation i dependsontheerrorεi−1for thepreviousobservation.We then have:

εi = ρεi−1 + ui , i = 2, . . . , n ,

where the usual hypotheses used for theerrors of a regression model are applied tothe ui.In this case, the variance–covariance matrixof errors can be expressed by:

V(ε) = σ 2V

= σ 2 1

1− ρ2

·

⎛⎜⎜⎜⎜⎜⎜⎜⎝

1 ρ ρ2 . . . ρn−1

ρ 1 ρ . . . ρn−2

... ρ. . .

. . ....

......

. . .. . . ρ

ρn−1 ρn−2 . . . ρ 1

⎞⎟⎟⎟⎟⎟⎟⎟⎠

.

WecanestimateVbyreplacing thefollowingestimation of ρ in the matrix V:

ρ =∑n

i=2 eiei−1∑ni=2 e2

i−1

(empirical correlation) ,

where the ei are the estimated error (residu-als), with an ordinary regression.We can test the first-order autocorrelationby performing a Durbin–Watson test on theresiduals (as described under the entry forserial correlation).

Page 225: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

222 Generation of Random Numbers

EXAMPLESCorrection for heteroscedasticity when thedata represent meansIn order to test the efficiency of a new sup-plement on the growth of chickens, 40 chick-ens were divided into five groups (of sizen1, n2, . . . , n5) and given the different dosesof the supplement. The growth data obtainedare summarized in the following table:

Groupi

Number ofchickens ni

Meanweight Yi

Meandose Xi

1 12 1.7 5.8

2 8 1.9 6.4

3 6 1.2 4.8

4 9 2.3 6.9

5 5 1.8 6.2

Since thevariables representmeans, thevari-

ance of the error εi is σ 2

ni.

The variance–covariance matrix of errors isthen:

V(ε) = σ 2

⎛⎜⎜⎜⎜⎜⎜⎝

112 0 0 0 0

0 18 0 0 0

0 0 16 0 0

0 0 0 19 0

0 0 0 0 15

⎞⎟⎟⎟⎟⎟⎟⎠

= σ 2V .

The inverse matrix V−1 is then:

V−1 =

⎛⎜⎜⎜⎜⎜⎝

12 0 0 0 00 8 0 0 00 0 6 0 00 0 0 9 00 0 0 0 5

⎞⎟⎟⎟⎟⎟⎠

,

and theestimations for theparametersβ0andβ1 in the model

Yi = β0 + β1Xi + εi

are:β0 = −1.23

β1 = 0.502 .

This result is the same as that obtained usingthe weighted least squares method.

FURTHER READING� Analysis of variance� Gauss–Markov theorem� Model� Multiple linear regression� Nonlinear regression� Regression analysis

REFERENCESBishop, Y.M.M., Fienberg, S.E., Hol-

land, P.W.: Discrete Multivariate Anal-ysis: Theory and Practice. MIT Press,Cambridge, MA (1975)

Graybill, F.A.: Theory and Applications ofthe Linear Model. Duxbury, North Scit-uate, MA (Waldsworth and Brooks/Cole,Pacific Grove, CA ) (1976)

Rao, C.R.: Linear Statistical Inference andIts Applications, 2nd edn. Wiley, NewYork (1973)

Generationof Random Numbers

Random numbers generation involves pro-ducing a series of numbers with no recogniz-able patterns or regularities, that is, appearrandom. These random numbers are oftenthen used to perform simulations and tosolve problems that are difficult or impossi-ble to resolve analytically, using the MonteCarlo method.

Page 226: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

G

Generation of Random Numbers 223

HISTORYThe first works concerning random num-ber generation were those of von Neumann,John (1951), who introduced the “middlefour square” method. Lehmer (1951) intro-duced the simple congruence method, whichis still used today.Sowey published three articles giving a clas-sified bibliography on random number gen-eration and testing (Sowey, E.R. 1972, 1978and 1986).In 1996, Dodge, Yadolah proposed a natu-ral method of generating random numbersfrom the digits after the decimal point in π .Some decimals of π presented in Appen-dix A.

MATHEMATICAL ASPECTSThere are several ways of generating ran-dom numbers. The first consists of pullingout notes that are numbered from 0 to 9 fromanurnonebyone,and thenputting thembackin the urn.Certain physical systems can be used to gen-erate random numbers: for example theroulette wheel. The numbers obtained canbe listed in a random number table. Thesetables are useful if several simulations needto be performed for different models in orderto test which one is the best. In this case,the same series of random numbers can bereused.Some computers use electronic circuits orelectromechanical means to provide theseries of random numbers. Since theselists are not reproducible, von Neumann,John and Lehmer, D.H. proposed methodsthat are fully deterministic but give a seriesofnumbers thathave theappearanceofbeingrandom. Such numbers are known as pseu-dorandom numbers.

These pseudorandom numbers are calcu-lated from a predetermined algebraic for-mula. For example, the Lehmer methodallows a series of pseudorandom numbersx0, x1, . . . , xn, . . . to be calculated from

xi = a · xi−1 (modulo m) , i = 1, 2, . . . ,

where x0 = b and a, b, m are given.The choice of a, b, m influences the qualityof the sample of pseudorandom numbers.Independence between the pseudorandomnumbers dictates that the length of the cycle,meaning the quantity of numbers generatedbefore the same sequence is restarted, mustbe long enough. To obtain the longest possi-ble cycle, the following criteria must be sat-isfied:- a must not divide m,- x0 must not divide m,- m must be large.

DOMAINS AND LIMITATIONSIf the simulation requires observationsderivedfromnonuniformdistributions, thereare several techniques that can generate anylaw from a uniform distribution. Some ofthese methods are applicable to many distri-butions; others are more specific and areused for a particular distribution.

EXAMPLESLet us use the Lehmer method to calculatethe first pseudorandom numbers of the seriescreated with a = 33, b = 7, and m = 1000:

x0 = 7

x1 = 7 · 33 (mod1000) = 231

x2 = 231 · 33 (mod1000) = 623

x3 = 623 · 33 (mod1000) = 559

etc.

Page 227: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

224 Genetic Statistics

A series of numbers ri for which the distri-bution does not significantly differ froma uniform distribution is obtained by divid-ing xi by m:

ri = xi

m, i = 1, 2, . . .

Therefore: r0 = 0.007

r1 = 0.231

r2 = 0.623

r3 = 0.559

etc.

FURTHER READING� Kendall, Maurice George� Monte Carlo method� Random number� Simulation� Uniform distribution

REFERENCESDodge, Y.: A natural random number gener-

ator, Int.Stat.Review,64,329–344 (1996)

Ripley, B.D.: Thoughts on pseudorandomnumber generators. J. Comput. Appl.Math. 31, 153–163 (1990)

Genetic StatisticsIn genetic statistics, genetic data analysis isperformed using methods and concepts ofclassical statistics as well as those derivedfrom the theory of stochastic processes.Genetics is the study of hereditary charac-ter and accidental variations; it is used tohelp explain transformism, in the practicaldomain, to the improvement of the units.

HISTORYGenetics and statistics have been related forover 100 years. Sir Galton, Francis was

very interested in studiesof thehumangenet-ics, and particularly in eugenics, the studyof methods that could be used improve thegeneticqualityofahumanpopulation,whichobviously requires knowledge of heredity.Galton also invented regression and corre-lation coefficients in order to use them asstatistical tools when investigating genet-ics. These methods were later developed fur-ther by Pearson, Karl. In 1900, the workof Mendel was rediscovered and classicalmathematical and statistical tools for inves-tigating Mendel genetics were put forwardby Sir Fisher, Ronald Aylmer, Wright,Sewall and Haldane, J.B.S. in the period1920–1950.

DOMAINS AND LIMITATIONSGenetic statistics is used in domains suchas biomathematics, bioinformatics, biology,epidemiology and genetics. Standard meth-ods of estimation and hypothesis testing areused in order to estimate and test geneticparameters. The theory of stochastic pro-cesses is used to study the units of the evo-lution of a subject of a population, taking inaccount the random changes in the frequen-cies of genes.

EXAMPLESThe Hardy–Weinberg law is the central the-oretical model used in population genetics.Formulated independently by the Englishmathematician Hardy, Godfrey H. and theGerman doctor Weinberg, W. (1908), theequations of this law show that when genet-ic variations first appear in some individu-als in a population, they do not disappearupon hereditary transmission,butare insteadmaintained in future generations in the sameproportions, conforming to Mendel laws.

Page 228: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

G

Genetic Statistics 225

The Hardy–Weinberg law allows us, undercertain conditions, to calculate genotyp-ic frequencies from allelic frequencies.By genotype, we mean the genetic prop-erties of an individual that are receivedduring hereditary transmission from theindividual’s parents. For example, iden-tical twins have the same genotype. Anallele is a possible version of a gene, con-structed from a nucleotide chain. There-fore, a particular gene can occur in manydifferent allelic forms in a given popu-lation. In an individual, if each gene isrepresented by two alleles that are com-posed of identical nucleotides, the individ-ual is then homozygous for this gene, andif the alleles have different compositions,the individual is then heterozygous for thisgene.In the original version, the law tells that ifa gene is controlled by two alleles A and B,that occur with frequencies p and q = 1− prespectively in thepopulation atgeneration t,then the frequencies of the genotypes at gen-eration t+1 are given by the following equa-tion:

p2AA+ 2pqAB+ q2BB .

The basic hypothesis is that the size of thepopulation N is big enough to minimizesam-pling variations; there also must be no selec-tion, no mutation, no migration (no acquisi-tion/loss of an allele), and successive gener-ations must be discrete (no generation cross-ing).To get an AA genotype, the individual mustreceiveonealleleof typeA frombothparents.If thisprocess is random(so thehypothesisofindependence holds), this event will occurwith a probability of:

P (AA) = pp = p2 .

The same logic applies to the probability ofobtaining the genotype BB:

P (BB) = qq = q2 .

Finally, for the genotype AB, two cases arepossible: the individual received A fromtheirfather and B from their mother or vice versa,so:

P (AB) = pq+ qp = 2pq .

Therefore, in an ideal population, thepHardy–Weinberg proportions are givenby:

AA AB BBp2 2pq q2 .

This situation can be generalized to a genewith many alleles A1, A2, . . . , Ak. Thefrequency that homozygotes (AiAi) occurequals:

f (AiAi) = p2i , i = 1, . . . , k ,

and the frequency that heterozygotes (AiAj)occur equals:

f(AiAj

) = 2pipj ,

i �= j , i = 1, . . . , k , j = 1, . . . , k .

The frequency p′ of allele A in generationt+1 may also beof interest.By simplecount-ing we have:

p′ = 2p2N + 2pqN

2N= p2 + pq

= p · (p+ q) = p · (1− q+ q) = p .

The frequency of the allele A in generationt+1 is identical to the frequency of this allelein thepreviousgeneration and the initialgen-eration.

FURTHER READING� Data� Epidemiology� Estimation

Page 229: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

226 Geometric Distribution

� Hypothesis testing� Independence� Parameter� Population� Statistics� Stochastic process

REFERENCESGalton, F.: Typical laws of heredity. Proc.

Roy. Inst. Great Britain 8, 282–301 (1877)(reprinted in: Stigler, S.M. (1986) TheHistory of Statistics: The Measurement ofUncertainty Before 1900. Belknap Press,Cambridge, MA, p. 280)

Galton, F.: Co-relations and their measu-rement, chiefly from anthropologicaldata. Proc. Roy. Soc. Lond. 45, 135–145(1888)

Hardy, G.H.: Mendelian proportions ina mixed population. Science 28, 49–50(1908)

Smith, C.A.B.: Statistics in human genetics.In:Kotz,S., Johnson,N.L. (eds.)Encyclo-pediaofStatisticalSciences,vol. 3. Wiley,New York (1983)

Geometric Distribution

Consider a process involving k+1 Bernoul-li trials with a probability of success p anda probability of failure q.The random variable X, corresponding tothe number of failures k that occur when theprocess is repeated until the first success, fol-lows a geometric distribution with param-eter p, and so we have:

P(X = k) = p · qk .

The geometric distribution is a discreteprobability distribution.

Geometric distribution, p = 0.3, q = 0.7

Geometric distribution, p = 0.5, q = 0.5

This distribution is called “geometric”because the successive terms of the proba-bility function described above form a geo-metric progression with a ratio of q.The geometric distribution is a particularcase of the negative binomial distributionwhere r = 1, meaning that the process con-tinues until the first success.

MATHEMATICAL ASPECTSThe expected value of the geometric distri-bution is by definition equal to:

E[X] =∞∑

x=0

x · P(X = x)

=∞∑

x=0

x · p · qx

= q

p.

Indeed:

E [X] =∞∑

x=0

x · qx−1 · p

Page 230: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

G

Geometric Mean 227

= p ·∞∑

x=0

x · qx−1

= 0+ p(

1+ 2 · q+ 3 · q2 + . . .)

.

Hencewehave:1+2·q+3·q2+. . . = 1(1−q)2

since:

q+ q2 + q3 + . . . = q

1− q(

q+ q2 + q3 + . . .)′=

(q

1− q

)′

1+ 2 · q+ 3 · q2 + . . . = 1

(1− q)2 .

The variance of the geometric distributionis by definition equal to:

Var(X) = E[X2]− (E[X])2

=∞∑

x=0

x2 · P(X = x)−(

q

p

)2

=∞∑

x=0

x2 · p · qx −(

q

p

)2

= p · q(1+ 22 · q+ 32q2 + . . .)

−(

q

p

)2

= q

p2 .

Indeed, the sum 1+22·q+32q2+. . . = 1+qp2 ,

since:(

q(

1+ 2 · q+ 3 · q2 + . . .))′

=(

q

(1− q)2

)′

1+ 22 · q+ 32 · q2 + . . .

= 1+ q

(1− q)2= 1+ q

p2.

DOMAINS AND LIMITATIONSThe geometric distribution is used relative-ly frequently in meteorological models. It is

also used in stochastic processes and the the-ory of waiting lines.The geometric distribution can also beexpressed in the following form:

P(X = k) = p · qk−1 ,

where therandom variableX represents thenumber of tasks required to attain the firstsuccess (including the success itself).

EXAMPLESA fair die is thrown and we want to know theprobability thata“six”willbe thrownforthefirst time on the fourth throw.We therefore have:

p = 16 (probability that a six is thrown)

q = 56 (probability that a six is not thrown)

k = 3 (since the fourth throw is a success,we have three failures)

Therefore, the probability is as follows:

P(X = 3) = 1

6·(

5

6

)3

= 0.0965 .

FURTHER READING� Bernoulli distribution� Discrete probability distribution� Negative binomial distribution

Geometric Mean

Thegeometricmean isdefinedasthenthrootof the product of n non-negative numbers.

HISTORYAccording to Droesbeke, J.-J. and Tassi, Ph.(1990), the geometric mean and the har-monic mean were introduced by Jevons,William Stanley in 1874.

Page 231: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

228 Geometric Mean

MATHEMATICAL ASPECTSLet x1, x2, . . . , xn be a set of n non-negativequantities, or of n observations related toa quantitative variable X. The geometricmean G of this set is:

G = n√

x1 · x2 · . . . · xn .

The geometric mean is sometimes expressedin a logarithmic form:

ln (G) =

n∑i=1

ln (xi)

n,

or

G = exp

[1

n∑i=1

ln (xi)

].

We note that the logarithm of the geometricmeanofasetofpositivenumbers is thearith-metic mean of the logarithms of these num-bers (or the weighted arithmetic mean inthe case of grouped observations).

DOMAINS AND LIMITATIONSIn practice, the geometric mean is mainlyused to calculate the mean of a group ofratios, or particularly the mean of a group ofindices.Just like thearithmetic mean, thegeometricmean takes every observation into accountindividually. However, it decreases the influ-ence of outliers on the mean, which is whyit is sometimes preferred to the arithmeticmean.One important aspect of the geometric meanis that it only applies to positive numbers.

EXAMPLESHere we have an example of how the geo-metric mean is used: in 1987, 1988 and

1989, a businessman achieves profits of10.000, 20.000 and 160.000 euros, respec-tively. Based on this information, we want todetermine the mean rate at which these prof-its are increasing.From 1987 to 1988, the profit doubled, andfrom 1988 to 1989 it increased eight-fold. Ifwe simply calculate the arithmetic mean ofthese two numbers, we get

x = 2+ 8

2= 5 ,

and we conclude that, on average, the prof-it increased five-fold annually. However, ifwe take the profit in 1987 (10000 euros) andmultiply it by five annually, we obtain profitsof 50000 in 1988 and 250000 euros in 1989.These two profits are much too large com-pared to the real profits.On the other hand, if we calculate the meanusing the geometric mean of these twoincreases instead of the arithmetic mean,we get:

G = √2 · 8 = √16 = 4 ,

and we can correctly say that, on average, theprofits quadrupled annually. Applying thismean rate of increase to the initial profit of10000 euros in 1987, we obtain 40000 eurosin 1988 and 160000 euros in 1989. Even ifthe profit in 1988 is too high, it is less so thanthe previous result, and the profit for 1989 isnow correct.To illustrate the use of the formula for thegeometric mean in the logarithmic form, wewill now find the mean of the followingindices:

112, 99, 105, 96, 85, 100 .

Page 232: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

G

Geostatistics 229

We calculate (using a table of logarithms):

log (112) = 2.0492log (99) = 1.9956

log (105) = 2.0212log (96) = 1.9823log (85) = 1.9294

log (100) = 2.000011.9777

We get then:

log (G) = 11.9777

6= 1.9963 .

Therefore G = 99.15.

FURTHER READING� Arithmetic mean� Harmonic mean� Mean� Measure of central tendency

REFERENCESDroesbeke, J.J., Tassi, P.H.: Histoire de la

Statistique.Editions“Quesais-je?”Press-es universitaires de France, Paris (1990)

Jevons, W.S.: The Principles of Science:A Treatise on Logic and Scientific Meth-ods (two volumes). Macmillan, London(1874)

GeometricStandard Deviation

The geometric standard deviation of a setof quantitative observations is a measureof dispersion. It corresponds to the devia-tion of observations around the geometricmean.

HISTORYSee L1 estimation.

MATHEMATICAL ASPECTSLet x1, x2, . . . , xn be a set of n observationsrelated to a quantitative variable X. Thegeometricstandarddeviation,denotedbyσg,is calculated as follows:

log σg =[

1

n

n∑i=1

(log xi − log G)2

] 12

.

where G = n√

x1 · x2 · . . . · xn is the geomet-ric mean of the observations.

FURTHER READING� L1 estimation� Measure of dispersion� Standard deviation

Geostatistics

Geostatistics is the application of statis-tics to problems in geology and hydrolo-gy. Geostatistics naturally treat spatial datawithknowncoordinates.Spatialhierarchicalmodels follow the principle: model locally,analyze globally. Simple conditional modelsare constructed on all levels of the hierarchy(local modeling). The result is a joint mod-el that can be very complex, but analysis isstill possible (global analysis). Geostatisticsencompasses the set used in the theory, aswell as the techniques and statistical applica-tions used to analyze and forecast the distri-bution of the values of the variable in spaceand (eventually) time.

HISTORYThe term geostatistics was first used by Hart(1954) in a geographical context, in order

Page 233: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

230 Gini, Corrado

to highlight the application of particular sta-tistical techniques to observations coveringa regional distribution. The first geostatisti-cal concepts were formulated by Matheron(1963) at the Ecole des Mines de Paris inorder to estimate the reserves of a miner-al from spatially distributed data (observa-tions).

DOMAINS AND LIMITATIONSGeostatistics is used in a wide variety ofdomains, including forecasting (of precip-itation, ground porosity, concentrations ofheavy metals) and when treating imagesfrom satellites.

FURTHER READING� Spatial data� Spatial statistics

REFERENCESAtteia, O., Dubois, J.-P., Webster, R.: Geo-

statisticalanalysisofsoilcontaminationinthe Swiss Jura. Environ. Pollut. 86, 315–327 (1994)

Cressie, N.A.C.: Statistics for Spatial Data.Wiley, New York (1991)

Goovaerts, P.: Geostatistics for NaturalResources Evaluation. Oxford Univer-sity Press, Oxford (1997)

Isaaks, E.H., Srivastava, R.M.: An Intro-duction to Applied Geostatistics. OxfordUniversity Press, Oxford (1989)

Journel, A., Huijbregts, C.J.: Mining Geo-statistics. Academic, New York (1981)

Matheron, G.: La théorie des variablesregionalisées et ses applications. Mas-son, Paris (1965)

Oliver,M.A.,Webster,R.:Kriging:amethodof interpolation for geographical informa-

tion system. Int. J. Geogr. Inf. Syst. 4(3),313–332 (1990)

Pannatier, Y.: Variowin 2.2. Softwarefor Spatial Data Analysis in 2D.Springer, Berlin Heidelberg New York(1996)

Gini, CorradoGini,Corrado (1884–1965)wasborn inMat-ta di Livenza, Italy. Although hegraduated inlaw in 1905, he took courses in mathematicsand biology during his studies. His subse-quent inclinations towards both science andstatistics led him to become a temporary pro-fessor of statistics at Cagliari University in1909 and, in 1920 he acceded to the Chairof Statistics at the same university. In 1913,he began teaching at Padua University, andacceded to the Chair of Statistics at the Uni-versity of Rome in 1927.Between 1926 and 1932, he was also thePresident of the Central Institute of Statistics(ISTAT).He founded two journals. The first one,Metron (founded in 1920), is an internation-al journal of statistics, and the second one,Genus (founded in 1934), is the Journal ofthe Italian Committee for the Study of Pop-ulation Problems.The contributions of Gini to the field ofstatistics principally concern mean valuesand the variability, as well as associationsbetween the random variables. He also con-tributed original works to economics, soci-ology, demography and biology.

Some principal works and articles of Gini,Corrado:

1910 Indici di concentrazione e di depen-denza. Atti della III Riunione della

Page 234: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

G

Gini Index 231

Societa Italiana per il Progresso delleScienze, R12, 3–210.

1912 Variabilità e mutabilità. Studieconomico-giuridici, Università diCagliari, III 2a, R12, 211–382.

Gini Index

The Gini index is the most commonly usedinequality index; it allows us to measurethe degree of inequality in the distributionof incomes for a given population. Graph-ically, the Gini index is represented by thesurface area between a line at 45◦ and theLorenz curve (a graphical representationof the cumulative percentage of the totalincome versus the cumulative percentageof the population that receives that income,where income increases left to right). Divid-ing the surface area between the line at 45◦and the Lorenz curve (area A in the follow-ing diagram) by the total surface area underthe line (A+ B) gives the Gini coefficient:

IG = A

A+ B.

The Gini coefficient is zero if there is noinequality in income distribution and a valueof 1 if the income distribution is complete-ly unequal, so 0 ≤ IG ≤ 1. The Gini indexis simply the Gini coefficient multiplied by100, in order to convert it into a percentage.Therefore, thecloser theindexisto100%,themore unequal the income distribution is. Indeveloped countries, valuesof theGini indexare around 40% (see the example).

HISTORYThe Gini coefficient was invented in 1912by the Italian statistician and demographerGini, Corrado.

MATHEMATICAL ASPECTSThe total surface area under the line at 45◦(A+ B) equals 1

2 , and so we can write:

IG = A

A+ B= A

12

= 2A

= 2

(1

2− B

)⇒ IG = 1− 2B

(1)

To calculate the value of the Gini coefficient,we thereforeneed to workoutB (total surfaceunder the Lorenz curve).

a) Gini index: discrete caseFor a discrete Lorenz curve wherethe income distribution is arranged inincreasing order (x1 ≤ . . . ≤ xn), theLorenz curve has the following profile:

To calculate B, we need to divide thesur-face under the Lorenz curve into a seriesof polygons (where the first polygon isa triangle and others is are trapezia).The surface of the triangle is:

1

2n

x1

X.

The surface of the jth trapezium is:

1

2n

⎡⎣

j−1∑i=1

xi

X+

j∑i=1

xi

X

⎤⎦

= 1

2nX

⎡⎣2 ·

j−1∑i=1

xi + xj

⎤⎦ ,

Page 235: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

232 Gini Index

with j = 2, . . . , n. We then have

B = 1

2n· x1

X

+n∑

j=2

1

2nX

⎡⎣2 ·

j−1∑i=1

xi + xj

⎤⎦ ,

where:

x1, . . . , xn are the incomes;

X is the total revenue;x1 + . . . + xn

Xis the total cumulated

revenue, in %.

After developing this equation, we get:

B = 1

2nX

·[−X + 2 ·

n∑i=1

(n− i+ 1)xi

].

(2)Using Eq. (1), we can then get the Giniindex:

IG = 1− 2B

= 1

− 2

2nX

[−X + 2·

n∑i=1

(n− i+ 1)xi

]

= 1+ 1

n− 2 ·

n∑i=1

(n− i+ 1) xi

nX

= 1+ 1

n− 2

n2λ (x)

n∑i=1

ixn−i+1 ,

where λ (x) = 1n

n∑i=1

xi = mean income

of the distribution.TheGinicoefficientcanbereformulatedin many ways. We could use, for exam-ple:

IG =n∑

i=1

n∑j=1

∣∣xi − xj∣∣

2n2λ (x).

b) Gini index: continuous caseIn this case, B corresponds to the inte-gral from 0 to 1 of the Lorenz function.Therefore, we only need to insert thisintegral into Eq. (1):

IG = 1− 2B = 1− 2

1∫

0

L (P) dP ,

whereP = F (y) =y∫

0f (x) dx is theper-

centage of the population with incomessmaller then y. For example,

IG =1∫

0

(1− (1− P)

α−1α

)dP

= 1− α

2α − 1

= 1− 2

(1− α

2α − 1

).

DOMAINS AND LIMITATIONSThe Gini index (or coefficient) is the param-eter most commonly used to measure theextent of inequality in an income distri-bution. It can also be used to measure the dif-ferences in unemploymentbetween regions.

EXAMPLESAs an example of how to calculate the Ginicoefficient (or index), let us take the discretecase.We have: IG = 1− 2BRecall that, in this case, we find surface B bycalculating the sum of a series of polygonsunder the Lorenz curve.The income data for a particular popula-tion in 1993–1994 are given in the follow-ing table, along with the areas associated theincome intervals:

Page 236: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

G

Goodness of Fit Test 233

Population Income Surface Area

0 0

94569 173100.4 0.000013733

37287 527244.8 0.000027323

186540 3242874.9 0.000726701

203530 4603346.0 0.00213263

266762 7329989.6 0.005465843

290524 9428392.6 0.010037284

297924 11153963.1 0.015437321

284686 12067808.2 0.02029755

251155 11898201.1 0.022956602

216164 11317647.7 0.023968454

182268 10455257.4 0.023539381

153903 9596182.1 0.022465371

127101 8560949.7 0.020488918

104788 7581237.4 0.018311098

85609 6621873.3 0.01597976

70108 5773641.3 0.013815408

57483 5021336.9 0.011848125

47385 4376126.7 0.010140353

39392 3835258.6 0.008701224

103116 11213361.3 0.024078927

71262 9463751.6 0.017876779

46387 7917846.5 0.012313061

50603 20181113.8 0.0114625049

So, the surface B equals:

B = 1

2nX·[−X + 2 ·

n∑i=1

(n− i+ 1) · xi

]

= 0.315246894 .

We then have:

IG = 1− 2B = 1− 2 · 0.315246894

= 0.369506212 .

FURTHER READING� Gini, Corrado� Simple index number� Uniform distribution

REFERENCESGreenwald, D. (ed.): Encyclopédie

économique. Economica, Paris, pp. 164–165 (1984)

Flückiger, Y.: Analyse socio-économiquedes différences cantonales de chômage.Le chômage en Suisse: bilan et perspec-tives. Forum Helvet. 6, 91–113 (1995)

Goodness of Fit Test

Performing a goodness of fit test on a sam-ple allows us to determine whether theobserved distribution corresponds to a par-ticular probability distribution (such as thenormal distribution or the Poisson distri-bution). This allows us to find out whetherthe observed sample was drawn from a pop-ulation that follows this distribution.

HISTORYSee chi-square goodness of fit test andKolmogorov–Smirnov test.

MATHEMATICAL ASPECTSThe goal of the goodness of fit test is todetermine whether an observed sample wasdrawn from a population that follows a par-ticular probability distribution.

Page 237: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

234 Gosset, William Sealy

The process used usually involves somehypothesis testing:

Null hypothesis H0 : F = F0

Alternative hypothesis H1 : F �= F0

where F is an unknown distribution func-tion of the underlying population and F0 isthe presumed distribution function.The goodness of fit test involves comparingthe empirical distribution with the presumeddistribution. We reject the null hypothesis iftheempiricaldistribution isnotcloseenoughto the presumed distribution. The preciserules that govern whether the null hypothesisis rejected or accepted depend on the type ofthe test used.

EXAMPLESThere are many goodness of fit tests, includ-ing the chi-square goodness of fit test andthe Kolmogorov–Smirnov test.

FURTHER READING� Chi-square goodness of fit test� Hypothesis testing� Kolmogorov–Smirnov test

Gosset, William Sealy

Gosset, William Sealy, better known by hispen name, “Student”, was born in Can-terbury, England in 1876. After studyingmathematics and chemistry at New College,Oxford, he started working as a brewer forGuinnessBreweriesinDublinin1899.Guin-ness was a business that favored research,and made its laboratories available to itsbrewers and its chemists. Indeed, in 1900 itopened the “Guinness Research Laborato-ry,” which had one of the greatest chemists

around at that time, Horace Brown, as direc-tor. Work that was carried out in this labora-tory was related to the quality and costs ofthe many varieties of barley and hop.It was in this environment that Gosset devel-oped his interest in statistics.At Oxford, Gosset had studied mathematics,and so he was often called upon by his col-leagues at Guinness when certain mathe-maticalproblemsrose.That led tohimstudy-ing the theory of errors, and he consultedPearson, K., whom he met in July 1905, onthis topic.The main difficulty encountered by Gossetduring his work was small sample sizes. Hetherefore decided to attempt to find an appro-priate method for handling the data fromthese small samples.In 1906–1907, Gosset spent a year collabo-rating with Pearson, K., in Pearson’s lab-oratory in the University College London,attempting to develop methods related to theprobable error of the mean.In 1907, Gosset was made responsible forGuinness’ Experimental Brewery, and heused the Student table that he had pre-viously derived experimentally in order todetermine the best variety of barley to usefor brewing. Guinness, wary of publishingtrade secrets, only allowed Gosset to pub-lish his work under the pseudonym of either“Pupil” or “Student”, and so he chose the lat-ter.He died in 1937, leaving important works,which were all published under thepen nameof “Student”.Gosset’s work remained largely ignored formany years, with few scientists using histable aside from the researchers at Guinness’breweries and those at Rothamsted Experi-mental Station (where Fisher, R.A. workedas a statistician).

Page 238: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

G

Graeco-Latin Square Design 235

Some of the main works and articles ofGosset, W.S.:

1907 On the error of counting witha haemacytometer. Biometrika 5,351–360.

1908 The probable error of a mean.Biometrika 6, 1–25

1925 Newtablesfor testingthesignificanceof observations. Metron 5, 105–120.

1942 Pearson, E.S. and Wishart, J. (eds)Student’s collected papers (forewordby McMullen, L.). Cambridge Uni-versity Press, Cambridge (Issued bythe Biometrika Office, UniversityCollege London).

FURTHER READING� Student distribution� Student table� Student test

REFERENCESFisher-Box,J.:Guinness,Gosset,Fisher,and

small samples. Stat. Sci. 2, 45–52 (1987)

McMullen, L., Pearson, E.S.: William SealyGosset, 1876–1937. In: Pearson, E.S.,Kendall, M. (eds.) Studies in the HistoryofStatisticsand Probability,vol. I.Griffin,London (1970)

Plackett,R.L.,Barnard,G.A. (eds.):Student:A Statistical Biography of William SealyGosset (Based on writings of E.S. Pear-son). Clarendon, Oxford (1990)

Graeco-Latin Squares

See Graeco-Latin square design.

Graeco-Latin Square Design

A Graeco-Latin square design is a designof experiment in which the experimentalunits are grouped in three different ways. It isobtained by superposing two Latin squaresof the same size.If every Latin letter coincides exactly oncewith a Greek letter, the two Latin squaredesigns are orthogonal. Together they forma Graeco-Latin square design.In this design, each treatment (Latin letter)appears just once in each line, once in eachcolumn and once with each Greek letter.

HISTORYThe construction of a Graeco-Latin square,was originated by Euler, Leonhard (1782).A book by Fisher, Ronald Aylmer andYates, Frank (1963) gives Graeco-Latintables of order 3 up to order 12 (not includ-ing the order of six). The book by Denes andKeedwell (1974) also contained compre-hensive information on Graeco-Latin squaredesigns. Dodge and Shah (1977) treat thecase of the estimation when data is missing.

DOMAINS AND LIMITATIONSGraeco-Latin square designs are used toreduce the effects of three sources of system-atic error.There are Graeco-Latin square designs forany size n except for n = 1, 2 and 6.

EXAMPLESWe want to test three different types ofgasoline. To do this, we have three driversand three vehicles. However, the test is tooinvolved to perform in one day: we haveto perform the experiment over three days.In this case, using drivers and vehicles over

Page 239: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

236 Graphical Representation

three separatedayscould result in systematicerrors. Therefore, in order to eliminate theseerrors we construct a Graeco-Latin squareexperimental design. In this case, each typeof gasoline (treatment) will be tested justoncewith each driver, oncewith each vehicleand once each day.In this Graeco-Latin square design, we rep-resent the different types of gasoline by theLatin letters A, B and C and the different daysby α, β and γ .

Drivers

Vehicles

1 2 31 Aα Bβ Cγ

2 Bγ Cα Aβ

3 Cβ Aγ Bα

An analysis of variance will then tell uswhether, after eliminating the effects of thelines (vehicles), the columns (drivers) andthe greek letters (days), there is a significantdifference between the types of gasoline.

FURTHER READING� Analysis of variance� Design of experiments� Latin square designs

REFERENCESDénes, J., Keedwell, A.D.: Latin squares and

their applications. Academic, New York(1974)

Dodge, Y., Shah, K.R.: Estimation of param-eters in Latin squares and Graeco-latinsquares with missing observations. Com-mun. Stat. Theory A6(15), 1465–1472(1977)

Euler, L.: Recherches sur une nouvelleespèce de carrés magiques. Verh. Zeeuw.Gen. Weten. Vlissengen 9, 85–239 (1782)

Fisher, R.A., Yates, F.: Statistical Tablesfor Biological, Agricultural and Medical

Research, 6th edn. Hafner (Macmillan),New York (1963)

Graphical Representation

Graphical representations encompass a widevariety of techniques that are used to clarify,interpret and analyze data by plotting pointsand drawing line segments, surfaces and oth-er geometric forms or symbols.The purpose of a graph is a rapid visual-ization of a data set. For instance, it shouldclearly illustrate the general behavior of thephenomenon investigated and highlight anyimportant factors. It can be used, for exam-ple, as a means to translate or to completea frequency table.Therefore, graphical representation is a formof data representation.

HISTORYThe concept of plotting a point in coordi-nate space dates back to at least the ancientGreeks, but we had to wait until the workof Descartes, René for mathematicians toinvestigate this concept.According to Royston, E. (1970), a Ger-man mathematician named Crome, A.W.was among the first to use graphical repre-sentation in statistics. He initially used it asa teaching tool.In his works Geographisch-statistischeDarstellung der Staatskräfte (1820) andUeber die Grösse und Bevölkerung dersämtlichen Europäischen Staaten (1785),Crome employed different types of graph-ical representation, among them the piechart.Royston, E. (1970) also cites Playfair, W.,whose work The Commercial and Polit-ical Atlas (1786) also referred to various

Page 240: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

G

Graphical Representation 237

graphical representations, especially the linechart. Playfair was very interested in theinternational trade balance, and illustratedhis studies using different graphics such asthe histogram and the pie chart.The term histogram was used for the firsttime by Pearson, Karl, and Guerry (1833)appears to have been among the first to usethe the line chart; in his Essai sur la Statis-tique Morale de la France, published inParis in 1833, he described the frequenciesof crimes by their characteristics; this studyconstituted one of the first uses of a frequen-cy distribution.Schmid, Calvin F. (1954) illustrates anddescribes thedifferent typesofgraphical rep-resentations in his Handbook of GraphicPresentation.

MATHEMATICAL ASPECTSThe type of graphic employed depends uponthe kind of data to be presented, the natureof the variable studied, and the goal of thestudy:• A quantitative graphic is particularly

useful for representing qualitative cate-gorical variables. Such graphics includethe pie chart, the line chart and the pic-togram.

• A frequency graphic allows us to rep-resent the (discrete or continuous) fre-quency distribution of a quantitativevariable. Such graphics include the his-

togram and the stem and leaf diagram.• A cartesian graphic, employs a system of

axes that are perpendicular to each otherand intersect at a point known as the “ori-gin.” Such a coordinate system is termed“cartesian”.

FURTHER READING� Frequency table� Quantitative graph

REFERENCESCrome, A.F.W.: Ueber die Grösse und

Bevölkerung der sämtlichen Europä-ischen Staaten. Weygand, Leipzig (1785)

Crome, A.F.W.: Geographisch-statistischeDarstellung der Staatskräfte. Weygand,Leipzig (1820)

Fienberg, S.E.: Graphical method in statis-tics. Am. Stat. 33, 165–178 (1979)

Guerry, A.M.: Essai sur la statistique moralede la France. Crochard, Paris (1833)

Playfair, W.: The Commercial and PoliticalAtlas. Playfair, London (1786)

Royston, E.: A note on the history of thegraphical presentation of data. In: Pear-son, E.S., Kendall, M. (eds.) Studies in theHistoryofStatisticsandProbability,vol. I.Griffin, London (1970)

Schmid, C.F.: Handbook of Graphic Presen-tation. Ronald Press, New York (1954)

Page 241: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

H

Hájek, Jaroslav

Hájek, Jaroslav was born in 1926 in Pode-brady, Bohemia. A statistical engineer byprofession, he obtained his doctorate in1954. From 1954 to 1964, he worked asa researcher at the Institute of Mathematicsof the Czechoslovakian Academy of Sci-ences. He then joined Charles University inPrague, where he was a professor from 1966until his premature death in 1974. The prin-cipal works of Hájek, J. concern samplingprobability theory and the rank test theory. Inparticular, he developed an asymptotic the-ory of the statistics of linear ranks. He wasthe first to apply the concept of invarianceto the theory of rank testing.

Some principal works and articles of Hájek,Jaroslav:

1955 Some rank distributions and theirapplications. Cas. Pest. Mat. 80, 17–31 (in Czech); translation in (1960)Select. Transl. Math. Stat. Probab.,2, 41–61.

1958 Some contributions to the theory ofprobability sampling. Bull. Inst. Int.Stat., 36, 127–134.

1961 Some extensions of the Wald–Wolfowitz–Noether theorem. Ann.Math. Stat. 32, 506–523.

1964 Asymptotic theory of rejective sam-pling with varying probabilities froma finite population. Ann. Math. Stat.35, 1419–1523.

1965 Extension of the Kolmogorov–Smirnov test to regression alterna-tives. In: Neyman, J. and LeCam, L.(eds) Bernoulli–Bayes–Laplace:Proceedings of an International Sem-inar, 1963. Springer, Berlin Heidel-berg New York, pp. 45–60.

1981 Dupac,V. (ed)SamplingfromaFinitePopulation. Marcel Dekker, NewYork.

Harmonic Mean

The harmonic mean of n observations isdefined as n divided by the sum of the invers-es of all of the observations.

HISTORYSee geometric mean.The relationship between the harmonicmean, the geometric mean and the arith-metic mean isdescribed by Mitrinovic,D.S.(1970).

MATHEMATICAL ASPECTSLet x1, x2, . . . , xn be n nonzero quantities,or n observations related to a quantitative

Page 242: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

240 Harmonic Mean

variable X. The harmonic mean H of thesen quantities is calculated as follows:

H = nn∑

i=1

1xi

.

If {xi}i=1,...,n represents a finite series of pos-itive numbers, we state that:

mini

xi ≤ H ≤ G ≤ x ≤ maxi

xi .

DOMAINS AND LIMITATIONSThe harmonic mean is rarely used in statis-tics. However, it can sometimes be useful,such as in the following cases:• If a set of investments are invested at dif-

ferent interest rates, and they all give thesame income, the unique rate at which allof the capital tied up in those investmentsmust be invested to produce the same rev-enue as given by the set of investments isequal to the harmonic mean of the indi-vidual rates.

• Say we have a group of different materi-als, and each material can be purchasedat a given price per amount of material(where the price per amount can be differ-ent for each material). We then buy a cer-tainamountofeachmaterial, spendingthesame amount of money on each. In thiscase, the mean price per amount across allmaterials is given by the harmonic meanof thepricesperamountforallof themate-rials.

• One property of the harmonic mean is thatit is largely insensitive tooutliers thathavemuch larger values than the other data.For example, consider the following val-ues: 1, 2, 3, 4, 5 and 100. Here the har-monic mean equals 2.62 and the arith-metic mean equals 19.17. However, theharmonic mean is much more sensitive

to outliers when they have much small-er values than the rest of the data. So, forthe observations 1, 6, 6, 6, 6, 6, we getH = 3.27 whereas the arithmetic meanequals 5.17.

EXAMPLESThree investments that each yield the sameincomehave the following interest rates:5%,10% and 15%.The harmonic mean gives the interest rateat which all of the capital would need to beinvested in order to produce the same totalincome as the three separate investments:

H = 3[15 + 1

10 + 115

] = 311

30

= 8.18% .

We note that this result is different from thearithmetic mean of 10% (5+ 10+ 15)/3.A representative buys three lots of coffee,each of a different grade (quality), at 3, 2 and1.5 euros per kg respectively. He buys 200euros of each grade.The mean price per kg of coffee is thenobtained by dividing the total costby the totalquantity bought:

mean price = total cost

total quantity

= 3 · 200

66.66+ 100+ 133.33= 2 .

This corresponds to the harmonic mean ofthe prices of the three different grades of cof-fee:

mean price = 3[13 + 1

2 + 11.5

] = 6

3= 2 .

FURTHER READING� Arithmetic mean� Geometric mean

Page 243: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

H

Hat Matrix 241

� Mean� Measure of central tendency

REFERENCESMitrinovic,D.S. (with Vasic,P.M.):Analytic

Inequalities. Springer, Berlin HeidelbergNew York (1970)

Hat MatrixThe hat matrix is a matrix used in regres-sion analysis and analysis of variance. Itis defined as the matrix that converts valuesfrom the observed variable into estimationsobtained with the least squares method.Therefore, when performing linear regres-sion in the matrix form, if Y is the vectorformed from estimations calculated from theleast squares parameters, and Y is a vector ofobservations related to the dependent vari-able, then Y is given by vector Y multipliedby H, that is, Y = HY converts to Y’s intoY’s.

HISTORYThe hat matrix H was introduced byTukey, John Wilder in 1972. An articleby Hoaglin, D.C. and Welsch, R.E. (1978)gives the properties of the matrix H and alsomany examples of its application.

MATHEMATICAL ASPECTSConsider the following linear regressionmodel:

Y = X · β + ε .

where

Y is an (n × 1) vector of observations onthe dependent variable;

X is the (n×p)matrix of independent vari-ables (therearep independentvariables);

ε is the (n× 1) vector of errors, and;

β is the (p× 1) vector of parameters to beestimated.

The estimation β of the vector β is given by

β = (X′ ·X)−1 ·X′ ·Y .

and we can calculate the estimated values Yof Y if we know β:

Y = X · β = X · (X′ ·X)−1 ·X′ · Y .

The matrix H is then defined by:

H = X · (X′ · X)−1 ·X′ .In particular, the diagonal element hii will bedefined by:

hii = xi ·(X′ · X)−1 · x′i .

where xi is the ith line of X.

DOMAINS AND LIMITATIONSThe matrix H, which allows us to obtain nestimations of the dependent variable fromn observations, is an idempotent symmetricsquare matrix of order n. The element (i, j)of this matrix measures the influence of thejth observation on the ith predicted value.In particular, the diagonal elements evaluatethe effects of the observations on the corre-sponding estimations of the dependent vari-ables. The value of each diagonal element ofthe matrix H ranges between 0 and 1.WritingH = (hij) for i, j = 1, . . . , n,wehavethe relation:

hii = h2ii +

n∑i=1i �=j

n∑j=1

h2ij .

which is obtained based on the idempotentnature of H; in other words H = H2.

Page 244: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

242 Histogram

Therefore tr(H) =n∑

i=1hii = p = number of

parameters to estimate.The matrix H is used to determine leveragepoints in regression analysis.

EXAMPLESConsider the following table where Y isa dependent variable related to the inde-pendent variable X:

X Y

50 6

52 8

55 9

75 7

57 8

58 10

The model of simple linear regression iswritten in the following manner in the matrixform:

Y = X · β + ε .

where

Y =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

68978

10

⎤⎥⎥⎥⎥⎥⎥⎥⎦

, X =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

1 501 521 551 751 571 58

⎤⎥⎥⎥⎥⎥⎥⎥⎦

.

ε is the (6× 1) vector of errors, and β is the(2× 1) vector of parameters.We find the matrix H using the result:

H = X · (X′ · X)−1 ·X′ .By stepwise matrix calculations, we obtain:

(X′ ·X)−1 = 1

2393·[

20467 −347−347 6

]

=[

8.5528 −0.1450−0.1450 0.0025

].

and finally:

H = X · (X′ · X)−1 · X′

=

⎡⎢⎢⎢⎢⎢⎣

0.32 0.28 0.22 −0.17 0.18 0.160.28 0.25 0.21 −0.08 0.18 0.160.22 0.21 0.19 0.04 0.17 0.17−0.17 −0.08 0.04 0.90 0.13 0.17

0.18 0.18 0.17 0.13 0.17 0.170.16 0.16 0.17 0.17 0.17 0.17

⎤⎥⎥⎥⎥⎥⎦

.

We remark, for example, that the weight ofy1 used during the estimation of y1 is 0.32.We then verify that the traceof Hequals 2; inother words, it equals the number of param-eters of the model.

tr(H) = 0.32+ 0.25+ 0.19+ 0.90

+ 0.17+ 0.17

= 2 .

FURTHER READING� Leverage point� Matrix� Multiple linear regression� Regression analysis� Simple linear regression

REFERENCESBelsley, D.A., Kuh, E., Welsch, R.E.:

Regression diagnostics. Wiley, New Yorkpp. 16–19 (1980)

Hoaglin, D.C., Welsch, R.E.: The hat matrixin regression and ANOVA. Am. Stat. 32,17–22 (and correction at 32, 146) (1978)

Tukey, J.W.: Some graphical and semigraph-ical displays. In: Bancroft, T.A. (ed.) Sta-tistical Papers in Honor of George W.Snedecor. Iowa State University Press,Ames, IA , pp. 293–316 (1972)

HistogramThe histogram is a graphical representa-tion of the distribution of data that has been

Page 245: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

H

Histogram 243

grouped into classes. It consists of a series ofrectangles, and is a type of frequency chart.Each data value is sorted and placed in anappropriate class interval. The number ofdatavalueswithineachclass intervaldictatesthe frequency (or relative frequency) of thatclass interval.Each rectangle in the histogram representsa class of data. The width of the rectanglecorresponds to the width of the class interval,and thesurfaceof therectanglerepresents theweight of the class.

HISTORYThe term histogram was used for the firsttime by Pearson, Karl in 1895.Also see graphical representation.

MATHEMATICAL ASPECTSThe first step in the construction of a his-togram consists of presenting the data in theform of a frequency table.This requires that the class intervals areestablished and thedatavaluesare sorted andplaced in theclasses,which makes itpossibleto calculate the frequencies of the classes.The class intervals and frequencies are thenadded to the frequency table.We then make use of the frequency table inorder to construct the histogram. We dividethe horizontal axis of the histogram intointervals, where the widths of these intervalscorrespond to those of the class intervals. Wethen draw the rectangles on the histogram.Thewidth ofeach rectangle is thesameas thewidth of the class that it corresponds to. Theheight of the rectangle is such that the sur-face area of the rectangle is equal to the rel-ative frequency of the corresponding class.The sum of the surface areas of the rectan-gles must be equal to 1.

DOMAINS AND LIMITATIONSHistograms are used to present a data set ina visual form that is easy to understand. Theyallow certain general characteristics (such astypical values, the range or the shape of thedata) to be visualized and extracted.Reviewing the shape of a histogram canallow us to detect the probability modelfollowed by the data (normal distribution,log-normal distribution, . . . ).It isalsopossibletodetectunexpectedbehav-ior or abnormal values with a histogram.This type of graphical representation ismost frequently used in economics, but sinceit is an extremely simple way of visualizinga data set, it is used in many other fields too.We can also illustrate relative frequency ina histogram. In this case, the height of eachrectangle equals the relative frequency of thecorresponding class divided by the length ofthe class interval. In this case, if we sum thesurface areas of all of the rectangles in thehistogram, we obtain unity.

EXAMPLESThe following table gives raw data on theaverage annual precipitation in 69 cities ofthe USA. We will use these data to establisha frequency table and then a correspondinghistogram.

Annual average precipitations in 69 cities of theUSA (in inches)

Mob. 67.0 Chic. 34.4 St.L. 35.9

Sacr. 17.2 Loui. 43.1 At.C. 45.5

Wash. 38.9 Detr. 31.0 Char. 42.7

Boise 11.5 K.C. 37.0 Col. 37.0

Wich. 30.6 Conc. 36.2 Prov. 42.8

Bost. 42.5 N.Y. 40.2 Dall. 35.9

Jack. 49.2 Clev. 35.0 Norf. 44.7

Reno 7.2 Pitt. 36.2 Chey. 14.6

Page 246: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

244 Homogeneity Test

Buff. 36.1 Nash. 46.0 L.R. 48.5

Cinc. 39.0 Burl. 32.5 Hart. 43.4

Phil. 39.9 Milw. 29.1 Atl. 48.3

Memp. 49.1 Phoen. 7.0 Indi. 38.7

S.L. 15.2 Denv. 13.0 Port. 40.8

Char. 40.8 Miami 59.8 Dul. 30.2

Jun. 54.7 Peor. 35.1 G.Fls 15.0

S.F. 20.7 N.Or. 56.8 Albq 7.8

Jack. 54.5 S.S.M. 31.7 Ral. 42.5

Spok. 17.4 L.A. 14.0 Omaha 30.2

Ok.C. 31.4 Wilm. 40.2 Alb. 33.4

Col. 46.4 Hon. 22.9 Bism. 16.2

El Paso 7.8 D.Mn. 30.8 Port. 37.6

S-Tac 38.8 Balt. 41.8 S.Fls 24.7

S.J. 59.2 Mn-SP 25.9 Hstn. 48.2

Source: U.S. Census Bureau (1975) Statisti-cal Abstract of the United States. U.S. CensusBureau, Washington, DC.

These data can be represented by the follow-ing frequency table:

Class Frequency Relative Frequency

0–10 4 0.058

10–20 9 0.130

20–30 5 0.072

30–40 24 0.348

40–50 21 0.304

50–60 4 0.058

60–70 2 0.030

Total 69 1.000

We can now construct the histogram:

C1

De

nsit

y

706050403020100

0.04

0.03

0.02

0.01

0.00

Histogram of C1

Thehorizontal axis isdivided upintoclasses,and in this case the relative frequencies aregivenbytheheightsof therectanglesbecausethe classes all have the same width.

FURTHER READING� Frequency distribution� Frequency polygon� Frequency table� Graphical representation� Ogive� Stem-and-leaf diagram

REFERENCESDodge, Y.: Some difficulties involving non-

parametric estimation of a density func-tion. J. Offic. Stat. 2(2), 193–202 (1986)

Freedman, D., Pisani, R., Purves, R.: Statis-tics. Norton, New York (1978)

Pearson, K.: Contributions to the mathe-matical theory of evolution. II: Skewvariation in homogeneous material. In:Karl Pearson’s Early Statistical Papers.Cambridge University Press, Cambridge,pp. 41–112 (1948). First published in1895 in Philos. Trans. Roy. Soc. Lond.Ser. A 186, 343–414

Homogeneity Test

One issue that often needs to be consideredwhen analyzing categorical data obtainedfor many groups is that of the homogene-ity of the groups. In other words, we needto find out whether there are significantdifferences between these groups in rela-tion to one or many qualitative categori-cal variables. A homogeneity test can showwhether the differences are significant ornot.

Page 247: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

H

Homogeneity Test 245

MATHEMATICAL ASPECTSWeconsider thechi-squarehomogeneity testhere, which is a specific type of chi-squaretest.Let I be the number ofgroups considered andJ the number of categories considered.Let:

ni. =J∑

j=1nij correspond to the size of group

i;

n.j =I∑

i=1nij;

n =I∑

i=1

J∑j=1

nij be the total number of

observationsnij be the empirical frequency (that is, the

number of occurrences observed) corre-sponding to group i and category j;

mij is the theoretical frequency correspond-ing to group i and category j, which,assuming homogeneity among thegroups, equals:

mij = ni. · n.j

n.

If we represent the data in the form of a con-tingency table with I lines and J columns,we can calculate the ni. that contribute to thesum of the elements of line i and the n.j thatcontribute to the sum of all of the elementsof column j.We calculate

χ2c =

I∑i=1

J∑j=1

(nij − mij

)2

mij.

and the chi-square homogeneity test isexpressed in the following way: we reject thehomogeneity hypothesis (at a significancelevel of 5%) if the value χ2

c is greater thenthe value of the χ2 (chi-square) distributionwith (J − 1) · (I − 1) degrees of freedom.

Note: We have assumed here that the samenumber of units are tested for each combina-tion ofgroup andcategory.However,wemaywant to testdifferentnumbersofunits fordif-ferent combinations. In this case, if we havea proportion pij of units for group i and cat-egory j, it is enough to replace mij by ni. ·pij.

EXAMPLESWe consider a study performed in the phar-maceutical domain that concerns 100 peo-ple suffering from a particular illness. Inorder to examine the effect of a medicaltreatment, 100 people were chosen at ran-dom. Half of them were placed in a con-trol group and received a placebo. The oth-er patients received the medical treatment.Then the number of healthy people in eachgroup was monitored for 24 hours followingadministration the treatment. The results areprovided in the following table.

Observedfrequency

Healthyfor 24hours

Nothealthy

Total

Placebo 2 48 50

Treatment 9 41 50

The theoretical frequencies are obtained byassuming that the general state of healthwould have been the same for both groups ifno treatment had been applied. In this casewe obtain m11 = m21 = 50·11

100 = 5.5 andm12 = m22 = 50·89

100 = 44.5.We calculate the value of χ2 by comparingthe theoretical frequencies with theobservedfrequencies:

χ2c =

(2− 5.5)2

5.5+ (48− 44.5)2

44.5

+ (9− 5.5)2

5.5+ (41− 44.5)2

44.5= 5.005 .

Page 248: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

246 Hotelling, Harold

Ifwethenrefer to theχ2 distribution tableforone degree of freedom, we obtain the valueχ2

0.05 = 3.84 for a significance level of 5%,which is smaller then the value we calculat-ed,χ2

c = 5.005. We conclude that the groupswere not homogeneous and the treatment isefficient.

FURTHER READING� Analysis of categorical data� Analysis of variance� Categorical data� Chi-square distribution� Chi-square test� Frequency

REFERENCESee analysis of categorical data.

Hotelling, Harold

Hotelling, Harold (1895–1973) is consid-ered to be one of the pioneers in the fieldof economical mathematics over the period1920–1930.He introduced the T2 multivari-ate test, principal components analysis andcanonical correlation analysis.He studied at University of Washington,where he obtained a B.A. in journalism in1919. He then moved to Princeton Univer-sity, obtaining a doctorate in mathematicsfrom there in 1924. The began teachingat Stanford University that same year. Hisapplications of mathematics to the socialsciences initially concerned journalism andpolitical science, and then he moved hisfocus to population and predictian.In 1931, he moved to Colombia University,where he actively participated in the cre-ation of its statistical department. During the

Second World War, he performed statisticalresearch for the military.In 1946, he was hired by North CarolinaUni-versity at Chapel Hill to create a statisticsdepartment there.

Some principal works and articles of Hotel-ling, Harold:

1933 Analysis of a complex of statisticalvariables with principal components.J. Educ. Psychol., 24, 417–441 and498–520.

1936 Relation between two sets of variates.Biometrika 28, 321–377.

Huber, Peter J.

Huber, Peter J. was born at Wohlen (Switzer-land) in 1934. He performed brilliantly dur-ing his studies and his doctorate in mathe-matics at the Federal Polytechnic School ofZurich, where he received the Silver Medalfor the scientific quality of his thesis. Heworked as Professor of Mathematical Statis-tics at the Federal Polytechnic School ofZurich. He then moved to the USA andworked at the most prestigious universities(Princeton,Yale,Berkeley)asan invitedpro-fessor. In 1977 he was named Professor ofApplied Mathematics at the MassachusettsInstitute of Technology. He is member ofthe prestigious American Academy of Artsand Sciences, the Bernoulli Society and theNational Science Foundation in the USA, inwhich foreign members are extremely rare.Since the publication of his article “Robustestimation of a location parameter” in 1964,he has been considered to be the founder ofrobust statistics.

Page 249: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

H

Hypergeometric Distribution 247

Huber, Peter J. received the title of DocteurHonoris Causa from Neuchâtel University in1994.

Some principal works and publications ofHuber, Peter J.:

1964 Robust estimation of a locationparameter. Ann. Math. Stat. 35, 73–101.

1968 Robust statistical procedures.SIAMCBMS-NSF Reg. Conf. Ser.Appl. Math.,

1981 Robust Statistics. Wiley, New York.

1995 Robustness: Where are we now? Stu-dent, Vol.1, 75–86.

Hypergeometric DistributionThe hypergeometric distribution describesthe probability of success if a series ofobjects are drawn from a population (whichcontains some objects that represent failurewhile the others represent success), withoutreplacement.It is therefore used to describe a randomexperiment where there are only two pos-sible results: “success” and “failure.”Consider a set of N events in which there areM “successes” and N − M “failures.” Therandom variable X, corresponding to thenumber of successes obtained if we draw nevents without replacement follows a hyper-geometric distribution with parameters N,M and n, denoted by H(N, M, n).The hypergeometric distribution is a dis-crete probability distribution.The number of ways that n events can bedrawn from N events is equal to:

CnN =

(Nn

)= n!

N! · (n− N)!.

The number of elementary events dependson X and is:

CxM · Cn−x

N−M .

which gives the following probability func-tion:

P (X = x) = CxM · Cn−x

N−M

CnN

,

for x = 0, 1, . . . , n

(where Cuv = 0 if u < vby convention) .

Hypergeometric distribution, N = 12, M = 7,n = 5

MATHEMATICAL ASPECTSConsider the random variable X = X1 +X2 + . . .+ Xn, where:

Xi ={

1 if the ith drawing is a success

0 if the ith drawing is a failure.

In this case, the probability distribution forXi is:

Xi 0 1

P (Xi )N −M

NMN

The expected value of Xi is therefore givenby:

E[Xi] =2∑

j=1

xjP(xi = xj)

= 0 · N −M

N+ 1 · M

N

= M

N.

Page 250: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

248 Hypergeometric Distribution

Utilizing the fact thatX = X1+X2+. . .+Xn,we have:

E[X] =n∑

i=1

E[Xi] =n∑

i=1

M

N= n

M

N.

The variance of Xi is, by definition:

Var(Xi) = E[X2

i

]− (E[Xi])2

= M

N−

(M

N

)2

= M(N −M)

N2 .

Since the Xi, i = 1, 2, . . . , n are dependentrandom variables, the covariance shouldbe taken into account when calculating thevariance of X.The probability that Xi and Xj (i �= j) areboth successes is equal to:

P(Xi = 1, Xj = 1) = M(M − 1)

N(N − 1).

If we put V = Xi ·Xj, the values of V and theassociated probabilities are:

V 0 1

P (V ) 1− M(M− 1)

N(N − 1)

M(M− 1)

N(N − 1)

The expected value of V is therefore:

E[V] = 0 ·(

1− M(M − 1)

N(N − 1)

)

+ 1 · M(M − 1)

N(N − 1).

The covariance of Xi and Xj is, by definition,equal to:

Cov(Xi, Xj) = E[Xi · Xj]− E[Xi] · E[Xj]

= M(M − 1)

N(N − 1)−

(M

N

)2

= −M(N −M)

N2(N − 1)

= − 1

N − 1Var(Xi) .

We can now calculate the variance of X:

Var(X) =n∑

i=1

Var(Xi)

+ 2n∑

j=1

∑i<j

Cov(Xi, Xj)

=n∑

i=1

Var(Xi)

+ n(n− 1)Cov(Xi, Xj)

= n

[Var(Xi)− n− 1

N − 1Var(Xi)

]

= nVar(Xi)N − n

N − 1

= nM(N −M)

N2

N − n

N − 1

= N − n

N − 1n

M

N

(1− M

N

).

DOMAINS AND LIMITATIONSThe hypergeometric distribution is oftenused in quality control.Suppose that a production line produces Nproducts, which are then submitted to verifi-cation. A sample of size n is taken from thisbatch of products, and the number of defec-tive products in this sample is noted. It is pos-sible to to use this to obtain (by inference)information on the probable total number ofdefective products in the whole batch.

EXAMPLESA box contains 30 fuses, and 12 of these aredefective. If we take five fuses at random, theprobability that none of them is defective isequal to:

P(X = 0) = CxMCn−x

N−M

CnN

= C012C5

18

C530

= 0.0601 .

Page 251: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

H

Hypothesis 249

FURTHER READING� Bernoulli distribution� Binomial distribution� Discrete probability distribution

Hypothesis

A statistical hypothesis is an assertionregarding the distribution(s) of one or sev-eral random variables. It may concern theparameters of a given distribution or theprobability distribution of a populationunder study.Thevalidity of thehypothesis isexamined byperforming hypothesis testing on observa-tions collected for a sample of the studiedpopulation.When performing hypothesis testing on theprobability distribution of the populationbeing studied, the hypothesis that the studiedpopulation followsagiven probability distri-bution is called the null hypothesis. Thehypothesis that affirms that the populationdoes not follow a given probability distri-bution is called the alternative hypothesis(or opposite hypothesis).If we perform hypothesis testing on theparameters of a distribution, the hypothe-sis that thestudiedparameter isequal toagiv-en value is called the null hypothesis. Thehypothesis that states that the value of theparameter is different to this given value iscalled the alternative hypothesis.The null hypothesis is usually denoted byH0 and the alternative hypothesis by H1.

HISTORYIn hypothesis testing, the hypothesis thatis to be tested is called the null hypothe-sis. We owe the term “null” to Fisher, R.A.

(1935). Introducing this concept, he men-tioned the well-known tea tasting problem,where a lady claimed to be able to recog-nize by taste whether the milk or the tea waspoured into her cup first. The hypothesis tobe tested was that the tastewasabsolutely notinfluenced by the order in which the tea wasmade.Originally, the null hypothesis was usuallytaken to mean that a particular treatmenthas no effect, or that there was no differ-ence between the effects of different treat-ments.Nowadays the null hypothesis is mostly usedto indicate the hypothesis has that to be test-ed, in contrast to the alternative hypothesis.Also see hypothesis testing.

EXAMPLESMany problems involve repeating an exper-iment that has two possible results.One example of this is the gender of a new-born child. In this case we are interested inthe proportion of boys and girls in a givenpopulation. Consider p, the proportion ofgirls, which we would like to estimate froman observed sample. To determine whethertheproportionsofnewborn boysandgirlsarethe same, we make the statistical hypothesisthat p = 1

2 .

FURTHER READING� Alternative hypothesis� Analysis of variance� Hypothesis testing� Null hypothesis

REFERENCESFisher, R.A.: The Design of Experiments.

Oliver & Boyd, Edinburgh (1935)

Page 252: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

250 Hypothesis Testing

Hypothesis TestingHypothesis testing is a procedure that allow,us to (depending on certain decision rules)confirmastarting hypothesis, called thenullhypothesis, or to reject this null hypothesisin favor of the alternative hypothesis.

HISTORYThe theory behind hypothesis testing devel-oped under study. The first steps were tak-en when works began to appear that dis-cussed the significance (or insignificance)of a group of observations. Some exam-ples of such works date from the eigh-teenth century, including those by Arbuth-nott, J. (1710), Bernoulli, Daniel (1734) andLaplace, Pierre Simon de (1773). Theseworks were seen more frequently in the nine-teenth century, such as those by Gavarett(1840) and Edgeworth, Francis Y. (1885).The development of hypothesis testingoccurred in parallel with the theory ofestimation. Hypothesis testing seems tohave been first elaborated by workers in theexperimental sciences and the managementdomain. For example, the Student test wasdeveloped by Gosset, William Sealy duringhis time working for Guinness.Neyman, JerzyandPearson, Egon Sharpedeveloped the mathematical theory ofhypothesis testing, which they presentedin an article published in 1928 in the reviewBiometrika. They were the first to recognizethat the rational choice to be made duringhypothesis testing had to be between the nullhypothesis that we want to test and an alter-native hypothesis. A second fundamentalarticle on the theory of hypothesis testingwas published in 1933 by the same math-ematicians, where they also distinguishedbetween a type I error and a type II error.

The works resulting from the collaborationbetween Neyman, J. and Pearson, E.S. aredescribed in Pearson (1966) and in the biog-raphy of Neyman, published by Reid (1982).

MATHEMATICAL ASPECTSHypothesis testing of a sample generallyinvolves the following steps:1. Formulate the hypotheses:• The null hypothesis H0,• The alternative hypothesis H1.

2. Determine the significance level α of thetest.

3. Determine the probability distributionthat corresponds to the sampling distri-bution.

4. Calculate the critical value of the nullhypothesis and deduce the rejectionregion or the acceptance region.

5. Establish the decision rules:• If the statistics observed in the sample

are located in the acceptance region,wedo not reject thenullhypothesisH0;

• If the statistics observed on the sam-ple are located in the rejection region,we reject the null hypothesis H0 for thealternative hypothesis H1.

6. Take the decision to accept or to rejectthe null hypothesis on the basis of theobserved sample.

DOMAINS AND LIMITATIONSThe most frequent types of hypothesis test-ing are described below.1. Hypothesis testing of a sample: We want

to test whether the value of a parameterθ of the population is identical to a pre-sumed value. The hypotheses will be asfollows:

H0 : θ = θ0 ,

H1 : θ �= θ0 .

Page 253: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

H

Hypothesis Testing 251

where θ0 is the presumed value of theunknown parameter θ .

2. Hypothesis testing on two samples: Thegoal in this case is to find out whethertwopopulations thatarebothdescribedbya particular parameter are different. Let θ1

and θ2 be parameters that describe popu-lations 1 and 2 respectively. We can thenformulate the following hypotheses:

H0 : θ1 = θ2 ,

H1 : θ1 �= θ2 .or

H0 : θ1 − θ2 = 0 ,

H1 : θ1 − θ2 �= 0 .

3. Hypothesis testing of more than twosamples: As for a test performed on twosamples, hypothesis testing is performedon more than two samples to determinewhether these populations are different,based on comparing the same parameterfrom all of the populations being tested.Inthiscase,wetest thefollowinghypothe-ses:

H0 : θ1 = θ2 = . . . = θk ,

H1 : The values of θi (i = 1, 2, . . . , k)are not all identical.

Here θ1, . . . , θk are the unknown param-eters of the populations and k is the num-ber of populations to be compared.

In hypothesis testing theory, and in prac-tice, we can distinguish between two typesof tests: a parametric test and a nonparamet-ric test.

Parametric TestsA parametric test is a hypothesis test thatpre-supposes a particular form for each of thedistributions related to the underlying popu-lations. This case applies, for example, whenthese populations follow a normal distri-bution.

TheStudent test is an exampleofaparamet-ric test. This test compares the means of twonormally distributed populations.

Nonparametric TestA nonparametric test is a hypothesis testwhere it is not necessary to specify theparametric form of the distribution of theunderlying population.There are many examples of this type oftest, including the sign test, the Wilcoxontest, the signed Wilcoxon test, the Mann–Whitney test, the Kruskal–Wallis test, andthe Kolmogorov–Smirnov test.

EXAMPLESFor examples of parametric hypothesis test-ing, see binomial test, Fisher test or Stu-dent test. For examples of nonparamet-ric hypothesis testing, see Kolmogorov–Smirnov test, Kruskal–Wallis test,Mann–Whitney test, Wilcoxon test,signed Wilcoxon test and sign test.

FURTHER READING� Acceptance region� Alternative hypothesis� Nonparametric test� Null hypothesis� One-sided test� Parametric test� Rejection region� Sampling distribution� Significance level� Two-sided test

REFERENCESArbuthnott, J.:An argument forDivineProv-

idence, taken from the constant regularityobserved in the births of both sexes. Phi-los. Trans. 27, 186–190 (1710)

Page 254: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

252 Hypothesis Testing

Bernoulli, D.: Quelle est la cause physiquede l’inclinaison des planètes (. . . ). Rec.Pièces Remport. Prix Acad. Roy. Sci. 3,95–122 (1734)

Edgeworth, F.Y.: Methods of Statistics.Jubilee Volume of the Royal StatisticalSociety, London (1885)

Gavarret, J.: Principes généraux de statis-tiquemédicale.BechéJeune&Labé,Paris(1840)

Laplace, P.S. de: Mémoire sur l’inclinaisonmoyenne des orbites des comètes. Mém.Acad. Roy. Sci. Paris 7, 503–524 (1773)

Lehmann, E.L.: Testing Statistical Hypothe-ses, 2nd edn. Wiley, New York (1986)

Neyman, J., Pearson, E.S.: On the use andinterpretation of certain test criteria forpurposes of statistical inference, Parts Iand II. Biometrika 20A, 175–240, 263–294 (1928)

Neyman, J., Pearson, E.S.: On the prob-lem of the most efficient tests of statisti-cal hypotheses. Philos. Trans. Roy. Soc.Lond. Ser. A 231, 289–337 (1933)

Pearson, E.S.: The Neyman-Pearson story:1926–34. In: David, F.N. (ed.) ResearchPapers in Statistics: Festschrift for J. Ney-man. Wiley, New York (1966)

Reid, C.: Neyman—From Life. Springer,Berlin Heidelberg New York (1982)

Page 255: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

I

Idempotent Matrix

See matrix.

Identity Matrix

See matrix.

Incidence

See incidence rate.

Incidence Rate

The incidence of an illness is defined asbeing the number of new cases of illnessappearingduringadeterminedperiodamongthe individuals of a population. This notionis similar to that of “stream.”The incidence rate is defined as being rela-tive to the dimension of the population andthe time; this value is expressed by relationto a number of individuals and to a duration.The incidence rate is the incidence I dividedby thenumberofpeopleat risk for the illness.

HISTORYSee prevalence rate.

MATHEMATICAL ASPECTSFormally we have:

incidence rate = risk

duration of observation.

DOMAINS AND LIMITATIONSAn accurate incidence rate calculated ina population and during a given period willbe interpreted as the index of existence of anepidemic in this population.

EXAMPLESLet us calculate the incidence rate of breastcancer in a population of 150000 womenobserved during 2 years, according to the fol-lowing table:

Numberofcases

Numberofsubjects

Incidencerate /100000 /year

Group [A] [B] [ A2B ]

Non-exposed

40 35100 57.0

Passivesmokers

140 55500 126.1

Activesmokers

164 59400 138.0

Total 344 150000 114.7

The numerator of the incidence rate is thesame as that of the risk: the number of new

Page 256: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

254 Incomplete Data

cases (or incident cases) of breast cancerduring 2 years of observation among thepreference group. The denominator of theincidence rate is obtained by multiplyingthe number of individuals in the preferencegroup by the duration of observation (here2 years). Normally, the rate is expressed as1 year.

FURTHER READING� Attributable risk� Avoidable risk� Cause and effect in epidemiology� Odds and odds ratio� Prevalence rate� Relative risk� Risk

REFERENCESLilienfeld, A.M., Lilienfeld, D.E.: Founda-

tions of Epidemiology, 2nd edn. Claren-don, Oxford (1980)

MacMahon, B., Pugh, T.F.: Epidemiology:Principles and Methods. Little Brown,Boston, MA (1970)

Morabia, A.: Epidemiologie Causale. Editi-ons Médecine et Hygiène, Geneva (1996)

Morabia, A.: L’Épidémiologie Clinique.Editions “Que sais-je?”. Presses Univer-sitaires de France, Paris (1996)

Incomplete Data

See missing data.

Independence

The notion of independence can have twomeanings: the first concerns the events ofa random experiment. Two events are said

to be independent if the realizationof thesec-ond event does not depend on the realizationof the first.The independencebetween two eventsAandB can be determined by probabilities: if theprobability of event A when B is realized isidentical to the probability of A when B is notrealized, thenAandBarecalledindependent.The second meaning of the word indepen-dence concerns the relation that can existbetween random variables. The indepen-dence of two random variables can be test-ed with the Kendall test and the Spearmantest.

HISTORYThe independence notion was implicitlyused long before a formal set of probabi-lity axioms was established. According toMaistrov, L.E. (1974), Cardano was alreadyusing the multiplication rule of probabilities(P (A ∩ B) = P (A) · P (B)). Maistrov, L.E.also mentions that the notions of indepen-denceanddependencebetween eventswerevery familiar to Pascal, Blaise, de Fermat,Pierre, and Huygens, Christiaan.

MATHEMATICAL ASPECTSMathematically, we can say that event A isindependent of event B if

P (A) = P (A | B) ,

where P(A) is the probability of realiz-ing event A and P(A|B) is the conditionalprobability ofevent A as a function of therealization of B, meaning the probability thatA happens knowing that B has happened.Using the definition of conditional proba-bility, the previous equation can be writtenas:

P (A) = P (A ∩ B)

P (B)

Page 257: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

I

Independence 255

or

P (A) · P (B) = P (A ∩ B) ,

which gives

P (B) = P (A ∩ B)

P (A)= P (B | A) .

With this simple transformation, we havedemonstrated that if A is independent of B,then B is also independent of A. We will saythat events A and B are independent.

DOMAINS AND LIMITATIONSIf two events A and B are independent, thenthey arealwayscompatible.Theprobabilitythat both events A and B happen simultane-ously can be determined by multiplying theindividual probabilities of the two events:

P (A ∩ B) = P (A) · P (B) .

But if A and B are dependent, they canbe compatible or incompatible. There is noabsolute rule.

EXAMPLESConsider a box containing four notes num-bered from 1 to 4 from which we will maketwo successive drawings without putting thefirst note back in the box before drawing thesecond note.

The sample space can be represented by thefollowing 12 pairs:

� = {(1, 2), (1, 3), (1, 4), (2, 1), (2, 3),

(2, 4), (3, 1), (3, 2), (3, 4),

(4, 1), (4, 2), (4, 3)} .

Consider the following two events:

A = “pull out note No. 3 on thefirst drawing”

= {(3, 1), (3, 2), (3, 4)}B = “pull out note No. 3 on the

second drawing”

= {(1, 3), (2, 3), (4, 3)}Are these two events dependent or inde-pendent? If they are independent, the proba-bility of event B must be identical to theprobabilityofeventA, regardlessof theresultof event A.If A happens, then the following notes willremain in the box for the second drawing:

The probability that B happens is then nullbecause note No. 3 is no longer in the box.On the other hand, if A is not fulfilled (forexample, note No. 2 was drawn in the firstdrawing), the second drawing will have thefollowing notes in the box:

The probability that B will be fulfilled is 13 .

We can therefore conclude that the probabi-lity of event B is dependent on event A or thatevents A and B are dependent.We can verify our conclusion in the follow-ing way: if A and B are independent, the fol-lowing equation must be verified:

P (A ∩ B) = P (A) · P (B) .

In our example, P(A) = 312 and P(B) = 3

12 .

Page 258: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

256 Independent Variable

The intersection between A and B is null.Therefore we have

P (A ∩ B) = 0 �= P (A) · 116 = P (B) .

If the equation is not confirmed, thenevents A and B are not independent. There-fore they are dependent.Consider now a similar experiment, but byputting the drawn note back in the box beforethe second drawing.The sample space of this new randomexperiment is composed of the following 16pairs:

� = {(1, 1), (1, 2), (1, 3), (1, 4), (2, 1),

(2, 2), (2, 3), (2, 4), (3, 1), (3, 2),

(3, 3), (3, 4), (4, 1), (4, 2), (4, 3),

(4, 4)} .In these conditions, the second drawing willbe done from a box that is identical to that ofthe first drawing:

EventsAand Bcontain the followingresults:

A = {(3, 1), (3, 2), (3, 3), (3, 4)} ,B = {(1, 3), (2, 3), (3, 3), (4, 3)} .

Theprobability of A, aswell asofB, is there-fore 4

16 . The intersection between A and B isgivenbytheeventA∩B= {(3, 3)}; itsproba-bility is 1

16 . Therefore we have

P (A ∩ B) = 116 = P (A) · P (B) .

Since P(A∩ B) = P(A) · P(B), we can con-clude that event B is independent of event A,or that the events A and B are independent.

FURTHER READING� Chi-square test of independence� Compatibility� Conditional probability� Dependence� Event� Kendall rank correlation coefficient� Probability� Spearman rank correlation coefficient� Test of independence

REFERENCESMaistrov, L.E.: Probability Theory—A His-

tory Sketch (transl. and edit. by Kotz, S.).Academic, New York (1974)

Independent Variable

In a regression model the variables which areconsidered to influence the dependent vari-ables or to explain the variations of the latter,are all independent variables.

FURTHER READING� Dependent variable� Multiple linear regression� Regression analysis� Simple linear regression

Index Number

In its most general definition, an index num-ber is a value representing the relative vari-ation of a variable between two determinedperiods (or situations).The simple index numbers should be dis-tinguished from the composite index num-bers.

Page 259: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

I

Index Number 257

• Simple index numbers describe the rel-ative change of a single variable.

• Composite index numbers allow one todescribe with a single number the com-parison of the set of values that severalvariables take in a certain situation withrespect to the set of values of the samevariables in a reference situation.

The reference situation defines the basis ofthe index number. We can say, for exam-ple, that for reference year (basis year) 1980,a certain index number has a value of 120 in1982.

HISTORYThe first studies using index numbers dateback to the early 18th century.In 1707, the Anglican Bishop Fleetwoodundertook thestudy of theevolution ofpricesbetween theyears1440and1700.Thisstudy,whose results are presented in his workChronicon Preciosum (1745), was done fora very specific purpose: during the foundingofacollege in1440–1460,anessentialclausewas established stipulating that any admittedmemberhad to leave thecollege ifhis fortuneexceeded £5 per year. Fleetwood wanted toknow if, taking into account the price evolu-tion, such a promise could still be kept threecenturies later.He considered four products—wheat, meat,drink, and clothing—and studied the evolu-tion of their prices. At the end of his work, hecame to the conclusion that 5 pounds (£) in1440–1460 had the same value as 30 pounds(£) in 1700.Later, in 1738, Dutot, C. studied the diminu-tion of the value of money in a work entitledRéflexions politiques sur les finances et lecommerce. For this he studied the incomesof two kings, Louis XII and Louis XV. Thetwo kings had the following incomes:

Louis XII: £ 7650000 in 1515

Louis XV: £100000000 in 1735

To determinewhich of thesovereignshad thehighest disposable income, Dutot, C. notedthe different prices, at the considered dates,of a certain number of goods: a chicken,a rabbit, a pigeon, the value of a day’s work,etc.Dutot’s measurement was the following:

I =∑

P1∑P0

,

where∑

P1 and∑

P0 are the sums of thenoted prices at the respective dates. Noticethat this index number is not ponderated.Dutot came to the conclusion that the wealthofLouisXIIwas lower than thatofLouisXV.The diminution of the value of money wasalso the object of a study by Carli, GianRinaldo. In 1764 the astronomy professor ofPadoue studied the evolution of prices sincethe discovery of the Americas.He took three products as references—grains, wine, and cooking oil—studied theprices on these products in the period 1500–1750.Carli’s measurement was the following:

I = 1

n·∑ P1

P0.

The obtained index number is an arithmeticmean of the relative prices for each observedproduct.In 1798 a work entitled An Account of SomeEndeavours to Ascertain a Standard ofWeight and Measure, signed by Sir GeorgeShuckburgh Evelyn, was published. Thisprecursor to index numbers defines a priceindex number covering the years 1050 to1800 by introducing the notions of refer-ence year (1550) and relative prices. He also

Page 260: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

258 Index Number

included the price of certain services in hisobservations.It is Lowe, Joseph who should be seen,according to Kendall, M.G. (1977), as thetrue father of index numbers. His work, pub-lished in 1822, called The present state ofEngland, treated many problems relative tothe creation of index numbers. Among oth-er things, he introduced the notion of priceponderation at different dates (P1 and P0) bythe quantities observed at the first date (Q0).The Lowe measurement is the following:

I =∑

(P1 · Q0)∑(P0 · Q0)

.

Later on this index number would be knownas the Laspeyres index.In the second half of the 19th century, statis-ticians made many advances in this field.In 1863, with ‘A Serious Fall in the Valueof Gold Ascertained and Its Social EffectsSet Forth, Jevons, W.S. studied the type ofmean to be used during the creation of anindex number. He recommended the use ofthe geometrical mean.Between 1864 and 1880, three German sci-entists, Laspeyres, E. (1864, 1871), Dro-bisch, M.W. (1871), and Paasche, H. (1874),worked on the evolution of prices of mate-rial goods in Hamburg, without taking ser-vices into account. All three are nonponder-ated calculation methods.The Paasche index is determined by the fol-lowing formula:

I =∑

(P1 · Q1)∑(P0 · Q1)

.

Palgrave, R.H.I (1886) proposed to ponder-ate the relative prices by the total value of theconcerned good; this method leads to calcu-

lating the index number as follows:

I =∑(

P1 · Q1 · P1P0

)∑

(P1 ·Q1).

It was the works of statisticians such asI. Fisher who defined in 1922 an index num-ber that carries his name and that constitutesthegeometric meanof theLaspeyres indexand the Paasche index:

I =√∑

(P1 · Q0)∑

(P1 ·Q1)∑(P0 · Q0)

∑(P0 ·Q1)

,

whereas in the same period, Marshall, A. andEdgeworth,F.W.proposed the followingfor-mula:

I =∑

(P1 (Q1 + Q0))∑(P0 (Q1 + Q0))

.

Notice that with this formula it is the meanof the weights that is calculated.Modern statistical studies still largely callupon the index numbers that have beendefined here.

DOMAINS AND LIMITATIONSThe calculation of index numbers is a usu-al method of describing the changes in eco-nomic variables. Even if the most currentindex numbers essentially concern the pricevariable and the quantity variable, they canalso be calculated for values, qualities, etc.Generally associated with the field of busi-ness and economy, index numbers are alsowidely used in other spheres of activity.

FURTHER READING� Composite index number� Fisher index� Laspeyres index� Paasche index� Simple index number

Page 261: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

I

Indicator 259

REFERENCESCarli, G.R.: Del valore etc. In: Opere Scelte

di Carli, vol. I, p. 299. Custodi, Milan(1764)

Drobisch,M.W.:ÜberMittelgrössen und dieAnwendbarkeit derselben auf die Berech-nung des Steigens und Sinkens des Geld-werts. Berichte über die Verhandlungender König. Sachs. Ges. Wiss. Leipzig.Math-Phy. Klasse, 23, 25 (1871)

Dutot, C.: Réflexions politiques sur lesfinances, et le commerce. Vaillant andNicolas Prevost, The Hague (1738)

Evelyn, Sir G.S.: An account of some en-deavours to ascertain a standard of weightand measure. Philos. Trans. 113 (1798)

Fleetwood, W.: Chronicon Preciosum, 2ndedn. Fleetwood, London (1707)

Jevons, W.S.: A Serious Fall in the Value ofGold Ascertained and Its Social EffectsSet Forth. Stanford, London (1863)

Kendall, M.G.: The early history of indexnumbers. In: Kendall, M., Plackett, R.L.(eds.) Studies in the History of Statisticsand Probability, vol. II. Griffin, London(1977)

Laspeyres, E.: Hamburger Warenpreise1850–1863 und die kalifornisch-austra-lischen Geldentdeckung seit 1848. Jahrb.Natl. Stat. 3, 81–118, 209–236 (1864)

Laspeyres, E.: Die Berechnung einer mit-tleren Waarenpreissteigerung. Jahrb.Natl. Stat. 16, 296–314 (1871)

Lowe, J.: The Present State of Englandin Regard to Agriculture, Trade, andFinance. Kelley, London (1822)

Paasche, H.: Über die Preisentwicklungder letzten Jahre nach den HamburgerBorsen-notirungen. Jahrb. Natl. Stat. 23,168–178 (1874)

Palgrave, R.H.I.: Currency and standardof value in England, France and Indiaetc. (1886) In: IUP (1969) British Par-liamentary Papers: Session 1886: ThirdReport and Final Report of the RoyalCommission on the Depression in Tradeand Industry, with Minutes of Evidenceand Appendices, Appendix B. Irish Uni-versity Press, Shannon

Indicator

An indicator is a statistic (official statistic)whoseobjective is to givean indicationaboutthe state, behavior, and changing nature dur-ing some period of an economic or politi-cal phenomenon. An indicator must informabout the variations in the values as well aschanges in the nature of the observed phe-nomena and must serve to instruct the deci-sion makers.At its most accurate, an indicator reflects thesocial or economic process and suggests thechanges that should be made. It is an alarmsignal that attracts attention and and servesas a call to action (pilotage). Many indicatorsare generally necessary for providing a glob-al reflection of given phenomena and of thepolitics that one wishes to evaluate. A setof indicators provides information about thesame subject and is called a system of indi-cators.Not all statistics are indicators; indicatorsmust:1. Be quantifiable, which, strictly speak-

ing, does not mean measurable.2. Be unidimensional (avoid overlap with

other indicators).3. Cover the set of priority objectives.4. Directly evaluate the performance of

a policy.

Page 262: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

260 Indicator

5. Show ways to improve.6. Refer to major aspects of the system and

be changeable with changes in the polit-ical system.

7. Be verifiable.8. Be relatable to each other.9. Be as economic as possible.10. Be readily available.11. Beeasily understood by thegeneralpub-

lic.12. Be recognizable by everybody as valid

and reliable.

DOMAINS AND LIMITATIONSConstruction of IndicatorsIndicators, like all measures of social phe-nomena, cannotbeexactandexhaustive.Theconstruction of indicators involves philoso-phical considerations (judgement of values)and technicaland scientificknowledge,but itis also based on logic and often reflects polit-ical preoccupations.

Collection and Choice of IndicatorsTo collect good indicators, an appropriatemodel of economic, social, and educationalsystems is needed. The model as generallyadapted by the OCDE is a systemic model:in a given environment, we study a sys-tem’s resources, process, and effects. Butthe choice of indicators united in a systemof indicators also depends on the logisticalfeasibility of data collection. This explainsthe large difference among indicators. Thenumber of indicators necessary for a goodpilotage of an economic or social systemis difficult to evaluate. If too numerous, theindicators lose importance and risk not play-ing an important role. But together theymust represent the system to be evaluated.A too small number risks making the systeminvalid and unrepresentative.

Interpretation of IndicatorsThe interpretation of indicators is as impor-tant for the system as their choice, construc-tion, and collection. Having the significancethat the indicators can take for the politicallyresponsible, for thefinancing andthesuccessof certain projects, all precautions must betaken to assure the interpretative integrality,the quality, and the neutrality of the indica-tors.

Validation of IndicatorsBefore using these statistics as indicatorsof the performance, quality, and health ofan economic, social, or educational system,they should be validated. If indicators’ val-uesarecalculated scientifically using mathe-maticalmethodsand instruments, their influ-ence on the system to be described and theirreliability as the foundation for a plan ofaction are not always scientifically proven.Moreover, when a set of indicators is used,it is still far from clear how these indica-tors combine their effects to produce theobserved results. The methods for validatingindicators have yet to be developed.

EXAMPLESEconomic IndicatorsThese include gross domestic product(GDP), national revenue, consumer priceindex, producer price index, unemploymentrate, total employment, long-term and short-term interest rates, exports and imports ofproduced goods, and balance of payments.

Education IndicatorsThese include the cost of education, theactives in thesystem of teaching, the scholar-ization expectancy, the geographical dispar-ities in access to the undergraduate level, the

Page 263: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

I

Inertia 261

general level of draftees, the level of educa-tion of young people following high schoolgraduation, the benefits of an undergraduatedegree with respect to finding employment,compulsory schooling of children from 2 to5 years of age, and educational expenses forcontinuing education.

Science and Technology IndicatorsThese include Gross domestic expenditureson research and development (GDERD),per-capita GDERD, GDERD as a percent-age of GDP, total personnel R&D expen-ditures, total university diplomas, percent-age of GDERD financing by businesses, per-centage of GDERD financed by the govern-ment, percentage of GDERD financed fromabroad, and national patent applications.

FURTHER READING� Official statistics

REFERENCESOCDE: OCDE et les indicateurs inter-

nationaux de l’enseignement. Centrede la Recherche et l’innovation dansl’enseignement, OCDE, Paris (1992)

OCDE: Principaux indicateurs de la scienceet de la technologie. OCDE, Paris (1988)

Inertia

Inertia is a notion that is used largely inmechanics. The more inertia a body hasaround an axis, the more energy has to beused to put this body in rotation around thisaxis. The mass of the body and the (squared)distance from the body to the axis are usedin the calculation of the inertia. The defini-tion of variance can be recognized here if

“mass” is replaced by “frequency” and “dis-tance” by “deviation.”

MATHEMATICAL ASPECTSSuppose that we have a cloud of n points cen-tered in a space of dimension p; the inertiamatrix V can be constructed with the coor-dinates and frequencies of the points.The eigenvalues k1, . . . , kp (written indecreasing order) of matrix V are the iner-tia explained by the p factorial axes of thecloud of points. Their sum, equal to the traceof V , is equal to the total inertia of the cloud.We will say, for example, that the first fac-torial axis (or inertia axis) explains:

100 · k1p∑

i=1

ki

(as %) of inertia .

The first two inertia axes, for example,explain:

100 · k1 + k2p∑

i=1

ki

(in %) of inertia .

DOMAINS AND LIMITATIONSThenotionof inertiaor,morespecifically, theinertia matrix of a cloud of points is used,among other things, to performacorrespon-dence analysis between considered points.

FURTHER READING� Correspondence analysis� Distance� Eigenvalue� Factorial axis� Frequency� Inertia matrix

Page 264: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

262 Inertia Matrix

Inertia MatrixConsider a cloud of points in a space of pdimensions. We suppose that this cloud iscentered in the origin of the reference sys-tem; if not, it is enough to cause a change invariables by subtracting from each point thecenter of the cloud. Let us designate by A thematrix of the coordinates multiplied by thesquare root of the frequency associated toeach point. Such a matrix will be, for n dif-ferent points, of dimension (n×p). Supposethat p is smaller than or equal to n; otherwisewe use the transpose of matrix A as a newmatrix of coordinates.We obtain the inertia matrix V by multiply-ing A by the transpose of A, A′. V is a sym-metric matrix of order p.The trace of this matrix V, is exactly the totalinertia (or total variance).

MATHEMATICAL ASPECTSLet n be different points Xi of p components

Xi =(xi1, xi2, . . . , xip

), i = 1, 2, . . . , n ,

each point Xi has one frequency fi (i =1, 2, . . . , n).If the points are not centered around the ori-gin, then we should make the changing vari-ables correspond to the shift to the origin ofthe center of the cloud:

X′i = Xi − Y ,

where Y = (y1, y2, . . . , yp

)with

yj = 1n∑

i=1fi

·(

n∑i=1

fi · xij

).

The general term of inertia matrix V is then:

vij =n∑

k=1

fk · (xki − yi) ·(xkj − yj

).

In matrix form, V = A′ ·A, where the coef-ficients of matrix A are:

aki =√

fk · (xki − yi) .

The total inertia of the cloud of points isobtained by calculating the trace of the iner-tia matrix V, which equals:

Itot = tr (V) =p∑

i=1

vii

=p∑

i=1

n∑k=1

fk · (xki − yi)2 .

DOMAINS AND LIMITATIONSThe inertia matrix of a cloud of points isused to perform a correspondence analysisbetween two considered points.

EXAMPLESConsider the following points in the space ofthree dimensions:

(2; 0; 0.5)

(3; 1; 2)

(−2;−1; −1)

(−1;−1;−0.5)

(−1;−1;−0.5)

(−1;−1;−0.5)

There are four different frequenciesequalling 1, 1, 1, and 3.The center of the cloud formed by its pointsis given by:

y1 = 2+ 3− 2+ 3 · (−1)

6= 0 ,

y2 = 1− 1+ 3 · (−1)

6= −0.5 ,

y3 = 0.5+ 2− 1+ 3 · (−0.5)

6= 0 .

Page 265: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

I

Inferential Statistics 263

The new coordinates of our four differentpoints will be given by:

(2; 0.5; 0.5)(3; 1.5; 2) (−2; −0.5; −1)

(−1; −0.5; −0.5)of frequency 3 ,

from which A, the matrix of the coordinatesmultiplied by the square root of the frequen-cy, is calculated:

A =

⎡⎢⎢⎢⎢⎣

2 0.5 0.53 1.5 2−2 −0.5 −1

−√3 −√

3

2−√

3

2

⎤⎥⎥⎥⎥⎦

and V, the inertia matrix:

V = A′ ·A =⎡⎣

20 8 10.58 3.5 4.5

10.5 4.5 6

⎤⎦ .

The inertia of the cloud of points equals:

Itot = tr (V) = 20+ 3.5+ 6 = 29.5 .

FURTHER READING� Correspondence analysis� Frequency� Matrix� Trace� Transpose� Variance

REFERENCESLagarde, J. de: Initiation à l’analyse des don-

nées. Dunod, Paris (1983)

Inference

Inference is a form of reasoning by inductiondone on the basis of information collected on

a sample from the perspective of generaliz-ing the information to the population asso-ciated to this sample.

HISTORYSee inferential statistics.

FURTHER READING� Inferential statistics

Inferential Statistics

Inferential or deductive statistics is comple-mentary to descriptive statistics becausethe goal of most research is not to establisha certain number of indicators on a givensample but rather to estimate a certain num-ber of parameters characterizing the popula-tion associated to the treated sample.In inferential statistics, hypotheses are for-mulated in the parameters of a population.These hypotheses are then tested on the basisof observations made on a representativesample of the population.

HISTORYThe origins of inferential statistics coincidewith those of probability theory and cor-respond to the works of Bayes, Thomas(1763), de Moivre, Abraham (1718),Gauss, Carl Friedrich (1809), and deLaplace, Pierre Simon (1812).At that time, there were numerousresearchers in the field of inferential statis-tics. We mention, for example, the works ofGalton, Francis (1889) in relation to thecorrelation as well as the development ofhypothesis testing due principally to Pear-son, Karl (1900) and Gosset, William

Page 266: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

264 Inner Product

Sealey,whowasknownas“Student”(1908).Neyman, Jerzy, Pearson, Egon Sharpe(1928), and Fisher, Ronald Aylmer (1956)also contributed in a fundamental way to thedevelopment of inferential statistics.

FURTHER READING� Bayes’ theorem� Descriptive statistics� Estimation� Hypothesis testing� Maximum likelihood� Optimization� Probability� Sampling� Statistics� Survey

REFERENCESBayes, T.: An essay towards solving a prob-

lem in the doctrine of chances. Philos.Trans. Roy. Soc. Lond. 53, 370–418(1763). Published, by the instigation ofPrice, R., 2 years after his death. Repub-lished with a biography by Barnard,George A. in 1958 and in Pearson, E.S.,Kendall, M.G.: Studies in the History ofStatistics and Probability. Griffin, Lon-don, pp. 131–153 (1970)

Fisher, R.A.: Statistical Methods and Scien-tific Inference. Oliver & Boyd, Edinburgh(1956)

Galton, F.: Natural Inheritance. Macmillan,London (1889)

Gauss, C.F.: Theoria Motus CorporumCoelestium. Werke, 7 (1809)

Gosset, S.W. “Student”: The Probable Errorof a Mean. Biometrika 6, 1–25 (1908)

Laplace, P.S. de: Théorie analytique desprobabilités. Courcier, Paris (1812)

Moivre, A. de: The Doctrine of Chances: or,A Method of Calculating the Probabilityof Events in Play. Pearson, London (1718)

Neyman, J., Pearson, E.S.: On the use andinterpretation of certain test criteria forpurposes of statistical inference, Parts Iand II. Biometrika 20A, 175–240, 263–294 (1928)

Pearson, K.: On the criterion, that a givensystem of deviations from the probable inthe case of a correlated system of vari-ables is such that it can be reasonably sup-posed to have arisen from random sam-pling. In: Karl Pearson’s Early Statisti-cal Papers. Cambridge University Press,pp. 339–357. First published in 1900 inPhilos. Mag. (5th Ser) 50, 157–175 (1948)

Inner ProductThe innerproductof x andy, denotedbyx′·y,is the sum of products, component by com-ponent, of two vectors.Geometrically, the inner product of two vec-tors of dimension n corresponds, in the spaceof n dimensions, to the number obtained bymultiplying the norm of one by those ofthe projection of another. This definition isequivalent to saying that thescalarproductoftwo vectors is obtained by making the prod-uct of their norms and cosines of the angledetermined by two vectors.

MATHEMATICAL ASPECTSLet x and y be two vectors of dimension n,given by their components:

x =

⎡⎢⎢⎢⎣

x1

x2...

xn

⎤⎥⎥⎥⎦ and y =

⎡⎢⎢⎢⎣

y1

y2...

yn

⎤⎥⎥⎥⎦ .

Page 267: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

I

Inner Product 265

The scalar product of x and y, denoted by x′ ·y, is defined by:

x′ · y = [x1 x2 . . . xn] ·

⎡⎢⎢⎢⎣

y1

y2...

yn

⎤⎥⎥⎥⎦ ,

x′ · y = x1 · y1 + x2 · y2 + . . .+ xn · yn ,

x′ · y =n∑

i=1

xi · yi .

The scalar product of a vector x with itselfcorresponds to its square norm (or length),denoted by ‖ x ‖2:

‖ x ‖2= x′ · x =n∑

i=1

x2i .

With the help of the norm, we can calculatethe scalar product

x′ · y =‖ x ‖ · ‖ y ‖ · cos α ,

where α is the angle between vectors x andy.The scalar product of two vectors has the fol-lowing properties:• It is commutative, that is, for two vectors

of the same dimension x and y, x′ · y =y′ ·x. In other words, there is no differenceif we project x onto y or vice versa.

• We can take the negative values if, forexample, theprojectionsof twovectorsonwhich we made the projection have oppo-site directions.

EXAMPLESWe consider two vectors in the space of threedimensions, given by their components:

x =⎡⎣

123

⎤⎦ and y =

⎡⎣

02−1

⎤⎦ .

The scalar product of x and y, denoted by x′ ·y, is defined by:

x′ · y =⎡⎣

123

⎤⎦′

·⎡⎣

02−1

⎤⎦

= [1 2 3] ·⎡⎣

02−1

⎤⎦ ,

x′ · y = 1 · 0+ 2 · 2+ 3 · (−1) = 4− 3 ,

x′ · y = 1 .

The norm of x, denoted by ‖ x ‖, is givenby the square root of the scalar product byitself. Thus:

x′ · x =⎡⎣

123

⎤⎦′

·⎡⎣

123

⎤⎦

= [1 2 3] ·⎡⎣

123

⎤⎦ ,

x′ · x = 12 + 22 + 32 = 1+ 4+ 9 ,

x′ · x = 14 ,

from which ‖ x ‖= √14 ≈ 3.74.

Vector x has a length of 3.74 in the space ofthree dimensions.In the same way, we can calculate the normof vector y. It is given by:

‖ y ‖2= y′ · y =3∑

i=1

y2i = 22 + (−1)2 = 5

and equals ‖ y ‖= √5.Thus we can determine the angle α situatedbetween the two vectors with the help of therelation:

cos α = x′ · y‖ x ‖ · ‖ y ‖ =

1√14 · √5

≈ 0.1195 .

This gives an approximative angle of 83.1◦between vectors x and y.

Page 268: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

266 Interaction

FURTHER READING� Matrix� Transpose� Vector

InteractionThe term interaction is used in design ofexperiment methods, more precisely in fac-torial experiments, where a certain numberof factors can be studied simultaneously. Itappears when the effect on the dependentvariable of a change in the level ofone factordepends on levels of other factors.

HISTORYThe interaction concept in design of exper-iment methodology is due to Fisher, R.A.(1925, 1935).

MATHEMATICAL ASPECTSThe model of a two-way classificationdesign with interaction is as follows:

Yijk = μ+ αi + βj + (αβ)ij + εijk ,

i = 1, 2, . . . , a ,

j = 1, 2, . . . , b , and

k = 1, 2, . . . , nij ,

where Yijk is the kth observation receivingtreatment ij,μ is the general mean commontoall treatments,αi is theeffectof the ith levelof factor A, βj is the effect of the jth level offactor B, (αβ)ij is the effect of the interactionbetween ith level of factor A and jth level offactor B, and εijk is the experimental errorof observation Yijk.

DOMAINS AND LIMITATIONSThe interactions between two factors arecalled first-order interactions, those between

three factors are called second-order interac-tions, and so forth.

EXAMPLESSee two-way analysis of variance.

FURTHER READING� Analysis of variance� Design of experiments� Experiment� Fisher distribution� Fisher table� Fisher test� Two-way analysis of variance

REFERENCESCox, D.R.: Interaction. Int. Stat. Rev. 52, 1–

25 (1984)

Fisher, R.A.: Statistical Methods forResearch Workers. Oliver & Boyd, Edin-burgh (1925)

Fisher, R.A.: The Design of Experiments.Oliver & Boyd, Edinburgh (1935)

Interquartile Range

The interquartile range is a measure ofdispersion corresponding to the differencebetween the first and the third quartiles.It therefore corresponds to the interval thatcontains 50% of the most centered observa-tions of the distribution.

MATHEMATICAL ASPECTSConsider Q1 and Q3 the first and third quar-tiles of a distribution. The interquartile rangeis calculated as follows:

Interquartile range = Q3 − Q1 .

Page 269: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

I

Inverse Matrix 267

DOMAINS AND LIMITATIONSThe interquartile range is a measure of vari-ability that does not depend on the number ofobservations. Moreover, and at the oppositeend of the range, this measure is less sensi-tive to outliers.

EXAMPLESConsider the following set of ten observa-tions:

0 1 1 2 2 2 3 3 4 5

The first quartile Q1 is equal to 1 and thethird quartile Q3 to 3. The interquartile rangeis therefore equal to:Interquartile range = Q3−Q1 = 3−1 = 2 .We can therefore conclude that 50% of theobservations are located in an interval oflength 2, meaning the interval between 1and 3.

FURTHER READING� Measure of dispersion� Quartile� Range

Interval

An interval isdetermined by two boundaries,a and b. These are called the limit values ofthe interval.Intervals can be closed, open, semiopen, andsemiclosed.– Closed interval:

denoted [a, b], represents the set of x with:

a ≤ x ≤ b .

– Open interval:denoted ]a, b[, represents the set of x with:

a < x < b .

– Interval that is semiopen on left or semi-closed interval on right:– denoted ]a, b], represents the set of xwith:

a < x ≤ b .

– Interval that is semiopen on right orsemiclosed on left:denoted [a, b[, represents the set of x with:

a ≤ x < b .

FURTHER READING� Frequency distribution� Histogram� Ogive

Inverse Matrix

For a square matrix A of order n, we callthe inverse matrix of A the square matrix ofordern such that the resultof thematrix prod-uct of A and its inverse (like those of theinverse-multiplying A) equals the identitymatrix of order n. For any square matrix, theinverse matrix, if it exists, is unique. To havean invertiblesquarematrixA, itsdeterminantshould be other than zero.

MATHEMATICAL ASPECTSLet A be a square matrix of order n of nonze-ro determinant. We call the inverse of A theunique square matrix of order n, denotedA−1, such as:

A ·A−1 = A−1 · A = In ,

where In is the identity matrix of order n. Sowe say that the matrix is non-singular. Whendet (A) = 0, that is, thematrixA isnotinvert-ible, we say that it is singular.

Page 270: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

268 Inverse Matrix

The condition that the determinant of A can-not be zero is explained by the properties ofthe determinant:

1 = det (In) = det(

A ·A−1)

= det (A) · det(

A−1)

,

that is, det(A−1

) = 1det(A)

.

On the other hand, the expression A−1, fromA, is obtained with the help of 1

det(A)(which

must be defined) and equals:

A−1 = 1

det (A)· [(−1)i+j · Dij

],

where Dij is the determinant of order n− 1,which we obtain by deleting in A′ (the trans-pose of A) line i and column j. Dij is calledthe cofactor of the element aij.

DOMAINS AND LIMITATIONSThe inverse matrix is principally used in lin-ear regression in matrix form to determinethe parameters estimates.Generally, we can say that the notion of theinverse matrix allows to resolve a system oflinear equations.

EXAMPLESLetustreat thegeneralcaseofasquarematrix

of order 2: Let A =[

a bc d

]. The deter-

minant of A is given by:

det (A) = a · d − b · c .

If the latter is not zero, we define:

A−1 = 1

a · d − b · c ·[

d −b−c a

].

So we can verify that A·A−1 = A−1 ·A = I2

by direct calculation.

Let B be a square matrix of order 3:

B =⎡⎣

3 2 14 −1 02 −2 0

⎤⎦ ,

and let us determine the inverse matric, B−1,of B if it exists.Let us calculate first the determinant of B bydeveloping according to the last column:

det (B) = 1 · [4 · (−2)− (−1) · 2] ,

det (B) = −8+ 2 = −6 .

The latter being non zero, we can calculateB−1 by determining the Dij, that is, the deter-minants of order 2 obtained by deleting theith line and the jth column of the transposeB′ of B:

B′ =⎡⎣

3 4 22 −1 −21 0 0

⎤⎦

D11 = −1 · 0− (−2) · 0 = 0 ,

D12 = 2 · 0− (−2) · 1 = 2 ,

D13 = 2 · 0− (−1) · 1 = 1 ,

...

D33 = 3 · (−1)− 4 · 2 = −11 .

Thus we obtain the inverse matrix B−1 byadjusting the correct sign to the element Dij

[that is, (−) if the sum of i and j is even and(+) if it is odd]:

B−1 = 1

−6·⎡⎣

0 −2 10 −2 4−6 10 −11

⎤⎦

= 1

6·⎡⎣

0 2 −10 2 −46 −10 11

⎤⎦ .

Page 271: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

I

Irregular Variation 269

FURTHER READING� Determinant� Generalized inverse� Matrix� Multiple linear regression� Transpose

Inversion

Consider a permutation of the integers1, 2, . . . , n. When in a pair of elements (notnecessarily adjacent) of the permutation thefirst element is bigger than the second, it iscalled inversion. This way it is possible tocount in a permutation the number of inver-sions, which is the number of pairs in whichthe big integer precedes the small one.

MATHEMATICAL ASPECTSConsider a permutation of the integers1, 2, . . . , n given by σ1, σ2, . . . , σn; the num-ber of inversions is determined by countingthe number of pairs (σi, σj) with σi > σj.To make sure we do not forget any, we startby taking σ1 and count the σi, i > 1, smallerthanσ1,andthenwetakeσ2 andcounttheele-ments thatare inferior to it located to its right,proceeding like this until σn−1; the numberof inversions of the considered permutationis then obtained.

DOMAINS AND LIMITATIONSThe number of inversions is used to calcu-late thedeterminantofasquarematrix.Thenotation aij is used for the element of thematrix that is located at the intersection ofthe ith line and the jth column. By carryingout the product a1i · a1j · . . . · a1r, for exam-ple, the parity of the number of inversions ofthe permutation i j . . . r determines the sign

attributed to the product for the calculationof the determinant.For a permutation of the elements1, 2, . . . , n, the number of inversions isalways an integer located between 0 andn·(n−1)

2 .

EXAMPLESConsider the integers1,2,3,4,5; a permuta-tion of these five integers is given, for exam-ple, by:

1 4 5 2 3 .

Thenumberof inversionsof thispermutationcan be determined by writing down the pairswhere there is an inversion:

(4, 2)

(4, 3)

(5, 2)

(5, 3) ,

which gives a number of inversions equalto 4.For the permutation 1 2 3 4 5 (identity per-mutation) we find that there is no inversion.For the permutation 5 4 3 2 1 we obtain 10inversions, because each pair is an inversionand therefore the number of inversions isequal to the number of combinations of twoof five elements chosen without repetition,meaning:

C25 =

5!

2! · 3!= 10 .

FURTHER READING� Determinant� Permutation

Irregular VariationIrregular variations or random variationsconstitute one of four components of a time

Page 272: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

270 Irregular Variation

series. They correspond to the movementsthat appear irregularly and generally duringshort periods.Irregularvariationsdo not followaparticularmodel and are not predictable.In practice, all the components of time seriesthat cannot be attributed to the influence ofcyclic fluctuations or seasonal variations orthose of the secular tendency are classed asirregular.

HISTORYSee time series.

MATHEMATICAL ASPECTSLet Yt be a time series; we can describe itwith the help of its components:• Secular tendency Tt

• Seasonal variations St

• Cyclic fluctuations Ct

• Irregular variations It

We distinguish:• Multiplicative model: Yt = Tt ·St ·Ct · It.• Additive model: Yt = Tt + St + Ct + It.When the secular tendency, the seasonalvariations, and the cyclic fluctuations aredetermined, it is possible to adjust the ini-tial data of the time series according to thesethree components. So we obtain the valuesof the irregular variations at each time t bythe following expressions:

Yt

St · Tt · Ct= It (multiplicative model)

Yt − St − Tt − Ct = It (additive model) .

DOMAINS AND LIMITATIONSThe study of the components of time seriesshows that the irregular variations can beclassed into two categories:

• The most numerous can be attributed toa large number of small causes, especiallyerrors of measure that provoke variationsof small amplitude.

• Those resulting from isolated acci-dental events of greater scale such asstrikes, administrative decisions, finan-cial booms, natural catastrophes, etc.In this case, to apply this method of themoving average we make a first data treat-ment to correct as well as we can the rawdata.

Even if we normally suppose that the irreg-ular variations do not produce durable vari-ations, but only in a short time interval, wecan imagine that they are strong enough totake the walk of cyclic or other fluctuations.In practice, we find that the irregular varia-tions have a tendency to have a small ampli-tude and to follow a normal distribution.That means that there are frequently smalldeviations and rarely large ones.

FURTHER READING� Cyclical fluctuation� Normal distribution� Seasonal variation� Secular trend� Time series

REFERENCESBowerman, B.L., O’Connel, R.T.: Time

Series and Forecasting: An AppliedApproach. Duxbury, Belmont, CA(1979)

Harnett, D.L., Murphy, J.L.: Introducto-ry Statistical Analysis. Addison-Wesley,Reading, MA (1975)

Page 273: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

J

Jackknife Method

During point estimations by confidenceinterval, it is often difficult to estimate thebias or standard error of the used estima-tor. The reasons are generally summarizedin two points: (1) the lack of informationabout the form of the theoretical data distri-bution and (2) the complexity of the theoret-ical model if the hypothesis is considered tobe too strong. The jackknife method permitsa reduction of bials in numerical estimationof the standard error as well as a confidence.interval. It is usually made by generatingartificial subsamples from the data of thecomplete sample.

HISTORYThe jackknife method was developed byQuenouille (1949, 1956) and John WilderTukey (1958). It is now the most widely usedmethod thanks to computers, which can gen-erate a large amount of data in a very shorttime.The creation of the jackknife denominationis attributed to Tukey, J.W. (1958). The nameof this method issued from a scout knife withmany blades ready to be used in a variety ofsituations.

MATHEMATICAL ASPECTSLet x1, . . . , xn, be a number of independentobservations of a random variable. Wewant to estimate a parameter θ of the popu-lation.Thejackknife techniqueisthefollow-ing: the estimator θ of θ is calculated fromthe initial sample.The latter is separated intoJ groups with m = n

J observations in each ofthem (to simplify the problem, we supposethat n is a multiple of J).The estimator of θ by the jackknife methodis the mean of the pseudovalues:

ˆθ = 1

J∑j=1

θj ,

where θj is called the pseudovalue anddefined by:

θj = J · θ − (J − 1) · θ(j)

and the estimators θ(j) (j = 1, . . . , J) are cal-culated in the same way as θ , but with the jthgroup deleted.The estimation of the variance is definedby:

S2 = Var( ˆθ) =J∑

j=1

(θj − ˆθ)2

(J − 1) · J .

Page 274: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

272 Jackknife Method

The estimator of the bias is:

B( ˆθ) = (J − 1) · (θ(.) − θ ) ,

where θ(.) is defined as follows:

θ(.) = 1

J∑j=1

θ(j) .

The confidence interval at the significancelevel α is given by:

[ˆθ − S · tn−1,1− α

2√n

; ˆθ + S · tn−1,1− α2√

n

].

One of the important properties of the jack-knife method is the fact that we delete fromthe bias the term of order 1

n .Suppose that a1, a2, . . . are functions of θ ;we can pose as a mathematical expectationof θ :

E[θ] = θ + a1

n+ a2

n2 + . . .

= θ + a1

m · J +a2

(m · J)2+ . . .

and

E[θj] = J · E[θ ]− (J − 1) · E[θ(j)]

= J ·[θ + a1

m · J +a2

(m · J)2 + . . .

]

− (J − 1) ·[θ + a1

m · (J − 1)+ . . .

]

= J · θ + a1

m+ a2

m2 · J + . . .− J · θ+ θ − a1

m− a2

m2 · (J − 1)− . . .

= θ − a2 · 1

m2 · J · (J − 1)+ . . . .

DOMAINS AND LIMITATIONSThe ideabehind the jackknifemethod,whichis relatively old, became modern thanks toinformatics. The Jackknife generates sam-ples from the original data and determine

the confidence interval at a chosen signifi-cance level of an unknown statistic.In practical cases, the groups are composedonly in an observation (m = 1). The esti-mators θ(j) are calculated by deleting the jthobservation.

EXAMPLESAlargechain of storeswants todetermine theevolution of its turnover between 1950 and1960.To obtain a correct estimation, we takea sample of 8 points of sale where we knowthe respective turnovers.The data are summarized in the followingtable:

Point of sale Turnover (mill $)

1950 1960

1 6.03 7.59

2 12.42 15.65

3 11.89 14.95

4 3.12 5.03

5 8.17 11.08

6 13.34 18.48

7 5.30 7.98

8 4.23 5.61

The total turnover of the 8 points of saleincreased from 64.5 in 1950 to 86.37 mill $in 1960.Thus the estimation of the increasing factorequals:

θ = 86.37

64.5= 1.34 .

We want to calculate the bias of this estima-tion as well as its standard deviation.We calculate the θ(j) for j going from 1 to8, that is, the factors of increase calculatedwithout the data of the jth point of sale. We

Page 275: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

J

Jackknife Method 273

obtain:

θ(1) = 86.37− 7.59

64.5− 6.03= 1.347 ,

θ(2) = 1.358 θ(6) = 1.327

θ(3) = 1.358 θ(7) = 1.324

θ(4) = 1.325 θ(8) = 1.340

θ(5) = 1.337 .

So we can calculate θ(.):

θ(.) = 1

J∑j=1

θ(j) = 1

8· (10.716) = 1.339 .

We calculate the pseudovalues θj. Recall thatθj is calculated as follows:

θj = J · θ − (J − 1) · θ(j) .

We find:

θ1 = 1.281 θ5 = 1.356

θ2 = 1.207 θ6 = 1.423

θ3 = 1.210 θ7 = 1.443

θ4 = 1.436 θ8 = 1.333 .

We calculate the estimator of θ by the jack-knife method:

ˆθ = 1

J∑j=1

θj = 1

8· (10.689) = 1.336 .

So we determine the bias:

B( ˆθ) = (J − 1) · (θ(.) − θ )

= (8− 1) · (1.339− 1.34)

= −0.0035 .

The standard error is given by:

S = √Var( ˆθ)

=√√√√

J∑j=1

(θj − ˆθ)2

(J − 1) · J

=√

0.0649

7 · 8 = 0.034 .

So we can calculate the confidence inter-val of the increase factor between 1950 and1960 of the turnover of the chain. For a sig-nificance level α of 5%, the critical value ofthe Student table for 7◦ of freedom is:

t7,0.975 = 2.365 .

The confidence interval for the estimation ofthe increase factor is:

[ˆθ − S · t7,0.975√

n; ˆθ + S · t7,0.975√

n

]

[1.34− 0.034 · 2.365√

8,

1.34+ 0.034 · 2.365√8

],

that is:[1.312, 1.368] .

So we can affirm that the mean increasein the turnover is approximately containedbetween 31% and 37% during the period1950–1960.

FURTHER READING� Bias� Confidence interval� Estimation� Estimator� Point estimation� Standard error

REFERENCESEfron, B.: The Jackknife, the Bootstrap

and the other Resampling Plans. Societyfor Industrial and Applied Mathematics,Philadelphia (1982)

Quenouille, M.H.: Approximate tests of cor-relation in time series. J. Roy. Stat. Soc.Ser. B 11, 68–84 (1949)

Page 276: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

274 Jeffreys, Harold

Quenouille, M.H.: Notes on bias in estima-tion. Biometrika 43, 353–360 (1956)

Tukey, J.W.: Bias and confidence in not quitelarge samples. Ann. Math. Stat. 29, 614(1958)

Jeffreys, Harold

Jeffreys, Harold was born in Fatfield, Eng-land in 1891. He studied mathematics,physics, physical chemistry, and geology atArmstrong College in Newcastle, where heobtained his degree in 1910. He then joinedSt. John’s College at Cambridge University,wherehebecameaFellowin1914andstayedthere for 75 years. He died there in 1989.He published his first paper (about photo-graphy) in 1910. History has preserved twoof his works in particular:

1939 Theory of Probability. Oxford Uni-versity Press, Oxford.

1946 (with Jeffreys, B.S.) Methodsof Mathematical Physics. Cam-bridge University Press, Cambridge;Macmillan, New York.

Joint Density Function

Suppose X1, X2 are continuous random vari-ables defined on the same sample space andthat

F (x1, x2) = P(X1 ≤ x1, X2 ≤ x2)

=xk∫

−∞

x1∫

−∞f (t1, t2) dt1, dt2

for all x1, xk. Then f (x1, x2) is the jointdensity function of (X1, X2) providedf (x1, x2) ≥ 0 and that

∞∫

−∞

∞∫

−∞f (x1x2) dx1dx2 = 1 .

Suppose (X1, . . . , Xk) are continuous ran-dom variables defined on the same samplespace and

F(x1, . . . , xk) = P(X1 ≤ x1, . . . , Xk ≤ xk)

=x1∫

−∞

xk∫

−∞f (t1, . . . , tk) dt1 . . . dtk

for all x1, . . . , xk. Then f (x1, . . . , xk) is thejoint density function of (x1, . . . , xk) provid-ed f (x1, . . . , xk) ≥ 0 and

∞∫

−∞

∞∫

−∞f (x1, . . . , xk) dx1 . . . dxk = 1 .

REFERENCESKnight, K.: Mathematical statistics. Chap-

man & Hall/CRC, London (2000)

Joint Distribution Function

The joint distribution function of a pair ofrandom variables is defined, as the proba-bility that the first random variable takesa value inferior or equal to the first real num-ber and that simultaneously the second vari-

Page 277: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

J

Joint Probability Distribution Function 275

able takes a value inferior or equal to the sec-ond real number.

HISTORYSee probability.

MATHEMATICAL ASPECTSConsider two random variables X and Y.The joint distribution function is defined asfollows:

Fxy (a, b) = P (X ≤ a, Y ≤ b) ,

−∞ < a, b <∞ .

FURTHER READING� Distribution function� Marginal distribution function� Pair of random variables

REFERENCESHogg, R.V., Craig, A.T.: Introduction

to Mathematical Statistics, 2nd edn.Macmillan, New York (1965)

Ross, S.M.: Introduction to ProbabilityModels, 8th edn. John Wiley, New York(2006)

Joint ProbabilityDistribution Function

Suppose thatX1,X2 arediscrete random vari-ables defined on the same sample space.Then the joint probability distribution func-tion of X = (X1, X2) is defined to be

f (x1, x2) = P(X1 = x1, X2 = x2) .

The joint probability distribution functionmust satisfy the following conditions:

∑x1,x2

f (x1, x2) = 1 .

A multiple joint probability distributionfunction of X = (X1, . . . , Xk) is defined to bef (x1, . . . , xk) = P(X1 = x1, . . . , Xk = xk)

satisfying

∑(x1, ..., xk)

f (x1, . . . , xk) = 1 .

REFERENCESCasella, G., Berger, R.L.: Statistical Infer-

ence. Duxbury, Pacific Grove (2001)

Page 278: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

K

Kendall, Maurice George

Kendall, Maurice was born in 1907 in Ket-tering,Northamptonshire,England.Hestud-ied Mathematics at St. John’s College, Cam-bridge. After graduation as a MathematicsWrangler in 1929, he joined the British Civ-il Service in the Ministry of Agriculture. Hewas elected a Fellow of the Society in 1934.In 1937, he worked with G. Udny Yule in therevision of his standard statistical textbook,Introduction to the Theory of Statistics. Healso work on the rank correlation coefficientwhich bears his name, Kendall’s tau, whicheventually led to a monograph on Rank Cor-relation in 1948.In 1938 and 1939 he began work, along withBernard Babington Smith, on the problemof random number generation, develop-ing both one of the first early mechanicaldevices to produce random digits, and for-mulated a series of tests such as frequencytest, serial test and a poker test, for statisti-cal randomness in a given set of digits.During the war he managed to produce vol-umeoneoftheAdvancedTheoryofStatisticsin 1943 and a second volume in 1946.In 1957, he published Multivariate Analysisand in the same year he also developed, withW.R. Buckland, a Dictionary of StatisticalTerms.

In 1953, he published The Analytics of Eco-nomic Time Series, and in 1961 he left theUniversity of London and took a position asthe Managing Director of a consulting com-pany, Scientific Control Systems, and in thesameyearbeganatwo-year termasPresidentof the Royal Statistical Society.In 1972, he became Director of the WorldFertility Survey, a project sponsored by theInternational Statistical Institute and theUnited Nations. He continued this workuntil 1980, when illness forced him to retire.He was knighted in 1974 for his servicesto the theory of statistics, and received thePeace Medal of the United Nations in 1980in recognition for his work on the World Fer-tility Survey. He was also elected a fellow ofthe British Academy and received the high-est honor of the Royal Statistical Society, theGuy Medal in Gold. At the time of his deathin 1983, he was Honorary President of theInternational Statistical Institute.

Some principal works and articles of Ken-dall, Maurice George:

1938 (with Babington Smith, B.) Random-ness and Random Sampling Num-bers. J. Roy. Stat. Soc. 101:1, 147–166.

1979 (with Stuart, A.) Advanced theory ofStatistics. Arnold.

Page 279: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

278 Kendall Rank Correlation Coefficient

1957 (with Buckland, W.R.) A Dictionaryof Statistical Terms. Internation-al Statistical Institute, The Hague,Netherland.

1973 Time Series, Griffin, London.

FURTHER READING� Kendall rank correlation coefficient� Random number generation

Kendall RankCorrelation Coefficient

The Kendall rank correlation coefficient(Kendall τ ) is a nonparametric measure ofcorrelation.

HISTORYThis rank correlation coefficient was dis-cussed as far back as the early 20th centuryby Fechner, G.T. (1897), Lipps, G.F. (1906),and Deuchler, G. (1914).Kendall, M.G. (1938) not only rediscov-ered it independently but also studied itusing a (nonparametric) approach. His 1970monographcontainsacompletedetailedpre-sentation of the theory aswell asabiography.

MATHEMATICAL ASPECTSConsider two random variables (X, Y)observed on a sample of size n with npairs of observations (X1, Y1), (X2, Y2),. . . , (Xn, Yn). An indication of the correla-tion between X and Y can be obtained byordering the values Xi in increasing orderand by counting the number of correspond-ing values Yi not satisfying this order.Q will denote the number of inversionsamong the values of Y that are required to

obtain the same (increasing) order as the val-ues of X.Since there are n(n−1)

2 distinct pairs that can

be formed, 0 ≤ Q ≤ n(n−1)2 ; the value 0 is

obtained when all the values Yi are alreadyin increasing order, and the value n(n−1)

2 isreached when all the values Yi are in inverseorder of Xi, each pair having to be switchedto obtain the desired order.The Kendall rank correlation coefficient,denoted by τ , is defined by:

τ = 1− 4Q

n(n− 1).

If all the pairs are in increasing order, then:

τ = 1− 4 · 0n(n− 1)

= 1 .

If all the pairs are in reverse order, then:

τ = 1− 4 · 12 · n(n− 1)

n(n− 1)= −1 .

An equivalent definition of the Kendall rankcoefficient can be given as follows: twoobservations are called concording if thetwo members of one observation are larg-er than the respective members of the oth-er observation. For example, (0.9, 1.1) and(1.5, 2.4) are two concording observationsbecause0.9 < 1.5and1.1 < 2.4.Twoobser-vations are said to be discording if the twomembers of one observation are in oppositeorder to the respective members of the otherobservation.Forexample, (0.8,2.6) and(1.3,2.1) are two discordingobservationsbecause0.8 < 1.3 and 2.6 > 2.1.Let Nc and Nd denote the total number ofpairs of concording and discording observa-tions, respectively.

Page 280: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

K

Kendall Rank Correlation Coefficient 279

Two pairs for which Xi = Xj and Yi = Yj

are neither concording nor discording andare therefore not counted either in Nc or inNd .With this notation the Kendall rank coeffi-cient is given by:

τ = 2(Nc − Nd)

n(n− 1).

Notice that when there are no pairs for whichXi = Xj or Yi = Yj, the two formulations ofτ are exactly the same. In the opposite situa-tion, the values given by both formulas canbe different.

Hypothesis TestThe Kendall rank correlation coefficient isoften used as a statistical test to determineif there is a relation between two randomvariables. The test can be a two-sided testor a one-sided test. The hypotheses are:

A: Two-sided case:

H0: X and Y are mutually independent.

H1: There is either a positive or a neg-ative correlation between X and Y.

There is a positive correlation when thelarge values of X tend to be associatedwith the large values of Y and the smallvalues of X with the small values of Y.There is a negative correlation when thelarge values of X tend to be associatedwith thesmallvaluesofY andviceversa.

B: One-sided case:

H0: X and Y are mutually independent.

H1: There is a positive correlationbetween X and Y.

C: One-sided case:

H0: X and Y are mutually independent.

H1: There is a negative correlationbetween X and Y.

The statistical test is defined as follows:

T = Nc − Nd .

Decision RulesThedecisionrulesaredifferentdependingonthe hypotheses that are made. That is whythere are decision rules A, B, and C relativeto the previous cases.Decision rule AReject H0 at the sigificant level α if

T > tn,1− α2

or T < tn, α2

,

where t is the critical value of the test givenby the Kendall table; otherwise accept H0.Decision rule BReject H0 at the sigificant level α if

T > tn,1−α .

otherwise accept H0.Decision rule C Reject H0 at the sigificantlevel α if

T < tn,α .

otherwise accept H0.It is also possible to use

τ = 1− 4Q

n(n− 1)

as a statistical test.When X and Y are independently distributedin a population, the exact distribution of τ

hasanexpected valueofzeroandavarianceof:

σ 2τ =

2(2n+ 5)

9n(n− 1)

and tends very quickly toward a normaldistribution, the approximation being goodenough for n ≥ 10.

Page 281: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

280 Kendall Rank Correlation Coefficient

In this case, to test independence at a 5%level, for example, it is enough to verify if τ

is located outside the bounds

±1.96 · στ

and to reject the independence hypothesis ifthat is the case.

DOMAINS AND LIMITATIONSThe Kendall rank correlation coefficient isused as a hypothesis test to study the depen-dence between two random variables. Itcanbeconsideredasa test of independence.As a nonparametric correlation measu-rement, it can also be used with nominalor ordinal data.A correlation measurement between twovariables must satisfy the following points:1. Its values are between −1 and +1.2. There is a positive correlation between X

and Y if the value of the correlation coef-ficient is positive; a perfect positive cor-relation corresponds to a value of +1.

3. There is a negative correlation betweenX and Y if the value of the correlationcoefficient is negative; a perfect nega-tive correlation corresponds to a valueof −1.

4. There is a null correlation between X andY when the correlation coefficient is closeto zero; one can also say that X and Y arenot correlated.

The Kendall rank correlation coefficient hasthe following advantages:• The data can be nonnumerical observa-

tions as long as they can be classifiedaccording to a determined criterion.

• It is easy to calculate.• The associated statistical test does not

make a basic hypothesis based on the

shape of the distribution of the popula-tion from which the samples are taken.

The Kendall table gives the theoretical val-ues of the statistic τ of the Kendall rank cor-relation coefficient used as a statistical testunder the independence hypothesis of tworandom variables.AKendall tablecanbefoundinKaarsemakerand van Wijngaarden (1953).Here is a sample of the Kendall table for n =4, . . . , 10 and α = 0.01 and 0.05:

n α = 0.01 α = 0.05

4 6 4

5 8 6

6 11 9

7 15 11

8 18 14

9 22 16

10 25 19

EXAMPLESIn this example eight pairs of real twins takeintelligence tests. The goal is to see if there isindependence between the tests of the onewho is born first and those of the one who isborn second.The data are given in the table below; thehighest scores correspond to the best results.

Pair of twins Firstborn Xi

Secondborn Yi

1 90 88

2 75 79

3 99 98

4 60 66

5 72 64

6 83 83

7 83 88

8 90 98

Page 282: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

K

Kiefer, Jack Carl 281

The pairs are then classified in increasingorder for X, and the concording and discord-ing pairs are determined. This gives:

Pair of twins(Xi , Yi )

Concordingpairs

Discordingpairs

(60,66) 6 1

(72,64) 6 0

(75,79) 5 0

(83,83) 3 0

(83,88) 2 0

(90,88) 1 0

(90,98) 0 0

(99,98) 0 0

Nc = 23 Nd = 1

The Kendall rank correlation coefficient isgiven by:

τ = 2(Nc − Nd)

n(n− 1)

= 2(23− 1)

8 · 7= 0.7857 .

Notice that since there are several observa-tions for which Xi = Xj or Yi = Yj, the valueof the coefficient given by:

τ = 1− 4Q

n(n− 1)= 1− 4 · 1

56= 0.9286

is different.Inbothcases,wenoticeapositivecorrelationbetween the intelligence tests.We will now carry out the hypothesis test:

H0: There is independence between theintelligence tests of a pair of twins.

H1: There is a positive correlation betweenthe intelligence tests.

We chose a significant level of α = 0.05.Since we are in case B, H0 is rejected if

T > t8,0.95 ,

where T = Nc−Nd and t8,0.95 is the value ofthe Kendall table. Since T = 22 and t8,0.95 =14, H0 is rejected.We can then conclude that there is a positivecorrelation between the results of the intel-ligence tests of a pair of twins.

FURTHER READING� Hypothesis testing� Nonparametric test� Test of independence

REFERENCESDeuchler, G.: Über die Methoden der Kor-

relationsrechnung in der Pädagogik undPsychologie. Zeitung für PädagogischePsycholologie und Experimentelle Päda-gogik, 15, 114–131, 145–159, 229–242(1914)

Fechner, G.T.: Kollektivmasslehre. W. En-gelmann, Leipzig (1897)

Kaarsemaker, L., van Wijngaarden, A.:Tables for use in rank correlation. Stat.Neerland. 7, 53 (1953)

Kendall, M.G.: A new measure of rank cor-relation. Biometrika 30, 81–93 (1938)

Kendall, M.G.: Rank Correlation Methods.Griffin, London (1948)

Lipps, G.F.: Die Psychischen Massmethod-en. F. Vieweg und Sohn, Braunschweig,Germany (1906)

Kiefer, Jack CarlKiefer, Jack Carl was born in Cincinnati,Ohio in 1924. He entered the MassachusettsInstitute of Technology in 1942, but after1 year of studying engineering and eco-nomics he left to take on war-related workduring World War II. His master’s thesis,

Page 283: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

282 Kolmogorov, Andrei Nikolaevich

Sequential Determination of the Maxi-mum of a Function, was supervised byHarold Freeman. It has been the basis forhis paper “Sequential minimax search fora maximum” which appeared in 1953 inthe “Proceedings of the American Mathe-matical Society”. In 1948 he went to theDepartment of Mathematical Statistics atColumbia University, where Abraham Waldwas preeminent in a department that includ-ed Ted Anderson, Henry Scheffé, and JackWolfowitz. He wrote his doctoral thesis indecision theory under Wolfowitz and wentto Cornell University in 1951 with Wol-fowitz. In 1973 Kiefer was elected the firstHorace White Professor at Cornell Univer-sity, a position he held until 1979, when heretired and joined the faculty at the Univer-sity of California at Berkeley. He died at theage of 57 in 1981.Kiefer’s research area was the design ofexperiments. Most of his 100 publicationsdealt with that topic. He also wrote papersontopicsinmathematicalstatisticsincludingdecision theory, stochastic approximation,queuing theory, nonparametric inference,estimation, sequential analysis, and condi-tional inference.Kiefer was a fellow of the Institute of Mathe-matical Statistics and the American Statis-tical Association and president of the Insti-tuteofMathematicalStatistics(1969–1970).He was elected to the American Academy ofArtsand Sciences in 1972 and to theNationalAcademy of Sciences (USA) in 1975.

Selected works and publications of JackCarl Kiefer:

1987 Introduction to Statistical Inference.Springer, Berlin Heidelberg NewYork

FURTHER READING� Design of experiments

REFERENCESKiefer, J.: Optimum experimental designs.

J. Roy. Stat. Soc. Ser. B 21, 272–319(1959)

L.D. Brown, I. Olkin, J. Sacks and H.P.Wynn (eds.): Jack Karl Kiefer, CollectedPapers. I: Statistical inference and proba-bility (1951–1963). New York (1985)

L.D.Brown, I.Olkin, J.SacksandH.P.Wynn(eds.): Jack Karl Kiefer, Collected Papers.II: Statistical inference and probability(1964–1984). New York (1985)

L.D. Brown, I. Olkin, J. Sacks and H.P.Wynn (eds.): Jack Karl Kiefer, CollectedPapers. III: Design of experiments. NewYork (1985)

Kolmogorov,Andrei Nikolaevich

Born in Tambov, Russia in 1903, Kol-mogorov, Andrei Nikolaevich is one of thefounders of modern probability. In 1920,he entered Moscow State University andstudied mathematics, history, and metallur-gy. In 1925, he published his first article inprobability on the inequalities of the partialsums of random variables, which became theprincipal reference in the field of stochas-tic processes. He received his doctorate in1929 and published 18 articles on the law oflarge numbers as well as on intuitive logic.He was named professor at Moscow StateUniversity in 1931. In 1933, he publishedhis monograph on probability theory.In1939hewaselectedmemberof theAcade-my of Sciences of the USSR. He received

Page 284: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

K

Kolmogorov–Smirnov Test 283

the Lenin Prize in 1965 and the Order ofLenin on six different occasions, as well astheLobachevskyPrize in1987.Hewaselect-ed member of many other foreign academiesincluding the Romanian Academy of Sci-ences (1956), the Royal Statistical Societyof London (1956), the Leopoldina Academyof Germany (1959), the American Acade-my of Arts and Sciences (1959), the Lon-don Mathematical Society (1959), the Ame-rican Philosophical Society (1961), the Indi-an Institute of Statistics(1962), the Hol-land Academy of Sciences (1963), the Roy-al Society of London (1964), the NationalAcademyoftheUnitedStates(1967),andtheAcadémie Française des Sciences (1968).

Selected principal works of Kolmogorov,Andrei Nikolaevich:

1933 Grundbegriffe der Wahrschein-lichkeitsrechnung. Springer, BerlinHeidelberg New York.

1933 Sulla determinazione empirica diuna lege di distribuzione. Giornaledell’Instituto Italiano degli Attuari,4, 83–91 (6.1).

1941 Local structure of turbulence inincompressible fluids with very highReynolds number. Dan SSSR, 30,229.

1941 Dissipation of energy in locallyisotropic turbulence. Dokl. Akad.Nauk. SSSR, 32, 16–18.

1958 (with Uspenskii, V.A.) K oprede-leniyu algoritma. (Toward the def-inition of an algorithm). UspekhiMatematicheskikh Nauk 13(4):3–28, American Mathematical SocietyTranslations Series 2(29):217–245,1963.

1961 (with Fomin, S.V.) Measure,Lebesgue integrals and Hilbert space.Natascha Artin Brunswick and AlanJeffrey. Academic, New York.

1963 On the representation of continu-ous functions of many variables bysuperposition of continuous func-tions of one variable and addition.Doklady Akademii Nauk SSR, 114,953–956, 1957. English translation.Mathematical Society Transactions,28, 55–59.

1965 Three approaches to the quantita-tive definition of information. Prob-lems of Information Transmission,1, 1–17. Translation of Problemyperedachi informatsii 1(1), 3–11(1965).

1987 (with Uspenskii, V.A.) Algorithmsand randomness. Teoria veroyatnos-tey i ee primeneniya (Probability the-ory and its applications), 3(32):389–412.

FURTHER READING� Kolmogorov–Smirnov test

Kolmogorov–Smirnov Test

The Kolmogorov–Smirnov test is a nonpara-metric goodness-of-fit test and is used todetermine wether two distributions differ,or whether an underlying probability distri-bution differes from a hypothesized distri-bution. It is used when we have two sam-plescomingfromtwopopulationsthatcanbedifferent. Unlike the Mann–Whitney testand the Wilcoxon test where the goal is todetect the difference between two means ormedians, the Kolmogorov–Smirnov test has

Page 285: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

284 Kolmogorov–Smirnov Test

the advantage of considering the distributionfunctions collectively. The Kolmogorov–Smirnov test can also be used as a goodness-of-fit test. In this case, we have only onerandom sample obtained from a populationwhere the distribution function is specificand known.

HISTORYThe goodness-of-fit test for a samplewas invented by Andrey Nikolaevich Kol-mogorov (1933).The Kolmogorov–Smirnov test for two sam-ples was invented by Vladimir IvanovichSmirnov (1939).In Massey (1952) we find a Smirnov tablefor the Kolmogorov–Smirnov test for twosamples, and in Miller (1956) we find a Kol-mogorov table for the goodness-of-fit test.

MATHEMATICAL ASPECTSConsider two independent random samples:(X1, X2, . . . , Xn), a sample of size n comingfrom a population 1, and (Y1, Y2, . . . , Ym),a sample of dimension m coming froma pop-ulation 2. We denote by, respectively, F (x)and G (x) their unknown distribution func-tions.

HypothesesThe hypotheses to test are as follows:

A: Two-sided case:

H0: F (x) = G (x) for each x

H1: F (x) �= G (x) or at least one valueof x

B: One-sided case:

H0: F (x) ≤ G (x) for each x

H1: F (x) > G (x) for at leastonevalueof x

C: One-sided case:

H0: F (x) ≥ G (x) for each x

H1: F (x) < G (x) for at leastonevalueof x

In case A, we make the hypothesis thatthere is no difference between the distri-bution functions of these two populations.Both populations can then be seen as onepopulation.In case B, we make the hypothesis thatthe distribution function of population 1 issmaller than those of population 2.We some-times say that, generally, X tends to be small-er than Y.In case C, we make the hypothesis that X isgreater than Y.We denote by H1 (x) the empiricaldistribution function of the sample(X1, X2, . . . , Xn) and by H2 (x) the empir-ical distribution function of the sample(Y1, Y2, . . . , Ym). The statistical test are de-fined as follows:

A: Two-tail caseThe statistical test T1 is defined as the great-est vertical distance between two empiricaldistribution functions:

T1 = supx|H1 (x)− H2 (x) | .

B: One-tail caseThe statistical test T2 is defined as the great-est vertical distance when H1 (x) is greaterthan H2 (x):

T2 = supx

[H1 (x)− H2 (x)] .

C: One-tail caseThe statistical test T3 is defined as the great-est vertical distance when H2 (x) is greaterthan H1 (x):

T3 = supx

[H2 (x)− H1 (x)] .

Page 286: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

K

Kolmogorov–Smirnov Test 285

Decision RuleWe reject H0 at the significance levelα if theappropriate statistical test (T1, T2, or T3) isgreater than the value of the Smirnov tablehaving for parameters n, m, and 1−α, whichwe denote by tn,m,1−α, that is, if

T1(or T2or T3) > tn,m,1−α .

If we want to test the goodness of fit of anunknown distribution functionF (x)of a ran-dom sample from a population with a spe-cific and known distribution function Fo (x),then the hypotheses will be the same as thosefor testing two samples, except that F (x)andG (x) are replaced by F (x) and Fo (x).If H (x) is the empirical distribution functionof a random sample, then the statistical testsT1, T2, and T3 are defined as follows:

T1 = supx|Fo (x)− H (x)| ,

T2 = supx

[Fo (x)− H (x)] ,

T3 = supx

[H (x)− Fo (x)] .

The decision rule is as follows: reject H0 atthe significance level α if T1 (or T2 or T3)is greater than the value of the Kolmogorovtable having for parameters n and 1 − α,which we denote by tn,1−α, that is, if

T1(or T2or T3) > tn,1−α .

DOMAINS AND LIMITATIONSTo perform the Kolmogorov–Smirnov test,the following must be observed:1. Both samples must be taken randomly

from their respective populations.2. There must be mutual independence

between two samples.3. Themeasurescalemustbeat leastordinal.

4. To perform an exact test, the random vari-ables must be continuous; otherwise thetest is less precise.

EXAMPLESThe first example treats the Kolmogorov–Smirnov test for two samples and the secondone for the goodness-of-fit test.In a class, we count 25 pupils: 15 boys and10 girls. We perform a test of mental calcula-tions to see if the boys tend to be better thanthe girls in this domain.The data are presented in the following table;the highest scores correspond to the resultsof the test.

Boys (Xi ) Girls (Yi )

19.8 17.5 17.7 14.1

12.3 17.9 7.1 23.6

10.6 21.1 21.0 11.1

11.3 16.4 10.7 20.3

13.3 7.7 8.6 15.7

14.0 15.2

9.2 16.0

15.6

We test the hypothesis according to whichthe distributions of the results of the girls andthose of the boys are identical. This meansthat the population from which the sampleof X is taken has the same distribution func-tionas thepopulation from which thesampleof Y is taken. Hence the null hypothesis:

H0 : F (x) = G (x) for each x .

If the two-tail case is applied here, we calcu-late:

T1 = supx|H1 (x)− H2 (x)| ,

where H1 (x) and H2 (x) are the empiri-cal distribution functions of the samples

Page 287: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

286 Kolmogorov–Smirnov Test

(X1, X2, . . . , X15) and (Y1, Y2, . . . , Y10),respectively. In the following table, we haveclassed the observations of two samples inincreasing order to simplify the calculationsof H1 (x)− H2 (x).

Xi Yi H1 (x) − H2 (x)

7.1 0− 1/10= − 0.1

7.7 1/15− 1/10= − 0.0333

8.6 1/15− 2/10= − 0.1333

9.2 2/15− 2/10= − 0.0667

10.6 3/15− 2/10= 0

10.7 3/15− 3/10= − 0.1

11.1 3/15− 4/10= − 0.2

11.3 4/15− 4/10= − 0.1333

12.3 5/15− 4/10= − 0.0667

13.3 6/15− 4/10= 0

14.0 7/15− 4/10= 0.0667

14.1 7/15− 5/10= − 0.0333

15.2 8/15− 5/10= 0.0333

15.6 9/15− 5/10= 0.1

15.7 9/15− 6/10= 0

16.0 10/15− 6/10= 0.0667

16.4 11/15− 6/10= 0.1333

17.5 12/15− 6/10= 0.2

17.7 12/15− 7/10= 0.1

17.9 13/15− 7/10= 0.1667

19.8 14/15− 7/10= 0.2333

20.3 14/15− 8/10= 0.1333

21.0 14/15− 9/10= 0.0333

21.1 1− 9/10= 0.1

23.6 1− 1= 0

We have then:

T1 = supx|H1 (x)− H2 (x)|

= 0.2333 .

The value of the Smirnov table for n =15, m = 10, and 1 − α = 0.95 equalst15,10,0.95 = 0.5.Thus T1 = 0.2333 < t15,10,0.95 = 0.5,and H0 cannot be rejected. This means that

there is no significant difference in the levelof mental calculations of girls and boys.Consider the following random sample ofdimension 10: X1 = 0.695, X2 = 0.937,X3 = 0.134, X4 = 0.222, X5 = 0.239,X6 = 0.763, X7 = 0.980, X8 = 0.322,X9 = 0.523, X10 = 0.578.We want to verify by the Kolmogorov–Smirnovtest if thissamplecomesfromauni-form distribution. The distribution func-tion of the uniform distribution is given by:

Fo (x) =

⎧⎪⎪⎨⎪⎪⎩

0 if x < 0

x if 0 ≤ x < 1

1 otherwise .

.

The null hypothesis H0 is then as fol-lows, where F (x) is the unknown distri-bution function of the population associatedto the sample:

H0 : F (x) = Fo (x) for each x .

If the two-tail case is applied, we calculate:

T1 = supx|Fo (x)− H (x)| ,

where H (x) is the empirical distributionfunction of the sample (X1, X2, . . . , X10).In the following table, we class the 10 obser-vations in increasing order to simplify thecalculation of F0 (x)− H (x).

Xi Fo (x) H (x) Fo (x) − H (x)

0.134 0.134 0.1 0.134− 0.1= 0.034

0.222 0.222 0.2 0.222− 0.2= 0.022

0.239 0.239 0.3 0.239− 0.3= − 0.061

0.322 0.322 0.4 0.322− 0.4= − 0.078

0.523 0.523 0.5 0.523− 0.5= 0.023

0.578 0.578 0.6 0.578− 0.6= − 0.022

0.695 0.695 0.7 0.695− 0.7= − 0.005

0.763 0.763 0.8 0.763− 0.8= − 0.037

0.937 0.937 0.9 0.937− 0.9= 0.037

0.980 0.980 1.0 0.980− 1.0= − 0.020

Page 288: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

K

Kruskal-Wallis Table 287

We obtain then:

T1 = supx|Fo (x)− H (x)| = 0.078 .

The value of the Kolmogorov table for n =10 and 1− α = 0.95 is t10,0.95 = 0.409.If T1 is smaller than t10,0.95 (0.078 < 0.409),then H0 cannot be rejected. That means thatthe random sample could come from a uni-formly distributed population.

FURTHER READING� Goodness of fit test� Hypothesis testing� Nonparametric test

REFERENCEKolmogorov, A.N.: Sulla determinazione

empirica di una legge di distribuzione.Giornale dell’Instituto Italiano degliAttuari 4, 83–91 (6.1) (1933)

Massey, F.J.: Distribution table for the devi-ation between two sample cumulatives.Ann. Math. Stat. 23, 435–441 (1952)

Miller, L.H.: Table of percentage points ofKolmogorov statistics. J. Am. Stat. Assoc.31, 111–121 (1956)

Smirnov, N.V.: Estimate of deviationbetween empirical distribution functionsin two independent samples. (Russian).Bull. Moscow Univ. 2(2), 3–16 (6.1, 6.2)(1939)

Smirnov, N.V.: Table for estimating thegoodness of fit of empirical distributions.Ann. Math. Stat. 19, 279–281 (6.1) (1948)

Kruskal-Wallis TableThe Kruskal–Wallis table gives the theoreti-cal values of the statistic H of the Kruskal–Wallis test under the hypothesis that there

is no difference among the k (k ≥ 2) popu-lations that we want to compare.

HISTORYSee Kruskal–Wallis test.

MATHEMATICAL ASPECTSLet k be the number of samples of probablydifferent sizes n1, n2, . . . , nk. We designateby N the total number of observations:

N =k∑

i=1

ni .

We class the N observations in increasingorder without taking into account whichsamples they belong to. We then give rank 1to the smallest value, rank 2 to the next great-est value, and so on until rank N, which isgiven to the greatest value.We denote by Ri the sum of the ranks givento the observations of sample i:

Ri =ni∑

j=1

R(Xij

), i = 1, 2, . . . , k ,

where Xij represents observation j of samplei and R(Xij) the corresponding rank. Whenmany observations are identical and of thesame rank, we give them a mean rank (seeKruskal–Wallis test). If there are no meanranks, the statistical test is defined in the fol-lowing way:

H =(

12

N(N + 1)

k∑i=1

R2i

ni

)− 3 (N + 1) .

On what to do if there are mean ranks, seeKruskal–Wallis test.The Kruskal–Wallis table gives the values ofthe statistic H of the Kruskal–Wallis test inthe case of three samples, for different valuesof n1, n2, and n3 (with n1, n2, n3 ≤ 5).

Page 289: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

288 Kruskal-Wallis Test

DOMAINS AND LIMITATIONSThe Kruskal–Wallis table is used for non-parametric tests that use ranks and particu-larly for tests with the same name.When the number of samples i is greaterthan 3, we can make an approximation of thevalue of the Kruskal–Wallis table by the chi-square table with k−1 degrees of freedom.

EXAMPLESSee Appendix D.For an example of the use of the Kruskal–Wallis table, see Kruskal–Wallis test.

FURTHER READING� Chi-square table� Kruskal-Wallis test� Statistical table

REFERENCESKruskal, W.H., Wallis, W.A.: Use of ranks

in one-criterion variance analysis. J. Am.Stat. Assoc. 47, 583–621 and errata, ibid.48, 907–911 (1952)

Kruskal-Wallis Test

The Kruskal–Wallis test is a nonparamet-ric test that has as its goal to determine if allk populations are identical or if at least oneof the populations tends to give observationsthat are different from those of other popu-lations.The test isusedwhenwehavek samples,withk ≥ 2, coming from k populations that canbe different.

HISTORYThe Kruskal–Wallis test was developed in1952 by Kruskal, W.H. and Wallis, W.A.

MATHEMATICAL ASPECTSThe data are represented in k samples. Wedesignate by ni the dimension of the samplei, for i = 1, . . . , k, and by N the total numberof observations:

N =k∑

i=1

ni .

We class the N observations in increasingorder without taking into account whether ornot they belong to thesamesamples.We thengive rank 1 to thesmallestvalue, rank 2 to thenext greatest value, and so on until N, whichis given to the greatest value.Let Xij be the jth observation of sample i, andset i = 1, . . . , k and j = 1, . . . , ni; we thendenote the rank given to Xij by R

(Xij

).

If many observations have the same value,we give them a mean rank. The sum of theranks given to the observations of sample iis denoted by Ri, and we have:

Ri =ni∑

j=1

R(Xij

), i = 1, . . . , k .

If there are no mean ranks (or if there is a lim-ited number of them), then the statistical testis defined as follows:

H =(

12

N(N + 1)

k∑i=1

R2i

ni

)− 3 (N + 1) .

If, on the contrary, there are many meanranks, it isnecessary tomakeacorrectionandto calculate:

H = H

1−∑g

i=1

(t3i − ti

)

N3 − N

,

where g is the number of groups of meanranks and ti the dimension of ith such group.

HypothesesThe goal of the Kruskal–Wallis test is todetermine if all the populations are identical

Page 290: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

K

Kruskal-Wallis Test 289

or if at least one of the populations tends togive observations different from other pop-ulations. The hypotheses are as follows:

H0: There isno differenceamongthek pop-ulations.

H1: At least one of the populations differsfrom the other populations.

Decision RuleIf there are 3 samples, each having a dimen-sion smaller or equal to 5, and if there are nomean ranks (that is, if H is calculated), thenwe use the Kruskal–Wallis table to test H0.The decision rule is the following: We rejectthe null hypothesis H0 at the significancelevel α if T is greater than the value of thetable with parameters ni, k − 1, and 1 − α,denoted hn1,n2,n3,1−α, and if there is no avail-able exact table or if there are mean ranks, wecan make an approximation of the value ofthe Kruskal–Wallis table by the distributionof the chi-square with k− 1 degrees of free-dom (chi-square distribution), that is, if:

H > hn1,n2,n3,1−α (Kruskal–Wallis table)

or H > χ2k−1,1−α (chi-square table) .

The corresponding decision rule is based onH > χ2

k−1,1−α (chi-quare table).

DOMAINS AND LIMITATIONSThe following rules should be respected tomake the Kruskal–Wallis test:1. All the samples must be random samples

taken from their respective populations.2. In addition to the independence inside

each sample, there must be mutual inde-pendence among the different samples.

3. The scale of measure must be at least ordi-nal.

If the Kruskal–Wallis test makes us reject thenull hypothesis H0, we can use the Wilcox-on test for all the samples taken in pairs todetermine which pairs of populations tend tobe different.

EXAMPLESWe cook potatoes in 4 different oils. We wantto verify if the quantity of fat absorbed bypotatoes depends on the type of oil used. Weconduct 5 different experiments with oil 1, 6with oil 2, 4 with oil 3, and 5 with oil 4, andwe obtain the following data:

Type of oil

1 2 3 4

64 78 75 55

72 91 93 66

68 97 78 49

77 82 71 64

56 85 70

77

In this example, the number of samplesequals 4 (k = 4) with the following respec-tive dimensions:

n1 = 5 ,

n2 = 6 ,

n3 = 4 ,

n4 = 5 .

The number of observations equals:

N = 5+ 6+ 4+ 5 = 20 .

We class the observations in increasing orderand give them a rank from 1 to 20 taking intoaccount the mean ranks. We obtain the fol-lowing tablewith therankoftheobservationsin parentheses and at the end of each samplethe sum of the ranks given to the correspond-ing sample:

Page 291: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

290 Kruskal-Wallis Test

Type of oil

1 2 3 4

64 (4.5) 78 (14.5) 75 (11) 55 (2)

72 (10) 91 (18) 93 (19) 66 (6)

68 (7) 97 (20) 78 (14.5) 49 (1)

77 (12.5) 82 (16) 71 (9) 64 (4.5)

56 (3) 85 (17) 70 (8)

77 (12.5)

R1 = 37 R2 = 98 R3 = 53.5 R4 = 21.5

We calculate:

H =(

12

N (N + 1)

k∑i=1

R2i

ni

)− 3 (N + 1)

= 12

20 (20+ 1)

·(

372

5+ 982

6+ 53.52

4+ 21.52

5

)

− 3 (20+ 1)

= 13.64 .

Ifwemakeanadjustment to take intoaccountthe mean ranks, we get:

H = H

1−∑g

i=1

(t3i − ti

)

N3 − N

= 13.64

1− 8− 2+ 8− 2+ 8− 2

203 − 20= 13.67 .

We see that the difference between H and His minimal.

The hypotheses are as follows:

H0: There is no difference between the fouroils.

H1: At least one of the oils differs from theothers.

The decision rule is as follows: Reject thenull hypothesis H0 at the significance levelα if

H > χ2k−1,1−α ,

where χ2k−1,1−α is the value of the chi-

square table at level 1 − α and k − 1 = 3degrees of freedom.If we choose α = 0.05, then the value ofχ2

3,0.95 is 7.81, and if H is greater than χ23,0.95

(13.64 > 7.81), then we reject the hypothe-sis H0.If H0 is rejected, we can use the procedureof multiple comparisons to see which pairsof oils are different.

FURTHER READING� Hypothesis testing� Nonparametric test� Wilcoxon test

REFERENCEKruskal, W.H., Wallis, W.A.: Use of ranks

in one-criterion variance analysis. J. Am.Stat. Assoc. 47, 583–621 and errata, ibid.48, 907–911 (1952)

Page 292: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

L1 EstimationThe L1 estimation method is an alternativeto the least-squares method used in linearregression for the estimation of the param-eters of a model. Instead of minimizing theresidual sum of squares, the sum of theresidual absolute values is minimized.The L1 estimators have the advantage of notbeing as sensitive to outliers as the least-squares method.

HISTORYThe use of the L1 estimation method, ina very basic form, dates back to Galileo in1632, who used it to determine the distancebetween the earth and a star that had recentlybeen discovered.Boscovich,R.J. (1757)proposed twocriteriafor adjusting a line:• The respective sums of the positive and

negative residuals must be equal, and theoptimal line will pass by the centroid ofthe observations.

• The sum of the residuals in absolute valuemust be minimal.

Boscovich, R.J. proposed a geometricalmethod to solve the equations resulting fromhis criteria. The analytical procedure usedfor the estimation of the parameters forthe two criteria of Boscovich, R.J. is dueto Laplace, P.S. (1793). Edgeworth, F.Y.

(1887) suppressed the first condition, say-ing that the sum of the residuals must benull, and developed a method that could beused to treat the case of multiple linearregression.During the first half of the 20th century,statisticians were not very interested in lin-ear regression based on L1 estimation. Themain reasons were:• The numerical difficulties due to the

absence of a closed formula.• The absence of asymptotic theory for L1

estimation in the regression model.• The estimation by the least-squares

method is satisfactory for small samples,and it is not certain that L1 estimation isany better.

Starting in 1970, the field of L1 estima-tion became the pole of research for a mul-titude of statisticians, and many workswere published. We mention Basset, G.and Koenker, R. (1978), who developed theasymptotic theory of L1 estimators. In 1987,a congress dedicated to these problems washeld in Neuchâtel (Switzerland): The FirstInternational Conference on Statistical DataAnalysis Based on the L1-norm and RelatedMethods.

MATHEMATICAL ASPECTSIn the one-dimensional (1D) case, the L1 es-timation method consists in estimating the

Page 293: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

292 Lagrange Multiplier

parameter β0 in the model

yi = β0 + εi ,

so that thesumof theerrors inabsolutevalueis minimal:

minβ0

n∑i=1

|εi| = minβ0

n∑i=1

|yi − β0| .

This problem boils down to findingβ0so thatn∑

i=1

|yi − β0| is minimal. The value ofβ0 that

minimizes this sum is the median of the 1Dsample y1, y2, . . . , yn.In the simple linear regression model

yi = β0 + β1 · xi + εi ,

the parameters β0 and β1 estimated usingthe L1 estimation method are usually calcu-lated by an iterative algorithm that is usedto minimize

n∑i=1

|yi − β0 − β1 · xi| .

Using linear programming techniques it ispossible today to calculate the L1 estima-tors in a relatively easy way, especially in themultiple linear regression model.

DOMAINS AND LIMITATIONSThe problem of L1 estimation lies in theabsence of a general formula for the estima-tion of the parameters, unlike the situationwith the least-squares method. For a set of1D data, the measure of central tendencydetermined by L1 estimation is the median,which is by definition the value that mini-mizes the sum of the absolute deviations.

FURTHER READING� Error� Estimation� Estimator� Least squares� Mean absolute deviation� Parameter� Residual� Robust estimation

REFERENCESBasset, G., Koenker, R.: Asymptotic theory

of least absolute error regression. J. Am.Stat. Assoc. 73, 618–622 (1978)

Boscovich, R.J.: De Litteraria Expeditioneper Pontificiam ditionem, et Synopsisamplioris Operis, ac habentur plura eiusex exemplaria etiam sensorum impres-sa. Bononiensi Scientiarium et ArtiumInstituto Atque Academia Commentarii,vol. IV, pp. 353–396 (1757)

Dodge, Y. (ed.): Statistical Data AnalysisBased on the L1-norm and Related Meth-ods. Elsevier, Amsterdam (1987)

Edgeworth F.Y.: A new method of reducingobservations relating to severalquantities.Philos. Mag. (5th Ser) 24, 222–223 (1887)

Lagrange Multiplier

The Lagrange multiplier method is a clas-sical optimization method that allows todetermine the local extremes of a functionsubject to certain constraints. It is namedafter the Italian-French mathematician andastronomer Joseph-Louis Lagrange.

MATHEMATICAL ASPECTSLet f (x, y) be the objective function tobe maximized or minimized subject to

Page 294: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Lagrange Multiplier 293

g(x, y) = 0. We form the auxiliary func-tion, the Lagrangian:

F(x, y, θ) = f (x, y)− θg(x, y) ,

where θ is an unknown variable called theLagrange multiplier. We search the minimaor the maxima of function F by equating tozero the partial derivatives with respect to x,y, and θ :

∂F

∂x= ∂ f

∂x− θ

∂g

∂x= 0

∂F

∂y= ∂ f

∂y− θ

∂g

∂y= 0

∂F

∂θ= g(x, y) = 0 .

These three equations can be solved for x, y,andθ .Thesolutionprovides thepointswherethe constrained function has a minimum ora maximum. We have to decide if there isa maximum or a minimum.We get a maximum if

[∂2F

∂x2

]·[∂2F

∂y2

]−

[∂2F

∂x∂y

]2

> 0 ,

∂2F

∂x2< 0 and

∂2F

∂y2< 0 .

We get a minimum if

[∂2F

∂x2

]·[∂2F

∂y2

]−

[∂2F

∂x∂y

]2

> 0 ,

∂2F

∂x2 > 0 and∂2F

∂y2 > 0 .

When

[∂2F

∂x2

]·[∂2F

∂y2

]−

[∂2F

∂x∂y

]2

≤ 0 ,

the test breaks down, and to determine ifthereisaminimumoramaximum,wehavetoexamine the functions neighboring in (x, y).

The Lagrange multiplier method isextended to a function of n variables,f (x1, x2, . . . , xn), that are subject to kconstraints gj (x1, x2, . . . , xn) = 0, j =1, 2, . . . , k with k ≤ n. Similarly, in thebivariate case, we define the Lagrangian:

F = f (x1, x2, . . . , xn)

−k∑

j=1

θj · gj (x1, x2, . . . , xn) ,

with the parameters θ1, . . . , θk being theLagrange multipliers. We establish the par-tial derivatives that lead to n + k equationson n + k unknown parameters. These leadto the coordinates of the function extremes(minimum or maximum), if they exist.

DOMAINS AND LIMITATIONSIt isnotpossible to apply theLagrangemulti-plier method when the functions are not dif-ferentiable or when the imposed constraintsare inequalities. In this case we can use oth-er optimization methods properly chosenfrom linear programming.

EXAMPLESLet

f (x, y) = 5x2 + 6y2 − x · ybe the objective function and

g (x, y) = x+ 2y− 24 = 0

the imposed constraint. We form the Lag-rangian

F (x, y, θ) = f (x, y)− θg (x, y)

= 5x2 + 6y2 − x · y− θ · (x+ 2y− 24) ,

and we set to zero the partial derivatives

∂F

∂x= 10x− y− θ = 0 , (1)

Page 295: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

294 Laplace Distribution

∂F

∂y= 12y− x− 2θ = 0 , (2)

∂F

∂θ= x+ 2y− 24 = 0 . (3)

By deleting θ of (1) and (2):

20x− 2y− 2θ = 0

−x+ 12y− 2θ = 0

21x− 14y = 0

3x− 2y = 0

and by replacing 2y with 3x in (3):

0 = x+ 2y− 24 = x+ 3x− 24

= 4x− 24

wegetx = 6.By replacing thisxvalue in (3),we get y = 9. The critical point has coordi-nates (6, 9). We compute the partial deriva-tives of second order to verify whether it isa maximum or a minimum. We get:

∂2F

∂x2 = 10 ,∂2F

∂y2 = 12 ,∂2F

∂x∂y= −1 ,

[∂2F

∂x2

]·[∂2F

∂y2

]−

[∂2F

∂x∂y

]2

= 10 · 12− (−1)2 = 119 > 0 ,

∂2F

∂x2 > 0 ,∂2F

∂y2 > 0 .

Note that the point (6, 9) corresponds toa minimum for the objective function f sub-ject to the constraints above. This minimumyields f (6, 9) = 180+ 486− 54 = 612.

FURTHER READING� Linear programming� Optimization

REFERENCESDodge, Y. (2004) Optimization Appliquée.

Springer, Paris

Laplace Distribution

The random variable X follows a Laplacedistribution if its density function is of thefollowing form:

f (x) = 1

2�· exp

[(−|x− θ |

)]� > 0 ,

where � is the dispersion parameter and θ

is the expected value.

Laplace distribution, θ = 0, � = 1

The Laplace distribution is symmetricaround its expected value θ , which is also themode and the median of the distribution.The Laplace distribution is a continuousprobability distribution.

HISTORYThis continuous probability distributioncarries the name of its author. In 1774,Laplace, P.S. wrote a fundamental articleabout symmetric distributions with a view todescribing the errors of measurement.

MATHEMATICAL ASPECTSThe expected value of a random variableX following a Laplace distribution is givenby:

E [X] = θ

and the variance is equal to:

Var (X) = 2 ·�2 .

Page 296: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Laspeyres Index 295

The standard form of the density functionof the Laplace distribution is written:

f (x) = 1

σ√

2· exp

[(−√2 · |x− μ|

σ

)]

σ > 0 ,

with

μ = θ and σ = √2 ·� .

This permits one to write the distributionfunction of the Laplace distribution givenby:

F (x) =

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎩

1

2exp

[(x− θ

)]

if x ≤ θ ,

1− 1

2exp

[(−x− θ

)]

if x ≥ θ .

FURTHER READING� Continuous probability distribution

REFERENCESLaplace, P.S. de: Mémoire sur la probabi-

lité des causes par les événements. Mem.Acad. Roy. Sci. (presented by various sci-entists) 6, 621–656 (1774) (or Laplace,P.S. de (1891) Œuvres complètes, vol 8.Gauthier-Villars, Paris, pp. 27–65)

Laplace, Pierre Simon DeBorn inFrance in1749toamiddle-classfam-ily, Marquis Pierre Simon de Laplace wasone of the pioneers of statistics. Interest-ed in mathematics, theoretical astronomy,probability, and statistics, his first publica-tionsappeared in theearly 1770s.Hebecamea member of the Academy of Science in

1785, and he also directed the LongitudesBureau. In 1799, he was named Minister ofInterior Affairs by Bonaparte, who in 1806conferred upon himthetitleof count.Electedto the French Academy in 1816, he pursuedhis scientific work until his death, in Paris,in 1827.

Some main works and articles of de La-place, P.S.:

1774 Mémoire sur la probabilité des caus-es par les événements. Oeuvres com-plètes, Vol. 8, pp. 27–65 (1891). Gau-thier-Villars, Paris.

1781 Mémoire sur les probabilités. Oeu-vres complètes, Vol. 9 pp. 383–485(1891). Gauthier-Villars, Paris.

1805 Traité de Mécanique Céleste. Vol. 1–4. Duprot, Paris. Vol. 5. Bachelier,Paris.

1812 Théorie analytique des probabilités.Courcier, Paris.

1878–1912 Oeuvres complètes. 14 vol-umes. Gauthier-Villars, Paris.

Laspeyres Index

The Laspeyres index is a composite indexnumber of price constructed by the weight-ed sum method. This index number repre-sents the ratio of the sum of prices in theactual period n to the price sum in the ref-erence period 0, these sums being weight-ed by the respective quantities of the ref-erence period. Therefore the index numbermeasures the relative price change of thegoods, the respective quantities being con-sideredunchanged.TheLaspeyres indexdif-fers from the Paasche index on the choice of

Page 297: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

296 Laspeyres Index

theweights: in thePaasche index, theweight-ing is done by the sold quantities of the cur-rent period Qn, whereas in the Laspeyresindex, it is done by the quantities sold in thereference period Q0.

HISTORYLaspeyres, Etienne, German economist andstatistician of French origin, reformulatedthe index number developed by Lowe thatwouldbecomeknownastheLaspeyres indexin the mid-19th century. During the years1864–1871 Laspeyres, Etienne worked ongoods and their prices in Hamburg (heexcluded services).

MATHEMATICAL ASPECTSThe Laspeyres index is calculated as fol-lows:

In/0 =∑

Pn · Q0∑P0 · Q0

,

where Pn and Qn are the prices and quantitiessold in the current period and P0 and Q0 arethe prices and quantities sold in the referenceperiod, expressed in base 100:

In/0 =∑

Pn · Q0∑P0 · Q0

· 100 .

The sum concerns the considered goods.The Laspeyres model can also be used tocalculate a quantity index (also called vol-ume index). In this case, it is the prices thatare constants and the quantities that are vari-ables:

In/0 =∑

Qn · P0∑Q0 · P0

· 100 .

EXAMPLESConsider the following table indicating therespective prices of three consumer goods inreference year 0 and in the current year n, as

well as the quantities sold in the referenceyear:

Product Quantity sold Price (euros)

in 1970 (Q0) 1970 1988

(thousands) (P0) (Pn )

Milk 50.5 0.20 1.20

Bread 42.8 0.15 1.10

Butter 15.5 0.50 2.00

From the following table we have

∑PnQ0 = 138.68 and

∑P0Q0 = 24.27 .

Product∑

PnQ0∑

P0Q0

Milk 60.60 10.10

Bread 47.08 6.42

Butter 31.00 7.75

Total 138.68 24.27

We can then find the Laspeyres index:

In/0 =∑

Pn · Q0∑P0 · Q0

· 100

= 138.68

24.27· 100 = 571.4 .

In other words, according to the Laspeyresindex, the price index of these goods hasincreased 471.4% (571.4− 100) during theconsidered period.

FURTHER READING� Composite index number� Fisher index� Index number� Paasche index

REFERENCESLaspeyres, E.: Hamburger Warenpreise

1850–1863 und die kalifornisch-austra-lischen Geldentdeckung seit 1848. Jahrb.Natl. Stat. 3, 81–118, 209–236 (1864)

Page 298: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Law of Large Numbers 297

Laspeyres, E.: Die Berechnung einer mit-tleren Waarenpreissteigerung. Jahrb.Natl. Stat. 16, 296–314 (1871)

Latin Square

See latin square designs.

Latin Square Designs

A Latin square design is the arrangement oft treatments, each one repeated t times, insuchawaythateach treatmentappearsexact-ly one time in each row and each column inthe design. We denote by Roman charactersthe treatments. Therefore the design is calleda Latin square design. This kind of design isused to reduce systematic error due to rows(treatments) and columns.

HISTORYAccording to Preece (1983), the history ofLatin squaredatesback to 1624. In statistics,Fisher, Ronald Aylmer (1925) introducedthe Latin square designs.

EXAMPLESA farmer has in his property two fields wherehe wants to cultivate corn. He wishes to con-duct an experiment involving his four differ-ent typesofcorn (treatments).Furthermore,he knows that his fields do not receive thesame sunshine and humidity. There are twosystematic error sources given that sunshineand humidity affect corn cultivation.The farmer associates to each small piece ofland a certain type of corn in such a way thateach one appears once in each line and col-

umn:A : grain 1

B : grain 2

C : grain 3

D : grain 4 .

The following figure shows the 4× 4 Latinsquare where the Roman letters A, B, C, andD represent the four treatments above:

A B C DB C D AC D A BD A B C

.

An analysis of variance will indicatewhether, after deleting the lines (sun) and thecolumn (humidity) effects, there exists a sig-nificant difference between the treatments.

FURTHER READING� Analysis of variance� Design of experiments

REFERENCESFisher, R.A.: Statistical Methods for

Research Workers. Oliver & Boyd, Edin-burgh (1925)

Preece, D.A.: Latin squares, Latin cubes,Latin rectangles, etc. In: Kotz, S., John-son, N.L. (eds.) Encyclopedia of Statis-tical Sciences, vol. 4. Wiley, New York(1983)

Law of Large NumbersThe law of large numbers stipulates that themean of a sum of independent and identical-ly distributed random variables convergestoward the expected value of their distri-bution when the sample size tends towardinfinity.

Page 299: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

298 Law of Large Numbers

HISTORYThe law of large numbers was first estab-lished by Bernoulli, J. in his work enti-tled Ars Conjectandi, published in 1713,in the framework of an empirical definitionof probability. He stated that the relativefrequency of an event converges, duringthe repetition of identical tasks (Bernoullidistribution), toward a number that consistsin its probability.The inequality of Bienaymé–Tchebychevwas initially discovered by Bienaymé, J.(1853), before Tchebychev, P.L. (1867) dis-covered completely independently a fewyears later.The general version of the law of large num-bers is attributed to the Russian mathemati-cian Khintchin, A. (1894–1959).

MATHEMATICAL ASPECTSLet us first present the inequality ofBienaymé–Tchebychev:Consider X a random variable with expect-ed value μ and variance σ 2, both finite. Forevery real ε > 0, there is:

P (|X − μ| ≥ ε) ≤ σ 2

ε2,

which means that the probability that a ran-dom variable X deviates from its expectedvalueby an amountε itsexpected valuecan-not be larger than σ 2/ε2.The importance of this inequality lies in thefact that it allows one to limit the value ofcertain probabilities in cases where only theexpected value of the distribution is known,ultimately its variance. It is obvious that ifthe distribution itself is known, there is noneed to use these limits since the exact valueof these probabilities can be computed. Withthis inequality it is possible to compute the

convergenceof the observed mean of a ran-dom variable toward its expected value.Consider n independent and identically dis-tributed random variables X1, X2, . . . , Xn

with expected value μ and variance σ 2;then for every ε > 0:

P

(∣∣∣∣X1 + X2 + . . .+ Xn

n−μ

∣∣∣∣>ε

)−→n→∞0.

DOMAINS AND LIMITATIONSThe law of large numbers underlines the factthat there persists a probability, even if verysmall, of substantial deviation between theempirical mean of a series of events and itsexpected value. It is for this reason that theterm weak law of large numbers is used.In fact, it ispossible todemonstratethat this isnot the case and that the convergence is qua-sicertain. This result, stronger than the pre-vious one, is called the strong law of largenumbers and is written:

P(

limn→∞ X = μ

)= 1 .

EXAMPLESThe law of large numbers states that it isalways possible to find a value for n so thattheprobability that X is included in an inter-val μ± ε is as large as desired.Let us give an example of a probabilitydistribution with variance σ 2 = 1. Bychoosing an interval ε = 0.5 and a proba-bility of 0.05, we can write:

P(∣∣X − μ

∣∣ ≥ 0.5) = 0.05 .

By the inequality of Tchebychev, we knowthat

P(∣∣X − μ

∣∣ ≥ ε) ≤ σ 2

nε2 .

We can therefore establish the followinginequality:

0.05 ≤ 1

n · 0.25,

Page 300: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Least Absolute Deviation Regression 299

from which:

0.05 · 0.25 ≤ 1/n ,

n ≥ 80 .

FURTHER READING� Central limit theorem� Convergence theorem(Théorème de convergence)

REFERENCESBernoulli, J.: Ars Conjectandi, Opus Posthu-

mum. Accedit Tractatus de Seriebusinfinitis, et Epistola Gallice scripta deludo Pilae recticularis. Impensis Thurni-siorum, Fratrum, Basel (1713)

Bienaymé, I.J.: Considérations à l’appuide la découverte de Laplace sur la loi deprobabilité dans la méthode des moindrescarrés. Comptes Rendus de l’Académiedes Sciences, Paris 37, 5–13 (1853);réédité en 1867 dans le journal de Liou-ville précédant la preuve des Inégalités deBienaymé-Chebyshev, Journal de Math-ématiques Pures et Appliquées 12, 158–176

Tchebychev, P.L.: Des valeurs moyennes.J. Math. Pures Appl. 12(2), 177–184(1867). Publié simultanément en Russe,dans Mat. Sbornik 2(2), 1–9

Least AbsoluteDeviation Regression

The least absolute deviations method (theLAD method) is one of the principal alter-natives to the least-squares methods whenone seeks to estimate regression parameters.The goal of the LAD regression is to providea robust estimator.

HISTORYThe least absolute deviation regression wasintroduced around 50 years before the least-squares method, in 1757, by Roger JosephBoscovich. He used this procedure whiletrying to reconcile incoherent measures thatwere used to estimate the shape of theearth. Pierre Simon de Laplace adopt-ed this method 30 years later, yet it wasobscured under the shadow of the least-squaresmethoddevelopedbyAdrienMarieLegendre and Carl Friedrich Gauss. Theeasy calculus of the least-squares methodmade least squares much more popular thantheLADmethod.Yet in recentyearsandwithadvances in statistical computing, the LADmethod can be easily used.

MATHEMATICAL ASPECTSWe treat here only simple LAD regressioncases.Consider the simple regression model: yi =β0 + β1xi + εi. The LAD method imple-ments the following criterion: choose β0 andβ1 such that they minimize the sum of theresidual absolute values, that is,

∑|ei| =

∑∣∣yi − β0 − β1xi∣∣ ,

where ei denotes the ith residual. The partic-ularity of this method is the fact that there isno explicit formula to compute β0 and β1.LAD estimates can be computed by apply-ing an iterative algorithm.

Simple LAD AlgorithmThe essential point that is used by the LADalgorithm is the fact that the regression linealways crosses at least two data points.Let (x1, y1) be our chosen data point. Wesearch the best line that passes through thispoint. This line crosses at least one otherpoint, (x2, y2).

Page 301: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

300 Least Absolute Deviation Regression

The next step is to search the best line, withrespect to the sum

∑ |ei|, that crosses thistime (x2, y2). This line crosses now at leastone other point, (x3, y3). We iteratively con-tinue and seek the best line that crosses(x3, y3) and so forth. During the iterations,the quantity

∑ |ei|decreases. If, for an itera-tion, the obtained line is identical to the pre-vious one, we can conclude that this is thebest line.

Construction of Best Line Crossing (x1, y1)

in Terms of∑ |ei |

Since this line crosses another data point, weshould find the point (xk, yk) for which theline

y (x) = y1 + yk − y1

xk − x1(x− x1) ,

with slope

β1 = yk − y1

xk − x1

and intercept

β0 = y1 − β1x1 ,

is the best one in terms of∑ |ei|. To find

that point, we rename the (n− 1) candidatepoints (x2, y2) , . . . , (xn, yn) in such a waythat

y2 − y1

x2 − x1≤ y3 − y1

x3 − x1≤ . . . ≤ yn − y1

xn − x1.

We define T =n∑

i=1|xi − x1|. The searched

point (xk, yk) is now determined by the indexk for which⎧⎪⎪⎨⎪⎪⎩

|x2 − x1| + . . .+ |xk−1 − x1| < T2

|x2 − x1| + . . .

+ |xk−1 − x1| + |xk − x1|> T

2 .

This condition guarantees that β1 minimizesthe quantity

n∑i=1

∣∣(yi − y1)− β1 (xi − x1)∣∣

analogously to∑ |ei| for the regression lines

passing through (x1, y1). The β0 is computedin such a way that the regression line crosses(x1, y1). We can equally verify that it passesthrough the data point (xk, yk). We just haveto rename the point (xk, yk) by (x2, y2) andrestart.

DOMAINS AND LIMITATIONSTwo particular cases can appear when oneruns the LAD regression: the nonuniquesolution and the degeneracy. Nonuniquesolutionmeansthat thereareat least twolinesminimizing the sum of the residual abso-lute values. Degeneracy means that the LADregression linepasses through more than twodata points.

EXAMPLESLet us take an example involving 14 NorthandCentralAmericancountries.Thefollow-ing table contains the data for each country,the birthrate (number of births per 1000 peo-ple) together with the urbanization rate (per-centage of population living in cities withmore than 100000 inhabitants) in 1980.

Table: Urbanization rate and birthrate

Obs. i Country Urbaniza-tion rate xi

Birth-rate yi

1 Canada 55.0 16.2

2 Costa Rica 27.3 30.5

3 Cuba 33.3 16.9

4 USA 56.5 16.0

5 El Salvador 11.5 40.2

Page 302: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Least Absolute Deviation Regression 301

Obs. i Country Urbaniza-tion rate xi

Birth-rate yi

6 Guatemala 14.2 38.4

7 Haiti 13.9 41.3

8 Honduras 19.0 43.9

9 Jamaica 33.1 28.3

10 Mexico 43.2 33.9

11 Nicaragua 28.5 44.2

12 Trinidad/Tobago

6.8 24.6

13 Panama 37.7 28.0

14 DominicanRepublic

37.1 33.1

Source: Birkes and Dodge (1993)

We try to estimate the birthrate yi as a func-tion of the urbanization rate xi using LADregression, which is to construct the LADregression model

yi = β0 + β1xi + εi .

To use the LAD algorithm to estimate β0 andβ1, in the fist step we must search for thebest linecrossingat leastonepoint, forexam-ple Canada (randomly chosen). We have(x1, y1) = (55.0, 16.2). We compute theslopes yi−16.2

xi−55.0 for all 13 remaining countries.For example, for Costa Rica, the obtainedresult is 30.5−16.2

27.3−55.0 = −0.5162. The 13 coun-tries are now reordered in ascending orderaccording to the computed slopes, as can beseen in the table below. The point (x2, y2)

is Mexico, the point (x3, y3) is Nicaragua,and so forth. In what follows we compute thequantity T = ∑ |xi − 55.0| = 355.9 andT2 = 177.95. This has to do with definingthe country k that satisfies the condition⎧⎪⎪⎨⎪⎪⎩

|x2 − x1| + . . .+ |xk−1 − x1| < T2 ,

|x2 − x1| + . . .

+ |xk−1 − x1| + |xk − x1|> T

2 .

Since8∑

j=2

∣∣xj − x1∣∣ = 172.5 <

T

2

and9∑

j=2

∣∣xj − x1∣∣ = 216.0 >

T

2,

we can conclude that k = 9 and that thebest regression line passes through Canada(x1, y1) and El Salvador (x9, y9). Hence:

β1 = y9 − y1

x9 − x1= 40.2− 16.2

11.5− 55.0= −0.552 ,

β0 = y1 − β1x1 = 16.2− (−0.552) · 55.0

= 46.54 ,

and the LAD regression line is:

y(x) = 46.54− 0.552x .

Table: Computation to find LAD regression line

Obs. i Country yi −y1xi −x1

|xi − x1|i∑

j=2

∣∣xj − x1

∣∣

2 Mexico −1.5000 11.8 11.8

3 Nicaragua −1.0566 26.5 38.3

4 Domin.Rep.

−0.9441 17.9 56.2

5 Honduras −0.7694 36.0 92.2

6 Panama −0.6821 17.3 109.5

7 Haiti −0.6107 41.1 150.6

8 Jamaica −0.5525 21.9 172.5

9 El Salvador −0.5517 43.5 216.0

10 Guatemala −0.5447 40.8 256.8

11 Costa Rica −0.5162 27.7 284.5

12 Trinidad/Tobago

−0.1743 48.2 332.7

13 USA −0.1333 1.5 334.2

14 Cuba −0.0323 21.7 355.9

To improve our first estimate, the next stepconsists in finding the best line crossing El

Page 303: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

302 Least Significant Difference Test

Salvador. We rename this point (x1, y1) =(11.5, 40.2) and reorder the 13 countries inascending order according to their slopesyi−11.5xi−40.2 , and, similarly, we obtain the bestregression line that passes through El Sal-vador as well as through the United States.Hence,

β1 = 40.2− 16.0

11.5− 56.5= −0.538 ,

β0 = 40.2− (−0.538) · 11.5 = 46.38 ,

and the regression line is

y(x) = 46.38− 0.538x .

The third step consists in finding the bestline passing through the United States. Sincewe find (in a similar way as above) that thisregression line passes through El Salvador,our algorithm stops and the regression liney(x) = 46.38 − 0.538x is the LAD regres-sion line of our problem.Wenotehere that the linecomputed using theleast-squares method for these data is givenby

y(x) = 42.99− 0.399x .

Note that with this regression line Cuba,Nicaragua, and Trinidad/Tobago haveabnormally large residual values. By delet-ing these three points and recomputing theleast-squares regression line, we obtain

y (x) = 48.52− 0.528x ,

which is very close to the LAD regressionline.

FURTHER READING� Analysis of variance� Least squares� Multiple linear regression� Regression analysis

� Residual� Robust estimation� Simple linear regression

REFERENCESBirkes, D., Dodge, Y.: Alternative Methods

of Regression. Wiley, New York (1993)

Least SignificantDifference Test

The least significant difference (LSD) test isused in the context of the analysis of vari-ance, when the F-ratio suggests rejection ofthe null hypothesis H0, that is, when the dif-ference between the population means is sig-nificant.This test helps to identify the populationswhose means are statistically different. Thebasic idea of the test is to compare the pop-ulations taken in pairs. It is then used to pro-ceed in a one-way or two-way analysis ofvariance, given that the null hypothesis hasalready been rejected.

HISTORYThe LSD test was developed by Fisher,Ronald Aylmer (1935), who wanted toknow which treatments had a significanteffect in an analysis of variance.

MATHEMATICAL ASPECTSIf in an analysis of variance the F-ratioleads to rejection of the null hypothesis H0,wecanperformtheLSDtest inorder todetectwhich means have led H0 to be rejected.The test consists in a pairwise comparisonof the means. In general terms, the stan-dard deviation of the difference betweenthe mean of group i and the mean of group j,

Page 304: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Least Significant Difference Test 303

for i �= j, is equal to:

√s2

I

(1

ni+ 1

nj

),

where s2I is the estimation of the variance

inside the groups

s2I =

SCI

N − k=

k∑i=1

ni∑j=1

(Yij − Yi.

)2

N − k,

where SCI is the sum of squares inside thegroups,N is the totalnumberofobservations,k is the number of groups, Yij is the jth obser-vation of group i, Yi. is the mean of group i,ni is the number of observations in group i,and nj is the number of observations in groupj.The ratio

Yi. − Yj.√s2

I

(1ni+ 1

nj

)

follows the Student distribution with N − kdegrees of freedom. The difference betweena pair of means is significant when

∣∣Yi. − Yj.∣∣ ≥

√s2

I

(1

ni+ 1

nj

)· tN−k,1− α

2,

with tN−k,1− α2

denoting the value of a Stu-dent variate with N − k degrees of freedomfor a significance level set to α.

The quantity√

s2I (

1ni+ 1

nj) · tN−k,1− α

2is

called the LSD.

DOMAINS AND LIMITATIONSLike the analysis of variance, the LSD testdemands independent and identically dis-tributed normal variables with constant vari-ance.

EXAMPLESDuring boiling, all-butter croissants absorbfat in variable quantities. We want to veri-fy whether the absorbed quantity depends onthe type of fat content (animal or vegetablefat). We prepare for our experiment four dif-ferent types of fat, and we boil six all-buttercroissants for each croissant type.Here are the results of our experiment:

Type of fat

1 2 3 4

64 78 75 55

72 91 93 66

68 97 78 49

77 82 71 64

56 85 63 70

95 77 76 68

We run a one-way analysis of variance thatpermits us to test the null hypothesis

H0 : μ1 = μ2 = μ3 = μ4 ,

which is evidently rejected.Since H0 is rejected, we run the LSD testseeking to identify which means caused therejection of H0.In this example, the number of observationsis the same for each group (n1 = n2 = n3 =n4 = n = 6). Consequently,

√s2

I

(1ni+ 1

nj

)

is simplified to√

2s2I

n .The value for the corresponding LSD test isequal to:

√2s2

I

n· tN−k,1− α

2.

The results of the analysis of variance pro-vide the computational elements needed to

Page 305: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

304 Least Squares

find the value of s2I :

s2I =

SCI

N − k=

k∑i=1

ni∑j=1

(Yij − Yi.

)2

N − k

= 2018

20= 100.9 .

If we choose a significance level α = 0.05,the Student value for N − k = 20 degreesof freedom is equal to:

tN−k,1− α2= t20,0.975 = 2.086 .

Hence the value for the LSD statistic is:

LSD =√

2s2I

n· t20,0.975

=√

2 · 100.9

6· 2.086

= 12.0976 .

The difference between a pair of means Yi

and Yj is significant if∣∣Yi. − Yj.

∣∣ ≥ 12.0976 .

We can summarize the results in a table con-taining the mean differences, the values forthe LSD statistic, and the conclusion.

Difference LSD Signi-ficant∣∣∣Y1.−Y2.

∣∣∣ = |72− 85| = 13 12.0976 Yes∣∣∣Y1.−Y3.

∣∣∣ = |72− 76| = 4 12.0976 No∣∣∣Y1.−Y4.

∣∣∣ = |72− 62| = 10 12.0976 No∣∣∣Y2.−Y3.

∣∣∣ = |85− 76| = 9 12.0976 No∣∣∣Y2.−Y4.

∣∣∣ = |85− 62| = 23 12.0976 Yes∣∣∣Y3.−Y4.

∣∣∣ = |76− 62| = 14 12.0976 Yes

We verify that only the differences Y1. −Y2., Y2. − Y4., and Y3. − Y4. are significant.

FURTHER READING� Analysis of variance� Student distribution� Student table

REFERENCESDodge, Y., Thomas, D.R.: On the perfor-

mance of non-parametric and normaltheory multiple comparison procedures.Sankhya B42, 11–27 (1980)

Fisher, R.A.: The Design of Experiments.Oliver & Boyd, Edinburgh (1935)

Miller, R.G., Jr.: Simultaneous StatisticalInference, 2nd edn. Springer, Berlin Hei-delberg New York (1981)

Least SquaresSee least-squares method.

Least-Squares MethodThe least-squares method consists in mini-mizing the sum of the squared residuals. Thelatter correspond to the squared deviationsbetween estimated and observed values.

HISTORYThe least-squares method was introduced byLegendre, Adrien-Marie (1805).

MATHEMATICAL ASPECTSConsider the multiple linear regressionmodel:

Yi = β0 +p−1∑j=1

βj · Xji + εi, i = 1, . . . , n ,

where Yi is the dependent variable, Xji, j =1, . . . , p − 1 are the explanatory or inde-pendent variables, βj, j = 0, . . . , p − 1 are

Page 306: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Least-Squares Method 305

the parameters to be estimated, and εi is theerror term.Parameter estimation using the least-squares method consists in minimizing thesum of the squared errors (residuals):

minimum f(β0, . . . , βp−1

),

where:

f(β0, . . . , βp−1

) =n∑

i=1

ε2i .

The residual sum of squares is equal to:

f(β0, . . . , βp−1

) =n∑

i=1

ε2i

=n∑

i=1

⎛⎝Yi − β0 −

p−1∑j=1

βj · Xji

⎞⎠

2

.

Wefind theestimated parametersof the func-tionbyequatingtozerothepartialderivativesof this function with respect to each param-eter:

∂ f(β0, . . . , βp−1

)

β0= 0 ,

∂ f(β0, . . . , βp−1

)

β1= 0 ,

...

∂ f(β0, . . . , βp−1

)

βp−1= 0 .

Suppose we have n observations and thatp = 2:

(X1, Y1) , (X2, Y2) , . . . , (Xn, Yn) .

TheequationrelatingYiandXi canbedefinedfrom the linear model:

Yi = β0 + β1 · Xi + εi , i = 1, . . . , n .

The sum of the squared deviations from theestimated regression line is:

f (β0, β1) =n∑

i=1

ε2i

=n∑

i=1

(yi − β0 − β1 · Xi)2 .

Mathematically, we determine β0 and β1 bytaking the partial derivatives of function fwith respect to parametersβ0 and β1 and set-ting them equal to zero.Partial derivative with respect to β0:

∂ f

∂β0= −2 ·

n∑i=1

(Yi − β0 − β1 · Xi) .

Partial derivative with respect to β1:

∂ f

∂β1= −2 ·

n∑i=1

Xi · (yi − β0 − β1 · Xi) .

Hence, the estimated values for β0 and β1,denotedas β0 and β1,aregivenasthesolutionto the following equations:

n∑i=1

(Yi − β0 − β1 · Xi

)= 0 ,

n∑i=1

Xi ·(

Yi − β0 − β1 · Xi

)= 0 .

Developing these two equations, we obtain:

n∑i=1

Yi = n · β0 + β1 ·n∑

i=1

Xi

and

n∑i=1

(Xi · Yi) = β0 ·n∑

i=1

Xi + β1 ·n∑

i=1

X2i ,

which we call the normal equations of theleast-squares regression line.

Page 307: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

306 Legendre, Adrien Marie

Given

X =∑n

i=1 Xi

nand Y =

∑ni=1 Yi

n,

the solution of thenormalequationsprovidesthe estimates of the following parameters:

β0 = Y − β1 · X ,

β1 =n ·∑n

i=1 (Xi · Yi)−(∑n

i=1 Xi) · (∑n

i=1 Yi)

n ·∑ni=1 X2

i −(∑n

i=1 Xi)2

=∑n

i=1

(Xi − X

) (Yi − Y

)∑n

i=1

(Xi − X

)2 .

The least-squares estimates for the depen-dent variable are:

Yi = β0 + β1 · Xi , i = 1, . . . , n

or

Yi = Y + β1 ·(Xi − X

), i = 1, . . . , n .

According to this equation, the least-squaresregression line passes through the point(X, Y

), which is called the barycenter or cen-

ter of gravity for the scatter cloud of the datapoints.We can, equally, express the multiple lin-ear regression model in terms of vectors andmatrices:

Y = X · β + ε ,

where Y is the (n× 1) response vector (thedependent variable), X is the (n × p) inde-pendent variables matrix, ε is the (n × 1)

vector of the error term, and β is the (p× 1)

vector of the parameters to be estimated.The application of the least-squares methodconsists in minimizing the following equa-

tion:

ε′ · ε = (Y− X · β)′ · (Y− X · β)

= Y′ ·Y− β ′ ·X′ ·Y− Y′ ·X · β+ β ′ ·X′ ·X · β= Y′ ·Y− 2 · β ′ · X′ · Y+ β ′ ·X′ ·X · β .

The normal equations are given, in matrixform, by:

X′ · X · β = X′ ·Y .

If X′ · X is invertible, the parameters β areestimated by:

β = (X′ · X)−1 · X′ · Y .

EXAMPLESSee simple linear regression.

FURTHER READING� Estimation� Multiple linear regression� Regression analysis� Simple linear regression

REFERENCESLegendre, A.M.: Nouvelles méthodes pour

la détermination des orbites des comètes.Courcier, Paris (1805)

Legendre, Adrien Marie

Born in 1752 in Paris, French mathemati-cian Legendre, Adrien Marie succeeded deLaplace, P.S. as professor in mathematics,first at the Ecole Militaire and then at theEcole Normale of Paris. He was interest-ed in the theory and practice of astronomyand geodesy. From 1792 he was part of the

Page 308: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Leverage Point 307

French commission in charge of measuringthe length of a meridian quadrant from theNorth Pole to the Equator passing throughParis.He died in 1833, but in the history of statis-tics his name remained attached to the pub-lication of his work in 1805, which includedan appendix titled ”On the least-squaresmethod.” This publication was at the rootof the controversy that opposed him toGauss, C.F., who vied with him over creditfor the discovery of this method.

Main work of Legendre, A.M.:

1805 Nouvelles méthodes pour la déter-mination des orbites des comètes.Courcier, Paris.

REFERENCESPlackett, R.L.: Studies in the history of

probability and statistics. In: Kendall, M.,Plackett, R.L. (eds.) The discovery of themethod of least squares. vol. II. Griffin,London (1977)

Lehmann, ErichLehmann, Erich was born in Strasbourg,France, in 1917 and went to the UnitedStates in 1940. He has made fundamentalcontributions to the theory of statistics. Hisresearch interests are the statistical theoryand history and philosophy of statistics. Heis the author of Basic Concepts of Probabi-lity and Statistics,Elements of Finite Proba-bility, Nonparametrics: Statistical Meth-ods Based on Ranks, Testing StatisticalHypotheses, and Theory of Point Estima-tion. He is a member of the American Acade-my of Arts and Sciences and of the Nation-al Academy of Science (NAS), having been

elected to the NAS in 1978. He has beeneditor of the Annals of Mathematical Statis-tics and President of the Institute of Mathe-matical Statistics.

Selected works and publications of ErichLehmann:

1983 Theory of Point Estimation, 2nd edn.Wiley, NY.

1986 Testing Statistical Hypotheses. 2ndedn. In: Series in Probability andMathematical Statistics. Wiley, NY.

1993 (withHodges,J.L.)TestingStatisticalHypotheses. Chapman & Hall, NY.

FURTHER READING� Estimation� Hypothesis testing

Level of Significance

See significance level.

Leverage Point

In regression analysis, we call the leveragepoint an observation i for which the esti-mated response value Yi is influenced bythe value of the corresponding independentvariable Xi. The notion of leverage point isequivalent to thatof leverage in physics.Thatis, even a small modification to the value ofobservation i can seriously affect the estima-tion of yi.

MATHEMATICAL ASPECTSThe diagonal elements of matrix H inregression analysis reflect the influenceof the observation on the estimation of

Page 309: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

308 Leverage Point

the response. We associate in principle thediagonal elements hii of matrix H with thenotion of the leverage point. Since the traceof matrix H is equal to the number of param-eters to estimate, we expect to have hii closeto p/n. The hii should be close to p/n if allthe observations have the same influenceon the estimates. If for an observation i weobtain hii larger than p/n, then observation iis considered a leverage point. In practice,an observation i is called a high leveragepoint if:

hii >2p

n.

Consider the following linear regressionmodel in matrix notation:

Y = X · β + ε ,

where Y is the (n× 1) response vector (thedependent variable), X is the (n × p) inde-pendent variables matrix, ε is the (n × 1)

vector of the error term, and β is the (p× 1)

vector of the parameters to be estimated.The estimation β of vector β is given by:

β = (X′X

)−1 X′Y .

Matrix H is defined by:

H = X(X′X

)−1 X′ .

In particular, the diagonal elements hii aredefined by:

hii = xi(X′X

)−1x′i ,

where xi is the ith line of X.We call the leverage point each observation ifor which:

hii = xi(X′X

)−1x′i > 2 · p

n.

DOMAINS AND LIMITATIONSHuber (1981) refers to leverage pointswhen the diagonal element of the matrixX

(X′X

)−1 X′ is larger than 0.2, the criticallimit not being dependent either on the num-ber of the parameters p or on the number ofobservations n.

EXAMPLESConsider the following table, where Y isa dependent variable explained by theindependent variable X:

X Y

50 6

52 8

55 9

75 7

57 8

58 10

The linear regression model in matrix nota-tion is:

Y = X · β + ε ,

where

Y =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

68978

10

⎤⎥⎥⎥⎥⎥⎥⎥⎦

, X =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

1 501 521 551 751 571 58

⎤⎥⎥⎥⎥⎥⎥⎥⎦

,

ε is the (6× 1) residual vector, and β is the(2× 1) parameter vector.We get matrix H by:

H = X(X′X

)−1 X′ .

We find:

H =

⎡⎢⎢⎢⎢⎢⎣

0.32 0.28 0.22 −0.17 0.18 0.160.28 0.25 0.21 −0.08 0.18 0.160.22 0.21 0.19 0.04 0.17 0.17−0.17 −0.08 0.04 0.90 0.13 0.17

0.18 0.18 0.17 0.13 0.17 0.170.16 0.16 0.17 0.17 0.17 0.17

⎤⎥⎥⎥⎥⎥⎦

.

Page 310: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Likelihood Ratio Test 309

Wewrite thediagonalofmatrix Hon thevec-tor h:

h =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

0.320.250.190.900.170.17

⎤⎥⎥⎥⎥⎥⎥⎥⎦

.

In this case there are two parameters (p =2) and six observations (n = 6), that is, thecritical value equals 4

6 = 0.67.Comparing the components of h to this val-ue, we notice that, with the exception ofobservation i = 4, all observations havehii < 0.67. The fourth observation is a lever-age point for the problem under study. Thefollowing figure,wherevariablesX and Yarerepresented on the two axes, illustrates thefact that the point (75, 7) is a high leveragepoint.

Inspecting this figure we notice that theregression line is highly influenced by thisleverage point and does not represent cor-rectly the data set under study.In the following figure we have deleted thispoint from the data set. We can see that thenew regression line gives a much better rep-resentation of the data set.

FURTHER READING� Hat matrix� Matrix� Multiple linear regression� Regression analysis� Simple linear regression

REFERENCESHuber, P.J.: Robust Statistics. Wiley, New

York (1981)

Mosteller, F., Tukey, J.W.: Data Analysisand Regression: A Second Course inStatistics. Addison-Wesley, Reading, MA(1977)

Likelihood Ratio Test

The likelihood ratio test is a hypothesis test.It allows to test general hypotheses concern-ing the parameters of interest of a paramet-ric family as well as to test two differentmodels built on the same data. The mainidea of this test is the following: compute theprobability of observing the data under thenull hypothesis H0 and under the alternativehypothesis using the likelihood function.

HISTORYIt is Neyman, Jerzy and Pearson, EgonSharpe (1928) who came up with the ideaof using the likelihood ratio statistic to test

Page 311: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

310 Likelihood Ratio Test

hypotheses. Wald, Abraham (1941) general-ized the likelihood ratio test to more com-plicated hypotheses. The asymptotic resultson the distribution of this statistic (subject tocertain regularity conditions) were first pre-sented by Wilks, Samuel Stanley (1962).

MATHEMATICAL ASPECTSIn the likelihood ratio test, the null hypoth-esis is rejected if the likelihood under thealternative hypothesis is significantly larg-er than the likelihood under the null hypoth-esis. Hence, the problem is not finding themost adequate parameter but knowing ifunder the alternative hypothesis we get a sig-nificantly better estimate.The Neyman–Pearson theorem states that,for all significant levels α, the likelihoodratio test (for testing the simple hypothesis)has more power than any other test.Let x1, x2, . . . , xn be n independent obser-vations (n > 1) of a random variableX following a probability distribution withparameter θ (or a parameter vector θ ). Wedenote by L (θ;x1, x2, . . . , xn) the likelihoodfunction that, for fixed xi, depends only uponθ , and we denote by � the space of all possi-ble values of θ and �0, with �1 are subspaceof �.We distinguish between:

Simple Null Hypothesisvs. a Simple Alternative Hypothesis

H0 : θ = θ0

vs. H1 : θ = θ1 .

In this case, the likelihood ratio is:

λ = λ (x1, x2, . . . , xn)

= L (θ0;x1, x2, . . . , xn)

L (θ1;x1, x2, . . . , xn).

The likelihood ratio for a significance levelα is defined by the decision rule:

Reject H0 if λ > λ0 ,

where λ0 is the critical value defined by thesignificance level (α) and the distribution ofλ under the null hypothesis:

α = P (λ (x1, x2, . . . , xn) ≥ λ0|H0) .

The problem here is to find the criticalregion since there is no generally acceptedprocedure to do so. The idea is to search fora statistic whose distribution is known.

Simple Null Hypothesisvs. a General Alternative

H0 : θ = θ0

vs. H1 : θ ∈ �1(for example θ < θ0) .

We pose:

λ = supθ∈�1L (θ0;x1, x2, . . . , xn)

L (θ1;x1, x2, . . . , xn),

where supθ∈�1L(θ; x1, x2, . . . , xn) repre-

sents the value of the likelihood functionfor the parameter that maximizes this like-lihood function in the restricted parameterspace (θ ∈ �1), and we search again for thedistribution of λ under H0.If the alternative hypothesis H1 is θ �= θ0,we can write λ as:

L(θ;x1, x2, . . . , xn

)

L (θ0;x1, x2, . . . , xn),

where θ is the maximum likelihood estimatefor θ .

General Hypothesis H0 : θ ∈ �0

vs. Alternative HypothesisH1 : θ ∈ �\�0 = {θ ∈ �; θ /∈ �0}Consider the following statistic:

λ = supθ∈�0L (θ;x1, x2, . . . , xn)

supθ∈� L (θ;x1, x2, . . . , xn)

Page 312: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Likelihood Ratio Test 311

and search the rejection region as in the pre-vious cases.The supθ∈�0

L(θ;x1, x2, . . . , xn) representsthe value of the likelihood function forthe parameter that maximizes the likeli-hoodfunction(themaximumlikelihoodesti-mate) on the restricted space of the param-eters. The supθ∈� L (θ;x1, x2, . . . , xn) repre-sents the same thing but for thewhole param-eter space. The larger the difference betweenthese two values, the more inappropriate willbe therestrictedmodel(andH0 willbereject-ed).Under H0, the distribution of −2 log (λ)

follows asymptotically a chi-square distri-bution with u degrees of freedom, with u =dim � − dim �0 (that is, the number ofparameters specified by H0). In other words:

− 2 limn→∞log

(supθ∈� L (θ;x1, x2, . . . , xn)

supθ∈�0L (!θ;x1, x2, . . . , xn)

)

= χ2 (u) .

For the particular cases treated earlier, theresult is:

− 2 log λ∼χ2 on 1 degree of freedom

if n→∞ .

DOMAINS AND LIMITATIONSIf the data distribution is normal, the like-lihood ratio test is equivalent to the Fishertest. The advantage of the latter (when it canbeexplicitlyconstructed) is that itguaranteesthe best performance and power [accordingto the Neyman–Pearson theorem] to test thesimple hypothesis.

EXAMPLESExample 1LetX beanormalrandomvariablewithmeanμ and variance σ 2. We want to test the fol-

lowing hypothesis:

H0 : μ = 0 , σ 2 > 0

H1 : μ �= 0 , σ 2 > 0 .

By considering the parameter space � ={(μ, σ 2); −∞ < μ < ∞, 0 < σ 2 < ∞}and a subspace �1 = {(μ, σ 2) ∈ �;μ =0}, the hypothesis can be now written as:

H0 : θ =(μ, σ 2

)∈ �0

H1 : θ =(μ, σ 2

)∈ �\�0 .

Consider now x1, . . . , xn n observations ofvariable X (n > 1). The likelihood func-tion is (see example under maximum like-lihood):

L (θ; x1, . . . , xn) = L(μ, σ 2; x1, . . . , xn

)

=(

1

2πσ 2

)n/2

· exp

⎛⎜⎜⎝−

n∑i=1

(xi − μ)2

2σ 2

⎞⎟⎟⎠ .

We estimate the sample mean and varianceby using the maximum likelihood estimates:

μ = 1

n∑i=1

xi = x ,

σ 2 = 1

n∑i=1

(xi − μ)2 .

Hence, the likelihood functions are, for thenull hypothesis:

L(

0, σ 2; x1, . . . , xn

)

=

⎛⎜⎜⎝

n

2πn∑

i=1x2

i

⎞⎟⎟⎠

n/2

· exp

⎛⎜⎜⎝−

nn∑

i=1x2

i

2∑n

i=1 x2i

⎞⎟⎟⎠

=

⎛⎜⎜⎝

n

2πn∑

i=1x2

i

⎞⎟⎟⎠

n/2

exp(−n

2

)

Page 313: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

312 Likelihood Ratio Test

=

⎛⎜⎜⎝

n exp (−1)

2πn∑

i=1x2

i

⎞⎟⎟⎠

n/2

,

and for the alternative hypothesis:

L(μ, σ 2; x1, . . . , xn

)

=

⎛⎜⎜⎝

n

2πn∑

i=1(xi − x)2

⎞⎟⎟⎠

n/2

· exp

⎛⎜⎜⎝−

nn∑

i=1(xi − x)2

2n∑

i=1(xi − x)2

⎞⎟⎟⎠

=

⎛⎜⎜⎝

1

2πn∑

i=1(xi − x)2

⎞⎟⎟⎠

n/2

· exp(−n

2

)

=

⎛⎜⎜⎝

n exp (−1)

2πn∑

i=1(xi − x)2

⎞⎟⎟⎠

n/2

,

so the likelihood ratio is:

λ = λ (x1, x2, . . . , xn)

= L(0, σ 2; x1, x2, . . . , xn

)

L(μ, σ 2; x1, x2, . . . , xn

)

=

⎛⎜⎜⎝

n exp (−1)

2πn∑

i=1x2

i

⎞⎟⎟⎠

n/2 /

⎛⎜⎜⎝

n exp (−1)

2πn∑

i=1(xi − x)2

⎞⎟⎟⎠

n/2

=

⎛⎜⎜⎝

n∑i=1

(xi − x)2

∑ni=1 x2

i

⎞⎟⎟⎠

n/2

.

One must find at this point the distribution

of λ. To do so we recall thatn∑

i=1x2

i =n∑

i=1(xi − x)2 + nx2, and we obtain:

λ =

⎛⎜⎜⎝

n∑i=1

(xi − x)2

n∑i=1

(xi − x)2 + nx2

⎞⎟⎟⎠

n/2

= 1

/⎛⎜⎜⎝1+ nx2

n∑i=1

(xi − x)2

⎞⎟⎟⎠

n/2

.

We search for λ0 for which the null hypoth-esis is not rejected if λ ≤ λ0, which we canwrite as:

λ ≤ λ0

or 1

/⎛⎜⎜⎝1+ nx2

n∑i=1

(xi − x)2

⎞⎟⎟⎠

n/2

≤ λ0

or

⎛⎜⎜⎝1+ nx2

n∑i=1

(xi − x)2

⎞⎟⎟⎠

n/2

≥ λ−10

ornx2

n∑i=1

(xi − x)2≥ λ−2/n0 − 1 .

By multiplying each term of the inequalityby (n− 1), we obtain:

λ ≤ λ0 or √n |x|√

n∑i=1

(xi − x)2 /(n− 1)

≥√

(n− 1)(λ−2/n0 − 1

).

Page 314: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Likelihood Ratio Test 313

Since the random variable

T = X − μ

σ/(n− 1)

,

where σ 2 is the sample variance, followsa t distribution with (n− 1) degrees of free-dom; we can equivalently write:

T =√

n(X − μ

)√

nσ 2/(n− 1)

,

which, for mean set to zero, gives:

T =√

nX√nσ 2

/(n− 1)

=√

nX√n∑

i=1x2

i

/(n− 1)

.

One must search now for the value λ0 for

which c :=√

(n− 1)(λ−2/n0 − 1

)is the

value of a Student variable for a chosen sig-nificance level α:

λ0 =(

c2

n− 1+ 1

)−n/2

.

For example, given a sample of size n = 6,and for α = 0.05, we find z5,0.05/2 = 2.751,so it is sufficient to see that:

λ ≤(

(2.751)2

5+ 1

)−3

= 0.063

for the null hypothesis to be true.

Example 2Two independent random samplesX1, . . . , Xn and Y1, . . . , Ym (m + n > 2)follow normal distributions with parameters

(μ1, σ 2

)and

(μ2, σ 2

), respectively.Wewish

to test the following hypothesis:

H0 : μ1 = μ2

H1 : μ1 �= μ2 .

We define the sets:

� = {(μ1, μ2, σ 2); −∞ < μ1, μ2

<∞, 0 < σ 2 <∞},

�1 ={(μ1, μ2, σ 2) ∈ �;μ1 = μ2

}.

We can now write the hypothesis as follows:

H0 :(μ1, μ2, σ 2

)∈ �1

H1 :(μ1, μ2, σ 2

)∈ �\�1 .

So the likelihood functions are:

L(μ1, μ1, σ 2; x1, . . . , xn, y1, . . . , ym

)

=(

1√2πσ

)n+m

· exp

⎛⎜⎜⎝−

n∑i=1

(xi − μ1)2 +

m∑i=1

(yi − μ1)2

2σ 2

⎞⎟⎟⎠,

L(μ1, μ2, σ 2; x1, . . . , xn, y1, . . . , ym

)

=(

1√2πσ 2

)n+m

· exp

⎛⎜⎜⎝−

n∑i=1

(xi − μ1)2 +

m∑i=1

(yi − μ2)2

2σ 2

⎞⎟⎟⎠.

By using the maximum likelihood estimates,which are for the null hypothesis:

μ1 =

n∑i=1

xi +m∑

i=1yi

n+ m= nx+ my

n+ m,

σ 2 =

n∑i=1

(xi − μ1)2 +

m∑i=1

(yi − μ1)2

n+ m

Page 315: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

314 Likelihood Ratio Test

and for the alternative hypothesis:

μ′1 =

n∑i=1

xi

n= x ,

μ2 =

m∑i=1

yi

m= y ,

σ ′2 =

n∑i=1

(xi − x)2 +m∑

i=1(yi − y)2

n+ m,

we get the likelihood ratio:

λ = L(μ1, μ1, σ 2; x1, . . . , xn, y1, . . . , ym

)

L(μ′1, μ′2, σ ′2; x1, . . . , xn, y1, . . . , ym

)

=(

n+ m

)−(n+m)/2

exp

(−n+ m

2

)

·[

n∑i=1

(xi − nx+ my

n+ m

)2

+m∑

i=1

(yi − nx+ my

n+ m

)2]−(n+m)/2

·(

n+ m

)(n+m)/2

exp

(n+ m

2

)

[n∑

i=1

(xi − x)2 +m∑

i=1

(yi − y)2

](n+m)/2

=(

n∑i=1

(xi − nx+ my

n+ m

)2

+m∑

i=1

(yi − nx+ my

n+ m

)2)−(n+m)/2

·(

n∑i=1

(xi − x)2 +m∑

i=1

(yi − y)2

)(n+m)/2

=(

n∑i=1

(xi − x)2 +m∑

i=1

(yi − y)2

)(n+m)/2

/(n∑

i=1

(xi − nx+ my

n+ m

)2

+m∑

i=1

(yi − nx+ my

n+ m

)2)(n+m)/2

.

The statistic λ2/(n+m) is given by:

n∑i=1

(Xi − X

)2 +m∑

i=1

(Yi − Y

)2

n∑i=1

(Xi − nX+mY

n+m

)2 +m∑

i=1

(Yi − nX+mY

n+m

)2.

To find the distribution of this statistic,we use the following arguments (note that

n∑i=1

(Xi − X

) = 0):

n∑i=1

(Xi − nX + mY

n+ m

)2

=n∑

i=1

((Xi − X

)+(

X − nX + mY

n+ m

))2

=n∑

i=1

(Xi − X

)2 + n

(X − nX + mY

n+ m

)2

=n∑

i=1

(Xi − X

)2 + nm2

(n+ m)2

(X − Y

)2.

The same holds for Y:

m∑i=1

(Yi − nX + mY

n+ m

)2

=m∑

i=1

(Yi − Y

)2 + mn2

(n+ m)2

(X − Y

)2.

Page 316: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Likelihood Ratio Test 315

We can now write:

λ2/(n+m)

=(

n∑i=1

(Xi − X

)2 +m∑

i=1

(Yi − Y

)2

)

·( n∑

i=1

(Xi − X

)2 +m∑

i=1

(Yi − Y

)2

+ nm

n+ m

(X − Y

)2)−1

=

⎛⎜⎜⎝1+

nmn+m

(X − Y

)2

n∑i=1

(Xi − X

)2 +m∑

i=1

(Yi − Y

)2

⎞⎟⎟⎠

−1

.

If the null hypothesis is true, then the randomvariable

T =√

nmn+m

(X − Y

)√

n∑i=1

(Xi−X)2+

m∑i=1

(Yi−Y)2

n+m−2

follows the Student distribution with n+m−2 degrees of freedom. Therefore, we obtain:

λ2/(n+m) = 1

1+ (n+ m− 2) T2

= n+ m− 2

n+ m− 2+ T2 .

We search for λ0 for which the null hypoth-esis is rejected given λ ≤ λ0 ≤ 1:

λ =(

n+ m− 2

n+ m− 2+ T2

)2/(n+m)

≤ λ0

⇔ |T| ≥

√√√√√(

λ− n+m

20 − 1

)

(n+ m− 2)

:= c .

It is now sufficient to take the critical value cfrom the table of a Student random variablewithα fixed (the one we want) and n the sam-ple size. For example, with n = 10, m = 6,

and α = 0.05, we find:

c = z14,0.025 = 2.145 ,

λ0 =(

c2

n+ m− 2− 1

)−2/(n+m)

=(

(2.145)2

14− 1

)−1/8

= −1.671 .

FURTHER READING� Alternative hypothesis� Fisher test� Hypothesis� Hypothesis testing� Maximum likelihood� Null hypothesis

REFERENCESCox,D.R.,Hinkley,D.V.: TheoreticalStatis-

tics. Chapman & Hall, London (1973)

Edwards, A.W.F.: Likelihood. An accountof the statistical concept of likelihoodand its application to scientific inference.Cambridge University Press, Cambridge(1972)

Kendall, M.G., Steward, A.: The AdvancedTheory of Statistics, vol. 2. Griffin, Lon-don (1967)

Neyman, J., Pearson, E.S.: On the use andinterpretation of certain test criteria forpurposes of statistical inference, Parts Iand II. Biometrika 20A, 175–240, 263–294 (1928)

Wald, A.: Asymptotically Most PowerfulTests of Statistical Hypotheses. Ann.Math. Stat. 12, 1–19 (1941)

Wald, A.: Some Examples of Asymptotical-ly Most Powerful Tests. Ann. Math. Stat.12, 396–408 (1941)

Page 317: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

316 Line Chart

Wilks, S.S.: Mathematical Statistics. Wiley,New York (1962)

Line Chart

The line chart is a special type of frequencygraph. It is useful in representing a frequen-cy distribution of a discrete random vari-able. The values of the variable are given inabscissa and the frequencies correspondingto these values are displayed in ordinate.

HISTORYSee graphical representation.

MATHEMATICAL ASPECTSThe line chart has two axes. The values xi ofthe discrete random variable are given inabscissa. A line is drawn over each xi valuewith a length that is equal to the frequencyni (or relative frequency fi) corresponding tothis value.

EXAMPLESThe distribution of the shoesizes ofa sampleof 84 men are given in the following table.

Shoe size Frequency

5 3

6 1

7 13

8 16

9 21

10 19

11 11

Total 84

Here is the line chart established from thistable:

FURTHER READING� Dot plot� Frequency distribution� Graphical representation� Histogram

Linear Programming

Mathematical programming concerns theoptimal allocation of limited resources inorder to reach thedesiredobjectives. Inlinearprogrammingthemathematicalmodelunderstudy is expressed by using linear relations.A linear programming problem appears inthe form of a set of linear equations (or ine-qualities), called constraints, and a linearfunction that states the objective, which iscalled the objective function.

HISTORYMathematical programming appeared inthe field of economics around 1930. Someremarkable works on linear programmingwere published by John Von Neumann(1935–1936). In 1947, George B. Dantzigpresented the simplex method for solvinglinear programming problems.

MATHEMATICAL ASPECTSA linear programming problem can bedefined as follows.

Page 318: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Linear Programming 317

Find a set of values for the variablesx1, x2, . . . , xn that maximizes (or minimizes)linearly the objective function:

z = c1 · x1 + c2 · x2 + . . .+ cn · xn

subject to the linear constraints:

a11 · x1 + a12 · x2 + . . .

+ a1n · xn

⎧⎨⎩≤=≥

⎫⎬⎭ b1 ,

a21 · x1 + a22 · x2 + . . .

+ a2n · xn

⎧⎨⎩≤=≥

⎫⎬⎭ b2 ,

...

am1 · x1 + am2 · x2 + . . .

+ amn · xn

⎧⎨⎩≤=≥

⎫⎬⎭ bm ,

and to the nonnegative constraints:

xj ≥ 0 , j = 1, 2, . . . , n ,

where aij, bi, and cj are known constants.In otherwords, thismeans that ina linearpro-gramming problem we seek a nonnegativesolution that satisfies the posed constraintsand optimizes the objective function.We can write a linear programming problemusing the following matrix notation:

Maximize (or minimize) z = c′ · x

subject to A · x⎧⎨⎩≤=≥

⎫⎬⎭ b

and x ≥ 0 ,

where c′ is a (1×n) line vector, x is an (n×1)

column vector, A is an (m× n) matrix, b is

an (m× 1) column vector, and 0 is the nullvector with n components.The principal tool for solving a linear pro-gramming problem is the simplex method.This consists of a set of rules that should befollowed to obtain the solution to a givenproblem. It is an iterative method that pro-vides an exact solution in a finite number ofiterations.

DOMAINS AND LIMITATIONSLinear programming seeks an optimal allo-cation of these resources that can produceone or more goods. This is done by maxi-mizing (or minimizing) functions such asthe profit (or cost). Therefore, the objectivefunction’s coefficients are often called the“prices” associated to the variables.

EXAMPLESA firm manufactures tables and chairs usingtwo machines, A and B. Each product ismade on both machines. To make a chair, 2 hof machine A and 1 h of machine B are need-ed. To make a table, 1 h of machine A and2 h of machine B are needed. The firmmakes a profit of 3.00CHF on each chair and4.00 CHF on each table. The two machinesare available for a maximum of 12 h daily.Given the above data, we search for the dai-ly production schedule that guarantees themaximum profit. In other words, the firmwants to know the optimal production pro-gram, that is, the amount of chairs and tablesthat must be produced per day to reach themaximum profit.We denote by x1 the number of chairs andby x2 the number of tables that must be pro-duced per day. The daily profit is given bythe following function:

z = 3 · x1 + 4 · x2 .

Page 319: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

318 Logarithmic Transformation

This objective function has to be maximized.We know that the product x1 requires 2 h ofmachine A, while product x2 requires 1 h ofmachine A. Hence, the total daily hours formachine A is:

2 · x1 + 1 · x2

since x1 chairs and x2 tables are produced.Yet the total time for machine A cannotexceed 12 h. In mathematical terms thismeans:

2x1 + x2 ≤ 12 .

In the same manner for machine B, we find:

x1 + 2x2 ≤ 12 .

We have two constraints. Furthermore, wecannot produce negative values of chairsand tables, so two additional constraints areadded:

x1 ≥ 0 and x2 ≥ 0 .

We wish to find the values for the variables x1

and x2 that satisfy the constraints and maxi-mize the profit. We can finally represent themathematical problem in the following way:

max z = 3x1 + 4x2 ,

s.t. 2x1 + x2 ≤ 12 ,

x1 + 2x2 ≤ 12 ,

and x1, x2 ≥ 0

(s.t. means subject to). The maximum prof-it is given by the pair (x1, x2) = (4, 4);28.00 CHF of profit by the production of4 chairs and 4 tables. For this problem, theresult can be found by computing the profitfor each possible pair (there are fewer than36 of them.)

FURTHER READING� Operations research� Optimization

REFERENCESArthanari, T.S., Dodge, Y.: Mathematical

Programming in Statistics. Wiley, NewYork (1981)

Dantzig, G.B.: Applications et prolonge-ments de la programmation linéaire. Dun-od, Paris (1966)

Hadley, G.: Linear programming. Addison-Wesley, Reading, MA (1962)

Von Neumann, J.: Über ein ökonomischesGleichungssystem und eine Verallge-meinerung des Brouwerschen Fixpunk-tsatzes. Ergebn. math. Kolloqu. Wien 8,73–83 (1937)

Logarithmic Transformation

One of the most useful and common trans-formations is the logarithmic transfor-mation. Indeed, before running a linearregression, it might be wise to replacethe dependent variable Y by its loga-rithm log (Y) or by a linear combinationof the latter a + b · log (Y) (a and b beingconstants adjusted by the least squares).Such an operation would stabilize the vari-ance of Y and would make the distributionof the transformed variable closer to nor-mal.A logarithmic transformation would alsoallow one to shift from a multiplicative to anadditive model with estimates that are easierto define and simpler to interpret. On the oth-er hand, the logarithmic transformation ofa gamma random variable Y results in a newrandom variable that is closer to the normaldistribution. The latter could be easily adapt-ed in a regression analysis.

Page 320: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Logistic Regression 319

FURTHER READING� Analysis of variance� Dependent variable� Regression analysis

Logistic Regression

Given a binary dependent variable, logis-tic regression offers the possibility of fittinga regression model. It permits one to studythe relation between a proportion and a setof explanatory variables, either quantitativeor qualitative.

MATHEMATICAL ASPECTSThe logistic regression model with depen-dent variable Y and independent variablesX1, X2, . . . , _p is expressed by:

logit π((

Xij)

i=1,...,p

)= log

(P(Yj = 1

)

P(Yj = 0

))

= β0 +p∑

i=1

βiXij ,

j = 1, . . . , n ,

where Yj (j = 1, . . . , n) corresponds tothe observations of variables Y and Xij totheobservationsoftheindependentvariablesand π is the probability (of success) functionof the logistic regression:

π((

Xij)

i=1,...,p

)=

exp(β0 +∑p

i=1 βiXij)

1+ exp(β0 +∑p

i=1 βiXij) .

Equivalently, we can write:

P(Yj = 1

) = exp(β0 +∑p

i=1 βiXij)

1+ exp(β0 +∑p

i=1 βiXij) ,

which permits one to compute the probabi-lity that variable Y takes a value of 1 depend-ing upon the realization of the independentvariables X1, X2, . . . , Xp.An interesting property is that the modelparameters (β0, β1, . . . , βp) can take any val-ue, sinceallvaluesof the logit transformation

logit (π (Y)) = log(

π(Y)1−π(Y)

)correspond to

a number between 0 and 1 for P (Y = 1).The model parameters can be estimatedusing the maximum likelihood method,thereby maximizing the following function:

L (β0, β1, . . . , βn) =n∏

j=1

π((

Xij)

i=1,...,p

)Yi ·

·(

1− π((

Xij)

i=1,...,p

))1−Yj

or, in an easier way, by maximizing the log-likelihood function:

log L (β0, β1, . . . , βn)

=n∑

j=1

Yj log π((

Xij)

i=1,...,p

)

+ (1− Yj

)log

(1− π

((Xij

)i=1,...,p

)).

Wecan thereforedo hypothesis testing ontheparameter values (for example β0 = 0, β3 =β2, etc.) by using the likelihood ratio test.The interpretation of the regression coeffi-cients corresponding to binary independentvariables is described in the next paragraph.When an independent variable X is contin-uous, the regression coefficient reflects therelativeriskofaone-unit increase inX.Whenthe independentvariableX hasmore thanoneclass, we must transform it into binary vari-ables. The coefficient’s interpretation is thenclear and straightforward.

Page 321: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

320 Logistic Regression

DOMAINS AND LIMITATIONSLogistic regression is often used in epide-miology since:• Its sigmoidal shape (of thepreviously giv-

en expression P(Yj = 1

)) fits very well

the relation usually observed betweena dose X and the frequency of a diseaseY: exp(x)

1+exp(x) .• It is easy to interpret: the association mea-

sure between a disease and a risk factorM (corresponding to a binary variable Xi)is expressed by the odds ratio. This isa good approximation of the relative riskwhen the probabilities P (Y = 1 |Xi = 1 )

and P (Y = 1 |Xi = 0 ) are small and it iscomputed simply as theexponentialof theparameter that is associated with the vari-able Xi; thus:

odds− ratio of the factor M = exp (βi) .

EXAMPLESWe take an example given by Hosmer andLemeshow. The dependent variable indi-cates the presence or absence of heart dis-ease for 100 subjects, and the independentvariable is age.The resulting logistic regression model is:

π (X) = −5.3+ 0.11 · X

with astandard deviation for theconstantandtheβ (coefficient of the independentvariableX) equal to 1.1337 and 0.241, respectively.The two estimated parameters are statistical-ly significant.The risk of having heart disease increases byexp (0.11) = 1.12 with every year.It is also possible to compute the confidenceinterval by taking the exponential of the

interval for β:

(exp (1.11− 1.96 · 0.241) ,exp(1.11+ 1.96 · 0.241))

= (1.065, 1.171) .

The following table gives the values for thelog-likelihood as well as the parameter val-ues thatareestimatedbymaximizing the log-likelihood function in an iterative algorithm:

Log-likelihood Constant β

1 −54.24 −4.15 0.872

2 −53.68 −5.18 1.083

3 −53.67 −5.31 1.109

4 −53.67 −5.31 1.109

FURTHER READING� Binary data� Contingency table� Likelihood ratio test� Maximum likelihood� Model� Odds and odds ratio� Regression analysis

REFERENCESBishop, Y.M.M., Fienberg, S.E., Hol-

land, P.W.: Discrete Multivariate Anal-ysis: Theory and Practice. MIT Press,Cambridge, MA (1975)

Cox, D.R., Snell, E.J.: The Analysis of Bina-ry Data. Chapman & Hall (1989)

Hosmer, D.W., Lemeshow S.: Applied logis-tic regression. Wiley Series in Probabilityand Statistics. Wiley, New York (1989)

Altman, D.G.: Practical Statistics for Med-ical Research. Chapman & Hall, London(1991)

Page 322: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Lognormal Distribution 321

Lognormal Distribution

A random variable Y follows a lognormaldistribution if the values of Y are a functionof the values of X according to the equation:

y = exp (x) ,

where therandom variableX followsanor-mal distribution.The density function of the lognormaldistribution is given by:

f (y) = 1

σy√

· exp

([− (log y− μ)2

2σ 2

]).

Lognormal distribution, μ = 0, σ = 1

The lognormal distribution is a continuousprobability distribution.

HISTORYThe lognormal distribution is due to theworks of Galton, F. (1879) and McAlis-ter, D. (1879), who obtained expressionsfor the mean, median, mode, variance,and certain quantiles of the resulting distri-bution. Galton, F. went from n independentpositive random variables and constructedthe product and, with the help of the loga-rithm, passed from the product to a sum ofnew random variables.

Kapteyn, J.C. (1903) reconsidered the con-struction of a random variable followinga lognormal distribution with van Uven,M.J.(1916). They developed a graphical methodfor the estimation of parameters. Since thattime, many works relative to the lognor-mal distribution have been published. Anexhaustive list is given by Johnson, N.L. andKotz, S. (1970).

MATHEMATICAL ASPECTSThe expected value of the lognormal distri-bution is given by:

E [Y] = exp

(μ+ 1

2σ 2

).

The variance is equal to:

Var (X) = exp (2μ)

·[exp

(2σ 2

)− exp

(σ 2

)].

If the random variable Y follows a lognor-mal distribution, the random variable

X = ln Y

follows a normal distribution.

DOMAINS AND LIMITATIONSThe lognormal distribution is largely appliedin common statistical practice. For example,the critical proportioning in pharmaceuticalapplications, the length of a visit to the doc-tor, and the distribution of economic forcesall follow a lognormal distribution.The lognormal distribution can be used as anapproximation of the normal distribution.As σ gets small, the lognormal distributiontends toward the normal distribution.

FURTHER READING� Continuous probability distribution� Normal distribution

Page 323: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

322 Longitudinal Data

REFERENCESGalton, F.: The geometric mean in vital and

social statistics. Proc. Roy. Soc. Lond. 29,365–367 (1879)

Johnson, N.L., Kotz, S.: Distributions inStatistics: Continuous Univariate Distri-butions, vols. 1 and 2. Wiley, New York(1970)

Kapteyn, J.: Skew frequency curves in biol-ogy and statistics. Astronomical Labora-tory Noordhoff, Groningen (1903)

Kapteyn, J., van Uven,M.J.:Skewfrequencycurves in biology and statistics. HoitsemaBrothers, Groningen (1916)

McAlister, D.: The law of the geometricmean. Proc. Roy. Soc. Lond. 29, 367–376(1879)

Longitudinal DataLongitudinal data are data collected on sub-jects over time. These correspond to mul-tiple observations per subject as the sameunits are repeatedly measured on more thanone occasion. Repeated measures, in biolo-gy and medicine, as well as panel studies,in social and economic sciences, are charac-teristic examples of longitudinal data stud-ies.

HISTORYSee history of panel data.

MATHEMATICAL ASPECTSA very general way to write the linear mod-el in longitudinal data analysis is the follow-ing:

yi = Xiβ +Wiγi + εi ,

where yi is the ni× 1 column response vec-tor for the subject i, Xi is an ni × b design

matrix usually consisting of dummy codedvariables, β is the ni × b regression coeffi-cient vector, Wi is an b×1 design matrix forrandom effectsγi, and εi are the subject errorterms.The general model above is a mixed modelsince it includes both fixed (β) and randomeffects (γi).Parameter estimation is often done by meansof maximum likelihood estimation withthe likelihood function strongly dependenton the assumptions made on the covariancestructure of γi and εi. It is straightforwardto construct statistical tests on the estimatedparameters, to obtain confidence intervals,and to measure prediction. Model-selectiontechniques are also used to test nested mod-els.

EXAMPLETwo groups of people follow two differenttreatments for the blood pressure, and theyhave been followed for a period of 5 months.Each group consists of five individuals. Themeasures of interest are given in the tablebelow:

Month (Jan to May)

105 68 95 86 93

100 92 98 74 35

1st group 102 85 95 62 82

70 81 72 35 39

98 34 34 37 40

83 25 31 42 38

69 44 41 30 32

2nd group 84 64 77 80 82

63 54 55 47 44

93 77 44 38 40

Each row corresponds to observations takenover time for the same individual.Below is the figure that plots the group meancurves±1 standard error.

Page 324: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

L

Longitudinal Data 323

The simplest approach in this case wouldbe to ignore the random effects (γi = 0)and to construct a simple model with fixedeffect only, that is, the group where a subjectbelongs. In such a case the model reduces to

yij = βj + εij ,

with j indicating the group and being a sim-ple dummy variable. If there is a significantdifferencebetween individuals, thenrandomeffect γi is added to the model above.

DOMAINS AND LIMITATIONSLongitudinal data are mostly used in epide-miology and medical statistics, as well as insocioeconomic studies. In the former cas-es longitudinal data represent repeated mea-surements for different treatment groups inclinical trials. In the latter we note the panelstudies that usually measure social and eco-

nomic indexes, such as income, consump-tion, and job satisfaction, in different timeperiods.There are many possible complications inlongitudinal data analysis. Firstly, data arecorrelated since the same subjects are mea-sured over time. This must be accounted forin thefinalanalysis in order to provideproperresults. Moreover, missing data often appearin such data analysis since not all of the sub-jects are measured during the whole timeperiod. The latter becomes more problemat-ic, for example, in survival analysis and fol-lowup studieswhere thesubjects (usually thepatients) die off or quit the trials and abandonthe treatments.

FURTHER READING� Panel

REFERENCESTwisk, J.W.R.: Applied Longitudinal Data

Analysis for Epidemiology. A Practi-cal Guide. Cambridge University Press,Cambridge (2003)

Fitzmaurice, G., Laird, N., Ware, J.: AppliedLongitudinal Analysis. Wiley, New York(2004)

Molenberghs, G., Geert, V.: Models for Dis-crete Longitudinal Data Series. SpringerSeries in Statistics. Springer, Berlin Hei-delberg New York (2005)

Page 325: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Mahalanobis Distance

The Mahalanobis distance is the distancefrom X to the quantity μ defined as:

d2M(X, μ) = (X− μ)t

∑−1(X− μ) .

This distance is based on the correla-tion between variables or the variance–covariance matrix.It differs from the Euclidean distance in thatit takes into account the correlation of thedata set and does not depend on the scaleof measurement. Mahalanobis distance iswidely used in cluster analysis and otherclassification methods.

HISTORYThe Mahalanobis distance was introducedby Mahalanobis, P.C. in 1936.

MATHEMATICAL ASPECTSIf μi denotes E(Xi), then by definition theexpected value of X = (X1, . . . , Xp) vector:

E[X] = μ =

⎡⎢⎢⎢⎢⎢⎢⎢⎣

μ1

μ2

.

.

.μp

⎤⎥⎥⎥⎥⎥⎥⎥⎦

.

The variance–covariance matrix Σ of X isdefined by:

Σ =

⎡⎢⎢⎢⎢⎣

σ 21 cov(X1, X2) . . . cov(X1, Xp)

σ 22

.... . .

...

σ 2p

⎤⎥⎥⎥⎥⎦

= E[XX′]− μμ′ .

It is a square symmetrical matrix of order p.If the Xi variables are standardized, then Σ

is identical to the correlation matrix:

⎡⎢⎢⎢⎢⎢⎢⎣

1 ρ12 ρ13 . . . ρ1p

1 ρ23 . . . ρ2p

1 . . . ρ3p

. . ....1

⎤⎥⎥⎥⎥⎥⎥⎦

.

Then Σ−1/2. Y = Σ−1/2(X − μ) is a ran-dom standardized vector with noncorrelat-ed components. The random variable (X −μ)′Σ−1(X−μ) = D2 has expected value ofp. In fact, D2 = ∑p

i=1 Y2i , where the Yi are

of random variables with mean 0 and 1. D iscalled a Mahalanobis distance from X to μ.

Page 326: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

326 Mahalanobis, Prasanta Chandra

FURTHER READING� Classification� Cluster analysis� Correspondence analysis� Covariance analysis� Distance

REFERENCESMahalanobis, P.C.: On Tests and Meassures

of Groups Divergence. Part I. Theoreticalformulae. J. Asiatic Soc. Bengal 26, 541–588 (1930)

Mahalanobis, P.C.: On the generalized dis-tance in statistics, Proc. natn. Inst. Sci.India 2, 49–55 (1936)

Mahalanobis, Prasanta Chandra

Prasanta Chandra Mahalanobis (1893–1972) studied in school at Calcutta until1908. In 1912, he received his B.Sc. inphysics at the Presidential College of Cal-cutta. In 1915, he went to England to earna Tripos (B.Sc. received at Cambridge) inmathematics and physics at King’s Col-lege of Cambridge. Before beginning hisresearch, he returned to Calcutta on holiday,but he did not return to England because ofthe war.As a statistician, Mahalanobis did not usestatistics as an end in itself but mostly asa tool for comprehending and interpret-ing scientific data as well as for makingdecisions regarding society’s well-being. Heused statistics first in his study of anthropol-ogy (about1917)and then floods. In 1930,hepublished an article on the D2 statistic called“Test and Measures of Group Divergence”.His greatest contribution to statistics is sure-ly the large-scale census. He made manycontributions in this area such as the opti-

mal choice of sampling using variance andcost functions. He was named president ofthe United Nations Subcommission on Sta-tistical Sampling in 1947, a position he helduntil 1951.His friendship with Ronald Aylmer Fisher,which lasted from 1926 until Fisher’s death,is also notable.

Principal works and articles of PrasantaChandra Mahalanobis:

1930 On tests and meassures of groupsdivergence.Part I.Theoretical formu-lae. Journal of the Asiatic Society ofBengal, 26 pp. 541–588.

1936 On the generalized distance in statis-tics, Proc. Natn. Inst. Sci. India 2, 49–55.

1940 A sample survey of the acreage underjute in Bengal. Sankhya 4, 511–530.

1944 On large-scale sample surveys. Phi-los. Trans. Roy. Soc. Lond. Ser. B231329-451.

1963 The Approach of OperationalResearch to Planning in India. AsiaPublishing House, Bombay.

FURTHER READING� Mahalanobis distance

Main Effect

In a factorial experiment, a main effect isthe contribution of a factor level specific tothe response variable.

MATHEMATICAL ASPECTSThe linearmodel thatonly takes into accountthe main effects of factors and that ignoresthe interactions among factors is called an

Page 327: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Mann–Whitney Test 327

additive model. Consider the additive modelof a double classification design:

Yij = μ+ αi + βj + εij ,

i = 1, 2, . . . , a and j = 1, 2, . . . , b ,

where Yij is the response receiving the treat-ment ij, μ is the mean common to all thetreatments,αi is the main effect of the ith lev-el of factor A, βj is the main effect of the jthlevel of factor B, and εij is the experimentalerror of the observation Yij.If a = b = 2, then the principal effects offactorsAandB,denotedaandb, respectively,are given by:

a = (α1 − α0) (β1 + β0)

2,

b = (α1 + α0) (β1 − β0)

2.

FURTHER READING� Analysis of variance� Estimation� Factor� Factorial experiment� Hypothesis testing� Interaction

REFERENCESBox, G.E.P., Hunter, W.G., Hunter, J.S.:

Statistics for Experimenters. An Intro-duction to Design, Data Analysis, andModel Building. Wiley, New York (1978)

Mann–Whitney Test

TheMann–Whitney test isanonparametrictest that aims to test theequality of two popu-lations. It is used when we have two samplescoming from two populations.

HISTORYThe Mann–Whitney test was first introducedfor the case m = n by Wilcoxon, Frank(1945).Then, theWilcoxontestwasextend-ed to the case of samples of different dimen-sions by White, C. (1952).Note that a test equivalent to those of theWilcoxon test was developed independent-ly and introduced by Festinger, L. (1946).Mann, H.B. and Whitney, D.R. (1947) werethe first to consider the samples of differentdimensions and to provide the tables adaptedto small samples.

MATHEMATICAL ASPECTSLet (X1, X2, . . . , Xn) be a sample of dimen-sion n coming from a population 1, and let(Y1, Y2, . . . , Ym) be a sample of dimension mcoming from a population 2.We get N = n+m observations that we classin increasing order noting each time the sam-ple to which a given observation belongs.The Mann and Whitney statistic, denoted byU, is defined as the total number of times thatan Xi precedes a Yj in the classification inincreasingorderof theN observations.Whenthe dimension of the sample is big enough,the determination of U becomes long, but wecanuse thefollowingrelation thatgives iden-tical results:

U = mn+ n (n+ 1)

2− T ,

where T is the sum of ranks attributed to theX from the Wilcoxon test.

HypothesesThe hypotheses corresponding to theMann–Whitney test can be formulated asfollows, according to the one-tail or two-tailcase:

Page 328: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

328 Mann–Whitney Test

A: Two-sided case:

H0 : P (X < Y) = 12

H1 : P (X < Y) �= 12 .

B: One-sided case:

H0 : P (X < Y) ≤ 12

H1 : P (X < Y) > 12 .

C: One-sided case:

H0 : P (X < Y) ≥ 12

H1 : P (X < Y) < 12 .

In case A, the null hypothesis correspondsto the situation where there is no differencebetween the populations. In case B, the nullhypothesismeans thatpopulation 1 isgreaterthan population 2. In case C, the null hypoth-esis indicates that population 1 is smallerthan population 2.

Decision RulesIfm, n < 12,wecomparestatisticU with thevalue found in the Mann–Whitney table totest H0. On the other hand, if m, n≥ 12, thenthe sampling distribution of U approach-es very quickly the normal distribution with

mean μ = m · n2

and standard deviation

σ =√

mn (m+ n+ 1)

12.

Thus we can determine the meaning of theobserved value of U by:

Z = U − μ

σ,

where Z is a random standardized variable.In other words, we can compare the value ofZ thus obtained with the normal table.Moreover, the decision rules are differentand depend on the posed hypotheses. In thisway we have rules A, B, and C relative to theprevious cases A, B, and C.

Decision rule A

Wereject thenullhypothesisH0at the signif-icance level α if U is smaller than the valuein the Mann–Whitney table with parametersn, m, and α

2 denoted by tn,m,1− α2, or if U is

greater than the value in the Mann–Whitneytable for n, m, and 1− α

2 , denoted tn,m, α2

, thatis, if

tn,m,1− α2

< U < tn,m, α2

.

Decision rule B

Wereject thenullhypothesisH0 at the signif-icance level α if U is smaller than the valuein the Mann–Whitney table with parametersn, m, and α, denoted by tn,m,α, that is, if

U < tn,m,α .

Decision rule C

We reject the null hypothesis H0 at the sig-nificance level α if U is greater than the valuein the Mann–Whitney table with parametersn, m, and 1− α, denoted by tn,m,1−α, that is,if

U > tn,m,1−α .

DOMAINS AND LIMITATIONSAs there exists a linear relation between theWilcoxon test and the Mann–Whitney test(U = mn+n(n+1)/2−T), these two testsare equivalent.Thus they have two commonproperties, and-both must be respected:1. Both samples are random samples taken

from their respective populations.2. In addition to the independence inside

each sample, there is mutual indepen-dence between samples.

3. The scale of measurement is at least ordi-nal.

Page 329: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Mann–Whitney Test 329

EXAMPLESIn a class, we count nine pupils: five boys andfour girls. The class takes a test on mentalcalculation to see if the boys are, in general,better than the girls in this domain.The data are in the following table, the high-est scorecorresponding to thebest test result.

Boys (Xi ) Girls (Yi )

9.2 11.1

11.3 15.6

19.8 20.3

21.1 23.6

24.3

We class the score in increasing order notingeach time the sample to which agivenobser-vation belongs.

Score Sample Rank Number of times Xprecedes Y

9.2 X 1 4

11.1 Y 2 –

11.3 X 3 3

15.6 Y 4 –

19.8 X 5 2

20.3 Y 6 –

21.1 X 7 1

23.6 Y 8 –

24.3 X 9 0

We find U = 4 + 3 + 2 + 1 = 10. In thesame way, if we use the relation U = mn+n(n+1)

2 − T, then we have:

T =n∑

i=1

R(Xi) = 1+ 3+ 5+ 7+ 9 = 25

and U = 5 · 4+ 5·62 − 25 = 10.

The hypotheses to be tested are as follows:

H0: Boys are not generally better than girlsin mental calculation.

H1: Boys are generally better than girls inmental calculation.

The null hypothesis may be restated as fol-lows: H0: P(X < Y) ≥ 1/2 and correspondstocaseC.Thedecisionrule is thenasfollows:Reject H0 at significance level α if

U > tn,m,1−α ,

where tn,m,1−α is the value in the Mann–Whitney table with parameters n, m, and 1−α. Recall that in our case, m = 4 and n = 5.As we have a relation between the Wilcox-on test and the Mann–Whitney test, insteadof using the U table, we use the Wilcoxontable for T. If we choose α = 0.05, then thevalue of the Wilcoxon table equals 32. AsT = 25 < 32, we do not reject H0 and weconclude that boys are not generally betterthan girls in mental calculation.

FURTHER READING� Hypothesis testing� Nonparametric test� Wilcoxon test

REFERENCEFestinger, L.: The significance of differ-

ence between means without reference tothe frequency distribution function. Psy-chometrika 11, 97–105 (1946)

Mann, H.B., Whitney, D.R.: On a testwhether one of two random variables isstochastically larger than the other. Ann.Math. Stat. 18, 50–60 (1947)

White,C.:Theuseof ranks in a testof signifi-cance for comparing two treatments. Bio-metrics 8, 33–41 (1952)

Wilcoxon, F.: Individual comparisons byranking methods. Biometrics 1, 80–83(1945)

Page 330: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

330 Marginal Density Function

Marginal Density FunctionIn the case of a pair of random variables(X, Y), when random variable X (or Y) isconsidered by itself, its density function iscalled the marginal density function.

HISTORYSee probability.

MATHEMATICAL ASPECTSIn the case of a pair of continuous randomvariables (X, Y) with a joint density func-tion f (x, y), the marginal density functionof X is obtained by integration:

P(X ≤ a,−∞ < Y <∞)

=∫ a

−∞

∫ ∞−∞

f (x, y)dydx

=∫ a

−∞fX(x)dx ,

where fX(x) =∫ ∞−∞

f (x, y)dy is the density

of X.The marginal density function of Y isobtained in the same way:

fY(y) =∫ ∞−∞

f (x, y) dx .

FURTHER READING� Density function� Joint density function� Pair of random variables

REFERENCESHogg, R.V., Craig, A.T.: Introduction

to Mathematical Statistics, 2nd edn.Macmillan, New York (1965)

Ross, S.M.: Introduction to ProbabilityModels, 8th edn. John Wiley, New York(2006)

Marginal Distribution FunctionIn the case of a pair of random variables(X, Y), when random variable X (or Y) isconsidered by itself, its distribution func-tion is called the marginal distribution func-tion.

MATHEMATICAL ASPECTSConsiderapair of random variables (X, Y)

and their joint distribution function:

F (a, b) = P (X ≤ a, Y ≤ b) .

The marginal distribution function of X isequal to:

FX(a) = P(X ≤ a)

= P(X ≤ a, Y <∞)

= P[ limb→∞(X ≤ a, Y ≤ b)]

= limb→∞P(X ≤ a, Y ≤ b)

= limb→∞F(a, b)

= F(a,∞) .

In a similar way, the marginal distributionfunction of Y is equal to:

FY (b) = F (∞, b) .

FURTHER READING� Distribution function� Joint distribution function� Pair of random variables

Markov, Andrei AndreevichMarkov, Andrei Andreevich was born in1856 in Ryazan and died in 1922 in St. Pe-tersburg.This student of Tchebychev, PafnutiiLvovichmadeimportantcontributionsto the

Page 331: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Matrix 331

field of probability calculus. He gave a rig-orous proof of the central limit theorem.Through his works on Markov chains, theconcept of Markovian dependence evolvedinto the modern theory and application ofrandom processes. The globalism of hiswork has influenced the development ofprobability and international statistics.He studied mathematics at St. PetersburgUniversity, where he received his bach-elor degree, and his doctorate in 1878.He then taught, continuing the courses ofTchebychev, Pafnutii Lvovich after Tcheby-chev, P.L. left the university. In 1886, he waselected member of the School of Mathe-matics founded by Tchebychev, P.L. at theSt. Petersburg Academy of Sciences. Hebecame a full member of the Academy in1886 and in 1905 retired from the university,where he continued teaching.Markov and Liapunov were famous studentsof Tchebychev, P.L. and were ever striving toestablishprobabilityasanexactandpracticalmathematical science.The first appearance of Markov chains inMarkov’s work happened in Izvestiia (Bul-letin) of the Physico-Mathematical Soci-ety of Kazan University. Markov complet-ed the proof of the central limit theorem thatTchebychev started but did not finish. Heapproached it using the method of moments,and the proof was published in the third edi-tion of Ischislenie Veroiatnostei (ProbabilityCalculus).

Principal work of Markov, Andrei Andree-vich:

1899 Application des functions continuesau calcul des probabilités, Kasan. Bull.9(2), 29–34.

FURTHER READING� Central limit theorem� Gauss–Markov theorem

REFERENCESMarkov, A.A.: Application des functions

continues au calcul des probabilités.Kasan. Bull. 9(2), 29–34 (1899)

Matrix

A rectangular table with m lines and ncolumns is called an (m×n) matrix. The ele-ments figuring in the table are called coeffi-cients or components of the matrix.

MATHEMATICAL ASPECTSA matrix can be represented as follows:

A =

⎡⎢⎢⎢⎣

a11 a12 . . . a1n

a21 a22 . . . a2n

......

. . ....

am1 am2 . . . amn

⎤⎥⎥⎥⎦ .

An element is characterized by its valueand position. Therefore aij is an element ofmatrix A where i indicates the line numberand j the column number. A matrix with mlines and n columns is of order (m×n).Thereare different types of matrices:• Square matrix of order n

A matrix is called a square matrix of ordern if it hasasmany linesascolumns(denot-ed by n).

• Null matrixThe matrix on which all the elements arenull is called the null matrix. It will bedenoted by O.

• Identity matrixFor any square matrix of order n, the ele-ments that have the same index for the line

Page 332: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

332 Matrix

as for the column [meaning the elementsin position (i, i)] form the main diagonal.An identity matrix is a square matrixthat has “1” on the main diagonal and“0” everywhere else. It is denoted by I orsometimes by In, where n is the order ofthe identity matrix.

• Symmetric matrixA square matrix A is symmetric if A = A′,where A′ is the transpose of matrix A.

• Antisymmetric matrixA square matrix A is antisymmetric if A=−A′, where A′ is the transpose of matrixA; this condition leads to the elements onthe main diagonal aij for i = j being null.

• Scalar matrixA square matrix A is called a scalar matrixif

aij = β for i = j ,

aij = 0 for i �= j ,

where β is a scalar.• Diagonal matrix

A square matrix A is called a diagonalmatrix if aij = 0 for i �= j and where aij

is any number.• Triangular matrix

A square matrix A is called upper triangu-lar if aij = 0 for i > j, or lower triangularif aij = 0 for i < j.

Operations on Matrices• Addition of matrices

IfA = (aij)andB = (bij)aretwomatricesof order (m× n), then their sum A+ B isdefined by the matrix C = (cij) of order(m×n) of which each element is the sumof corresponding elements of A and B:

C = (cij

) = (aij + bij

).

• Multiplication of a matrix by a scalarIf α is a scalar and A a matrix, the productαA is the matrix of the same order as A

obtained by multiplying each element aij

of A by the scalar α:

αA = Aα = (α · aij

).

• Multiplication of matricesLet A and B be two matrices; the productAB is defined if and only if the number ofcolumns of A is equal to the number oflines of B. If A is of order (m × n) andB of order (n× q), then the product AB isdefined by the matrix C of order (m× q)

whose elements are obtained by calculat-ing:

cij =n∑

k=1

aik · bkj

for i = 1, . . . , m and j = 1, . . . , q .

Thismeans that theelementsof the ith lineof A are multiplied by the correspondingelements of the jth column of B, and theresults are added.

• Integer power of square matricesLet A be a square matrix; if p is a positiveinteger, the pth power of A is defined by:

Ap = A · A · . . . · A p times .

Normally, for p = 0, we pose A0 = I,where I is the identity matrix of the sameorder as A.

• Square root of a diagonal matrixLet A be a diagonal matrix of nonnegativecoefficients. We denote by

√A the square

root of A, defined by the diagonal matrixof thesameorderasA, inwhichwereplaceeach element by its square root:

√A =

⎡⎢⎢⎢⎣

√a11 0 . . . 00

√a22 . . . 0

......

. . ....

0 . . . 0√

ann

⎤⎥⎥⎥⎦ .

Page 333: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Matrix 333

DOMAINS AND LIMITATIONSThe use of matrices, or generally matrix cal-culus, is widely used in regression analysis,in analysis of variance, and in many otherdomains of statistics. It allows to describe inthe most concise manner a system of equa-tions and facilitates greatly the notations.For the same reasons, the matrices appearin analyses of data where we generallyencounter large data tables.

EXAMPLESLet A and B be two square matrices oforder 2:

A =[

1 21 3

]and B =

[6 2−2 1

].

The sum of two matrices is defined by:

A+ B =[

1 21 3

]+

[6 2−2 1

]

=[

1+ 6 2+ 21− 2 3+ 1

]

=[

7 4−1 4

].

The multiplication of matrix A by 3 gives:

3 ·A = 3 ·[

1 21 3

]=

[3 · 1 3 · 23 · 1 3 · 3

]

=[

3 63 9

].

The matrix product of A and B equals:

A · B =[

1 21 3

]·[

6 2−2 1

]

=[

1 · 6+ 2 · (−2) 1 · 2+ 2 · 11 · 6+ 3 · (−2) 1 · 2+ 3 · 1

]

A · B =[

2 40 5

].

Remark: A · B and B · A are generally notequal, which illustrates the noncommutativ-ity of the matrix multiplication.The square of matrix A is obtained by mul-tiplying A by A. Thus:

A2 = A · A =[

1 21 3

]·[

1 21 3

]

=[

12 + 2 · 1 1 · 2+ 2 · 312 + 3 · 1 1 · 2+ 3 · 3

]

A2 =[

3 84 11

].

Let D be the following diagonal matrix oforder 2:

D =[

9 00 2

].

We define the square root of square matrixD, denoted

√D, by:

√D =

[ √9 0

0√

2

]=

[3 0

0√

2

].

FURTHER READING� Correspondence analysis� Data analysis� Determinant� Hat matrix� Inertia matrix� Inverse matrix� Multiple linear regression� Regression analysis� Trace� Transpose

REFERENCESBalestra, P.: Calcul matriciel pour

économistes. Editions Castella, Albeuve,Switzerland (1972)

Page 334: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

334 Maximum Likelihood

Graybill, F.A.: Theory and Applications ofthe Linear Model. Duxbury, North Scit-uate, MA (Waldsworth and Brooks/Cole,Pacific Grove, CA ) (1976)

Searle, S.R.: Matrix Algebra Useful forStatistics. Wiley, New York (1982)

Maximum Likelihood

The term maximum likelihood refers toa method of estimating parameters of a pop-ulation from a random sample. It is appliedwhen we know the general form of distri-bution of the population but when one ormore parameters of this distribution areunknown. The method consists in choosingan estimator of unknown parameters whosevalues maximize the probability of obtain-ing the observed sample.

HISTORYGenerally the maximum likelihood methodis attributed to Fisher, Ronald Aylmer.However, the method can be traced backto the works of the 18th-century scientistsLambert, J.H. and Bernoulli, D. Fisher, R.A.introduced the method of maximum likeli-hood in his first statistical publications in1912.

MATHEMATICAL ASPECTSConsider a sample of n random variablesX1, X2, . . . , Xn, where each random vari-able is distributed according to a probabi-lity distribution giving a density functionf (x, θ), with θ ε �, where � is the param-eter space. The joint density function ofX1, X2, . . . , Xn is

f (x1, θ) · f (x2, θ) · . . . · f (xn, θ) .

For fixed x, this function is a function of θ

and is called likelihood function L:

L (θ; x1, x2, . . . , xn) = f (x1, θ) · f (x2, θ)

· . . . · f (xn, θ) .

An estimator is the maximum likelihoodestimator of θ if it maximizes the likeli-hood function. Note that this function L canbe maximized by setting to zero the firstpartial derivative by θ and resolving theequations thus obtained. Moreover, wheneach function L and lnL are maximizedfor each value of θ , it is often possible toresolve:

∂ ln L (θ; x1, x2, . . . , xn)

∂θ= 0 .

Finally, the maximum likelihood estima-tors are not always unique and this methoddoes not always give unbiased estima-tors.

EXAMPLESLet X1, X2, . . . , Xn be a random sample com-ing from the normal distribution with themean μ and the variance σ 2. The methodof estimation of maximum likelihood con-sists in choosing as estimators of unknownparameters μ and σ 2 the values that maxi-mize the probability of having the observedsample.Thus maximizing the likelihood functionentails the following:

L(μ, σ 2; x1 . . . , xn

)

= 1(σ√

2π)n exp

⎛⎜⎜⎝−

n∑i=1

(xi − μ)2

2σ 2

⎞⎟⎟⎠ ,

Page 335: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Maximum Likelihood 335

or, which is the same thing, its natural loga-rithm:

ln L = ln

⎡⎢⎢⎣

1(σ√

2π)n

· exp

⎛⎜⎜⎝−

n∑i=1

(xi − μ)2

2σ 2

⎞⎟⎟⎠

⎤⎥⎥⎦

= ln(σ√

2π)−n −

n∑i=1

(xi − μ)2

2σ 2

= ln (2π)−n2 + ln (σ )−n

n∑i=1

(xi − μ)2

2σ 2

= −n

2· ln (2π)− n · ln(σ )

n∑i=1

(xi − μ)2

2σ 2 .

We set equal to zero the partial derivatives∂ ln L

∂μand

∂ ln L

∂σ:

∂ lnL

∂μ= 1

σ 2 ·n∑

i=1

(xi − μ) = 0 , (1)

∂ lnL

∂σ= − n

σ+ 1

σ · σ 2 ·n∑

i=1

(xi − μ)2

= 0 .

(2)

From (1) we deduce:

1

σ 2 ·n∑

i=1

(xi − μ) = 0 ,

n∑i=1

(xi − μ) = 0 ,

n∑i=1

xi − n · μ = 0 ,

μ = 1

n∑i=1

xi = x .

Thusfor thenormal distribution, thearith-metic mean is the maximum likelihood esti-mator.From (2) we get:

− n

σ+ 1

σ · σ 2 ·n∑

i=1

(xi − μ)2 = 0 ,

1

σ · σ 2·

n∑i=1

(xi − μ)2 = n

σ,

σ 2 = 1

n∑i=1

(xi − μ)2 .

We find in this case as maximum likelihoodestimator the empirical variance of the sam-ple.Note that the maximum likelihood estima-tion for the mean is without bias and those ofthe variance are biased (divided by n insteadof n− 1). We find that for the normal distri-bution, the mean and variance of the sam-ple are the maximum likelihood estimatorsof the mean and variance of the population.This is not necessarily true for other dis-tributions: consider, for example, the caseof the uniform distribution in the interval[a, b], where a and b are unknown and wherethe parameter to be estimated is the mean ofthe distribution μ = (a+ b) /2. We can ver-ify that the estimator of the maximum like-lihood of μ equals:

μ = x1 + xn

2,

where x1 and xn are the smallest andthe largest observations of the sample(x1, x2, . . . , xn), respectively. This estimatoris the middle point of the interval with thelimits x1 and xn. We see that in this case, the

Page 336: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

336 Mean

maximum likelihood estimator is differentfrom the mean of the sample:

μ �= x .

FURTHER READING� Estimation� Estimator� Likelihood ratio test� Robust estimation

REFERENCESFisher, R.A.: On an absolute criterium for

fitting frequency curves. Mess. Math. 41,155–160 (1912)

Mean

The mean of a random variable is a centraltendency measureof thisvariable. It isalsocalled the expected value. In practice, theterm mean is often used in the meaning ofthe arithmetic mean.

HISTORYSee arithmetic mean.

MATHEMATICAL ASPECTSLet X be a continuous random variablewhose density function is f (x). The meanμ of X, if it exists, is given by:

μ =∫

x · f (x) dx .

The mean of a discrete random variable X ofprobability function P (X = x), if it exists,is given by:

μ =∑

x · P (X = x) .

DOMAINS AND LIMITATIONSEmpirically, the determination of the arith-metic mean of a set of observations givesus a measure of the central value of this set.This measure gives us an estimation of themean μ of random variable X from whichthe observations are given.

FURTHER READING� Arithmetic mean� Expected value� Measure of central tendency� Measure of kurtosis� Measure of skewness� Weighted arithmetic mean

Mean Absolute Deviation

The mean deviation of a set of quantitativeobservations is a measure of dispersionindicating the mean of the absolute valuesof the deviations of each observation withrespect to a measure of position of thedistribution.

HISTORYSee L1 estimation.

MATHEMATICAL ASPECTSConsider x1, x2, . . . , xn a set of n quantities orof n observations related to a quantitativevariable X. According to the definition, themean deviation is calculated as follows:

EM =

n∑i=1|xi − x|

n,

where x is the arithmetic mean of the obser-vations. When the observations are ordered

Page 337: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Mean Squared Error 337

in a frequency distribution, the mean devi-ation is calculated as follows:

EM =

k∑i=1

fi · |xi − x|k∑

i=1fi

,

where

xi represents the different values of thevariable,

fi is the frequencies associated to thesevalues, and

k is the number of different values.

DOMAINS AND LIMITATIONSThe mean absolute deviation is a measureof dispersion more robust than the standarddeviation.

EXAMPLESFivestudents successively passed two examsfor which they obtained the followinggrades:Exam 1:

3.5 4 4.5 3.5 4.5 x = 20

5= 4

Exam 2:

2.5 5.5 3.5 4.5 4 x = 20

5= 4

The arithmetic mean xof the grades is iden-tical for both exams. Nevertheless, the vari-ability of the grades (or the dispersion) is notthe same.For exam 1, the mean deviation is equal to:

Meandeviation

= |3.5− 4|5

+ |4− 4|5

+ |4.5− 4|5

+ |3.5− 4|5

+ |4.5− 4|5

= 2

5= 0.4 .

Forexam2, themeandeviation ismuchhigh-er:

Meandeviation

= |2.5− 4|5

+ |5.5− 4|5

+ |3.5− 4|5

+ |4.5− 4|5

+ |4− 4|5

= 4

5= 0.8 .

The grades of the second exam are thereforemoredispersed around thearithmeticmeanthan the grades of the first exam.

FURTHER READING� L1 estimation� Measure of dispersion� Standard deviation� Variance

Mean Squared Error

Let θ be an estimator of parameter θ .The squared error in estimating θ by θ is(θ − θ

)2.

The mean squared error (MSE) is the mathe-matical expectation of this value:

MSE(θ) = E

[(θ − θ

)2]

.

We can prove that the mean squared errorequals the sum of the variance of the esti-mator and of the square of the bias:

MSE(θ) = Var(θ)+

(E

[θ]− θ

)2.

Thus when the considered estimator is with-out bias, the mean squared error correspondsexactly to the variance of the estimator.

Page 338: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

338 Mean Squared Error

HISTORYIn the early 19th century, de Laplace,Pierre Simon (1820) and Gauss, CarlFriedrich (1821) compared the estimationof an unknown quantity, based on obser-vations with random errors, with a chancegame. They drew a parallel between theerrorof the estimated value and the loss due tosuch a game. Gauss, Carl Friedrich proposedameasureof inexactitude, thesquareoferror.He recognized that his choice was random,but he justified it by the mathematical sim-plicity of the function “make the square”.

MATHEMATICAL ASPECTSLet X1, X2, . . . , Xn a sample of dimension nand θ anestimatorofparameterθ ; thedevi-ation (squares) of the estimation is given by(θ − θ

)2.

The mean squared error is defined by:

MSE(θ)= E

[(θ − θ

)2]

.

We can prove that:

MSE(θ)= Var

(θ)+

(E

[θ]− θ

)2,

that is, the mean squared error equals thesum of the variance of the estimator and thesquare of the bias.In fact:

MSE(θ ) = E

[(θ − θ

)2]

= E

[(θ − E

[θ]+ E

[θ]− θ

)2]

= E[(θ − E

[θ])2 +

(E

[θ]− θ

)2

+ 2 ·(θ − E

[θ])·(

E[θ]− θ

)]

= E

[(θ − E

[θ])2

]+ E

[(E

[θ]−θ

)2]

+ 2 ·(θ − E

[θ])· (E

[θ]− E

[θ])

= E

[(E

[θ]− θ

)2]+

(θ − E

[θ])2

= Var(θ)+ (bias)2 ,

where the bias is E[θ]− θ .Thus when the estimator θ is without bias,that is, E[θ] = θ , we get:

MSE(θ)= Var

(θ)

.

The efficiency of an estimator is inverselyproportional to its mean squared error.We say that the estimator θ is consistent if

limn→∞MSE(θ) = 0 .

In this case the variance and the bias of theestimator tendto0whenthenumberofobser-vations n tends to infinity.

DOMAINS AND LIMITATIONSCertain authors prefer to use the mean devi-ation in absolute value defined by E[|θ−θ|],where θ is the estimator of parameter θ .The disadvantage of the last definition is atthe levelofmathematicalproperties,becausethe function of absolute value cannot be dif-ferentiated everywhere.

EXAMPLESWe consider a multiple linear regression inmatrix form:

Y = X · β + ε ,

whereY is a vector of order n (n observations of

the independent variable),X is a given matrix n× p,β is the vector of order p of the coefficients

to be determined, and

Page 339: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Measure of Central Tendency 339

ε is the vector of errors that we suppose areindependentandeachfollowinganormaldistribution with mean 0 and varianceσ 2.

The vector β = (X′ ·X)−1 ·X′ ·Y gives anestimator without bias of β.We then obtain:

MSE(β)= Var

(β)

.

FURTHER READING� Bias� Estimator� Expected value� Variance

REFERENCESGauss, C.F.: In: Gauss’s Work (1803–

1826) on the Theory of Least Squares.Trans. Trotter, H.F. Statistical TechniquesResearch Group, Technical Report No. 5.Princeton University Press, Princeton, NJ(1821)

Laplace, P.S. de: Théorie analytique desprobabilités, 3rd edn. Courcier, Paris(1820)

Measureof Central Tendency

A measure of central tendency is a statis-tic that summarizes a set of data relativeto a quantitative variable. More precisely,it allows to determine a fixed value, calleda central value, around which the set of datahas the tendency to group. The principalmeasures of central tendencies are:• Arithmetic mean• Median• Mode

MATHEMATICAL ASPECTSThe mean μ, median Md, and mode Mo aremeasures of the center of distribution.In a symmetric distribution, the mean, medi-an, and mode coincide in a common value.This value gives the center of symmetry ofthe distribution. For a symmetric unimodaldistribution, we have:

μ = Md = Mo .

In thiscase, an estimationof thecenterof thedistribution can be given by the mean, medi-an, or mode.In an asymmetric distribution, the differentmeasures of central tendency have differentvalues. (Their comparison can be used asa measure of skewness.)For a unimodal distribution stretched to theright, we generally have:

μ ≥ Md ≥ Mo .

This relation is inverted when the distri-bution is unimodal and stretched to the left:

μ ≤ Md ≤ Mo .

For a unimodal distribution, moderatelyasymmetric, it was empirically proved that

Page 340: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

340 Measure of Central Tendency

the mode Mo, the median Md, and the meanμ often satisfy the following relation, in anapproximative manner:

Mo − μ = 3 · (Md − μ) .

DOMAINS AND LIMITATIONSIn practice, the choice of the measure allow-ing one to characterize at best the center ofa set of observations depends on the specif-ic features of the statistical studies and theinformation we wish to obtain.Let us compare the arithmetic mean, themode, and the median depending on differ-ent criteria:1. The arithmetic mean:• Depends on the value of all observa-

tions.• Is simple to interpret.• Is the most familiar and the most used

measure.• Is frequently used as an estimator of

the mean of the population.• Has a value that can be falsified by the

outliers.• In addition, the sum of squared devia-

tions of each observation xi of a set ofdata and a value α is minimal when α

equals the arithmetic mean:

minα

n∑i=1

(xi − α)2

⇒ α = arithmet. mean

2. The median:• Is easy to determine because only one

data classification is needed.• Is easy to understand but less used than

the arithmetic mean.• Is not influenced by outliers, which

gives it an advantage over the arith-

metic mean, if the series really haveoutliers.

• Is used as an estimator of central val-ues of a distribution, especially whenit is asymmetric or has outliers.

• The sum of squared deviations in abso-lute value between each observation xi

of a set of data and a valueα is minimalwhen α equals the median:

minα

n∑i=1

|xi − α| ⇒ α = Md .

3. The mode:• Has practical interest because it is the

most represented value of a set.• Is in any event a rarely used measure.• Has a value that is little influenced by

outliers.• Has a value that is strongly influenced

by the fluctuations of a sampling. Itcan strongly vary from one sample toanother.

In addition, there can be many (or no) modesin a data set.

FURTHER READING� Arithmetic mean� Expected value� Mean� Measure of skewness� Median� Mode� Weighted arithmetic mean

REFERENCESStuart,A.,Ord,J.K.:Kendall’sadvancedthe-

oryofstatistics.Vol. I.Distribution theory.Wiley, New York (1943)

Dodge, Y.: Premiers pas en statistique.Springer, Paris (1999)

Page 341: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Measure of Dissimilarity 341

Measure of Dispersion

A measure of dispersion allows to describea set of data concerning a particular vari-able, giving an indication of the variabilityof the values inside the data set. The mea-sure of dispersion completes the descriptiongiven by the measure of central tendencyof a distribution.If we observe different distributions, we cansay that for some of them, all the data aregrouped in amoreor less shortdistance fromthe central value; for others the distance ismuch greater.We can class the measures of dispersion intotwo groups:1. Measuresdefinedbythedistancebetween

two representative values of the distri-bution:• Range, also called the interval of vari-

ation

• Interquartile Range

2. Measures calculated depending on devia-tions of each datum from a central value:• Geometric deviation

• Median deviation

• Mean absolute deviation

• Standard deviationAmong measures of dispersion, the mostimportantand mostused is thestandard devi-ation.

DOMAINS AND LIMITATIONSExcept for the descriptive aspect allow-ing one to characterize a distribution orcompare many distributions, the knowl-edge of the dispersion or of the variabil-ity of a distribution can have a consider-able practical interest. What would happenif our decisions were based on the meanalone.

• Roads would be constructed to get onlythe mean traffic, and the traffic jams dur-ing holidays would be immeasurable.

• Houses would be constructed in such asway as to resist the mean force of windwith the consequences that would resultin the case of a strong storm.

These examples show that in certain cas-es, having a measure of central tenden-cy is not enough to be able to make a reli-able decision. Moreover, the measure ofdispersion is an indication of the charac-ter of the central tendency measure. Thesmaller the variability of a given set, themore the value of the measure of centraltendency will be representative of the dataset.

FURTHER READING� Coefficient of variation� Interquartile range� Mean absolute deviation� Measure of central tendency� Range� Standard deviation� Variance

REFERENCESDodge, Y.: Premiers pas en statistique.

Springer, Paris (1999)

Measure of Dissimilarity

The measure of dissimilarity is a distancemeasure that assigns to pair of objects a realpositive (or zero) number.If we suppose, for example, that each objectis characterized by n variables, a measureof dissimilarity between two objects wouldconsist in giving the number of differentpoints that represent two considered objects.

Page 342: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

342 Measure of Dissimilarity

This number is a whole number between 0and n and equals n when two objects haveno common point.

MATHEMATICAL ASPECTSLet � be a population on which we wouldlike to define a measure of dissimilarity s′.We call a measure of dissimilarity s′ anymapping of � × � to the real positive (orzero) numbers, satisfying the following twoproperties:• For each w, w′ in �, s′(w, w′) = s′(w′, w);

that is, s′ is symmetric.• For each w in �, s′(w, w) = 0, that is,

the measure of dissimilarity between anobject and itself is zero.

DOMAINS AND LIMITATIONSAs a measure of dissimilarity s′ on a finitepopulation, � is a limited application, thatis:

max{s′

(w, w′

) ;w, w′ ∈ �} = m <∞ .

We can define a measure of similarity onthe population � by:

s(w, w′

) = m− s′(w, w′

).

These two notions, measure of similarity andmeasure of dissimilarity, can be consideredcomplementary.We use the measure of dissimilarity toresolve the problems of classification,instead of a distance that is a particularcase of it and that has the so-called propertyof triangular inequality.

EXAMPLESWe consider the population � composed ofthree flowers

� = {w1, w2, w3}on which the following three variables weremeasured:

• v1: color of flower• v2: number of petals• v3: number of leaveswhere

v1 is summarized here as a dichotomousvariable because we consider only red(R) and yellow (Y) flowers and

v2 and v3 arediscrete random variables tak-ing their values in cardinal numbers.

We have observed:

Variables

v1 v2 v3w1 R 5 2w2 Y 4 2w3 R 5 1

We determine for each couple of flowers thenumber of points that distinguish them. Wesee that we have a measure of dissimilarityfor the set of three flowers:

s′ (w1, w1) = s′ (w2, w2) = s′ (w3, w3) = 0

because a flower is not different from itself;

s′ (w1, w2) = s′ (w2, w1) = 2

because the color and the number of petalsare different at w1 and w2;

s′ (w1, w3) = s′ (w3, w1) = 1

as only the number of leaves allows to dis-tinguish w1 from w3;

s′ (w2, w3) = s′ (w3, w2) = 3

as w2 and w3 are different in each of the threestudied characteristics.With thehelpof thesepossiblevalues,wecandeduce that s′ satisfies the two properties ofa measure of dissimilarity.

Page 343: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Measure of Location 343

FURTHER READING� Classification� Distance� Measure of similarity� Population

REFERENCESCeleux,G.,Diday,E.,Govaert,G.,Lecheval-

lier, Y., Ralambondrainy, H.: Classifi-cation automatique des données—aspectsstatistiques et informatiques. Dunod,Paris (1989)

Hand, D.J.: Discrimination and classifi-cation. Wiley, New York (1981)

Measure of Kurtosis

The measures of kurtosis are a part of themeasures of form and characterize an aspectof the form of a given distribution. More pre-cisely, they characterize the degree of kurto-sis of the distribution toward a normal distri-bution. Certain distributions are close to thenormal distribution without being totallyidentical. It is very useful to be able to test ifthe form of the distribution represents a devi-ation from a kurtosis of the normal distri-bution.We talk about a platicurtical distribution ifthe curve is more flattened than the normalcurve and of a leptocurtical distribution if itis sharper than the normal curve.To test the kurtosis of a curve, we use thecoefficient of kurtosis.

MATHEMATICAL ASPECTSGiven a random variable X with probabilitydistribution fx the coefficient of kurtosis isdefined as:

γ2 = μ4

σ 4− 3 ,

whereμ4 stands for the fourth momentaboutthemean, andσ 4 is thesquared variance.Thesample kurtosis is obtained by using sam-pling analogues, that is:

γ = μ4

σ 4− 3 .

FURTHER READING� Coefficient of kurtosis� Measure of shape� Measure of skewness

REFERENCESJoanes, D.N. & Gill, C.A.: Comparing mea-

sures of samp skewness and kurtosis,JRSS.D 47(1), 183–189 (1998)

Measure of LocationA measure of location (or of position) isa measure that proposes to synthesize a set ofstatistical data by only one fixed value, high-lighting aparticularpointof thestudied vari-able. The most frequently used measures ofposition are:• Measures of central tendency (mean,

mode, median) that tend to determinethe center of a set of data.

• The quantiles (quartile, decile, centile),which tend todeterminenot thecenter,buta particular position of a subset of data.

DOMAINS AND LIMITATIONSThe measures of location, as well as those ofdispersion and of form, are used especiallyto compare many sets of data or many fre-quency distributions.

FURTHER READING� Measure of central tendency� Quantile

Page 344: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

344 Measure of Shape

Measure of Shape

We can try to characterize the form of a fre-quency distribution with the help of appro-priate coefficients. Certain frequency distri-butions are close to the normal distribution(“bell curve”). These distributions can alsorepresent an asymmetry or a kurtosis fromthe normal curve. The two measures of formare:• Measure of asymmetry• Measure of kurtosis.

HISTORYPearson, Karl (1894, 1895) was the first totest the differences between certain distri-butions and the normal distribution. Wehave shown that deviations from the normalcurve can be characterized by moments oforder 3 and 4 of a distribution.Before 1890, Gram, Jorgen Pedersen andThiele, Thorvald Nicolai of Denmark devel-oped a theory about the symmetry of fre-quency curves. After Pearson, Karl pub-lished his sophisticated and extremely inter-esting system (1894, 1895), many articleswere published on this subject.

FURTHER READING� Measure of dispersion� Measure of kurtosis� Measure of location� Measure of skewness

REFERENCESPearson, K.: Contributions to the mathe-

matical theory of evolution. I. In: KarlPearson’s Early Statistical Papers. Cam-bridge University Press, Cambridge,pp. 1–40 (1948). First published in 1894

as: On the dissection of asymmetrical fre-quency curves. Philos. Trans. Roy. Soc.Lond. Ser. A 185, 71–110

Pearson, K.: Contributions to the mathe-matical theory of evolution. II: Skewvariation in homogeneous material. In:Karl Pearson’s Early Statistical Papers.Cambridge University Press, Cambridge,pp. 41–112 (1948). First published in1895 in Philos. Trans. Roy. Soc. Lond.Ser. A 186, 343–414

Measure of SimilarityThe measure of similarity is a distance mea-sure assigns a real positive (or zero) numbercorrespond to a pair of objects of the studiedpopulation. If we suppose, for example, thateach object is characterized by n variables,a measure of similarity between two objectswould consist in giving the number of com-mon points to two considered objects. Thisnumber is a whole one between 0 and n andequals n when the two objects are identical.

MATHEMATICAL ASPECTSLet � be a population on which we wouldlike to define a measure of similarity s. Wecall a measure of similarity s any mappingfrom�×� to the realpositive (orzero)num-bers, satisfying:• For each w, w′ in �, s

(w, w′

) = s(w′, w

),

that is, s is symmetric;• For each w, w′ in �, with w �= w′

s (w, w) = s(w′, w′

)> s

(w, w′

),

that is, an object is more “similar” to itselfthan to another object.

DOMAINS AND LIMITATIONSAs a measure of similarity s on a population,� is a limited application, that is:

Page 345: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Measure of Skewness 345

max{s(w, w′

) ;w, w′ ∈ �} =

s (w, w) <∞ ,

where w is any object of �; if we pose m =s(w, w), thenwecandefineameasure of dis-similarity of the population � by:

s′(w, w′

) = m− s(w, w′

).

These two notions, measure of similarity andmeasure of dissimilarity, can be consideredcomplementary.We use the measures of similarity, as wellas the measures of dissimilarity, for classi-fication problems.

EXAMPLESLet the population � be the forms of threeflowers

� = {w1, w2, w3}on which we measure the three variables:• v1: color of flower• v2: number of petals• v3: number of leaveswhere

v1 is summarized here as a dichotomousvariable because we consider only red(R) and yellow (Y) flowers and

v2 and v3 arediscrete random variables tak-ing their values in cardinal numbers.

We have observed:

Variables

v1 v2 v3w1 R 5 2w2 Y 4 2w3 R 5 1

We determine for each couple of flowers thenumber of points that are common to them.We see that we have a measure of similarityof the set of three flowers:

s (w1, w1) = s (w2, w2) = s (w3, w3) = 3 ,

which corresponds to the number of vari-ables;

s (w1, w2) = s (w2, w1) = 1

because the color and number of petals aredifferent at w1 and w2;

s (w1, w3) = s (w3, w1) = 2

as only the number of leaves allows us to dis-tinguish w1 from w3;

s (w2, w3) = s (w3, w2) = 0

as w2 and w3 have no common points.With thehelpof thesepossiblevalues,wecandeduce that s satisfies the two properties ofa measure of similarity.

FURTHER READING� Distance� Measure of dissimilarity� Population

REFERENCESSee measure of dissimilarity.

Measure of SkewnessIn a symmetric distribution, the median,arithmetic mean, and mode are in the samecentral point. This is not true when the distri-bution is skewed. In this case, the mode isseparate from the arithmetic mean, and themedian is between two of them. In conse-quence, it is necessary to develop the mea-suresof skewness to study thedegreeofdevi-ation of the form of the distribution from thesymmetric distribution. The principal skew-ness measures are:• Yule and Kendall coefficient• Coefficient of skewness

Page 346: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

346 Median

MATHEMATICAL ASPECTGiven a random variable X with prob. distri-bution fx, the coefficient of skewness isdefined as:

γ1 = μ3

σ 3

where μ3 stands for the third moment aboutthe mean, and σ 3 is the stand deviation.

HISTORYSee coefficient of skewness.

FURTHER READING� Coefficient of skewness� Measure of kurtosis� Measure of shape� Yule and Kendall coefficient

Median

Themedian isameasure of central tenden-cy defined as the value that is in the centerof a set of ordered observations when it is inincreasing or decreasing order.ForarandomvariableX, themedian,denotedby Md, has the property that:

P (X ≤ Md) ≤ 12 and P (X ≥ Md) ≤ 1

2 .

Wefind then 50%of theobservationsoneachside of the median:

If we have an odd number of observations,the median corresponds to the values of themiddle observation. If we have an even num-ber of observations, there is no unique obser-vation in the middle. The median will be giv-en by the arithmetic mean of the values of

two observations of the middle of orderedobservations.

HISTORYIn 1748, Euler, Leonhard and Mayer, JohannTobias proposed, independently, a methodthat consists in dividing the observations ofa data set into two equal parts.

MATHEMATICAL ASPECTSWhen the observations are given individual-ly, the process of calculating the median issimple:1. Arrange the n observations in increasing

or decreasing order.2. If the number of observations is odd:• Find theobservationof themiddle n+1

2 .• The median equals the value of the

middle observation.3. If the number of observations is even:• Find the two observations of the mid-

dle n2 and n

2 + 1.• The median equals the arithmetic

mean of the values of these two obser-vations.

When we do not have individual observa-tions but they are grouped into classes, themedian is determined as follows:1. Determine the median class (the one that

contains the n2 -th observation).

2. Calculate the median using the followingformula:

Md = L+[ n

2 −∑

finf

fMd

]· c ,

where

L is the lower limit of the medianclass,

n is the total number of observa-tions,

Page 347: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Median 347

∑finf is the sum of the frequencies

smaller than the median class,

fMd is the frequency of the medianclass, and

c is the length of the interval of themedian class.

This formula is based on the hypothesisaccording to which the observations areuniformly distributed inside each class. Italso supposes that the lower and upper lim-its of the median class are defined andknown.

PropertiesThe sum of deviations in absolute valuesbetween each observation xi of a data set anda valueα is minimal whenα equals the medi-an:

minimizeα

n∑i=1

|xi − α| ⇒ α = Md .

DOMAINS AND LIMITATIONSIn the calculation of the arithmetic mean,we take into account thevalueof eachobser-vation. Thus an outlier strongly influencesthe arithmetic mean. On the other hand, inthe calculation of the median, we take intoaccount only the rank of the observations.Thustheoutliersdonotinfluencethemedian.When we have a strongly skewed frequen-cy distribution, we are interested in choos-ing the median as a measure of central ten-dency and not the arithmetic mean in orderto neutralize the effect of the extreme val-ues.

EXAMPLESThe median of a set of five numbers 42, 25,75, 40, and 20 is 40 because 40 is the middle

number when the numbers are ordered:

20 25 40︸︷︷︸median

42 75 .

If we add the number 62 to this set, the num-ber of observations is even, and the medianequals the arithmetic mean of the two mid-dle observations:

20 25 40 42︸ ︷︷ ︸40+42

2 =41=median

62 75 .

We consider now the median of data groupedinto classes. Let us take, for example, the fol-lowing frequency table that represents thedaily revenue of 50 grocery stores:

Class(revenuein euros)

Frequency fi(number ofgrocery stores)

Cumulatedfrequency

500–550 3 3

550–600 12 15

600–650 17 32

650–700 8 40

700–750 6 46

750–800 4 50

Total 50

The median class is the class 600–650 asit contains the middle observation (the 25thobservation).Supposing that the observations are uni-formly distributed in each class, the medianequals:

Md = 600+502 − 15

17· (650− 600)

= 629.41 .

This result means that 50% of the gro-cery stores have daily revenues in excess of629.41 euros and the other 50% have rev-enues totaling less than 629.41 euros.

Page 348: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

348 Median Absolute Deviation

FURTHER READING� Mean� Measure of central tendency� Mode� Quantile

REFERENCESBoscovich, R.J.: De Litteraria Expeditione

per Pontificiam ditionem, et Synopsisamplioris Operis, ac habentur plura eiusex exemplaria etiam sensorum impres-sa. Bononiensi Scientiarium et ArtiumInstituto Atque Academia Commentarii,vol. IV, pp. 353–396 (1757)

Euler, L.: Recherches sur la question des iné-galités du mouvement de Saturne et deJupiter, pièce ayant remporté le prix del’année 1748, par l’Académie royale dessciences de Paris. Republié en 1960, dansLeonhardi Euleri, Opera Omnia, 2èmesérie. Turici, Bâle, 25, pp. 47–157 (1749)

Mayer,T.:AbhandlungüberdieUmwälzungdesMondsumseineAchseunddieschein-bare Bewegung der Mondflecken. Kos-mographische Nachrichten und Samm-lungen aufdasJahr17481, 52–183 (1750)

Median Absolute DeviationThe median absolute deviation of a set ofquantitative observations is a measure ofdispersion. It corresponds to themeanof theabsolute values of deviation of each observa-tion relative to the median.

HISTORYSee L1 estimation.

MATHEMATICAL ASPECTSLet x1, x2, . . . , xn be a set of n observa-tions relative to a quantitative variable X.

The median absolute deviation, denoted byEMAD, is calculated as follows:

EMAD =

n∑i=1|xi −Md|

n,

where Md is the median of the observations.

DOMAINS AND LIMITATIONSThe median absolute deviation is a funda-mental notion of the L1 estimation. The L1

estimation of a parameter of the measureof central tendency θ is really a method ofestimation based on the minimum absolutedeviations of observations xi from param-eter θ :

n∑i=1

|xi − θ | .

The value that minimizes this expression isthe median of the sample. In the case of datacontaining outliers, this method is more effi-cient than the least-squares method.

FURTHER READING� L1 estimation� Measure of dispersion� Standard deviation� Variance

Method of Moments

The method of moments is used for esti-mating the parameters of a distributionfrom a sample. The idea is as follows. Ifthe number of parameters to be estimatedequals k, we pose the k first moments ofthe distribution (whose expression dependson unknown parameters) equal the corre-sponding empirical moments, that is, to the

Page 349: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Method of Moments 349

estimators of the moments of order k cal-culated on the sample. We should solve thesystem of k equations with k unknown vari-ables.

HISTORYIn the late 19th century, Pearson, Karlused the method of moments to estimate theparameters of a normal mixed model (withparameters p1, p2, μ1, μ2, σ 2

1 , σ 22 ) given by

P (x) =2∑

i=1

pi · gi (x) ,

where the gi are normally distributed withthe mean μi and variance σ 2

i . The methodwas further developed and studied byChuprov, A.A. (1874–1926), Thiele, Thor-vald Nicolai, Fisher, Ronald Aylmer, andPearson, Karl, among others.

MATHEMATICAL ASPECTSLet θ1, . . . , θk be the k parameters to be esti-mated for the random variable Xwith a givendistribution.We denote by μj((θ1, . . . , θk)) = E

[Xj

]the

moment of order j of X expressed as a func-tion of the parameters to be estimated.The method of moments consists in solvingthe following system of equations:

μj (θ1, . . . , θk) = mj for j = 1, . . . , k .

To distinguish between real parameters andestimators, we denote the latter by θ1, . . . , θk.

EXAMPLES1. Let X be a random variable following an

exponential distribution with unknownparameter θ . We have n observationsx1, . . . , xn of X, and we want to estimateθ .

As the moment of order 1 is the mean ofthe distribution, to estimate θ we shouldequalize the theoretical and the empiri-cal mean. We know that (see exponen-tial distribution) the (theoretical) meanequals 1

θ. The desired estimator θ satisfies

1θ= 1

n

n∑i=1

xi.

2. According to the method of moments, themean μ and the variance σ 2 of a randomvariable X following a normal distri-bution are estimated by the empiricalmean x and the variance

1

n

n∑i=1

(xi − x)2

of a sample {x1, x2, . . . , xn} of X.

DOMAINS AND LIMITATIONSFirst, we should observe if the system ofequations given previously has solutions.It cannot have solutions if, for example,the parameters to be estimated have one ormore particular constraints. In any case, themethod does not guarantee particularly effi-cient estimators. For example, the estima-tor found for the variance of a normal distri-bution is biased.

FURTHER READING� Chi-square distance� Estimator� Least squares� Maximum likelihood� Moment� Parameter� Sample

REFERENCESFisher, R.A.: Moments and product

moments of sampling distributions. Proc.Lond. Math. Soc. 30, 199–238 (1929)

Page 350: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

350 Missing Data

Fisher, R.A.: The moments of the distri-bution for normal samples of measures ofdeparture from normality. Proc. Roy. Soc.Lond. Ser. A 130, 16–28 (1930)

Kendall, M.G., Stuard, A.: The AdvancedTheory of Statistics. Vol. 1. DistributionTheory, 4th edn. Griffin, London (1977)

Pearson, K.: Contributions to the mathe-matical theory of evolution. I. In: KarlPearson’s Early Statistical Papers. Cam-bridge University Press, Cambridge,pp. 1–40 (1948). First published in 1894as: On the dissection of asymmetrical fre-quency curves. Philos. Trans. Roy. Soc.Lond. Ser. A 185, 71–110

Thiele, T.N.: Theory of Observations. C. andE. Layton, London; Ann. Math. Stat. 2,165–308 (1903)

Missing Data

Missing data occur in statistics when cer-tain outcome are not present. This happenswhen data are not available (the data weredestroyed by a saving error, an animal diedduring an experiment, etc.).

HISTORYAllan, F.E. and Wishart, J. (1930) wereapparently thefirst toconsider theproblemofmissing data in the analysis of experimentaldesign. Their works were followed by thoseof Yates, Frank (1933).Allan, F.E. and Wishart, J., as well asYates, F., tried to find an estimate of missingdata by the method of iteration. In a situationwhere many observations are missing, theproposed method does not give a satisfacto-ry solution. Bartlett, M.S. (1937) used the

analysis of covariance to resolve the sameproblem.Since then, many other methods havebeen proposed, among them the method ofRubin, D.B. (1972). Birkes, D. et al. (1976)presented an exact method for solving theproblem of missing data using the incidencematrix. The generalized matrix approachwas proposed by Dodge, Y. and Majum-dar, D. (1979). More details concerningthe historical aspects of missing data from1933 to 1985 can be found in Dodge, Y.(1985).

MATHEMATICAL ASPECTSWe can solve the problem of missing databy repeating the experiment under the sameconditions and so obtain the values for themissingdata.Suchasolution,despite thefactthat it is ideal, isnotalwayseconomicallyandmaterially possible.There exists also other approaches to solvethe problem of missing data in experiment-al design. We mention the approach by thegeneralized inverse or by the incidencematrix, which indicates which observationsare missing in the design.

FURTHER READING� Data� Design of experiments� Experiment� Observation

REFERENCESAllan, F.E., Wishart, J.: A method of esti-

mating the yield of a missing plot in fieldexperimental work. J. Agricult. Sci. 20,399–406 (1930)

Page 351: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Mode 351

Bartlett, M.S.: Some examples of statisti-cal methods of research in agriculture andapplied biology. J. Roy. Stat. Soc. (Suppl.)4, 137–183 (1937)

Birkes, D., Dodge, Y., Seely, J.: Spanningsets for estimable contrasts in classi-fication models. Ann. Stat. 4, 86–107(1976)

Dodge, Y.: Analysis of Experiments withMissing Data. Wiley, New York (1985)

Dodge, Y., Majumdar, D.: An algorithm forfinding least square generalized invers-es for classification models with arbitrarypatterns. J. Stat. Comput. Simul. 9, 1–17(1979)

Rubin, D.B.: A non-iterative algorithm forleast squares estimation of missing valuesin any analysis of variance design. Appl.Stat. 21, 136–141 (1972)

Yates, F.: The analysis of replicated expe-riments when the field results are incom-plete. Empire J. Exp. Agricult. 1, 129–142(1933)

Missing Data Analysis

See missing data.

Mode

The mode is a measure of central tenden-cy. The mode of a set of observations is thevalue of the observation that have the high-est frequency. According to this definition,a distribution can have a unique mode (calledthe unimodal distribution). In some situa-tions a distribution may have many modes(called the bimodal, trimodal, multimodal,etc. distribution).

MATHEMATICAL ASPECTSConsider a random variable X whose den-sity function is f (x). Mo is a mode of thedistribution if f (x) has a local maximum atMo.Empirically, the mode of a set of observa-tions is determined as follows:1. Establish the frequency distribution of

the set of observations.2. The mode or modes are the values whose

frequency is greater or equals to the fre-quency of the other observations.

For the distribution of observationsgrouped into classes of the same range(= maximum−minimum) whose limits areknown, the mode is determined as follows:1. Determine themodalclasses—theclasses

with a frequency greater than or equal tothe frequencies of other classes. (If manyclasseswith thesamefrequency,wegroupthese classes in such a way as to calculateonly one mode of this set of classes).

2. Calculate the mode Mo taking intoaccount the joint classes:

Mo = L+(

d1

d1 + d2

)· c ,

where

L is the smaller limit of the modal class,

d1 is the difference between the frequencyof the modal class and the frequency ofthe previous class,

d2 is the difference between the frequencyof the modal class and the frequency ofthe next class, and

c is the length of the modalclass, commonto all the classes.

Page 352: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

352 Mode

If the length of the classes is not identical,the mode must be calculated by modifyingthe division of the classes in order to obtain,if possible, the classes of equal length.

DOMAINS AND LIMITATIONSFor a discrete variable, the mode has theadvantage of being easy to determine andinterpret. It isusedespeciallywhenthedistri-bution is not symmetric.For a continuous variable, the determinationof the mode is not always precise because itdepends on the grouping into classes. Gen-erally, the mode is a good indicator of thecenter of the data only if there is one dom-inant value in the set of data. If there aremany dominant values, the distribution iscalled plurimodal. In this case, the modesare not the measure of central tendency.A bimodal distribution generally indicatesthat the considered population is in reali-ty heterogeneous and is composed of twosubpopulations having different central val-ues.

EXAMPLESLet the set be the following numbers:

1 2 2 3 3 3 4 4 4 5 5 5 5 6 6 7 7 8 9 .

The frequency table derived from this setis:

Values 1 2 3 4 5 6 7 8 9

Frequency 1 2 3 3 4 2 2 1 1

The mode of this set of numbers is 5 becauseit appears with the greatest frequency (4).Consider now the following frequency table,which represents the daily revenues of 50shops, grouped by class of revenue:

Class (revenuein euros)

Frequency fi(number of shops)

500–550 3

550–600 12

600–650 17

650–700 8

700–750 6

750–800 4

Total 50

We can immediately see that the modal classis 600–650 because it has the greatest fre-quency.The mode is calculated as follows:

Mo = L1 +(

d1

d1 + d2

)· c

= 600+(

5

5+ 9

)· 50 = 617.86 .

Let us take once more the same example butgrouping the numbers into different classes:

Class (revenuein euros)

Frequency fi(number of shops)

500–600 15

600–650 17

650–700 8

700–800 10

Total 50

The classes do not have the same length. Wewill do a new grouping to obtain classes ofthe same length.

Page 353: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Model 353

Case 1Let us take, for example, as reference theclass 600–650, whose length is 50, and sup-pose that the revenues are uniformly dis-tributed in a class.The table producing this new grouping is thefollowing:

Class (revenuein euros)

Frequency fi(number of shops)

500–550 7.5

550–600 7.5

600–650 17

650–700 8

700–750 5

750–800 5

Total 50

The modal class is the class 600–650, and thevalues of the mode are:

Mo = 600+(

9.5

9.5+ 9

)· 50 = 625.67 .

Case 2We could also base our calculation on thelength of the first class; this yields the fol-lowing table:

Class (revenuein euros)

Frequency fi(number of shops)

500–600 15

600–700 25

700–800 10

Total 50

The modal class is the class 600–700, and thevalues of the mode are:

Mo = 600+(

10

10+ 15

)· 100 = 640 .

These results show that we should be carefulto the use of the mode because the obtained

results are different depending on how thedata are grouped into classes.

FURTHER READING� Mean� Measure of central tendency� Measure of skewness� Median

Model

A model is a theoretical representation ofa real situation. It is composed of mathe-matical symbols. A mathematical model,based on a certain number of observationsand hypotheses, tries to give thebestpossibledescription of the phenomenon under study.A mathematical model contains essential-ly two types of elements: the directly orindirectly observable variables that concernthe studied phenomenon and the param-eters, which are fixed quantities, general-ly unknown, that relate the variables amongthem inside the model.

HISTORYThe notion of “model” first appeared in the18th century, in three major problems:1. Understanding the observed inequality

between the movements of Jupiter andSaturn.

2. Determining and mathematically des-cribing the movements of the moon.

3. Determining the shape of the earth.These three problems used observationsfromastronomyandsometheoreticalknowl-edge about gravitation. They occupiedmany scientists, especially Euler, Leonhard(1749), who first treated the problem, and

Page 354: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

354 Model

Mayer, Tobias (1750), who was interestedin the movements of the moon. The thirdproblem was treated by Boscovich, RogerJoseph and Maire, Christopher (1755).

MATHEMATICAL ASPECTSThere exist many types of models. Amongthem, thedeterministmodel isamodelwherethe relations are considered to be exact. Onthe other hand, the stochastic model pre-supposes that the relations are not exact butestablished by a random process.A stochastic model is principally deter-mined by thenatureof independentvariables(exogenous or explanatory). We speak aboutcontinuous models when the independentvariables Xj are the quantitative continuousvariables. We try to express the dependentvariable Y (also called the endogenous orresponse) by a function of the type:

Y = f(X1, . . . , Xp|β0, β1, . . . , βp−1

)+ ε ,

where the βj are the parameters (or coeffi-cients) to be estimated and ε is a nonobserv-able random error. Of this type of model, wecan distinguish linear models and nonlinearmodels; the linearity is relative to the coeffi-cients. A linear model can be written in theform of a polynomial. For example:

Y = β0 + β1X + β2X2 + ε

is a linear model called “of the second order”(the order comes from the greatest power ofvariable X). The model

log V = β0 + β1

W

is also a linear model for parameters β0 andβ1. This model also becomes linear for vari-ables posing:

Y = log V and X = 1

W.

It gives:

Y = β0 + β1 · X .

On the other hand, the model

Y = β0

X + β1+ ε

is a nonlinear model for the parameter β1.There are many methods for estimating theparameters of a model. Among these meth-ods, we may cite the least-squares and theL1 estimation methods.Except for the continuous models, we talkabout discrete models when the studied pop-ulation can be divided into subpopulationson the basis of qualitative categorical vari-ables (or factors) where each can have a cer-tain number of levels. The type of modeldepends on the established experimentaldesign. The objective of such a design is todetermine how the experimental conditionsinfluence theobserved valuesof the responsevariable. We can distinguish different typesof discrete models:• A model with fixed effects (model I) is

a linear model for which the factors arefixed, that is, take a finite number of levelsall of which are represented in the estab-lished experimental design. In this case,the model can be written:

observed value =⎛⎝

linear functionof unknownparameters

⎞⎠

+ error ;

the errors are independent random vari-ables mean equal to zero.

• A model with variable effects (model II)isa linearmodel forwhich thestudied lev-els of each factor were randomly chosen

Page 355: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Model 355

from a finite or nonfinite number of pos-sible levels. In this case, we can write:

observed value = constant

+(

linear functionof random variables

)+ error ;

the random variables are independentwith mean equal to zero.

• A model with mixed effects (model III)is a linear model that contains at the sametime fixed and random factors. Thus wecan write:

observed value =⎛⎝

linear functionof unknownparameters

⎞⎠

+⎛⎝

linear functionof randomvariables

⎞⎠+ error .

When the experimental design containsa single factor, denoted by α, the modelcan be written as:

Yij = μ+ αi + εij ,

i = 1, . . . , I and j = 1, . . . , ni ,

where I is the number of levels of a factor α,ni is thenumberofobservationsateach level,μ is the mean of the population, αi are theeffects associated to each level of the factorα, and εij are the independent random errors,normally distributed with mean to zero andhomogeneous variances σ 2.Such a model can be of type I or type II,dependingonthenatureofthefactorα: it isoftype I if the levels of the factorαare fixed andif they all appear on the entire design. It is oftype II if α is a random variable independentofε, distributed according toany distributionof zero mathematical expectancy and vari-

ance σ 2α (if the levels of α are randomly cho-

sen from a finite or infinite number of possi-ble levels).When the experimental design containsmany factors, for example:

Yijk = μ+ αi + βj + (αβ)ij + εijk ,

i = 1, . . . , I ,

j = 1, . . . , J ,

and k = 1, . . . , nij ,

the model can be of type I (the level of all thefactors are fixed), II (all the factors are ran-dom variables with the same restrictions asin the previous model), or III (certain factorshavefixedlevelsandothersarevariable).Theterm (αβ)ij represents here the effect of theinteraction between the I levels of the factorα and the J levels of the factor β.A model of type I can be analyzed using ananalysis of variance.

FURTHER READING� Analysis of variance� Dependent variable� Design of experiments� Independent variable� Parameter� Random variable� Regression analysis� Variable

REFERENCESAnderson, V.L., McLean, R.A.: Design of

experiments: a realistic approach. MarcelDekker, New York (1974)

Boscovich, R.J. et Maire, C.: De LitterariaExpeditione per Pontificiam ditionem addimetiendas duas Meridiani gradus. Pal-ladis, Rome (1755)

Page 356: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

356 De Moivre, Abraham

Euler, L.: Recherches sur la question des iné-galités du mouvement de Saturne et deJupiter, pièce ayant remporté le prix del’année 1748, par l’Académie royale dessciences de Paris. Republié en 1960, dansLeonhardi Euleri, Opera Omnia, 2èmesérie. Turici, Bâle, 25, pp. 47–157 (1749)

Mayer,T.:AbhandlungüberdieUmwälzungdesMondsumseineAchseunddieschein-bare Bewegung der Mondflecken. Kos-mographische Nachrichten und Samm-lungen aufdasJahr17481, 52–183 (1750)

De Moivre, Abraham

Abraham de Moivre (1667–1754) first stud-ied in Sedan and Saumur (both in France)before continuing his studies in mathematicsand physics at the Sorbonne. Coming froma Protestant family, he was forced to leaveFrance in 1685 due to the revocation of theEdict of Nantes and was exiled to London.After a rough start in his new homeland, hewas able to study mathematics. Hewas elect-ed member of the Royal Society of Londonin 1697.His most important works were The Doc-trine of Chances, which appeared in 1718and was dedicated to questions that contin-ue to play a fundamental role in modernprobability theory, and Miscellanea Analyt-ica de Seriebus et Quadraturis, where forthe first time appeared the density func-tion of the normal distribution in the con-text of probability calculations for gamesof hazard. Carl Friedrich Gauss took thesame density function and applied it to prob-lems of measurement in astronomy. He alsomade contributions to actuarial sciences andmathematical analysis.

Principal works of Abraham De Moivre:

1718 The Doctrine of Chances: or,A Method of Calculating the Proba-bility of Events in Play. Pearson,London.

1725 Annuities upon Lives. Pearson, Lon-don.

1730 Miscellanea Analytica de Seriebus etQuadraturis. Tonson and Watts, Lon-don.

FURTHER READING� De Moivre–Laplace theorem� Normal distribution

De Moivre–Laplace TheoremThis theorem established the relationbetween the binomial distribution withparameters n and p and the normal distri-bution for n that tends to infinity. Its utilityconsists especially in the estimation of thedistribution function F (k) of the binomialdistribution by the distribution function ofthe normal distribution, which is found tobe easier.

HISTORYThis theorem dates back to 1870, whenPierre Simon de Laplace studied, with oth-ers (among them de Moivre, Abraham),problems related to the approximation of thebinomial distribution and to the theory oferrors.

MATHEMATICAL ASPECTSLet X1, X2, . . . , Xn, . . . be a sequence of ran-dom variables, independent and identical-ly distributed, following a binomial distri-bution with parameters n and p, where 0 <

p < 1.

Page 357: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

De Moivre–Laplace Theorem 357

We consider the corresponding adjustedvariable Zn, n = 1, 2, . . . such as:

Zn = Xn − np√npq

,

where np represents the mean of a binomi-al distribution and

√npq its variance (the

variable Zn corresponds to the normaliza-tion of Xn).The de Moivre–Laplace theorem is ex-pressed as follows:

P (Zn ≤ x) −→n→∞ � (x) ,

where � is the distribution function of thestandard normal distribution.The approximation can also be improvedusing the factor of correction, “a half”.Hence, the following approximation:

P (Xn ≥ x) ∼= 1−�

(x− 1

2 − np√npq

)

is better than those obtained without takinginto account the corrective factor 1

2 .

DOMAINS AND LIMITATIONSThis theorem is a particular case of the cen-tral limit theorem. A binomial distri-bution (such as was defined earlier) can bewritten as a sum of n random variables dis-tributed according to a Bernoulli distri-bution of parameter p. The mean of a ran-dom Bernoulli variable is equal to p andits variance to p (1− p). The de Moivre–Laplace theorem is easily deduced from thecentral limit theorem.

EXAMPLESWewant to calculate theprobability ofmorethan 27 successes in a binomial experimentof 100 trials, each trial having a probabilityof success of p = 0.2. The desired binomial

probability can be expressed by the sum:

P (Xn > 27)

=100∑

k=27

(100

k

)· (0.2)k(0.8)100−k .

The direct calculus of this probabilitydemands an evaluation of 74 terms, eachof the form

(100

k

)(0.2)k(0.8)100−k .

Using the de Moivre–Laplace theorem, thisprobability can be approximately evaluatedtaking into account the relation between thebinomial and normal distributions. Thus:

P (Xn > 27) ∼= 1−�

(27− np√

npq

)

= 1−�

(27− 100 · 0.2√

100 · 0.2 · 0.8

)

= 1−� (1.75) .

Referring to the normal table, we obtain� (1.75) = 0.9599, and so:

P (Xn > 27) ∼= 0.0401 .

The exact value to 4 decimal places of theprobability is: P (Xn > 27) = 0.0558.A comparison of the two values 0.0401 and0.0558 shows that the approximation by thenormal distribution gives a result that is veryclose to the exact value.We obtain the following approximationusing the corrective factor:

P (Xn > x) ∼= 1−� (1.625) = 0.0521 .

The result is closer to the exact value 0.0558than thoseobtained withoutusing thecorrec-tive factor (0.0401).

Page 358: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

358 Moment

FURTHER READING� Binomial distribution� Central limit theorem� Normal distribution� Normalization

REFERENCELaplace, P.S. de: Mémoire sur les approxi-

mations des formules qui sont fonctionsde très grands nombres et sur leur appli-cation aux probabilités. Mémoires del’Académie Royale des Sciences de Paris,10. Reproduced in: Œuvres de Laplace12, 301–347 (1810)

Moivre, A. de: Approximatio ad summamterminorum binomii (a + b)n, in seriemexpansi. Supplementum II to MiscellanaeAnalytica, pp. 1–7 (1733). Photograph-ically reprinted in a rare pamphlet onMoivre and some of his discoveries. Pub-lished by Archibald, R.C. Isis 8, 671–683(1926)

Sheynin,O.B.:P.S.Laplace’sworkonproba-bility, Archive for History of Exact Sci-ences 16, 137–187 (1976)

Sheynin, O.B.: P.S. Laplace’s theory oferrors, Archive for History of Exact Sci-ences 17, 1–61 (1977)

MomentWe call the moment of order k of the randomvariable X relative to any value x0 the meandifference between the random variable andx0, to the power k. If x0 = 0, we refer tothe moment (or initial moment) of order k.If x0 = μ, where μ is the expected valueof random variable X, we refer to the centralmoment of order k.

HISTORYThe concept of “moment” in statistics comesfrom the concept of moment in physics

which isderived fromArchimedes (287–212BC) discovery of the operating of the lever,wherewespeak of themomentof forces.Thecorrected formulas for the calculation of themoments of grouped data were adapted bySheppard, W.F. (1886).

MATHEMATICAL ASPECTSThe initial moment of order k of a discreterandom variable X is expressed by:

E[Xk

]=

n∑i=1

xki P (xi) ,

if it exists, that is, if E[Xk

]< ∞, where

P (x) is the probability function of the dis-crete random variable X taking the n valuesx1, . . . , xn.In the case of a continuous random variable,if the moment exists, it is defined by:

E[Xk

]=

∫ ∞−∞

xkf (x) dx ,

where f (x) is the density function of thecontinuous random variable.For k = 1, the initial moment is the same asthe expected value of the random variableX (if this E(X) exists):

(X discrete) E [X] =n∑

i=1

xiP (xi) ,

(X continuous) [X] =∫ ∞−∞

xf (x) dx .

The central moment of order k of a discreterandom variable X, if it exists, is expressedby:

E[(X − E [X])k

]=

n∑i=1

(xi − E [X])k P (xi) .

Page 359: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Monte Carlo Method 359

In the same way, in the case of a continuousrandom variable, we have:

E[(X − E [X])k

]

=∫ ∞−∞

(x− E [X])k f (x) dx

if the integral is well defined.It is evident that with any random variable Xhaving a finite mathematical expectancy, thecentered moment of order 1 is zero:

E [X − μ] =n∑

i=1

(xi − μ) P (xi)

=n∑

i=1

xiP (xi)− μ

n∑i=1

P (xi)

= μ− μ = 0 .

On the other hand, if it exists, the centralmoment of order 2 is the same as the vari-ance of the random variable:

(X discrete) E[(X − μ)2

]

=n∑

i=1

(xi − μ)2 P (xi) = Var (X) ,

(X continuous) E[(X − μ)2

]

=∫ ∞−∞

(x− μ)2 f (x) dx = Var (X) .

DOMAINS AND LIMITATIONSThe estimator of the moment of order k cal-culated for a sample x1, x2, . . . , xn is denotedby mk. It equals:

mk =

n∑i=1

(xi − x0)k

n,

where n is the total number of observations.Ifx0 = 0, then wehave themoment (or initialmoment) of order k.If x0 = x, where x is the arithmetic mean,we have of the central moment of order k.

For the case where a random variable Xtakes its values xi with the frequencies fi, i =1, 2, . . . , h, themomentof orderk isgiven by:

mk =

h∑i=1

fi (xi − x0)k

h∑i=1

fi

.

FURTHER READING� Arithmetic mean� Density function� Expected value� Probability function� Random variable� Variance� Weighted arithmetic mean

REFERENCESSheppard, W.F.: On the calculation of the

most probable values of frequency con-stants for data arranged according toequidistant divisions of a scale. LondonMath. Soc. Proc. 29, 353–380 (1898)

Monte Carlo Method

Any numerical technique for solving mathe-matical problems that uses random or pseu-dorandom numbers is called a Monte Car-lo method. These mathematical or statisticalproblems have no analytical solutions.

HISTORYThe name Monte Carlo comes from the cityof the same name in Monaco, famous for itscasinos. The roulette is one of the simplestmechanisms for the generation of randomnumbers.According to Sobol, I.M. (1975), the MonteCarlo method owes its existence to theAmerican mathematicians von Neumann,

Page 360: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

360 Moving Average

John (1951), Ulam, Stanislaw Marcin, andMetropolis, Nicholas (1949). It was devel-oped in around 1949, but it became practicalwith the invention of computers.

EXAMPLESSuppose that we like to compute the surfaceof a figure S that is situated inside a square;its side equals 1. We will generate N randompoints inside the square. To do this, we notethe square on the system of perpendicularaxes whose origin is the lower left angle ofthe square. This means that all the points thatwe will generate inside the square will havecoordinates between 0 and 1. It is enoughto take uniform random variables to obtaina point. The first random number will bethe abscissa and the second the ordinate.In the following example, let us choose 40random points. We will need two samples of40 random numbers. The surface S that wewould like to estimate is the square of 0.75of the side, in other words the exact valueequals 0.5625.After producing of the 40 random points inthe unit square, it is sufficient to count thenumber of points inside figure S (21 in thiscase). The estimation of the searched sur-face is obtained by the ratio 21

40 = 0.525.

It is possible to improve this estimate by per-forming theexperimentmanytimesandtak-ing the mean of the obtained surfaces.

FURTHER READING� Generation of random numbers� Jackknife method� Random number� Simulation

REFERENCESMetropolis, N., Ulam, S.: The Monte Carlo

method. J. Am. Stat. Assoc. 44, 335–341(1949)

Sobol, I.M.: The Monte Carlo Method. MirPublishers, Moscow (1975)

Von Neumann, J.: Various Techniques Usedin Connection with Random Digits.National Bureau of Standards sympo-sium, NBS, Applied Mathematics Series12, pp. 36–38, National Bureau of Stan-dards, Washington, D.C (1951)

Moving Average

We call the moving average of order k thearithmetic mean calculated on k successivevalues of time series. The rest of these arith-metic means give the series of moving aver-ages of order k. The process that consists inreplacing the initial series is called smooth-ing of the time series.Concerning the time series that are com-prised of annual or monthly data, the aver-ages take, respectively, the name of mov-ing averages on k years or k months. Whenwe have monthly data, for example, calcu-lating a moving average over 12 months, the

Page 361: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Moving Average 361

obtained results correspond to the middleof the considered period, at the end of thesixth month (or on the first day of the sev-enth month), instead of being affected in themiddle of a month as in the original data.Wecorrect thisby making acentered movingaverage over 12 months. Generally, a cen-tered moving average of order k is calculatedmaking the moving average of order 2 of themoving average of order k of the initial timeseries.

HISTORYSee time series.

MATHEMATICAL ASPECTSWe have the time series Y1, Y2, Y3, . . ., YN ,and we define the series of moving averagesof order k by the series of arithmetic means:

Y1 + Y2 + · · · + Yk

k,

Y2 + Y3 + · · · + Yk+1

k, . . . ,

YN−k+1 + YN−k+2 + · · · + YN

k.

The sum of numerators is called movingsums of order k.We can use the weighted arithmetic meanswith previously specified weights; the rest ofwhat is obtained in this way is called a seriesof weighted moving averages of order k.

DOMAINS AND LIMITATIONSThe moving averages are used in the study oftime series, for the estimation of the secu-lar tendency, of seasonal variations, and ofcyclic fluctuations.The operation “moving average” as appliedto a time series allows to:

• Determine the seasonal variations in thelimits where they are rigorously periodi-cal,

• Smooth irregular variations, and• Approximately conserve theextraseason-

al movement.Letusmention thedisadvantagesof themov-ing average method:• The data of the beginning and the end of

series are missing.• Moving averages can cause cycles or oth-

er movements not present in the originaldata.

• Moving averages are strongly influencedby outliers and accidental values.

• When a graph of the studied time seriesshows an exponential secular tendencyor, more generally, a strong curve, themoving average method does not givevery precise results for the estimation ofextraseasonal movement.

EXAMPLESWith the time series 7, 2, 3, 4, 5, 0, 1 data,we obtain the moving averages of order 3:

7+ 2+ 3

3,

2+ 3+ 4

3,

3+ 4+ 5

3,

4+ 5+ 0

3,

5+ 0+ 1

3,

that is, 4, 3, 4, 3, 2.In a moving average series, it is commonto localize each number on its relative posi-tion to the original data. In this example, wewrite:

Original data 7, 2, 3, 4, 5, 0, 1

Moving average

of order 3 4, 3, 4, 3, 2

Page 362: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

362 Multicollinearity

In the graph below each number of the mov-ing average series is the arithmetic mean ofthe three numbers immediately below it.

We propose to calculate the centered mov-ing average over 4 years of themean monthlyproduction of oil in acountry. In thefirst step,we calculate the moving sums over 4 years,adding the data of 4 successive years. Then,dividing each moving sum by 4, we obtainmoving averages over 4 years, the last cor-responding to 1 January of the third yearcon-sidered in different sums. Finally, we obtainmoving averages over 4 years, calculatingthe arithmetic mean of two moving averagesover 4 successive years.

Average monthly oil production in a country inmillions of tons

Year Datum Movingsumover 4years

Movingaverageover 4years

Centeredmovingaverageover 4years

1948 41.4

1949 36.0

158.8 39.7

1950 39.3 39.6

158.0 39.5

1951 42.1 40.4

165.2 41.3

1952 40.6 40.7

160.4 40.1

1953 43.2 39.9

158.8 39.7

1954 34.5 39.3

155.6 38.9

1955 40.5 38.5

152.4 38.1

1956 37.4 38.2

153.2 38.3

1957 40.0

1958 35.3

FURTHER READING� Arithmetic mean� Time series� Weighted arithmetic mean

Multicollinearity

Multicollinearity is a term used to describea situation where a variable is almost equalto a linear combination of other variables.Multicollinearity is sometimes present atregression problems when there is a strongcorellation between the explanatory vari-ables.

Page 363: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Multinomial Distribution 363

The existence of linear or almost linear rela-tions among independent variables causesa big imprecision in the estimation of coef-ficients of regression. If the relations arerigorously linear, the coefficients would bebiased.

MATHEMATICAL ASPECTSSee ridge regression.

FURTHER READING� Correlation coefficient� Multiple linear regression

REFERENCESBelsley, D., Kuh, E., Welsch, R.E.: Regres-

sion diagnistics Idendifying Influentialdata and sources of collinearity. Wiley,New York (1976)

Multimodal Distribution

See frequency distribution.

Multinomial Distribution

Consider a random experiment with nindependent trials. Suppose that each trialcan only give one of event Ei, with a proba-bility pi.Random variable Xi represents the numberof appearances of event Ei.The multinomial distribution with param-eters n and pi is defined by the followingjoint density function:

P (X1 = x1, X2 = x2, . . . , Xs = xs)

= n!s∏

i=1xi!

s∏i=1

pxii ,

with the integers xi ≥ 0 such thats∑

i=1

xi = n.

Indeed, the probability of obtaining thesequence

E1, . . . , E1︸ ︷︷ ︸x1

;E2, . . . , E2︸ ︷︷ ︸x2

; . . . ;Es, . . . , Es︸ ︷︷ ︸xs

is equal tos∏

i=1

pxii .

The number of existing possibilities is thenumber of permutations of n objects ofwhich x1 is event E1, x2 is event E2,. . . , xs isevent Es, meaning that:

n!s∏

i=1xi!

.

The multinomial distribution is a discreteprobability distribution.

MATHEMATICAL ASPECTSFor the multinomial distribution

s∑i=1

pi = 1 ands∑

i=1

xi = n ,

with 0 < pi < 1.The expected value is calculated for eachrandom variable Xi.We therefore have:

E [Xi] = n · pi .

Indeed each random variable can be con-sidered individually as following a binomialdistribution.We can make the same remark for the vari-ance. It is then equal to:

Var (Xi) = n · pi · qi .

DOMAINS AND LIMITATIONSThe multinomial distribution has manyapplications in the analysis of statistical

Page 364: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

364 Multiple Correlation Coefficient

data. It is used especially in situations wherethe data must be classified into several cate-gories of events (for example, in the analysisof categorical data).

EXAMPLESConsider the following random experi-ment:An urn contains nine balls, two white, threered, and four black. A ball is chosen at ran-dom; its color is noted and it is put back inthe urn. This is repeated 5 times.We can describe this random experimentas follows:We have three possible events:

E1: choosing a white ball

E2: choosing a red ball

E3: choosing a black ball

and three random variables:

X1: number of white balls chosen

X2: number of red balls chosen

X3: number of black balls chosen

The probabilities associated to these threeevents are:

p1 = 29 ,

p2 = 39 ,

p3 = 49 .

We will calculate the probability that on fivechosen balls (n = 5), one will be white, twowill be red, and two will be black.The number of possibilities is equal to:

5!/(1!2!2!) .

The probability of obtaining the sequence(w, r, r, b, b), where w, r, and b are, respec-tively, white, red, and black balls, is:

3∏i=1

pxii =

2

9·(

3

9

)2

·(

4

9

)2

= 0.0049 .

Hence,

P (X1 = x1, X2 = x2, X3 = x3)

= 5!3∏

i=1xi!

·(

2

9

)x1

·(

3

9

)x2

·(

4

9

)x3

,

where

P(X1 = 1, X2 = 2, X3 = 2) = 30 · 0.0049

= 0.147 .

FURTHER READING� Binomial distribution� Discrete probability distribution

Multiple Correlation CoefficientSee correlation coefficient.

Multiple Linear RegressionA regression analysis where independentvariable Y linearly depends on many inde-pendent variables X1, X2, . . . , Xk is calledmultiple linear regression.A multiple linear regression equation is ofthe form:

Y = f (X1, X2, . . . , Xk) ,

where f (X1, X2, . . . , Xk) is a linear functionof X1, X2, . . . , Xk.

HISTORYSee regression analysis.

MATHEMATICAL ASPECTSA general model of multiple linear regres-sion containing k = (p − 1) independentvariables (and n observations) is written as:

Yi = β0 +p−1∑j=1

Xjiβj + εi , i = 1, . . . , n ,

Page 365: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Multiple Linear Regression 365

where Yi is the dependent variable, Xji, j =1, . . . , p− 1, are the independent variables,βj, j = 0, . . . , p − 1, are the parameters tobe estimated, and εi is the term of randomnonobservable error.In the matrix form, this model is written as:

Y = Xβ + ε ,

where Y is the vector (n×1) of observationsrelated to the dependent variable (n observa-tions), β is the vector (p×1) of parameters tobe estimated, ε is the vector (n×1) of errors,

andX =⎛⎜⎝

1 X11 . . . X1(p−1)

......

...1 Xn1 . . . Xn(p−1)

⎞⎟⎠ is the

(n×p) matrix of the independent variables.

Estimation of Vector β

Starting from the model

Y = Xβ + ε ,

we obtain the estimate β of the vector β bythe least-squares method:

β = (X′X)−1X′Y ,

and we calculate an estimated value Yfor Y:

Y = X · β .

At this step, we can calculate the residuals,denoted by vector e, that we find in the fol-lowing manner:

e = Y− Y .

Measure of Reliability of Estimation of YTo know which measure to trust for the cho-sen linear model, it is useful to conduct ananalysis of variance and to test thehypothe-ses on vector β of the regression model. Toconduct these tests, we must make the fol-lowing assumptions:

• For each value of Xji and for all i =1, . . . , nand j = 1 . . . , p−1, there isaran-dom variable Y distributed according tothe normal distribution.

• The variance of Y is the same for all Xji;it equals σ 2 (unknown).

• The different observations on Y are inde-pendentofoneanotherbutconditioned bythe values of Xji.

Analysis of Variance in Matrix FormIn the matrix form, the table of analysis ofvariance for the regression is as follows:

Analysis of variance

Sourceofvariation

Degreesoffreedom

Sum ofsquares

Mean ofsquares

Regres-sion

p− 1 β ′X′Y−ny2

β ′X′Y− nY2

p− 1

Residual n− p Y′Y−β ′X′Y

Y′Y− β ′X′Yn− p

Total n− 1 Y′Y−ny2

If the model is correct, then

S2 = Y′Y− β ′X′Yn− p

is an unbiased estimator of σ 2.The analysis of variance allows us to test thenull hypothesis:

H0 : βj = 0 for j = 1, . . . , p− 1

against the alternative hypothesis:

H1: at least one of the parametersβj, j �= 0,is different from zero

calculating the statistic:

F = EMSE

RMSE= EMSE

S2,

Page 366: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

366 Multiple Linear Regression

where EMSE is the mean of squares of theregression, RMSE is the mean of squares ofresiduals, and TMSE is the total mean ofsquares.This F statistic must be compared to the val-ue Fα,p−1,n−p of the Fisher table, where α

is the significance level of the test:

⇒ if F ≤ Fα,p−1,n−p , we accept H0;

if F > Fα,p−1,n−p , we reject H0 for H1.

The coefficient of determination R2 is calcu-lated in the following manner:

R2 = ESS

TSS= β ′X′Y− ny2

Y′Y− ny2,

whereESS is thesumofsquaresof theregres-sion and TSS is the total sum of squares.

DOMAINS AND LIMITATIONSSee analysis of regression.

EXAMPLESThe following data concern ten companiesof the chemical industry. We try to establisha relation between the production, the hoursof work, and capital.

Production Work Capital

(100 tons) (h) (machine hours)

60 1100 300

120 1200 400

190 1430 420

250 1500 400

300 1520 510

360 1620 590

380 1800 600

430 1820 630

440 1800 610

490 1750 630

The matrix model is written as:

Y = Xβ + ε ,

where

Y =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

60120190250300360380430440490

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, X =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1 1100 3001 1200 4001 1430 4201 1500 4001 1520 5101 1620 5901 1800 6001 1820 6301 1800 6101 1750 630

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

,

ε =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

ε1

ε2

ε3

ε4

ε5

ε6

ε7

ε8

ε9

ε10

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, β =⎡⎣

β0

β1

β2

⎤⎦ .

Theequationsthat formthemodelarethefol-lowing:

60 = β0 + 1100β1 + 300β2 + ε1 ,

120 = β0 + 1200β1 + 400β2 + ε2 ,

. . .

490 = β0 + 1750β1 + 630β2 + ε10 .

We calculate the estimators β using theresult:

β = (X′X

)−1 X′Y ,

with

(X′X) =⎡⎣

10 15540 509015540 24734600 8168700

5090 8168700 2720500

⎤⎦ ,

(X′X)−1 =⎡⎣

6.304288 −0.007817 0.011677−0.007817 0.000015 −0.000029

0.011677 −0.000029 0.000066

⎤⎦ ,

Page 367: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

M

Multiple Linear Regression 367

and

X′Y =⎡⎣

302050120001687200

⎤⎦ .

That gives:

β =⎡⎣−439.269

0.2830.591

⎤⎦ =

⎡⎢⎣

β0

β1

β2

⎤⎥⎦ .

Thusweobtain theequationfor theestimatedline:

Yi = −439.269+ 0.283Xi1 + 0.591Xi2 .

Analysis of Variance in Matrix FormThe calculation of different elements of thetable gives us1. Calculation of degrees of freedom:

dlreg = p− 1 = 3− 1 = 2 ,

dlres = n− p = 10− 3 = 7 ,

dltot = n− 1 = 10− 1 = 9 ,

where dlreg is the degrees of freedom ofthe regression, dlres is the degrees of free-dom of the residuals, and dltot is the totaldegrees of freedom.

2. Calculation of the sum of squares:

ESS = β ′X′Y− nY2

= [−439.2690.2830.591]

·⎡⎣

302050120001687200

⎤⎦− 912040

= 179063.39 .

TSS = Y′Y− nY2

= [60 . . . 490] ·⎡⎢⎣

60...

490

⎤⎥⎦−912040

= 187160 .

RSS = Y′Y− β ′X′Y′

= TSS− ESS

= 187160.00− 179063.39

= 8096.61 .

We obtain the following table:

Analysis of variance

Sourceofvariation

Degreeof free-dom

Sum ofsquares

Mean ofsquares

Regres-sion

2 179063.39 89531.70

Residual 7 8096.61 1156.66 = S2

Total 9 187160.00

This allows us to test the null hypothesis:

H0 : β1 = β2 = 0 .

The calculated value of F is:

F = EMSE

RMSE= EMSE

S2= 89531.70

1156.66= 77.41 .

The value of F in the Fisher table with a sig-nificance level α = 0.05 is

F0.05,2,7 = 4.74 .

In consequence, the null hypothesis H0 isrejected. The alternative hypothesis can beone of the three following possibilities:

H1 : β1 �= 0 and β2 �= 0

H2 : β1 = 0 and β2 �= 0

H3 : β1 �= 0 and β2 = 0

Page 368: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

368 Multiple Linear Regression

We can calculate the coefficient of determi-nation R2:

R2 = ESS

TSS= 179063.39

187160.00∼= 0.96 .

FURTHER READING� Analysis of residuals� Analysis of variance� Coefficient of determination� Collinearity� Correlation coefficient� Hat matrix

� Least squares� Leverage point� Normal equations� Regression analysis� Residual� Simple linear regression

REFERENCESWeiseberg, S.: Applied linear regression.

Wiley, New York (2006)

Chatterjee, P., Hadi, A.S.: Regression Anal-ysis by Example. Wiley, New York (2006)

Page 369: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

N

Negative Binomial Distribution

A random variable X follows a negativebinomialdistributionwithparametersk andr if its probability function is of the form:

P (X = k) = Ckr+k−1 · pr · qk ,

where p is the probability of success, q is theprobability of failure, and Cx

n is the numberof combinations of x objects among n.In the situation where a “success–failure”random experiment is repeated in an inde-pendent way, the probability of success isdenoted by p and the probability of fail-ure by q = 1 − p. The experimentis repeated as many times as required toobtain r successes. The number of obtainedfailures before attaining this goal is a ran-dom variable following a negative binomi-al distribution described above. The negativebinomial distribution is a discrete binomialdistribution.

HISTORYThe first to treat the negative binomial distri-bution was Pascal, Blaise (1679). Then deMontmort, P.R. (1714) applied the negativebinomial distribution to represent the num-ber of times that a coin should be flipped toobtain a certain number of heads. Student(1907) used the negative binomial distri-

bution as an alternative to thePoisson distri-bution.Greenwood, M. and Yule, G.U. (1920) andEggenberger, F. and Polya, G. (1923) foundapplications of the negative binomial distri-bution. Ever since, there has been an increas-ing number of applications of this distri-bution, and the statistical techniques basedon this distribution have been developed ina parallel way.

MATHEMATICAL ASPECTSIf X1, X2, . . . , Xn are n independent randomvariables following a geometric distri-bution, then the random variable

X = X1 + X2 + . . .+ Xr

follows a negative binomial distribution.To calculate theexpected valueof X, the fol-lowing property is used, where Y and Z arerandom variables:

E [Y + Z] = E [Y]+ E [Z] .

We therefore have:

E [X] = E [X1 + X2 + . . .+ Xr]

= E [X1]+ E [X2]+ . . .

+ E [Xr]

= q

p+ q

p+ . . .+ q

p

= r · q

p.

Page 370: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

370 Negative Binomial Distribution

To calculate the variance of X, the follow-ing property is used, where Y and Z are inde-pendent variables:

Var (Y + Z) = Var (Y)+ Var (Z) .

We therefore have:

Var (X) = Var (X1 + X2 + . . .+ Xr)

= Var (X1)+ Var (X2)+ . . .

+ Var (Xr)

= q

p2 +q

p2 + . . .+ q

p2

= r · q

p2 .

DOMAINS AND LIMITATIONSAmong the specific fields for which the neg-ative binomial distribution has been appliedare accidents statistics, biological sciences,ecology, market studies, medical research,and psychology.

EXAMPLESA coin is flipped several times. We are inter-ested in the probability of obtaining headsa third time on the sixth throw.We then have:

Number of successes: r = 3

Number of failures: k = 6− r = 3

Probability of one success: p = 1

2(obtaining heads)

Probability of one failure: q = 1

2(obtaining tails)

The probability of obtaining k tails beforethe third heads is given by:

P (X = k) = Ck3+k−1 ·

(1

2

)3

·(

1

2

)k

.

Theprobabilityofobtaininga thirdheadsonthe sixth throw, meaning the probability ofobtaining tails three times before obtainingthe third heads, is therefore equal to:

P(X = 3) = C33+3−1 ·

1

23 ·1

23

= C35 ·

1

23 ·1

23

= 5!

3!(5− 3)!· 1

23 ·1

23

= 0.1563 .

FURTHER READING� Bernoulli distribution� Binomial distribution� Discrete probability distribution� Poisson distribution

REFERENCESEggenberger, F. und Polya, G.: Über

die Statistik verketteter Vorgänge.Zeitschrift für angewandte Mathematis-che Mechanik 3, 279–289 (1923)

Greenwood, M., Yule, G.U.: An inquiry intothe nature of frequency distributions rep-resentative of multiple happenings withparticular reference to the occurrence ofmultiple attacks of disease or of repeat-ed accidents. J. Roy. Stat. Soc. Ser. A 83,255–279 (1920)

Montmort, P.R. de: Essai d’analyse sur lesjeux de hasard, 2nd edn. Quillau, Paris(1713)

Pascal, B.: Varia Opera Mathematica. D.Petri de Fermat. Tolosae (1679)

Gosset, S.W. (“Student”): On the errorof counting with a haemacytometer.Biometrika 5, 351–360 (1907)

Page 371: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

N

Newcomb, Simon 371

Nelder, John A.John Nelder was born in 1924 at Dulver-ton (Somerset, England). After receiving hisdiploma in mathematical statistics in Cam-bridge, he was named president of the Sec-tion of Statistiscs of the National VegetableResearch Station at Wellesbourn, a positionhe held from 1951 to 1968. He earned thetitle of Doctor in sciences at Birminghamand was then elected president of theStatisti-cal Department at Rothamsted ExperimentalStation in Harpenden from 1968 to 1984. Heis now invited professor in the Imperial Col-legeofScience,Technology andMedicine inLondon. He was elected member of the Roy-al Society in 1984 and was president of twoprestigious societies: the International Bio-metricSocietyandtheRoyalStatisticalSoci-ety.Nelder is the author of statistical computa-tional systems Genstat and GLIM, now usedin more than 70 countries. Author of morethan 140 articles published in statistical andbiological reviews, he is also author of Com-puters in Biology and coauthor with PeterMcCullaghofabooktreatingthegeneralizedlinear model. Nelder also developed the ideaof generally balanced designs.

Selected works and articles of John A.Nelder:

1965 (with Mead, R.) A simplex methodfor function minimization. Computa-tional Journal, 7, 303–333.

1971 Computers in Biology. Wykeham,London and Winchester, pp. 149.

1972 (with Wedderburn, R.W.M.) Gener-alized linearmodels. J.Roy.Stat.Soc.Ser. A 135, 370–384.

1974 Genstat: a statistical system. In:COMPSTAT: Proceedings in Com-

putational Statistics. Physica, Vien-na.

1983 (with McCullagh, P.) GeneralizedLinear Models. Chapman & Hall,London, pp. 261.

FURTHER READING� Statistical software

Newcomb, SimonNewcomb, Simon was born in 1835 in NovaScotia (Canada)and died in 1909.Thisastro-nomer of Canadian origin contributed to thetreatmentof outliers in statistics, to theappli-cation of probability theory for the interpre-tationofdata,and to thedevelopmentofwhatwe call today robust methods in statistics.Until the age of 16 he studied essentially byconsulting the books that his father found forhim. After that, he began studying medicinewith plans of becoming a doctor. In 1857, hewasengaged in theAmerican EphemerisandNautical Almanac in Cambridge in the stateofMassachussetts.At thesametime,hestud-ied in Harvard and he graduated in 1858. In1861, he became professor of mathematicsat the Naval Observatory.The collection of Notes on the Theory ofProbabilities (Mathematics Monthly, 1859–61) constitutes a work that today is still con-sidered modern. Newcomb’s most remark-able contribution to statistics is his approachto the treatment of outliers in astronomicaldata; affirming that the normal distributiondoes not fit, he invented the normal contam-inated distribution.

Selected works and articles of Newcomb,Simon:

1859–61 Notes on the theory of probabi-lities. Mathematics Monthly, 1, 136–

Page 372: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

372 Neyman, Jerzy

139, 233–235, 331–355, 349–350; 2,134–140, 272–275; 3, 68, 119–125,341–349.

Neyman, JerzyNeyman, Jerzy (1894–1981) was one of thefounders of modern statistics. One of hisoutstanding contributions was establishing,with Pearson, Egon Sharpe, the basis of thetheory of hypothesis testing. Born in Russia,Neyman, Jerzy studied physics and mathe-matics at Kharkov University. In 1921, hewent to Poland, his ancestral country of ori-gin, where he worked for a while as a statis-tician at the National Institute of Agricul-ture of Bydgoszcz. In 1924, he spent sometime in London,wherehecouldstudy in Uni-versity College under the direction of Pear-son, Karl. At this time he met Pearson, EgonSharpe Gosset, William Sealy, and Fish-er, Ronald Aylmer. In 1937, at the end ofa trip to the United States, where he present-ed papers at many conferences, he accept-ed the position of professor at the Univer-sity of California at Berkeley. He created theDepartmentofStatisticsandfinishedhisbril-liant career at this university.Selected articles of Neyman, Jerzy:

1928 (with Pearson, E.S.) On the use andinterpretation of certain test crite-ria for purposes of statistical infer-ence, I, II. Biometrika 20A, 175–240,pp. 263–295.

1933 (with Pearson, E.S.) Testing of statis-tical hypotheses in relation to proba-bilities a priori. Proc. Camb. Philos.Soc. 29, 492–510.

1934 On the two different aspects of therepresentative method: the method of

stratified sampling and the method ofpurposive selection. J. Roy. Stat. Soc.97, 558–606, discussion pp. 607–625.

1938 Contribution to the theory of sam-pling human populations. J. Am. Stat.Assoc. 33, 101–116.

FURTHER READING� Hypothesis testing

Nonlinear Regression

An analysis of regression where the depen-dent variable Y depends on one or manyindependent variables X1, . . . , Xk is calleda nonlinear regression if the equation ofregression Y = f

(X1, . . . , Xk; β0, . . . , βp

)is

not linear in parameters β0, . . . , βp.

HISTORYNonlinear regression dates back to the 1920sto Fisher, Ronald Aylmer and Macken-zie,W.A.However, theuseandmoredetailedinvestigation of these models had to waitfor advances in automatic calculations in the1970s.

MATHEMATICAL ASPECTSLet Y1, Y2, . . . , Yr and X11, . . . , X1r,X21, . . . , X2r, . . ., Xk1, . . . , Xkr be the r obser-vations (respectively) of the dependentvariable Y and the independent variablesX1, . . . , Xk.The goal is to estimate the parametersβ0, . . . , βp of the model:

Yi = f(X1i, . . . , Xki; β0, . . . , βp

)+ εi ,

i = 1, . . . , r .

Page 373: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

N

Nonlinear Regression 373

By the least-squares method and underthe corresponding hypothesis concerningerrors (see analysis of residuals), we esti-mate the parameters β0, . . . , βp by the val-ues β0, . . . , βp that minimize the sum ofsquared errors, denoted by S

(β0, . . . , βp

) =r∑

i=1

(Yi − f

(X1i, . . . , Xki; β0, . . . , βp

))2:

minβ0,...,βp

r∑i=1

ε2i = min

β0,...,βpS(β0, . . . , βp

).

If f is nonlinear, the resolution of the normalequations (neither of which is linear):

r∑i=1

(Yi − f

(X1i, . . . , Xki; β0, . . . , βp

))

·[

∂ f(X1i, . . . , Xki; β0, . . . , βp

)

∂βj

]= 0

(j = 0, . . . , p) can be difficult.The problem is that the given equations donot necessarily have a solution or have morethan one.In what follows, we discuss the differentapproaches to the resolution of the problem.1. Gauss–Newton

The procedure, iterative in nature, con-sists in the linearization of f with the helpof the Taylor expansion in the successiveestimationofparametersby linearregres-sion.We will develop this method in moredetail in the following example.

2. Steepest descentThe idea of this method is to fix, in anapproximate way, the parameters βj forestimates and then to add an approxi-mation of

−∂S(β0, . . . , βp

)

∂βj

that corresponds to the maximum ofdescent from function S.Iteratively, we determine the parametersthat minimize S.

3. Marquard–Levenburg This method, notdeveloped here, integrates the advan-tages of two mentioned procedures forc-ing and accelerating the convergence ofthe approximations of parameters to beestimated.

DOMAINS AND LIMITATIONSIn multiple linear regression, there arefunctions that are not linear in parametersbut thatbecomelinear after a transformation.If this is not the case, we call these func-tions nonlinear and treat them with nonlinearmethods.An example of a nonlinear function is givenby:

Y = β1

β1 − β2

[exp (−β2X)− exp (−β1X)

]

+ ε .

EXAMPLESLet P1 = (x1, y1), P2 = (x2, y2), and P3 =(x3, y3) be three points on a graph and d1,d2, and d3 the approximative distances (the“approximations” being distributed accord-ing to theGaussdistribution of thesamevari-ance) between the data points and a pointP = (x, y) of unknown and searched coor-dinates.A possible regression model to estimate thecoordinates of P is as follows:

di = f (xi, yi; x, y)+ εi , i = 1, 2, 3 ,

where the function f represents the distancebetween point Pi and point P, that is:

f (xi, yi; x, y) =√

(x− xi)2 + (y− yi)

2 .

Page 374: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

374 Nonlinear Regression

The distance function is clearly nonlinearandcannotbetransformedinalinearform,sothe appropriate regression is nonlinear. Weapply the method of Gauss–Newton.Considering f as a function of parameters xand y to be estimated, the Taylor develop-ment of f , until it becomes the linear term in(x0, y0), is given by:

f (xi, yi; x, y)

� f (xi, yi; x0, y0)

+ ∂ f (xi, yi; x, y)

∂x

∣∣∣∣x=x0y=y0

· (x− x0)

+ ∂ f (xi, yi; x, y)

∂x

∣∣∣∣x=x0y=y0

· (y− y0) ,

where

∂ f (xi, yi; x, y)

∂x

∣∣∣∣x=x0y=y0

= xi − x0

f (xi, yi; x0, y0)

and

∂ f (xi, yi; x, y)

∂x

∣∣∣∣x=x0y=y0

= yi − y0

f (xi, yi; x0, y0).

Solet(x0, y0)beafirstestimationofP, found,for example, geometrically.The linearized regression model is thusexpressed by:

di = f (xi, yi; x0, y0)

+ xi − x0

f (xi, yi; x0, y0)· (x− x0)

+ yi − y0

f (xi, yi; x0, y0)· (y− y0)+ εi ,

i = 1, 2, 3 .

With the help of a multiple linear regres-sion (withoutconstants),wecan estimate theparameters (�x, �y) = (x− x0, y− y0) of

the following model (the “observations” ofthe model appear in brackets):

[di − f (xi, yi; x0, y0)

]

= �x ·[

xi − x0

f (xi, yi; x0, y0)

]

+�y ·[

yi − y0

f (xi, yi; x0, y0)

]+ εi ,

i = 1, 2, 3 .

Let(�x, �y

)be the estimation of (�x, �y)

that was found. Thus we get a bet-ter estimation of coordinates of P by(

x0 + �x, y0 + �y)

.

Using this new point as our point of depar-ture, we can use the same procedure again.The remaining coordinates are found in con-verging way to the desired point, that is, tothe point that approaches the best.Let us take now a concrete example: P1 =(0, 5); P2 = (5, 0); P3 = (10, 10); d1 = 6;d2 = 4; d3 = 6.If we choose as the initial point of theGauss–Newton procedure the following point: P =(

0+5+103 , 5+0+10

3

)= (5, 5), we obtain the

set of the following coordinates:

(5, 5)

(6.2536, 4.7537)

(6.1704, 4.7711)

(6.1829, 4.7621)

(6.1806, 4.7639)

that converges to the point (6.1806,4.7639).The distances between the point found andP1, P2, and P3 are respectively 6.1856,4.9078, and 6.4811.

FURTHER READING� Least squares

Page 375: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

N

Nonparametric Statistics 375

� Model� Multiple linear regression� Normal equations� Regression analysis� Simple linear regression

REFERENCESBates, D.M., Watts D.G.: Nonlinear regres-

sion analysis and its applications. Wiley,New York (1988)

Draper, N.R., Smith, H.: Applied Regres-sion Analysis, 3rd edn. Wiley, New York(1998)

Fisher, R.A.: On the mathematical foun-dations of theoretical statistics. Philos.Trans. Roy. Soc. Lond. Ser. A 222, 309–368 (1922)

Fisher,R.A.,Mackenzie,W.A.:Themanuri-al response of different potato varieties.J. Agricult. Sci. 13, 311–320 (1923)

Fletcher, R.: Practical Methods of Optimiza-tion. Wiley, New York (1997)

Seber, A.F., Wilt, S.C.: Nonlinear Regres-sion. Wiley, New York (2003)

Nonparametric Statistics

Statistical procedures that allow us to pro-cess data from small samples, on variablesabout which nothing is known concerningtheir distribution. Specifically, nonparamet-ric statistical methods were developed to beused in cases when the researcher knowsnothing about the parameters of the vari-able of interest in the population (hencethe name nonparametric). Nonparametricmethods do not rely on the estimation ofparameters (such as the mean or the stan-dard deviation) describing the distribution

of the variable of interest in the population.Nonparametric models differ from paramet-ric models in that the model structure is notspecified a priori but is instead determinedfrom data. Therefore, these methods are alsosometimescalledparameter-freemethodsordistribution-free methods.

HISTORYThe term nonparametric was first used byWolfowitz, 1942.

See also nonparametric test.

DOMAIN AND LIMITATIONSThe nonparametric method varies from theanalysis of a one-way classification modelfor comparing treatments to regression andcurve fitting problems.The most frequently used nonparametrictests are Anderson-Darling test, Chi-square test, Kendall’s tau, Kolmogorov-Smirnov test, Kruskall–Wallis test, Wil-coxon rank sum test, Spearman’s rankcorrelation coefficient, and Wilcoxon signrank test.Nonparametric testshave lesspower than theappropriate parametric tests, but are morerobust when the assumptions underlying theparametric test are not satisfied.

EXAMPLESA histogram is a simple nonparametric esti-mate of a probability distribution. A gener-alization of histogram is kernel smoothingtechnique by which from a given data seta very smooth probability density functioncan be constructed.

See also Nonparametric tests.

Page 376: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

376 Nonparametric Test

FURTHER READING� Nonparametric test

REFERENCESParzen, E.: On estimation of a probability

density function and mode. Ann. Math.Stat. 33, 1065–1076 (1962)

Savage, R.: Bibliography of Nonparamet-ric Statistics and Related Topics: Intro-duction. J. Am. Stat. Assoc. 48, 844–849(1953)

Gibbons, J.D., Chakraborti, S.: Nonpara-metric Statistical Inference, 4th ed. CRC(2003)

Nonparametric Test

A nonparametric test is a type of hypoth-esis testing in which it is not necessary tospecify the form of the distribution of thepopulation under study. In any event, weshould have independent observations, thatis, that the selection of individuals that formsthe sample included must not influence thechoice of other individuals.

HISTORYThe first nonparametric test appeared in theworks of Arbuthnott, J. (1710), who intro-duced the sign test. But most nonparamet-ric tests were developed between 1940 and1955.We make special mention of the articles ofKolmogorov, Andrey Nikolaevich (1933),Smirnov, Vladimir Ivanovich (1939),Wilcoxon, F. (1945, 1947), Mann, H.B. andWhitney, D.R. (1947), Mood, A.M. (1950),and Kruskal, W.H. and Wallis, W.A. (1952).Later, many other articles were added to thislist. Savage, I.R. (1962) published a bibliog-

raphy of about 3000 articles, written before1962, concerning nonparametric tests.

DOMAINS AND LIMITATIONSThe fast development of nonparametric testscan be explained by the following points:• Nonparametric methods require few

assumption concerning the populationunder study, such as assumption of nor-mality.

• Nonparametric methods are often easierto understand and to apply than the equiv-alent parametric tests.

• Nonparametric methods are applicable insituations where parametric tests cannotbe used, for example, when the variablesare measured only on ordinal scales.

• Despite the fact that, at first view, non-parametric methods seem to sacrifice anessential part of information containedin the samples, theoretical researcheshave shown that nonparametric tests areonly slightly inferior to their paramet-ric counterparts when the distributionof the studied population is specified,for example, the normal distribution.Nonparametric tests, on the other hand,are superior to parametric tests whenthe distribution of the population is faraway from the specified (normal) distri-bution.

EXAMPLESSee Kolmogorov–Smirnov test, Kruskal–Wallis test, Mann–Withney test, Wilcox-on signed test), sign test.

FURTHER READING� Goodness of fit test� Hypothesis testing� Kolmogorov–Smirnov test� Kruskal-Wallis test

Page 377: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

N

Norm of a Vector 377

� Mann–Whitney test� Sign test� Spearman rank correlation coefficient� Test of independence� Wilcoxon signed test� Wilcoxon test

REFERENCESArbuthnott, J.:An argument forDivineProv-

idence, taken from the constant regularityobserved in the births of both sexes. Phi-los. Trans. 27, 186–190 (1710)(3.4).

Kolmogorov, A.N.: Sulla determinazioneempirica di una legge di distribuzione.Giornale dell’Instituto Italiano degliAttuari 4, 83–91 (6.1) (1933)

Kruskal, W.H., Wallis, W.A.: Use of ranksin one-criterion variance analysis. J. Am.Stat. Assoc. 47, 583–621 and errata, ibid.48, 907–911 (1952)

Mann, H.B., Whitney, D.R.: On a testwhether one of two random variables isstochastically larger than the other. Ann.Math. Stat. 18, 50–60 (1947)

Mood, A.M.: Introduction to the Theoryof Statistics. McGraw-Hill, New York(1950) (chap. 16).

Savage, I.R.: Bibliography of Nonparamet-ric Statistics. Harvard University Press,Cambridge, MA (1962)

Smirnov, N.V.: Estimate of deviationbetween empirical distribution functionsin two independent samples. (Russian).Bull. Moscow Univ. 2(2), 3–16 (6.1, 6.2)(1939)

Wilcoxon, F.: Individual comparisons byranking methods. Biometrics 1, 80–83(1945)

Wilcoxon, F.: Some rapid approximate sta-tistical procedures. American Cyanamid,

Stamford Research Laboratories, Stam-ford, CT (1957)

Norm of a Vector

The norm of a vector indicates the length ofthis vector defined from the scalar productof the vector by itself. The norm of a vectoris obtained by taking the square root of thisscalar product.A vector of norm 1 is called a unit vector.

MATHEMATICAL ASPECTSFor a given vector x,

x =

⎡⎢⎢⎢⎣

x1

x2...

xn

⎤⎥⎥⎥⎦ ,

we define the scalar product of x by itselfby:

x′ · x =n∑

i=1

x2i .

The norm of vector x thus equals:

‖ x ‖= √x′ · x =√√√√

n∑i=1

x2i .

DOMAINS AND LIMITATIONSAs the norm of a vector is calculated withthehelp of the scalar product, thepropertiesof the norm come from those of the scalarproduct. Thus:• The norm of a vector x is strictly positive

if the vector is not zero, and zero if it is:

‖ x ‖> 0 if x �= 0 and

‖ x ‖= 0 if and only if x = 0 .

Page 378: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

378 Normal Distribution

• The norm of the result of a multiplicationof a vector x by a scalar equals the productof the norm of x and the absolute value ofthe scalar:

‖ k · x ‖= |k| · ‖ x ‖ .

• For two vectors x and y, we have:

‖ x+ y ‖≤‖ x ‖ + ‖ y ‖ ,

that is, the norm of a sum of two vectorsis smaller than or equal to the sum of thenorms of these vectors.

• For two vectors x and y, the absolute valueof the scalar product of x and y is smallerthan or equal to the product of norms:

∣∣x′ · y∣∣ ≤‖ x ‖ · ‖ y ‖ .

This result is called the Cauchy–Schwarzinequality.

The norm of a vector is also used to obtaina unit vector having the same direction as thegivenvector. It isenoughin thiscase todividethe initial vector by its norm:

y = x

‖ x ‖ .

EXAMPLESConsider the following vector defined in theEuclidean space of three dimensions:

x =⎡⎣

121516

⎤⎦ .

The scalar product of this vector by itselfequals:

x′ · x = 122 + 152 + 162

= 144+ 225+ 256 = 625

from where we get the norm of x:

‖ x ‖= √x′ · x = √625 = 25 .

The unit vector having the direction of x isobtained by dividing each component of x by25:

y =⎡⎣

0.480.60.64

⎤⎦ ,

and we want to verify that ‖ y ‖= 1. Hence:

‖ y ‖2 = 0.482 + 0.62 + 0.642

= 0.2304+ 0.36+ 0.4096 = 1 .

FURTHER READING� Least squares� L1 estimation� Vector

REFERENCESDodge, Y.: Mathématiques de base pour

économistes. Springer, Berlin HeidelbergNew York (2002)

Dodge, Y., Rousson, V.: Multivariate L1-mean. Metrika 49, 127–134 (1999)

Normal DistributionRandom variable X is distributed accord-ing to a normal distribution if it has a densityfunction of the form:

f (x) = 1

σ√

2πexp

(− (x− μ)2

2σ 2

),

(σ > 0) .

Normal distribution, μ = 0, σ = 1

Page 379: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

N

Normal Distribution 379

We will say that X follows a normal distri-bution of mean μ and of variance σ 2. Thenormal distribution is a continuous proba-bility distribution.

HISTORYThe normal distribution is often attributed toLaplace, P.S. and Gauss, C.F., whose nameit bears. However, its origin dates back to theworks of Bernoulli, J. who in his work ArsConjectandi (1713) provided the first basicelements of the law of large numbers.In 1733, de Moivre, A. was the first to obtainthe normal distribution as an approximationof thebinomial distribution.Thiswork waswritten in Latin and published in an Englishversion in 1738. De Moivre, A. called whathe found a “curve”; he discovered this curvewhile calculating the probabilities of gainfor different games of hazard.Laplace, P.S., after de Moivre, A., stud-ied this distribution and obtained a moreformal and general result of the de Moivreapproximation. In 1774 he obtained the nor-mal distribution as an approximation of thehypergeometric distribution.Gauss, C.F. studied thisdistribution throughproblems of measurement in astronomy. Hisworks in 1809 and 1816 established tech-niques based on the normal distribution thatbecame standard methods during the 19thcentury. Note that even if the first appro-ximation of this distribution is due to deMoivre, Galileo had already found that theerrors of observation were distributed ina symmetric way and tended to regrouparound their true value.Many denominations of the normal distri-bution can be found in the literature.Quetelet, A. (1796–1874) spoke of the“curve of possibilities” or the “distributionof possibilities”.

Note also that Galton, F. spoke of the“frequency of error distribution” or of the“distribution of deviations from a mean”.Stigler, S. (1980) presents a more detaileddiscussion on the different names of thiscurve.

MATHEMATICAL ASPECTSThe expected value of the normal distri-bution is given by:

E [X] = μ .

The variance is equal to:

Var (X) = σ 2 .

If the mean μ is equal to 0, and the varianceσ 2 is equal to 1, then we obtain a centeredand reduced normal distribution (or standardnormal distribution) whose density func-tion is given by:

f (x) = 1√2π

exp

(−x2

2

).

If a random variable X follows a normaldistribution of mean μ and variance σ 2, thenthe random variable

Z = X − μ

σ

followsacenteredandreducednormaldistri-bution (of mean 0 and variance 1).

DOMAINS AND LIMITATIONSThe normal distribution is the most famouscontinuous probability distribution. Itplays a central role in the theory of proba-bility and its statistical applications. Manymeasurements such as the size or weightof individuals, the diameter of a piece ofmachinery, the results of an IQ test, etc.

Page 380: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

380 Normal Equations

approximately follow a normal distribution.The normal distribution is frequently used asan approximation, either when the normalityis attributed to a distribution in the construc-tion of a model or when a known distributionis replaced by a normal distribution withthe same expected value or variance. It isused for theapproximationof thechi-squaredistribution, the Student distribution, anddiscrete probability distributions such asthe binomial distribution and the Poissondistribution.The normal distribution is also a fundamen-tal element of the theory of sampling, whereits role is important in the study of correla-tion, regression analysis, variance analy-sis, and covariance analysis.

FURTHER READING� Continuous probability distribution� Lognormal distribution� Normal table

REFERENCESBernoulli, J.: Ars Conjectandi, Opus Posthu-

mum. Accedit Tractatus de Seriebusinfinitis, et Epistola Gallice scripta deludo Pilae recticularis. Impensis Thurni-siorum, Fratrum, Basel (1713)

Gauss, C.F.: Theoria Motus CorporumCoelestium. Werke, 7 (1809)

Gauss, C.F.: Bestimmung der Genauigkeitder Beobachtungen, vol. 4, pp. 109–117(1816). In: Gauss, C.F. Werke (publishedin 1880). Dieterichsche Universitäts-Druckerei, Göttingen

Laplace, P.S. de: Mémoire sur la probabi-lité des causes par les événements. Mem.Acad. Roy. Sci. (presented by various sci-entists) 6, 621–656 (1774) (or Laplace,

P.S. de (1891) Œuvres complètes, vol 8.Gauthier-Villars, Paris, pp. 27–65)

Moivre, A. de: Approximatio ad summamterminorum binomii (a + b)n, in seriemexpansi. Supplementum II to MiscellanaeAnalytica, pp. 1–7 (1733). Photograph-ically reprinted in a rare pamphlet onMoivre and some of his discoveries. Pub-lished by Archibald, R.C. Isis 8, 671–683(1926)

Moivre, A. de: The Doctrine of Chances: or,A Method of Calculating the Probabilityof Events in Play. Pearson, London (1718)

Stigler, S.: Stigler’s law of eponymy, Trans-actions of the New York Academy of Sci-ences, 2nd series 39, 147–157 (1980)

Normal EquationsNormal equations are equations obtained bysetting equal to zero the partial derivativesof the sum of squared errors (least squares);normal equations allow one to estimate theparameters of a multiple linear regression.

HISTORYSee analysis of regression.

MATHEMATICAL ASPECTSConsiderageneralmodelof multiple linearregression:

Yi = β0 +p−1∑j=1

βj · Xji + εi , i = 1, . . . , n ,

where Yi is the dependent variable, Xji, j =1, . . . , p−1, are the independent variables,εi

is the term of random nonobservable error,

Page 381: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

N

Normal Equations 381

βj, j = 0, . . . , p− 1, are the parameters to beestimated, and n is the number of observa-tions.To apply the method of least squares, weshould minimize the sum of squared errors:

f(β0, β1, . . . , βp−1

) =n∑

i=1

ε2i

=n∑

i=1

⎛⎝Yi − β0 −

p−1∑j=1

βj · Xji

⎞⎠

2

.

Setting the p partial derivatives equal to zerorelative to the parameters to be estimated, weget the p normal equations:

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

∂f (β0,β1,...,βp−1)∂β0

= 0 ,∂f (β0,β1,...,βp−1)

∂β1= 0 ,

...∂f (β0,β1,...,βp−1)

∂βp−1= 0 .

Wecanalsoexpress thesamemodel inmatrixform:

Y = Xβ + ε ,

where Y is the vector (n×1) of observationsrelated to the dependent variable (n observa-tions), X is the matrix (n × p) of the planrelated to the independent variables, ε is the(n× 1) vector of errors, and β is the (p× 1)vector of the parameters to be estimated.By the least-squares method, we shouldminimize:

ε′ε = (Y− Xβ ′)(Y− Xβ)

= Y′Y− β ′X′Y− Y′Xβ

+ β ′X′Xβ

= Y′Y− 2β ′X′Y+ β ′X′Xβ .

Setting to zero the derivatives relative to β

(in matrix form), corresponding to thepartial

derivativesbutwritteninvectorform,wefindthe normal equations:

X′Xβ = X′Y .

DOMAINS AND LIMITATIONSWe can generalize the concept of normalequations in the case of a nonlinear regres-sion (with r observations and p+ 1 param-eters to estimate).They are expressed by:

r∑i=1

(Yi − f

(X1i, . . . , Xki; β0, . . . , βp

))2

∂βj

=r∑

i=1

(Yi − f

(X1i, . . . , Xki; β0, . . . , βp

))

·[

∂ f(X1i, . . . , Xki; β0, . . . , βp

)

∂βj

]= 0 ,

where j = 0, . . . , p. Because f is nonlinearin the parameters to be estimated, the normalequations are also nonlinear and so can bevery difficult to solve. Moreover, the systemof equations can have more than one solutionthat corresponds to the possibility of manyminima of f . Normally, we try to develop aniterative procedure to solve the equations orrefer to the techniquesexplained innonlinearregression.

EXAMPLESIf the model contains, for example, 3 param-etersβ0, β1, and β2, the normal equations arethe following:

⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩

∂ f (β0, β1, β2)

∂β0= 0 ,

∂ f (β0, β1, β2)

∂β1= 0 ,

∂ f (β0, β1, β2)

∂β1= 0 .

Page 382: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

382 Normal Probability Plot

That is:⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

β0 · n+ β1 ·n∑

i=1X1i + β2 ·

n∑i=1

X2i =n∑

i=1Yi ,

β0 ·n∑

i=1X1i + β1 ·

n∑i=1

X21i + β2 ·

n∑i=1

X1i · X2i

=n∑

i=1X1i · Yi ,

β0 ·n∑

i=1X2i + β1 ·

n∑i=1

X2i · X1i + β2 ·n∑

i=1X2

2i

=n∑

i=1X2i · Yi .

See simple linear regression.

FURTHER READING� Error� Least squares� Multiple linear regression� Nonlinear regression� Parameter� Simple linear regression

REFERENCESLegendre, A.M.: Nouvelles méthodes pour

la détermination des orbites des comètes.Courcier, Paris (1805)

Normal Probability Plot

A normal probability plot allows one to ver-ify if a data set is distributed according toa normal distribution. When this is thecase, it is possible to estimate the mean andthe standard deviation from the plot.The cumulated frequencies are transcribedin ordinate on a Gaussian scale, graduat-ed by putting the value of the distributionfunction F(t) of the normal distribution(centered and reduced) opposite to the pointlocated at a distance t from the origin. Inabscissa, the observations are represented

on an arithmetic scale. In practice, the nor-mal probability papers sold on the market areused.

MATHEMATICAL ASPECTSOnanaxissystemthechoiceoftheoriginandsize of the unity length is made according tothe observations that are to be represented.The abscissa is then arithmetically graduat-ed like millimeter paper.In ordinate, the values of the distributionfunction F(t) of the normal distributionare transcribed at the height t of a fictive ori-gin placed atmid-heightof thesheetofpaper.Therefore:

F (t) t

0.9772 2

0.8412 1

0.5000 0

0.1588 −1

0.0

will be placed, and only the scale on the left,which will be graduated from 0 to 1 (or from0 to 100 if written in %), will be kept.Consider a set of n point data that are sup-posed to be classified in increasing order. Letus call this series x1, x2, . . . , xn, and for eachabscissa xi (i = 1, . . . , n), the ordinate iscalculated as

100 · i− 38

n+ 14

and represented on anormalprobability plot.It is also possible to transcribe

100 · i− 12

n

as a function of xi on the plot. This secondapproximation of the cumulated frequencyis explained as follows. If the surface locat-ed under the normal curve (with area equal to

Page 383: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

N

Normal Probability Plot 383

Normal probability paper. Vertical scale: 0.01 - 99.99 %

Page 384: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

384 Normal Probability Plot

1) is divided into n equal parts, it can be sup-posed that each of our n observations can befound in one of these parts. Therefore the ithobservation xi (in increasing order) is locatedin the middle of the ith part, which in termsof cumulated frequency corresponds to

i− 12

n.

The normality of the observed distributioncan then beobserved by examining thealign-ment of the points: if the points are aligned,then the distribution is normal.

DOMAINS AND LIMITATIONSBy representing the cumulated frequenciesas a function of the values of the statisticalvariable on a normal probability plot, it ispossible to verify if the observed distributionfollows a normal distribution. Indeed, ifthis is the case, the points are approximatelyaligned.Therefore the graphical adjustment of a lineto a set of points allows to:• Visualize the normal character of the

distribution inabetterwaythanwithahis-togram.

• Make a graphical estimation of themean and the standard deviation ofthe observed distribution.

The line of least squares cuts the ordinate0.50 (or 50%) at the estimation of the mean(which can be read on the abscissa). The

estimated standard deviation is given by theinverse of the slope of the line.It is easy to determine the slope by sim-ply considering two points. Let us choose(μ,0.5) and the point at the ordinate

0.9772

(= F (2) = F

[X0.9772 − μ

σ

])

whose abscissa X0.9772 is read on the line.We have:

1

σ= F−1(0.9772)− F−1(0.5)

X − μ

= slope of the line of least squares ,

from which we finally obtain:

σ = X0.9772 − μ

2.

EXAMPLESConsider ten pieces fabricated by a machineX taken randomly: They have the followingdiameters:

9.36 , 9.69 , 11.10 , 8.72 , 10.13 ,

9.98 , 11.42 , 10.33 , 9.71 , 10.96 .

These pieces, supposedly distributed ac-cording to a normal distribution, are thenclassified in increasing order and, for eachdiameter

fi = i− 38

10+ 14

,

are calculated, i being the order number ofeach observation:

xi fi8.72 5/82

9.36 13/82

9.69 21/82

9.71 29/82

9.98 37/82

Page 385: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

N

Normal Table 385

xi fi10.13 45/82

10.33 53/82

10.96 61/82

11.10 69/82

11.42 77/82

The graphical representation of fi withrespect to xi on a piece of normal probabi-lity paper provides for our random sample:estimated mean of diameters: μ = 10.2,estimated standard deviation: σ = 0.9.

FURTHER READING� Distribution function� Frequency� Frequency distribution� Normal distribution

Normal Table

The normal table, also called a Gauss table ora normal distribution table, gives the val-ues of the distribution function of a ran-dom variable following a standard normaldistribution, that is, of the mean equalling 0and of variance equalling 1.

HISTORYde Laplace, Pierre Simon (1774) obtainedthe normal distribution from the hyper-geometric distribution and in a secondwork (dated 1778 but published in 1781) he

gave a normal table. Pearson, Egon Sharpeand Hartley, H.O. (1948) edited a normaltable based on the values calculated by Shep-pard, W.F. (1903, 1907). Exhaustive lists ofnormal tables on the market was given by theNational Bureau of Standards (until 1952)and by Greenwood, J.A. and Hartley, H.O.(until 1958).

DOMAINS AND LIMITATIONSSee central limit theorem.

MATHEMATICAL ASPECTSLet random variable Z follow a normaldistribution of mean 0 and of variance 1.The density function f (t) thus equals:

f (t) = 1√2π· exp

(− t2

2

).

The distribution function of random vari-able Z is defined by:

F (z) = P (Z ≤ z) =∫ z

−∞f (t) dt .

The normal table represented here gives thevalues of F (z) for the different values of zbetween 0 and 3.5.With a normal distribution symmetric rel-ative to the mean, we have:

F (z) = 1− F (−z) ,

which allows to determine F (z) for a nega-tive value of z.Sometimes we must use the normal table, ininverse direction, to find the value of z cor-responding to a given probability. We gener-ally denote by z = zα the value of randomvariable Z for which

P (Z ≤ zα) = 1− α .

Page 386: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

386 Normal Table

EXAMPLESSee Appendix G.Examples of normal table use:

P (Z ≤ 2.5) = 0.9938 ,

P (1 ≤ Z ≤ 2)

= P (Z ≤ 2)− P (Z ≤ 1)

= 0.9772− 0.8413 = 0.1359 ,

P (Z ≤ −1)

= 1− P (Z ≤ 1)

= 1− 0.8413 = 0.1587 ,

P (−0.5 ≤ Z ≤ 1.5)

= P (Z ≤ 1.5)− P (Z ≤ −0.5)

= P (Z ≤ 1.5)− [1− P (Z ≤ 0.5)]

= 0.9332− [1− 0.6915]

= 0.9332− 0.3085 = 0.6247 .

Conversely, we can use the normal table todetermine the limitzα forwhich theprobabi-lity that therandom variableZ takesavaluesmaller to that limit (zα), is equal to a fixedvalue (1− α), that is:

P (Z ≤ zα) = 0.95 ⇒ zα = 1.645 ,

P (Z ≤ zα) = 0.975 ⇒ zα = 1.960 .

This allows us, in hypothesis testing, todetermine the critical value zα relative toa given significance level α.

Example of ApplicationIn the case of a one-tailed test, if the signif-icance level α equals 5%, then we can deter-mine the critical value z0.05 in the followingmanner:

P (Z ≤ z0.05) = 1− α

= 1− 0.05

= 0.95

⇒ z0.05 = 1.645 .

In the case of a two-tail test, we should findthe value zα/2. With a significance level of5%, the critical value z0.025 is obtained in thefollowing manner:

P (Z ≤ z0.025)

= 1− α

2= 1− 0.025 = 0.975

⇒ z0.025 = 1.960 .

FURTHER READING� Hypothesis testing� Normal distribution� One-sided test� Statistical table� Two-sided test

REFERENCESGreenwood, J.A., Hartley, H.O.: Guide to

Tables in Mathematical Statistics. Prince-ton University Press,Princeton,NJ(1962)

Laplace, P.S. de: Mémoire sur la probabi-lité des causes par les événements. Mem.Acad. Roy. Sci. (presented by various sci-entists) 6, 621–656 (1774) (or Laplace,P.S. de (1891) Œuvres complètes, vol 8.Gauthier-Villars, Paris, pp. 27–65)

Laplace, P.S. de: Mémoire sur les probabil-ités. Mem. Acad. Roy. Sci. Paris, 227–332(1781) (or Laplace, P.S. de (1891) Œuvrescomplètes, vol. 9. Gauthier-Villars, Paris,pp. 385–485.)

National Bureau of Standards.: A Guide toTables of the Normal Probability Integral.U.S. Department of Commerce. AppliedMathematics Series, 21 (1952)

Pearson, E.S., Hartley, H.O.: BiometrikaTables for Statisticians, 1 (2nd edn.).Cambridge University Press, Cambridge(1948)

Page 387: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

N

Normalization 387

Sheppard, W.F.: New tables of the proba-bility integral. Biometrika 2, 174–190(1903)

Sheppard, W.F.: Table of deviates of the nor-mal curve. Biometrika 5, 404–406 (1907)

Normalization

Normalization is the transformation froma normally distributed random variable toa random variable following a standard nor-mal distribution. It allows to calculate andcompare the values belonging to the nor-mal curves of the mean and of differentvariances on the basis of a reference nor-mal distribution, that is, the standard normaldistribution.

MATHEMATICAL ASPECTSLet the random variable X follow the nor-mal distribution N

(μ, σ 2

). Its normaliza-

tion gives us a random variable

Z = X − μ

σ

that follows a standard normal distributionN (0, 1).Each value of the distribution N

(μ, σ 2

)can

be transformed into standard variable Z,each Z representing a deviation from themean expressed in units of standard devi-ation.

EXAMPLESThe students of a professional school havehad two exams. Each exam was graded ona scale of 1 to 60, and the grades are consid-ered the realizations of two random variablesof a normal distribution. Let us comparethe grades received by the students on thesetwo exams.

Here are the means and the standard devia-tions of each exam calculated on all the stu-dents:

Exam 1: μ1 = 35 , σ1 = 4 ,

Exam 2: μ2 = 45 , σ2 = 1.5 ,

A student named Marc got the followingresults:

Exam 1: X1 = 41 ,

Exam 2: X2 = 48 .

The question is to know which exam Marcscored better on relative to the other stu-dents.Wecannotdirectlycomparetheresultsof two exams because the results belong todistributions with different means and stan-dard deviations.Is we simply examine the difference of eachnote and compare that to the mean of itsdistribution, we get:

Exam 1: X1 − μ1 = 41− 35 = 6 ,

Exam 2: X2 − μ2 = 48− 45 = 3 .

Note that Marc’s score was 6 points higherthan the mean on exam 1 and only 3 pointshigher than themean on exam2.Ahasty con-clusion would suggest to us that Marc didbetter on exam 1 than on exam 2, relative tothe other students.This conclusion takes into account only thedifference of each result from the mean. Itdoes not consider the dispersion of studentgrades around the mean. We divide the dif-ference from the mean by the standard devi-ation to make the results comparable:

Exam 1: Z1 = X1 − μ1

σ1= 6

4= 1.5 ,

Exam 1: Z2 = X2 − μ2

σ2= 3

1.5= 2 .

By this calculation we have normalized thescores X1 and X2. We can conclude that the

Page 388: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

388 Null Hypothesis

normalized value isgreater for exam 2 (Z2 =2) than for exam 1 (Z1 = 1.5) and that Marcdid better on exam 1, relative to other stu-dents.

FURTHER READING� Normal distribution

Null HypothesisIn the fulfillment of hypothesis testing, thenull hypothesis is the hypothesis that is to betested. It is designated by H0. The oppositehypothesis iscalled thealternative hypoth-esis. It is the alternative hypothesis that willbe accepted if the test leads to rejecting thenull hypothesis.

HISTORYSee hypothesis and hypothesis testing.

MATHEMATICAL ASPECTSDuring the fulfillment of hypothesis testingon a parameter of a population, the nullhypothesis is usually a supposition on thepresumed value of this parameter. The nullhypothesis will then be presented as:

H0 : θ = θ0 ,

θ being the unknown parameter of the popu-lation and θ0 the presumed value of thisparameter. This parameter can be, forexample, the mean of the population.When hypothesis testing aimsatcomparingtwo populations, the null hypothesis gener-ally supposes that the parameters are equal:

H0 : θ1 = θ2 ,

where θ1 is the parameter of the popula-tion where the first sample came from and θ2

the population that the second sample camefrom.

EXAMPLESWe are going to examine the null hypothesison three examples of hypothesis testing:1. Hypothesis testing on the percentage of

a sampleA candidate running for office wants toknow if he will receive more than 50% ofthe vote.The null hypothesis for this problem canbe posed thus:

H0 : π = 0.5 ,

where π is the percentage of the popu-lation to be estimated.

2. Hypothesis testing on the mean of thepopulationA manufacturer wants to test the preci-sionofanewmachine thatshouldproducepieces 8 mm in diameter.We can pose the following null hypoth-esis:

H0 : μ = 8 ,

where μ is the mean of the population tobe estimated.

3. Hypothesis testing on the comparison ofmeans of two populationsAn insurance company has decided toequip its offices with computers. It wantsto buy computers from two different sup-pliers if there is no significant differencebetween the reliability of the two brands.It draws two samples of PC’s commingfrom each brand. Then it records howmuch timewasconsumed foreachsampleto have a problem on PC.According to the null hypothesis, themean of the time passed since the firstproblem is the same for each brand:

H0 : μ1 − μ2 = 0 ,

Page 389: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

N

Null Hypothesis 389

where μ1 and μ2 are the respective meansof two populations.This hypothesis can also be written as:

H0 : μ1 = μ2 .

FURTHER READING� Alternative hypothesis� Analysis of variance� Hypothesis� Hypothesis testing

Page 390: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

O

ObservationAn observation is the result of a scientificstudy assembling information on a statisti-cal unit belonging to a given population.

FURTHER READING� Data� Outlier� Population� Sample

Odds and Odds RatioOdds are defined as the ratio of the proba-bility of an event and the probability of thecomplementary event, that is, the ratio of theprobability of an event occurring (in the par-ticular case of epidemiology, an illness) andthe probability of its not occurring.The odds ratio is, like relative risk, a mea-sure of the relative effect of a cause. Moreprecisely, it is the ratio of the odds calculat-ed within populations exposed to given riskfactor to varying degrees.

MATHEMATICAL ASPECTSOdds attached to an event “having an illness”are, formally, the following ratio, where pdenotes the probability of having the illness:

Odds = p

1− p.

Let there be a factor of risk to which a groupE is exposed and a group NE is not exposed.The corresponding odds ratio, OR, is theratio of the odds of group E and the odds ofgroup NE:

OR = odds of group E

odds of group NE.

DOMAINS AND LIMITATIONSLet us take as an example the calculation ofthe odds ratio from the data in the followingtable comparing two groups of individualsattacked by the same potentially dangerousillness; one group (“treated” group) receivesmedical treatment while the other does not(“reference” group):

Group Deaths Survivors Odds OR

Treated 5% 95% 0.053 0.47

Reference 10% 90% 0.111 1.0*

*) By definition the OR equals 1 for the group ofreference.

If we call here “risk” the risk attached to theevent “death”, then the OR is a good appro-ximation of the relative risk, on the condi-tion that the risks of death are similar or,if they are strongly different, are small. Wegenerally consider a risk smaller than 10%,as in this case, as being small. The odds ratiodoes not remain an approximation of relative

Page 391: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

392 Odds and Odds Ratio

risk because there is a bias of the OR whichappears for small risks. That is:

OR =risk of treated group

survival rate in treated grouprisk in reference group

survival rate in reference group

= risk in treated group

risk in reference group

× survival rate in reference group

survival rate in treated group

= relative risk × bias of odds ratio

with:

bias of OR =survival rate in reference group

survival rate in treated group.

The estimations of the relative and oddsratios are used to measure the force of causalassociations between the exposure to a riskfactor and an illness. The two ratios can takevalues between 0 and infinity that are justi-fied by the same general interpretation. Wecan distinguish two clearly different cases:1. AnORvaluegreater than1indicatesarisk

or an odds of illness much greater whenthe studied group is exposed to a specificrisk factor. In such acase there isapositiveassociation.

2. A smaller value implies a reduced expo-sure to a risk factor. In this case there isa negative association.

Confidence Interval at 95% of Odds RatioThe technique and notation used for a con-fidence interval of 95% of the odds ratio arethe same as those for relative risk:

exp

{(ln OR)± 1.96

√1

a− 1

n+ 1

c− 1

m

}.

a, b, c, and d must be large. If the confidenceinterval includes the value 1.0, then the oddsratio is not statistically significant.

EXAMPLESConsider a study of the relation betweenthe risk factor “smoking” and breast can-cer. The odds of breast cancer is the ratio ofthe probability of developing this cancer andthe probability of not having this illness dur-ing a given period. The odds of developingbreast cancer for each group are defined bythegroup’slevelofexposuretosmoking.Thefollowing table illustrates this example:

Group Cancer(100000/2 years)

Nocancer(100000/2 years)

Odds ofbreastcancer(/2 years)

OR

Non-exposed

114 99886 0.00114 1.0

Passivesmokers

252 99748 0.00252 2.2

Activesmokers

276 99724 0.00276 2.4

In this example, the odds of breast can-cer of the nonexposed group are 0.00114(114 : 99886), those of the passive smok-ers 0.00252 (252 : 99748), and those of theactive smokers 0.00276 (276 : 99274). Tak-ing as the reference group the nonexposedsubjects, the odds ratio of the breast canceris, for the active smokers:

27699724114

99886

= 2.4 = 276

114· 99886

99724.

The risk of breast cancer over 2 years beingvery small, the survival of active smokers aswell as that of nonexposed subjects is closeto 100%, and so the survival ratio (the biasof the odds ratio) is practically equal to 1.0,which is interpreted as almost the absence ofany OR bias. In this example, the odds ratioof breast cancer is identical to the relative

Page 392: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

O

Official Statistics 393

risk of breast cancer in the passive smokersand theactivesmokers.Sincebreast cancer isa rare illness, the odds ratio is a good appro-ximation of the relative risk.

FURTHER READING� Attributable risk� Avoidable risk� Cause and effect in epidemiology� Incidence� Incidence rate� Odds and odds ratio� Prevalence� Prevalence rate� Relative risk� Risk

REFERENCESLilienfeld, A.M., Lilienfeld, D.E.: Founda-

tions of Epidemiology, 2nd edn. Claren-don, Oxford (1980)

MacMahon, B., Pugh, T.F.: Epidemiology:Principles and Methods. Little Brown,Boston, MA (1970)

Morabia, A.: Epidemiologie Causale. Editi-ons Médecine et Hygiène, Geneva (1996)

Morabia, A.: L’Épidémiologie Clinique.Editions “Que sais-je?”. Presses Univer-sitaires de France, Paris (1996)

Official Statistics

Theadjectiveofficialmeanscoming from thegovernment, the administration; authorizedby a proper authority. Thus the word offi-cial, in conjunction with the word statistics,indicates a relation with the state. Moreover,the etymology of the word statistics, fromthe Latin status (state), meaning to be gov-

ernmental statistics of data collected by gov-ernment stablishement or private agenciesworking for the goverment. The term officialstatistics refers to a very general concept thatcan be obtained in different ways.

HISTORYThe census, the most widely known of offi-cial statistics, is a practice almost as oldas the social organization of people. Sinceantiquity, the kings of Babylon and Egypt,as well as Chinese, Greek, Hebrew, Persian,and Roman chiefs, tried to determine thenumber of soldiers under their command andthe goods they possessed. The first statisti-cal trials date back more than 6000 years,but they were only sporadic trials of count-ing people and goods. Statisticians, in themodern sense of the term, appear only inthe late 17th century, with the establishmentof the modern state and the needs of colo-nial administration in France and England.In the 18th century, in Diderot and Alem-bert’s encyclopedia, “arithmetical politics”is defined as “the application of arithmeticalcalculations to political subjects and usage”.The French Revolution, and then the birth ofdemocracy in the United States, justified thenew developments in official statistics. Thenew states, to know the political weight oftheir different entities, needed to know abouttheir populations. That is why the first reg-ular decennial censuses started in the US in1790.Since the Second World War, empires,colonies, and tribal societies have been grad-ually replaced by national states.To organizeelections and assemblies and facilitate thedevelopment of modern corporations, thesenew states must establish statistical systemsthat were essential conditions of their devel-opment. The United Nations, and in partic-

Page 393: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

394 Official Statistics

ular its Statistical Commission, expendedconsiderable effort in promoting and devel-oping official statistics in different states.Under their auspices, the first world popula-tion census was organized in 1950. Anotherfollowed in 1960.Now official statistics are ubiquitous. TheOrganization for Economic Cooperationand Development, OECD, wants to producestatistics that could help in the formulationof a precise, comprehensive, and balancedassessment regarding the state of the prin-cipal sectors of society. These statistics,called indicators, have since the 1960s beenthe subject of much research in economics,health, and education. The goal is to cre-ate, for politicians, a system of informa-tion allowing them to determine the effectsof their economic, social, and educationalpolicies.

EXAMPLESOfficial statistics treat many domains, ofwhich the most important are:• Population (families households, births,

adoptions, recognitions, deaths, mar-riages, divorces, migrations, etc.)

• Environment (land, climate, water, fauna,flora, etc.)

• Employment and active life (companiesand organizations, employment, workconditions, etc.)

• National accounts (national accounting,balance of payments, etc.)

• Price (prices and price indices)• Production, commerce, and consump-

tion (performance of companies, produc-tion, turnover, foreign trade, etc.)

• Agriculture and silviculture (resourceuse, labor,machines, forests,fishing, etc.)

• Energy (balance, production, consump-tion, etc.)

• Construction and housing (building andhousing structures, living conditions,etc.)

• Tourism (hotels, trips, etc.)• Transportation and communications

(vehicles, installations, traffic, highwayaccidents, etc.)

• Financial markets, and banks (bankingsystem, financial accounting, etc.)

• Insurance (illness insurance, privateinsurance, etc.)

• Health (healthcare system, mortality andmorbidity causes, healthcare workers,equipment, healthcare costs, etc.)

• Education and science (primary andsecondary education, higher education,research and development, etc.)

• Sports, culture, and standard of liv-ing (language, religion, cultural events,sports, etc.)

• Politics (elections, participation in polit-ical life, etc.)

• Public finance (federaland localstate rev-enues and expenditures, taxation, etc.)

• Legal system (penal code, courts, recidi-vism, etc.)

DOMAINS AND LIMITATIONSHistory shows the mutual interdependencebetween statistics and the evolution of thestate. This interdependence explains, at leastpartly, the difficulty that statisticians havein defining clear limits in their discipline.Throughout history and even today, politicalpower determines the limits of the study ofofficial statistics, expanding or limiting thecontents depending on its needs.Political choice comes into play in manyareas of official statistics: in definitions ofsuch concepts as unemployment, poverty,wellness, etc.; in the definitions of vari-ables such as professional categories, level

Page 394: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

O

Ogive 395

of salary, etc. This problem of choice leadsto two other problems:1. The comparability of statistics

The needs and objectives of the state dic-tate choice in this domain. Because needsarenotalwaysthesame,choicewillbedif-ferent in differentplacesand internationalstatistics will not always be comparable.The difference in different states’ meansfor data collection and use also explainsthe difference in the results among coun-tries.

2. ObjectivityOfficial statistics require objectivity.Evenifpoliticalwillcannotbecompletelyeliminated, theevolutionofstatistical the-ories and improvements in tools allow formuch more important and more reliabledata collection. The volume of treateddata has increased greatly, and informa-tion has improved in terms of objectivity.

FURTHER READING� Census� Data� Index number� Indicator� Population

REFERENCESDuncan, J.W., Gross, A.C.: Statistics for

the 21st Century: Proposals for Improv-ing Statistics for Better Decision Mak-ing. IrwinProfessionalPublishing,Chica-go (1995)

Dupâquier, J. et Dupâquier, M.: Histoire dela démographie. Académie Perrin, Paris(1985)

INSEE: Pour une histoire de la statistique.Vol. 1: Contributions. INSEE, Econo-mica, Paris (1977)

INSEE: Pour une histoire de la statistique.Vol. 2: Matériaux. J. Affichard (ed.)INSEE, Economica, Paris (1987)

ISI: The Future of Statistics: An Internation-al Perspective. Voorburg (1994)

Stigler, S.: The History of Statistics, theMeasurement of Uncertainty Before1900. Belknap, London (1986)

United Nations Statistical Commission Fun-damental Principles of Official Statistics

Ogive

An ogive is a graphical representation ofacumulated frequency distribution.This isa type of frequency graphic and is also calleda cumulated frequency polygon. It servesto give the number (or proportion) of obser-vations smaller than or equal to a particularvalue.

MATHEMATICAL ASPECTSAn ogive is constructed on a system of per-pendicular axes. We place on the horizon-tal axis the limits of the class intervals pre-viously determined. Based on each of theselimit values, we determine the ordinate ofthe height equal to the cumulated frequencycorresponding to this value. Joining by linesegments the successive points thus deter-mined, we obtain a line called an ogive. Wenormally place the cumulated frequency inthe left vertical axis and the percentage ofcumulated frequency on the right verticalaxis.

DOMAINS AND LIMITATIONSOgives are especially useful for estimatingcentiles in a distribution. For example, wecan know the central point so that 50% of the

Page 395: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

396 Olkin, Ingram

observations would be below this point and50% above. To do this, we draw a line fromthe point of 50% on the axis of percentageuntil it intersectswith thecurve.Thenwever-tically project the intersection onto the hori-zontal axis. The last intersection gives us thedesired value. The frequency polygon andogive are used to compare two statistical setswhose number could be different.

EXAMPLESFrom the table representing the distributionof the daily turnover of 75 stores we will con-struct an ogive.

Distribution of daily turnover of 75 stores (indollars):

Turnover(classintervals)

Fre-quency

Cumu-latedfrequen-cy

Percen-tage ofcumulatedfrequency

215–235 4 4 5.3

235–255 6 10 13.3

255–275 13 23 30.7

275–295 22 45 60.0

295–315 15 60 80.0

315–335 6 66 88.0

335–355 5 71 94.7

355–375 4 75 100.0

Total 75

In this example, we obtain the estimationthat 50% of stores have a turnover smallerthan or equal to about US&289.

FURTHER READING� Frequency distribution� Frequency polygon� Graphical representation

Olkin, Ingram

Olkin, Ingram was born in 1924 in Water-bury, CT, USA. He received his diploma in1947 at the City College of New York andcontinued his studies at Columbia Univer-sity, where he received a master’s degree instatistics (1949). He received his doctoratein statistical mathematics at the Universityof North Carolina in 1951. He then taughtat Michigan State University and at the Uni-versity of Minnesota before being namedprofessor of statistics at Stanford University,a position he currently occupies.Ingram Olkin has served as editor of theAnnals of Statistics, associate editor ofPsychometrika, Journal of EducationalStatistics, Journal of the American Sta-tistical Association, and Linear Algebraand Its Applications, among other mathe-matical and statistical journals. His researchis focused on multivariate analysis, inequa-lities, and the theory and application ofmeta-analysis in the social sciences and biolo-gy. He is coauthor of many works includingInequality Theory of Majorization and ItsApplications (1979), Selecting and Order-ing Populations (1980), Probability Modelsand Applications (1980), and, more recent-ly, Statistical Methods in Meta-Analysis(1985).

Selected works and articles of IngramOlkin:

1979 (with Marshall,A.) Inequalities:The-ory of Majorization and Its Applica-tions. Academic, New York.

1984 (with Marsaglia, G.) Generating Cor-relation Matrices. SIAM Journal ofScientific and Statistical Computing5, pp. 470–475.

Page 396: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

O

One-Sided Test 397

1985 (with Hedges, L.V.) Statistical Meth-ods for Meta-Analysis. Academic,New York.

1994 (with Gleser, L.J. and Derman, C.)Probability Models and Applica-tions. Macmillan, New York.

1995 (with Moher, D.) Meta-analysis ofrandomised controlled trials. A con-cern for standards. J. Am. Med.Assoc. 274, 1962–1963.

1996 (with Begg C., Cho M., Eastwood S.,Horton R., Moher D., Pitkin R., Ren-nie D., Schulz KF., Simel D., StroupD.F.) Improving the quality of report-ing of randomized controlled trials.The CONSORT statement. J. Am.Med. Assoc. 276(8), 637–639.

One-Sided TestA one-sided or a one-tailed test on a popu-lation parameter is a type of hypothesis testin which the values for which we can rejectthe null hypothesis, denoted H0 are locatedentirely in one tail of the probability distri-bution.

MATHEMATICAL ASPECTSA one-sided test on a sample is a type ofhypothesis test where the hypotheses are ofthe form:

(1) H0 : θ ≥ θ0

H1 : θ < θ0

or(2) H0 : θ ≤ θ0

H1 : θ > θ0 ,

where H0 is the null hypothesis, H1 thealternative hypothesis, θ an unknownparameter of the population, and θ0 thepresumed value of this parameter.

In thecaseofaone-sided teston twosamples,the hypotheses are as follows:

(1) H0 : θ1 ≥ θ2

H1 : θ1 < θ2

or(2) H0 : θ1 ≤ θ2

H1 : θ1 > θ2 ,

where θ1 and θ2 are the unknown parametersof two populations being compared.The critical region which is defined to be theset of values of the test statistics for whichthe null hypothesis, H0 is rejected, for a one-sided test is the set of values less than thecritical value of the test, or the set of valuesgreater than critical value of the test.

EXAMPLESOne-sided Test on a Population MeanA perfume manufacturer wants to make surethat their bottles really contain a minimumof 40 ml of perfume. A sample of 50 bottlesgivesameanof39 ml,withastandard devi-ation of 4 ml.The hypotheses are as follows:

Null hypothesis H0 : μ ≥ 40 ,

Alternative hypothesis H1 : μ < 40 .

If the dimension of the sample is largeenough, the sampling distribution of themean can be approximated by a normaldistribution. For a significance level α =1%, we obtain the value of zα in the normaltable: zα = 2.33. The critical value of thenull hypothesis is calculated by the expres-sion:

μx − zα · σx ,

where μx is the mean of the sampling distri-bution of the means

μx = μ

Page 397: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

398 One-Sided Test

and σx is the standard deviation of the sam-pling distribution of the means, or standarderror of the mean

σx = σ√n

.

We obtain thus the following critical value:

40− 2.33 · 4√50= 38.68 .

As the sampling mean x = 39 is greaterthan the critical value for the one-tailed lefttest, the manufacturer cannot reject the nullhypothesis and so he can consider that hisbottles of perfume do indeed contain at least40 ml of perfume, the mean of the samplebeing randomly distributed.

One-sided Test on Percentageof Two PopulationsThe proposer of a bill on road traffic believesthat his bill will be more readily acceptedby urban populations than by rural ones. Aninquirywasmadeontwosamplesof100peo-plefromurbanandruralenvironments. In theurban environment (population 1), 82 peo-ple responded favorably to his bill, while intheruralenvironment(population2),only69people responded favorably.To confirm or to invalidate the proposer’ssuspicion, we formulate a hypothesis testposing the following hypotheses:

Null hypothesis H0 : π1 ≤ π2

or π1 − π2 ≤ 0

Alt. hypothesis H1 : π1 > π2

or π1 − π2 > 0

where π1 and π2 represent the favorable pro-portion of the urban and rural populations,respectively.Depending on the measured percentage ofthe two samples (p1 = 0.82 and p2 = 0.69),

we can estimate the value of the standarddeviation of the sampling distribution of thepercentagedifferenceorstandarderrorof thepercentage difference:

σp1−p2 =√

p1 · (1− p1)

n1+ p2 · (1− p2)

n2

=√√√√ 0.82·(1−0.82)

100

+ 0.69·(1−0.69)100

= 0.06012 .

If we design a test with a significance levelα of 5%, then the value of zα in the normaltable equals 1.645. The critical value of thenull hypothesis is calculated by:

π1 − π2 + zα · σp1−p2

= 0+ 1.645 · 0.06012 = 0.0989 .

Because thedifferencebetween theobservedproportions p1 − p2 = 0.82− 0.69 = 0.13is greater than the critical value, we have toreject the null hypothesis in favorof the alter-native hypothesis, thus confirming the sup-position of the proposer.

FURTHER READING� Acceptance region� Alternative hypothesis� Hypothesis testing� Null hypothesis� Rejection region� Sampling distribution� Significance level� Two-sided test

REFERENCESMiller, R.G., Jr.: Simultaneous Statistical

Inference, 2nd edn. Springer, Berlin Hei-delberg New York (1981)

Page 398: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

O

One-Way Analysis of Variance 399

One-Way Analysis of VarianceOne-way analysis of variance for a fac-tor with t levels (or t treatments) is a tech-nique that helps to determine if there existsa significant difference between the t treat-ments (populations). The principle of one-way analysis of variance is to compare thevariability within samples taken from thepopulations with the variability betweenthese samples. The source of variation called“error” corresponds to the variability with-in the samples, and the source of variationcalled “effect” corresponds to the variabili-ty between the samples.

HISTORYSee analysis of variance.

MATHEMATICAL ASPECTSThe linear model for a factor with t levels(treatments) is as follows:

Yij = μ+ τi + εij ,

i = 1, 2, . . . , t , j = 1, 2, . . . , ni ,

where Yij represents observation j receivingtreatment i, μ is the general mean commontoall treatments,τi is theactualeffectof treat-ment ion theobservations, andεij is theexpe-rimental error of observation Yij.This model is subjected to the basichypotheses associated to analysis of vari-ance if we suppose that the errors εij areindependent random variables followinga normal distribution N(0, σ 2).To see if there exists a difference between thet treatments, a test of hypothesis is done.The following null hypothesis:

H0 : τ1 = τ2 = . . . = τt

means that the t treatments are identical.

The alternative hypothesis is formulated asfollows:

H1 : not all values of τi (i = 1, 2, . . . , t)are identical.

The principle of analysis of variance is tocompare the variability within each samplewith the variability among the samples. Forthis, the Fisher test is used, in which a ratiois formed whose numerator is an estimationof the variance among treatments (samples)and whose denominator is an estimation ofthe variance within treatments.This ratio, denoted F, follows a Fisherdistribution with t−1 and N− t degrees offreedom (N being the total number of obser-vations). The null hypothesis H0 will berejected at the significance level α if the Fratio is superior or equal to the value in theFisher table, meaning if

F ≥ Ft−1,N−t,α .

If the null hypothesis H0 cannot be rejected,then we can consider that the t samples comefrom the same population.

Calculation of Variance Among TreatmentsFor the variance among treatments, the sumof squares among treatments is calculated asfollows:

SSTr =t∑

i=1

ni(Yi. − Y..)2 ,

where Yi. is the mean of the ith treatment.The number of degrees of freedom associ-ated to this sum is equal to t− 1, t being thenumber of samples (or groups).Thevarianceamongtreatments is thenequalto:

s2Tr =

SSTr

t − 1.

Page 399: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

400 One-Way Analysis of Variance

Calculation of Variance Within TreatmentsFor the variance within treatments the sumof squares within treatments also callederror, must be calculated as follows:

SSE =t∑

i=1

ni∑j=1

(Yij − Yi.)2 .

The number of degrees of freedom associ-ated to this sum is equal to N− t, N being thetotal number of observations and t the num-ber of samples. The variance within treat-ments is then equal to:

s2E =

SSE

N − t.

The total sum of squares (SST) is the sum ofsquares of the deviations of each observa-tion Yij from the general mean Y..:

SST =t∑

i=1

ni∑j=1

(Yij − Y..)2 .

The number of degrees of freedom associ-ated to this sum is equal to N−1, N being thetotal number of observations. We can alsofind SST in the following way:

SST = SSTr + SSE .

Fisher TestWe can now calculate the F ratio and deter-mineif thesamplescanbeconsideredashav-ing been taken from the same population.The F ratio is equal to the estimation of thevariance among treatments divided by theestimation of the variance within treatments,meaning

F = s2Tr

s2E

.

If F is larger than or equal to the value in theFisher table for t − 1 and N − t degrees

of freedom, then we can conclude that thedifference between the samples is due tothe experimental treatments and not only torandomness.

Table of Analysis of VarianceIt iscustomary to summarize the informationofananalysis of variance ina tablecalledananalysis of variance table, presented as fol-lows:

Sourceof va-riation

Degreesoffreedom

Sum ofsquares

Mean ofsquares

F

Amongtreat-ments

t-1 SSTr s2Tr

s2Tr

s2E

Withintreat-ments

N-t SSE s2E

Total N-1 SST

Pairwise Comparison of MeansWhen the Fisher test rejects the null hypoth-esis, meaning there is a significant differ-ence between the means of samples, wecan wonder where this difference is. Sev-eral tests have been developed to answerthis question. There is the least signifi-cant difference (LSD) test (in French thetest de la différence minimale) that makescomparisons of means taken in pairs. Forthis test to be used, the F ratio must indi-cate a significant difference between themeans.

EXAMPLESDuring cooking, croissants absorb grease invariable quantities. We want to see if thequantity absorbed depends on the type of

Page 400: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

O

One-Way Analysis of Variance 401

grease.Fourdifferent typesofgreasearepre-pared, and six croissants are cooked per typeof grease.The data are in the following table (the num-bers are quantities of grease absorbed percroissant):

Grease

1 2 3 4

64 78 75 55

72 91 93 66

68 97 78 49

77 82 71 64

56 85 63 70

95 77 76 68

The null hypothesis is that there is no dif-ferenceamong treatments,meaning that thequantity of grease absorbed during cookingdoes not depend on the type of grease:

H0 : τ1 = τ2 = τ3 = τ4 .

To test this hypothesis, the Fisher test isused, for which the F ratio has to be deter-mined. Therefore the variance among treat-ments and the variance within treatmentsmust be calculated.

Variance Among TreatmentsThe general mean Y.. and the means of the

four sets Yi. = 1ni

ni∑j=1

Y1i (i = 1, 2, 3, 4) have

to be calculated. Since all ni are equal to 6and N is equal to 24, we obtain:

Y.. = 64+ 72+ . . .+ 70+ 68

24

= 1770

24= 73.75 .

We also calculate:

Y1. = 1

6

6∑j=1

Y1j = 432

6= 72 ,

Y2. = 1

6

6∑j=1

Y2j = 510

6= 85 ,

Y3. = 1

6

6∑j=1

Y3j = 456

6= 76 ,

Y4. = 1

6

6∑j=1

Y4j = 372

6= 62 .

We can then calculate the sum of squaresamong treatments:

SSTr =4∑

i=1

ni(Yi. − Y..)2

= 6(72− 73.75)2 + 6(85− 73.75)2

+ 6(76− 73.75)2 + 6(62− 73.75)2

= 1636.5 .

The number of degrees of freedom associ-ated to this sum is equal to 4− 1 = 3.Thevarianceamongtreatments is thenequalto:

s2Tr =

SSTr

t − 1= 1636.5

3= 545.5 .

Variance Within TreatmentsThesumofsquareswithin treatmentsorerroris equal to:

SSE =4∑

i=1

6∑j=1

(Yij − Yi.)2

= (64− 72)2 + (72− 72)2 + . . .+(70− 62)2 + (68− 62)2

= 2018 .

The number of degrees of freedom associ-ated to this sum is equal to 24− 4 = 20.

Page 401: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

402 Operations Research

The variance within treatments is then equalto:

s2E =

SSE

N − t= 2018

20= 100.9 .

Fisher TestAll the elements necessary for the calcula-tion of F are now known:

F = s2Tr

s2E

= 545.5

100.9= 5.4063 .

We have to find the value of the Fisher tablewith 3 and 20 degrees of freedomwith a sig-nificance level α = 0.05:

F3.20,0.05 = 3.10 .

We notice that

F ≥ F3,20,0.05 ,

5.4063 ≥ 3.10 ,

which means that H0 must be rejected.This means that there is a significant dif-ference between the treatments. Thereforethe quantity of grease absorbed depends onthe type of grease used.

Table of Analysis of VarianceThis information is summarized in the fol-lowing table of analysis of variance:

Sourceofvaria-tion

Degreesoffreedom

Sum ofsquares

Mean ofsquares

F

Amongtreat-ments

3 1636.5 545.5 5.4063

Withintreat-ments

20 2018 100.9

Total 23 3654.5

FURTHER READING� Analysis of variance� Contrast� Fisher table� Fisher test� Least significant difference test� Two-way analysis of variance

REFERENCESCox, D.R., Reid, N.: Theory of the design

ofexperiments. Chapman & Hall, London(2000)

Montgomery, D.C.: Design and analysis ofexperiments, 4th edn. Wiley, Chichester(1997)

Sheffé, H.: The analysis of variance. C’n’H(1959)

Operations Research

Operations research is a domain of appliedmathematics that uses scientific methods toprovide a necessary basis for decision mak-ing. It is generally applied to the complexproblems of people and equipment organiza-tion in order to find the best solution to reachsome goal. Included in operations researchmethods are simulation, linear program-ming, mathematical programming (nonlin-ear), game theory, etc.

HISTORYThe original objective of operations researchwas to find military applications in the Unit-ed Kingdom in the 1930s. These militaryapplications were then largely extended, andsince the Second World War the use of oper-ations research has been extended to non-military public applications and even to theprivate sector. The impact on operations

Page 402: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

O

Optimization 403

research of the rapid development of com-puters was very important.

EXAMPLESOptimization, linear programming, andsimulation are examples of operationsresearch.

FURTHER READING� Linear programming� Optimization� Simulation

REFERENCESBealle, E.M.L.: Introduction to Optimiza-

tion. Wiley, New York (1988)

Collatz, L., Wetterling, W.: Optimizationproblems. Springer, Berlin HeidelbergNew York (1975)

Optimal Design

An optimal design is an experimentaldesign that satisfies certain criterion of opti-mality (for example the minimization oferrors associated to the estimations that wemust make). As the treatments applied todifferent experimental units are fixed bythe experimenter, we can try to improve thequality of the results. We can also increasethe precision of the estimators, reduce theconfidence intervals, or increase the powerof the hypothesis testing, choosing the opti-mal design inside the category to which itbelongs.

HISTORYThe theory of optimal designs was devel-oped by Kiefer, Jack (1924–1981). One ofthe principal points of his work is based on

the statistical efficiency of design. In his arti-cle of 1959, he presents the principal crite-ria for obtaining the optimal design. Kiefer’scollected articles in the area of experiment-al design was published by Springer withthe collaboration of the Institute of Mathe-matical Statistics.

DOMAINS AND LIMITATIONSAs we cannot minimize all errors at thesame time, the optimal design will be deter-mined by the choice of optimality crite-ria. These different criteria are revealed inKiefer’s 1959 article.

FURTHER READING� Design of experiments

REFERENCESBrown,L.D.,Olkin, I.,Sacks, J.,Wynn,H.P.:

Jack Carl Kiefer, Collected Papers III:Design of Experiments. Springer, BerlinHeidelberg New York (1985)

Fedorov, V.V.: Theory of Optimal Expe-riments. Academic, New York (1972)

Kiefer, J.: Optimum experimental designs.J. Roy. Stat. Soc. Ser. B 21, 272–319(1959)

Optimization

A large number of statistical problems canbe solved using optimization techniques.We use optimization techniques in statisti-cal procedures such as least squares, max-imum likelihood, L1 estimation, etc. Inmany other areas of statistics such as regres-sion analysis, hypothesis testing, experi-mental design, etc., optimization plays a hid-den, but important, role.

Page 403: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

404 Outlier

Optimization methods can be classed in thefollowing manner:1. Classical optimization methods (differ-

ential calculation, Lagrange multipliers)2. Mathematical programming methods

(linear programming, nonlinear pro-gramming, dynamic programming)

HISTORYThe problems of optimization were formu-lated by Euclid, but only with the devel-opment of the differential calculus and thecalculus of variations in the 17th and 18thcenturies were mathematical tools capableof resolving such problems available. Thesemethods were employed for the first time inthe resolution of certain problems of opti-mization in geometry and physics.Problems of this type were formulated andstudied by well-known mathematicians suchas Euler, Leonhard Bernoulli, Jakob, Jaco-bi,CarlGustavJacob,andLagrange,Joseph-Louis, to whom we owe the Lagrange mul-tiplier.The discovery of important applications ofthe optimization technique in military prob-lems during the Second World War andadvances in technology (computer) allowedfor the resolution of increasingly complexproblems of optimization in the most variedfields.

FURTHER READING� Lagrange multiplier� Linear programming� Operations research

REFERENCESArthanari, T.S., Dodge, Y.: Mathematical

Programming in Statistics. Wiley, NewYork (1981)

Outlier

Outlier is an observation which is well seper-ated from the rest of the data.Outliers are defined with respect to a sup-posedunderlyingdistributionora theoreticalmodel. If these change, the observation maybe no more outlying.

HISTORYThe problem of outliers has been addressedby many scientists since 1750 by Boscovich,Roger Joseph, de Laplace, Pierre Simon,and Legendre, Adrien Marie.According to Stigler, Stephen M. (1973),Legendre,A.M.proposed, in1805, the rejec-tion of outliers; and in 1852, Peirce, Ben-jamin established a first criterion for deter-mining outliers. Criticisms appeared, espe-cially from Airy, George Biddell (1856),against the use of such a criterion.

DOMAINS AND LIMITATIONSIn the framework of statistical study, we canformulate the question of outliers in two dif-ferent ways:• Thefirst consistsof adapting methods that

are resistent to the presence of outliers inthe sample.

• the Second tries to eliminate the outliers.After identification, delete them from thesample before making a statistical analy-sis.

Deletion of outliers from a data set is a con-troversial issue especially in a small data set.The problem of outliers led to the research ofnew statistical methods called robust meth-ods. Estimators not sensitive to outliers aresaid to be robust.

Page 404: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

O

Outlier 405

FURTHER READING� Analysis of residuals� Hat matrix� L1 estimation� Leverage point� Observation� Robust estimation

REFERENCESAiry, G.B.: Letter from Professor Airy,

Astronomer Royal, to the editor.Astronom. J. 4, 137–138 (1856)

Barnett, V., Lewis, T.: Outliers in statisticaldata. Wiley, New York (1984)

Peirce, B.: Criterion for the Rejection ofDoubtful Observations. Astron. J. 2, 161–163 (1852)

Stigler, S.: Simon Newcomb, Percy Daniell,and the History of Robust Estimation1885–1920. J. Am. Stat. Assoc. 68, 872–879 (1973)

Page 405: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

Paasche IndexThe Paasche index is a composite indexnumber of price arrived at by the weight-ed sum method. This index number corre-sponds to the ratio of the sum of the pricesof the actual period n and the sum of pricesof the reference period 0, these sums beingweighted by the respective quantities of theactual period.The Paasche index differs from theLaspeyres index only by the choice ofweights method. Indeed, in the Laspeyresindex, the weights are given using quantitiesof the reference period Q0 rather than thoseof the current period Qn.

HISTORYIn the mid-19th century German statisticianPaasche, Hermann developed a formula forthe index number that carries his name.Paasche, H. (1874) worked on prices record-ed in Hamburg.

MATHEMATICAL ASPECTSThe Paasche index is calculated as follows:

In/0 =∑

Pn ·Qn∑P0 ·Qn

,

where Pn and Qn are, respectively, the pricesand sold quantities in the current period andP0 and Q0 are the prices and sold quantities

in the referenceperiod.Thesumsrelate to theconsidered goods and are expressed in base100.

In/0 =∑

Pn · Qn∑P0 · Qn

· 100 .

ThePaaschemodelcanalsobeapplied tocal-culate a quantity index (also called volumeindex). In this case, it is the prices that areconstant and the quantities that are variable:

In/0 =∑

Qn · Pn∑Q0 · Pn

· 100 .

EXAMPLESConsider the following table indicating therespective prices of three food items in refer-ence year 0 and in the current year n, as wellas the quantities sold in the current year:

Product Price (euros)

1970 1988

Quantity soldin 1988 (Qn )

(thousands)(P0) (Pn )

Milk 85.5 0.20 1.20

Bread 50.5 0.15 1.10

Butter 40.5 0.50 2.00

From the following table we have:

∑PnQn = 239.15 and

∑P0Qn = 44.925 .

Page 406: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

408 Pair of Random Variables

Product∑

Pn Qn∑

P0Qn

Milk 102.60 17.100

Bread 55.55 7.575

Butter 81.00 20.250

Total 239.15 44.925

We can then find the Paasche index:

In/0 =∑

Pn · Qn∑P0 · Qn

· 100

= 239.15

44.925· 100 = 532.3 .

In other words, according to the Paascheindex, the price index number of these prod-ucts has risen by 432.3% (532.3−100) dur-ing the considered period.

FURTHER READING� Composite index number� Fisher index� Index number� Laspeyres index� Simple index number

REFERENCESPaasche, H.: Über die Preisentwicklung

der letzten Jahre nach den HamburgerBorsen-notirungen. Jahrb. Natl. Stat. 23,168–178 (1874)

Pair of Random Variables

A pair of variables whose values are deter-mined by a random experiment is calleda pair of random variables. There are twotypes of pairs:• A pair of random variables is discrete if

the set of values taken by each of the ran-dom variables is a finite or infinite count-able set.

• A pair of random variables is continuousif the setofvalues taken by each of the ran-dom variables is an infinite noncountableset.

In the case of a pair of random variables, theset of all the possible results of the experi-ment comprises the sample space, which islocated in a two-dimensional space.

MATHEMATICAL ASPECTSA pair of random variables is generallydenoted by thecapital letters such asX and Y.It is a two-variable function with true valuesin �2.If A is a subset of the sample space, then:

P(A) = Probability that (X, Y) ∈ A

= P[(X, Y) ∈ A] .

The case of a pair of random variables canbe generalized to the case of n random vari-ables: If n numbers are necessary to describethe realization of a random experiment,such a realization is represented by n randomvariables X1, X2, . . . , Xn. The sample spaceis then located in an n-dimensional space. IfA is a subset of the sample space, then we cansay that:

P(A) = P[(X1, X2, . . . , Xn) ∈ A] .

DOMAINS AND LIMITATIONSCertain random situations force one to con-sider not one but several numerical enti-ties simultaneously. Consider, for example,a system of two elements in a series andof random life spans X and Y; the study ofthe life span of the whole set must consid-er events concerning X and Y at the sametime. Or consider a cylinder, fabricated bya machine, of diameter X and height Y, with

Page 407: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

Paired Student’s T-Test 409

these values being random. The imposed tol-erances usually consider both entities simul-taneously; the structure of their probabilisticdependence should be known.

FURTHER READING� Covariance� Joint density function� Marginal density function� Random variable

REFERENCESHogg, R.V., Craig, A.T.: Introduction

to Mathematical Statistics, 2nd edn.Macmillan, New York (1965)

Paired Student’s T-TestThe paired Student’s test is a hypothesis testthat is used to compare the means of twopopulations when each element of a popu-lation is related to an element from the otherone.

MATHEMATICAL ASPECTSLet xij be observation j for the pair i (j= 1, 2and i = 1, 2, . . . , n). For each pair of obser-vations we calculate the difference

di = xi2 − xi1 .

We are looking for the estimated standarderror of the mean of di, denoted by d:

sd =1√n· Sd ,

where Sd is the standard deviation of di:

Sd =

√√√√√n∑

i=1

(di − d

)2

n− 1.

The resulting statistical test is defined by:

T = d

sd.

HypothesesThePairedStudent’sTest isa two-sided test.The hypotheses are:

H0 : δ = 0 (there is no differenceamong the treatments)

H1 : δ �= 0 (there is a differenceamong the treatments) ,

where δ is the difference between the meansof two populations (δ = μ1 − μ2).

Decision RulesWereject thenull hypothesisatsignificancelevel α if

|T| > tn−1, α2

,

where tn−1, α2

is the value of the Studenttable with n− 1 degrees of freedom.

EXAMPLESSuppose that two treatments are applied toten pairs of observations. The obtained dataand the corresponding differences denotedby di are presented in the following table:

Pair i Treatment di = xi2−xi1

1 2

1 110 118 8

2 99 104 5

3 91 85 −6

4 107 108 1

5 82 81 −1

6 96 93 −3

7 100 102 2

8 87 101 14

9 75 84 9

10 108 111 3

Page 408: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

410 Panel

The mean equals:

d = 1

10

10∑i=1

di = 32

10= 3.2 .

The standard deviation is calculated thus:

Sd =

√√√√√√10∑

i=1

(di − d

)2

10− 1=

√323.6

9= 6.00 ,

and the standard error:

sd =1√n· Sd

= 1√10· 6.00 = 1.90 .

The statistical test then equals:

T = d

sd= 3.2

1.90= 1.69 .

If we choose a significance level α = 0.05,the value of t9,0.025 is 2.26. Thus the nullhypothesis H0 : δ = 0 cannot be rejectedbecause |T| < t9,0.025.

FURTHER READING� Hypothesis testing� Student table� Student test

Panel

The panel is a type of census that repeatsperiodically. It is based on a permanent(or semipermanent) sample of individuals,households, etc., who are regularly ques-tioned about their behavior or opinion. Thepanel offers the advantage of following theindividual behavior of questioned people or

units and measuring any changes in theirbehavior over time. The information collect-ed in the panel is often richer than in a sim-ple census while retaining limited costs. Weadapt the sample renewal rate depending onthe goal of the inquiry. When the objective ofthepanel is to followtheevolutionof individ-uals over time (e. g., following career paths),the sample stays the same; we call this a lon-gitudinal inquiry. When the objective is toserve as a way of estimating characteristicsof the population in different periods, thesamplewillbepartially renewedfromtimetotime; we call this a changing panel. Thus thedata of the panel constitute a source of veryimportant information because they containan individual and time dimension simultane-ously.

HISTORYThe history of panel research dates back to1759, when the French Count du Montbeil-lard, Philibert Guéneau began recording hissons stature at six-month intervals from birthto age 18. His records have little in com-mon with contemporary panel studies, asidefrom the systematic nature of his repeatedobservations. The current concept of panelresearch was established in around the 1920sand 1930s, when several studies of humangrowth and development began. Since themid-20th century panel studies have prolif-erated across thesocial sciences.Several fac-tors have contributed to the growth of panelresearch in the social sciences. Much growthin the 1960s and 1970s came in responseto increased federal funding for panel stud-ies in the United States. For example, pol-icy concerns regarding work and econom-ic behavior led to funding for ongoing stud-ies like the National Longitudinal Surveys(NLS) in1966andthePanelStudyof Income

Page 409: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

Panel 411

Dynamics in 1968. The latter, for exam-ple, began with 4802 families with a list ofvariables exceeding 5000; poor householdswere oversampled. The NLS survey includ-ed 5020 elderly men, 5225 young men, 5083mature women, 5159 young women, and12686 youth.Technologicaladvanceshave facilitateddatacollection at a national scale and increasedresearchers’ capacity for managing, analyz-ing, and sharing large, complex data sets. Inaddition, advances in statistical methods foranalyzing longitudinaldatahaveencouragedresearchers not only to collect panel databut also to ask new questions about exist-ing data sets. The growth of panel studiesis evident in other industrialized countries,too.

DOMAINS AND LIMITATIONSInpractice, it is rare tohavepanels inthestrictmeaningof the term.Weshouldthustake intoaccount the following problems of the panel:difficulty of recruiting certain categories ofpanelists, fatigueofpanelists,which isoneofthe main reasons of non-responses, and evo-lution of the population causing the defor-mation of the sample from the population itshould represent. In practice, we often try toeliminate thisproblembypreferringachang-ing panel to a strict one.There are also specific errors of measure inpanels:(1) Telescope effect: error of dating, oftenmade by new panelists when indicating thedate of the event about which they are beingasked.(2) Panel effect or bias of conditioning:change in behavior of panelists that canappear over time as a result of repeated inter-views. The panelists are asked about theirbehavior, and finally they change it.

EXAMPLESInquiry about health and medical care:inquiry conducted by INSEE in about 10000households. Each household is followed for3 months during which every 3 weeks mem-bers of the household are questioned bya researcher about their medical care.Panel of Sofres consumers: The Metascopepanel of Sofres is based on a sample of20000 households. Each month the panelistsreceive by mail a self-administered ques-tionnaire on their different purchases. Thesequestionnaires are essentially used by com-panies who want to know what their clientswant.Panel of audience: The Mediamat panelis based on a sample of 2300 householdsequipped with audimeterswith push buttons.It allowsresearchers to measure theaudienceof TV stations.

FURTHER READING� Bias� Population� Sample� Survey

REFERENCESDuncan, G.J.: Household Panel Studies:

Prospects and Problems. Social ScienceMethodology, Trento, Italy (1992)

Kasprzyk, D., Duncan, G. et al. (eds.): Pan-el Surveys. New York, Chichester, Bris-bane, Toronto, Singapore. Wiley, NewYork (1989)

Rose, D.: Household panel studies: anoverview. Innovation 8(1), 7–24 (1995)

Diggle, P. J., P. Heagerty, K.-Y. Liang, andS.L. Zeger.: Analysis of longitudinal data,2nd edn. Oxford University Press, Oxford(2002)

Page 410: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

412 Parameter

Parameter

A parameter characterizes a quantitativeaspect of a population.

DOMAINS AND LIMITATIONSThe parameters of a population are oftenunknown. However, we can estimatea parameter by a statistic calculated froma sample using a method of estimation.

EXAMPLESA parameter is generally designated bya Greek letter:

μ: mean of populationσ : standard deviation of populationπ: percentage relative to population

FURTHER READING� Estimation� Estimator� Least squares� Maximum likelihood� Moment� Statistics

REFERENCESLehmann, E.L.: Theory of Point Estimation,

2nd edn. Wiley, New York (1983)

Parametric Test

A parametric test is a form of hypothesistesting inwhichassumptionsaremadeaboutthe underlying distribution of observed data.

HISTORYOne of the first parametric tests was the chi-square test, introduced by Pearson, K. in1900.

EXAMPLESThe Student test is an example of a param-etric test. It aims to compare themeansof twonormally distributed populations.Among the best-known parametric tests wemention the Student t-test and the Fishertest.

FURTHER READING� Binomial test� Fisher test� Hypothesis testing� Student test

Partial Autocorrelation

The partial autocorrelation at lag k is theautocorrelation between Yt and Yt−k that isnot accounted for by lags 1 through k − 1.

See autocorrelation.

Partial Correlation

The partial correlation between two vari-ables is defined as correlation of two vari-ables while controlling for a third or moreother variables. A measure of partial corre-lation between variables X and Y has a thirdvariable Z as a measure of the direct rela-tion between X and Y that does not takeinto account the consequences of the linearrelations of these two variables with Z. Thecalculation of thepartial correlation betweenX and Y given Z serves to identify in whatmeasure the linear relation between X and Ycan be due to the correlation of two variableswith Z.

HISTORYInterpreting analyses of correlation coef-ficients between two variables became an

Page 411: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

Partial Correlation 413

important question in statistics during theincreasingly widespread use of correlationmethods in the early 1900s. Pearson, Karlknew that a large correlation between twovariables could be due to their correlationwith a third variable. This phenomenon wasnotrecognizeduntil1926whenYule,GeorgeUdny proved it through an example by get-ting the coefficients of correlation betweentime series.

MATHEMATICAL ASPECTSFrom a sample of n observations, (x1, y1, z1),(x2, y2, z2), . . . , (xn, yn, zn) from an unknowndistribution of three random variables X, Y,and Z, the coefficient of partial correla-tion, which we denote by rxy.z, is definedas the coefficient of correlation calculatedbetween xi and yi with

xi = β0x + β1xzi ,

yi = β0y + β1yzi ,

where β0x and β1x are the least-squares esti-mators obtained by making a regression of xi

on zi, and β0y and β1y are the least-squaresestimators obtained by making a regressionof yi on zi. Thus by definition we have:

rxy.z =∑

(xi − x) (yi − y)√∑(xi − x)2

√∑(yi − y)2

.

We can prove the following result:

rxy.z = rxy − rxz · ryz√1− r2

xz ·√

1− r2yz

,

where rxy, rxz, and ryz are the coefficients ofcorrelation respectively between xi and yi, xi

and zi, and yi and zi. Note that when xi and yi

are not correlated with zi, that is, when rxz =ryz = 0, we have:

rxy.z = rxy ,

that is, the coefficient of correlation equalsthe coefficient of partial correlation, or, inother words, the value of the correlationbetween xi and yi is not due at all to the pres-ence of zi. On the other hand, if the corre-lations with zi are important, then we haverxy.z ∼= 0, which indicates that the observedcorrelation between xi and yi is only due tothe correlation between these variables withzi. Thus we say that there is no direct rela-tion between xi and yi (but rather an indirectrelation, through zi).More generally, we can calculate the partialcorrelation between xi and yi with zi1 and zi2

relative to two variables Z1 and Z2 in the fol-lowing manner:

rxy.z1z2 =rxy·z1 − rxz2.z1 · ryz2.z1√1− r2

xz2.z1·√

1− r2yz2.z1

.

DOMAINS AND LIMITATIONSWe must insist on the fact that a correlationmeasures only the linear relation betweentwo variables, without taking into accountfunctional models or the predictive or fore-casting capacity of a given model.

EXAMPLESWehaveobserved, for example, astrong pos-itive correlation between the number of reg-istered infarcts during a certain period andtheamountof icecreamsold in thesameperi-od. We should not conclude that high con-sumption of ice cream provokes infarcts or,conversely, that an infarct provokes a par-ticular desire for ice cream. In this exam-ple, the large number of infarcts does notcome from the amount of ice cream soldbut from the heat. Thus we have a strongcorrelation between the number of infarctsand the temperature. The large amount of

Page 412: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

414 Partial Least Absolute Deviation Regression

ice cream sold is also explained by the heat,which implies a strong correlation betweenthe amount of ice cream sold and the tem-perature. The strong correlation observedbetween the number of infarcts and theamount of ice cream sold is then the conse-quence of two strong correlations. A thirdvariable is hidden behind the apparent rela-tion of the two previous variables.

FURTHER READING� Correlation coefficient

REFERENCESYule, G.U. (1926) Why do we sometimes

get nonsense-correlations between time-series? A study in sampling and the natureof time-series. J. Roy. Stat. Soc. (2) 89, 1–64

Partial Least AbsoluteDeviation Regression

The partial least absolute deviation (partialLAD) regression is a regression method thatlinearly relates a response vector to a set ofpredictors using derived components. It is anL1 regression modeling technique well suit-ed to situations where the number of param-eters in a linear regression model exceedsthe number of observations. It is mostly usedfor prediction purposes rather than inferenceon parameters. The partial LAD regressionmethod extends the partial least-squaresregression to the L1 norm associated withLAD regression instead of the L2 norm,which isassociatedwithpartial leastsquares.The partial LAD regression follows thestructure of the univariate partial least-squares regression algorithm and extractscomponents (denoted by t) from direc-tions (denoted by w) that depend upon the

response variable. The directions are deter-mined by a Gnanadesikan–Ketterning (GK)covariance estimate that replaces the usualvariance based on the L2 norm with MAD,the median absolute deviation, based onL1. Therefore we use the notation wmad

k forpartial LAD directions.

HISTORYThe partial LAD regression method wasintroduced in Dodge, Y. et al. (2004). It wasfurther developed in Dodge, Y. et al. (2004)andtestedusing thebootstrap inKondylis,A.and Whittaker, J. (2005).

MATHEMATICAL ASPECTSFor i = 1, . . . , n and j = 1, . . . , p, denotingobservation units and predictors, the partialLAD regression algorithm is given below:1. Center or standardize both X and y.2. For k = 1, . . . , kmax:

Compute wmadk according to

wmadj,k =

1

4

(mad2(xj,k−1 + y)

−mad2(xj,k−1 − y))

,

and scale wmadk to 1.

Extract component

tk =p∑

j=1

wmadj,k xj,k−1 ,

where

wmadk =

(wmad

1,k , . . . , wmadj,k

).

Orthogonalize each xj,k−1 with respect totk: xj,k = xj,k−1 − E(xj,k−1|tk).

3. Give the resulting sequence of the fit-ted vectors yplad

k = Tkqk, where Tk =(t1, . . . , tk) is the score matrix and qk =(q1, . . . , qk) the LAD regression coeffi-cient vector.

Page 413: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

Partial Least Absolute Deviation Regression 415

4. Recover the implied partial LAD regres-sion coefficients according to β

pladk =

Wmadk qk, where the matrix Wmad

k pools inits columns the vectors wmad

k expressed interms of the original xj.

The univariate partial least-squares regres-sion and the partial LAD method share thefollowing properties:1. y = q1t1 + . . .+ qktk + ε,2. X = p1t1 + . . .+ pktk + f ,3. cor(ti, tj) = 0 for i �= j,where cor(,) denotes the Pearson correlationcoefficient, ε and f correspond to residualterms, and pk the X-loading.The partial LAD method builds a regres-sion model that relates the predictors to theresponse according to:

ypladk =

p∑j=1

βpladj xj .

EXAMPLEWe give here an example from near infraredexperiments in spectroscopy. We use theoctane data set that consists of 39 gaso-line samples for which the octanes havebeen measured at 225 wavelengths (innanometers). The regressors are highly mul-ticollinear spectra at different numbers ofwavelengths, and their number exceeds thesample size n.

In the figure, we give the lines for the 39gasolinesamples throughout their225wave-lengths.We use the partial LAD regression methodto build a linear model based on two derivedcomponents. The resulting plot of theresponse y vs. the fitted values y is givenin the right panel of the following figureunder the main title PLAD (partial LADregression). The right panel of the same fig-ure is the corresponding plot of the responsey vs. the fitted values y given for a univariatepartial least-squares regression model basedon two components.

The partial least-squares regression modelwill require an additional component in the

Page 414: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

416 Partial Least-Squares Regression

final model in order to provide good pre-dictive results as the partial LAD regressionmodel. This is due to a group of six outliersin the octane data set. These are observa-tions 25, 26, 36, 37, 38, and 39, which con-tained alcohol. They are visible for wave-lengths higher than 140nm in the lineplotabove.

FURTHER READING� Least absolute deviation regression� Partial least-squares regression

REFERENCESDodge, Y., Kondylis, A., Whittaker, J.:

Extending PLS1 to PLAD regressionand the use of the L1 norm in soft mod-elling. COMPSTAT 2004, pp. 935–942,Physica/Verlag-Springer (2004)

Dodge, Y., Whittaker, J., Kondylis, A.:Progress on PLAD and PQR. In: Pro-ceedings 55th Session of the ISI 2005,pp. 495–503 (2005)

Kondylis, A., Whittaker, J.: Using the Boot-strap on PLAD regression. PLS andRelated Methods. In: Proceedings of thePLS’05 International Symposium 2005,pp. 395–402. Edition Aluja et al. (2005)

Partial Least-SquaresRegression

The partial least-squares regression (PLSR)is a statistical method that relates two datamatrices X and Y, usually called blocks, viaa latent linear structure. It is applied either ona single response vector y (univariate PLSR)or a response matrix Y (multivariate PLSR).PLSR is commonly used when the recordedvariables are highly correlated. It is a very

suitablemethodincaseswhere thenumberofthevariablesexceedsthenumberof theavail-ableobservations.Thisisduetothefact that ituses orthogonal derived components insteadof the original variables. The use of a feworthogonal components instead ofnumerouscorrelatedvariablesguarantees thereductionof the regression problem on a small sub-space that often stabilizes the variability ofthe estimated coefficients and provides bet-ter predictions.

HISTORYThe PLSR was initially developed withinthe NIPALS algorithm, though it has beenimplemented in various algorithms includ-ing theorthogonalscoresandtheorthogonal-loading PLSR, the SIMPLS algorithm forPLSR, the Helland PLSR algorithm, and theKernel PLSR algorithm.

MATHEMATICAL ASPECTSPLSR methods solve the following maxi-mization problem:

maxw,q{cov(Xk−1wk, Yk−1qk)}

subject to

wTk wk = 1, qT

k qk = 1

and

tk ⊥ tj, uk ⊥ uj for each k �= j ,

where cov(·, ·) denotes covariance and wk,qk and tk, uk are loading vectors and compo-nents (or score vectors) for X and Y, respec-tively. The use of the subscript k in X andY shows that PLSR deflates the data at eachextracted dimension k and uses residuals asnew data for the following dimension.

Page 415: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

Partial Least-Squares Regression 417

Univariate PLSR is much easier to inter-pret. It is in fact a generalization of multi-ple linear regression that shrinks regres-sion coefficients on directions of low covari-ancebetweenvariablesandresponses.Giventhe following relation:

cov(Xw, y) ∝ cor(Xw, y)var(X)1/2 ,

with cor and var denoting correlation andvariance, respectively, it is easy to verifythat the maximization criterion in the PLSRis a compromise between least-squaresregression and regression on principalcomponents. The latter methods maximizecor(Xw, y), and var(X), respectively.The univariate PLSR involves the followingsteps:1. For (X, y)commonlycentered:2. For k = 1, . . . , p:

Store the cov(xj,k−1, yk−1) on vector wk.Pool wk in matrix W(p×k).Extract component tk as tk = Xj,k−1wk.Orthogonalize Xk−1 and yk−1 withrespect to tk.

3. Take least-squares residuals as new datafor k← k+ 1.

Once the components are extracted, theleast-squares regression is used to regressthem on response vector y. The numberof components that should be ultimatelyretained is defined using bootstrap andcross-validation methods.

EXAMPLEThePLSRmethodisageneralizationofmul-tiple linear regression. This is justifiedusing a regression problem where no corre-lation between the predictors occurs, that is,an orthogonal design. The table below con-tains the results of an orthogonal design withtwo explanatory variables and one response.

Table: Orthogonal Design

y x1 x2

18 −2 4

12 1 3

10 0 −6

16 −1 −5

11 2 3

9 0 −3

11 −1 6

8 1 −1

7 1 0

12 −1 −1

Multiple linear regression analysis for thedata in Table 1 (y as the response and twoexplanatory variables x1 and x2) lead to thefollowing least-squares estimates:

y = 11.40− 1.8571x1 + 0.1408x2 .

Using the PLSR method (data were initial-ly centered and scaled) and regressing theresponse on the derived components we areled to exactly the same estimates with onlyone component used, that is, by y = t1 · q1.We finally get

y = T1q1 = Xw0q1 = XβPLS .

Thus for standardized data the impliedregression coefficients are

qβPLS = (−0.68, 0.164) ,

which, when transformed back into the orig-inal scale, equals (11, 40,−1.8571, 0.1408).The latter are the multiple linear regressionestimates.

DOMAINS AND LIMITATIONSPLSR methods have been extensively usedin near infrared experiments inspectroscopy.They have long been used by chemome-tricians to do multivariate calibration and

Page 416: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

418 Pearson, Egon Shape

to measure chemical concentrations. PLSRmethods have been equally used in many sta-tistical applications with more variables thanobservations, for example inenvironmetrics,in microarray experiments in biostatistics,and in functional data analysis.PLSR methods focus mainly on predictionaccuracy and are mainly used in order to con-struct good predictive models.

FURTHER READING� Bootstrap� Least-squares method

REFERENCESTenenhaus, M.: La régression PLS. Théorie

et pratique. Technip, Paris (1998)

Frank, I., Friedman, J.: A statistical viewof some chemometrics regression tools.Technometrics 35, 109–135 (1993)

Martens, H., and Naes, T.: Multivariate Cal-ibration. Wiley, New York (1989)

Helland: On the structure of PLS regres-sion, Comm. in Stat-Simul. and Comp.,17, 581–607 (1988)

Pearson, Egon Shape

Pearson, Egon Shape (1895–1980), son ofstatistician Pearson, Karl, studied mathe-matics at Trinity College, Cambridge. In1921, he entered the Department of Statis-tics of the University College in London. Hemet Neyman, Jerzy, with whom he start-ed to collaborate during the visit of the lat-ter to London in 1924–1925, and also collab-orated with Gosset, William Sealy. Whenhis father left University College in 1933, hetransferred to the Applied Statistics Depart-mentand in 1936becamedirectorof the jour-

nal Biometrika following the death of hisfather, the journal’s foundor.Besides the works that he published in col-laboration with Neyman, J. and that led tothe development of the theory of hypoth-esis testing, Pearson, E.S. also touched onproblems of quality control and operationsresearch.

Selected articles of Pearson, E.S.:

1928 (with Neyman, J.) On the use andinterpretation of certain test criteriafor purposes of statistical inference.Biometrika 20A, 175–240, pp. 263–295.

1931 The test of significance for the corre-lation coefficient. J. Am. Stat. Assoc.26, 128–134.

1932 The percentage limits for the distri-bution of range in samples from a nor-mal population (n ≤ 100). Biometri-ka 24, 404–417.

1933 Neyman, J. and Pearson, E.S. Onthe testing of statistical hypotheses inrelation to probability a priori. Proc.Camb. Philos. Soc. 29, 492–510.

1935 The Application of Statistical Meth-ods to Industrial Standardization andQuality Control. Brit. Stant. 600.British Standards Institution, Lon-don.

FURTHER READING� Hypothesis testing

Pearson, Karl

Born in London, Pearson, Karl (1857–1936)is known for his numerous contributions to

Page 417: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

Percentile 419

statistics. After studying at Kings College,Cambridge, he was named in 1885 chairof the Applied Mathematics Department ofUniversity College in London. He spent hisentire career at this university, where he wasmade chair of the Eugenics Department in1911. In 1901, with the help of Galton,Francis, he founded the journal Biometrikaand was its editor in chief until his death in1936.In 1906, he welcomed for 1year in his lab-oratory Gosset, William Sealy, with whomhe resolved problems related to samples ofsmall dimension. He retired in 1933, and hisdepartment at the university was split intotwo: the Eugenics Department was entrustedto Fisher, Ronald Aylmerand theStatisticsDepartment to his own son Pearson, EgonShape.Selected articles of Pearson, Karl:

1894 1948. Contributions to the mathe-matical theory of evolution. I. pp. 1–40. In: Karl Pearson’s Early Statis-tical Papers. Cambridge UniversityPress, Cambridge. First published as:On the dissection of asymmetricalfrequency curves, Philos. Trans. Roy.Soc. Lond. Ser. A 185, 71–110.

1895 1948. Contributions to the mathe-matical theory of evolution. II. Skewvariation in homogeneous material.In: Karl Pearson’s Early StatisticalPapers. Cambridge University Press,Cambridge, pp. 41–112. First pub-lished in Philos. Trans. Roy. Soc.Lond. Ser. A 186, 343–414.

1896 1948. Mathematical contributions tothe theory of evolution. III. Regres-sion, heredity and panmixia. In: KarlPearson’s Early Statistical Papers.Cambridge University Press, Cam-

bridge, pp. 113–178. First publishedin Philos. Trans. Roy. Soc. Lond. Ser.A 187, 253–318 in the PhilosophicalMagazine, 5th series, 50, pp. 157–175.

FURTHER READING� Chi-square distribution� Correlation coefficient

PercentagePercentage is the notion of a measure allow-ing one to describe the proportion of indi-viduals or statistical units having a certaincharacteristic in a collection, evaluated onthe basis of 100.

MATHEMATICAL ASPECTSPercentage is generally denoted by p whenit is measured on a sample and by π whenit concerns a population.Let therebeasampleofsizen, ifk individualshave certain characteristic; the percentage ofindividuals is given by the statistic p:

p = k

n· 100% .

The calculation of the same percentage onapopulation of sizeN givesusparameterπ :

π = k

N· 100% .

PercentilePercentiles are measurements of locationcomputed in a data set. We call percentilesthe values that divide a distribution into 100equalparts (eachpartcontains thesamenum-ber of observations). The xth percentile is

Page 418: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

420 Percentile

the value Cx such that x% of the observationsare lower and (100−x)% of the observationsare greater than Cx. For example, 20th per-centile is the value that separates the 20% ofvalues that are lower than it from the 80%that are higher than it. We will then have 99percentiles for a given distribution:

Note for example:Percentile 10 = first decilePercentile 25 = first quartilePercentile 50 = medianThis notion is part of the family of quantiles.

MATHEMATICAL ASPECTSThe process of calculation is similar to thatof the median, quartiles, and deciles. Whenall the raw observations are given, the per-centiles are calculated as follows:1. Organize the n observations in the form

of a frequency distribution.2. The percentiles correspond to the obser-

vations for which the relative cumulatedfrequency exceeds respectively 1%, 2%,3%, . . ., 98%, 99%.Some authors propose the following for-mula, which permits one to determinewith precision the value of the differentpercentiles:Calculation of the jth percentile:Consider i the integer part of j·(n+1)

100 and

k the fractional part of j·(n+1)100 .

Consider xi and xi+1 the data values ofthe observations respectively classifiedin the ith and (i + 1)th position (whenthe n observations are sorted in increas-ing order).

The jth percentile is equal to:

Cj = xi + k · (xi+1 − xi) .

When we have observations that aregrouped into classes, the percentiles aredetermined as follows:1. Determine the class in which the desired

percentile is:• 1st percentile: first class for which the

relative cumulated frequency is over1%.

• 2nd percentile: first class for which therelative cumulated frequency is over2%.

• 99thpercentile:firstclassforwhich therelative cumulated frequency is over99%.

2. Calculate the value of the percentiles asa function of the hypothesis accordingto which the observations are uniformlydistributed in each class:

percentile = L1+[(n · q)−∑

finf

fcentile

]· c ,

where L1 is the lower limit in the classof the percentile, n is the total numberof observations, q is 1

100 for the 1st per-centile, q is 2

100 for the 2nd percentile, . . .,q is 99

100 for the 99th percentile,∑

finf isthe sum of the frequencies lower than thepercentile class, fpercentile is the frequencyof the percentile class, and c is the dimen-sion of the interval of the percentile class.

DOMAINS AND LIMITATIONSThe calculation of the percentiles only hastrue meaning for a quantitative variablethat can take its values in a given interval. Inpractice, percentiles can be calculated onlywhenthere isa largenumberofobservationssince this calculation consists in dividing the

Page 419: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

Permutation 421

setofobservationsinto100parts.Thisnotionis seldom applied because in most descrip-tive analyses the determination of the decilesor even the quartiles is sufficient for theinterpretation of the results.

EXAMPLESConsider an example of the calculation ofpercentiles on the frequency distributionof a continuous variable where the obser-vations are grouped into classes.The following frequency table representsthe profits (in thousands of euros) of 2000bakteries:

Profit(thousandsof euros)

Fre-quen-cies

Cumu-latedfrequen-cies

Relativecumulat-ed fre-quency

100–150 160 160 0.08

150–200 200 360 0.18

200–250 240 600 0.30

250–300 280 880 0.44

300–350 400 1280 0.64

350–400 320 1600 0.80

400–450 240 1840 0.92

450–500 160 2000 1.00

Total 2000

The class containing the first percentile is theclass 100–150 [the one for which the rela-tivecumulated frequency isover0.01(1%)].Considering that the observations are uni-formly distributed in each class, we obtainfor the first percentile the following value:

1st percentile

= 100+⎡⎣

(2000 · 1

100

)− 0

160

⎤⎦ · 50

= 106.25 .

The class containing the 10th percentile isthe class 150–200. The value of the 10th per-centile is equal to

10th percentile

= 150+⎡⎣

(2000 · 10

100

)− 160

200

⎤⎦ · 50

= 160 .

We can calculate all the percentiles in thesame way. We can then conclude, for exam-ple, that 1% of the 2000 bakeries have a prof-it between 100000 and 106250 euros, orthat 10% have a profit between 100000 and160000 euros, etc.

FURTHER READING� Decile� Measure of location� Median� Quantile� Quartile

Permutation

The term permutation is a subject of combi-natory analysis. It refers to an arrangementor an ordered of n objects. Theoretically, weshould distinguish between the case where nobjects are fully identified and those wherethey are only partially identified.

HISTORYSee combinatory analysis.

MATHEMATICAL ASPECTS1. Number of permutations of different

objects

Page 420: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

422 Pictogram

The number of possible permutations ofn objects equals n factorial:

Permutations

= n! = n · (n− 1) · . . . · 3 · 2 · 1 .

2. Number of permutations of partially dis-tinct objectsThe number of possible permutations ofn objects among which n1, n2, . . . , nr arenot distinguished among them equals:

number ofpermutations

= n!

n1! · n2! · . . . · nr!,

where n1 + n2 + . . .+ nr = n.

EXAMPLESIf we have three objects A, B, and C, then thepossible permutations are as follows:

A B C

A C B

B A C

B C A

C A B

C B A

Thus we have six possible permutations.Without enumerating them, we can find thenumber of permutations using the followingformula:

Number of permutations = n! ,

where n is the number of objects. This givesus in our example:

Number of permutations = 3!

= 3 · 2 · 1 = 6 .

Imagine that we compose the signals align-ing some flags. If we have five flags of dif-ferent colors, the number of signals that wecan compose will be:

5! = 5 · 4 · 3 · 2 · 1 = 120 .

On the other hand, if among the flags thereare two red, two white, and one black flag,we can compose:

5!

2! · 2! · 1!= 30

different signals.

FURTHER READING� Arrangement� Combination� Combinatory analysis

Pictogram

A pictogram is a symbol representing a con-cept, object, place or event by illustration.Like the bar chart, the pictogram is usedeither to compare the categories of a quali-tative variableor to comparedata setscom-ing from different years or different places.Pictograms are mostly used in journals asadvertisement graphs for comparisons andare of little statistical interest because theyonly provide rough approximations.

MATHEMATICAL ASPECTSThe figure that is chosen to illustrate the dif-ferentquantities inapictogramshould ifpos-sible be divisible into four parts, so that theycan represent halves and quarters. The scaleis chosen as a function of the available space.Therefore the figure is not necessarily equalto unity.

EXAMPLESConsider wheat production in two countriesA and B. The production of country A isequal to 450.000 quintals and that of coun-try B is of 200.000 quintals (1 quintal is equalto 100kg).

Page 421: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

Pie Chart 423

We obtain the following pictogram:

The surface of each figure is proportional tothe represented quantity.

FURTHER READING� Bar chart� Graphical representation� Quantitative graph

Pie Chart

Thepiechart isa typeofquantitativegraph.It ismadeofacircledivided intosectors,eachsector having an angle that is proportional tothe represented magnitude.

HISTORYSee graphical representation.

MATHEMATICAL ASPECTSTo establish a pie chart, the total of the rep-resented frequencies is calculated and thenthe relative frequencies representing the dif-ferent sectors are calculated. To draw thesesectors on a circle, the relative frequenciesare converted into degrees by the followingtransformation:

α = f · 360◦ ,

whereα represents theangle in thecenterandf the relative frequency.

DOMAINS AND LIMITATIONSPiechartsareused togiveavisual representa-tionof thedata that formthedifferentpartsofa whole population. They are often found inthe media (television, journals, magazines,etc.), where they serve to explain in a simpleand synthetic way a concept or situation thatis difficult to understand with just numbers.

EXAMPLESWe will represent the distribution of the mar-ital status in Australia on 30 June 1981 usinga pie chart. The data are the following:

Marital status in Australia on 30 June 1981 (inthousands)

Marital status Frequency Relativefrequency

Bachelor 6587.3 0.452

Married 6836.8 0.469

Divorced 403.5 0.028

Widow 748.7 0.051

Total 14576.3 1.000

Source: Australian Bureau of Statistics, Aus-tralian Pocket Year Book 1984, p. 11

First the relative frequencies are trans-formed into degrees:

a) α = 0.452 · 360◦ = 162, 72◦

b) α = 0.469 · 360◦ = 168, 84◦

c) α = 0.028 · 360◦ = 10, 08◦

d) α = 0.051 · 360◦ = 18, 36◦

The pie chart is shown on the next page.

FURTHER READING� Bar chart� Graphical representation� Pictogram� Quantitative graph

Page 422: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

424 Pitman, Edwin James George

Pitman, Edwin James George

Pitman, Edwin James George was born inMelbourne (Victoria), Australia in 1897. Hewent to school at Kensington State Schooland South Melbourne College. During hislast year of study, he was awarded a schol-arship to study mathematics in Wyselaskieand in Dixson, as well as at Ormond Col-lege. He obtained a degree in letters (similarto a B.A.) (1921), a degree in the sciences(similar to a B.S.) (1922), and a master’sdegree (1923). He was then named tempo-rary professor of mathematics at CanterburyCollege, University of New Zealand (1922–1923). He returned to Australia where hewas named assistant professor at Trinity andOrmond Colleges as well as part-time lectur-er inphysicsatMelbourneUniversity(1924–1925). In 1926, Pitman, E.J.G. was namedprofessor of mathematics at Tasmania Uni-versity, where he stayed until his retirementin 1962. He died in 1993, in Kingston (Tas-mania).A major contribution of Pitman, E.J.G. toprobability theory concerns the study ofthe behavior of a characteristic function ina neighborhood of zero.

Selected works of Pitman, Edwin JamesGeorge:

1937 Significance tests which may beapplied to samples from any pop-ulations. Supplement, J. Roy. Stat.Soc. Ser. B 4, 119–130.

1937 Significance tests which may beapplied to samples from any popula-tions. II. The correlation coefficienttest. Supplement, J. Roy. Stat. Soc.Ser. B 4, 225–232.

1938 Significance tests which may beapply to samples from any popu-lations. III. The analysis of variancetest. Biometrika 29, 322–335.

1939 The estimation of location and scaleparameters. Biometrika 30, 391–421.

1939 Tests of hypotheses concerning loca-tion and scale parameters. Biometri-ka 31, 200–215.

Point Estimation

Pointestimationofapopulation parameterallows to obtain a unique value calculatedfrom a sample. This value will be consideredthe estimate of the unknown parameter.

MATHEMATICAL ASPECTSConsider a population in which trait X isstudied; suppose that the shape of the distri-bution of X is known but that the param-eter θ on which this distribution depends isunknown.To estimate θ , a sample of size n(X1, X2, . . . , Xn) is taken from the popu-lation and a function G(X1, X2, . . . , Xn)

is created that, for a particular realization(x1, x2, . . . , xn), will provide a unique value

Page 423: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

Poisson Distribution 425

as an estimate of the value of the para-meter θ .

DOMAINS AND LIMITATIONSIn general, the point estimation of a param-eter of a population does not exactly corre-spond to the value of this parameter. Thereexists a sampling error that derives fromthe fact that part of the population has beenomitted. It ispossible to measure thiserrorbycalculating the variance or standard devi-ation of the estimator that has been used toevaluate the parameter of the population.

EXAMPLESA company that manufactures light bulbswants to study the average lifetime ofits bulbs. It takes a sample of size n(X1, X2, . . . , Xn) from the production of lightbulbs.The function G(X1, X2, . . . , Xn) that it willuse to estimate the parameter θ correspond-ing to the mean μ of the population is asfollows:

X =n∑

i=1

Xi

n.

This estimator X of μ is a randomvariable, and for a particular realization(x1, x2, . . . , xn) of the sample it takes a pre-cisevaluedenotedby x.Therefore, forexam-ple, for n = 5, the following sample can beobtained (in hours):

(812, 1067, 604, 918, 895) .

The estimator x becomes:

x = 812+ 1067+ 604+ 918+ 895

5

= 4296

5= 859.2 .

The value 859.2 is a point estimation of theaverage lifetime of these light bulbs.

FURTHER READING� Error� Estimation� Estimator� Sample� Sampling

REFERENCESLehmann, Erich L., Casella, G.: Theory of

point estimation. Springer-Verlag, NewYork (1998)

Poisson Distribution

ThePoissondistribution isadiscrete proba-bility distribution. It is particularly usefulfor phenomena of counting in unit of time orspace.The random variable X corresponding tothe number of elements observed per unit oftime or space follows a Poisson distributionof parameter θ , denoted P(θ), if its proba-bility function is:

P (X = x) = exp (−θ) · θ x

x!.

Poisson distribution, θ = 1

HISTORYIn 1837 Poisson, S.D. published the distri-bution that carries his name. He reached thisdistribution by considering the limits of thebinomial distribution.

Page 424: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

426 Poisson Distribution

MATHEMATICAL ASPECTSThe expected value of the Poisson distri-bution is by definition:

E[X] =∞∑

x=0

x · P(X = x)

=∞∑

x=0

x · e−θ θ x

x!

=∞∑

x=1

x · e−θ θ x

x!

= θ · e−θ

∞∑x=1

θ x−1

(x− 1)!

= θ · e−θ · eθ

= θ .

The variance of X is equal to:

Var(X) = E[X2]− (E[X])2

= E[X(X − 1)+ X]− (E[X])2

= E[X(X − 1)]+ E[X]− (E[X])2 .

Since

E[X(X − 1)] =∞∑

x=0

x(x− 1)e−θ · θ x

x!

=∞∑

x=2

x(x− 1)e−θ · θ x

x!

= e−θ · θ2 ·∞∑

x=2

θ x−2

(x− 2)!

= e−θ · θ2 · eθ

= θ2 ,

we obtain:

Var(X) = θ2 + θ − θ2 = θ .

DOMAINS AND LIMITATIONSThePoisson distribution isused to determinethe following events, among others:

• Number of particles emitted by a radioac-tive substance.

• Number of phone calls recorded by a cen-ter.

• Number of accidents happening to aninsured person.

• Number of arrivals at a counter.• Number of bacteria in a microscopic

preparation.• Number of plants or animals in a surface

in nature determined by an observer.When θ tends toward infinity, the Poissondistribution can be approximated by the nor-mal distribution with mean θ and vari-ance θ .

EXAMPLESA secretary makes on average two mistakesper page. What is the probability of havingthree mistakes on one page?The number of mistakes per page, X, followsa Poisson distribution of parameter θ = 2,and its probability function is then:

P (X = x) = exp (−θ) · θ x

x!

= exp (−2) · 2x

x!.

The probability of having three mistakes ona page is equal to:

P(X = 3) = e−2 · 23

3!= 0.1804 .

FURTHER READING� Binomial distribution� Discrete probability distribution� Normal distribution

REFERENCESPoisson, S.D.: Recherches sur la probabil-

ité des jugements en matière criminelle

Page 425: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

Pooled Variance 427

et en matière civile. In: Procédés desRègles Générales du Calcul des Probabil-ités. Bachelier, Imprimeur-Libraire pourles Mathématiques, Paris (1837)

Poisson, Siméon-Denis

Poisson, Siméon-Denis (1781–1840),despite coming from a modest provincialfamily, was able to pursue his studies at theEcole Polytechnique in Paris, where he laterbecame a professor.From 1815, he also taught at the Sorbonneand was elected to the Academy of Sciencesthat same year.Poisson was interested in research in differ-ent fields: mechanics, physics, and probabi-lity theory. In 1837 he published an arti-cle called “Recherches sur la probabil-ité des jugements en matière criminelleet en matière civile, précédées des règlesgénérales du calcul des probabilités”. In hisnumerous writings he presented the basis ofa statistical method for the social sciences.Selected works of Siméon-Denis Poisson:

1824 Sur la probabilité des résultatsmoyens des observations. Connais-sance des temps pour l’an 1827,pp. 273–302.

1829 Suite du mémoire sur la probabil-ité du résultat moyen des observa-tions, inséré dans la connaissancedes temps de l’année 1827. Con-naissance des temps pour l’an 1832,pp. 3–22.

1836a Note sur la loi des grands nombres.Comptes rendus hebdomadaires desséances de l’Académie des sciences2: pp. 377–382.

1836b Note sur le calcul des probabilités.Comptes rendus hebdomadaires des

séances de l’Académie des sciences2: pp. 395–400.

1837 Recherches sur la probabilité desjugements en matière criminelleet en matière civile, précédéesdes règles générales du calcul desprobabilités. Bachelier, Imprimeur-Libraire pour les Mathématiques,Paris.

FURTHER READING� Poisson distribution

Pooled Variance

The pooled variance is used to estimate thevalue of the variance of two or more popula-tions when the respective variances of eachpopulation are unknown but can be consid-ered as equal.

MATHEMATICAL ASPECTSOn the basis of k samples of dimensions n1,n2, . . . , nk, the pooled variance S2

p is estimat-ed by:

S2p =

k∑i=1

ni∑j=1

(xij − xi.

)2

(k∑

i=1ni

)− k

,

where xij is the jth observation of sample iand xi. is the arithmetic mean of sample i.Knowing the respective variances of eachsample, the pooled variance can also bedefined by:

S2p =

k∑i=1

(ni − 1) · S2i

(k∑

i=1ni

)− k

,

Page 426: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

428 Population

where S2i is the variance of sample i:

S2i =

ni∑j=1

(xij − xi.

)2

ni − 1.

EXAMPLESConsider two samples of computers of dif-ferent brands for which we noted the time (inhours) before the first problem:

Brand 1 Brand 2

2800 2800

2700 2600

2850 2400

2650 2700

2700 2600

2800 2500

2900

3000

The number of samples k equals 2, andthe dimension of the sample brands equals,respectively:

n1 = 8 ,

n2 = 6 .

The calculation of the variance of the firstsample gives us

S21 =

n1∑j=1

(x1j − x1.

)2

n1 − 1,

where

x1. =

n1∑j=1

x1j

n1= 22400

8= 2800

S21 =

95000

8 − 1= 13571.428 .

For the second sample we get:

S22 =

n2∑j=1

(x2j − x2.

)2

n2 − 1

with

x2. =

n2∑j=1

x2j

n2= 15600

6= 2600

S22 =

70000

6− 1= 14000 .

If we assume that the unknown variances oftwo populationsσ 2

1 andσ 22 are identical, then

we can calculate an estimation of σ 2 =σ 2

1 = σ 22 by the pooled variance:

S2p =

2∑i=1

(ni − 1) · S2i

∑2i=1 ni − 2

= (8−1)·13571.428+(6−1)·14000

8+6−2

= 13750 .

FURTHER READING� Variance

Population

A population is defined as a collection of sta-tistical units of the same nature whose quan-tifiable information we are interested in. Thepopulationconstitutes thereferenceuniverseduring the study of a given statistical prob-lem.

EXAMPLESThecitizensofastate,agroupoftreesinafor-est, workers in a factory, and prices of con-sumer goods all form discrete populations.

Page 427: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

Prevalence Rate 429

FURTHER READING� Sample

Prevalence

See prevalence rate.

Prevalence Rate

The prevalence of a disease is the numberof individuals affected in the statistical pop-ulation at a given time. This notion is closeto the notion of “stock”.The prevalence rate of an illness is the pro-portion of affected individuals in the popu-lation at a given moment.

HISTORYFarr, William, pioneer in the use of statisticsin epidemiology and creator of the conceptsof mortality rate and question and answer,showed that the prevalence of an illness isequivalent to the product of the incidenceand duration of the illness.MacMahon, B. and Pugh, T.F. (1970) illus-trated the relation of the incidence rateand the prevalence rate using data col-lected between 1948 and 1952 on acuteleukemia among the white population ofBrooklyn, New York. The incidence ratewas 32.5 cases per million inhabitants peryear, and the prevalence rate was 6.7 cas-es per million inhabitants. The duration ofacute leukemia thus had to be 0.21 years or2.5 months:

Duration ofacute leukemia

= prevalence ate

incidence rate

= 6.7

32.5year = 0.21 year .

MATHEMATICAL ASPECTSThe prevalence rate is defined as follows:

Prevalence rate = number of afflicted

dimension ofpopulation

.

The prevalence rate and the incidence rateof an illness are approximately associated toone another by the mean duration of the ill-ness (that is, the mean duration of survival orthe mean duration of the cure). This relationis written:

Prevalence rate

� (incidence rate) ·(mean duration

of illness

).

DOMAINS AND LIMITATIONSPrevalence is different from risk. A risk isa quantity that has a predictive value, as itconcerns a fixed period of observation andcontains, because of this, information homo-geneous in time. In contrast, prevalence istangential to the history of an illness in a pop-ulation: it cannot give any indication aboutthe risk insofar as it aggregates, at a givenmoment, recently diagnosed cases and oth-ers in the distant past. Note that the greaterthe mean survival, if the illness is incurable,or the longer the convalescence, if the illnessis curable, the greater the prevalence of theillness.Finally, note that the prevalence rate oftenrefers not to a well-defined illness, but to aninterval of values of a biological parameter.For example, the prevalence rate of hyper-cholesterolemy can be defined as the pro-portion of individuals having a cholesterole-my greater than 6.5mmol/l in given a pop-ulation at a given moment. Thus the per-centiles of a biological parameter, consid-ered as measuring the proportion of individ-uals for which the parameters take a value

Page 428: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

430 Probability

withinacertain interval,canbeinterpretedasthe prevalence rate. For example, the preva-lence rate relative to values smaller than orequal to the median (or percentile 50) ofa biological parameter is 50%.

FURTHER READING� Attributable risk� Avoidable risk� Cause and effect in epidemiology� Incidence rate� Odds and odds ratio� Relative risk� Risk

REFERENCESCornfield, J.: A method of estimating com-

parative rates from clinical data. Appli-cations to cancer of the lung, breast, andcervix. J. Natl. Cancer Inst. 11, 1269–75(1951)

Lilienfeld, A.M., Lilienfeld, D.E.: Founda-tions of Epidemiology, 2nd edn. Claren-don, Oxford (1980)

MacMahon, B., Pugh, T.F.: Epidemiology:Principles and Methods. Little Brown,Boston, MA (1970)

Morabia, A.: Epidemiologie Causale. Editi-ons Médecine et Hygiène, Geneva (1996)

Morabia, A.: L’Épidémiologie Clinique.Editions “Que sais-je?”. Presses Univer-sitaires de France, Paris (1996)

Probability

We can define the probability of an eventeither by using the relative frequencies orthrough an axiomatic approach.In the first approach, we suppose that a ran-dom experiment is repeated many times in

thesameconditions.ForeacheventAdefinedin the sample space �, we define nA as thenumberof times thateventAoccurred duringthe first n repetitions of the experiment. Inthis case, the probability of event A, denotedby P (A), is defined by:

P (A) = limn→∞

nA

n,

which means that P (A) is defined as the lim-it relative to the number of times event Aoccurred relative to the total number of rep-etitions.In the second approach, for each event A,we accept that there exists a probabilityof A, P (A), satisfying the following threeaxioms:1. 0 ≤ P (A) ≤ 1,2. P (�) = 1,3. For each sequence of mutually exclusive

events A1, A2, . . . (that is of events Ai ∩Aj = φ if i �= j):

P

[∞⋃i=1

Ai

]=∞∑

i=1

P(Ai) .

HISTORYThefirsthazard gamesmark thebeginningofthe history of probability, and we can affirmthat they dateback to theemergenceofHomosapiens.The origin of the word “hazard” offers lesscertitude. According to Kendall, MauriceGeorge (1956), it was brought to Europe atthe time of the third Crusade and derivedfrom the Arabic word “al zhar”, meaninga die.Benzécri, J.P. (1982) gives a reference forFranc d’Orient, named Guillaume de Tyr,according to whom the word hazard wasrelated to the name of a castle in Syria. This

Page 429: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

Probability 431

castle was an object of a siege during whichthe aggressors invented a game of dice, giv-ing its name to these games.Traces of hazard games can be found inancient civilizations. Drawings and objectsfound in Egyptian graves from the FirstDynasty (3500 B.C.) show that these gameswere already being played at that time.According to the Greek historian Herodotus,hazard games were invented by Palamedesduring the siege of Troy.The game of astragals for dogs was one ofthe first hazard games. The astragal is a smallfoot bone symmetric to the vertical axis.Much appreciated by the Greeks, and lat-er by the Romans, it is possibly, accordingto Kendall, M.G. (1956), the original gameof craps. Very old astragals were found inEgypt, but more ancient ones, produced interra cotta, were discovered in northern Iraq.They date back to the third millennium B.C.Another,also in terracotta,datingtothesametime, was found in India.There is a wealth of evidence demonstrat-ing how games involving dicewere played inancient times.Butnobodyat the timethoughtto establish the equiprobable property to getany side.For David, F.N. (1955), two hypotheseshelp to explain this gap. According to thefirst hypothesis, the dice generally weredeformed. According to the second hypoth-esis, the rolling of dice was of a religiousnature. Itwasawaytoask thegodsquestions:the result of a given roll of the dice revealedthe answer to questions that had been posedto the gods.Among games of hazard, playing cards arealsoofveryancientorigin.Weknowthat theywereused inChina, India,Arabia,andEgypt.In theWest,wefind thefirst evidenceofcardsin Venice in 1377, in Nuremberg in 1380,

and in Paris in 1397. Tarot cards are the mostancient playing cards.Until the 15th century, according toKendall, M.G. (1956), the Church and kingsfought against the practice of dice games.As evidence of this we may mention thelaws of Louis IX, Edward III, and HenryVIII. Card games were added to the list ofillegal activities, and we find a note on theprohibition of card games issued in Paris in1397. Despite the interdictions, games ofhazard were present without interruption,from the Romans to the Renaissance, in allsocial classes.It is not surprising to learn that they are atthe origin of mathematical works on proba-bility. In the West, the first writings on thissubject come from Italy. They were writtenby Cardano in his treatise Liber de LudoAleae, published posthumously in 1663, andby Galileo Galilei in his Sopra le Scopertedei Dadi. In the works of Galileo we find theprinciple of equiprobability of dice games.In the rest of Europe, these hazard-relatedproblems also interested mathematicians.Huygens, fromtheNetherlands,published in1657 a work entitled De Ratiociniis in AleaeLudo that, according to Holgate, P. (1984),extended the interest in probabilities to oth-er domains; game theory would be devel-oped in parallel to this in relation to bodydynamics. The works of Huygens consid-erably influenced those of two other well-known mathematicians: Bernoulli, Jakoband de Moivre, Abraham.But it was in France, with Pascal, Blaise(1623–1662) and de Fermat, Pierre (1601–1665), that probability theory really tookshape. According to Todhunter, I. (1949),Pascal, B. was asked by a famous gam-bler, Gombauld, Antoine, Knight of Méré,to resolve the following problem: In a game

Page 430: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

432 Probability

there are two adverseries The first one winsn of m plays while the second wins p of mplays, which are the chances for one of themto win the game (given that it wins the firstto take m plays).Pascal,B.contacteddeFermat,P.,whofoundthe solution. Pascal himself discovered therecurrence formula showing the same result.According to Benzécri, J.P. (1982), at a timewhen the West was witnessing the fall of theRoman Empire and the advent of the Chris-tian era, the East was engaged in intensescientific and artistic activity. In these cir-cumstances, Eastern scholars, such as OmarKhayam, probably discovered probabilityrules.There isanaspectofprobability that isofpar-ticular interest to mathematicians: the prob-lems of combinatory analysis.

MATHEMATICAL ASPECTSAxiomatic Basis of ProbabilityWe consider a random experiment with thesample space � containing n elements:

� = {x1, x2, . . . , xn} .

We can associate to each simple event xi

a probability P (xi) having the followingproperties:1. The probabilities P (xi) are nonnegative:

P (xi) ≥ 0 for i = 1, 2, . . . , n .

2. The sum of the probabilities of xi for igoing from 1 to n equals 1:

n∑i=1

P (xi) = 1 .

3. TheprobabilityP isa function thatassignsa number between 0 and 1 to each simpleevent of a random experiment:

P : �→ [0, 1] .

Properties of Probabilities1. The probability of a sample space is the

highest probability that can be associatedto an event:

P (�) = 1 .

2. The probability of an impossible eventequals 0:

If A = φ , then P(A) = 0 .

3. Let A be the complement of A in �, thenthe probability of A equals 1 minus theprobability of A:

P(A) = 1− P (A) .

4. Let A and B be two incompatible events(A ∩ B = φ). The probability of A ∪ Bequals the sum of the probabilities of Aand B:

P (A ∪ B) = P (A)+ P (B) .

5. LetAand Bbe two events.Theprobabilityof A ∪ B equals:

P (A ∪ B) = P (A)+ P (B)− P (A ∩ B) .

FURTHER READING� Conditional probability� Continuous probability distribution� Discrete probability distribution� Event� Random experiment� Sample space

REFERENCESBenzécri, J.P.: Histoire et préhistoire de

l’analyse des données. Dunod, Paris(1982)

Page 431: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

Probability Distribution 433

David, F.N.: Dicing and gaming (a note onthe history of probability). Biometrika 42,1–15 (1955)

Holgate, P.: The influence of Huygens’workin Dynamicson hisContribution to Proba-bility. Int. Stat. Rev. 52, 137–140 (1984)

Kendall, M.G.: Studies in the history ofprobability and statistics: II. The begin-nings of a probability calculus. Biometri-ka 43, 1–14 (1956)

Stigler, S.: The History of Statistics, theMeasurement of Uncertainty Before1900. Belknap, London (1986)

Todhunter, I.: A history of the mathematicaltheory of probability. Chelsea PublishingCompany, New York (1949)

Probability Distribution

Synonymforprobability function,butusedmuch more frequently than “probabilityfunction” to describe the distribution givingthe probability occurance of a value of a ran-dom variable X at the value of x.

HISTORYIt is in the 17th century that the system-atic study of problems related to randomphenomena began. The eminent physicistGalileo had already tried to study the errorsofphysicalmeasurements,consideringtheseas random and estimating their probability.During this period, the theory of insurancesalso appeared and was based on the analy-sis of the laws that rule random phenomenasuch as morbidity, mortality, accidents, etc.However, it was first necessary to study sim-pler phenomena such as games of chance.These provide particularly simple and clearmodels of random phenomena, allowing

one to observe and study the specific lawsthat rule them; moreover, the possibility ofrepeating the same experiment many timesallows for experimental verification of theselaws.

MATHEMATICAL ASPECTSThe function P(b) = P(X = b), where bvaries according to the possible values of thediscrete random variable X, is called theprobability function of X. Since P(X = b) isalways positive or zero, the probability func-tion is also positive or zero.The probability function is represented on anaxis system. The different values b of X areplotted as abscissae, the images P(b) as ordi-nates. The probability P(b) is representedby rectangles with a width equal to unity anda height equal to the probability of b. SinceX must take at least one of the values b, thesum of the P(b) must be equal to 1, meaningthat the sum of the surfaces of the rectanglesmust be equal to 1.

EXAMPLESConsider a random experiment that con-sists in rolling a fixed die. Consider the ran-dom variable X corresponding to the num-ber of obtained points. The probability func-tion P(X = b) is given by the probabilitiesassociated to each value of X:

b 1 2 3 4 5 6

P(b) 16

14

112

112

14

16

Page 432: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

434 Probability Function

FURTHER READING� Continuous probability distribution� Density function� Discrete distribution function� Discrete probability distribution� Distribution function� Probability� Probability function� Random variable

REFERENCESJohnson, N.L., Kotz, S.: Distributions in

Statistics: Discrete Distributions. Wiley,New York (1969)

Johnson, N.L., Kotz, S.: Distributions inStatistics: Continuous Univariate Distri-butions, vols. 1 and 2. Wiley, New York(1970)

Johnson, N.L., Kotz, S., Balakrishnan, N.:Discrete Multivariate Distributions, JohnWiley (1997)

Rothschild, V., Logothetis, N.: ProbabilityDistributions. Wiley, New York (1986)

Probability FunctionThe probability function of a discrete ran-dom variable is a function that associateseach value of this random variable to itsprobability.

See probability distribution.

HISTORYSee probability, probability distribution.

p-ValueThe p-value is defined as the probability,calculatedunder thenull hypothesis,ofhav-

ing outcome as extreme as the observed val-ue in the sample or is the probability ofobtaining a result at least as extreme as a giv-en data point, assuming the data point wasa result of chance alone.

HISTORYThe p-value was introduced by Gibbens andPratt in 1975.

MATHEMATICAL ASPECTSLet us illustrate the case of a hypothesis testmade on the estimator of mean; the sameprinciple is applied for any other estimator;only the notations are different.Suppose that we want to test the followinghypotheses:

H0 : μ = μ0

H1 : μ > μ0 ,

where μ represents the mean of a normallydistributed population with a known stan-dard deviation σ . A sample of dimension ngives an observed mean x.Consider the case where x exceeds μ0 andcalculate the probability of obtaining anestimation μgreater than or equal to xunderthe null hypothesis μ = μ0. The value pcorresponds to this probability. Thus:

p = P(μ ≥ x|μ = μ0) .

The standard random value Z given by

Z = μ− μ0σ√

n

follows normal distribution of

mean 0 with standard deviation 1. Intro-ducing this variable into the expression ofp, we find:

p = P

(Z ≥ x− μ0

σ√n

),

Page 433: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

P

p-Value 435

which in this form can be read in the normaltable.For a significance level α, the comparisonof p to α allows to make a decision about aneventual rejection of the null hypothesis. If:• p ≤ α: we reject the null hypothesis H0 in

favour of the alternative hypothesis H1;• p > α:wedonotreject thenullhypothesis

H0.We also can calculate the upper limit of theacceptance regionofH0,knowingthevalueCα such as

P(μ ≥ Cα|μ = μ0) = α .

Depending on the standard normal randomvariable, zα , the value Cα is expressed by:

Cα = μ0 + zα · σ√n

.

DOMAINS AND LIMITATIONSWe frequently use the p value to understandthe result of a hypothesis test.It is used for a one-tailed test, when thehypotheses are of the form:

H0 : μ = μ0

H1 : μ > μ0 ,

or

H0 : μ = μ0

H1 : μ < μ0 ,

or

H0 : μ ≤ μ0

H1 : μ > μ0 ,

or

H0 : μ ≥ μ0

H1 : μ < μ0 .

The value p can be interpreted as the small-est significance level for which the nullhypothesis cannot be rejected.

EXAMPLESSuppose that we want to conduct a one-tailedhypothesis test

H0 : μ = 30

against H1 : μ > 30

for a normally distributed population witha standard deviation σ = 8. A sample ofsize n = 25 gives an observed mean of 34.As x (= 34) clearly exceeds μ = 30, weshould try to accept the alternative hypoth-esis, which is μ > 30.Calculate the probability of obtaining anobserved mean of 34 under the null hypoth-esis μ = 30. This is the p value:

p = P(X ≥ 34|μ = 30)

= P

(X − 30

85

≥ 34− 3085

)

= P(Z ≥ 2.5) ,

where Z = X − μσ√

n

is normally distributed

with mean 0 and standard deviation 1;p = 0.0062 according to the normal table.For a significance level α = 0.01, wereject the null hypothesis for the alternativehypothesis μ > 30. We can calculate theupper limit of the acceptance region of H0,Cα such as

P(X ≥ Cα|μ = 30) = 0.01

and find Cα = 30+ 1.64 · 8

5= 32.624.

Page 434: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

436 p-Value

This confirms that we really must reject thenull hypothesis μ = 30 when the observedmean equals 34.

FURTHER READING� Acceptance region� Hypothesis testing

� Normal table� Significance level

REFERENCESGibbens, R.J., Pratt, J.W.: P-values Interpre-

tation and Methodology. Am. Stat. 29(1),20–25 (1975)

Page 435: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Q

Q-Q Plot(Quantile to Quantile Plot)

The Q-Q plot, or quantile to quantile plot, isa graph that tests the conformity between theempirical distribution and the given theoret-ical distribution.One of the methods used to verify the nor-mality of errors of a regression model is toconstruct a Q-Q plot of the residuals. If thepoints are aligned on the line x = y, then thedata are normally distributed.

HISTORYThe method of graphical representationknown as the Q-Q plot appeared in the early1960s. Since then, it has become a necessarytool in the analysis of data and/or residualsbecause it isavery providesawealth of infor-mation and is easy to interpret.

MATHEMATICAL ASPECTSTo construct a Q-Qplot we follow two steps.1. Arrange the data x1, x2, . . . , xn in increas-

ing order:

x[1] ≤ x[2] ≤ . . . ≤ x[n] .

2. Associate to each data point x[i] the i/(n+1)-quantile qi of the standard normaldistribution. Plot on a graph as ordinatesthe ordered data xi and as abscissae thequantiles qi.

If the variables have the same distribution,then the graphical representation betweenthe quantiles of the first variable relative tothe quantiles of the second distribution willbe a line of slope 1. So if the data xi arenormally distributed, the points on the graphmust be almost aligned on the line of equa-tion xi = qi.

DOMAINS AND LIMITATIONSThe Q-Q plot is used to verify if data fol-low a particular distribution or if two giv-en data sets have the same distribution. Ifthe distributions are the same, the graph isa line. The further the obtained result is fromthe 45◦ diagonal, the further is the empiri-cal distribution from the theoretical one. Theextremepointshaveagreatervariability thanthose in the center of the distribution. ThusaU-shapedgraphmeansthatonedistributionis skewed relative to another. An S-shapedgraph indicates that the distributions repre-sent a greater influence of extreme values onanother distribution (long tail).

Page 436: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

438 Q-Q Plot (Quantile to Quantile Plot)

A frequent use of the Q-Q plots is whenone wants to know the behavior of residualsduring a simple or multiple linear regres-sion,morespecifically, toknowif theirdistri-bution is normal.

EXAMPLESThefollowing tablecontains the residuals (inincreasing order) of a linear regression with13 observations. We associate them to thequantilesqi asdescribedabove.TheobtainedQ-Q plot is represented by the following fig-ure. We see that except for two points that areprobablyoutliers, thepointsarewellaligned.

Table: Residuals and quantiles

i ei qi

1 −1.62 −1.47

2 −0.66 −1.07

3 −0.62 −0.79

4 −0.37 −0.57

i ei qi

5 −0.17 −0.37

6 −0.12 −0.18

7 0.04 0.00

8 0.18 0.18

9 0.34 0.37

10 0.42 0.57

11 0.64 0.79

12 1.33 1.07

13 2.54 1.47

FURTHER READING� Analysis of residuals� Data analysis� Exploratory data analysis� Multiple linear regression� Probability distribution� Quantile� Residual� Simple linear regression

REFERENCESChambers, J.M., Cleveland, W.S., Klein-

er, B., Tukey, P.A.: Graphical Methods forData Analysis. Wadsworth, Belmont, CA(1983)

Hoaglin, D.C., Mosteller, F., Tukey, J.W.(eds.): Understanding Robust andExploratory Data Analysis. Wiley, NewYork (1983)

Page 437: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Q

Quantitative Graph 439

Wilk, M.B., Ganadesikan, R.: Probabilityplotting methods for the analysis of data.Biometrika 55, 1–17 (1968)

Qualitative Categorical Variable

A qualitative categorical variable is a vari-able with modalities in the form of cate-gories, such as, for example, “man” and“woman” of the variable “sex”; or the cate-gories “red”, “orange”, “green”, “blue”, “in-digo”, and “violet” of the variable “color”.The modalities of qualitative categoricalvariable can be represented on a nominalscale or on an ordinal scale.An example of qualitative categorical vari-able having an ordinal scale is the quali-ficativeprofessionalvariablewith categories“qualified”, “semiqualified”, and “nonqual-ified”.

FURTHER READING� Category� Variable

Quantile

Quantiles measure position (or the centraltendency)anddonotnecessarily trytodeter-mine the center of a distribution of observa-tions, but to describe a particular position.This notion is an extension of the conceptof the median (which divides a distributionof observation into two parts). The most fre-quently used quantiles are:• Quartiles, which separate a collection of

observations into four parts,• Deciles, which separate a collection of

observations into ten parts,• Centiles, which separate a collection of

observations into a hundred parts.

DOMAINS AND LIMITATIONSThe calculation of quantiles makes senseonly for a quantitative variable that cantake values on a determined interval.The concept of quantile indicates the sep-aration of a distribution of observations inan arbitrary number of parts. Note that thegreater the number of observations, the moresophisticated the separation of the distri-bution can be.Quartilescangenerallybeusedforanydistri-bution. The calculation of deciles and, a for-tiori, centiles requires a relatively large num-ber of observations to obtain a valid interpre-tation.

FURTHER READING� Decile� Measure of location� Median� Percentile� Quartile

Quantitative Graph

Quantitative graphs are used to present andsummarize numerical information comingfrom the study of a categorical quantitativevariable.The most frequently used types of quantita-tive graphs are:• Bar chart• Pictogram• Pie chart

HISTORYBeniger, J.R. and Robyn, J.L. (1978) retracethe historical development of quantitativecharts since the 17th century. Schmid, C.F.and Schmid, S.E. (1979) discuss the numer-ous forms of quantitative graphs.

Page 438: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

440 Quantitative Variable

FURTHER READING� Bar chart� Graphical representation� Pictogram� Pie chart

REFERENCESBeniger, J.R., Robyn, D.L.: Quantitative

graphics in statistics: a brief history. Am.Stat. 32, 1–11 (1978)

Schmid, C.F., Schmid, S.E.: Handbook ofGraphic Presentation, 2nd edn. Wiley,New York (1979)

Quantitative VariableA quantitative variable is a variable withnumerical modalities. For example, weight,dimension, age, speed, and time are quanti-tativevariables.Wedistinguish discretevari-ables (e. g., number of children per family)from continuous variables (e. g., length ofa jump).

FURTHER READING� Variable

QuartileQuartiles are location measures of a distri-bution of observations. Quartiles separatea distribution into four parts. Thus thereare three quartiles for a given distribution.Between each quartile we find 25% of thetotal observations:

Note that the second quartile equals themedian.

MATHEMATICAL ASPECTSCalculation of the quartile is similar to thatof the median. When we have all the obser-vations, quartiles are calculated as follows:1. The n observations must be arranged in

the form of a frequency distribution.2. Quartiles correspond to observations for

which the relative cumulated frequen-cy exceeds 25%, 50%, and 75%. Certainauthors propose the following formula,which allows to determine with precisionthe value of different quartiles:Computation of jth quartile:Let i be the integer part of j·(n+1)

4 and k

the fraction part of j·(n+1)4 .

Let xi and xi+1 be the values of theobservations respectively in the ith and(i+ 1)th position (when the observationsare arranged in increasing order).The jth quartile is

Qj = xi + k · (xi+1 − xi) .

When we have observations grouped intoclasses, quartiles are determined as follows:1. Determine the class where the quartile is

found:• First quartile: class for which the

relative cumulative frequency exceeds25%.

• Second quartile: class for which therelative cumulative frequency exceeds50%.

• Third quartile: class for which therelative cumulative frequency exceeds75%.

2. Calculate thevalueof thequartiledepend-ing on the assumption according to whichthe observations are uniformly distribut-ed in each class; the jth quartile Qj is:

Qj = Lj +[

n · j4 −

∑finf

fj

]· cj ,

Page 439: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Q

Quartile 441

where Lj is the lower limit of the class ofquartile Qj, n is the total number of obser-vations,

∑finf is the sum of frequencies

lower than the class of the quartile, fj is thefrequency of the class of quartile Qj, andcj is the size of the interval of the class ofquartile Qj.

DOMAINS AND LIMITATIONSThe calculation of quartiles makes senseonly for a quantitative variable that cantake values on a determined interval.The quartile is similar to the median. It isalso based on observation rank, not values.An outlier will have only a small influenceon the quartile values.

EXAMPLESLet us first take an example where we haveten observations (n = 10):

1 2 4 4 5 5 5 6 7 9

Despite the fact that quartiles cannot be cal-culated for a small number of observations(we should be very cautious when interpret-ing them), we will study this case in order tounderstand the principles behind the calcu-lation.The first quartile Q1 is found at the positionn+1

4 = 2.75. Q1 is three quarters of the dis-tance between the second and third observa-tions (which we will call x2 and x3) . We cancalculate Q1 as follows:

Q1 = x2 + 0.75 · (x3 − x2)

= 2+ 0.75 · (4− 2)

= 3.5 .

The second quartile Q2 (which is median) isfound at the position 2 · n+1

4 (or n+12 ), which

equals for our example 5.5.

Q2 = x5 + 0.5 · (x6 − x5)

= 5+ 0.5 · (5− 5)

= 5 .

The third quartile Q3 is found at the position3 · n+1

4 = 8.25. Q3 equals:

Q3 = x8 + 0.25 · (x9 − x8)

= 6+ 0.25 · (7− 6)

= 6.25 .

The values 3.5, 5, and 6.25 separate theobservations into four equal parts.For the second example, let us consider thefollowing frequency table representing thenumber of children per family in 200 fami-lies:

Value(numberofchildren)

Relative(numberoffamilies)

Relativefrequen-cy

Cumulatedfrequency

0 6 0.03 0.03

1 38 0.19 0.22

2 50 0.25 0.47

3 54 0.27 0.74

4 42 0.21 0.95

5 8 0.04 0.99

6 2 0.01 1.00

Total 200 1.00

The first quartile equals those observationsthat have a relative cumulative frequencyexceeding 25%, which corresponds to twochildren (because the relative cumulativefrequency for two children goes from 22to 47%, which includes 25%). The secondquartile equals three children because therelative cumulated frequency for three chil-dren goes from 47 to 74%, which includes

Page 440: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

442 Quetelet, Adolphe

50%. The third quartile equals four childrenbecause the relativecumulated frequencyforfour children goes from 74 to 95%, whichincludes 75%.Quartiles 2, 3, and 4 separate the 200 fami-lies into quarters. We can group 50 of the 200families into the first quarter with zero, one,or two children, 50 into the second quarterwith two or three children, 50 into the thirdquarter with three or four children, and 50into the fourth quarter with four, five, or sixchildren.Now consider an example involving the cal-culation of quartiles from the frequencydistribution of a continuous variable wherethe observations are grouped into classes.The following frequency table represents theprofits (in thousands of francs) of 100 stores:

Profit(thou-sands offrancs)

Cumu-latedfre-quency

Cumu-latedfre-quency

Relativefrequency

100–200 10 10 0.1

200–300 20 30 0.3

300–400 40 70 0.7

400–500 30 100 1.0

Total 100

The class containing the first quartile is theclass 200–300 (the one with a relative cumu-lative frequency of 25%).Considering that the observations are uni-formly distributed in each class, we obtainfor the first quartile the following values:

1st quartile

= 200+⎡⎣

(100 · 1

4

)− 10

20

⎤⎦ · 100

= 275 .

The class containing the second quartile isthe class 300–400. The value of the secondquartile equals:

2nd quartile

= 300+⎡⎣

(100 · 2

4

)− 30

40

⎤⎦ · 100

= 350 .

The class containing the third quartile is theclass 400–500. The value of the third quartileequals:

3rd quartile

= 400+⎡⎣

(100 · 3

4

)− 70

30

⎤⎦ · 100

= 416.66 .

We can conclude that 25 of the 100 storeshave profits between 100000 and 275000francs, 25 have a profits between 275000and 350000 francs, 25 have profits between350000 and 416066 francs, and25 haveprof-its between 416066 and 500000 francs.

FURTHER READING� Decile� Measure of location� Median� Percentile� Quantile

Quetelet, Adolphe

Quetelet, Adolphe was born in 1796 inGhent, Belgium. In 1823, the minister ofpublic education asked him to supervisethe construction of an observatory for the

Page 441: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Q

Quota Sampling 443

city of Brussels. To this end, he was sentto Paris to study astronomy and to prepareplans for the construction of the observato-ry. In France, Bouvard, Alexis was chargedwith his education. Quetelet, Adolphe met,among others, Poisson, Siméon-Denis andde Laplace, Pierre Simon, who convincedhim of the importance of probabilistic cal-culations. Over the next 4 years, he trav-eled widely to visit different observatoriesand to collect instruments. He wrote popu-lar works on science: Astronomie élémen-taire, Astronomie populaire et Positions dephysique, and Instructions populaires sur lecalcul des probabilités in 1828. Moreover,he founded with Garnier (his thesis advisor)a journal: Correspondance mathématique etphysique. In 1828,hewasnamedastronomerof the Brussels observatory and rose quicklyto become director of theobservatory, a posi-tion he would occupy for the next 25 years.In 1836, he became the tutor of PrincesErnest and Albert of Saxe-Cobourg andGotha, which allowed him to pursue hisinterest in the calculation of probabilities.His lessons to the boys were published in1846 under the title Lettres à S.A.R. leDuc régnant de Saxe-Cobourg et Gotha;they addressed the theory of probability asapplied to ethical and political sciences.Quetelet died in 1874.

Principal works of Quetelet, Adolphe:

1826 Astronomie élémentaire. Malther,Paris.

1828 Positions de physique. Brussels.

1828 Instructions populaires sur le calculdes probabilités.

1846 Lettres à S.A.R. le Duc régnant deSaxe-Cobourg et Gotha sur la théorie

des probabilités, appliquée aux sci-ences morales et politiques, Brus-sels.

Quota Sampling

Quota sampling is a nonrandom samplingmethod. A sample is chosen in such a waythat it reproduces an image that is as close aspossible to the population.The quota method is based on the knowndistribution of a population for certain char-acteristic (gender, age, socioeconomic class,etc.).

HISTORYSee sampling.

MATHEMATICAL ASPECTSConsider a population composed by Nindividuals. The distribution of the pop-ulation is known to have the charac-teristics A, B, . . . , Z with the respectivemodalities A1, A2, . . . , Aa; B1, B2, . . . , Bb;. . . ; Z1, Z2, . . . , Zz, the desired sample isobtained by multiplying the number of thedifferent modalities of all the control traitsby the sounding rate q, meaning the per-centage of the population that one wants tosound.The sample will be of size n= N ·q and willhave the following distribution:

q · XA1 , q · XA2 , . . . , q · XAa,

q · XB1 , q · XB2 , . . . , q · XBb, . . . ,

q · XZ1 , q · XZ2 , . . . , q · XZz ,

where X... represents the number of the dif-ferent modalities for all the control traits.

Page 442: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

444 Quota Sampling

DOMAINS AND LIMITATIONSA control trait that will be used as a basis forthe distribution of the sample must obey thefollowing rules: (1) It must be in close cor-relation with the under study variables. (2) Itmust have a known distribution for the wholepopulation.Commonly used control traits are gender,age, socioprofessional class, region, etc.The advantages of quota sampling are thefollowing:• Unlike random sampling, quota sam-

pling does not require the existence ofa sounding basis.

• The cost is much lower than for randomsampling.

The main disadvantages are:• The quota method does not have a suf-

ficient theoretical foundation. It is basedon the following hypothesis: a correctdistribution of the control traits ensuresthe representativity of the distribution ofthe observed traits.

• The quota method cannot calculate theprecision of the estimations obtainedfrom the sample. Because the investiga-tors choose the people to be surveyed, itis impossible to know with what proba-bility any given individual of the popula-tion may be part of the sample. It is there-fore impossible to apply thecalculation ofprobabilities that, in the case of randomsampling, can associate to each estima-tion ameasureof theerror that couldhavebeen committed.

EXAMPLESSuppose we want to study the market pos-sibilities for a consumer product in town X.Thechosen controlvariablesaregender,age,and socioprofessionalclass.Thepopulation

distribution (in thousands of persons) is thefollowing:

Gender Age Socioprofes-sional class

Male 162 0–20 70 Chief 19.25

Female 188 21–60 192.5 Indepen-dent Prof.

14.7

≥ 61 87.5 Managers 79.1

Workers 60.9

Unem-ployed

176.05

Total 350 Total 350 Total 350

We choose a sounding rate equal to 1300 . The

samplewill thereforehavethefollowingsizeand distribution:

n = N · q = 350000 · 1

300= 1167 .

Gender Age Socioprofes-sional class

Male 540 0–20 233 Chief 64

Female 627 21–60 642 Indepen-dent Prof.

49

≥ 61 292 Managers 264

Workers 203

Unem-ployed

587

Total 1167 Total 1167 Total 1167

The choice of individuals is then made bysimple random sampling in each category.

FURTHER READING� Estimation� Simple random sampling� Sampling

REFERENCESSukhatme, P.V.: Sampling Theory of Sur-

veys with Applications. Iowa State Uni-versity Press, Ames, IA (1954)

Page 443: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

R

Random Experiment

An experiment in which the outcome is notpredictable in advance is called a randomexperiment.A random experiment can be characterizedas follows:1. It is possible to describe the set of all pos-

sible results (called the sample space ofthe random experiment).

2. It is not possible to predict the result withcertainty.

3. It is possible to associate each possibleoutcome to a probability of appearing.

DOMAINS AND LIMITATIONSThe random phenomena to which proba-bilities apply can be encountered in manydifferent fields such as economic sciences,social sciences, physics, medicine, psychol-ogy, etc.

EXAMPLESHere are some examples of random expe-riments, characterized by their action and bythe set of the possible results (their samplespace).

Action Sample space

Flipping a coin � = {Heads, Tails}Rolling a die � = {1, 2, 3, 4, 5, 6}Drawing a ball froma container with onegreen ball, one blueball, and one yellowball

� ={Blue, Green, Yellow}

Rainfall ina determined period

� = set ofnonnegative realnumbers (infinitenoncountable set)

FURTHER READING� Event� Probability� Sample space

Random Number

A random number is called the realization ofa uniformly distributed random variable inthe interval [0, 1].

DOMAINS AND LIMITATIONSIf the conditions imposed above are not rig-orously verified, that is, if the distributionfrom where the sample is taken is not exact-

Page 444: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

446 Random Number Generation

ly the uniform distribution and the succes-sive draws are not necessarily independent,then we say that the set of randomly obtainednumbers form a series of pseudorandomnumbers.

EXAMPLESA container holds 10 balls from 0 to 9. If ourgoal is to obtain a random number between0 and 1 to m decimal places, we need todraw m balls with replacement seeking toform a number of m digits. Then we considerthesedrawsas if therewereonly onenumber,dividing it by 10m.As an example, we suppose that we drawsuccessively the following balls: 8, 7, 5, 4,and 7. The corresponding random number is0.87547.

FURTHER READING� Generation of random numbers� Uniform distribution

Random Number GenerationSee generation of random numbers.

Random VariableA variable whose value is determined bythe result of a random experiment is calleda random variable.

MATHEMATICAL ASPECTSA random variable is generally denotedby one of the last letters of the alphabet(in uppercase). It is a real valued functiondefined on the sample space �. In otherwords, a real random variable X is a mappingof � onto �:

X : �→ � .

We refer to:• Discrete random variables, when the set

of all possible values for the randomvariable is either finite or infinite count-able;

• Continuous random variables, when theset of all possible values for the randomvariable is infinite uncountable.

EXAMPLESExamples of Discrete Random VariableIf we roll two dice simultaneously, the 36possible results that comprise � are the fol-lowing:

� = {(1, 1), (1, 2), . . . , (1, 6),

(2, 1), (2, 2), . . . , (2, 6), . . . ,

(6, 1), (6, 2), . . . , (6, 6)} .Therandom variableX =“sumof twodice” isa discrete random variable. It takes its valuesin the finite set E = {2, 3, 4, . . . , 11, 12}, thatis, the sum of two dice can be 2, 3, 4, . . . or12. In this example, X is an entirely positiverandom variable and the operation is from �

in E.It is possible to attribute a probability to thedifferent values that the random variable Xcan take.The value “sum equal to 2” is then obtainedonly by the result (1, 1). On the other hand,the value “sum equal to 8” is obtained by theresults (2, 6), (3, 5), (4, 4), (5, 3), and (6, 2).Since�contains36resultswithequalproba-bility, we can claim that the “chance” or the“probability” of obtaining a “sum equal to2” is 1

36 . The probability of obtaining a “sumequal to 8” is 5

36 , etc.

Example of Continuous Random VariableSuch random variables characterize mea-sures like, for example, the time period

Page 445: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

R

Randomized Block Design 447

of a telephone call defined in the interval]0,∞[, wind direction defined in the inter-val [0, 360[, etc.

FURTHER READING� Density function� Probability� Random experiment� Sample space� Value

RandomizationRandomization is a procedure by whichtreatments are randomly assingned to expe-rimental units. It is used to avoid subjectivetreatment affectations. It allows to eliminatea systematic error caused by noncontrollablefactors that persist even in repeated expe-riments. Randomization can be used to avoidany bias that might corrupt the data. Final-ly, randomization is necessary if we seek toestimate the experimental error.

HISTORYIn September 1919, Fisher, Ronald Aylmerbegan his career at the Rothamsted Experi-mental Station, where agricultural researchwas taking place. He started with histor-ical data and then took into account datafrom agricultural trials, for which he devel-oped the analysis of variance. These tri-als led him to develop experimental designs.He developed highly efficient experiment-al designs using the principles of random-ization, Latin square, randomized block, andfactorial arrangements.All these methods can be found in his workStatistical Methods for Research Workers(1925). His article in 1926 on the arrange-ment of his agricultural experiments becamethe book The Design of Experiments (1935).

FURTHER READING� Completely randomized design� Design of experiments� Experiment� Experimental unit� Treatment

REFERENCESFisher, R.A.: Statistical Methods for

Research Workers. Oliver & Boyd, Edin-burgh (1925)

Fisher, R.A.: The arrangement of field expe-riments. J. Ministry Agric. 33, 503–513(1926)

Fisher, R.A.: The Design of Experiments.Oliver & Boyd, Edinburgh (1935)

Fisher, R.A.: Statistical Methods, Expe-rimental Design and Scientific Infer-ence. A re-issue of Statistical Methodsfor Research Workers, “The Design ofExperiments’, and “Statistical Methodsand Scientific Inference.” Bennett, J.H.,Yates, F. (eds.) Oxford University Press,Oxford (1990)

Randomized Block Design

A randomized block design is an experi-mental design where the experimental unitsare in groups called blocks. The treatmentsare randomly allocated to the experimentalunits inside each block. When all treatmentsappear at least once in each block, we havea completely randomized block design. Oth-erwise, we have an incomplete randomizedblock design.This kind of design is used to minimizethe effects of systematic error. If the exper-imenter focuses exclusively on the differ-ences between treatments, the effects due

Page 446: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

448 Range

to variations between the different blocksshould be eliminated.

HISTORYSee experimental design.

EXAMPLESA farmer possesses five plots of land wherehe wishes to cultivate corn. He wants to runan experiment since he has two kinds ofcorn and two types of fertilizer. Moreover, heknows that his plots are quite heterogeneousregarding sunshine, and therefore a system-atic error could arise if sunshine does indeedfacilitate corn cultivation.The farmer divides the land into five blocksand randomly attributes toeachblock the fol-lowing four treatments:

A: corn 1 with fertilizer 1B: corn 1 with fertilizer 2C: corn 2 with fertilizer 1D: corn 2 with fertilizer 2

He eliminates therefore the source of the sys-tematic error due to the factor sun. By fol-lowing the same procedure randomly insideeach block, he does not favor a special cornvariety with respect to the others, nor a spe-cial type of fertilizer with respect to the oth-ers. The following table shows the randomarrangement of the four treatments in the fiveblocks:

Block 1 D C A B

Block 2 C B A D

Block 3 A B C D

Block 4 A C B D

Block 5 D C A B

An analysis of variance permitsone to drawconclusions, after block effects are removed,as to whether there exists a significative dif-ference between the treatments.

FURTHER READING� Analysis of variance� Design of experiments

Range

The range is the easiest measure of disper-sion to calculate.Consider a set of observations relative toa quantitative variable X. The range isdefined as the difference between the high-est observed value and the lowest observedvalue.

MATHEMATICAL ASPECTSConsider a set of observations relative toa quantitative variable X.If we denote by Xmax the value of the highestobservation in a set of observations and byXmin the lowest value, then the range is givenby:

range = Xmax − Xmin .

When observations are grouped into class-es, the range is equal to the differencebetween the center of the two extreme class-es. Let δ1 be the center of the first class andδk the center of the last class. The range isequal to:

range = δk − δ1 .

DOMAINS AND LIMITATIONSThe range is a measure of dispersion thatis not used very often because it only pro-vides an indication on the span of the valuesbut does not provide any information on thedistribution within this span.It nevertheless has the great advantage ofbeing easy to calculate. However,one shouldbe very careful with its interpretation.

Page 447: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

R

Rank of a Matrix 449

The range is a measure of dispersion thatonly takes into account the two extreme val-ues.Thismethod can therefore lead to wronginterpretations if the number of children inseven families is 0, 0, 1, 1, 2, 3, and 6, therange is 6, whereas all but one of the fam-ilies have between 0 and 3 children. In thiscase, the variability is relatively low whereasthe range is high.Using the range to measure dispersion (orvariability) presents another inconvenience.It can only increase when the number ofobservations increases. This is a disadvan-tage because ideally a measure of variabil-ity should be independent of the number ofobservations.One must exercise extreme caution wheninterpreting a range because there may beoutliers. These have a direct influence on thevalue of the range.

EXAMPLESHereare theagesof twogroupsofpeople tak-ing an evening class:

Group 1:33 34 34 37 40

40 40 43 44 45

Group 2:22 25 31 40 40

41 44 45 50 52

The arithmetic mean x of these two sets ofobservations is identical:

x = 390

10= 39 .

Even if the arithmetic mean of the ages ofthese two groups is the same, 39 years, thedispersion around the mean is very different.The range for the first group is equal to 12(45− 33), whereas the range for the secondgroup is equal to 52− 22 = 30 years old.Notice that the range only gives an indicativemeasure on the variability of the values; it

doesnot indicate theshapeof thedistributionwithin this span.

FURTHER READING� Interquartile range� Measure of dispersion

REFERENCESAnderson, T.W., Sclove, S.L.: An Intro-

duction to the Statistical Analysis of Data.Houghton Mifflin, Boston (1978)

Rank of a MatrixThe rank of a matrix is the maximum num-ber of linearly independent rows or columns.

MATHEMATICAL ASPECTSLet A be an m × n matrix. We define itsrank, denoted by rA, as the maximum num-ber of linearly independent rows or columns.In particular:

rA ≤ min (m, n) .

If rA = min (m, n), then we say that matrix Ais of full rank.If therankofasquarematrixofordernequalsn, then its determinant is nonzero; in otherwords, this matrix is invertible.

EXAMPLESLet A be a 2× 3 matrix:

A =[ −7 3 2

1 0 4

].

Its rank is less than or equal to min(2, 3).The number of linearly independent lines orcolumns equals 2. Consequently, rA = 2.

FURTHER READING� Determinant� Matrix

Page 448: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

450 Rao, Calyampudi Radhakrishna

Rao, Calyampudi Radhakrishna

Rao, Radhakrishna was born in 1920 atHadagali, Karnata, India. He received hisPh.D. in 1948 at Cambridge University.Since 1989 he has received honorary doc-torates from many universities around theworld, notably the University of Neuchâtel,in Switzerland.Hiscontribution to thedevelopmentofstatis-tical theory and its applications is compara-ble to the works of Fisher, Ronald Aylmer,Neyman, Jerzy, and other important modernstatisticians. Among his contributions, themost important are the Fisher–Rao theorem,the Rao–Blackwell theorem, the Cramer–Rao inequality, and Rao’s U-test. He is thecoauthor of 12 books and more than 250scientific articles. His book Statistics andTruth: Putting Chance to Work deals withfundamental logic and statistical applica-tions and has been translated into many lan-guages.

Selected works and publications of Rao,Calyampudi Radhakrishna:

1973 Linear Statistical Inference and ItsApplications, 2nd edn. Wiley, NewYork.

1989 (with Shanbhag, D.N.) Further exten-sions of the Choquet–Deny and Denytheorems with applications in char-acterization theory. Q. J. Math., 40,333–350.

1992 (with Zhao, L.C.) Linear representa-tion of R-estimates in linear models.Can. J. Stat., 220:359–368.

1997 Statistics and Truth: Putting Chanceto Work, 2nd edn. World Scientific,Singapore.

1999 (with Toutenburg, H.) Linear Mod-els. Least Squares and Alternatives,2nd edn. Springer Series in Statis-tics.Springer,BerlinHeidelbergNewYork.

FURTHER READING� Gauss–Markov theorem� Generalized inverse

Regression Analysis

Regression analysis is a technique that per-mits one to study and measure the relationbetweentwoormorevariables.Startingfromdataregistered inasample, regressionanaly-sis seeks to determinean estimateofamathe-matical relation between two or more vari-ables. The goal is to estimate the value ofone variable as a function of one or more oth-er variables. The estimated variable is calledthe dependent variable and is common-ly denoted by Y. In contrast, the variablesthat explain the variations in Y are calledindependent variables, and they are denotedby X.When Y dependson only oneX,wehavesim-ple regression analysis, but when Y dependson more than one independent variable, wehave multiple regression analysis. If the rela-tion between the dependent and the inde-pendent variables is linear, then we have lin-ear regression analysis.

HISTORYThe pioneer in linear regression analysis,Boscovich, Roger Joseph, an astronomeras well as a physician, was one of the firstto find a method for determining the coeffi-cients of a regression line. To obtain a line

Page 449: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

R

Regression Analysis 451

passing as close as possible to all observa-tions, two parameters should be determinedin such a way that two Boscovich, R.J. con-ditions are respected:1. The sum of the deviations must equal

zero.2. The sum of the deviations absolute value

must be minimal.Boscovich, R.J. established these conditionsaround 1755–1757. He used a method ofgeometrical resolution that he applied whenmeasuring the length of five of the earth’smeridians.In 1789, de Laplace, Pierre Simon, in hiswork Sur les degrés mesurés des méridiens,et sur les longueurs observées sur pend-ule, adopted the two conditions imposed byBoscovich, R.J. and established an algebra-ic method for the resolution of Boscovich’salgorithm.Two other mathematicians, Gauss, CarlFriedrich and Legendre, Adrien Marie,seem to have discovered, without conferringwith each other, the least-squares method.It seems that Gauss, C.F. had been using itsince 1795. But it is Legendre, A.M. whopublished the least-squares method in 1805in his work Nouvelles méthodes pour ladétermination des orbites des comètes,where we find the appendix Sur la méth-ode des moindres carrés. A controversyabout the true author of the discovery erupt-ed between the two men in 1809. Gauss, C.F.published his method citing references tohis previous works. Note that in the sameperiod, an American named Adrian, Robert,who had no knowledge of Gauss’ and Leg-endre’s work, was also working with leastsquares.The 18th century marked the developmentof the least-squares method. Accordingto Stigler, S. (1986), this development is

strongly associated with three problemsfrom that period. The first concerned themathematical representation and determi-nation of the movements of the moon. Thesecond was related to explaining the secu-lar inequality observed in the movement ofJupiter and Saturn. Finally, the third con-cerned the determination of the shape of theearth.According to thesameauthor,onefindsintheworks of Galton, Francis the origin of theterm linear regression. His research focusedon heredity: the notion of regression arosefrom an analysis of heredity among familymembers.In 1875, Galton, C.F. conducted an exper-iment with small peas distributed in sevengroups by weight. For each group, he tookten seeds and asked seven friends to culti-vate them. He thus cultivated 490 germs.He discovered that each seed group clas-sified by weight followed a normal distri-bution and the curves, centered on differ-ent weights, were dispersed in an equal-ly normal distribution; the variability ofthe different groups was identical. Gal-ton, F. talked about “reversion” (in biolo-gy “reversion” refers to a return to a prim-itive type). It was a linear reversion thatGalton, F. had witnessed since the sev-en seed groups were normally distributed,compared not to their parents’ weight, butto a value close to the overall populationmean.Humanhereditywasaresearch topicforGal-ton, F. as well, who was trying to define therelation between the length of parent seedsand that of their offspring. In 1889, he pub-lished his work under the title Natural Inher-itance. In this field, from 1885 he used theterm regression, and in his last works he end-ed up with the concept of correlation.

Page 450: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

452 Rejection Region

MATHEMATICAL ASPECTSSee LAD regression, multiple linearregression, and simple linear regression.

DOMAINS AND LIMITATIONSThe goal of regression analysis is not only todetermine the relation between the depen-dent variable and the independent vari-able(s). Regression analysis seeks as well toestablish the reliability of estimates and con-sequently the reliability of the obtained pre-dictions.Regression analysis allows furthermore toexamine whether the results are statistical-ly significant and if the relation between thevariables is real or only apparent.Regression analysis has various applicationsin all scientific fields. In fact, from themomentwedeterminea relation among vari-ables, we can predict future values, provid-ed that the conditions remain unchanged andthat there always exists an error. Commonlywe check the validity of a regression anal-ysis by residual analysis.

EXAMPLESSee linear regression (multiple or simple),LAD regression, and ridge.

FURTHER READING� Analysis of residuals� Analysis of variance� Coefficient of determination� Hat matrix� Least squares� Leverage point� Multiple linear regression� Normal equations� Residual� Simple linear regression

REFERENCESEisenhart, C.: Boscovich and the Combi-

nation of Observations. In: Kendall, M.,Plackett, R.L. (eds.) Studies in the Historyof Statistics and Probability, vol. II. Grif-fin, London (1977)

Galton, F.: Natural Inheritance. Macmillan,London (1889)

Gauss, C.F.: Theoria Motus CorporumCoelestium. Werke, 7 (1809)

Laplace, P.S. de: Sur les degrés mesurés desméridiens, et sur les longueurs observéessur pendule. Histoire de l’Académieroyale des inscriptions et belles lettres,avec les Mémoires de littérature tirées desregistres de cette académie. Paris (1789)

Legendre, A.M.: Nouvelles méthodes pourla détermination des orbites des comètes.Courcier, Paris (1805)

Plackett, R.L.: Studies in the history ofprobability and statistics. In: Kendall, M.,Plackett, R.L. (eds.) The discovery of themethod of least squares. vol. II. Griffin,London (1977)

Stigler, S.: The History of Statistics, theMeasurement of Uncertainty Before1900. Belknap, London (1986)

Rejection Region

The rejection region is the interval, mea-sured in the sampling distribution of thestatistic under study, that leads to rejectionof the null hypothesis H0 in a hypothesistest.The rejection region iscomplementaryto the acceptance region and is associatedto a probability α, called the significancelevel of the test or type I error.

Page 451: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

R

Rejection Region 453

MATHEMATICAL ASPECTSConsider a hypothesis test on a parameterθ . We utilize the statistic T in order to esti-mate this parameter.

Case 1: One-sided Test (Right Tail)The hypothesis is the following:

H0 : θ = θ0

H1 : θ > θ0 .

where θ0 is the preassigned value for param-eter θ .In a one-sided test (test for the right tail), therejection region corresponds to an intervallimited on the right side of the criticalvalue:

Rejection region = [critical value;∞[, andAcceptance region= ]−∞; critical value[,

given that the critical value equals:

μT + zα · σT ,

where μT is the mean of the samplingdistribution for statistic T, σT is the stan-dard error for the statistic T, and α is thesignificance level of the test.The value zα is obtained from statisticaltables of distribution T. If the value of Tcomputed in the sample lies in the rejectionregion, that is, if T is greater than or equalto the critical value, the null hypothesis H0

must be rejected in favor of the alternativehypothesis H1. Otherwise, the null hypoth-esis H0 cannot be rejected.

Cas 2: One-sided Test (Left Tail)The hypothesis is as follows:

H0 : θ = θ0

H1 : θ < θ0 .

In a one-sided test (test for the right tail), therejection region is limited on the left side bythe critical value:

rejection region = ]−∞; critical value] ,

and the acceptance region is:

acceptance region = ]critical value;∞[ ,

provided the critical value equals:

μT − zα · σT .

If the value of T computed in the sample liesin the rejection region, that is, if T is less thanorequaltothecriticalvalue, thenull hypoth-esis H0 must be rejected in favor of the alter-native hypothesis H1. Otherwise, the nullhypothesis H0 cannot be rejected.

Case 3: Two-sided TestThe hypothesis is as follows:

H0 : θ = θ0

H1 : θ �= θ0

In a two-sided test the rejection region isdivided into two intervals:

rejection region =�\]lower critical value;upper critical value

[

and the acceptance region:

acceptance region =]lower critical value ;

upper critical value[

,

where the critical values are equal to:

μT − z α2· σT and μT + z α

2· σT .

Page 452: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

454 Rejection Region

If the value of T computed in the sample liesin the rejection region, that is, if T is greaterthan or equal to the uppercritical value or ifTis less than or equal to the lower critical val-ue, the null hypothesis H0 must be rejectedin favor of the alternative hypothesis H1.Otherwise, the null hypothesis H0 cannot berejected.The following figure illustrates the accep-tanceandrejectionregionsfor thecasesmen-tioned above.

DOMAINS AND LIMITATIONSIn a one-sided test, the rejection region isunique and is detected:• On the right side of the sampling distri-

bution for a one-sided test in the right tail.• On the left side of the sampling distri-

bution for a one-sided test in the left tail.In contrast, the rejection region is dividedinto two parts in the two-sided test, whichare found on the extreme left and right sidesof the sampling distribution. To each one ofthese regions is associated a probability α

2 ,withαbeingthesignificance levelofthetest.

EXAMPLESA company produces steel cables. Based ona sample of n = 100 units, it wants to verifywhether the diameter of the cables is 0.9 cm.The standard deviation of the population isknown to be equal to 0.05 cm.The hypothesis test in this case is a two-sided test. The hypothesis is as follows:

Null hypothesis H0 : μ = 0.9

Alternative hypothesis H1 : μ �= 0.9

For a significance level set to α = 5%, thevalue z α

2of the normal table equals 1.96.

The critical values are then:

μx ± z α2· σx ;

whereμx is themeanof thesampling distri-bution for the means:

μx = μ = 0.9

and σx is the standard error of the mean:

σx = σ√n= 0.05√

100= 0.005

⇒ μx ± z α2σx = 0.9± 1.96 · 0.005

= 0.9± 0.0098 .

The interval ]0.8902; 0.9098[ correspondsto the acceptance region of the null hypoth-esis.The rejection region is the complementaryinterval and is divided into two parts:

]−∞; 0.8902] and [0.9098;∞[ .

If the obtained sampling mean lies in thisinterval, the null hypothesis H0 must berejected in favor of the alternative hypoth-esis H1.

FURTHER READING� Acceptance region� Critical value� Hypothesis testing� Significance level

REFERENCESBickel, R.D.: Selecting an optimal rejec-

tion region for multiple testing. Workingpaper, Office of Biostatistics and Bioin-formation, Medical College of Georgia(2002)

Page 453: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

R

Relative Risk 455

Relation

The notion of relation expresses the rapportthat exists between two random variables.It is one of the most important notions instatistics. It appears in concepts like classi-fication, correlation, dependence, regres-sion analysis, time series, etc.

FURTHER READING� Classification� Correlation coefficient� Dependence� Model� Regression analysis� Time series

Relative Risk

The relative risk is defined as the ratiobetween the risk relative to the individualswho are exposed to a factor. It measures therelative effect of a certain cause, that is, thedegree of the association between a diseaseand its cause.

HISTORYSee risk.

MATHEMATICAL ASPECTSThe relative risk is a probability ratio that isestimated by:

relative risk = risk of exposed

risk of nonexposed

= incidence rate for exposed

incidence rate for nonexposed.

Note that the relative risk is a measure with-out a unit since it is defined as the ratio of twoquantities measured in the same unit.

DOMAINS AND LIMITATIONSThe relative risk expresses the risk of theunits exposed to a factor as a multiple of therisk of the units not exposed to this factor:

risk of units exposed

= risk of units nonexposed

· relative risk .

We want to study the relation between a riskfactor and a disease. When there is no asso-ciation, the relative risk should theoreticallybe equal to 1. If exposure to the risk factorincreases the number of cases of the disease,then the relative risk becomes larger than 1.If, on the other hand, exposure to the factordecreases the number of cases of the disease,then the relative risk lies between 0 and 1.It is easy to compute a P= (1− α) % confi-dence interval for the relative risk. We con-sider here the following general case:The confidence interval is computed in thelogarithmic scale and then transformed backinto the arithmetic scale. The lowercase let-ters refer to the following table:

Disease Nondisease Total

Exposed a n− a n

Nonexposed c m− c m

The P = (1− α) % confidence interval forthe relative risk, RR, is then given by:

exp

{ln (RR)± zα

√1

a− 1

n+ 1

c− 1

m

},

with “exp” the exponent for the expressionin brackets and zα the value in the normaltable. It is assumed that a, n, c, and m arelarge enough. If the confidence interval forthe relative risk includes the value 1.0, thenweconclude that theexposurefactordoesnotplay any statistically significant role in theprobability of having the disease.

Page 454: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

456 Resampling

EXAMPLESWe use here example 1 from the section onattributable risk. The relative risk of breastcancer associated to exposure to smoke cor-responds to the risk (or incidence rate) foreach exposure level divided by the risk (orincidence rate) for the nonexposed:

Group Incidence rate(100000 year)(A)

Relativerisk(A/57.0)

Nonexposed 57.0 1.0

Passive smokers 126.2 2.2

Active smokers 138.1 2.4

The relative risk is 2.2 for passive smok-ers and 2.4 for active smokers. Comparedto the risk of the nonexposed, the risk foractive smokers and passive smokers is now2.4 and 2.2 times larger. There is a roughlymodest association that underlines the factthat breast cancer depends also on factorsother than smoking. The relative risk pro-vides thus a partial description of the relationbetween smoking and breast cancer.By itselfit doesnotallowone to drawany firm conclu-sions and indicates that smoking is probablyresponsible for 50% of breast cancer casesin the population.

FURTHER READING� Attributable risk� Avoidable risk� Cause and effect in epidemiology� Incidence rate� Odds and odds ratio� Prevalence rate� Risk

REFERENCESCornfield, J.: A method of estimating com-

parative rates from clinical data. Appli-

cations to cancer of the lung, breast, andcervix. J. Natl. Cancer Inst. 11, 1269–75(1951)

Lilienfeld, A.M., Lilienfeld, D.E.: Founda-tions of Epidemiology, 2nd edn. Claren-don, Oxford (1980)

MacMahon, B., Pugh, T.F.: Epidemiology:Principles and Methods. Little Brown,Boston, MA (1970)

Morabia, A.: Epidemiologie Causale. Editi-ons Médecine et Hygiène, Geneva (1996)

Morabia, A.: L’Épidémiologie Clinique.Editions “Que sais-je?”. Presses Univer-sitaires de France, Paris (1996)

Resampling

While using a sample to estimate a certainparameter concerning the population fromwhich the sample is drawn, we can use thesame sample more than once to improve ourstatistical analysis and inference. This hap-pens by sampling inside the initial sample.This is the idea behind resampling methods.Using resampling we may compute the biasor the standard deviation of an estimatedparameter, or we can also draw conclusionsfrom construction confidence interval andtesting hypotheses for the estimated param-eter. Among the techniques that use resam-pling, bootstrap, jackknife, and cross vali-dation can be mentioned, for example. Theirspecial interest is in nonparametric tech-niques because:• The exact distribution of the data is not

known and• The data arise from a nonlinear design,

which is very difficult to model.On the other hand, a problem related to suchprocedures arises when the sample does not

Page 455: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

R

Residual 457

represent the population and we risk makingabsolutely false inferences.

HISTORYThe jackknife method was introduced in the1950s as a way to reduce the bias of an esti-mate. In the 1960s and 1970s, resamplingwaswidelyused insamplingasawayofesti-mating the variance.The bootstrap method was invented byEfron, B. in the late 1970s and has since beendeveloped by numerous researchers in dif-ferent fields.

MATHEMATICAL ASPECTSSee bootstrap, jackknife.

FURTHER READING� Bootstrap� Jackknife method� Monte Carlo method� Randomization

REFERENCESEdington, E. S.: Randomisation Tests. Mar-

cel Dekker, New York (1980)

Efron, B.: The jackknife, the bootstrap, andother resampling plans, CBMS Mono-graph.No.38.SIAM,Philadelphia (1982)

Fisher, R.A.: The Design of Experiments.Oliver & Boyd, Edinburgh (1935)

Pitman,E.J.G.:Significance testswhichmaybe applied to samples from any popula-tions. Supplement, J. Roy. Stat. Soc. 4,119–130 (1937)

Residual

Residuals are defined as the differencebetween the observed values and the esti-

mated (fitted) values of a regression model.They represent what is not explained by theregression equation.

HISTORYSee error.

MATHEMATICAL ASPECTSThe residuals ei are given by:

ei = Yi − Yi , i = 1, . . . , n ,

where Yi denotes an observation and Yi itsfitted value, obtained by fitting a regressionmodel.The following graph illustrates the residualsin a simple linear regression:

EXAMPLESSee residual analysis.

FURTHER READING� Analysis of residuals� Error� Hat matrix� Leverage point� Multiple linear regression� Regression analysis� Simple linear regression

Page 456: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

458 Rice, Stuart Arthur

Rice, Stuart Arthur

Rice, Stuart Arthur (1889–1969) studied atthe University of Washington (obtaining hisbachelor of science degree in 1912 and hismaster’sdegree in 1915).Hisacademicworkwasdevoted to thedevelopmentofamethod-ology in the social sciences.In the early 1930s, Rice, S.A. joined theadministration of Franklin D. Roosevelt. Hebecame the assistant director of the censusoffice from 1933 until the early 1950s. Hismajorcontribution to statisticsconsists in themodernization of the use of statistics in gov-ernment. He promoted the development ofCOGSIS(CommitteeonGovernmentStatis-tics and Information Services) and workedtoward adapting modern techniques of sam-pling as well as mathematical statistics infederal agencies. In 1955, he created theStuart A. Rice Associates (later Surveys &Research Corporation), a statistical consult-ing agency for the public and private sectors.He retired in the 1960s.

FURTHER READING� Census� Official statistics

Ridge Regression

The main goal of the ridge method is to mini-mize themean squared errorof theestimates,that is, to compromise bias and variance. Byconsidering collinearity as a problem of qua-sisingularity of W′W, where W denotes thematrix of the explanatory variables X suit-ably transformed (centered and/or scaled),we use the ridge regression method, whichmodifies the matrix W′W in order to removesingularities.

HISTORYThe ridge regression was introduced in 1962by Hoerl, A.E., who stated that the existenceofcorrelation between explanatory variablescan cause errors in estimation while apply-ing the least-squares method. As an alter-native, he developed the ridge regression,which allows to compute estimates that arebiased but of lower variance than in the least-squares estimates. While the least-squaresmethod provides nonbiased estimates, theridge minimizes the mean squared error ofthe estimates.

MATHEMATICAL ASPECTSLet us consider the multiple linear regres-sion model:

Y = Xβ + ε ,

where Y = (y1, . . . , yn)′ is an (n × 1)

vector of the observations related to thedependent variable (n observations), β =(β0, . . . , βp−1

)′ is the (p×1) parameter vec-tor to be estimated, ε = (ε1, . . . , εn)

′ is the(n× 1) error vector, and

X =⎛⎜⎝

1 X11 . . . X1(p−1)

......

...1 Xn1 . . . Xn(p−1)

⎞⎟⎠

is an (n× p) matrix with independent vari-ables.The estimation of β by means of leastsquares is given by:

β = (β0, . . . , βp−1

)′ = (X′X

)−1 X′Y .

Consider the standardized data matrix:

W =

⎛⎜⎜⎜⎜⎝

Xs11 Xs

12 · · · Xs1p−1

Xs21 Xs

22 · · · Xs2p−1

......

. . ....

sn1 Xs

n2 · · · Xsnp−1

⎞⎟⎟⎟⎟⎠

.

Page 457: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

R

Ridge Regression 459

Hence the model becomes:

Y = μ+Wγ + ε ,

where

μ = (γ0, . . . , γ0)′ = (

βs0, . . . , βs

0

)′ and

γ = (γ1, . . . , γp−1

)′ =(βs

1, . . . , βsp−1

)′.

The least-squares estimates are given byγ0 = βs

0 = Y and by:

γ = (γ1, . . . , γp−1

)′

=(βs

1, . . . , βsp−1

)′

= (W′W)−1W′Y .

These estimates are unbiased, that is:

E(βj) = βj , j = 0, . . . , p− 1 ,

E(γj) = γj , j = 0, . . . , p− 1 .

Among all possible linear unbiased esti-mates, they are the estimates with the mini-mal variance. We know that

Var(β) = σ 2 (

X′X)−1

,

Var (γ ) = σ 2 (W′W

)−1,

where σ 2 is the error variance. Note thatW′W is equal to (n − 1) times the corre-lation matrix for the independent variables.Consequently, in the most favorable case nocollinearity arises since W′W is a multipleof I. In order to have a matrix W′W as closeas possible to the favorable case, we replaceW′W by W′W + kI, where k is a positivescalar.Ridge regression replaces the least-squaresestimates

γ = (W′W

)−1 W′Y

by the ridge estimates

γR =(W′W+ kI

)−1 W′Y= (

γR1 , . . . , γR(p−1)

).

There exists always a value for k for whichthe total mean squared error of the ridgeestimates is less than the total mean squarederror of the least-squares estimates. In 1975,

Hoerl et al. suggested that k = (p−1)σ 2

γ ′γgivesa lower totalmean squared error,where(p− 1) is the number of explanatory vari-ables. To compute k, we must estimate σ andγ by s and γ . Then, γ is reestimated by γR

using ridge regression with k = (p−1)σ 2

γ ′γ .Note, though, that the k that has been usedis a random variable, since it depends on thedata.

DOMAINS AND LIMITATIONSProperties of Ridge EstimatesRidge estimates are biased since the expect-ed values E

(γRj

)of the estimates γRj are in

the vector:

E(γRj

) = (W′W+ kI

)−1 W′Wγ �= γ ,

∀k > 0 .

Note that the value k = 0 corresponds tothe least-squares estimates. Ridge estimatestrade some bias in order to reduce the vari-ance of the estimates. We have:

Var (γR) = σ 2VR ,

with:

VR=(W′W+ kI

)−1 W′W(W′W+ kI

)−1.

Note that the diagonal elements of matrix VR

are smaller than the diagonal elements corre-sponding to

(W′W

)−1, implying that ridgeestimates have lower variance than least-squares estimates.Thus ridge estimates are less variable thanleast-squares estimates. Is their bias accept-able? To measure a compromise between the

Page 458: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

460 Ridge Regression

bias and variance of an estimate, we general-ly compute its mean squared error, denotedby MSE. For an estimate θ of θ this is definedas:

MSE(θ) = E

((θ − θ

)2)

= Var(θ)+ (

E(θ)− θ

)2.

Equivalently, we define the total meansquared error of an estimated vector bythe sum of the mean squared errors of itscomponents. We have:

TMSE (γR) =p−1∑j=1

MSE(γRj

)

=p−1∑j=1

(E

(γRj − γj

)2)

=p−1∑j=1

[Var

(γRj

)

+ (E

(γRj

)− γj)2

]

= (p− 1) σ 2 Trace (VR)

+p−1∑j=1

(E

(γRj

)− γj)2

.

This method is called adaptive ridge regres-sion. The properties of the ridge estimateshold only when k is a constant, which is notthe case for adaptive ridge regression. Nev-ertheless, the properties for the ridge esti-mates suggest that adaptive ridge estimatesare good estimates. In fact, statistical stud-ies have revealed that for many data sets anadaptive ridge estimation is better in terms ofmean squared error than least-squares esti-mation.

EXAMPLESIn the following example, 13 samples ofcement were set. For each one, the per-

centages of the four chemical ingredientswas measured and are tabulated below. Theamount of heat emitted was also measured.The goal was to define how the quantities xi1,xi2, xi3, and xi4 affect yi, the amount of heatemitted.

Table: Heat emitted

Samples Ingredient Heat

1 2 3 4

i Xi1 Xi2 Xi3 Xi4 Yi

1 7 26 6 60 78.5

2 1 29 15 52 74.3

3 11 56 8 20 104.3

4 11 31 8 47 87.6

5 7 52 6 33 95.9

6 11 55 9 22 109.2

7 3 71 17 6 102.7

8 1 31 22 44 72.5

9 2 54 18 22 93.1

10 21 47 4 26 115.9

11 1 40 23 34 83.9

12 11 66 9 12 113.3

13 10 68 8 12 109.4

where Yi is the heat evolved by the sample i,Xi1 is the quantity of ingredient 1 in sample i,Xi2 is the quantity of ingredient 2 in sample i,Xi3 is the quantity of ingredient 3 in sample i,and Xi4 is the quantity of ingredient 4 in sam-ple i.Weinitiallycarryoutamultiplelinearregres-sion. The linear regression model is:

yi = β0 + β1Xi1 + β2Xi2 + β3Xi3

+ β4Xi4 + εi .

We obtain the following results:

Page 459: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

R

Risk 461

Variable Coefficient Std. dev. tcConstant 62.4100 70.0700 0.89

X1 1.5511 0.7448 2.08

X2 0.5102 0.7238 0.70

X3 0.1019 0.7547 0.14

X4 −0.1441 0.7091 −0.20

The estimated vector obtained by standard-ized data is:

γ = (9.12; 7.94; 0.65;−2.41)′ ,

with a standard deviation of 4.381, 11.26,4.834, and 11.87, respectively. The estimatefor s of σ is s = 2.446. We follow a ridgeanalysis using:

k = (p− 1) σ 2

γ ′γ

= 4 · (2.446)2

(9.12)2+(7.94)2+(0.65)2+(2.41)2

= 0.157 .

This is an adaptive ridge regression since kdependsonthedata.Weuse, though, the termridge regression. We obtain:

YR = (7.64; 4.67; −0.91; −5.84)′ ,

with a standard deviation of 1.189, 1.377,1.169, and 1.391, respectively. We have theestimated values:

Yi = 95.4+ 7.64Xsi1 + 4.67Xs

i2 − 0.91Xsi3

− 5.84Xsi4 .

The estimated value for the constant does notdiffer from the least-squares estimate.

FURTHER READING� Collinearity� Mean squared error

� Simulation� Standardized data

REFERENCESHoerlA.E.:ApplicationofRidgeAnalysis to

Regression Problems. Chem. Eng. Prog.58(3), 54–59 (1962)

Hoerl A.E., Kennard R.W.: Ridge regres-sion: biased estimation for nonorthogo-nal Problems. Technometrics 12, 55–67(1970)

Hoerl A.E., Kennard R.W., Baldwin K.F.:Ridge regression: some simulations.Commun. Stat., vol. 4, pp. 105–123(1975)

RiskIn epidemiology, risk corresponds to theprobability that a disease appears in a popu-lation during a given time interval. The riskis equally defined for either the populationor a homogeneous subgroup of individuals.The homogeneity is used here in the sensethat individuals are exposed to a certain fac-tor suspected of causing the disease beingstudied. The notion of risk is strongly relat-ed to that of incidence (the incidence corre-sponding to the frequency of a disease, therisk to its probability).

HISTORYIn 1951, Cornfield, Jerome clearly statedfor the first time the notion of risk. Thisresulted from a followup study where riskfactors were compared for two groups, oneconsisting of sick individuals and the otherof healthy individuals. Since then risk hasbeen widely used in epidemiologic studiesasa measure of association between a diseaseand the related risk factors.

Page 460: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

462 Risk

MATHEMATICAL ASPECTSThe risk is the probability of the diseaseappearing in a fixed time period in a groupof individuals exposed to a risk factor understudy. We estimate the risk by the ratio:

risk =incident cases

(in the exposed population)

population at risk

during the given time period. The nomina-tor corresponds to the number of new casesof the disease during the fixed time period inthe studied population, while the denomina-tor corresponds to the number of persons atrisk in the same time period.

DOMAINS AND LIMITATIONSThe notion of risk is intuitively a proba-bility measure, being defined for a givenpopulation and a determined time period.Insurance companies use it to predict thefrequency of events such as death, disease,etc. in a defined population and accord-ing to factors such as sex, age, weight, etc.Risk can be interpreted as the probabilityfor an individual to contract a disease, giv-en that he has been exposed to a certainrisk factor. We attribute to the individual themean value of the risk for the individualsof his group. Insurance companies use riskfor each population to determine the contri-bution of each insured individual. Doctorsuse risk to estimate the probability that oneof their patients develops a certain disease,has a complication, or recovers from the dis-ease.It is important to note that the notion of riskis always defined for a certain time period:the risk in 2, 3, or 10 years.The minimal risk is 0%, the maximal risk100%.

EXAMPLESSuppose that on 1 January 1992 a group of100060-year-oldmenisselected,and10cas-es of heart attack are observed during theyear. The heart attack risk for this populationof 60-year-old men equals 10/1000 = 1%in a year.We consider here a case of breast cancerover the course of 2 years in a population of100000 women, distributed in three groupsdepending on the level of exposure to thefactor “smoke”. The following contingencytable illustrates this example:

Group Number ofcases(incidents)

Number ofindividuals

Nonexposed 40 35100

Passive smokers 140 55500

Active smokers 164 59400

Total 344 150000

The risk for each group is computed asthe ratio of the number of cases dividedby the total number of individuals in eachgroup. For example, the risk for the passivesmokers is 140/55500 = 0.002523 =252.3/100000 in 2 years. We have the fol-lowing table:

Group Number ofcases(incidents)

Numberof indi-viduals

Risk in twoyears(100000)

Nonex-posed

40 35100 114.0

Passivesmokers

140 55500 252.3

Activesmokers

164 59400 276.1

Total 344 150000 229.3

The notion of risk implies the notion of pre-diction: among the women of a given popu-

Page 461: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

R

Robust Estimation 463

lation, what proportion of them will developbreast cancer in the next 2 years? From thetable above we note that 229.3/100000 maydevelop this disease in the next 2 years.

FURTHER READING� Attributable risk� Avoidable risk� Cause and effect in epidemiology� Incidence rate� Odds and odds ratio� Prevalence rate� Relative risk

REFERENCESCornfield, J.: A method of estimating com-

parative rates from clinical data. Appli-cations to cancer of the lung, breast, andcervix. J. Natl. Cancer Inst. 11, 1269–75(1951)

Lilienfeld, A.M., Lilienfeld, D.E.: Founda-tions of Epidemiology, 2nd edn. Claren-don, Oxford (1980)

MacMahon, B., Pugh, T.F.: Epidemiology:Principles and Methods. Little Brown,Boston, MA (1970)

Morabia, A.: Epidemiologie Causale. Editi-ons Médecine et Hygiène, Geneva (1996)

Morabia, A.: L’Épidémiologie Clinique.Editions “Que sais-je?”. Presses Univer-sitaires de France, Paris (1996)

Robust Estimation

An estimation is called robust if the esti-mators that are used are not sensitive tooutliers and small departures from ideal-ized (model) assumptions. These estima-tion methods include Maximum likelihood

type estimation known as M-Estimates, lin-ear combination of order statistics, known asL-Estimates, and methods based on statis-tical ranks, R-Estimates. Outliers are prob-ably errors resulting from a bad reading,a bad recording, or any other cause relatedto the experimental environment. The arith-metic mean and other estimators of the leastsquares are very sensitive to such outliers.That is why it is preferable in these cases tocall upon estimators that are more robust.

HISTORYThe term robust, as well as the field towhich it is related, was introduced in 1953by Box, G.E.P. The term was recognized asa field of statistics only in the mid-1960s.Nevertheless, it is important to note that thisconcept is not new; in the late 19th century,several scientists already had a clear idea ofthis concept. In fact, the first mathematicalwork on robust estimation apparently datesback to 1818 with Laplace, P.S. in his workLe deuxième supplément à la théorie ana-lytique des probabilités. The distribution ofthe median can be found in this work.It is from 1964, with the article ofHuber,P.J.,that this field became a separate field ofstatistics known as robust statistics.

MATHEMATICAL ASPECTSConsider the following model:

Y = X · β + ε ,

where Y is the (n × 1) vector of the obser-vations relative to the dependent variable(n observations), β is the (p × 1) vector ofthe parameters to be estimated, and ε is the(n× 1) vector of errors.There exist several methods of robust esti-mation that allow one to estimate the vectorof parameters β.

Page 462: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

464 Robust Estimation

Some of the methods are:• L1 estimation:

Minimizen∑

i=1

|εi| .

• In a more general way, the concept of M-estimators based on the idea of replacingthe square of the errorsε2

i (least-squaresmethod) is defined by a function δ(εi),where δ is a symmetric function havinga unique minimum in zero:

Minimizen∑

i=1

δ (εi) .

In 1964, Huber, P.J. proposed the follow-ing function:

δ (x) =

⎧⎪⎪⎨⎪⎪⎩

x2

2, if |x| ≤ k

k · |x| − k2

2, if |x| > k

,

where k is a positive constant.Notice that for δ(x) = |x|, the L1 estima-tors are obtained.

FURTHER READING� Estimation� Estimator� L1 estimation� Maximum likelihood� Outlier

REFERENCESBox, G.E.P.: Non-normality and tests on

variance. Biometrika 40, 318–335 (1953)

Huber, P.J.: Robust estimation of a locationparameter. Ann. Math. Stat. 35, 73–101(1964)

Huber, P.J.: Robust Statistics. Wiley, NewYork (1981)

Huber, P.J.: Robustness: Where are we now?Student, vol. 1, no. 2, pp. 75–86 (1995)

Laplace, P.S. de: Deuxième supplément àla théorie analytique des probabilités.Courcier, Paris (1818). pp. 531–580 dansles Œuvres complètes de Laplace, vol. 7.Gauthier-Villars, Paris, 1886

Page 463: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Sample

A sample is a subset of a population onwhich statistical studies are made in order todraw conclusions relative to the population.

EXAMPLESA housewife is preparing a soup. To deter-mine if the soup has enough salt, she tastesa spoonful. As a function of what sheobserved on the sample represented hereby a spoonful, she can decide if it is neces-sary to add salt to the soup, symbolizing thestudied population.

FURTHER READING� Chebyshev’s Inequality� Estimation� Population� Sample size� Sampling

Sample Size

The sample size is the number of individualsor elements belonging to a sample.

HISTORYNordin, J.A. (1944) treats the problem ofthe estimation of the sample size within theframework of a theory of decisions through

a relative example in the potential sales ina market where a seller wants to have a pres-ence. Cornfield, J. (1951) illustrates in a pre-cise manner the estimation of the sample sizeto form classes with known proportions. Sit-tig, J. (1951) discusses the choice of the sam-ple size for a typical economic problem, tak-ing into account the inspection and the costincurred for defective pieces in an acceptedlot and for correct pieces in a rejected lot.Numerous studies were also conducted byCox, D.R. (1952), who was inspired by theworks of Stein, C. (1945), to determine sam-ple size in an ideal way.

MATHEMATICAL ASPECTSLet X1, . . . , Xn be a random sample takenwithout replacement of a population of Nobservations having an unknown mean μ.We suppose that the sample distribution ofthe statistic follows a normal distribution.The Chebyshev inequality, depends on thenumber n of observations and it allows toestimate the mean μ according to the empir-ical mean of the obtained values Xn.When N is large enough, the size n of thesample can be determined using the Tcheby-chev inequality, so that the probability thatXn would be included in the interval [μ −ε, μ+ ε] (ε > 0 fixed at the required preci-sion) can be as great as we want. If we call

Page 464: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

466 Sample Size

α the significance level, we get:

P (|x− μ| ≥ ε) = α or

P (|x− μ| < ε) = 1− α ,

which we can also write as:

P

(−zα/2 ≤ x− μ

σx≤ zα/2

)= 1− α ,

where σx, the standard error of the mean,is estimated by σ√

n, where σ is the standard

deviationof thepopulationandzα/2thestan-dard critical value associated to the confi-dence level 1− α.We construct the confidence interval withthe help of the normal table for the confi-dence level 1− α:

−zα/2 · σ√n≤ x− μ ≤ zα/2 · σ√

n.

As the normal distribution is symmetric,weareonly interested in the inequality on theright. It indicates that zα/2 · σ√

nis the greatest

value that x− μ can take. This limit is alsogiven by the desired precision. From where:

ε = zα/2 · σ√n

,

that is:

n =(

zα/2 · σδ

)2.

If the desired precision can be expressed inpercentage relative to the meanμ, or k%, wethen have:

δ = k · μ

and n =(

k· σμ

)2

,

where σμ

is called the coefficient of varia-tion. It measures the relative dispersion ofthestudiedvariable. Incaseswhere thesam-pling distribution is not normally distribut-ed, these formulas are approximately validwhen the sample size, n, is much greater than(n ≥ 30).

DOMAINS AND LIMITATIONSThe determination of the sample size isan important step for organizing a statis-tical inquiry. It implies not only the com-plex calculations but also the taking intoaccount of the different requirements of theinquiry itself. Thus the desired precision ofthe inquiry results is to be considered. Gen-erally, the greater the sample size, the greaterthe precision. The cost of the inquiry con-stitutes another important constraint. Thesmaller the budget, the more restricted thesample size. Moreover, the availability ofother resources, such as the existence of datacomming from a census or of a system ofinquirers or the duration of the inquiry, aswell as the territory to be covered, can haveadramatic impacton thestructureof thesam-ple.Census theory places great importance onsamples of “optimal” dimension, especiallywhen optimizing (maximizing or minimiz-ing) an objective function in which the prin-ciplesare formulated mathematically.More-over, it is important to underline that gener-ally the optimal sample dimension does notdepend (or depends only slightly) on the sizeof the original population.In the area of experimental design, the prob-lem of optimal sample size is also raised.In this case one should find the minimalnumber of observations to take into accountto detect a significant difference (if there isone) between the different treatments suchas many drugs or manufacturing processes.

EXAMPLESA factory employs several thousand work-ers. Based on a previous experiment, theresearcher knows that weekly salaries arenormally distributed with a standard devia-tion of 40 CHF. The researcher wants to esti-

Page 465: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Sample Space 467

mate themeanweeklysalarywithaprecisionof 10 CHF and a confidence level of 99%.What should the sample size be?For a confidence level of 99% (α = 0.01 =1− 0.99), the z0.01 value that we find in thenormal table equals 2.575.The direct application of the formula givesus:

n ≥[zα · σ

δ

]2,

n ≥[

2.575 · 40

10

]2

,

n ≥ 106.09 .

We will have the desired precision if thesam-ple size is greater than or equal to 107 work-ers.

FURTHER READING� Population� Sample

REFERENCESCochran, W.G.: Sampling Techniques, 2nd

edn. Wiley, New York (1963)

Cornfield, J.: Methods in the Sampling ofHuman Populations. The determinationof sample size. Amer. J. Publ. Health 41,654–661 (1951)

Cox, D.R.: Estimation by double sampling.Biometrika 39, 217–227 (1952)

Harnett, D.L., Murphy, J.L.: Introducto-ry Statistical Analysis. Addison-Wesley,Reading, MA (1975)

Nordin, J.A.: Determining sample size.J. Am. Stat. Assoc. 39, 497–506 (1944)

Sittig, J.: The economic choice of samplingsystem in acceptance sampling. Bull. Int.Stat. Inst. 33(V), 51–84 (1952)

Stein, C.: A two-sample test for a linearhypothesiswhosepower is independentof

the variance. Ann. Math. Stat. 16, 243–258 (1945)

Sample SpaceThe sample space of a random experimentis the set of all possible outcomes of theexperiment, usually denoted by �.

HISTORYSee probability.

MATHEMATICAL ASPECTSA sample space can be:• Finite (the outcomes are countable);• Infinite countable (it is possible to asso-

ciate each possible outcome to a positiveinteger in such a way that each possibleresult has a different number; however,the number of possible outcomes is infi-nite);

• Infinite noncountable (it is impossible toenumerate all the possible outcomes).

The sample space � of a random experi-ment is also called a sure event. The proba-bility of the sure event is equal to 1:

P (�) = 1 .

EXAMPLESFinite sample space:– Flipping a coin

� = {Heads, Tails}– Rolling a die

� = {1, 2, 3, 4, 5, 6}Infinite countable sample space:– Flipping a coin as many times as neces-

sary to obtain “Heads” (H = heads and T= tails)

� = {H, TH, TTH, TTTH, TTTTH,

TTTTTH, TTTTTTH, . . .} .

Page 466: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

468 Sampling

Infinite noncountable sample space:– Location of the coordinates of the impact

of a bullet on a panel of 10 × 10 cm. Inthis case, the sample space� is composedof an infinite noncountable set of coor-dinates because the coordinates are mea-sured on a scale of real numbers (it isimpossible to give an exhaustive list of thepossible results).

FURTHER READING� Event� Probability� Random experiment

REFERENCESBailey, D.E.: Probability and statistics: mod-

els for research. Wiley, New York (1971)

Sampling

From a general point of view, the term sam-pling means the selection of part of a pop-ulation, called sample, the study of certaincharacteristics x of this sample, and the factof making inferences relative to the popula-tion.Samplingalsorepresents thesetofoper-ationsrequired toobtainasample fromagiv-en population.

HISTORYThe concept of representativity of a sampleis a very recent one.Even though the first trials of extrapola-tion appeared in the 13th century, notablyin France, where the number of homes wasused to estimate the population, the censusremained more popular than sampling untilthe 19th century.The principle of sampling with or withoutreplacement appeared for the first time in

the work “De ratiociniis in Aleae Ludo”,pubished in 1657 by the Dutch scientist Huy-gens, Christiaan (1629–1695).It was in 1895, in Bern, during the congressof the International Institute of Statistics,that Kiaer, Anders Nicolai presented a mem-oir called “Observations et expériences con-cernant des dénombrements représentatifs”,Kiaer, as director of the Central Bureau ofStatistics of the Kingdom of Norway at thetime, compared in his presentation the struc-tureofasamplewith thestructureofa popu-lation obtained by census. He was thereforedefining the term control a posteriori and forthefirst timeused the term“representative”.Kiaer, A.N. encountered the opposition ofalmost all congress participants, who wereconvinced that complete enumeration wasthe only way to proceed.Nevertheless, in 1897, Kiaer, A.N. con-firmed the notion of representativity andreceived the support of the Conference ofScandinavian Statisticians. He was giventhe opportunity to defend his point of viewduring several conferences: St.-Petersburg(1897),Stockholm (1899),Budapest (1901),and Berlin (1903).A new step was taken with the works ofMarch, Lucien on the role of randomnessduring sampling; he was the first to developthe ideaofprobabilisticsampling,alsocalledrandom sampling.Other statisticians got involved with theproblem. Von Bortkiewicz, L., a profes-sor from Berlin, suggested the calcula-tion of probabilities to test the deviationbetween the distributions of a sample andthe total population on the key variables.Bowley, A.L. (1906) developed a randomsampling procedure called stratification.According to Desrosières, A. (1988), he wasalso interested in the notion of confidence

Page 467: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Sampling 469

interval, of which he presented the first cal-culations in 1906 to the Royal StatisticalSociety.The year 1925 marked a new step in the his-tory of sampling. It is indeed the year of thecongress in Rome of the International Insti-tute of Statistics during which were present-edrandomsampling(choicedonebychance)and purposive sampling (judicious choice).Henceforth, the problem would never ariseagain with respect to the choice betweensampling and complete enumeration, but itdid arise with respect to various samplingmethods.Of the different uses of sampling, the mostfrequently used in the United States wasthe one done by the various public opinioninstitutes and notably preelectoral inquiries.The third day of November 1936, the daythe results of the presidential elections weremade public, marks one of the most impor-tant days in this regard. For this poll, the Rea-ders Digest predicted the victory of Landon,whereas,withoutconsultingeachother,poll-stersCrossley,Roper, and Gallup announcedthe victory of his opponent, Roosevelt. Roo-sevelt got elected. The mistake of the Read-ers Digest was to base its prediction of theoutcome of the election on a telephone sur-vey, but at that time, only rich people owneda telephone, so the sample was biased!The year 1938 marks in France the birthof the IFOP (Institut Français d’OpinionPublique, the French Institute of PublicOpinion) created by Stoetzel, Jean.It is interesting to go back a little in timeto emphasize the part that Russian statisti-cians played in the evolution of samplingtechniques. Indeed, as early as the 19th cen-tury, they were known and used in theircountry. According to Tassi, Ph. (1988),Chuprov, A.I. (1842–1908) was one of the

precursors of modern sampling, and from1910 his son Chuprov, A.A. used randomsampling, made reference to cluster sam-pling, stratified sampling with and with-out replacement, and, under the influence ofMarkov, A., encouraged the use of proba-bilities in statistics. Even if the Russianschool was only known very late for its sam-pling works, the statisticians of this schoolapparently knewabout these techniques longbefore those of western Europe.Note that in 1924, Kovalevsky, A.G. hadalready rigorously worked on surveys bystratification and the optimal allocation bystratum, while this result was only rediscov-ered 10 years later by Neyman, J.Finally, this history would be incompletewithout mentioning the contributions ofNeyman, Jerzy to sampling. Consideredone of the founders of the statistical theo-ry of surveys, he was in favor of randomsampling on purposeful sampling, and heworked on sampling with or without remit-tance, stratified sampling, and clustersampling. He was already using estima-tors by quotient, difference, and regres-sion, techniques that would be developedlater on by Horvitz, D.G., Cochran, W.G.,Hansen, M.M., Hurwitz, W.N., and Mad-ow, W.G.

MATHEMATICAL ASPECTSThere are several sampling methods. Theycan be divided into random methods andnonrandom methods.The most widely used nonrandom procedureis quota sampling.The random methods (or probabilistic sam-pling, sometimes also called statistical sam-pling) call upon a probabilistic mechanismto form a sample taken from a popula-tion.

Page 468: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

470 Sampling

The variables observed on the sample arerandom variables. From these variables thecorresponding values of the population andthe error due to sampling can be calculat-ed.The main random methods are:• Simple random sampling• Stratified sampling• Systematic sampling• Cluster sampling

DOMAINS AND LIMITATIONSThe goal of sampling is to provide enoughinformation to make inferences on the char-acteristics of a population. It therefore isnecessary to select a sample that reproducesthese characteristics as accurately as possi-ble.Attempts are made to reduce the potentialerror due to sampling as much as possible.Buteven if this sampling error is inevitable, itis tolerated because it is largely compensatedby the advantages of the sampling method.These are:• Cost reduction

In view of all the costs that can be incurredduring data collection, they can obvious-ly be reduced by sampling.

• Time savingsThe speed of decision making can onlybe ensured if the study is constrained toa sample.

• Increased possibilitiesIn certain fields, a complete analysis onan entire population is impossible; thechoice, then, is between using samplingor shelving the study.

• Precise resultsSince one of the goals of sampling isto recreate a representative image ofthe characteristics of a population, larger

samples would not bring results that aresignificantly more precise.

Onlyrandomsamplingmethodscanevaluatethe error due to sampling because it is basedon the calculation of probabilities.

FURTHER READING� Cluster sampling� Data collection� Estimation� Estimator� Quota sampling� Simple random sampling� Stratified sampling� Systematic sampling

REFERENCESBowley, A.L.: Presidential address to the

economic section of the British Asso-ciation. J. Roy. Stat. Soc. 69, 540–558(1906)

Delanius, T.: Elements of Survey Sampling.SAREC, Stockholm (1985)

Desrosières, A. La partie pour le tout: com-mentgénéraliser?Lapréhistoirede lacon-trainte de représentativité. Estimation etsondages. Cinq contributions à l’histoirede la statistique. Economica, Paris (1988)

Foody, W.N., Hedagat, A.S.: On theory andapplication of BIB designs with repeatedblocks.AnnalsofStatistics,V=S932–945(1977)

Hedayat, A.S., Sinha, B.K.: Design andInference in Finite Population Sampling.Wiley, New York (1991)

Tassi, Ph.: De l’exhaustif au partiel: unpeu d’histoire sur le développement dessondages. Estimation et sondages, cinqcontributions à l’histoire de la statistique.Economica, Paris (1988)

Page 469: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Sampling Distribution 471

Sampling Distribution

A sampling distribution is a distributionof a statistic T(X1, . . . , Xn), where T(.) isa function of the observed values of the ran-dom variables X1, . . . , Xn.

HISTORYSee sampling.

MATHEMATICAL ASPECTSConsider, a population of size N and ofparameters μ and σ (μ being the unknownmean of the population and σ its standarddeviation). We take a sample of size n fromthis population. The number of possiblesamples of size n is given by the number ofcombinations of n objects among N:

k = CnN =

N!

(N − n)! · n!.

Each sample can be characterized, for exam-ple, by its sample mean denoted by:

x1, x2, . . . , xk .

The set of these k means forms the samplingdistribution of the means.We will adopt the following notations tocharacterize the mean and standard devi-ation of the sampling distribution of themeans:

Mean: μx and standard deviation: σx .

Mean of Sampling Distribution of MeansConsider X1, . . . , Xn independent randomvariables distributed according to a distri-bution with a mean μ and a variance σ 2. Bydefinition of the arithmetic mean, the meanof the sampling distribution of the means is

equal to:

μx =

k∑i=1

xi

k,

where k is the number of samples of size nthat it is possible to form from the popula-tion.The expected value of the mean of the sam-pling distribution of the means is equal to themean of the population μ:

E[μx] = μ .

Because:

E[μx] = E

[∑ki=1 xi

k

]= 1

k· E

[k∑

i=1

xi

]

= 1

k∑i=1

E[xi]

= 1

k∑i=1

E

[∑nj=1 xj

n

]

= 1

k∑i=1

⎛⎝1

n· E

⎡⎣

n∑j=1

xj

⎤⎦⎞⎠

= 1

k∑i=1

⎛⎝1

n∑j=1

E[xj]

⎞⎠

= 1

k∑i=1

(1

n· n · μ

)

= 1

k· k · μ = μ .

Standard Deviation of SamplingDistribution of MeansBy definition, the standard deviation of thesampling distribution of the means, also

Page 470: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

472 Sampling Distribution

called the standard error, is equal to:

σx =

√√√√√√k∑

i=1

(xi − μx)2

k.

In the same way that there exists a relationbetween the mean of the sampling distri-bution of the means and the mean of the pop-ulation, there also exists a relation betweenthe standard error and the standard devi-ation of the population:• In the case of an exhaustive sampling

(without replacement) on a finite popu-lation, the standard error of the meanis given by:

σx = σ√n·√

N − n

N − 1,

where σ is the standard deviation of thepopulation.

• If the population is infinite, or if thestudied sampling is not exhaustive (withreplacement), the standard error of themean is equal to the standard deviationof the population divided by the squareroot of the sample size:

σx = σ√n

.

Characteristics of Sampling Distributionof Means• Ifn is largeenough(n≥ 30), thesampling

distribution of the means approximatelyfollows a normal distribution, whateverthe population distribution.

• If the population is distributed accordingto a normal distribution, then the sam-pling distribution of the means alwaysfollows a normal distribution, whateverthe size of the sample.

• The sampling distribution of the means isalways symmetric around its mean.

Similar considerations can be made forstatistics other than the arithmetic mean.Consider the distribution of the variancesobtained from all the possible samples ofsize n, taken from a normal population witha variance σ 2. This distribution is called thesampling distribution of the variances.One of the characteristics of this distributionis that it can only take nonnegative values,the variance s2 being defined by a sum ofsquares. The sampling distribution of thevariances is related to the chi-square distri-bution by the following relation:

s2 = χ2σ 2

v= χ2σ 2

n− 1,

where χ2 is a random variable distributedaccording to a chi-square distribution withv = n− 1 degrees of freedom.Knowing the expected value and the vari-ance of a chi-square distribution:

E[χ2

]= v ,

Var(χ2

)= 2v ,

we can determine these same characteristicsfor the distribution of s2 in the case where thesamples come from an infinite population.

Mean of Sampling Distribution ofVariancesThe mean of the distribution of s2 is equalto the variance of the population:

μs2 = E[s2

]= E

[χ2σ 2

v

]

=(

σ 2

v

)· E

[χ2

]

(because σ 2/v is a constant)

Page 471: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Sampling Distribution 473

=(

σ 2

v

)· v

= σ 2 .

Variance of Sampling Distributionof VariancesThe variance of the distribution of s2

depends not only on the variance of thepopulation σ 2, but also on the size of thesamples n = v+ 1:

σ 2s2 = Var

(s2

)= Var

[χ2σ 2

v

]

=(

σ 4

v2

)· Var

(χ2

)

=(

σ 4

v2

)· 2v

= 2 · σ4

v.

DOMAINS AND LIMITATIONSThe study of the sampling distribution ofa statistic allows one to judge the proximityof the statistic measured on a single samplewith an unknown parameter of the popu-lation. It therefore also allows to specify theerror margins of the estimators, calculatedfrom the sample itself.The notion of the sampling distribution istherefore at the base of the construction ofconfidence intervals, indicating the degreeof precision of an estimator, as well asthe realization of hypothesis testing on theunknown parameters of a population.

EXAMPLESUsing an example, we will show the rela-tionexistingbetween themeanofasamplingdistribution and the mean of a populationand the relation between the standard errorand the standard deviation of that popula-tion.

Consider a population of five stores whoseaverage price on a certain product we wantto know. The following table gives the priceof the product for each store belonging to thepopulation:

Store no. Price

1 30.50

2 32.00

3 37.50

4 30.00

5 33.00

The mean of the population is equal to:

μ =

5∑i=1

xi

5

= 30.50+ 32+ 37.50+ 30+ 33

5= 32.60 .

By selecting samples of size 3, we can deter-mine the different sampling means that canresult from a random selection of the sample.We obtain:

k = C35 =

5!

3! · 2!= 10 possible samples.

The following table represents the differ-ent samples and their respective samplingmeans:

No. of stores in sample Samplingmeans xi

1. 1, 2, 3 33.333..

2. 1, 2, 4 30.833..

3. 1, 2, 5 31.833..

4. 1, 3, 4 32.666..

5. 1, 3, 5 33.666..

6. 1, 4, 5 31.166..

Page 472: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

474 Sampling Distribution

No. of stores in sample Samplingmeans xi

7. 2, 3, 4 33.166..

8. 2, 3, 5 34.166..

9. 2, 4, 5 31.666..

10. 3, 4, 5 33.500

Total 326.000

From the data in this table we can calculatethe mean of the sampling distribution of themeans:

μx =

k∑i=1

xi

k= 326.00

10= 32.60 .

In this example, we notice that the mean ofthesampling distribution isequal to themeanof the population:

μx = μ .

We can proceed in the same way to verify therelation between the standard error and thestandard deviation of the population.By definition, the standard deviation of thepopulation is calculated as follows:

σ =

√√√√√√N∑

i=1

(xi − μ)2

N.

According to the data of the first table, weobtain:

σ =√

35.70

5= 2.6721 .

By the same definition, we can calculate thestandard deviation of the sampling distri-bution of the means:

σx =

√√√√√√k∑

i=1

(xi − μx)2

k

=√

11.899

10= 1.0908 .

Thefollowing tablepresents thedetailsof thecalculations:

No. of storesin sample

Samplingmeans xi

(xi − μx )2

1. 1, 2, 3 33.333.. 0.538

2. 1, 2, 4 30.833.. 3.121

3. 1, 2, 5 31.833.. 0.588

4. 1, 3, 4 32.666.. 0.004

5. 1, 3, 5 33.666.. 1.138

6. 1, 4, 5 31.166.. 2.054

7. 2, 3, 4 33.166.. 0.321

8. 2, 3, 5 34.166.. 2.454

9. 2, 4, 5 31.666.. 0.871

10. 3, 4, 5 33.500 0.810

Total 326.000 11.899

According to the obtained results:

σ = 2.6721 and σx = 1.0908 ,

we can verify the following equality:

σx = σ√n·√

N − n

N − 1

because σ√n·√

N−nN−1 = 2.6721√

3·√

5−35−1 =

1.0908. These results allow us to concludethat by knowing the parameters μ and σ

of the population, we can evaluate the cor-responding characteristics of the samplingdistribution of the means. Conversely, wecanalsodetermine theparametersof thepop-ulation by knowing the characteristics of thesampling distribution.

FURTHER READING� Confidence interval� Estimation� Hypothesis testing� Standard error

Page 473: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Scheffé, Henry 475

Scatterplot

A scatterplot is obtained by transcribingthe data of a sample onto a graphic. Intwo dimensions, a scatterplot can repre-sent n pairs of observations (xi; yi), i =1, 2, . . . , n.

MATHEMATICAL ASPECTSA scatterplot is obtained by placing the pairsof observed values in an axis system.For two variables X and Y, the pairs ofcorresponding points (x1, y1), (x2, y2), . . . ,(xn, yn) are placed in a rectangular axis sys-tem.Thedependent variable Y isusually plottedon the vertical axis (ordinate) and the inde-pendent variable X on the horizontal axis(abscissa).

DOMAINS AND LIMITATIONSThe scatterplot is a very useful and power-ful tool often used in regression analysis.Each point represents a pair of observed val-ues of the dependent variable and of theindependent variable. It allows to deter-mine graphically if there exists a relationbetween two variables before choosing anappropriate model. Moreover, these scatter-plots are very useful in residual analysissince they allow one to verify if the modelis appropriate or not.

EXAMPLESConsider thefollowingtwoexamplesofscat-terplots representing pairs of observations(xi, yi) relative to two quantitative vari-ables X and Y:In this example, the distribution of the obser-vations seems to indicate a linear relationbetween the two variables:

The distribution of the observations in thefollowing graphic seems to indicate a non-linear relation between X and Y:

FURTHER READING� Analysis of residuals� Regression analysis

Scheffé, Henry

Scheffé, Henry was born in 1907 in NewYork. Scheffé, Henry’s doctoral disserta-tion, “The Asymptotic Solutions of Cer-tain Linear Differential Equations in Whichthe Coefficient of the Parameter May Havea Zero”, was supervised by Langer, RudolphE. Immediately he began his career as a uni-versity professor of mathematics at the Uni-versity of Wisconsin, then at Oregon StateUniversity as well as at Reed College outside

Page 474: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

476 Seasonal Index

ofPortland. In 1941 he joined PrincetonUni-versity, where a statistics team had grown up.A second career started for him as a univer-sity professor of statistics. He was on the fac-ulty atSyracuseUniversity in the1944–1945academic year and at the University of Cali-fornia at Los Angeles from 1946 until 1948.He left Los Angeles to go to Columbia Uni-versity, where he became chair of the Statis-tics Department. After 5 years at Columbia,Scheffé, Henry went to Berkeley as profes-sor of statistics in 1953.Scheffé, H.’s research was mainly concernedwith theanalysisofvariance.Oneofhismostimportant papers appeared in 1953 on theS-method of simultaneous confidence inter-vals for estimable functions in a subspace ofthe parameter space. He also studied pairedcomparisons and mixed models. In 1958 andagain in 1963 he published papers on expe-riments on mixtures and in 1973 wrote oncalibration methods.Scheffé, H. became a fellow of the Instituteof Mathematical Statistics in 1944, the Ame-rican Statistical Association in 1952, and theInternational Statistical Institute in 1964. Hewas elected president of the InternationalStatistical Institute and vice president of theAmerican Statistical Association. He died inCalifornia in 1979.

Selected works of Henry Scheffé:

1959 The analysis of variance. Wiley, NewYork.

1952 An analysis of variance for pairedcomparisons. J. Am. Stat. Assoc., 47,381–400.

1953 A method for judging all contrasts inthe analysis of variance. Biometrika,40, 87–104.

1956 Alternative models for the analysis ofvariance. Ann. Math. Stat., 27, 251–271.

1973 A statistical theory of calibration.Ann. Stat., 1:1–37.

FURTHER READING� Analysis of variance� Least significant difference test

Seasonal Index

Considera time seriesYt whosecomponentsare:• Secular trend Tt

• Cyclical fluctuations Ct

• Seasonal variations St

• Irregular variations It

The influence of seasonal variations mustbe neutral over an entire year, and seasonalvariations St theoretically repeat themselvesin an identical way from period to period(which is not verified in a real case).To satisfy the requirements of the theoreticalmodel and to be able to study the real timeseries,periodicalvariations thatare identicaleach year (month by month or trimester bytrimester), called seasonal variations, mustbe estimated instead of the observed St. Theyare denoted by Sj, where j varies as follows:

j = 1 up to 12 for months (over n years) or

j = 1 up to 4 for trimesters (over n years).

Notice that over n years, there only exist12 seasonal index numbers concerning themonths and 4 concerning the trimesters.

MATHEMATICAL ASPECTSThere are many methods for calculating theseasonal indices:

Page 475: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Seasonal Index 477

1. Percentage of meanThe data corresponding to each month areexpressed as percentages of the mean.We calculate it using the mean (withouttaking into account extreme values) or themedian of the percentages of each monthfor the different years. Taking the meanfor each month, we get 12 percentages forgiven indices. If the mean does not equal100%, we should make an adjustment bymultiplying the indices by a convenientfactor.Consider a monthly time series overn years, denoted by Yi,j, where j =1, 2, . . . , 12 for the different months andi = 1, . . . , n, indicated by year.First, we group the data in a table of thetype:

Month

January . . . December

Year 1 . . . 12

1 Y1,1 . . . Y1,12

. . . . . . . . . . . .

n Yn,1 . . . Yn,12

Then we calculate the annual means forthe n years:

Mi = Yi,.

12for i = 1, . . . , n.Thus we calculate for each datum:

Qi,j = Yi,j

Mifor i = 1, . . . , n

and j = 1, . . . , 12, which we report in thefollowing table:

Month

January . . . December

Year 1 . . . 12

1 Q1,1 . . . Q1,12

. . . . . . . . . . . .

n Qn,1 . . . Qn,12

In each column weinclude thearithmeticmean, denoted by Qi,., and we calculatethe sum of these 12 means; if it does notequal 1200, then we introduce a correc-tion factor:

k = 1200∑12i=1 Qi,.

.

Finally, the seasonal indices are given foreach month by the mean multiplied by k.

2. Tendency ratioThe data of each month are expressed aspercentages of the monthly value of thesecular tendency. The index is obtainedfrom an appropriate mean of the percent-ages of the corresponding months. Weshould adjust the results if the mean doesnot equal 100% to obtain the desired sea-sonal indices.

3. Moving average ratioWe calculate a moving average on 12months. As the corresponding results areextended over many months, instead ofbeing in the middle of the month as inthe original data, we calculate the mov-ing average over 2 months of the mov-ing average of 12 months. This is whatwe call a centered moving average over12 months.Then we express the original data ofeach month as percentages of the corre-sponding centered moving average over12 months. We take the mean of the cor-responding percentages of each month,which gives the indices. We should thenadjust at 100% to obtain the desired sea-sonal indices.

4. Relative chainsThe data of each month are expressedas percentages of the data of the previ-ous month (whence the expression rela-tive chain). We calculate an appropriate

Page 476: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

478 Seasonal Index

mean of the chains for the correspondingmonths. We can obtain the relative per-centages of each month relative to thoseof January (= 100%). We find thenthat the following January will ordinari-ly have a percentage equal to, less than,or greater than 100%. Thus the tenden-cy is either increasing or decreasing. Theindices finally obtained, adjusted for theirmean equal to 100%, give the desired sea-sonal indices.

EXAMPLESWe consider the time series of any phe-nomenon; we have trimestrial data over6 years, summarized in the following table:

Year 1sttrim.

2ndtrim.

3rdtrim.

4thtrim.

1 19.65 16.35 21.30 14.90

2 28.15 25.00 29.85 23.40

3 36.75 33.60 38.55 32.10

4 45.30 42.25 47.00 40.65

5 54.15 51.00 55.75 49.50

6 62.80 59.55 64.40 58.05

We determine the seasonal indices using thepercentage of the mean:First we calculate the arithmetic means for6 years:

Year Sum Mean

1 72.2 18.05

2 106.4 26.6

3 141.0 35.25

4 175.2 43.8

5 210.4 52.6

6 244.8 61.2

We divide the lines of the initial table by thetrimestrial mean of the corresponding yearand multiply by 100.

Then, if the results show strong enough devi-ations for the obtained values for the sametrimester, it is better to take the medianinstead of the arithmetic mean to obtain thesignificative numbers.

Year 1sttrim.

2ndtrim.

3rdtrim.

4thtrim.

1 108.86 90.58 118.01 82.55

2 105.83 93.98 112.22 87.97

3 104.26 95.32 109.36 91.06

4 103.42 96.46 107.31 92.81

5 102.95 96.96 105.99 94.11

6 102.61 97.30 105.23 94.85

Median 103.84 95.89 108.33 91.94

The median percentage for each trimester isgiven in the last lineof theprevioustable.Thesum of these percentages is exactly 400%,an adjustment is not necessary, and the num-bers that we have in the last line represent thedesired seasonal indices.We have seen that there are three other meth-ods to determine the seasonal indices, andthey give the following values for the sameproblem:

Trimester

Method 1 2 3 4

Percentageof mean

103.84 95.89 108.33 91.94

Tendencyratio

112.62 98.08 104.85 84.45

Ratio ofmovingaverage

111.56 98.87 106.14 83.44

Chains 112.49 98.24 104.99 84.28

As we see, the results agree one with another,despite the diversity of the methods used todetermine them.

Page 477: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Seasonal Variation 479

FURTHER READING� Cyclical fluctuation� Irregular variation� Seasonal variation� Secular trend� Time series

Seasonal Variation

Seasonal variations form one of the basiccomponents of the time series. They arevariations of a periodic nature that recur reg-ularly. Inaneconomiccontext,seasonalvari-ations can be caused by the following events:• Climatic conditions• Customs appropriate for a population• Religious holidaysTheseareperiodicalfluctuationsmoreorlessregular that are superimposed on extrasea-sonal movement.

HISTORYSee time series.

MATHEMATICAL ASPECTSTheevaluationofseasonalvariations ismadeby determining an index of seasonal varia-tion, called the seasonal index.

EXAMPLESWe consider the time series of any phe-nomenon; we have trimestrial data takenover a 6 years, summarized in the followingtable:

Year 1sttrim.

2ndtrim.

3rdtrim.

4thtrim.

1 19.65 16.35 21.30 14.90

2 28.15 25.00 29.85 23.40

3 36.75 33.60 38.55 32.10

Year 1sttrim.

2ndtrim.

3rdtrim.

4thtrim.

4 45.30 42.25 47.00 40.65

5 54.15 51.00 55.75 49.50

6 62.80 59.55 64.40 58.05

We determine the seasonal indices using thepercentage of the mean:First, we calculate the trimestrial arithmeticmeans for the 6 years:

Year Sum Mean

1 72.2 18.05

2 106.4 26.6

3 141.0 35.25

4 175.2 43.8

5 210.4 52.6

6 244.8 61.2

We divide the lines of the initial table by thetrimestrial mean of the corresponding yearand multiply by 100.Then, if resultsshowlargeenoughdeviationsfor theobtainedvaluesfor thesametrimester,it is better to take the median instead toobtain more significative numbers.

Year 1sttrim.

2ndtrim.

3rdtrim.

4thtrim.

1 108.86 90.58 118.01 82.55

2 105.83 93.98 112.22 87.97

3 104.26 95.32 109.36 91.06

4 103.42 96.46 107.31 92.81

5 102.95 96.96 105.99 94.11

6 102.61 97.30 105.23 94.85

Median 103.84 95.89 108.33 91.94

The median percentage for each trimester isgiven by the last line of the previous table.Since these percentages add up to exactly400%, an adjustment is not necessary, and

Page 478: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

480 Secular Trend

the numbers in the last line represent thedesired seasonal indices.We have seen that there exist other methodsto determine the seasonal indices; they givethe following values for the same problem:

Trimester

Method 1 2 3 4

Percentageof mean

103.84 95.89 108.33 91.94

Tendencyratio

112.62 98.08 104.85 84.45

Ratio ofmoving avg.

111.56 98.87 106.14 83.44

Chains 112.49 98.24 104.99 84.28

We see that the results agree with one anoth-er, despite the diversity of the methods usedto determine them.

FURTHER READING� Arithmetic mean� Cyclical fluctuation� Forecasting� Index number� Moving average� Seasonal index� Secular trend� Time series

REFERENCESFuller,A.W.: Introduction to StatisticalTime

Series. John Wiley’n’Sas (1976)

Secular Trend

The secular trend forms one of the fourbasic components of the time series. Itdescribes the movement over the long termof a time series that globally can be increas-ing, decreasing, or stable.

The secular trend can be linear or not. In thelast case, we should choose the type of func-tion (exponential, squared, logistic, etc.) thatis best adapted to the observed distributionon the graphical representation of the timeseries before estimating the secular trend.

HISTORYSee time series.

MATHEMATICAL ASPECTSLet Yt be a time series. It can be decomposedwith the help of four components:• Secular trend Tt

• Cyclic fluctuation Ct

• Seasonal variations St

• Irregular variations It

During the study of the time series, we startby analyzing the secular trend, first decidingif this trend can be supposed linear or not.Ifwecansuppose it is linear, theevaluationofthe secular trend can be made in many ways:1. Graphical method or the raised hand

After making a graphical representa-tionof the timeseries, itconsists inadjust-ing a line or a curve of trend at raised handon the graph. This is the approximation ofthe secular trend.

2. Least-squares methodThisallowstomathematicallyidentifytheline representing the best data, that is:

Yt = a+ b · t ,

where Yt is the value of the time series atagiven time,a is thevalueof the linewhent is at the origin, b is the slope of the line,and t corresponds to another value in theentire chosen period.The line of the least squares Yt corre-sponds to the estimation of the seculartrend Tt.

Page 479: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Secular Trend 481

Calculation of secular trendSupposing that the time of the time seriesis the year, we start from the idea of mov-ing the origin to the corresponding year inthe middle of the years in which the mea-surements were made.a. Even number of years n

Let j = n− 1/2, and let us set xi =ti− tj for i = 1, 2, . . . , n; the middle ofthe year tj is the new origin and the unitof the new variable x is the year.Thus

a =

n∑i=1

Yi

nand b =

n∑i=1

xiYi

n∑i=1

x2i

gives us Yt = a+bx or Yt = (a−btj)+bt, where t is the year without changingthe origin.

b. Odd number of years nLet j = n/2 + 1, where tj is the neworigin, set xi = 2 · (ti − tj) + 1 fori = 1, 2, . . . , n.Thus

a =

n∑i=1

Yi

nand b =

n∑i=1

xiYi

∑x2

i

give us Yt = a + bx. The unity x isthe half year counted from 1 Januaryof year tj.

3. Moving average methodLet Y1, Y2, Y3, . . . be a given time series.Construct new series:

Y1 + . . .+ Yn

n; Y2 + . . .+ Yn+1

n, . . . ;

Yj + . . .+ Yn+j−1

n, . . .

whose estimation of the secular trend ismade by one of the two previous methods.

4. Semimeans methodsLet Y1, Y2, . . . , Yn be a given time series.Denote by k the integer part of n

2 .We calculate:

y1 = Y1 + . . .+ Yk

kand

y2 = Yk+1 + . . .+ Yn

n− k

x1 = k + 1

2and

x2 = n+ k+ 1

2

and determine the line of secular trendpassing through two points:

(x1; y1) and (x2; y2).

When the secular trend cannot be supposedlinear, we refine the least-squares method.We should apply it replacing Yt = a + btby another trend curve as, for example:• An exponential: Yt = a · bt

• A modified exponential: Yt = c+ a · bt

• A logistic curve: Yt = (c+ a · bt)−1

• A Gompertz curve: Yt = c · abx

DOMAINS AND LIMITATIONSThe analysis of secular trend has the follow-ing goals:• Create a descriptive model of a past situ-

ation.• Make projections of constant structure.• Eliminate secular trends to study other

components of the time series.The four methods of estimation of the secu-lar trend have certain disadvantages. Let usmention the principal ones for each:1. Graphical method

Too subjective.

Page 480: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

482 Secular Trend

2. Least-squares methodThis method can be used only for a peri-od where the movement is completely inthe same direction. When the first partcorresponds to an ascending movementand another part to a descending move-ment, two long lines will have to be cre-ated to adjust to these data, each referringto a period with a unique direction.These lines have some interesting proper-ties:• We find on the same graph the set of

valuesYi of the timeseriesand the trendline. The sum of deviations betweenthe observed values Yi and the estimateof thesecular trend Yi willalwaysequalzero:

n∑i=1

(Yi − Yi

)= 0 .

• The trend line will minimize the sumof deviations:

n∑i=1

(Yi − Yi

)2 = minimal value .

3. Moving average methodThe data of the beginning and end of theinitial time series are “lost”. Moving aver-ages can create cycles and other move-ments not present in the original data;moreover, they are strongly affected byaccidental outliers. This method allows toreveal changes in direction.

4. Semimeans methodEven if this method is simple to apply, itcan give results without value. Thoughtvalid in the case when data can be classedinto two groups where the tendencies arelinear, themethod isapplicableonly whenthe tendency is globally linear or approx-imately linear. If the procedure is an easyapplication, it is not very rigorous.

EXAMPLESLetusestimate thesecular trendof theannualsales of the Surfin cookie factory using thefollowing data:

Annual sales of Surfin cookie factory (in millionsof euros)

Year Sales (mill. $)

1975 7.6

1976 6.8

1977 8.4

1978 9.3

1979 12.1

1980 11.9

1981 12.3

Let us use the least-squares method:Let us move the origin to the middle of the 7years, which is 1978. We complete the fol-lowing table:

Year Sales(mill. euros)

Codeyear

Xt Yt t tYt t2

1975 7.6 −3 −22.2 9

1976 6.8 −2 −13.6 4

1977 8.4 −1 −8.4 1

1978 9.3 0 0.0 0

1979 12.1 1 12.1 1

1980 11.9 2 23.8 4

1981 12.3 3 36.9 9

Total 68.4 0 28.0 28

Set ti = Xi − 1978 for i = 1, 2, . . . , 7. Themiddle of 1978 is the new origin, and the unitof t is the year.Thus:

a =

3∑t=−3

Yt

7= 68.4

7= 9.7714 (million euros) and

Page 481: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Semilogarithmic Plot 483

b =

3∑t=−3

tYt

3∑t=−3

t2

= 28

28= 1.0 (million euros) ,

which gives us Tt = Yt = 9.7714+ t for theequation of the line of secular trend.This gives graphically:

FURTHER READING� Cyclical fluctuation� Graphical representation� Least squares� Moving average� Time series

REFERENCESChatfield, C.: The analysis of Time Series.

An introduction. Chapman’n’Hall (2003)

SemilogarithmicPlot

A semilogarithmic plot is a graphical rep-resentation of a time series. It is defined byan arithmetic scale of time t plotted on theabscissa axis and a logarithmic scale plottedon the ordinate axis, meaning that the loga-rithm of the observed value Yt will be tran-scribed in ordinate on an arithmetic scale.In general, printed semilogarithmic sheets ofpaper are used.

MATHEMATICAL ASPECTSOn an axis system, an origin and a unitlength are arbitrarily chosen; the abscissais arithmetically graduated as on millimeterpaper.On the ordinate axis, the unit length chosenby the maker of the semilogarithmic paper,which corresponds to the distance betweentwo successive powers of 10 on the log-arithmic scale, can be modified. This dis-tance, which is the unit length of the corre-sponding arithmetic scale, is called the mod-ule.The values printed in the margin of the log-arithmic scale can be multiplied by the samenumber; this multiplicative constant is cho-sen to be as simple as possible, in such a waythat theobtained valuesareeasy to use for thetranscription of the points on the plot.Commercially available semilogarithmicsheets of paper usually have one, two, three,or four modules.

DOMAINS AND LIMITATIONSThe semilogarithmic plot has the followingproperties.• The exponential curves Y are represented

by a line. If y(x) is an exponential:

y (x) = c · ax ,

by taking the logarithm:

log y (x) = log c+ x · log a

where log y (x) = m · x+ h

where m = log a and h = log c ,

we find a linear expression in x.• If a = 1+ i, where i is the annual interest

rate of Y and with an initial sum of money

Page 482: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

484 Serial Correlation

of Y0, then:

Y = Y0 · (1+ i)x

gives the sum of money after x years.It is also possible to work in Napierianbase (e) since ax = ex·ln a and thereforey(x) = c · ex·ln a or y(x) = c · er·x, wherer = ln a is called the instant growth rateof y. In thiscase, it is the ratio of the instantvariation dy

dx to the value of y.The use of the semilogarithmic plot is judi-cious in the following situations:• When there exist great differences in val-

ues in the variable to avoid going outsidethe plot.

• When one wants to make relative varia-tions appear.

• When cumulations of growth rates, whichwould make exponentials appear, must berepresented in a linear fashion.

FURTHER READING� Graphical representation� Time series

Serial Correlation

Wecall serial correlation or autocorrelationthe dependence between the observations oftime series.

HISTORYSee time series.

MATHEMATICAL ASPECTSLet x1, . . . , xn be n observations of a ran-dom variable X that depends on time (we arespeaking here of time series). To measure thedependence between observations and time,

we use the following coefficient:

d =

n∑t=2

(xt − xt−1)2

n∑t=1

x2t

,

which indicates the relation between the twoconsecutive observations. This coefficienttakes values between 0 and 4. If it is close to0, so that the difference between two succes-sive observations is minimal, wespeak abouta positive autocorrelation. If it is close to 4,then we have a negative autocorrelation: anobservation with a large value has a tendencyto be followed by a low value and vice ver-sa. In the case where observations are inde-pendent of time, the coefficient d has a valueclose to 2.We can also measure the serial correlationof the observations x1, . . . , xn calculatingthe coefficient of correlation between theseries of xt and those of xt “moved on k”:

rk =

n∑t=k+1

(xt − x) (xt−k − x)

n∑t=1

(xt − x)2

and, analyzing in the same way, measure theserial correlation for the coefficient of corre-lation of two different series.

DOMAINS AND LIMITATIONSAutoregressive ModelsWhen a serial correlation is observedbetween the residuals of a model of sim-ple linear regression, convenient models ofthe domain of time series can be used withsuccess, for example, models of the type

Yt = β0 + β1Xt + εt ,

Page 483: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Sign Test 485

where the terms of error εt are not inde-pendent but are described by a relation of thetype

εt = ρsεt−1 + δt ,

where ρs can be estimated by the serial cor-relation of the εt and where the δt are inde-pendent errors.

EXAMPLESGraph of Residuals as Functions of TimeThis type of graph is especially used in theanalysis of time series, that is, when time isan explanatory variable of the model.Residuals relative to time.

A graphical representation of residuals rel-ative to time always entails the appearanceof a serial correlation, positive or negative,as in the case of the figure above. A positiveor negative serial correlation implies that theerrors are not independent.Residuals representing a serial correlation.

FURTHER READING� Autocorrelation� Hypothesis testing� Time series

REFERENCESBartlett, M.S.: On the theoretical specifi-

cation of sampling properties of autocor-related time series. J. Roy. Stat. Soc. Ser.B 8, 27–41 (1946)

Box, G.E.P., Jenkins, G.M.: Time SeriesAnalysis: Forecasting and Control (SeriesinTimeSeriesAnalysis).HoldenDay,SanFrancisco (1970)

Durbin, J., Watson, G.S.: Testing for serialcorrelation in least squares regression, II.Biometrika 38, 159–177 (1951)

Sign TestThe sign test is a nonparametric test andcan be used to test the hypothesis that thereis no difference between the distribution oftwo random variables. We owe its name tothe fact that it uses “+” and “−” signs insteadof quantitative values. Thus it is applicableeven when a series is of an ordinal measure.

HISTORYIn what surely was the first publication aboutnonparametric tests, Arbuthnott, J. (1710)studied the list of births registered in Londonin a period of 82 years. For each year he com-pared the number of children born of eachsex. He denoted by “+” the event “more boysthan girls are born” and by “−” the oppositeevent. (There was no equality).To his greatest surprise, 82 times he came upwith a “+” sign and never a “−” sign; thismade him reject the null hypothesis of anequality of births relative to the sex of thechild.

MATHEMATICAL ASPECTSConsider n pairs of observations (x1, y1),(x2, y2), . . . , (xn, yn). For each pair (xi, yi),

Page 484: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

486 Sign Test

we make the following comparison accord-ing to the dimension of xi relative to yi :

“+′′ if xi < yi

“−′′ if xi > yi

“ =′′ if xi = yi

We subtract from the dimension of the sam-ple the number of “=” signs that appear, thatis, we take into account only the pairs thatrepresent a positive or negative difference.We denote by m this number of pairs.We count then the number of “+” signs,which we designate by T.

HypothesesAccording to the test, one-tailed or two-tailed, the null and alternative hypothesescorresponding to the test are:A: Two-sided case:

H0 : P (X < Y) = P (X > Y)

H1 : P (X < Y) �= P (X > Y)

B: One-sided case:

H0 : P (X < Y) ≤ P (X > Y)

H1 : P (X < Y) > P (X > Y)

C: One-sided case:

H0 : P (X < Y) ≥ P (X > Y)

H1 : P (X < Y) < P (X > Y)

In case A, we make the null hypothesis (H0)that the probability that X is smaller than Yis the same as the probability that X is greaterthan Y.In case B, we suppose a priori that the proba-bility that X is smaller than Y is smaller thanor equal to the probability that X is greaterthan Y.Finally, in case C, we suppose a priori thatthe probability that X is smaller than Y is

greater than or equal to the probability thatX is greater than Y.With the two-tail case, the probability that Xis smaller than Y (as that of X > Y) equals 1

2 .The sign test is a particular case of the bino-mial test with p = 1

2 , that is, that under thenull hypothesis H0 we want to get as many“+” as “−”.

Decision RulesCase AWe use the binomial table and we search forthe number X with corresponding binomialprobability equal to 1

2 .We should find in the table the value closestto α

2 , where α is the significance level.We reject H0 at the level α if:

T ≤ tα/2 or T ≥ m− tα/2 ,

with T = number of “+”.When m > 20, we can use the normal tableas an approximation of the binomial distri-bution.We transform statistic T into a randomvariable Z following a standard normaldistribution:

Z = T − μ

σ,

where

μ = m · p = 1

2m

and σ = √m · p · q = 1

2

√m

are the mean and the standard deviation ofthe binomial distribution.Thus we obtain:

Z = T − 12 m

12

√m

.

Page 485: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Sign Test 487

The approximation of the value of the bino-mial table is then:

tα/2 = 1

2

(m+ zα/2

√m)

,

where zα/2 is to to be found in the normaltable at the level α

2 . Then, the decision ruleis the same.Case BWe reject H0 at the level α if:

T ≥ m− tα ,

where tα is the value of the correspondingbinomial table and the significance level α

(or the closest value).For m > 20, we make the approximation

tα = 1

2

(m+ zα

√m)

,

where zα is to be found in the normal tableat the level α.Case CWe reject H0 at the level α if:

T ≤ tα ,

where thevalueof tα is the sameas forcaseB.

DOMAINS AND LIMITATIONSThe requirements of the use of the sign testare the following:1. Thepairsof random variables (Xi, Yi), i =

1, 2, . . . , n must be mutually independent.2. The scale of measure must be at least ordi-

nal, that is, a “+”, “−”, or “=” sign can beassociated to each pair.

3. The pairs (Xi, Yi) , i = 1, 2, . . . , n mustrepresent a logic between them, that is, ifP(“+′′) > P

(“−′′) for one pair (Xi, Yi),

then P(“+′′) > P

(“−′′) for all the other

pairs.The same principle applies if P

(“+′′) <

P(“−′′) or P

(“+′′) = P

(“−′′).

EXAMPLESA manufacturer of chocolate is studyinga new type of packaging for one of his prod-ucts. To this end, he proposes to 10 con-sumers to rate the old and new packaging ona scale of 1 to 6 (with 1 meaning “stronglydislike” and 6 meaning “strongly like”).Weperform aone-sided test (correspondingto case C) where the null hypothesis that wewant to test is written:

H0 : P(“+′′) ≥ P

(“−′′)

relative to the alternative hypothesis:

H1 : P(“+′′) < P

(“−′′) .

The “+” sign means that the new packagingis preferred to the old one.The results obtained are:

RankCon-sumer old

packagingnewpackaging

Sign ofdiffer-ence

1 4 5 +2 3 5 +3 1 1 =4 5 4 −5 4 4 =6 2 6 +7 4 6 +8 4 5 +9 2 3 +

10 1 4 +

The table shows that two consumers saidthere was no difference between the new andold packaging. So there are eight differences(m = 8) among which the new packagingwas, in seven cases, preferred over the oldone. Thus we have T = 7 (number of “+”).So we reject H0 at the level α if

T ≤ tα ,

Page 486: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

488 Significance Level

where tα is the value of the binomial table.We choose α = 0.05; the value of tα is 1(for m = 8 and α = 0.0352). Thuswe have T > tα because T = 7 andtα = 1. We do not reject the H0hypothesis:P(“+′′) ≥ P

(“−′′), and the chocolate man-

ufacturer may conclude that consumers gen-erally prefer the new packaging.We take once more the same example, butthis time the study is conducted on 100 con-sumers. The assumed results are the follow-ing:

83 people prefer the new packaging.

7 people prefer the old packaging.

10 people are indifferent.

Thus we have:

m = 100− 10 = 90 ,

T = 83 .

The value of tα can be approximated by thenormal distribution:

tα = 1

2

(m+ zα

√m)

,

where zα is the value of the normal table forα = 0.05. This gives:

tα = 1

2

(90− 1.64

√90

)

= 37.22 .

If T is greater than tα (83 > 37.22), then thenull hypothesis H0: P

(“+′′) ≥ P

(“−′′) is

not rejected and the chocolate manufacturerdraws the same conclusion as before.

FURTHER READING� Binomial table� Binomial test� Hypothesis testing� Nonparametric test� Normal distribution

REFERENCESArbuthnott, J.:An argumentforDivineProv-

idence, taken from the constant regularityobserved in the births of both sexes. Phi-los. Trans. 27, 186–190 (1710)(3.4).

Significance Level

The significance level is a parameter ofa hypothesis test, and its value is fixed bythe user in advance.The significance level of a hypothesis test,denoted by α, is the probability of rejectingthe null hypothesis H0 when it is true:

α = P{reject H0|H0 true} .

The significance level is also called theprobability of Type I error.

HISTORYIn 1928, Jerzy Neyman and Egon Pearsondiscussed the problems related to whetheror not a sample may be judged as likelyto have been drawn from a certain popula-tion. They identified two types of error asso-ciated with possible decisions. The prob-ability of the type-I error (rejecting thehypothesisgiven it is true) corresponds to thesignificance level of the test.See hypothesis testing.

DOMAINS AND LIMITATIONSConcerned with the possibility of error byrejecting the null hypothesis when it is true,statisticians construct a hypothesis test suchthat the probability of error of this type doesnot exceed a fixed value, given in advance.This value corresponds to the significancelevel of the test.

Page 487: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Significance Level 489

Note that statisticians conducting a hypoth-esis test must confront the second type oferror, called theType II error. It takesplacewhenthenullhypothesis isnot rejectedwhenin reality it is false.To decrease the global risk of error in deci-sion making, it is not enough to decrease thesignificance level, but a compromise shouldbe found between the significance level andthe probability of error of the second type.One method is based on the study of the pow-er of the test.

EXAMPLESIn the following example, we illustrate theinfluence of the choice of the significancelevel when carrying out a hypothesis test.Consider a right one-sided test on themean μ of a population whose variance σ 2

equals 36.Let us state the following hypothesis:

Null hypothesis H0 : μ = 30

Alternative hypothesis H1 : μ > 30 .

Let us suppose that the mean x of a sampleof dimension n = 100 taken from this pop-ulation equals 31.

Case 1Choose a significance levelα = 5%. We findthe critical value of zα = 1.65 in the nor-mal table. The upper limit of the acceptanceregion is determined by:

μx + zα · σx ,

whereμx is the mean of the sampling distri-bution of the means and σx is the standarderror (or thestandard deviationof thesam-pling distribution of the means):

σx = σ√n

.

This upper limit equals:

30+ 1.65 · 6√100= 30.99 .

The rejection region of the null hypothe-sis corresponds to the interval [30.99;∞[and the acceptance region to the interval]− ∞; 30.99[.

Case 2If we fix a smaller significance level, forexample α = 1%, the value of zα in the nor-mal table equals 2.33 and the upper limit ofthe acceptance region now equals:

μx + zα · σx = 30+ 2.33 · 6√100

= 31.4 .

The rejection region of the null hypothesiscorresponds to the interval [31.4;∞[ andthe acceptance region to the interval ] −∞; 31.4[.From the observed means x = 31 we cansee that in the first case (α = 5%) we mustreject the null hypothesis because the sam-pling mean (31) is in the rejection region. Inthe second case (α = 1%) we must make theconverse decision: the null hypothesis can-not be rejected because the sampling meanis in the acceptance region.Wesee that thechoiceof thesignificance lev-el has a direct influence on the decision aboutthe null hypothesis and, in consequence, onthe decision-making process. A smaller sig-nificance level (and thus a smaller probabi-lity of error of the first type) restricts therejectionregionandincreases theacceptanceregion of the null hypothesis.A smaller risk of error is associated toa smaller precision.

Page 488: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

490 Simple Index Number

FURTHER READING� Hypothesis testing� Type I error

REFERENCESNeyman, J., Pearson, E.S.: On the use and

interpretation of certain Test Criteria forpurposes of Statistical Inference Part I(1928). Reprinted by Cambridge Univer-sity Press in 1967

Simple Index NumberA simple index number is the ratio of twovalues representing the same variable, mea-sured in two different situations or in two dif-ferent periods. For example, a simple indexnumber of price will give the relative varia-tion of the price between the current periodand a reference period. The most common-ly used simple index numbers are those ofprice, quantity, and value.

HISTORYSee index.

MATHEMATICAL ASPECTSThe index numberIn/0,whichisrepresenta-tiveofavariableG insituationnwithrespectto the same variable in situation 0 (referencesituation), is defined by:

In/0 = Gn

G0,

where Gn is the value of variable G in situ-ation n and G0 is the value of variable G insituation 0.Generally, a simple index number isexpressed in base 100 in reference situa-tion 0:

In/0 = Gn

G0· 100 .

Properties of Simple Index Numbers• Identity: If two compared situations (or

two periods) are identical, the value of theindex number is equal to 1 (or 100):

In/n = I0/0 = 1 .

• A simple index number is reversible:

In/0 = 1

I0/n.

• A simple index number is transferable.Consider two index numbers I1/0 and I2/0

representing two situations, 1 and 2, withrespect to the same base situation 0. If weconsider the index number I2/1, we cansay that:

I2/0 = I2/1 · I1/0 .

Notice that the transferability (also calledcircularity) implies reversibility:

If I0/1 · I1/0 = I0/0 = 1, then I0/1 = 1

I1/0.

DOMAINS AND LIMITATIONSIn economics it is very rare that simple indexnumbers are exploitable in themselves; theinformation they contain is relatively limit-ed. Nevertheless, they have a fundamentalimportance in the construction of compos-ite index numbers.

EXAMPLESConsider the following table representingthe prices of a certain item and the quanti-ties sold at different periods:

Price Quantity

Period 1 Period 2 Period 1 Period 2

P0 = 50 Pn = 70 Q0 = 25 Qn = 20

Page 489: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Simple Linear Regression 491

If Pn is the price of the period of interest andP0 is thepriceof the referenceperiod,wewillhave:

In/0 = Pn

P0· 100 = 70

50· 100 = 140 ,

which means that in base 100 in period 1,the price index of the item is 140 in peri-od 2. The price of the item increased by 40%(140−100)betweenthereferenceperiodandthe actual period.We can also calculate a simple index numberof quantity:

In/0 = Qn

Q0· 100 = 20

25· 100 = 80 .

The sold quantity therefore has decreased by20%(100−80)between the referenceperiodand the actual period.Using these numbers, we can also calculatea value index number, value being defined asprice multiplied by quantity:

In/0 = Pn · Qn

P0 · Q0= 70 · 20

50 · 25· 100 = 112 .

The value of the considered item has there-fore increased by 12% (112−100) betweenthe two considered periods.

FURTHER READING� Composite index number� Fisher index� Index number� Laspeyres index� Paasche index

Simple LinearRegression

Thesimple linear regression isananalysis ofregression where the dependent variable

Y linearly depends on a single independentvariable X.The simple linear regression aims to not onlyestimate the regression function relative tothe chosen model but also test the reliabilityof the obtained estimations.

HISTORYSee analysis of regression.

MATHEMATICAL ASPECTSThe model of simple linear regression is ofthe form:

Y = β0 + β1X + ε ,

whereY is thedependent variable (or expli-cated variable), X is the independent vari-able (or explanatory variable), ε is the termof random non-observable error, andβ0 andβ1 are the parameters to estimate.If we have an set of n observations (X1, Y1),. . . , (Xn, Yn), where Yi is linearly dependenton the corresponding Xi, we can write

Yi = β0 + β1Xi + εi , i = 1, . . . , n .

The problem with regression analysisconsists in estimating parameters β0 andβ1, choosing the values β0 and β1 suchthat the distance between Yi and (β0+β1 · Xi) is minimal. We should have:

εi = Yi − β0 − β1 · Xi

small for all i = 1, . . . , n. To achieve this, wecan choose among many criteria:1. min

β0,β1max

i|εi|

2. minβ0,β1

n∑i=1

|εi|

3. minβ0,β1

n∑i=1

ε2i

Page 490: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

492 Simple Linear Regression

The most used method of estimation ofparameters β0 and β1 is the third, called theleast squares. It aims to minimize the sumof squared errors.Theestimationof theparametersby the least-squares method gives the following estima-tors β0 and β1:

β1 =

n∑i=1

(Xi − X

) (Yi − Y

)

n∑i=1

(Xi − X

)2

β0 = Y − β1X .

We can then write:

Yi = β0 + β1Xi ,

where Yi is the estimated value of Yi for a giv-en Xi when β0 and β1 are known.

Measure of Reliability of Estimation of YWehavecalculatedanestimationof thevalueof Y, or Y , with the help of the least-squaresmethod, basing our calculation on a linearmodel to translate the relation that relates Yto X. But how far can we trust this model?To answer this question, it is useful to per-form an analysis of variance and to test thehypothesis on parameters β0 and β1 of theregression line. To carry out these tests, wemust make the following suppositions:• For each value of X, Y is a random vari-

able distributed according to the normaldistribution.

• The variance of Y is the same for all X;it equals σ 2 (unknown).

• The different observations on Y are inde-pendent of each other but conditioned bythe values of X.

Analysis of VarianceThe table of the analysis of variance that wemust construct is the following:

Analysis of variance

Sourceof var-iation

Degreeof free-dom

Sum ofsquares

Mean ofsquares

Regres-sion

1n∑

i=1

(Yi − Y

)2 n∑i=1

(Yi − Y

)2

Resi-dual

n− 2n∑

i=1

(Yi − Y

)2

n∑i=1

(Yi−Y

)2

n−2

Total n− 1n∑

i=1

(Yi − Y

)2

If the model is correct, then

S2 =

n∑i=1

(Yi − Y

)2

n− 2

is an unbiased estimator of σ 2.The analysis of variance allows us to test thenull hypothesis:

H0 : β1 = 0

against the alternative hypothesis:

H1 : β1 �= 0

calculating the statistic:

F = EMSE

RMSE= EMSE

S2 .

This statistic F must be compared with thevalue Fα,1,n−2 of the Fisher table, where α

is the significance of the test.

⇒If F ≤ Fα,1,n−2 , then we accept H0.

If F > Fα,1,n−2 , then we reject H0 for H1.

The coefficient of determination R2 is cal-culated in the following manner:

R2 = ESS

TSS= β ′X′Y− nY2

Y′Y− nY2,

whereESS is thesumofsquaresof theregres-sion and TSS is the total sum of squares.

Page 491: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Simple Linear Regression 493

Hypothesis on the Slope β1

In the case of a simple linear regression, thestatistic F allows to test a hypothesis usingthe parameters of the regression equation orβ1, the slopeof the line. Anotherway to carryout the same test is as follows:If H0 is true (β1 = 0), then statistic t:

t = β1

Sβ1

follows a Student distribution with (n−2)degrees of freedom.S

β1is the standard deviation of β1, estimat-

ed from the sample:

Sβ1=

√√√√√S2

n∑i=1

(Xi − X

)2.

Statistic t must be compared with the valuet α

2 ,n−2 of the Student table, where α is thesignificance level and n − 2 the number ofdegrees of freedom. The decision rule is thefollowing:

|t| ≤ t α2 ,n−2

⇒ we accept H0 (β1 = 0).

|t| > t α2 ,n−2

⇒ we reject H0 for H1 (β1 �= 0).

It is possible to show that in the case of a sim-ple linear regression, t2 equals F.Statistic t allows to calculate a confidenceinterval for β1.

Hypothesis on Ordinate in Origin β0

In a similar way, we can construct a confi-dence interval for β0 and test the hypothesis:

H0 : β0 = 0

H1 : β0 �= 0

Statistic t:

t = β0

Sβ0

also follows a Student distribution with (n−2) degrees of freedom.The estimated standard deviation of β0,denoted by S

β0, is defined by:

Sβ0=

√√√√√√√√

S2n∑

i=1X2

i

nn∑

i=1

(Xi − X

)2.

The decision rule is the following:

|t| ≤ t α2 ,n−2

⇒ we accept H0 (β0 = 0).

|t| > t α2 ,n−2

⇒ we reject H0 for H1 (β0 �= 0).

t α2 ,n−2 is obtained from the Student table for

(n−2) degrees of freedom and a significancelevel α.The coefficient of determination R2 is cal-culated in the following manner:

R2 = ESS

TSS= β ′X′Y − nY2

Y ′Y − nY2,

whereESS is thesumofsquaresof theregres-sion and TSS is the total sum of squares.

DOMAINS AND LIMITATIONSSimple linear regression is a particular caseof multiple linear regression. The matrixapproach revealed in the multiple linearregression (model containing many inde-pendent variables) is also valid for the par-ticular case where we have only one inde-pendent variable.See analysis of regression.

EXAMPLESThe following table represents the grossnational product (GNP) and the demand for

Page 492: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

494 Simple Linear Regression

staples for the period 1969 to 1980 in certaincountries.

Year GNP Demand for staple

X Y

1969 50 6

1970 52 8

1971 55 9

1972 59 10

1973 57 8

1974 58 10

1975 62 12

1976 65 9

1977 68 11

1978 69 10

1979 70 11

1980 72 14

We want to estimate the demand for staplesdependingontheGNPaccordingtothemod-el:

Yi = β0 + β1Xi + εi, i = 1, . . . , n .

The estimation of parameters β0 and β1 bythe least-squares method gives us the fol-lowing estimators:

β1 =

12∑i=1

(Xi − X

) (Yi − Y

)

12∑i=1

(Xi − X

)2= 0.226

β0 = Y − β1X = −4.04 .

The estimated line can be written as:

Y = −4.04+ 0.226X .

Analysis of VarianceWe will calculate the degrees of freedom aswell as the sum of squares and the mean ofsquares in order to establish the table of the

analysis of variance:

dlreg = 1 ,

dlres = n− 2 = 10 ,

dltot = n− 1 = 11 ;

ESS =12∑

i=1

(Yi − Y

)2

= (7.20− 9.833)2 + · · ·+ (12.23− 9.833)2

= 30.457 ;

RSS =12∑

i=1

(Yi − Yi

)2

= (6− 7.20)2 + · · ·+ (14− 12.23)2

= 17.21 ;

TSS =12∑

i=1

(Yi − Y

)2

= (6− 9.833)2 + · · ·+ (14− 9.833)2

= 47.667 ;EMSE = ESS

dlreg= 30.457

S2 = RMSE = RSS

dlres= 1.721 .

Thus we have the following table:

Analysis of variance

Sourceof var-iation

Degreesof free-dom

Sum ofsquares

Mean ofsquares

F

Re-gres-sion

1 30.457 30.457 17.687

Resi-dual

10 17.211 1.721

Total 11 47.667

Page 493: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Simple Linear Regression 495

With a significance levelofα = 5%,wefindin the Fisher table the value

Fα,1,n−2 = F0.05,1,10 = 4.96 .

As a result of F > F0,05,1,10, we reject thenull hypothesis H0: β1 = 0, which meansthat β1 is significatively different from zero.To calculate the coefficient of determina-tion of this example, it is enough to look atthe table of analysis of variance because itcontains all the required elements:

R2 = ESS

TSS= 30.457

47.667= 0.6390 = 63.90 .%

We can conclude that, according to the cho-sen model, 63.90% of the variation of thedemand for staples is explained by the vari-ation in GNP.It isevident that theR2valueofcannotexceed100; the value 63.90 is large enough, but itis not close enough to 100 to discourage onefrom trying to improve the model.This can mean:• That except for the GNP, other variables

should be taken into account for a fin-er determination of the function of thedemand of staples, so the GNP explainsonly part of the variation.

• We should test another model (one with-out constant term, nonlinear model, etc).

Hypothesis testing on the parameters willallow us to determine if they are significa-tively different from zero.

Hypothesis on Slope β1

In the table of analysis of variance, we have:

S2 = RMSE = 1.721 ,

which allows to calculate the standard devi-ation S

β1of β1:

Sβ1=

√√√√√√S2

12∑i=1

(Xi − X

)2

=√√√√√√

1.721

(50− 61.416)2 + · · ·+ (72− 61.416)2

=√

1.721

596.92= 0.054 .

We can calculate the statistic:

t = β1

Sβ1

= 0.226

0.054= 4.19 .

Choosing a significance level ofα = 5%, weget:

t α2 ,n−2 = t0.025,10 = 2.228 .

Comparing this value of t with the value inthe table, we get:

|t| > t α2 ,n−2 ,

which indicates that the null hypothesis:

H0 : β1 = 0

must be rejected for the alternative hypoth-esis:

H1 : β1 �= 0 .

Weconclude that thereexistsa linear relationbetweentheGNPandthedemandforstaples.If H0 were accepted, that would mean thatthere was no linear relation between thesetwo variables.Remark: We can see that

F = t2,

17.696 = 4.192

(without taking into account errors of round-ness). In consequence, we can use indiffer-ently test t or test F to test the hypothesis H0:β1 = 0.

Page 494: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

496 Simple Linear Regression

Hypothesis on Ordinate in Origin β0

The standard deviation Sβ0

of β0 is:

Sβ0=

√√√√√√√√

S212∑

i=1X2

i

n12∑

i=1

(Xi − X

)2.

=√

1.721(502 + · · · + 722

)

(12 · 596.92)

=√

1.721 · 45861

7163.04= 3.31 .

We can calculate the statistic:

t = β0

Sβ0

= −4.04

3.31= −1.22 .

Comparing this value of t with the value ofthe Student table t0.025,10 = 2.228, we get:

|t| < t α2 ,n−2 ,

which indicates the null hypothesis:

H0 : β0 = 0

must be accepted at the significance levelα = 5%. We conclude that the value of β0

is not significatively different from zero, andthe line passes through the origin.

New Model: Regression Passing ThroughOriginThe new model is the following:

Yi = β1Xi + εi , i = 1, . . . , 12 .

Making the estimation of the parameter β1

according to the least-squares method, wehave:

12∑i=1

ε2i =

12∑i=1

(Yi − β1Xi)2 .

Setting to zero the derivative by β1, we get:

∂12∑

i=1ε2

i

∂β1= −2

12∑i=1

Xi (Yi − β1Xi) = 0

or12∑

i=1

(XiYi)− β1

12∑i=1

X2i = 0 .

The value of β1 that satisfies the equation ofthe estimator β1 of β1 is:

β1 =

12∑i=1

(XiYi)

12∑i=1

X2i

= 7382

45861= 0.161 .

In consequence, the new regression line isdescribed by:

Yi = β1Xi

Yi = 0.161Xi .

The table of analysis of variance relative tothis new model is the following:

Analysis of variance

Sourceof var-iation

Degreesoffreedom

Sum ofsquares

Mean ofsquares

F

Re-gres-sion

1 1188.2 1188.2 660.11

Resi-dual

11 19.8 1.8

Total 12 1208.0

Note that in a model without a constant, thenumber of degrees of freedom for the totalvariation equals the number of observationsn. The number of degrees of freedom for theresidual variation equals n− 1.

Page 495: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Simple Random Sampling 497

Thus we can determine R2:

R2 = ESS

TSS= 1188.2

1208.0= 0.9836

= 98.36% .

We have found with the first model the val-ue R2 = 63.90%. We can say that this newmodel without a constant term is much betterthan the previous one.

FURTHER READING� Analysis of residuals� Coefficient of determination� Correlation coefficient� Least squares� Multiple linear regression� Normal equations� Regression analysis� Residual

REFERENCESSeber, G.A.F. (1977) Linear Regression

Analysis. Wiley, New York

Simple Random SamplingSimple random sampling is a samplingmethod whereby one chooses n unitsamongst the N units of a population in sucha way that each of the Cn

N possible sampleshas the same probability of being selected.

HISTORYSee sampling.

MATHEMATICAL ASPECTSTo obtain a simple random sample of sizen, the individuals of a population are firstnumbered from 1 to N; then n numbers aredrawn between 1 and N. The drawings aredone in principle without replacement.

Tablesof random numberscan also beusedto obtain a simple random sample. The goalof a simple random sample is to provide anestimation without a bias of the mean andof the variance of the population. Indeed,if we denote by y1, y2, . . . , yN the character-istics of a population and by y1, . . . , yn thecorresponding values in the simple randomsample, we obtain the following results:

Population: Sample:

Total T =N∑

i=1

yi t =n∑

i=1

yi

Mean Y = TN

y = tn

Vari-ance

σ2 =

N∑

i=1

(yi − Y)2

N − 1s2 =

n∑

i=1

(yi − y)2

n− 1

DOMAINS AND LIMITATIONSSimple random sampling can be done withor without replacement. If a drawing is per-formed with replacement, then the popula-tion always remains the same. The sampleis therefore characterized by a series of inde-pendent random variables that are identi-cally distributed. If the population is suffi-ciently large, and if the size of the sampleis relatively small with respect to the popu-lation, it can be considered that the randomvariables are independent even if the draw-ings are done without replacement. It is thenpossible to estimatecertain characteristicsofthe population by determining them fromthis sample. Simple random sampling is thebasis of the theory of sampling.

EXAMPLESConsider a population of 658 individu-als. To form a simple random sample, we

Page 496: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

498 Simulation

attribute a three-digit number to every indi-vidual: 001, 002, . . . , 658. Then with a ran-dom number table, we obtain a sample often individuals. We randomly choose a firstdigit and a reading direction. We then readthe first three digits until we obtain ten dis-tinct numbers between 001 and 658 to defineour sample.

FURTHER READING� Cluster sampling� Estimation� Estimator� Sampling� Sampling distribution� Stratified sampling� Systematic sampling

REFERENCESCochran, W.G.: Sampling Techniques, 2nd

edn. Wiley, New York (1963)

Simulation

Simulation is a method for analyzing,designing and operating complex systems.Simulation involve designing a model ofa system and carrying out experiments on itas it progresses.The fundamental problem of simulation isin the construction of the artificial sam-ples relative to a statistically known distri-bution. These distributions are generallyknown empirically: they result in a statisti-cal study from which we can determine theprobability distributions of the random vari-ables characterizing the phenomenon.Once thedistributionsareknown, thesampleis constructed by generating random num-bers.

HISTORYThe method of simulation has a long histo-ry. It was first introduced by Student (1908),whodiscovered thesamplingdistributionsofthe t statistic and the coefficient of corre-lation.More recently, thanks to computers, simula-tion methods and the Monte Carlo methodshave progressed rapidly.

EXAMPLESConsider the phenomenon of waiting inthe checkout line of a large store. Supposethat we want to know the best “system ofcashiers”, which is that where the sum of thecosts of inactivity of cashiers and the costsof waiting customers are the smallest.The evolution of the phenomenon essential-ly depends on the distribution of the arrivalof customers at the cashiers and the distri-bution of the time of service. The simulationconsists in constructing a sampling of cus-tomer arrivals and a sample of time of ser-vice. Thus we can calculate the cost of a sys-tem.Theexperiment can be repeatedfordif-ferent systems and the best system chosen.

FURTHER READING� Bootstrap� Generation of random numbers� Jackknife method� Monte Carlo method� Random number

REFERENCESGosset, S.W. “Student”: The Probable

Error of a Mean. Biometrika 6, 1–25(1908)pp. 302–310.

Kleijnen, J.P.C.: Statistical Techniques inSimulation (in two parts). Vol. 9 of Statis-

Page 497: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Spatial Data 499

tics: Textbooks and Monographs. MarcelDekker, New York (1974)

Kleijnen, J.P.C.: Statistical Tools for Sim-ulation Practitioners. Vol. 76 of Statis-tics: Textbooks and Monographs. MarcelDekker, New York (1987)

Naylor, T.H., Balintfy, J.L., Burdick, D.S.,Chu, K.: Computer Simulation Tech-niques. Wiley, New York (1967)

Snedecor, George Waddel

Snedecor, George Waddel was born in 1881in Memphis and died in 1974 in Amherst.In 1899, he entered the Alabama Polytech-nic Institute in Auburn and stayed there for2years.Hethenspent2yearsearningateach-er’s certificate. When his family moved toTuscaloosa in 1903, Snedecor, George Wad-del was transferred to Alabama University,wherehereceived, in1905,hisB.S. inmathe-matics and physics. He accepted his first aca-demic position at the Selma Military Acade-my, where he taught from 1905 to 1907.From 1907 and to 1910 he taught mathe-matics and Greek at Austin College in Sher-man.In 1910, Snedecor, George Waddel moved toAnn Arbor, MI, where he completed his edu-cation at Michigan State University in 1913,having earned a master’s degree. In the sameyear, he was hired at the University of Iowain Ames, where he stayed until 1958.In 1927, the Mathematics StatisticalService was inaugurated in Ames withSnedecor,G.W.and Brandt,A.E. at thehelm.This Service was the precursor of the Statis-tical Laboratory inaugurated in 1933, whereSnedecor, G.W. was director and whereCox,Gertrude worked.

In 1931, Snedecor, G.W. was named pro-fessor in the Mathematics Department atIowa State College. During this period,Snedecor, G.W. developed a friendshipwith Fisher, Ronald Aylmer. In 1947,Snedecor, G.W. left the Statistical Labo-ratory. He became professor of statisticsat Iowa State College and remained at thisposition until his retirement in 1958. He diedin February 1974.

Principal work of Snedecor, George Wad-del:

1937 Statistical Methods Applied to Expe-riments in Agriculture and Biology.Collegiate, Ames, USA.

Spatial DataSpatial data are special kind of data (oftencalled map data, geographic data) that referto a spatial localization, such as, for exam-ple, the concentration of pollutants in a giv-en area, clouds, etc., and all the informationthat can be represented in the form of a map.Included under spatial data are data witha denumerable collection of spatial sites, forexample, the distribution of child mortalityin different cities and countries.We can also consider that spatial data are therealization of a regional variable. The theo-ry of the regional variable presupposes thatany measure can be modeled (visualized) asbeing therealizationofarandomfunction(orrandom process). For example, samples tak-en on the ground can be seen as realizationsof a regional variable (see geostatistics).

HISTORYWith the development of territory informa-tion systems (TIS) or geographic informa-

Page 498: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

500 Spatial Data

tion systems (GIS), the notion of spatial databecame widely used in statistics.

DOMAINS AND LIMITATIONSA GIS is a collection of information pro-gramsthat facilitate,basedongeoreferences,the integration of spatial, nonspatial, qualita-tive, and quantitative data in data bases thatcan be organized as a unique system. Spa-tial data come from widely differing fieldsand often demand a selection and an analysisaccording to specific criteria for each type ofresearch. Satellites collect enormous quanti-ties of data, and only a tiny fraction is ana-lyzed. This wealth of data must be verifiedby the selection process. Data selection canbe done according to the type of problemsto resolve and the models used to make theanalysis.

EXAMPLESGenerally, spatial data can be treated asrealizations of random variables such as inan ordinary statistical analysis. Below wepresent, as an example of spatial data, datacollected in Swiss Jura. It is a set of 259data on heavy-metal pollution in a region of14.5 km2, 10 of which are represented here:

X Y Ground Rock Cd Co

2.386 3.077 3 3 1.740 9.32

2.544 1.972 2 2 1.335 10.00

2.807 3.347 2 3 1.610 10.60

4.308 1.933 3 2 2.150 11.92

4.383 1.081 3 5 1.565 16.32

3.244 4.519 3 5 1.145 3.50

3.925 3.785 3 5 0.894 15.08

2.116 3.498 3 1 0.525 4.20

1.842 0.989 3 1 0.240 4.52

Cr Cu Ni Pb Zn

38.32 25.72 21.32 77.36 92.56

40.20 24.76 29.72 77.88 73.56

Cr Cu Ni Pb Zn

47.00 8.88 21.40 30.80 64.80

43.52 22.70 29.72 56.40 90.00

38.52 34.32 26.20 66.40 88.40

40.40 31.28 22.04 72.40 75.20

30.52 27.44 21.76 60.00 72.40

25.40 66.12 9.72 141.00 72.08

27.96 22.32 11.32 52.40 56.40

“Ground”, which refers to the use of theground, and “rock”,which describes the typeof underlying rock in the sampled location,are the categorical data (qualitative cate-gorical variables). The different types ofrock are:1. Argovian,2. Kimmeridgian,3. Sequanian,4. Portlandian,5. Quaternary.The use of the ground corresponds to:1. forest,2. pasturage,3. prairie,4. plowed.

FURTHER READING� Classification� Geostatistics� Histogram� Sampling� Spatial statistics

REFERENCESAtteia, O., Dubois, J.-P., Webster, R.: Geo-

statisticalanalysisofsoilcontaminationinthe Swiss Jura. Environ. Pollut. 86, 315–327 (1994)

Goovaerts, P.: Geostatistics for NaturalResources Evaluation. Oxford Univer-sity Press, Oxford (1997)

Page 499: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Spatial Statistics 501

Matheron, G.: La théorie des variablesregionalisées et ses applications. Mas-son, Paris (1965)

Spatial StatisticsSpatial statistics concern the statistical anal-ysis of spatial data. Spatial statistics cov-er all the techniques used to explore andprove the presence of a spatial dependencebetween observations distributed in space.Spatial data can come from geology, earthsciences, image treatment, epidemiology,agriculture, ecology, astronomy, or forestsciences. Data are assumed to be random,and sometimes their place is also assumedto be random.

HISTORYThe statistical treatment of spatial data dateback to Halley, Edmond (1686). Halley, E.superposed on one map the relief of winddirections and monsoon between and aroundthe tropics and tried to explain the causes.The spatial statistics model appeared muchlater. Fisher, Ronald Aylmer (1920–1930),during his researches on the experimentalstation of Rothamsted in England, formu-lated the basis of the principles of randomchoice, of analysis by blocks, and of repli-cation.

MATHEMATICAL ASPECTSThe statistical analysis of spatial data formsthe principal subject of spatial statistics. Theprincipal tools of the statistical analysis arethe histogram, the curve of frequency cumu-lation, the Q-Q plot, the median, the meanof the distribution, the coefficients of varia-tion, and the measure of skewness. In thecase of many variables, the most used statis-tical approaches are to represent them in the

form of a scatterplot and to study their corre-lation.Spatialanalysishas thedistinguishingfeature that it takes into account spatial infor-mation and the relation that exists betweentwo spatial data. Two data close to one anoth-erwillmore likely resembleoneanother thanif they were farther apart. Thus when we ana-lyze a map showing concentrations of pollu-tants on the ground, we would see not ran-dom values but, on the contrary, small valuesof pollutants grouped as well as large values,and the transition between the classes wouldbe continuous.

DOMAINS AND LIMITATIONSSpatial representation in the form of mapsand drawings and spatial statistics can beunited to give a better visual understandingof the influence that the neighboring valueshave on one another. Superposing statisticalinformationonamapallowstobetteranalyzethe information and to draw the appropriateconclusions.

EXAMPLESSee spatial data.

FURTHER READING� Autocorrelation� Geostatistics� Spatial data

REFERENCESCressie, N.A.C.: Statistics for Spatial Data.

Wiley, New York (1991)

Goovaerts, P.: Geostatistics for NaturalResources Evaluation. Oxford Univer-sity Press, Oxford (1997)

Isaaks, E.H., Srivastava, R.M.: An Intro-duction to Applied Geostatistics. OxfordUniversity Press, Oxford (1989)

Page 500: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

502 Spearman Rank Correlation Coefficient

Journel, A., Huijbregts, C.J.: Mining Geo-statistics. Academic, New York (1981)

Matheron, G.: La théorie des variablesregionalisées et ses applications. Mas-son, Paris (1965)

Oliver,M.A.,Webster,R.:Kriging:amethodof interpolation for geographical informa-tion system. Int. J. Geogr. Inf. Syst. 4(3),313–332 (1990)

Spearman RankCorrelation Coefficient

The Spearman rank correlation coefficient(Spearman ρ) is a nonparametric measu-rementcorrelation. It isused todetermine therelation existing between two sets of data.

HISTORYSpearman, Charles was a psychologist. In1904 he introduced for the first time the rankcorrelation coefficient. Often called the ρ ofSpearman, it is one of the oldest rank statis-tic.

MATHEMATICAL ASPECTSLet (X1, X2, . . . , Xn) and (Y1, Y2, . . . , Yn) betwo samples of size n. RXi denotes the rankof Xi compare to the other values of the Xsample, for i = 1, 2, . . . , n. RXi = 1 if Xi isthe smallest value of X, RXi = 2 if Xi is thesecond smallest value, etc., until RXi = n ifXi is the largest value of X. In the same way,RYi denotes the rank of Yi, for i = 1, 2, . . . , n.The Spearman rank correlation coefficient,generally denoted by ρ, is defined by:

ρ = 1−6

n∑i=1

d2i

n(n2 − 1),

where di = RXi − RYi .

If several observations have exactly thesame value, an average rank will be given tothese observations. If there aremany averageranks, it is best to make a correction and tocalculate:

ρ =Sx + Sy −

n∑i=1

d2i

2√

Sx · Sy,

where

Sx =n(

n2 − 1)−

g∑i=1

(t3i − ti

)

12,

with g the number of groups with aver-age ranks and ti the size of group i for theXsample, and

Sy =n(

n2 − 1)−

h∑j=1

(t3j − tj

)

12,

with h the number of groups with averageranks and tj the size of group j for the Y sam-ple.(If there are no average ranks, the observa-tions are seen as groups of size 1, meaningthat g = h = n and ti = tj = 1 for i, j =1, 2, . . . , n and Sx = Sy = n(n2 − 1)/12.)

Hypothesis TestingThe Spearman rank correlation coefficient isoften used as a statistical test to determine ifthere exists a relation between two randomvariables. The test can be a bilateral test ora unilateral test. The hypotheses are:

A: Bilateral case

H0: X and Y are mutually independent.

H1: There is either a positive or a negativecorrelation between X and Y.

Page 501: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Spearman Rank Correlation Coefficient 503

There is a positive correlation when the largevalues of X have a tendency to be associat-ed with large values of Y and small values ofX with small values of Y. There is a negativecorrelation when largevaluesof X havea ten-dency to be associated with small values ofY and vice versa.

B: Unilateral case

H0: X and Y are mutually independent.

H1: There is a positive correlation betweenX and Y.

C: Unilateral case

H0: X and Y are mutually independent.

H1: There is a negative correlation betweenX and Y.

Decision RulesThedecisionrulesaredifferentdependingonthe hypotheses. That is why there are deci-sion rules A, B, and C relative to the previouscases.

Decision rule AReject H0 at the significance level α if

ρ > tn,1− α2

or ρ < tn, α2

,

where t is the critical value of the test givenby the Spearman table.Decision rule BReject H0 at the significance level α if

ρ > tn,1−α .

Decision rule CReject H0 at the significance level α if

ρ < tn,α .

Remark: The notation t does not mean thattheSpearmancoefficientsarerelated to thoseof Student.

DOMAINS AND LIMITATIONSThe Spearman rank correlation coefficient isused as a hypothesis test to study the depen-dence between two random variables. Itcan be considered as a test of indepen-dence.As a nonparametric correlation measu-rement, it can also be used with nominalor ordinal data.A correlation measurement between twovariables must satisfy the following points:

1. It takes values between −1 and +1.2. There is a positive correlation between X

and Y if the value of the correlation coef-ficient is positive; a perfect positive cor-relation corresponds to a value of +1.

3. There is a negative correlation between Xand Y if the value of the correlation coef-ficient is negative; a perfect negative cor-relation corresponds to a value of −1.

4. There is null correlation between X and Ywhen thecorrelation coefficient isclose tozero; one can also say that X and Y are notcorrelated.

The Spearman rank correlation coefficientpresents the following advantages:

• The data can be nonnumerical observa-tions as long as they can be classifiedaccording to certain criteria.

• It is easy to calculate.• Theassociated statistical testdoesnot for-

mulate a basic hypothesis based on theshape of the distribution of the popu-lation from which the samples are tak-en.

TheSpearmantablegives the theoreticalval-ues of the Spearman rank correlation coeffi-cient under the hypothesis of the indepen-dence of two random variables.Here is a sample of the Spearman table forn = 6, 7, and 8 and α = 0.05 and 0.025:

Page 502: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

504 Spearman Rank Correlation Coefficient

n α = 0.05 α = 0.025

6 0.7714 0.8286

7 0.6786 0.7450

8 0.6190 0.7143

A complete Spearman table can be found inGlasser, G.J. and Winter, R.F. (1961).

EXAMPLESIn this example eight pairs of real twins takeintelligence tests. The goal is to see if there isindependence between the test of the first-born twin and that of the one born second.The data are given in the table below, thehighest scores corresponding to the bestresults.

Pair of twins Born 1st Xi Born 2nd Yi

1 90 88

2 75 79

3 99 98

4 60 66

5 72 64

6 83 83

7 86 86

8 92 95

The X are classified amongst themselves andthe Y amongst themselves, and di is calculat-ed. This gives:

Pair oftwins

Born1st Xi

RXiBorn2nd Yi

RYidi

1 90 6 88 6 0

2 75 3 79 3 0

3 99 8 98 8 0

4 60 1 66 2 −1

5 72 2 64 1 1

6 83 4 83 4 0

7 86 5 86 5 0

8 92 7 95 7 0

The Spearman rank correlation coefficient isthen calculated:

ρ = 1−6 ·

8∑i=1

d2i

n(n2 − 1)= 1− 6 · 2

8(82 − 1)

= 0.9762 .

This shows that there is an (almost perfect)positive correlation between the intelligencetests.Suppose that the results for 7 and 8 arechanged. We then obtain the following table:

Pair oftwins

Born1st Xi

RXiBorn2nd Yi

RYidi

1 90 6.5 88 5.5 1

2 75 3 79 3 0

3 99 8 98 7.5 −0.5

4 60 1 66 2 −1

5 72 2 64 1 1

6 83 4.5 83 4 0.5

7 83 4.5 88 5.5 −1

8 90 6.5 98 7.5 −1

Since there are average ranks, we use the for-mula

ρ =Sx + Sy −

8∑i=1

d2i

2√

Sx · Sy

to calculate the Spearman rank correlationcoefficient. We first calculate Sx:

Sx =n(

n2 − 1)−

g∑i=1

(t3i − ti

)

12

= 8(82 − 1

)− [23 − 2+ 23 − 2

]

12= 41 .

Page 503: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Standard Deviation 505

Then Sy:

Sy =n(

n2 − 1)−

h∑j=1

(t3j − tj

)

12

= 8(82 − 1

)− [23 − 2+ 23 − 2

]

12= 41 .

The value of ρ becomes the following:

ρ = 41+ 41− (1+ 0+ . . .+ 1)

2√

41 · 41= 76.5

82

= 0.9329 .

We see that in this case there is also an almostperfect positive correlation.We now carry out the hypothesis test:

H0: There is independence between theintelligence tests of a pair of twins.

H1: There is a positive correlation betweenthe intelligence tests.

We choose a significant level of α = 0.05.Since we are in case B, H0 is rejected if

ρ > t8,0.95 ,

where t8,0.95 is the value of the Spearmantable, meaning if

ρ > 0.6190 .

In both cases (ρ = 0.9762 and ρ = 0.9329),H0 is rejected. We can conclude that there isa positive correlation between the results ofthe intelligence tests of a pair of twins.

FURTHER READING� Hypothesis testing� Nonparametric test� Test of independence

REFERENCESGlasser, G.J., Winter, R.F.: Critical values

of the coefficient of rank correlation fortesting the hypothesis of independance.Biometrika 48, 444–448 (1961)

Spearman, C.: The Proof and Measurementof Association between Two Things. Am.J. Psychol. 15, 72–101 (1904)

Spearman Table

TheSpearman tablegives the theoreticalval-ues of the Spearman rank correlation coeffi-cient under the hypothesis of independenceof two random variables.

HISTORYSee Spearman rank correlation coeffi-cient.

FURTHER READING� Spearman rank correlation coefficient

Standard Deviation

The standard deviation is a measure of dis-persion. Itcorrespondsto thepositivesquareroot of the variance, where the variance isthe mean of the squared deviations of eachobservation with respect to the mean of theset of observations.It is usually denoted by σ when it is relativeto a population and by S when it is relativeto a sample.Inpractice, thestandarddeviationσ ofapop-ulation will be estimated by the standarddeviation S of a sample of this population.

Page 504: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

506 Standard Deviation

HISTORYThe term standard deviation is closely relat-ed to the works of two English mathemati-cians, Pearson, Karl and Gosset, W.S. Itwas indeed during a conference that he gavebefore theLondonRoyalSocietyin1893thatPearson, K. used the term for the first time.He used it again in his article entitled “Onthe Dissection of Asymmetrical FrequencyCurves” in 1894, and it was also Pearson, K.who introduced the symbol σ to denote thestandard deviation. Gosset, W.S., (Student),alsoworkedontheseproblems.Heexplainedwhy it is important to distinguish S (stan-dard deviation relative to a sample) fromσ (standard deviation relative to a popula-tion).In his article of March 1908, Gosset, W.S.defined the standard deviation of a sampleby:

S =

√√√√√n∑

i=1(xi − x)2

n.

A question that came up was to know ifthis expression should be divided by n orby n − 1. Pearson, K. asked the opinion ofGosset, W.S., who answered in a letter on13 March 1927 that both formulas could befound in the literature. The one with an n−1denominator gives a mean value that is inde-pendent of the size of the sample and there-fore of the population. On the other hand,with large samples, the difference betweenboth formulas is negligible and the calcula-tions are simpler if the formula with an ndenominator is used.Onthewhole,Gosset, W.S. indicates that theuse of n − 1 is probably more appropriate

for small samples, whereas the use of n ispreferable for large samples.Note that thediscovery of thestandarddevia-tion is to beplaced in thecontextof the theoryof the estimation and of hypothesis testing.Also, at first the study of variability attract-ed the attention of astronomers because theywere interested in discoveries related to thedistribution of errors.

MATHEMATICAL ASPECTSConsider a random variable X with anexpected value E[X]. The variance σ 2 ofthe population is defined by:

σ 2 = E[(X − E [X])2

]

or

σ 2 = E[X2

]− E [X]2 .

The standard deviation σ is the positivesquare root of σ 2:

σ =√

E[(X − E [X])2] .

To estimate the standard deviation σ ofa population, the distribution of the popu-lation should be known, but in practice, itis convenient to estimate σ from a randomsample x1, . . . , xn with the formula:

S =

√√√√√√n∑

i=1

(xi − x)2

n− 1

where S is the estimator of σ and x is thearithmetic mean of the sample:

x =

n∑i=1

xi

n.

Page 505: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Standard Deviation 507

Thestandarddeviationcanalsobecalculatedusing the following alternative formula:

S =

√√√√√n∑

i=1x2

i − nx2

n− 1

=

√√√√√√nn∑

i=1x2

i −(

n∑i=1

xi

)2

n (n− 1).

DOMAINS AND LIMITATIONSThe standard deviation is used as a scale ofmeasurement in tests of hypotheses andconfidence intervals. Other measures ofdispersion such as the range or the meandeviationarealsoused indescriptivestatis-tics,but theyplaya less important role intestsof hypotheses. It can be verified that:

mean deviation ≤ S ≤ range

2·√

n

n− 1.

If we consider a random variable that fol-lows a normal distribution of mean μ

and standard deviation σ , we can verify that68.26% of the observations are located inthe interval μ ± σ . Also, 95.44% of theobservations are located in the interval μ±2σ , and99.74%of theobservationsare locat-ed in the interval μ± 3σ .

EXAMPLESFive students have successively passed twoexams on which they obtained the followinggrades:Exam 1:

3.5 4 4.5 3.5 4.5 x = 20

5= 4

Exam 2:

2.5 5.5 3.5 4.5 4 x = 20

5= 4

The arithmetic mean x of these two setsof observations is identical. Nevertheless,the dispersion of the observations around themean is not the same.To calculate the standard deviation, we firsthave to calculate the deviations of eachobservation with respect to the arithmeticmean and then square these deviations:

Exam 1:

Grade (xi − x) (xi − x)2

3.5 −0.5 0.25

4 0.0 0.00

4.5 0.5 0.25

3.5 −0.5 0.25

4.5 0.5 0.25∑5

i=1(xi − x)2 1.00

Exam 2:

Grade (xi − x) (xi − x)2

2.5 −1.5 2.25

5.5 1.5 2.25

3.5 −0.5 0.25

4.5 0.5 0.25

4 0.0 0.00∑5i=1(xi − x)2 5.00

The standard deviation for each exam isequal to:Exam 1:

S =

√√√√√5∑

i=1(xi − x)2

n− 1=

√1

4= 0.5

Exam 2:

S =

√√√√√5∑

i=1(xi − x)2

n− 1=

√5

4= 1.118

Since the standard deviation of the grades inthesecondexamis larger, thegradesaremore

Page 506: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

508 Standard Error

dispersed around the arithmetic mean thanfor the first exam. The variability of the sec-ond exam is therefore larger than that of thefirst.

FURTHER READING� Coefficient of variation� Mean absolute deviation� Measure of dispersion� Variance

REFERENCESBarnard, G.A., Plackett, R.L., Pearson, S.:

“Student”: a statistical biography ofWilliam Sealy Gosset: based on writ-ings by E.S. Pearson. Oxford UniversityPress, Oxford (1990)

Pearson, K.: Contributions to the mathe-matical theory of evolution. I. In: KarlPearson’s Early Statistical Papers. Cam-bridge University Press, Cambridge,pp. 1–40 (1948). First published in 1894as: On the dissection of asymmetrical fre-quency curves. Philos. Trans. Roy. Soc.Lond. Ser. A 185, 71–110

Gosset, S.W. “Student”: The Probable Errorof a Mean. Biometrika 6, 1–25 (1908)

Standard ErrorThe standard error is the square root of theestimated variance of a statistic, meaningthe standard deviation of the samplingdistribution of this statistic.

HISTORYThe standard error notion is attributed toGauss, C.F. (1816), who apparently did notknow the concept of sampling distributionof a statistic used to estimate the value ofa parameter of a population.

MATHEMATICAL ASPECTSThestandarderrorof themean,orstandarddeviation of the sampling distribution ofthe means, denoted by σx, is calculated asa function of the standard deviation of thepopulation and of the respective size of thefinite population of the sample:

σx = σ√n·√

N − n

N,

where σ is the standard deviation of the pop-ulation, N is the size of the population, andn is the sample size.If the size N of the population is infinite, thecorrection factor:

√N − n

N

can be omitted because it tends toward 1when N tends toward infinity.If the population is infinite, or if the sam-pling is nonexhaustive (with replacement),then the standard error of the mean is equalto the standard deviation of the populationdivided by the square root of the size of thesample:

σx = σ√n

.

If the standard deviation σ of the popula-tion is unknown, it can be estimated by thestandard deviation S of the sample given bythe square root of the variance S2:

S2 =

n∑j=1

(xj − x

)2

n− 1.

The same formulas are valid for the standarderror of a proportion, or standard devia-tionof the sampling distributionof thepro-portions, denoted by σp:

σp = σ√n·√

N − n

N

Page 507: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Standard Error 509

in the case of a finite population and

σp = σ√n

in the case of an infinite population, where σ

is the standard deviation of the population:

σ = √p · (1− p) = √p · q ,

where p is the probability that an elementpossesses the studied trait and q = 1− p isthe probability that it does not.Thestandarderrorof thedifferencebetweentwo independent quantities is the square rootof the sum of the squared standard errors ofboth quantities. For example, the standarderror of the difference between two meansis equal to:

σx1−x2 =√

σ 2x1+ σ 2

x2=

√σ 2

1

n1+ σ 2

2

n2,

whereσ 21 andσ 2

2 aretherespectivevariancesof the two infinite populations to be com-pared and n1 and n2 the respective sizes ofthe two samples.If the variances of the two populations areunknown, we can estimate them with thevariances calculated on the two samples,which gives:

σx1−x2 =√

S21

n1+ S2

2

n2,

where S21 and S2

2 are the respective variancesof the two samples.If we consider that the variances of the twopopulations are equal, we can estimate thevalue of these variances with a pooled vari-ance S2

p calculated as a function of S21 and S2

2:

S2p =

(n1 − 1) · S21 + (n2 − 1) · S2

2

n1 + n2 − 2,

which gives:

σx1−x2 = Sp ·√

1

n1+ 1

n2.

Thestandarderrorof thedifferencebetweentwo proportions is calculated in the sameway as the standard error of the differencebetween two means. Therefore:

σp1−p2 =√

σ 2p1+ σ 2

p2

=√

p1 · (1− p1)

n1+ p2 · (1− p2)

n2,

where p1 and p2 are the proportions for infi-nite populations calculated on the two sam-ples of size n1 and n2.

EXAMPLES(1) Mean standard errorA manufacturer wants to test the precision ofa new machine to make bolts 8 mm in diam-eter. On a lot of N = 10000 pieces, a sampleof n = 100 pieces isexamined; the standarddeviation S is 1.2 mm.The standard error of the mean is equal to:

σx = σ√n·√

N − n

N.

Since the standard deviation σ of the pop-ulation is unknown, we can estimate it usingthestandarddeviationofthesampleS,whichgives:

σx = S√n·√

N − n

N

= 1.2

10·√

10000− 100

10000= 0.12 · 0.995 = 0.119 .

Note that the factor√

N − n

N= 0.995

Page 508: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

510 Standard Error

has little influence on the result as the sizeof the population is large enough.

(2) Standard error of a proportionAcandidateconductsasurveyonasampleof200 people to know if he will have more than50% (= π0) of the vote, π0 being the pre-sumed value of parameter π (proportionof the population).The standard error of the proportion σp isequal to:

σp =√

π0 · (1− π0)

n,

where n is the sample size (200 in our exam-ple). We can consider that the population isinfinite, which is why we do not take intoaccount the corrective factor:

σp =√

0.5 · (1− 0.5)

200= 0.0354 .

(3) Standard error of difference betweentwo quantitiesAn insurance company decides to equip itsoffices with microcomputers. It wants to buythese microcomputers from two differentsuppliersas long as there isno significantdif-ference in durability between the twobrands.It tests a sample of 35 microcomputers ofbrand 1 and 32 of brand 2, noting the timethat passed before the first breakdown. Theobserved data are the following:

Brand 1: Standard deviation S1 = 57.916

Brand 2: Standard deviation S2 = 57.247

Case 1:We suppose that the variances of the twopopulations are equal. The pooled standard

deviation is equal to:

Sp =√

(n1 − 1) · S21 + (n2 − 1) · S2

2

n1 + n2 − 2

=√

34 · 57.9162 + 31 · 57.2472

35+ 32− 2

= 57.5979 .

Knowing the weighted standard devia-tion, we can calculate the value of the stan-dard deviation of the sampling distributionusing the following formula:

σx1−x2 = Sp ·√

1

n1+ 1

n2

= 57.5979 ·√

1

35+ 1

32= 14.0875 .

Case 2:If we suppose the variances of the two pop-ulations to be unequal, we must start by cal-culating the standard error of the mean foreach sample:

σx1 =S1√n1= 57.916√

35= 9.79 ,

σx2 =S2√n2= 57.247√

32= 10.12 ,

σx1−x2 =√

σ 2x1+ σ 2

x2

=√

9.792 + 10.122

= 14.08 .

We notice in this example that the two cas-es give values of the standard error that arealmost identical. This is explained by thevery slight difference between the standarddeviations of the two samples S1 and S2.

FURTHER READING� Hypothesis testing� Sampling distribution� Standard deviation

Page 509: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Standardized Data 511

REFERENCESBox, G.E.P., Hunter, W.G., Hunter, J.S.:

Statistics for Experimenters. An Intro-duction to Design, Data Analysis, andModel Building. Wiley, New York (1978)

Cochran, W.G.: Sampling Techniques, 2ndedn. Wiley, New York (1963)

Gauss, C.F.: Bestimmung der Genauigkeitder Beobachtungen, vol. 4, pp. 109–117(1816). In: Gauss, C.F. Werke (publishedin 1880). Dieterichsche Universitäts-Druckerei, Göttingen

Gauss, C.F.: Gauss’ Work (1803–1826) onthe Theory of Least Squares. Trotter, H.F.,trans. Statistical Techniques ResearchGroup, Technical Report, No. 5, Prince-ton University Press,Princeton,NJ(1957)

Standardized Data

Standardized data are data from which wesubtract the mean of the observations relat-ed to the random variable (associated to thedata) and then divide by the standard devia-tion of the observations. Thus standardizeddata have a mean of 0 and a variance of 1.

MATHEMATICAL ASPECTSThe standardization of a data series X withthe termsxi, i = 1, . . . n is thesameas replac-ing the term xi of the series by xs

i :

xsi =

xi − xi

s,

where xi and s are, respectively, the mean andthe standard deviation of xi.

DOMAINS AND LIMITATIONSCertain statisticians have a habit of stan-dardizing explanatory variables in a linear

regression model to simplify the numeri-cal difficulties arising from matrix calcula-tions and to facilitate the interpretation andcomparison of the regression coefficients.The application of the least-squares methodusing standardized data gives results equiva-lent to those obtained using the original data.A regression is generally modeled by

Yi =p∑

j=1

βjXij , i = 1, . . . , n ,

where Xj is the nonstandardized data. If Xij

is standardized (mean 0 and variance 1), βj

is called the standardized regression coeffi-cient and is equivalent to the correlation if Yis also standardized. The same terminologycan be used if Xj is reduced to having a vari-ance of 1, but not necessarily a zero mean.The coefficient β0 absorbs the difference.

EXAMPLESIn the following example, the observationsrelateto13mixturesofcement.Eachmixtureis composed of four ingredients given in thetable. The goal of the experiment is to deter-mine how the quantities X1, X2, X3, and X4

of these four ingredients influence the quan-tity of heat Y emitted by the hardening of thecement.

Table: Heat emitted by cement

IngredientMix-ture i X1 X2 X3 X4

HeatY

1 7 26 6 60 78.5

2 1 29 15 52 74.3

3 11 56 8 20 104.3

4 11 31 8 47 87.6

5 7 52 6 33 95.9

6 11 55 9 22 109.2

7 3 71 17 6 102.7

8 1 31 22 44 72.5

Page 510: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

512 Statistical Software

IngredientMix-ture i X1 X2 X3 X4

HeatY

9 2 54 18 22 93.1

10 21 47 4 26 115.9

11 1 40 23 34 83.9

12 11 66 9 12 113.3

13 10 68 8 12 109.4

Source: Birkes and Dodge (1993)

yi quantities of heat given by hardening ofith mixture (in joules);

xi1 quantity of ingredient 1 (aluminate oftricalcium) in ith mixture;

xi2 quantity of ingredient 2 (silicate of tri-calcium) in ith mixture;

xi3 quantity of ingredient 3 (aluminoferriteof tetracalcium) in ith mixture;

xi4 quantity of ingredient 4 (silicate ofdicalcium) in ith mixture.

Theobtained modelofa simple linear regres-sion is:

Y = 62.4+ 1.55X1 + 0.51X2

+ 0.102X3 − 0.144X4 + ε .

The following table presents data on thecement with data relative to the standardizedexplanatory variables, which we denote byXs

1, Xs2, Xs

3, and Xs4.

Data on cement with standardized explana-tory variables

Standardized ingredientMix-ture i Xs

1 Xs2 Xs

3 Xs4

HeatY

1 −0.079 −1.424 −0.901 1.792 78.5

2 −1.099 −1.231 0.504 1.314 74.3

3 0.602 0.504 −0.589 −0.597 104.3

4 0.602 −1.102 −0.589 1.016 87.6

5 −0.079 0.247 −0.901 0.179 95.9

6 0.602 0.440 −0.432 −0.478 109.2

7 −0.759 1.468 0.817 −1.434 102.7

8 −1.099 −1.102 1.597 0.836 72.5

Standardized ingredientMix-ture i Xs

1 Xs2 Xs

3 Xs4

HeatY

9 −0.929 0.376 0.973 −0.478 93.1

10 2.302 −0.074 −1.213 −0.239 115.9

11 −1.099 −0.524 1.753 0.239 83.9

12 0.602 1.147 −0.432 −1.075 113.3

13 0.432 1.275 −0.589 −1.075 109.4

The estimation by the least-squares methodof thecompletemodelusingthestandardizedexplanatory variables gives:

Y = 95.4+ 9.12Xs1 + 7.94Xs

2

+ 0.65Xs3 − 2.41Xs

4 .

Note that the coefficient βs1 = 9.12 of Xs

1 canbe calculated by multiplying the coefficientβ1 = 1.55 of X1 in the initial model by thestandard deviation s1 = 5.882 of X1. Alsonote that the mean X1 = 7.46 of X1 is con-sidered not in the relation between βs

1 and β1

but in the relation between βs0 and β0. Taking

only the first two standardized explanatoryvariables, we obtain the solution

Y = 95.4+ 8.64Xs1 + 10.3Xs

2 .

FURTHER READING� Data� Ridge regression

REFERENCESBirkes, D., Dodge, Y.: Alternative Methods

of Regression. Wiley, New York (1993)

Statistical Software

Statistical software is a set of computer pro-grams and procedures for the treatment ofstatistical data.

Page 511: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Statistical Software 513

LibrariesA library is a series of programs or sub-routines that are installed under the sameoperating system (DOS, Windows, or Unix)and can be used by entering individual com-mands.

IMSL (International Mathematics andStatistics Library): a collection of about540 subroutines written in FORTRANspecifically concerning mathematicsand statistics.

NAG (Numerical Algorithms Group):a library of algorithms written with thehelp of three languages (ALGOL 60,ALGOL 68, and FORTRAN ANSI).We find here more than 600 advancedsubroutines concerning simulation,regression analysis, techniques of opti-mization, GLIMs (general linear mod-els), GenStat, time series, and graphicalrepresentations. Moreover, we can treatin a simple manner answers obtainedby census.

SoftwareA software is a set of complex algorithmsrepresenting a common structure of data andrequiring a minimum of previous program-ming on the part of the user.

BMDP: asetofsubroutineswritten inFOR-TRAN that allow one to treat enquiriesmade by survey, to generate graphi-cal representations, toapplymultivari-ate techniques or nonparametric tests,or to carry out a linear or nonlinearregression analysis without forgettingthe study of time series.

IMSL (International Mathematics andStatistics Library): set of subroutines

written in FORTRAN specifically con-cerning the fields of mathematics andstatistics.

MINITAB: by far the easiest system to use.Since it can be used in an interactivemode, it allows to carry out analysesof various types: descriptive statistics,analysis of variance, regression anal-ysis, nonparametric tests, random-number generation, or the study oftime series, among other things.

NAG: a library of algorithms written withthe help of three languages (ALGOL 60,ALGOL 68, FORTRAN ANSI).Advanced subroutines can be foundconcerning simulation, regressionanalysis, optimization techniques,time series, and graphical representa-tions. Moreover, the answers obtainedby survey can be easily treated, andmultivariate analysis as well as non-parametric tests can be applied tothem.

P-STAT: an interactive system that treatsdesigns of experiment,graphical rep-resentations, nonparametric tests,and data analysis.

SAS (Statistical Analysis System): a set ofsoftware including a basic SAS lan-guage as well as specific programs.Surely the most complete package, itallowstoanalyzedata fromverydiversefields.

SPSS (Statistical Package for the SocialSciences): one of the most commonlyused systems. It offers interactive waysof treating large data bases. Also, somealgorithms of graphical representa-tions are included in the program.

Page 512: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

514 Statistical Software

BMDP: basically for biomedical applica-tions, BMDP is a complete softwarekit composed of subroutines written inFORTRAN that allows to treat inquiriesmade by census, to generate graphicalrepresentations, to apply multivariatetechniques and nonparametric tests, orto perform a linear or nonlinear regres-sion analysis, without neglecting stud-ies of time series. Each subprogram isbased on highly competitive numericalalgorithms.

DataDesk: an interactive software for ana-lyzing statistical treatments of data. Itdepicts tools graphically, allowing oneto see relations among data, tendencies,subgroups, and outliers.

EViews: software essentially for econo-mists, EViews is the tool of choicefor analysis, forecasting, and econo-metric modeling. EViews can be usedfor general statistical analysis, esti-mating time series, large-scale simu-lations, graphics, or simple data man-agement.

Excel: allows the user to represent and ana-lyzedatawith thehelp ofbasic statisticalmethods: descriptive statistics, analysisof variance, regression and correlation,and hypothesis testing.

Gauss: a statistical program with orientedmatrices,Gauss isapowerfuleconomet-ric and graphic tool.

JMP: statistical software made for experi-mental designs in particular, JMP alsocontains an exhaustive suite of statisti-cal tools.

Lisrel: used in thesocial sciencesand,morespecifically, for factorial analysis andmodeling, Lisrel is particularly useful

for modeling the representation of latentvariables.

Maple: a complete mathematical environ-ment designed for the manipulation ofalgebraical expressions and the resolu-tion of equations and integrals; it alsohas very powerful graphic tools in twoand three dimensions. The user can alsoprogram in Maple.

Mathematica: statistical suite that allowsto perform descriptive univariate andmultivariate statistical analysis, datasmoothing, classical hypothesis test-ing, confidence interval estimation, andlinear and nonlinear regression.

Matlab: an interactive environment thatcan be used to analyze scientific andstatistical data. The basic objects ofMatlab are matrices. The user can per-form numerical analyses, treatment ofsigns and image treatment. The usercan program the functions using C andFORTRAN.

MIM: a program for graphic modeling ofdiscrete and continuous data. Amongthe families of available graphics pro-grams in this software suite, we men-tion log-linear models, Gaussian graph-ic models, Manova standard models,and many other models useful for mul-tivariate analysis.

P-STAT: created in 1960 for the treatmentofpharmacologicaldata,P-STATisnowused by researchers in the social sci-ences. It allows to carry out an analy-sis of censuses. The power of P-STATlies in its macros, and it is very use-ful for the statistical applications relatedto business. It allows to treat large datasets and treats experimental designs,

Page 513: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Statistical Software 515

graphical representations, nonparamet-ric tests, and data analysis.

R programming language similar to S, R isalso freeware widely used in academia.It represents an implementation differ-ent from S, but many S commands workin R. R contains a wide variety of sta-tistical commands (linear and nonlinearmodeling, execution of classical tests,analysis of time series, classification,segmentation, . . . ) as well as graphicaltechniques.

SAS (Statistical Analysis System): a suiteof software programs including a basicSAS language as well as specific pro-grams. No doubt the most completepackage, it allows to analyze data ofvarious origins: research, engineering,medicine, and business applications. Itincludes a significant number of statis-tical tools: analysis of variance, regres-sion analysis, categorical data analysis,multivariate analysis, survival analysis,decision trees, nonparametric analysis,and “data mining”.

S-plus: software written in a powerfuland flexible object-oriented program-ming language called S, it allows formatrix, mathematical, and statisti-cal calculations. It includes a tool forexploratory visual analysis that alsoallows for advanced data analysis aswell as their modeling. S-plus allowsgeostatisticians to put together a block“S+SpatialStats” containing differenttools of spatial statistics.

Spad-T: based purely on statistical tech-niques of multivariate analysis. Con-tingency tables of treated variables arecreated with the help of multifactorial

methods giving way to graphical repre-sentations.

SPSS (Statistical Package for the SocialSciences): first released in 1968, SPSSis actually one of the most widely usedprograms by statisticians in variousdomains. It offers interactive meansfor the treatment of large data bases. Itcontains algorithms for graphical rep-resentations.

Stata: a statistical software allowing one tocarry out statistical analysis from linearmodeling on a generalized linear mod-el, treat binary data, use nonparametricmethods, conduct multivariate analysisand analysis of time series, and utilizemethods of graphical data exploration.Moreover, the software allows to pro-gram the user-specific commands. Statacontains, among other things, statisticaltools for epidemiologists.

Statistica: first released in 1993, Statisti-ca is an extremely complex softwaresuite that contains powerful explorationtools for large data bases. The basicsoftware includes basic statistics, anal-ysis of variance, nonparametric statis-tics, linear and nonlinear models, tech-niques of exploratory multivariate anal-ysis, neural nets, calculation of sampledimensions, experimental design analy-sis tools, quality control tools, etc.

StatView: statistical software for medi-cal scientists and researchers. StatViewallows to produce descriptive statistics,perform parametric and nonparametrichypothesis testing, calculate correla-tions and covariances, carry out linear,nonlinear, and logistic regression, com-pile contingency tables, and performfactorial analysis, survival analysis,

Page 514: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

516 Statistical Software

quality control, and Pareto analysis.A good graphical interface is available.

Systat: powerfulgraphicssoftware.Allowsfor thevisualization ofdataaswellas theuse of basic statistics to find an adequatemodel.

Solas: software treating missing data. Solasintegrates algorithms of multiple impu-tation.

Vista: designed for those starting out instatistics as well as for professors. Vistacan be used as a tutorial software inthe domain of univariate, multivariate,graphic, and calculatory statistics.

WinBugs: a program of Bayesian datamodeling based on Monte Carlo sim-ulation techniques. It includes the rep-resentation of Bayesian models withcommands that allow to carry out simu-lations. Moreover, an interface is avail-able allowing one to control the analysisat any moment. The graphic tools allowone to control the convergence of thesimulation.

XploRe: awide-rangingstatisticalenviron-ment made for exploratory data anal-ysis. It is programmed in the matrix-oriented language, and as such it con-tains a large group of statistical oper-ations and has a powerful interactiveinterface. The program supports user-defined macros.

HISTORYThe first effort made by statisticians tosimplify their statistical calculations wasthe construction of statistical tables. Withthe arrival of computers in the mid-1940s,the encoding of statistical subroutines tookplace. The programming languages at this

time were not widely known to researchers,and they had to wait till the beginning of1960, when the programming languageFORTRAN was released, for the devel-opment of statistical software programs.Among the first routines introduced by IBM,the best known is the pseudorandom numbergenerator RANDU. The systematic devel-opment of statistical software began in 1960with the creation of the programs BDM andSPSS, followed by SAS.

REFERENCESAsimov, D.: The grand tour: a tool for view-

ing multidimensional data. SIAM J. Sci.Stat. Comput. 6(1), 128–143 (1985)

Becker, R.A., Chambers, J.M., Wilks, A.R.:The new S Language. Waldsworth andBrooks/Cole, Pacific Grove, CA (1988)

Dixon, W.J. chief ed.: BMDP StatisticalSoftware Communications. The Univer-sity of California Press. Berkeley, CA(1981)

Chambers, J.M and Hastie, T.K (eds.): Sta-tistical models in S. Waldsworth andBrooks/Cole, Pacific Grove, CA (1992)

Dodge, Y., Hand, D.: What Should FutureStatistical Software Look like? Comput.Stat. Data Anal. (1991)

Dodge, Y., Whittaker, J. (eds.): Computa-tional statistics, vols. 1 and 2. COMP-STAT, Neuchâtel, Switzerland, 24–28August 1992 (English). Physica-Verlag,Heidelberg (1992)

Edwards, D.: Introduction to GraphicalModelling. Springer, Berlin HeidelbergNew York (1995)

Francis, I.: Statistical software: a compar-ative rewiew. North Holland, New York(1981)

Page 515: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Statistic 517

Hayes, A.R.: Statistical Software: A sur-vey and Critique of its Development.Office of Naval Research, Arlington, VA(1982)

International Mathematical and StatisticalLibraries.: IMSL Library Information.IMSLK, Houston (1981)

Ripley, B.D., Venables, W.N. (2000) S pro-gramming. Springer, Berlin HeidelbergNew York

Statistical Sciences: Statistical Analysisin S-Plus, Version 3.1. Seattle: StatSci,a division of MathSoft (1993)

Venables, W.N., Ripley, B.D.: ModernAppliedStatisticswithS-PLUS.Springer,Berlin Heidelberg New York (1994)

Wegman, E.J., Hayes, A.R.: Statistical Soft-ware. Encycl. Stat. Sci. 8, 667–674 (1988)

Statistical Table

A statistical table gives values of the distri-bution function or of the individual proba-bility ofarandom variable following aspe-cific probability distribution.

DOMAINS AND LIMITATIONSThe necessity of a statistical table comesfrom the fact that certain distribution func-tions cannot be expressed in an easy-to-use mathematical form. In other cases, it isimpossible (or too difficult) to find a prim-itive of the density function for the directcalculation of the corresponding distributionfunction.The values of the distribution function arecalculated in advance and represented ina statistical table to simplify use.

EXAMPLESThe normal table, the Student table, Fish-er’s tables, and the chi-square are examplesof statistical tables.

FURTHER READING� Binomial table� Chi-square table� Distribution function� Fisher table� Kruskal-Wallis table� Normal table� Student table� Wilcoxon signed table� Wilcoxon table

REFERENCESKokoska, S., Zwillinger, D.: CRC Standard

Probability and Statistics Tables and for-mulae. CRC Press (2000)

Statistic

Statistic is the result of applying a functionto a sample of data where the function itselfis independent of the sampling distribution.

MATHEMATICAL ASPECTSA statistic is an observable random vari-able, this differentiates it from a parameter.Completeness, sufficiency and unbiasednessare important desirable properties of statis-tics.

EXAMPLESarithmetic mean, variance, standard de-viation.

FURTHER READING� Chi-square test of independence� Percentile

Page 516: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

518 Statistics

� p-value� Quantile� Student test

Statistics

The word statistics, derived from Latin,refers to the notion of state (status): “Whatis relative to the state”. Governments havea great need to count and measure numer-ous events and activities, such as changingdemographics,births, immigration and emi-gration trends, changes in employment rates,businesses, etc.In this perspective, the term “statistics” isused to indicate a set of available data abouta given phenomenon (e. g., unemploymentstatistics).In the more modern and more accurate senseof the word, “statistics” is considered a disci-pline that concerns itself with numeric data.It is made up of a set of techniques forobtaining knowledge from incomplete data,from a rigorous scientific system for manag-ing data collection, their organization, anal-ysis, and interpretation, when it is possibleto present them in numeric form.In a population of individuals, we would liketo know, in termsof statistical theory, if agiv-en individual has a car or if he smokes. Onthe other hand, we would like to know howmany individuals have acarand are smokers,and if there is a relation between possessinga car and smoking habits in the studied pop-ulation.We would like to know the traits of the pop-ulation globally, without concerning our-selves with each person or each object in thepopulation.We distinguish two subsets of techniques:(1) those involving descriptive statistics

and (2) those involving inferential statis-tics. The essential goal of descriptive statis-tics is to represent information in a com-prehensible and usable format. Inferentialstatistics, on the other hand, aims to facili-tate the generalization of this information or,more specifically, to make inferences (con-cerning populations) based on samples ofthese populations.

HISTORYThe term “statistics”, derived from the Latin“status” (state), was used for the first time,according to Kendall, M.G. (1960), by a his-torian named Ghilini, Girolamo in 1589.According to Kendall, M.G. (1960) the trueorigin of modern statistics dates back to1660.Statistical methods and statistical distri-butions developed principally in the 19thcentury thanks to the important role ofstatistics in experimental and human sci-ences.In the 20th century, statistics became a sep-arate discipline because of the wealth anddiversity of methods that it uses. This alsoexplains the birth of specialized statisti-cal encyclopedias such as the InternationalEncyclopedia of Statistics (1978) publishedin two volumes and the Encyclopedia of Sta-tistical Sciences (1982–1988) in nine vol-umes.

DOMAINS AND LIMITATIONSStatistics is used in many domains, all ofthem very different from one another. Thuswe find it being used in industrial productionin theenvironmental and medicalScience, aswell as in official agencies.Let us look at some of the areas that mightinterest a statistician:

Page 517: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Statistics 519

• Fundamental research in statistics:research in probabilities, statistical meth-ods, and theory.

• Biology: fundamental research and expe-riments related to the principal phenom-ena of living organisms and biometry.

• Commerce: data management, sales vol-ume, management, inventory methods,industrial planning, communication andtheoreticalcontrol,andaccountingproce-dures.

• Demography: study of the increase inhuman population (birth and death rate,migratory movement), study of the struc-ture of populations (personal, social, andeconomic characteristics).

• Economy: measure of the volume of pro-duction, commerce, resources, employ-ment, and standard of living; analysis ofthe consumer and manufacturer behavior,responses of the market to price changes,impact of advertising, and governmentpolicies.

• Education: measures and tests related tothe process of studying.

• Engineering: research and experimentsincluding design and efficiency testing,improvements in test methods, questionsrelated to inference and precision tests,improvements inqualitycontrolmethods.

• Health: cost of accidents, medical care,hospitalization.

• Insurance: determining mortality andaccident rates of insured people and ofthe general population; determining theprime rate for each category of risk;application of statistical techniques toinsurance policy management.

• Marketing and research on consump-tion: problems related to markets, the

distribution system, detection of trends;studyofconsumerpreferencesandbehav-ior.

• Medicine: epidemiology, fundamentalresearch and experiments on the causes,diagnosis, treatment, and prevention ofillnesses.

• Management and administration: prob-lems related to personnel management,equipment, methods of production, andwork conditions.

• Psychology and psychometry: problemsrelated to the evaluation of the studycapacity, intelligence,personalcharacter-istics, and normal and abnormal behaviorof an individual as well as the establish-mentof scalesofmeasurementand instru-ments of measure.

• Social sciences: establishment of tech-niques of sampling and theoretical testsconcerning social systems and societalwell-being. Analysis of the cost of thesocial sciences; analysis of cultural dif-ferences on the level of values and socialbehavior.

• Space research: interpretation of expe-rimentsand analysisof datacollected dur-ing space missions.

• Sciences (in general): fundamentalresearch in the natural and social sci-ences.

FURTHER READING� Bayesian statistics� Descriptive statistics� Genetic statistics� Inferential statistics� Official statistics� Spatial statistics

Page 518: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

520 Stem and Leaf

REFERENCESKendall, M.G.: Where shall the history of

Statistics begin? Biometrika 47, 447–449(1960)

Kotz, S., Johnson, N.L.: Encyclopedia ofStatisticalSciences,vols.1–9.Wiley,NewYork (1982–1988)

Kruskal, W.H., Tanur, J.M.: InternationalEncyclopedia of Statistics, vols. 1–2. TheFree Press, New York (1978)

Stigler, S.: The History of Statistics, theMeasurement of Uncertainty Before1900. Belknap, London (1986)

Stem and LeafSee stem-and-leaf diagram.

Stem-And-Leaf DiagramThe stem-and-leaf diagram is a graphicalrepresentation of frequencies. The basicidea is to provide information on the fre-quency distribution and retain the valuesof the data at the same time. Indeed, the stemcorresponds to the class intervals and theleaf to the number of observations in theclass represented by the different data. It isthen possible to directly read the values ofthe data.

HISTORYThe origin of the stem-and-leaf diagram isoften associated with Tukey, J.W. (1977). Itsconcept is based on the histogram, whichdates back to the 18th century.

MATHEMATICAL ASPECTSTo construct a stem-and-leaf diagram, eachnumber must be cut into a main part (stem)

and asecondary part (leaf).Forexample,4.2,42,or420 can beseparated into 4 for thestemand 2 for the leaf. The stems are then list-ed vertically (one line per stem) in order ofmagnitude and the leaves are written next tothem, also in order of magnitude. If a set ofdata includes the values 4.2, 4.4, 4.8, 4.4,and 4.1, then the stem and leaf are as fol-lows:

4|12448 .

DOMAINS AND LIMITATIONSThe stem-and-leaf diagram provides infor-mation about distribution, symmetry, con-centration, empty sets, and outliers.Sometimes there are too many numbers withthe same stem. In this case it is not possibleto put them all on the same line. The mostcommon solution consists in separating theten leaves into two groupsoffive leaves, e.g.,4|0122444 and 4|567889.It is possible to compare two series of databy putting the two leaves to the left and rightof the same stem:

012244|4|567889 .

EXAMPLES

Per-capita GNP – 1970 (in dollars)

AFRICA

Algeria 324 Mauritius 233

Angola 294 Morocco 224

Benin 81 Mozambique 228

Botswana 143 Niger 90

Burundi 67 Nigeria 145

CentralAfrican Rep.

127 Reunion 769

Chad 74 Rwanda 60

Comores 102 Senegal 236

Page 519: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Stochastic Process 521

AFRICA

Congo 238 SierraLeone

167

Egypt 217 Somalia 89

Equat. Guinea 267 South Africa 758

Ethiopia 72 Rhodesia(South)

283

Gabon 670 Sudan 117

Gambia 101 Swaziland 270

Ghana 257 Togo 135

Guinea 82 Tunisia 281

Guinea-Bissau

259 Uganda 135

Ivory Coast 347 United Rep.

Kenya 143 of Cameroon 185

Lesotho 74 United Rep.

Liberia 268 of Tanzania 100

Madagascar 133 High-Volta 59

Malawi 74 Zaire 88

Mali 54 Zambia 421

Mauritania 165

Source: Annual Statistics of the UN, 1976

The above table gives the per-capita GNP, indollars, for most African countries in 1970.To establish a stem-and-leaf diagram fromthese data, they are first classified in orderof magnitude. The stem is then chosen: inthis example we will use hundreds. The tensnumber is then placed to the right of thecorresponding hundred, which gives the leafpart of the diagram. The units are not takeninto account.The stem and leaf are as follows:

Unit: 1 1 = 110

0 5566777788889

1 00012333444668

2 12233355667889

Unit: 1 1 = 110

3 24

4 2

5

6 7

7 56

FURTHER READING� Frequency distribution� Graphical representation� Histogram

REFERENCESTukey, J.W.: Explanatory Data Analysis,

limited preliminary edition. Addison-Wesley, Reading, MA (1970–1971)

Stochastic Process

A stochastic process is a family of ran-dom variables. In practice, it serves to mod-el a large number of temporal phenomenawhere chance comes into play.We distinguish many types of stochastic pro-cesses using certain mathematical proper-ties. The best known are:• Markov process (or Markov chain)• Martingale• Stationary process• Process of independent increments

HISTORYStochastic processes originally resultedfrom advances in the early 20th century incertain applied branches of statistics such asstatistical mechanics (by Gibbs, J.W., Boltz-mann, L., Poincaré, H., Smoluchowski, M.,and Langevin, P.). The theoretical founda-tions were formulated later by Doob, J.L.,Kolmogorov, A.N., and others (1930–1940).During this period the word “stochastic”,

Page 520: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

522 Stochastic Process

from the Greek word for “guess, believe,imagine”, started to be used. Other advanceswere made starting from a Brownian move-ment in physics (by Einstein, A., Levy, P.,and Wiener, N.).The Markov process was invented byMarkov, Andrei Andreevich (1856–1922), who laid the foundation for the theoryof Markov processes in finite time, gener-alized by the Russian mathematician Kol-mogorov, A. (1936).

MATHEMATICAL ASPECTSA stochastic process is a family (Xt)t∈I

(where I is a discrete or continuous set) ofrandom variables defined on the same proba-bility space. The processes in discrete timeare represented by the following families(Xt)t∈� (that iswith I = �)and theprocessesin continuous time by the families (Xt)t∈�+(that is, with I = �+ = [0,∞)).The term “stochastic process” is used if set Iis infinite. In physics, the elements of I rep-resent time.For an event w belonging to the fundamen-tal set�,wedefine the trajectoryof thestud-ied process as being the following family:

(Xt (w))t∈I ,

so that the “series” (for a process in discretetime) of the values of the random variablesis taken for a chosen and fixed element.The principal objects of the study of randomprocess are the properties of random vari-ables and trajectories.We characterize the different types of pro-cesses relative to the probabilistic conditionsto which they are related.1. Markov process

A Markov process is a random process(Xt)t∈� in discrete time for which the

probability of an event (in time t +1) depends only on the immediate past(time t).This is mathematically expressed by thefollowing condition:For each t ∈ � and for each i0, i1, . . . ,it+1 ∈ �+ we have:

P(Xt+1 = it+1|Xt = it,

Xt−1 = it−1, . . . , X0 = i0)

= P(Xt+1 = it+1|Xt = it) .

2. MartingaleA martingale is a random process (Xt)t∈�

in discrete time that verifies the followingcondition:For each t ∈ �:

E(Xt+1 |Xt, Xt−1, . . . , X0 ) = Xt .

3. Stationary processA stationary process (in the strict sense ofthe term) is a random process (Xt)t∈� forwhich the joint distributions of the pro-cess depend only on the time difference(between the considered variables in thejoint distribution) and not on the “abso-lute” time. Mathematically:For each t1, t2, . . . , tk ∈ I(k ≥ 2)

and for each a ∈ I, Xt1 , Xt2 , . . . , Xtkhave the same joint distribution asXt1+a, Xt2+a, . . . , Xtk+a.

4. Process with independent incrementsA process with independent increments isa random process (Xt)t∈I for which:For each t1 < t2 < . . . < tk ∈ I: Xt2 −Xt1 , Xt3−Xt2 , . . . , Xtk −Xtk−1 the randomvariables are reciprocally independent.

DOMAINS AND LIMITATIONSTheoretically, there is no difference betweenthe study of stochastic processes and statis-tics itself. In practice, the study of stochas-tic processes focuses more on the structure

Page 521: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Stratified Sampling 523

and properties of models than on inferencesfrom real data, which is properly the domainof statistics.

EXAMPLES1. Here we give an example that shows

the close mathematical relations betweenstatistics and stochastic processes. Weconsider the random variable Xi = ageof individual i of a population (potential-ly infinite). A typical problem is to esti-mate the mean E (X) with the followinghypothesis:• Random variable Xi is independent.• All Xi are identically distributed.

2. Random walk. Let Zn, n ≥ 2, be inde-pendent and identically distributed ran-dom variables taking the value +1 withprobabilitypandthevalue−1withproba-bility q = 1− p.We define the stochastic process of therandom walk as follows:

X0 = 0, Xt =t∑

n=1

Zn if t > 1 .

A sample of a random walk is typicallyrepresented as follows:

(0, 1, 2, 1, 0,−1, 0,−1,−2,−3,−2, . . .) .

It is easy to see that the defined stochasticprocess is a Markov process (this fact isa direct consequence of the independenceof Xt) with the following probabilities:

P (Xt+1 = i+ 1 |Xt = i ) = p ,

P (Xt+1 = i− 1 |Xt = i ) = q .

We have that:

E(Xt+1 |Xt ) = Xt + (+1) · p+ (−1) · q= Xt + (p− q).

The random walk is a martingale if andonly if p = q = 0.5.

FURTHER READING� Convergence� Random variable� Statistics� Time series

REFERENCESBarlett, M.S.: An Introduction to Stochastic

Processes. Cambridge University Press,Cambridge (1960)

Billingsley, P.: Statistical Inference forMarkov Processes. University of ChicagoPress, Chicago (1961)

Cox, D.R., Miller, H.D.: Theory of Stochas-tic Processes. Wiley, New York; Methuen,London (1965)

Doob, J.L.: What is a stochastic process?Am. Math. Monthly 49, 648–653 (1942)

Doob, J.L.: Stochastic Processes. Wiley,New York (1953)

Kolmogorov, A.N.: Math. Sbornik. N.S. 1,607–610 (1936)

Stratified Sampling

In stratified sampling, the population isfirst divided into subpopulations called stra-ta. These must not interpenetrate each other,and the set of these strata must constitute thewhole population.Once the strata have been determined, a ran-dom sample is taken (not necessarily of thesame size) from each stratum, this samplingbeing done independently in the differentstrata.

HISTORYSee sampling.

Page 522: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

524 Stratified Sampling

MATHEMATICAL ASPECTSWith respect to determining thenumber indi-viduals to be taken from each stratum, thereare mainly two approaches: the representa-tive stratified sample and the optimal strat-ified sample.

Representative Stratified SampleIn each stratum, a number of individuals pro-portional to the magnitude of the stratum istaken.Let Ni be the magnitude of stratum i, N thenumberofindividualsof thepopulation,andn the size of the global sample. The size ofa subsample ni is defined by:

ni = Ni

N· n , i = 1, 2, . . . , k ,

where k is the number of strata in the popu-lation.

Optimal Stratified SampleThe goal is to determine the sizes of the sub-samples to be taken to obtain the best possi-ble estimation. This is done by minimizingthe variance of the estimator m under theconstraint

k∑i=1

ni = n ,

where m is the mean of the sample definedby:

m = 1

N

k∑i=1

Nixi ,

where xi is the mean of subsample i. This isa classical optimization problem and givesthe following result:

ni = nk∑

i=1Ni · σi

Ni · σi ,

where σi is the standard deviation of stra-tum i.

Given a constant factor n/

k∑i=1

Ni · σi, the

size of the subset is proportional to the mag-nitude of the stratum and to the standarddeviation.The larger the dispersion of the studied vari-able in the stratum and the larger the size ofNi, the larger the size of subsample ni.

DOMAINS AND LIMITATIONSStratified sampling is based on the idea ofobtaining, as a result of controlling certainvariables, a sample that recreates an imagethat is as close as possible to the population.The dispersion in the population may bestrong; in this case, to maintain the samplingerror at an acceptable level, a very largesample must be created, which can increasethe cost considerably. To avoid this problem,stratified sampling can be a good alternative.The population can be subdivided into stra-ta in such a way that the variation within thestrata is relatively low,which will reduce therequired sample size for a given samplingerror.Stratified sampling may be justified for thefollowing reasons:1. If the population is too heterogeneous, it

is preferable to have more homogeneousgroupstoobtainasample that ismorerep-resentative of the population.

2. If one wants to obtain information on par-ticular aspects of a population (e.g., withrespect to different states), stratified sam-pling is better suited for this.

A control character used to define the stratamust obey the following rules:• It must be in close correlation with the

studied variables.

Page 523: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Student 525

• It must have a known value for every unitof the population.

Some of the most commonly used controlvariables are gender, age, region, sociopro-fessional class, etc.

EXAMPLESConsiderapopulationof600studentsdivid-ed into 4 strata. We are interested in theresults of a test with scores ranging from 0to 100.Suppose we have the following information:

Size Ni Standard deviation σi

Stratum 1 285 8.6

Stratum 2 150 5.2

Stratum 3 90 2.2

Stratum 4 75 1.4

If we want to take a stratified sample of 30individuals,weobtain thefollowingsubsam-ples:1. Representative stratified sample

ni = Ni

N· n

We then obtain the following values,rounded to the nearest integer:

n1 = N1

N· n

= 285

600· 30

= 14.25

≈ 14

n2 = 7.5 ≈ 7

n3 = 4.5 ≈ 5

n4 = 3.75 ≈ 4

2. Optimal stratified sample

ni = nk∑

i=1Ni · σi

· Ni · σi

We have:

N1 · σ1 = 2451

N2 · σ2 = 780

N3 · σ3 = 198

N4 · σ4 = 105

The constant factor n/

4∑i=1

Ni · σi is equal

to:

nk∑

i=1Ni · σi

= 30

3534= 0.0085 .

We obtain the following values, roundedto the nearest integer:

n1 = 0.0085 · 2451 = 20.81 ≈ 21

n2 = 0.0085 · 780 = 6.62 ≈ 6

n3 = 0.0085 · 198 = 1.68 ≈ 2

n4 = 0.0085 · 105 = 0.89 ≈ 1 .

If wecompareboth methods,wecan see that,given a more homogeneous stratum, the sec-ond method takes a larger numberof individ-uals in the first stratum (the most dispersed)and a smaller number of individuals in thesecond stratum.

FURTHER READING� Estimation� Estimator� Sampling� Simple random sampling

REFERENCESYates, F.: Sampling Methods for Censuses

and Surveys. Griffin, London (1949)

StudentSee Sealy, Gosset William.

Page 524: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

526 Student Distribution

Student DistributionRandom variable T follows a Studentdistribution if its density function is of theform:

f (t) =

(v+1

2

)√

vπ · ( v2

)(

1+ t2

v

)− v+12

,

where is the gamma function (see gammadistribution)andv thenumberofdegrees offreedom.

Student distribution, ν = 7

The Student distribution is a continuousprobability distribution.

HISTORYTheStudentdistributionwascreatedbyGos-set, W.S., known as “Student”, who in 1908publishedanarticle inwhichhedescribedthedensity function of the difference betweenthe mean of a sample and the mean of thepopulation from which the sample was tak-en, divided by the standard deviation of thesample. He also provided in the same arti-cle the first table of the corresponding distri-bution function.Student continued his research and in 1925published an article in which he proposednew tables that were more extensive andmore precise.Fisher, R.A. was interested in Gosset’sworks. He wrote him in 1912 to propose

a geometric demonstration of the Studentdistribution and to introduce the notion ofdegrees of freedom. In 1925 he publishedan article in which he defines the ratio t:

t = Z√Xv

,

where X and Z are two independent randomvariables,X isdistributedaccording toachi-square distribution with v degrees of free-dom, and Z is distributed according to thecentered and reduced normal distribution.This ratio is called the t of Student.

MATHEMATICAL ASPECTSIf X is a random variable that followsa chi-square distribution with v degreesof freedom, and Z is a random variable dis-tributed according to the standard normaldistribution, then the random variable

T = Z√Xv

follows a Student distribution with v degreesof freedom if X and Z are independent.Consider random variable T:

T =√

n (x− μ)

S,

where x is the mean of a sample, μ is themean of the population fromwhich thesam-ple was taken, and S is the standard devia-tion of the sample given by:

S =√√√√ 1

n− 1·

n∑i=1

(xi − x)2 .

Then T follows a Student distribution withn− 1 degrees of freedom.

Page 525: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Student Table 527

The expected value of random variable Tfollowing a Student distribution is given by:

E [T] = 0 , v > 1 ,

and the variance is equal to:

Var (T) = v

v− 2, v > 2 ,

where v is the number of degrees of free-dom.The Student distribution is related to othercontinuous probability distributions:• When the number of degrees of free-

dom is sufficiently large, the Studentdistribution approaches a normal distri-bution.

• When the number of degrees of freedomv is equal to 1, the Student distribution isidentical to a Cauchy distribution withα = 0 and θ = 1.

• If random variable X follows a Studentdistribution with v degrees of freedom,then random variable X2 follows a Fish-er distribution with 1 and v degrees offreedom.

DOMAINS AND LIMITATIONSThe Student distribution is a symmetricdistribution with a mean equal to 0.The larg-er its number of degrees of freedom, thesmaller its standard deviation becomes.The Student distribution is used in inferen-tial statistics in relation to the t ratio of Stu-dent in hypothesis testing and in the con-struction of confidence intervals for themean of a population.In tests of analysis of variance, the Stu-dent distribution can be used when the sumof squares between the groups (or the sum ofsquares of the factors) to be compared onlyhas one degree of freedom.

The normal distribution can be used asan approximation of the Student distributionwhen the number of observations n is large(n > 30 according to most authors).

FURTHER READING� Cauchy distribution� Continuous probability distribution� Fisher distribution� Normal distribution� Student table� Student test

REFERENCESFisher, R.A.: Applications of “Student’s”

distribution. Metron 5, 90–104 (1925)

Gosset, S.W. “Student”: The Probable Errorof a Mean. Biometrika 6, 1–25 (1908)

Gosset, S.W. “Student”: New tables fortesting the significance of observations.Metron 5, 105–120 (1925)

Student TableThe Student table gives the values of thedistribution function of a random vari-able following a Student distribution.

HISTORYGosset, William Sealy, known as “Stu-dent”, developed in 1908 the Student testand the table bearing his name.

MATHEMATICAL ASPECTSLet random variable T follow a Studentdistribution with v degrees of freedom. Thedensity function of random variable T isgiven by:

f (s) =

(v+1

2

)√

vπ · ( v2

)(

1+ s2

v

)−(v+1)/2

,

Page 526: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

528 Student Test

where represents the gamma function (seegamma distribution).The distribution function of random vari-able T is defined by:

F (t) = P (T ≤ t) =∫ t

−∞f (s) ds .

For each value of v, the Student table givesthe value of the distribution function F (t).Since the Student distribution is symmetricrelative to the origin, we have:

F (t) = 1− F (−t) ,

which allows to determine F (t) for each neg-ative value of t.TheStudent table isused in the inverse sense:to find the values of t corresponding to a giv-en probability. In thiscase, theStudent tablegives the critical value for the number ofdegrees of freedom v and the significancelevelα. We generally denote by tv,α the valueof random variable T for which:

P(T ≤ tv,α

) = 1− α .

DOMAINS AND LIMITATIONSSee Student test.

EXAMPLESSee Appendix E.We consider a one-tailed Student test ona sample of dimension n. For v = n − 1degrees of freedom, the Student table allowsto determine the critical value tv,α of the testfor the given significance level α.For an example using the Student table, seeStudent test.

FURTHER READING� Statistical table� Student distribution� Student test

REFERENCESFisher, R.A.: Applications of “Student’s”

distribution. Metron 5, 90–104 (1925)

Gosset, S.W. “Student”: The Probable Errorof a Mean. Biometrika 6, 1–25 (1908)

Gosset, S.W. “Student”: New tables fortesting the significance of observations.Metron 5, 105–120 (1925)

Student Test

The Student test is a parametric hypothesistest on the sample mean or on the compar-ison of the means of two samples.

HISTORYSee under history of Student table.

MATHEMATICAL ASPECTSLet X be a random variable distributedaccording to a normal distribution. Ran-dom variable T defined above follows a Stu-dent distribution with n−1 degrees of free-dom:

T = X − μ

S√n

,

where X is the arithmetic mean of the sam-ple, μ is the mean of the population, S is thestandard deviation of the sample, and n isthe sample size.The Student test consists in calculating thisratio for the sample data and comparing itwith the theoretical value of the Studenttable.A. Student test for one or single sample1. The hypotheses we want to test is:

– Null hypothesis:

H0 : μ = μ0 .

Page 527: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Student Test 529

We should test if the sample comesfrom a population with a specifiedmean, μ0, or if there is a statistical-ly significant difference between theobserved mean in the sample and thehypothetical mean on the population(μ0) under H0.

– Alternative hypothesis:The alternative hypothesis can takethree forms:

H1 : μ > μ0 (one-sided teston the right)

H1 : μ < μ0 (one-sided teston the left)

H1 : μ �= μ0 (two-sided test) .

2. Calculation of the t-statistic as:

t = x− μ0S√n

,

where μ0 is the mean of the populationspecified by H0, x is the sample mean, Sis the sample standard deviation, and n isthe sample size.

3. Choose of significance level α of test4. Compare of the calculated value t with the

appropriate critical value tn−1 (the valuefrom the Student table with n−1 degreesof freedom). Reject H0 if the absolute val-ue of t is greater than the critical valuetn−1.The critical values for different degrees offreedom and different significance levelsare given by the Student table.For a one-sided test we take the valuetn−1,1−α in the table, and for a two-sidedtest we take the value tn−1,1− α

2.

B. Student test for two samples1. The hypotheses we want to test are:

– Null hypothesis:

H0 : μ1 = μ2 .

In a test that designed to comparetwo samples, we want to know ifthere exists a significative differencebetween the mean of two populationsfrom which the samples were drawn.

– Alternative hypothesis:It can take the following three forms:

H1 : μ1 > μ2 (one-sided teston the right)

H1 : μ1 < μ2 (one-sided teston the left)

H1 : μ1 �= μ2 (two-sided test) .

2. Compute the t-statistic by:

t = x1 − x2

Sp ·√

1n1+ 1

n2

,

wheren1 andn2 are therespectivesamplessizes and Sp is the pooled standard devi-ation:

Sp =√

(n1 − 1) S21 + (n2 − 1) S2

2

n1 + n2 − 2

=

√√√√√√√

2∑i=1

ni∑j=1

(xij − xi

)2

n1 + n2 − 2,

where S21 represents the sample standard

deviation 1, S2 that of sample 2, and xij

observation j of sample i.3. Choose the significance level α of test4. Compare of the calculated value of t with

the critical value tn1+n2−2,1− α2

(the valueof the Student table with (n1 + n2 − 2)

degrees of freedom). Reject H0 if theabsolute value of t is greater than that ofthe critical value.If the test is one-sided, then take the val-ue tn1+n2−2,1−α from the Student table,and if it is two-sided, take the valuetn1+n2−2,1− α

2.

Page 528: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

530 Student Test

Test of Signification of Coefficientof RegressionThe Student test can also be applied toregression problems. Consider a simple lin-ear regression model for which the data con-sist of n pairs of observations (xi, yi), i =1, . . . , n, modeled by

yi = β0 + β1 · xi + εi ,

where the εi are the random independent andnormally distributed errors with mean 0 andvariance σ 2.The estimators of β0 and β1 are given by:

β0 = y− β1 · x ,

β1 =

n∑i=1

(xi − x) (yi − y)

n∑i=1

(xi − x) 2

.

Theestimator β1 isnormally distributed withmean β1 and variance

Var(β1) = σ 2

n∑i=1

(xi − x) 2

.

If σ 2 is estimated by the mean of squares ofresiduals S2 and replaced in the expressionby the variance of β1, then the ratio

T = β1 − β1√Var

(β1

)

is distributed according to the Student distri-bution with n− 2 degrees of freedom.At a certain significance level α we canmake a Student test of the coefficientβ1. Thehypotheses are the following:

Null hypothesis H0 : β1 = 0

Alternative hypothesis H1 : β1 �= 0 .

If the absolute value of the calculated ratioT is greater than the value of t in the Stu-dent table tn−2,1− α

2, then we can conclude

that the coefficient β1 is significatively dif-ferent from zero. In the contrary case, wecannot reject the null hypothesis (β1 = 0).We conclude in this case that the slope of theregression line is not significatively differentfrom zero.A similar test can be made for the constantβ0:Theestimator β0 isnormally distributed withmean β0 and variance

Var(β0) = σ 2 ·

⎡⎢⎢⎢⎢⎣

n∑i=1

x2i

n∑i=1

(xi − x)2

⎤⎥⎥⎥⎥⎦

.

If σ 2 is estimated by the mean of squares ofresidualsS2 and is replaced by theexpression

Var(β0

), then the ratio

T = β0 − β0√Var

(β0

)

is distributed according to the Student distri-bution with n− 2 degrees of freedom.The Student test concerns the followinghypotheses:

Null hypothesis H0 : β0 = 0

Alternative hypothesis H1 : β0 �= 0 .

If the absolute value of the calculated ratioT is greater than the value of t in the Stu-dent table tn−2,1− α

2, then we can conclude

that the coefficient β0 is significatively dif-ferent from zero. In the contrary case, wecannot reject the null hypothesis (β0 = 0).We conclude that the regression is not signi-ficatively different from zero.

Page 529: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Student Test 531

DOMAINS AND LIMITATIONSGenerally, we should be very attentive tothe choice of the test to perform. The Stu-dent test is often falsely applied. In fact, itmeans something only if the observationscome from a normally distributed popula-tion or are close to a normally distributedone. If this requirement is not satisfied, thecritical value of the Student test does notprovide an absolute guarantee.If the normal distribution is applicable, theStudent test on a sample is used when thestandard deviation of the population isunknown and when the dimension of thesample is not large (n < 30).Likewise, when there are two samples, theStudent test is used when the standard devi-ations of two populations are unknown andsupposed equal and when the sum of dimen-sions of samples is relatively small (n1 +n2 < 30).

EXAMPLESStudent Test on a SampleConsider a sample of dimension n = 10where the observations are normally dis-tributed and are the following:

47 51 48 49 48 52 47 49 46 47 .

We want to test the following hypotheses:

Null hypothesis H0 : μ = 50

Alternative hypothesis H1 : μ �= 50 .

The arithmetic mean of the observationsequals:

x =

10∑i=1

xi

n= 484

10= 48.4 .

We summarize the calculation in the follow-ing table:

xi(xi − x

)2

47 1.96

51 6.76

48 0.16

49 0.36

48 0.16

52 12.96

47 1.96

49 0.36

46 5.76

47 1.96

Total 32.4

The standard deviation equals:

S =

√√√√√10∑

i=1(xi − x)2

n− 1=

√32.4

9= 1.90 .

We calculate the ratio:

t = x− μ

S√n

= 48.4− 501.90√

10

= −2.67 .

The number of degrees of freedom associat-ed to the test equals n− 1 = 9.If we choose a significance level α of 5%,then the value tn−1,1− α

2of the Student table

equals:t9,0.975 = 2.26 .

As |t| = 2.67 > 2.26, we reject the nullhypothesis and conclude that at a signifi-cance level of 5%, the sample does not comefrom a population with a mean μ = 50.

Student Test on Two SamplesA study is conducted to compare the meanlifetime of two brands of tires. A sample wastaken for each brand. The results are:• For brand 1:

Mean lifetime: x1 = 72000 kmStandard deviation: S1 = 3200 kmSample dimension: n1 = 50

Page 530: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

532 Sure Event

• For brand 2:Mean lifetime: x2 = 74400 kmStandard deviation: S2 = 2400 kmSample size: n2 = 40

Wewant toknowif thereexistsasignificativedifference in the lifetime of the two brandsof tires, at a significance level α of 1%.This implies the following hypotheses:

Null hypothesis H0 :

μ1 = μ2 or μ1 − μ2 = 0

Alternative hypothesisH1 :

μ1 �= μ2 or μ1 − μ1 �= 0 .

The weighted standard deviation Sp equals:

Sp =√

(n1 − 1) · S21 + (n2 − 1) · S2

2

n1 + n2 − 2

=√

49 · 32002 + 39 · 24002

50+ 40− 2

= 2873.07 .

We can calculate the ratio t:

t = x1 − x2

Sp ·√

1n1+ 1

n2

= 72000− 74400

2873.07 ·√

150 + 1

40

= −3.938 .

The number of degrees of freedom associat-ed to the test equals:

v = n1 + n2 − 2 = 88 .

The value of tv,1− α2

found in the Studenttable, for α = 1%, is:

t88,0.995 = 2.634 .

As the absolute value of the calculated ratiois greater than this value, |t| = |−3.938| >

2.634, we reject the null hypothesis for thealternative hypothesis and conclude that atthe significance level 1%, the two brands oftires do not have the same mean lifetime.

FURTHER READING� Hypothesis testing� Parametric test� Student distribution� Student table

REFERENCESFisher, R.A.: Applications of “Student’s”

distribution. Metron 5, 90–104 (1925)

Gosset, S.W. “Student”: The Probable Errorof a Mean. Biometrika 6, 1–25 (1908)

Gosset, S.W. “Student”: New tables fortesting the significance of observations.Metron 5, 105–120 (1925)

Sure Event

See sample space.

Survey

An inquiry by survey, or simply survey, is aninquiry madeon a restrictedpartofapopula-tion. This fraction of the population consti-tutes the sample, and the methods that allowone to construct this sample are called sam-pling methods.Proceeding with the survey, we obtaininformation on the sampled units. Then,with the help of inferential statistics, wegeneralize this information to the entirepopulation. This generalization intro-duces a certain error. The importance of thiserror will depend on the sample size, howthey were chosen on the sampling scheme

Page 531: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

S

Systematic Sampling 533

used to draw the sample, and the estima-tor that was used. The more representativethe sample, the smaller the error. Samplingmethods are very useful in order to drawsamples which result in a minimal error.Another technique consists in observing allindividuals of a population. We refer hereto a census. Surveys have cost and speedadvantages over censuses that largely com-pensate for sampling error. Moreover, it issometimes impossible to carry out a census,forexample,whenthefactofobservingaunitofapopulationmeansitsdestructionorwhenthe population is infinite.

FURTHER READING� Census� Data collection� Inferential statistics� Panel� Population� Sample� Sampling

REFERENCESGourieroux, C.: Théorie des sondages. Eco-

nomica, Paris (1981)

Grosbras, J.M.: Méthodes statistiques dessondages. Economica, Paris (1987)

Systematic Sampling

Systematic sampling is a random type ofsampling. Individuals are taken from a pop-ulation at fixed intervals according to time,space, or order of occurrence, the first indi-vidual being drawn randomly.

HISTORYSee sampling.

MATHEMATICAL ASPECTSConsider a population composed of N indi-viduals. A systematic sample is made upof individuals whose numbers consitute anarithmetic progression.Afirstnumberb is chosen randomlybetween1 and r,where r = N

n , n is the sizeof the sam-ple,andr iscommondifferencebetweensuc-cessive form of the arithmetic progression.Individuals sampled in this way will have thefollowing numbers:

b, b+ r, b+ 2r, . . . , b+ (n− 1) r .

DOMAINS AND LIMITATIONSSystematic sampling is possible only if theindividuals are classified in a certain order.The main advantage of this method lies in itssimplicity and, therefore, its low cost.On the other hand, it has the disadvantage ofnot taking into account an eventual periodic-ity of the studied trait. Serious errors couldoccur from this, especially if the period isa submultiple of the ratio of the arithmeticprogression.

EXAMPLESConsider a population of 600 studentsgrouped in alphabetical order from whichwe want to take a sample of 30 individualsaccording to systematic sampling.The ratio of the arithmetic progression isequal to:

r = N

n= 600

30= 20 .

A first number is chosen randomly between1 and 20, for example b = 17.The samplewillbecomposed of thestudentswith the numbers:

17, 37, 57, . . . , 597 .

Page 532: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

534 Systematic Sampling

FURTHER READING� Cluster sampling� Sampling

REFERENCESDeming, W.E.: Sample Design in Business

Research. Wiley, New York (1960)

Page 533: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

T

Target Population

The concept of the target population isdefined as a population to which we wouldlike to apply the results of an inquiry.

EXAMPLESWe want to conduct an inquiry on foreignersliving in city X. The population is represent-ed by the list of foreigners registered in thecity.The target population is defined by the col-lection of foreigners living in city X, includ-ing those not on the list (unannounced, clan-destine, etc.).

FURTHER READING� Population

Test of Independence

A test of independence is a hypothesis testwhere theobjective is todetermineif tworan-dom variables are independent or not.

HISTORYSee Kendall rank correlation coefficient,Spearman rank correlation coefficient,and chi-square test of independence.

MATHEMATICAL ASPECTSLet (X1, X2, . . . , Xn) and (Y1, Y2, . . . , Yn) betwo samples of dimension n. The objectiveis to test the following two hypotheses:

Null hypothesis H0: The two variables Xand Y are independent.

Alternative hypothesis H1: The two vari-ables X and Y are not independent.

The testof independenceconsists in compar-ing the empirical distribution with the theo-retical distribution by calculating an indica-tor.The test of independence applied to two con-tinuous variables is based on the ranks of theobservations. Such is the case if the tests arebased on the Spearman rank correlationcoefficient or on the Kendall rank corre-lation coefficient.In the case of two categorical variables, themost widely used test is the chi-square testof independence.

DOMAINS AND LIMITATIONSTo make a test of independence each couple(Xi, Yi), i = 1, 2, . . . , n must come from thesame bivariate population.

EXAMPLESSee chi-square test of independence.

Page 534: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

536 Time Series

FURTHER READING� Chi-square test of independence� Hypothesis testing� Kendall rank correlation coefficient� Nonparametric test� Spearman rank correlation coefficient

Time Series

A time series is a sequence of observationsmeasured at succesive times. Time series aremonthly, trimestrial, or annual, sometimesweekly,daily,or hourly (studyof road traffic,telephone traffic), or biennial or decennial.Time series analysis consists of methods thatattempt to understand such time series tomake predictions.Time series can be decomposed into fourcomponents, each expressing a particularaspect of the movement of the values of thetime series.These four components are:• Secular trend, which describe the move-

ment along the term;• Seasonal variations, which represent

seasonal changes;• Cyclical fluctuations, which correspond

to periodical but not seasonal variations;• Irregular variations, which are oth-

er nonrandom sources of variations ofseries.

The analysis of time series consists in mak-ing mathematical descriptions of these ele-ments, that is, estimating separately the fourcomponents.

HISTORYIn an article in 1936 Funkhauser, H.G. repro-duced a diagram of the 10th century thatshows the inclination of the orbit of sev-en planets in function of time. Accord-

ing to Kendall this is the oldest time dia-gram known in the western world.It is adequate to look to astronomy for theorigins of the time series. The observationof the stars was popular in antiquity.In the 16th century, astronomy was againa source of important discoveries. The worksof Brahe, Tycho (1546–1601), whose collec-tion of data on the movement of the plan-ets allowed Kepler, Johannes (1571–1630)to formulate his laws on planetary motion.A statistical analysis of data, as empirical asit could be at that time, was at the base of thisgreat scientist’s work.The development of mathematics in the 18thand 19th centuries made it possible to bypassgraphical visualization to arrive at the firsttechniques of time series analysis. Using thefrequencyapproach,scientistsdevelopedthefirst works in this domain.Frequency analysis (also called harmonicanalysis) was originally designed to high-light one or more cyclical “components” ofa time series.In 1919, Persons, W.M. proposed a decom-position of time series in terms of tenden-cy (secular trends), cyclical cyclical fluc-tuations), seasonal (seasonal variation),and accidental (irregular variation) com-ponents. Many works have been devoted tothe determination and elimination of one oranother of these components.The determination of tendencies has oftenbeen related to problems of regression anal-ysis, also developed researchers relative tononlinear tendencies, as for example theGompertz curve (established in 1825 byGompertz, Benjamin) or the logistic curve(introduced in 1844 by Verhulst, PierreFrançois). Another approach to determin-ing tendencies concerns the use of movingaverages because they allow one to treat the

Page 535: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

T

Time Series 537

problem of tendencies when they serve toeliminate irregular or periodic variations.The use of moving averages was devel-oped in two directions: for actuarialproblems and for the study of the timeseries in economy. The use of mov-ing averages is particularly developed inframe of the methods of Seasonal Adjust-ment Methods. More precisely, the sepa-ration of seasonal variables from the othercomponents of the time series was studiedoriginally by Copeland, M.T. in 1915 andPersons, W.M. in 1919 and continued byMacauley, F.R. in 1930.Macauley, F.R. used the method of ratiosto the moving average that is still widelyused today (see seasonal variation) see alsoDufour (2006).

MATHEMATICAL ASPECTSLet Yt be a time series that can be decom-posed with the help of these four compo-nents:• Secular trend Tt

• Seasonal variations St

• Cyclical fluctuations Ct

• Irregular variations It

We suppose that the values taken by the ran-dom variable Yt are determined by a rela-tion between the four previous components.We distinguish three models of composition:1. Additive model:

Yt = Tt + Ct + St + It .

The four components are assumed to beindependent of one another.

2. Multiplicative model:

Yt = Tt · Ct · St · It .

With the help of logarithms we pass fromthe multiplicative model to the additivemodel.

3. Mixed model:

Yt = St + (Tt · Ct · It)

or

Yt = Ct + (Tt · St · It) .

This type of model is very little used.Wechoose theadditivemodelwhen seasonalvariationsarealmostconstantand their influ-ence on the tendency does not depend on itslevel.Whentheseasonalvariationsareofanampli-tudealmostproportional to thatof theseculartrends, we introduce the multiplicative mod-el; this is generally the case.The analysis of time series consists in deter-mining the values taken by each component.We always start with the seasonal variationsand end on the cyclical fluctuations. All fluc-tuations thatcannotbeattributed tooneof thethree components will be grouped with theirregular variations.Based on the used model the analysis can,after the components are determined, adjustsubsequent data by substraction or division.Thus when we have estimated the seculartrend Tt, at each time t, the time series isadjusted in the following manner:– If we use the additive model Yt = Tt +

Ct + St + It:

Yt − Tt = Ct + St + It .

– For the multiplicative model Yt = Tt ·Ct ·St · It:

Yt

Tt= Ct · St · It .

Successively evaluating each of the threecomponents and adjusting the time seriesamong them, we obtain the values attributedto the irregular variations.

Page 536: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

538 Time Series

DOMAINS AND LIMITATIONSWhen the time series are not in a long enoughperiod, we group the secular trend andthe cyclical fluctuations in one componentcalled extraseasonal movement.

Graphical RepresentationThegraphical representationoftimeserieshelps to analyze observed values dependingon time.To analyze these values, a certain number ofprecautions must be taken:

1. Each point must be representative toan observation. If the value representsa stock, on a fix date, then the point isfound on the reference date. If there isa flow, for example, monthly, then thepoint is fixed in the middle of the month.

2. Sometimes one has to report, on the samegraph, annual data and monthly data. Therepresentative points of the results shouldbe 12 times more frequent than those rela-tive to the annual results. Thus we have topay attention to the representative pointsin the right place.

3. If many series are represented on thesamegraph, they must be identical in nature.Otherwise, we should use different scalesor place graphs below one another.

The principal objectives of the study of thetime series are:

• Description: determination or elimina-tion of different components

• Stochastic modeling• Forecasting• Filtering: elimination or conversion of

certain characteristics• Control: observation of the past of a pro-

cess to react to future evolution

For forecasting, the analysis of time series isbased on the following hypotheses:

• The great historically observed tenden-ciesaremaintainedinthemoreorlessnearfuture.

• Themeasurablefluctuationsofavariableare reproduced in regular intervals.

The analysis of time series will be justifiedas long as it allows one to reduce the level ofuncertainty in the elaboration of forecasts.We distinguish:1. Long-term forecast:

Based on the secular trend; generally doesnot take intoaccountcyclicalfluctuations.

2. Middle-term forecast:To take into account the probable effect ofthe cyclical component, we should mul-tiply the forecasted value of the seculartrend by an estimation of the relative dis-counted variation attributable to the cycli-cal fluctuations.

3. Short-term forecast:Generally we do not try to forecast irreg-ular variations or make short-term fore-casts or forecasts concerning the longerperiods. Thus we multiply the forecastedvalues of the secular trend by estimatingthe cyclical fluctuations and by the sea-sonal variation of the corresponding date.

During an analysis of time series, we face thefollowing problems:1. Is the classical model appropriate?

We expect the quality of the results ofanalysis to be proportional to the preci-sion and the accuracy of the model itself.ThemultiplicativemodelYt = Tt·St·Ct ·It

presupposes that the four components areindependent, which is rarely true in real-ity.

2. Are our hypotheses of the constancy andregularity valid?The analysis must be adjusted for subjec-tive and qualitative factors that exist inalmost any situation.

Page 537: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

T

Trace 539

3. Can we trust the available data?The problem may arise of breaks of seriesresulting from insufficiency of data ora change in the quality of variables.

EXAMPLESA time series is generally represented as fol-lows:

We distinguish four components:• Secular trend, slightly increasing in the

present case• Seasonal variations, readily apparent• Cyclicalfluctuations, in theformofcycles

of an approximate amplitude of 27 unitsof time

• Irregular variations, generally weakenough, except t = 38, which repre-sents a very abrupt fall that would not berational

FURTHER READING� Cyclical fluctuation� Forecasting� Graphical representation� Irregular variation� Seasonal variation� Secular trend

REFERENCESCopeland, M.T.: Statistical indices of busi-

ness conditions. Q. J. Econ. 29, 522–562(1915)

Dufour, I.M.: Histoire de l’analyse de SériesChronologues, Université de Montréal,Canada (2006)

Funkhauser, H.G.: A note on a 10th centurygraph. Osiris 1, 260–262 (1936)

Kendall, M.G.: Time Series. Griffin, London(1973)

Macauley, F.R.: The smoothing of timeseries. National Bureau of EconomicResearch, 121–136 (1930)

Persons, W.M.: An index of general businessconditions. Rev. Econom. Stat. 1, 111–205 (1919)

TraceFor a square matrix A of order n, we definethe trace of A as the sum of the terms situatedon the diagonal. Thus the trace of a matrix isa scalar quantity.

MATHEMATICAL ASPECTSLet A be a square matrix of order n, A =(aij

), where i, j = 1, 2, . . . , n. We define

trace A by:

tr (A) =n∑

i=1

aii .

DOMAINS AND LIMITATIONSAs we are interested only in diagonal ele-ments, the trace is defined only for squarematrices.1. The trace of the sum of two matrices

equals the sum of the traces of the matri-ces:

tr (A+ B) = tr (A)+ tr (B) .

2. The trace of the product of two matricesdoes not change with the commutationsof the matrices. Mathematically:

tr (A · B) = tr (B · A) .

Page 538: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

540 Transformation

EXAMPLESLetAandBbetwosquarematricesoforder2:

A =[

1 21 3

]and B =

[5 2−2 3

].

The calculation of traces gives:

tr (A) = tr

[1 21 3

]= 1+ 3 = 4 ,

tr (B) = tr

[5 2−2 3

]= 5+ 3 = 8 ,

and

tr (A+ B) = tr

[6 4−1 6

]

= 6+ 6 = 12 .

We verify that tr (A) + tr (B) = 12. On theother hand:

A · B =[

1 8−1 11

]and

B · A =[

7 161 5

],

tr (A · B) = tr

[1 8−1 11

]

= 1+ 11 = 12 ,

tr (B · A) = tr

[7 161 5

]

= 7+ 5 = 12 .

Thus tr (A · B) = tr (B · A) = 12.

FURTHER READING� Matrix

TransformationA transformation is a change in one or manyvariables in a statistical study.We transform variables, for example,by replacing them with their logarithms(logarithmic transformation).

DOMAINS AND LIMITATIONSWe transform variables for many reasons:

1. When we are in the presence of a data setrelative to many variables and we want toexpress one of them, Y (dependent vari-able), with the help of others (called inde-pendent variables) and we want to lin-earize the relation between the variables.For example, when the relation betweenindependent variables is multiplicative,a logarithmic transformation of thedata makes it additive and allows the useof linear regression to estimate theparam-eters of the model.

2. When we are in the presence of a dataset relative to a variable and its varianceis not constant, it is generally possible tomake it almost constant by transformingthis variable.

3. When a random variable follows anydistribution, an adequate transformationof it allows to obtain a new random vari-able that approximately follows a normaldistribution.

The advantages of transforming data set rel-ative to a variable are largely summarized asfollows:

• It simplifies existing relations with othervariables.

• A remedy for outliers, linearity andhomoscedasticity.

• It produces a variable that is approximate-ly normally distributed.

FURTHER READING� Dependent variable� Independent variable� Logarithmic transformation� Normal distribution� Regression analysis� Variance

Page 539: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

T

Treatment 541

REFERENCESCox, D.R., Box, G.E.P.: An analysis of trans-

formations (with discussion). J. Roy. Stat.Soc. Ser. B 26, 211–243 (1964)

Transpose

The(matrix) transposeofamatrix Aoforder(m× n) is a matrix (n×m), denoted by A′,obtained by writing the lines of A in columnsof A′ and vice versa.Thus we effect simple “inversion” of thetable.

MATHEMATICAL ASPECTSIf A = (

aij), i = 1, 2, . . . , m and j =

1, 2, . . . , n is a matrix of order (m× n), thenthe transpose of A is the matrix A′ of order(n×m) given by:

A′ = (aji

)with j = 1, 2, . . . , n

and i = 1, 2, . . . , m .

Transposition is a reversible and recipro-cal operation; thus taking the transpose ofthe transpose of a matrix, we find the initialmatrix.

DOMAINS AND LIMITATIONSWe use the transpose of a matrix, or moreprecisely the transposeofavector,whilecal-culating the scalar product.

EXAMPLESLet A be the following matrix of order(3× 2):

A =⎡⎣

1 20 32 5

⎤⎦ .

The transpose of A is the matrix (2× 3):

A′ =[

1 0 22 3 5

].

We can also verify that the transpose of A′equals the initial matrix A:

(A′

)′ = A .

FURTHER READING� Matrix� Vector

Treatment

In an experimental design, a treatment isa particular combination of levels of variousfactors.

EXAMPLESExperimentsareoftencarriedout tocomparetwo or more treatments, for example two dif-ferent fertilizers on a particular type of plantor many types of drugs to treat a certain ill-ness. Another example consists in measur-ing the time of coagulation of blood sam-plesof16animalshavingsupporteddifferentregimes A, B, C, and D.In this case, different examples of treat-ments are, respectively, fertilizers, drugs,and regimes.On the other hand, in experiments that testa particular fertilizer, for example azote, ona harvest of wheat, we can consider differ-ent quantities of the same fertilizer as dif-ferent treatments. Here, we have one factor(azote) on different levels (quantities), forexample 30 kg, 50 kg, 100 kg, and 200 kg.Each of these levels corresponds to a treat-ment.

Page 540: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

542 Tukey, John Wilder

The goal of an experiment is to determineif there is a real (significative) differencebetween these treatments.

FURTHER READING� Design of experiments� Experiment� Experimental unit� Factor

REFERENCESSenn, St.: Cross-over Trials in Clinical

Research. Wiley (1993)

Tukey, John Wilder

Tukey, John Wilder was born in Bedford,MA in 1915. He studied chemistry at BrownUniversity and received his doctorate inmathematics at Princeton University in1939. At the age of 35, he became pro-fessor of mathematics at Princeton. He wasa key player in the formation in the Dept ofStatistics in Princeton University in 1966.He served as its chairman during 1966–1969. He is the author of Exploratory DataAnalysis (1977) (translated into Russian)and eight volumes of articles. He is coau-thor of Statistical Problems of the KinseyReport on Sexual Behavior in the HumanMale; Data Analysis and Regression (alsotranslated into Russian), Index to Statisticsand Probability. He is coeditor of the fol-lowing books: Understanding Robust andExploratory Data Analysis, Exploring DataTables, Trends Shapes, Configural Polysam-pling, and Fundamentals of ExploratoryAnalysis of Variance. He died in New Jer-sey in July 2000.

Selected principal works and articles ofTukey, John Wilder:

1977 Exploratory Data Analysis. 1st edn.Addison-Wesley, Reading, MA.

1980 We need both exploratory and confir-matory. Am. Stat. 34, 23–25.

1986 Data analysis and behavioral scienceor learning to bear the quantitativeman’s burden by shunning badmand-ments. In: Collected Works of John W.Tukey, Vol III: Philosophy and Prin-ciples of Data Analysis, 1949–1964.(Jones, L.V., ed.) Wadsworth AdvancedBooks & Software, Monterey, CA.

1986 Collected Works of John W. Tukey,Vol III: Philosophy and Principles ofData Analysis, 1949–1964. (Jones,L.V., ed.) Wadsworth Advanced Books& Software, Monterey, CA.

1986 Collected Works of John W. Tukey,Vol IV: Philosophy and Princi-ples of Data Analysis, 1965–1986.(Jones, L.V., ed.) Wadsworth AdvancedBooks & Software, Monterey, CA.

1988 Collected Works of John W. Tukey,Vol V: Graphics, 1965–1985. (Cleve-land, W.S., ed.) Waldsworth andBrooks/Cole, Pacific Grove, CA.

FURTHER READING� Box plot� Exploratory data analysis

Two-Sided TestA two-sided test concerning a population isa hypothesis test that is applied when wewant to compare an estimate of a param-eter to a given value against the alternativehypothesis not equal to the statet value.

Page 541: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

T

Two-Sided Test 543

MATHEMATICAL ASPECTSA two-tail test is a hypothesis test on a pop-ulation of the type:

Null hypothesis H0 : θ = θ0

Alternative hypothesis H1 : θ �= θ0 ,

where θ is a parameter of the populationwhose value is unknown and θ0 is the pre-sumed value of this parameter.For a test concerning the comparison of twopopulations, the hypotheses are:

Null hypothesis H0 : θ1 = θ2

Alternative hypothesis H1 : θ1 �= θ2 ,

where θ1 and θ2 are the unknown parametersof two underlying populations.

DOMAINS AND LIMITATIONSIn a two-sided test, the rejection region isdivided into two parts, the left and the rightsides of the considered parameter.

EXAMPLESA company produces steel cables. It wants toverify if the diameter μ of much of the pro-duced cable conforms tothe standard diam-eter of 0.9 cm.It takes a sample of 100 cables whose meandiameter is 0.93 cm with a standard devia-tion of S = 0.12.The hypotheses are the following:

Null hypothesis H0 : μ = 0.9

Alternative hypothesis H1 : μ �= 0.9 ,

As the sample size is large enough, thesampling distribution of the mean can beapproximated by a normal distribution.For a significance level α = 5%, we obtaina critical value zα/2 in the normal table ofzα/2 = 1.96.

We find the acceptance region of the nullhypotheses (see rejection region):

μ± zα/2 · σ√n

,

where σ is the standard deviation of thepopulation. Since the standard deviationσ isunknown, we estimate it using the standarddeviation S of the sample (S = 0.12), whichgives:

0.9± 1.96 · 0.12√100=

{0.87650.9235 .

The acceptance region of the null hypoth-esis equals the interval [0.8765, 0.9235].Because the mean of the sample (0.93 cm)is outside this interval, we must reject thenull hypothesis for the alternative hypoth-esis. We conclude that at a significance levelof 5% and depending on the studied sample,the diameter of the cables does not conformto the norm.A study is carried out to compare the meanlifetime of two types of tires. A sample of50 tires of brand 1 gives a mean lifetime of72000 km, and a sample of 40 tires of brand 2has a mean lifetime of 74400 km. Suppos-ing that the standard deviation for brand 1 isσ1 = 3200 km and that of brand 2 is σ2 =2400 km, we want to know if there is a sig-nificative difference in lifetimes between thetwo brands, at a significance level α =1%.The hypotheses are the following:

Null hypothesis H0 : μ1 = μ2

Alternative hypothesis H1 : μ1 �= μ2 .

Thedimension of thesample is largeenough,and the standard deviations of the popula-tions are known, so that we can approach the

Page 542: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

544 Two-way Analysis of Variance

sampling distribution using a normal distri-bution.The acceptance region of the null hypothesisis given by (see rejection region):

μ1 − μ2 ± zα/2 · σx1−x2 ,

where σx1−x2 is the standard deviation of thesamplingdistributionof thedifferenceoftwomeans, or the standard error of two means.The value zα/2 is found in the normal table.For a two-tail test with α = 1%, we getzα/2 = 2.575, which gives:

0± 2.575 ·√

σ 21

n1+ σ 2

2

n2

= 0± 2.575 ·√

32002

50+ 24002

40

= 0± 2.575 · 590.59

= ±1520.77 .

The acceptance region of the nullhypothesis is given by the interval[−1520.77, 1520.77]. Because the differ-ence in means of the samples (72000 −74400 = −2400) is outside the acceptanceregion, we must reject the null hypothesisfor the alternative hypothesis. Thus we canconclude that there is a significative differ-ence between the lifetimes of the tires of thetested brands.

FURTHER READING� Acceptance region� Alternative hypothesis� Hypothesis testing� Null hypothesis� One-sided test� Rejection region� Sampling distribution� Significance level

Two-Way Analysisof Variance

The two-way analysis of variance is anexpansion of one-way analysis of variancein which there are two independent factors(variables). Each factor has two or morelevels and treatments are formed by makingall possible combinations of levels of twofactors.

HISTORYIt is Fisher, R.A. (1925) who gave the name“factorial experiments” to complex expe-riments.Yates, F. (1935, 1937) developed the conceptand analysis of these factorial experiments.See analysis of variance.

MATHEMATICAL ASPECTSGiven an experiment with two factors, fac-tor A having a levels and factor B having blevels.If the design associated with this factorialexperiment is a completely randomizeddesign, the model is the following:

Yijk = μ+ αi + βj + (αβ)ij + εijk ,

i = 1, 2, . . . , a (levels of factor A)

j = 1, 2, . . . , b (levels of factor B)

k = 1, 2, . . . , c (number of observationsreceiving the treatment ij)

where μ is the general mean common to alltreatments, αi is the effect of level i of fac-tor A, βj is the effect of level j of factor B,(αβ)ij is the effect of the interaction between

Page 543: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

T

Two-way Analysis of Variance 545

αi and βj, and εijk is the experimental errorof observation Yijk.Thismodel is subjected to thebasicassump-tions associated to analysis of variance ifone supposes that the errors εijk are inde-pendent random variables following a nor-mal distribution N(0, σ 2).Three hypotheses can then be tested:1.

H0 : α1 = α2 = . . . = αa

H1 : At least one αi is differentfrom αj, i �= j.

2.

H0 : β1 = β2 = . . . = βb

H1 : At least one βi is differentfrom βj, i �= j.

3.

H0 : (αβ)11 = (αβ)12 = . . . = (αβ)1b

= (αβ)21 = . . . = (αβ)ab

H1 : At least one of theinteractions is differentfrom the others.

The Fisher distribution is used to test thefirst hypothesis. This distribution requiresthe creation of a ratio whose numerator isan estimate of the variance of factor A andwhosedenominator isan estimateof thevari-ance within treatments known also as erroror residual.This ratio, denoted by F, follows a Fisherdistributionwitha−1andab(c−1)degreesof freedom.

Null hypothesis H0 : α1 = α2 = . . . = αa

will be rejected at the significant level α ifthe ratio F isgreater than orequal to thevaluein the Fisher table, meaning if

F ≥ Fa−1,ab(c−1),α .

To test the second hypothesis, we createa ratio whose numerator is an estimate of thevariance of factor B and whose denomina-tor isan estimateof thevariation within treat-ments.This ratio, denoted by F, follows a Fisherdistributionwithb−1andab(c−1)degreesof freedom.

Null hypothesis H0 : β1 = β2 = . . . = βb

will be rejected if the F ratio is greater than orequal to the value of the Fisher table, mean-ing if

F ≥ Fb−1,ab(c−1),α .

To test the third hypothesis, we create a ratiowhose numerator is an estimate of thevariance of the interaction between fac-tors A and B and whose denominator is anestimate of the variance within treatments.This ratio, denoted by F, follows a Fisherdistributionwith (a−1)(b−1)andab(c−1)

degrees of freedom.

Null hypothesis H0 :

(αβ)11 = (αβ)12 = . . . = (αβ)1b

= (αβ)21 = . . . = (αβ)ab

will be rejected if the F ratio is greater than orequal to the value in the Fisher table, mean-ing if

F ≥ F(a−1)(b−1),ab(c−1),α .

Variance of Factor AFor the variance of factor A, the sum ofsquaresmustbecalculatedforfactorA(SSA),which is obtained as follows:

SSA = b · ca∑

i=1

(Yi.. − Y...)2 ,

where Yi.. is themeanofall theobservationsof level i of factor A and Y... is the generalmean of all the observations.

Page 544: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

546 Two-way Analysis of Variance

The number of degrees of freedom associ-ated to this sum is equal to a− 1.The variance of factor A is then equal to:

s2A =

SSA

a− 1.

Variance of Factor BFor the variance of factor B, the sum ofsquaresmustbecalculatedfor factorB(SSB),which is obtained as follows:

SSB = a · cb∑

j=1

(Y.j. − Y...)2 ,

where Yi.. is themeanofall theobservationsof level i of factor B and Y... is the generalmean of all the observations.The number of degrees of freedom associ-ated to this sum is equal to b− 1.The variance of factor B is then equal to:

s2B =

SSB

b− 1.

Variance of Interaction ABFor the variance of interaction AB, the sumof squares must be calculated for interac-tion AB(SSAB),which isobtained as follows:

SSAB = ca∑

i=1

b∑j=1

(Yij. − Yi.. − Y.j. + Y...)2 ,

where Yij. is themeanofall theobservationsof level ijof interactionAB, Yi.. is themeanofall the observations of level i of factor A, Y.j.

is the mean of all the observations of level jof factor B, and Y... is the general mean of allthe observations.The number of degrees of freedom associ-ated to this sum is equal to (a− 1)(b− 1).The variance of interaction AB is thenequal to:

s2AB =

SSAB

(a− 1)(b− 1).

Variance Within Treatments or ErrorFor the variance within treatments, the sumof squares within treatments is obtained asfollows:

SSE =a∑

i=1

b∑j=1

c∑k=1

(Yijk − Yij.)2 ,

where a and b are, respectively, the numberof levels of factors A and B, c is the numberof observations receiving treatment ij (con-stant forany iand j),Yijk isobservationk fromlevel i of factor A and from level j of factor B,and Yij. is the mean of all the observations oflevel ij of interaction AB.The number of degrees of freedom associ-ated to this sum is equal to ab(c− 1).The variance within treatments is thus equalto:

s2E =

SSE

ab(c− 1).

The total sum of squares (SST) is equal to:

SST =a∑

i=1

b∑j=1

c∑k=1

Y2ijk .

The number of degrees of freedom associ-ated to this sum is equal to N, meaning thetotal number of observations.The sum of squares for the mean is equal to:

SSM = NY2... .

The number of degrees of freedom associ-ated to this sum is equal to 1.The total sum of squares (SST) can beexpressed with all the other sums of squaresin the following way:

SST = SSM + SSA + SSB + SSAB + SSE .

Fisher TestsWe can now calculate the different F ratios.

Page 545: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

T

Two-way Analysis of Variance 547

To test the null hypothesis

H0 : α1 = α2 = . . . = αa ,

the first F ratio is made whose numeratoris the estimation of the variance of fac-tor A and denominator is the estimation ofthe variance within treatments:

F = s2A

s2E

.

If F is greater than or equal to the value in theFisher table for a−1 and ab(c−1)degreesof freedom, the null hypothesis is rejectedand it is concluded that at least one αi is dif-ferent from αj, i �= j.To test the null hypothesis

H0 : β1 = β2 = . . . = βb ,

the second F ratio is made whose numeratoris the estimation of the variance of factor Band denominator is theestimationof thevari-ance within treatments:

F = s2B

s2E

.

If F is greater than or equal to the value in theFisher table for b−1 and ab(c−1)degreesof freedom, the null hypothesis is rejectedand it is concluded that H, is true.To test the null hypothesis

H0 : (αβ)11 = (αβ)12 = . . . = (αβ)1b

= (αβ)21 = . . . = (αβ)ab

the third F ratio is made whose numerator isthe estimation of the variance of interac-tion AB and whose denominator is the esti-mation of the variance within treatments:

F = s2AB

s2E

.

If F is greater than or equal to the value in theFisher table for (a−1)(b−1) and ab(c−1)

degrees of freedom, the null hypothesis isrejected and it is concluded that H, is true.

Table of Analysis of VarianceAll the information required to calculate thetwo F ratios can be summarized in a table ofanalysis of variance:

Sourceof varia-tion

Degreesof free-dom

Sum ofsquares

Mean ofsquares

F

Mean 1 SSM

Factor A a− 1 SSA s2A

s2A

s2E

Factor B b− 1 SSB s2B

s2B

s2E

Inter-actionAB

(a− 1)

· (b− 1)SSAB s2

AB

s2AB

s2E

Withintreat-ments

ab(c−1) SSI s2E

Total abc SST

If c = 1, meaning that there is only oneobservation per treatment ij (or per cell ij),it is not possible to attribute part of the error“inside the groups” to the interaction.

EXAMPLESThe director of a school wants to test fourbrands of typewriters. He asks five profes-sionalsecretaries to tryeachbrandandrepeatthe exercise the next day. They must all typethe same text, and after 15 min the averagenumber of words typed in 1 min is recorded.Here are the results:

Page 546: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

548 Two-way Analysis of Variance

Typewriter Secretary

1 2 3 4 5

1 33 31 34 34 31

36 31 36 33 31

2 32 37 40 33 35

35 35 36 36 36

3 37 35 34 31 37

39 35 37 35 40

4 29 31 33 31 33

31 33 34 27 33

Let us do a two-way analysis of variance,the first factor being the brands of the type-writers and the second the secretaries.The null hypotheses are the following:1.

H0 : α1 = α2 = α3 = α4

H1 : At least one αi is differentfrom αj, i �= j.

2.

H0 : β1 = β2 = β3 = β4 = β5

H1 : At least one βi is differentfrom βj, i �= j.

3.

H0 : (αβ)11 = (αβ)12 = . . . = (αβ)15

= (αβ)21 = . . . = (αβ)45

H1 : At least one of the interactionsis different from the others.

Variance of Factor A“Brands of Typewriters”Variance of factor A

The sum of squares of factor A is equal to:

SSA = 5 · 24∑

i=1

(Yi.. − Y...)2

= 5 · 2[(33− 34)2 + (35.5− 34)2

+(36− 34)2 + (31.5− 34)2]

= 135 .

The number of degrees of freedom associ-ated to this sum is equal to 4− 1 = 3.The variance of factor A (mean of squares)is then equal to:

s2A =

SSA

a− 1

= 135

3= 45 .

Variance of Factor B “Secretaries”The sum of squares of factor B is equal to:

SSB = 4 · 25∑

j=1

(Y.j. − Y...)2

= 4 · 2[(34− 34)2 + (33.5− 34)2

+ . . .+ (35.5− 34)2

+(34.5− 34)2]= 40 .

The number of degrees of freedom associ-ated to this sum is equal to 5− 1 = 4.The variance of factor B (mean of squares)is then equal to:

s2B =

SSB

b− 1

= 40

4= 10 .

Variance of Interaction AB“Brands of Typewriters – Secretaries”The sum of squares of interaction AB isequal to:

SSAB = 24∑

i=1

5∑j=1

(Yij. − Yi.. − Y.j. + Y...)2

= 2[(34.5− 33− 34+ 34)2+

(31− 31.5− 34.5+ 34)2 + . . .

+(33− 31.5− 34.5+ 34)2]

Page 547: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

T

Two-way Analysis of Variance 549

= 2[1.52 + (−1.5)2 + . . .+ 12

]

= 83 .

The number of degrees of freedom associ-ated to this sum is equal to (4−1)(5−1) =12.The variance of interaction AB is thenequal to:

s2AB =

SSAB

(a− 1)(b− 1)

= 83

12= 6.92 .

Variance Within Treatments (Error)The sum of squares within treatments isequal to:

SSE =4∑

i=1

5∑j=1

2∑k=1

(Yijk − Yij.)2

= (33− 34.5)2 + (36− 34.5)2

+ (31− 31)2 + . . .+ (33− 33)2

= 58 .

The number of degrees of freedom associ-ated to this sum is equal to 4 ·5(2−1)= 20:

s2E =

SSI

ab(c− 1)

= 58

20= 2.9 .

The total sum of squares is equal to:

SST =4∑

i=1

5∑j=1

2∑k=1

(Yijk)2

= 332 + 362 + 312 + . . .

+ 272 + 332 + 332

= 46556 .

The number of degrees of freedom associ-ated to this sum is equal to the number N ofobservations, meaning 40.The sum of squares for the mean is equal to:

SSM = NY2...

= 40 · 342

= 46240 .

The number of degrees of freedom associ-ated to this sum is equal to 1.

Fisher TestsTo test the null hypothesis

H0 : α1 = α2 = α3 = α4 ,

we form the first F ratio whose numeratoris the estimation of the variance of fac-tor A and denominator is the estimation ofthe variance within treatments:

F = s2A

s2E

= 45

2.9= 15.5172 .

The value of the Fisher table with 3 and 20degrees of freedom for a significant levelα = 0.05 is equal to 3.10.Since F ≥ F3,20,0.05, we reject the H0

hypotheses and conclude that the brands oftypewriters have a significant effect on thenumber of words typed in 1 min.To test the null hypothesis

H0 : β1 = β2 = β3 = β4 = β5 ,

we form the second F ratio whose numeratoris the estimation of the variance of factor Band whose denominator is the estimation of

Page 548: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

550 Type I Error

the variance within treatments:

F = s2B

s2E

= 10

2.9= 3.4483 .

The value of the Fisher table with 4 and 20degrees of freedom for α = 0.05 is equalto 2.87.Since F ≥ F4,20,0.05, we reject the H0

hypotheses and conclude that thesecretarieshave a significant effect on the number ofwords typed in 1 min.To test the null hypothesis

H0 : (αβ)11 = (αβ)12 = . . . = (αβ)15

= (αβ)21 = . . . = (αβ)45 ,

we form the third F ratio whose numeratoris the estimation of the variance of interac-tion AB and whose denominator is the esti-mation of the variance within treatments:

F = s2AB

s2E

= 6.92

2.9= 2.385 .

The value of the Fisher table with 12 and 20degrees of freedom for α = 0.05 is equalto 2.28.Since F ≥ F4,20,0.05, we reject the H0

hypotheses and conclude that the effect ofthe interaction between the secretary and thetype of typewriter is significant on the num-ber of words typed in 1 min.

Table of Analysis of VarianceAll this information is summarized in thetable of analysis of variance:

Sourceof varia-tion

Degreesof free-dom

Sum ofsquares

Mean ofsquares

F

Mean 1 46240

Factor A 3 135 45 15.5172

Factor B 4 40 10 3.4483

Inter-actionAB

12 83 6.92 2.385

Withintreat-ments

20 58 2.9

Total 40 46556

FURTHER READING� Analysis of variance� Contrast� Fisher table� Fisher test� Interaction� Least significant difference test� One-way analysis of variance

REFERENCESCox, D.R., Reid, N.: Theory of the design

ofexperiments. Chapman & Hall, London(2000)

Fisher, R.A.: Statistical Methods forResearch Workers. Oliver & Boyd, Edin-burgh (1925)

Yates, F.: Complex experiments. Suppl. J.Roy. Stat. Soc. 2, 181–247 (1935)

Yates, F.: The design and analysis of fac-torial experiments. Technical Commu-nication of the Commonwealth Bureauof Soils 35, Commonwealth AgriculturalBureaux, Farnham Royal (1937)

Type I ErrorWhen hypothesis testing is carried out,a type I error occurs when one rejects the

Page 549: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

T

Type I Error 551

null hypothesis when it is true, and α is theprobability of rejecting the null hypothesisH0 when it is true:

α = P(reject H0 | H0 is true) .

The probability of type I error α is equal tothe significance level of the hypothesis test.

HISTORYIn1928,Neyman, J.andPearson, E.S.werethe first authors to recognize that a ratio-nal choice of hypothesis testing must takeinto account not only the hypothesis thatone wants to verify but also the alternativehypothesis.They introduced the typeIerrorand the type II error.

DOMAINS AND LIMITATIONSThe type I error is one of the two errors thestatistician is confronted with in hypothesistesting: it is the type of error that can occurin decision making if the null hypothesis istrue.If the null hypothesis is wrong, another typeoferrorarisescalled the typeII error,mean-ing accepting the null hypothesis H0 when itis false.Thedifferenttypesoferrorscanberepresent-ed by the following table:

Situation Decision

Accept H0 Reject H0

H0 true 1− α α

H0 false β 1− β

with α is the probability of the type I errorand β is the probability of the type II error.

EXAMPLESWe will illustrate, using an example fromeveryday life, thedifferenterrors thatonecancommit in making a decision.

When we leave home in the morning, wewonder what the weather will be like. If wethink it is going to rain, we take an umbrella.If we think it is going to be sunny, we do nottake anything for bad weather.Thereforeweareconfrontedwith thefollow-ing hypotheses:

Null hypothesis H0: It is going to rain.

Alternative hypothesis H1: It is going tobe sunny.

Suppose we accept the rain hypothesis andtake an umbrella.If it really does rain, we have made the rightdecision, but if it is sunny, we made an error:the error of accepting a false hypothesis.In the opposite case, if we reject the rainhypothesis, we have made a good decisionif it is sunny, but we will have made an errorif it rains: the error of rejecting a true hypoth-esis.We can represent these different types oferrors in the following table:

Situation Decision

Accept H0(I take anumbrella)

Reject H0(I do not takean umbrella)

H0 true(it rains)

Gooddecision

Type I error

H0 false(it is sunny)

Type II error β Gooddecision

Theprobabilityαofrejectingatruehypoth-esis is called level of significance.The probability β of accepting a falsehypothesis is the probability of type IIerror.

FURTHER READING� Hypothesis testing� Significance level� Type II error

Page 550: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

552 Type II Error

REFERENCESNeyman, J., Pearson, E.S.: On the use and

interpretation of certain test criteria forpurposes of statistical inference, Parts Iand II. Biometrika 20A, 175–240, 263–294 (1928)

Type II Error

When carrying out a hypothesis test,a type II error occurs when accepting thenull hypothesis given it is false and β is theprobability of accepting the null hypothe-sis H0 when it is false:

β = P(accept H0 | H0 is false) .

HISTORYSee type I error.

DOMAINS AND LIMITATIONSThe probability of type II error is denotedby β, the probability of rejecting the nullhypothesis when it is false is given by 1−β:

β = P(reject H0 | H0 is false) .

The probability 1−β is called the power ofthe test.Therefore it is always necessary to try andminimize the risk of having a type II error,which means increasing the power of thehypothesis test.

EXAMPLESSee type I error.

FURTHER READING� Hypothesis testing� Type I error

Page 551: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

U

Uniform DistributionA uniform distribution is used to describea population that is uniformly distributed inan interval.Arandom variableX is said to beuniformlydistributed in the interval [a, b] if its densityfunction is given by:

f (x) =⎧⎨⎩

1

b− aif a ≤ x ≤ b

0 otherwise .

Because of the rectangular shape of its den-sity function, the uniform distribution isalso sometimes called the rectangular distri-bution.

Uniform distribution, a = 1, b = 3

The uniform distribution is a continuousprobability distribution.

MATHEMATICAL ASPECTSThe distribution function of a uniformdistribution is the following:

F (x) =

⎧⎪⎪⎨⎪⎪⎩

0 if x ≤ ax− a

b− aif a < x < b

1 otherwise .

The expected value of a random variablethat is uniformly distributed in an interval[a, b] is equal to:

E [X] =∫ b

a

x

b− adx = b2 − a2

2 (b− a)= a+ b

2.

This result is intuitively very easy to under-stand because the density function is sym-metric around the middle point of the inter-val [a, b].We can calculate

E[X2

]=

∫ b

ax2f (x) dx = b2 + ab+ a2

3.

The variance is therefore equal to:

Var (X) = E[X2

]− (E [X])2 = (b− a)2

12.

The uniform distribution is a particular caseof the beta distribution when α = β = 1.

DOMAINS AND LIMITATIONSApplications of the uniform distribution arenumerous in the construction of models forphysical, biological, and social phenomena.The uniform distribution is often used as

Page 552: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

554 Unimodal Distribution

a model for errors of approximation dur-ing measurements.For example, if the lengthof some objects is measured with a 1-cmprecision, a length of 35 cm can representall the objects with a length between 34.5and 35.5 cm. The approximation errors fol-low a uniform distribution in the interval[−0.5, 0.5].Theuniform distribution isoftenused togen-erate random numbers of any discrete orcontinuous probability distribution. Themethod used to generate a continuous distri-bution is the following.Consider X a continuous random variableand F(x) its distribution function. Supposethat F is continuous and strictly increasing.Then U = F(X) is a uniform and con-tinuous random variable. U takes its valuesin the interval [0,1]. We then have X =F−1(U), where F−1 is the inverse functionof the distribution function.Therefore, if u1, u2, . . . , un is a series of ran-dom numbers, then:

x1 = F−1(u1) ,

x2 = F−1(u2) ,

. . . ,

xn = F−1(un)

isaseriesof randomnumbersgeneratedfromthe distribution of the random variable X.

EXAMPLESIf random variable X follows a negativeexponential distribution of mean equal

to 1, then its distribution function is

F (x) = 1− exp (−x) .

Since F(x) = u, we have u = 1− e−x.To find x, we search for F−1:

u = 1− e−x ,

u− 1 = −e−x ,

1− u = e−x ,

ln(1− u) = −x ,

− ln(1− u) = x .

Notice that if u is a random variable that isuniform in [0,1], then (1− u) is also a ran-dom variable that is uniform in [0,1].Byapplying theaforementioned toaseriesofrandom numbers, we find a sample takenfrom X:

U X

0.14 1.97

0.97 0.03

0.53 0.63

0.73 0.31

......

FURTHER READING� Beta distribution� Continuous probability distribution� Discrete uniform distribution

Unimodal Distribution

See frequency distribution.

Page 553: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

V

Value

A value is a quantitative or qualitative mea-sure associated to a variable.Wespeakaboutasetofvalues takenbyavari-able. The values can be cardinal numbers ifthey are associated to a quantitative vari-able or ordinal numbers if they are associat-ed to a qualitative categorical variable.

EXAMPLESThe values taken by a certain variable canbe of the following type:• Consider the quantitative variable

“number of children”. The values associ-ated to this variable are the natural num-bers 0, 1, 2, 3, 4, etc.

• Consider the qualitative categoricalvariable “sex.” The associated valuesare the categories “masculine” and “fem-inine”.

FURTHER READING� Qualitative categorical variable� Quantitative variable� Variable

Variable

A variable is a measurable characteristic towhich are attributable many different vari-ables.

We distinguish many types of variables:• Quantitative variable, of which we dis-

tinguish:– Discrete variable, e.g., the result of

a test– Continuous variable, e.g., a weight or

revenue• Qualitative categorical variable, e.g.,

marital status

FURTHER READING� Data� Dichotomous variable� Qualitative categorical variable� Quantitative variable� Random variable� Value

Variance

Variance is a measure of dispersion ofa distribution of a random variable.Empirically, the variance of a quantitativevariable X is defined as the sum of squareddeviations of each observation relative tothe arithmetic mean divided by the numberof observations.

HISTORYThe analysis of variance such as we under-stand and practice it today was principal-

Page 554: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

556 Variance

ly developed by Fisher, Ronald Aylmer(1918, 1925, 1935). He also introduced theterms variance and analysis of variance.

MATHEMATICAL ASPECTSVariance is generally denoted by S2 when itis relative to a sample and by σ 2 when it isrelative to a population.We also denote the variance by Var(X)whenwe speak about the variance of a randomvariable.LetapopulationofN observationsberelativeto a quantitative variable X. By definition,the variance of a population is calculated asfollows:

σ 2 =

N∑i=1

(xi − μ)2

N,

where N is the dimension of the populationand μ the mean of the observation:

μ =

N∑i=1

xi

N.

When the observations are ordered in theform of a frequency distribution, the cal-culation of the variance is made in the fol-lowing manner:

σ 2 =

k∑i=1

fi · (xi − μ)2

k∑i=1

fi

,

where xi are the different values of the vari-able, fi are thefrequenciesassociated tothesevalues, and k is the number of different val-ues.To calculate the variance of the frequencydistribution of aquantitativeXwhere theval-ues are grouped in classes, we consider that

all the observations belonging to a certainclass take the values of the center of theclass.It is correct only if the hypothesis specify-ing that the observations are uniformly dis-tributed inside each class is verified. If thishypothesis is not verified, the obtained val-ue of the variance will be only approxima-tive.For the values grouped in classes, we have:

σ 2 =

k∑i=1

fi · (δi − μ)2

k∑i=1

fi

,

where δi are the centers of the classes, fi arethe frequencies associated to each class, andk is the number of classes.The formula for calculating the variance canbe modified to decrease the time of calcula-tion and to increase the precision. Thus it isbetter to calculate the variance with the fol-lowing formula:

σ 2 =N ·

N∑i=1

x2i −

(N∑

i=1xi

)2

N2 .

The variance corresponds to the centeredmoment of order two.Thevariancemeasuredonasample isanesti-mator of the variance of the population.Consider a sample of n observations relativeto a quantitative variable X and denote by xthe arithmetic mean of this sample:

x =

n∑i=1

xi

n.

To let the variance of the sample be an esti-matorwithoutbiasof thevarianceof thepop-ulation, it must be calculated by dividing by

Page 555: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

V

Variance 557

(n− 1) and not by n:

S2 =

n∑i=1

(xi − x)2

n− 1.

EXAMPLESFive students passed two exams, earning thefollowing grades:

Exam 1:

3.5 4 4.5 3.5 4.5

Exam 2:

2.5 5.5 3.5 4.5 4

Note that thearithmetic mean xof these twosets of observations is identical:

x1 = x2 = 4 .

The dispersion of the observations aroundthe mean is not the same.To determine the variance, first we calculatethe deviations of each observation from thearithmetic mean and then square the devia-tions:

Exam 1:

Grade (xi − x1) (xi − x1)2

3.5 −0.5 0.25

4 0.0 0.00

4.5 0.5 0.25

3.5 −0.5 0.25

4.5 0.5 0.255∑

i=1

(xi − x1

)2 1.00

Exam 2:

Grade(xi − x2

) (xi − x2

)2

2.5 −1.5 2.25

5.5 1.5 2.25

3.5 −0.5 0.25

Grade(xi − x2

) (xi − x2

)2

4.5 0.5 0.25

4 0.0 0.005∑

i=1

(xi − x2

)2 5.00

The variance of the grades of each examequals:

Exam 1:

S21 =

5∑i=1

(xi − x1)2

n− 1= 1

4= 0.25

Exam 2:

S22 =

5∑i=1

(xi − x2)2

n− 1= 5

4= 1.25

Since the variance of the grades of the sec-ond exam is greater, on the second exam thegrades are more dispersed around the arith-metic mean than on the first exam.

FURTHER READING� Analysis of variance� Measure of dispersion� Standard deviation� Variance of a random variable

REFERENCESFisher, R.A.: The correlation between rel-

atives on the supposition of Mendelianinheritance. Trans. Roy. Soc. Edinb. 52,399–433 (1918)

Fisher, R.A.: Statistical Methods forResearch Workers. Oliver & Boyd, Edin-burgh (1925)

Fisher, R.A.: The Design of Experiments.Oliver & Boyd, Edinburgh (1935)

Page 556: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

558 Variance Analyses

Variance Analyses

See analysis of variance.

Variance of a Random Variable

The variance of a quantitative randomvariable measures the value of the meandeviations of the values of this random vari-able from the mathematical expectancy(that is, from its mean).

MATHEMATICAL ASPECTSDepending on whether the random vari-able is discrete or continuous, we speakabout thevarianceofadiscreteorcontinuousrandom variable.In the case where the random variable is dis-creteX taking n valuesx1, x2, . . . , xn with rel-ative frequencies f1, f2, . . . , fn, the varianceis defined by:

σ 2 = Var (X) = E[(X − μ)2

]

=n∑

i=1

fi (xi − μ)2 ,

where μ represents the mathematicalexpectancy of X.We can establish another formula for whichthe calculation of the variance is made thus:

Var (X) = E[(X − μ)2

]

= E[X2 − 2μX + μ2

]

= E[X2

]− E [2μX]+ E

[μ2

]

= E[X2

]− 2μE [X]+ μ2

= E[X2

]− 2μ2 + μ2

= E[X2

]− μ2

= E[X2

]− (E [X])2

=n∑

i=1

fix2i −

(n∑

i=1

fixi

)2

.

In practice, this formula is generally bettersuited for making the calculations.If random variable X is continuous ininterval D, the expression of the variancebecomes:

σ 2 = Var (X) = E[(X − μ)2

]

=∫

D(x− μ)2 f (x) dx .

We can also transform this formula by thesame operations as in the discrete case, andso the variance equals:

Var(X) =∫

Dx2f (x)dx−

(∫

Dxf (x)dx

)2

.

Properties of the Variance1. Let a and b be two constants and X a ran-

dom variable:

Var (aX + b) = a2 ·Var (X) .

2. Let X and Y be two random variables:

Var (X + Y) = Var (X)+ Var (Y)

+ 2Cov (X, Y) ,

Var (X − Y) = Var (X)+ Var (Y)

− 2Cov (X, Y) ,

where Cov (X, Y) is the covariancebetween X and Y. In particular, if X andY are independent:

Var (X + Y) = Var (X)+ Var (Y)

Var (X − Y) = Var (X)+ Var (Y) .

Page 557: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

V

Variance–Covariance Matrix 559

EXAMPLESWe consider two examples, one concerninga discrete random variable, another a con-tinuous random variable.We throw a die several times in a row. Sup-pose we win 1 euro if the result is even and2 euros if the result is 1 or 3, and that we lose3 euros if the result is 5.Random variable X describes the number ofeuros won or lost. The following table repre-sents different values of X and their respec-tive probabilities:

X −3 1 2

P (X ) 16

36

26

The variance of discrete random variable Xequals:

Var (X) =3∑

i=1

fix2i −

[3∑

i=1

fixi

]2

= (−3)2 1

6+ (1)2 3

6+ (2)2 2

6

−(−3

1

6+ 1

3

6+ 2

2

6

)2

= 26

9.

Consider a continuous random variable Xwhose density function is uniform on (0, 1):

f (x) ={

1 for 0 < x < 10 otherwise .

The mathematical expectancy of this ran-dom variable equals 1

2 .We can calculate the variance of randomvariable X:

Var (X) =∫ 1

0x2f (x) dx−

(∫ 1

0xf (x) dx

)2

=∫ 1

0x2 · 1dx−

(∫ 1

0x · 1dx

)2

= x3

3

∣∣∣∣1

0−

(x2

2

∣∣∣∣1

0

)2

= 1

3−

(1

2

)2

= 1

12.

FURTHER READING� Expected value� Random variable

Variance–Covariance MatrixThe variance–covariance matrix V betweenn variables X1, . . . , Xn is a matrix of order(n× n), where the diagonal contains thevariances of each variable Xi and outside thediagonals are the covariances between thepairs of variables (Xi, Xj) for each i �= j. For nvariables, we calculate n variances and n(n−1) covariances. The result is that matrix Vis a matrix of dimension n× n. In the liter-ature, we often encounter this matrix underanother name: dispersion matrix or covari-ance matrix.If the variables are standardized (see stan-dardized data), the variances of the vari-ables equal 1 and the covariances becomecorrelations, and so the matrix becomesa correlation matrix.

HISTORYSee variance and covariance.

MATHEMATICAL ASPECTSLet X1, . . . , Xn be n random variables.The variance–covariance matrix is a matrixoforder (n× n); it is symmetricand contains

Page 558: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

560 Variance–Covariance Matrix

on the diagonal the variances of each vari-able and as the other terms the covariancesbetween the variables. Matrix V is the fol-lowing matrix:⎡⎢⎢⎣

Var (X1) Cov (X1, X2) · · · Cov (X1, Xn)

Cov (X2, X1) Var (X2) · · · Cov (X2, Xn)

......

. . ....

Cov (Xn, X1) Cov (Xn, X2) · · · Var (Xn)

⎤⎥⎥⎦ ,

with

Var (Xi) = E[X2

i

]− (E [Xi])

2 and

Cov(Xi, Xj

)

= E[(Xi − E[Xi])

(Xj − E[Xj]

)].

EXAMPLESConsider the following three variablesX1, X2, and X3 summarized in the followingtable:

X1 X2 X3

1 2 3

2 3 4

1 2 3

5 4 3

4 4 4

In the case n = 3, each random variable con-sists of five terms, which we denote N = 5.We want to calculate the variance–covariance matrix V of these variables. Wecomplete the following steps.We estimate the expectancy of each variableby itsempiricalmean xi and get the followingresults:

X1 = 2.6

X2 = 3

X3 = 3.4 .

The variance for a variable Xj is given by:

N∑i=1

(xij − xj

)2

4.

ThecovariancebetweentwovariablesXr andXs is calculated as follows:

N∑i=1

(xir − xr) (xis − xs)

4.

Then we should subtract the mean Xi of eachvariable Xi. We get a new variable Yi = Xi−Xi.

Y1 Y2 Y3

−1.6 −1 −0.4

−0.6 0 0.6

−1.6 −1 −0.4

2.4 1 −0.4

1.4 1 0.6

Note that the mean of each new variable Yi

equals0.Wefindtheseresults inmatrixform:

Y =

⎡⎢⎢⎢⎢⎢⎣

−1.6 −1 −0.4−0.6 0 0.6−1.6 −1 −0.4

2.4 1 −0.41.4 1 0.6

⎤⎥⎥⎥⎥⎥⎦

.

From this matrix we calculate matrix Y′ ·Y,a symmetric matrix, in the following form:⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

N∑i=1

y2i1

N∑i=1

yi2 · yi1

N∑i=1

yi3 · yi1

N∑i=1

yi2 · yi1

N∑i=1

y2i2

N∑i=1

yi3 · yi2

N∑i=1

yi3 · yi1

N∑i=1

yi3 · yi2

N∑i=1

y2i3

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦

.

This matrix contains the values of the sumof products of squares of each variable Yi

(on the diagonal) and the cross product ofeach pairofvariables (Yi, Yj)on the restof thematrix. For the original variables Xi, the fol-lowing matrix contains on the diagonal thesum of squared deviations from the mean for

Page 559: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

V

Vector 561

each variable and on the external terms thecross products of these deviations.Numerically we get:

⎡⎣

13.2 7 0.87 4 1

0.8 1 1.2

⎤⎦ .

The mean of the sum of squared deviationsfrom the mean of a random variable Xi of Nterms is known as the variance. The meanof the crossed products of these deviationsis called the covariance. Thus to get thevariance–covariance matrix V of the vari-ables X1, . . . , Xn, we should divide the ele-ments of matrix Y′ · Y by N − 1, which ismathematically the same as:

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

N∑i=1

y2i1

N−1

N∑i=1

yi2·yi1

N−1

N∑i=1

yi3·yi1

N−1N∑

i=1yi2·yi1

N−1

N∑i=1

y2i2

N−1

N∑i=1

yi3·yi2

N−1N∑

i=1yi3·yi1

N−1

N∑i=1

yi3·yi2

N−1

N∑i=1

y2i3

N−1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦

.

The fact of dividing by N−1 and not by N forobtaining the means is related to the fact thatwe generally work with a sample of randomvariables without knowing the real popula-tion and for which we make a nonbiased esti-mation of the variance and of the covariance.Numerically we get:

V =⎡⎣

3.3 1.75 1.751.75 1 0.20.2 0.25 0.3

⎤⎦ .

FURTHER READING� Correlation coefficient� Covariance� Standardized data� Variance

Vector

We call a vector of dimension n an (n ×1) matrix , that is, a table having only onecolumn and n rows. The elements in thistable are called coefficients or vector com-ponents.

MATHEMATICAL ASPECTSLet x = (xi) and y = (yi) for i = 1, 2, . . . , nbe two vectors of dimension n.We define:• The sum of two vectors made component

by component:

x+ y = (xi)+ (yi) = (xi + yi) .

• The product of vector x by the scalar k:

k · x = (k · xi) ,

where each component is multiplied by k.• The scalar product of x and y:

x′ · y = (xi)′ · (yi) =

n∑i=1

xi · yi .

• The norm of x:

‖ x ‖ = √x′ · x

‖ x ‖ =√√√√

n∑i=1

x2i .

We call a unit vector a vector of norm (orlength)1.Weobtainaunitvectorbydivid-ing a vector by its norm. Thus for exam-ple

1

‖ x ‖ (xi)

is a unit vector having the same directionas x.

Page 560: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

562 Von Mises, Richard

• The ith vector of the canonic bases inthe space of n dimensions is definedby the vector of dimension n havinga “1” on line i and “0” for other compo-nents.

• A vector is a matrix of a certain dimen-sion (n × 1). It is possible to multiplya vector by an (m×n)matrix according tothe multiplication rules defined for matri-ces.

DOMAINS AND LIMITATIONSA vector can be considered a column ofnum-bers, whence the term vector-column, just aswe use the term vector-line to designate thetranspose of a vector.Thus the inner product of two vectorscorresponds to a vector-row multiplied bya vector-column.

EXAMPLESIn a space of three dimensions, we considerthe vectors:

x =⎡⎣

123

⎤⎦ and y =

⎡⎣

010

⎤⎦ ,

where y is the second vector of the canonicbasis.– The norm of x is given by:

‖ x ‖2 =⎡⎣

123

⎤⎦′

·⎡⎣

123

⎤⎦

= [1 2 3] ·⎡⎣

123

⎤⎦

= 12 + 22 + 32 = 1+ 4+ 9

= 14 ,

from where ‖ x ‖= √14 ≈ 3.74.

– In the same way, we get ‖ y ‖= 1.– The sum of two vectors x and y equals:

x+ y =⎡⎣

123

⎤⎦+

⎡⎣

010

⎤⎦

=⎡⎣

1+ 02+ 13+ 0

⎤⎦ =

⎡⎣

133

⎤⎦ .

– The multiplication of vector x by thescalar product 3 gives:

3 · x = 3 ·⎡⎣

123

⎤⎦

=⎡⎣

3 · 13 · 23 · 3

⎤⎦ =

⎡⎣

369

⎤⎦ .

– The scalar product of x and y equals:

x′ · y =⎡⎣

123

⎤⎦′

·⎡⎣

010

⎤⎦

= [1 2 3] ·⎡⎣

010

⎤⎦

= 1 · 0+ 2 · 1+ 3 · 0 = 2 .

FURTHER READING� Matrix� Transpose

Von Mises, Richard

Von Mises, Richard was born in 1883 inLemberg in the Austro–Hungarian Empire(now L’viv, Ukraine) and died in 1953. Von

Page 561: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

V

Von Mises, Richard 563

Mises studied mechanical engineering at theTechnical University in Vienna until 1906.He became assistant to Georg Hamel, pro-fessor of mechanics, in Bruenn (now Brno,Czech Republic), where he received theVenia Legendi in 1908. He then workedas associate professor of applied mathe-matics. After the First World War, in 1920,hebecamedirectorof theInstituteofAppliedMathematics in Berlin.Von Mises, Richard is principally known forhis work on the foundations of probabilityand statistics, which were rehabilitated in

the 1960s. He founded the applied schoolof mathematics and wrote his first work onphilosophical positivism in 1939.

Selected principal works and articles of vonMises, Richard:

1928 Wahrscheinlichkeit, Statistik undWahrheit. Springer, Berlin Heidel-berg New York.

1939 Probability, Statistics and Truth,trans. Neyman, J., Scholl, D., andRabinowitsch, E. Hodge, Glasgow.

Page 562: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

W

Weighted Arithmetic Mean

The weighted arithmetic mean is a mea-sure of central tendency of a set of quan-titative observations when not all the obser-vations have the same importance.We must assign a weight to each observationdepending on its importance relative to otherobservations.The weighted arithmetic mean equalsthe sum of observations multiplied bytheir weights divided by the sum of theirweights.

HISTORYThe weighted arithmetic mean was intro-duced by Cotes, Roger in 1712. His workwas published in 1722, six years after hisdeath.

MATHEMATICAL ASPECTSLet x1, x2, . . . , xn be a set of n quantitiesor n observations relative to a quantitativevariable X to which we assign the weightsw1, w2, . . . , wn.The weighted arithmetic mean equals:

x =

n∑i=1

xi · wi

n∑i=1

wi

.

DOMAINS AND LIMITATIONSTheweighted arithmeticmean isnowused ineconomics, especially in consumer and pro-ducer price indices, etc.

EXAMPLESSuppose that during a course on the com-pany, management was composed of threerelatively different requirements of differentimportance:

Individual project 30%

Mid-term exam 20%

Final exam 50%

Each student receives a grade on a scale of 1to 10. One student receives an 8 for his indi-vidual project, a 9 on themid-term exam, anda 4 on the final exam.His final grade is calculated by the weightedarithmetic mean:

x = (8 · 30)+ (9 · 20)+ (4 · 50)

30+ 20+ 50= 6.2 .

In this example, the weights correspondto the relative importance of the differentrequirements of the course.

FURTHER READING� Arithmetic mean� Mean� Measure of central tendency

Page 563: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

566 Weighted Least-Squares Method

REFERENCESCotes, R.: Aestimatio Errorum in Mixta

Mathesi, per variationes partium Trian-guli plani et sphaerici. In: Smith, R. (ed.)Opera Miscellania, Cambridge (1722)

Weighted Least-SquaresMethod

The weighted least-squares method is usedwhen the variance of errors is not constant,that is, when the following hypothesis ofthe least-squares method is violated: thevariance of errors is constant (equal to theunknown value σ 2) for any observation i(that is, whatever the value of the concernedxij). Thus, instead of having, for each i =1, . . . , n, Var(εi) = σ 2, we have:

Var (εi) = σ 2wi ,

where the weights wi > 0 can be differentfor each i = 1, . . . , n.

MATHEMATICAL ASPECTSIn the matrix form, we have the model

Y = Xβ + ε ,

where Y is the vector (n × 1) of the obser-vations relative to the dependent variable(n observations), β is the vector (p × 1) ofparameters to be estimated, ε is the vector(n× 1) of errors, and

X =⎛⎜⎝

1 X11 . . . X1(p−1)

......

...1 Xn1 . . . Xn(p−1)

⎞⎟⎠

is the matrix (n × p) of the plan concern-ing the independent variables. Moreover, wehave:

Var (ε) = σ 2V ,

where

V =

⎛⎜⎜⎜⎝

1/w1 0 · · · 00 1/w2 · · · 0...

.... . .

...0 0 · · · 1/wn

⎞⎟⎟⎟⎠ .

StatingYw =WY ,

Xw =WX ,

εw =Wε ,

with

W =

⎛⎜⎜⎜⎝

√w1 0 · · · 00

√w2 · · · 0

......

. . ....

0 0 · · · √wn

⎞⎟⎟⎟⎠ ,

such as W′W = V−1, we obtain the equiv-alent model:

Yw= Xwβ + εw ,

where

Var (εw) = Var (Wε) =WVar (ε) W′

= σ 2 ·WVW = σ 2In ,

where In is the identity matrix of dimen-sion n. Since, for this new model, the vari-ance of errors is constant, the least-squaresmethod is used. The vector of estimators is:

βw =(X′wXw

)−1 X′wYw

= (X′W′WX

)−1 X′W′WY

=(

X′V−1X)−1

X′V−1Y .

The vectors of estimated values for Yw =WY become:

Yw = Xwβw

=WX(

X′V−1X)−1

X′V−1Y .

Page 564: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

W

Weighted Least-Squares Method 567

We can obtain a vector of estimated valuesfor Y =W−1Yw setting:

Y =W−1Yw

=W−1WX(

X′V−1X)−1

X′V−1Y

= Xβw .

The variance σ 2 is estimated by:

s2w =

(Yw − Yw

)′ (Yw − Yw

)

n− p

=(Y− Y

)′W′W

(Y− Y

)

n− p

=

n∑i=1

wi (yi − yi)2

n− p.

In the case of a simple regression, froma sample of n observations (xi, yi) andweights wi we have:

X′V−1X =

⎛⎜⎜⎝

n∑i=1

wi

n∑i=1

wixi

n∑i=1

wixi

n∑i=1

wix2i

⎞⎟⎟⎠ ,

XV−1Y =

⎛⎜⎜⎝

n∑i=1

wiyi

n∑i=1

wixiyi

⎞⎟⎟⎠ ,

and thus if we set βw =(

β0w

β1w

), we find:

β0w =

n∑i=1

wiyi

n∑i=1

wix2i −

n∑i=1

wixi

n∑i=1

wixiyi

n∑i=1

wi

n∑i=1

wix2i −

(n∑

i=1wixi

)2,

β1w =

n∑i=1

wi

n∑i=1

wixiyi −n∑

i=1wixi

n∑i=1

wiyi

n∑i=1

wi

n∑i=1

wix2i −

(n∑

i=1wixi

)2 .

Considering the ponderated weightedmeans:

xw =

n∑i=1

wixi

n∑i=1

wi

, yw =

n∑i=1

wiyi

n∑i=1

wi

,

we have the estimators:

β1w =

n∑i=1

wixiyi −n∑

i=1wixwyw

n∑i=1

wix2i −

n∑i=1

wix2w

=

n∑i=1

wi (xi − xw)(yi − yw

)

n∑i=1

wi (xi − xw)2

β0w = yw − β1wxw

that strongly resemble the least-squares esti-mators, except that we give more weightto the observations with a large wi (thatis, observation for which the errors havea smaller variance in the initial model). Theestimated values are given by:

yi = β0w + β1wxi = yw + β1w (xi − xw)

and the variance σ 2 is estimated by:

s2w =

n∑i=1

wi (yi − yi)2

n− 2

=

n∑i=1

wi(yi−yw

)2−β21w

n∑i=1

wi(xi−xw)2

n− 2.

Examples of weights wi chosen in simpleregression are given by wi = 1/xi in caseswhere the variance of εi is proportional to xi,or where wi = 1/x2

i when the variance of εi

is proportional to x2i .

Page 565: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

568 Weighted Least-Squares Method

DOMAINS AND LIMITATIONSThe greatest disadvantage of the weightedleast-squares method, which many peopleprefer not to know about, is probably thefact that it is based on the assumption thatthe weight is known exactly. This is almostnever the case in practice, and the estimat-ed weights are used instead. Generally, it isdifficult to evaluate the effect of the use ofestimated weights, but experience indicatesthat the results of most regression analysesarenotverysensitivetotheweightsused.Theadvantages of a weighted analysis are oftenobtained on a large scale, though not alwaysentirely, with the approximate weights. It isimportant to be aware of and avoid this prob-lem and to use only weights that can be esti-mated with precision relative to one anoth-er. The weighted least-squares regression, asother least-squares methods, is also sensitiveto outliers.

EXAMPLESConsider a data set containing 10 observa-tions with explanatory variable X taking val-ues Xi = i with i = 1, 2, . . . , 10. Variable Yis generated using the model:

yi = 3+ 2xi + εi ,

where the εi are normally distributed withE (εi) = 0 and Var (εi) = (0.2xi)

2. Thusthe variance of error of the first observationcorresponds to [(1) (0.2)]2 = 0.04, just asthose of the tenth observation correspond to[(10) (0.2)]2 = 4. The data thus generatedare presented in the following table:

Values xi and yi generated by model yi = 3 +2xi + εi

i xi yi

1 1 4.90

2 2 6.55

i xi yi

3 3 8.67

4 4 12.59

5 5 17.38

6 6 13.81

7 7 14.60

8 8 32.46

9 9 18.73

10 10 20.27

First we perform a nonweighted regression.We obtain the equation of regression:

yi = 3.49+ 2.09xi ,

β0 = 3.49 with tc = 0.99 ,

β1 = 2.09 with tc = 3.69 .

Analysis of variance

Sourceof var-iation

Degreesof free-dom

Sum ofsquares

Mean ofsquares

F

Regres-sion

1 360.68 360.68 13.59

Resi-dual

8 212.39 26.55

Total 9 573.07

The R2 of this regression is 62.94%. At firstglance, the results seem to be good. The esti-mated coefficient associated to the explana-tory variable is clearly significative. Thisresult is not at all surprising because thefact that the variance of errors is not con-stant rarely influences the estimated coeffi-cients. The mean square of residuals is diffi-cult toexplain,considering there isnouniquevariance to estimate. Moreover, the standarderrorsand the lengthsof theconfidence inter-vals at 95% for the estimated conditionalmeans are relatively constant for all obser-vations. The nonweighted analysis does not

Page 566: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

W

Wilcoxon, Frank 569

take into account the observed inequalitiesbetween the variances of the errors.We now make a weighted regression usingas weights the values 1/x2. The weightsare known because they are proportional tothe real variances, which equal (0.2x)2. Theresults are the following:

yi = 2.53+ 2.28xi ,

β0 = 2.53 with tc = 3.07 ,

β1 = 2.28 with tc = 7.04 .

Analysis of variance

Sourceofvariation

Degreesof free-dom

Sum ofsquares

Mean ofsquares

F

Regres-sion

1 23.29 23.29 49.6

Residual 8 3.75 0.47

Total 9 27.05

The R2 of this regression is 86.11%. On thisbasis, we can make the following comments:• All numbers related to the sum of squares

of the dependent variable are affected bythe weights and are not comparable tothose obtained by nonweighted regres-sion.

• The Fisher test and the R2 can be com-pared between two models. In our exam-ple, two values are greater than the non-weighted regression,but this isnotalwaysthe case.

• The summarized coefficients are rela-tively close to those of the nonweightedregression. This is often the case but nota generality.

• The standard errors and confidence inter-vals for the conditional means reflect thefact that the precision of the estimations

decreases with increasing values of x.This result is the first reason to use theweighted regression.

FURTHER READING� Generalized linear regression� Least squares� Regression analysis

REFERENCESSeber, G.A.F. (1977) Linear Regression

Analysis. Wiley, New York

Wilcoxon, Frank

Wilcoxon, Frank (1892–1965) received hismaster’s degree in chemistry at Rutgers Uni-versity in 1921 and his doctorate in physicsin 1924 at Cornell University, where he alsodid postdoctoral work from 1924 to 1925.From 1928 to 1929 he was employed byNichols Copper Company in Maspeth inQueens, New York and was then hired bythe Boyce Thompson Institute for PlantResearch as chief of a group studying theeffects of insecticides and fungicides. Heremain at the institute until the start of theSecond World War; at this time and for thenext 2 years, he moved to the Atlas PowderCompany in Ohio wherehedirected theCon-trol Laboratory.In 1943, Wilcoxon, Frank was appointedhead of the group at the American CyanamidCompany laboratory (in Stamford, CT) thatstudied insecticides and fungicides. He thendeveloped an interest in statistics: in 1950,hewas transferred to Lederle Laboratories divi-sion of Cyanamid in Pearl River, NY, wherehedeveloped astatistical consultationgroup.From 1960 until his death in 1965 he taught

Page 567: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

570 Wilcoxon Signed Table

applied statistics in the Department of Statis-tics to natural science majors at Florida StateUniversity.

Selected principal works and articles ofFrank Wilcoxon:

1945 Individual comparisons by rankingmethods. Biometrics 1, 80–83.

1957 Some rapid approximate statisti-cal procedures. American Cyanamid,Stamford Research Laboratories, Stam-ford, CT.

Wilcoxon Signed Table

The Wilcoxon signed table gives theoreticalvalues of statistic T of the Wilcoxon signedtest applied to two paired samples, on thehypothesis that two populations follow thesame distribution.

HISTORYOwen, D.B. (1962) published a Wilcoxonsigned table for n ≤ 20 (where n is the num-ber of paired observations) for the Wilcox-on signed test developed by Wilcoxon, F.(1945).The generally used table is that of Harter,H.L. and Owen, D.B. (1970) established forn ≤ 50.

MATHEMATICAL ASPECTSConsider a set of n pairs of observations((X1, Y1) , . . . , (Xn, Yn)). We calculate theabsolute differences |Xi − Yi| to which weassign a rank (that is, we associate rank 1 tothe smallest value, rank 2 to the next highestvalue, and so on, up to rank n for the great-

est value). Then we give to this rank the signcorresponding to the difference Xi−Yi. Thuswe calculate the sum of positive ranks:

T =k∑

i=1

Ri ,

where k is the number of pairs representinga positive difference (Xi − Yi > 0).For different values of m and α (the sig-nificance level), the Wilcoxon signed tablegives the theoretical values of statistic T ofthe Wilcoxon signed test, on the hypothe-sis that the two underlying populations areidentically distributed.

DOMAINS AND LIMITATIONSThe Wilcoxon signed table is used in non-parametric tests that use rank and especiallyin the Wilcoxon signed test.

EXAMPLESBelow is an extract of a Wilcoxon signedtable m going from 10 to 15 and α = 0.05 toα = 0.975:

m α = 0.05 α = 0.975

10 11 46

11 14 55

12 18 64

3 22 73

14 26 83

15 31 94

For an example of how the Wilcoxon signedtable is used, see Wilcoxon signed table.

FURTHER READING� Statistical table� Wilcoxon signed test

Page 568: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

W

Wilcoxon Signed Test 571

REFERENCESHarter, H.L., Owen, D.B.: Selected Tables in

Mathematical Statistics, vol. 1. Markham,Chicago (5.7 Appendix) (1970)

Owen, D.B.: Handbook of Statistical Tables.(3.3, 5.7, 5.8, 5.10). Addison-Wesley,Reading, MA (1962)

Wilcoxon, F.: Individual comparisons byranking methods. Biometrics 1, 80–83(1945)(5.1, 5.7).

Wilcoxon Signed Test

The Wilcoxon signed test is a nonparam-etric test. As a signed test, it serves to checkfor any difference between two populations.It tellsone if thedifferencebetweentwopairsof observations is positive or negative, tak-ing into account the amplitude of this differ-ence.

HISTORYThe Wilcoxon signed was created byWilcoxon, F. in 1945. In 1949, he proposedanother method to treat the zero differencesthat can appear between two observationsof considered samples.

MATHEMATICAL ASPECTSLet (X1, X2, . . . , Xn) and (Y1, Y2, . . . , Yn) betwo samples of size n. We consider n pairs ofobservations (x1, y1) , (x2, y2) , . . . , (xn, yn).We denote by |di| the difference in absolutevalue between xi and yi:

|di| = |yi − xi| , for i = 1, 2, . . . , n .

We subtract from the dimension all the pairsof observations that do not represent the dif-ference (di = 0). We denote by m the num-ber of pairs remaining. We assign then to

the pairs of observations a rank from 1 tom depending on the size of |di|, that is, wegive rank 1 to the smallest value of |di| andrank m to the largest value of |di|. We nameRi, i = 1, . . . , m the rank thus defined.If many pairs of variables represent the sameabsolute difference |di|, we assign to themthe mean rank. If there are mean ranks, thestatistic of the test is calculated by the fol-lowing expression:

T1 =

m∑i=1

Ri

√√√√m∑

i=1

R2i

,

whereRi is the rank of thepair (xi, yi)withoutthe sign of the difference di.If there are no mean ranks, it is more correctto use the sum of ranks associated to the pos-itive difference:

T =∑

(for i suchas di > 0)

Ri .

HypothesesThe underlying hypotheses in the Wilcoxonsigned test are based on dk, the median ofdi. According to whether it is a two-tail testor a one-tailed test, we have the followingcases:

A: Two-sided test:

H0 : dk = 0

H1 : dk �= 0

B: One-sided test:

H0 : dk ≤ 0

H1 : dk > 0

C: One-sided test:

H0 : dk ≥ 0

H1 : dk < 0

Page 569: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

572 Wilcoxon Signed Test

In case A, the hypothesis represents the situ-ation where there are no differences betweenthe two populations.In case B, we make the hypothesis that thevalues of the population of Xhave a tendencytobegreater thanthoseofthepopulationofY.In case C, we make the contrary hypothesis,i.e., that the values of the population of Xhave a tendency to be smaller than those ofthe population of Y.

Decision RulesIf T is used and if m < 50, then the Wilcox-on signed table will be used to test the nullhypothesis H0.The random variable T is a linear combi-nation of m independent Bernoulli randomvariables (but not identically distributed). Ithas as mean and standard deviation:

μ = m (m+ 1)

4and

σ =√

m (m+ 1) (2m+ 1)

24.

We deduce the random variable Z:

Z = T − μ

σ.

We denote by wα the value of the Wilcoxonsigned table with parameters m and α (whereα is the significance level).Ifm > 20, thenwα canbeapproximatedwiththe help of the corresponding values of thenormal distribution:

wα = m (m+ 1)

4

+ zα

√m (m+ 1) (2m+ 1)

24,

where zα is the values in the normal tableat level α.If T1 is used, we will base our calculationdirectly on the normal table to test the null

hypothesis H0. The decision rules are differ-ent depending on the hypotheses. We havethe decision rules A, B, and C relative to theprevious cases A, B, and C. If T1 is used, weshould replace T by T1 and wα by zα in therules that follow.

Case AWe reject H0 at the significance level α if Tis greater than w1−α/2 or smaller than wα/2,that is, if

T < wα/2 ou T > w1−α/2 .

Case BWe reject H0 at the significance level α if Tis greater than w1−α , that is, if

T > w1−α .

Case CWe reject H0 at the significance level α if Tis smaller than wα , that is, if

T < wα .

DOMAINS AND LIMITATIONSTo use the Wilcoxon signed test, the follow-ing criteria must be met:1. The distribution of di must be symmetric.2. di must be independent.3. di must be measured in true values.The Wilcoxon signed test is also used as thetest of the median. The data X1, X2, . . . , Xn

consist of a single sample of dimension n.The hypotheses corresponding to the previ-ous cases A, B, and C are the following:

A: Two-tail cases:

H0: The median of X equals a presumedvalue Md

H1: The median of X does not equal Md

Page 570: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

W

Wilcoxon Signed Test 573

B: One-tail case:

H0: The median of X is ≥ Md

H1: The median of X is < Md

C: One-tail case:

H0: The median of X is ≤ Md

H1: The median of X is > Md

Note that “median” could be replaced by“mean” because we accepted the hypothe-sis of the symmetry of the distribution of X.In what concerns the calculations, we form npairs (X1, Md) , (X2, Md) , . . . , (Xn, Md) andtreat them in the same way as before.The restof the test is identical.

EXAMPLESWe conduct tests on eight pairs of twins. Thegoal is to see if the first born is more intelli-gent than the second born.The data are presented in the following table,the higher scores corresponding to better testresults.

Rank ofPair oftwins

Firstborn Xi

Secondborn Yi di

∣∣di

∣∣ Ri

1 90 88 −2 2 −2

2 75 79 4 4 4

3 99 98 −1 1 −1

4 60 66 6 5 5

5 72 64 −8 6 −6

6 83 83 0 – –

7 86 86 0 – –

8 92 95 3 3 3

As pairs 6 and 7 represent no difference(d6 = d7 = 0), the number of remainingpairs equals m = 6. As the di are all differ-ent, the mean ranks must not be calculated,and we use the sum of positive ranks:

T =∑

(for i suchas di > 0)

Ri ,

where Ri is the rank of the pair (xi, yi) repre-senting a positive di. Thus we have:

T = 4+ 5+ 3 = 12 .

The hypotheses are the following:

H0: There is no difference in the level ofintelligencebetweenthefirstandsecondtwin (dm = 0)

H1: There is a difference (dm �= 0)

These hypotheses corresponding to case Aand the decision rule are the following:Reject H0 at level α if T < wα/2 or T >

w1−α/2.If we choose the significance level α =0.05, we obtain from the Wilcoxon signedtable for m = 6:

w1−α/2 = 20 and wα/2 = 1 ,

whereT isneithergreater than 20 nor smallerthan 1, and we cannot reject the null hypoth-esis H0.Now we suppose that the results for pairs 7and 8 have changed with the new hypothesisin the following table:

Rank ofPair oftwins

Firstborn Xi

Secondborn Yi di

∣∣di

∣∣ Ri

1 90 88 −2 4 −4

2 75 79 4 5 5

3 99 98 −1 2 −2

4 60 66 6 6 6

5 72 64 −8 7 −7

6 83 83 0 – –

7 86 87 1 2 2

8 92 91 −1 2 −2

Now, pair 6 does not represent a difference(d6 = 0); thus m equals 7.Pairs 3, 7, and 8 represent the same absolutedifference: |d3| = |d7| = |d8| = 1. We

Page 571: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

574 Wilcoxon Table

assign a mean rank to the pairs that equal therank of the ex aequo to which we add half thenumber of ex aequo minus one. Thus:

mean rank = rank of |d3| + 12 (3− 1)

= 2 .

In this case, we use the statistical test T1 cal-culated by:

T1 =

m∑i=1

Ri

√√√√m∑

i=1

R2i

= −2

11.75= −0.17 .

The decision rule is the following:Reject H0 at level α if T1 < zα/2 or T1 >

z1−α/2.If we choose the level α = 0.05, then thevalues of the normal table are:

z1−α/2 = −1.96 and zα/2 = 1.96 .

In consequence, T1 is neither greater than1.96 nor smaller than −1.96.We obtain the same result as in the previouscase and do not reject the null hypothesis H0.

FURTHER READING� Hypothesis testing� Nonparametric test� Wilcoxon signed table

REFERENCESWilcoxon, F.: Individual comparisons by

ranking methods. Biometrics 1, 80–83(1945)(5.1, 5.7).

Wilcoxon, F.: Some rapid approximate sta-tistical procedures. American Cyanamid,Stamford Research Laboratories, Stam-ford, CT (1957)

Wilcoxon Table

The Wilcoxon table gives the theoretical val-ues of statistic T of the Wilcoxon test underthe hypothesis that there is no differencebetween the distribution of two comparedpopulations.

HISTORYThe critical values for N ≤ 20 (where N isthe total number of observations) were cal-culated by Wilcoxon, F. (1947).

MATHEMATICAL ASPECTSLet (X1, X2, . . . , Xn) be a sample of dimen-sion n coming from population 1 and(Y1, Y2, . . . , Ym) a sample of dimension mcoming from population 2.Thus we obtain N = n+m observations thatwe want to class in increasing order with-out taking into account their belonging tothe samples. We assign a rank to each value:rank 1 to the smallest value, rank 2 to the nextvalue, and so on up to rank N to the largestvalue.We define the statistical test T in the follow-ing manner:

T =n∑

i=1

R (Xi) ,

where R(Xi) is the rank assigned to the obser-vation Xi, i = 1, 2, . . . , n relative to the set oftwo samples (X1, . . . , Xn) and (Y1, . . . , Ym).For different values of n, m, and α (the sig-nificance level), the Wilcoxon table givesthe theoretical values of statistic T of theWilcoxon test, under thehypothesis that thetwo underlying populations are identicallydistributed.

Page 572: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

W

Wilcoxon Test 575

DOMAINS AND LIMITATIONSThe Wilcoxon table is used in nonparam-etric tests that use rank, and especially in theWilcoxon test.

EXAMPLESHere is an extract of a Wilcoxon table for n=14, m = 11, and α = 0.05 equalling 0.95:

n m α critical

14 11 0.05 152

14 11 0.95 212

For an example of the use of the Wilcoxontable, see Wilcoxon test.

FURTHER READING� Statistical table� Wilcoxon test

REFERENCESWilcoxon,F.:Probability tablesfor individu-

al comparisons by ranking methods. Bio-metrics 3, 119–122 (1947)

Wilcoxon TestThe Wilcoxon test is a nonparametric test.It is used when we have two samples comingfrom two populations. The goal is to verifyif there is a difference between the popula-tions on the basis of the random samples tak-en from these populations.

HISTORYTheWilcoxon test isnamed for itsauthorandwas introduced in 1945.See also Mann–Whitney test.

MATHEMATICAL ASPECTSLet (X1, X2, . . . , Xn) be a sample of dimen-sion n coming from a population 1, and let

(Y1, Y2, . . . , Ym) be a sample of dimension mcoming from a population 2.Thus we obtain N = n+m observations thatwe will class in increasing order without tak-ing into account their belonging to the sam-ples. Then we assign a rank of 1 to the small-estvalue, a rankof2 to thenexthighestvalue,and so on up to rank N, which is assigned tothe highest value. If many observations haveexactly the same value, then we will assigna mean rank. We denote by R(Xi) the rankassigned to Xi, i = 1, . . . , n.Statistic W of the test is defined by:

W =n∑

i=1

R (Xi) .

If the dimension of the sample is large (N ≥12), then we use, according to Gibbons, J.D.(1971), an approximation using the statis-tic W1, which is supposed to follow a stan-dard normal distribution N (0, 1):

W1 = W − μ

σ,

where μ and σ are, respectively, the meanand the standard deviation of randomvariable W:

μ = n (N + 1)

2

and

σ =√

mn (N + 1)

12.

If we use the approximation for the largesamples W1, and if we use the fact that thereare ranks ex aequo among the N observa-tions, we replace the standard deviation by:

σ =

√√√√√√√√√√mn

12

⎛⎜⎜⎜⎜⎜⎝

N + 1−

g∑j=1

tj(

t2j − 1)

N (N − 1)

⎞⎟⎟⎟⎟⎟⎠

,

Page 573: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

576 Wilcoxon Test

where g is the number of groups of equal (tie)ranksand tj thedimension ofgroup j. (If thereare no equal (tie) ranks, the observations areseen as many groups of dimension 1. In con-sequence, g = N and tj = 1 for j = 1, . . . , N,

and σ reduces to

√mn (N + 1)

12.)

HypothesesThe Wilcoxon test can be made on the basisof a two-tail test or one-tailed test, accord-ing to the type of hypothesis:A: Two-tail test:

H0 : P (X < Y) = 12

H1 : P (X < Y) �= 12

B: One-tail test:

H0 : P (X < Y) ≤ 12

H1 : P (X < Y) > 12

C: One-tail test:

H0 : P (X < Y) ≥ 12

H1 : P (X < Y) < 12

Case A expresses the hypothesis that there isno difference between two populations.Case B represents the hypothesis that popu-lation 1 (from where we took the sample ofX) generally takes greater values than doespopulation2(fromwherewetookthesampleof Y).Case C expresses the contrary hypothesis,i.e., that thevaluesofpopulation 1havea ten-dency to be smaller than those of popula-tion 2.

Decision RulesCase AWe reject the null hypothesis H0 at the sig-nificance levelα if W is smaller than the val-ue of the Wilcoxon table with parameters n,

m, and α2 denoted by tn,m,α/2 or if W is greater

than the value in the table for n, m, and 1−α2 ,

by tn,m,1−α/2, that is, if

W < tn,m,α/2 or W > tn,m,1−α/2 .

If the test uses statistic W1, a comparison ismade with the normal table:

W1 < zα/2 or W1 > z1−α/2 ,

with zα/2 and z1−α/2 the values of the nor-mal table with, respectively, the parametersα2 and 1− α

2 .

Case BWereject thenullhypothesisH0 at the signif-icance level α if W is smaller than the valueof the Wilcoxon table with parameters n, m,and α, denoted by tn,m,α, that is, if

W < tn,m,α ,

and in the case where statistic W1 is used:

W1 < zα ,

with zα being the value of the normal tablewith parameter α.

Case CWereject thenullhypothesisH0 at the signif-icance level α if W is greater than the valuein the Wilcoxon table with parameters n, m,and 1− α, denoted by tn,m,1−α, that is, if

W > tn,m,1−α ,

and with statistic W1:

W1 > z1−α ,

with z1−α being the value of the normal tablewith parameter α.

Page 574: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

W

Wilcoxon Test 577

DOMAINS AND LIMITATIONSTo use the Wilcoxon test, the following cri-teria must be met:1. The two samples must be from random

samples taken from their respective pop-ulations.

2. In addition to the independence insideeach sample, there must be mutual inde-pendence between the two samples.

3. The scale of measure must be at least ordi-nal.

EXAMPLESIn a class, we count 25 students: 14 boys and11 girls. We test them on mental calculationto see if the boys in this class have a tendencyto be better than the girls.The data are in the following table, the high-est scores corresponding to the best testresults.

Boys (Xi ) Girls (Yi )

19.8 17.5 17.7 23.6

12.3 17.9 7.1 11.1

10.6 21.1 21.0 20.3

11.3 16.4 10.6 15.6

14.0 7.7 13.3

9.2 15.2 8.6

15.6 16.0 14.1

We class the scores in increasing order andassign a rank:

Scores Sample Rank R (Xi )

7.1 Y 1 –

7.7 X 2 2

8.6 Y 3 –

9.2 X 4 4

10.6 Y 5.5 –

10.6 X 5.5 5.5

11.1 Y 7 –

11.3 X 8 8

12.3 X 9 9

Scores Sample Rank R (Xi )

13.3 Y 10 –

14.0 X 11 11

14.1 Y 12 –

15.2 X 13 13

15.6 X 14.5 14.5

15.6 Y 14.5 –

16.0 X 16 16

16.4 X 17 17

17.5 X 18 18

17.7 Y 19 –

17.9 X 20 20

19.8 X 21 21

20.3 Y 22 –

21.0 Y 23 –

21.1 X 24 24

23.6 Y 25 –

We calculate statistic W:

W =14∑

i=1

R (Xi)

= 2+ 4+ 5.5+ 8+ 9+ 11+ 13

+ 14.5+ 16+ 17+ 18+ 20

+ 21+ 24 = 183 .

If we choose the approximation for the largesamples, taking into account the mean ranks,we make the adjustment:

W1 =W − n (N + 1)

2√√√√√√√√√√mn

12

⎛⎜⎜⎜⎜⎜⎝

m+ n+ 1−

g∑

j=1

tj(

t2j − 1)

(m+ n)(m+ n− 1)

⎞⎟⎟⎟⎟⎟⎠

.

Here g = 2 and tj = 2 for j = 1, 2. Thus wehave:

W1 =183− 14 (25+ 1)

2√11·14

12

(11+ 14 + 1− 2(22−1+2

(22−1

)(11+14)(11+14−1)

)

= 0.0548 .

Page 575: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

578 Willcox, Walter Francis

The hypotheses that we want to test are:

H0: The boys in this class do not have a ten-dency to be better than the girls in thisclass in mental calculation.

H1: The boys in this class have a tendencyto be better than the girls in this class inmental calculation.

This is expressed by H0: P (X < Y) ≥ 12 and

corresponds to case C. The decision rule isthe following:Reject H0 at significance level α if

W > tn,m,1−α or W1 > z1−α .

If we choose α = 0.05, the theoretical valuein the Wilcoxon table is tn,m,1−α = 212.The calculated value of W (W = 183) is notgreater than tn,m,1−α = 212, and we cannotreject the null hypothesis H0.The value in the normal table z1−α equals1.6449. In consequence, W1 is not greaterthan z1−α (0.0548 < 1.6449), and we drawthe same conclusion on the basis of statis-tic W: the boys in this class do not have a ten-dency to be better than the girls in this classin mental calculation.

FURTHER READING� Hypothesis testing� Mann–Whitney test� Nonparametric test� Wilcoxon signed test� Wilcoxon table

REFERENCESGibbons, J.D.: Nonparametric Statistical

Inference. McGraw-Hill, New York(1971)

Wilcoxon, F.: Individual comparisons byranking methods. Biometrics 1, 80–83(1945)

Willcox, Walter Francis

Willcox,Walter Francis (1861–1964)gradu-ated from AmherstCollege inMassachusetts(receiving his bachelor’s degree in 1884 andhis master’s degree in 1888). He continuedhis studies at Columbia University in NewYork in the late 1880s. He received his doc-torate from Columbia University in 1891.His Ph.D. thesis was called The DivorceProblem: A Study in Statistics. From1891, Willcox became professor of eco-nomics and of statistics at Cornell Univer-sity, where he stayed until his retirement in1931.His principal contributions in statistics werein thedomain ofdemography aswell as in thedevelopment of the system of federal statis-tics.From 1899 to 1902 he managed the 12thcensus of the United Nations and presidedover the American Statistical Association in1912, the American Economic Associationin 1915, and the International StatisticalInstitute in 1947.

Selected principal works and articles ofWillcox, Walther Francis:

1897 The Divorce Problem: A Study inStatistics. Studies in History, Eco-nomics and Public Law. Columbia Uni-versity Press, New York.

1940 Studies in American Demography,Cornell University Press, Ithaca, NY.

Page 576: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Y

Yates’ Algorithm

Yates’ algorithm is a process used to com-pute the estimates of main effects and of theinteractions in a factorial experiment.

HISTORYYates’ algorithm was created by Yates,Franck in 1937.

MATHEMATICAL ASPECTSConsider a factorial experiment with n fac-tors, each having two levels. We denote thisexperiment by 2n.The factorsaredenoted byuppercase letters.We use a lowercase letter for the heigh levelof the factor and omit the letter for the lowerlevel. When all the factors are at the lowerlevel, we use the symbol (1).The algorithm presented in a table. In thefirst column we find all the combinations ofthe levels of different factors in a standardorder (the introduction of a letter is followedby combinations with all previous combi-nations). The second column contains theobservations of all the combinations.For the subsequent n columns, the procedureis the same: the first half of the column isobtained by summing the pairs of successivevalues of the previous column; the secondhalf of the column is obtained by subtract-

ing the first value from the second value ofeach pair.Column n contains the contrasts of the maineffectsand interactionsgiven in thefirstcol-umn. To find the estimators of these effects,we divide the nth column by m ·2n−1, wherem is the number of repetitions of the experi-ment and n the number of factors. Concern-ing the first line, we find in column n the sumof all the observations.

DOMAINS AND LIMITATIONSYates’ algorithm can be applied to expe-riments with factors that have more thantwo levels.One can also use this algorithm to find thesum of squares corresponding to the maineffects and interactions of the 2n factorialexperiment. They are obtained by squaringthe nth column and dividing the result by m ·2n (wherem is thenumber of replicationsandn the number of factors).

EXAMPLESTo compute the estimates of the maineffects and those of the three interactions,let us consider a factorial experiment 23

(meaning three factors with two levels) rela-tive to the production of the sugar beet. Thefirst factor, denoted A, concerns the typeof fertilizer (manure or nonmanure); the

Page 577: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

580 Yates’ Algorithm

second, denoted B, concerns the depth ofplowing (20 or 30 cm); and the last factor isC, the variety of sugar beet (variety 1 or 2).Since each factor has two levels, one canspeak of a 2× 2× 2 experiment.For example, the interaction between fac-tor A and factor B is denoted by A×B, andthe interaction between all three factors iswritten A× B× C.The eight observations indicate the weightof the harvesting in kilograms for halfa hectare, giving the following table:

A Manure Nonmanure

B 20 cm 30 cm 20 cm 30 cm

Var. 1 80 96 84 88

C

Var. 2 86 96 89 93

In doing this experiment, we want to quan-tify not only the effect of fertilizer, plowing,and the variety of beet (simple effects), butalso the impact of combined effects (effectsof interaction) on harvesting.To establish the table of Yates’ algorithm,the eight observations must be classified inYates’ order. The classification table is prac-tical in this case; the symbols “−” and “+”represent, respectively, the lower and upperlevelof the factors.Forexample, thefirst lineindicates the observation “80” correspond-ing to the lower levels for all three factors(A: manure, B: 20 cm, C: variety 1).

Observation classification table in Yates’ order

Obs. Factor

A B C

Correspondingobservation

1 − − − 80

2 + − − 84

3 − + − 96

Obs. Factor

A B C

Correspondingobservation

4 + + − 88

5 − − + 86

6 + − + 89

7 − + + 96

8 + + + 93

Wecannowestablish the tableofYates’algo-rithm:

Table corresponding to Yates’ algorithm

ColumnsCombi-nation Obs.

1 2 3

Esti-mation

Averageeffect

(1) 80 164 348 712 89 —

a 84 184 364 −4 −1 A

b 96 175 −4 34 8.5 B

ab 88 189 0 −18 −4.5 A× B

c 86 4 20 16 4 C

ac 89 −8 14 4 1 A×C

bc 96 3 −12 −6 −1.5 B× C

abc 93 −3 −6 6 1.5 A× B× C

The terms in column 2, for example, are cal-culated in the following way:• First half of column (sum of elements of

each pair of previous column, in this casecolumn 1):

348 = 164+ 184

364 = 175+ 189

−4 = 4+ (−8)

0 = 3+ (−3)

• Second halfof column (subtractionofele-ments of each pair, in opposite order):

20 = 184− 164

14 = 189− 175

−12 = −8− 4

−6 = −3− 3

Page 578: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Y

Yule and Kendall Coefficient 581

The estimation for the main effect of fac-tor A is equal to −1. As one can see, thevalue −4 (column 3) has been divided bym · 2n−1 = 1 · 23−1 = 4, where m is equalto 1 because the experiment was done onlyonce (without repetition) and n correspondsto thenumberof factors,which is three in thiscase.

FURTHER READING� Analysis of variance� Contrast� Design of experiments� Factorial experiment� Interaction� Main effect

REFERENCESYates, F.: The design and analysis of fac-

torial experiments. Technical Commu-nication of the Commonwealth Bureauof Soils 35, Commonwealth AgriculturalBureaux, Farnham Royal (1937)

Youden, William JohnYouden, William John (1900–1971) earnedhis master’s degree in chemistry fromColumbia University in 1924. From 1924to 1948 he worked in the Boyce Thomp-son Institute FOR PLANT RESEARCH inYonkers, NY. From 1948 until his retire-ment in 1965, Youden, W.J. was a memberof the Applied Mathematics Division in theNational Bureau of Standards in Washing-ton, D.C. During this period Youden devel-oped a test and wrote Statistical Methodsfor Chemists.Principal work of Youden, William John:

1951 Statistical Methods for Chemists.Wiley, New York.

Yule and Kendall Coefficient

The Yule coefficient is used to measure theskewness of a frequency distribution. Ittakes into account the relative positions ofthequartileswith respect to the median, andcompares the spreading of the curve to theright and left of the median.

MATHEMATICAL ASPECTSIn a symmetrical distribution, the quartilesare at an equal distance on each side of themedian. This means that

(Q3 −M)− (M − Q1) = 0 ,

where M is the median, Q1 the first quartile,and Q3 the third quartile.If the distribution is asymmetric, then theprevious equality is not true.The left part of the equation can be rear-ranged in the following way:

Q3 + Q1 − 2M .

To obtain a coefficient of skewness that isindependent of the measure unit, the equa-tion should be divided by the value (Q3 −Q1).The Yule coefficient is then:

CY = Q3 + Q1 − 2M

Q3 − Q1.

IfCY ispositive, then thedistribution spreadstoward the right; if CY is negative, the distri-bution spreads toward the left.

DOMAINS AND LIMITATIONSIf the coefficient is positive, then it repre-sents a curve that spreads toward the right. Ifit is positive, then it represents a curve thatspreads toward the left. A coefficient that is

Page 579: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

582 Yule and Kendall Coefficient

close to zero means that the curve is approx-imately symmetric.The Yule and Kendall coefficient can varybetween−1 and+1. In fact, since the medi-an M is always located between the first andthird quartile, the following two extremepossibilities occur:1. M = Q1

2. M = Q3

In the first case the coefficient becomes:

CY = Q3 + Q1 − 2Q1

Q3 − Q1= 1 ,

and in the second case it becomes:

CY = Q3 + Q1 − 2Q3

Q3 − Q1= −1 .

In these conditions, the coefficient should beclose to zero for the distribution to be con-sidered symmetric.This coefficient as well as the other mea-sures of skewness are of interest only if theycan compare the shapes of two or severaldistributions. The results vary considerablyfromoneformula toanother. It isobviousthatcomparisons must be made using the sameformula.

EXAMPLESSuppose that we want to compare the shapeof the distribution of the daily turnover in 75bakeries over 2 years. We then calculate inboth cases the Yule and Kendall coefficients.The data are the following:

Turnover Frequencyyear 1

Frequencyyear 2

215–235 4 25

235–255 6 15

255–275 13 9

275–295 22 8

Turnover Frequencyyear 1

Frequencyyear 2

295–315 15 6

315–335 6 5

335–355 5 4

355–375 4 3

Foryear1, thefirstquartile, themedian, andthe third quartile are respectively:

Q1 = 268.46 ,

M = 288.18 ,

Q3 = 310 .

The coefficient is then the following:

CY = Q3 + Q1 − 2M

Q3 − Q1

= 310+ 268.46− 2(288.18)

310− 268.46= 0.05 .

For year 2, these values are respectively:

Q1 = 230 ,

M = 251.67 ,

Q3 = 293.12 .

The Yule and Kendall coefficient is thenequal to:

CY = Q3 + Q1 − 2M

Q3 − Q1

= 293.12+ 230− 2(251.67)

293.12− 230

= 0.31 .

For year 1, since the coefficient gives a resultthat is close to zero, we can admit that thedistribution of the daily turnover in the 75bakeries is very close to a symmetric distri-bution. For year 2, the coefficient is high-er; this means that the distribution spreadstoward the right.

Page 580: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Y

Yule, George Udny 583

FURTHER READING� Measure of shape� Measure of skewness

REFERENCESYule, G.V., Kendall, M.G.: An Introduction

to the Theory of Statistics, 14th edn. Grif-fin, London (1968)

Yule, George Udny

Yule, George Udny was born in 1871 inBeech Hill near Haddington (Ecosse) anddied in 1951 in Cambridge.The Yule coefficient of association, theYule paradox (also called later the Simp-son paradox), and the Yule procedure bearhis name in statistics who also contribut-ed to the Mendel theory and the timeseries.

Selected principal works of Yule, GeorgeUdny:

1897 On the Theory of Correlation. J. Roy.Stat. Soc. 60, 812–854.

1900 On the association of attributes instatistics: with illustration from thematerialof thechildhoodsociety.Phi-los. Trans. Roy. Soc. Lond. Ser. A194, 257–319.

1926 Why do we sometimes get nonsense-correlations between time-series?A study in sampling and the natureof time-series. J. Roy. Stat. Soc. (2)89, 1–64.

1968 (withKendall,M.G.)AnIntroductionto the Theory of Statistics. 14th edn.Griffin, London.

FURTHER READING� Measure of skewness� Yule and Kendall coefficient

Page 581: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Appendix

Page 582: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

586 Appendix A: Table of Random Numbers, Decimals of π

Appendix A: Table of Random Numbers, Decimals of π

1415926535 3305727036 5024459455 8583616035 8164706001 9465764078

8979323846 5759591953 3469083026 6370766010 6145249192 9512694683

2643383279 9218611739 4252230825 4710181942 1732172147 9835259570

5028841971 8193261179 3344685035 9555961989 7235014144 9825822620

6939937510 3105118548 2619311881 4676783744 1973568548 5224894077

5820974944 7446237999 7101000313 9448255379 1613611573 2671947826

5923078164 6274956735 7838752886 7747268471 5255213347 8482601476

6286208995 1885752724 5875332083 4047534640 5741849468 9909026401

8628034825 8912279381 8142061717 6208046684 4385233239 3639443745

3421170679 8301194912 7669147303 2590694912 7394143337 5305068203

8214808651 9833673362 5982534904 3313677028 4547762416 4962524517

3282306647 4406566430 2875546873 8989152104 8625189835 4939965143

9384460952 8602139494 1159562863 7521620569 6948556209 1429809190

5058223172 6395224737 8823537875 6602405803 9219222184 6592509372

5359408128 1907021798 9375195778 8150193511 2725502542 2169646151

4811174502 6094370277 1857780532 2533824300 5688767179 5709858387

8410270193 5392171760 1712268066 3558764024 4946016530 4105978859

8521105559 2931767523 1300192787 7496473263 4668049886 5977297549

6446229489 8467481846 6611195909 9141992726 2723279178 8930161753

5493038196 7669405132 2164201989 4269922792 6085784383 9284681382

4428810975 5681271702 3809525720 6782354781 8279679766 6868386894

6659334461 4526356082 1065485863 6360093417 8145410095 2774155991

2847564823 7785771342 2788659361 2164121992 3883786360 8559252459

3786783165 7517896091 5338182796 4586315030 9506800642 5395943104

2712019091 7363717872 8230301952 2861829745 2512520511 9972524680

4564856692 1468440901 3530185292 5570674983 7392984896 8459872736

3460348610 2249534301 6899577362 8505494588 8412848862 4469584865

4543266482 4654958537 2599413891 5869269956 2694560424 3836736222

1339360726 1050792279 2497217752 9092721079 1965285022 6260991246

2491412733 6892589235 8347913151 7509302955 2106611863 8051243888

7245870066 4201995611 5574857242 3211653449 6744278629 4390451244

6315588170 2129021960 4541500959 8720275596 2039194945 1365497627

4881520920 8640344181 5082953311 2364806657 4712371373 8079771569

9628292540 5981362977 6861727855 4991198818 8696095636 1435997700

9171536436 4771309960 8890750983 3479775356 4371917287 1296160894

7892590360 5187072113 8175463746 6369807426 4677646575 4169486855

1133053052 4999999837 4939319255 5425278625 7396241389 5848406353

4882046652 2978049951 6040092774 5181841757 8658326451 4220722258

1384146951 5973173280 1671139001 4672890977 9958133904 2848864815

9415116094 1609631859 9848824012 7727938000 7802759009 8456028506

Page 583: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Appendix B: Binomial Table 587

Appendix B: Binomial Table

Number Probability of successof trials 0.01 0.02 0.03 0.04 0.05 0.1 0.2 0.3 0.5

0 0.9510 0.9039 0.8587 0.8154 0.7738 0.5905 0.3277 0.1681 0.0313

1 0.9990 0.9962 0.9915 0.9852 0.9774 0.9185 0.7373 0.5282 0.1875

2 1.0000 0.9999 0.9997 0.9994 0.9988 0.9914 0.9421 0.8369 0.5000

5 3 1.0000 1.0000 1.0000 1.0000 1.0000 0.9995 0.9933 0.9692 0.8125

4 1 1.0000 1.0000 1.0000 1.0000 1.0000 0.9997 0.9976 0.9688

5 1 1 1 1 1 1 1 1 1

0 0.9044 0.8171 0.7374 0.6648 0.5987 0.3487 0.1074 0.0282 0.0010

1 0.9957 0.9838 0.9655 0.9418 0.9139 0.7361 0.3758 0.1493 0.0107

2 0.9999 0.9991 0.9972 0;9938 0.9885 0.9298 0.6778 0.3828 0.0547

3 1.0000 1.0000 0.9999 0.9996 0.9990 0.9872 0.8791 0.6496 0.1719

4 1.0000 1.0000 1.0000 1.0000 0.9999 0.9984 0.0672 0.8497 0.3770

5 1 1.0000 1.0000 1.0000 1.0000 0.9999 0.9936 0.9527 0.6230

10 6 1 1 1.0000 1.0000 1.0000 1.0000 0.9991 0.9894 0.8281

7 1 1 1 1 1.0000 1.0000 0.9999 0.9984 0.9453

8 1 1 1 1 1 1.0000 1.0000 0.9999 0.9893

9 1 1 1 1 1 1 1.0000 1.0000 0.9990

10 1 1 1 1 1 1 1 1 1

0 0.8601 0.7386 0.6333 0.5421 0.4633 0.2059 0.0352 0.0047 0.0000

1 0.9904 0.9647 0.9270 0.8809 0.8290 0.5490 0.1671 0.0353 0.0005

2 0.9996 0.9970 0.9906 0.9797 0.9638 0.8159 0.3980 0.1268 0.0037

3 1.0000 0.9998 0.9992 0.9976 0.9945 0.9444 0.6482 0.2969 0.0176

4 1.0000 1.0000 0.9999 0.9998 0.9994 0.9873 0.8358 0.5155 0.0592

5 1.0000 1.0000 1.0000 1.0000 0.9999 0.9978 0.9389 0.7216 0.1509

6 1 1.0000 1.0000 1.0000 1.0000 0.9997 0.9819 0.8689 0.3036

15 7 1 1 1.0000 1.0000 1.0000 1.0000 0.9958 0.9500 0.5000

8 1 1 1 1.0000 1.0000 1.0000 0.9992 0.9848 0.6964

9 1 1 1 1 1 1.0000 0.9999 0.9963 0.8491

10 1 1 1 1 1 1.0000 1.0000 0.9993 0.9408

11 1 1 1 1 1 1 1.0000 0.9999 0.9824

12 1 1 1 1 1 1 1.0000 1.0000 0.9963

13 1 1 1 1 1 1 1.0000 1.0000 0.9995

14 1 1 1 1 1 1 1 1.0000 1.0000

15 1 1 1 1 1 1 1 1 1

Page 584: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

588 Appendix B: Binomial Table

Number Probability of successof trials 0.01 0.02 0.03 0.04 0.05 0.1 0.2 0.3 0.5

0 0.8179 0.6676 0.5438 0.4420 0.1585 0.1216 0.0115 0.0008 0.0000

1 0.9831 0.9401 0.8802 0.8103 0.7358 0.3917 0.0692 0.0076 0.0000

2 0.9990 0.9929 0.9790 0.9561 0.9245 0.6769 0.2061 0.0355 0.0002

3 1.0000 0.9994 0.9973 0.9926 0.9841 0.8670 0.4114 0.1071 0.0013

4 1.0000 1.0000 0.9997 0.9990 0;9974 0.9568 0.6296 0.2375 0.0059

5 1.0000 1.0000 1.0000 0.9999 0.9001 0.9887 0.8042 0.4164 0.0207

6 1.0000 1.0000 1.0000 1.0000 1.0000 0.9976 0.9133 0.6080 0.0577

7 1 1.0000 1.0000 1.0000 1.0000 0.9996 0.9679 0.7723 0.1316

8 1 1 1.0000 1.0000 1.0000 0.9999 0.9900 0.8867 0.2517

9 1 1 1 1.0000 1.0000 1.0000 0.9974 0.9520 0.4119

20 10 1 1 1 1 1.0000 1.0000 0.9994 0.9829 0.5881

11 1 1 1 1 1 1.0000 0.9999 0.9949 0.7483

12 1 1 1 1 1 1.0000 1.0000 0.9987 0.8684

13 1 1 1 1 1 1 1.0000 0.9997 0.9423

14 1 1 1 1 1 1 1.0000 1.0000 0.9793

15 1 1 1 1 1 1 1.0000 1.0000 0.9941

16 1 1 1 1 1 1 1.0000 1.0000 0.9987

17 1 1 1 1 1 1 1 1.0000 0.9998

18 1 1 1 1 1 1 1 1.0000 1.0000

19 1 1 1 1 1 1 1 1 1.0000

20 1 1 1 1 1 1 1 1 1

Page 585: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Appendix B: Binomial Table 589

Number Probability of successof trials 0.01 0.02 0.03 0.04 0.05 0.1 0.2 0.3 0.5

0 0.7397 0.5455 0.4010 0.2939 0.2146 0.0424 0.0012 0.0000 0.0000

1 0.9639 0.8795 0.7731 0.6612 0.5535 0.1837 0.0105 0.0003 0.0000

2 0.9967 0.9783 0.9399 0.8831 0.8122 0.4114 0.0442 0.0021 0.0000

3 0.9998 0.9971 0.9881 0.9694 0.9392 0.6474 0.1227 0.0093 0.0000

4 1.0000 0.9997 0.9982 0.9937 0.9844 0.8245 0.2552 0.0302 0.0000

5 1.0000 1.0000 0.9998 0.9989 0.9967 0.9268 0.4275 0.0766 0.0002

6 1.0000 1.0000 1.0000 0.9999 0.9994 0.9742 0.6070 0.1595 0.0007

7 1 1.0000 1.0000 1.0000 0.9999 0.9922 0.7608 0.2814 0.0026

8 1 1.0000 1.0000 1.0000 1.0000 0.9980 0.8713 0.4315 0.0081

9 1 1 1.0000 1.0000 1.0000 0.9995 0.9389 0.5888 0.0214

10 1 1 1.0000 1.0000 1.0000 0.9999 0.9744 0.7304 0.0494

11 1 1 1 1.0000 1.0000 1.0000 0.9905 0.8407 0.1002

12 1 1 1 1 1.0000 1.0000 0.9969 0.9155 0.1808

13 1 1 1 1 1 1.0000 0.9991 0.9599 0.2923

14 1 1 1 1 1 1.0000 0.9998 0.9831 0.4278

30 15 1 1 1 1 1 1.0000 0.9999 0.9936 0.5722

16 1 1 1 1 1 1 1.0000 0.9979 0.7077

17 1 1 1 1 1 1 1.0000 0.9994 0.8192

18 1 1 1 1 1 1 1.0000 0.9998 0.8998

19 1 1 1 1 1 1 1.0000 1.0000 0.9506

20 1 1 1 1 1 1 1.0000 1.0000 0.9786

21 1 1 1 1 1 1 1 1.0000 0.9919

22 1 1 1 1 1 1 1 1.0000 0.9974

23 1 1 1 1 1 1 1 1.0000 0.9993

24 1 1 1 1 1 1 1 1.0000 0.9998

25 1 1 1 1 1 1 1 1 1.0000

26 1 1 1 1 1 1 1 1 1.0000

27 1 1 1 1 1 1 1 1 1.0000

28 1 1 1 1 1 1 1 1 1.0000

29 1 1 1 1 1 1 1 1 1.0000

30 1 1 1 1 1 1 1 1 1

Page 586: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

590 Appendix C: Fisher Table for α = 0.05

Appendix C: Fisher Table for α = 0.05

ν1

ν2 1 2 3 4 5 6 7 8 9 10

1 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 241.9

2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40

3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79

4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96

5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74

6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06

7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64

8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35

9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14

10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98

11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85

12 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75

13 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67

14 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60

15 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54

16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49

17 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45

18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41

19 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38

20 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35

21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32

22 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30

23 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.27

24 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25

25 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.24

26 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22

27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.20

28 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19

29 4.18 3.33 2.93 2.70 2.55 2.43 2;35 2.28 2.22 2.18

30 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16

40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08

60 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99

120 3.92 3.07 2.68 2.45 2.29 2.17 2.09 2.02 1.96 1.91

∞ 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 1.83

Page 587: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Appendix C: Fisher Table for α = 0.05 591

ν1

ν2 12 15 20 24 30 40 60 120 ∞1 243.9 245.9 248.0 249.1 250.1 251.1 252.2 253.3 254.3

2 19.41 19.43 19.45 19.45 19.46 19.47 19.48 19.49 19.50

3 8.74 8.70 8.66 8.64 8.62 8.59 8.57 8.55 8.53

4 5.91 5.86 5.80 5.77 5.75 5.72 5.69 5.66 5.63

5 4.68 4.62 4.56 4.53 4.50 4.46 4.43 4.40 4.36

6 4.00 3.94 3.87 3.84 3.81 3.77 3.74 3.70 3.67

7 3.57 3.51 3.44 3.41 3.38 3.34 3.30 3.27 3.23

8 3.28 3.22 3.15 3.12 3.08 3.04 3.01 2.97 2.93

9 3.07 3.01 2.94 2.90 2.86 2.83 2.79 2.75 2.71

10 2.91 2.85 2.77 2.74 2.70 2.66 2.62 2.58 2.54

11 2.79 2.72 2.65 2.61 2.57 2.53 2.49 2.45 2.40

12 2.69 2.62 2.54 2.51 2.47 2.43 2.38 2.34 2.30

13 2.60 2.53 2.46 2.42 2.38 2.34 2.30 2.25 2.21

14 2.53 2.46 2.39 2.35 2.31 2.27 2.22 2.18 2.13

15 2.48 2.40 2.33 2.29 2.25 2.20 2.16 2.11 2.07

16 2.42 2.35 2.28 2.24 2.19 2.15 2.11 2.06 2.01

17 2.38 2.31 2.23 2.19 2.15 2.10 2.06 2.01 1.96

18 2.34 2.27 2.19 2.15 2.11 2.06 2.02 1.97 1.92

19 2.31 2.23 2.16 2.11 2.07 2.03 1.98 1.93 1.88

20 2.28 2.20 2.12 2.08 2.04 1.99 1.95 1.90 1.84

21 2.25 2.18 2.10 2.05 2.01 1;96 1.92 1.87 1.81

22 2.23 2.15 2.07 2.03 1.98 1.94 1.89 1.84 1.78

23 2.20 2.13 2.05 2.01 1.96 1.91 1.86 1.81 1.76

24 2.18 2.11 2.03 1.98 1.94 1.89 1.84 1.79 1.73

25 2.16 2.09 2.01 1.96 1.92 1.87 1.82 1.77 1.71

26 2.15 2.07 1.99 1.95 1.90 1.85 1.80 1.75 1.69

27 2.13 2.06 1.97 1.93 1.88 1.84 1.79 1.73 1.67

28 2.12 2.04 1.96 1.91 1.87 1.82 1.77 1.71 1.65

29 2.10 2.03 1.94 1.90 1.85 1.81 1.75 1.70 1.64

30 2.09 2.01 1.93 1.89 1.84 1.79 1.74 1.68 1.62

40 2.00 1.92 1.84 1.79 1.74 1.69 1.64 1.58 1.51

60 1.92 1.84 1.75 1.70 1.65 1.59 1.53 1.47 1.39

120 1.83 1.75 1.66 1.61 1.55 1.50 1.43 1.35 1.25

∞ 1.75 1.67 1.57 1.52 1.46 1.39 1.32 1.22 1.00

Page 588: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

592 Appendix D: Kruskal-Wallis Table

Appendix D: Kruskal-Wallis Table

n1 n2 n3 Critical value α

2 1 1 2.7000 0.500

2 2 1 3.6000 0.267

2 2 2 4.5711 0.067

3.7143 0.200

3 1 1 3.2000 0.300

3 2 1 4.2857 0.100

3.8571 0.133

3 2 2 5.3572 0.029

4.7143 0.048

4.5000 0.067

4.4643 0.105

3 3 1 5.1429 0.043

4.5714 0.100

4.0000 0.129

3 3 2 6.2500 0.011

5.3611 0.032

5.1389 0.061

4.5556 0.100

4.2500 0.121

3 3 3 7.2000 0.004

6.4889 0.011

5.6889 0.029

5.0667 0.086

4.6222 0.100

4 1 1 3.5714 0.200

4 2 1 4.8214 0.057

4.5000 0.076

4.0179 0.114

4 2 2 6.0000 0.014

5.3333 0.033

5.1250 0.052

4.3750 0.100

4.1667 0.105

4 3 1 5.8333 0.021

5.2083 0.050

5.0000 0.057

4.0556 0.093

3.8889 0.129

n1 n2 n3 Critical value α

4 3 2 6.4444 0.009

6.4222 0.010

5.4444 0.047

5.4000 0.052

4.5111 0.098

4.4666 0.101

4 3 3 6.7455 0.010

6.7091 0.013

5.7909 0.046

5.7273 0.050

4.7091 0.094

4.7000 0.101

4 4 1 6.6667 0.010

6.1667 0.013

4.9667 0.046

4.8667 0.054

4.1667 0.082

4.0667 0.102

4 4 2 7.0364 0.006

6.8727 0.011

5.4545 0.046

5.2364 0.052

4.5545 0.098

4.4455 0.103

4 4 3 7.1739 0.010

7.1364 0.011

5.5985 0.049

5.5758 0.051

4.5455 0.099

4.4773 0.102

4 4 4 7.6538 0.008

7.5385 0.011

5.6923 0.049

5.6538 0.054

4.6539 0.097

4.5001 0.104

Page 589: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Appendix D: Kruskal-Wallis Table 593

n1 n2 n3 Critical value α

5 1 1 3.8571 0.143

5 2 1 5.2500 0.036

5.0000 0.048

4.4500 0.071

4.2000 0.095

4.0500 0.119

5 2 2 6.5333 0.008

6.1333 0.013

5.1600 0.034

5.0400 0.056

4.3733 0.090

4.2933 0.122

5 3 1 6.4000 0.012

4.9600 0.048

4.8711 0.052

4.0178 0.095

3.8400 0.123

5 3 2 6.9091 0.009

6.8606 0.011

5.4424 0.048

5.3455 0.050

4.5333 0.097

4.4121 0.109

5 3 3 6.9818 0.010

6.8608 0.011

5.4424 0.048

5.3455 0.050

4.5333 0.097

4.4121 0.109

5 4 1 6.9545 0.008

6.8400 0.011

4.9855 0.044

4.8600 0.056

3.9873 0.098

3.9600 0.102

5 4 2 7.2045 0.009

7.1182 0.010

5.2727 0.049

5.2682 0.050

4.5409 0.098

4.5182 0.101

n1 n2 n3 Critical value α

5 4 3 7.4449 0.010

7.3949 0.011

5.6564 0.049

5.6308 0.051

4.5487 0.099

4.5231 0.103

5 4 4 7.7604 0.009

7.7440 0.011

5.6571 0.049

5;6116 0.050

4.6187 0.100

4.5527 0.102

5 5 1 7.3091 0.009

6.8364 0.011

5.1273 0.049

4.9091 0.053

4.1091 0.086

4.0364 0.105

5 5 2 7.3385 0.010

7.5429 0.010

5.7055 0.046

5.6264 0.051

4.5451 0.100

4.5363 0.102

5 5 4 7.8229 0.010

7.1914 0.010

5.6657 0.049

5.6429 0.050

4.5229 0.099

4.6200 0.101

5 5 5 8.0000 0.09

7.9800 0.010

5.7800 0.049

5.6600 0.051

4.5600 0.100

4.5000 0.102

Page 590: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

594 Appendix E: Student Table

Appendix E: Student Table

α

ν 0.100 0.050 0.025 0.010 0.005 0.001

1 3.078 6.314 12.706 31.821 63.656 318.289

2 1.886 2.920 4.303 6.965 9.925 22.328

3 1.638 2.353 3.182 4.541 5.841 10.214

4 1.533 2.132 2.776 3.747 4.604 7.173

5 1.476 2.015 2.571 3.365 4.032 5.894

6 1.440 1.943 2.447 3.143 3.707 5.208

7 1.415 1.895 2.315 2.998 3.499 4.785

8 1.397 1.860 2.306 2.896 3.355 4.501

9 1.383 1.833 2.212 2.821 3.250 4.297

10 1.312 1.812 2.228 2.764 3.169 4.144

11 1.363 1.796 2.201 2.718 3.106 4.025

12 1.356 1.182 2.179 2.681 3.055 3.930

13 1.350 1.771 2.160 2.650 3.012 3.852

14 1.345 1.761 2.145 2.624 2.977 3.787

15 1.341 1.753 2.131 2.602 2.947 3.733

16 1.337 1.746 2.120 2.583 2.921 3.686

17 1.133 1.740 2.110 2.567 2.898 3.646

18 1.330 1.734 2.101 2.552 2.878 3.610

19 1.328 1.729 2.093 2.539 2.861 3.579

20 1.325 1.725 2.086 2.528 2.845 3.552

21 1.323 1.721 2.080 2.518 2.831 3.527

22 1.321 1.717 2.074 2.508 2.519 3.505

23 1.319 1.714 2.069 2.500 2.807 3.485

24 1.318 1.711 2.064 2.492 2.797 3.467

25 1.316 1.708 2.060 2.485 2.787 3.450

26 1.315 1.706 2.056 2.479 2.779 3.435

27 1.314 1.103 2.052 2.473 2.771 3.421

28 1.313 1.701 2.048 2.467 2.763 3.408

29 1.311 1.699 2.046 2.462 2.756 3.396

30 1.310 1.697 2.042 2.457 2.750 3.385

40 1.303 1.684 2.021 2.123 2.704 3.301

50 1.299 1.676 2.009 2.403 2.678 3.261

60 1.296 1.571 2.000 2.300 2.660 3.232

70 1.294 1.667 1.994 2.381 2.648 3.211

80 1.292 1.664 1.990 2.374 2.639 3.195

90 1.291 1.662 1.987 2.368 2.632 3.183

100 1.290 1.660 1.984 2.364 2.626 3.174

∞ 1.282 1.645 1.960 2.326 2.576 3.090

Page 591: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Appendix F: Chi-Square Table 595

Appendix F: Chi-Square Table

Degree of freedeom

ν 0.990 0.950 0.10 0.05 0.025 0.01

1 0.000 0.004 2.71 3.84 5.020 6.63

2 0.020 0.103 4.61 5.99 7.380 9.21

3 0.115 0.352 6.25 7.81 9.350 11.34

4 0.297 0.711 7.78 9.49 11.140 13.23

5 0.554 1.145 9.24 11.07 12.830 ]5.09

6 0.872 1.635 10.64 12.53 14.450 16.81

7 1.239 2.107 12.02 14.07 16.010 18.48

8 1.646 2.733 13.36 15.51 17.530 20.09

9 2.088 3.325 14.68 16.92 19.020 21.67

10 2.558 3.940 15.99 18.31 20.480 23.21

11 3.050 4.580 17.29 19.68 21.920 21.72

12 3.570 5.230 18.55 21.03 23.340 26.22

13 4.110 5.890 19.81 21.36 24.740 27.69

14 4.660 6.570 21.06 23.68 26.120 29.14

15 5.230 7.260 22.31 25.00 27.100 30.58

16 5.810 7.960 23.54 26.30 28.850 32.00

17 6.410 8.670 24.77 27.59 30.190 33.41

18 7.020 9.390 25.99 28.87 31.530 34.81

19 7.630 10.120 27.20 30.14 32.850 36.19

20 8.260 10.850 28.41 31.41 34.170 37.57

21 8.900 11.590 29.62 39.67 35.480 38.93

22 9.540 12.340 30.81 33.92 36.780 40.29

23 10.200 13.090 32.01 35.17 38.080 41.64

24 10.860 13.850 33.20 36.42 39.360 42.98

25 11.520 14.610 34.38 37.65 40.650 44.31

26 12.200 15.380 35.56 38.89 41.920 45.64

27 12.880 16.150 36.74 40.11 43.190 46.96

28 13.570 16.930 37.92 41.34 44.160 48.28

29 14.260 17.710 39.09 42.56 45.720 49.59

30 14.950 18.490 40.26 43.77 46.980 50.89

Page 592: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

596 Appendix G: Standard Normal Table

Appendix G: Standard Normal Table

Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359

0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753

0.2 0.5793 0.5832 0 5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141

0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517

0.4 0.6501 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879

0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224

0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549

0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852

0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133

0.9 0.8119 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389

1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621

1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830

1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015

1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177

1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319

1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441

1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545

1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633

1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706

1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767

2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817

2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857

2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890

2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916

2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936

2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952

2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964

2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974

2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981

2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986

3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

Page 593: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Bibliography

Abdi, H.: Binomial distribution: Binomical and SignTests. In: Salkind, N.J. (ed.) Encyclopedia ofMeasurement and Statistics. Sage, Thousand Oaks(2007) � Binomial Test

Abdi, H.: Distance. In: Salkind, N.J. (ed.)Encyclopedia of Measurement and Statistics. Sage,Thousand Oaks (2007) � Distance

Agresti, A.: Categorical Data Analysis. Wiley, NewYork (1990) � Analysis of Categorical Data

Airy, G.B.: Letter from Professor Airy, AstronomerRoyal, to the editor. Astronom. J. 4, 137–138(1856) � Outlier

Allan, F.E., Wishart, J.: A method of estimating theyield of a missing plot in field experimental work.J. Agricult. Sci. 20, 399–406 (1930) � MissingData

Altman, D.G.: Practical Statistics for MedicalResearch. Chapman & Hall, London (1991)� Logistic Regression

Anderson, T.W., Darling, D.A.: Asymptotic theory ofcertain goodness of fit criteria based on stochasticprocesses. Ann. Math. Stat. 23, 193–212 (1952)� Anderson–Darling Test

Anderson, T.W., Darling, D.A.: A test of goodness offit. J. Am. Stat. Assoc. 49, 765–769 (1954)� Anderson–Darling Test

Anderson, T.W., Sclove, S.L.: An Introduction to theStatistical Analysis of Data. Houghton Mifflin,Boston (1978) � Range

Anderson, V.L., McLean, R.A.: Design ofexperiments: a realistic approach. Marcel Dekker,New York (1974) � Model

Andrews D.F., Hertzberg, A.M.: Data: A Collectionof Problems from Many Fields for Students andResearch Workers. Springer, Berlin HeidelbergNew York (1985) � Cp Criterion

Anscombe, F.J., Tukey, J.W.: Analysis of residuals.Technometrics 5, 141–160 (1963) � Analysis ofResiduals

Anscombe, F.J.: Examination of residuals. Proc. 4thBerkeley Symp. Math. Statist. Prob. 1, 1–36 (1961)� Analysis of Residuals

Anscombe, F.J.: Graphs in statistical analysis. Am.Stat. 27, 17–21 (1973) � Analysis of Residuals

Antille, G., Ujvari, A.: Pratique de la StatistiqueInférentielle. PAN, Neuchâtel, Switzerland (1991)� Data Collection

Arbuthnott, J.: An argument for Divine Providence,taken from the constant regularity observed in thebirths of both sexes. Philos. Trans. 27, 186–190(1710) � Hypothesis Testing, � NonparametricTest, � Sign Test

Armatte, M.: La construction des notionsd’estimation et de vraisemblance chez Ronald A.Fisher. In: Mairesse, J. (ed.) Estimation etsondages. Economica, Paris (1988) � Estimation

Armitage, P., Colton, T.: Encyclopedia ofBiostatistics. Wiley, New York (1998)� Biostatistics

Arthanari, T.S., Dodge, Y.: MathematicalProgramming in Statistics. Wiley, New York(1981) � Linear Programming, � Optimization

Asimov, D.: The grand tour: a tool for viewingmultidimensional data. SIAM J. Sci. Stat. Comput.6(1), 128–143 (1985) � Statistical Software

Atteia, O., Dubois, J.-P., Webster, R.: Geostatisticalanalysis of soil contamination in the Swiss Jura.Environ. Pollut. 86, 315–327 (1994)� Geostatistics, � Spatial Data

Bailey, D.E.: Probability and statistics: models forresearch. Wiley, New York (1971) � SampleSpace

Page 594: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

598 Bibliography

Balestra, P.: Calcul matriciel pour économistes.Editions Castella, Albeuve, Switzerland (1972)� Matrix

Barlett, M.S.: An Introduction to StochasticProcesses. Cambridge University Press, Cambridge(1960) � Stochastic Process

Barnard, G.A., Plackett, R.L., Pearson, S.: “Student”:a statistical biography of William Sealy Gosset:based on writings by E.S. Pearson. OxfordUniversity Press, Oxford (1990) � StandardDeviation

Barnett, V., Lewis, T.: Outliers in statistical data.Wiley, New York (1984) � Outlier

Bartlett, M.S.: On the theoretical specification ofsampling properties of autocorrelated time series.J. Roy. Stat. Soc. Ser. B 8, 27–41 (1946) � SerialCorrelation

Bartlett, M.S.: Properties of sufficiency and statisticaltests. Proc. Roy. Soc. Lond. Ser. A 160, 268–282(1937) � Analysis of Categorical Data

Bartlett, M.S.: Some examples of statistical methodsof research in agriculture and applied biology.J. Roy. Stat. Soc. (Suppl.) 4, 137–183 (1937)� Chi-Square Test, � Covariance Analysis,� Missing Data

Basset, G., Koenker, R.: Asymptotic theory of leastabsolute error regression. J. Am. Stat. Assoc. 73,618–622 (1978) � L1 Estimation

Bates, D.M., Watts D.G.: Nonlinear regressionanalysis and its applications. Wiley, New York(1988) � Nonlinear Regression

Bayes, T.: An essay towards solving a problem in thedoctrine of chances. Philos. Trans. Roy. Soc. Lond.53, 370–418 (1763). Published, by the instigationof Price, R., 2 years after his death. Republishedwith a biography by Barnard, George A. in 1958and in Pearson, E.S., Kendall, M.G.: Studies in theHistory of Statistics and Probability. Griffin,London, pp. 131–153 (1970) � Bayes’ Theorem,� Inferential Statistics

Bealle, E.M.L.: Introduction to Optimization. Wiley,New York (1988) � Operations Research

Becker, R.A., Chambers, J.M., Wilks, A.R.: The newS Language. Waldsworth and Brooks/Cole, PacificGrove, CA (1988) � Statistical Software

Belsley, D., Kuh, E., Welsch, R.E.: Regressiondiagnistics Idendifying Influential data and sourcesof collinearity. Wiley, New York (1976)� Multicollinearity

Belsley, D.A., Kuh, E., Welsch, R.E.: Regressiondiagnostics. Wiley, New York pp. 16–19 (1980)� Hat Matrix

Beniger, J.R., Robyn, D.L.: Quantitative graphics instatistics: a brief history. Am. Stat. 32, 1–11 (1978)� Quantitative Graph

Benzécri, J.P.: Histoire et préhistoire de l’analyse desdonnées, les cahiers de l’analyse des données 1,no. 1–4. Dunod, Paris (1976) � CorrespondenceAnalysis, � Data Analysis

Benzécri, J.P.: Histoire et préhistoire de l’analyse desdonnées. Dunod, Paris (1982) � Probability

Benzécri, J.P.: L’Analyse des données. Vol. 1: LaTaxinomie. Vol. 2: L’Analyse factorielle descorrespondances. Dunod, Paris (1976)� Correspondence Analysis, � Data Analysis

Bernardo, J.M., Smith, A.F.M.: Bayesian Theory.Wiley, Chichester (1994) � Bayesian Statistics

Bernoulli, D.: Quelle est la cause physique del’inclinaison des planètes (. . . ). Rec. PiècesRemport. Prix Acad. Roy. Sci. 3, 95–122 (1734)� Hypothesis Testing

Bernoulli, D.: The most probable choice betweenseveral discrepant observations, translation andcomments (1778). In: Pearson, E.S. andKendall, M.G. (eds.) Studies in the History ofStatistics and Probability. Griffin, London (1970)� Estimation

Bernoulli, J.: Ars Conjectandi, Opus Posthumum.Accedit Tractatus de Seriebus infinitis, et EpistolaGallice scripta de ludo Pilae recticularis. ImpensisThurnisiorum, Fratrum, Basel (1713) � BernoulliTrial, � Bernoulli’s Theorem, � BinomialDistribution, � Estimation, � Law of LargeNumbers, � Normal Distribution

Berry, D.A.: Statistics, a Bayesian Perspective.Wadsworth, Belmont, CA (1996) � BayesianStatistics

Bickel, R.D.: Selecting an optimal rejection regionfor multiple testing. Working paper, Office ofBiostatistics and Bioinformation, Medical Collegeof Georgia (2002) � Rejection Region

Bienaymé, I.J.: Considérations à l’appui de ladécouverte de Laplace sur la loi de probabilité dansla méthode des moindres carrés. Comptes Rendusde l’Académie des Sciences, Paris 37, 5–13 (1853);réédité en 1867 dans le journal de Liouvilleprécédant la preuve des Inégalités deBienaymé-Chebyshev, Journal de Mathématiques

Page 595: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Bibliography 599

Pures et Appliquées 12, 158–176 � Law of LargeNumbers

Billingsley, P.: Statistical Inference for MarkovProcesses. University of Chicago Press, Chicago(1961) � Stochastic Process

Birkes, D., Dodge, Y., Seely, J.: Spanning sets forestimable contrasts in classification models. Ann.Stat. 4, 86–107 (1976) � Missing Data

Birkes, D., Dodge, Y.: Alternative Methods ofRegression. Wiley, New York (1993)� Collinearity, � Least Absolute DeviationRegression, � Standardized Data

Bishop, Y.M.M., Fienberg, S.E., Holland, P.W.:Discrete Multivariate Analysis: Theory andPractice. MIT Press, Cambridge, MA (1975)� Binary Data, � Generalized Linear Regression,� Logistic Regression

Bjerhammar, A.: Rectangular reciprocal matrices withspecial reference to geodetic calculations. Bull.Geod. 20, 188–220 (1951) � Generalized Inverse

Bock, R.D.: Multivariate Statistical Methods inBehavioral Research. McGraw-Hill, New York(1975) � Collinearity

Boscovich, R.J. et Maire, C.: De LitterariaExpeditione per Pontificiam ditionem addimetiendas duas Meridiani gradus. Palladis, Rome(1755) � Model

Boscovich, R.J.: De Litteraria Expeditione perPontificiam ditionem, et Synopsis ampliorisOperis, ac habentur plura eius ex exemplaria etiamsensorum impressa. Bononiensi Scientiarium etArtium Instituto Atque Academia Commentarii,vol. IV, pp. 353–396 (1757) � L1 Estimation,� Median

Bourbonnais, R.: Econométrie, manuel et exercicescorrigés, 2nd edn. Dunod, Paris (1998)� Autocorrelation, � Durbin–Watson Test

Boursin, J.-L.: Les structures du hasard. Seuil, Paris(1986) � Data Collection

Bowerman, B.L., O’Connel, R.T.: Time Series andForecasting: An Applied Approach. Duxbury,Belmont, CA (1979) � Irregular Variation

Bowley, A.L.: Presidential address to the economicsection of the British Association. J. Roy. Stat. Soc.69, 540–558 (1906) � Confidence Interval,� Sampling

Box, G.E.P, Draper, N.R.: A basis for the selection ofresponse surface design. J. Am. Stat. Assoc. 54,

622–654 (1959) � Criterion of Total MeanSquared Error

Box, G.E.P., Hunter, W.G., Hunter, J.S.: Statistics forExperimenters. An Introduction to Design, DataAnalysis, and Model Building. Wiley, New York(1978) � Experiment, � Main Effect, � StandardError

Box, G.E.P., Jenkins, G.M.: Time Series Analysis:Forecasting and Control (Series in Time SeriesAnalysis). Holden Day, San Francisco (1970)� ARMA Models, � Autocorrelation, � CyclicalFluctuation, � Serial Correlation

Box, G.E.P., Tiao, G.P.: Bayesian Inference inStatistical Analysis. Addison-Wesley, Reading,MA (1973) � Bayesian Statistics

Box, G.E.P.: Non-normality and tests on variance.Biometrika 40, 318–335 (1953) � RobustEstimation

Brown, L.D., Olkin, I., Sacks, J., Wynn, H.P.: JackCarl Kiefer, Collected Papers III: Design ofExperiments. Springer, Berlin Heidelberg NewYork (1985) � Optimal Design

Carli, G.R.: Del valore etc. In: Opere Scelte di Carli,vol. I, p. 299. Custodi, Milan (1764) � IndexNumber

Carlin, B., Louis, T.A.: Bayes & Empirical BayesMethods. Chapman & Hall /CRC, London (2000)� Bayesian Statistics

Casella, G., Berger, R.L.: Statistical Inference.Duxbury, Pacific Grove (2001) � JointProbability Distribution Function

Cauchy, A.L.: Sur les résultats moyens d’observationsde même nature, et sur les résultats les plusprobables. C.R. Acad. Sci. 37, 198–206 (1853)� Cauchy Distribution

Celeux, G., Diday, E., Govaert, G., Lechevallier, Y.,Ralambondrainy, H.: Classification automatiquedes données—aspects statistiques et informatiques.Dunod, Paris (1989) � Cluster Analysis,� Measure of Dissimilarity

Chambers, J.M and Hastie, T.K (eds.): Statisticalmodels in S. Waldsworth and Brooks/Cole, PacificGrove, CA (1992) � Statistical Software

Chambers, J.M., Cleveland, W.S., Kleiner, B.,Tukey, P.A.: Graphical Methods for Data Analysis.Wadsworth, Belmont, CA (1983) � Q-Q Plot(Quantile to Quantile Plot)

Chambers, J.M.: Computational Methods for DataAnalysis. Wiley, New York (1977) � Algorithm

Page 596: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

600 Bibliography

Chatfield, C.: The Analysis of Time Series: AnIntroduction, 4th edn. Chapman & Hall (1989)� Autocorrelation

Chatfield, C.: The analysis of Time Series. Anintroduction. Chapman’n’Hall (2003) � SecularTrend

Chatterjee, P., Hadi, A.S.: Regression Analysis byExample. Wiley, New York (2006) � MultipleLinear Regression

Cochran, W.G.: Sampling Techniques, 2nd edn.Wiley, New York (1963) � Sample Size,� Simple Random Sampling, � Standard Error

Collatz, L., Wetterling, W.: Optimization problems.Springer, Berlin Heidelberg New York (1975)� Operations Research

Cook, R.D., Weisberg, S.: An Introduction toRegression Graphics. Wiley, New York (1994)� Analysis of Residuals

Cook, R.D., Weisberg, S.: Applied RegressionIncluding Computing and Graphics. Wiley, NewYork (1999) � Analysis of Residuals

Cook, R.D., Weisberg, S.: Residuals and Influence inRegression. Chapman & Hall, London (1982)� Analysis of Residuals

Copeland, M.T.: Statistical indices of businessconditions. Q. J. Econ. 29, 522–562 (1915)� Time Series

Cornfield, J.: A method of estimating comparativerates from clinical data. Applications to cancer ofthe lung, breast, and cervix. J. Natl. Cancer Inst.11, 1269–75 (1951) � Attributable Risk,� Avoidable Risk, � Prevalence Rate, � RelativeRisk, � Risk

Cornfield, J.: Methods in the Sampling of HumanPopulations. The determination of sample size.Amer. J. Publ. Health 41, 654–661 (1951)� Sample Size

Cotes, R.: Aestimatio Errorum in Mixta Mathesi, pervariationes partium Trianguli plani et sphaerici. In:Smith, R. (ed.) Opera Miscellania, Cambridge(1722) � Weighted Arithmetic Mean

Cox, D.R., Box, G.E.P.: An analysis oftransformations (with discussion). J. Roy. Stat. Soc.Ser. B 26, 211–243 (1964) � Transformation

Cox, D.R., Hinkley, D.V.: Theoretical Statistics.Chapman & Hall, London (1973) � LikelihoodRatio Test

Cox, D.R., Miller, H.D.: Theory of StochasticProcesses. Wiley, New York; Methuen, London(1965) � Stochastic Process

Cox, D.R., Reid, N.: Theory of the design ofexperiments. Chapman & Hall, London (2000)� One-Way Analysis of Variance, � Two-wayAnalysis of Variance

Cox, D.R., Snell, E.J.: Analysis of Binary Data, 2ndedn. Chapman & Hall, London (1990)� Analysis of Categorical Data

Cox, D.R., Snell, E.J.: The Analysis of Binary Data.Chapman & Hall (1989) � Analysis of BinaryData, � Binary Data, � Logistic Regression

Cox, D.R.: Estimation by double sampling.Biometrika 39, 217–227 (1952) � Sample Size

Cox, D.R.: Interaction. Int. Stat. Rev. 52, 1–25 (1984)� Interaction

Cox, P.R.: Demography, 5th edn. CambridgeUniversity Press, Cambridge (1976)� Demography

Cressie, N.A.C.: Statistics for Spatial Data. Wiley,New York (1991) � Geostatistics, � SpatialStatistics

Crome, A.F.W.: Geographisch-statistischeDarstellung der Staatskräfte. Weygand, Leipzig(1820) � Graphical Representation

Crome, A.F.W.: Ueber die Grösse und Bevölkerungder sämtlichen Europäischen Staaten. Weygand,Leipzig (1785) � Graphical Representation

Dénes, J., Keedwell, A.D.: Latin squares and theirapplications. Academic, New York (1974)� Graeco-Latin Square Design

Dantzig, G.B.: Applications et prolongements de laprogrammation linéaire. Dunod, Paris (1966)� Linear Programming

David, F.N.: Dicing and gaming (a note on the historyof probability). Biometrika 42, 1–15 (1955)� Probability

Davison, A.C., Hinkley, D.V.: Bootstrap Methods andTheir Application. Cambridge University Press,Cambridge (1997) � Bootstrap

DeLury, D.B.: The analysis of covariance. Biometrics4, 153–170 (1948) � Covariance Analysis

Delanius, T.: Elements of Survey Sampling. SAREC,Stockholm (1985) � Sampling

Deming, W.E.: Sample Design in Business Research.Wiley, New York (1960) � Systematic Sampling

Desrosières, A. La partie pour le tout: commentgénéraliser? La préhistoire de la contrainte de

Page 597: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Bibliography 601

représentativité. Estimation et sondages. Cinqcontributions à l’histoire de la statistique.Economica, Paris (1988) � Confidence Interval,� Sampling

Deuchler, G.: Über die Methoden derKorrelationsrechnung in der Pädagogik undPsychologie. Zeitung für PädagogischePsycholologie und Experimentelle Pädagogik, 15,114–131, 145–159, 229–242 (1914) � KendallRank Correlation Coefficient

Diggle, P. J., P. Heagerty, K.-Y. Liang, andS.L. Zeger.: Analysis of longitudinal data, 2nd edn.Oxford University Press, Oxford (2002) � Panel

Dixon, W.J. chief ed.: BMDP Statistical SoftwareCommunications. The University of CaliforniaPress. Berkeley, CA (1981) � Statistical Software

Dodge, Y. (2004) Optimization Appliquée. Springer,Paris � Lagrange Multiplier

Dodge, Y.: A natural random number generator, Int.Stat. Review, 64, 329–344 (1996) � Generationof Random Numbers

Dodge, Y.: Premiers pas en statistique. Springer, Paris(1999) � Measure of Central Tendency,� Measure of Dispersion

Dodge, Y. (ed.): Statistical Data Analysis Based onthe L1-norm and Related Methods. Elsevier,Amsterdam (1987) � L1 Estimation

Dodge, Y., Hand, D.: What Should Future StatisticalSoftware Look like? Comput. Stat. Data Anal.(1991) � Statistical Software

Dodge, Y., Kondylis, A., Whittaker, J.: ExtendingPLS1 to PLAD regression and the use of the L1norm in soft modelling. COMPSTAT 2004,pp. 935–942, Physica/Verlag-Springer (2004)� Partial Least Absolute Deviation Regression

Dodge, Y., Majumdar, D.: An algorithm for findingleast square generalized inverses for classificationmodels with arbitrary patterns. J. Stat. Comput.Simul. 9, 1–17 (1979) � Generalized Inverse,� Missing Data

Dodge, Y., Rousson, V.: Multivariate L1-mean.Metrika 49, 127–134 (1999) � Norm of a Vector

Dodge, Y., Shah, K.R.: Estimation of parameters inLatin squares and Graeco-latin squares withmissing observations. Commun. Stat. TheoryA6(15), 1465–1472 (1977) � Graeco-LatinSquare Design

Dodge, Y., Thomas, D.R.: On the performance ofnon-parametric and normal theory multiple

comparison procedures. Sankhya B42, 11–27(1980) � Least Significant Difference Test

Dodge, Y., Whittaker, J. (eds.): Computationalstatistics, vols. 1 and 2. COMPSTAT, Neuchâtel,Switzerland, 24–28 August 1992 (English).Physica-Verlag, Heidelberg (1992) � StatisticalSoftware

Dodge, Y., Whittaker, J., Kondylis, A.: Progress onPLAD and PQR. In: Proceedings 55th Session ofthe ISI 2005, pp. 495–503 (2005) � Partial LeastAbsolute Deviation Regression

Dodge, Y.: Analyse de régression appliquée. Dunod,Paris (1999) � Criterion of Total Mean SquaredError

Dodge, Y.: Analysis of Experiments with MissingData. Wiley, New York (1985) � Missing Data

Dodge, Y.: Mathématiques de base pour économistes.Springer, Berlin Heidelberg New York (2002)� Eigenvector, � Norm of a Vector

Dodge, Y.: Principaux plans d’expériences. In:Aeschlimann, J., Bonjour, C., Stocker, E. (eds.)Méthodologies et techniques de plansd’expériences: 28ème cours de perfectionnementde l’Association vaudoise des chercheurs enphysique, Saas-Fee, 2–8 March 1986 (1986)� Factorial Experiment

Dodge, Y.: Some difficulties involving nonparametricestimation of a density function. J. Offic. Stat. 2(2),193–202 (1986) � Histogram

Doob, J.L.: Stochastic Processes. Wiley, New York(1953) � Stochastic Process

Doob, J.L.: What is a stochastic process? Am. Math.Monthly 49, 648–653 (1942) � StochasticProcess

Draper, N.R., Smith, H.: Applied RegressionAnalysis, 3rd edn. Wiley, New York (1998)� Analysis of Residuals, � Nonlinear Regression

Drobisch, M.W.: Über Mittelgrössen und dieAnwendbarkeit derselben auf die Berechnung desSteigens und Sinkens des Geldwerts. Berichte überdie Verhandlungen der König. Sachs. Ges. Wiss.Leipzig. Math-Phy. Klasse, 23, 25 (1871)� Index Number

Droesbeke, J.J., Tassi, P.H.: Histoire de la Statistique.Editions “Que sais-je?” Presses universitaires deFrance, Paris (1990) � Geometric Mean

Dufour, I.M.: Histoire de l’analyse de SériesChronologues, Université de Montréal, Canada(2006) � Time Series

Page 598: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

602 Bibliography

Duncan, G.J.: Household Panel Studies: Prospectsand Problems. Social Science Methodology,Trento, Italy (1992) � Panel

Duncan, J.W., Gross, A.C.: Statistics for the 21stCentury: Proposals for Improving Statistics forBetter Decision Making. Irwin ProfessionalPublishing, Chicago (1995) � Official Statistics

Dupâquier, J. et Dupâquier, M.: Histoire de ladémographie. Académie Perrin, Paris (1985)� Official Statistics

Durbin, J., Knott, M., Taylor, C.C.: Components ofCramer-Von Mises statistics, II. J. Roy. Stat. Soc.Ser. B 37, 216–237 (1975) � Anderson–DarlingTest

Durbin, J., Watson, G.S.: Testing for serial correlationin least squares regression, I. Biometrika 37,409–428 (1950) � Durbin–Watson Test

Durbin, J., Watson, G.S.: Testing for serial correlationin least squares regression, II. Biometrika 38,159–177 (1951) � Durbin–Watson Test, � SerialCorrelation

Durbin, J.: Alternative method to d-test. Biometrika56, 1–15 (1969) � Durbin–Watson Test

Dutot, C.: Réflexions politiques sur les finances, et lecommerce. Vaillant and Nicolas Prevost, TheHague (1738) � Index Number

Edgeworth F.Y.: A new method of reducingobservations relating to several quantities. Philos.Mag. (5th Ser) 24, 222–223 (1887) � L1Estimation

Edgeworth, F.Y.: Methods of Statistics. JubileeVolume of the Royal Statistical Society, London(1885) � Hypothesis Testing

Edington, E. S.: Randomisation Tests. MarcelDekker, New York (1980) � Resampling

Edwards, A.W.F.: Likelihood. An account of thestatistical concept of likelihood and its applicationto scientific inference. Cambridge University Press,Cambridge (1972) � Likelihood Ratio Test

Edwards, D.: Introduction to Graphical Modelling.Springer, Berlin Heidelberg New York (1995)� Statistical Software

Efron, B., Tibshirani, R.J.: An Introduction to theBootstrap. Chapman & Hall, New York (1993)� Bootstrap

Efron, B.: Bootstrap methods: another look at thejackknife. Ann. Stat. 7, 1–26 (1979) � Bootstrap

Efron, B.: The Jackknife, the Bootstrap and the otherResampling Plans. Society for Industrial and

Applied Mathematics, Philadelphia (1982)� Jackknife Method

Efron, B.: The jackknife, the bootstrap, and otherresampling plans, CBMS Monograph. No. 38.SIAM, Philadelphia (1982) � Resampling

Eggenberger, F. und Polya, G.: Über die Statistikverketteter Vorgänge. Zeitschrift für angewandteMathematische Mechanik 3, 279–289 (1923)� Negative Binomial Distribution

Eisenhart, C.: Boscovich and the Combination ofObservations. In: Kendall, M., Plackett, R.L. (eds.)Studies in the History of Statistics and Probability,vol. II. Griffin, London (1977) � RegressionAnalysis

Elderton, W.P.: Tables for testing the goodness of fitof theory to observation. Biometrika 1, 155–163(1902) � Chi-Square Table

Euler, L.: Recherches sur la question des inégalités dumouvement de Saturne et de Jupiter, pièce ayantremporté le prix de l’année 1748, par l’Académieroyale des sciences de Paris. Republié en 1960,dans Leonhardi Euleri, Opera Omnia, 2ème série.Turici, Bâle, 25, pp. 47–157 (1749) � Analysisof Residuals, � Median, � Model

Euler, L.: Recherches sur une nouvelle espèce decarrés magiques. Verh. Zeeuw. Gen. Weten.Vlissengen 9, 85–239 (1782) � Graeco-LatinSquare Design

Evelyn, Sir G.S.: An account of some endeavoursto ascertain a standard of weight and measure.Philos. Trans. 113 (1798) � Index Number

Everitt, B.S.: Cluster Analysis. Halstead, London(1974) � Classification, � Cluster Analysis,� Data Analysis

Fechner, G.T.: Kollektivmasslehre. W. Engelmann,Leipzig (1897) � Kendall Rank CorrelationCoefficient

Federer, W.T., Balaam, L.N.: Bibliography onExperiment and Treatment Design Pre-1968.Hafner, New York (1973) � Design ofExperiments

Federer, W.T.: Data collection. In: Kotz, S.,Johnson, N.L. (eds.) Encyclopedia of StatisticalSciences, vol. 2. Wiley, New York (1982) � Data

Fedorov, V.V.: Theory of Optimal Experiments.Academic, New York (1972) � Optimal Design

Festinger, L.: The significance of difference betweenmeans without reference to the frequency

Page 599: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Bibliography 603

distribution function. Psychometrika 11, 97–105(1946) � Mann–Whitney Test

Fienberg, S.E.: Graphical method in statistics. Am.Stat. 33, 165–178 (1979) � GraphicalRepresentation

Fienberg, S.E.: The Analysis of Cross-ClassifiedCategorical Data, 2nd edn. MIT Press, Cambridge,MA (1980) � Contingency Table

Finetti, B. de: Theory of Probability; A CriticalIntroductory Treatment. Trans. Machi, A.,Smith, A. Wiley, New York (1975) � BayesianStatistics

Fisher, I.: The Making of Index Numbers. HoughtonMifflin, Boston (1922) � Fisher Index

Fisher, R.A., Mackenzie, W. A.: The manurialresponse of different potato varieties. J. Agricult.Sci. 13, 311–320 (1923) � Nonlinear Regression

Fisher, R.A., Yates, F.: Statistical Tables forBiological, Agricultural and Medical Research, 6thedn. Hafner (Macmillan), New York (1963)� Graeco-Latin Square Design

Fisher, R.A., Yates, F.: Statistical Tables forBiological, Agricultural and Medical Research.Oliver and Boyd, Edinburgh and London (1963)� Fisher Table

Fisher, R.A.: Applications of “Student’s” distribution.Metron 5, 90–104 (1925) � Degree of Freedom,� Student Distribution, � Student Table, � StudentTest

Fisher, R.A.: Moments and product moments ofsampling distributions. Proc. Lond. Math. Soc. 30,199–238 (1929) � Method of Moments

Fisher, R.A.: On an absolute criterium for fittingfrequency curves. Mess. Math. 41, 155–160 (1912)� Estimation, � Maximum Likelihood

Fisher, R.A.: On the interpretation of χ2 fromcontingency tables, and the calculation of P. J. Roy.Stat. Soc. Ser. A 85, 87–94 (1922) � Chi-squareTest of Independence

Fisher, R.A.: On the mathematical foundations oftheoretical statistics. Philos. Trans. Roy. Soc. Lond.Ser. A 222, 309–368 (1922) � Estimation,� Nonlinear Regression

Fisher, R.A.: Statistical Methods and ScientificInference. Oliver & Boyd, Edinburgh (1956)� Inferential Statistics

Fisher, R.A.: Statistical Methods for ResearchWorkers. Oliver & Boyd, Edinburgh (1925)� Analysis of Variance, � Block, � Covariance

Analysis, � Fisher Distribution, � Interaction,� Latin Square Designs, � Randomization,� Two-way Analysis of Variance, � Variance

Fisher, R.A.: Statistical Methods, ExperimentalDesign and Scientific Inference. A re-issue ofStatistical Methods for Research Workers, “TheDesign of Experiments’, and “Statistical Methodsand Scientific Inference.” Bennett, J.H., Yates, F.(eds.) Oxford University Press, Oxford (1990)� Randomization

Fisher, R.A.: The Design of Experiments. Oliver &Boyd, Edinburgh (1935) � Hypothesis,� Interaction, � Least Significant Difference Test,� Randomization, � Resampling, � Variance

Fisher, R.A.: The arrangement of field experiments.J. Ministry Agric. 33, 503–513 (1926) � Designof Experiments, � Randomization

Fisher, R.A.: The correlation between relatives on thesupposition of Mendelian inheritance. Trans. Roy.Soc. Edinb. 52, 399–433 (1918) � Variance

Fisher, R.A.: The moments of the distribution fornormal samples of measures of departure fromnormality. Proc. Roy. Soc. Lond. Ser. A 130, 16–28(1930) � Method of Moments

Fisher, R.A.: The precision of discriminant functions.Ann. Eugen. (London) 10, 422–429 (1940)� Correspondence Analysis, � Data Analysis

Fisher, R.A.: Theory of statistical estimation. Proc.Camb. Philos. Soc. 22, 700–725 (1925)� Estimation

Fisher-Box, J.: Guinness, Gosset, Fisher, and smallsamples. Stat. Sci. 2, 45–52 (1987) � Gosset,William Sealy

Fisher-Box, J.: R.A. Fisher, the Life of a Scientist.Wiley, New York (1978) � Fisher, RonaldAylmer

Fitzmaurice, G., Laird, N., Ware, J.: AppliedLongitudinal Analysis. Wiley, New York (2004)� Longitudinal Data

Flückiger, Y.: Analyse socio-économique desdifférences cantonales de chômage. Le chômage enSuisse: bilan et perspectives. Forum Helvet. 6,91–113 (1995) � Gini Index

Fleetwood, W.: Chronicon Preciosum, 2nd edn.Fleetwood, London (1707) � Index Number

Fletcher, R.: Practical Methods of Optimization.Wiley, New York (1997) � Nonlinear Regression

Foody, W.N., Hedagat, A.S.: On theory andapplication of BIB designs with repeated blocks.

Page 600: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

604 Bibliography

Annals of Statistics, V=S 932–945 (1977)� Sampling

Francis, I.: Statistical software: a comparative rewiew.North Holland, New York (1981) � StatisticalSoftware

Frank, I., Friedman, J.: A statistical view of somechemometrics regression tools. Technometrics 35,109–135 (1993) � Partial Least-SquaresRegression

Fredholm, J.: Sur une classe d’équationsfonctionnelles. Acta Math. 27, 365–390 (1903)� Generalized Inverse

Freedman, D., Pisani, R., Purves, R.: Statistics.Norton, New York (1978) � Histogram

Fuller, A.W.: Introduction to Statistical Time Series.John Wiley’n’Sas (1976) � Seasonal Variation

Funkhauser, H.G.: A note on a 10th century graph.Osiris 1, 260–262 (1936) � Time Series

Galton, F.: Co-relations and their measurement,chiefly from anthropological data. Proc. Roy. Soc.Lond. 45, 135–145 (1888) � CorrelationCoefficient, � Genetic Statistics

Galton, F.: Kinship and correlation. North Am. Rev.150, 419–431 (1890) � Correlation Coefficient

Galton, F.: Memories of My Life. Methuen, London(1908) � Correlation Coefficient

Galton, F.: Natural Inheritance. Macmillan, London(1889) � Correlation Coefficient, � InferentialStatistics, � Regression Analysis

Galton, F.: The geometric mean in vital and socialstatistics. Proc. Roy. Soc. Lond. 29, 365–367(1879) � Lognormal Distribution

Galton, F.: Typical laws of heredity. Proc. Roy. Inst.Great Britain 8, 282–301 (1877) (reprinted in:Stigler, S.M. (1986) The History of Statistics: TheMeasurement of Uncertainty Before 1900. BelknapPress, Cambridge, MA, p. 280) � GeneticStatistics

Gauss, C.F.: Bestimmung der Genauigkeit derBeobachtungen, vol. 4, pp. 109–117 (1816). In:Gauss, C.F. Werke (published in 1880).Dieterichsche Universitäts-Druckerei, Göttingen� Normal Distribution, � Standard Error

Gauss, C.F.: Gauss’ Work (1803–1826) on the Theoryof Least Squares. Trotter, H.F., trans. StatisticalTechniques Research Group, Technical Report,No. 5, Princeton University Press, Princeton, NJ(1957) � Standard Error

Gauss, C.F.: In: Gauss’s Work (1803–1826) on theTheory of Least Squares. Trans. Trotter, H.F.Statistical Techniques Research Group, TechnicalReport No. 5. Princeton University Press,Princeton, NJ (1821) � Mean Squared Error

Gauss, C.F.: Méthode des Moindres Carrés.Mémoires sur la Combinaison des Observations.Traduction Française par J. Bertrand.Mallet-Bachelier, Paris (1855) � Bias

Gauss, C.F.: Theoria Combinationis ObservationumErroribus Minimis Obnoxiae, Parts 1, 2 and suppl.Werke 4, 1–108 (1821, 1823, 1826) � Bias,� Gauss–Markov Theorem

Gauss, C.F.: Theoria Motus Corporum Coelestium.Werke, 7 (1809) � Error, � Estimation,� Inferential Statistics, � Normal Distribution,� Regression Analysis

Gavarret, J.: Principes généraux de statistiquemédicale. Beché Jeune & Labé, Paris (1840)� Hypothesis Testing

Gelman, A., Carlin, J.B., Stern H.S., Rubin, D.B.:Bayesian Data Analysis. Chapman & Hall /CRC,London (2003) � Bayesian Statistics

Gibbens, R.J., Pratt, J.W.: P-values Interpretation andMethodology. Am. Stat. 29(1), 20–25 (1975)� p-Value

Gibbons, J.D., Chakraborti, S.: NonparametricStatistical Inference, 4th ed. CRC (2003)� Nonparametric Statistics

M Inference. McGraw-Hill, New York (1971)� Wilcoxon Test

Glasser, G.J., Winter, R.F.: Critical values of thecoefficient of rank correlation for testing thehypothesis of independance. Biometrika 48,444–448 (1961) � Spearman Rank CorrelationCoefficient

Goovaerts, P.: Geostatistics for Natural ResourcesEvaluation. Oxford University Press, Oxford(1997) � Geostatistics, � Spatial Data, � SpatialStatistics

Gordon, A.D.: Classification. Methods for theExploratory Analysis of Multivariate Data.Chapman & Hall, London (1981)� Classification, � Cluster Analysis

Gosset, S.W. (“Student”): On the error of countingwith a haemacytometer. Biometrika 5, 351–360(1907) � Negative Binomial Distribution

Gosset, S.W. “Student”: New tables for testing thesignificance of observations. Metron 5, 105–120

Page 601: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Bibliography 605

(1925) � Student Distribution, � Student Table,� Student Test

Gosset, S.W. “Student”: The Probable Error ofa Mean. Biometrika 6, 1–25 (1908) � InferentialStatistics, � Simulation, � Standard Deviation,� Student Distribution, � Student Table, � StudentTest

Gourieroux, C.: Théorie des sondages. Economica,Paris (1981) � Survey

Gower, J.C.: Classification, geometry and dataanalysis. In: Bock, H.H. (ed.) Classification andRelated Methods of Data Analysis. Elsevier,Amsterdam (1988) � Data Analysis

Graunt, J.: Natural and political observationsmentioned in a following index, and made upon thebills of mortality: with reference to thegovernment, religion, trade, growth, ayre, diseases,and the several changes of the said city. Tho.Roycroft, for John Martin, James Allestry, andTho. Dicas (1662) � Biostatistics

Graybill, F.A.: Theory and Applications of the LinearModel. Duxbury, North Scituate, MA (Waldsworthand Brooks/Cole, Pacific Grove, CA ) (1976)� Gauss–Markov Theorem, � Generalized LinearRegression, � Matrix

Greenacre, M.: Theory and Applications ofCorrespondence Analysis. Academic, London(1984) � Correspondence Analysis

Greenwald, D. (ed.): Encyclopédie économique.Economica, Paris, pp. 164–165 (1984) � GiniIndex

Greenwood, J.A., Hartley, H.O.: Guide to Tables inMathematical Statistics. Princeton UniversityPress, Princeton, NJ (1962) � Normal Table

Greenwood, M., Yule, G.U.: An inquiry into thenature of frequency distributions representative ofmultiple happenings with particular reference tothe occurrence of multiple attacks of disease or ofrepeated accidents. J. Roy. Stat. Soc. Ser. A 83,255–279 (1920) � Negative BinomialDistribution

Greenwood, M.: Medical Statistics from Graunt toFarr. Cambridge University Press, Cambridge(1948) � Biostatistics

Grosbras, J.M.: Méthodes statistiques des sondages.Economica, Paris (1987) � Survey

Guerry, A.M.: Essai sur la statistique morale de laFrance. Crochard, Paris (1833) � GraphicalRepresentation

Haberman, S.J.: Analysis of Qualitative Data. Vol. I:Introductory Topics. Academic, New York (1978)� Analysis of Categorical Data

Hadley, G.: Linear programming. Addison-Wesley,Reading, MA (1962) � Linear Programming

Hall, P.: The Bootstrap and Edgeworth Expansion.Springer, Berlin Heidelberg New York (1992)� Bootstrap

Hand, D.J.: Discrimination and classification. Wiley,New York (1981) � Measure of Dissimilarity

Hansen, M.H., Hurwitz, W.N., Madow, M.G.: SampleSurvey Methods and Theory. Vol. I. Methods andApplications. Vol. II. Theory. Chapman & Hall,London (1953) � Cluster Sampling

Hardy, G.H.: Mendelian proportions in a mixedpopulation. Science 28, 49–50 (1908) � GeneticStatistics

Harnett, D.L., Murphy, J.L.: Introductory StatisticalAnalysis. Addison-Wesley, Reading, MA (1975)� Irregular Variation, � Sample Size

Harter, H.L., Owen, D.B.: Selected Tables inMathematical Statistics, vol. 1. Markham, Chicago(5.7 Appendix) (1970) � Wilcoxon Signed Table

Harvard University: Tables of the CumulativeBinomial Probability Distribution, vol. 35. Annalsof the Computation Laboratory, Harvard UniversityPress, Cambridge, MA (1955) � BinomialDistribution, � Binomial Table

Harvey, A.C.: The Econometric Analysis of TimeSeries. Philip Allan, Oxford (Wiley, New York )(1981) � Durbin–Watson Test

Hauser, P.M., Duncan, O.D.: The Study ofPopulation: An Inventory and Appraisal.University of Chicago Press, Chicago, IL (1959)� Demography

Hayes, A.R.: Statistical Software: A survey andCritique of its Development. Office of NavalResearch, Arlington, VA (1982) � StatisticalSoftware

Hecht, J.: L’idée du dénombrement jusqu’à laRévolution. Pour une histoire de la statistique,tome 1, pp. 21–82 . Economica/INSEE (1978)� Census

Hedayat, A.S., Sinha, B.K.: Design and Inference inFinite Population Sampling. Wiley, New York(1991) � Sampling

Helland: On the structure of PLS regression, Comm.in Stat-Simul. and Comp., 17, 581–607 (1988)� Partial Least-Squares Regression

Page 602: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

606 Bibliography

Heyde, C.E., Seneta, E.: I.J. Bienaymé. StatisticalTheory Anticipated. Springer, Berlin HeidelbergNew York (1977) � Chebyshev, Pafnutii Lvovich

Hirschfeld, H.O.: A connection between correlationand contingency. Proc. Camb. Philos. Soc. 31,520–524 (1935) � Correspondence Analysis,� Data Analysis

Hoaglin, D.C., Mosteller, F., Tukey, J.W. (eds.):Understanding Robust and Exploratory DataAnalysis. Wiley, New York (1983) � Q-Q Plot(Quantile to Quantile Plot)

Hoaglin, D.C., Welsch, R.E.: The hat matrix inregression and ANOVA. Am. Stat. 32, 17–22 (andcorrection at 32, 146) (1978) � Hat Matrix

Hoerl A.E., Kennard R.W., Baldwin K.F.: Ridgeregression: some simulations. Commun. Stat.,vol. 4, pp. 105–123 (1975) � Ridge Regression

Hoerl A.E., Kennard R.W.: Ridge regression: biasedestimation for nonorthogonal Problems.Technometrics 12, 55–67 (1970) � RidgeRegression

Hoerl A.E.: Application of Ridge Analysis toRegression Problems. Chem. Eng. Prog. 58(3),54–59 (1962) � Ridge Regression

Hogg, R.V., Craig, A.T.: Introduction toMathematical Statistics, 2nd edn. Macmillan, NewYork (1965) � Estimator, � Joint DistributionFunction, � Marginal Density Function, � Pair ofRandom Variables

Holgate, P.: The influence of Huygens’work inDynamics on his Contribution to Probability. Int.Stat. Rev. 52, 137–140 (1984) � Probability

Hosmer, D.W., Lemeshow S.: Applied logisticregression. Wiley Series in Probability andStatistics. Wiley, New York (1989) � LogisticRegression

Huber, P.J.: Robust Statistics. Wiley, New York(1981) � Leverage Point, � Robust Estimation

Huber, P.J.: Robust estimation of a locationparameter. Ann. Math. Stat. 35, 73–101 (1964)� Robust Estimation

Huber, P.J.: Robustness: Where are we now? Student,vol. 1, no. 2, pp. 75–86 (1995) � RobustEstimation

Huitema, B.E.: The Analysis of Covariance andAlternatives. Wiley, New York (1980)� Covariance Analysis

INSEE: Pour une histoire de la statistique. Vol. 1:Contributions. INSEE, Economica, Paris (1977)� Official Statistics

INSEE: Pour une histoire de la statistique. Vol. 2:Matériaux. J. Affichard (ed.) INSEE, Economica,Paris (1987) � Official Statistics

ISI: The Future of Statistics: An InternationalPerspective. Voorburg (1994) � Official Statistics

International Mathematical and Statistical Libraries.:IMSL Library Information. IMSLK, Houston(1981) � Statistical Software

Isaaks, E.H., Srivastava, R.M.: An Introduction toApplied Geostatistics. Oxford University Press,Oxford (1989) � Geostatistics, � SpatialStatistics

Jaffé, W.: Walras, Léon. In: Sills, D.L. (ed.)International Encyclopedia of the Social Sciences,vol. 16. Macmillan and Free Press, New York,pp. 447–453 (1968) � Econometrics

Jambu, M., Lebeaux, M.O.: Classificationautomatique pour l’analyse de données. Dunod,Paris (1978) � Cluster Analysis

Jevons, W.S.: A Serious Fall in the Value of GoldAscertained and Its Social Effects Set Forth.Stanford, London (1863) � Index Number

Jevons, W.S.: The Principles of Science: A Treatiseon Logic and Scientific Methods (two volumes).Macmillan, London (1874) � Geometric Mean

Joanes, D.N. & Gill, C.A.: Comparing measures ofsamp skewness and kurtosis, JRSS.D 47(1),183–189 (1998) � Measure of Kurtosis

Johnson, N.L., Kotz, S., Balakrishnan, N.: DiscreteMultivariate Distributions, John Wiley (1997)� Probability Distribution

Johnson, N.L., Kotz, S.: Distributions in Statistics:Continuous Univariate Distributions, vols. 1 and 2.Wiley, New York (1970) � ContinuousProbability Distribution, � LognormalDistribution, � Probability Distribution

Johnson, N.L., Kotz, S.: Distributions in Statistics:Discrete Distributions. Wiley, New York (1969)� Discrete Probability Distribution, � ProbabilityDistribution

Johnson, N.L., Leone, F.C.: Statistics andexperimental design in engineering and thephysical sciences. Wiley, New York (1964)� Coefficient of Variation

Page 603: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Bibliography 607

Journel, A., Huijbregts, C.J.: Mining Geostatistics.Academic, New York (1981) � Geostatistics,� Spatial Statistics

Kaarsemaker, L., van Wijngaarden, A.: Tables for usein rank correlation. Stat. Neerland. 7, 53 (1953)� Kendall Rank Correlation Coefficient

Kapteyn, J., van Uven, M.J.: Skew frequency curvesin biology and statistics. Hoitsema Brothers,Groningen (1916) � Lognormal Distribution

Kapteyn, J.: Skew frequency curves in biology andstatistics. Astronomical Laboratory Noordhoff,Groningen (1903) � Lognormal Distribution

Kasprzyk, D., Duncan, G. et al. (eds.): Panel Surveys.New York, Chichester, Brisbane, Toronto,Singapore. Wiley, New York (1989) � Panel

Kaufman, L., Rousseeuw, P.J.: Finding Groups inData: An Introduction to Cluster Analysis. Wiley,New York (1990) � Classification, � ClusterAnalysis

Kendall, M.G., Steward, A.: The Advanced Theory ofStatistics, vol. 2. Griffin, London (1967)� Likelihood Ratio Test

Kendall, M.G., Stuard, A.: The Advanced Theory ofStatistics. Vol. 1. Distribution Theory, 4th edn.Griffin, London (1977) � Method of Moments

Kendall, M.G.: A new measure of rank correlation.Biometrika 30, 81–93 (1938) � Kendall RankCorrelation Coefficient

Kendall, M.G.: Rank Correlation Methods. Griffin,London (1948) � Kendall Rank CorrelationCoefficient

Kendall, M.G.: Studies in the history of probabilityand statistics: II. The beginnings of a probabilitycalculus. Biometrika 43, 1–14 (1956)� Probability

Kendall, M.G.: The early history of index numbers.In: Kendall, M., Plackett, R.L. (eds.) Studies in theHistory of Statistics and Probability, vol. II. Griffin,London (1977) � Index Number

Kendall, M.G.: Time Series. Griffin, London (1973)� Covariation, � Time Series

Kendall, M.G.: Where shall the history of Statisticsbegin? Biometrika 47, 447–449 (1960)� Statistics

Khwarizmi, Musa ibn Meusba (9th cent.). Jabrwa-al-muqeabalah. The algebra of Mohammed benMusa, Rosen, F. (ed. and transl.). Georg OlmsVerlag, Hildesheim (1986) � Algorithm

Kiefer, J.: Optimum experimental designs. J. Roy.Stat. Soc. Ser. B 21, 272–319 (1959) � Kiefer,Jack Carl, � Optimal Design

Kleijnen, J.P.C.: Statistical Techniques in Simulation(in two parts). Vol. 9 of Statistics: Textbooks andMonographs. Marcel Dekker, New York (1974)� Simulation

Kleijnen, J.P.C.: Statistical Tools for SimulationPractitioners. Vol. 76 of Statistics: Textbooks andMonographs. Marcel Dekker, New York (1987)� Simulation

Knight, K.: Mathematical statistics. Chapman &Hall/CRC, London (2000) � Joint DensityFunction

Kokoska, S., Zwillinger, D.: CRC StandardProbability and Statistics Tables and formulae.CRC Press (2000) � Statistical Table

Kolmogorov, A.N.: Foundations of the Theory ofProbability. Chelsea Publishing Company, NewYork (1956) � Conditional Probability

Kolmogorov, A.N.: Grundbegriffe derWahrscheinlichkeitsrechnung. Springer, BerlinHeidelberg New York (1933) � ConditionalProbability

Kolmogorov, A.N.: Math. Sbornik. N.S. 1, 607–610(1936) � Stochastic Process

Kolmogorov, A.N.: Sulla determinazione empirica diuna legge di distribuzione. Giornale dell’InstitutoItaliano degli Attuari 4, 83–91 (6.1) (1933)� Kolmogorov–Smirnov Test, � NonparametricTest

Kondylis, A., Whittaker, J.: Using the Bootstrap onPLAD regression. PLS and Related Methods. In:Proceedings of the PLS’05 InternationalSymposium 2005, pp. 395–402. Edition Aluja et al.(2005) � Partial Least Absolute DeviationRegression

Kotz, S., Johnson, N.L.: Encyclopedia of StatisticalSciences, vols. 1–9. Wiley, New York (1982–1988)� Statistics

Kruskal, W.H., Tanur, J.M.: InternationalEncyclopedia of Statistics, vols. 1–2. The FreePress, New York (1978) � Statistics

Kruskal, W.H., Wallis, W.A.: Use of ranks inone-criterion variance analysis. J. Am. Stat. Assoc.47, 583–621 and errata, ibid. 48, 907–911 (1952)� Kruskal-Wallis Table, � Kruskal-Wallis Test,� Nonparametric Test

Page 604: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

608 Bibliography

L.D. Brown, I. Olkin, J. Sacks and H.P. Wynn (eds.):Jack Karl Kiefer, Collected Papers. I: Statisticalinference and probability (1951–1963). New York(1985) � Kiefer, Jack Carl

L.D. Brown, I. Olkin, J. Sacks and H.P. Wynn (eds.):Jack Karl Kiefer, Collected Papers. II: Statisticalinference and probability (1964–1984). New York(1985) � Kiefer, Jack Carl

L.D. Brown, I. Olkin, J. Sacks and H.P. Wynn (eds.):Jack Karl Kiefer, Collected Papers. III: Design ofexperiments. New York (1985) � Kiefer, JackCarl

Lagarde, J. de: Initiation à l’analyse des données.Dunod, Paris (1983) � Inertia Matrix

Lagrange, J.L.: Mémoire sur l’utilité de la méthodede prendre le milieu entre les résultats de plusieursobservations; dans lequel on examine les avantagesde cette méthode par le calcul des probabilités; etoù l’on résoud différents problèmes relatifs à cettematière. Misc. Taurinensia 5, 167–232 (1776)� Error

Lancaster, H.O.: Forerunners of the Pearsonchi-square. Aust. J. Stat. 8, 117–126 (1966)� Chi-square Distribution, � Gamma Distribution

Laplace, P.S. de: Deuxième supplément à la théorieanalytique des probabilités. Courcier, Paris (1818).pp. 531–580 dans les Œuvres complètes deLaplace, vol. 7. Gauthier-Villars, Paris, 1886� Robust Estimation

Laplace, P.S. de: Mémoire sur l’inclinaison moyennedes orbites des comètes. Mém. Acad. Roy. Sci.Paris 7, 503–524 (1773) � Hypothesis Testing

Laplace, P.S. de: Mémoire sur la probabilité descauses par les événements. Mem. Acad. Roy. Sci.(presented by various scientists) 6, 621–656 (1774)(or Laplace, P.S. de (1891) Œuvres complètes,vol 8. Gauthier-Villars, Paris, pp. 27–65) � Error,� Estimation, � Laplace Distribution, � NormalDistribution, � Normal Table

Laplace, P.S. de: Mémoire sur les approximations desformules qui sont fonctions de très grands nombreset sur leur application aux probabilités. Mémoiresde l’Académie Royale des Sciences de Paris, 10.Reproduced in: Œuvres de Laplace 12, 301–347(1810) � Central Limit Theorem, � DeMoivre–Laplace Theorem

Laplace, P.S. de: Mémoire sur les probabilités. Mem.Acad. Roy. Sci. Paris, 227–332 (1781) (orLaplace, P.S. de (1891) Œuvres complètes, vol. 9.

Gauthier-Villars, Paris, pp. 385–485.) � Error,� Normal Table

Laplace, P.S. de: Sur les degrés mesurés desméridiens, et sur les longueurs observées surpendule. Histoire de l’Académie royale desinscriptions et belles lettres, avec les Mémoires delittérature tirées des registres de cette académie.Paris (1789) � Regression Analysis

Laplace, P.S. de: Théorie analytique des probabilités,3rd edn. Courcier, Paris (1820) � Mean SquaredError

Laplace, P.S. de: Théorie analytique des probabilités,suppl. to 3rd edn. Courcier, Paris (1836)� Gamma Distribution

Laplace, P.S. de: Théorie analytique des probabilités.Courcier, Paris (1812) � Estimation,� Inferential Statistics

Laspeyres, E.: Die Berechnung einer mittlerenWaarenpreissteigerung. Jahrb. Natl. Stat. 16,296–314 (1871) � Index Number, � LaspeyresIndex

1863 und die kalifornisch-australischenGeldentdeckung seit 1848. Jahrb. Natl. Stat. 3,81–118, 209–236 (1864) � Index Number,� Laspeyres Index

Le Cam, L.M., Yang, C.L.: Asymptotics in Statistics:Some Basic Concepts. Springer, Berlin HeidelbergNew York (1990) � Convergence

Le Cam, L.: The Central Limit Theorem around1935. Stat. Sci. 1, 78–96 (1986) � Central LimitTheorem

Lebart, L., Salem, A.: Analyse Statistique desDonnées Textuelles. Dunod, Paris (1988)� Correspondence Analysis

Legendre, A.M.: Nouvelles méthodes pour ladétermination des orbites des comètes. Courcier,Paris (1805) � Estimation, � Least-SquaresMethod, � Normal Equations, � RegressionAnalysis

Lehmann, Erich L., Casella, G.: Theory of pointestimation. Springer-Verlag, New York (1998)� Point Estimation

Lehmann, E.I., Romann, S.P.: Testing StatisticalHypothesis, 3rd edn. Springer, New York (2005)� Alternative Hypothesis

Lehmann, E.L.: Testing Statistical Hypotheses, 2ndedn. Wiley, New York (1986) � HypothesisTesting

Page 605: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Bibliography 609

Lehmann, E.L.: Theory of Point Estimation, 2nd edn.Wiley, New York (1983) � Estimator,� Parameter

Lerman, L.C.: Classification et analyse ordinale desdonnées. Dunod, Paris (1981) � Cluster Analysis

Liapounov, A.M.: Sur une proposition de la théoriedes probabilités. Bulletin de l’Academie Imperialedes Sciences de St.-Petersbourg 8, 1–24 (1900)� Central Limit Theorem

Lilienfeld, A.M., Lilienfeld, D.E.: Foundations ofEpidemiology, 2nd edn. Clarendon, Oxford (1980)� Attributable Risk, � Avoidable Risk, � Causeand Effect in Epidemiology, � Epidemiology,� Incidence Rate, � Odds and Odds Ratio,� Prevalence Rate, � Relative Risk, � Risk

Lindley, D.V.: The philosophy of statistics.Statistician 49, 293–337 (2000) � BayesianStatistics

Lipps, G.F.: Die Psychischen Massmethoden. F.Vieweg und Sohn, Braunschweig, Germany (1906)� Kendall Rank Correlation Coefficient

Lowe, J.: The Present State of England in Regard toAgriculture, Trade, and Finance. Kelley, London(1822) � Index Number

MacMahon, B., Pugh, T.F.: Epidemiology: Principlesand Methods. Little Brown, Boston, MA (1970)� Attributable Risk, � Avoidable Risk, � Causeand Effect in Epidemiology, � Epidemiology,� Incidence Rate, � Odds and Odds Ratio,� Prevalence Rate, � Relative Risk, � Risk

MacMahon, P.A.: Combinatory Analysis, vols. Iand II. Cambridge University Press, Cambridge(1915–1916) � Combinatory Analysis

Macauley, F.R.: The smoothing of time series.National Bureau of Economic Research, 121–136(1930) � Time Series

Mahalanobis, P.C.: On the generalized distance instatistics, Proc. natn. Inst. Sci. India 2, 49–55(1936) � Mahalanobis Distance

Mahalanobis, P.C.: On Tests and Meassures ofGroups Divergence. Part I. Theoretical formulae.J. Asiatic Soc. Bengal 26, 541–588 (1930)� Mahalanobis Distance

Maistrov, L.E.: Probability Theory—A HistorySketch (transl. and edit. by Kotz, S.). Academic,New York (1974) � Independence

Mallows, C.L.: Choosing variables in a linearregression: A graphical aid. Presented at theCentral Regional Meeting of the Institute of

Mathematical Statistics, Manhattan, KS, 7–9 May1964 (1964) � Cp Criterion

Mallows, C.L.: Some comments on Cp.Technometrics 15, 661–675 (1973) � Cp

Criterion

Mann, H.B., Whitney, D.R.: On a test whether one oftwo random variables is stochastically larger thanthe other. Ann. Math. Stat. 18, 50–60 (1947)� Mann–Whitney Test, � Nonparametric Test

Markov, A.A.: Application des functions continues aucalcul des probabilités. Kasan. Bull. 9(2), 29–34(1899) � Markov, Andrei Andreevich

Markov, A.A.: Extension des théorèmes limites ducalcul des probabilités aux sommes des quantitésliées en chaîne. Mem. Acad. Sci. St. Petersburg 8,365–397 (1908) � Central Limit Theorem

Martens, H., and Naes, T.: Multivariate Calibration.Wiley, New York (1989) � Partial Least-SquaresRegression

Massey, F.J.: Distribution table for the deviationbetween two sample cumulatives. Ann. Math. Stat.23, 435–441 (1952) � Kolmogorov–SmirnovTest

Matheron, G.: La théorie des variables regionaliséeset ses applications. Masson, Paris (1965)� Geostatistics, � Spatial Data, � Spatial Statistics

Mayer, T.: Abhandlung über die Umwälzung desMonds um seine Achse und die scheinbareBewegung der Mondflecken. KosmographischeNachrichten und Sammlungen auf das Jahr 1748 1,52–183 (1750) � Analysis of Residuals,� Median, � Model

Mayr, E., Linsley, E.G., Usinger, R.L.: Methods andPrinciples of Systematic Zoology. McGraw-Hill,New York (1953) � Dendrogram

McAlister, D.: The law of the geometric mean. Proc.Roy. Soc. Lond. 29, 367–376 (1879)� Lognormal Distribution

McMullen, L., Pearson, E.S.: William Sealy Gosset,1876–1937. In: Pearson, E.S., Kendall, M. (eds.)Studies in the History of Statistics and Probability,vol. I. Griffin, London (1970) � Gosset, WilliamSealy

Merrington, M., Thompson, C.M.: Tables ofpercentage points of the inverted beta (F)distribution. Biometrika 33, 73–88 (1943)� Fisher Table

Page 606: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

610 Bibliography

Metropolis, N., Ulam, S.: The Monte Carlo method.J. Am. Stat. Assoc. 44, 335–341 (1949) � MonteCarlo Method

Meyer, C.D.: Matrix analysis and Applied LinearAlgebra, SIAM (2000) � Eigenvector

Miller, L.H.: Table of percentage points ofKolmogorov statistics. J. Am. Stat. Assoc. 31,111–121 (1956) � Kolmogorov–Smirnov Test

Miller, R.G., Jr.: Simultaneous Statistical Inference,2nd edn. Springer, Berlin Heidelberg New York(1981) � Least Significant Difference Test,� One-Sided Test

Mitrinovic, D.S. (with Vasic, P.M.): AnalyticInequalities. Springer, Berlin Heidelberg New York(1970) � Harmonic Mean

Moivre, A. de: Approximatio ad summamterminorum binomii (a+ b)n, in seriem expansi.Supplementum II to Miscellanae Analytica, pp. 1–7(1733). Photographically reprinted in a rarepamphlet on Moivre and some of his discoveries.Published by Archibald, R.C. Isis 8, 671–683(1926) � Central Limit Theorem, � DeMoivre–Laplace Theorem, � Normal Distribution

Moivre, A. de: The Doctrine of Chances: or,A Method of Calculating the Probability of Eventsin Play. Pearson, London (1718) � InferentialStatistics, � Normal Distribution

Molenberghs, G., Geert, V.: Models for DiscreteLongitudinal Data Series. Springer Series inStatistics. Springer, Berlin Heidelberg New York(2005) � Longitudinal Data

Montgomery, D.C.: Design and analysis ofexperiments, 4th edn. Wiley, Chichester (1997)� One-Way Analysis of Variance

Montmort, P.R. de: Essai d’analyse sur les jeux dehasard, 2nd edn. Quillau, Paris (1713)� Negative Binomial Distribution

Mood, A.M.: Introduction to the Theory of Statistics.McGraw-Hill, New York (1950)� Nonparametric Test

Moore, E.H.: General Analysis, Part I. Mem. Am.Philos. Soc., Philadelphia, PA, pp. 147–209 (1935)� Generalized Inverse

Morabia, A.: Epidemiologie Causale. EditionsMédecine et Hygiène, Geneva (1996)� Attributable Risk, � Avoidable Risk, � Causeand Effect in Epidemiology, � Epidemiology,� Incidence Rate, � Odds and Odds Ratio,� Prevalence Rate, � Relative Risk, � Risk

Morabia, A.: L’Épidémiologie Clinique. Editions“Que sais-je?”. Presses Universitaires de France,Paris (1996) � Attributable Risk, � AvoidableRisk, � Incidence Rate, � Odds and Odds Ratio,� Prevalence Rate, � Relative Risk, � Risk

Mosteller, F., Tukey, J.W.: Data Analysis andRegression: A Second Course in Statistics.Addison-Wesley, Reading, MA (1977)� Correlation Coefficient, � Leverage Point

Mudholkar, G.S.: Multiple correlation coefficient. In:Kotz, S., Johnson, N.L. (eds.) Encyclopedia ofStatistical Sciences, vol. 5. Wiley, New York(1982) � Correlation Coefficient

National Bureau of Standards.: A Guide to Tables ofthe Normal Probability Integral. U.S. Departmentof Commerce. Applied Mathematics Series, 21(1952) � Normal Table

National Bureau of Standards.: Tables of the BinomialProbability Distribution. U.S. Department ofCommerce. Applied Mathematics Series 6 (1950)� Binomial Distribution, � Binomial Table

Naylor, T.H., Balintfy, J.L., Burdick, D.S., Chu, K.:Computer Simulation Techniques. Wiley, NewYork (1967) � Simulation

Neyman, J., Pearson, E.S.: On the problem of themost efficient tests of statistical hypotheses. Philos.Trans. Roy. Soc. Lond. Ser. A 231, 289–337 (1933)� Hypothesis Testing

Neyman, J., Pearson, E.S.: On the use andinterpretation of certain Test Criteria for purposesof Statistical Inference Part I (1928). Reprinted byCambridge University Press in 1967� Significance Level

Neyman, J., Pearson, E.S.: On the use andinterpretation of certain test criteria for purposes ofstatistical inference, Parts I and II. Biometrika 20A,175–240, 263–294 (1928) � Critical Value,� Hypothesis Testing, � Inferential Statistics,� Likelihood Ratio Test, � Type I Error

Nordin, J.A.: Determining sample size. J. Am. Stat.Assoc. 39, 497–506 (1944) � Sample Size

OCDE: OCDE et les indicateurs internationaux del’enseignement. Centre de la Recherche etl’innovation dans l’enseignement, OCDE, Paris(1992) � Indicator

OCDE: Principaux indicateurs de la science et de latechnologie. OCDE, Paris (1988) � Indicator

Oliver, M.A., Webster, R.: Kriging: a method ofinterpolation for geographical information system.

Page 607: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Bibliography 611

Int. J. Geogr. Inf. Syst. 4(3), 313–332 (1990)� Geostatistics, � Spatial Statistics

Ostle, B.: Statistics in Research: Basic Concepts andTechniques for Research Workers. Iowa StateCollege Press, Ames, IA (1954) � Chi-SquareTest, � Contrast

Owen, D.B.: Handbook of Statistical Tables. (3.3, 5.7,5.8, 5.10). Addison-Wesley, Reading, MA (1962)� Wilcoxon Signed Table

Paasche, H.: Über die Preisentwicklung der letztenJahre nach den Hamburger Borsen-notirungen.Jahrb. Natl. Stat. 23, 168–178 (1874) � IndexNumber, � Paasche Index

Palgrave, R.H.I.: Currency and standard of value inEngland, France and India etc. (1886) In: IUP(1969) British Parliamentary Papers: Session 1886:Third Report and Final Report of the RoyalCommission on the Depression in Trade andIndustry, with Minutes of Evidence andAppendices, Appendix B. Irish University Press,Shannon � Index Number

Pannatier, Y.: Variowin 2.2. Software for Spatial DataAnalysis in 2D. Springer, Berlin Heidelberg NewYork (1996) � Geostatistics

Parzen, E.: On estimation of a probability densityfunction and mode. Ann. Math. Stat. 33,1065–1076 (1962) � Nonparametric Statistics

Pascal, B.: Mesnard, J. (ed.) Œuvres complètes.Vol. 2. Desclée de Brouwer, Paris (1970)� Arithmetic Triangle

Pascal, B.: Traité du triangle arithmétique (publ.posthum. in 1665), Paris (1654) � ArithmeticTriangle

Pascal, B.: Varia Opera Mathematica. D. Petri deFermat. Tolosae (1679) � Negative BinomialDistribution

Pascal, B.: Œuvres, vols. 1–14. Brunschvicg, L.,Boutroux, P., Gazier, F. (eds.) Les Grands Ecrivainsde France. Hachette, Paris (1904–1925)� Arithmetic Triangle

Pearson, E.S., Hartley, H.O.: Biometrika Tables forStatisticians, 1 (2nd edn.). Cambridge UniversityPress, Cambridge (1948) � Normal Table

Pearson, E.S., Hartley, H.O.: Biometrika Tables forStatisticians, vols. I and II. Cambridge UniversityPress, Cambridge (1966,1972) � Coefficient ofSkewness

Pearson, E.S.: The Neyman-Pearson story: 1926–34.In: David, F.N. (ed.) Research Papers in Statistics:

Festschrift for J. Neyman. Wiley, New York (1966)� Hypothesis Testing

Pearson, K., Filon, L.N.G.: Mathematicalcontributions to the theory of evolution. IV: On theprobable errors of frequency constants and on theinfluence of random selection on variation andcorrelation. Philos. Trans. Roy. Soc. Lond. Ser. A191, 229–311 (1898) � Estimation

Pearson, K.: Contributions to the mathematical theoryof evolution. I. In: Karl Pearson’s Early StatisticalPapers. Cambridge University Press, Cambridge,pp. 1–40 (1948). First published in 1894 as: On thedissection of asymmetrical frequency curves.Philos. Trans. Roy. Soc. Lond. Ser. A 185, 71–110� Coefficient of Skewness, � Estimation,� Measure of Shape, � Method of Moments,� Standard Deviation

Pearson, K.: Contributions to the mathematical theoryof evolution. II: Skew variation in homogeneousmaterial. In: Karl Pearson’s Early StatisticalPapers. Cambridge University Press, Cambridge,pp. 41–112 (1948). First published in 1895 inPhilos. Trans. Roy. Soc. Lond. Ser. A 186, 343–414� Coefficient of Skewness, � Frequency Curve,� Histogram, � Measure of Shape

Pearson, K.: On the criterion, that a given system ofdeviations from the probable in the case ofa correlated system of variables is such that it canbe reasonably supposed to have arisen fromrandom sampling. In: Karl Pearson’s EarlyStatistical Papers. Cambridge University Press,pp. 339–357. First published in 1900 in Philos.Mag. (5th Ser) 50, 157–175 (1948) � Chi-squareDistribution, � Chi-square Goodness of Fit Test,� Chi-square Test of Independence, � InferentialStatistics

Pearson, K.: On the theory of contingency and itsrelation to association and normal correlation.Drapers’ Company Research Memoirs, BiometricSer. I., pp. 1–35 (1904) � Analysis of CategoricalData, � Contingency Table

Pearson, K.: Studies in the history of statistics andprobability. Biometrika 13, 25–45 (1920).Reprinted in: Pearson, E.S., Kendall, M.G. (eds.)Studies in the History of Statistics and Probability,vol. I. Griffin, London � Correlation Coefficient

Pearson, K.: Tables of the Incomplete -function.H.M. Stationery Office (Cambridge University

Page 608: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

612 Bibliography

Press, Cambridge since 1934), London (1922)� Chi-Square Table

Peirce, B.: Criterion for the Rejection of DoubtfulObservations. Astron. J. 2, 161–163 (1852)� Outlier

Penrose, R.: A generalized inverse for matrices. Proc.Camb. Philos. Soc. 51, 406–413 (1955)� Generalized Inverse

Persons, W.M.: An index of general businessconditions. Rev. Econom. Stat. 1, 111–205 (1919)� Time Series

Petty, W.: Another essay in political arithmeticconcerning the growth of the city of London.Kelley, New York (1682) (2nd edn. 1963)� Biostatistics

Petty, W.: Observations Upon the Dublin-Bills ofMortality, 1681, and the State of That City. Kelley,New York (1683) (2nd edn. 1963) � Biostatistics

Petty, W.: Political arithmetic. In: William Petty, TheEconomic Writings . . . , vol 1. Kelley, New York,pp. 233–313 (1676) � Econometrics

Pitman, E.J.G.: Significance tests which may beapplied to samples from any populations.Supplement, J. Roy. Stat. Soc. 4, 119–130 (1937)� Resampling

Plackett, R.L., Barnard, G.A. (eds.): Student:A Statistical Biography of William Sealy Gosset(Based on writings of E.S. Pearson). Clarendon,Oxford (1990) � Gosset, William Sealy

Plackett, R.L.: Studies in the history of probabilityand statistics. In: Kendall, M., Plackett, R.L. (eds.)The discovery of the method of least squares.vol. II. Griffin, London (1977) � Legendre,Adrien Marie, � Regression Analysis

Plackett, R.L.: Studies in the history of probabilityand statistics. VII. The principle of the arithmeticmean. Biometrika 45, 130–135 (1958)� Arithmetic Mean

Playfair, W.: The Commercial and Political Atlas.Playfair, London (1786) � GraphicalRepresentation

Poisson, S.D.: Recherches sur la probabilité desjugements en matière criminelle et en matièrecivile. In: Procédés des Règles Générales du Calculdes Probabilités. Bachelier, Imprimeur-Librairepour les Mathématiques, Paris (1837) � PoissonDistribution

Poisson, S.D.: Sur la probabilité des résultats moyensdes observations. Connaissance des temps pour

l’an 1827, pp. 273–302 (1824) � Central LimitTheorem

Polyà, G.: Ueber den zentralen Grenzwertsatz derWahrscheinlichkeitsrechnung und dasMomentproblem. Mathematische Zeitschrift 8,171–181 (1920) � Central Limit Theorem

Porter, T.M.: The Rise of Statistical Thinking,1820–1900. Princeton University Press, Princeton,NJ (1986) � Correlation Coefficient

Preece, D.A.: Latin squares, Latin cubes, Latinrectangles, etc. In: Kotz, S., Johnson, N.L. (eds.)Encyclopedia of Statistical Sciences, vol. 4. Wiley,New York (1983) � Latin Square Designs

Py, B.: Statistique Déscriptive. Economica, Paris(1987) � Covariation

Quenouille, M.H.: Approximate tests of correlation intime series. J. Roy. Stat. Soc. Ser. B 11, 68–84(1949) � Jackknife Method

Quenouille, M.H.: Notes on bias in estimation.Biometrika 43, 353–360 (1956) � JackknifeMethod

Quetelet, A.: Letters addressed to H.R.H. the GrandDuke of Saxe Coburg and Gotha, on the Theory ofProbabilities as Applied to the Moral and PoliticalSciences. (French translation by Downs, OlinthusGregory). Charles & Edwin Layton, London(1849) � Chi-square Test of Independence

Rüegg, A.: Probabilités et statistique. PressesPolytechniques Romandes, Lausanne, Switzerland(1985) � Experiment

Rényi, A.: Probability Theory. North Holland,Amsterdam (1970) � Bernoulli’s Theorem

Raktoe, B.L. Hedayat, A., Federer, W.T.: FactorialDesigns. Wiley, New York (1981) � Factor

Rao, C.R., Mitra, S.K., Matthai, A.,Ramamurthy, K.G.: Formulae and Tables forStatistical Work. Statistical Publishing Company,Calcutta, pp. 34–37 (1966) � BinomialDistribution

Rao, C.R., Mitra, S.K.: Generalized Inverses ofMatrices and its Applications. Wiley, New York(1971) � Generalized Inverse

Rao, C.R.: Advanced Statistical Methods inBiometric Research. Wiley, New York (1952)� Analysis of Variance

Rao, C.R.: A note on generalized inverse of a matrixwith applications to problems in mathematicalstatistics. J. Roy. Stat. Soc. Ser. B 24, 152–158(1962) � Generalized Inverse

Page 609: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Bibliography 613

Rao, C.R.: Calculus of generalized inverse ofmatrices. I. General theory. Sankhya A29, 317–342(1967) � Generalized Inverse

Rao, C.R.: Linear Statistical Inference and ItsApplications, 2nd edn. Wiley, New York (1973)� Gauss–Markov Theorem, � Generalized LinearRegression

Rashed, R.: La naissance de l’algèbre. In: Noël, E.(ed.) Le Matin des Mathématiciens. Belin-RadioFrance, Paris (1985) � Algorithm, � ArithmeticTriangle

Reid, C.: Neyman—From Life. Springer, BerlinHeidelberg New York (1982) � HypothesisTesting

Ripley, B.D., Venables, W.N. (2000) S programming.Springer, Berlin Heidelberg New York� Statistical Software

Ripley, B.D.: Thoughts on pseudorandom numbergenerators. J. Comput. Appl. Math. 31, 153–163(1990) � Generation of Random Numbers

Robert, C.P.: Le choix Bayesien. Springer, Paris(2006) � Bayesian Statistics

Rose, D.: Household panel studies: an overview.Innovation 8(1), 7–24 (1995) � Panel

Ross, S.M.: Introduction to Probability Models, 8thedn. John Wiley, New York (2006) � BernoulliTrial, � Joint Distribution Function, � MarginalDensity Function

Rothmann, J.K.: Epidemiology. An Introduction.Oxford University Press (2002) � Cause andEffect in Epidemiology, � Epidemiology

Rothschild, V., Logothetis, N.: ProbabilityDistributions. Wiley, New York (1986)� Probability Distribution

Royston, E.: A note on the history of the graphicalpresentation of data. In: Pearson, E.S., Kendall, M.(eds.) Studies in the History of Statistics andProbability, vol. I. Griffin, London (1970)� Graphical Representation

Rubin, D.B.: A non-iterative algorithm for leastsquares estimation of missing values in anyanalysis of variance design. Appl. Stat. 21,136–141 (1972) � Missing Data

Saporta, G.: Probabilité, analyse de données etstatistiques. Technip, Paris (1990)� Correspondence Analysis

Savage, I.R.: Bibliography of NonparametricStatistics. Harvard University Press, Cambridge,MA (1962) � Nonparametric Test

Savage, R.: Bibliography of Nonparametric Statisticsand Related Topics: Introduction. J. Am. Stat.Assoc. 48, 844–849 (1953) � NonparametricStatistics

Scheffé, H.: A method for judging all contrasts in theanalysis of variance. Biometrika 40, 87–104 (1953)� Contrast

Scheffé, H.: The Analysis of Variance. Wiley, NewYork (1959) � Analysis of Variance

Schmid, C.F., Schmid, S.E.: Handbook of GraphicPresentation, 2nd edn. Wiley, New York (1979)� Quantitative Graph

Schmid, C.F.: Handbook of Graphic Presentation.Ronald Press, New York (1954) � GraphicalRepresentation

Searle, S.R.: Matrix Algebra Useful for Statistics.Wiley, New York (1982) � Matrix

Seber, A.F., Wilt, S.C.: Nonlinear Regression. Wiley,New York (2003) � Nonlinear Regression

Seber, G.A.F. (1977) Linear Regression Analysis.Wiley, New York � Simple Linear Regression,� Weighted Least-Squares Method

Senn, St.: Cross-over Trials in Clinical Research.Wiley (1993) � Treatment

Shao, J., Tu, D.: The Jackknife and Bootstrap.Springer, Berlin Heidelberg New York (1995)� Bootstrap

Sheffé, H.: The analysis of variance. C’n’H (1959)� One-Way Analysis of Variance

Sheppard, W.F.: New tables of the probabilityintegral. Biometrika 2, 174–190 (1903)� Normal Table

Sheppard, W.F.: On the calculation of the mostprobable values of frequency constants for dataarranged according to equidistant divisions ofa scale. London Math. Soc. Proc. 29, 353–380(1898) � Moment

Sheppard, W.F.: Table of deviates of the normal curve.Biometrika 5, 404–406 (1907) � Normal Table

Sheynin, O.B.: On the history of some statistical lawsof distribution. In: Kendall, M., Plackett, R.L.(eds.) Studies in the History of Statistics andProbability, vol. II. Griffin, London (1977)� Chi-square Distribution

Sheynin, O.B.: P.S. Laplace’s theory of errors,Archive for History of Exact Sciences 17, 1–61(1977) � De Moivre–Laplace Theorem

Page 610: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

614 Bibliography

Sheynin, O.B.: P.S. Laplace’s work on probability,Archive for History of Exact Sciences 16, 137–187(1976) � De Moivre–Laplace Theorem

Simpson, T.: An attempt to show the advantagearising by taking the mean of a number ofobservations in practical astronomy. In:Miscellaneous Tracts on Some Curious and VeryInteresting Subjects in Mechanics,Physical-Astronomy, and SpeculativeMathematics. Nourse, London (1757). pp. 64–75� Arithmetic Mean

Simpson, T.: A letter to the Right Honorable GeorgeEarl of Macclesfield, President of the RoyalSociety, on the advantage of taking the mean ofa number of observations in practical astronomy.Philos. Trans. Roy. Soc. Lond. 49, 82–93 (1755)� Arithmetic Mean, � Error

Simpson, T.: Miscellaneous Tracts on Some Curiousand Very Interesting Subjects in Mechanics,Physical-Astronomy and Speculative Mathematics.Nourse, London (1757) � Error

Sittig, J.: The economic choice of sampling system inacceptance sampling. Bull. Int. Stat. Inst. 33(V),51–84 (1952) � Sample Size

Smirnov, N.V.: Estimate of deviation betweenempirical distribution functions in two independentsamples. (Russian). Bull. Moscow Univ. 2(2), 3–16(6.1, 6.2) (1939) � Kolmogorov–Smirnov Test,� Nonparametric Test

Smirnov, N.V.: Table for estimating the goodness offit of empirical distributions. Ann. Math. Stat. 19,279–281 (6.1) (1948) � Kolmogorov–SmirnovTest

Smith, C.A.B.: Statistics in human genetics. In:Kotz, S., Johnson, N.L. (eds.) Encyclopedia ofStatistical Sciences, vol. 3. Wiley, New York(1983) � Genetic Statistics

Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy:The Principles and Practice of NumericalClassification (A Series of Books in Biology).W.H. Freeman, San Francisco, CA (1973)� Dendrogram

Snedecor, G.W.: Calculation and Interpretation ofAnalysis of Variance and Covariance. Collegiate,Ames, IA (1934) � Fisher Distribution

Sobol, I.M.: The Monte Carlo Method. MirPublishers, Moscow (1975) � Monte CarloMethod

Sokal, R.R., Sneath, P.H.: Principles of NumericalTaxonomy. Freeman, San Francisco (1963)� Dendrogram

Spearman, C.: The Proof and Measurement ofAssociation between Two Things. Am. J. Psychol.15, 72–101 (1904) � Spearman Rank CorrelationCoefficient

Statistical Sciences: Statistical Analysis in S-Plus,Version 3.1. Seattle: StatSci, a division of MathSoft(1993) � Statistical Software

Staudte, R.G., Sheater, S.J.: Robust Estimation andTesting. Wiley, New York (1990) � Convergence

Stein, C.: A two-sample test for a linear hypothesiswhose power is independent of the variance. Ann.Math. Stat. 16, 243–258 (1945) � Sample Size

Stephens, M.A.: EDF statistics for goodness of fit andsome comparisons. J. Am. Stat. Assoc. 69,730–737 (1974) � Anderson–Darling Test

Stigler, S.: Francis Galton’s account of the inventionof correlation. Stat. Sci. 4, 73–79 (1989)� Correlation Coefficient

Stigler, S.: Simon Newcomb, Percy Daniell, and theHistory of Robust Estimation 1885–1920. J. Am.Stat. Assoc. 68, 872–879 (1973) � Outlier

Stigler, S.: Stigler’s law of eponymy, Transactions ofthe New York Academy of Sciences, 2nd series 39,147–157 (1980) � Normal Distribution

Stigler, S.: The History of Statistics, the Measurementof Uncertainty Before 1900. Belknap, London(1986) � Bernoulli Family, � CombinatoryAnalysis, � Official Statistics, � Probability,� Regression Analysis, � Statistics

Stuart, A., Ord, J.K.: Kendall’s advanced theory ofstatistics. Vol. I. Distribution theory. Wiley, NewYork (1943) � Measure of Central Tendency

Sukhatme, P.V.: Sampling Theory of Surveys withApplications. Iowa State University Press, Ames,IA (1954) � Quota Sampling

Takács, L.: Combinatorics. In: Kotz, S.,Johnson, N.L. (eds.) Encyclopedia of StatisticalSciences, vol. 2. Wiley, New York (1982)� Combinatory Analysis

Tassi, Ph.: De l’exhaustif au partiel: un peu d’histoiresur le développement des sondages. Estimation etsondages, cinq contributions à l’histoire de lastatistique. Economica, Paris (1988) � Sampling

Tchebychev, P.L. (1890–1891). Sur deux théorèmesrelatifs aux probabilités. Acta Math. 14, 305–315� Central Limit Theorem

Page 611: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

Bibliography 615

Tchebychev, P.L.: Des valeurs moyennes. J. Math.Pures Appl. 12(2), 177–184 (1867). Publiésimultanément en Russe, dans Mat. Sbornik 2(2),1–9 � Law of Large Numbers

Tenenhaus, M.: La régression PLS. Théorie etpratique. Technip, Paris (1998) � PartialLeast-Squares Regression

Thiele, T.N.: Theory of Observations. C. and E.Layton, London; Ann. Math. Stat. 2, 165–308(1903) � Method of Moments

Thorndike, R.L.: Who belongs in a family?Psychometrika 18, 267–276 (1953)� Classification, � Data Analysis

Todhunter, I.: A history of the mathematical theory ofprobability. Chelsea Publishing Company, NewYork (1949) � Probability

Tomassone, R., Daudin, J.J., Danzart, M.,Masson, J.P.: Discrimination et classement.Masson, Paris (1988) � Cluster Analysis

Tryon, R.C.: Cluster Analysis. McGraw-Hill, NewYork (1939) � Classification

Tukey, J.W.: Bias and confidence in not quite largesamples. Ann. Math. Stat. 29, 614 (1958)� Jackknife Method

Tukey, J.W.: Comparing individual means in theanalysis of variance. Biometrics 5, 99–114 (1949)� Contrast

Tukey, J.W.: Explanatory Data Analysis, limitedpreliminary edition. Addison-Wesley, Reading,MA (1970–1971) � Data Analysis,� Stem-And-Leaf Diagram

Tukey, J.W.: Exploratory Data Analysis.Addison-Wesley, Reading, MA (1977) � DataAnalysis, � Exploratory Data Analysis

Tukey, J.W.: Quick and Dirty Methods in Statistics.Part II: Simple Analyses for Standard Designs.Quality Control Conference Papers 1951.American Society for Quality Control, New York,pp. 189–197 (1951) � Contrast

Tukey, J.W.: Some graphical and semigraphicaldisplays. In: Bancroft, T.A. (ed.) Statistical Papersin Honor of George W. Snedecor. Iowa StateUniversity Press, Ames, IA , pp. 293–316 (1972)� Box Plot, � Exploratory Data Analysis, � HatMatrix

Twisk, J.W.R.: Applied Longitudinal Data Analysisfor Epidemiology. A Practical Guide. CambridgeUniversity Press, Cambridge (2003)� Longitudinal Data

United Nations Statistical Commission FundamentalPrinciples of Official Statistics � OfficialStatistics

Venables, W.N., Ripley, B.D.: Modern AppliedStatistics with S-PLUS. Springer, BerlinHeidelberg New York (1994) � StatisticalSoftware

Von Neumann, J.: Various Techniques Used inConnection with Random Digits. National Bureauof Standards symposium, NBS, AppliedMathematics Series 12, pp. 36–38, NationalBureau of Standards, Washington, D.C (1951)� Monte Carlo Method

Von Neumann, J.: Über ein ökonomischesGleichungssystem und eine Verallgemeinerung desBrouwerschen Fixpunktsatzes. Ergebn. math.Kolloqu. Wien 8, 73–83 (1937) � LinearProgramming

Wald, A.: Asymptotically Most Powerful Tests ofStatistical Hypotheses. Ann. Math. Stat. 12, 1–19(1941) � Likelihood Ratio Test

Wald, A.: Some Examples of Asymptotically MostPowerful Tests. Ann. Math. Stat. 12, 396–408(1941) � Likelihood Ratio Test

Wegman, E.J., Hayes, A.R.: Statistical Software.Encycl. Stat. Sci. 8, 667–674 (1988) � StatisticalSoftware

Weiseberg, S.: Applied linear regression. Wiley, NewYork (2006) � Multiple Linear Regression

White, C.: The use of ranks in a test of significancefor comparing two treatments. Biometrics 8, 33–41(1952) � Mann–Whitney Test

Whyte, Lancelot Law: Roger Joseph Boscovich,Studies of His Life and Work on the 250thAnniversary of his Birth. Allen and Unwin,London (1961) � Boscovich, Roger J.

Wilcoxon, F.: Individual comparisons by rankingmethods. Biometrics 1, 80–83 (1945)� Mann–Whitney Test, � Nonparametric Test,� Wilcoxon Signed Table, � Wilcoxon SignedTest, � Wilcoxon Test

Wilcoxon, F.: Probability tables for individualcomparisons by ranking methods. Biometrics 3,119–122 (1947) � Wilcoxon Table

Wilcoxon, F.: Some rapid approximate statisticalprocedures. American Cyanamid, StamfordResearch Laboratories, Stamford, CT (1957)� Nonparametric Test, � Wilcoxon Signed Test

Page 612: The Concise Encyclopedia of Statisticsstewartschultz.com/statistics/books/The Concise Encyclopedia of... · The Concise Encyclopedia of Statistics With 247 Tables 123. Author YadolahDodge

616 Bibliography

Wildt, A.R., Ahtola, O.: Analysis of Covariance(Sage University Papers Series on QuantitativeApplications in the Social Sciences, Paper 12).Sage, Thousand Oaks, CA (1978) � CovarianceAnalysis

Wilk, M.B., Ganadesikan, R.: Probability plottingmethods for the analysis of data. Biometrika 55,1–17 (1968) � Q-Q Plot (Quantile to QuantilePlot)

Wilks, S.S.: Mathematical Statistics. Wiley, NewYork (1962) � Likelihood Ratio Test

Wold, H.O. (ed.): Bibliography on Time Series andStochastic Processes. Oliver & Boyd, Edinburgh(1965) � Cyclical Fluctuation

Wonnacott, R.J., Wonnacott, T.H.: Econometrics.Wiley, New York (1970) � Econometrics

Yates, F.: The design and analysis of factorialexperiments, Techn. comm. 35, Imperial Bureau ofSoil Science, Harpenden (1937) � FractionalFactorial Design

Yates, F.: Complex experiments. Suppl. J. Roy. Stat.Soc. 2, 181–247 (1935) � Two-way Analysis ofVariance

Yates, F.: Sampling Methods for Censuses andSurveys. Griffin, London (1949) � StratifiedSampling

Yates, F.: The analysis of replicated experimentswhen the field results are incomplete. Empire J.Exp. Agricult. 1, 129–142 (1933) � Missing Data

Yates, F.: The design and analysis of factorialexperiments. Technical Communication of theCommonwealth Bureau of Soils 35,Commonwealth Agricultural Bureaux, FarnhamRoyal (1937) � Two-way Analysis of Variance,� Yates’ Algorithm

Youschkevitch, A.P.: Les mathématiques arabes(VIIIème-XVème siècles). Partial translation byCazenave, M., Jaouiche, K. Vrin, Paris (1976)� Arithmetic Triangle

Yule, G.U. (1926) Why do we sometimes getnonsense-correlations between time-series?A study in sampling and the nature of time-series.J. Roy. Stat. Soc. (2) 89, 1–64 � CorrelationCoefficient, � Partial Correlation

Yule, G.U.: On the association of attributes instatistics: with illustration from the material of thechildhood society. Philos. Trans. Roy. Soc. Lond.Ser. A 194, 257–319 (1900) � Chi-square Test ofIndependence

Yule, G.U.: On the theory of correlation. J. Roy. Stat.Soc. 60, 812–854 (1897) � CorrelationCoefficient

Yule, G.V., Kendall, M.G.: An Introduction to theTheory of Statistics, 14th edn. Griffin, London(1968) � Yule and Kendall Coefficient

Zubin, J.: A technique for measuring likemindedness.J. Abnorm. Social Psychol. 33, 508–516 (1938)� Classification, � Data Analysis